Give Python a debug-mode pymalloc, much as sketched on Python-Dev.

When WITH_PYMALLOC is defined, define PYMALLOC_DEBUG to enable the debug
allocator.  This can be done independent of build type (release or debug).
A debug build automatically defines PYMALLOC_DEBUG when pymalloc is
enabled.  It's a detected error to define PYMALLOC_DEBUG when pymalloc
isn't enabled.

Two debugging entry points defined only under PYMALLOC_DEBUG:

+ _PyMalloc_DebugCheckAddress(const void *p) can be used (e.g., from gdb)
  to sanity-check a memory block obtained from pymalloc.  It sprays
  info to stderr (see next) and dies via Py_FatalError if the block is
  detectably damaged.

+ _PyMalloc_DebugDumpAddress(const void *p) can be used to spray info
  about a debug memory block to stderr.

A tiny start at implementing "API family" checks isn't good for
anything yet.

_PyMalloc_DebugRealloc() has been optimized to do little when the new
size is <= old size.  However, if the new size is larger, it really
can't call the underlying realloc() routine without either violating its
contract, or knowing something non-trivial about how the underlying
realloc() works.  A memcpy is always done in this case.

This was a disaster for (and only) one of the std tests:  test_bufio
creates single text file lines up to a million characters long.  On
Windows, fileobject.c's get_line() uses the horridly funky
getline_via_fgets(), which keeps growing and growing a string object
hoping to find a newline.  It grew the string object 1000 bytes each
time, so for a million-character string it took approximately forever
(I gave up after a few minutes).

So, also:

fileobject.c, getline_via_fgets():  When a single line is outrageously
long, grow the string object at a mildly exponential rate, instead of
just 1000 bytes at a time.

That's enough so that a debug-build test_bufio finishes in about 5 seconds
on my Win98SE box.  I'm curious to try this on Win2K, because it has very
different memory behavior than Win9X, and test_bufio always took a factor
of 10 longer to complete on Win2K.  It *could* be that the endless
reallocs were simply killing it on Win2K even in the release build.
diff --git a/Include/pymem.h b/Include/pymem.h
index 10e915f..a321f5b 100644
--- a/Include/pymem.h
+++ b/Include/pymem.h
@@ -89,20 +89,34 @@
    it is recommended to write the test explicitly in the code.
    Note that according to ANSI C, free(NULL) has no effect. */
 
-	
+
 /* pymalloc (private to the interpreter) */
 #ifdef WITH_PYMALLOC
 DL_IMPORT(void *) _PyMalloc_Malloc(size_t nbytes);
 DL_IMPORT(void *) _PyMalloc_Realloc(void *p, size_t nbytes);
 DL_IMPORT(void) _PyMalloc_Free(void *p);
+
+#ifdef PYMALLOC_DEBUG
+DL_IMPORT(void *) _PyMalloc_DebugMalloc(size_t nbytes, int family);
+DL_IMPORT(void *) _PyMalloc_DebugRealloc(void *p, size_t nbytes, int family);
+DL_IMPORT(void) _PyMalloc_DebugFree(void *p, int family);
+DL_IMPORT(void) _PyMalloc_DebugDumpAddress(const void *p);
+DL_IMPORT(void) _PyMalloc_DebugCheckAddress(const void *p);
+#define _PyMalloc_MALLOC(N) _PyMalloc_DebugMalloc(N, 0)
+#define _PyMalloc_REALLOC(P, N) _PyMalloc_DebugRealloc(P, N, 0)
+#define _PyMalloc_FREE(P) _PyMalloc_DebugFree(P, 0)
+
+#else	/* WITH_PYMALLOC && ! PYMALLOC_DEBUG */
 #define _PyMalloc_MALLOC _PyMalloc_Malloc
 #define _PyMalloc_REALLOC _PyMalloc_Realloc
 #define _PyMalloc_FREE _PyMalloc_Free
-#else
+#endif
+
+#else	/* ! WITH_PYMALLOC */
 #define _PyMalloc_MALLOC PyMem_MALLOC
 #define _PyMalloc_REALLOC PyMem_REALLOC
 #define _PyMalloc_FREE PyMem_FREE
-#endif
+#endif	/* WITH_PYMALLOC */
 
 
 #ifdef __cplusplus