Add PYTHONMALLOC env var

Issue #26516:

* Add PYTHONMALLOC environment variable to set the Python memory
  allocators and/or install debug hooks.
* PyMem_SetupDebugHooks() can now also be used on Python compiled in release
  mode.
* The PYTHONMALLOCSTATS environment variable can now also be used on Python
  compiled in release mode. It now has no effect if set to an empty string.
* In debug mode, debug hooks are now also installed on Python memory allocators
  when Python is configured without pymalloc.
diff --git a/Doc/c-api/memory.rst b/Doc/c-api/memory.rst
index 290ef09..fe1cd5f 100644
--- a/Doc/c-api/memory.rst
+++ b/Doc/c-api/memory.rst
@@ -85,9 +85,12 @@
 
 .. seealso::
 
+   The :envvar:`PYTHONMALLOC` environment variable can be used to configure
+   the memory allocators used by Python.
+
    The :envvar:`PYTHONMALLOCSTATS` environment variable can be used to print
-   memory allocation statistics every time a new object arena is created, and
-   on shutdown.
+   statistics of the :ref:`pymalloc memory allocator <pymalloc>` every time a
+   new pymalloc object arena is created, and on shutdown.
 
 
 Raw Memory Interface
@@ -343,25 +346,36 @@
    - detect write before the start of the buffer (buffer underflow)
    - detect write after the end of the buffer (buffer overflow)
 
-   The function does nothing if Python is not compiled is debug mode.
+   These hooks are installed by default if Python is compiled in debug
+   mode. The :envvar:`PYTHONMALLOC` environment variable can be used to install
+   debug hooks on a Python compiled in release mode.
+
+   .. versionchanged:: 3.6
+      This function now also works on Python compiled in release mode.
 
 
-Customize PyObject Arena Allocator
-==================================
+.. _pymalloc:
 
-Python has a *pymalloc* allocator for allocations smaller than 512 bytes. This
-allocator is optimized for small objects with a short lifetime. It uses memory
-mappings called "arenas" with a fixed size of 256 KB. It falls back to
-:c:func:`PyMem_RawMalloc` and :c:func:`PyMem_RawRealloc` for allocations larger
-than 512 bytes.  *pymalloc* is the default allocator used by
-:c:func:`PyObject_Malloc`.
+The pymalloc allocator
+======================
 
-The default arena allocator uses the following functions:
+Python has a *pymalloc* allocator optimized for small objects (smaller or equal
+to 512 bytes) with a short lifetime. It uses memory mappings called "arenas"
+with a fixed size of 256 KB. It falls back to :c:func:`PyMem_RawMalloc` and
+:c:func:`PyMem_RawRealloc` for allocations larger than 512 bytes.
+
+*pymalloc* is the default allocator of the :c:data:`PYMEM_DOMAIN_OBJ` domain
+(:c:func:`PyObject_Malloc` & cie).
+
+The arena allocator uses the following functions:
 
 * :c:func:`VirtualAlloc` and :c:func:`VirtualFree` on Windows,
 * :c:func:`mmap` and :c:func:`munmap` if available,
 * :c:func:`malloc` and :c:func:`free` otherwise.
 
+Customize pymalloc Arena Allocator
+----------------------------------
+
 .. versionadded:: 3.4
 
 .. c:type:: PyObjectArenaAllocator
diff --git a/Doc/using/cmdline.rst b/Doc/using/cmdline.rst
index ec744a3..684ccb6 100644
--- a/Doc/using/cmdline.rst
+++ b/Doc/using/cmdline.rst
@@ -621,6 +621,51 @@
    .. versionadded:: 3.4
 
 
+.. envvar:: PYTHONMALLOC
+
+   Set the Python memory allocators and/or install debug hooks.
+
+   Set the family of memory allocators used by Python:
+
+   * ``malloc``: use the :c:func:`malloc` function of the C library
+     for all Python memory allocators (:c:func:`PyMem_RawMalloc`,
+     :c:func:`PyMem_Malloc`, :c:func:`PyObject_Malloc` & cie).
+   * ``pymalloc``: :c:func:`PyObject_Malloc`, :c:func:`PyObject_Calloc` and
+     :c:func:`PyObject_Realloc` use the :ref:`pymalloc allocator <pymalloc>`.
+     Other Python memory allocators (:c:func:`PyMem_RawMalloc`,
+     :c:func:`PyMem_Malloc` & cie) use :c:func:`malloc`.
+
+   Install debug hooks:
+
+   * ``debug``: install debug hooks on top of the default memory allocator
+   * ``malloc_debug``: same than ``malloc`` but also install debug hooks
+   * ``pymalloc_debug``: same than ``malloc`` but also install debug hooks
+
+   See the :c:func:`PyMem_SetupDebugHooks` function for debug hooks on Python
+   memory allocators.
+
+   .. note::
+      ``pymalloc`` and ``pymalloc_debug`` are not available if Python is
+      configured without ``pymalloc`` support.
+
+   .. versionadded:: 3.6
+
+
+.. envvar:: PYTHONMALLOCSTATS
+
+   If set to a non-empty string, Python will print statistics of the
+   :ref:`pymalloc memory allocator <pymalloc>` every time a new pymalloc object
+   arena is created, and on shutdown.
+
+   This variable is ignored if the :envvar:`PYTHONMALLOC` environment variable
+   is used to force the :c:func:`malloc` allocator of the C library, or if
+   Python is configured without ``pymalloc`` support.
+
+   .. versionchanged:: 3.6
+      This variable can now also be used on Python compiled in release mode.
+      It now has no effect if set to an empty string.
+
+
 Debug-mode variables
 ~~~~~~~~~~~~~~~~~~~~
 
@@ -636,9 +681,3 @@
 
    If set, Python will dump objects and reference counts still alive after
    shutting down the interpreter.
-
-
-.. envvar:: PYTHONMALLOCSTATS
-
-   If set, Python will print memory allocation statistics every time a new
-   object arena is created, and on shutdown.
diff --git a/Doc/whatsnew/3.6.rst b/Doc/whatsnew/3.6.rst
index 3afe2d4..588826b 100644
--- a/Doc/whatsnew/3.6.rst
+++ b/Doc/whatsnew/3.6.rst
@@ -80,6 +80,9 @@
        PEP written by Carl Meyer
 
 
+New Features
+============
+
 .. _whatsnew-fstrings:
 
 PEP 498: Formatted string literals
@@ -98,6 +101,34 @@
 See :pep:`498` and the main documentation at :ref:`f-strings`.
 
 
+PYTHONMALLOC environment variable
+---------------------------------
+
+The new :envvar:`PYTHONMALLOC` environment variable allows to set the Python
+memory allocators and/or install debug hooks.
+
+It is now possible to install debug hooks on Python memory allocators on Python
+compiled in release mode using ``PYTHONMALLOC=debug``. Effects of debug hooks:
+
+* Newly allocated memory is filled with the byte ``0xCB``
+* Freed memory is filled with the byte ``0xDB``
+* Detect violations of Python memory allocator API. For example,
+  :c:func:`PyObject_Free` called on a memory block allocated by
+  :c:func:`PyMem_Malloc`.
+* Detect write before the start of the buffer (buffer underflow)
+* Detect write after the end of the buffer (buffer overflow)
+
+See the :c:func:`PyMem_SetupDebugHooks` function for debug hooks on Python
+memory allocators.
+
+It is now also possible to force the usage of the :c:func:`malloc` allocator of
+the C library for all Python memory allocations using ``PYTHONMALLOC=malloc``.
+It helps to use external memory debuggers like Valgrind on a Python compiled in
+release mode.
+
+(Contributed by Victor Stinner in :issue:`26516`.)
+
+
 Other Language Changes
 ======================
 
diff --git a/Include/pymem.h b/Include/pymem.h
index 043db64..b1f06ef 100644
--- a/Include/pymem.h
+++ b/Include/pymem.h
@@ -16,8 +16,17 @@
 PyAPI_FUNC(void *) PyMem_RawCalloc(size_t nelem, size_t elsize);
 PyAPI_FUNC(void *) PyMem_RawRealloc(void *ptr, size_t new_size);
 PyAPI_FUNC(void) PyMem_RawFree(void *ptr);
+
+/* Configure the Python memory allocators. Pass NULL to use default
+   allocators. */
+PyAPI_FUNC(int) _PyMem_SetupAllocators(const char *opt);
+
+#ifdef WITH_PYMALLOC
+PyAPI_FUNC(int) _PyMem_PymallocEnabled(void);
 #endif
 
+#endif   /* !Py_LIMITED_API */
+
 
 /* BEWARE:
 
diff --git a/Lib/test/test_capi.py b/Lib/test/test_capi.py
index 74ec6c5..d56d702 100644
--- a/Lib/test/test_capi.py
+++ b/Lib/test/test_capi.py
@@ -6,6 +6,7 @@
 import random
 import subprocess
 import sys
+import sysconfig
 import textwrap
 import time
 import unittest
@@ -521,6 +522,7 @@
         self.assertRaises(ValueError, _testcapi.parse_tuple_and_keywords,
                           (), {}, b'', [42])
 
+
 @unittest.skipUnless(threading, 'Threading required for this test.')
 class TestThreadState(unittest.TestCase):
 
@@ -545,6 +547,7 @@
         t.start()
         t.join()
 
+
 class Test_testcapi(unittest.TestCase):
     def test__testcapi(self):
         for name in dir(_testcapi):
@@ -553,5 +556,61 @@
                     test = getattr(_testcapi, name)
                     test()
 
+
+class MallocTests(unittest.TestCase):
+    ENV = 'debug'
+
+    def check(self, code):
+        with support.SuppressCrashReport():
+            out = assert_python_failure('-c', code, PYTHONMALLOC=self.ENV)
+        stderr = out.err
+        return stderr.decode('ascii', 'replace')
+
+    def test_buffer_overflow(self):
+        out = self.check('import _testcapi; _testcapi.pymem_buffer_overflow()')
+        regex = (r"Debug memory block at address p=0x[0-9a-f]+: API 'm'\n"
+                 r"    16 bytes originally requested\n"
+                 r"    The 7 pad bytes at p-7 are FORBIDDENBYTE, as expected.\n"
+                 r"    The 8 pad bytes at tail=0x[0-9a-f]+ are not all FORBIDDENBYTE \(0x[0-9a-f]{2}\):\n"
+                 r"        at tail\+0: 0x78 \*\*\* OUCH\n"
+                 r"        at tail\+1: 0xfb\n"
+                 r"        at tail\+2: 0xfb\n"
+                 r"        at tail\+3: 0xfb\n"
+                 r"        at tail\+4: 0xfb\n"
+                 r"        at tail\+5: 0xfb\n"
+                 r"        at tail\+6: 0xfb\n"
+                 r"        at tail\+7: 0xfb\n"
+                 r"    The block was made by call #[0-9]+ to debug malloc/realloc.\n"
+                 r"    Data at p: cb cb cb cb cb cb cb cb cb cb cb cb cb cb cb cb\n"
+                 r"Fatal Python error: bad trailing pad byte")
+        self.assertRegex(out, regex)
+
+    def test_api_misuse(self):
+        out = self.check('import _testcapi; _testcapi.pymem_api_misuse()')
+        regex = (r"Debug memory block at address p=0x[0-9a-f]+: API 'm'\n"
+                 r"    16 bytes originally requested\n"
+                 r"    The 7 pad bytes at p-7 are FORBIDDENBYTE, as expected.\n"
+                 r"    The 8 pad bytes at tail=0x[0-9a-f]+ are FORBIDDENBYTE, as expected.\n"
+                 r"    The block was made by call #[0-9]+ to debug malloc/realloc.\n"
+                 r"    Data at p: .*\n"
+                 r"Fatal Python error: bad ID: Allocated using API 'm', verified using API 'r'\n")
+        self.assertRegex(out, regex)
+
+
+class MallocDebugTests(MallocTests):
+    ENV = 'malloc_debug'
+
+
+@unittest.skipUnless(sysconfig.get_config_var('WITH_PYMALLOC') == 1,
+                     'need pymalloc')
+class PymallocDebugTests(MallocTests):
+    ENV = 'pymalloc_debug'
+
+
+@unittest.skipUnless(Py_DEBUG, 'need Py_DEBUG')
+class DefaultMallocDebugTests(MallocTests):
+    ENV = ''
+
+
 if __name__ == "__main__":
     unittest.main()
diff --git a/Misc/NEWS b/Misc/NEWS
index 1c894a4..852be06 100644
--- a/Misc/NEWS
+++ b/Misc/NEWS
@@ -10,6 +10,19 @@
 Core and Builtins
 -----------------
 
+- Issue #26516: Add :envvar`PYTHONMALLOC` environment variable to set the
+  Python memory allocators and/or install debug hooks.
+
+- Issue #26516: The :c:func`PyMem_SetupDebugHooks` function can now also be
+  used on Python compiled in release mode.
+
+- Issue #26516: The :envvar:`PYTHONMALLOCSTATS` environment variable can now
+  also be used on Python compiled in release mode. It now has no effect if
+  set to an empty string.
+
+- Issue #26516: In debug mode, debug hooks are now also installed on Python
+  memory allocators when Python is configured without pymalloc.
+
 - Issue #26464: Fix str.translate() when string is ASCII and first replacements
   removes character, but next replacement uses a non-ASCII character or a
   string longer than 1 character. Regression introduced in Python 3.5.0.
diff --git a/Misc/README.valgrind b/Misc/README.valgrind
index b5a9a32..908f137 100644
--- a/Misc/README.valgrind
+++ b/Misc/README.valgrind
@@ -2,6 +2,9 @@
 Python.  Valgrind is used periodically by Python developers to try
 to ensure there are no memory leaks or invalid memory reads/writes.
 
+UPDATE: Python 3.6 now supports PYTHONMALLOC=malloc environment variable which
+can be used to force the usage of the malloc() allocator of the C library.
+
 If you don't want to read about the details of using Valgrind, there
 are still two things you must do to suppress the warnings.  First,
 you must use a suppressions file.  One is supplied in
diff --git a/Modules/_testcapimodule.c b/Modules/_testcapimodule.c
index a21a584..babffc4 100644
--- a/Modules/_testcapimodule.c
+++ b/Modules/_testcapimodule.c
@@ -3616,6 +3616,33 @@
     return PyLong_FromLong(tstate->recursion_depth - 1);
 }
 
+static PyObject*
+pymem_buffer_overflow(PyObject *self, PyObject *args)
+{
+    char *buffer;
+
+    /* Deliberate buffer overflow to check that PyMem_Free() detects
+       the overflow when debug hooks are installed. */
+    buffer = PyMem_Malloc(16);
+    buffer[16] = 'x';
+    PyMem_Free(buffer);
+
+    Py_RETURN_NONE;
+}
+
+static PyObject*
+pymem_api_misuse(PyObject *self, PyObject *args)
+{
+    char *buffer;
+
+    /* Deliberate misusage of Python allocators:
+       allococate with PyMem but release with PyMem_Raw. */
+    buffer = PyMem_Malloc(16);
+    PyMem_RawFree(buffer);
+
+    Py_RETURN_NONE;
+}
+
 
 static PyMethodDef TestMethods[] = {
     {"raise_exception",         raise_exception,                 METH_VARARGS},
@@ -3798,6 +3825,8 @@
     {"PyTime_AsMilliseconds", test_PyTime_AsMilliseconds, METH_VARARGS},
     {"PyTime_AsMicroseconds", test_PyTime_AsMicroseconds, METH_VARARGS},
     {"get_recursion_depth", get_recursion_depth, METH_NOARGS},
+    {"pymem_buffer_overflow", pymem_buffer_overflow, METH_NOARGS},
+    {"pymem_api_misuse", pymem_api_misuse, METH_NOARGS},
     {NULL, NULL} /* sentinel */
 };
 
diff --git a/Modules/main.c b/Modules/main.c
index ee129a5..b6dcdd0 100644
--- a/Modules/main.c
+++ b/Modules/main.c
@@ -93,14 +93,15 @@
 "               The default module search path uses %s.\n"
 "PYTHONCASEOK : ignore case in 'import' statements (Windows).\n"
 "PYTHONIOENCODING: Encoding[:errors] used for stdin/stdout/stderr.\n"
-"PYTHONFAULTHANDLER: dump the Python traceback on fatal errors.\n\
-";
-static const char usage_6[] = "\
-PYTHONHASHSEED: if this variable is set to 'random', a random value is used\n\
-   to seed the hashes of str, bytes and datetime objects.  It can also be\n\
-   set to an integer in the range [0,4294967295] to get hash values with a\n\
-   predictable seed.\n\
-";
+"PYTHONFAULTHANDLER: dump the Python traceback on fatal errors.\n";
+static const char usage_6[] =
+"PYTHONHASHSEED: if this variable is set to 'random', a random value is used\n"
+"   to seed the hashes of str, bytes and datetime objects.  It can also be\n"
+"   set to an integer in the range [0,4294967295] to get hash values with a\n"
+"   predictable seed.\n"
+"PYTHONMALLOC: set the Python memory allocators and/or install debug hooks\n"
+"   on Python memory allocators. Use PYTHONMALLOC=debug to install debug\n"
+"   hooks.\n";
 
 static int
 usage(int exitcode, const wchar_t* program)
@@ -341,6 +342,7 @@
     int help = 0;
     int version = 0;
     int saw_unbuffered_flag = 0;
+    char *opt;
     PyCompilerFlags cf;
     PyObject *warning_option = NULL;
     PyObject *warning_options = NULL;
@@ -365,6 +367,13 @@
         }
     }
 
+    opt = Py_GETENV("PYTHONMALLOC");
+    if (_PyMem_SetupAllocators(opt) < 0) {
+        fprintf(stderr,
+                "Error in PYTHONMALLOC: unknown allocator \"%s\"!\n", opt);
+        exit(1);
+    }
+
     Py_HashRandomizationFlag = 1;
     _PyRandom_Init();
 
diff --git a/Objects/obmalloc.c b/Objects/obmalloc.c
index 7cc889f..e4bd8ac 100644
--- a/Objects/obmalloc.c
+++ b/Objects/obmalloc.c
@@ -2,7 +2,19 @@
 
 /* Python's malloc wrappers (see pymem.h) */
 
-#ifdef PYMALLOC_DEBUG   /* WITH_PYMALLOC && PYMALLOC_DEBUG */
+/*
+ * Basic types
+ * I don't care if these are defined in <sys/types.h> or elsewhere. Axiom.
+ */
+#undef  uchar
+#define uchar   unsigned char   /* assuming == 8 bits  */
+
+#undef  uint
+#define uint    unsigned int    /* assuming >= 16 bits */
+
+#undef uptr
+#define uptr    Py_uintptr_t
+
 /* Forward declaration */
 static void* _PyMem_DebugMalloc(void *ctx, size_t size);
 static void* _PyMem_DebugCalloc(void *ctx, size_t nelem, size_t elsize);
@@ -11,7 +23,6 @@
 
 static void _PyObject_DebugDumpAddress(const void *p);
 static void _PyMem_DebugCheckAddress(char api_id, const void *p);
-#endif
 
 #if defined(__has_feature)  /* Clang */
  #if __has_feature(address_sanitizer)  /* is ASAN enabled? */
@@ -147,7 +158,6 @@
 #endif
 #define PYMEM_FUNCS PYRAW_FUNCS
 
-#ifdef PYMALLOC_DEBUG
 typedef struct {
     /* We tag each block with an API ID in order to tag API violations */
     char api_id;
@@ -164,10 +174,9 @@
     };
 
 #define PYDBG_FUNCS _PyMem_DebugMalloc, _PyMem_DebugCalloc, _PyMem_DebugRealloc, _PyMem_DebugFree
-#endif
 
 static PyMemAllocatorEx _PyMem_Raw = {
-#ifdef PYMALLOC_DEBUG
+#ifdef Py_DEBUG
     &_PyMem_Debug.raw, PYDBG_FUNCS
 #else
     NULL, PYRAW_FUNCS
@@ -175,7 +184,7 @@
     };
 
 static PyMemAllocatorEx _PyMem = {
-#ifdef PYMALLOC_DEBUG
+#ifdef Py_DEBUG
     &_PyMem_Debug.mem, PYDBG_FUNCS
 #else
     NULL, PYMEM_FUNCS
@@ -183,13 +192,71 @@
     };
 
 static PyMemAllocatorEx _PyObject = {
-#ifdef PYMALLOC_DEBUG
+#ifdef Py_DEBUG
     &_PyMem_Debug.obj, PYDBG_FUNCS
 #else
     NULL, PYOBJ_FUNCS
 #endif
     };
 
+int
+_PyMem_SetupAllocators(const char *opt)
+{
+    if (opt == NULL || *opt == '\0') {
+        /* PYTHONMALLOC is empty or is not set or ignored (-E/-I command line
+           options): use default allocators */
+#ifdef Py_DEBUG
+#  ifdef WITH_PYMALLOC
+        opt = "pymalloc_debug";
+#  else
+        opt = "malloc_debug";
+#  endif
+#else
+   /* !Py_DEBUG */
+#  ifdef WITH_PYMALLOC
+        opt = "pymalloc";
+#  else
+        opt = "malloc";
+#  endif
+#endif
+    }
+
+    if (strcmp(opt, "debug") == 0) {
+        PyMem_SetupDebugHooks();
+    }
+    else if (strcmp(opt, "malloc") == 0 || strcmp(opt, "malloc_debug") == 0)
+    {
+        PyMemAllocatorEx alloc = {NULL, PYRAW_FUNCS};
+
+        PyMem_SetAllocator(PYMEM_DOMAIN_RAW, &alloc);
+        PyMem_SetAllocator(PYMEM_DOMAIN_MEM, &alloc);
+        PyMem_SetAllocator(PYMEM_DOMAIN_OBJ, &alloc);
+
+        if (strcmp(opt, "malloc_debug") == 0)
+            PyMem_SetupDebugHooks();
+    }
+#ifdef WITH_PYMALLOC
+    else if (strcmp(opt, "pymalloc") == 0
+             || strcmp(opt, "pymalloc_debug") == 0)
+    {
+        PyMemAllocatorEx mem_alloc = {NULL, PYRAW_FUNCS};
+        PyMemAllocatorEx obj_alloc = {NULL, PYOBJ_FUNCS};
+
+        PyMem_SetAllocator(PYMEM_DOMAIN_RAW, &mem_alloc);
+        PyMem_SetAllocator(PYMEM_DOMAIN_MEM, &mem_alloc);
+        PyMem_SetAllocator(PYMEM_DOMAIN_OBJ, &obj_alloc);
+
+        if (strcmp(opt, "pymalloc_debug") == 0)
+            PyMem_SetupDebugHooks();
+    }
+#endif
+    else {
+        /* unknown allocator */
+        return -1;
+    }
+    return 0;
+}
+
 #undef PYRAW_FUNCS
 #undef PYMEM_FUNCS
 #undef PYOBJ_FUNCS
@@ -205,12 +272,34 @@
 #endif
     };
 
+static int
+_PyMem_DebugEnabled(void)
+{
+    return (_PyObject.malloc == _PyMem_DebugMalloc);
+}
+
+#ifdef WITH_PYMALLOC
+int
+_PyMem_PymallocEnabled(void)
+{
+    if (_PyMem_DebugEnabled()) {
+        return (_PyMem_Debug.obj.alloc.malloc == _PyObject_Malloc);
+    }
+    else {
+        return (_PyObject.malloc == _PyObject_Malloc);
+    }
+}
+#endif
+
 void
 PyMem_SetupDebugHooks(void)
 {
-#ifdef PYMALLOC_DEBUG
     PyMemAllocatorEx alloc;
 
+    /* hooks already installed */
+    if (_PyMem_DebugEnabled())
+        return;
+
     alloc.malloc = _PyMem_DebugMalloc;
     alloc.calloc = _PyMem_DebugCalloc;
     alloc.realloc = _PyMem_DebugRealloc;
@@ -233,7 +322,6 @@
         PyMem_GetAllocator(PYMEM_DOMAIN_OBJ, &_PyMem_Debug.obj.alloc);
         PyMem_SetAllocator(PYMEM_DOMAIN_OBJ, &alloc);
     }
-#endif
 }
 
 void
@@ -264,7 +352,6 @@
     case PYMEM_DOMAIN_OBJ: _PyObject = *allocator; break;
     /* ignore unknown domain */
     }
-
 }
 
 void
@@ -642,22 +729,6 @@
 #define SIMPLELOCK_LOCK(lock)   /* acquire released lock */
 #define SIMPLELOCK_UNLOCK(lock) /* release acquired lock */
 
-/*
- * Basic types
- * I don't care if these are defined in <sys/types.h> or elsewhere. Axiom.
- */
-#undef  uchar
-#define uchar   unsigned char   /* assuming == 8 bits  */
-
-#undef  uint
-#define uint    unsigned int    /* assuming >= 16 bits */
-
-#undef  ulong
-#define ulong   unsigned long   /* assuming >= 32 bits */
-
-#undef uptr
-#define uptr    Py_uintptr_t
-
 /* When you say memory, my mind reasons in terms of (pointers to) blocks */
 typedef uchar block;
 
@@ -949,11 +1020,15 @@
     struct arena_object* arenaobj;
     uint excess;        /* number of bytes above pool alignment */
     void *address;
+    static int debug_stats = -1;
 
-#ifdef PYMALLOC_DEBUG
-    if (Py_GETENV("PYTHONMALLOCSTATS"))
+    if (debug_stats == -1) {
+        char *opt = Py_GETENV("PYTHONMALLOCSTATS");
+        debug_stats = (opt != NULL && *opt != '\0');
+    }
+    if (debug_stats)
         _PyObject_DebugMallocStats(stderr);
-#endif
+
     if (unused_arena_objects == NULL) {
         uint i;
         uint numarenas;
@@ -1709,7 +1784,7 @@
 
 #endif /* WITH_PYMALLOC */
 
-#ifdef PYMALLOC_DEBUG
+
 /*==========================================================================*/
 /* A x-platform debugging allocator.  This doesn't manage memory directly,
  * it wraps a real allocator, adding extra debugging info to the memory blocks.
@@ -1767,31 +1842,6 @@
     }
 }
 
-#ifdef Py_DEBUG
-/* Is target in the list?  The list is traversed via the nextpool pointers.
- * The list may be NULL-terminated, or circular.  Return 1 if target is in
- * list, else 0.
- */
-static int
-pool_is_in_list(const poolp target, poolp list)
-{
-    poolp origlist = list;
-    assert(target != NULL);
-    if (list == NULL)
-        return 0;
-    do {
-        if (target == list)
-            return 1;
-        list = list->nextpool;
-    } while (list != NULL && list != origlist);
-    return 0;
-}
-
-#else
-#define pool_is_in_list(X, Y) 1
-
-#endif  /* Py_DEBUG */
-
 /* Let S = sizeof(size_t).  The debug malloc asks for 4*S extra bytes and
    fills them with useful stuff, here calling the underlying malloc's result p:
 
@@ -2106,7 +2156,6 @@
     }
 }
 
-#endif  /* PYMALLOC_DEBUG */
 
 static size_t
 printone(FILE *out, const char* msg, size_t value)
@@ -2158,8 +2207,30 @@
     (void)printone(out, buf2, num_blocks * sizeof_block);
 }
 
+
 #ifdef WITH_PYMALLOC
 
+#ifdef Py_DEBUG
+/* Is target in the list?  The list is traversed via the nextpool pointers.
+ * The list may be NULL-terminated, or circular.  Return 1 if target is in
+ * list, else 0.
+ */
+static int
+pool_is_in_list(const poolp target, poolp list)
+{
+    poolp origlist = list;
+    assert(target != NULL);
+    if (list == NULL)
+        return 0;
+    do {
+        if (target == list)
+            return 1;
+        list = list->nextpool;
+    } while (list != NULL && list != origlist);
+    return 0;
+}
+#endif
+
 /* Print summary info to "out" about the state of pymalloc's structures.
  * In Py_DEBUG mode, also perform some expensive internal consistency
  * checks.
@@ -2233,7 +2304,9 @@
 
             if (p->ref.count == 0) {
                 /* currently unused */
+#ifdef Py_DEBUG
                 assert(pool_is_in_list(p, arenas[i].freepools));
+#endif
                 continue;
             }
             ++numpools[sz];
@@ -2273,9 +2346,8 @@
         quantization += p * ((POOL_SIZE - POOL_OVERHEAD) % size);
     }
     fputc('\n', out);
-#ifdef PYMALLOC_DEBUG
-    (void)printone(out, "# times object malloc called", serialno);
-#endif
+    if (_PyMem_DebugEnabled())
+        (void)printone(out, "# times object malloc called", serialno);
     (void)printone(out, "# arenas allocated total", ntimes_arena_allocated);
     (void)printone(out, "# arenas reclaimed", ntimes_arena_allocated - narenas);
     (void)printone(out, "# arenas highwater mark", narenas_highwater);
@@ -2303,6 +2375,7 @@
 
 #endif /* #ifdef WITH_PYMALLOC */
 
+
 #ifdef Py_USING_MEMORY_DEBUGGER
 /* Make this function last so gcc won't inline it since the definition is
  * after the reference.
diff --git a/Programs/python.c b/Programs/python.c
index 37b10b8..a7afbc7 100644
--- a/Programs/python.c
+++ b/Programs/python.c
@@ -24,6 +24,9 @@
     int i, res;
     char *oldloc;
 
+    /* Force malloc() allocator to bootstrap Python */
+    (void)_PyMem_SetupAllocators("malloc");
+
     argv_copy = (wchar_t **)PyMem_RawMalloc(sizeof(wchar_t*) * (argc+1));
     argv_copy2 = (wchar_t **)PyMem_RawMalloc(sizeof(wchar_t*) * (argc+1));
     if (!argv_copy || !argv_copy2) {
@@ -62,7 +65,13 @@
 
     setlocale(LC_ALL, oldloc);
     PyMem_RawFree(oldloc);
+
     res = Py_Main(argc, argv_copy);
+
+    /* Force again malloc() allocator to release memory blocks allocated
+       before Py_Main() */
+    (void)_PyMem_SetupAllocators("malloc");
+
     for (i = 0; i < argc; i++) {
         PyMem_RawFree(argv_copy2[i]);
     }
diff --git a/Python/pylifecycle.c b/Python/pylifecycle.c
index e9db7f6..715a547 100644
--- a/Python/pylifecycle.c
+++ b/Python/pylifecycle.c
@@ -702,9 +702,12 @@
     if (Py_GETENV("PYTHONDUMPREFS"))
         _Py_PrintReferenceAddresses(stderr);
 #endif /* Py_TRACE_REFS */
-#ifdef PYMALLOC_DEBUG
-    if (Py_GETENV("PYTHONMALLOCSTATS"))
-        _PyObject_DebugMallocStats(stderr);
+#ifdef WITH_PYMALLOC
+    if (_PyMem_PymallocEnabled()) {
+        char *opt = Py_GETENV("PYTHONMALLOCSTATS");
+        if (opt != NULL && *opt != '\0')
+            _PyObject_DebugMallocStats(stderr);
+    }
 #endif
 
     call_ll_exitfuncs();
diff --git a/Python/sysmodule.c b/Python/sysmodule.c
index 702e8f0..c5b4ac1 100644
--- a/Python/sysmodule.c
+++ b/Python/sysmodule.c
@@ -1151,8 +1151,10 @@
 sys_debugmallocstats(PyObject *self, PyObject *args)
 {
 #ifdef WITH_PYMALLOC
-    _PyObject_DebugMallocStats(stderr);
-    fputc('\n', stderr);
+    if (_PyMem_PymallocEnabled()) {
+        _PyObject_DebugMallocStats(stderr);
+        fputc('\n', stderr);
+    }
 #endif
     _PyObject_DebugTypeStats(stderr);