New PYMALLOC_DEBUG function void _PyMalloc_DebugDumpStats(void).
This displays stats about the # of arenas, pools, blocks and bytes, to
stderr, both used and reserved but unused.

CAUTION:  Because PYMALLOC_DEBUG is on, the debug malloc routine adds
16 bytes to each request.  This makes each block appear two size classes
higher than it would be if PYMALLOC_DEBUG weren't on.

So far, playing with this confirms the obvious:  there's a lot of activity
in the "small dict" size class, but nothing in the core makes any use of
the 8-byte or 16-byte classes.
diff --git a/Include/pymem.h b/Include/pymem.h
index 5d9beed..18c49d7 100644
--- a/Include/pymem.h
+++ b/Include/pymem.h
@@ -102,6 +102,7 @@
 DL_IMPORT(void) _PyMalloc_DebugFree(void *p);
 DL_IMPORT(void) _PyMalloc_DebugDumpAddress(const void *p);
 DL_IMPORT(void) _PyMalloc_DebugCheckAddress(const void *p);
+DL_IMPORT(void) _PyMalloc_DebugDumpStats(void);
 #define _PyMalloc_MALLOC _PyMalloc_DebugMalloc
 #define _PyMalloc_REALLOC _PyMalloc_DebugRealloc
 #define _PyMalloc_FREE _PyMalloc_DebugFree