- pythunrun.c, Py_Finalize(): move the call to _Py_PrintReferences()
  even farther down, to just before the call to
  _PyObject_DebugMallocStats().  This required the following changes:

- pystate.c, PyThreadState_GetDict(): changed not to raise an
  exception or issue a fatal error when no current thread state is
  available, but simply return NULL without raising an exception
  (ever).

- object.c, Py_ReprEnter(): when PyThreadState_GetDict() returns NULL,
  don't raise an exception but return 0.  This means that when
  printing a container that's recursive, printing will go on and on
  and on.  But that shouldn't happen in the case we care about (see
  first bullet).

- Updated Misc/NEWS and Doc/api/init.tex to reflect changes to
  PyThreadState_GetDict() definition.
diff --git a/Doc/api/init.tex b/Doc/api/init.tex
index f0ca287..388f479 100644
--- a/Doc/api/init.tex
+++ b/Doc/api/init.tex
@@ -677,9 +677,12 @@
 \begin{cfuncdesc}{PyObject*}{PyThreadState_GetDict}{}
   Return a dictionary in which extensions can store thread-specific
   state information.  Each extension should use a unique key to use to
-  store state in the dictionary.  If this function returns \NULL, an
-  exception has been raised and the caller should allow it to
-  propagate.
+  store state in the dictionary.  It is okay to call this function
+  when no current thread state is available.
+  If this function returns \NULL, no exception has been raised and the
+  caller should assume no current thread state is available.
+  \versionchanged[Previously this could only be called when a current
+  thread is active, and \NULL meant that an exception was raised]{2.3}
 \end{cfuncdesc}