bpo-39984: Move pending calls to PyInterpreterState (GH-19066)

If Py_AddPendingCall() is called in a subinterpreter, the function is
now scheduled to be called from the subinterpreter, rather than being
called from the main interpreter.

Each subinterpreter now has its own list of scheduled calls.

* Move pending and eval_breaker fields from _PyRuntimeState.ceval
  to PyInterpreterState.ceval.
* new_interpreter() now calls _PyEval_InitThreads() to create
  pending calls lock.
* Fix Py_AddPendingCall() for subinterpreters. It now calls
  _PyThreadState_GET() which works in a subinterpreter if the
  caller holds the GIL, and only falls back on
  PyGILState_GetThisThreadState() if _PyThreadState_GET()
  returns NULL.
diff --git a/Doc/c-api/init.rst b/Doc/c-api/init.rst
index f309ad0..a4ec0e3 100644
--- a/Doc/c-api/init.rst
+++ b/Doc/c-api/init.rst
@@ -1389,6 +1389,10 @@
    This function doesn't need a current thread state to run, and it doesn't
    need the global interpreter lock.
 
+   To call this function in a subinterpreter, the caller must hold the GIL.
+   Otherwise, the function *func* can be scheduled to be called from the wrong
+   interpreter.
+
    .. warning::
       This is a low-level function, only useful for very special cases.
       There is no guarantee that *func* will be called as quick as
@@ -1397,6 +1401,12 @@
       function is generally **not** suitable for calling Python code from
       arbitrary C threads.  Instead, use the :ref:`PyGILState API<gilstate>`.
 
+   .. versionchanged:: 3.9
+      If this function is called in a subinterpreter, the function *func* is
+      now scheduled to be called from the subinterpreter, rather than being
+      called from the main interpreter. Each subinterpreter now has its own
+      list of scheduled calls.
+
    .. versionadded:: 3.1
 
 .. _profiling: