bpo-35081: Add _PyThreadState_GET() internal macro (GH-10266)

If Py_BUILD_CORE is defined, the PyThreadState_GET() macro access
_PyRuntime which comes from the internal pycore_state.h header.
Public headers must not require internal headers.

Move PyThreadState_GET() and _PyInterpreterState_GET_UNSAFE() from
Include/pystate.h to Include/internal/pycore_state.h, and rename
PyThreadState_GET() to _PyThreadState_GET() there.

The PyThreadState_GET() macro of pystate.h is now redefined when
pycore_state.h is included, to use the fast _PyThreadState_GET().

Changes:

* Add _PyThreadState_GET() macro
* Replace "PyThreadState_GET()->interp" with
  _PyInterpreterState_GET_UNSAFE()
* Replace PyThreadState_GET() with _PyThreadState_GET() in internal C
  files (compiled with Py_BUILD_CORE defined), but keep
  PyThreadState_GET() in the public header files.
* _testcapimodule.c: replace PyThreadState_GET() with
  PyThreadState_Get(); the module is not compiled with Py_BUILD_CORE
  defined.
* pycore_state.h now requires Py_BUILD_CORE to be defined.
diff --git a/Objects/dictobject.c b/Objects/dictobject.c
index a9ae907..ea564a2 100644
--- a/Objects/dictobject.c
+++ b/Objects/dictobject.c
@@ -1314,9 +1314,9 @@
     /* We can arrive here with a NULL tstate during initialization: try
        running "python -Wi" for an example related to string interning.
        Let's just hope that no exception occurs then...  This must be
-       PyThreadState_GET() and not PyThreadState_Get() because the latter
+       _PyThreadState_GET() and not PyThreadState_Get() because the latter
        abort Python if tstate is NULL. */
-    tstate = PyThreadState_GET();
+    tstate = _PyThreadState_GET();
     if (tstate != NULL && tstate->curexc_type != NULL) {
         /* preserve the existing exception */
         PyObject *err_type, *err_value, *err_tb;