bpo-35081: Add _PyThreadState_GET() internal macro (GH-10266)

If Py_BUILD_CORE is defined, the PyThreadState_GET() macro access
_PyRuntime which comes from the internal pycore_state.h header.
Public headers must not require internal headers.

Move PyThreadState_GET() and _PyInterpreterState_GET_UNSAFE() from
Include/pystate.h to Include/internal/pycore_state.h, and rename
PyThreadState_GET() to _PyThreadState_GET() there.

The PyThreadState_GET() macro of pystate.h is now redefined when
pycore_state.h is included, to use the fast _PyThreadState_GET().

Changes:

* Add _PyThreadState_GET() macro
* Replace "PyThreadState_GET()->interp" with
  _PyInterpreterState_GET_UNSAFE()
* Replace PyThreadState_GET() with _PyThreadState_GET() in internal C
  files (compiled with Py_BUILD_CORE defined), but keep
  PyThreadState_GET() in the public header files.
* _testcapimodule.c: replace PyThreadState_GET() with
  PyThreadState_Get(); the module is not compiled with Py_BUILD_CORE
  defined.
* pycore_state.h now requires Py_BUILD_CORE to be defined.
diff --git a/Modules/_testcapimodule.c b/Modules/_testcapimodule.c
index b2cda51..018af4a 100644
--- a/Modules/_testcapimodule.c
+++ b/Modules/_testcapimodule.c
@@ -4160,7 +4160,7 @@
 static PyObject*
 get_recursion_depth(PyObject *self, PyObject *args)
 {
-    PyThreadState *tstate = PyThreadState_GET();
+    PyThreadState *tstate = PyThreadState_Get();
 
     /* subtract one to ignore the frame of the get_recursion_depth() call */
     return PyLong_FromLong(tstate->recursion_depth - 1);