Reword and restructure the GIL API doc
diff --git a/Doc/c-api/init.rst b/Doc/c-api/init.rst
index 8d793a4..f920909 100644
--- a/Doc/c-api/init.rst
+++ b/Doc/c-api/init.rst
@@ -366,48 +366,47 @@
    single: lock, interpreter
 
 The Python interpreter is not fully thread-safe.  In order to support
-multi-threaded Python programs, there's a global lock, called the :dfn:`global
-interpreter lock` or :dfn:`GIL`, that must be held by the current thread before
+multi-threaded Python programs, there's a global lock, called the :term:`global
+interpreter lock` or :term:`GIL`, that must be held by the current thread before
 it can safely access Python objects. Without the lock, even the simplest
 operations could cause problems in a multi-threaded program: for example, when
 two threads simultaneously increment the reference count of the same object, the
 reference count could end up being incremented only once instead of twice.
 
-.. index:: single: setcheckinterval() (in module sys)
+.. index:: single: setswitchinterval() (in module sys)
 
-Therefore, the rule exists that only the thread that has acquired the global
-interpreter lock may operate on Python objects or call Python/C API functions.
-In order to support multi-threaded Python programs, the interpreter regularly
-releases and reacquires the lock --- by default, every 100 bytecode instructions
-(this can be changed with  :func:`sys.setcheckinterval`).  The lock is also
-released and reacquired around potentially blocking I/O operations like reading
-or writing a file, so that other threads can run while the thread that requests
-the I/O is waiting for the I/O operation to complete.
+Therefore, the rule exists that only the thread that has acquired the
+:term:`GIL` may operate on Python objects or call Python/C API functions.
+In order to emulate concurrency of execution, the interpreter regularly
+tries to switch threads (see :func:`sys.setswitchinterval`).  The lock is also
+released around potentially blocking I/O operations like reading or writing
+a file, so that other Python threads can run in the meantime.
 
 .. index::
    single: PyThreadState
    single: PyThreadState
 
-The Python interpreter needs to keep some bookkeeping information separate per
-thread --- for this it uses a data structure called :c:type:`PyThreadState`.
-There's one global variable, however: the pointer to the current
-:c:type:`PyThreadState` structure.  Before the addition of :dfn:`thread-local
-storage` (:dfn:`TLS`) the current thread state had to be manipulated
-explicitly.
+The Python interpreter keeps some thread-specific bookkeeping information
+inside a data structure called :c:type:`PyThreadState`.  There's also one
+global variable pointing to the current :c:type:`PyThreadState`: it can
+be retrieved using :c:func:`PyThreadState_Get`.
 
-This is easy enough in most cases.  Most code manipulating the global
-interpreter lock has the following simple structure::
+Releasing the GIL from extension code
+-------------------------------------
+
+Most extension code manipulating the :term:`GIL` has the following simple
+structure::
 
    Save the thread state in a local variable.
    Release the global interpreter lock.
-   ...Do some blocking I/O operation...
+   ... Do some blocking I/O operation ...
    Reacquire the global interpreter lock.
    Restore the thread state from the local variable.
 
 This is so common that a pair of macros exists to simplify it::
 
    Py_BEGIN_ALLOW_THREADS
-   ...Do some blocking I/O operation...
+   ... Do some blocking I/O operation ...
    Py_END_ALLOW_THREADS
 
 .. index::
@@ -416,9 +415,8 @@
 
 The :c:macro:`Py_BEGIN_ALLOW_THREADS` macro opens a new block and declares a
 hidden local variable; the :c:macro:`Py_END_ALLOW_THREADS` macro closes the
-block.  Another advantage of using these two macros is that when Python is
-compiled without thread support, they are defined empty, thus saving the thread
-state and GIL manipulations.
+block.  These two macros are still available when Python is compiled without
+thread support (they simply have an empty expansion).
 
 When thread support is enabled, the block above expands to the following code::
 
@@ -428,65 +426,60 @@
    ...Do some blocking I/O operation...
    PyEval_RestoreThread(_save);
 
-Using even lower level primitives, we can get roughly the same effect as
-follows::
-
-   PyThreadState *_save;
-
-   _save = PyThreadState_Swap(NULL);
-   PyEval_ReleaseLock();
-   ...Do some blocking I/O operation...
-   PyEval_AcquireLock();
-   PyThreadState_Swap(_save);
-
 .. index::
    single: PyEval_RestoreThread()
-   single: errno
    single: PyEval_SaveThread()
-   single: PyEval_ReleaseLock()
-   single: PyEval_AcquireLock()
 
-There are some subtle differences; in particular, :c:func:`PyEval_RestoreThread`
-saves and restores the value of the  global variable :c:data:`errno`, since the
-lock manipulation does not guarantee that :c:data:`errno` is left alone.  Also,
-when thread support is disabled, :c:func:`PyEval_SaveThread` and
-:c:func:`PyEval_RestoreThread` don't manipulate the GIL; in this case,
-:c:func:`PyEval_ReleaseLock` and :c:func:`PyEval_AcquireLock` are not available.
-This is done so that dynamically loaded extensions compiled with thread support
-enabled can be loaded by an interpreter that was compiled with disabled thread
-support.
+Here is how these functions work: the global interpreter lock is used to protect the pointer to the
+current thread state.  When releasing the lock and saving the thread state,
+the current thread state pointer must be retrieved before the lock is released
+(since another thread could immediately acquire the lock and store its own thread
+state in the global variable). Conversely, when acquiring the lock and restoring
+the thread state, the lock must be acquired before storing the thread state
+pointer.
 
-The global interpreter lock is used to protect the pointer to the current thread
-state.  When releasing the lock and saving the thread state, the current thread
-state pointer must be retrieved before the lock is released (since another
-thread could immediately acquire the lock and store its own thread state in the
-global variable). Conversely, when acquiring the lock and restoring the thread
-state, the lock must be acquired before storing the thread state pointer.
+.. note::
+   Calling system I/O functions is the most common use case for releasing
+   the GIL, but it can also be useful before calling long-running computations
+   which don't need access to Python objects, such as compression or
+   cryptographic functions operating over memory buffers.  For example, the
+   standard :mod:`zlib` and :mod:`hashlib` modules release the GIL when
+   compressing or hashing data.
 
-It is important to note that when threads are created from C, they don't have
-the global interpreter lock, nor is there a thread state data structure for
-them.  Such threads must bootstrap themselves into existence, by first
-creating a thread state data structure, then acquiring the lock, and finally
-storing their thread state pointer, before they can start using the Python/C
-API.  When they are done, they should reset the thread state pointer, release
-the lock, and finally free their thread state data structure.
+Non-Python created threads
+--------------------------
 
-Threads can take advantage of the :c:func:`PyGILState_\*` functions to do all of
-the above automatically.  The typical idiom for calling into Python from a C
-thread is now::
+When threads are created using the dedicated Python APIs (such as the
+:mod:`threading` module), a thread state is automatically associated to them
+and the code showed above is therefore correct.  However, when threads are
+created from C (for example by a third-party library with its own thread
+management), they don't hold the GIL, nor is there a thread state structure
+for them.
+
+If you need to call Python code from these threads (often this will be part
+of a callback API provided by the aforementioned third-party library),
+you must first register these threads with the interpreter by
+creating a thread state data structure, then acquiring the GIL, and finally
+storing their thread state pointer, before you can start using the Python/C
+API.  When you are done, you should reset the thread state pointer, release
+the GIL, and finally free the thread state data structure.
+
+The :c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release` functions do
+all of the above automatically.  The typical idiom for calling into Python
+from a C thread is::
 
    PyGILState_STATE gstate;
    gstate = PyGILState_Ensure();
 
-   /* Perform Python actions here.  */
+   /* Perform Python actions here. */
    result = CallSomeFunction();
-   /* evaluate result */
+   /* evaluate result or handle exception */
 
    /* Release the thread. No Python API allowed beyond this point. */
    PyGILState_Release(gstate);
 
 Note that the :c:func:`PyGILState_\*` functions assume there is only one global
-interpreter (created automatically by :c:func:`Py_Initialize`).  Python still
+interpreter (created automatically by :c:func:`Py_Initialize`).  Python
 supports the creation of additional interpreters (using
 :c:func:`Py_NewInterpreter`), but mixing multiple interpreters and the
 :c:func:`PyGILState_\*` API is unsupported.
@@ -509,6 +502,12 @@
 always able to.
 
 
+High-level API
+--------------
+
+These are the most commonly used types and functions when writing C extension
+code, or when embedding the Python interpreter:
+
 .. c:type:: PyInterpreterState
 
    This data structure represents the state shared by a number of cooperating
@@ -550,21 +549,22 @@
 
    .. index:: module: _thread
 
-   When only the main thread exists, no GIL operations are needed. This is a
-   common situation (most Python programs do not use threads), and the lock
-   operations slow the interpreter down a bit. Therefore, the lock is not
-   created initially.  This situation is equivalent to having acquired the lock:
-   when there is only a single thread, all object accesses are safe.  Therefore,
-   when this function initializes the global interpreter lock, it also acquires
-   it.  Before the Python :mod:`_thread` module creates a new thread, knowing
-   that either it has the lock or the lock hasn't been created yet, it calls
-   :c:func:`PyEval_InitThreads`.  When this call returns, it is guaranteed that
-   the lock has been created and that the calling thread has acquired it.
+   .. note::
+      When only the main thread exists, no GIL operations are needed. This is a
+      common situation (most Python programs do not use threads), and the lock
+      operations slow the interpreter down a bit. Therefore, the lock is not
+      created initially.  This situation is equivalent to having acquired the lock:
+      when there is only a single thread, all object accesses are safe.  Therefore,
+      when this function initializes the global interpreter lock, it also acquires
+      it.  Before the Python :mod:`_thread` module creates a new thread, knowing
+      that either it has the lock or the lock hasn't been created yet, it calls
+      :c:func:`PyEval_InitThreads`.  When this call returns, it is guaranteed that
+      the lock has been created and that the calling thread has acquired it.
 
-   It is **not** safe to call this function when it is unknown which thread (if
-   any) currently has the global interpreter lock.
+      It is **not** safe to call this function when it is unknown which thread (if
+      any) currently has the global interpreter lock.
 
-   This function is not available when thread support is disabled at compile time.
+      This function is not available when thread support is disabled at compile time.
 
 
 .. c:function:: int PyEval_ThreadsInitialized()
@@ -575,37 +575,6 @@
    not available when thread support is disabled at compile time.
 
 
-.. c:function:: void PyEval_AcquireLock()
-
-   Acquire the global interpreter lock.  The lock must have been created earlier.
-   If this thread already has the lock, a deadlock ensues.  This function is not
-   available when thread support is disabled at compile time.
-
-
-.. c:function:: void PyEval_ReleaseLock()
-
-   Release the global interpreter lock.  The lock must have been created earlier.
-   This function is not available when thread support is disabled at compile time.
-
-
-.. c:function:: void PyEval_AcquireThread(PyThreadState *tstate)
-
-   Acquire the global interpreter lock and set the current thread state to
-   *tstate*, which should not be *NULL*.  The lock must have been created earlier.
-   If this thread already has the lock, deadlock ensues.  This function is not
-   available when thread support is disabled at compile time.
-
-
-.. c:function:: void PyEval_ReleaseThread(PyThreadState *tstate)
-
-   Reset the current thread state to *NULL* and release the global interpreter
-   lock.  The lock must have been created earlier and must be held by the current
-   thread.  The *tstate* argument, which must not be *NULL*, is only used to check
-   that it represents the current thread state --- if it isn't, a fatal error is
-   reported. This function is not available when thread support is disabled at
-   compile time.
-
-
 .. c:function:: PyThreadState* PyEval_SaveThread()
 
    Release the global interpreter lock (if it has been created and thread
@@ -624,6 +593,20 @@
    when thread support is disabled at compile time.)
 
 
+.. c:function:: PyThreadState* PyThreadState_Get()
+
+   Return the current thread state.  The global interpreter lock must be held.
+   When the current thread state is *NULL*, this issues a fatal error (so that
+   the caller needn't check for *NULL*).
+
+
+.. c:function:: PyThreadState* PyThreadState_Swap(PyThreadState *tstate)
+
+   Swap the current thread state with the thread state given by the argument
+   *tstate*, which may be *NULL*.  The global interpreter lock must be held
+   and is not released.
+
+
 .. c:function:: void PyEval_ReInitThreads()
 
    This function is called from :c:func:`PyOS_AfterFork` to ensure that newly
@@ -631,6 +614,43 @@
    are not running in the child process.
 
 
+The following functions use thread-local storage, and are not compatible
+with sub-interpreters:
+
+.. c:function:: PyGILState_STATE PyGILState_Ensure()
+
+   Ensure that the current thread is ready to call the Python C API regardless
+   of the current state of Python, or of the global interpreter lock. This may
+   be called as many times as desired by a thread as long as each call is
+   matched with a call to :c:func:`PyGILState_Release`. In general, other
+   thread-related APIs may be used between :c:func:`PyGILState_Ensure` and
+   :c:func:`PyGILState_Release` calls as long as the thread state is restored to
+   its previous state before the Release().  For example, normal usage of the
+   :c:macro:`Py_BEGIN_ALLOW_THREADS` and :c:macro:`Py_END_ALLOW_THREADS` macros is
+   acceptable.
+
+   The return value is an opaque "handle" to the thread state when
+   :c:func:`PyGILState_Ensure` was called, and must be passed to
+   :c:func:`PyGILState_Release` to ensure Python is left in the same state. Even
+   though recursive calls are allowed, these handles *cannot* be shared - each
+   unique call to :c:func:`PyGILState_Ensure` must save the handle for its call
+   to :c:func:`PyGILState_Release`.
+
+   When the function returns, the current thread will hold the GIL and be able
+   to call arbitrary Python code.  Failure is a fatal error.
+
+
+.. c:function:: void PyGILState_Release(PyGILState_STATE)
+
+   Release any resources previously acquired.  After this call, Python's state will
+   be the same as it was prior to the corresponding :c:func:`PyGILState_Ensure` call
+   (but generally this state will be unknown to the caller, hence the use of the
+   GILState API).
+
+   Every call to :c:func:`PyGILState_Ensure` must be matched by a call to
+   :c:func:`PyGILState_Release` on the same thread.
+
+
 The following macros are normally used without a trailing semicolon; look for
 example usage in the Python source distribution.
 
@@ -664,6 +684,10 @@
    :c:macro:`Py_BEGIN_ALLOW_THREADS` without the opening brace and variable
    declaration.  It is a no-op when thread support is disabled at compile time.
 
+
+Low-level API
+-------------
+
 All of the following functions are only available when thread support is enabled
 at compile time, and must be called only when the global interpreter lock has
 been created.
@@ -709,19 +733,6 @@
    :c:func:`PyThreadState_Clear`.
 
 
-.. c:function:: PyThreadState* PyThreadState_Get()
-
-   Return the current thread state.  The global interpreter lock must be held.
-   When the current thread state is *NULL*, this issues a fatal error (so that
-   the caller needn't check for *NULL*).
-
-
-.. c:function:: PyThreadState* PyThreadState_Swap(PyThreadState *tstate)
-
-   Swap the current thread state with the thread state given by the argument
-   *tstate*, which may be *NULL*.  The global interpreter lock must be held.
-
-
 .. c:function:: PyObject* PyThreadState_GetDict()
 
    Return a dictionary in which extensions can store thread-specific state
@@ -742,38 +753,31 @@
    exception (if any) for the thread is cleared. This raises no exceptions.
 
 
-.. c:function:: PyGILState_STATE PyGILState_Ensure()
+.. c:function:: void PyEval_AcquireThread(PyThreadState *tstate)
 
-   Ensure that the current thread is ready to call the Python C API regardless
-   of the current state of Python, or of the global interpreter lock. This may
-   be called as many times as desired by a thread as long as each call is
-   matched with a call to :c:func:`PyGILState_Release`. In general, other
-   thread-related APIs may be used between :c:func:`PyGILState_Ensure` and
-   :c:func:`PyGILState_Release` calls as long as the thread state is restored to
-   its previous state before the Release().  For example, normal usage of the
-   :c:macro:`Py_BEGIN_ALLOW_THREADS` and :c:macro:`Py_END_ALLOW_THREADS` macros is
-   acceptable.
-
-   The return value is an opaque "handle" to the thread state when
-   :c:func:`PyGILState_Ensure` was called, and must be passed to
-   :c:func:`PyGILState_Release` to ensure Python is left in the same state. Even
-   though recursive calls are allowed, these handles *cannot* be shared - each
-   unique call to :c:func:`PyGILState_Ensure` must save the handle for its call
-   to :c:func:`PyGILState_Release`.
-
-   When the function returns, the current thread will hold the GIL. Failure is a
-   fatal error.
+   Acquire the global interpreter lock and set the current thread state to
+   *tstate*, which should not be *NULL*.  The lock must have been created earlier.
+   If this thread already has the lock, deadlock ensues.
 
 
-.. c:function:: void PyGILState_Release(PyGILState_STATE)
+.. c:function:: void PyEval_ReleaseThread(PyThreadState *tstate)
 
-   Release any resources previously acquired.  After this call, Python's state will
-   be the same as it was prior to the corresponding :c:func:`PyGILState_Ensure` call
-   (but generally this state will be unknown to the caller, hence the use of the
-   GILState API.)
+   Reset the current thread state to *NULL* and release the global interpreter
+   lock.  The lock must have been created earlier and must be held by the current
+   thread.  The *tstate* argument, which must not be *NULL*, is only used to check
+   that it represents the current thread state --- if it isn't, a fatal error is
+   reported.
 
-   Every call to :c:func:`PyGILState_Ensure` must be matched by a call to
-   :c:func:`PyGILState_Release` on the same thread.
+
+.. c:function:: void PyEval_AcquireLock()
+
+   Acquire the global interpreter lock.  The lock must have been created earlier.
+   If this thread already has the lock, a deadlock ensues.
+
+
+.. c:function:: void PyEval_ReleaseLock()
+
+   Release the global interpreter lock.  The lock must have been created earlier.
 
 
 Sub-interpreter support