Per-thread -fstack-protector guards for x86.

Based on a pair of patches from Intel:

  https://android-review.googlesource.com/#/c/43909/
  https://android-review.googlesource.com/#/c/44903/

For x86, this patch supports _both_ the global that ARM/MIPS use
and the per-thread TLS entry (%gs:20) that GCC uses by default. This
lets us support binaries built with any x86 toolchain (right now,
the NDK is emitting x86 code that uses the global).

I've also extended the original tests to cover ARM/MIPS too, and
be a little more thorough for x86.

Change-Id: I02f279a80c6b626aecad449771dec91df235ad01
diff --git a/libc/private/bionic_tls.h b/libc/private/bionic_tls.h
index a626d21..f661ccf 100644
--- a/libc/private/bionic_tls.h
+++ b/libc/private/bionic_tls.h
@@ -43,24 +43,19 @@
  ** pre-allocated slot directly for performance reason).
  **/
 
-/* maximum number of elements in the TLS array */
+/* Maximum number of elements in the TLS array. */
 #define BIONIC_TLS_SLOTS            64
 
-/* note that slot 0, called TLS_SLOT_SELF must point to itself.
- * this is required to implement thread-local storage with the x86
- * Linux kernel, that reads the TLS from fs:[0], where 'fs' is a
- * thread-specific segment descriptor...
- */
-
-/* Well-known TLS slots. */
-#define TLS_SLOT_SELF               0
+/* Well-known TLS slots. What data goes in which slot is arbitrary unless otherwise noted. */
+#define TLS_SLOT_SELF               0  /* The kernel requires this specific slot for x86. */
 #define TLS_SLOT_THREAD_ID          1
 #define TLS_SLOT_ERRNO              2
 
 #define TLS_SLOT_OPENGL_API         3
 #define TLS_SLOT_OPENGL             4
 
-#define TLS_SLOT_DLERROR            5
+#define TLS_SLOT_STACK_GUARD        5  /* GCC requires this specific slot for x86. */
+#define TLS_SLOT_DLERROR            6
 
 #define TLS_SLOT_MAX_WELL_KNOWN     TLS_SLOT_DLERROR