Make (most of) Python's tests pass under Thread Sanitizer.

http://code.google.com/p/data-race-test/wiki/ThreadSanitizer is a dynamic data
race detector that runs on top of valgrind. With this patch, the binaries at
http://code.google.com/p/data-race-test/wiki/ThreadSanitizer#Binaries pass many
but not all of the Python tests. All of regrtest still passes outside of tsan.

I've implemented part of the C1x atomic types so that we can explicitly mark
variables that are used across threads, and get defined behavior as compilers
advance.

I've added tsan's client header and implementation to the codebase in
dynamic_annotations.{h,c} (docs at
http://code.google.com/p/data-race-test/wiki/DynamicAnnotations).
Unfortunately, I haven't been able to get helgrind and drd to give sensible
error messages, even when I use their client annotations, so I'm not supporting
them.
diff --git a/Include/pystate.h b/Include/pystate.h
index f85fa8c..7b6b602 100644
--- a/Include/pystate.h
+++ b/Include/pystate.h
@@ -131,12 +131,15 @@
 
 /* Variable and macro for in-line access to current thread state */
 
-PyAPI_DATA(PyThreadState *) _PyThreadState_Current;
+/* Assuming the current thread holds the GIL, this is the
+   PyThreadState for the current thread. */
+PyAPI_DATA(_Py_atomic_address) _PyThreadState_Current;
 
 #ifdef Py_DEBUG
 #define PyThreadState_GET() PyThreadState_Get()
 #else
-#define PyThreadState_GET() (_PyThreadState_Current)
+#define PyThreadState_GET() \
+    ((PyThreadState*)_Py_atomic_load_relaxed(&_PyThreadState_Current))
 #endif
 
 typedef