Move thread flags and state into 32bits.

We need to ensure that transitions to Runnable are atomic wrt to a
thread modifying the suspend count. Currently this is achieved by
holding the thread_suspend_count_lock_. This change creates a set of bit
flags that summarize that the suspend_count_ is raised and also others
flags that signify the managed code should go into a slow path.

The effect of this change are two-fold:
1) transitions from suspended to runnable can CAS the thread state
rather than holding the suspend_count_lock_. This will make JNI
transitions cheaper.
2) the exception/suspend/interpreter poll needed for shadow frames can
be rolled into a single compare of the bit fields against 0.

Change-Id: I589f84e3dca396c3db448bf32d814565acf3d11f
diff --git a/src/signal_catcher.cc b/src/signal_catcher.cc
index 229edf6..7239374 100644
--- a/src/signal_catcher.cc
+++ b/src/signal_catcher.cc
@@ -121,7 +121,7 @@
   thread_list->SuspendAll();
 
   // We should exclusively hold the mutator lock, set state to Runnable without a pending
-  // suspension to avoid giving away or trying re-acquire the mutator lock.
+  // suspension to avoid giving away or trying to re-acquire the mutator lock.
   Locks::mutator_lock_->AssertExclusiveHeld();
   Thread* self = Thread::Current();
   ThreadState old_state;
@@ -133,7 +133,7 @@
       CHECK_EQ(suspend_count, 1);
       self->ModifySuspendCount(-1, false);
     }
-    old_state = self->SetState(kRunnable);
+    old_state = self->SetStateUnsafe(kRunnable);
   }
 
   std::ostringstream os;