ART: JNI thread state transition optimization

This patch improves the JNI performance by removing the explicit acquiring and
releasing the mutator lock when a thread state transits between suspended and
runnable states.

The functions responsible for changing the state were found to be the costliest
part of the JNI. Originally, a thread needs to acquire a shared mutator lock by
a CAS instruction when entering the runnable state and also needs to release
the lock by a CAS when entering the native state from runnable. This patch
removes these CAS operations when a thread state transits between suspended and
runnable. A thread in the runnable state is considered to have shared ownership
of the mutator lock and therefore transitions in and out of the runnable state
have associated implication on the mutator lock ownership. Meanwhile, a barrier
is added to control suspending all threads from running.

JNI transition overhead was reduced by 25% on IA platform and by 17% on ARM
platform by this patch, while it has little impact on GC pause time (measured
with "suspend all histogram").

Change-Id: Icee95d8ffff1bbfc95309a41cc48836536fec689
Signed-off-by: Yu, Li <yu.l.li@intel.com>
Signed-off-by: Haitao, Feng <haitao.feng@intel.com>
Signed-off-by: Lei, Li <lei.l.li@intel.com>
diff --git a/runtime/thread_list.h b/runtime/thread_list.h
index 2c1f813..edd1e05 100644
--- a/runtime/thread_list.h
+++ b/runtime/thread_list.h
@@ -155,6 +155,8 @@
 
   bool Contains(Thread* thread) EXCLUSIVE_LOCKS_REQUIRED(Locks::thread_list_lock_);
   bool Contains(pid_t tid) EXCLUSIVE_LOCKS_REQUIRED(Locks::thread_list_lock_);
+  size_t RunCheckpoint(Closure* checkpoint_function, bool includeSuspended)
+      LOCKS_EXCLUDED(Locks::thread_list_lock_, Locks::thread_suspend_count_lock_);
 
   void DumpUnattachedThreads(std::ostream& os)
       LOCKS_EXCLUDED(Locks::thread_list_lock_);
@@ -166,6 +168,11 @@
       LOCKS_EXCLUDED(Locks::thread_list_lock_,
                      Locks::thread_suspend_count_lock_);
 
+  void SuspendAllInternal(Thread* self, Thread* ignore1, Thread* ignore2 = nullptr,
+                          bool debug_suspend = false)
+      LOCKS_EXCLUDED(Locks::thread_list_lock_,
+                     Locks::thread_suspend_count_lock_);
+
   void AssertThreadsAreSuspended(Thread* self, Thread* ignore1, Thread* ignore2 = nullptr)
       LOCKS_EXCLUDED(Locks::thread_list_lock_,
                      Locks::thread_suspend_count_lock_);