More parallel GC, rewritten parallel mark stack processing. Card scanning may now be done in parallel. This speeds up sticky and reduces pause times for all GC types. Speedup on my mako (ritz perf): Average pause time for sticky GC (~250 samples): Without parallel cards scanning enabled: 2.524904215ms Parallel card scanning (num_gc_threads_): 1.552123552ms Throughput (~250 samples): Sticky GC throughput with parallel card scanning: 69MB/s Sticky GC throughput without parallel card scanning: 51MB/s Rewrote the mark stack processing to be LIFO and use a prefetch queue like the non parallel version. Cleaned up some of the logcat printing for the activity manager process state listening. Added unlikely hints to object scanning since arrays and classes are scanned much less often than normal objects. Fixed a bug where the number of GC threads was clamped to 1 due to a bool instead of a size_t. Fixed a race condition when we added references to the reference queues. Sharded the reference queue lock into one lock for each reference type (weak, soft, phatom, finalizer). Changed timing splits to be different for processing gray objects with and without mutators paused since sticky GC does both. Mask out the class bit when visiting fields as an optimization, this is valid since classes are held live by the class linker. Partially completed: Parallel recursive mark + finger. Bug: 10245302 Bug: 9969166 Bug: 9986532 Bug: 9961698 Change-Id: I142d09718c4609b7c2387cb28f517a6983c73288

commit: 94c32c5f01c7d44781317bf23933ed0a5bc4b796 [log] [tgz]
author: Mathieu Chartier <mathieuc@google.com> Fri Aug 09 11:14:04 2013 -0700
committer: Mathieu Chartier <mathieuc@google.com> Fri Aug 16 16:30:59 2013 -0700
tree: a59e9dcf43671ac624e3db2ed88e5ebd4a65ce9d
parent: d2b0f33cf01601f040494f4f882e60f70b527930 [diff] [blame]
diff --git a/runtime/thread_pool.cc b/runtime/thread_pool.cc
index 067ef2d..f7fdcfb 100644
--- a/runtime/thread_pool.cc
+++ b/runtime/thread_pool.cc

@@ -23,6 +23,8 @@
 
 namespace art {
 
+static const bool kMeasureWaitTime = false;
+
 ThreadPoolWorker::ThreadPoolWorker(ThreadPool* thread_pool, const std::string& name,
                                    size_t stack_size)
     : thread_pool_(thread_pool),
@@ -64,7 +66,7 @@
   MutexLock mu(self, task_queue_lock_);
   tasks_.push_back(task);
   // If we have any waiters, signal one.
-  if (waiting_count_ != 0) {
+  if (started_ && waiting_count_ != 0) {
     task_queue_condition_.Signal(self);
   }
 }
@@ -129,11 +131,13 @@
       // We may be done, lets broadcast to the completion condition.
       completion_condition_.Broadcast(self);
     }
-    const uint64_t wait_start = NanoTime();
+    const uint64_t wait_start = kMeasureWaitTime ? NanoTime() : 0;
     task_queue_condition_.Wait(self);
-    const uint64_t wait_end = NanoTime();
-    total_wait_time_ += wait_end - std::max(wait_start, start_time_);
-    waiting_count_--;
+    if (kMeasureWaitTime) {
+      const uint64_t wait_end = NanoTime();
+      total_wait_time_ += wait_end - std::max(wait_start, start_time_);
+    }
+    --waiting_count_;
   }
 
   // We are shutting down, return NULL to tell the worker thread to stop looping.
commit	94c32c5f01c7d44781317bf23933ed0a5bc4b796	[log] [tgz]
author	Mathieu Chartier <mathieuc@google.com>	Fri Aug 09 11:14:04 2013 -0700
committer	Mathieu Chartier <mathieuc@google.com>	Fri Aug 16 16:30:59 2013 -0700
tree	a59e9dcf43671ac624e3db2ed88e5ebd4a65ce9d
parent	d2b0f33cf01601f040494f4f882e60f70b527930 [diff] [blame]