Enable reading page map without lock in RosAlloc::BulkFree

Enabling this flag greatly reduces how much time was spent in the GC.
It was not done previously since it was regressing MemAllocTest. With
these RosAlloc changes, the benchmark score no longer regresses after
we enable the flag.

Changed Run::AllocSlot to only have one mode of allocation. The new
mode is finding the first free bit in the bitmap. This was
previously the slow path but is now the fast path. Some optimizations
which enabled this include always having the alloc bitmap bits which
correspond to invalid slots be set to 1. This prevents us from needing
a bound check since we will never end up allocating there.

Changed revoking thread local buffer to point to an invalid run. The
invalid run is just a run which always has all the allocation bits set
to 1. When a thread attempts to do a thread local allocation from here
it will always fail and go slow path. This eliminates the need for a
null check for revoked runs.

Changed zeroing of memory to happen during free, AllocPages should
always return zeroed memory. Added prefetching which happens when we
allocate a run.

Some refactoring to reduce duplicated code.

Ergonomics changes: Changed kStickyGcThroughputAdjustment to 1.0,
this helps reduce GC time.

Measurements (3 samples per benchmark):
Before: MemAllocTest scores: 3463, 3445, 3431
EvaluateAndApplyChanges score | total GC time
Iter 1: 3485, 23.602436s
Iter 2: 3434, 22.499882s
Iter 3: 3483, 23.253274s

After: MemAllocTest scores: 3495, 3417, 3409
EvaluateAndApplyChanges score | total GC time:
Iter 1: 3375, 17.463462s
Iter 2: 3358, 16.185188s
Iter 3: 3367, 15.822312s

Bug: 8788501
Bug: 11790317
Bug: 9986565
Change-Id: Ifd273a054824028dabed27c07c081dde1816f93c
diff --git a/runtime/base/mutex.cc b/runtime/base/mutex.cc
index fdd0249..2bc17bf 100644
--- a/runtime/base/mutex.cc
+++ b/runtime/base/mutex.cc
@@ -206,16 +206,16 @@
       os << "never contended";
     } else {
       os << "contended " << contention_count
-         << " times, average wait of contender " << PrettyDuration(wait_time / contention_count);
+         << " total wait of contender " << PrettyDuration(wait_time)
+         << " average " << PrettyDuration(wait_time / contention_count);
       SafeMap<uint64_t, size_t> most_common_blocker;
       SafeMap<uint64_t, size_t> most_common_blocked;
-      typedef SafeMap<uint64_t, size_t>::const_iterator It;
       for (size_t i = 0; i < kContentionLogSize; ++i) {
         uint64_t blocked_tid = log[i].blocked_tid;
         uint64_t owner_tid = log[i].owner_tid;
         uint32_t count = log[i].count;
         if (count > 0) {
-          It it = most_common_blocked.find(blocked_tid);
+          auto it = most_common_blocked.find(blocked_tid);
           if (it != most_common_blocked.end()) {
             most_common_blocked.Overwrite(blocked_tid, it->second + count);
           } else {
@@ -231,10 +231,10 @@
       }
       uint64_t max_tid = 0;
       size_t max_tid_count = 0;
-      for (It it = most_common_blocked.begin(); it != most_common_blocked.end(); ++it) {
-        if (it->second > max_tid_count) {
-          max_tid = it->first;
-          max_tid_count = it->second;
+      for (const auto& pair : most_common_blocked) {
+        if (pair.second > max_tid_count) {
+          max_tid = pair.first;
+          max_tid_count = pair.second;
         }
       }
       if (max_tid != 0) {
@@ -242,10 +242,10 @@
       }
       max_tid = 0;
       max_tid_count = 0;
-      for (It it = most_common_blocker.begin(); it != most_common_blocker.end(); ++it) {
-        if (it->second > max_tid_count) {
-          max_tid = it->first;
-          max_tid_count = it->second;
+      for (const auto& pair : most_common_blocker) {
+        if (pair.second > max_tid_count) {
+          max_tid = pair.first;
+          max_tid_count = pair.second;
         }
       }
       if (max_tid != 0) {