Change ProcessReferences to not use RecursiveMarkObject.

Calling ProcessMarkStack in RecursiveMarkObject caused a lot of
overhead due to timing logger splits. Changed the logic to be the
same as prior to the reference queue refactoring which involves
calling process mark stack after preserving soft references and
enqueueing finalizer references.

FinalizingGC longest pause is reduced by around 1/2 down to ~300ms.
Benchmark score ~400000 -> ~600000.

Also changed the timing logger splits in the GC to have (Paused) if
the split is a paused part of the GC.

Bug: 12129382

Change-Id: I7476d4f23670b19d70738e2fd48e37ec2f57e9f4
diff --git a/runtime/gc/collector/mark_sweep.h b/runtime/gc/collector/mark_sweep.h
index 6a48cf7..963b9ea 100644
--- a/runtime/gc/collector/mark_sweep.h
+++ b/runtime/gc/collector/mark_sweep.h
@@ -176,7 +176,7 @@
       SHARED_LOCKS_REQUIRED(Locks::heap_bitmap_lock_,
                             Locks::mutator_lock_);
 
-  static mirror::Object* RecursiveMarkObjectCallback(mirror::Object* obj, void* arg)
+  static mirror::Object* MarkObjectCallback(mirror::Object* obj, void* arg)
       SHARED_LOCKS_REQUIRED(Locks::mutator_lock_)
       EXCLUSIVE_LOCKS_REQUIRED(Locks::heap_bitmap_lock_);
 
@@ -185,9 +185,8 @@
       SHARED_LOCKS_REQUIRED(Locks::mutator_lock_)
       EXCLUSIVE_LOCKS_REQUIRED(Locks::heap_bitmap_lock_);
 
-  static mirror::Object* MarkObjectCallback(mirror::Object* object, void* arg)
-        SHARED_LOCKS_REQUIRED(Locks::mutator_lock_)
-        EXCLUSIVE_LOCKS_REQUIRED(Locks::heap_bitmap_lock_);
+  static void ProcessMarkStackPausedCallback(void* arg)
+      EXCLUSIVE_LOCKS_REQUIRED(Locks::heap_bitmap_lock_, Locks::mutator_lock_);
 
   static void MarkRootParallelCallback(mirror::Object** root, void* arg, uint32_t thread_id,
                                        RootType root_type)