Change ProcessReferences to not use RecursiveMarkObject.

Calling ProcessMarkStack in RecursiveMarkObject caused a lot of
overhead due to timing logger splits. Changed the logic to be the
same as prior to the reference queue refactoring which involves
calling process mark stack after preserving soft references and
enqueueing finalizer references.

FinalizingGC longest pause is reduced by around 1/2 down to ~300ms.
Benchmark score ~400000 -> ~600000.

Also changed the timing logger splits in the GC to have (Paused) if
the split is a paused part of the GC.

Bug: 12129382

Change-Id: I7476d4f23670b19d70738e2fd48e37ec2f57e9f4
diff --git a/runtime/object_callbacks.h b/runtime/object_callbacks.h
index 6af338b..468ba08 100644
--- a/runtime/object_callbacks.h
+++ b/runtime/object_callbacks.h
@@ -60,6 +60,7 @@
 // address the object (if the object didn't move, returns the object input parameter).
 typedef mirror::Object* (IsMarkedCallback)(mirror::Object* object, void* arg)
     __attribute__((warn_unused_result));
+typedef void (ProcessMarkStackCallback)(void* arg);
 
 }  // namespace art