Move caller-saves saving/restoring to ReadBarrierMarkRegX.

Instead of saving/restoring live caller-save registers
before/after the call to read barrier mark entry points
ReadBarrierMarkRegX, have these entry points save/restore
all the caller-save registers themselves (except register
rX, which contains the return value).

Also refactor the assembly code of these entry points
using macros.

* Boot image code size variation on Nexus 5X
  (aosp_bullhead-userdebug build):
  - total ARM64 framework Oat files size change:
    119196792 bytes -> 115575920 bytes (-3.04%)
  - total ARM framework Oat files size change:
    100435212 bytes -> 97621188 bytes (-2.80%)

* Benchmarks (ARM64) score variations on Nexus 5X
  (aosp_bullhead-userdebug build):
  - RitzPerf (lower is better)
    - average score difference: -2.71%
  - CaffeineMark (higher is better)
    - no real difference for most tests
      (absolute variation lower than 1%)
    - better score on the "Method" benchmark:
      score variation 41253 -> 44891 (+8.82%)

Test: ART host and target (ARM, ARM64) tests.
Bug: 29506760
Bug: 12687968
Change-Id: I881bf73139a3f1c2bee9ffc6fc8c00f9a392afa6
diff --git a/compiler/optimizing/code_generator_x86_64.cc b/compiler/optimizing/code_generator_x86_64.cc
index a015893..0d85bea 100644
--- a/compiler/optimizing/code_generator_x86_64.cc
+++ b/compiler/optimizing/code_generator_x86_64.cc
@@ -493,11 +493,9 @@
         << instruction_->DebugName();
 
     __ Bind(GetEntryLabel());
-    // Save live registers before the runtime call, and in particular
-    // RDI and/or RAX (if they are live), as they are clobbered by
-    // functions art_quick_read_barrier_mark_regX.
-    SaveLiveRegisters(codegen, locations);
-
+    // No need to save live registers; it's taken care of by the
+    // entrypoint. Also, there is no need to update the stack mask,
+    // as this runtime call will not trigger a garbage collection.
     InvokeRuntimeCallingConvention calling_convention;
     CodeGeneratorX86_64* x86_64_codegen = down_cast<CodeGeneratorX86_64*>(codegen);
     DCHECK_NE(reg, RSP);
@@ -523,8 +521,6 @@
                                   instruction_,
                                   instruction_->GetDexPc(),
                                   this);
-
-    RestoreLiveRegisters(codegen, locations);
     __ jmp(GetExitLabel());
   }