Add concurrent card graying for immune spaces

We now age the cards and gray the objects before the GC pause. This
is done to reduce how much work is required during the pause and
allows increasing the card size without regressing the GC pause
time.

We rescan the cards in the pause and only process the cards that were
dirtied since the concurrent graying.

Pause time spent graying objects on maps (Pixel). The average is the
per GC metric.

Disabled entrypoint switching for x86 and x86_64. This is to fix a
case where the gray bit is set but the entrypoint is null, resulting
in crashes.

Also reverted to checking "is gc marking" for x86 and x86_64 codegen
to prevent performance regressions.

128 byte cards without the change:
Sum: 1.912ms 99% C.I. 125us-244us Avg: 159.333us Max: 244us

512 byte cards without the change:
Sum: 12.027ms 99% C.I. 0.940ms-1.495ms Avg: 1.202ms Max: 1.495ms

512 byte cards with concurrent graying:
Sum: 1.385ms 99% C.I. 51us-239us Avg: 86.562us Max: 239us

Bug: 36457259
Bug: 12687968
Bug: 31022084

Test: test-art-host

(cherry picked from commit a3856d0d801f066b9b09649b3a17bdbb747f012d)

Change-Id: I7e8f8a5716f96dde827377234f854482452bc9cd
9 files changed