Use CLREX in ARM/ARM64 CAS intrinsic Baker read barrier slow paths.

Follow clang's implementation, which uses CLREX in
compare-and-exchange operations on the failure path, i.e.
when the value read by the LDREX (ARM) or LDXR (ARM64)
instruction is not the expected value, in order to release
the monitor.  The previous implementation was perfectly
correct, but this one may improve performance on some
micro-architectures.

This change only affects the
art::arm::ReadBarrierMarkAndUpdateFieldSlowPathARM and
art::arm64::ReadBarrierMarkAndUpdateFieldSlowPathARM64 slow
paths.

Test: make test-art-target-run-test-004-UnsafeTest
Bug: 29516905
Bug: 12687968
Change-Id: I99edd1ae6489dcec4a0089bfef52736114c6cd48
2 files changed