Reduce the number of fences needed for monitors

Add the necessary CasWeakAcquire primitives for LockWords.

Have MonitorEnter initially read the lockword using a
memory_order_relaxed operation. In the unlikely case we need more,
compensate with an explicit fence.

In the uncontended case, install the thin lock with Acquire,
rather than SequentiallyConsistent semantics.

Have MonitorExit use a Release instead of SequentiallyConsistent
CAS in the ReadBarrier case. Add TODO for the other case.

Together, these should usually eliminate 3 fences (or acq/rel)
per critical section.

Have Install() only use Release ordering.

Add TODO for inflation spinning, which looks to me like it could be
improved appreciably.

Drive-by fix:

GetMaxSpinsBeforeThinLockInflation spelling

Test: Build for several targets, boot, m art-test-host art-test-target

Change-Id: I2cab09723252065f6365e4234ee3249c69ece888
4 files changed