Rosalloc fast path in assembly for x86_64.

Measurements (host, ms)
    BinaryTrees:   324 ->  360 (+11%)
    BinaryTrees with 64 MB alloc stack + 1 GB heap:
                   299 ->  275  (-8%)
    MemAllocTest:  414 ->  368 (-11%)

Interestingly, BinaryTrees gets slower with default settings due to more
blocking (gc-for-alloc) collections. It seems because allocations are
faster, the allocation stack size and the heap size become the
bottleneck (note both an allocation stack overflow as well as heap
exhaustion cause gc-for-alloc collections). With a larger allocation
stack and heap where no blocking collections are observed, BinaryTrees
gets faster.

Bug: 9986565
Change-Id: I642b9fecd0a583cc133998c2f3932de815c4a757
1 file changed