ARM/ARM64: Use trampolines for slow-path entrypoint calls.

This reduces the size of the generated code. We do this only
for AOT compilation where we get the most benefit.

Sizes of aosp_taimen-userdebug prebuilts:
 - before:
   arm/boot*.oat: 19624804
   arm64/boot*.oat: 23265752
   oat/arm64/services.odex: 22417968
 - after:
   arm/boot*.oat: 19460500 (-160KiB)
   arm64/boot*.oat: 22957928 (-301KiB)
   oat/arm64/services.odex: 21957864 (-449KiB)

Test: m test-art-host-gtest
Test: aosp_taimen-userdebug boots.
Test: run-gtests.sh
Test: testrunner.py --target --optimizing
Bug: 12607709
Change-Id: Ie9dbd1ba256173e4e439e8bbb8832a791965cbe6
29 files changed