ARM64: Tune SIMD loop unrolling factor heuristic.

Improve SIMD loop unrolling factor heuristic for ARM64 by
accounting for max desired loop size, trip_count, etc. The
following example shows 21% perf increase:

  for (int i = 0; i < LENGTH; i++) {
    bc[i] = ba[i];  // Byte arrays
  }

Test: test-art-host, test-art-target.
Change-Id: Ic587759c51aa4354df621ffb1c7ce4ebd798dfc1
3 files changed