Optimize stack overflow handling.

We now subtract the frame size from the stack pointer for methods
which have a frame smaller than a certain size. Also changed code to
use slow paths instead of launchpads.

Delete kStackOverflow launchpad since it is no longer needed.

ARM optimizations:
One less move per stack overflow check (without fault handler for
stack overflows). Use ldr pc instead of ldr r12, b r12.
Code size (boot.oat):
Before: 58405348
After: 57803236

TODO: X86 doesn't have the case for large frames. This could case an
incoming signal to go past the end of the stack (unlikely however).

Change-Id: Ie3a5635cd6fb09de27960e1f8cee45bfae38fb33
diff --git a/runtime/thread.h b/runtime/thread.h
index fdf976d..6cbd3d9 100644
--- a/runtime/thread.h
+++ b/runtime/thread.h
@@ -101,6 +101,12 @@
 #else
   static constexpr size_t kStackOverflowReservedBytes = 16 * KB;
 #endif
+  // How much of the reserved bytes is reserved for incoming signals.
+  static constexpr size_t kStackOverflowSignalReservedBytes = 2 * KB;
+  // How much of the reserved bytes we may temporarily use during stack overflow checks as an
+  // optimization.
+  static constexpr size_t kStackOverflowReservedUsableBytes =
+      kStackOverflowReservedBytes - kStackOverflowSignalReservedBytes;
 
   // Creates a new native thread corresponding to the given managed peer.
   // Used to implement Thread.start.