Move 64-bit multiplication to helper

We're right on the edge for supporting inline 64-bit arithmetic
with our current temp register pool allocation.  Moving 64-bit multiplication
out of line to sidestep the problem, and added some temp frees to
3-operand long ops.  In the latter case there was a potential problem
if the result long was located in a part of the frame not in the range
of a single base+displacement store.

Change-Id: I6f8e0a11b440ed35e08f2e3457de6cbea89cfccc
diff --git a/src/thread.cc b/src/thread.cc
index ed902ab..5fdff78 100644
--- a/src/thread.cc
+++ b/src/thread.cc
@@ -59,6 +59,7 @@
   pArtF2l = artF2L;
   pArtD2l = artD2L;
   pLdivmod = __aeabi_ldivmod;
+  pLmul = __aeabi_lmul;
 #endif
   pArtAllocArrayByClass = Array::Alloc;
   pMemcpy = memcpy;