am 8881a809: Sweep concurrently.

Merge commit '8881a8098e259a1faf392d20c1fefc1ee4a63b20' into dalvik-dev

* commit '8881a8098e259a1faf392d20c1fefc1ee4a63b20':
  Sweep concurrently.
diff --git a/docs/jni-tips.html b/docs/jni-tips.html
index 76cb03f..33f2c25 100644
--- a/docs/jni-tips.html
+++ b/docs/jni-tips.html
@@ -35,6 +35,15 @@
 </li>
 
 <li> <a href="#Unsupported">Unsupported Features</a>
+</li>
+
+<li> <a href="#FAQUnsatisfied">FAQ: UnsatisfiedLinkError</a>
+</li>
+<li> <a href="#FAQFindClass">FAQ: FindClass didn't find my class</a>
+</li>
+<li> <a href="#FAQSharing">FAQ: Sharing raw data with native code</a>
+</li>
+
 </ul>
 <p>
 <noautolink>
@@ -72,7 +81,8 @@
 If a piece of code has no other way to get its JNIEnv, you should share
 the JavaVM, and use JavaVM-&gt;GetEnv to discover the thread's JNIEnv.
 </p><p>
-The C and C++ declarations of JNIEnv and JavaVM are different.  "jni.h" provides different typedefs
+The C declarations of JNIEnv and JavaVM are different from the C++
+declarations.  "jni.h" provides different typedefs
 depending on whether it's included into ".c" or ".cpp".  For this reason it's a bad idea to
 include JNIEnv arguments in header files included by both languages.  (Put another way: if your
 header file requires "#ifdef __cplusplus", you may have to do some extra work if anything in
@@ -284,7 +294,7 @@
 </p><p>
 One reason for checking the <code>isCopy</code> flag is to know if
 you need to call <code>Release</code> with <code>JNI_COMMIT</code>
-after making changes to an array -- if you're alternating between making
+after making changes to an array &mdash; if you're alternating between making
 changes and executing code that uses the contents of the array, you may be
 able to
 skip the no-op commit.  Another possible reason for checking the flag is for
@@ -331,11 +341,11 @@
     env->GetByteArrayRegion(array, 0, len, buffer);
 </pre>
 </p><p>
-This accomplishes the same thing, with several advantages:
+This has several advantages:
 <ul>
     <li>Requires one JNI call instead of 2, reducing overhead.
     <li>Doesn't require pinning or extra data copies.
-    <li>Reduces the risk of programmer error -- no risk of forgetting
+    <li>Reduces the risk of programmer error &mdash; no risk of forgetting
     to call <code>Release</code> after something fails.
 </ul>
 </p><p>
@@ -372,6 +382,13 @@
     <li>ReleaseStringUTFChars
 </ul></font>
 </p><p>
+Many JNI calls can throw an exception, but often provide a simpler way
+of checking for failure.  For example, if <code>NewString</code> returns
+a non-NULL value, you don't need to check for an exception.  However, if
+you call a method (using a function like <code>CallObjectMethod</code>),
+you must always check for an exception, because the return value is not
+going to be valid if an exception was thrown.
+</p><p>
 Note that exceptions thrown by interpreted code do not "leap over" native code,
 and C++ exceptions thrown by native code are not handled by Dalvik.
 The JNI <code>Throw</code> and <code>ThrowNew</code> instructions just
@@ -400,12 +417,14 @@
 goal is to minimize the overhead on the assumption that, if you've written it in native code,
 you probably did it for performance reasons.
 </p><p>
-Some VMs support extended checking with the "<code>-Xcheck:jni</code>" flag.  If the flag is set, the VM
-puts a different table of functions into the JavaVM and JNIEnv pointers.  These functions do
-an extended series of checks before calling the standard implementation.
+In Dalvik, you can enable additional checks by setting the
+"<code>-Xcheck:jni</code>" flag.  If the flag is set, the VM directs
+the JavaVM and JNIEnv pointers to a different table of functions.
+These functions perform an extended series of checks before calling the
+standard implementation.
 
 </p><p>
-Some things that may be checked:
+The additional tests include:
 </p><p>
 </p>
 <ul>
@@ -416,6 +435,9 @@
 </li>
 <li> Field type correctness, e.g. don't store a HashMap in a String field.
 </li>
+<li> Ensure jmethodID is appropriate when making a static or virtual
+method call.
+</li>
 <li> Check to see if an exception is pending on calls where pending exceptions are not legal.
 </li>
 <li> Check for calls to inappropriate functions between Critical get/release calls.
@@ -431,8 +453,7 @@
 <p>Accessibility of methods and fields (i.e. public vs. private) is not
 checked.
 <p>
-The Dalvik VM supports the <code>-Xcheck:jni</code> flag.  For a
-description of how to enable it for Android apps, see
+For a description of how to enable CheckJNI for Android apps, see
 <a href="embedded-vm-control.html">Controlling the Embedded VM</a>.
 It's currently enabled by default in the Android emulator and on
 "engineering" device builds.
@@ -483,13 +504,13 @@
 </p><blockquote><pre>jint JNI_OnLoad(JavaVM* vm, void* reserved)
 {
     JNIEnv* env;
-    if ((*vm)->GetEnv(vm, (void**) &env, JNI_VERSION_1_4) != JNI_OK)
+    if ((*vm)->GetEnv(vm, (void**) &env, JNI_VERSION_1_6) != JNI_OK)
         return -1;
 
     /* get class with (*env)->FindClass */
     /* register methods with (*env)->RegisterNatives */
 
-    return JNI_VERSION_1_4;
+    return JNI_VERSION_1_6;
 }
 </pre></blockquote>
 </p><p>
@@ -513,7 +534,9 @@
 that was used to load the shared library.  Normally <code>FindClass</code>
 uses the loader associated with the method at the top of the interpreted
 stack, or if there isn't one (because the thread was just attached to
-the VM) it uses the "system" class loader.
+the VM) it uses the "system" class loader.  This makes
+<code>JNI_OnLoad</code> a convenient place to look up and cache class
+object references.
 </p><p>
 
 
@@ -543,10 +566,169 @@
     programmers to create hard references to weak globals before doing
     anything with them, so this should not be at all limiting.)</li>
     <li><code>GetObjectRefType</code> (new in 1.6) is implemented but not fully
-    functional -- it can't always tell the difference between "local" and
+    functional &mdash; it can't always tell the difference between "local" and
     "global" references.</li>
 </ul>
 
+<p>For backward compatibility, you may need to be aware of:
+<ul>
+    <li>Until 2.0 ("Eclair"), the '$' character was not properly
+    converted to "_00024" during searches for method names.  Working
+    around this requires using explicit registration or moving the
+    native methods out of inner classes.
+    <li>"Weak global" references were not implemented until 2.2 ("Froyo").
+    Older VMs will vigorously reject attempts to use them.  You can use
+    the Android platform version constants to test for support.
+</ul>
+
+
+</p><h2><a name="FAQUnsatisfied"> FAQ: UnsatisfiedLinkError </a></h2>
+<p>
+When working on native code it's not uncommon to see a failure like this:
+<pre>java.lang.UnsatisfiedLinkError: Library foo not found</pre>
+<p>
+In some cases it means what it says &mdash; the library wasn't found.  In
+other cases the library exists but couldn't be opened by dlopen(), and
+the details of the failure can be found in logcat.  For example:
+<pre>D/dalvikvm(  870): Trying to load lib /sdcard/libfoo.so 0x4001ff48
+I/dalvikvm(  870): Unable to dlopen(/sdcard/libfoo.so): Cannot load library: /sdcard/libfoo.so is not a valid ELF object
+</pre>
+<p>
+Common reasons why you might encounter "library not found" exceptions:
+<ul>
+    <li>The library doesn't exist or isn't accessible to the app.  Use
+    <code>adb shell ls -l &lt;path&gt;</code> to check its presence
+    and permissions.
+    <li>The library wasn't built with the NDK.  This can result in
+    dependencies on functions or libraries that don't exist on the device.
+</ul>
+</p><p>
+Another class of <code>UnsatisfiedLinkError</code> failures looks like:
+<pre>java.lang.UnsatisfiedLinkError: myfunc
+        at Foo.myfunc(Native Method)
+        at Foo.main(Foo.java:10)</pre>
+<p>
+In logcat, you'll see:
+<pre>W/dalvikvm(  880): No implementation found for native LFoo;.myfunc ()V</pre>
+<p>
+This means that the VM tried to find a matching method but was unsuccessful.
+Some common reasons for this are:
+<ul>
+    <li>The library isn't getting loaded.  Check the logcat output for
+    messages about library loading.
+    <li>The method isn't being found due to a name or signature mismatch.  This
+    is commonly caused by:
+    <ul>
+        <li>For lazy method lookup, failing to declare C++ functions
+        with <code>extern C</code>.  You can use <code>arm-eabi-nm</code>
+        to see the symbols as they appear in the library; if they look
+        mangled (e.g. <code>_Z15Java_Foo_myfuncP7_JNIEnvP7_jclass</code>
+        rather than <code>Java_Foo_myfunc</code>) then you need to
+        adjust the declaration.
+        <li>For explicit registration, minor errors when entering the
+        method signature.  Make sure that what you're passing to the
+        registration call matches the signature in the log file.
+        Remember that 'B' is <code>byte</code> and 'Z' is <code>boolean</code>.
+        Class name components in signatures start with 'L', end with ';',
+        use '/' to separate package/class names, and use '$' to separate
+        inner-class names
+        (e.g. <code>Ljava/util/Map$Entry;</code>).
+    </ul>
+</ul>
+<p>
+Using <code>javah</code> to automatically generate JNI headers may help
+avoid some problems.
+
+
+</p><h2><a name="FAQFindClass"> FAQ: FindClass didn't find my class </a></h2>
+<p>
+Make sure that the class name string has the correct format.  JNI class
+names start with the package name and are separated with slashes,
+e.g. <code>java/lang/String</code>.  If you're looking up an array class,
+you need to start with the appropriate number of square brackets and
+must also wrap the class with 'L' and ';', so a one-dimensional array of
+<code>String</code> would be <code>[Ljava/lang/String;</code>.
+</p><p>
+If the class name looks right, you could be running into a class loader
+issue.  <code>FindClass</code> wants to start the class search in the
+class loader associated with your code.  It examines the VM call stack,
+which will look something like:
+<pre>    Foo.myfunc(Native Method)
+    Foo.main(Foo.java:10)
+    dalvik.system.NativeStart.main(Native Method)</pre>
+<p>
+The topmost method is <code>Foo.myfunc</code>.  <code>FindClass</code>
+finds the <code>ClassLoader</code> object associated with the <code>Foo</code>
+class and uses that.
+</p><p>
+This usually does what you want.  You can get into trouble if you
+create a thread outside the VM (perhaps by calling <code>pthread_create</code>
+and then attaching it to the VM with <code>AttachCurrentThread</code>).
+Now the stack trace looks like this:
+<pre>    dalvik.system.NativeStart.run(Native Method)</pre>
+<p>
+The topmost method is <code>NativeStart.run</code>, which isn't part of
+your application.  If you call <code>FindClass</code> from this thread, the
+VM will start in the "system" class loader instead of the one associated
+with your application, so attempts to find app-specific classes will fail.
+</p><p>
+There are a few ways to work around this:
+<ul>
+    <li>Do your <code>FindClass</code> lookups once, in
+    <code>JNI_OnLoad</code>, and cache the class references for later
+    use.  Any <code>FindClass</code> calls made as part of executing
+    <code>JNI_OnLoad</code> will use the class loader associated with
+    the function that called <code>System.loadLibrary</code> (this is a
+    special rule, provided to make library initialization more convenient).
+    If your app code is loading the library, <code>FindClass</code>
+    will use the correct class loader.
+    <li>Pass an instance of the class into the functions that need
+    it, e.g. declare your native method to take a Class argument and
+    then pass <code>Foo.class</code> in.
+    <li>Cache a reference to the <code>ClassLoader</code> object somewhere
+    handy, and issue <code>loadClass</code> calls directly.  This requires
+    some effort.
+</ul>
+
+</p><p>
+
+
+</p><h2><a name="FAQSharing"> FAQ: Sharing raw data with native code </a></h2>
+<p>
+You may find yourself in a situation where you need to access a large
+buffer of raw data from code written in Java and C/C++.  Common examples
+include manipulation of bitmaps or sound samples.  There are two
+basic approaches.
+</p><p>
+You can store the data in a <code>byte[]</code>.  This allows very fast
+access from code written in Java.  On the native side, however, you're
+not guaranteed to be able to access the data without having to copy it.  In
+some implementations, <code>GetByteArrayElements</code> and
+<code>GetPrimitiveArrayCritical</code> will return actual pointers to the
+raw data in the managed heap, but in others it will allocate a buffer
+on the native heap and copy the data over.
+</p><p>
+The alternative is to store the data in a direct byte buffer.  These
+can be created with <code>java.nio.ByteBuffer.allocateDirect</code>, or
+the JNI <code>NewDirectByteBuffer</code> function.  Unlike regular
+byte buffers, the storage is not allocated on the managed heap, and can
+always be accessed directly from native code (get the address
+with <code>GetDirectBufferAddress</code>).  Depending on how direct
+byte buffer access is implemented in the VM, accessing the data from code
+written in Java can be very slow.
+</p><p>
+The choice of which to use depends on two factors:
+<ol>
+    <li>Will most of the data accesses happen from code written in Java
+    or in C/C++?
+    <li>If the data is eventually being passed to a system API, what form
+    must it be in?  (For example, if the data is eventually passed to a
+    function that takes a byte[], doing processing in a direct
+    <code>ByteBuffer</code> might be unwise.)
+</ol>
+If there's no clear winner, use a direct byte buffer.  Support for them
+is built directly into JNI, and access to them from code written in
+Java can be made faster with VM improvements.
 </p>
 
 <address>Copyright &copy; 2008 The Android Open Source Project</address>
diff --git a/vm/AtomicCache.c b/vm/AtomicCache.c
index bf8639e..a6f48c2 100644
--- a/vm/AtomicCache.c
+++ b/vm/AtomicCache.c
@@ -135,24 +135,25 @@
      * while we work.  We use memory barriers to ensure that the state
      * is always consistent when the version number is even.
      */
-    pEntry->version++;
-    ANDROID_MEMBAR_FULL();
+    u4 newVersion = (firstVersion | ATOMIC_LOCK_FLAG) + 1;
+    assert((newVersion & 0x01) == 1);
 
-    pEntry->key1 = key1;
+    pEntry->version = newVersion;
+
+    android_atomic_release_store(key1, (int32_t*) &pEntry->key1);
     pEntry->key2 = key2;
     pEntry->value = value;
 
-    ANDROID_MEMBAR_FULL();
-    pEntry->version++;
+    newVersion++;
+    android_atomic_release_store(newVersion, (int32_t*) &pEntry->version);
 
     /*
      * Clear the lock flag.  Nobody else should have been able to modify
      * pEntry->version, so if this fails the world is broken.
      */
-    firstVersion += 2;
-    assert((firstVersion & 0x01) == 0);
+    assert(newVersion == ((firstVersion + 2) | ATOMIC_LOCK_FLAG));
     if (android_atomic_release_cas(
-            firstVersion | ATOMIC_LOCK_FLAG, firstVersion,
+            newVersion, newVersion & ~ATOMIC_LOCK_FLAG,
             (volatile s4*) &pEntry->version) != 0)
     {
         //LOGE("unable to reset the instanceof cache ownership\n");
diff --git a/vm/AtomicCache.h b/vm/AtomicCache.h
index 66d222e..686eac5 100644
--- a/vm/AtomicCache.h
+++ b/vm/AtomicCache.h
@@ -95,28 +95,26 @@
 #define ATOMIC_CACHE_LOOKUP(_cache, _cacheSize, _key1, _key2) ({            \
     AtomicCacheEntry* pEntry;                                               \
     int hash;                                                               \
-    u4 firstVersion;                                                        \
+    u4 firstVersion, secondVersion;                                         \
     u4 value;                                                               \
                                                                             \
     /* simple hash function */                                              \
     hash = (((u4)(_key1) >> 2) ^ (u4)(_key2)) & ((_cacheSize)-1);           \
     pEntry = (_cache)->entries + hash;                                      \
                                                                             \
-    firstVersion = pEntry->version;                                         \
-    ANDROID_MEMBAR_FULL();                                                  \
+    firstVersion = android_atomic_acquire_load((int32_t*)&pEntry->version); \
                                                                             \
     if (pEntry->key1 == (u4)(_key1) && pEntry->key2 == (u4)(_key2)) {       \
         /*                                                                  \
          * The fields match.  Get the value, then read the version a        \
          * second time to verify that we didn't catch a partial update.     \
          * We're also hosed if "firstVersion" was odd, indicating that      \
-         * an update was in progress before we got here.                    \
+         * an update was in progress before we got here (unlikely).         \
          */                                                                 \
-        value = pEntry->value;                                              \
-        ANDROID_MEMBAR_FULL();                                              \
+        value = android_atomic_acquire_load((int32_t*) &pEntry->value);     \
+        secondVersion = pEntry->version;                                    \
                                                                             \
-        if ((firstVersion & 0x01) != 0 || firstVersion != pEntry->version)  \
-        {                                                                   \
+        if ((firstVersion & 0x01) != 0 || firstVersion != secondVersion) {  \
             /*                                                              \
              * We clashed with another thread.  Instead of sitting and      \
              * spinning, which might not complete if we're a high priority  \
diff --git a/vm/Globals.h b/vm/Globals.h
index 90e9e88..954bb5d 100644
--- a/vm/Globals.h
+++ b/vm/Globals.h
@@ -296,7 +296,7 @@
     //int         offJavaNioDirectByteBufferImpl_pointer;
 
     /* method pointers - java.security.AccessController */
-    volatile bool javaSecurityAccessControllerReady;
+    volatile int javaSecurityAccessControllerReady;
     Method*     methJavaSecurityAccessController_doPrivileged[4];
 
     /* constructor method pointers; no vtable involved, so use Method* */
diff --git a/vm/Init.c b/vm/Init.c
index 6c8ec9e..232b7a6 100644
--- a/vm/Init.c
+++ b/vm/Init.c
@@ -170,7 +170,7 @@
 #ifdef WITH_EXTRA_GC_CHECKS
         " extra_gc_checks"
 #endif
-#ifdef WITH_DALVIK_ASSERT
+#if !defined(NDEBUG) && defined(WITH_DALVIK_ASSERT)
         " dalvik_assert"
 #endif
 #ifdef WITH_JNI_STACK_CHECK
diff --git a/vm/Profile.c b/vm/Profile.c
index 57c96df..92d4001 100644
--- a/vm/Profile.c
+++ b/vm/Profile.c
@@ -407,13 +407,11 @@
     storeLongLE(state->buf + 8, state->startWhen);
     state->curOffset = TRACE_HEADER_LEN;
 
-    ANDROID_MEMBAR_FULL();
-
     /*
      * Set the "enabled" flag.  Once we do this, threads will wait to be
      * signaled before exiting, so we have to make sure we wake them up.
      */
-    state->traceEnabled = true;
+    android_atomic_release_store(true, &state->traceEnabled);
     dvmUnlockMutex(&state->startStopLock);
     return;
 
diff --git a/vm/Profile.h b/vm/Profile.h
index a294f83..afa529d 100644
--- a/vm/Profile.h
+++ b/vm/Profile.h
@@ -53,7 +53,7 @@
     int     bufferSize;
     int     flags;
 
-    bool    traceEnabled;
+    int     traceEnabled;
     u1*     buf;
     volatile int curOffset;
     u8      startWhen;
diff --git a/vm/Thread.c b/vm/Thread.c
index 40c79c4..9024312 100644
--- a/vm/Thread.c
+++ b/vm/Thread.c
@@ -472,6 +472,11 @@
  * the lock.  Nobody can suspend all threads without holding the thread list
  * lock while they do it, so by definition there isn't a GC in progress.
  *
+ * This function deliberately avoids the use of dvmChangeStatus(),
+ * which could grab threadSuspendCountLock.  To avoid deadlock, threads
+ * are required to grab the thread list lock before the thread suspend
+ * count lock.  (See comment in DvmGlobals.)
+ *
  * TODO: consider checking for suspend after acquiring the lock, and
  * backing off if set.  As stated above, it can't happen during normal
  * execution, but it *can* happen during shutdown when daemon threads
@@ -2262,7 +2267,7 @@
      * parallel with us from here out.  It's important to do this if
      * profiling is enabled, since we can wait indefinitely.
      */
-    self->status = THREAD_VMWAIT;
+    android_atomic_release_store(THREAD_VMWAIT, &self->status);
 
 #ifdef WITH_PROFILER
     /*
@@ -2974,8 +2979,13 @@
  *
  * As with all operations on foreign threads, the caller should hold
  * the thread list lock before calling.
+ *
+ * If the thread is suspending or waking, these fields could be changing
+ * out from under us (or the thread could change state right after we
+ * examine it), making this generally unreliable.  This is chiefly
+ * intended for use by the debugger.
  */
-bool dvmIsSuspended(Thread* thread)
+bool dvmIsSuspended(const Thread* thread)
 {
     /*
      * The thread could be:
@@ -3052,27 +3062,27 @@
     lockThreadSuspendCount();   /* grab gDvm.threadSuspendCountLock */
 
     didSuspend = (self->suspendCount != 0);
-    self->isSuspended = true;
-    LOG_THREAD("threadid=%d: self-suspending\n", self->threadId);
-    while (self->suspendCount != 0) {
-        /* wait for wakeup signal; releases lock */
-        int cc;
-        cc = pthread_cond_wait(&gDvm.threadSuspendCountCond,
-                &gDvm.threadSuspendCountLock);
-        assert(cc == 0);
+    if (didSuspend) {
+        self->isSuspended = true;
+        LOG_THREAD("threadid=%d: self-suspending\n", self->threadId);
+        while (self->suspendCount != 0) {
+            /* wait for wakeup signal; releases lock */
+            dvmWaitCond(&gDvm.threadSuspendCountCond,
+                    &gDvm.threadSuspendCountLock);
+        }
+        assert(self->suspendCount == 0 && self->dbgSuspendCount == 0);
+        self->isSuspended = false;
+        LOG_THREAD("threadid=%d: self-reviving, status=%d\n",
+            self->threadId, self->status);
     }
-    assert(self->suspendCount == 0 && self->dbgSuspendCount == 0);
-    self->isSuspended = false;
-    LOG_THREAD("threadid=%d: self-reviving, status=%d\n",
-        self->threadId, self->status);
 
     /*
      * The status change needs to happen while the suspend count lock is
      * held.  Otherwise we could switch to RUNNING after another thread
      * increases our suspend count, which isn't a "bad" state for us
      * (we'll suspend on the next check) but could be a problem for the
-     * other thread (which thinks we're safely in VMWAIT or NATIVE with
-     * a nonzero suspend count, and proceeds to initate GC).
+     * other thread (which thinks we're still safely in VMWAIT or NATIVE
+     * with a nonzero suspend count, and proceeds to initate GC).
      */
     if (newStatus != THREAD_UNDEFINED)
         self->status = newStatus;
@@ -3128,8 +3138,12 @@
     } else {
         /*
          * Not changing to THREAD_RUNNING.  No additional work required.
+         *
+         * We use a releasing store to ensure that, if we were RUNNING,
+         * any updates we previously made to objects on the managed heap
+         * will be observed before the state change.
          */
-        self->status = newStatus;
+        android_atomic_release_store(newStatus, &self->status);
     }
 
     return oldStatus;
@@ -4084,6 +4098,12 @@
      * (Newly-created threads shouldn't be able to shift themselves to
      * RUNNING without a suspend-pending check, so this shouldn't cause
      * a false-positive.)
+     *
+     * Since the thread is supposed to be suspended, we should have already
+     * observed all changes to thread state, and don't need to use a memory
+     * barrier here.  It's possible we might miss a wayward unsuspended
+     * thread on SMP, but as this is primarily a sanity check it's not
+     * worth slowing the common case.
      */
     if (thread->status == THREAD_RUNNING && !thread->isSuspended &&
         thread != dvmThreadSelf())
diff --git a/vm/Thread.h b/vm/Thread.h
index 204135a..b01676c 100644
--- a/vm/Thread.h
+++ b/vm/Thread.h
@@ -319,7 +319,7 @@
 /*
  * Check suspend state.  Grab threadListLock before calling.
  */
-bool dvmIsSuspended(Thread* thread);
+bool dvmIsSuspended(const Thread* thread);
 
 /*
  * Wait until a thread has suspended.  (Used by debugger support.)
diff --git a/vm/alloc/MarkSweep.c b/vm/alloc/MarkSweep.c
index f673907..b6ea3fd 100644
--- a/vm/alloc/MarkSweep.c
+++ b/vm/alloc/MarkSweep.c
@@ -963,6 +963,11 @@
 static void sweepBitmapCallback(size_t numPtrs, void **ptrs,
                                 const void *finger, void *arg)
 {
+#ifndef NDEBUG
+    /*
+     * Verify that any classes we're freeing have had their innards
+     * discarded.
+     */
     const ClassObject *const classJavaLangClass = gDvm.classJavaLangClass;
     SweepContext *ctx = arg;
     size_t i;
@@ -979,12 +984,22 @@
          * value.
          */
         if (obj->clazz == classJavaLangClass) {
+            const char* descr = ((ClassObject*)obj)->descriptor;
+            if (descr != NULL) {
+                LOGW("WARNING: unfreed innards in class '%s'\n", descr);
+                /* not that important; keep going */
+            } else {
+                LOGV("found freed innards on object %p\n", obj);
+            }
+
             /* dvmFreeClassInnards() may have already been called,
              * but it's safe to call on the same ClassObject twice.
              */
-            dvmFreeClassInnards((ClassObject *)obj);
+            //dvmFreeClassInnards((ClassObject *)obj);
         }
     }
+#endif
+
     // TODO: dvmHeapSourceFreeList has a loop, just like the above
     // does. Consider collapsing the two loops to save overhead.
     if (ctx->isConcurrent) {
diff --git a/vm/compiler/Compiler.h b/vm/compiler/Compiler.h
index 1d074f2..9aeb661 100644
--- a/vm/compiler/Compiler.h
+++ b/vm/compiler/Compiler.h
@@ -225,6 +225,13 @@
 #define METHOD_IS_GETTER        (1 << kIsGetter)
 #define METHOD_IS_SETTER        (1 << kIsSetter)
 
+/* Vectors to provide optimization hints */
+typedef enum JitOptimizationHints {
+    kJitOptNoLoop = 0,          // Disable loop formation/optimization
+} JitOptimizationHints;
+
+#define JIT_OPT_NO_LOOP         (1 << kJitOptNoLoop)
+
 typedef struct CompilerMethodStats {
     const Method *method;       // Used as hash entry signature
     int dalvikSize;             // # of bytes for dalvik bytecodes
@@ -254,7 +261,7 @@
 bool dvmCompileMethod(struct CompilationUnit *cUnit, const Method *method,
                       JitTranslationInfo *info);
 bool dvmCompileTrace(JitTraceDescription *trace, int numMaxInsts,
-                     JitTranslationInfo *info, jmp_buf *bailPtr);
+                     JitTranslationInfo *info, jmp_buf *bailPtr, int optHints);
 void dvmCompilerDumpStats(void);
 void dvmCompilerDrainQueue(void);
 void dvmJitUnchainAll(void);
@@ -263,7 +270,7 @@
 void dvmCompilerInlineMIR(struct CompilationUnit *cUnit);
 void dvmInitializeSSAConversion(struct CompilationUnit *cUnit);
 int dvmConvertSSARegToDalvik(struct CompilationUnit *cUnit, int ssaReg);
-void dvmCompilerLoopOpt(struct CompilationUnit *cUnit);
+bool dvmCompilerLoopOpt(struct CompilationUnit *cUnit);
 void dvmCompilerNonLoopAnalysis(struct CompilationUnit *cUnit);
 void dvmCompilerFindLiveIn(struct CompilationUnit *cUnit,
                            struct BasicBlock *bb);
diff --git a/vm/compiler/CompilerIR.h b/vm/compiler/CompilerIR.h
index 21aadec..82b97e5 100644
--- a/vm/compiler/CompilerIR.h
+++ b/vm/compiler/CompilerIR.h
@@ -152,6 +152,12 @@
 struct LoopAnalysis;
 struct RegisterPool;
 
+typedef enum AssemblerStatus {
+    kSuccess,
+    kRetryAll,
+    kRetryHalve
+} AssemblerStatus;
+
 typedef struct CompilationUnit {
     int numInsts;
     int numBlocks;
@@ -166,11 +172,12 @@
     int headerSize;                     // bytes before the first code ptr
     int dataOffset;                     // starting offset of literal pool
     int totalSize;                      // header + code size
+    AssemblerStatus assemblerStatus;    // Success or fix and retry
+    int assemblerRetries;               // How many times tried to fix assembly
     unsigned char *codeBuffer;
     void *baseAddr;
     bool printMe;
     bool allSingleStep;
-    bool halveInstCount;
     bool executionCount;                // Add code to count trace executions
     bool hasLoop;                       // Contains a loop
     bool hasInvoke;                     // Contains an invoke instruction
diff --git a/vm/compiler/Frontend.c b/vm/compiler/Frontend.c
index 40da0e1..8e5a5e2 100644
--- a/vm/compiler/Frontend.c
+++ b/vm/compiler/Frontend.c
@@ -399,7 +399,8 @@
  * bytecode into machine code.
  */
 bool dvmCompileTrace(JitTraceDescription *desc, int numMaxInsts,
-                     JitTranslationInfo *info, jmp_buf *bailPtr)
+                     JitTranslationInfo *info, jmp_buf *bailPtr,
+                     int optHints)
 {
     const DexCode *dexCode = dvmGetMethodCode(desc->method);
     const JitTraceRun* currRun = &desc->trace[0];
@@ -671,10 +672,12 @@
                        kInstrInvoke)) == 0) ||
             (lastInsn->dalvikInsn.opCode == OP_INVOKE_DIRECT_EMPTY);
 
+        /* Only form a loop if JIT_OPT_NO_LOOP is not set */
         if (curBB->taken == NULL &&
             curBB->fallThrough == NULL &&
             flags == (kInstrCanBranch | kInstrCanContinue) &&
-            fallThroughOffset == startBB->startOffset) {
+            fallThroughOffset == startBB->startOffset &&
+            JIT_OPT_NO_LOOP != (optHints & JIT_OPT_NO_LOOP)) {
             BasicBlock *loopBranch = curBB;
             BasicBlock *exitBB;
             BasicBlock *exitChainingCell;
@@ -883,7 +886,21 @@
     dvmInitializeSSAConversion(&cUnit);
 
     if (cUnit.hasLoop) {
-        dvmCompilerLoopOpt(&cUnit);
+        /*
+         * Loop is not optimizable (for example lack of a single induction
+         * variable), punt and recompile the trace with loop optimization
+         * disabled.
+         */
+        bool loopOpt = dvmCompilerLoopOpt(&cUnit);
+        if (loopOpt == false) {
+            if (cUnit.printMe) {
+                LOGD("Loop is not optimizable - retry codegen");
+            }
+            /* Reset the compiler resource pool */
+            dvmCompilerArenaReset();
+            return dvmCompileTrace(desc, cUnit.numInsts, info, bailPtr,
+                                   optHints | JIT_OPT_NO_LOOP);
+        }
     }
     else {
         dvmCompilerNonLoopAnalysis(&cUnit);
@@ -901,15 +918,17 @@
     /* Convert MIR to LIR, etc. */
     dvmCompilerMIR2LIR(&cUnit);
 
-    /* Convert LIR into machine code. */
-    dvmCompilerAssembleLIR(&cUnit, info);
+    /* Convert LIR into machine code. Loop for recoverable retries */
+    do {
+        dvmCompilerAssembleLIR(&cUnit, info);
+        cUnit.assemblerRetries++;
+        if (cUnit.printMe && cUnit.assemblerStatus != kSuccess)
+            LOGD("Assembler abort #%d on %d",cUnit.assemblerRetries,
+                  cUnit.assemblerStatus);
+    } while (cUnit.assemblerStatus == kRetryAll);
 
     if (cUnit.printMe) {
-        if (cUnit.halveInstCount) {
-            LOGD("Assembler aborted");
-        } else {
-            dvmCompilerCodegenDump(&cUnit);
-        }
+        dvmCompilerCodegenDump(&cUnit);
         LOGD("End %s%s, %d Dalvik instructions",
              desc->method->clazz->descriptor, desc->method->name,
              cUnit.numInsts);
@@ -918,17 +937,17 @@
     /* Reset the compiler resource pool */
     dvmCompilerArenaReset();
 
-    /* Success */
-    if (!cUnit.halveInstCount) {
-#if defined(WITH_JIT_TUNING)
-        methodStats->nativeSize += cUnit.totalSize;
-#endif
-        return info->codeAddress != NULL;
-
-    /* Halve the instruction count and retry again */
-    } else {
-        return dvmCompileTrace(desc, cUnit.numInsts / 2, info, bailPtr);
+    if (cUnit.assemblerStatus == kRetryHalve) {
+        /* Halve the instruction count and start from the top */
+        return dvmCompileTrace(desc, cUnit.numInsts / 2, info, bailPtr,
+                               optHints);
     }
+
+    assert(cUnit.assemblerStatus == kSuccess);
+#if defined(WITH_JIT_TUNING)
+    methodStats->nativeSize += cUnit.totalSize;
+#endif
+    return info->codeAddress != NULL;
 }
 
 /*
@@ -1272,7 +1291,7 @@
     /* Convert LIR into machine code. */
     dvmCompilerAssembleLIR(cUnit, info);
 
-    if (cUnit->halveInstCount) {
+    if (cUnit->assemblerStatus != kSuccess) {
         return false;
     }
 
diff --git a/vm/compiler/Loop.c b/vm/compiler/Loop.c
index dede0ee..78f5717 100644
--- a/vm/compiler/Loop.c
+++ b/vm/compiler/Loop.c
@@ -159,6 +159,8 @@
  * 2) The loop back branch compares the BIV with a constant
  * 3) If it is a count-up loop, the condition is GE/GT, or LE/LT/LEZ/LTZ for
  *    a count-down loop.
+ *
+ * Return false if the loop is not optimizable.
  */
 static bool isLoopOptimizable(CompilationUnit *cUnit)
 {
@@ -173,6 +175,10 @@
         ivInfo = GET_ELEM_N(loopAnalysis->ivList, InductionVariableInfo*, i);
         /* Count up or down loop? */
         if (ivInfo->ssaReg == ivInfo->basicSSAReg) {
+            /* Infinite loop */
+            if (ivInfo->inc == 0) {
+                return false;
+            }
             loopAnalysis->isCountUpLoop = ivInfo->inc > 0;
             break;
         }
@@ -207,11 +213,14 @@
         }
         /*
          * If the comparison is not between the BIV and a loop invariant,
-         * return false.
+         * return false. endReg is loop invariant if one of the following is
+         * true:
+         * - It is not defined in the loop (ie DECODE_SUB returns 0)
+         * - It is reloaded with a constant
          */
         int endReg = dvmConvertSSARegToDalvik(cUnit, branch->ssaRep->uses[1]);
-
-        if (DECODE_SUB(endReg) != 0) {
+        if (DECODE_SUB(endReg) != 0 &&
+            !dvmIsBitSet(cUnit->isConstantV, branch->ssaRep->uses[1])) {
             return false;
         }
         loopAnalysis->endConditionReg = DECODE_REG(endReg);
@@ -457,8 +466,11 @@
     }
 }
 
-/* Main entry point to do loop optimization */
-void dvmCompilerLoopOpt(CompilationUnit *cUnit)
+/*
+ * Main entry point to do loop optimization.
+ * Return false if sanity checks for loop formation/optimization failed.
+ */
+bool dvmCompilerLoopOpt(CompilationUnit *cUnit)
 {
     LoopAnalysis *loopAnalysis = dvmCompilerNew(sizeof(LoopAnalysis), true);
 
@@ -497,7 +509,7 @@
 
     /* If the loop turns out to be non-optimizable, return early */
     if (!isLoopOptimizable(cUnit))
-        return;
+        return false;
 
     loopAnalysis->arrayAccessInfo = dvmCompilerNew(sizeof(GrowableList), true);
     dvmInitGrowableList(loopAnalysis->arrayAccessInfo, 4);
@@ -509,4 +521,5 @@
      * header.
      */
     genHoistedChecks(cUnit);
+    return true;
 }
diff --git a/vm/compiler/codegen/arm/ArchUtility.c b/vm/compiler/codegen/arm/ArchUtility.c
index d5acd13..2e68459 100644
--- a/vm/compiler/codegen/arm/ArchUtility.c
+++ b/vm/compiler/codegen/arm/ArchUtility.c
@@ -18,6 +18,12 @@
 #include "libdex/OpCodeNames.h"
 #include "ArmLIR.h"
 
+static char *shiftNames[4] = {
+    "lsl",
+    "lsr",
+    "asr",
+    "ror"};
+
 /* Decode and print a ARM register name */
 static char * decodeRegList(int vector, char *buf)
 {
@@ -83,6 +89,14 @@
                assert((unsigned)(nc-'0') < 4);
                operand = lir->operands[nc-'0'];
                switch(*fmt++) {
+                   case 'H':
+                       if (operand != 0) {
+                           sprintf(tbuf, ", %s %d",shiftNames[operand & 0x3],
+                                   operand >> 2);
+                       } else {
+                           strcpy(tbuf,"");
+                       }
+                       break;
                    case 'B':
                        switch (operand) {
                            case kSY:
diff --git a/vm/compiler/codegen/arm/Assemble.c b/vm/compiler/codegen/arm/Assemble.c
index c1b08a3..226a942 100644
--- a/vm/compiler/codegen/arm/Assemble.c
+++ b/vm/compiler/codegen/arm/Assemble.c
@@ -20,9 +20,12 @@
 
 #include "../../CompilerInternals.h"
 #include "ArmLIR.h"
+#include "Codegen.h"
 #include <unistd.h>             /* for cacheflush */
 #include <sys/mman.h>           /* for protection change */
 
+#define MAX_ASSEMBLER_RETRIES 10
+
 /*
  * opcode: ArmOpCode enum
  * skeleton: pre-designated bit-pattern for this opcode
@@ -71,6 +74,7 @@
  *     M -> Thumb2 16-bit zero-extended immediate
  *     b -> 4-digit binary
  *     B -> dmb option string (sy, st, ish, ishst, nsh, hshst)
+ *     H -> operand shift
  *
  *  [!] escape.  To insert "!", use "!!"
  */
@@ -540,17 +544,17 @@
                  kFmtBitBlt, 11, 8, kFmtBitBlt, 19, 16, kFmtBitBlt, 3, 0,
                  kFmtShift, -1, -1,
                  IS_QUAD_OP | REG_DEF0_USE12 | SETS_CCODES,
-                 "adds", "r!0d, r!1d, r!2d", 2),
+                 "adds", "r!0d, r!1d, r!2d!3H", 2),
     ENCODING_MAP(kThumb2SubRRR,       0xebb00000, /* setflags enconding */
                  kFmtBitBlt, 11, 8, kFmtBitBlt, 19, 16, kFmtBitBlt, 3, 0,
                  kFmtShift, -1, -1,
                  IS_QUAD_OP | REG_DEF0_USE12 | SETS_CCODES,
-                 "subs", "r!0d, r!1d, r!2d", 2),
+                 "subs", "r!0d, r!1d, r!2d!3H", 2),
     ENCODING_MAP(kThumb2SbcRRR,       0xeb700000, /* setflags encoding */
                  kFmtBitBlt, 11, 8, kFmtBitBlt, 19, 16, kFmtBitBlt, 3, 0,
                  kFmtShift, -1, -1,
                  IS_QUAD_OP | REG_DEF0_USE12 | USES_CCODES | SETS_CCODES,
-                 "sbcs", "r!0d, r!1d, r!2d", 2),
+                 "sbcs", "r!0d, r!1d, r!2d!3H", 2),
     ENCODING_MAP(kThumb2CmpRR,       0xebb00f00,
                  kFmtBitBlt, 19, 16, kFmtBitBlt, 3, 0, kFmtShift, -1, -1,
                  kFmtUnused, -1, -1,
@@ -653,15 +657,15 @@
                  kFmtBitBlt, 11, 8, kFmtBitBlt, 19, 16, kFmtBitBlt, 3, 0,
                  kFmtShift, -1, -1,
                  IS_QUAD_OP | REG_DEF0_USE12 | SETS_CCODES,
-                 "adcs", "r!0d, r!1d, r!2d, shift !3d", 2),
+                 "adcs", "r!0d, r!1d, r!2d!3H", 2),
     ENCODING_MAP(kThumb2AndRRR,  0xea000000,
                  kFmtBitBlt, 11, 8, kFmtBitBlt, 19, 16, kFmtBitBlt, 3, 0,
                  kFmtShift, -1, -1, IS_QUAD_OP | REG_DEF0_USE12,
-                 "and", "r!0d, r!1d, r!2d, shift !3d", 2),
+                 "and", "r!0d, r!1d, r!2d!3H", 2),
     ENCODING_MAP(kThumb2BicRRR,  0xea200000,
                  kFmtBitBlt, 11, 8, kFmtBitBlt, 19, 16, kFmtBitBlt, 3, 0,
                  kFmtShift, -1, -1, IS_QUAD_OP | REG_DEF0_USE12,
-                 "bic", "r!0d, r!1d, r!2d, shift !3d", 2),
+                 "bic", "r!0d, r!1d, r!2d!3H", 2),
     ENCODING_MAP(kThumb2CmnRR,  0xeb000000,
                  kFmtBitBlt, 19, 16, kFmtBitBlt, 3, 0, kFmtShift, -1, -1,
                  kFmtUnused, -1, -1,
@@ -670,7 +674,7 @@
     ENCODING_MAP(kThumb2EorRRR,  0xea800000,
                  kFmtBitBlt, 11, 8, kFmtBitBlt, 19, 16, kFmtBitBlt, 3, 0,
                  kFmtShift, -1, -1, IS_QUAD_OP | REG_DEF0_USE12,
-                 "eor", "r!0d, r!1d, r!2d, shift !3d", 2),
+                 "eor", "r!0d, r!1d, r!2d!3H", 2),
     ENCODING_MAP(kThumb2MulRRR,  0xfb00f000,
                  kFmtBitBlt, 11, 8, kFmtBitBlt, 19, 16, kFmtBitBlt, 3, 0,
                  kFmtUnused, -1, -1, IS_TERTIARY_OP | REG_DEF0_USE12,
@@ -692,7 +696,7 @@
     ENCODING_MAP(kThumb2OrrRRR,  0xea400000,
                  kFmtBitBlt, 11, 8, kFmtBitBlt, 19, 16, kFmtBitBlt, 3, 0,
                  kFmtShift, -1, -1, IS_QUAD_OP | REG_DEF0_USE12,
-                 "orr", "r!0d, r!1d, r!2d, shift !3d", 2),
+                 "orr", "r!0d, r!1d, r!2d!3H", 2),
     ENCODING_MAP(kThumb2TstRR,       0xea100f00,
                  kFmtBitBlt, 19, 16, kFmtBitBlt, 3, 0, kFmtShift, -1, -1,
                  kFmtUnused, -1, -1,
@@ -914,8 +918,14 @@
     return sizeof(JitTraceDescription) + ((runCount+1) * sizeof(JitTraceRun));
 }
 
-/* Return TRUE if error happens */
-static bool assembleInstructions(CompilationUnit *cUnit, intptr_t startAddr)
+/*
+ * Assemble the LIR into binary instruction format.  Note that we may
+ * discover that pc-relative displacements may not fit the selected
+ * instruction.  In those cases we will try to substitute a new code
+ * sequence or request that the trace be shortened and retried.
+ */
+static AssemblerStatus assembleInstructions(CompilationUnit *cUnit,
+                                            intptr_t startAddr)
 {
     short *bufferAddr = (short *) cUnit->codeBuffer;
     ArmLIR *lir;
@@ -940,21 +950,16 @@
             ((lir->opCode == kThumb2Vldrs) && (lir->operands[1] == rpc))) {
             ArmLIR *lirTarget = (ArmLIR *) lir->generic.target;
             intptr_t pc = (lir->generic.offset + 4) & ~3;
-            /*
-             * Allow an offset (stored in operands[2] to be added to the
-             * PC-relative target. Useful to get to a fixed field inside a
-             * chaining cell.
-             */
-            intptr_t target = lirTarget->generic.offset + lir->operands[2];
+            intptr_t target = lirTarget->generic.offset;
             int delta = target - pc;
             if (delta & 0x3) {
                 LOGE("PC-rel distance is not multiples of 4: %d\n", delta);
                 dvmCompilerAbort(cUnit);
             }
             if ((lir->opCode == kThumb2LdrPcRel12) && (delta > 4091)) {
-                return true;
+                return kRetryHalve;
             } else if (delta > 1020) {
-                return true;
+                return kRetryHalve;
             }
             if (lir->opCode == kThumb2Vldrs) {
                 lir->operands[2] = delta >> 2;
@@ -968,11 +973,23 @@
             intptr_t target = targetLIR->generic.offset;
             int delta = target - pc;
             if (delta > 126 || delta < 0) {
-                /*
-                 * TODO: allow multiple kinds of assembler failure to allow
-                 * change of code patterns when things don't fit.
-                 */
-                return true;
+                /* Convert to cmp rx,#0 / b[eq/ne] tgt pair */
+                ArmLIR *newInst = dvmCompilerNew(sizeof(ArmLIR), true);
+                /* Make new branch instruction and insert after */
+                newInst->opCode = kThumbBCond;
+                newInst->operands[0] = 0;
+                newInst->operands[1] = (lir->opCode == kThumb2Cbz) ?
+                                        kArmCondEq : kArmCondNe;
+                newInst->generic.target = lir->generic.target;
+                dvmCompilerSetupResourceMasks(newInst);
+                dvmCompilerInsertLIRAfter((LIR *)lir, (LIR *)newInst);
+                /* Convert the cb[n]z to a cmp rx, #0 ] */
+                lir->opCode = kThumbCmpRI8;
+                lir->operands[0] = lir->operands[1];
+                lir->operands[1] = 0;
+                lir->generic.target = 0;
+                dvmCompilerSetupResourceMasks(lir);
+                return kRetryAll;
             } else {
                 lir->operands[1] = delta >> 1;
             }
@@ -983,7 +1000,7 @@
             intptr_t target = targetLIR->generic.offset;
             int delta = target - pc;
             if ((lir->opCode == kThumbBCond) && (delta > 254 || delta < -256)) {
-                return true;
+                return kRetryHalve;
             }
             lir->operands[0] = delta >> 1;
         } else if (lir->opCode == kThumbBUncond) {
@@ -1029,18 +1046,12 @@
                     bits |= value;
                     break;
                 case kFmtBrOffset:
-                    /*
-                     * NOTE: branch offsets are not handled here, but
-                     * in the main assembly loop (where label values
-                     * are known).  For reference, here is what the
-                     * encoder handing would be:
-                         value = ((operand  & 0x80000) >> 19) << 26;
-                         value |= ((operand & 0x40000) >> 18) << 11;
-                         value |= ((operand & 0x20000) >> 17) << 13;
-                         value |= ((operand & 0x1f800) >> 11) << 16;
-                         value |= (operand  & 0x007ff);
-                         bits |= value;
-                     */
+                    value = ((operand  & 0x80000) >> 19) << 26;
+                    value |= ((operand & 0x40000) >> 18) << 11;
+                    value |= ((operand & 0x20000) >> 17) << 13;
+                    value |= ((operand & 0x1f800) >> 11) << 16;
+                    value |= (operand  & 0x007ff);
+                    bits |= value;
                     break;
                 case kFmtShift5:
                     value = ((operand & 0x1c) >> 2) << 12;
@@ -1117,7 +1128,7 @@
         }
         *bufferAddr++ = bits & 0xffff;
     }
-    return false;
+    return kSuccess;
 }
 
 #if defined(SIGNATURE_BREAKPOINT)
@@ -1277,16 +1288,31 @@
         return;
     }
 
-    bool assemblerFailure = assembleInstructions(
-        cUnit, (intptr_t) gDvmJit.codeCache + gDvmJit.codeCacheByteUsed);
-
     /*
-     * Currently the only reason that can cause the assembler to fail is due to
-     * trace length - cut it in half and retry.
+     * Attempt to assemble the trace.  Note that assembleInstructions
+     * may rewrite the code sequence and request a retry.
      */
-    if (assemblerFailure) {
-        cUnit->halveInstCount = true;
-        return;
+    cUnit->assemblerStatus = assembleInstructions(cUnit,
+          (intptr_t) gDvmJit.codeCache + gDvmJit.codeCacheByteUsed);
+
+    switch(cUnit->assemblerStatus) {
+        case kSuccess:
+            break;
+        case kRetryAll:
+            if (cUnit->assemblerRetries < MAX_ASSEMBLER_RETRIES) {
+                /* Restore pristine chain cell marker on retry */
+                chainCellOffsetLIR->operands[0] = CHAIN_CELL_OFFSET_TAG;
+                return;
+            }
+            /* Too many retries - reset and try cutting the trace in half */
+            cUnit->assemblerRetries = 0;
+            cUnit->assemblerStatus = kRetryHalve;
+            return;
+        case kRetryHalve:
+            return;
+        default:
+             LOGE("Unexpected assembler status: %d", cUnit->assemblerStatus);
+             dvmAbort();
     }
 
 #if defined(SIGNATURE_BREAKPOINT)
@@ -1461,8 +1487,8 @@
          * The update order matters - make sure clazz is updated last since it
          * will bring the uninitialized chaining cell to life.
          */
-        ANDROID_MEMBAR_FULL();
-        cellAddr->clazz = newContent->clazz;
+        android_atomic_release_store((int32_t)newContent->clazz,
+            (void*) &cellAddr->clazz);
         cacheflush((intptr_t) cellAddr, (intptr_t) (cellAddr+1), 0);
         UPDATE_CODE_CACHE_PATCHES();
 
@@ -1821,6 +1847,7 @@
     u2* pCellOffset;
     JitTraceDescription *desc;
     const Method* method;
+    int idx;
 
     traceBase = getTraceBase(p);
 
@@ -1874,6 +1901,24 @@
          method->clazz->descriptor, method->name, methodDesc);
     free(methodDesc);
 
+    /* Find the last fragment (ie runEnd is set) */
+    for (idx = 0;
+         desc->trace[idx].frag.isCode && !desc->trace[idx].frag.runEnd;
+         idx++) {
+    }
+
+    /*
+     * runEnd must comes with a JitCodeDesc frag. If isCode is false it must
+     * be a meta info field (only used by callsite info for now).
+     */
+    if (!desc->trace[idx].frag.isCode) {
+        const Method *method = desc->trace[idx+1].meta;
+        char *methodDesc = dexProtoCopyMethodDescriptor(&method->prototype);
+        /* Print the callee info in the trace */
+        LOGD("    -> %s%s;%s", method->clazz->descriptor, method->name,
+             methodDesc);
+    }
+
     return executionCount;
 }
 
diff --git a/vm/compiler/codegen/arm/Codegen.h b/vm/compiler/codegen/arm/Codegen.h
index ca0a843..be74e3f 100644
--- a/vm/compiler/codegen/arm/Codegen.h
+++ b/vm/compiler/codegen/arm/Codegen.h
@@ -78,6 +78,8 @@
 extern void dvmCompilerRegCopyWide(CompilationUnit *cUnit, int destLo,
                                    int destHi, int srcLo, int srcHi);
 
+extern void dvmCompilerSetupResourceMasks(ArmLIR *lir);
+
 extern void dvmCompilerFlushRegImpl(CompilationUnit *cUnit, int rBase,
                                     int displacement, int rSrc, OpSize size);
 
diff --git a/vm/compiler/codegen/arm/CodegenDriver.c b/vm/compiler/codegen/arm/CodegenDriver.c
index a1e5449..1ff9409 100644
--- a/vm/compiler/codegen/arm/CodegenDriver.c
+++ b/vm/compiler/codegen/arm/CodegenDriver.c
@@ -4339,14 +4339,14 @@
         case kWorkOrderTrace:
             /* Start compilation with maximally allowed trace length */
             res = dvmCompileTrace(work->info, JIT_MAX_TRACE_LEN, &work->result,
-                                  work->bailPtr);
+                                  work->bailPtr, 0 /* no hints */);
             break;
         case kWorkOrderTraceDebug: {
             bool oldPrintMe = gDvmJit.printMe;
             gDvmJit.printMe = true;
             /* Start compilation with maximally allowed trace length */
             res = dvmCompileTrace(work->info, JIT_MAX_TRACE_LEN, &work->result,
-                                  work->bailPtr);
+                                  work->bailPtr, 0 /* no hints */);
             gDvmJit.printMe = oldPrintMe;
             break;
         }
@@ -4426,6 +4426,12 @@
                       templateEntryOffsets[TEMPLATE_INTERPRET]);
 }
 
+/* Needed by the Assembler */
+void dvmCompilerSetupResourceMasks(ArmLIR *lir)
+{
+    setupResourceMasks(lir);
+}
+
 /* Needed by the ld/st optmizatons */
 ArmLIR* dvmCompilerRegCopyNoInsert(CompilationUnit *cUnit, int rDest, int rSrc)
 {
diff --git a/vm/compiler/codegen/arm/Thumb2/Factory.c b/vm/compiler/codegen/arm/Thumb2/Factory.c
index c7b52fd..2bf2940 100644
--- a/vm/compiler/codegen/arm/Thumb2/Factory.c
+++ b/vm/compiler/codegen/arm/Thumb2/Factory.c
@@ -171,7 +171,7 @@
         dataTarget = addWordData(cUnit, value, false);
     }
     ArmLIR *loadPcRel = dvmCompilerNew(sizeof(ArmLIR), true);
-    loadPcRel->opCode = LOWREG(rDest) ? kThumbLdrPcRel : kThumb2LdrPcRel12;
+    loadPcRel->opCode = kThumb2LdrPcRel12;
     loadPcRel->generic.target = (LIR *) dataTarget;
     loadPcRel->operands[0] = rDest;
     setupResourceMasks(loadPcRel);
@@ -227,7 +227,7 @@
 
 static ArmLIR *opCondBranch(CompilationUnit *cUnit, ArmConditionCode cc)
 {
-    return newLIR2(cUnit, kThumbBCond, 0 /* offset to be patched */, cc);
+    return newLIR2(cUnit, kThumb2BCond, 0 /* offset to be patched */, cc);
 }
 
 static ArmLIR *opImm(CompilationUnit *cUnit, OpKind op, int value)
@@ -1100,15 +1100,7 @@
 {
     ArmLIR *branch;
     int modImm;
-    /*
-     * TODO: re-enable usage of kThumb2Cbz & kThumb2Cbnz once assembler is
-     * enhanced to allow us to replace code patterns when instructions don't
-     * reach.  Currently, CB[N]Z is causing too many assembler aborts.
-     * What we want to do is emit the short forms, and then replace them with
-     * longer versions when needed.
-     */
-
-    if (0 && (LOWREG(reg)) && (checkValue == 0) &&
+    if ((LOWREG(reg)) && (checkValue == 0) &&
        ((cond == kArmCondEq) || (cond == kArmCondNe))) {
         branch = newLIR2(cUnit,
                          (cond == kArmCondEq) ? kThumb2Cbz : kThumb2Cbnz,
diff --git a/vm/interp/Interp.c b/vm/interp/Interp.c
index ab61882..d3de730 100644
--- a/vm/interp/Interp.c
+++ b/vm/interp/Interp.c
@@ -450,6 +450,14 @@
 #ifdef WITH_DEBUGGER
 /*
  * Get the original opcode from under a breakpoint.
+ *
+ * On SMP hardware it's possible one core might try to execute a breakpoint
+ * after another core has cleared it.  We need to handle the case where
+ * there's no entry in the breakpoint set.  (The memory barriers in the
+ * locks and in the breakpoint update code should ensure that, once we've
+ * observed the absence of a breakpoint entry, we will also now observe
+ * the restoration of the original opcode.  The fact that we're holding
+ * the lock prevents other threads from confusing things further.)
  */
 u1 dvmGetOriginalOpCode(const u2* addr)
 {
diff --git a/vm/jdwp/JdwpMain.c b/vm/jdwp/JdwpMain.c
index 1688e5e..24e5c6c 100644
--- a/vm/jdwp/JdwpMain.c
+++ b/vm/jdwp/JdwpMain.c
@@ -231,8 +231,7 @@
      */
     state->debugThreadHandle = dvmThreadSelf()->handle;
     state->run = true;
-    ANDROID_MEMBAR_FULL();
-    state->debugThreadStarted = true;       // touch this last
+    android_atomic_release_store(true, &state->debugThreadStarted);
 
     dvmDbgLockMutex(&state->threadStartLock);
     dvmDbgCondBroadcast(&state->threadStartCond);
diff --git a/vm/jdwp/JdwpPriv.h b/vm/jdwp/JdwpPriv.h
index 85f9ec2..bc249f1 100644
--- a/vm/jdwp/JdwpPriv.h
+++ b/vm/jdwp/JdwpPriv.h
@@ -79,7 +79,7 @@
     pthread_mutex_t threadStartLock;
     pthread_cond_t  threadStartCond;
 
-    bool            debugThreadStarted;
+    int             debugThreadStarted;
     pthread_t       debugThreadHandle;
     ObjectId        debugThreadId;
     bool            run;
diff --git a/vm/mterp/Mterp.c b/vm/mterp/Mterp.c
index 4b0ceb8..569e49f 100644
--- a/vm/mterp/Mterp.c
+++ b/vm/mterp/Mterp.c
@@ -86,9 +86,7 @@
     if (gDvm.jdwpConfigured) {
         glue->pDebuggerActive = &gDvm.debuggerActive;
     } else {
-        /* TODO: fix x86 impl before enabling this */
-        //glue->pDebuggerActive = NULL;
-        glue->pDebuggerActive = &gDvm.debuggerActive;
+        glue->pDebuggerActive = NULL;
     }
 #endif
 #if defined(WITH_PROFILER)
diff --git a/vm/mterp/out/InterpAsm-x86.S b/vm/mterp/out/InterpAsm-x86.S
index 5dd2601..563f9ca 100644
--- a/vm/mterp/out/InterpAsm-x86.S
+++ b/vm/mterp/out/InterpAsm-x86.S
@@ -9153,45 +9153,44 @@
  * On entry:
  *   ebx  -> PC adjustment in 16-bit words (must be preserved)
  *   ecx  -> GLUE pointer
+ *   reentry type, e.g. kInterpEntryInstr stored in rGLUE->entryPoint
  *
  * Note: A call will normally kill %eax, rPC/%edx and %ecx.  To
  *       streamline the normal case, this routine will preserve rPC and
  *       %ecx in addition to the normal caller save regs.  The save/restore
  *       is a bit ugly, but will happen in the relatively uncommon path.
- * TUNING: Might be worthwhile to inline this.
  * TODO: Basic-block style Jit will need a hook here as well.  Fold it into
  *       the suspendCount check so we can get both in 1 shot.
- * TODO: to match the other intepreters, this should handle suspension
- *       and then check for debugger/profiling after dvmCheckSuspendPending
- *       returns.
  */
 common_periodicChecks:
     movl    offGlue_pSelfSuspendCount(%ecx),%eax    # eax <- &suspendCount
     cmpl    $0,(%eax)
     jne     1f
 
-#if defined(WITH_DEBUGGER) || defined(WITH_PROFILER)
-#if defined(WITH_DEBUGGER)
-    movl   offGlue_pDebuggerActive(%ecx),%eax      # eax <- &DebuggerActive
-#endif
-#if defined(WITH_PROFILER)
-    movl   offGlue_pActiveProfilers(%ecx),%ecx     # ecx <- &ActiveProfilers
-#endif
+6:
 #if defined(WITH_DEBUGGER) && defined(WITH_PROFILER)
-    # TODO: check for NULL before load
-    movzbl (%eax),%eax             # eax <- debuggerActive (boolean)
-    orl    (%ecx),%eax             # eax <- debuggerActive || activeProfilers
-#elif defined(WITH_DEBUGGER)
-    # TODO: check for NULL before load
-    movzbl (%eax),%eax             # eax <- debuggerActive (boolean)
-#elif defined(WITH_PROFILER)
-    movl   (%ecx),%eax             # eax <= activeProfilers
-#endif
+    movl   offGlue_pDebuggerActive(%ecx),%eax      # eax <- &DebuggerActive
+    movl   offGlue_pActiveProfilers(%ecx),%ecx     # ecx <- &ActiveProfilers
+    testl  %eax,%eax               # debugger enabled?
+    je     2f
+    movzbl (%eax),%eax             # get active count
+2:
+    orl    (%ecx),%eax             # eax <- debuggerActive | activeProfilers
     GET_GLUE(%ecx)                 # restore rGLUE
-    testl  %eax,%eax
     jne    3f                      # one or both active - switch interp
+#elif defined(WITH_DEBUGGER)
+    movl   offGlue_pDebuggerActive(%ecx),%eax      # eax <- &DebuggerActive
+    testl  %eax,%eax
+    je     5f
+    cmpl   $0,(%eax)              # debugger active?
+    jne    3f                      # switch interp if so
+#elif defined(WITH_PROFILER)
+    movl   offGlue_pActiveProfilers(%ecx),%eax     # eax <- &ActiveProfilers
+    cmpl   $0,(%eax)              # profiler active?
+    jne    3f                      # switch interp if so
 #endif
 
+5:
     ret
 
     /* Check for suspend */
@@ -9215,7 +9214,16 @@
     pop     %ebp
     UNSPILL(rPC)
     GET_GLUE(%ecx)
+
+    /*
+     * Need to check to see if debugger or profiler flags got set
+     * while we were suspended.
+     */
+#if defined(WITH_DEBUGGER) || defined(WITH_PROFILER)
+     jmp    6b
+#elif
     ret
+#endif
 
     /* Switch interpreters */
     /* Note: %ebx contains the 16-bit word offset to be applied to rPC to
diff --git a/vm/mterp/x86/footer.S b/vm/mterp/x86/footer.S
index 09eeb6e..800f7d3 100644
--- a/vm/mterp/x86/footer.S
+++ b/vm/mterp/x86/footer.S
@@ -268,45 +268,44 @@
  * On entry:
  *   ebx  -> PC adjustment in 16-bit words (must be preserved)
  *   ecx  -> GLUE pointer
+ *   reentry type, e.g. kInterpEntryInstr stored in rGLUE->entryPoint
  *
  * Note: A call will normally kill %eax, rPC/%edx and %ecx.  To
  *       streamline the normal case, this routine will preserve rPC and
  *       %ecx in addition to the normal caller save regs.  The save/restore
  *       is a bit ugly, but will happen in the relatively uncommon path.
- * TUNING: Might be worthwhile to inline this.
  * TODO: Basic-block style Jit will need a hook here as well.  Fold it into
  *       the suspendCount check so we can get both in 1 shot.
- * TODO: to match the other intepreters, this should handle suspension
- *       and then check for debugger/profiling after dvmCheckSuspendPending
- *       returns.
  */
 common_periodicChecks:
     movl    offGlue_pSelfSuspendCount(%ecx),%eax    # eax <- &suspendCount
     cmpl    $$0,(%eax)
     jne     1f
 
-#if defined(WITH_DEBUGGER) || defined(WITH_PROFILER)
-#if defined(WITH_DEBUGGER)
-    movl   offGlue_pDebuggerActive(%ecx),%eax      # eax <- &DebuggerActive
-#endif
-#if defined(WITH_PROFILER)
-    movl   offGlue_pActiveProfilers(%ecx),%ecx     # ecx <- &ActiveProfilers
-#endif
+6:
 #if defined(WITH_DEBUGGER) && defined(WITH_PROFILER)
-    # TODO: check for NULL before load
-    movzbl (%eax),%eax             # eax <- debuggerActive (boolean)
-    orl    (%ecx),%eax             # eax <- debuggerActive || activeProfilers
-#elif defined(WITH_DEBUGGER)
-    # TODO: check for NULL before load
-    movzbl (%eax),%eax             # eax <- debuggerActive (boolean)
-#elif defined(WITH_PROFILER)
-    movl   (%ecx),%eax             # eax <= activeProfilers
-#endif
+    movl   offGlue_pDebuggerActive(%ecx),%eax      # eax <- &DebuggerActive
+    movl   offGlue_pActiveProfilers(%ecx),%ecx     # ecx <- &ActiveProfilers
+    testl  %eax,%eax               # debugger enabled?
+    je     2f
+    movzbl (%eax),%eax             # get active count
+2:
+    orl    (%ecx),%eax             # eax <- debuggerActive | activeProfilers
     GET_GLUE(%ecx)                 # restore rGLUE
-    testl  %eax,%eax
     jne    3f                      # one or both active - switch interp
+#elif defined(WITH_DEBUGGER)
+    movl   offGlue_pDebuggerActive(%ecx),%eax      # eax <- &DebuggerActive
+    testl  %eax,%eax
+    je     5f
+    cmpl   $$0,(%eax)              # debugger active?
+    jne    3f                      # switch interp if so
+#elif defined(WITH_PROFILER)
+    movl   offGlue_pActiveProfilers(%ecx),%eax     # eax <- &ActiveProfilers
+    cmpl   $$0,(%eax)              # profiler active?
+    jne    3f                      # switch interp if so
 #endif
 
+5:
     ret
 
     /* Check for suspend */
@@ -330,7 +329,16 @@
     pop     %ebp
     UNSPILL(rPC)
     GET_GLUE(%ecx)
+
+    /*
+     * Need to check to see if debugger or profiler flags got set
+     * while we were suspended.
+     */
+#if defined(WITH_DEBUGGER) || defined(WITH_PROFILER)
+     jmp    6b
+#elif
     ret
+#endif
 
     /* Switch interpreters */
     /* Note: %ebx contains the 16-bit word offset to be applied to rPC to
diff --git a/vm/native/InternalNative.c b/vm/native/InternalNative.c
index 700c8af..62d4d04 100644
--- a/vm/native/InternalNative.c
+++ b/vm/native/InternalNative.c
@@ -335,7 +335,8 @@
         }
 
         /* all good, raise volatile readiness flag */
-        gDvm.javaSecurityAccessControllerReady = true;
+        android_atomic_release_store(true,
+            &gDvm.javaSecurityAccessControllerReady);
     }
 
     for (i = 0; i < NUM_DOPRIV_FUNCS; i++) {
diff --git a/vm/oo/Class.c b/vm/oo/Class.c
index f10a9c6..e0d002d 100644
--- a/vm/oo/Class.c
+++ b/vm/oo/Class.c
@@ -1123,17 +1123,8 @@
  */
 void dvmSetClassSerialNumber(ClassObject* clazz)
 {
-    u4 oldValue, newValue;
-
     assert(clazz->serialNumber == 0);
-
-    do {
-        oldValue = gDvm.classSerialNumber;
-        newValue = oldValue + 1;
-    } while (android_atomic_release_cas(oldValue, newValue,
-            &gDvm.classSerialNumber) != 0);
-
-    clazz->serialNumber = (u4) oldValue;
+    clazz->serialNumber = android_atomic_inc(&gDvm.classSerialNumber);
 }
 
 
@@ -1444,7 +1435,10 @@
         clazz = loadClassFromDex(pDvmDex, pClassDef, loader);
         if (dvmCheckException(self)) {
             /* class was found but had issues */
-            dvmReleaseTrackedAlloc((Object*) clazz, NULL);
+            if (clazz != NULL) {
+                dvmFreeClassInnards(clazz);
+                dvmReleaseTrackedAlloc((Object*) clazz, NULL);
+            }
             goto bail;
         }
 
@@ -1476,6 +1470,7 @@
 
             /* Let the GC free the class.
              */
+            dvmFreeClassInnards(clazz);
             dvmReleaseTrackedAlloc((Object*) clazz, NULL);
 
             /* Grab the winning class.