| <html> |
| <head> |
| <title>Dalvik Bytecode Verifier Notes</title> |
| </head> |
| |
| <body> |
| <h1>Dalvik Bytecode Verifier Notes</h1> |
| |
| <p> |
| The bytecode verifier in the Dalvik VM attempts to provide the same sorts |
| of checks and guarantees that other popular virtual machines do. We |
| perform generally the same set of checks as are described in _The Java |
| Virtual Machine Specification, Second Edition_, including the updates |
| planned for the Third Edition. |
| |
| <p> |
| Verification can be enabled for all classes, disabled for all, or enabled |
| only for "remote" (non-bootstrap) classes. It should be performed for any |
| class that will be processed with the DEX optimizer, and in fact the |
| default VM behavior is to only optimize verified classes. |
| |
| |
| <h2>Why Verify?</h2> |
| |
| <p> |
| The verification process adds additional time to the build and to |
| the installation of new applications. It's fairly quick for app-sized |
| DEX files, but rather slow for the big "core" and "framework" files. |
| Why do it all, when our system relies on UNIX processes for security? |
| <p> |
| <ol> |
| <li>Optimizations. The interpreter can ignore a lot of potential |
| error cases because the verifier guarantees that they are impossible. |
| Also, we can optimize the DEX file more aggressively if we start |
| with a stronger set of assumptions about the bytecode. |
| <li>"Precise" GC. The work peformed during verification has significant |
| overlap with the work required to compute register use maps for |
| type-precise GC. |
| <li>Intra-application security. If an app wants to download bits |
| of interpreted code over the network and execute them, it can safely |
| do so using well-established security mechanisms. |
| <li>3rd party app failure analysis. We have no way to control the |
| tools and post-processing utilities that external developers employ, |
| so when we get bug reports with a weird exception or native crash |
| it's very helpful to start with the assumption that the bytecode |
| is valid. |
| </ol> |
| <p> |
| It's also a convenient framework to deal with certain situations, notably |
| replacement of instructions that access volatile 64-bit fields with |
| more rigorous versions that guarantee atomicity. |
| |
| |
| <h2>Verifier Differences</h2> |
| |
| <p> |
| There are a few checks that the Dalvik bytecode verifier does not perform, |
| because they're not relevant. For example: |
| <ul> |
| <li>Type restrictions on constant pool references are not enforced, |
| because Dalvik does not have a pool of typed constants. (Dalvik |
| uses a simple index into type-specific pools.) |
| <li>Verification of the operand stack size is not performed, because |
| Dalvik does not have an operand stack. |
| <li>Limitations on <code>jsr</code> and <code>ret</code> do not apply, |
| because Dalvik doesn't support subroutines. |
| </ul> |
| |
| In some cases they are implemented differently, e.g.: |
| <ul> |
| <li>In a conventional VM, backward branches and exceptions are |
| forbidden when a local variable holds an uninitialized reference. The |
| restriction was changed to mark registers as invalid when they hold |
| references to the uninitialized result of a previous invocation of the |
| same <code>new-instance</code> instruction. |
| This solves the same problem -- trickery potentially allowing |
| uninitialized objects to slip past the verifier -- without unduly |
| limiting branches. |
| </ul> |
| |
| There are also some new ones, such as: |
| <ul> |
| <li>The <code>move-exception</code> instruction can only appear as |
| the first instruction in an exception handler. |
| <li>The <code>move-result*</code> instructions can only appear |
| immediately after an appropriate <code>invoke-*</code> |
| or <code>filled-new-array</code> instruction. |
| </ul> |
| |
| <p> |
| The VM is permitted but not required to enforce "structured locking" |
| constraints, which are designed to ensure that, when a method returns, all |
| monitors locked by the method have been unlocked an equal number of times. |
| This is not currently implemented. |
| |
| <p> |
| The Dalvik verifier is more restrictive than other VMs in one area: |
| type safety on sub-32-bit integer widths. These additional restrictions |
| should make it impossible to, say, pass a value outside the range |
| [-128, 127] to a function that takes a <code>byte</code> as an argument. |
| |
| |
| <h2>Monitor Verification</h2> |
| |
| <p> |
| If a method locks an object with a <code>synchronized</code> statement, the |
| object must be unlocked before the method returns. At the bytecode level, |
| this means the method must execute a matching <code>monitor-exit</code> |
| for every <code>monitor-enter</code> instruction, whether the function |
| completes normally or abnormally. The bytecode verifier optionally |
| enforces this. |
| |
| <p> |
| The verifier uses a fairly simple-minded model. If you enter a monitor |
| held in register N, you can exit the monitor using register N or any |
| subsequently-made copies of register N. The verifier does not attempt |
| to identify previously-made copies, track loads and stores through |
| fields, or recognize identical constant values (for example, the result |
| values from two <code>const-class</code> instructions on the same class |
| will be the same reference, but the verifier doesn't recognize this). |
| |
| <p> |
| Further, you may only exit the monitor most recently entered. "Hand |
| over hand" locking techniques, e.g. "lock A; lock B; unlock A; unlock B", |
| are not allowed. |
| |
| <p> |
| This means that there are a number of situations in which the verifier |
| will throw an exception on code that would execute correctly at run time. |
| This is not expected to be an issue for compiler-generated bytecode. |
| |
| <p> |
| For implementation convenience, the maximum nesting depth of |
| <code>synchronized</code> statements has been set to 32. This is not |
| a limitation on the recursion count. The only way to trip this would be |
| to have a single method with more than 32 nested <code>synchronized</code> |
| statements, something that is unlikely to occur. |
| |
| |
| <h2>Verification Failures</h2> |
| |
| <p> |
| The verifier may reject a class immediately, or it may defer throwing |
| an exception until the code is actually used. For example, if a class |
| attempts to perform an illegal access on a field, the VM should throw |
| an IllegalAccessError the first time the instruction is encountered. |
| On the other hand, if a class contains an invalid bytecode, it should be |
| rejected immediately with a VerifyError. |
| |
| <p> |
| Immediate VerifyErrors are accompanied by detailed, if somewhat cryptic, |
| information in the log file. From this it's possible to determine the |
| exact instruction that failed, and the reason for the failure. |
| |
| <p> |
| It's a bit tricky to implement deferred verification errors in Dalvik. |
| A few approaches were considered: |
| |
| <ol> |
| <li>We could replace the invalid field access instruction with a special |
| instruction that generates an illegal access error, and allow class |
| verification to complete successfully. This type of verification must |
| be deferred to first class load, rather than be performed ahead of time |
| during DEX optimization, because some failures will depend on the current |
| execution environment (e.g. not all classes are available at dexopt time). |
| At that point the bytecode instructions are mapped read-only during |
| verification, so rewriting them isn't possible. |
| </li> |
| |
| <li>We can perform the access checks when the field/method/class is |
| resolved. In a typical VM implementation we would do the check when the |
| entry is resolved in the context of the current classfile, but our DEX |
| files combine multiple classfiles together, merging the field/method/class |
| resolution results into a single large table. Once one class successfully |
| resolves the field, every other class in the same DEX file would be able |
| to access the field. This is incorrect. |
| </li> |
| |
| <li>Perform the access checks on every field/method/class access. |
| This adds significant overhead. This is mitigated somewhat by the DEX |
| optimizer, which will convert many field/method/class accesses into a |
| simpler form after performing the access check. However, not all accesses |
| can be optimized (e.g. accesses to classes unknown at dexopt time), |
| and we don't currently have an optimized form of certain instructions |
| (notably static field operations). |
| </li> |
| </ol> |
| |
| <p> |
| In early versions of Dalvik (as found in Android 1.6 and earlier), the verifier |
| simply regarded all problems as immediately fatal. This generally worked, |
| but in some cases the VM was rejecting classes because of bits of code |
| that were never used. The VerifyError itself was sometimes difficult to |
| decipher, because it was thrown during verification rather than at the |
| point where the problem was first noticed during execution. |
| <p> |
| The current version uses a variation of approach #1. The dexopt |
| command works the way it did before, leaving the code untouched and |
| flagging fully-correct classes as "pre-verified". When the VM loads a |
| class that didn't pass pre-verification, the verifier is invoked. If a |
| "deferrable" problem is detected, a modifiable copy of the instructions |
| in the problematic method is made. In that copy, the troubled instruction |
| is replaced with an "always throw" opcode, and verification continues. |
| |
| <p> |
| In the example used earlier, an attempt to read from an inaccessible |
| field would result in the "field get" instruction being replaced by |
| "always throw IllegalAccessError on field X". Creating copies of method |
| bodies requires additional heap space, but since this affects very few |
| methods overall the memory impact should be minor. |
| |
| <p> |
| <address>Copyright © 2008 The Android Open Source Project</address> |
| |
| </body> |
| </html> |