auto import from //depot/cupcake/@135843

commit: f6c387128427e121477c1b32ad35cdcaa5101ba3 [log] [tgz]
author: The Android Open Source Project <initial-contribution@android.com> Tue Mar 03 19:28:47 2009 -0800
committer: The Android Open Source Project <initial-contribution@android.com> Tue Mar 03 19:28:47 2009 -0800
tree: 2aa25fa8c8c3a9caeecf98fd8ac4cd9b12717997
parent: f72d5de56a522ac3be03873bdde26f23a5eeeb3c [diff] [blame]
diff --git a/docs/dexopt.html b/docs/dexopt.html
new file mode 100644
index 0000000..7f0b4bc
--- /dev/null
+++ b/docs/dexopt.html

@@ -0,0 +1,326 @@
+<html>
+<head>
+    <title>Dalvik Optimization and Verification</title>
+</head>
+
+<body>
+<h1>Dalvik Optimization and Verification With <i>dexopt</i></h1>
+
+<p>
+The Dalvik virtual machine was designed specifically for the Android
+mobile platform.  The target systems have little RAM, store data on slow
+internal flash memory, and generally have the performance characteristics
+of decade-old desktop systems.  They also run Linux, which provides
+virtual memory, processes and threads, and UID-based security mechanisms.
+<p>
+The features and limitations caused us to focus on certain goals:
+
+<ul>
+    <li>Class data, notably bytecode, must be shared between multiple
+    processes to minimize total system memory usage.
+    <li>The overhead in launching a new app must be minimized to keep
+    the device responsive.
+    <li>Storing class data in individual files results in a lot of
+    redundancy, especially with respect to strings.  To conserve disk
+    space we need to factor this out.
+    <li>Parsing class data fields adds unnecessary overhead during
+    class loading.  Accessing data values (e.g. integers and strings)
+    directly as C types is better.
+    <li>Bytecode verification is necessary, but slow, so we want to verify
+    as much as possible outside app execution.
+    <li>Bytecode optimization (quickened instructions, method pruning) is
+    important for speed and battery life.
+    <li>For security reasons, processes may not edit shared code.
+</ul>
+
+<p>
+The typical VM implementation uncompresses individual classes from a
+compressed archive and stores them on the heap.  This implies a separate
+copy of each class in every process, and slows application startup because
+the code must be uncompressed (or at least read off disk in many small
+pieces).  On the other hand, having the bytecode on the local heap makes
+it easy to rewrite instructions on first use, facilitating a number of
+different optimizations.
+<p>
+The goals led us to make some fundamental decisions:
+
+<ul>
+    <li>Multiple classes are aggregated into a single "DEX" file.
+    <li>DEX files are mapped read-only and shared between processes.
+    <li>Byte ordering and word alignment are adjusted to suit the local
+    system.
+    <li>Bytecode verification is mandatory for all classes, but we want
+    to "pre-verify" whatever we can.
+    <li>Optimizations that require rewriting bytecode must be done ahead
+    of time.
+</ul>
+
+<p>
+The consequences of these decisions are explained in the following sections.
+
+
+<h2>VM Operation</h2>
+
+<p>
+Application code is delivered to the system in a <code>.jar</code>
+or <code>.apk</code> file.  These are really just <code>.zip</code>
+archives with some meta-data files added.  The Dalvik DEX data file
+is always called <code>classes.dex</code>.
+<p>
+The bytecode cannot be memory-mapped and executed directly from the zip
+file, because the data is compressed and the start of the file is not
+guaranteed to be word-aligned.  These problems could be addressed by
+storing <code>classes.dex</code> without compression and padding out the zip
+file, but that would increase the size of the package sent across the
+data network.
+<p>
+We need to extract <code>classes.dex</code> from the zip archive before
+we can use it.  While we have the file available, we might as well perform
+some of the other actions (realignment, optimization, verification) described
+earlier.  This raises a new question however: who is responsible for doing
+this, and where do we keep the output?
+
+<h3>Preparation</h3>
+
+<p>
+There are at least three different ways to create a "prepared" DEX file,
+sometimes known as "ODEX" (for Optimized DEX):
+<ol>
+    <li>The VM does it "just in time".  The output goes into a special
+    <code>dalvik-cache</code> directory.  This works on the desktop and
+    engineering-only device builds where the permissions on the
+    <code>dalvik-cache</code> directory are not restricted.  On production
+    devices, this is not allowed.
+    <li>The system installer does it when an application is first added.
+    It has the privileges required to write to <code>dalvik-cache</code>.
+    <li>The build system does it ahead of time.  The relevant <code>jar</code>
+    / <code>apk</code> files are present, but the <code>classes.dex</code>
+    is stripped out.  The optimized DEX is stored next to the original
+    zip archive, not in <code>dalvik-cache</code>, and is part of the
+    system image.
+</ol>
+<p>
+The <code>dalvik-cache</code> directory is more accurately
+<code>$ANDROID_DATA/data/dalvik-cache</code>.  The files inside it have
+names derived from the full path of the source DEX.  On the device the
+directory is owned by <code>system</code> / <code>system</code>
+and has 0771 permissions, and the optimized DEX files stored there are
+owned by <code>system</code> and the
+application's group, with 0644 permissions.  DRM-locked applications will
+use 640 permissions to prevent other user applications from examining them.
+The bottom line is that you can read your own DEX file and those of most
+other applications, but you cannot create, modify, or remove them.
+<p>
+Preparation of the DEX file for the "just in time" and "system installer"
+approaches proceeds in three steps:
+<p>
+First, the dalvik-cache file is created.  This must be done in a process
+with appropriate privileges, so for the "system installer" case this is
+done within <code>installd</code>, which runs as root.
+<p>
+Second, the <code>classes.dex</code> entry is extracted from the the zip
+archive.  A small amount of space is left at the start of the file for
+the ODEX header.
+<p>
+Third, the file is memory-mapped for easy access and tweaked for use on
+the current system.  This includes byte-swapping and structure realigning,
+but no meaningful changes to the DEX file.  We also do some basic
+structure checks, such as ensuring that file offsets and data indices
+fall within valid ranges.
+<p>
+The build system uses a hairy process that involves starting the
+emulator, forcing just-in-time optimization of all relevant DEX files,
+and then extracting the results from <code>dalvik-cache</code>.  The
+reasons for doing this, rather than using a tool that runs on the desktop,
+will become more apparent when the optimizations are explained.
+<p>
+Once the code is byte-swapped and aligned, we're ready to go.  We append
+some pre-computed data, fill in the ODEX header at the start of the file,
+and start executing.  (The header is filled in last, so that we don't
+try to use a partial file.)  If we're interested in verification and
+optimization, however, we need to insert a step after the initial prep.
+
+<h3>dexopt</h3>
+
+<p>
+We want to verify and optimize all of the classes in the DEX file.  The
+easiest and safest way to do this is to load all of the classes into
+the VM and run through them.  Anything that fails to load is simply not
+verified or optimized.  Unfortunately, this can cause allocation of some
+resources that are difficult to release (e.g. loading of native shared
+libraries), so we don't want to do it in the same virtual machine that
+we're running applications in.
+<p>
+The solution is to invoke a program called <code>dexopt</code>, which
+is really just a back door into the VM.  It performs an abbreviated VM
+initialization, loads zero or more DEX files from the bootstrap class
+path, and then sets about verifying and optimizing whatever it can from
+the target DEX.  On completion, the process exits, freeing all resources.
+<p>
+It is possible for multiple VMs to want the same DEX file at the same
+time.  File locking is used to ensure that dexopt is only run once.
+
+
+<h2>Verification</h2>
+
+<p>
+The bytecode verification process involves scanning through the instructions
+in every method in every class in a DEX file.  The goal is to identify
+illegal instruction sequences so that we don't have to check for them at
+run time.  Many of the computations involved are also necessary for "exact"
+garbage collection.  See
+<a href="verifier.html">Dalvik Bytecode Verifier Notes</a> for more
+information.
+<p>
+For performance reasons, the optimizer (described in the next section)
+assumes that the verifier has run successfully, and makes some potentially
+unsafe assumptions.  By default, Dalvik insists upon verifying all classes,
+and only optimizes classes that have been verified.  If you want to
+disable the verifier, you can use command-line flags to do so.  See also
+<a href="embedded-vm-control.html"> Controlling the Embedded VM</a>
+for instructions on controlling these
+features within the Android application framework.
+<p>
+Reporting of verification failures is a tricky issue.  For example,
+calling a package-scope method on a class in a different package is
+illegal and will be caught by the verifier.  We don't necessarily want
+to report it during verification though -- we actually want to throw
+an exception when the method call is attempted.  Checking the access
+flags on every method call is expensive though.  The
+<a href="verifier.html">Dalvik Bytecode Verifier Notes</a> document
+addresses this issue.
+<p>
+Classes that have been verified successfully have a flag set in the ODEX.
+They will not be re-verified when loaded.  The Linux access permissions
+are expected to prevent tampering; if you can get around those, installing
+faulty bytecode is far from the easiest line of attack.  The ODEX file has
+a 32-bit checksum, but that's chiefly present as a quick check for
+corrupted data.
+
+
+<h2>Optimization</h2>
+
+<p>
+Virtual machine interpreters typically perform certain optimizations the
+first time a piece of code is used.  Constant pool references are replaced
+with pointers to internal data structures, operations that always succeed
+or always work a certain way are replaced with simpler forms.  Some of
+these require information only available at runtime, others can be inferred
+statically when certain assumptions are made.
+<p>
+The Dalvik optimizer does the following:
+<ul>
+    <li>For virtual method calls, replace the method index with a
+    vtable index.
+    <li>For instance field get/put, replace the field index with
+    a byte offset.  Also, merge the boolean / byte / char / short
+    variants into a single 32-bit form (less code in the interpreter
+    means more room in the CPU I-cache).
+    <li>Replace a handful of high-volume calls, like String.length(),
+    with "inline" replacements.  This skips the usual method call
+    overhead, directly switching from the interpreter to a native
+    implementation.
+    <li>Prune empty methods.  The simplest example is
+    <code>Object.&lt;init&gt;</code>, which does nothing, but must be
+    called whenever any object is allocated.  The instruction is
+    replaced with a new version that acts as a no-op unless a debugger
+    is attached.
+    <li>Append pre-computed data.  For example, the VM wants to have a
+    hash table for lookups on class name.  Instead of computing this
+    when the DEX file is loaded, we can compute it now, saving heap
+    space and computation time in every VM where the DEX is loaded.
+</ul>
+
+<p>
+All of the instruction modifications involve replacing the opcode with
+one not defined by the Dalvik specification.  This allows us to freely
+mix optimized and unoptimized instructions.  The set of optimized
+instructions, and their exact representation, is tied closely to the VM
+version.
+<p>
+Most of the optimizations are obvious "wins".  The use of raw indices
+and offsets not only allows us to execute more quickly, we can also
+skip the initial symbolic resolution.  Pre-computation eats up
+disk space, and so must be done in moderation.
+<p>
+There are a couple of potential sources of trouble with these
+optimizations.  First, vtable indices and byte offsets are subject to
+change if the VM is updated.  Second, if a superclass is in a different
+DEX, and that other DEX is updated, we need to ensure that our optimized
+indices and offsets are updated as well.  A similar but more subtle
+problem emerges when user-defined class loaders are employed: the class
+we actually call may not be the one we expected to call.
+<p>These problems are addressed with dependency lists and some limitations
+on what can be optimized.
+
+
+<h2>Dependencies and Limitations</h2>
+
+<p>
+The optimized DEX file includes a list of dependencies on other DEX files,
+plus the CRC-32 and modification date from the originating
+<code>classes.dex</code> zip file entry.  The dependency list includes the
+full path to the <code>dalvik-cache</code> file, and the file's SHA-1
+signature.  The timestamps of files on the device are unreliable and
+not used.  The dependency area also includes the VM version number.
+<p>
+An optimized DEX is dependent upon all of the DEX files in the bootstrap
+class path.  DEX files that are part of the bootstrap class path depend
+upon the DEX files that appeared earlier.  To ensure that nothing outside
+the dependent DEX files is available, <code>dexopt</code> only loads the
+bootstrap classes.  References to classes in other DEX files fail, which
+causes class loading and/or verification to fail, and classes with
+external dependencies are simply not optimized.
+<p>
+This means that splitting code out into many separate DEX files has a
+disadvantage: virtual method calls and instance field lookups between
+non-boot DEX files can't be optimized.  Because verification is pass/fail
+with class granularity, no method in a class that has any reliance on
+classes in external DEX files can be optimized.  This may be a bit
+heavy-handed, but it's the only way to guarantee that nothing breaks
+when individual pieces are updated.
+<p>
+Another negative consequence: any change to a bootstrap DEX will result
+in rejection of all optimized DEX files.  This makes it hard to keep
+system updates small.
+<p>
+Despite our caution, there is still a possibility that a class in a DEX
+file loaded by a user-defined class loader could ask for a bootstrap class
+(say, String) and be given a different class with the same name.  If a
+class in the DEX file being processed has the same name as a class in the
+bootstrap DEX files, the class will be flagged as ambiguous and references
+to it will not be resolved during verification / optimization.  The class
+linking code in the VM does additional checks to plug another hole;
+see the verbose description in the VM sources for details (vm/oo/Class.c).
+<p>
+If one of the dependencies is updated, we need to re-verify and
+re-optimize the DEX file.  If we can do a just-in-time <code>dexopt</code>
+invocation, this is easy.  If we have to rely on the installer daemon, or
+the DEX was shipped only in ODEX, then the VM has to reject the DEX.
+<p>
+The output of <code>dexopt</code> is byte-swapped and struct-aligned
+for the host, and contains indices and offsets that are highly VM-specific
+(both version-wise and platform-wise).  For this reason it's tricky to
+write a version of <code>dexopt</code> that runs on the desktop but
+generates output suitable for a particular device.  The safest way to
+invoke it is on the target device, or on an emulator for that device.
+
+
+<h2>Generated DEX</h2>
+
+<p>
+Some languages and frameworks rely on the ability to generate bytecode
+and execute it.  The rather heavy <code>dexopt</code> verification and
+optimization model doesn't work well with that.
+<p>
+We intend to support this in a future release, but the exact method is
+to be determined.  We may allow individual classes to be added or whole
+DEX files; may allow Java bytecode or Dalvik bytecode in instructions;
+may perform the usual set of optimizations, or use a separate interpreter
+that performs on-first-use optimizations directly on the bytecode (which
+won't be mapped read-only, since it's locally defined).
+
+<address>Copyright &copy; 2008 The Android Open Source Project</address>
+
+</body>
+</html>
commit	f6c387128427e121477c1b32ad35cdcaa5101ba3	[log] [tgz]
author	The Android Open Source Project <initial-contribution@android.com>	Tue Mar 03 19:28:47 2009 -0800
committer	The Android Open Source Project <initial-contribution@android.com>	Tue Mar 03 19:28:47 2009 -0800
tree	2aa25fa8c8c3a9caeecf98fd8ac4cd9b12717997
parent	f72d5de56a522ac3be03873bdde26f23a5eeeb3c [diff] [blame]