| Dalvik "mterp" README |
| |
| NOTE: Find rebuilding instructions at the bottom of this file. |
| |
| |
| ==== Overview ==== |
| |
| This is the source code for the Dalvik interpreter. The core of the |
| original version was implemented as a single C function, but to improve |
| performance we rewrote it in assembly. To make this and future assembly |
| ports easier and less error-prone, we used a modular approach that allows |
| development of platform-specific code one opcode at a time. |
| |
| The original all-in-one-function C version still exists as the "portable" |
| interpreter, and is generated using the same sources and tools that |
| generate the platform-specific versions. |
| |
| Every configuration has a "config-*" file that controls how the sources |
| are generated. The sources are written into the "out" directory, where |
| they are picked up by the Android build system. |
| |
| The best way to become familiar with the interpreter is to look at the |
| generated files in the "out" directory, such as out/InterpC-portstd.c, |
| rather than trying to look at the various component pieces in (say) |
| armv5te. |
| |
| |
| ==== Platform-specific source generation ==== |
| |
| The architecture-specific config files determine what goes into two |
| generated output files (InterpC-<arch>.c, InterpAsm-<arch>.S). The goal is |
| to make it easy to swap C and assembly sources during initial development |
| and testing, and to provide a way to use architecture-specific versions of |
| some operations (e.g. making use of PLD instructions on ARMv6 or avoiding |
| CLZ on ARMv4T). |
| |
| Depending on architecture, instruction-to-instruction transitions may |
| be done as either computed goto or jump table. In the computed goto |
| variant, each instruction handler is allocated a fixed-size area (e.g. 64 |
| byte). "Overflow" code is tacked on to the end. In the jump table variant, |
| all of the instructions handlers are contiguous and may be of any size. |
| The interpreter style is selected via the "handler-size" command (see below). |
| |
| When a C implementation for an instruction is desired, the assembly |
| version packs all local state into the Thread structure and passes |
| that to the C function. Updates to the state are pulled out of |
| "Thread" on return. |
| |
| The "arch" value should indicate an architecture family with common |
| programming characteristics, so "armv5te" would work for all ARMv5TE CPUs, |
| but might not be backward- or forward-compatible. (We *might* want to |
| specify the ABI model as well, e.g. "armv5te-eabi", but currently that adds |
| verbosity without value.) |
| |
| |
| ==== Config file format ==== |
| |
| The config files are parsed from top to bottom. Each line in the file |
| may be blank, hold a comment (line starts with '#'), or be a command. |
| |
| The commands are: |
| |
| handler-style <computed-goto|jump-table|all-c> |
| |
| Specify which style of interpreter to generate. In computed-goto, |
| each handler is allocated a fixed region, allowing transitions to |
| be done via table-start-address + (opcode * handler-size). With |
| jump-table style, handlers may be of any length, and the generated |
| table is an array of pointers to the handlers. The "all-c" style is |
| for the portable interpreter (which is implemented completely in C). |
| [Note: all-c is distinct from an "allstubs" configuration. In both |
| configurations, all handlers are the C versions, but the allstubs |
| configuration uses the assembly outer loop and assembly stubs to |
| transition to the handlers]. This command is required, and must be |
| the first command in the config file. |
| |
| handler-size <bytes> |
| |
| Specify the size of the fixed region, in bytes. On most platforms |
| this will need to be a power of 2. For jump-table and all-c |
| implementations, this command is ignored. |
| |
| import <filename> |
| |
| The specified file is included immediately, in its entirety. No |
| substitutions are performed. ".cpp" and ".h" files are copied to the |
| C output, ".S" files are copied to the asm output. |
| |
| asm-stub <filename> |
| |
| The named file will be included whenever an assembly "stub" is needed |
| to transfer control to a handler written in C. Text substitution is |
| performed on the opcode name. This command is not applicable to |
| to "all-c" configurations. |
| |
| asm-alt-stub <filename> |
| |
| When present, this command will cause the generation of an alternate |
| set of entry points (for computed-goto interpreters) or an alternate |
| jump table (for jump-table interpreters). |
| |
| op-start <directory> |
| |
| Indicates the start of the opcode list. Must precede any "op" |
| commands. The specified directory is the default location to pull |
| instruction files from. |
| |
| op <opcode> <directory> |
| |
| Can only appear after "op-start" and before "op-end". Overrides the |
| default source file location of the specified opcode. The opcode |
| definition will come from the specified file, e.g. "op OP_NOP armv5te" |
| will load from "armv5te/OP_NOP.S". A substitution dictionary will be |
| applied (see below). |
| |
| alt <opcode> <directory> |
| |
| Can only appear after "op-start" and before "op-end". Similar to the |
| "op" command above, but denotes a source file to override the entry |
| in the alternate handler table. The opcode definition will come from |
| the specified file, e.g. "alt OP_NOP armv5te" will load from |
| "armv5te/ALT_OP_NOP.S". A substitution dictionary will be applied |
| (see below). |
| |
| op-end |
| |
| Indicates the end of the opcode list. All kNumPackedOpcodes |
| opcodes are emitted when this is seen, followed by any code that |
| didn't fit inside the fixed-size instruction handler space. |
| |
| The order of "op" and "alt" directives are not significant; the generation |
| tool will extract ordering info from the VM sources. |
| |
| Typically the form in which most opcodes currently exist is used in |
| the "op-start" directive. For a new port you would start with "c", |
| and add architecture-specific "op" entries as you write instructions. |
| When complete it will default to the target architecture, and you insert |
| "c" ops to stub out platform-specific code. |
| |
| For the <directory> specified in the "op" command, the "c" directory |
| is special in two ways: (1) the sources are assumed to be C code, and |
| will be inserted into the generated C file; (2) when a C implementation |
| is emitted, a "glue stub" is emitted in the assembly source file. |
| (The generator script always emits kNumPackedOpcodes assembly |
| instructions, unless "asm-stub" was left blank, in which case it only |
| emits some labels.) |
| |
| |
| ==== Instruction file format ==== |
| |
| The assembly instruction files are simply fragments of assembly sources. |
| The starting label will be provided by the generation tool, as will |
| declarations for the segment type and alignment. The expected target |
| assembler is GNU "as", but others will work (may require fiddling with |
| some of the pseudo-ops emitted by the generation tool). |
| |
| The C files do a bunch of fancy things with macros in an attempt to share |
| code with the portable interpreter. (This is expected to be reduced in |
| the future.) |
| |
| A substitution dictionary is applied to all opcode fragments as they are |
| appended to the output. Substitutions can look like "$value" or "${value}". |
| |
| The dictionary always includes: |
| |
| $opcode - opcode name, e.g. "OP_NOP" |
| $opnum - opcode number, e.g. 0 for OP_NOP |
| $handler_size_bytes - max size of an instruction handler, in bytes |
| $handler_size_bits - max size of an instruction handler, log 2 |
| |
| Both C and assembly sources will be passed through the C pre-processor, |
| so you can take advantage of C-style comments and preprocessor directives |
| like "#define". |
| |
| Some generator operations are available. |
| |
| %include "filename" [subst-dict] |
| |
| Includes the file, which should look like "armv5te/OP_NOP.S". You can |
| specify values for the substitution dictionary, using standard Python |
| syntax. For example, this: |
| %include "armv5te/unop.S" {"result":"r1"} |
| would insert "armv5te/unop.S" at the current file position, replacing |
| occurrences of "$result" with "r1". |
| |
| %default <subst-dict> |
| |
| Specify default substitution dictionary values, using standard Python |
| syntax. Useful if you want to have a "base" version and variants. |
| |
| %break |
| |
| Identifies the split between the main portion of the instruction |
| handler (which must fit in "handler-size" bytes) and the "sister" |
| code, which is appended to the end of the instruction handler block. |
| In jump table implementations, %break is ignored. |
| |
| %verify "message" |
| |
| Leave a note to yourself about what needs to be tested. (This may |
| turn into something more interesting someday; for now, it just gets |
| stripped out before the output is generated.) |
| |
| The generation tool does *not* print a warning if your instructions |
| exceed "handler-size", but the VM will abort on startup if it detects an |
| oversized handler. On architectures with fixed-width instructions this |
| is easy to work with, on others this you will need to count bytes. |
| |
| |
| ==== Using C constants from assembly sources ==== |
| |
| The file "common/asm-constants.h" has some definitions for constant |
| values, structure sizes, and struct member offsets. The format is fairly |
| restricted, as simple macros are used to massage it for use with both C |
| (where it is verified) and assembly (where the definitions are used). |
| |
| If a constant in the file becomes out of sync, the VM will log an error |
| message and abort during startup. |
| |
| |
| ==== Development tips ==== |
| |
| If you need to debug the initial piece of an opcode handler, and your |
| debug code expands it beyond the handler size limit, you can insert a |
| generic header at the top: |
| |
| b ${opcode}_start |
| %break |
| ${opcode}_start: |
| |
| If you already have a %break, it's okay to leave it in place -- the second |
| %break is ignored. |
| |
| |
| ==== Rebuilding ==== |
| |
| If you change any of the source file fragments, you need to rebuild the |
| combined source files in the "out" directory. Make sure the files in |
| "out" are editable, then: |
| |
| $ cd mterp |
| $ ./rebuild.sh |
| |
| As of this writing, this requires Python 2.5. You may see inscrutible |
| error messages or just general failure if you have a different version |
| of Python installed. |
| |
| The ultimate goal is to have the build system generate the necessary |
| output files without requiring this separate step, but we're not yet |
| ready to require Python in the build. |
| |
| ==== Interpreter Control ==== |
| |
| The central mechanism for interpreter control is the InterpBreak struture |
| that is found in each thread's Thread struct (see vm/Thread.h). There |
| is one mandatory field, and two optional fields: |
| |
| subMode - required, describes debug/profile/special operation |
| breakFlags & curHandlerTable - optional, used lower subMode polling costs |
| |
| The subMode field is a bitmask which records all currently active |
| special modes of operation. For example, when Traceview profiling |
| is active, kSubModeMethodTrace is set. This bit informs the interpreter |
| that it must notify the profiling subsystem on each method entry and |
| return. There are similar bits for an active debugging session, |
| instruction count profiling, pending thread suspension request, etc. |
| |
| To support special subMode operation the simplest mechanism for the |
| interpreter is to poll the subMode field before interpreting each Dalvik |
| bytecode and take any required action. In fact, this is precisely |
| what the portable interpreter does. The "FINISH" macro expands to |
| include a test of subMode and subsequent call to the "dvmCheckBefore()". |
| |
| Per-instruction polling, however, is expensive and subMode operation is |
| relative rare. For normal operation we'd like to avoid having to perform |
| any checks unless a special subMode is actually in effect. This is |
| where curHandlerTable and breakFlags come in to play. |
| |
| The mterp fast interpreter achieves much of its performance advantage |
| over the portable interpreter through its efficient mechanism of |
| transitioning from one Dalvik bytecode to the next. Mterp for ARM targets |
| uses a computed-goto mechanism, in which the handler entrypoints are |
| located at the base of the handler table + (opcode * 64). Mterp for x86 |
| targets instead uses a jump table of handler entry points indexed |
| by the Dalvik opcode. To support efficient handling of special subModes, |
| mterp supports two sets of handler entries (for ARM) or two jump |
| tables (for x86). One handler set is optimized for speed and performs no |
| inter-instruction checks (mainHandlerTable in the Thread structure), while |
| the other includes a test of the subMode field (altHandlerTable). |
| |
| In normal operation (i.e. subMode == 0), the dedicated register rIBASE |
| (r8 for ARM, edx for x86) holds a mainHandlerTable. If we need to switch |
| to a subMode that requires inter-instruction checking, rIBASE is changed |
| to altHandlerTable. Note that this change is not immediate. What is actually |
| changed is the value of curHandlerTable - which is part of the interpBreak |
| structure. Rather than explicitly check for changes, each thread will |
| blindly refresh rIBASE at backward branches, exception throws and returns. |
| |
| The breakFlags field tells the interpreter control mechanism whether |
| curHandlerTable should hold the real or alternate handler base. If |
| non-zero, we use the altHandlerBase. The bits within breakFlags |
| tells dvmCheckBefore which set of subModes need to be checked. |
| |
| See dvmCheckBefore() for subMode handling, and dvmEnableSubMode(), |
| dvmDisableSubMode() for switching on and off. |