Andy McFadden | 6b45cff | 2009-05-12 15:26:25 -0700 | [diff] [blame] | 1 | <html> |
| 2 | <head> |
| 3 | <title>Dalvik Porting Guide</title> |
| 4 | </head> |
| 5 | |
| 6 | <body> |
| 7 | <h1>Dalvik Porting Guide</h1> |
| 8 | |
| 9 | <p> |
| 10 | The Dalvik virtual machine is intended to run on a variety of platforms. |
| 11 | The baseline system is expected to be a variant of UNIX (Linux, BSD, Mac |
| 12 | OS X) running the GNU C compiler. Little-endian CPUs have been exercised |
| 13 | the most heavily, but big-endian systems are explicitly supported. |
| 14 | </p><p> |
| 15 | There are two general categories of work: porting to a Linux system |
| 16 | with a previously unseen CPU architecture, and porting to a different |
| 17 | operating system. This document covers the former. |
| 18 | </p><p> |
| 19 | Basic familiarity with the Android platform, source code structure, and |
| 20 | build system is assumed. |
| 21 | </p> |
| 22 | |
| 23 | |
| 24 | <h2>Core Libraries</h2> |
| 25 | |
| 26 | <p> |
| 27 | The native code in the core libraries (chiefly <code>dalvik/libcore</code>, |
| 28 | but also <code>dalvik/vm/native</code>) is written in C/C++ and is expected |
| 29 | to work without modification in a Linux environment. Much of the code |
| 30 | comes directly from the Apache Harmony project. |
| 31 | </p><p> |
| 32 | The core libraries pull in code from many other projects, including |
| 33 | OpenSSL, zlib, and ICU. These will also need to be ported before the VM |
| 34 | can be used. |
| 35 | </p> |
| 36 | |
| 37 | |
| 38 | <h2>JNI Call Bridge</h2> |
| 39 | |
| 40 | <p> |
| 41 | Most of the Dalvik VM runtime is written in portable C. The one |
| 42 | non-portable component of the runtime is the JNI call bridge. Simply put, |
| 43 | this converts an array of integers into function arguments of various |
| 44 | types, and calls a function. This must be done according to the C calling |
| 45 | conventions for the platform. The task could be as simple as pushing all |
| 46 | of the arguments onto the stack, or involve complex rules for register |
| 47 | assignment and stack alignment. |
| 48 | </p><p> |
| 49 | To ease porting to new platforms, the <a href="http://sourceware.org/libffi/"> |
| 50 | open-source FFI library</a> (Foreign Function Interface) is used when a |
| 51 | custom bridge is unavailable. FFI is not as fast as a native implementation, |
| 52 | and the optional performance improvements it does offer are not used, so |
| 53 | writing a replacement is a good first step. |
| 54 | </p><p> |
| 55 | The code lives in <code>dalvik/vm/arch/*</code>, with the FFI-based version |
| 56 | in the "generic" directory. There are two source files for each architecture. |
| 57 | One defines the call bridge itself: |
| 58 | </p><p><blockquote> |
| 59 | <code>void dvmPlatformInvoke(void* pEnv, ClassObject* clazz, int argInfo, |
| 60 | int argc, const u4* argv, const char* signature, void* func, |
| 61 | JValue* pReturn)</code> |
| 62 | </blockquote></p><p> |
| 63 | This will invoke a C/C++ function declared: |
| 64 | </p><p><blockquote> |
| 65 | <code>return_type func(JNIEnv* pEnv, Object* this [, <i>args</i>])<br></code> |
| 66 | </blockquote>or (for a "static" method):<blockquote> |
| 67 | <code>return_type func(JNIEnv* pEnv, ClassObject* clazz [, <i>args</i>])</code> |
| 68 | </blockquote></p><p> |
| 69 | The role of <code>dvmPlatformInvoke</code> is to convert the values in |
| 70 | <code>argv</code> into C-style calling conventions, call the method, and |
| 71 | then place the return type into <code>pReturn</code> (a union that holds |
| 72 | all of the basic JNI types). The code may use the method signature |
| 73 | (a DEX "shorty" signature, with one character for the return type and one |
| 74 | per argument) to determine how to handle the values. |
| 75 | </p><p> |
| 76 | The other source file involved here defines a 32-bit "hint". The hint |
| 77 | is computed when the method's class is loaded, and passed in as the |
| 78 | "argInfo" argument. The hint can be used to avoid scanning the ASCII |
| 79 | method signature for things like the return value, total argument size, |
| 80 | or inter-argument 64-bit alignment restrictions. |
| 81 | |
| 82 | |
| 83 | <h2>Interpreter</h2> |
| 84 | |
| 85 | <p> |
| 86 | The Dalvik runtime includes two interpreters, labeled "portable" and "fast". |
| 87 | The portable interpreter is largely contained within a single C function, |
| 88 | and should compile on any system that supports gcc. (If you don't have gcc, |
| 89 | you may need to disable the "threaded" execution model, which relies on |
| 90 | gcc's "goto table" implementation; look for the THREADED_INTERP define.) |
| 91 | </p><p> |
| 92 | The fast interpreter uses hand-coded assembly fragments. If none are |
| 93 | available for the current architecture, the build system will create an |
| 94 | interpreter out of C "stubs". The resulting "all stubs" interpreter is |
| 95 | quite a bit slower than the portable interpreter, making "fast" something |
| 96 | of a misnomer. |
| 97 | </p><p> |
| 98 | The fast interpreter is enabled by default. On platforms without native |
| 99 | support, you may want to switch to the portable interpreter. This can |
| 100 | be controlled with the <code>dalvik.vm.execution-mode</code> system |
| 101 | property. For example, if you: |
| 102 | </p><p><blockquote> |
| 103 | <code>adb shell "echo dalvik.vm.execution-mode = int:portable >> /data/local.prop"</code> |
| 104 | </blockquote></p><p> |
| 105 | and reboot, the Android app framework will start the VM with the portable |
| 106 | interpreter enabled. |
| 107 | </p> |
| 108 | |
| 109 | |
| 110 | <h3>Mterp Interpreter Structure</h3> |
| 111 | |
| 112 | <p> |
| 113 | There may be significant performance advantages to rewriting the |
| 114 | interpreter core in assembly language, using architecture-specific |
| 115 | optimizations. In Dalvik this can be done one instruction at a time. |
| 116 | </p><p> |
| 117 | The simplest way to implement an interpreter is to have a large "switch" |
| 118 | statement. After each instruction is handled, the interpreter returns to |
| 119 | the top of the loop, fetches the next instruction, and jumps to the |
| 120 | appropriate label. |
| 121 | </p><p> |
| 122 | An improvement on this is called "threaded" execution. The instruction |
| 123 | fetch and dispatch are included at the end of every instruction handler. |
| 124 | This makes the interpreter a little larger overall, but you get to avoid |
| 125 | the (potentially expensive) branch back to the top of the switch statement. |
| 126 | </p><p> |
| 127 | Dalvik mterp goes one step further, using a computed goto instead of a goto |
| 128 | table. Instead of looking up the address in a table, which requires an |
| 129 | extra memory fetch on every instruction, mterp multiplies the opcode number |
| 130 | by a fixed value. By default, each handler is allowed 64 bytes of space. |
| 131 | </p><p> |
| 132 | Not all handlers fit in 64 bytes. Those that don't can have subroutines |
| 133 | or simply continue on to additional code outside the basic space. Some of |
| 134 | this is handled automatically by Dalvik, but there's no portable way to detect |
| 135 | overflow of a 64-byte handler until the VM starts executing. |
| 136 | </p><p> |
| 137 | The choice of 64 bytes is somewhat arbitrary, but has worked out well for |
| 138 | ARM and x86. |
| 139 | </p><p> |
| 140 | In the course of development it's useful to have C and assembly |
| 141 | implementations of each handler, and be able to flip back and forth |
| 142 | between them when hunting problems down. In mterp this is relatively |
| 143 | straightforward. You can always see the files being fed to the compiler |
| 144 | and assembler for your platform by looking in the |
| 145 | <code>dalvik/vm/mterp/out</code> directory. |
| 146 | </p><p> |
| 147 | The interpreter sources live in <code>dalvik/vm/mterp</code>. If you |
| 148 | haven't yet, you should read <code>dalvik/vm/mterp/README.txt</code> now. |
| 149 | </p> |
| 150 | |
| 151 | |
| 152 | <h3>Getting Started With Mterp</h3> |
| 153 | |
| 154 | </p><p> |
| 155 | Getting started: |
| 156 | <ol> |
| 157 | <li>Decide on the name of your architecture. For the sake of discussion, |
| 158 | let's call it <code>myarch</code>. |
| 159 | <li>Make a copy of <code>dalvik/vm/mterp/config-allstubs</code> to |
| 160 | <code>dalvik/vm/mterp/config-myarch</code>. |
| 161 | <li>Create a <code>dalvik/vm/mterp/myarch</code> directory to hold your |
| 162 | source files. |
| 163 | <li>Add <code>myarch</code> to the list in |
| 164 | <code>dalvik/vm/mterp/rebuild.sh</code>. |
| 165 | <li>Make sure <code>dalvik/vm/Android.mk</code> will find the files for |
| 166 | your architecture. If <code>$(TARGET_ARCH)</code> is configured this |
| 167 | will happen automatically. |
| 168 | </ol> |
| 169 | </p><p> |
| 170 | You now have the basic framework in place. Whenever you make a change, you |
| 171 | need to perform two steps: regenerate the mterp output, and build the |
| 172 | core VM library. (It's two steps because we didn't want the build system |
| 173 | to require Python 2.5. Which, incidentally, you need to have.) |
| 174 | <ol> |
| 175 | <li>In the <code>dalvik/vm/mterp</code> directory, regenerate the contents |
| 176 | of the files in <code>dalvik/vm/mterp/out</code> by executing |
| 177 | <code>./rebuild.sh</code>. Note there are two files, one in C and one |
| 178 | in assembly. |
| 179 | <li>In the <code>dalvik</code> directory, regenerate the |
| 180 | <code>libdvm.so</code> library with <code>mm</code>. You can also use |
| 181 | <code>make libdvm</code> from the top of the tree. |
| 182 | </ol> |
| 183 | </p><p> |
| 184 | This will leave you with an updated libdvm.so, which can be pushed out to |
| 185 | a device with <code>adb sync</code> or <code>adb push</code>. If you're |
| 186 | using the emulator, you need to add <code>make snod</code> (System image, |
| 187 | NO Dependency check) to rebuild the system image file. You should not |
| 188 | need to do a top-level "make" and rebuild the dependent binaries. |
| 189 | </p><p> |
| 190 | At this point you have an "all stubs" interpreter. You can see how it |
| 191 | works by examining <code>dalvik/vm/mterp/cstubs/entry.c</code>. The |
| 192 | code runs in a loop, pulling out the next opcode, and invoking the |
| 193 | handler through a function pointer. Each handler takes a "glue" argument |
| 194 | that contains all of the useful state. |
| 195 | </p><p> |
| 196 | Your goal is to replace the entry method, exit method, and each individual |
| 197 | instruction with custom implementations. The first thing you need to do |
| 198 | is create an entry function that calls the handler for the first instruction. |
| 199 | After that, the instructions chain together, so you don't need a loop. |
| 200 | (Look at the ARM or x86 implementation to see how they work.) |
| 201 | </p><p> |
| 202 | Once you have that, you need something to jump to. You can't branch |
| 203 | directly to the C stub because it's expecting to be called with a "glue" |
| 204 | argument and then return. We need a C stub "wrapper" that does the |
| 205 | setup and jumps directly to the next handler. We write this in assembly |
| 206 | and then add it to the config file definition. |
| 207 | </p><p> |
| 208 | To see how this works, create a file called |
| 209 | <code>dalvik/vm/mterp/myarch/stub.S</code> that contains one line: |
| 210 | <pre> |
| 211 | /* stub for ${opcode} */ |
| 212 | </pre> |
| 213 | Then, in <code>dalvik/vm/mterp/config-myarch</code>, add this below the |
| 214 | <code>handler-size</code> directive: |
| 215 | <pre> |
| 216 | # source for the instruction table stub |
| 217 | asm-stub myarch/stub.S |
| 218 | </pre> |
| 219 | </p><p> |
| 220 | Regenerate the sources with <code>./rebuild.sh</code>, and take a look |
| 221 | inside <code>dalvik/vm/mterp/out/InterpAsm-myarch.S</code>. You should |
| 222 | see 256 copies of the stub function in a single large block after the |
| 223 | <code>dvmAsmInstructionStart</code> label. The <code>stub.S</code> |
| 224 | code will be used anywhere you don't provide an assembly implementation. |
| 225 | </p><p> |
| 226 | Note that each block begins with a <code>.balign 64</code> directive. |
| 227 | This is what pads each handler out to 64 bytes. Note also that the |
| 228 | <code>${opcode}</code> text changed into an opcode name, which should |
| 229 | be used to call the C implementation (<code>dvmMterp_${opcode}</code>). |
| 230 | </p><p> |
| 231 | The actual contents of <code>stub.S</code> are up to you to define. |
| 232 | See <code>entry.S</code> and <code>stub.S</code> in the <code>armv5te</code> |
| 233 | or <code>x86</code> directories for working examples. |
| 234 | </p><p> |
| 235 | If you're working on a variation of an existing architecture, you may be |
| 236 | able to use most of the existing code and just provide replacements for |
| 237 | a few instructions. Look at the <code>armv4t</code> implementation as |
| 238 | an example. |
| 239 | </p> |
| 240 | |
| 241 | |
| 242 | <h3>Replacing Stubs</h3> |
| 243 | |
| 244 | <p> |
| 245 | There are roughly 230 Dalvik opcodes, including some that are inserted by |
| 246 | <a href="dexopt.html">dexopt</a> and aren't described in the |
| 247 | <a href="dalvik-bytecode.html">Dalvik bytecode</a> documentation. Each |
| 248 | one must perform the appropriate actions, fetch the next opcode, and |
| 249 | branch to the next handler. The actions performed by the assembly version |
| 250 | must exactly match those performed by the C version (in |
| 251 | <code>dalvik/vm/mterp/c/OP_*</code>). |
| 252 | </p><p> |
| 253 | It is possible to customize the set of "optimized" instructions for your |
| 254 | platform. This is possible because optimized DEX files are not expected |
| 255 | to work on multiple devices. Adding, removing, or redefining instructions |
| 256 | is beyond the scope of this document, and for simplicity it's best to stick |
| 257 | with the basic set defined by the portable interpreter. |
| 258 | </p><p> |
| 259 | Once you have written a handler that looks like it should work, add |
| 260 | it to the config file. For example, suppose we have a working version |
| 261 | of <code>OP_NOP</code>. For demonstration purposes, fake it for now by |
| 262 | putting this into <code>dalvik/vm/mterp/myarch/OP_NOP.S</code>: |
| 263 | <pre> |
| 264 | /* This is my NOP handler */ |
| 265 | </pre> |
| 266 | </p><p> |
| 267 | Then, in the <code>op-start</code> section of <code>config-myarch</code>, add: |
| 268 | <pre> |
| 269 | op OP_NOP myarch |
| 270 | </pre> |
| 271 | </p><p> |
| 272 | This tells the generation script to use the assembly version from the |
| 273 | <code>myarch</code> directory instead of the C version from the <code>c</code> |
| 274 | directory. |
| 275 | </p><p> |
| 276 | Execute <code>./rebuild.sh</code>. Look at <code>InterpAsm-myarch.S</code> |
| 277 | and <code>InterpC-myarch.c</code> in the <code>out</code> directory. You |
| 278 | will see that the <code>OP_NOP</code> stub wrapper has been replaced with our |
| 279 | new code in the assembly file, and the C stub implementation is no longer |
| 280 | included. |
| 281 | </p><p> |
| 282 | As you implement instructions, the C version and corresponding stub wrapper |
| 283 | will disappear from the output files. Eventually you will have a 100% |
Andy McFadden | e9f54e6 | 2009-07-02 13:53:09 -0700 | [diff] [blame^] | 284 | assembly interpreter. You may find it saves a little time to examine |
| 285 | the output of your compiler for some of the operations. The |
| 286 | <a href="porting-proto.c.txt">porting-proto.c</a> sample code can be |
| 287 | helpful here. |
Andy McFadden | 6b45cff | 2009-05-12 15:26:25 -0700 | [diff] [blame] | 288 | </p> |
| 289 | |
| 290 | |
| 291 | <h3>Interpreter Switching</h3> |
| 292 | |
| 293 | <p> |
| 294 | The Dalvik VM actually includes a third interpreter implementation: the debug |
| 295 | interpreter. This is a variation of the portable interpreter that includes |
| 296 | support for debugging and profiling. |
| 297 | </p><p> |
| 298 | When a debugger attaches, or a profiling feature is enabled, the VM |
| 299 | will switch interpreters at a convenient point. This is done at the |
| 300 | same time as the GC safe point check: on a backward branch, a method |
| 301 | return, or an exception throw. Similarly, when the debugger detaches |
| 302 | or profiling is discontinued, execution transfers back to the "fast" or |
| 303 | "portable" interpreter. |
| 304 | </p><p> |
| 305 | Your entry function needs to test the "entryPoint" value in the "glue" |
| 306 | pointer to determine where execution should begin. Your exit function |
| 307 | will need to return a boolean that indicates whether the interpreter is |
| 308 | exiting (because we reached the "bottom" of a thread stack) or wants to |
| 309 | switch to the other implementation. |
| 310 | </p><p> |
| 311 | See the <code>entry.S</code> file in <code>x86</code> or <code>armv5te</code> |
| 312 | for examples. |
| 313 | </p> |
| 314 | |
| 315 | |
| 316 | <h3>Testing</h3> |
| 317 | |
| 318 | <p> |
| 319 | A number of VM tests can be found in <code>dalvik/tests</code>. The most |
| 320 | useful during interpreter development is <code>003-omnibus-opcodes</code>, |
| 321 | which tests many different instructions. |
| 322 | </p><p> |
| 323 | The basic invocation is: |
| 324 | <pre> |
| 325 | $ cd dalvik/tests |
| 326 | $ ./run-test 003 |
| 327 | </pre> |
| 328 | </p><p> |
| 329 | This will run test 003 on an attached device or emulator. You can run |
| 330 | the test against your desktop VM by specifying <code>--reference</code> |
| 331 | if you suspect the test may be faulty. You can also use |
| 332 | <code>--portable</code> and <code>--fast</code> to explictly specify |
| 333 | one Dalvik interpreter or the other. |
| 334 | </p><p> |
| 335 | Some instructions are replaced by <code>dexopt</code>, notably when |
| 336 | "quickening" field accesses and method invocations. To ensure |
| 337 | that you are testing the basic form of the instruction, add the |
| 338 | <code>--no-optimize</code> option. |
| 339 | </p><p> |
Andy McFadden | 9c14e0a | 2009-05-13 10:39:24 -0700 | [diff] [blame] | 340 | There is no in-built instruction tracing mechanism. If you want |
| 341 | to know for sure that your implementation of an opcode handler |
| 342 | is being used, the easiest approach is to insert a "printf" |
| 343 | call. For an example, look at <code>common_squeak</code> in |
| 344 | <code>dalvik/vm/mterp/armv5te/footer.S</code>. |
| 345 | </p><p> |
Andy McFadden | 6b45cff | 2009-05-12 15:26:25 -0700 | [diff] [blame] | 346 | At some point you need to ensure that debuggers and profiling work with |
| 347 | your interpreter. The easiest way to do this is to simply connect a |
| 348 | debugger or toggle profiling. (A future test suite may include some |
| 349 | tests for this.) |
| 350 | </p> |
| 351 | |
| 352 | <p> |
| 353 | <address>Copyright © 2009 The Android Open Source Project</address> |
| 354 | |
| 355 | </body> |
| 356 | </html> |