Marc R. Hoffmann | e571f3f | 2012-05-13 12:18:02 +0000 | [diff] [blame] | 1 | <?xml version="1.0" encoding="UTF-8" ?> |
Evgeny Mandrikov | 82a92ca | 2012-01-15 20:25:48 +0000 | [diff] [blame] | 2 | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> |
| 3 | <html xmlns="http://www.w3.org/1999/xhtml" lang="en"> |
| 4 | <head> |
Marc R. Hoffmann | e571f3f | 2012-05-13 12:18:02 +0000 | [diff] [blame] | 5 | <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> |
Evgeny Mandrikov | 8b21298 | 2016-06-12 17:55:49 +0200 | [diff] [blame] | 6 | <link rel="stylesheet" href="resources/doc.css" charset="UTF-8" type="text/css" /> |
Evgeny Mandrikov | 28d5985 | 2016-06-11 12:41:17 +0200 | [diff] [blame] | 7 | <link rel="stylesheet" href="../coverage/jacoco-resources/prettify.css" charset="UTF-8" type="text/css" /> |
Evgeny Mandrikov | 8b21298 | 2016-06-12 17:55:49 +0200 | [diff] [blame] | 8 | <link rel="shortcut icon" href="resources/report.gif" type="image/gif" /> |
Evgeny Mandrikov | 28d5985 | 2016-06-11 12:41:17 +0200 | [diff] [blame] | 9 | <script type="text/javascript" src="../coverage/jacoco-resources/prettify.js"></script> |
Evgeny Mandrikov | 82a92ca | 2012-01-15 20:25:48 +0000 | [diff] [blame] | 10 | <title>JaCoCo - Control Flow Analysis</title> |
| 11 | </head> |
| 12 | <body onload="prettyPrint()"> |
| 13 | |
| 14 | <div class="breadcrumb"> |
| 15 | <a href="../index.html" class="el_report">JaCoCo</a> > |
| 16 | <a href="index.html" class="el_group">Documentation</a> > |
| 17 | <span class="el_source">Control Flow Analysis</span> |
| 18 | </div> |
| 19 | <div id="content"> |
| 20 | |
| 21 | <h1>Control Flow Analysis for Java Methods</h1> |
| 22 | |
| 23 | <p class="hint"> |
| 24 | Implementing a coverage tool that supports statement (C0) as well as branch |
| 25 | coverage coverage (C1) requires detailed analysis of the internal control flow |
| 26 | of Java methods. Due to the architecture of JaCoCo this analysis happens on |
| 27 | the bytecode of compiled class files. This document describes JaCoCo's |
| 28 | strategies for inserting probes into the control flow at runtime and analyzing |
| 29 | the actual code coverage. Marc R. Hoffmann, November 2011 |
| 30 | </p> |
| 31 | |
| 32 | <h2>Control Flow Graphs for Java Bytecode</h2> |
| 33 | |
| 34 | <p> |
| 35 | As an starting point we take the following example method that contains a |
| 36 | single branching point: |
| 37 | </p> |
| 38 | |
| 39 | <pre class="source lang-java linenums"> |
| 40 | public static void example() { |
| 41 | a(); |
| 42 | if (cond()) { |
| 43 | b(); |
| 44 | } else { |
| 45 | c(); |
| 46 | } |
| 47 | d(); |
| 48 | } |
| 49 | </pre> |
| 50 | |
| 51 | <p> |
| 52 | A Java compiler will create the following bytecode from this example method. |
| 53 | Java bytecode is a linear sequence of instructions. Control flow is |
| 54 | implemented with <i>jump</i> instructions like the conditional |
| 55 | <code>IFEQ</code> or the unconditional <code>GOTO</code> opcode. The jump |
| 56 | targets are technically relative offsets to the target instruction. For better |
| 57 | readability we use symbolic labels (<code>L1</code>, <code>L2</code>) instead |
| 58 | (also the ASM API uses such symbolic labels): |
| 59 | </p> |
| 60 | |
| 61 | <pre class="source linenums"> |
| 62 | public static example()V |
| 63 | INVOKESTATIC a()V |
| 64 | INVOKESTATIC cond()Z |
| 65 | IFEQ L1 |
| 66 | INVOKESTATIC b()V |
| 67 | GOTO L2 |
| 68 | L1: INVOKESTATIC c()V |
| 69 | L2: INVOKESTATIC d()V |
| 70 | RETURN |
| 71 | </pre> |
| 72 | |
| 73 | <p> |
| 74 | The possible control flow in the bytecode above can be represented by a graph. |
| 75 | The nodes are byte code instruction, the edged of the graph represent the |
| 76 | possible control flow between the instructions. The control flow of the |
| 77 | example is shown in the left box of this diagram: |
| 78 | </p> |
| 79 | |
Evgeny Mandrikov | 8b21298 | 2016-06-12 17:55:49 +0200 | [diff] [blame] | 80 | <img src="resources/flow-example.png" alt="Bytecode Control Flow"/> |
Evgeny Mandrikov | 82a92ca | 2012-01-15 20:25:48 +0000 | [diff] [blame] | 81 | |
| 82 | |
| 83 | <h3>Flow Edges</h3> |
| 84 | |
| 85 | <p> |
| 86 | The control flow graph of a Java method defined by Java byte code may have |
| 87 | the following Edges. Each edge connects a source instruction with a target |
| 88 | instruction. In some cases the source instruction or the target instruction |
| 89 | does not exist (virtual edges for method entry and exit) or cannot be |
| 90 | exactly specified (exception handlers). |
| 91 | </p> |
| 92 | |
| 93 | <table class="coverage"> |
| 94 | <thead> |
| 95 | <tr> |
| 96 | <td>Type</td> |
| 97 | <td>Source</td> |
| 98 | <td>Target</td> |
| 99 | <td>Remarks</td> |
| 100 | </tr> |
| 101 | </thead> |
| 102 | <tbody> |
| 103 | <tr> |
| 104 | <td>ENTRY</td> |
| 105 | <td>-</td> |
| 106 | <td>First instruction in method</td> |
| 107 | <td></td> |
| 108 | </tr> |
| 109 | <tr> |
| 110 | <td>SEQUENCE</td> |
| 111 | <td>Instruction, except <code>GOTO</code>, <code>xRETURN</code>, |
| 112 | <code>THROW</code>, <code>TABLESWITCH</code> and <code>LOOKUPSWITCH</code></td> |
| 113 | <td>Subsequent instruction</td> |
| 114 | <td></td> |
| 115 | </tr> |
| 116 | <tr> |
| 117 | <td>JUMP</td> |
| 118 | <td><code>GOTO</code>, <code>IFx</code>, <code>TABLESWITCH</code> or |
| 119 | <code>LOOKUPSWITCH</code> instruction</td> |
| 120 | <td>Target instruction</td> |
| 121 | <td><code>TABLESWITCH</code> and <code>LOOKUPSWITCH</code> will define |
| 122 | multiple edges.</td> |
| 123 | </tr> |
| 124 | <tr> |
| 125 | <td>EXHANDLER</td> |
| 126 | <td>Any instruction in handler scope</td> |
| 127 | <td>Target instruction</td> |
| 128 | <td></td> |
| 129 | </tr> |
| 130 | <tr> |
| 131 | <td>EXIT</td> |
| 132 | <td><code>xRETURN</code> or <code>THROW</code> instruction</td> |
| 133 | <td>-</td> |
| 134 | <td></td> |
| 135 | </tr> |
| 136 | <tr> |
| 137 | <td>EXEXIT</td> |
| 138 | <td>Any instruction</td> |
| 139 | <td>-</td> |
| 140 | <td>Unhandled exception.</td> |
| 141 | </tr> |
| 142 | </tbody> |
| 143 | </table> |
| 144 | |
| 145 | <p> |
| 146 | The current JaCoCo implementation ignores edges caused by implicit exceptions |
| 147 | and the the method entry. This means we consider SEQUENCE, JUMP, EXIT. |
| 148 | </p> |
| 149 | |
| 150 | |
| 151 | <h2>Probe Insertion Strategy</h2> |
| 152 | |
| 153 | <p> |
| 154 | Probes are additional instructions that can be inserted between existing |
| 155 | instructions. They do not change the behavior of the method but record the |
| 156 | fact that they have been executed. One can think probes are placed on edges of |
| 157 | the control flow graph. Theoretically we could insert a probe at every edge of |
| 158 | the control flow graph. As a probe implementation itself requires multiple |
| 159 | bytecode instructions this would increase the size of the class files several |
| 160 | times and significantly slow down execution speed of the instrumented classes. |
| 161 | Fortunately this is not required, in fact we only need a few probes per method |
| 162 | depending on the control flow of the method. For example a method without any |
| 163 | branches requires a single probe only. The reason for this is that starting |
| 164 | from a certain probe we can back-trace the execution path and typically get |
| 165 | coverage information for multiple instructions. |
| 166 | </p> |
| 167 | |
| 168 | <p> |
| 169 | If a probe has been executed we know that the corresponding edge has been |
| 170 | visited. From this edge we can conclude to other preceding nodes and edges: |
| 171 | </p> |
| 172 | |
| 173 | <ul> |
| 174 | <li>If a edge has been visited, we know that the source node of the this edge |
| 175 | has been executed.</li> |
| 176 | <li>If a node has been executed and the node is the target of only one edge |
| 177 | we know that this edge has been visited.</li> |
| 178 | </ul> |
| 179 | |
| 180 | <p> |
| 181 | Recursively applying these rules allows to determine the execution status of |
| 182 | all instructions of a method – given that we have probes at the right |
| 183 | positions. Therefore JaCoCo inserts probes |
| 184 | </p> |
| 185 | |
| 186 | <ul> |
| 187 | <li>at every method exit (return or throws) and</li> |
| 188 | <li>at every edge where the target instruction is the target of more than one |
| 189 | edge.</li> |
| 190 | </ul> |
| 191 | |
| 192 | <p> |
| 193 | We recall that a probe is simply a small sequence of additional instructions |
| 194 | that needs to be inserted at a control flow edge. The following table |
| 195 | illustrates how this extra instructions are added in case of different edge |
| 196 | types. |
| 197 | </p> |
| 198 | |
| 199 | <table class="coverage"> |
| 200 | <thead> |
| 201 | <tr> |
| 202 | <td>Type</td> |
| 203 | <td>Before</td> |
| 204 | <td>After</td> |
| 205 | <td>Remarks</td> |
| 206 | </tr> |
| 207 | </thead> |
| 208 | <tbody> |
| 209 | <tr> |
| 210 | <td>SEQUENCE</td> |
Evgeny Mandrikov | 8b21298 | 2016-06-12 17:55:49 +0200 | [diff] [blame] | 211 | <td><img src="resources/flow-sequence.png" alt="Sequence"/></td> |
| 212 | <td><img src="resources/flow-sequence-probe.png" alt="Sequence with Probe"/></td> |
Evgeny Mandrikov | 82a92ca | 2012-01-15 20:25:48 +0000 | [diff] [blame] | 213 | <td> |
| 214 | In case of a simple sequence the probe is simply inserted between the |
| 215 | two instructions. |
| 216 | </td> |
| 217 | </tr> |
| 218 | <tr> |
| 219 | <td>JUMP (unconditional)</td> |
Evgeny Mandrikov | 8b21298 | 2016-06-12 17:55:49 +0200 | [diff] [blame] | 220 | <td><img src="resources/flow-goto.png" alt="Unconditional Jump"/></td> |
| 221 | <td><img src="resources/flow-goto-probe.png" alt="Unconditional Jump with Probe"/></td> |
Evgeny Mandrikov | 82a92ca | 2012-01-15 20:25:48 +0000 | [diff] [blame] | 222 | <td> |
| 223 | As an unconditional jump is executed in any case, we can also insert the |
| 224 | probe just before the GOTO instruction. |
| 225 | </td> |
| 226 | </tr> |
| 227 | <tr> |
| 228 | <td>JUMP (conditional)</td> |
Evgeny Mandrikov | 8b21298 | 2016-06-12 17:55:49 +0200 | [diff] [blame] | 229 | <td><img src="resources/flow-cond.png" alt="Conditional Jump"/></td> |
| 230 | <td><img src="resources/flow-cond-probe.png" alt="Conditional Jump with Probe"/></td> |
Evgeny Mandrikov | 82a92ca | 2012-01-15 20:25:48 +0000 | [diff] [blame] | 231 | <td> |
Marc R. Hoffmann | 99f7409 | 2012-06-21 18:46:52 +0000 | [diff] [blame] | 232 | Adding a probe to an conditional jump is little bit more tricky. We |
| 233 | invert the semantic of the opcode and add the probe right after the |
| 234 | conditional jump. With a subsequent <code>GOTO</code> instruction we |
| 235 | jump to the original target. Note that this approach will not introduce |
| 236 | a backward jump, which would cause trouble with the Java verifier if we |
| 237 | have an uninitialized object on the stack. |
Evgeny Mandrikov | 82a92ca | 2012-01-15 20:25:48 +0000 | [diff] [blame] | 238 | </td> |
| 239 | </tr> |
| 240 | <tr> |
| 241 | <td>EXIT</td> |
Evgeny Mandrikov | 8b21298 | 2016-06-12 17:55:49 +0200 | [diff] [blame] | 242 | <td><img src="resources/flow-exit.png" alt="Exit"/></td> |
| 243 | <td><img src="resources/flow-exit-probe.png" alt="Exit with Probe"/></td> |
Evgeny Mandrikov | 82a92ca | 2012-01-15 20:25:48 +0000 | [diff] [blame] | 244 | <td> |
| 245 | As is is the nature of RETURN and THROW statements to actually leave the |
| 246 | method we add the probe right before these statements. |
| 247 | </td> |
| 248 | </tr> |
| 249 | </tbody> |
| 250 | </table> |
| 251 | |
| 252 | <p> |
| 253 | Now let's see how this rules apply to the example snippet above. We see that |
| 254 | <code>INVOKE d()</code> instruction is the only node with more than one |
| 255 | incoming edge. So we need to place probes on those edges and another probe on |
| 256 | the only exit node. The result is shown the the right box of the diagram |
| 257 | above. |
| 258 | </p> |
| 259 | |
Marc R. Hoffmann | 8731145 | 2015-02-16 09:09:25 +0100 | [diff] [blame] | 260 | <h2>Additional Probes Between Lines</h2> |
| 261 | |
| 262 | <p> |
| 263 | The probe insertion strategy described so far does not consider implicit |
| 264 | exceptions thrown for example from invoked methods. If the control flow |
| 265 | between two probes is interrupted by a exception not explicitly created with |
| 266 | a <code>throw</code> statement all instruction in between are considered as |
| 267 | not covered. This leads to unexpected results especially when the the block of |
| 268 | instructions spans multiple lines of source code. |
| 269 | </p> |
| 270 | |
| 271 | <p> |
| 272 | Therefore JaCoCo adds an additional probe between the instructions of two |
| 273 | lines whenever the subsequent line contains at least one method invocation. |
| 274 | This limits the effect of implicit exceptions from method invocations to |
| 275 | single lines of source. The approach only works for class files compiled with |
| 276 | debug information (line numbers) and does not consider implicit exceptions |
| 277 | from other instructions than method invocations (e.g. |
| 278 | <code>NullPointerException</code> or <code>ArrayIndexOutOfBoundsException</code>). |
| 279 | </p> |
| 280 | |
Evgeny Mandrikov | 82a92ca | 2012-01-15 20:25:48 +0000 | [diff] [blame] | 281 | <h2>Probe Implementation</h2> |
| 282 | |
| 283 | <p> |
| 284 | Code coverage analysis is a runtime metric that provides execution details |
| 285 | of the software under test. This requires detailed recording about the |
| 286 | instructions (instruction coverage) that have been executed. For branch |
| 287 | coverage also the outcome of decisions has to be recorded. In any case |
| 288 | execution data is collected by so called probes: |
| 289 | </p> |
| 290 | |
| 291 | <p class="hint"> |
| 292 | A <b>probe</b> is a sequence of bytecode instructions that can be inserted |
| 293 | into a Java method. When the probe is executed, this fact is recorded and can |
| 294 | be reported by the coverage runtime. The probe must not change the behavior |
| 295 | of the original code. |
| 296 | </p> |
| 297 | |
| 298 | <p> |
| 299 | The only purpose of the probe is to record that it has been executed at least |
| 300 | once. The probe does not record the number of times it has been called or |
| 301 | collect any timing information. The latter is out of scope for code coverage |
| 302 | analysis and more in the objective of a performance analysis tool. Typically |
| 303 | multiple probes needs to be inserted into each method, therefore probes needs |
| 304 | to be identified. Also the probe implementation and the storage mechanism it |
| 305 | depends on needs to be thread safe as multi-threaded execution is a common |
| 306 | scenario for java applications (albeit not for plain unit tests). Probes must |
| 307 | not have any side effects on the original code of the method. Also they should |
| 308 | add minimal overhead. |
| 309 | </p> |
| 310 | |
| 311 | <p> |
| 312 | So to summarize the requirements for execution probes: |
| 313 | </p> |
| 314 | |
| 315 | <ul> |
| 316 | <li>Record execution</li> |
| 317 | <li>Identification for different probes</li> |
| 318 | <li>Thread safe</li> |
| 319 | <li>No side effects on application code</li> |
| 320 | <li>Minimal runtime overhead</li> |
| 321 | </ul> |
| 322 | |
| 323 | <p> |
| 324 | JaCoCo implements probes with a <code>boolean[]</code> array instance per |
| 325 | class. Each probe corresponds to a entry in this array. Whenever the probe is |
| 326 | executed the entry is set to <code>true</code> with the following four |
| 327 | bytecode instructions: |
| 328 | </p> |
| 329 | |
| 330 | <pre class="source"> |
| 331 | ALOAD probearray |
| 332 | xPUSH probeid |
| 333 | ICONST_1 |
| 334 | BASTORE |
| 335 | </pre> |
| 336 | |
| 337 | <p> |
| 338 | Note that this probe code is thread safe, does not modify the operand stack |
| 339 | or modify local variables and is also thread safe. It does also not leave the |
| 340 | method though an external call. The only prerequisite is that the probe array |
| 341 | is available as a local variable. For this at the beginning of each method |
| 342 | additional instrumentation code needs to be added to obtain the array instance |
| 343 | associated with the belonging class. To avoid code duplication the |
| 344 | initialization is delegated to a static private method |
| 345 | <code>$jacocoinit()</code> which is added to every non-interface class. |
| 346 | </p> |
| 347 | |
| 348 | <p> |
| 349 | The size of the probe code above depends on the position of the probe array |
| 350 | variable and the value of the probe identifier as different opcodes can be |
| 351 | used. As calculated in the table below the overhead per probe ranges between 4 |
| 352 | and 7 bytes of additional bytecode: |
| 353 | </p> |
| 354 | |
| 355 | <table class="coverage"> |
| 356 | <thead> |
| 357 | <tr> |
| 358 | <td>Possible Opcodes</td> |
| 359 | <td>Min. Size [bytes]</td> |
| 360 | <td>Max. Size [bytes]</td> |
| 361 | </tr> |
| 362 | </thead> |
| 363 | <tfoot> |
| 364 | <tr> |
| 365 | <td>Total:</td> |
| 366 | <td>4</td> |
| 367 | <td>7</td> |
| 368 | </tr> |
| 369 | </tfoot> |
| 370 | <tbody> |
| 371 | <tr> |
| 372 | <td><code>ALOAD_x</code>, <code>ALOAD</code> <sup>1</sup></td> |
| 373 | <td>1</td> |
| 374 | <td>2</td> |
| 375 | </tr> |
| 376 | <tr> |
| 377 | <td><code>ICONST_x</code>, <code>BIPUSH</code>, <code>SIPUSH</code>, <code>LDC</code>, <code>LDC_W</code> <sup>2</sup></td> |
| 378 | <td>1</td> |
| 379 | <td>3</td> |
| 380 | </tr> |
| 381 | <tr> |
| 382 | <td><code>ICONST_1</code></td> |
| 383 | <td>1</td> |
| 384 | <td>1</td> |
| 385 | </tr> |
| 386 | <tr> |
| 387 | <td><code>BASTORE</code></td> |
| 388 | <td>1</td> |
| 389 | <td>1</td> |
| 390 | </tr> |
| 391 | </tbody> |
| 392 | </table> |
| 393 | |
| 394 | <p> |
| 395 | <sup>1</sup> The probe array is the first variable after the arguments. |
| 396 | If the method arguments do not consume more that 3 slots the 1-byte opcode can |
| 397 | be used.<br/> |
| 398 | <sup>2</sup> 1-byte opcodes for ids 0 to 5, 2-byte opcode for ids up to 127, |
| 399 | 3-byte opcode for ids up to 32767. Ids values of 32768 or more require an |
| 400 | additional constant pool entry. For normal class files it is very unlikely to |
| 401 | require more than 32,000 probes. |
| 402 | </p> |
| 403 | |
| 404 | <h2>Performance</h2> |
| 405 | |
| 406 | <p> |
| 407 | The control flow analysis and probe insertion strategy described in this |
| 408 | document allows to efficiently record instruction and branch coverage. In |
| 409 | total classes instrumented with JaCoCo increase their size by about 30%. Due |
| 410 | to the fact that probe execution does not require any method calls, only local |
| 411 | instructions, the observed execution time overhead for instrumented |
| 412 | applications typically is less than 10%. |
| 413 | </p> |
| 414 | |
| 415 | <h2>References</h2> |
| 416 | |
| 417 | <ul> |
| 418 | <li><a href="http://asm.objectweb.org/">ASM byte code library</a> by Eric Bruneton at al.</li> |
| 419 | <li><a href="http://andrei.gmxhome.de/bytecode/index.html">Bytecode Outline Plug-In</a> by Andrei Loskutov</li> |
| 420 | <li><a href="http://en.wikipedia.org/wiki/Glossary_of_graph_theory">Wikipedia: Glossary of Graph Theory</a></li> |
| 421 | </ul> |
| 422 | |
| 423 | </div> |
| 424 | <div class="footer"> |
| 425 | <div class="right"><a href="@jacoco.home.url@">JaCoCo</a> @qualified.bundle.version@</div> |
| 426 | <a href="license.html">Copyright</a> © @copyright.years@ Mountainminds GmbH & Co. KG and Contributors |
| 427 | </div> |
| 428 | |
| 429 | </body> |
| 430 | </html> |