blob: 7f294fb429b518e4df2a290c5dfc66434a237b47 [file] [log] [blame]
Marc R. Hoffmann096bd1d2010-04-27 04:21:39 +00001<?xml version="1.0" encoding="ISO-8859-1" ?>
2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
3<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
4<head>
5 <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
6 <link rel="stylesheet" href=".resources/doc.css" charset="ISO-8859-1" type="text/css" />
Marc R. Hoffmannae8c2a82010-10-05 16:20:10 +00007 <link rel="stylesheet" href="../coverage/.resources/prettify.css" charset="ISO-8859-1" type="text/css" />
Marc R. Hoffmannd7d2f752010-05-06 21:12:31 +00008 <link rel="shortcut icon" href=".resources/report.gif" type="image/gif" />
Marc R. Hoffmannae8c2a82010-10-05 16:20:10 +00009 <script type="text/javascript" src="../coverage/.resources/prettify.js"></script>
Marc R. Hoffmann096bd1d2010-04-27 04:21:39 +000010 <title>JaCoCo - Control Flow Analysis</title>
11</head>
Marc R. Hoffmannae8c2a82010-10-05 16:20:10 +000012<body onload="prettyPrint()">
Marc R. Hoffmann096bd1d2010-04-27 04:21:39 +000013
14<div class="breadcrumb">
Marc R. Hoffmannd7d2f752010-05-06 21:12:31 +000015 <a href="../index.html" class="el_report">JaCoCo</a> &gt;
Marc R. Hoffmann096bd1d2010-04-27 04:21:39 +000016 <a href="index.html" class="el_group">Documentation</a> &gt;
17 <span class="el_source">Control Flow Analysis</span>
18</div>
19<div id="content">
20
21<h1>Control Flow Analysis for Java Methods</h1>
22
Marc R. Hoffmann096bd1d2010-04-27 04:21:39 +000023<p class="hint">
Marc R. Hoffmannb6372422010-11-14 17:30:00 +000024 Implementing a coverage tool that supports statement (C0) as well as branch
25 coverage coverage (C1) requires detailed analysis of the internal control flow
26 of Java methods. Due to the architecture of JaCoCo this analysis happens on
Marc R. Hoffmannc44971f2011-11-05 12:42:28 +000027 the bytecode of compiled class files. This document describes JaCoCo's
28 strategies for inserting probes into the control flow at runtime and analyzing
29 the actual code coverage. Marc R. Hoffmann, November 2011
Marc R. Hoffmann096bd1d2010-04-27 04:21:39 +000030</p>
31
Marc R. Hoffmannb6372422010-11-14 17:30:00 +000032<h2>Control Flow Graphs for Java Bytecode</h2>
Marc R. Hoffmann096bd1d2010-04-27 04:21:39 +000033
Marc R. Hoffmannb6372422010-11-14 17:30:00 +000034<p>
35 As an starting point we take the following example method that contains a
36 single branching point:
Marc R. Hoffmannae8c2a82010-10-05 16:20:10 +000037</p>
38
Marc R. Hoffmann0fd9c832011-03-16 20:42:40 +000039<pre class="source lang-java linenums">
40public static void example() {
41 a();
42 if (cond()) {
43 b();
44 } else {
45 c();
46 }
47 d();
48}
Marc R. Hoffmannae8c2a82010-10-05 16:20:10 +000049</pre>
50
Marc R. Hoffmannb6372422010-11-14 17:30:00 +000051<p>
52 A Java compiler will create the following bytecode from this example method.
53 Java bytecode is a linear sequence of instructions. Control flow is
54 implemented with <i>jump</i> instructions like the conditional
55 <code>IFEQ</code> or the unconditional <code>GOTO</code> opcode. The jump
56 targets are technically relative offsets to the target instruction. For better
57 readability we use symbolic labels (<code>L1</code>, <code>L2</code>) instead
58 (also the ASM API uses such symbolic labels):
59</p>
Marc R. Hoffmannae8c2a82010-10-05 16:20:10 +000060
Marc R. Hoffmann0fd9c832011-03-16 20:42:40 +000061<pre class="source linenums">
62public static example()V
Marc R. Hoffmannc44971f2011-11-05 12:42:28 +000063 INVOKESTATIC a()V
64 INVOKESTATIC cond()Z
65 IFEQ L1
66 INVOKESTATIC b()V
67 GOTO L2
68 L1: INVOKESTATIC c()V
69 L2: INVOKESTATIC d()V
70 RETURN
Marc R. Hoffmannae8c2a82010-10-05 16:20:10 +000071</pre>
72
Marc R. Hoffmannb6372422010-11-14 17:30:00 +000073<p>
74 The possible control flow in the bytecode above can be represented by a graph.
75 The nodes are byte code instruction, the edged of the graph represent the
Marc R. Hoffmannc44971f2011-11-05 12:42:28 +000076 possible control flow between the instructions. The control flow of the
77 example is shown in the left box of this diagram:
Marc R. Hoffmannb6372422010-11-14 17:30:00 +000078</p>
79
Marc R. Hoffmannc44971f2011-11-05 12:42:28 +000080<img src=".resources/flow-example.png" alt="Bytecode Control Flow"/>
81
Marc R. Hoffmann096bd1d2010-04-27 04:21:39 +000082
Marc R. Hoffmann66234cf2010-10-28 15:24:37 +000083<h3>Flow Edges</h3>
84
85<p>
86 The control flow graph of a Java method defined by Java byte code may have
87 the following Edges. Each edge connects a source instruction with a target
88 instruction. In some cases the source instruction or the target instruction
89 does not exist (virtual edges for method entry and exit) or cannot be
90 exactly specified (exception handlers).
91</p>
92
93<table class="coverage">
94 <thead>
95 <tr>
96 <td>Type</td>
97 <td>Source</td>
98 <td>Target</td>
99 <td>Remarks</td>
100 </tr>
101 </thead>
102 <tbody>
103 <tr>
104 <td>ENTRY</td>
105 <td>-</td>
106 <td>First instruction in method</td>
107 <td></td>
108 </tr>
109 <tr>
110 <td>SEQUENCE</td>
111 <td>Instruction, except <code>GOTO</code>, <code>xRETURN</code>,
112 <code>THROW</code>, <code>TABLESWITCH</code> and <code>LOOKUPSWITCH</code></td>
113 <td>Subsequent instruction</td>
114 <td></td>
115 </tr>
116 <tr>
117 <td>JUMP</td>
118 <td><code>GOTO</code>, <code>IFx</code>, <code>TABLESWITCH</code> or
119 <code>LOOKUPSWITCH</code> instruction</td>
120 <td>Target instruction</td>
121 <td><code>TABLESWITCH</code> and <code>LOOKUPSWITCH</code> will define
122 multiple edges.</td>
123 </tr>
124 <tr>
125 <td>EXHANDLER</td>
Marc R. Hoffmannb6372422010-11-14 17:30:00 +0000126 <td>Any instruction in handler scope</td>
Marc R. Hoffmann66234cf2010-10-28 15:24:37 +0000127 <td>Target instruction</td>
128 <td></td>
129 </tr>
130 <tr>
131 <td>EXIT</td>
132 <td><code>xRETURN</code> or <code>THROW</code> instruction</td>
133 <td>-</td>
134 <td></td>
135 </tr>
136 <tr>
137 <td>EXEXIT</td>
138 <td>Any instruction</td>
139 <td>-</td>
140 <td>Unhandled exception.</td>
141 </tr>
142 </tbody>
143</table>
144
Marc R. Hoffmannecf74502010-12-11 02:54:22 +0000145<p>
Marc R. Hoffmannc44971f2011-11-05 12:42:28 +0000146 The current JaCoCo implementation ignores edges caused by implicit exceptions
147 and the the method entry. This means we consider SEQUENCE, JUMP, EXIT.
Marc R. Hoffmannecf74502010-12-11 02:54:22 +0000148</p>
149
Marc R. Hoffmannc44971f2011-11-05 12:42:28 +0000150
Marc R. Hoffmannb6372422010-11-14 17:30:00 +0000151<h2>Probe Insertion Strategy</h2>
Marc R. Hoffmann096bd1d2010-04-27 04:21:39 +0000152
Marc R. Hoffmannecf74502010-12-11 02:54:22 +0000153<p>
154 Probes are additional instructions that can be inserted between existing
Marc R. Hoffmannc44971f2011-11-05 12:42:28 +0000155 instructions. They do not change the behavior of the method but record the
156 fact that they have been executed. One can think probes are placed on edges of
157 the control flow graph. Theoretically we could insert a probe at every edge of
158 the control flow graph. As a probe implementation itself requires multiple
159 bytecode instructions this would increase the size of the class files several
160 times and significantly slow down execution speed of the instrumented classes.
161 Fortunately this is not required, in fact we only need a few probes per method
162 depending on the control flow of the method. For example a method without any
163 branches requires a single probe only. The reason for this is that starting
164 from a certain probe we can back-trace the execution path and typically get
165 coverage information for multiple instructions.
166</p>
167
168<p>
169 If a probe has been executed we know that the corresponding edge has been
170 visited. From this edge we can conclude to other preceding nodes and edges:
Marc R. Hoffmannecf74502010-12-11 02:54:22 +0000171</p>
172
Marc R. Hoffmannb6372422010-11-14 17:30:00 +0000173<ul>
Marc R. Hoffmannb6372422010-11-14 17:30:00 +0000174 <li>If a edge has been visited, we know that the source node of the this edge
175 has been executed.</li>
176 <li>If a node has been executed and the node is the target of only one edge
177 we know that this edge has been visited.</li>
178</ul>
179
180<p>
Marc R. Hoffmannc44971f2011-11-05 12:42:28 +0000181 Recursively applying these rules allows to determine the execution status of
182 all instructions of a method &ndash; given that we have probes at the right
183 positions. Therefore JaCoCo inserts probes
Marc R. Hoffmannb6372422010-11-14 17:30:00 +0000184</p>
185
186<ul>
Marc R. Hoffmannc44971f2011-11-05 12:42:28 +0000187 <li>at every method exit (return or throws) and</li>
188 <li>at every edge where the target instruction is the target of more than one
Marc R. Hoffmannb6372422010-11-14 17:30:00 +0000189 edge.</li>
190</ul>
191
192<p>
Marc R. Hoffmannc44971f2011-11-05 12:42:28 +0000193 We recall that a probe is simply a small sequence of additional instructions
194 that needs to be inserted at a control flow edge. The following table
195 illustrates how this extra instructions are added in case of different edge
196 types.
Marc R. Hoffmannb6372422010-11-14 17:30:00 +0000197</p>
198
Marc R. Hoffmannecf74502010-12-11 02:54:22 +0000199<table class="coverage">
200 <thead>
201 <tr>
202 <td>Type</td>
203 <td>Before</td>
204 <td>After</td>
Marc R. Hoffmannc44971f2011-11-05 12:42:28 +0000205 <td>Remarks</td>
Marc R. Hoffmannecf74502010-12-11 02:54:22 +0000206 </tr>
207 </thead>
208 <tbody>
209 <tr>
210 <td>SEQUENCE</td>
Marc R. Hoffmannc44971f2011-11-05 12:42:28 +0000211 <td><img src=".resources/flow-sequence.png" alt="Sequence"/></td>
212 <td><img src=".resources/flow-sequence-probe.png" alt="Sequence with Probe"/></td>
213 <td>
214 In case of a simple sequence the probe is simply inserted between the
215 two instructions.
216 </td>
Marc R. Hoffmannecf74502010-12-11 02:54:22 +0000217 </tr>
218 <tr>
219 <td>JUMP (unconditional)</td>
Marc R. Hoffmannc44971f2011-11-05 12:42:28 +0000220 <td><img src=".resources/flow-goto.png" alt="Unconditional Jump"/></td>
221 <td><img src=".resources/flow-goto-probe.png" alt="Unconditional Jump with Probe"/></td>
222 <td>
223 As an unconditional jump is executed in any case, we can also insert the
224 probe just before the GOTO instruction.
225 </td>
Marc R. Hoffmannecf74502010-12-11 02:54:22 +0000226 </tr>
227 <tr>
228 <td>JUMP (conditional)</td>
Marc R. Hoffmannc44971f2011-11-05 12:42:28 +0000229 <td><img src=".resources/flow-cond.png" alt="Conditional Jump"/></td>
230 <td><img src=".resources/flow-cond-probe.png" alt="Conditional Jump with Probe"/></td>
231 <td>
232 Adding a probe to an conditional jump is little bit more tricky. We add
233 a probe to the end of the method and jump to this probe. Then we jump
234 back to the original target.
235 </td>
236 </tr>
237 <tr>
238 <td>EXIT</td>
239 <td><img src=".resources/flow-exit.png" alt="Exit"/></td>
240 <td><img src=".resources/flow-exit-probe.png" alt="Exit with Probe"/></td>
241 <td>
242 As is is the nature of RETURN and THROW statements to actually leave the
243 method we add the probe right before these statements.
244 </td>
Marc R. Hoffmannecf74502010-12-11 02:54:22 +0000245 </tr>
246 </tbody>
247</table>
Marc R. Hoffmannff410a62010-11-14 20:27:07 +0000248
Marc R. Hoffmannecf74502010-12-11 02:54:22 +0000249<p>
Marc R. Hoffmannc44971f2011-11-05 12:42:28 +0000250 Now let's see how this rules apply to the example snippet above. We see that
251 <code>INVOKE d()</code> instruction is the only node with more than one
252 incoming edge. So we need to place probes on those edges and another probe on
253 the only exit node. The result is shown the the right box of the diagram
254 above.
Marc R. Hoffmannecf74502010-12-11 02:54:22 +0000255</p>
Marc R. Hoffmannb6372422010-11-14 17:30:00 +0000256
257<h2>Probe Implementation</h2>
Marc R. Hoffmann096bd1d2010-04-27 04:21:39 +0000258
259<p>
260 Code coverage analysis is a runtime metric that provides execution details
261 of the software under test. This requires detailed recording about the
Marc R. Hoffmannae8c2a82010-10-05 16:20:10 +0000262 instructions (instruction coverage) that have been executed. For branch
263 coverage also the outcome of decisions has to be recorded. In any case
264 execution data is collected by so called probes:
Marc R. Hoffmann096bd1d2010-04-27 04:21:39 +0000265</p>
266
267<p class="hint">
Marc R. Hoffmann102e8372010-04-27 15:46:23 +0000268 A <b>probe</b> is a sequence of bytecode instructions that can be inserted
Marc R. Hoffmann096bd1d2010-04-27 04:21:39 +0000269 into a Java method. When the probe is executed, this fact is recorded and can
Marc R. Hoffmannc44971f2011-11-05 12:42:28 +0000270 be reported by the coverage runtime. The probe must not change the behavior
271 of the original code.
Marc R. Hoffmann096bd1d2010-04-27 04:21:39 +0000272</p>
273
274<p>
275 The only purpose of the probe is to record that it has been executed at least
276 once. The probe does not record the number of times it has been called or
277 collect any timing information. The latter is out of scope for code coverage
278 analysis and more in the objective of a performance analysis tool. Typically
279 multiple probes needs to be inserted into each method, therefore probes needs
280 to be identified. Also the probe implementation and the storage mechanism it
281 depends on needs to be thread safe as multi-threaded execution is a common
282 scenario for java applications (albeit not for plain unit tests). Probes must
283 not have any side effects on the original code of the method. Also they should
284 add minimal overhead.
285</p>
286
287<p>
288 So to summarize the requirements for execution probes:
289</p>
290
291<ul>
292 <li>Record execution</li>
293 <li>Identification for different probes</li>
294 <li>Thread safe</li>
295 <li>No side effects on application code</li>
296 <li>Minimal runtime overhead</li>
297</ul>
298
299<p>
300 JaCoCo implements probes with a <code>boolean[]</code> array instance per
301 class. Each probe corresponds to a entry in this array. Whenever the probe is
Marc R. Hoffmann102e8372010-04-27 15:46:23 +0000302 executed the entry is set to <code>true</code> with the following four
303 bytecode instructions:
Marc R. Hoffmann096bd1d2010-04-27 04:21:39 +0000304</p>
305
306<pre class="source">
307ALOAD probearray
308xPUSH probeid
309ICONST_1
310BASTORE
311</pre>
312
313<p>
314 Note that this probe code is thread safe, does not modify the operand stack
315 or modify local variables and is also thread safe. It does also not leave the
316 method though an external call. The only prerequisite is that the probe array
317 is available as a local variable. For this at the beginning of each method
318 additional instrumentation code needs to be added to obtain the array instance
Marc R. Hoffmann97728292010-04-27 20:29:23 +0000319 associated with the belonging class. To avoid code duplication the
320 initialization is delegated to a static private method
321 <code>$jacocoinit()</code> which is added to every non-interface class.
Marc R. Hoffmann102e8372010-04-27 15:46:23 +0000322</p>
323
324<p>
325 The size of the probe code above depends on the position of the probe array
326 variable and the value of the probe identifier as different opcodes can be
327 used. As calculated in the table below the overhead per probe ranges between 4
328 and 7 bytes of additional bytecode:
329</p>
330
331<table class="coverage">
332 <thead>
333 <tr>
334 <td>Possible Opcodes</td>
335 <td>Min. Size [bytes]</td>
336 <td>Max. Size [bytes]</td>
Marc R. Hoffmann102e8372010-04-27 15:46:23 +0000337 </tr>
338 </thead>
Marc R. Hoffmann102e8372010-04-27 15:46:23 +0000339 <tfoot>
340 <tr>
341 <td>Total:</td>
342 <td>4</td>
343 <td>7</td>
Marc R. Hoffmann102e8372010-04-27 15:46:23 +0000344 </tr>
345 </tfoot>
Marc R. Hoffmann97728292010-04-27 20:29:23 +0000346 <tbody>
347 <tr>
348 <td><code>ALOAD_x</code>, <code>ALOAD</code> <sup>1</sup></td>
349 <td>1</td>
350 <td>2</td>
351 </tr>
352 <tr>
353 <td><code>ICONST_x</code>, <code>BIPUSH</code>, <code>SIPUSH</code>, <code>LDC</code>, <code>LDC_W</code> <sup>2</sup></td>
354 <td>1</td>
355 <td>3</td>
356 </tr>
357 <tr>
358 <td><code>ICONST_1</code></td>
359 <td>1</td>
360 <td>1</td>
361 </tr>
362 <tr>
363 <td><code>BASTORE</code></td>
364 <td>1</td>
365 <td>1</td>
366 </tr>
367 </tbody>
Marc R. Hoffmann102e8372010-04-27 15:46:23 +0000368</table>
369
370<p>
Marc R. Hoffmann97728292010-04-27 20:29:23 +0000371 <sup>1</sup> The probe array is the first variable after the arguments.
372 If the method arguments do not consume more that 3 slots the 1-byte opcode can
373 be used.<br/>
374 <sup>2</sup> 1-byte opcodes for ids 0 to 5, 2-byte opcode for ids up to 127,
375 3-byte opcode for ids up to 32767. Ids values of 32768 or more require an
376 additional constant pool entry. For normal class files it is very unlikely to
377 require more than 32,000 probes.
Marc R. Hoffmann096bd1d2010-04-27 04:21:39 +0000378</p>
379
Marc R. Hoffmannc44971f2011-11-05 12:42:28 +0000380<h2>Performance</h2>
Marc R. Hoffmann096bd1d2010-04-27 04:21:39 +0000381
Marc R. Hoffmannc44971f2011-11-05 12:42:28 +0000382<p>
383 The control flow analysis and probe insertion strategy described in this
384 document allows to efficiently record instruction and branch coverage. In
385 total classes instrumented with JaCoCo increase their size by about 30%. Due
386 to the fact that probe execution does not require any method calls, only local
387 instructions, the observed execution time overhead for instrumented
388 applications typically is less than 10%.
389</p>
390
391<h2>References</h2>
Marc R. Hoffmann096bd1d2010-04-27 04:21:39 +0000392
393<ul>
Marc R. Hoffmannc44971f2011-11-05 12:42:28 +0000394 <li><a href="http://asm.objectweb.org/">ASM byte code library</a> by Eric Bruneton at al.</li>
Marc R. Hoffmannb6372422010-11-14 17:30:00 +0000395 <li><a href="http://andrei.gmxhome.de/bytecode/index.html">Bytecode Outline Plug-In</a> by Andrei Loskutov</li>
396 <li><a href="http://en.wikipedia.org/wiki/Glossary_of_graph_theory">Wikipedia: Glossary of Graph Theory</a></li>
Marc R. Hoffmann096bd1d2010-04-27 04:21:39 +0000397</ul>
398
Marc R. Hoffmann096bd1d2010-04-27 04:21:39 +0000399</div>
400<div class="footer">
Marc R. Hoffmann6c037452012-01-09 18:23:32 +0000401 <div class="right"><a href="@jacoco.home.url@">JaCoCo</a> @qualified.bundle.version@</div>
Marc R. Hoffmann096bd1d2010-04-27 04:21:39 +0000402 <a href="license.html">Copyright</a> &copy; @copyright.years@ Mountainminds GmbH &amp; Co. KG and Contributors
403</div>
404
405</body>
406</html>