Alex Lorenz | 6a12833 | 2014-08-19 17:05:58 +0000 | [diff] [blame] | 1 | .. role:: raw-html(raw) |
| 2 | :format: html |
| 3 | |
| 4 | ================================= |
| 5 | LLVM Code Coverage Mapping Format |
| 6 | ================================= |
| 7 | |
| 8 | .. contents:: |
| 9 | :local: |
| 10 | |
| 11 | Introduction |
| 12 | ============ |
| 13 | |
| 14 | LLVM's code coverage mapping format is used to provide code coverage |
Kazuaki Ishizaki | f65d4aa | 2020-01-22 11:30:57 +0800 | [diff] [blame] | 15 | analysis using LLVM's and Clang's instrumentation based profiling |
Alex Lorenz | 6a12833 | 2014-08-19 17:05:58 +0000 | [diff] [blame] | 16 | (Clang's ``-fprofile-instr-generate`` option). |
| 17 | |
Vedant Kumar | dd1ea9d | 2019-10-21 11:48:38 -0700 | [diff] [blame] | 18 | This document is aimed at those who would like to know how LLVM's code coverage |
| 19 | mapping works under the hood. A prior knowledge of how Clang's profile guided |
| 20 | optimization works is useful, but not required. For those interested in using |
| 21 | LLVM to provide code coverage analysis for their own programs, see the `Clang |
| 22 | documentation <https://clang.llvm.org/docs/SourceBasedCodeCoverage.html>`. |
Alex Lorenz | 6a12833 | 2014-08-19 17:05:58 +0000 | [diff] [blame] | 23 | |
Vedant Kumar | dd1ea9d | 2019-10-21 11:48:38 -0700 | [diff] [blame] | 24 | We start by briefly describing LLVM's code coverage mapping format and the |
Alex Lorenz | 6a12833 | 2014-08-19 17:05:58 +0000 | [diff] [blame] | 25 | way that Clang and LLVM's code coverage tool work with this format. After |
| 26 | the basics are down, more advanced features of the coverage mapping format |
| 27 | are discussed - such as the data structures, LLVM IR representation and |
| 28 | the binary encoding. |
| 29 | |
Alex Lorenz | 6a12833 | 2014-08-19 17:05:58 +0000 | [diff] [blame] | 30 | High Level Overview |
| 31 | =================== |
| 32 | |
| 33 | LLVM's code coverage mapping format is designed to be a self contained |
Vedant Kumar | dd1ea9d | 2019-10-21 11:48:38 -0700 | [diff] [blame] | 34 | data format that can be embedded into the LLVM IR and into object files. |
Alex Lorenz | 6a12833 | 2014-08-19 17:05:58 +0000 | [diff] [blame] | 35 | It's described in this document as a **mapping** format because its goal is |
| 36 | to store the data that is required for a code coverage tool to map between |
| 37 | the specific source ranges in a file and the execution counts obtained |
| 38 | after running the instrumented version of the program. |
| 39 | |
| 40 | The mapping data is used in two places in the code coverage process: |
| 41 | |
| 42 | 1. When clang compiles a source file with ``-fcoverage-mapping``, it |
| 43 | generates the mapping information that describes the mapping between the |
| 44 | source ranges and the profiling instrumentation counters. |
| 45 | This information gets embedded into the LLVM IR and conveniently |
| 46 | ends up in the final executable file when the program is linked. |
| 47 | |
| 48 | 2. It is also used by *llvm-cov* - the mapping information is extracted from an |
| 49 | object file and is used to associate the execution counts (the values of the |
| 50 | profile instrumentation counters), and the source ranges in a file. |
| 51 | After that, the tool is able to generate various code coverage reports |
| 52 | for the program. |
| 53 | |
| 54 | The coverage mapping format aims to be a "universal format" that would be |
| 55 | suitable for usage by any frontend, and not just by Clang. It also aims to |
| 56 | provide the frontend the possibility of generating the minimal coverage mapping |
| 57 | data in order to reduce the size of the IR and object files - for example, |
| 58 | instead of emitting mapping information for each statement in a function, the |
| 59 | frontend is allowed to group the statements with the same execution count into |
| 60 | regions of code, and emit the mapping information only for those regions. |
| 61 | |
| 62 | Advanced Concepts |
| 63 | ================= |
| 64 | |
| 65 | The remainder of this guide is meant to give you insight into the way the |
| 66 | coverage mapping format works. |
| 67 | |
| 68 | The coverage mapping format operates on a per-function level as the |
| 69 | profile instrumentation counters are associated with a specific function. |
| 70 | For each function that requires code coverage, the frontend has to create |
| 71 | coverage mapping data that can map between the source code ranges and |
| 72 | the profile instrumentation counters for that function. |
| 73 | |
| 74 | Mapping Region |
| 75 | -------------- |
| 76 | |
| 77 | The function's coverage mapping data contains an array of mapping regions. |
| 78 | A mapping region stores the `source code range`_ that is covered by this region, |
| 79 | the `file id <coverage file id_>`_, the `coverage mapping counter`_ and |
| 80 | the region's kind. |
| 81 | There are several kinds of mapping regions: |
| 82 | |
| 83 | * Code regions associate portions of source code and `coverage mapping |
| 84 | counters`_. They make up the majority of the mapping regions. They are used |
| 85 | by the code coverage tool to compute the execution counts for lines, |
| 86 | highlight the regions of code that were never executed, and to obtain |
| 87 | the various code coverage statistics for a function. |
| 88 | For example: |
| 89 | |
| 90 | :raw-html:`<pre class='highlight' style='line-height:initial;'><span>int main(int argc, const char *argv[]) </span><span style='background-color:#4A789C'>{ </span> <span class='c1'>// Code Region from 1:40 to 9:2</span> |
| 91 | <span style='background-color:#4A789C'> </span> |
| 92 | <span style='background-color:#4A789C'> if (argc > 1) </span><span style='background-color:#85C1F5'>{ </span> <span class='c1'>// Code Region from 3:17 to 5:4</span> |
| 93 | <span style='background-color:#85C1F5'> printf("%s\n", argv[1]); </span> |
| 94 | <span style='background-color:#85C1F5'> }</span><span style='background-color:#4A789C'> else </span><span style='background-color:#F6D55D'>{ </span> <span class='c1'>// Code Region from 5:10 to 7:4</span> |
| 95 | <span style='background-color:#F6D55D'> printf("\n"); </span> |
| 96 | <span style='background-color:#F6D55D'> }</span><span style='background-color:#4A789C'> </span> |
| 97 | <span style='background-color:#4A789C'> return 0; </span> |
| 98 | <span style='background-color:#4A789C'>}</span> |
| 99 | </pre>` |
| 100 | * Skipped regions are used to represent source ranges that were skipped |
| 101 | by Clang's preprocessor. They don't associate with |
| 102 | `coverage mapping counters`_, as the frontend knows that they are never |
| 103 | executed. They are used by the code coverage tool to mark the skipped lines |
| 104 | inside a function as non-code lines that don't have execution counts. |
| 105 | For example: |
| 106 | |
| 107 | :raw-html:`<pre class='highlight' style='line-height:initial;'><span>int main() </span><span style='background-color:#4A789C'>{ </span> <span class='c1'>// Code Region from 1:12 to 6:2</span> |
| 108 | <span style='background-color:#85C1F5'>#ifdef DEBUG </span> <span class='c1'>// Skipped Region from 2:1 to 4:2</span> |
| 109 | <span style='background-color:#85C1F5'> printf("Hello world"); </span> |
| 110 | <span style='background-color:#85C1F5'>#</span><span style='background-color:#4A789C'>endif </span> |
| 111 | <span style='background-color:#4A789C'> return 0; </span> |
| 112 | <span style='background-color:#4A789C'>}</span> |
| 113 | </pre>` |
| 114 | * Expansion regions are used to represent Clang's macro expansions. They |
| 115 | have an additional property - *expanded file id*. This property can be |
| 116 | used by the code coverage tool to find the mapping regions that are created |
| 117 | as a result of this macro expansion, by checking if their file id matches the |
| 118 | expanded file id. They don't associate with `coverage mapping counters`_, |
| 119 | as the code coverage tool can determine the execution count for this region |
| 120 | by looking up the execution count of the first region with a corresponding |
| 121 | file id. |
| 122 | For example: |
| 123 | |
| 124 | :raw-html:`<pre class='highlight' style='line-height:initial;'><span>int func(int x) </span><span style='background-color:#4A789C'>{ </span> |
| 125 | <span style='background-color:#4A789C'> #define MAX(x,y) </span><span style='background-color:#85C1F5'>((x) > (y)? </span><span style='background-color:#F6D55D'>(x)</span><span style='background-color:#85C1F5'> : </span><span style='background-color:#F4BA70'>(y)</span><span style='background-color:#85C1F5'>)</span><span style='background-color:#4A789C'> </span> |
| 126 | <span style='background-color:#4A789C'> return </span><span style='background-color:#7FCA9F'>MAX</span><span style='background-color:#4A789C'>(x, 42); </span> <span class='c1'>// Expansion Region from 3:10 to 3:13</span> |
| 127 | <span style='background-color:#4A789C'>}</span> |
| 128 | </pre>` |
| 129 | |
| 130 | .. _source code range: |
| 131 | |
| 132 | Source Range: |
| 133 | ^^^^^^^^^^^^^ |
| 134 | |
| 135 | The source range record contains the starting and ending location of a certain |
| 136 | mapping region. Both locations include the line and the column numbers. |
| 137 | |
| 138 | .. _coverage file id: |
| 139 | |
| 140 | File ID: |
| 141 | ^^^^^^^^ |
| 142 | |
| 143 | The file id an integer value that tells us |
| 144 | in which source file or macro expansion is this region located. |
| 145 | It enables Clang to produce mapping information for the code |
| 146 | defined inside macros, like this example demonstrates: |
| 147 | |
| 148 | :raw-html:`<pre class='highlight' style='line-height:initial;'><span>void func(const char *str) </span><span style='background-color:#4A789C'>{ </span> <span class='c1'>// Code Region from 1:28 to 6:2 with file id 0</span> |
| 149 | <span style='background-color:#4A789C'> #define PUT </span><span style='background-color:#85C1F5'>printf("%s\n", str)</span><span style='background-color:#4A789C'> </span> <span class='c1'>// 2 Code Regions from 2:15 to 2:34 with file ids 1 and 2</span> |
| 150 | <span style='background-color:#4A789C'> if(*str) </span> |
| 151 | <span style='background-color:#4A789C'> </span><span style='background-color:#F6D55D'>PUT</span><span style='background-color:#4A789C'>; </span> <span class='c1'>// Expansion Region from 4:5 to 4:8 with file id 0 that expands a macro with file id 1</span> |
| 152 | <span style='background-color:#4A789C'> </span><span style='background-color:#F6D55D'>PUT</span><span style='background-color:#4A789C'>; </span> <span class='c1'>// Expansion Region from 5:3 to 5:6 with file id 0 that expands a macro with file id 2</span> |
| 153 | <span style='background-color:#4A789C'>}</span> |
| 154 | </pre>` |
| 155 | |
| 156 | .. _coverage mapping counter: |
| 157 | .. _coverage mapping counters: |
| 158 | |
| 159 | Counter: |
| 160 | ^^^^^^^^ |
| 161 | |
| 162 | A coverage mapping counter can represents a reference to the profile |
| 163 | instrumentation counter. The execution count for a region with such counter |
| 164 | is determined by looking up the value of the corresponding profile |
| 165 | instrumentation counter. |
| 166 | |
| 167 | It can also represent a binary arithmetical expression that operates on |
| 168 | coverage mapping counters or other expressions. |
| 169 | The execution count for a region with an expression counter is determined by |
| 170 | evaluating the expression's arguments and then adding them together or |
| 171 | subtracting them from one another. |
| 172 | In the example below, a subtraction expression is used to compute the execution |
| 173 | count for the compound statement that follows the *else* keyword: |
| 174 | |
| 175 | :raw-html:`<pre class='highlight' style='line-height:initial;'><span>int main(int argc, const char *argv[]) </span><span style='background-color:#4A789C'>{ </span> <span class='c1'>// Region's counter is a reference to the profile counter #0</span> |
| 176 | <span style='background-color:#4A789C'> </span> |
| 177 | <span style='background-color:#4A789C'> if (argc > 1) </span><span style='background-color:#85C1F5'>{ </span> <span class='c1'>// Region's counter is a reference to the profile counter #1</span> |
| 178 | <span style='background-color:#85C1F5'> printf("%s\n", argv[1]); </span><span> </span> |
| 179 | <span style='background-color:#85C1F5'> }</span><span style='background-color:#4A789C'> else </span><span style='background-color:#F6D55D'>{ </span> <span class='c1'>// Region's counter is an expression (reference to the profile counter #0 - reference to the profile counter #1)</span> |
| 180 | <span style='background-color:#F6D55D'> printf("\n"); </span> |
| 181 | <span style='background-color:#F6D55D'> }</span><span style='background-color:#4A789C'> </span> |
| 182 | <span style='background-color:#4A789C'> return 0; </span> |
| 183 | <span style='background-color:#4A789C'>}</span> |
| 184 | </pre>` |
| 185 | |
| 186 | Finally, a coverage mapping counter can also represent an execution count of |
| 187 | of zero. The zero counter is used to provide coverage mapping for |
| 188 | unreachable statements and expressions, like in the example below: |
| 189 | |
| 190 | :raw-html:`<pre class='highlight' style='line-height:initial;'><span>int main() </span><span style='background-color:#4A789C'>{ </span> |
| 191 | <span style='background-color:#4A789C'> return 0; </span> |
| 192 | <span style='background-color:#4A789C'> </span><span style='background-color:#85C1F5'>printf("Hello world!\n")</span><span style='background-color:#4A789C'>; </span> <span class='c1'>// Unreachable region's counter is zero</span> |
| 193 | <span style='background-color:#4A789C'>}</span> |
| 194 | </pre>` |
| 195 | |
| 196 | The zero counters allow the code coverage tool to display proper line execution |
| 197 | counts for the unreachable lines and highlight the unreachable code. |
| 198 | Without them, the tool would think that those lines and regions were still |
| 199 | executed, as it doesn't possess the frontend's knowledge. |
| 200 | |
| 201 | LLVM IR Representation |
| 202 | ====================== |
| 203 | |
Vedant Kumar | dd1ea9d | 2019-10-21 11:48:38 -0700 | [diff] [blame] | 204 | The coverage mapping data is stored in the LLVM IR using a global constant |
| 205 | structure variable called *__llvm_coverage_mapping* with the *IPSK_covmap* |
| 206 | section specifier (i.e. ".lcovmap$M" on Windows and "__llvm_covmap" elsewhere). |
Alex Lorenz | 6a12833 | 2014-08-19 17:05:58 +0000 | [diff] [blame] | 207 | |
| 208 | For example, let’s consider a C file and how it gets compiled to LLVM: |
| 209 | |
| 210 | .. _coverage mapping sample: |
| 211 | |
| 212 | .. code-block:: c |
| 213 | |
| 214 | int foo() { |
| 215 | return 42; |
| 216 | } |
| 217 | int bar() { |
| 218 | return 13; |
| 219 | } |
| 220 | |
Vedant Kumar | dd1ea9d | 2019-10-21 11:48:38 -0700 | [diff] [blame] | 221 | The coverage mapping variable generated by Clang has 2 fields: |
Xinliang David Li | a0da640 | 2016-01-06 01:23:41 +0000 | [diff] [blame] | 222 | |
| 223 | * Coverage mapping header. |
| 224 | |
Vedant Kumar | dd1ea9d | 2019-10-21 11:48:38 -0700 | [diff] [blame] | 225 | * An optionally compressed list of filenames present in the translation unit. |
Xinliang David Li | a0da640 | 2016-01-06 01:23:41 +0000 | [diff] [blame] | 226 | |
Vedant Kumar | dd1ea9d | 2019-10-21 11:48:38 -0700 | [diff] [blame] | 227 | The variable has 8-byte alignment because ld64 cannot always pack symbols from |
| 228 | different object files tightly (the word-level alignment assumption is baked in |
| 229 | too deeply). |
Alex Lorenz | 6a12833 | 2014-08-19 17:05:58 +0000 | [diff] [blame] | 230 | |
| 231 | .. code-block:: llvm |
| 232 | |
Vedant Kumar | dd1ea9d | 2019-10-21 11:48:38 -0700 | [diff] [blame] | 233 | @__llvm_coverage_mapping = internal constant { { i32, i32, i32, i32 }, [32 x i8] } |
| 234 | { |
Xinliang David Li | 58999d9 | 2016-01-04 20:00:47 +0000 | [diff] [blame] | 235 | { i32, i32, i32, i32 } ; Coverage map header |
| 236 | { |
Vedant Kumar | dd1ea9d | 2019-10-21 11:48:38 -0700 | [diff] [blame] | 237 | i32 0, ; Always 0. In prior versions, the number of affixed function records |
| 238 | i32 32, ; The length of the string that contains the encoded translation unit filenames |
| 239 | i32 0, ; Always 0. In prior versions, the length of the affixed string that contains the encoded coverage mapping data |
| 240 | i32 3, ; Coverage mapping format version |
Xinliang David Li | 58999d9 | 2016-01-04 20:00:47 +0000 | [diff] [blame] | 241 | }, |
Vedant Kumar | dd1ea9d | 2019-10-21 11:48:38 -0700 | [diff] [blame] | 242 | [32 x i8] c"..." ; Encoded data (dissected later) |
Alex Lorenz | 6a12833 | 2014-08-19 17:05:58 +0000 | [diff] [blame] | 243 | }, section "__llvm_covmap", align 8 |
| 244 | |
Vedant Kumar | dd1ea9d | 2019-10-21 11:48:38 -0700 | [diff] [blame] | 245 | The current version of the format is version 4. There are two differences from version 3: |
Vedant Kumar | ad8f637 | 2017-09-18 23:37:28 +0000 | [diff] [blame] | 246 | |
Vedant Kumar | dd1ea9d | 2019-10-21 11:48:38 -0700 | [diff] [blame] | 247 | * Function records are now named symbols, and are marked *linkonce_odr*. This |
| 248 | allows linkers to merge duplicate function records. Merging of duplicate |
| 249 | *dummy* records (emitted for functions included-but-not-used in a translation |
| 250 | unit) reduces size bloat in the coverage mapping data. As part of this |
| 251 | change, region mapping information for a function is now included within the |
| 252 | function record, instead of being affixed to the coverage header. |
| 253 | |
| 254 | * The filename list for a translation unit may optionally be zlib-compressed. |
| 255 | |
| 256 | The only difference between versions 3 and 2 is that a special encoding for |
| 257 | column end locations was introduced to indicate gap regions. |
| 258 | |
| 259 | In version 1, the function record for *foo* was defined as follows: |
Xinliang David Li | a82d6c0 | 2016-02-08 18:13:49 +0000 | [diff] [blame] | 260 | |
| 261 | .. code-block:: llvm |
| 262 | |
| 263 | { i8*, i32, i32, i64 } { i8* getelementptr inbounds ([3 x i8]* @__profn_foo, i32 0, i32 0), ; Function's name |
| 264 | i32 3, ; Function's name length |
| 265 | i32 9, ; Function's encoded coverage mapping data string length |
| 266 | i64 0 ; Function's structural hash |
| 267 | } |
| 268 | |
Vedant Kumar | dd1ea9d | 2019-10-21 11:48:38 -0700 | [diff] [blame] | 269 | In version 2, the function record for *foo* was defined as follows: |
| 270 | |
| 271 | .. code-block:: llvm |
| 272 | |
| 273 | { i64, i32, i64 } { |
| 274 | i64 0x5cf8c24cdb18bdac, ; Function's name MD5 |
| 275 | i32 9, ; Function's encoded coverage mapping data string length |
| 276 | i64 0 ; Function's structural hash |
Xinliang David Li | a82d6c0 | 2016-02-08 18:13:49 +0000 | [diff] [blame] | 277 | |
Xinliang David Li | a0da640 | 2016-01-06 01:23:41 +0000 | [diff] [blame] | 278 | Coverage Mapping Header: |
| 279 | ------------------------ |
Alex Lorenz | 6a12833 | 2014-08-19 17:05:58 +0000 | [diff] [blame] | 280 | |
Xinliang David Li | a0da640 | 2016-01-06 01:23:41 +0000 | [diff] [blame] | 281 | The coverage mapping header has the following fields: |
Alex Lorenz | 6a12833 | 2014-08-19 17:05:58 +0000 | [diff] [blame] | 282 | |
Vedant Kumar | dd1ea9d | 2019-10-21 11:48:38 -0700 | [diff] [blame] | 283 | * The number of function records affixed to the coverage header. Always 0, but present for backwards compatibility. |
Xinliang David Li | a0da640 | 2016-01-06 01:23:41 +0000 | [diff] [blame] | 284 | |
| 285 | * The length of the string in the third field of *__llvm_coverage_mapping* that contains the encoded translation unit filenames. |
| 286 | |
Vedant Kumar | dd1ea9d | 2019-10-21 11:48:38 -0700 | [diff] [blame] | 287 | * The length of the string in the third field of *__llvm_coverage_mapping* that contains any encoded coverage mapping data affixed to the coverage header. Always 0, but present for backwards compatibility. |
Xinliang David Li | a0da640 | 2016-01-06 01:23:41 +0000 | [diff] [blame] | 288 | |
Vedant Kumar | dd1ea9d | 2019-10-21 11:48:38 -0700 | [diff] [blame] | 289 | * The format version. The current version is 4 (encoded as a 3). |
Alex Lorenz | 6a12833 | 2014-08-19 17:05:58 +0000 | [diff] [blame] | 290 | |
| 291 | .. _function records: |
| 292 | |
| 293 | Function record: |
| 294 | ---------------- |
| 295 | |
| 296 | A function record is a structure of the following type: |
| 297 | |
| 298 | .. code-block:: llvm |
| 299 | |
Vedant Kumar | dd1ea9d | 2019-10-21 11:48:38 -0700 | [diff] [blame] | 300 | { i64, i32, i64, i64, [? x i8] } |
Alex Lorenz | 6a12833 | 2014-08-19 17:05:58 +0000 | [diff] [blame] | 301 | |
Vedant Kumar | dd1ea9d | 2019-10-21 11:48:38 -0700 | [diff] [blame] | 302 | It contains the function name's MD5, the length of the encoded mapping data for |
| 303 | that function, the function's structural hash value, the hash of the filenames |
| 304 | in the function's translation unit, and the encoded mapping data. |
Alex Lorenz | 6a12833 | 2014-08-19 17:05:58 +0000 | [diff] [blame] | 305 | |
| 306 | Dissecting the sample: |
| 307 | ^^^^^^^^^^^^^^^^^^^^^^ |
| 308 | |
| 309 | Here's an overview of the encoded data that was stored in the |
| 310 | IR for the `coverage mapping sample`_ that was shown earlier: |
| 311 | |
| 312 | * The IR contains the following string constant that represents the encoded |
| 313 | coverage mapping data for the sample translation unit: |
| 314 | |
| 315 | .. code-block:: llvm |
| 316 | |
Vedant Kumar | dd1ea9d | 2019-10-21 11:48:38 -0700 | [diff] [blame] | 317 | c"\01\15\1Dx\DA\13\D1\0F-N-*\D6/+\CE\D6/\C9-\D0O\CB\CF\D7K\06\00N+\07]" |
Alex Lorenz | 6a12833 | 2014-08-19 17:05:58 +0000 | [diff] [blame] | 318 | |
| 319 | * The string contains values that are encoded in the LEB128 format, which is |
Vedant Kumar | dd1ea9d | 2019-10-21 11:48:38 -0700 | [diff] [blame] | 320 | used throughout for storing integers. It also contains a compressed payload. |
Alex Lorenz | 6a12833 | 2014-08-19 17:05:58 +0000 | [diff] [blame] | 321 | |
Vedant Kumar | dd1ea9d | 2019-10-21 11:48:38 -0700 | [diff] [blame] | 322 | * The first three LEB128-encoded numbers in the sample specify the number of |
| 323 | filenames, the length of the uncompressed filenames, and the length of the |
| 324 | compressed payload (or 0 if compression is disabled). In this sample, there |
| 325 | is 1 filename that is 21 bytes in length (uncompressed), and stored in 29 |
| 326 | bytes (compressed). |
Alex Lorenz | 6a12833 | 2014-08-19 17:05:58 +0000 | [diff] [blame] | 327 | |
Vedant Kumar | dd1ea9d | 2019-10-21 11:48:38 -0700 | [diff] [blame] | 328 | * The coverage mapping from the first function record is encoded in this string: |
Alex Lorenz | 6a12833 | 2014-08-19 17:05:58 +0000 | [diff] [blame] | 329 | |
| 330 | .. code-block:: llvm |
| 331 | |
| 332 | c"\01\00\00\01\01\01\0C\02\02" |
| 333 | |
| 334 | This string consists of the following bytes: |
| 335 | |
| 336 | +----------+-------------------------------------------------------------------------------------------------------------------------+ |
| 337 | | ``0x01`` | The number of file ids used by this function. There is only one file id used by the mapping data in this function. | |
| 338 | +----------+-------------------------------------------------------------------------------------------------------------------------+ |
| 339 | | ``0x00`` | An index into the filenames array which corresponds to the file "/Users/alex/test.c". | |
| 340 | +----------+-------------------------------------------------------------------------------------------------------------------------+ |
| 341 | | ``0x00`` | The number of counter expressions used by this function. This function doesn't use any expressions. | |
| 342 | +----------+-------------------------------------------------------------------------------------------------------------------------+ |
| 343 | | ``0x01`` | The number of mapping regions that are stored in an array for the function's file id #0. | |
| 344 | +----------+-------------------------------------------------------------------------------------------------------------------------+ |
| 345 | | ``0x01`` | The coverage mapping counter for the first region in this function. The value of 1 tells us that it's a coverage | |
Xinliang David Li | a0da640 | 2016-01-06 01:23:41 +0000 | [diff] [blame] | 346 | | | mapping counter that is a reference to the profile instrumentation counter with an index of 0. | |
Alex Lorenz | 6a12833 | 2014-08-19 17:05:58 +0000 | [diff] [blame] | 347 | +----------+-------------------------------------------------------------------------------------------------------------------------+ |
| 348 | | ``0x01`` | The starting line of the first mapping region in this function. | |
| 349 | +----------+-------------------------------------------------------------------------------------------------------------------------+ |
| 350 | | ``0x0C`` | The starting column of the first mapping region in this function. | |
| 351 | +----------+-------------------------------------------------------------------------------------------------------------------------+ |
| 352 | | ``0x02`` | The ending line of the first mapping region in this function. | |
| 353 | +----------+-------------------------------------------------------------------------------------------------------------------------+ |
| 354 | | ``0x02`` | The ending column of the first mapping region in this function. | |
| 355 | +----------+-------------------------------------------------------------------------------------------------------------------------+ |
| 356 | |
| 357 | * The length of the substring that contains the encoded coverage mapping data |
| 358 | for the second function record is also 9. It's structured like the mapping data |
| 359 | for the first function record. |
| 360 | |
| 361 | * The two trailing bytes are zeroes and are used to pad the coverage mapping |
| 362 | data to give it the 8 byte alignment. |
| 363 | |
| 364 | Encoding |
| 365 | ======== |
| 366 | |
| 367 | The per-function coverage mapping data is encoded as a stream of bytes, |
| 368 | with a simple structure. The structure consists of the encoding |
| 369 | `types <cvmtypes_>`_ like variable-length unsigned integers, that |
| 370 | are used to encode `File ID Mapping`_, `Counter Expressions`_ and |
| 371 | the `Mapping Regions`_. |
| 372 | |
| 373 | The format of the structure follows: |
| 374 | |
| 375 | ``[file id mapping, counter expressions, mapping regions]`` |
| 376 | |
| 377 | The translation unit filenames are encoded using the same encoding |
| 378 | `types <cvmtypes_>`_ as the per-function coverage mapping data, with the |
| 379 | following structure: |
| 380 | |
| 381 | ``[numFilenames : LEB128, filename0 : string, filename1 : string, ...]`` |
| 382 | |
| 383 | .. _cvmtypes: |
| 384 | |
| 385 | Types |
| 386 | ----- |
| 387 | |
| 388 | This section describes the basic types that are used by the encoding format |
| 389 | and can appear after ``:`` in the ``[foo : type]`` description. |
| 390 | |
| 391 | .. _LEB128: |
| 392 | |
| 393 | LEB128 |
| 394 | ^^^^^^ |
| 395 | |
Sylvestre Ledru | 84666a1 | 2016-02-14 20:16:22 +0000 | [diff] [blame] | 396 | LEB128 is an unsigned integer value that is encoded using DWARF's LEB128 |
Alex Lorenz | 6a12833 | 2014-08-19 17:05:58 +0000 | [diff] [blame] | 397 | encoding, optimizing for the case where values are small |
| 398 | (1 byte for values less than 128). |
| 399 | |
Aaron Ballman | f733993 | 2016-07-23 18:52:21 +0000 | [diff] [blame] | 400 | .. _CoverageStrings: |
Alex Lorenz | 6a12833 | 2014-08-19 17:05:58 +0000 | [diff] [blame] | 401 | |
| 402 | Strings |
| 403 | ^^^^^^^ |
| 404 | |
| 405 | ``[length : LEB128, characters...]`` |
| 406 | |
| 407 | String values are encoded with a `LEB value <LEB128_>`_ for the length |
| 408 | of the string and a sequence of bytes for its characters. |
| 409 | |
| 410 | .. _file id mapping: |
| 411 | |
| 412 | File ID Mapping |
| 413 | --------------- |
| 414 | |
| 415 | ``[numIndices : LEB128, filenameIndex0 : LEB128, filenameIndex1 : LEB128, ...]`` |
| 416 | |
| 417 | File id mapping in a function's coverage mapping stream |
| 418 | contains the indices into the translation unit's filenames array. |
| 419 | |
| 420 | Counter |
| 421 | ------- |
| 422 | |
| 423 | ``[value : LEB128]`` |
| 424 | |
| 425 | A `coverage mapping counter`_ is stored in a single `LEB value <LEB128_>`_. |
| 426 | It is composed of two things --- the `tag <counter-tag_>`_ |
| 427 | which is stored in the lowest 2 bits, and the `counter data`_ which is stored |
| 428 | in the remaining bits. |
| 429 | |
| 430 | .. _counter-tag: |
| 431 | |
| 432 | Tag: |
| 433 | ^^^^ |
| 434 | |
| 435 | The counter's tag encodes the counter's kind |
| 436 | and, if the counter is an expression, the expression's kind. |
| 437 | The possible tag values are: |
| 438 | |
| 439 | * 0 - The counter is zero. |
| 440 | |
| 441 | * 1 - The counter is a reference to the profile instrumentation counter. |
| 442 | |
| 443 | * 2 - The counter is a subtraction expression. |
| 444 | |
| 445 | * 3 - The counter is an addition expression. |
| 446 | |
| 447 | .. _counter data: |
| 448 | |
| 449 | Data: |
| 450 | ^^^^^ |
| 451 | |
| 452 | The counter's data is interpreted in the following manner: |
| 453 | |
| 454 | * When the counter is a reference to the profile instrumentation counter, |
| 455 | then the counter's data is the id of the profile counter. |
| 456 | * When the counter is an expression, then the counter's data |
| 457 | is the index into the array of counter expressions. |
| 458 | |
| 459 | .. _Counter Expressions: |
| 460 | |
| 461 | Counter Expressions |
| 462 | ------------------- |
| 463 | |
| 464 | ``[numExpressions : LEB128, expr0LHS : LEB128, expr0RHS : LEB128, expr1LHS : LEB128, expr1RHS : LEB128, ...]`` |
| 465 | |
| 466 | Counter expressions consist of two counters as they |
| 467 | represent binary arithmetic operations. |
| 468 | The expression's kind is determined from the `tag <counter-tag_>`_ of the |
| 469 | counter that references this expression. |
| 470 | |
| 471 | .. _Mapping Regions: |
| 472 | |
| 473 | Mapping Regions |
| 474 | --------------- |
| 475 | |
| 476 | ``[numRegionArrays : LEB128, regionsForFile0, regionsForFile1, ...]`` |
| 477 | |
| 478 | The mapping regions are stored in an array of sub-arrays where every |
| 479 | region in a particular sub-array has the same file id. |
| 480 | |
| 481 | The file id for a sub-array of regions is the index of that |
| 482 | sub-array in the main array e.g. The first sub-array will have the file id |
| 483 | of 0. |
| 484 | |
| 485 | Sub-Array of Regions |
| 486 | ^^^^^^^^^^^^^^^^^^^^ |
| 487 | |
| 488 | ``[numRegions : LEB128, region0, region1, ...]`` |
| 489 | |
| 490 | The mapping regions for a specific file id are stored in an array that is |
| 491 | sorted in an ascending order by the region's starting location. |
| 492 | |
| 493 | Mapping Region |
| 494 | ^^^^^^^^^^^^^^ |
| 495 | |
| 496 | ``[header, source range]`` |
| 497 | |
| 498 | The mapping region record contains two sub-records --- |
| 499 | the `header`_, which stores the counter and/or the region's kind, |
| 500 | and the `source range`_ that contains the starting and ending |
| 501 | location of this region. |
| 502 | |
| 503 | .. _header: |
| 504 | |
| 505 | Header |
| 506 | ^^^^^^ |
| 507 | |
| 508 | ``[counter]`` |
| 509 | |
| 510 | or |
| 511 | |
| 512 | ``[pseudo-counter]`` |
| 513 | |
| 514 | The header encodes the region's counter and the region's kind. |
| 515 | |
| 516 | The value of the counter's tag distinguishes between the counters and |
| 517 | pseudo-counters --- if the tag is zero, than this header contains a |
| 518 | pseudo-counter, otherwise this header contains an ordinary counter. |
| 519 | |
| 520 | Counter: |
| 521 | """""""" |
| 522 | |
| 523 | A mapping region whose header has a counter with a non-zero tag is |
| 524 | a code region. |
| 525 | |
| 526 | Pseudo-Counter: |
| 527 | """"""""""""""" |
| 528 | |
| 529 | ``[value : LEB128]`` |
| 530 | |
| 531 | A pseudo-counter is stored in a single `LEB value <LEB128_>`_, just like |
| 532 | the ordinary counter. It has the following interpretation: |
| 533 | |
| 534 | * bits 0-1: tag, which is always 0. |
| 535 | |
| 536 | * bit 2: expansionRegionTag. If this bit is set, then this mapping region |
| 537 | is an expansion region. |
| 538 | |
| 539 | * remaining bits: data. If this region is an expansion region, then the data |
| 540 | contains the expanded file id of that region. |
| 541 | |
| 542 | Otherwise, the data contains the region's kind. The possible region |
| 543 | kind values are: |
| 544 | |
| 545 | * 0 - This mapping region is a code region with a counter of zero. |
| 546 | * 2 - This mapping region is a skipped region. |
| 547 | |
| 548 | .. _source range: |
| 549 | |
| 550 | Source Range |
| 551 | ^^^^^^^^^^^^ |
| 552 | |
| 553 | ``[deltaLineStart : LEB128, columnStart : LEB128, numLines : LEB128, columnEnd : LEB128]`` |
| 554 | |
| 555 | The source range record contains the following fields: |
| 556 | |
| 557 | * *deltaLineStart*: The difference between the starting line of the |
| 558 | current mapping region and the starting line of the previous mapping region. |
| 559 | |
| 560 | If the current mapping region is the first region in the current |
| 561 | sub-array, then it stores the starting line of that region. |
| 562 | |
| 563 | * *columnStart*: The starting column of the mapping region. |
| 564 | |
| 565 | * *numLines*: The difference between the ending line and the starting line |
| 566 | of the current mapping region. |
| 567 | |
Vedant Kumar | ad8f637 | 2017-09-18 23:37:28 +0000 | [diff] [blame] | 568 | * *columnEnd*: The ending column of the mapping region. If the high bit is set, |
| 569 | the current mapping region is a gap area. A count for a gap area is only used |
| 570 | as the line execution count if there are no other regions on a line. |