Sergey Matveev | 33e3224 | 2015-04-23 21:29:37 +0000 | [diff] [blame] | 1 | ================= |
Sergey Matveev | 07e2d28 | 2015-04-23 20:40:04 +0000 | [diff] [blame] | 2 | SanitizerCoverage |
Sergey Matveev | 33e3224 | 2015-04-23 21:29:37 +0000 | [diff] [blame] | 3 | ================= |
Sergey Matveev | 07e2d28 | 2015-04-23 20:40:04 +0000 | [diff] [blame] | 4 | |
| 5 | .. contents:: |
| 6 | :local: |
| 7 | |
| 8 | Introduction |
| 9 | ============ |
| 10 | |
| 11 | Sanitizer tools have a very simple code coverage tool built in. It allows to |
| 12 | get function-level, basic-block-level, and edge-level coverage at a very low |
| 13 | cost. |
| 14 | |
| 15 | How to build and run |
| 16 | ==================== |
| 17 | |
| 18 | SanitizerCoverage can be used with :doc:`AddressSanitizer`, |
Evgeniy Stepanov | 5b49eb4 | 2016-06-14 21:33:40 +0000 | [diff] [blame] | 19 | :doc:`LeakSanitizer`, :doc:`MemorySanitizer`, |
| 20 | UndefinedBehaviorSanitizer, or without any sanitizer. Pass one of the |
| 21 | following compile-time flags: |
Sergey Matveev | 07e2d28 | 2015-04-23 20:40:04 +0000 | [diff] [blame] | 22 | |
Alexey Samsonov | 8fffba1 | 2015-05-07 23:04:19 +0000 | [diff] [blame] | 23 | * ``-fsanitize-coverage=func`` for function-level coverage (very fast). |
| 24 | * ``-fsanitize-coverage=bb`` for basic-block-level coverage (may add up to 30% |
Sergey Matveev | 07e2d28 | 2015-04-23 20:40:04 +0000 | [diff] [blame] | 25 | **extra** slowdown). |
Alexey Samsonov | 8fffba1 | 2015-05-07 23:04:19 +0000 | [diff] [blame] | 26 | * ``-fsanitize-coverage=edge`` for edge-level coverage (up to 40% slowdown). |
Sergey Matveev | 07e2d28 | 2015-04-23 20:40:04 +0000 | [diff] [blame] | 27 | |
Alexey Samsonov | 8fffba1 | 2015-05-07 23:04:19 +0000 | [diff] [blame] | 28 | You may also specify ``-fsanitize-coverage=indirect-calls`` for |
| 29 | additional `caller-callee coverage`_. |
Sergey Matveev | 07e2d28 | 2015-04-23 20:40:04 +0000 | [diff] [blame] | 30 | |
Evgeniy Stepanov | 5b49eb4 | 2016-06-14 21:33:40 +0000 | [diff] [blame] | 31 | At run time, pass ``coverage=1`` in ``ASAN_OPTIONS``, |
| 32 | ``LSAN_OPTIONS``, ``MSAN_OPTIONS`` or ``UBSAN_OPTIONS``, as |
| 33 | appropriate. For the standalone coverage mode, use ``UBSAN_OPTIONS``. |
Alexey Samsonov | 8fffba1 | 2015-05-07 23:04:19 +0000 | [diff] [blame] | 34 | |
| 35 | To get `Coverage counters`_, add ``-fsanitize-coverage=8bit-counters`` |
Sergey Matveev | 07e2d28 | 2015-04-23 20:40:04 +0000 | [diff] [blame] | 36 | to one of the above compile-time flags. At runtime, use |
| 37 | ``*SAN_OPTIONS=coverage=1:coverage_counters=1``. |
| 38 | |
| 39 | Example: |
| 40 | |
| 41 | .. code-block:: console |
| 42 | |
| 43 | % cat -n cov.cc |
| 44 | 1 #include <stdio.h> |
| 45 | 2 __attribute__((noinline)) |
| 46 | 3 void foo() { printf("foo\n"); } |
| 47 | 4 |
| 48 | 5 int main(int argc, char **argv) { |
| 49 | 6 if (argc == 2) |
| 50 | 7 foo(); |
| 51 | 8 printf("main\n"); |
| 52 | 9 } |
Alexey Samsonov | 8fffba1 | 2015-05-07 23:04:19 +0000 | [diff] [blame] | 53 | % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=func |
Sergey Matveev | 07e2d28 | 2015-04-23 20:40:04 +0000 | [diff] [blame] | 54 | % ASAN_OPTIONS=coverage=1 ./a.out; ls -l *sancov |
| 55 | main |
| 56 | -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov |
| 57 | % ASAN_OPTIONS=coverage=1 ./a.out foo ; ls -l *sancov |
| 58 | foo |
| 59 | main |
| 60 | -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov |
| 61 | -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov |
| 62 | |
| 63 | Every time you run an executable instrumented with SanitizerCoverage |
| 64 | one ``*.sancov`` file is created during the process shutdown. |
| 65 | If the executable is dynamically linked against instrumented DSOs, |
| 66 | one ``*.sancov`` file will be also created for every DSO. |
| 67 | |
| 68 | Postprocessing |
| 69 | ============== |
| 70 | |
| 71 | The format of ``*.sancov`` files is very simple: the first 8 bytes is the magic, |
| 72 | one of ``0xC0BFFFFFFFFFFF64`` and ``0xC0BFFFFFFFFFFF32``. The last byte of the |
| 73 | magic defines the size of the following offsets. The rest of the data is the |
| 74 | offsets in the corresponding binary/DSO that were executed during the run. |
| 75 | |
| 76 | A simple script |
| 77 | ``$LLVM/projects/compiler-rt/lib/sanitizer_common/scripts/sancov.py`` is |
| 78 | provided to dump these offsets. |
| 79 | |
| 80 | .. code-block:: console |
| 81 | |
| 82 | % sancov.py print a.out.22679.sancov a.out.22673.sancov |
| 83 | sancov.py: read 2 PCs from a.out.22679.sancov |
| 84 | sancov.py: read 1 PCs from a.out.22673.sancov |
| 85 | sancov.py: 2 files merged; 2 PCs total |
| 86 | 0x465250 |
| 87 | 0x4652a0 |
| 88 | |
| 89 | You can then filter the output of ``sancov.py`` through ``addr2line --exe |
| 90 | ObjectFile`` or ``llvm-symbolizer --obj ObjectFile`` to get file names and line |
| 91 | numbers: |
| 92 | |
| 93 | .. code-block:: console |
| 94 | |
| 95 | % sancov.py print a.out.22679.sancov a.out.22673.sancov 2> /dev/null | llvm-symbolizer --obj a.out |
| 96 | cov.cc:3 |
| 97 | cov.cc:5 |
| 98 | |
Mike Aizatsky | 3828cbb | 2016-01-27 23:56:12 +0000 | [diff] [blame] | 99 | Sancov Tool |
| 100 | =========== |
| 101 | |
| 102 | A new experimental ``sancov`` tool is developed to process coverage files. |
| 103 | The tool is part of LLVM project and is currently supported only on Linux. |
Mike Aizatsky | a731ee3 | 2016-02-12 00:29:45 +0000 | [diff] [blame] | 104 | It can handle symbolization tasks autonomously without any extra support |
| 105 | from the environment. You need to pass .sancov files (named |
| 106 | ``<module_name>.<pid>.sancov`` and paths to all corresponding binary elf files. |
| 107 | Sancov matches these files using module names and binaries file names. |
Mike Aizatsky | 3828cbb | 2016-01-27 23:56:12 +0000 | [diff] [blame] | 108 | |
| 109 | .. code-block:: console |
| 110 | |
Mike Aizatsky | a731ee3 | 2016-02-12 00:29:45 +0000 | [diff] [blame] | 111 | USAGE: sancov [options] <action> (<binary file>|<.sancov file>)... |
Mike Aizatsky | 3828cbb | 2016-01-27 23:56:12 +0000 | [diff] [blame] | 112 | |
| 113 | Action (required) |
| 114 | -print - Print coverage addresses |
Sylvestre Ledru | be8f396 | 2016-02-14 20:20:58 +0000 | [diff] [blame] | 115 | -covered-functions - Print all covered functions. |
| 116 | -not-covered-functions - Print all not covered functions. |
Mike Aizatsky | a675e0e | 2016-09-30 21:02:56 +0000 | [diff] [blame] | 117 | -symbolize - Symbolizes the report. |
Mike Aizatsky | 3828cbb | 2016-01-27 23:56:12 +0000 | [diff] [blame] | 118 | |
| 119 | Options |
| 120 | -blacklist=<string> - Blacklist file (sanitizer blacklist format). |
| 121 | -demangle - Print demangled function name. |
Mike Aizatsky | 3828cbb | 2016-01-27 23:56:12 +0000 | [diff] [blame] | 122 | -strip_path_prefix=<string> - Strip this prefix from file paths in reports |
| 123 | |
| 124 | |
Mike Aizatsky | a675e0e | 2016-09-30 21:02:56 +0000 | [diff] [blame] | 125 | Coverage Reports (Experimental) |
Mike Aizatsky | 3828cbb | 2016-01-27 23:56:12 +0000 | [diff] [blame] | 126 | ================================ |
| 127 | |
Mike Aizatsky | a675e0e | 2016-09-30 21:02:56 +0000 | [diff] [blame] | 128 | ``.sancov`` files do not contain enough information to generate a source-level |
| 129 | coverage report. The missing information is contained |
| 130 | in debug info of the binary. Thus the ``.sancov`` has to be symbolized |
| 131 | to produce a ``.symcov`` file first: |
| 132 | |
| 133 | .. code-block:: console |
Kostya Serebryany | f74169c | 2016-09-30 21:57:10 +0000 | [diff] [blame] | 134 | |
Mike Aizatsky | a675e0e | 2016-09-30 21:02:56 +0000 | [diff] [blame] | 135 | sancov -symbolize my_program.123.sancov my_program > my_program.123.symcov |
| 136 | |
| 137 | The ``.symcov`` file can be browsed overlayed over the source code by |
Mike Aizatsky | a271d1a | 2016-10-04 19:19:16 +0000 | [diff] [blame] | 138 | running ``tools/sancov/coverage-report-server.py`` script that will start |
Mike Aizatsky | a675e0e | 2016-09-30 21:02:56 +0000 | [diff] [blame] | 139 | an HTTP server. |
Mike Aizatsky | 3828cbb | 2016-01-27 23:56:12 +0000 | [diff] [blame] | 140 | |
| 141 | |
Sergey Matveev | 07e2d28 | 2015-04-23 20:40:04 +0000 | [diff] [blame] | 142 | How good is the coverage? |
| 143 | ========================= |
| 144 | |
Sergey Matveev | ea558e0 | 2015-05-06 21:09:00 +0000 | [diff] [blame] | 145 | It is possible to find out which PCs are not covered, by subtracting the covered |
| 146 | set from the set of all instrumented PCs. The latter can be obtained by listing |
| 147 | all callsites of ``__sanitizer_cov()`` in the binary. On Linux, ``sancov.py`` |
| 148 | can do this for you. Just supply the path to binary and a list of covered PCs: |
Sergey Matveev | 07e2d28 | 2015-04-23 20:40:04 +0000 | [diff] [blame] | 149 | |
| 150 | .. code-block:: console |
| 151 | |
Sergey Matveev | ea558e0 | 2015-05-06 21:09:00 +0000 | [diff] [blame] | 152 | % sancov.py print a.out.12345.sancov > covered.txt |
| 153 | sancov.py: read 2 64-bit PCs from a.out.12345.sancov |
| 154 | sancov.py: 1 file merged; 2 PCs total |
| 155 | % sancov.py missing a.out < covered.txt |
| 156 | sancov.py: found 3 instrumented PCs in a.out |
| 157 | sancov.py: read 2 PCs from stdin |
| 158 | sancov.py: 1 PCs missing from coverage |
| 159 | 0x4cc61c |
Sergey Matveev | 07e2d28 | 2015-04-23 20:40:04 +0000 | [diff] [blame] | 160 | |
| 161 | Edge coverage |
| 162 | ============= |
| 163 | |
| 164 | Consider this code: |
| 165 | |
| 166 | .. code-block:: c++ |
| 167 | |
| 168 | void foo(int *a) { |
| 169 | if (a) |
| 170 | *a = 0; |
| 171 | } |
| 172 | |
| 173 | It contains 3 basic blocks, let's name them A, B, C: |
| 174 | |
| 175 | .. code-block:: none |
| 176 | |
| 177 | A |
| 178 | |\ |
| 179 | | \ |
| 180 | | B |
| 181 | | / |
| 182 | |/ |
| 183 | C |
| 184 | |
| 185 | If blocks A, B, and C are all covered we know for certain that the edges A=>B |
| 186 | and B=>C were executed, but we still don't know if the edge A=>C was executed. |
| 187 | Such edges of control flow graph are called |
| 188 | `critical <http://en.wikipedia.org/wiki/Control_flow_graph#Special_edges>`_. The |
Alexey Samsonov | 8fffba1 | 2015-05-07 23:04:19 +0000 | [diff] [blame] | 189 | edge-level coverage (``-fsanitize-coverage=edge``) simply splits all critical |
| 190 | edges by introducing new dummy blocks and then instruments those blocks: |
Sergey Matveev | 07e2d28 | 2015-04-23 20:40:04 +0000 | [diff] [blame] | 191 | |
| 192 | .. code-block:: none |
| 193 | |
| 194 | A |
| 195 | |\ |
| 196 | | \ |
| 197 | D B |
| 198 | | / |
| 199 | |/ |
| 200 | C |
| 201 | |
| 202 | Bitset |
| 203 | ====== |
| 204 | |
| 205 | When ``coverage_bitset=1`` run-time flag is given, the coverage will also be |
| 206 | dumped as a bitset (text file with 1 for blocks that have been executed and 0 |
| 207 | for blocks that were not). |
| 208 | |
| 209 | .. code-block:: console |
| 210 | |
Alexey Samsonov | 8fffba1 | 2015-05-07 23:04:19 +0000 | [diff] [blame] | 211 | % clang++ -fsanitize=address -fsanitize-coverage=edge cov.cc |
Sergey Matveev | 07e2d28 | 2015-04-23 20:40:04 +0000 | [diff] [blame] | 212 | % ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out |
| 213 | main |
| 214 | % ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out 1 |
| 215 | foo |
| 216 | main |
| 217 | % head *bitset* |
| 218 | ==> a.out.38214.bitset-sancov <== |
| 219 | 01101 |
| 220 | ==> a.out.6128.bitset-sancov <== |
| 221 | 11011% |
| 222 | |
| 223 | For a given executable the length of the bitset is always the same (well, |
| 224 | unless dlopen/dlclose come into play), so the bitset coverage can be |
| 225 | easily used for bitset-based corpus distillation. |
| 226 | |
| 227 | Caller-callee coverage |
| 228 | ====================== |
| 229 | |
| 230 | (Experimental!) |
| 231 | Every indirect function call is instrumented with a run-time function call that |
| 232 | captures caller and callee. At the shutdown time the process dumps a separate |
| 233 | file called ``caller-callee.PID.sancov`` which contains caller/callee pairs as |
| 234 | pairs of lines (odd lines are callers, even lines are callees) |
| 235 | |
| 236 | .. code-block:: console |
| 237 | |
| 238 | a.out 0x4a2e0c |
| 239 | a.out 0x4a6510 |
| 240 | a.out 0x4a2e0c |
| 241 | a.out 0x4a87f0 |
| 242 | |
| 243 | Current limitations: |
| 244 | |
| 245 | * Only the first 14 callees for every caller are recorded, the rest are silently |
| 246 | ignored. |
| 247 | * The output format is not very compact since caller and callee may reside in |
| 248 | different modules and we need to spell out the module names. |
| 249 | * The routine that dumps the output is not optimized for speed |
| 250 | * Only Linux x86_64 is tested so far. |
| 251 | * Sandboxes are not supported. |
| 252 | |
| 253 | Coverage counters |
| 254 | ================= |
| 255 | |
| 256 | This experimental feature is inspired by |
Aaron Ballman | 0f6f82a3 | 2016-02-22 13:09:36 +0000 | [diff] [blame] | 257 | `AFL <http://lcamtuf.coredump.cx/afl/technical_details.txt>`__'s coverage |
Sergey Matveev | 07e2d28 | 2015-04-23 20:40:04 +0000 | [diff] [blame] | 258 | instrumentation. With additional compile-time and run-time flags you can get |
| 259 | more sensitive coverage information. In addition to boolean values assigned to |
| 260 | every basic block (edge) the instrumentation will collect imprecise counters. |
| 261 | On exit, every counter will be mapped to a 8-bit bitset representing counter |
| 262 | ranges: ``1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+`` and those 8-bit bitsets will |
| 263 | be dumped to disk. |
| 264 | |
| 265 | .. code-block:: console |
| 266 | |
Alexey Samsonov | 8fffba1 | 2015-05-07 23:04:19 +0000 | [diff] [blame] | 267 | % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=edge,8bit-counters |
Sergey Matveev | 07e2d28 | 2015-04-23 20:40:04 +0000 | [diff] [blame] | 268 | % ASAN_OPTIONS="coverage=1:coverage_counters=1" ./a.out |
| 269 | % ls -l *counters-sancov |
| 270 | ... a.out.17110.counters-sancov |
| 271 | % xxd *counters-sancov |
| 272 | 0000000: 0001 0100 01 |
| 273 | |
| 274 | These counters may also be used for in-process coverage-guided fuzzers. See |
| 275 | ``include/sanitizer/coverage_interface.h``: |
| 276 | |
| 277 | .. code-block:: c++ |
| 278 | |
| 279 | // The coverage instrumentation may optionally provide imprecise counters. |
| 280 | // Rather than exposing the counter values to the user we instead map |
| 281 | // the counters to a bitset. |
| 282 | // Every counter is associated with 8 bits in the bitset. |
| 283 | // We define 8 value ranges: 1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+ |
| 284 | // The i-th bit is set to 1 if the counter value is in the i-th range. |
| 285 | // This counter-based coverage implementation is *not* thread-safe. |
| 286 | |
| 287 | // Returns the number of registered coverage counters. |
| 288 | uintptr_t __sanitizer_get_number_of_counters(); |
| 289 | // Updates the counter 'bitset', clears the counters and returns the number of |
| 290 | // new bits in 'bitset'. |
| 291 | // If 'bitset' is nullptr, only clears the counters. |
| 292 | // Otherwise 'bitset' should be at least |
| 293 | // __sanitizer_get_number_of_counters bytes long and 8-aligned. |
| 294 | uintptr_t |
| 295 | __sanitizer_update_counter_bitset_and_clear_counters(uint8_t *bitset); |
| 296 | |
Kostya Serebryany | 5ce8179 | 2015-12-02 02:08:26 +0000 | [diff] [blame] | 297 | Tracing basic blocks |
| 298 | ==================== |
Kostya Serebryany | 6453786 | 2016-04-18 21:28:37 +0000 | [diff] [blame] | 299 | Experimental support for basic block (or edge) tracing. |
Kostya Serebryany | 5ce8179 | 2015-12-02 02:08:26 +0000 | [diff] [blame] | 300 | With ``-fsanitize-coverage=trace-bb`` the compiler will insert |
| 301 | ``__sanitizer_cov_trace_basic_block(s32 *id)`` before every function, basic block, or edge |
| 302 | (depending on the value of ``-fsanitize-coverage=[func,bb,edge]``). |
Kostya Serebryany | 6453786 | 2016-04-18 21:28:37 +0000 | [diff] [blame] | 303 | Example: |
| 304 | |
| 305 | .. code-block:: console |
| 306 | |
| 307 | % clang -g -fsanitize=address -fsanitize-coverage=edge,trace-bb foo.cc |
| 308 | % ASAN_OPTIONS=coverage=1 ./a.out |
| 309 | |
| 310 | This will produce two files after the process exit: |
| 311 | `trace-points.PID.sancov` and `trace-events.PID.sancov`. |
| 312 | The first file will contain a textual description of all the instrumented points in the program |
| 313 | in the form that you can feed into llvm-symbolizer (e.g. `a.out 0x4dca89`), one per line. |
| 314 | The second file will contain the actual execution trace as a sequence of 4-byte integers |
| 315 | -- these integers are the indices into the array of instrumented points (the first file). |
| 316 | |
| 317 | Basic block tracing is currently supported only for single-threaded applications. |
| 318 | |
Kostya Serebryany | 5ce8179 | 2015-12-02 02:08:26 +0000 | [diff] [blame] | 319 | |
Kostya Serebryany | d4590c7 | 2016-02-17 21:34:43 +0000 | [diff] [blame] | 320 | Tracing PCs |
| 321 | =========== |
| 322 | *Experimental* feature similar to tracing basic blocks, but with a different API. |
Kostya Serebryany | 52e8649 | 2016-02-18 00:49:23 +0000 | [diff] [blame] | 323 | With ``-fsanitize-coverage=trace-pc`` the compiler will insert |
| 324 | ``__sanitizer_cov_trace_pc()`` on every edge. |
| 325 | With an additional ``...=trace-pc,indirect-calls`` flag |
Kostya Serebryany | d4590c7 | 2016-02-17 21:34:43 +0000 | [diff] [blame] | 326 | ``__sanitizer_cov_trace_pc_indirect(void *callee)`` will be inserted on every indirect call. |
| 327 | These callbacks are not implemented in the Sanitizer run-time and should be defined |
Kostya Serebryany | 52e8649 | 2016-02-18 00:49:23 +0000 | [diff] [blame] | 328 | by the user. So, these flags do not require the other sanitizer to be used. |
| 329 | This mechanism is used for fuzzing the Linux kernel (https://github.com/google/syzkaller) |
Aaron Ballman | 0f6f82a3 | 2016-02-22 13:09:36 +0000 | [diff] [blame] | 330 | and can be used with `AFL <http://lcamtuf.coredump.cx/afl>`__. |
Kostya Serebryany | d4590c7 | 2016-02-17 21:34:43 +0000 | [diff] [blame] | 331 | |
Kostya Serebryany | 60cdd61 | 2016-09-14 01:39:49 +0000 | [diff] [blame] | 332 | Tracing PCs with guards |
| 333 | ======================= |
Kostya Serebryany | 66a9c17 | 2016-09-15 22:11:08 +0000 | [diff] [blame] | 334 | Another *experimental* feature that tries to combine the functionality of `trace-pc`, |
| 335 | `8bit-counters` and boolean coverage. |
Kostya Serebryany | 60cdd61 | 2016-09-14 01:39:49 +0000 | [diff] [blame] | 336 | |
| 337 | With ``-fsanitize-coverage=trace-pc-guard`` the compiler will insert the following code |
| 338 | on every edge: |
| 339 | |
| 340 | .. code-block:: none |
| 341 | |
Kostya Serebryany | 8e781a8 | 2016-09-18 04:52:23 +0000 | [diff] [blame] | 342 | if (guard_variable) |
Kostya Serebryany | 60cdd61 | 2016-09-14 01:39:49 +0000 | [diff] [blame] | 343 | __sanitizer_cov_trace_pc_guard(&guard_variable) |
| 344 | |
Kostya Serebryany | a9b0dd0 | 2016-09-29 17:43:24 +0000 | [diff] [blame] | 345 | Every edge will have its own `guard_variable` (uint32_t). |
Kostya Serebryany | 66a9c17 | 2016-09-15 22:11:08 +0000 | [diff] [blame] | 346 | |
Kostya Serebryany | 60cdd61 | 2016-09-14 01:39:49 +0000 | [diff] [blame] | 347 | The compler will also insert a module constructor that will call |
| 348 | |
| 349 | .. code-block:: c++ |
| 350 | |
Kostya Serebryany | 8ad4155 | 2016-09-17 05:03:05 +0000 | [diff] [blame] | 351 | // The guards are [start, stop). |
| 352 | // This function may be called multiple times with the same values of start/stop. |
Kostya Serebryany | 6bb5498 | 2016-09-29 18:34:40 +0000 | [diff] [blame] | 353 | __sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop); |
Kostya Serebryany | 60cdd61 | 2016-09-14 01:39:49 +0000 | [diff] [blame] | 354 | |
Kostya Serebryany | 8ad4155 | 2016-09-17 05:03:05 +0000 | [diff] [blame] | 355 | Similarly to `trace-pc,indirect-calls`, with `trace-pc-guards,indirect-calls` |
| 356 | ``__sanitizer_cov_trace_pc_indirect(void *callee)`` will be inserted on every indirect call. |
| 357 | |
| 358 | The functions `__sanitizer_cov_trace_pc_*` should be defined by the user. |
Kostya Serebryany | 60cdd61 | 2016-09-14 01:39:49 +0000 | [diff] [blame] | 359 | |
Kostya Serebryany | d6ae22a | 2016-09-29 18:58:17 +0000 | [diff] [blame] | 360 | Example: |
| 361 | |
| 362 | .. code-block:: c++ |
| 363 | |
| 364 | // trace-pc-guard-cb.cc |
| 365 | #include <stdint.h> |
| 366 | #include <stdio.h> |
| 367 | #include <sanitizer/coverage_interface.h> |
| 368 | |
| 369 | // This callback is inserted by the compiler as a module constructor |
| 370 | // into every compilation unit. 'start' and 'stop' correspond to the |
| 371 | // beginning and end of the section with the guards for the entire |
| 372 | // binary (executable or DSO) and so it will be called multiple times |
| 373 | // with the same parameters. |
| 374 | extern "C" void __sanitizer_cov_trace_pc_guard_init(uint32_t *start, |
| 375 | uint32_t *stop) { |
| 376 | static uint64_t N; // Counter for the guards. |
| 377 | if (start == stop || *start) return; // Initialize only once. |
| 378 | printf("INIT: %p %p\n", start, stop); |
| 379 | for (uint32_t *x = start; x < stop; x++) |
| 380 | *x = ++N; // Guards should start from 1. |
| 381 | } |
| 382 | |
| 383 | // This callback is inserted by the compiler on every edge in the |
| 384 | // control flow (some optimizations apply). |
| 385 | // Typically, the compiler will emit the code like this: |
| 386 | // if(*guard) |
| 387 | // __sanitizer_cov_trace_pc_guard(guard); |
| 388 | // But for large functions it will emit a simple call: |
| 389 | // __sanitizer_cov_trace_pc_guard(guard); |
| 390 | extern "C" void __sanitizer_cov_trace_pc_guard(uint32_t *guard) { |
| 391 | if (!*guard) return; // Duplicate the guard check. |
| 392 | // If you set *guard to 0 this code will not be called again for this edge. |
| 393 | // Now you can get the PC and do whatever you want: |
| 394 | // store it somewhere or symbolize it and print right away. |
| 395 | // The values of `*guard` are as you set them in |
Kostya Serebryany | 851cb98 | 2016-09-29 19:06:09 +0000 | [diff] [blame] | 396 | // __sanitizer_cov_trace_pc_guard_init and so you can make them consecutive |
Kostya Serebryany | d6ae22a | 2016-09-29 18:58:17 +0000 | [diff] [blame] | 397 | // and use them to dereference an array or a bit vector. |
| 398 | void *PC = __builtin_return_address(0); |
| 399 | char PcDescr[1024]; |
| 400 | // This function is a part of the sanitizer run-time. |
| 401 | // To use it, link with AddressSanitizer or other sanitizer. |
| 402 | __sanitizer_symbolize_pc(PC, "%p %F %L", PcDescr, sizeof(PcDescr)); |
| 403 | printf("guard: %p %x PC %s\n", guard, *guard, PcDescr); |
| 404 | } |
| 405 | |
| 406 | .. code-block:: c++ |
| 407 | |
| 408 | // trace-pc-guard-example.cc |
| 409 | void foo() { } |
| 410 | int main(int argc, char **argv) { |
| 411 | if (argc > 1) foo(); |
| 412 | } |
| 413 | |
| 414 | .. code-block:: console |
| 415 | |
| 416 | clang++ -g -fsanitize-coverage=trace-pc-guard trace-pc-guard-example.cc -c |
| 417 | clang++ trace-pc-guard-cb.cc trace-pc-guard-example.o -fsanitize=address |
| 418 | ASAN_OPTIONS=strip_path_prefix=`pwd`/ ./a.out |
| 419 | |
| 420 | .. code-block:: console |
| 421 | |
| 422 | INIT: 0x71bcd0 0x71bce0 |
| 423 | guard: 0x71bcd4 2 PC 0x4ecd5b in main trace-pc-guard-example.cc:2 |
| 424 | guard: 0x71bcd8 3 PC 0x4ecd9e in main trace-pc-guard-example.cc:3:7 |
| 425 | |
Kostya Serebryany | 851cb98 | 2016-09-29 19:06:09 +0000 | [diff] [blame] | 426 | .. code-block:: console |
| 427 | |
| 428 | ASAN_OPTIONS=strip_path_prefix=`pwd`/ ./a.out with-foo |
| 429 | |
| 430 | |
| 431 | .. code-block:: console |
| 432 | |
| 433 | INIT: 0x71bcd0 0x71bce0 |
| 434 | guard: 0x71bcd4 2 PC 0x4ecd5b in main trace-pc-guard-example.cc:3 |
| 435 | guard: 0x71bcdc 4 PC 0x4ecdc7 in main trace-pc-guard-example.cc:4:17 |
| 436 | guard: 0x71bcd0 1 PC 0x4ecd20 in foo() trace-pc-guard-example.cc:2:14 |
| 437 | |
Kostya Serebryany | d6ae22a | 2016-09-29 18:58:17 +0000 | [diff] [blame] | 438 | |
Kostya Serebryany | b17e298 | 2015-07-31 21:48:10 +0000 | [diff] [blame] | 439 | Tracing data flow |
| 440 | ================= |
| 441 | |
Kostya Serebryany | 3b41971 | 2016-08-30 01:27:03 +0000 | [diff] [blame] | 442 | Support for data-flow-guided fuzzing. |
Kostya Serebryany | b17e298 | 2015-07-31 21:48:10 +0000 | [diff] [blame] | 443 | With ``-fsanitize-coverage=trace-cmp`` the compiler will insert extra instrumentation |
| 444 | around comparison instructions and switch statements. |
Kostya Serebryany | 3b41971 | 2016-08-30 01:27:03 +0000 | [diff] [blame] | 445 | Similarly, with ``-fsanitize-coverage=trace-div`` the compiler will instrument |
| 446 | integer division instructions (to capture the right argument of division) |
| 447 | and with ``-fsanitize-coverage=trace-gep`` -- |
| 448 | the `LLVM GEP instructions <http://llvm.org/docs/GetElementPtr.html>`_ |
| 449 | (to capture array indices). |
Kostya Serebryany | b17e298 | 2015-07-31 21:48:10 +0000 | [diff] [blame] | 450 | |
| 451 | .. code-block:: c++ |
| 452 | |
| 453 | // Called before a comparison instruction. |
Kostya Serebryany | b17e298 | 2015-07-31 21:48:10 +0000 | [diff] [blame] | 454 | // Arg1 and Arg2 are arguments of the comparison. |
Kostya Serebryany | 070bcb0 | 2016-08-18 01:26:36 +0000 | [diff] [blame] | 455 | void __sanitizer_cov_trace_cmp1(uint8_t Arg1, uint8_t Arg2); |
| 456 | void __sanitizer_cov_trace_cmp2(uint16_t Arg1, uint16_t Arg2); |
| 457 | void __sanitizer_cov_trace_cmp4(uint32_t Arg1, uint32_t Arg2); |
| 458 | void __sanitizer_cov_trace_cmp8(uint64_t Arg1, uint64_t Arg2); |
Kostya Serebryany | b17e298 | 2015-07-31 21:48:10 +0000 | [diff] [blame] | 459 | |
| 460 | // Called before a switch statement. |
| 461 | // Val is the switch operand. |
| 462 | // Cases[0] is the number of case constants. |
| 463 | // Cases[1] is the size of Val in bits. |
| 464 | // Cases[2:] are the case constants. |
| 465 | void __sanitizer_cov_trace_switch(uint64_t Val, uint64_t *Cases); |
| 466 | |
Kostya Serebryany | 3b41971 | 2016-08-30 01:27:03 +0000 | [diff] [blame] | 467 | // Called before a division statement. |
| 468 | // Val is the second argument of division. |
| 469 | void __sanitizer_cov_trace_div4(uint32_t Val); |
| 470 | void __sanitizer_cov_trace_div8(uint64_t Val); |
| 471 | |
| 472 | // Called before a GetElemementPtr (GEP) instruction |
| 473 | // for every non-constant array index. |
| 474 | void __sanitizer_cov_trace_gep(uintptr_t Idx); |
| 475 | |
| 476 | |
Kostya Serebryany | b17e298 | 2015-07-31 21:48:10 +0000 | [diff] [blame] | 477 | This interface is a subject to change. |
Kostya Serebryany | a94e6e7 | 2015-11-30 22:17:19 +0000 | [diff] [blame] | 478 | The current implementation is not thread-safe and thus can be safely used only for single-threaded targets. |
Kostya Serebryany | b17e298 | 2015-07-31 21:48:10 +0000 | [diff] [blame] | 479 | |
Sergey Matveev | 07e2d28 | 2015-04-23 20:40:04 +0000 | [diff] [blame] | 480 | Output directory |
| 481 | ================ |
| 482 | |
| 483 | By default, .sancov files are created in the current working directory. |
| 484 | This can be changed with ``ASAN_OPTIONS=coverage_dir=/path``: |
| 485 | |
| 486 | .. code-block:: console |
| 487 | |
| 488 | % ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo |
| 489 | % ls -l /tmp/cov/*sancov |
| 490 | -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov |
| 491 | -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov |
| 492 | |
| 493 | Sudden death |
| 494 | ============ |
| 495 | |
| 496 | Normally, coverage data is collected in memory and saved to disk when the |
| 497 | program exits (with an ``atexit()`` handler), when a SIGSEGV is caught, or when |
| 498 | ``__sanitizer_cov_dump()`` is called. |
| 499 | |
| 500 | If the program ends with a signal that ASan does not handle (or can not handle |
| 501 | at all, like SIGKILL), coverage data will be lost. This is a big problem on |
| 502 | Android, where SIGKILL is a normal way of evicting applications from memory. |
| 503 | |
| 504 | With ``ASAN_OPTIONS=coverage=1:coverage_direct=1`` coverage data is written to a |
| 505 | memory-mapped file as soon as it collected. |
| 506 | |
| 507 | .. code-block:: console |
| 508 | |
| 509 | % ASAN_OPTIONS="coverage=1:coverage_direct=1" ./a.out |
| 510 | main |
| 511 | % ls |
| 512 | 7036.sancov.map 7036.sancov.raw a.out |
| 513 | % sancov.py rawunpack 7036.sancov.raw |
| 514 | sancov.py: reading map 7036.sancov.map |
| 515 | sancov.py: unpacking 7036.sancov.raw |
| 516 | writing 1 PCs to a.out.7036.sancov |
| 517 | % sancov.py print a.out.7036.sancov |
| 518 | sancov.py: read 1 PCs from a.out.7036.sancov |
| 519 | sancov.py: 1 files merged; 1 PCs total |
| 520 | 0x4b2bae |
| 521 | |
| 522 | Note that on 64-bit platforms, this method writes 2x more data than the default, |
| 523 | because it stores full PC values instead of 32-bit offsets. |
| 524 | |
| 525 | In-process fuzzing |
| 526 | ================== |
| 527 | |
| 528 | Coverage data could be useful for fuzzers and sometimes it is preferable to run |
| 529 | a fuzzer in the same process as the code being fuzzed (in-process fuzzer). |
| 530 | |
| 531 | You can use ``__sanitizer_get_total_unique_coverage()`` from |
| 532 | ``<sanitizer/coverage_interface.h>`` which returns the number of currently |
| 533 | covered entities in the program. This will tell the fuzzer if the coverage has |
| 534 | increased after testing every new input. |
| 535 | |
| 536 | If a fuzzer finds a bug in the ASan run, you will need to save the reproducer |
| 537 | before exiting the process. Use ``__asan_set_death_callback`` from |
| 538 | ``<sanitizer/asan_interface.h>`` to do that. |
| 539 | |
| 540 | An example of such fuzzer can be found in `the LLVM tree |
| 541 | <http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup>`_. |
| 542 | |
| 543 | Performance |
| 544 | =========== |
| 545 | |
| 546 | This coverage implementation is **fast**. With function-level coverage |
Alexey Samsonov | 8fffba1 | 2015-05-07 23:04:19 +0000 | [diff] [blame] | 547 | (``-fsanitize-coverage=func``) the overhead is not measurable. With |
| 548 | basic-block-level coverage (``-fsanitize-coverage=bb``) the overhead varies |
Sergey Matveev | 07e2d28 | 2015-04-23 20:40:04 +0000 | [diff] [blame] | 549 | between 0 and 25%. |
| 550 | |
| 551 | ============== ========= ========= ========= ========= ========= ========= |
| 552 | benchmark cov0 cov1 diff 0-1 cov2 diff 0-2 diff 1-2 |
| 553 | ============== ========= ========= ========= ========= ========= ========= |
| 554 | 400.perlbench 1296.00 1307.00 1.01 1465.00 1.13 1.12 |
| 555 | 401.bzip2 858.00 854.00 1.00 1010.00 1.18 1.18 |
| 556 | 403.gcc 613.00 617.00 1.01 683.00 1.11 1.11 |
| 557 | 429.mcf 605.00 582.00 0.96 610.00 1.01 1.05 |
| 558 | 445.gobmk 896.00 880.00 0.98 1050.00 1.17 1.19 |
| 559 | 456.hmmer 892.00 892.00 1.00 918.00 1.03 1.03 |
| 560 | 458.sjeng 995.00 1009.00 1.01 1217.00 1.22 1.21 |
| 561 | 462.libquantum 497.00 492.00 0.99 534.00 1.07 1.09 |
| 562 | 464.h264ref 1461.00 1467.00 1.00 1543.00 1.06 1.05 |
| 563 | 471.omnetpp 575.00 590.00 1.03 660.00 1.15 1.12 |
| 564 | 473.astar 658.00 652.00 0.99 715.00 1.09 1.10 |
| 565 | 483.xalancbmk 471.00 491.00 1.04 582.00 1.24 1.19 |
| 566 | 433.milc 616.00 627.00 1.02 627.00 1.02 1.00 |
| 567 | 444.namd 602.00 601.00 1.00 654.00 1.09 1.09 |
| 568 | 447.dealII 630.00 634.00 1.01 653.00 1.04 1.03 |
| 569 | 450.soplex 365.00 368.00 1.01 395.00 1.08 1.07 |
| 570 | 453.povray 427.00 434.00 1.02 495.00 1.16 1.14 |
| 571 | 470.lbm 357.00 375.00 1.05 370.00 1.04 0.99 |
| 572 | 482.sphinx3 927.00 928.00 1.00 1000.00 1.08 1.08 |
| 573 | ============== ========= ========= ========= ========= ========= ========= |
| 574 | |
| 575 | Why another coverage? |
| 576 | ===================== |
| 577 | |
| 578 | Why did we implement yet another code coverage? |
| 579 | * We needed something that is lightning fast, plays well with |
| 580 | AddressSanitizer, and does not significantly increase the binary size. |
| 581 | * Traditional coverage implementations based in global counters |
| 582 | `suffer from contention on counters |
| 583 | <https://groups.google.com/forum/#!topic/llvm-dev/cDqYgnxNEhY>`_. |