blob: f7ec65fdd155e9ae2835c898a3e71cbee3687a39 [file] [log] [blame]
Sergey Matveev33e32242015-04-23 21:29:37 +00001=================
Sergey Matveev07e2d282015-04-23 20:40:04 +00002SanitizerCoverage
Sergey Matveev33e32242015-04-23 21:29:37 +00003=================
Sergey Matveev07e2d282015-04-23 20:40:04 +00004
5.. contents::
6 :local:
7
8Introduction
9============
10
11Sanitizer tools have a very simple code coverage tool built in. It allows to
12get function-level, basic-block-level, and edge-level coverage at a very low
13cost.
14
15How to build and run
16====================
17
18SanitizerCoverage can be used with :doc:`AddressSanitizer`,
Evgeniy Stepanov5b49eb42016-06-14 21:33:40 +000019:doc:`LeakSanitizer`, :doc:`MemorySanitizer`,
20UndefinedBehaviorSanitizer, or without any sanitizer. Pass one of the
21following compile-time flags:
Sergey Matveev07e2d282015-04-23 20:40:04 +000022
Alexey Samsonov8fffba12015-05-07 23:04:19 +000023* ``-fsanitize-coverage=func`` for function-level coverage (very fast).
24* ``-fsanitize-coverage=bb`` for basic-block-level coverage (may add up to 30%
Sergey Matveev07e2d282015-04-23 20:40:04 +000025 **extra** slowdown).
Alexey Samsonov8fffba12015-05-07 23:04:19 +000026* ``-fsanitize-coverage=edge`` for edge-level coverage (up to 40% slowdown).
Sergey Matveev07e2d282015-04-23 20:40:04 +000027
Alexey Samsonov8fffba12015-05-07 23:04:19 +000028You may also specify ``-fsanitize-coverage=indirect-calls`` for
29additional `caller-callee coverage`_.
Sergey Matveev07e2d282015-04-23 20:40:04 +000030
Evgeniy Stepanov5b49eb42016-06-14 21:33:40 +000031At run time, pass ``coverage=1`` in ``ASAN_OPTIONS``,
32``LSAN_OPTIONS``, ``MSAN_OPTIONS`` or ``UBSAN_OPTIONS``, as
33appropriate. For the standalone coverage mode, use ``UBSAN_OPTIONS``.
Alexey Samsonov8fffba12015-05-07 23:04:19 +000034
35To get `Coverage counters`_, add ``-fsanitize-coverage=8bit-counters``
Sergey Matveev07e2d282015-04-23 20:40:04 +000036to one of the above compile-time flags. At runtime, use
37``*SAN_OPTIONS=coverage=1:coverage_counters=1``.
38
39Example:
40
41.. code-block:: console
42
43 % cat -n cov.cc
44 1 #include <stdio.h>
45 2 __attribute__((noinline))
46 3 void foo() { printf("foo\n"); }
47 4
48 5 int main(int argc, char **argv) {
49 6 if (argc == 2)
50 7 foo();
51 8 printf("main\n");
52 9 }
Alexey Samsonov8fffba12015-05-07 23:04:19 +000053 % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=func
Sergey Matveev07e2d282015-04-23 20:40:04 +000054 % ASAN_OPTIONS=coverage=1 ./a.out; ls -l *sancov
55 main
56 -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
57 % ASAN_OPTIONS=coverage=1 ./a.out foo ; ls -l *sancov
58 foo
59 main
60 -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
61 -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
62
63Every time you run an executable instrumented with SanitizerCoverage
64one ``*.sancov`` file is created during the process shutdown.
65If the executable is dynamically linked against instrumented DSOs,
66one ``*.sancov`` file will be also created for every DSO.
67
68Postprocessing
69==============
70
71The format of ``*.sancov`` files is very simple: the first 8 bytes is the magic,
72one of ``0xC0BFFFFFFFFFFF64`` and ``0xC0BFFFFFFFFFFF32``. The last byte of the
73magic defines the size of the following offsets. The rest of the data is the
74offsets in the corresponding binary/DSO that were executed during the run.
75
76A simple script
77``$LLVM/projects/compiler-rt/lib/sanitizer_common/scripts/sancov.py`` is
78provided to dump these offsets.
79
80.. code-block:: console
81
82 % sancov.py print a.out.22679.sancov a.out.22673.sancov
83 sancov.py: read 2 PCs from a.out.22679.sancov
84 sancov.py: read 1 PCs from a.out.22673.sancov
85 sancov.py: 2 files merged; 2 PCs total
86 0x465250
87 0x4652a0
88
89You can then filter the output of ``sancov.py`` through ``addr2line --exe
90ObjectFile`` or ``llvm-symbolizer --obj ObjectFile`` to get file names and line
91numbers:
92
93.. code-block:: console
94
95 % sancov.py print a.out.22679.sancov a.out.22673.sancov 2> /dev/null | llvm-symbolizer --obj a.out
96 cov.cc:3
97 cov.cc:5
98
Mike Aizatsky3828cbb2016-01-27 23:56:12 +000099Sancov Tool
100===========
101
102A new experimental ``sancov`` tool is developed to process coverage files.
103The tool is part of LLVM project and is currently supported only on Linux.
Mike Aizatskya731ee32016-02-12 00:29:45 +0000104It can handle symbolization tasks autonomously without any extra support
105from the environment. You need to pass .sancov files (named
106``<module_name>.<pid>.sancov`` and paths to all corresponding binary elf files.
107Sancov matches these files using module names and binaries file names.
Mike Aizatsky3828cbb2016-01-27 23:56:12 +0000108
109.. code-block:: console
110
Mike Aizatskya731ee32016-02-12 00:29:45 +0000111 USAGE: sancov [options] <action> (<binary file>|<.sancov file>)...
Mike Aizatsky3828cbb2016-01-27 23:56:12 +0000112
113 Action (required)
114 -print - Print coverage addresses
Sylvestre Ledrube8f3962016-02-14 20:20:58 +0000115 -covered-functions - Print all covered functions.
116 -not-covered-functions - Print all not covered functions.
Mike Aizatsky3828cbb2016-01-27 23:56:12 +0000117 -html-report - Print HTML coverage report.
118
119 Options
120 -blacklist=<string> - Blacklist file (sanitizer blacklist format).
121 -demangle - Print demangled function name.
Mike Aizatsky3828cbb2016-01-27 23:56:12 +0000122 -strip_path_prefix=<string> - Strip this prefix from file paths in reports
123
124
125Automatic HTML Report Generation
126================================
127
128If ``*SAN_OPTIONS`` contains ``html_cov_report=1`` option set, then html
129coverage report would be automatically generated alongside the coverage files.
130The ``sancov`` binary should be present in ``PATH`` or
131``sancov_path=<path_to_sancov`` option can be used to specify tool location.
132
133
Sergey Matveev07e2d282015-04-23 20:40:04 +0000134How good is the coverage?
135=========================
136
Sergey Matveevea558e02015-05-06 21:09:00 +0000137It is possible to find out which PCs are not covered, by subtracting the covered
138set from the set of all instrumented PCs. The latter can be obtained by listing
139all callsites of ``__sanitizer_cov()`` in the binary. On Linux, ``sancov.py``
140can do this for you. Just supply the path to binary and a list of covered PCs:
Sergey Matveev07e2d282015-04-23 20:40:04 +0000141
142.. code-block:: console
143
Sergey Matveevea558e02015-05-06 21:09:00 +0000144 % sancov.py print a.out.12345.sancov > covered.txt
145 sancov.py: read 2 64-bit PCs from a.out.12345.sancov
146 sancov.py: 1 file merged; 2 PCs total
147 % sancov.py missing a.out < covered.txt
148 sancov.py: found 3 instrumented PCs in a.out
149 sancov.py: read 2 PCs from stdin
150 sancov.py: 1 PCs missing from coverage
151 0x4cc61c
Sergey Matveev07e2d282015-04-23 20:40:04 +0000152
153Edge coverage
154=============
155
156Consider this code:
157
158.. code-block:: c++
159
160 void foo(int *a) {
161 if (a)
162 *a = 0;
163 }
164
165It contains 3 basic blocks, let's name them A, B, C:
166
167.. code-block:: none
168
169 A
170 |\
171 | \
172 | B
173 | /
174 |/
175 C
176
177If blocks A, B, and C are all covered we know for certain that the edges A=>B
178and B=>C were executed, but we still don't know if the edge A=>C was executed.
179Such edges of control flow graph are called
180`critical <http://en.wikipedia.org/wiki/Control_flow_graph#Special_edges>`_. The
Alexey Samsonov8fffba12015-05-07 23:04:19 +0000181edge-level coverage (``-fsanitize-coverage=edge``) simply splits all critical
182edges by introducing new dummy blocks and then instruments those blocks:
Sergey Matveev07e2d282015-04-23 20:40:04 +0000183
184.. code-block:: none
185
186 A
187 |\
188 | \
189 D B
190 | /
191 |/
192 C
193
194Bitset
195======
196
197When ``coverage_bitset=1`` run-time flag is given, the coverage will also be
198dumped as a bitset (text file with 1 for blocks that have been executed and 0
199for blocks that were not).
200
201.. code-block:: console
202
Alexey Samsonov8fffba12015-05-07 23:04:19 +0000203 % clang++ -fsanitize=address -fsanitize-coverage=edge cov.cc
Sergey Matveev07e2d282015-04-23 20:40:04 +0000204 % ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out
205 main
206 % ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out 1
207 foo
208 main
209 % head *bitset*
210 ==> a.out.38214.bitset-sancov <==
211 01101
212 ==> a.out.6128.bitset-sancov <==
213 11011%
214
215For a given executable the length of the bitset is always the same (well,
216unless dlopen/dlclose come into play), so the bitset coverage can be
217easily used for bitset-based corpus distillation.
218
219Caller-callee coverage
220======================
221
222(Experimental!)
223Every indirect function call is instrumented with a run-time function call that
224captures caller and callee. At the shutdown time the process dumps a separate
225file called ``caller-callee.PID.sancov`` which contains caller/callee pairs as
226pairs of lines (odd lines are callers, even lines are callees)
227
228.. code-block:: console
229
230 a.out 0x4a2e0c
231 a.out 0x4a6510
232 a.out 0x4a2e0c
233 a.out 0x4a87f0
234
235Current limitations:
236
237* Only the first 14 callees for every caller are recorded, the rest are silently
238 ignored.
239* The output format is not very compact since caller and callee may reside in
240 different modules and we need to spell out the module names.
241* The routine that dumps the output is not optimized for speed
242* Only Linux x86_64 is tested so far.
243* Sandboxes are not supported.
244
245Coverage counters
246=================
247
248This experimental feature is inspired by
Aaron Ballman0f6f82a32016-02-22 13:09:36 +0000249`AFL <http://lcamtuf.coredump.cx/afl/technical_details.txt>`__'s coverage
Sergey Matveev07e2d282015-04-23 20:40:04 +0000250instrumentation. With additional compile-time and run-time flags you can get
251more sensitive coverage information. In addition to boolean values assigned to
252every basic block (edge) the instrumentation will collect imprecise counters.
253On exit, every counter will be mapped to a 8-bit bitset representing counter
254ranges: ``1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+`` and those 8-bit bitsets will
255be dumped to disk.
256
257.. code-block:: console
258
Alexey Samsonov8fffba12015-05-07 23:04:19 +0000259 % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=edge,8bit-counters
Sergey Matveev07e2d282015-04-23 20:40:04 +0000260 % ASAN_OPTIONS="coverage=1:coverage_counters=1" ./a.out
261 % ls -l *counters-sancov
262 ... a.out.17110.counters-sancov
263 % xxd *counters-sancov
264 0000000: 0001 0100 01
265
266These counters may also be used for in-process coverage-guided fuzzers. See
267``include/sanitizer/coverage_interface.h``:
268
269.. code-block:: c++
270
271 // The coverage instrumentation may optionally provide imprecise counters.
272 // Rather than exposing the counter values to the user we instead map
273 // the counters to a bitset.
274 // Every counter is associated with 8 bits in the bitset.
275 // We define 8 value ranges: 1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+
276 // The i-th bit is set to 1 if the counter value is in the i-th range.
277 // This counter-based coverage implementation is *not* thread-safe.
278
279 // Returns the number of registered coverage counters.
280 uintptr_t __sanitizer_get_number_of_counters();
281 // Updates the counter 'bitset', clears the counters and returns the number of
282 // new bits in 'bitset'.
283 // If 'bitset' is nullptr, only clears the counters.
284 // Otherwise 'bitset' should be at least
285 // __sanitizer_get_number_of_counters bytes long and 8-aligned.
286 uintptr_t
287 __sanitizer_update_counter_bitset_and_clear_counters(uint8_t *bitset);
288
Kostya Serebryany5ce81792015-12-02 02:08:26 +0000289Tracing basic blocks
290====================
Kostya Serebryany64537862016-04-18 21:28:37 +0000291Experimental support for basic block (or edge) tracing.
Kostya Serebryany5ce81792015-12-02 02:08:26 +0000292With ``-fsanitize-coverage=trace-bb`` the compiler will insert
293``__sanitizer_cov_trace_basic_block(s32 *id)`` before every function, basic block, or edge
294(depending on the value of ``-fsanitize-coverage=[func,bb,edge]``).
Kostya Serebryany64537862016-04-18 21:28:37 +0000295Example:
296
297.. code-block:: console
298
299 % clang -g -fsanitize=address -fsanitize-coverage=edge,trace-bb foo.cc
300 % ASAN_OPTIONS=coverage=1 ./a.out
301
302This will produce two files after the process exit:
303`trace-points.PID.sancov` and `trace-events.PID.sancov`.
304The first file will contain a textual description of all the instrumented points in the program
305in the form that you can feed into llvm-symbolizer (e.g. `a.out 0x4dca89`), one per line.
306The second file will contain the actual execution trace as a sequence of 4-byte integers
307-- these integers are the indices into the array of instrumented points (the first file).
308
309Basic block tracing is currently supported only for single-threaded applications.
310
Kostya Serebryany5ce81792015-12-02 02:08:26 +0000311
Kostya Serebryanyd4590c72016-02-17 21:34:43 +0000312Tracing PCs
313===========
314*Experimental* feature similar to tracing basic blocks, but with a different API.
Kostya Serebryany52e86492016-02-18 00:49:23 +0000315With ``-fsanitize-coverage=trace-pc`` the compiler will insert
316``__sanitizer_cov_trace_pc()`` on every edge.
317With an additional ``...=trace-pc,indirect-calls`` flag
Kostya Serebryanyd4590c72016-02-17 21:34:43 +0000318``__sanitizer_cov_trace_pc_indirect(void *callee)`` will be inserted on every indirect call.
319These callbacks are not implemented in the Sanitizer run-time and should be defined
Kostya Serebryany52e86492016-02-18 00:49:23 +0000320by the user. So, these flags do not require the other sanitizer to be used.
321This mechanism is used for fuzzing the Linux kernel (https://github.com/google/syzkaller)
Aaron Ballman0f6f82a32016-02-22 13:09:36 +0000322and can be used with `AFL <http://lcamtuf.coredump.cx/afl>`__.
Kostya Serebryanyd4590c72016-02-17 21:34:43 +0000323
Kostya Serebryanyb17e2982015-07-31 21:48:10 +0000324Tracing data flow
325=================
326
327An *experimental* feature to support data-flow-guided fuzzing.
328With ``-fsanitize-coverage=trace-cmp`` the compiler will insert extra instrumentation
329around comparison instructions and switch statements.
330The fuzzer will need to define the following functions,
331they will be called by the instrumented code.
332
333.. code-block:: c++
334
335 // Called before a comparison instruction.
336 // SizeAndType is a packed value containing
337 // - [63:32] the Size of the operands of comparison in bits
338 // - [31:0] the Type of comparison (one of ICMP_EQ, ... ICMP_SLE)
339 // Arg1 and Arg2 are arguments of the comparison.
340 void __sanitizer_cov_trace_cmp(uint64_t SizeAndType, uint64_t Arg1, uint64_t Arg2);
341
342 // Called before a switch statement.
343 // Val is the switch operand.
344 // Cases[0] is the number of case constants.
345 // Cases[1] is the size of Val in bits.
346 // Cases[2:] are the case constants.
347 void __sanitizer_cov_trace_switch(uint64_t Val, uint64_t *Cases);
348
349This interface is a subject to change.
Kostya Serebryanya94e6e72015-11-30 22:17:19 +0000350The current implementation is not thread-safe and thus can be safely used only for single-threaded targets.
Kostya Serebryanyb17e2982015-07-31 21:48:10 +0000351
Sergey Matveev07e2d282015-04-23 20:40:04 +0000352Output directory
353================
354
355By default, .sancov files are created in the current working directory.
356This can be changed with ``ASAN_OPTIONS=coverage_dir=/path``:
357
358.. code-block:: console
359
360 % ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo
361 % ls -l /tmp/cov/*sancov
362 -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
363 -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
364
365Sudden death
366============
367
368Normally, coverage data is collected in memory and saved to disk when the
369program exits (with an ``atexit()`` handler), when a SIGSEGV is caught, or when
370``__sanitizer_cov_dump()`` is called.
371
372If the program ends with a signal that ASan does not handle (or can not handle
373at all, like SIGKILL), coverage data will be lost. This is a big problem on
374Android, where SIGKILL is a normal way of evicting applications from memory.
375
376With ``ASAN_OPTIONS=coverage=1:coverage_direct=1`` coverage data is written to a
377memory-mapped file as soon as it collected.
378
379.. code-block:: console
380
381 % ASAN_OPTIONS="coverage=1:coverage_direct=1" ./a.out
382 main
383 % ls
384 7036.sancov.map 7036.sancov.raw a.out
385 % sancov.py rawunpack 7036.sancov.raw
386 sancov.py: reading map 7036.sancov.map
387 sancov.py: unpacking 7036.sancov.raw
388 writing 1 PCs to a.out.7036.sancov
389 % sancov.py print a.out.7036.sancov
390 sancov.py: read 1 PCs from a.out.7036.sancov
391 sancov.py: 1 files merged; 1 PCs total
392 0x4b2bae
393
394Note that on 64-bit platforms, this method writes 2x more data than the default,
395because it stores full PC values instead of 32-bit offsets.
396
397In-process fuzzing
398==================
399
400Coverage data could be useful for fuzzers and sometimes it is preferable to run
401a fuzzer in the same process as the code being fuzzed (in-process fuzzer).
402
403You can use ``__sanitizer_get_total_unique_coverage()`` from
404``<sanitizer/coverage_interface.h>`` which returns the number of currently
405covered entities in the program. This will tell the fuzzer if the coverage has
406increased after testing every new input.
407
408If a fuzzer finds a bug in the ASan run, you will need to save the reproducer
409before exiting the process. Use ``__asan_set_death_callback`` from
410``<sanitizer/asan_interface.h>`` to do that.
411
412An example of such fuzzer can be found in `the LLVM tree
413<http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup>`_.
414
415Performance
416===========
417
418This coverage implementation is **fast**. With function-level coverage
Alexey Samsonov8fffba12015-05-07 23:04:19 +0000419(``-fsanitize-coverage=func``) the overhead is not measurable. With
420basic-block-level coverage (``-fsanitize-coverage=bb``) the overhead varies
Sergey Matveev07e2d282015-04-23 20:40:04 +0000421between 0 and 25%.
422
423============== ========= ========= ========= ========= ========= =========
424 benchmark cov0 cov1 diff 0-1 cov2 diff 0-2 diff 1-2
425============== ========= ========= ========= ========= ========= =========
426 400.perlbench 1296.00 1307.00 1.01 1465.00 1.13 1.12
427 401.bzip2 858.00 854.00 1.00 1010.00 1.18 1.18
428 403.gcc 613.00 617.00 1.01 683.00 1.11 1.11
429 429.mcf 605.00 582.00 0.96 610.00 1.01 1.05
430 445.gobmk 896.00 880.00 0.98 1050.00 1.17 1.19
431 456.hmmer 892.00 892.00 1.00 918.00 1.03 1.03
432 458.sjeng 995.00 1009.00 1.01 1217.00 1.22 1.21
433462.libquantum 497.00 492.00 0.99 534.00 1.07 1.09
434 464.h264ref 1461.00 1467.00 1.00 1543.00 1.06 1.05
435 471.omnetpp 575.00 590.00 1.03 660.00 1.15 1.12
436 473.astar 658.00 652.00 0.99 715.00 1.09 1.10
437 483.xalancbmk 471.00 491.00 1.04 582.00 1.24 1.19
438 433.milc 616.00 627.00 1.02 627.00 1.02 1.00
439 444.namd 602.00 601.00 1.00 654.00 1.09 1.09
440 447.dealII 630.00 634.00 1.01 653.00 1.04 1.03
441 450.soplex 365.00 368.00 1.01 395.00 1.08 1.07
442 453.povray 427.00 434.00 1.02 495.00 1.16 1.14
443 470.lbm 357.00 375.00 1.05 370.00 1.04 0.99
444 482.sphinx3 927.00 928.00 1.00 1000.00 1.08 1.08
445============== ========= ========= ========= ========= ========= =========
446
447Why another coverage?
448=====================
449
450Why did we implement yet another code coverage?
451 * We needed something that is lightning fast, plays well with
452 AddressSanitizer, and does not significantly increase the binary size.
453 * Traditional coverage implementations based in global counters
454 `suffer from contention on counters
455 <https://groups.google.com/forum/#!topic/llvm-dev/cDqYgnxNEhY>`_.