blob: 024511cf51c6e1e28f62db268ccb5a0e408c6f3c [file] [log] [blame]
Sergey Matveev33e32242015-04-23 21:29:37 +00001=================
Sergey Matveev07e2d282015-04-23 20:40:04 +00002SanitizerCoverage
Sergey Matveev33e32242015-04-23 21:29:37 +00003=================
Sergey Matveev07e2d282015-04-23 20:40:04 +00004
5.. contents::
6 :local:
7
8Introduction
9============
10
11Sanitizer tools have a very simple code coverage tool built in. It allows to
12get function-level, basic-block-level, and edge-level coverage at a very low
13cost.
14
15How to build and run
16====================
17
18SanitizerCoverage can be used with :doc:`AddressSanitizer`,
Alexey Samsonov8fffba12015-05-07 23:04:19 +000019:doc:`LeakSanitizer`, :doc:`MemorySanitizer`, and UndefinedBehaviorSanitizer.
20In addition to ``-fsanitize=``, pass one of the following compile-time flags:
Sergey Matveev07e2d282015-04-23 20:40:04 +000021
Alexey Samsonov8fffba12015-05-07 23:04:19 +000022* ``-fsanitize-coverage=func`` for function-level coverage (very fast).
23* ``-fsanitize-coverage=bb`` for basic-block-level coverage (may add up to 30%
Sergey Matveev07e2d282015-04-23 20:40:04 +000024 **extra** slowdown).
Alexey Samsonov8fffba12015-05-07 23:04:19 +000025* ``-fsanitize-coverage=edge`` for edge-level coverage (up to 40% slowdown).
Sergey Matveev07e2d282015-04-23 20:40:04 +000026
Alexey Samsonov8fffba12015-05-07 23:04:19 +000027You may also specify ``-fsanitize-coverage=indirect-calls`` for
28additional `caller-callee coverage`_.
Sergey Matveev07e2d282015-04-23 20:40:04 +000029
Alexey Samsonov8fffba12015-05-07 23:04:19 +000030At run time, pass ``coverage=1`` in ``ASAN_OPTIONS``, ``LSAN_OPTIONS``,
31``MSAN_OPTIONS`` or ``UBSAN_OPTIONS``, as appropriate.
32
33To get `Coverage counters`_, add ``-fsanitize-coverage=8bit-counters``
Sergey Matveev07e2d282015-04-23 20:40:04 +000034to one of the above compile-time flags. At runtime, use
35``*SAN_OPTIONS=coverage=1:coverage_counters=1``.
36
37Example:
38
39.. code-block:: console
40
41 % cat -n cov.cc
42 1 #include <stdio.h>
43 2 __attribute__((noinline))
44 3 void foo() { printf("foo\n"); }
45 4
46 5 int main(int argc, char **argv) {
47 6 if (argc == 2)
48 7 foo();
49 8 printf("main\n");
50 9 }
Alexey Samsonov8fffba12015-05-07 23:04:19 +000051 % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=func
Sergey Matveev07e2d282015-04-23 20:40:04 +000052 % ASAN_OPTIONS=coverage=1 ./a.out; ls -l *sancov
53 main
54 -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
55 % ASAN_OPTIONS=coverage=1 ./a.out foo ; ls -l *sancov
56 foo
57 main
58 -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
59 -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
60
61Every time you run an executable instrumented with SanitizerCoverage
62one ``*.sancov`` file is created during the process shutdown.
63If the executable is dynamically linked against instrumented DSOs,
64one ``*.sancov`` file will be also created for every DSO.
65
66Postprocessing
67==============
68
69The format of ``*.sancov`` files is very simple: the first 8 bytes is the magic,
70one of ``0xC0BFFFFFFFFFFF64`` and ``0xC0BFFFFFFFFFFF32``. The last byte of the
71magic defines the size of the following offsets. The rest of the data is the
72offsets in the corresponding binary/DSO that were executed during the run.
73
74A simple script
75``$LLVM/projects/compiler-rt/lib/sanitizer_common/scripts/sancov.py`` is
76provided to dump these offsets.
77
78.. code-block:: console
79
80 % sancov.py print a.out.22679.sancov a.out.22673.sancov
81 sancov.py: read 2 PCs from a.out.22679.sancov
82 sancov.py: read 1 PCs from a.out.22673.sancov
83 sancov.py: 2 files merged; 2 PCs total
84 0x465250
85 0x4652a0
86
87You can then filter the output of ``sancov.py`` through ``addr2line --exe
88ObjectFile`` or ``llvm-symbolizer --obj ObjectFile`` to get file names and line
89numbers:
90
91.. code-block:: console
92
93 % sancov.py print a.out.22679.sancov a.out.22673.sancov 2> /dev/null | llvm-symbolizer --obj a.out
94 cov.cc:3
95 cov.cc:5
96
Mike Aizatsky3828cbb2016-01-27 23:56:12 +000097Sancov Tool
98===========
99
100A new experimental ``sancov`` tool is developed to process coverage files.
101The tool is part of LLVM project and is currently supported only on Linux.
Mike Aizatskya731ee32016-02-12 00:29:45 +0000102It can handle symbolization tasks autonomously without any extra support
103from the environment. You need to pass .sancov files (named
104``<module_name>.<pid>.sancov`` and paths to all corresponding binary elf files.
105Sancov matches these files using module names and binaries file names.
Mike Aizatsky3828cbb2016-01-27 23:56:12 +0000106
107.. code-block:: console
108
Mike Aizatskya731ee32016-02-12 00:29:45 +0000109 USAGE: sancov [options] <action> (<binary file>|<.sancov file>)...
Mike Aizatsky3828cbb2016-01-27 23:56:12 +0000110
111 Action (required)
112 -print - Print coverage addresses
Sylvestre Ledrube8f3962016-02-14 20:20:58 +0000113 -covered-functions - Print all covered functions.
114 -not-covered-functions - Print all not covered functions.
Mike Aizatsky3828cbb2016-01-27 23:56:12 +0000115 -html-report - Print HTML coverage report.
116
117 Options
118 -blacklist=<string> - Blacklist file (sanitizer blacklist format).
119 -demangle - Print demangled function name.
Mike Aizatsky3828cbb2016-01-27 23:56:12 +0000120 -strip_path_prefix=<string> - Strip this prefix from file paths in reports
121
122
123Automatic HTML Report Generation
124================================
125
126If ``*SAN_OPTIONS`` contains ``html_cov_report=1`` option set, then html
127coverage report would be automatically generated alongside the coverage files.
128The ``sancov`` binary should be present in ``PATH`` or
129``sancov_path=<path_to_sancov`` option can be used to specify tool location.
130
131
Sergey Matveev07e2d282015-04-23 20:40:04 +0000132How good is the coverage?
133=========================
134
Sergey Matveevea558e02015-05-06 21:09:00 +0000135It is possible to find out which PCs are not covered, by subtracting the covered
136set from the set of all instrumented PCs. The latter can be obtained by listing
137all callsites of ``__sanitizer_cov()`` in the binary. On Linux, ``sancov.py``
138can do this for you. Just supply the path to binary and a list of covered PCs:
Sergey Matveev07e2d282015-04-23 20:40:04 +0000139
140.. code-block:: console
141
Sergey Matveevea558e02015-05-06 21:09:00 +0000142 % sancov.py print a.out.12345.sancov > covered.txt
143 sancov.py: read 2 64-bit PCs from a.out.12345.sancov
144 sancov.py: 1 file merged; 2 PCs total
145 % sancov.py missing a.out < covered.txt
146 sancov.py: found 3 instrumented PCs in a.out
147 sancov.py: read 2 PCs from stdin
148 sancov.py: 1 PCs missing from coverage
149 0x4cc61c
Sergey Matveev07e2d282015-04-23 20:40:04 +0000150
151Edge coverage
152=============
153
154Consider this code:
155
156.. code-block:: c++
157
158 void foo(int *a) {
159 if (a)
160 *a = 0;
161 }
162
163It contains 3 basic blocks, let's name them A, B, C:
164
165.. code-block:: none
166
167 A
168 |\
169 | \
170 | B
171 | /
172 |/
173 C
174
175If blocks A, B, and C are all covered we know for certain that the edges A=>B
176and B=>C were executed, but we still don't know if the edge A=>C was executed.
177Such edges of control flow graph are called
178`critical <http://en.wikipedia.org/wiki/Control_flow_graph#Special_edges>`_. The
Alexey Samsonov8fffba12015-05-07 23:04:19 +0000179edge-level coverage (``-fsanitize-coverage=edge``) simply splits all critical
180edges by introducing new dummy blocks and then instruments those blocks:
Sergey Matveev07e2d282015-04-23 20:40:04 +0000181
182.. code-block:: none
183
184 A
185 |\
186 | \
187 D B
188 | /
189 |/
190 C
191
192Bitset
193======
194
195When ``coverage_bitset=1`` run-time flag is given, the coverage will also be
196dumped as a bitset (text file with 1 for blocks that have been executed and 0
197for blocks that were not).
198
199.. code-block:: console
200
Alexey Samsonov8fffba12015-05-07 23:04:19 +0000201 % clang++ -fsanitize=address -fsanitize-coverage=edge cov.cc
Sergey Matveev07e2d282015-04-23 20:40:04 +0000202 % ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out
203 main
204 % ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out 1
205 foo
206 main
207 % head *bitset*
208 ==> a.out.38214.bitset-sancov <==
209 01101
210 ==> a.out.6128.bitset-sancov <==
211 11011%
212
213For a given executable the length of the bitset is always the same (well,
214unless dlopen/dlclose come into play), so the bitset coverage can be
215easily used for bitset-based corpus distillation.
216
217Caller-callee coverage
218======================
219
220(Experimental!)
221Every indirect function call is instrumented with a run-time function call that
222captures caller and callee. At the shutdown time the process dumps a separate
223file called ``caller-callee.PID.sancov`` which contains caller/callee pairs as
224pairs of lines (odd lines are callers, even lines are callees)
225
226.. code-block:: console
227
228 a.out 0x4a2e0c
229 a.out 0x4a6510
230 a.out 0x4a2e0c
231 a.out 0x4a87f0
232
233Current limitations:
234
235* Only the first 14 callees for every caller are recorded, the rest are silently
236 ignored.
237* The output format is not very compact since caller and callee may reside in
238 different modules and we need to spell out the module names.
239* The routine that dumps the output is not optimized for speed
240* Only Linux x86_64 is tested so far.
241* Sandboxes are not supported.
242
243Coverage counters
244=================
245
246This experimental feature is inspired by
Aaron Ballman0f6f82a32016-02-22 13:09:36 +0000247`AFL <http://lcamtuf.coredump.cx/afl/technical_details.txt>`__'s coverage
Sergey Matveev07e2d282015-04-23 20:40:04 +0000248instrumentation. With additional compile-time and run-time flags you can get
249more sensitive coverage information. In addition to boolean values assigned to
250every basic block (edge) the instrumentation will collect imprecise counters.
251On exit, every counter will be mapped to a 8-bit bitset representing counter
252ranges: ``1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+`` and those 8-bit bitsets will
253be dumped to disk.
254
255.. code-block:: console
256
Alexey Samsonov8fffba12015-05-07 23:04:19 +0000257 % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=edge,8bit-counters
Sergey Matveev07e2d282015-04-23 20:40:04 +0000258 % ASAN_OPTIONS="coverage=1:coverage_counters=1" ./a.out
259 % ls -l *counters-sancov
260 ... a.out.17110.counters-sancov
261 % xxd *counters-sancov
262 0000000: 0001 0100 01
263
264These counters may also be used for in-process coverage-guided fuzzers. See
265``include/sanitizer/coverage_interface.h``:
266
267.. code-block:: c++
268
269 // The coverage instrumentation may optionally provide imprecise counters.
270 // Rather than exposing the counter values to the user we instead map
271 // the counters to a bitset.
272 // Every counter is associated with 8 bits in the bitset.
273 // We define 8 value ranges: 1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+
274 // The i-th bit is set to 1 if the counter value is in the i-th range.
275 // This counter-based coverage implementation is *not* thread-safe.
276
277 // Returns the number of registered coverage counters.
278 uintptr_t __sanitizer_get_number_of_counters();
279 // Updates the counter 'bitset', clears the counters and returns the number of
280 // new bits in 'bitset'.
281 // If 'bitset' is nullptr, only clears the counters.
282 // Otherwise 'bitset' should be at least
283 // __sanitizer_get_number_of_counters bytes long and 8-aligned.
284 uintptr_t
285 __sanitizer_update_counter_bitset_and_clear_counters(uint8_t *bitset);
286
Kostya Serebryany5ce81792015-12-02 02:08:26 +0000287Tracing basic blocks
288====================
289An *experimental* feature to support basic block (or edge) tracing.
290With ``-fsanitize-coverage=trace-bb`` the compiler will insert
291``__sanitizer_cov_trace_basic_block(s32 *id)`` before every function, basic block, or edge
292(depending on the value of ``-fsanitize-coverage=[func,bb,edge]``).
293
Kostya Serebryanyd4590c72016-02-17 21:34:43 +0000294Tracing PCs
295===========
296*Experimental* feature similar to tracing basic blocks, but with a different API.
Kostya Serebryany52e86492016-02-18 00:49:23 +0000297With ``-fsanitize-coverage=trace-pc`` the compiler will insert
298``__sanitizer_cov_trace_pc()`` on every edge.
299With an additional ``...=trace-pc,indirect-calls`` flag
Kostya Serebryanyd4590c72016-02-17 21:34:43 +0000300``__sanitizer_cov_trace_pc_indirect(void *callee)`` will be inserted on every indirect call.
301These callbacks are not implemented in the Sanitizer run-time and should be defined
Kostya Serebryany52e86492016-02-18 00:49:23 +0000302by the user. So, these flags do not require the other sanitizer to be used.
303This mechanism is used for fuzzing the Linux kernel (https://github.com/google/syzkaller)
Aaron Ballman0f6f82a32016-02-22 13:09:36 +0000304and can be used with `AFL <http://lcamtuf.coredump.cx/afl>`__.
Kostya Serebryanyd4590c72016-02-17 21:34:43 +0000305
Kostya Serebryanyb17e2982015-07-31 21:48:10 +0000306Tracing data flow
307=================
308
309An *experimental* feature to support data-flow-guided fuzzing.
310With ``-fsanitize-coverage=trace-cmp`` the compiler will insert extra instrumentation
311around comparison instructions and switch statements.
312The fuzzer will need to define the following functions,
313they will be called by the instrumented code.
314
315.. code-block:: c++
316
317 // Called before a comparison instruction.
318 // SizeAndType is a packed value containing
319 // - [63:32] the Size of the operands of comparison in bits
320 // - [31:0] the Type of comparison (one of ICMP_EQ, ... ICMP_SLE)
321 // Arg1 and Arg2 are arguments of the comparison.
322 void __sanitizer_cov_trace_cmp(uint64_t SizeAndType, uint64_t Arg1, uint64_t Arg2);
323
324 // Called before a switch statement.
325 // Val is the switch operand.
326 // Cases[0] is the number of case constants.
327 // Cases[1] is the size of Val in bits.
328 // Cases[2:] are the case constants.
329 void __sanitizer_cov_trace_switch(uint64_t Val, uint64_t *Cases);
330
331This interface is a subject to change.
Kostya Serebryanya94e6e72015-11-30 22:17:19 +0000332The current implementation is not thread-safe and thus can be safely used only for single-threaded targets.
Kostya Serebryanyb17e2982015-07-31 21:48:10 +0000333
Sergey Matveev07e2d282015-04-23 20:40:04 +0000334Output directory
335================
336
337By default, .sancov files are created in the current working directory.
338This can be changed with ``ASAN_OPTIONS=coverage_dir=/path``:
339
340.. code-block:: console
341
342 % ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo
343 % ls -l /tmp/cov/*sancov
344 -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
345 -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
346
347Sudden death
348============
349
350Normally, coverage data is collected in memory and saved to disk when the
351program exits (with an ``atexit()`` handler), when a SIGSEGV is caught, or when
352``__sanitizer_cov_dump()`` is called.
353
354If the program ends with a signal that ASan does not handle (or can not handle
355at all, like SIGKILL), coverage data will be lost. This is a big problem on
356Android, where SIGKILL is a normal way of evicting applications from memory.
357
358With ``ASAN_OPTIONS=coverage=1:coverage_direct=1`` coverage data is written to a
359memory-mapped file as soon as it collected.
360
361.. code-block:: console
362
363 % ASAN_OPTIONS="coverage=1:coverage_direct=1" ./a.out
364 main
365 % ls
366 7036.sancov.map 7036.sancov.raw a.out
367 % sancov.py rawunpack 7036.sancov.raw
368 sancov.py: reading map 7036.sancov.map
369 sancov.py: unpacking 7036.sancov.raw
370 writing 1 PCs to a.out.7036.sancov
371 % sancov.py print a.out.7036.sancov
372 sancov.py: read 1 PCs from a.out.7036.sancov
373 sancov.py: 1 files merged; 1 PCs total
374 0x4b2bae
375
376Note that on 64-bit platforms, this method writes 2x more data than the default,
377because it stores full PC values instead of 32-bit offsets.
378
379In-process fuzzing
380==================
381
382Coverage data could be useful for fuzzers and sometimes it is preferable to run
383a fuzzer in the same process as the code being fuzzed (in-process fuzzer).
384
385You can use ``__sanitizer_get_total_unique_coverage()`` from
386``<sanitizer/coverage_interface.h>`` which returns the number of currently
387covered entities in the program. This will tell the fuzzer if the coverage has
388increased after testing every new input.
389
390If a fuzzer finds a bug in the ASan run, you will need to save the reproducer
391before exiting the process. Use ``__asan_set_death_callback`` from
392``<sanitizer/asan_interface.h>`` to do that.
393
394An example of such fuzzer can be found in `the LLVM tree
395<http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup>`_.
396
397Performance
398===========
399
400This coverage implementation is **fast**. With function-level coverage
Alexey Samsonov8fffba12015-05-07 23:04:19 +0000401(``-fsanitize-coverage=func``) the overhead is not measurable. With
402basic-block-level coverage (``-fsanitize-coverage=bb``) the overhead varies
Sergey Matveev07e2d282015-04-23 20:40:04 +0000403between 0 and 25%.
404
405============== ========= ========= ========= ========= ========= =========
406 benchmark cov0 cov1 diff 0-1 cov2 diff 0-2 diff 1-2
407============== ========= ========= ========= ========= ========= =========
408 400.perlbench 1296.00 1307.00 1.01 1465.00 1.13 1.12
409 401.bzip2 858.00 854.00 1.00 1010.00 1.18 1.18
410 403.gcc 613.00 617.00 1.01 683.00 1.11 1.11
411 429.mcf 605.00 582.00 0.96 610.00 1.01 1.05
412 445.gobmk 896.00 880.00 0.98 1050.00 1.17 1.19
413 456.hmmer 892.00 892.00 1.00 918.00 1.03 1.03
414 458.sjeng 995.00 1009.00 1.01 1217.00 1.22 1.21
415462.libquantum 497.00 492.00 0.99 534.00 1.07 1.09
416 464.h264ref 1461.00 1467.00 1.00 1543.00 1.06 1.05
417 471.omnetpp 575.00 590.00 1.03 660.00 1.15 1.12
418 473.astar 658.00 652.00 0.99 715.00 1.09 1.10
419 483.xalancbmk 471.00 491.00 1.04 582.00 1.24 1.19
420 433.milc 616.00 627.00 1.02 627.00 1.02 1.00
421 444.namd 602.00 601.00 1.00 654.00 1.09 1.09
422 447.dealII 630.00 634.00 1.01 653.00 1.04 1.03
423 450.soplex 365.00 368.00 1.01 395.00 1.08 1.07
424 453.povray 427.00 434.00 1.02 495.00 1.16 1.14
425 470.lbm 357.00 375.00 1.05 370.00 1.04 0.99
426 482.sphinx3 927.00 928.00 1.00 1000.00 1.08 1.08
427============== ========= ========= ========= ========= ========= =========
428
429Why another coverage?
430=====================
431
432Why did we implement yet another code coverage?
433 * We needed something that is lightning fast, plays well with
434 AddressSanitizer, and does not significantly increase the binary size.
435 * Traditional coverage implementations based in global counters
436 `suffer from contention on counters
437 <https://groups.google.com/forum/#!topic/llvm-dev/cDqYgnxNEhY>`_.