blob: 7b2a5c6315b3a6f322057370562b4f613099ca2d [file] [log] [blame]
Sergey Matveev33e32242015-04-23 21:29:37 +00001=================
Sergey Matveev07e2d282015-04-23 20:40:04 +00002SanitizerCoverage
Sergey Matveev33e32242015-04-23 21:29:37 +00003=================
Sergey Matveev07e2d282015-04-23 20:40:04 +00004
5.. contents::
6 :local:
7
8Introduction
9============
10
11Sanitizer tools have a very simple code coverage tool built in. It allows to
12get function-level, basic-block-level, and edge-level coverage at a very low
13cost.
14
15How to build and run
16====================
17
18SanitizerCoverage can be used with :doc:`AddressSanitizer`,
Alexey Samsonov8fffba12015-05-07 23:04:19 +000019:doc:`LeakSanitizer`, :doc:`MemorySanitizer`, and UndefinedBehaviorSanitizer.
20In addition to ``-fsanitize=``, pass one of the following compile-time flags:
Sergey Matveev07e2d282015-04-23 20:40:04 +000021
Alexey Samsonov8fffba12015-05-07 23:04:19 +000022* ``-fsanitize-coverage=func`` for function-level coverage (very fast).
23* ``-fsanitize-coverage=bb`` for basic-block-level coverage (may add up to 30%
Sergey Matveev07e2d282015-04-23 20:40:04 +000024 **extra** slowdown).
Alexey Samsonov8fffba12015-05-07 23:04:19 +000025* ``-fsanitize-coverage=edge`` for edge-level coverage (up to 40% slowdown).
Sergey Matveev07e2d282015-04-23 20:40:04 +000026
Alexey Samsonov8fffba12015-05-07 23:04:19 +000027You may also specify ``-fsanitize-coverage=indirect-calls`` for
28additional `caller-callee coverage`_.
Sergey Matveev07e2d282015-04-23 20:40:04 +000029
Alexey Samsonov8fffba12015-05-07 23:04:19 +000030At run time, pass ``coverage=1`` in ``ASAN_OPTIONS``, ``LSAN_OPTIONS``,
31``MSAN_OPTIONS`` or ``UBSAN_OPTIONS``, as appropriate.
32
33To get `Coverage counters`_, add ``-fsanitize-coverage=8bit-counters``
Sergey Matveev07e2d282015-04-23 20:40:04 +000034to one of the above compile-time flags. At runtime, use
35``*SAN_OPTIONS=coverage=1:coverage_counters=1``.
36
37Example:
38
39.. code-block:: console
40
41 % cat -n cov.cc
42 1 #include <stdio.h>
43 2 __attribute__((noinline))
44 3 void foo() { printf("foo\n"); }
45 4
46 5 int main(int argc, char **argv) {
47 6 if (argc == 2)
48 7 foo();
49 8 printf("main\n");
50 9 }
Alexey Samsonov8fffba12015-05-07 23:04:19 +000051 % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=func
Sergey Matveev07e2d282015-04-23 20:40:04 +000052 % ASAN_OPTIONS=coverage=1 ./a.out; ls -l *sancov
53 main
54 -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
55 % ASAN_OPTIONS=coverage=1 ./a.out foo ; ls -l *sancov
56 foo
57 main
58 -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
59 -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
60
61Every time you run an executable instrumented with SanitizerCoverage
62one ``*.sancov`` file is created during the process shutdown.
63If the executable is dynamically linked against instrumented DSOs,
64one ``*.sancov`` file will be also created for every DSO.
65
66Postprocessing
67==============
68
69The format of ``*.sancov`` files is very simple: the first 8 bytes is the magic,
70one of ``0xC0BFFFFFFFFFFF64`` and ``0xC0BFFFFFFFFFFF32``. The last byte of the
71magic defines the size of the following offsets. The rest of the data is the
72offsets in the corresponding binary/DSO that were executed during the run.
73
74A simple script
75``$LLVM/projects/compiler-rt/lib/sanitizer_common/scripts/sancov.py`` is
76provided to dump these offsets.
77
78.. code-block:: console
79
80 % sancov.py print a.out.22679.sancov a.out.22673.sancov
81 sancov.py: read 2 PCs from a.out.22679.sancov
82 sancov.py: read 1 PCs from a.out.22673.sancov
83 sancov.py: 2 files merged; 2 PCs total
84 0x465250
85 0x4652a0
86
87You can then filter the output of ``sancov.py`` through ``addr2line --exe
88ObjectFile`` or ``llvm-symbolizer --obj ObjectFile`` to get file names and line
89numbers:
90
91.. code-block:: console
92
93 % sancov.py print a.out.22679.sancov a.out.22673.sancov 2> /dev/null | llvm-symbolizer --obj a.out
94 cov.cc:3
95 cov.cc:5
96
Mike Aizatsky3828cbb2016-01-27 23:56:12 +000097Sancov Tool
98===========
99
100A new experimental ``sancov`` tool is developed to process coverage files.
101The tool is part of LLVM project and is currently supported only on Linux.
102It can handle symbolization tasks autonomously without needed any extra
103support from environment.
104
105.. code-block:: console
106
107 USAGE: sancov [options] <action> <filenames...>
108
109 Action (required)
110 -print - Print coverage addresses
111 -covered-functions - Print all covered funcions.
112 -not-covered-functions - Print all not covered funcions.
113 -html-report - Print HTML coverage report.
114
115 Options
116 -blacklist=<string> - Blacklist file (sanitizer blacklist format).
117 -demangle - Print demangled function name.
118 -obj=<string> - Path to object file to be symbolized
119 -strip_path_prefix=<string> - Strip this prefix from file paths in reports
120
121
122Automatic HTML Report Generation
123================================
124
125If ``*SAN_OPTIONS`` contains ``html_cov_report=1`` option set, then html
126coverage report would be automatically generated alongside the coverage files.
127The ``sancov`` binary should be present in ``PATH`` or
128``sancov_path=<path_to_sancov`` option can be used to specify tool location.
129
130
Sergey Matveev07e2d282015-04-23 20:40:04 +0000131How good is the coverage?
132=========================
133
Sergey Matveevea558e02015-05-06 21:09:00 +0000134It is possible to find out which PCs are not covered, by subtracting the covered
135set from the set of all instrumented PCs. The latter can be obtained by listing
136all callsites of ``__sanitizer_cov()`` in the binary. On Linux, ``sancov.py``
137can do this for you. Just supply the path to binary and a list of covered PCs:
Sergey Matveev07e2d282015-04-23 20:40:04 +0000138
139.. code-block:: console
140
Sergey Matveevea558e02015-05-06 21:09:00 +0000141 % sancov.py print a.out.12345.sancov > covered.txt
142 sancov.py: read 2 64-bit PCs from a.out.12345.sancov
143 sancov.py: 1 file merged; 2 PCs total
144 % sancov.py missing a.out < covered.txt
145 sancov.py: found 3 instrumented PCs in a.out
146 sancov.py: read 2 PCs from stdin
147 sancov.py: 1 PCs missing from coverage
148 0x4cc61c
Sergey Matveev07e2d282015-04-23 20:40:04 +0000149
150Edge coverage
151=============
152
153Consider this code:
154
155.. code-block:: c++
156
157 void foo(int *a) {
158 if (a)
159 *a = 0;
160 }
161
162It contains 3 basic blocks, let's name them A, B, C:
163
164.. code-block:: none
165
166 A
167 |\
168 | \
169 | B
170 | /
171 |/
172 C
173
174If blocks A, B, and C are all covered we know for certain that the edges A=>B
175and B=>C were executed, but we still don't know if the edge A=>C was executed.
176Such edges of control flow graph are called
177`critical <http://en.wikipedia.org/wiki/Control_flow_graph#Special_edges>`_. The
Alexey Samsonov8fffba12015-05-07 23:04:19 +0000178edge-level coverage (``-fsanitize-coverage=edge``) simply splits all critical
179edges by introducing new dummy blocks and then instruments those blocks:
Sergey Matveev07e2d282015-04-23 20:40:04 +0000180
181.. code-block:: none
182
183 A
184 |\
185 | \
186 D B
187 | /
188 |/
189 C
190
191Bitset
192======
193
194When ``coverage_bitset=1`` run-time flag is given, the coverage will also be
195dumped as a bitset (text file with 1 for blocks that have been executed and 0
196for blocks that were not).
197
198.. code-block:: console
199
Alexey Samsonov8fffba12015-05-07 23:04:19 +0000200 % clang++ -fsanitize=address -fsanitize-coverage=edge cov.cc
Sergey Matveev07e2d282015-04-23 20:40:04 +0000201 % ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out
202 main
203 % ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out 1
204 foo
205 main
206 % head *bitset*
207 ==> a.out.38214.bitset-sancov <==
208 01101
209 ==> a.out.6128.bitset-sancov <==
210 11011%
211
212For a given executable the length of the bitset is always the same (well,
213unless dlopen/dlclose come into play), so the bitset coverage can be
214easily used for bitset-based corpus distillation.
215
216Caller-callee coverage
217======================
218
219(Experimental!)
220Every indirect function call is instrumented with a run-time function call that
221captures caller and callee. At the shutdown time the process dumps a separate
222file called ``caller-callee.PID.sancov`` which contains caller/callee pairs as
223pairs of lines (odd lines are callers, even lines are callees)
224
225.. code-block:: console
226
227 a.out 0x4a2e0c
228 a.out 0x4a6510
229 a.out 0x4a2e0c
230 a.out 0x4a87f0
231
232Current limitations:
233
234* Only the first 14 callees for every caller are recorded, the rest are silently
235 ignored.
236* The output format is not very compact since caller and callee may reside in
237 different modules and we need to spell out the module names.
238* The routine that dumps the output is not optimized for speed
239* Only Linux x86_64 is tested so far.
240* Sandboxes are not supported.
241
242Coverage counters
243=================
244
245This experimental feature is inspired by
246`AFL <http://lcamtuf.coredump.cx/afl/technical_details.txt>`_'s coverage
247instrumentation. With additional compile-time and run-time flags you can get
248more sensitive coverage information. In addition to boolean values assigned to
249every basic block (edge) the instrumentation will collect imprecise counters.
250On exit, every counter will be mapped to a 8-bit bitset representing counter
251ranges: ``1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+`` and those 8-bit bitsets will
252be dumped to disk.
253
254.. code-block:: console
255
Alexey Samsonov8fffba12015-05-07 23:04:19 +0000256 % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=edge,8bit-counters
Sergey Matveev07e2d282015-04-23 20:40:04 +0000257 % ASAN_OPTIONS="coverage=1:coverage_counters=1" ./a.out
258 % ls -l *counters-sancov
259 ... a.out.17110.counters-sancov
260 % xxd *counters-sancov
261 0000000: 0001 0100 01
262
263These counters may also be used for in-process coverage-guided fuzzers. See
264``include/sanitizer/coverage_interface.h``:
265
266.. code-block:: c++
267
268 // The coverage instrumentation may optionally provide imprecise counters.
269 // Rather than exposing the counter values to the user we instead map
270 // the counters to a bitset.
271 // Every counter is associated with 8 bits in the bitset.
272 // We define 8 value ranges: 1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+
273 // The i-th bit is set to 1 if the counter value is in the i-th range.
274 // This counter-based coverage implementation is *not* thread-safe.
275
276 // Returns the number of registered coverage counters.
277 uintptr_t __sanitizer_get_number_of_counters();
278 // Updates the counter 'bitset', clears the counters and returns the number of
279 // new bits in 'bitset'.
280 // If 'bitset' is nullptr, only clears the counters.
281 // Otherwise 'bitset' should be at least
282 // __sanitizer_get_number_of_counters bytes long and 8-aligned.
283 uintptr_t
284 __sanitizer_update_counter_bitset_and_clear_counters(uint8_t *bitset);
285
Kostya Serebryany5ce81792015-12-02 02:08:26 +0000286Tracing basic blocks
287====================
288An *experimental* feature to support basic block (or edge) tracing.
289With ``-fsanitize-coverage=trace-bb`` the compiler will insert
290``__sanitizer_cov_trace_basic_block(s32 *id)`` before every function, basic block, or edge
291(depending on the value of ``-fsanitize-coverage=[func,bb,edge]``).
292
Kostya Serebryanyb17e2982015-07-31 21:48:10 +0000293Tracing data flow
294=================
295
296An *experimental* feature to support data-flow-guided fuzzing.
297With ``-fsanitize-coverage=trace-cmp`` the compiler will insert extra instrumentation
298around comparison instructions and switch statements.
299The fuzzer will need to define the following functions,
300they will be called by the instrumented code.
301
302.. code-block:: c++
303
304 // Called before a comparison instruction.
305 // SizeAndType is a packed value containing
306 // - [63:32] the Size of the operands of comparison in bits
307 // - [31:0] the Type of comparison (one of ICMP_EQ, ... ICMP_SLE)
308 // Arg1 and Arg2 are arguments of the comparison.
309 void __sanitizer_cov_trace_cmp(uint64_t SizeAndType, uint64_t Arg1, uint64_t Arg2);
310
311 // Called before a switch statement.
312 // Val is the switch operand.
313 // Cases[0] is the number of case constants.
314 // Cases[1] is the size of Val in bits.
315 // Cases[2:] are the case constants.
316 void __sanitizer_cov_trace_switch(uint64_t Val, uint64_t *Cases);
317
318This interface is a subject to change.
Kostya Serebryanya94e6e72015-11-30 22:17:19 +0000319The current implementation is not thread-safe and thus can be safely used only for single-threaded targets.
Kostya Serebryanyb17e2982015-07-31 21:48:10 +0000320
Sergey Matveev07e2d282015-04-23 20:40:04 +0000321Output directory
322================
323
324By default, .sancov files are created in the current working directory.
325This can be changed with ``ASAN_OPTIONS=coverage_dir=/path``:
326
327.. code-block:: console
328
329 % ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo
330 % ls -l /tmp/cov/*sancov
331 -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
332 -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
333
334Sudden death
335============
336
337Normally, coverage data is collected in memory and saved to disk when the
338program exits (with an ``atexit()`` handler), when a SIGSEGV is caught, or when
339``__sanitizer_cov_dump()`` is called.
340
341If the program ends with a signal that ASan does not handle (or can not handle
342at all, like SIGKILL), coverage data will be lost. This is a big problem on
343Android, where SIGKILL is a normal way of evicting applications from memory.
344
345With ``ASAN_OPTIONS=coverage=1:coverage_direct=1`` coverage data is written to a
346memory-mapped file as soon as it collected.
347
348.. code-block:: console
349
350 % ASAN_OPTIONS="coverage=1:coverage_direct=1" ./a.out
351 main
352 % ls
353 7036.sancov.map 7036.sancov.raw a.out
354 % sancov.py rawunpack 7036.sancov.raw
355 sancov.py: reading map 7036.sancov.map
356 sancov.py: unpacking 7036.sancov.raw
357 writing 1 PCs to a.out.7036.sancov
358 % sancov.py print a.out.7036.sancov
359 sancov.py: read 1 PCs from a.out.7036.sancov
360 sancov.py: 1 files merged; 1 PCs total
361 0x4b2bae
362
363Note that on 64-bit platforms, this method writes 2x more data than the default,
364because it stores full PC values instead of 32-bit offsets.
365
366In-process fuzzing
367==================
368
369Coverage data could be useful for fuzzers and sometimes it is preferable to run
370a fuzzer in the same process as the code being fuzzed (in-process fuzzer).
371
372You can use ``__sanitizer_get_total_unique_coverage()`` from
373``<sanitizer/coverage_interface.h>`` which returns the number of currently
374covered entities in the program. This will tell the fuzzer if the coverage has
375increased after testing every new input.
376
377If a fuzzer finds a bug in the ASan run, you will need to save the reproducer
378before exiting the process. Use ``__asan_set_death_callback`` from
379``<sanitizer/asan_interface.h>`` to do that.
380
381An example of such fuzzer can be found in `the LLVM tree
382<http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup>`_.
383
384Performance
385===========
386
387This coverage implementation is **fast**. With function-level coverage
Alexey Samsonov8fffba12015-05-07 23:04:19 +0000388(``-fsanitize-coverage=func``) the overhead is not measurable. With
389basic-block-level coverage (``-fsanitize-coverage=bb``) the overhead varies
Sergey Matveev07e2d282015-04-23 20:40:04 +0000390between 0 and 25%.
391
392============== ========= ========= ========= ========= ========= =========
393 benchmark cov0 cov1 diff 0-1 cov2 diff 0-2 diff 1-2
394============== ========= ========= ========= ========= ========= =========
395 400.perlbench 1296.00 1307.00 1.01 1465.00 1.13 1.12
396 401.bzip2 858.00 854.00 1.00 1010.00 1.18 1.18
397 403.gcc 613.00 617.00 1.01 683.00 1.11 1.11
398 429.mcf 605.00 582.00 0.96 610.00 1.01 1.05
399 445.gobmk 896.00 880.00 0.98 1050.00 1.17 1.19
400 456.hmmer 892.00 892.00 1.00 918.00 1.03 1.03
401 458.sjeng 995.00 1009.00 1.01 1217.00 1.22 1.21
402462.libquantum 497.00 492.00 0.99 534.00 1.07 1.09
403 464.h264ref 1461.00 1467.00 1.00 1543.00 1.06 1.05
404 471.omnetpp 575.00 590.00 1.03 660.00 1.15 1.12
405 473.astar 658.00 652.00 0.99 715.00 1.09 1.10
406 483.xalancbmk 471.00 491.00 1.04 582.00 1.24 1.19
407 433.milc 616.00 627.00 1.02 627.00 1.02 1.00
408 444.namd 602.00 601.00 1.00 654.00 1.09 1.09
409 447.dealII 630.00 634.00 1.01 653.00 1.04 1.03
410 450.soplex 365.00 368.00 1.01 395.00 1.08 1.07
411 453.povray 427.00 434.00 1.02 495.00 1.16 1.14
412 470.lbm 357.00 375.00 1.05 370.00 1.04 0.99
413 482.sphinx3 927.00 928.00 1.00 1000.00 1.08 1.08
414============== ========= ========= ========= ========= ========= =========
415
416Why another coverage?
417=====================
418
419Why did we implement yet another code coverage?
420 * We needed something that is lightning fast, plays well with
421 AddressSanitizer, and does not significantly increase the binary size.
422 * Traditional coverage implementations based in global counters
423 `suffer from contention on counters
424 <https://groups.google.com/forum/#!topic/llvm-dev/cDqYgnxNEhY>`_.