blob: f8ac1dc2df97508f58c3940e507157d4ba681b81 [file] [log] [blame]
Sergey Matveev33e32242015-04-23 21:29:37 +00001=================
Sergey Matveev07e2d282015-04-23 20:40:04 +00002SanitizerCoverage
Sergey Matveev33e32242015-04-23 21:29:37 +00003=================
Sergey Matveev07e2d282015-04-23 20:40:04 +00004
5.. contents::
6 :local:
7
8Introduction
9============
10
11Sanitizer tools have a very simple code coverage tool built in. It allows to
12get function-level, basic-block-level, and edge-level coverage at a very low
13cost.
14
15How to build and run
16====================
17
18SanitizerCoverage can be used with :doc:`AddressSanitizer`,
Alexey Samsonov8fffba12015-05-07 23:04:19 +000019:doc:`LeakSanitizer`, :doc:`MemorySanitizer`, and UndefinedBehaviorSanitizer.
20In addition to ``-fsanitize=``, pass one of the following compile-time flags:
Sergey Matveev07e2d282015-04-23 20:40:04 +000021
Alexey Samsonov8fffba12015-05-07 23:04:19 +000022* ``-fsanitize-coverage=func`` for function-level coverage (very fast).
23* ``-fsanitize-coverage=bb`` for basic-block-level coverage (may add up to 30%
Sergey Matveev07e2d282015-04-23 20:40:04 +000024 **extra** slowdown).
Alexey Samsonov8fffba12015-05-07 23:04:19 +000025* ``-fsanitize-coverage=edge`` for edge-level coverage (up to 40% slowdown).
Sergey Matveev07e2d282015-04-23 20:40:04 +000026
Alexey Samsonov8fffba12015-05-07 23:04:19 +000027You may also specify ``-fsanitize-coverage=indirect-calls`` for
28additional `caller-callee coverage`_.
Sergey Matveev07e2d282015-04-23 20:40:04 +000029
Alexey Samsonov8fffba12015-05-07 23:04:19 +000030At run time, pass ``coverage=1`` in ``ASAN_OPTIONS``, ``LSAN_OPTIONS``,
31``MSAN_OPTIONS`` or ``UBSAN_OPTIONS``, as appropriate.
32
33To get `Coverage counters`_, add ``-fsanitize-coverage=8bit-counters``
Sergey Matveev07e2d282015-04-23 20:40:04 +000034to one of the above compile-time flags. At runtime, use
35``*SAN_OPTIONS=coverage=1:coverage_counters=1``.
36
37Example:
38
39.. code-block:: console
40
41 % cat -n cov.cc
42 1 #include <stdio.h>
43 2 __attribute__((noinline))
44 3 void foo() { printf("foo\n"); }
45 4
46 5 int main(int argc, char **argv) {
47 6 if (argc == 2)
48 7 foo();
49 8 printf("main\n");
50 9 }
Alexey Samsonov8fffba12015-05-07 23:04:19 +000051 % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=func
Sergey Matveev07e2d282015-04-23 20:40:04 +000052 % ASAN_OPTIONS=coverage=1 ./a.out; ls -l *sancov
53 main
54 -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
55 % ASAN_OPTIONS=coverage=1 ./a.out foo ; ls -l *sancov
56 foo
57 main
58 -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
59 -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
60
61Every time you run an executable instrumented with SanitizerCoverage
62one ``*.sancov`` file is created during the process shutdown.
63If the executable is dynamically linked against instrumented DSOs,
64one ``*.sancov`` file will be also created for every DSO.
65
66Postprocessing
67==============
68
69The format of ``*.sancov`` files is very simple: the first 8 bytes is the magic,
70one of ``0xC0BFFFFFFFFFFF64`` and ``0xC0BFFFFFFFFFFF32``. The last byte of the
71magic defines the size of the following offsets. The rest of the data is the
72offsets in the corresponding binary/DSO that were executed during the run.
73
74A simple script
75``$LLVM/projects/compiler-rt/lib/sanitizer_common/scripts/sancov.py`` is
76provided to dump these offsets.
77
78.. code-block:: console
79
80 % sancov.py print a.out.22679.sancov a.out.22673.sancov
81 sancov.py: read 2 PCs from a.out.22679.sancov
82 sancov.py: read 1 PCs from a.out.22673.sancov
83 sancov.py: 2 files merged; 2 PCs total
84 0x465250
85 0x4652a0
86
87You can then filter the output of ``sancov.py`` through ``addr2line --exe
88ObjectFile`` or ``llvm-symbolizer --obj ObjectFile`` to get file names and line
89numbers:
90
91.. code-block:: console
92
93 % sancov.py print a.out.22679.sancov a.out.22673.sancov 2> /dev/null | llvm-symbolizer --obj a.out
94 cov.cc:3
95 cov.cc:5
96
97How good is the coverage?
98=========================
99
Sergey Matveevea558e02015-05-06 21:09:00 +0000100It is possible to find out which PCs are not covered, by subtracting the covered
101set from the set of all instrumented PCs. The latter can be obtained by listing
102all callsites of ``__sanitizer_cov()`` in the binary. On Linux, ``sancov.py``
103can do this for you. Just supply the path to binary and a list of covered PCs:
Sergey Matveev07e2d282015-04-23 20:40:04 +0000104
105.. code-block:: console
106
Sergey Matveevea558e02015-05-06 21:09:00 +0000107 % sancov.py print a.out.12345.sancov > covered.txt
108 sancov.py: read 2 64-bit PCs from a.out.12345.sancov
109 sancov.py: 1 file merged; 2 PCs total
110 % sancov.py missing a.out < covered.txt
111 sancov.py: found 3 instrumented PCs in a.out
112 sancov.py: read 2 PCs from stdin
113 sancov.py: 1 PCs missing from coverage
114 0x4cc61c
Sergey Matveev07e2d282015-04-23 20:40:04 +0000115
116Edge coverage
117=============
118
119Consider this code:
120
121.. code-block:: c++
122
123 void foo(int *a) {
124 if (a)
125 *a = 0;
126 }
127
128It contains 3 basic blocks, let's name them A, B, C:
129
130.. code-block:: none
131
132 A
133 |\
134 | \
135 | B
136 | /
137 |/
138 C
139
140If blocks A, B, and C are all covered we know for certain that the edges A=>B
141and B=>C were executed, but we still don't know if the edge A=>C was executed.
142Such edges of control flow graph are called
143`critical <http://en.wikipedia.org/wiki/Control_flow_graph#Special_edges>`_. The
Alexey Samsonov8fffba12015-05-07 23:04:19 +0000144edge-level coverage (``-fsanitize-coverage=edge``) simply splits all critical
145edges by introducing new dummy blocks and then instruments those blocks:
Sergey Matveev07e2d282015-04-23 20:40:04 +0000146
147.. code-block:: none
148
149 A
150 |\
151 | \
152 D B
153 | /
154 |/
155 C
156
157Bitset
158======
159
160When ``coverage_bitset=1`` run-time flag is given, the coverage will also be
161dumped as a bitset (text file with 1 for blocks that have been executed and 0
162for blocks that were not).
163
164.. code-block:: console
165
Alexey Samsonov8fffba12015-05-07 23:04:19 +0000166 % clang++ -fsanitize=address -fsanitize-coverage=edge cov.cc
Sergey Matveev07e2d282015-04-23 20:40:04 +0000167 % ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out
168 main
169 % ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out 1
170 foo
171 main
172 % head *bitset*
173 ==> a.out.38214.bitset-sancov <==
174 01101
175 ==> a.out.6128.bitset-sancov <==
176 11011%
177
178For a given executable the length of the bitset is always the same (well,
179unless dlopen/dlclose come into play), so the bitset coverage can be
180easily used for bitset-based corpus distillation.
181
182Caller-callee coverage
183======================
184
185(Experimental!)
186Every indirect function call is instrumented with a run-time function call that
187captures caller and callee. At the shutdown time the process dumps a separate
188file called ``caller-callee.PID.sancov`` which contains caller/callee pairs as
189pairs of lines (odd lines are callers, even lines are callees)
190
191.. code-block:: console
192
193 a.out 0x4a2e0c
194 a.out 0x4a6510
195 a.out 0x4a2e0c
196 a.out 0x4a87f0
197
198Current limitations:
199
200* Only the first 14 callees for every caller are recorded, the rest are silently
201 ignored.
202* The output format is not very compact since caller and callee may reside in
203 different modules and we need to spell out the module names.
204* The routine that dumps the output is not optimized for speed
205* Only Linux x86_64 is tested so far.
206* Sandboxes are not supported.
207
208Coverage counters
209=================
210
211This experimental feature is inspired by
212`AFL <http://lcamtuf.coredump.cx/afl/technical_details.txt>`_'s coverage
213instrumentation. With additional compile-time and run-time flags you can get
214more sensitive coverage information. In addition to boolean values assigned to
215every basic block (edge) the instrumentation will collect imprecise counters.
216On exit, every counter will be mapped to a 8-bit bitset representing counter
217ranges: ``1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+`` and those 8-bit bitsets will
218be dumped to disk.
219
220.. code-block:: console
221
Alexey Samsonov8fffba12015-05-07 23:04:19 +0000222 % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=edge,8bit-counters
Sergey Matveev07e2d282015-04-23 20:40:04 +0000223 % ASAN_OPTIONS="coverage=1:coverage_counters=1" ./a.out
224 % ls -l *counters-sancov
225 ... a.out.17110.counters-sancov
226 % xxd *counters-sancov
227 0000000: 0001 0100 01
228
229These counters may also be used for in-process coverage-guided fuzzers. See
230``include/sanitizer/coverage_interface.h``:
231
232.. code-block:: c++
233
234 // The coverage instrumentation may optionally provide imprecise counters.
235 // Rather than exposing the counter values to the user we instead map
236 // the counters to a bitset.
237 // Every counter is associated with 8 bits in the bitset.
238 // We define 8 value ranges: 1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+
239 // The i-th bit is set to 1 if the counter value is in the i-th range.
240 // This counter-based coverage implementation is *not* thread-safe.
241
242 // Returns the number of registered coverage counters.
243 uintptr_t __sanitizer_get_number_of_counters();
244 // Updates the counter 'bitset', clears the counters and returns the number of
245 // new bits in 'bitset'.
246 // If 'bitset' is nullptr, only clears the counters.
247 // Otherwise 'bitset' should be at least
248 // __sanitizer_get_number_of_counters bytes long and 8-aligned.
249 uintptr_t
250 __sanitizer_update_counter_bitset_and_clear_counters(uint8_t *bitset);
251
Kostya Serebryanyb17e2982015-07-31 21:48:10 +0000252Tracing data flow
253=================
254
255An *experimental* feature to support data-flow-guided fuzzing.
256With ``-fsanitize-coverage=trace-cmp`` the compiler will insert extra instrumentation
257around comparison instructions and switch statements.
258The fuzzer will need to define the following functions,
259they will be called by the instrumented code.
260
261.. code-block:: c++
262
263 // Called before a comparison instruction.
264 // SizeAndType is a packed value containing
265 // - [63:32] the Size of the operands of comparison in bits
266 // - [31:0] the Type of comparison (one of ICMP_EQ, ... ICMP_SLE)
267 // Arg1 and Arg2 are arguments of the comparison.
268 void __sanitizer_cov_trace_cmp(uint64_t SizeAndType, uint64_t Arg1, uint64_t Arg2);
269
270 // Called before a switch statement.
271 // Val is the switch operand.
272 // Cases[0] is the number of case constants.
273 // Cases[1] is the size of Val in bits.
274 // Cases[2:] are the case constants.
275 void __sanitizer_cov_trace_switch(uint64_t Val, uint64_t *Cases);
276
277This interface is a subject to change.
Kostya Serebryanya94e6e72015-11-30 22:17:19 +0000278The current implementation is not thread-safe and thus can be safely used only for single-threaded targets.
Kostya Serebryanyb17e2982015-07-31 21:48:10 +0000279
Sergey Matveev07e2d282015-04-23 20:40:04 +0000280Output directory
281================
282
283By default, .sancov files are created in the current working directory.
284This can be changed with ``ASAN_OPTIONS=coverage_dir=/path``:
285
286.. code-block:: console
287
288 % ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo
289 % ls -l /tmp/cov/*sancov
290 -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
291 -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
292
293Sudden death
294============
295
296Normally, coverage data is collected in memory and saved to disk when the
297program exits (with an ``atexit()`` handler), when a SIGSEGV is caught, or when
298``__sanitizer_cov_dump()`` is called.
299
300If the program ends with a signal that ASan does not handle (or can not handle
301at all, like SIGKILL), coverage data will be lost. This is a big problem on
302Android, where SIGKILL is a normal way of evicting applications from memory.
303
304With ``ASAN_OPTIONS=coverage=1:coverage_direct=1`` coverage data is written to a
305memory-mapped file as soon as it collected.
306
307.. code-block:: console
308
309 % ASAN_OPTIONS="coverage=1:coverage_direct=1" ./a.out
310 main
311 % ls
312 7036.sancov.map 7036.sancov.raw a.out
313 % sancov.py rawunpack 7036.sancov.raw
314 sancov.py: reading map 7036.sancov.map
315 sancov.py: unpacking 7036.sancov.raw
316 writing 1 PCs to a.out.7036.sancov
317 % sancov.py print a.out.7036.sancov
318 sancov.py: read 1 PCs from a.out.7036.sancov
319 sancov.py: 1 files merged; 1 PCs total
320 0x4b2bae
321
322Note that on 64-bit platforms, this method writes 2x more data than the default,
323because it stores full PC values instead of 32-bit offsets.
324
325In-process fuzzing
326==================
327
328Coverage data could be useful for fuzzers and sometimes it is preferable to run
329a fuzzer in the same process as the code being fuzzed (in-process fuzzer).
330
331You can use ``__sanitizer_get_total_unique_coverage()`` from
332``<sanitizer/coverage_interface.h>`` which returns the number of currently
333covered entities in the program. This will tell the fuzzer if the coverage has
334increased after testing every new input.
335
336If a fuzzer finds a bug in the ASan run, you will need to save the reproducer
337before exiting the process. Use ``__asan_set_death_callback`` from
338``<sanitizer/asan_interface.h>`` to do that.
339
340An example of such fuzzer can be found in `the LLVM tree
341<http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup>`_.
342
343Performance
344===========
345
346This coverage implementation is **fast**. With function-level coverage
Alexey Samsonov8fffba12015-05-07 23:04:19 +0000347(``-fsanitize-coverage=func``) the overhead is not measurable. With
348basic-block-level coverage (``-fsanitize-coverage=bb``) the overhead varies
Sergey Matveev07e2d282015-04-23 20:40:04 +0000349between 0 and 25%.
350
351============== ========= ========= ========= ========= ========= =========
352 benchmark cov0 cov1 diff 0-1 cov2 diff 0-2 diff 1-2
353============== ========= ========= ========= ========= ========= =========
354 400.perlbench 1296.00 1307.00 1.01 1465.00 1.13 1.12
355 401.bzip2 858.00 854.00 1.00 1010.00 1.18 1.18
356 403.gcc 613.00 617.00 1.01 683.00 1.11 1.11
357 429.mcf 605.00 582.00 0.96 610.00 1.01 1.05
358 445.gobmk 896.00 880.00 0.98 1050.00 1.17 1.19
359 456.hmmer 892.00 892.00 1.00 918.00 1.03 1.03
360 458.sjeng 995.00 1009.00 1.01 1217.00 1.22 1.21
361462.libquantum 497.00 492.00 0.99 534.00 1.07 1.09
362 464.h264ref 1461.00 1467.00 1.00 1543.00 1.06 1.05
363 471.omnetpp 575.00 590.00 1.03 660.00 1.15 1.12
364 473.astar 658.00 652.00 0.99 715.00 1.09 1.10
365 483.xalancbmk 471.00 491.00 1.04 582.00 1.24 1.19
366 433.milc 616.00 627.00 1.02 627.00 1.02 1.00
367 444.namd 602.00 601.00 1.00 654.00 1.09 1.09
368 447.dealII 630.00 634.00 1.01 653.00 1.04 1.03
369 450.soplex 365.00 368.00 1.01 395.00 1.08 1.07
370 453.povray 427.00 434.00 1.02 495.00 1.16 1.14
371 470.lbm 357.00 375.00 1.05 370.00 1.04 0.99
372 482.sphinx3 927.00 928.00 1.00 1000.00 1.08 1.08
373============== ========= ========= ========= ========= ========= =========
374
375Why another coverage?
376=====================
377
378Why did we implement yet another code coverage?
379 * We needed something that is lightning fast, plays well with
380 AddressSanitizer, and does not significantly increase the binary size.
381 * Traditional coverage implementations based in global counters
382 `suffer from contention on counters
383 <https://groups.google.com/forum/#!topic/llvm-dev/cDqYgnxNEhY>`_.