blob: ebc637c58e17c02b4cad1882cdda9ba35a89e70a [file] [log] [blame]
Sergey Matveev33e32242015-04-23 21:29:37 +00001=================
Sergey Matveev07e2d282015-04-23 20:40:04 +00002SanitizerCoverage
Sergey Matveev33e32242015-04-23 21:29:37 +00003=================
Sergey Matveev07e2d282015-04-23 20:40:04 +00004
5.. contents::
6 :local:
7
8Introduction
9============
10
11Sanitizer tools have a very simple code coverage tool built in. It allows to
12get function-level, basic-block-level, and edge-level coverage at a very low
13cost.
14
15How to build and run
16====================
17
18SanitizerCoverage can be used with :doc:`AddressSanitizer`,
19:doc:`LeakSanitizer` or :doc:`MemorySanitizer`. In addition to
20``-fsanitize=address``, ``leak`` or ``memory``, pass one of the following
21compile-time flags:
22
23* ``-fsanitize-coverage=1`` for function-level coverage (very fast).
24* ``-fsanitize-coverage=2`` for basic-block-level coverage (may add up to 30%
25 **extra** slowdown).
26* ``-fsanitize-coverage=3`` for edge-level coverage (up to 40% slowdown).
27* ``-fsanitize-coverage=4`` for additional calleer-callee coverage.
28
29At run time, pass ``coverage=1`` in ``ASAN_OPTIONS``, ``LSAN_OPTIONS`` or
30``MSAN_OPTIONS``, as appropriate.
31
32To get `Coverage counters`_, add ``-mllvm -sanitizer-coverage-8bit-counters=1``
33to one of the above compile-time flags. At runtime, use
34``*SAN_OPTIONS=coverage=1:coverage_counters=1``.
35
36Example:
37
38.. code-block:: console
39
40 % cat -n cov.cc
41 1 #include <stdio.h>
42 2 __attribute__((noinline))
43 3 void foo() { printf("foo\n"); }
44 4
45 5 int main(int argc, char **argv) {
46 6 if (argc == 2)
47 7 foo();
48 8 printf("main\n");
49 9 }
50 % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=1
51 % ASAN_OPTIONS=coverage=1 ./a.out; ls -l *sancov
52 main
53 -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
54 % ASAN_OPTIONS=coverage=1 ./a.out foo ; ls -l *sancov
55 foo
56 main
57 -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
58 -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
59
60Every time you run an executable instrumented with SanitizerCoverage
61one ``*.sancov`` file is created during the process shutdown.
62If the executable is dynamically linked against instrumented DSOs,
63one ``*.sancov`` file will be also created for every DSO.
64
65Postprocessing
66==============
67
68The format of ``*.sancov`` files is very simple: the first 8 bytes is the magic,
69one of ``0xC0BFFFFFFFFFFF64`` and ``0xC0BFFFFFFFFFFF32``. The last byte of the
70magic defines the size of the following offsets. The rest of the data is the
71offsets in the corresponding binary/DSO that were executed during the run.
72
73A simple script
74``$LLVM/projects/compiler-rt/lib/sanitizer_common/scripts/sancov.py`` is
75provided to dump these offsets.
76
77.. code-block:: console
78
79 % sancov.py print a.out.22679.sancov a.out.22673.sancov
80 sancov.py: read 2 PCs from a.out.22679.sancov
81 sancov.py: read 1 PCs from a.out.22673.sancov
82 sancov.py: 2 files merged; 2 PCs total
83 0x465250
84 0x4652a0
85
86You can then filter the output of ``sancov.py`` through ``addr2line --exe
87ObjectFile`` or ``llvm-symbolizer --obj ObjectFile`` to get file names and line
88numbers:
89
90.. code-block:: console
91
92 % sancov.py print a.out.22679.sancov a.out.22673.sancov 2> /dev/null | llvm-symbolizer --obj a.out
93 cov.cc:3
94 cov.cc:5
95
96How good is the coverage?
97=========================
98
Sergey Matveevea558e02015-05-06 21:09:00 +000099It is possible to find out which PCs are not covered, by subtracting the covered
100set from the set of all instrumented PCs. The latter can be obtained by listing
101all callsites of ``__sanitizer_cov()`` in the binary. On Linux, ``sancov.py``
102can do this for you. Just supply the path to binary and a list of covered PCs:
Sergey Matveev07e2d282015-04-23 20:40:04 +0000103
104.. code-block:: console
105
Sergey Matveevea558e02015-05-06 21:09:00 +0000106 % sancov.py print a.out.12345.sancov > covered.txt
107 sancov.py: read 2 64-bit PCs from a.out.12345.sancov
108 sancov.py: 1 file merged; 2 PCs total
109 % sancov.py missing a.out < covered.txt
110 sancov.py: found 3 instrumented PCs in a.out
111 sancov.py: read 2 PCs from stdin
112 sancov.py: 1 PCs missing from coverage
113 0x4cc61c
Sergey Matveev07e2d282015-04-23 20:40:04 +0000114
115Edge coverage
116=============
117
118Consider this code:
119
120.. code-block:: c++
121
122 void foo(int *a) {
123 if (a)
124 *a = 0;
125 }
126
127It contains 3 basic blocks, let's name them A, B, C:
128
129.. code-block:: none
130
131 A
132 |\
133 | \
134 | B
135 | /
136 |/
137 C
138
139If blocks A, B, and C are all covered we know for certain that the edges A=>B
140and B=>C were executed, but we still don't know if the edge A=>C was executed.
141Such edges of control flow graph are called
142`critical <http://en.wikipedia.org/wiki/Control_flow_graph#Special_edges>`_. The
143edge-level coverage (``-fsanitize-coverage=3``) simply splits all critical edges
144by introducing new dummy blocks and then instruments those blocks:
145
146.. code-block:: none
147
148 A
149 |\
150 | \
151 D B
152 | /
153 |/
154 C
155
156Bitset
157======
158
159When ``coverage_bitset=1`` run-time flag is given, the coverage will also be
160dumped as a bitset (text file with 1 for blocks that have been executed and 0
161for blocks that were not).
162
163.. code-block:: console
164
165 % clang++ -fsanitize=address -fsanitize-coverage=3 cov.cc
166 % ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out
167 main
168 % ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out 1
169 foo
170 main
171 % head *bitset*
172 ==> a.out.38214.bitset-sancov <==
173 01101
174 ==> a.out.6128.bitset-sancov <==
175 11011%
176
177For a given executable the length of the bitset is always the same (well,
178unless dlopen/dlclose come into play), so the bitset coverage can be
179easily used for bitset-based corpus distillation.
180
181Caller-callee coverage
182======================
183
184(Experimental!)
185Every indirect function call is instrumented with a run-time function call that
186captures caller and callee. At the shutdown time the process dumps a separate
187file called ``caller-callee.PID.sancov`` which contains caller/callee pairs as
188pairs of lines (odd lines are callers, even lines are callees)
189
190.. code-block:: console
191
192 a.out 0x4a2e0c
193 a.out 0x4a6510
194 a.out 0x4a2e0c
195 a.out 0x4a87f0
196
197Current limitations:
198
199* Only the first 14 callees for every caller are recorded, the rest are silently
200 ignored.
201* The output format is not very compact since caller and callee may reside in
202 different modules and we need to spell out the module names.
203* The routine that dumps the output is not optimized for speed
204* Only Linux x86_64 is tested so far.
205* Sandboxes are not supported.
206
207Coverage counters
208=================
209
210This experimental feature is inspired by
211`AFL <http://lcamtuf.coredump.cx/afl/technical_details.txt>`_'s coverage
212instrumentation. With additional compile-time and run-time flags you can get
213more sensitive coverage information. In addition to boolean values assigned to
214every basic block (edge) the instrumentation will collect imprecise counters.
215On exit, every counter will be mapped to a 8-bit bitset representing counter
216ranges: ``1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+`` and those 8-bit bitsets will
217be dumped to disk.
218
219.. code-block:: console
220
221 % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=3 -mllvm -sanitizer-coverage-8bit-counters=1
222 % ASAN_OPTIONS="coverage=1:coverage_counters=1" ./a.out
223 % ls -l *counters-sancov
224 ... a.out.17110.counters-sancov
225 % xxd *counters-sancov
226 0000000: 0001 0100 01
227
228These counters may also be used for in-process coverage-guided fuzzers. See
229``include/sanitizer/coverage_interface.h``:
230
231.. code-block:: c++
232
233 // The coverage instrumentation may optionally provide imprecise counters.
234 // Rather than exposing the counter values to the user we instead map
235 // the counters to a bitset.
236 // Every counter is associated with 8 bits in the bitset.
237 // We define 8 value ranges: 1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+
238 // The i-th bit is set to 1 if the counter value is in the i-th range.
239 // This counter-based coverage implementation is *not* thread-safe.
240
241 // Returns the number of registered coverage counters.
242 uintptr_t __sanitizer_get_number_of_counters();
243 // Updates the counter 'bitset', clears the counters and returns the number of
244 // new bits in 'bitset'.
245 // If 'bitset' is nullptr, only clears the counters.
246 // Otherwise 'bitset' should be at least
247 // __sanitizer_get_number_of_counters bytes long and 8-aligned.
248 uintptr_t
249 __sanitizer_update_counter_bitset_and_clear_counters(uint8_t *bitset);
250
251Output directory
252================
253
254By default, .sancov files are created in the current working directory.
255This can be changed with ``ASAN_OPTIONS=coverage_dir=/path``:
256
257.. code-block:: console
258
259 % ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo
260 % ls -l /tmp/cov/*sancov
261 -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
262 -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
263
264Sudden death
265============
266
267Normally, coverage data is collected in memory and saved to disk when the
268program exits (with an ``atexit()`` handler), when a SIGSEGV is caught, or when
269``__sanitizer_cov_dump()`` is called.
270
271If the program ends with a signal that ASan does not handle (or can not handle
272at all, like SIGKILL), coverage data will be lost. This is a big problem on
273Android, where SIGKILL is a normal way of evicting applications from memory.
274
275With ``ASAN_OPTIONS=coverage=1:coverage_direct=1`` coverage data is written to a
276memory-mapped file as soon as it collected.
277
278.. code-block:: console
279
280 % ASAN_OPTIONS="coverage=1:coverage_direct=1" ./a.out
281 main
282 % ls
283 7036.sancov.map 7036.sancov.raw a.out
284 % sancov.py rawunpack 7036.sancov.raw
285 sancov.py: reading map 7036.sancov.map
286 sancov.py: unpacking 7036.sancov.raw
287 writing 1 PCs to a.out.7036.sancov
288 % sancov.py print a.out.7036.sancov
289 sancov.py: read 1 PCs from a.out.7036.sancov
290 sancov.py: 1 files merged; 1 PCs total
291 0x4b2bae
292
293Note that on 64-bit platforms, this method writes 2x more data than the default,
294because it stores full PC values instead of 32-bit offsets.
295
296In-process fuzzing
297==================
298
299Coverage data could be useful for fuzzers and sometimes it is preferable to run
300a fuzzer in the same process as the code being fuzzed (in-process fuzzer).
301
302You can use ``__sanitizer_get_total_unique_coverage()`` from
303``<sanitizer/coverage_interface.h>`` which returns the number of currently
304covered entities in the program. This will tell the fuzzer if the coverage has
305increased after testing every new input.
306
307If a fuzzer finds a bug in the ASan run, you will need to save the reproducer
308before exiting the process. Use ``__asan_set_death_callback`` from
309``<sanitizer/asan_interface.h>`` to do that.
310
311An example of such fuzzer can be found in `the LLVM tree
312<http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup>`_.
313
314Performance
315===========
316
317This coverage implementation is **fast**. With function-level coverage
318(``-fsanitize-coverage=1``) the overhead is not measurable. With
319basic-block-level coverage (``-fsanitize-coverage=2``) the overhead varies
320between 0 and 25%.
321
322============== ========= ========= ========= ========= ========= =========
323 benchmark cov0 cov1 diff 0-1 cov2 diff 0-2 diff 1-2
324============== ========= ========= ========= ========= ========= =========
325 400.perlbench 1296.00 1307.00 1.01 1465.00 1.13 1.12
326 401.bzip2 858.00 854.00 1.00 1010.00 1.18 1.18
327 403.gcc 613.00 617.00 1.01 683.00 1.11 1.11
328 429.mcf 605.00 582.00 0.96 610.00 1.01 1.05
329 445.gobmk 896.00 880.00 0.98 1050.00 1.17 1.19
330 456.hmmer 892.00 892.00 1.00 918.00 1.03 1.03
331 458.sjeng 995.00 1009.00 1.01 1217.00 1.22 1.21
332462.libquantum 497.00 492.00 0.99 534.00 1.07 1.09
333 464.h264ref 1461.00 1467.00 1.00 1543.00 1.06 1.05
334 471.omnetpp 575.00 590.00 1.03 660.00 1.15 1.12
335 473.astar 658.00 652.00 0.99 715.00 1.09 1.10
336 483.xalancbmk 471.00 491.00 1.04 582.00 1.24 1.19
337 433.milc 616.00 627.00 1.02 627.00 1.02 1.00
338 444.namd 602.00 601.00 1.00 654.00 1.09 1.09
339 447.dealII 630.00 634.00 1.01 653.00 1.04 1.03
340 450.soplex 365.00 368.00 1.01 395.00 1.08 1.07
341 453.povray 427.00 434.00 1.02 495.00 1.16 1.14
342 470.lbm 357.00 375.00 1.05 370.00 1.04 0.99
343 482.sphinx3 927.00 928.00 1.00 1000.00 1.08 1.08
344============== ========= ========= ========= ========= ========= =========
345
346Why another coverage?
347=====================
348
349Why did we implement yet another code coverage?
350 * We needed something that is lightning fast, plays well with
351 AddressSanitizer, and does not significantly increase the binary size.
352 * Traditional coverage implementations based in global counters
353 `suffer from contention on counters
354 <https://groups.google.com/forum/#!topic/llvm-dev/cDqYgnxNEhY>`_.