Blame - clang/docs/SanitizerCoverage.rst - toolchain/llvm-project

blob: ebc637c58e17c02b4cad1882cdda9ba35a89e70a [file] [log] [blame]

Sergey Matveev	33e3224	2015-04-23 21:29:37 +0000	[diff] [blame]	1	=================
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	2	SanitizerCoverage
Sergey Matveev	33e3224	2015-04-23 21:29:37 +0000	[diff] [blame]	3	=================
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	4
				5	.. contents::
				6	:local:
				7
				8	Introduction
				9	============
				10
				11	Sanitizer tools have a very simple code coverage tool built in. It allows to
				12	get function-level, basic-block-level, and edge-level coverage at a very low
				13	cost.
				14
				15	How to build and run
				16	====================
				17
				18	SanitizerCoverage can be used with :doc:`AddressSanitizer`,
				19	:doc:`LeakSanitizer` or :doc:`MemorySanitizer`. In addition to
				20	``-fsanitize=address``, ``leak`` or ``memory``, pass one of the following
				21	compile-time flags:
				22
				23	* ``-fsanitize-coverage=1`` for function-level coverage (very fast).
				24	* ``-fsanitize-coverage=2`` for basic-block-level coverage (may add up to 30%
				25	extra slowdown).
				26	* ``-fsanitize-coverage=3`` for edge-level coverage (up to 40% slowdown).
				27	* ``-fsanitize-coverage=4`` for additional calleer-callee coverage.
				28
				29	At run time, pass ``coverage=1`` in ``ASAN_OPTIONS``, ``LSAN_OPTIONS`` or
				30	``MSAN_OPTIONS``, as appropriate.
				31
				32	To get `Coverage counters`_, add ``-mllvm -sanitizer-coverage-8bit-counters=1``
				33	to one of the above compile-time flags. At runtime, use
				34	``*SAN_OPTIONS=coverage=1:coverage_counters=1``.
				35
				36	Example:
				37
				38	.. code-block:: console
				39
				40	% cat -n cov.cc
				41	1 #include <stdio.h>
				42	2 __attribute__((noinline))
				43	3 void foo() { printf("foo\n"); }
				44	4
				45	5 int main(int argc, char **argv) {
				46	6 if (argc == 2)
				47	7 foo();
				48	8 printf("main\n");
				49	9 }
				50	% clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=1
				51	% ASAN_OPTIONS=coverage=1 ./a.out; ls -l *sancov
				52	main
				53	-rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
				54	% ASAN_OPTIONS=coverage=1 ./a.out foo ; ls -l *sancov
				55	foo
				56	main
				57	-rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
				58	-rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
				59
				60	Every time you run an executable instrumented with SanitizerCoverage
				61	one ``*.sancov`` file is created during the process shutdown.
				62	If the executable is dynamically linked against instrumented DSOs,
				63	one ``*.sancov`` file will be also created for every DSO.
				64
				65	Postprocessing
				66	==============
				67
				68	The format of ``*.sancov`` files is very simple: the first 8 bytes is the magic,
				69	one of ``0xC0BFFFFFFFFFFF64`` and ``0xC0BFFFFFFFFFFF32``. The last byte of the
				70	magic defines the size of the following offsets. The rest of the data is the
				71	offsets in the corresponding binary/DSO that were executed during the run.
				72
				73	A simple script
				74	``$LLVM/projects/compiler-rt/lib/sanitizer_common/scripts/sancov.py`` is
				75	provided to dump these offsets.
				76
				77	.. code-block:: console
				78
				79	% sancov.py print a.out.22679.sancov a.out.22673.sancov
				80	sancov.py: read 2 PCs from a.out.22679.sancov
				81	sancov.py: read 1 PCs from a.out.22673.sancov
				82	sancov.py: 2 files merged; 2 PCs total
				83	0x465250
				84	0x4652a0
				85
				86	You can then filter the output of ``sancov.py`` through ``addr2line --exe
				87	ObjectFile`` or ``llvm-symbolizer --obj ObjectFile`` to get file names and line
				88	numbers:
				89
				90	.. code-block:: console
				91
				92	% sancov.py print a.out.22679.sancov a.out.22673.sancov 2> /dev/null \| llvm-symbolizer --obj a.out
				93	cov.cc:3
				94	cov.cc:5
				95
				96	How good is the coverage?
				97	=========================
				98
Sergey Matveev	ea558e0	2015-05-06 21:09:00 +0000	[diff] [blame^]	99	It is possible to find out which PCs are not covered, by subtracting the covered
				100	set from the set of all instrumented PCs. The latter can be obtained by listing
				101	all callsites of ``__sanitizer_cov()`` in the binary. On Linux, ``sancov.py``
				102	can do this for you. Just supply the path to binary and a list of covered PCs:
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	103
				104	.. code-block:: console
				105
Sergey Matveev	ea558e0	2015-05-06 21:09:00 +0000	[diff] [blame^]	106	% sancov.py print a.out.12345.sancov > covered.txt
				107	sancov.py: read 2 64-bit PCs from a.out.12345.sancov
				108	sancov.py: 1 file merged; 2 PCs total
				109	% sancov.py missing a.out < covered.txt
				110	sancov.py: found 3 instrumented PCs in a.out
				111	sancov.py: read 2 PCs from stdin
				112	sancov.py: 1 PCs missing from coverage
				113	0x4cc61c
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	114
				115	Edge coverage
				116	=============
				117
				118	Consider this code:
				119
				120	.. code-block:: c++
				121
				122	void foo(int *a) {
				123	if (a)
				124	*a = 0;
				125	}
				126
				127	It contains 3 basic blocks, let's name them A, B, C:
				128
				129	.. code-block:: none
				130
				131	A
				132	\|\
				133	\| \
				134	\| B
				135	\| /
				136	\|/
				137	C
				138
				139	If blocks A, B, and C are all covered we know for certain that the edges A=>B
				140	and B=>C were executed, but we still don't know if the edge A=>C was executed.
				141	Such edges of control flow graph are called
				142	`critical <http://en.wikipedia.org/wiki/Control_flow_graph#Special_edges>`_. The
				143	edge-level coverage (``-fsanitize-coverage=3``) simply splits all critical edges
				144	by introducing new dummy blocks and then instruments those blocks:
				145
				146	.. code-block:: none
				147
				148	A
				149	\|\
				150	\| \
				151	D B
				152	\| /
				153	\|/
				154	C
				155
				156	Bitset
				157	======
				158
				159	When ``coverage_bitset=1`` run-time flag is given, the coverage will also be
				160	dumped as a bitset (text file with 1 for blocks that have been executed and 0
				161	for blocks that were not).
				162
				163	.. code-block:: console
				164
				165	% clang++ -fsanitize=address -fsanitize-coverage=3 cov.cc
				166	% ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out
				167	main
				168	% ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out 1
				169	foo
				170	main
				171	% head bitset
				172	==> a.out.38214.bitset-sancov <==
				173	01101
				174	==> a.out.6128.bitset-sancov <==
				175	11011%
				176
				177	For a given executable the length of the bitset is always the same (well,
				178	unless dlopen/dlclose come into play), so the bitset coverage can be
				179	easily used for bitset-based corpus distillation.
				180
				181	Caller-callee coverage
				182	======================
				183
				184	(Experimental!)
				185	Every indirect function call is instrumented with a run-time function call that
				186	captures caller and callee. At the shutdown time the process dumps a separate
				187	file called ``caller-callee.PID.sancov`` which contains caller/callee pairs as
				188	pairs of lines (odd lines are callers, even lines are callees)
				189
				190	.. code-block:: console
				191
				192	a.out 0x4a2e0c
				193	a.out 0x4a6510
				194	a.out 0x4a2e0c
				195	a.out 0x4a87f0
				196
				197	Current limitations:
				198
				199	* Only the first 14 callees for every caller are recorded, the rest are silently
				200	ignored.
				201	* The output format is not very compact since caller and callee may reside in
				202	different modules and we need to spell out the module names.
				203	* The routine that dumps the output is not optimized for speed
				204	* Only Linux x86_64 is tested so far.
				205	* Sandboxes are not supported.
				206
				207	Coverage counters
				208	=================
				209
				210	This experimental feature is inspired by
				211	`AFL <http://lcamtuf.coredump.cx/afl/technical_details.txt>`_'s coverage
				212	instrumentation. With additional compile-time and run-time flags you can get
				213	more sensitive coverage information. In addition to boolean values assigned to
				214	every basic block (edge) the instrumentation will collect imprecise counters.
				215	On exit, every counter will be mapped to a 8-bit bitset representing counter
				216	ranges: ``1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+`` and those 8-bit bitsets will
				217	be dumped to disk.
				218
				219	.. code-block:: console
				220
				221	% clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=3 -mllvm -sanitizer-coverage-8bit-counters=1
				222	% ASAN_OPTIONS="coverage=1:coverage_counters=1" ./a.out
				223	% ls -l *counters-sancov
				224	... a.out.17110.counters-sancov
				225	% xxd *counters-sancov
				226	0000000: 0001 0100 01
				227
				228	These counters may also be used for in-process coverage-guided fuzzers. See
				229	``include/sanitizer/coverage_interface.h``:
				230
				231	.. code-block:: c++
				232
				233	// The coverage instrumentation may optionally provide imprecise counters.
				234	// Rather than exposing the counter values to the user we instead map
				235	// the counters to a bitset.
				236	// Every counter is associated with 8 bits in the bitset.
				237	// We define 8 value ranges: 1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+
				238	// The i-th bit is set to 1 if the counter value is in the i-th range.
				239	// This counter-based coverage implementation is not thread-safe.
				240
				241	// Returns the number of registered coverage counters.
				242	uintptr_t __sanitizer_get_number_of_counters();
				243	// Updates the counter 'bitset', clears the counters and returns the number of
				244	// new bits in 'bitset'.
				245	// If 'bitset' is nullptr, only clears the counters.
				246	// Otherwise 'bitset' should be at least
				247	// __sanitizer_get_number_of_counters bytes long and 8-aligned.
				248	uintptr_t
				249	__sanitizer_update_counter_bitset_and_clear_counters(uint8_t *bitset);
				250
				251	Output directory
				252	================
				253
				254	By default, .sancov files are created in the current working directory.
				255	This can be changed with ``ASAN_OPTIONS=coverage_dir=/path``:
				256
				257	.. code-block:: console
				258
				259	% ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo
				260	% ls -l /tmp/cov/*sancov
				261	-rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
				262	-rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
				263
				264	Sudden death
				265	============
				266
				267	Normally, coverage data is collected in memory and saved to disk when the
				268	program exits (with an ``atexit()`` handler), when a SIGSEGV is caught, or when
				269	``__sanitizer_cov_dump()`` is called.
				270
				271	If the program ends with a signal that ASan does not handle (or can not handle
				272	at all, like SIGKILL), coverage data will be lost. This is a big problem on
				273	Android, where SIGKILL is a normal way of evicting applications from memory.
				274
				275	With ``ASAN_OPTIONS=coverage=1:coverage_direct=1`` coverage data is written to a
				276	memory-mapped file as soon as it collected.
				277
				278	.. code-block:: console
				279
				280	% ASAN_OPTIONS="coverage=1:coverage_direct=1" ./a.out
				281	main
				282	% ls
				283	7036.sancov.map 7036.sancov.raw a.out
				284	% sancov.py rawunpack 7036.sancov.raw
				285	sancov.py: reading map 7036.sancov.map
				286	sancov.py: unpacking 7036.sancov.raw
				287	writing 1 PCs to a.out.7036.sancov
				288	% sancov.py print a.out.7036.sancov
				289	sancov.py: read 1 PCs from a.out.7036.sancov
				290	sancov.py: 1 files merged; 1 PCs total
				291	0x4b2bae
				292
				293	Note that on 64-bit platforms, this method writes 2x more data than the default,
				294	because it stores full PC values instead of 32-bit offsets.
				295
				296	In-process fuzzing
				297	==================
				298
				299	Coverage data could be useful for fuzzers and sometimes it is preferable to run
				300	a fuzzer in the same process as the code being fuzzed (in-process fuzzer).
				301
				302	You can use ``__sanitizer_get_total_unique_coverage()`` from
				303	``<sanitizer/coverage_interface.h>`` which returns the number of currently
				304	covered entities in the program. This will tell the fuzzer if the coverage has
				305	increased after testing every new input.
				306
				307	If a fuzzer finds a bug in the ASan run, you will need to save the reproducer
				308	before exiting the process. Use ``__asan_set_death_callback`` from
				309	``<sanitizer/asan_interface.h>`` to do that.
				310
				311	An example of such fuzzer can be found in `the LLVM tree
				312	<http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup>`_.
				313
				314	Performance
				315	===========
				316
				317	This coverage implementation is fast. With function-level coverage
				318	(``-fsanitize-coverage=1``) the overhead is not measurable. With
				319	basic-block-level coverage (``-fsanitize-coverage=2``) the overhead varies
				320	between 0 and 25%.
				321
				322	============== ========= ========= ========= ========= ========= =========
				323	benchmark cov0 cov1 diff 0-1 cov2 diff 0-2 diff 1-2
				324	============== ========= ========= ========= ========= ========= =========
				325	400.perlbench 1296.00 1307.00 1.01 1465.00 1.13 1.12
				326	401.bzip2 858.00 854.00 1.00 1010.00 1.18 1.18
				327	403.gcc 613.00 617.00 1.01 683.00 1.11 1.11
				328	429.mcf 605.00 582.00 0.96 610.00 1.01 1.05
				329	445.gobmk 896.00 880.00 0.98 1050.00 1.17 1.19
				330	456.hmmer 892.00 892.00 1.00 918.00 1.03 1.03
				331	458.sjeng 995.00 1009.00 1.01 1217.00 1.22 1.21
				332	462.libquantum 497.00 492.00 0.99 534.00 1.07 1.09
				333	464.h264ref 1461.00 1467.00 1.00 1543.00 1.06 1.05
				334	471.omnetpp 575.00 590.00 1.03 660.00 1.15 1.12
				335	473.astar 658.00 652.00 0.99 715.00 1.09 1.10
				336	483.xalancbmk 471.00 491.00 1.04 582.00 1.24 1.19
				337	433.milc 616.00 627.00 1.02 627.00 1.02 1.00
				338	444.namd 602.00 601.00 1.00 654.00 1.09 1.09
				339	447.dealII 630.00 634.00 1.01 653.00 1.04 1.03
				340	450.soplex 365.00 368.00 1.01 395.00 1.08 1.07
				341	453.povray 427.00 434.00 1.02 495.00 1.16 1.14
				342	470.lbm 357.00 375.00 1.05 370.00 1.04 0.99
				343	482.sphinx3 927.00 928.00 1.00 1000.00 1.08 1.08
				344	============== ========= ========= ========= ========= ========= =========
				345
				346	Why another coverage?
				347	=====================
				348
				349	Why did we implement yet another code coverage?
				350	* We needed something that is lightning fast, plays well with
				351	AddressSanitizer, and does not significantly increase the binary size.
				352	* Traditional coverage implementations based in global counters
				353	`suffer from contention on counters
				354	<https://groups.google.com/forum/#!topic/llvm-dev/cDqYgnxNEhY>`_.