Blame - clang/docs/SanitizerCoverage.rst - toolchain/llvm-project

blob: 024511cf51c6e1e28f62db268ccb5a0e408c6f3c [file] [log] [blame]

Sergey Matveev	33e3224	2015-04-23 21:29:37 +0000	[diff] [blame]	1	=================
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	2	SanitizerCoverage
Sergey Matveev	33e3224	2015-04-23 21:29:37 +0000	[diff] [blame]	3	=================
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	4
				5	.. contents::
				6	:local:
				7
				8	Introduction
				9	============
				10
				11	Sanitizer tools have a very simple code coverage tool built in. It allows to
				12	get function-level, basic-block-level, and edge-level coverage at a very low
				13	cost.
				14
				15	How to build and run
				16	====================
				17
				18	SanitizerCoverage can be used with :doc:`AddressSanitizer`,
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	19	:doc:`LeakSanitizer`, :doc:`MemorySanitizer`, and UndefinedBehaviorSanitizer.
				20	In addition to ``-fsanitize=``, pass one of the following compile-time flags:
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	21
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	22	* ``-fsanitize-coverage=func`` for function-level coverage (very fast).
				23	* ``-fsanitize-coverage=bb`` for basic-block-level coverage (may add up to 30%
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	24	extra slowdown).
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	25	* ``-fsanitize-coverage=edge`` for edge-level coverage (up to 40% slowdown).
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	26
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	27	You may also specify ``-fsanitize-coverage=indirect-calls`` for
				28	additional `caller-callee coverage`_.
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	29
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	30	At run time, pass ``coverage=1`` in ``ASAN_OPTIONS``, ``LSAN_OPTIONS``,
				31	``MSAN_OPTIONS`` or ``UBSAN_OPTIONS``, as appropriate.
				32
				33	To get `Coverage counters`_, add ``-fsanitize-coverage=8bit-counters``
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	34	to one of the above compile-time flags. At runtime, use
				35	``*SAN_OPTIONS=coverage=1:coverage_counters=1``.
				36
				37	Example:
				38
				39	.. code-block:: console
				40
				41	% cat -n cov.cc
				42	1 #include <stdio.h>
				43	2 __attribute__((noinline))
				44	3 void foo() { printf("foo\n"); }
				45	4
				46	5 int main(int argc, char **argv) {
				47	6 if (argc == 2)
				48	7 foo();
				49	8 printf("main\n");
				50	9 }
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	51	% clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=func
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	52	% ASAN_OPTIONS=coverage=1 ./a.out; ls -l *sancov
				53	main
				54	-rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
				55	% ASAN_OPTIONS=coverage=1 ./a.out foo ; ls -l *sancov
				56	foo
				57	main
				58	-rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
				59	-rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
				60
				61	Every time you run an executable instrumented with SanitizerCoverage
				62	one ``*.sancov`` file is created during the process shutdown.
				63	If the executable is dynamically linked against instrumented DSOs,
				64	one ``*.sancov`` file will be also created for every DSO.
				65
				66	Postprocessing
				67	==============
				68
				69	The format of ``*.sancov`` files is very simple: the first 8 bytes is the magic,
				70	one of ``0xC0BFFFFFFFFFFF64`` and ``0xC0BFFFFFFFFFFF32``. The last byte of the
				71	magic defines the size of the following offsets. The rest of the data is the
				72	offsets in the corresponding binary/DSO that were executed during the run.
				73
				74	A simple script
				75	``$LLVM/projects/compiler-rt/lib/sanitizer_common/scripts/sancov.py`` is
				76	provided to dump these offsets.
				77
				78	.. code-block:: console
				79
				80	% sancov.py print a.out.22679.sancov a.out.22673.sancov
				81	sancov.py: read 2 PCs from a.out.22679.sancov
				82	sancov.py: read 1 PCs from a.out.22673.sancov
				83	sancov.py: 2 files merged; 2 PCs total
				84	0x465250
				85	0x4652a0
				86
				87	You can then filter the output of ``sancov.py`` through ``addr2line --exe
				88	ObjectFile`` or ``llvm-symbolizer --obj ObjectFile`` to get file names and line
				89	numbers:
				90
				91	.. code-block:: console
				92
				93	% sancov.py print a.out.22679.sancov a.out.22673.sancov 2> /dev/null \| llvm-symbolizer --obj a.out
				94	cov.cc:3
				95	cov.cc:5
				96
Mike Aizatsky	3828cbb	2016-01-27 23:56:12 +0000	[diff] [blame]	97	Sancov Tool
				98	===========
				99
				100	A new experimental ``sancov`` tool is developed to process coverage files.
				101	The tool is part of LLVM project and is currently supported only on Linux.
Mike Aizatsky	a731ee3	2016-02-12 00:29:45 +0000	[diff] [blame]	102	It can handle symbolization tasks autonomously without any extra support
				103	from the environment. You need to pass .sancov files (named
				104	``<module_name>.<pid>.sancov`` and paths to all corresponding binary elf files.
				105	Sancov matches these files using module names and binaries file names.
Mike Aizatsky	3828cbb	2016-01-27 23:56:12 +0000	[diff] [blame]	106
				107	.. code-block:: console
				108
Mike Aizatsky	a731ee3	2016-02-12 00:29:45 +0000	[diff] [blame]	109	USAGE: sancov [options] <action> (<binary file>\|<.sancov file>)...
Mike Aizatsky	3828cbb	2016-01-27 23:56:12 +0000	[diff] [blame]	110
				111	Action (required)
				112	-print - Print coverage addresses
Sylvestre Ledru	be8f396	2016-02-14 20:20:58 +0000	[diff] [blame]	113	-covered-functions - Print all covered functions.
				114	-not-covered-functions - Print all not covered functions.
Mike Aizatsky	3828cbb	2016-01-27 23:56:12 +0000	[diff] [blame]	115	-html-report - Print HTML coverage report.
				116
				117	Options
				118	-blacklist=<string> - Blacklist file (sanitizer blacklist format).
				119	-demangle - Print demangled function name.
Mike Aizatsky	3828cbb	2016-01-27 23:56:12 +0000	[diff] [blame]	120	-strip_path_prefix=<string> - Strip this prefix from file paths in reports
				121
				122
				123	Automatic HTML Report Generation
				124	================================
				125
				126	If ``*SAN_OPTIONS`` contains ``html_cov_report=1`` option set, then html
				127	coverage report would be automatically generated alongside the coverage files.
				128	The ``sancov`` binary should be present in ``PATH`` or
				129	``sancov_path=<path_to_sancov`` option can be used to specify tool location.
				130
				131
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	132	How good is the coverage?
				133	=========================
				134
Sergey Matveev	ea558e0	2015-05-06 21:09:00 +0000	[diff] [blame]	135	It is possible to find out which PCs are not covered, by subtracting the covered
				136	set from the set of all instrumented PCs. The latter can be obtained by listing
				137	all callsites of ``__sanitizer_cov()`` in the binary. On Linux, ``sancov.py``
				138	can do this for you. Just supply the path to binary and a list of covered PCs:
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	139
				140	.. code-block:: console
				141
Sergey Matveev	ea558e0	2015-05-06 21:09:00 +0000	[diff] [blame]	142	% sancov.py print a.out.12345.sancov > covered.txt
				143	sancov.py: read 2 64-bit PCs from a.out.12345.sancov
				144	sancov.py: 1 file merged; 2 PCs total
				145	% sancov.py missing a.out < covered.txt
				146	sancov.py: found 3 instrumented PCs in a.out
				147	sancov.py: read 2 PCs from stdin
				148	sancov.py: 1 PCs missing from coverage
				149	0x4cc61c
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	150
				151	Edge coverage
				152	=============
				153
				154	Consider this code:
				155
				156	.. code-block:: c++
				157
				158	void foo(int *a) {
				159	if (a)
				160	*a = 0;
				161	}
				162
				163	It contains 3 basic blocks, let's name them A, B, C:
				164
				165	.. code-block:: none
				166
				167	A
				168	\|\
				169	\| \
				170	\| B
				171	\| /
				172	\|/
				173	C
				174
				175	If blocks A, B, and C are all covered we know for certain that the edges A=>B
				176	and B=>C were executed, but we still don't know if the edge A=>C was executed.
				177	Such edges of control flow graph are called
				178	`critical <http://en.wikipedia.org/wiki/Control_flow_graph#Special_edges>`_. The
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	179	edge-level coverage (``-fsanitize-coverage=edge``) simply splits all critical
				180	edges by introducing new dummy blocks and then instruments those blocks:
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	181
				182	.. code-block:: none
				183
				184	A
				185	\|\
				186	\| \
				187	D B
				188	\| /
				189	\|/
				190	C
				191
				192	Bitset
				193	======
				194
				195	When ``coverage_bitset=1`` run-time flag is given, the coverage will also be
				196	dumped as a bitset (text file with 1 for blocks that have been executed and 0
				197	for blocks that were not).
				198
				199	.. code-block:: console
				200
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	201	% clang++ -fsanitize=address -fsanitize-coverage=edge cov.cc
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	202	% ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out
				203	main
				204	% ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out 1
				205	foo
				206	main
				207	% head bitset
				208	==> a.out.38214.bitset-sancov <==
				209	01101
				210	==> a.out.6128.bitset-sancov <==
				211	11011%
				212
				213	For a given executable the length of the bitset is always the same (well,
				214	unless dlopen/dlclose come into play), so the bitset coverage can be
				215	easily used for bitset-based corpus distillation.
				216
				217	Caller-callee coverage
				218	======================
				219
				220	(Experimental!)
				221	Every indirect function call is instrumented with a run-time function call that
				222	captures caller and callee. At the shutdown time the process dumps a separate
				223	file called ``caller-callee.PID.sancov`` which contains caller/callee pairs as
				224	pairs of lines (odd lines are callers, even lines are callees)
				225
				226	.. code-block:: console
				227
				228	a.out 0x4a2e0c
				229	a.out 0x4a6510
				230	a.out 0x4a2e0c
				231	a.out 0x4a87f0
				232
				233	Current limitations:
				234
				235	* Only the first 14 callees for every caller are recorded, the rest are silently
				236	ignored.
				237	* The output format is not very compact since caller and callee may reside in
				238	different modules and we need to spell out the module names.
				239	* The routine that dumps the output is not optimized for speed
				240	* Only Linux x86_64 is tested so far.
				241	* Sandboxes are not supported.
				242
				243	Coverage counters
				244	=================
				245
				246	This experimental feature is inspired by
Aaron Ballman	0f6f82a3	2016-02-22 13:09:36 +0000	[diff] [blame^]	247	`AFL <http://lcamtuf.coredump.cx/afl/technical_details.txt>`__'s coverage
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	248	instrumentation. With additional compile-time and run-time flags you can get
				249	more sensitive coverage information. In addition to boolean values assigned to
				250	every basic block (edge) the instrumentation will collect imprecise counters.
				251	On exit, every counter will be mapped to a 8-bit bitset representing counter
				252	ranges: ``1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+`` and those 8-bit bitsets will
				253	be dumped to disk.
				254
				255	.. code-block:: console
				256
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	257	% clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=edge,8bit-counters
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	258	% ASAN_OPTIONS="coverage=1:coverage_counters=1" ./a.out
				259	% ls -l *counters-sancov
				260	... a.out.17110.counters-sancov
				261	% xxd *counters-sancov
				262	0000000: 0001 0100 01
				263
				264	These counters may also be used for in-process coverage-guided fuzzers. See
				265	``include/sanitizer/coverage_interface.h``:
				266
				267	.. code-block:: c++
				268
				269	// The coverage instrumentation may optionally provide imprecise counters.
				270	// Rather than exposing the counter values to the user we instead map
				271	// the counters to a bitset.
				272	// Every counter is associated with 8 bits in the bitset.
				273	// We define 8 value ranges: 1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+
				274	// The i-th bit is set to 1 if the counter value is in the i-th range.
				275	// This counter-based coverage implementation is not thread-safe.
				276
				277	// Returns the number of registered coverage counters.
				278	uintptr_t __sanitizer_get_number_of_counters();
				279	// Updates the counter 'bitset', clears the counters and returns the number of
				280	// new bits in 'bitset'.
				281	// If 'bitset' is nullptr, only clears the counters.
				282	// Otherwise 'bitset' should be at least
				283	// __sanitizer_get_number_of_counters bytes long and 8-aligned.
				284	uintptr_t
				285	__sanitizer_update_counter_bitset_and_clear_counters(uint8_t *bitset);
				286
Kostya Serebryany	5ce8179	2015-12-02 02:08:26 +0000	[diff] [blame]	287	Tracing basic blocks
				288	====================
				289	An experimental feature to support basic block (or edge) tracing.
				290	With ``-fsanitize-coverage=trace-bb`` the compiler will insert
				291	``__sanitizer_cov_trace_basic_block(s32 *id)`` before every function, basic block, or edge
				292	(depending on the value of ``-fsanitize-coverage=[func,bb,edge]``).
				293
Kostya Serebryany	d4590c7	2016-02-17 21:34:43 +0000	[diff] [blame]	294	Tracing PCs
				295	===========
				296	Experimental feature similar to tracing basic blocks, but with a different API.
Kostya Serebryany	52e8649	2016-02-18 00:49:23 +0000	[diff] [blame]	297	With ``-fsanitize-coverage=trace-pc`` the compiler will insert
				298	``__sanitizer_cov_trace_pc()`` on every edge.
				299	With an additional ``...=trace-pc,indirect-calls`` flag
Kostya Serebryany	d4590c7	2016-02-17 21:34:43 +0000	[diff] [blame]	300	``__sanitizer_cov_trace_pc_indirect(void *callee)`` will be inserted on every indirect call.
				301	These callbacks are not implemented in the Sanitizer run-time and should be defined
Kostya Serebryany	52e8649	2016-02-18 00:49:23 +0000	[diff] [blame]	302	by the user. So, these flags do not require the other sanitizer to be used.
				303	This mechanism is used for fuzzing the Linux kernel (https://github.com/google/syzkaller)
Aaron Ballman	0f6f82a3	2016-02-22 13:09:36 +0000	[diff] [blame^]	304	and can be used with `AFL <http://lcamtuf.coredump.cx/afl>`__.
Kostya Serebryany	d4590c7	2016-02-17 21:34:43 +0000	[diff] [blame]	305
Kostya Serebryany	b17e298	2015-07-31 21:48:10 +0000	[diff] [blame]	306	Tracing data flow
				307	=================
				308
				309	An experimental feature to support data-flow-guided fuzzing.
				310	With ``-fsanitize-coverage=trace-cmp`` the compiler will insert extra instrumentation
				311	around comparison instructions and switch statements.
				312	The fuzzer will need to define the following functions,
				313	they will be called by the instrumented code.
				314
				315	.. code-block:: c++
				316
				317	// Called before a comparison instruction.
				318	// SizeAndType is a packed value containing
				319	// - [63:32] the Size of the operands of comparison in bits
				320	// - [31:0] the Type of comparison (one of ICMP_EQ, ... ICMP_SLE)
				321	// Arg1 and Arg2 are arguments of the comparison.
				322	void __sanitizer_cov_trace_cmp(uint64_t SizeAndType, uint64_t Arg1, uint64_t Arg2);
				323
				324	// Called before a switch statement.
				325	// Val is the switch operand.
				326	// Cases[0] is the number of case constants.
				327	// Cases[1] is the size of Val in bits.
				328	// Cases[2:] are the case constants.
				329	void __sanitizer_cov_trace_switch(uint64_t Val, uint64_t *Cases);
				330
				331	This interface is a subject to change.
Kostya Serebryany	a94e6e7	2015-11-30 22:17:19 +0000	[diff] [blame]	332	The current implementation is not thread-safe and thus can be safely used only for single-threaded targets.
Kostya Serebryany	b17e298	2015-07-31 21:48:10 +0000	[diff] [blame]	333
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	334	Output directory
				335	================
				336
				337	By default, .sancov files are created in the current working directory.
				338	This can be changed with ``ASAN_OPTIONS=coverage_dir=/path``:
				339
				340	.. code-block:: console
				341
				342	% ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo
				343	% ls -l /tmp/cov/*sancov
				344	-rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
				345	-rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
				346
				347	Sudden death
				348	============
				349
				350	Normally, coverage data is collected in memory and saved to disk when the
				351	program exits (with an ``atexit()`` handler), when a SIGSEGV is caught, or when
				352	``__sanitizer_cov_dump()`` is called.
				353
				354	If the program ends with a signal that ASan does not handle (or can not handle
				355	at all, like SIGKILL), coverage data will be lost. This is a big problem on
				356	Android, where SIGKILL is a normal way of evicting applications from memory.
				357
				358	With ``ASAN_OPTIONS=coverage=1:coverage_direct=1`` coverage data is written to a
				359	memory-mapped file as soon as it collected.
				360
				361	.. code-block:: console
				362
				363	% ASAN_OPTIONS="coverage=1:coverage_direct=1" ./a.out
				364	main
				365	% ls
				366	7036.sancov.map 7036.sancov.raw a.out
				367	% sancov.py rawunpack 7036.sancov.raw
				368	sancov.py: reading map 7036.sancov.map
				369	sancov.py: unpacking 7036.sancov.raw
				370	writing 1 PCs to a.out.7036.sancov
				371	% sancov.py print a.out.7036.sancov
				372	sancov.py: read 1 PCs from a.out.7036.sancov
				373	sancov.py: 1 files merged; 1 PCs total
				374	0x4b2bae
				375
				376	Note that on 64-bit platforms, this method writes 2x more data than the default,
				377	because it stores full PC values instead of 32-bit offsets.
				378
				379	In-process fuzzing
				380	==================
				381
				382	Coverage data could be useful for fuzzers and sometimes it is preferable to run
				383	a fuzzer in the same process as the code being fuzzed (in-process fuzzer).
				384
				385	You can use ``__sanitizer_get_total_unique_coverage()`` from
				386	``<sanitizer/coverage_interface.h>`` which returns the number of currently
				387	covered entities in the program. This will tell the fuzzer if the coverage has
				388	increased after testing every new input.
				389
				390	If a fuzzer finds a bug in the ASan run, you will need to save the reproducer
				391	before exiting the process. Use ``__asan_set_death_callback`` from
				392	``<sanitizer/asan_interface.h>`` to do that.
				393
				394	An example of such fuzzer can be found in `the LLVM tree
				395	<http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup>`_.
				396
				397	Performance
				398	===========
				399
				400	This coverage implementation is fast. With function-level coverage
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	401	(``-fsanitize-coverage=func``) the overhead is not measurable. With
				402	basic-block-level coverage (``-fsanitize-coverage=bb``) the overhead varies
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	403	between 0 and 25%.
				404
				405	============== ========= ========= ========= ========= ========= =========
				406	benchmark cov0 cov1 diff 0-1 cov2 diff 0-2 diff 1-2
				407	============== ========= ========= ========= ========= ========= =========
				408	400.perlbench 1296.00 1307.00 1.01 1465.00 1.13 1.12
				409	401.bzip2 858.00 854.00 1.00 1010.00 1.18 1.18
				410	403.gcc 613.00 617.00 1.01 683.00 1.11 1.11
				411	429.mcf 605.00 582.00 0.96 610.00 1.01 1.05
				412	445.gobmk 896.00 880.00 0.98 1050.00 1.17 1.19
				413	456.hmmer 892.00 892.00 1.00 918.00 1.03 1.03
				414	458.sjeng 995.00 1009.00 1.01 1217.00 1.22 1.21
				415	462.libquantum 497.00 492.00 0.99 534.00 1.07 1.09
				416	464.h264ref 1461.00 1467.00 1.00 1543.00 1.06 1.05
				417	471.omnetpp 575.00 590.00 1.03 660.00 1.15 1.12
				418	473.astar 658.00 652.00 0.99 715.00 1.09 1.10
				419	483.xalancbmk 471.00 491.00 1.04 582.00 1.24 1.19
				420	433.milc 616.00 627.00 1.02 627.00 1.02 1.00
				421	444.namd 602.00 601.00 1.00 654.00 1.09 1.09
				422	447.dealII 630.00 634.00 1.01 653.00 1.04 1.03
				423	450.soplex 365.00 368.00 1.01 395.00 1.08 1.07
				424	453.povray 427.00 434.00 1.02 495.00 1.16 1.14
				425	470.lbm 357.00 375.00 1.05 370.00 1.04 0.99
				426	482.sphinx3 927.00 928.00 1.00 1000.00 1.08 1.08
				427	============== ========= ========= ========= ========= ========= =========
				428
				429	Why another coverage?
				430	=====================
				431
				432	Why did we implement yet another code coverage?
				433	* We needed something that is lightning fast, plays well with
				434	AddressSanitizer, and does not significantly increase the binary size.
				435	* Traditional coverage implementations based in global counters
				436	`suffer from contention on counters
				437	<https://groups.google.com/forum/#!topic/llvm-dev/cDqYgnxNEhY>`_.