Blame - clang/docs/SanitizerCoverage.rst - toolchain/llvm-project

blob: e759b351c1b5807635937424403d4e4ab76021ef [file] [log] [blame]

Sergey Matveev	33e3224	2015-04-23 21:29:37 +0000	[diff] [blame]	1	=================
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	2	SanitizerCoverage
Sergey Matveev	33e3224	2015-04-23 21:29:37 +0000	[diff] [blame]	3	=================
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	4
				5	.. contents::
				6	:local:
				7
				8	Introduction
				9	============
				10
				11	Sanitizer tools have a very simple code coverage tool built in. It allows to
				12	get function-level, basic-block-level, and edge-level coverage at a very low
				13	cost.
				14
				15	How to build and run
				16	====================
				17
				18	SanitizerCoverage can be used with :doc:`AddressSanitizer`,
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	19	:doc:`LeakSanitizer`, :doc:`MemorySanitizer`, and UndefinedBehaviorSanitizer.
				20	In addition to ``-fsanitize=``, pass one of the following compile-time flags:
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	21
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	22	* ``-fsanitize-coverage=func`` for function-level coverage (very fast).
				23	* ``-fsanitize-coverage=bb`` for basic-block-level coverage (may add up to 30%
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	24	extra slowdown).
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	25	* ``-fsanitize-coverage=edge`` for edge-level coverage (up to 40% slowdown).
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	26
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	27	You may also specify ``-fsanitize-coverage=indirect-calls`` for
				28	additional `caller-callee coverage`_.
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	29
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	30	At run time, pass ``coverage=1`` in ``ASAN_OPTIONS``, ``LSAN_OPTIONS``,
				31	``MSAN_OPTIONS`` or ``UBSAN_OPTIONS``, as appropriate.
				32
				33	To get `Coverage counters`_, add ``-fsanitize-coverage=8bit-counters``
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	34	to one of the above compile-time flags. At runtime, use
				35	``*SAN_OPTIONS=coverage=1:coverage_counters=1``.
				36
				37	Example:
				38
				39	.. code-block:: console
				40
				41	% cat -n cov.cc
				42	1 #include <stdio.h>
				43	2 __attribute__((noinline))
				44	3 void foo() { printf("foo\n"); }
				45	4
				46	5 int main(int argc, char **argv) {
				47	6 if (argc == 2)
				48	7 foo();
				49	8 printf("main\n");
				50	9 }
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	51	% clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=func
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	52	% ASAN_OPTIONS=coverage=1 ./a.out; ls -l *sancov
				53	main
				54	-rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
				55	% ASAN_OPTIONS=coverage=1 ./a.out foo ; ls -l *sancov
				56	foo
				57	main
				58	-rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
				59	-rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
				60
				61	Every time you run an executable instrumented with SanitizerCoverage
				62	one ``*.sancov`` file is created during the process shutdown.
				63	If the executable is dynamically linked against instrumented DSOs,
				64	one ``*.sancov`` file will be also created for every DSO.
				65
				66	Postprocessing
				67	==============
				68
				69	The format of ``*.sancov`` files is very simple: the first 8 bytes is the magic,
				70	one of ``0xC0BFFFFFFFFFFF64`` and ``0xC0BFFFFFFFFFFF32``. The last byte of the
				71	magic defines the size of the following offsets. The rest of the data is the
				72	offsets in the corresponding binary/DSO that were executed during the run.
				73
				74	A simple script
				75	``$LLVM/projects/compiler-rt/lib/sanitizer_common/scripts/sancov.py`` is
				76	provided to dump these offsets.
				77
				78	.. code-block:: console
				79
				80	% sancov.py print a.out.22679.sancov a.out.22673.sancov
				81	sancov.py: read 2 PCs from a.out.22679.sancov
				82	sancov.py: read 1 PCs from a.out.22673.sancov
				83	sancov.py: 2 files merged; 2 PCs total
				84	0x465250
				85	0x4652a0
				86
				87	You can then filter the output of ``sancov.py`` through ``addr2line --exe
				88	ObjectFile`` or ``llvm-symbolizer --obj ObjectFile`` to get file names and line
				89	numbers:
				90
				91	.. code-block:: console
				92
				93	% sancov.py print a.out.22679.sancov a.out.22673.sancov 2> /dev/null \| llvm-symbolizer --obj a.out
				94	cov.cc:3
				95	cov.cc:5
				96
				97	How good is the coverage?
				98	=========================
				99
Sergey Matveev	ea558e0	2015-05-06 21:09:00 +0000	[diff] [blame]	100	It is possible to find out which PCs are not covered, by subtracting the covered
				101	set from the set of all instrumented PCs. The latter can be obtained by listing
				102	all callsites of ``__sanitizer_cov()`` in the binary. On Linux, ``sancov.py``
				103	can do this for you. Just supply the path to binary and a list of covered PCs:
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	104
				105	.. code-block:: console
				106
Sergey Matveev	ea558e0	2015-05-06 21:09:00 +0000	[diff] [blame]	107	% sancov.py print a.out.12345.sancov > covered.txt
				108	sancov.py: read 2 64-bit PCs from a.out.12345.sancov
				109	sancov.py: 1 file merged; 2 PCs total
				110	% sancov.py missing a.out < covered.txt
				111	sancov.py: found 3 instrumented PCs in a.out
				112	sancov.py: read 2 PCs from stdin
				113	sancov.py: 1 PCs missing from coverage
				114	0x4cc61c
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	115
				116	Edge coverage
				117	=============
				118
				119	Consider this code:
				120
				121	.. code-block:: c++
				122
				123	void foo(int *a) {
				124	if (a)
				125	*a = 0;
				126	}
				127
				128	It contains 3 basic blocks, let's name them A, B, C:
				129
				130	.. code-block:: none
				131
				132	A
				133	\|\
				134	\| \
				135	\| B
				136	\| /
				137	\|/
				138	C
				139
				140	If blocks A, B, and C are all covered we know for certain that the edges A=>B
				141	and B=>C were executed, but we still don't know if the edge A=>C was executed.
				142	Such edges of control flow graph are called
				143	`critical <http://en.wikipedia.org/wiki/Control_flow_graph#Special_edges>`_. The
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	144	edge-level coverage (``-fsanitize-coverage=edge``) simply splits all critical
				145	edges by introducing new dummy blocks and then instruments those blocks:
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	146
				147	.. code-block:: none
				148
				149	A
				150	\|\
				151	\| \
				152	D B
				153	\| /
				154	\|/
				155	C
				156
				157	Bitset
				158	======
				159
				160	When ``coverage_bitset=1`` run-time flag is given, the coverage will also be
				161	dumped as a bitset (text file with 1 for blocks that have been executed and 0
				162	for blocks that were not).
				163
				164	.. code-block:: console
				165
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	166	% clang++ -fsanitize=address -fsanitize-coverage=edge cov.cc
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	167	% ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out
				168	main
				169	% ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out 1
				170	foo
				171	main
				172	% head bitset
				173	==> a.out.38214.bitset-sancov <==
				174	01101
				175	==> a.out.6128.bitset-sancov <==
				176	11011%
				177
				178	For a given executable the length of the bitset is always the same (well,
				179	unless dlopen/dlclose come into play), so the bitset coverage can be
				180	easily used for bitset-based corpus distillation.
				181
				182	Caller-callee coverage
				183	======================
				184
				185	(Experimental!)
				186	Every indirect function call is instrumented with a run-time function call that
				187	captures caller and callee. At the shutdown time the process dumps a separate
				188	file called ``caller-callee.PID.sancov`` which contains caller/callee pairs as
				189	pairs of lines (odd lines are callers, even lines are callees)
				190
				191	.. code-block:: console
				192
				193	a.out 0x4a2e0c
				194	a.out 0x4a6510
				195	a.out 0x4a2e0c
				196	a.out 0x4a87f0
				197
				198	Current limitations:
				199
				200	* Only the first 14 callees for every caller are recorded, the rest are silently
				201	ignored.
				202	* The output format is not very compact since caller and callee may reside in
				203	different modules and we need to spell out the module names.
				204	* The routine that dumps the output is not optimized for speed
				205	* Only Linux x86_64 is tested so far.
				206	* Sandboxes are not supported.
				207
				208	Coverage counters
				209	=================
				210
				211	This experimental feature is inspired by
				212	`AFL <http://lcamtuf.coredump.cx/afl/technical_details.txt>`_'s coverage
				213	instrumentation. With additional compile-time and run-time flags you can get
				214	more sensitive coverage information. In addition to boolean values assigned to
				215	every basic block (edge) the instrumentation will collect imprecise counters.
				216	On exit, every counter will be mapped to a 8-bit bitset representing counter
				217	ranges: ``1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+`` and those 8-bit bitsets will
				218	be dumped to disk.
				219
				220	.. code-block:: console
				221
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	222	% clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=edge,8bit-counters
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	223	% ASAN_OPTIONS="coverage=1:coverage_counters=1" ./a.out
				224	% ls -l *counters-sancov
				225	... a.out.17110.counters-sancov
				226	% xxd *counters-sancov
				227	0000000: 0001 0100 01
				228
				229	These counters may also be used for in-process coverage-guided fuzzers. See
				230	``include/sanitizer/coverage_interface.h``:
				231
				232	.. code-block:: c++
				233
				234	// The coverage instrumentation may optionally provide imprecise counters.
				235	// Rather than exposing the counter values to the user we instead map
				236	// the counters to a bitset.
				237	// Every counter is associated with 8 bits in the bitset.
				238	// We define 8 value ranges: 1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+
				239	// The i-th bit is set to 1 if the counter value is in the i-th range.
				240	// This counter-based coverage implementation is not thread-safe.
				241
				242	// Returns the number of registered coverage counters.
				243	uintptr_t __sanitizer_get_number_of_counters();
				244	// Updates the counter 'bitset', clears the counters and returns the number of
				245	// new bits in 'bitset'.
				246	// If 'bitset' is nullptr, only clears the counters.
				247	// Otherwise 'bitset' should be at least
				248	// __sanitizer_get_number_of_counters bytes long and 8-aligned.
				249	uintptr_t
				250	__sanitizer_update_counter_bitset_and_clear_counters(uint8_t *bitset);
				251
Kostya Serebryany	5ce8179	2015-12-02 02:08:26 +0000	[diff] [blame^]	252	Tracing basic blocks
				253	====================
				254	An experimental feature to support basic block (or edge) tracing.
				255	With ``-fsanitize-coverage=trace-bb`` the compiler will insert
				256	``__sanitizer_cov_trace_basic_block(s32 *id)`` before every function, basic block, or edge
				257	(depending on the value of ``-fsanitize-coverage=[func,bb,edge]``).
				258
Kostya Serebryany	b17e298	2015-07-31 21:48:10 +0000	[diff] [blame]	259	Tracing data flow
				260	=================
				261
				262	An experimental feature to support data-flow-guided fuzzing.
				263	With ``-fsanitize-coverage=trace-cmp`` the compiler will insert extra instrumentation
				264	around comparison instructions and switch statements.
				265	The fuzzer will need to define the following functions,
				266	they will be called by the instrumented code.
				267
				268	.. code-block:: c++
				269
				270	// Called before a comparison instruction.
				271	// SizeAndType is a packed value containing
				272	// - [63:32] the Size of the operands of comparison in bits
				273	// - [31:0] the Type of comparison (one of ICMP_EQ, ... ICMP_SLE)
				274	// Arg1 and Arg2 are arguments of the comparison.
				275	void __sanitizer_cov_trace_cmp(uint64_t SizeAndType, uint64_t Arg1, uint64_t Arg2);
				276
				277	// Called before a switch statement.
				278	// Val is the switch operand.
				279	// Cases[0] is the number of case constants.
				280	// Cases[1] is the size of Val in bits.
				281	// Cases[2:] are the case constants.
				282	void __sanitizer_cov_trace_switch(uint64_t Val, uint64_t *Cases);
				283
				284	This interface is a subject to change.
Kostya Serebryany	a94e6e7	2015-11-30 22:17:19 +0000	[diff] [blame]	285	The current implementation is not thread-safe and thus can be safely used only for single-threaded targets.
Kostya Serebryany	b17e298	2015-07-31 21:48:10 +0000	[diff] [blame]	286
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	287	Output directory
				288	================
				289
				290	By default, .sancov files are created in the current working directory.
				291	This can be changed with ``ASAN_OPTIONS=coverage_dir=/path``:
				292
				293	.. code-block:: console
				294
				295	% ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo
				296	% ls -l /tmp/cov/*sancov
				297	-rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
				298	-rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
				299
				300	Sudden death
				301	============
				302
				303	Normally, coverage data is collected in memory and saved to disk when the
				304	program exits (with an ``atexit()`` handler), when a SIGSEGV is caught, or when
				305	``__sanitizer_cov_dump()`` is called.
				306
				307	If the program ends with a signal that ASan does not handle (or can not handle
				308	at all, like SIGKILL), coverage data will be lost. This is a big problem on
				309	Android, where SIGKILL is a normal way of evicting applications from memory.
				310
				311	With ``ASAN_OPTIONS=coverage=1:coverage_direct=1`` coverage data is written to a
				312	memory-mapped file as soon as it collected.
				313
				314	.. code-block:: console
				315
				316	% ASAN_OPTIONS="coverage=1:coverage_direct=1" ./a.out
				317	main
				318	% ls
				319	7036.sancov.map 7036.sancov.raw a.out
				320	% sancov.py rawunpack 7036.sancov.raw
				321	sancov.py: reading map 7036.sancov.map
				322	sancov.py: unpacking 7036.sancov.raw
				323	writing 1 PCs to a.out.7036.sancov
				324	% sancov.py print a.out.7036.sancov
				325	sancov.py: read 1 PCs from a.out.7036.sancov
				326	sancov.py: 1 files merged; 1 PCs total
				327	0x4b2bae
				328
				329	Note that on 64-bit platforms, this method writes 2x more data than the default,
				330	because it stores full PC values instead of 32-bit offsets.
				331
				332	In-process fuzzing
				333	==================
				334
				335	Coverage data could be useful for fuzzers and sometimes it is preferable to run
				336	a fuzzer in the same process as the code being fuzzed (in-process fuzzer).
				337
				338	You can use ``__sanitizer_get_total_unique_coverage()`` from
				339	``<sanitizer/coverage_interface.h>`` which returns the number of currently
				340	covered entities in the program. This will tell the fuzzer if the coverage has
				341	increased after testing every new input.
				342
				343	If a fuzzer finds a bug in the ASan run, you will need to save the reproducer
				344	before exiting the process. Use ``__asan_set_death_callback`` from
				345	``<sanitizer/asan_interface.h>`` to do that.
				346
				347	An example of such fuzzer can be found in `the LLVM tree
				348	<http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup>`_.
				349
				350	Performance
				351	===========
				352
				353	This coverage implementation is fast. With function-level coverage
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	354	(``-fsanitize-coverage=func``) the overhead is not measurable. With
				355	basic-block-level coverage (``-fsanitize-coverage=bb``) the overhead varies
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	356	between 0 and 25%.
				357
				358	============== ========= ========= ========= ========= ========= =========
				359	benchmark cov0 cov1 diff 0-1 cov2 diff 0-2 diff 1-2
				360	============== ========= ========= ========= ========= ========= =========
				361	400.perlbench 1296.00 1307.00 1.01 1465.00 1.13 1.12
				362	401.bzip2 858.00 854.00 1.00 1010.00 1.18 1.18
				363	403.gcc 613.00 617.00 1.01 683.00 1.11 1.11
				364	429.mcf 605.00 582.00 0.96 610.00 1.01 1.05
				365	445.gobmk 896.00 880.00 0.98 1050.00 1.17 1.19
				366	456.hmmer 892.00 892.00 1.00 918.00 1.03 1.03
				367	458.sjeng 995.00 1009.00 1.01 1217.00 1.22 1.21
				368	462.libquantum 497.00 492.00 0.99 534.00 1.07 1.09
				369	464.h264ref 1461.00 1467.00 1.00 1543.00 1.06 1.05
				370	471.omnetpp 575.00 590.00 1.03 660.00 1.15 1.12
				371	473.astar 658.00 652.00 0.99 715.00 1.09 1.10
				372	483.xalancbmk 471.00 491.00 1.04 582.00 1.24 1.19
				373	433.milc 616.00 627.00 1.02 627.00 1.02 1.00
				374	444.namd 602.00 601.00 1.00 654.00 1.09 1.09
				375	447.dealII 630.00 634.00 1.01 653.00 1.04 1.03
				376	450.soplex 365.00 368.00 1.01 395.00 1.08 1.07
				377	453.povray 427.00 434.00 1.02 495.00 1.16 1.14
				378	470.lbm 357.00 375.00 1.05 370.00 1.04 0.99
				379	482.sphinx3 927.00 928.00 1.00 1000.00 1.08 1.08
				380	============== ========= ========= ========= ========= ========= =========
				381
				382	Why another coverage?
				383	=====================
				384
				385	Why did we implement yet another code coverage?
				386	* We needed something that is lightning fast, plays well with
				387	AddressSanitizer, and does not significantly increase the binary size.
				388	* Traditional coverage implementations based in global counters
				389	`suffer from contention on counters
				390	<https://groups.google.com/forum/#!topic/llvm-dev/cDqYgnxNEhY>`_.