Blame - clang/docs/SanitizerCoverage.rst - toolchain/llvm-project

blob: f8ac1dc2df97508f58c3940e507157d4ba681b81 [file] [log] [blame]

Sergey Matveev	33e3224	2015-04-23 21:29:37 +0000	[diff] [blame]	1	=================
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	2	SanitizerCoverage
Sergey Matveev	33e3224	2015-04-23 21:29:37 +0000	[diff] [blame]	3	=================
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	4
				5	.. contents::
				6	:local:
				7
				8	Introduction
				9	============
				10
				11	Sanitizer tools have a very simple code coverage tool built in. It allows to
				12	get function-level, basic-block-level, and edge-level coverage at a very low
				13	cost.
				14
				15	How to build and run
				16	====================
				17
				18	SanitizerCoverage can be used with :doc:`AddressSanitizer`,
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	19	:doc:`LeakSanitizer`, :doc:`MemorySanitizer`, and UndefinedBehaviorSanitizer.
				20	In addition to ``-fsanitize=``, pass one of the following compile-time flags:
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	21
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	22	* ``-fsanitize-coverage=func`` for function-level coverage (very fast).
				23	* ``-fsanitize-coverage=bb`` for basic-block-level coverage (may add up to 30%
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	24	extra slowdown).
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	25	* ``-fsanitize-coverage=edge`` for edge-level coverage (up to 40% slowdown).
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	26
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	27	You may also specify ``-fsanitize-coverage=indirect-calls`` for
				28	additional `caller-callee coverage`_.
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	29
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	30	At run time, pass ``coverage=1`` in ``ASAN_OPTIONS``, ``LSAN_OPTIONS``,
				31	``MSAN_OPTIONS`` or ``UBSAN_OPTIONS``, as appropriate.
				32
				33	To get `Coverage counters`_, add ``-fsanitize-coverage=8bit-counters``
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	34	to one of the above compile-time flags. At runtime, use
				35	``*SAN_OPTIONS=coverage=1:coverage_counters=1``.
				36
				37	Example:
				38
				39	.. code-block:: console
				40
				41	% cat -n cov.cc
				42	1 #include <stdio.h>
				43	2 __attribute__((noinline))
				44	3 void foo() { printf("foo\n"); }
				45	4
				46	5 int main(int argc, char **argv) {
				47	6 if (argc == 2)
				48	7 foo();
				49	8 printf("main\n");
				50	9 }
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	51	% clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=func
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	52	% ASAN_OPTIONS=coverage=1 ./a.out; ls -l *sancov
				53	main
				54	-rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
				55	% ASAN_OPTIONS=coverage=1 ./a.out foo ; ls -l *sancov
				56	foo
				57	main
				58	-rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
				59	-rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
				60
				61	Every time you run an executable instrumented with SanitizerCoverage
				62	one ``*.sancov`` file is created during the process shutdown.
				63	If the executable is dynamically linked against instrumented DSOs,
				64	one ``*.sancov`` file will be also created for every DSO.
				65
				66	Postprocessing
				67	==============
				68
				69	The format of ``*.sancov`` files is very simple: the first 8 bytes is the magic,
				70	one of ``0xC0BFFFFFFFFFFF64`` and ``0xC0BFFFFFFFFFFF32``. The last byte of the
				71	magic defines the size of the following offsets. The rest of the data is the
				72	offsets in the corresponding binary/DSO that were executed during the run.
				73
				74	A simple script
				75	``$LLVM/projects/compiler-rt/lib/sanitizer_common/scripts/sancov.py`` is
				76	provided to dump these offsets.
				77
				78	.. code-block:: console
				79
				80	% sancov.py print a.out.22679.sancov a.out.22673.sancov
				81	sancov.py: read 2 PCs from a.out.22679.sancov
				82	sancov.py: read 1 PCs from a.out.22673.sancov
				83	sancov.py: 2 files merged; 2 PCs total
				84	0x465250
				85	0x4652a0
				86
				87	You can then filter the output of ``sancov.py`` through ``addr2line --exe
				88	ObjectFile`` or ``llvm-symbolizer --obj ObjectFile`` to get file names and line
				89	numbers:
				90
				91	.. code-block:: console
				92
				93	% sancov.py print a.out.22679.sancov a.out.22673.sancov 2> /dev/null \| llvm-symbolizer --obj a.out
				94	cov.cc:3
				95	cov.cc:5
				96
				97	How good is the coverage?
				98	=========================
				99
Sergey Matveev	ea558e0	2015-05-06 21:09:00 +0000	[diff] [blame]	100	It is possible to find out which PCs are not covered, by subtracting the covered
				101	set from the set of all instrumented PCs. The latter can be obtained by listing
				102	all callsites of ``__sanitizer_cov()`` in the binary. On Linux, ``sancov.py``
				103	can do this for you. Just supply the path to binary and a list of covered PCs:
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	104
				105	.. code-block:: console
				106
Sergey Matveev	ea558e0	2015-05-06 21:09:00 +0000	[diff] [blame]	107	% sancov.py print a.out.12345.sancov > covered.txt
				108	sancov.py: read 2 64-bit PCs from a.out.12345.sancov
				109	sancov.py: 1 file merged; 2 PCs total
				110	% sancov.py missing a.out < covered.txt
				111	sancov.py: found 3 instrumented PCs in a.out
				112	sancov.py: read 2 PCs from stdin
				113	sancov.py: 1 PCs missing from coverage
				114	0x4cc61c
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	115
				116	Edge coverage
				117	=============
				118
				119	Consider this code:
				120
				121	.. code-block:: c++
				122
				123	void foo(int *a) {
				124	if (a)
				125	*a = 0;
				126	}
				127
				128	It contains 3 basic blocks, let's name them A, B, C:
				129
				130	.. code-block:: none
				131
				132	A
				133	\|\
				134	\| \
				135	\| B
				136	\| /
				137	\|/
				138	C
				139
				140	If blocks A, B, and C are all covered we know for certain that the edges A=>B
				141	and B=>C were executed, but we still don't know if the edge A=>C was executed.
				142	Such edges of control flow graph are called
				143	`critical <http://en.wikipedia.org/wiki/Control_flow_graph#Special_edges>`_. The
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	144	edge-level coverage (``-fsanitize-coverage=edge``) simply splits all critical
				145	edges by introducing new dummy blocks and then instruments those blocks:
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	146
				147	.. code-block:: none
				148
				149	A
				150	\|\
				151	\| \
				152	D B
				153	\| /
				154	\|/
				155	C
				156
				157	Bitset
				158	======
				159
				160	When ``coverage_bitset=1`` run-time flag is given, the coverage will also be
				161	dumped as a bitset (text file with 1 for blocks that have been executed and 0
				162	for blocks that were not).
				163
				164	.. code-block:: console
				165
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	166	% clang++ -fsanitize=address -fsanitize-coverage=edge cov.cc
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	167	% ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out
				168	main
				169	% ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out 1
				170	foo
				171	main
				172	% head bitset
				173	==> a.out.38214.bitset-sancov <==
				174	01101
				175	==> a.out.6128.bitset-sancov <==
				176	11011%
				177
				178	For a given executable the length of the bitset is always the same (well,
				179	unless dlopen/dlclose come into play), so the bitset coverage can be
				180	easily used for bitset-based corpus distillation.
				181
				182	Caller-callee coverage
				183	======================
				184
				185	(Experimental!)
				186	Every indirect function call is instrumented with a run-time function call that
				187	captures caller and callee. At the shutdown time the process dumps a separate
				188	file called ``caller-callee.PID.sancov`` which contains caller/callee pairs as
				189	pairs of lines (odd lines are callers, even lines are callees)
				190
				191	.. code-block:: console
				192
				193	a.out 0x4a2e0c
				194	a.out 0x4a6510
				195	a.out 0x4a2e0c
				196	a.out 0x4a87f0
				197
				198	Current limitations:
				199
				200	* Only the first 14 callees for every caller are recorded, the rest are silently
				201	ignored.
				202	* The output format is not very compact since caller and callee may reside in
				203	different modules and we need to spell out the module names.
				204	* The routine that dumps the output is not optimized for speed
				205	* Only Linux x86_64 is tested so far.
				206	* Sandboxes are not supported.
				207
				208	Coverage counters
				209	=================
				210
				211	This experimental feature is inspired by
				212	`AFL <http://lcamtuf.coredump.cx/afl/technical_details.txt>`_'s coverage
				213	instrumentation. With additional compile-time and run-time flags you can get
				214	more sensitive coverage information. In addition to boolean values assigned to
				215	every basic block (edge) the instrumentation will collect imprecise counters.
				216	On exit, every counter will be mapped to a 8-bit bitset representing counter
				217	ranges: ``1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+`` and those 8-bit bitsets will
				218	be dumped to disk.
				219
				220	.. code-block:: console
				221
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	222	% clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=edge,8bit-counters
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	223	% ASAN_OPTIONS="coverage=1:coverage_counters=1" ./a.out
				224	% ls -l *counters-sancov
				225	... a.out.17110.counters-sancov
				226	% xxd *counters-sancov
				227	0000000: 0001 0100 01
				228
				229	These counters may also be used for in-process coverage-guided fuzzers. See
				230	``include/sanitizer/coverage_interface.h``:
				231
				232	.. code-block:: c++
				233
				234	// The coverage instrumentation may optionally provide imprecise counters.
				235	// Rather than exposing the counter values to the user we instead map
				236	// the counters to a bitset.
				237	// Every counter is associated with 8 bits in the bitset.
				238	// We define 8 value ranges: 1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+
				239	// The i-th bit is set to 1 if the counter value is in the i-th range.
				240	// This counter-based coverage implementation is not thread-safe.
				241
				242	// Returns the number of registered coverage counters.
				243	uintptr_t __sanitizer_get_number_of_counters();
				244	// Updates the counter 'bitset', clears the counters and returns the number of
				245	// new bits in 'bitset'.
				246	// If 'bitset' is nullptr, only clears the counters.
				247	// Otherwise 'bitset' should be at least
				248	// __sanitizer_get_number_of_counters bytes long and 8-aligned.
				249	uintptr_t
				250	__sanitizer_update_counter_bitset_and_clear_counters(uint8_t *bitset);
				251
Kostya Serebryany	b17e298	2015-07-31 21:48:10 +0000	[diff] [blame]	252	Tracing data flow
				253	=================
				254
				255	An experimental feature to support data-flow-guided fuzzing.
				256	With ``-fsanitize-coverage=trace-cmp`` the compiler will insert extra instrumentation
				257	around comparison instructions and switch statements.
				258	The fuzzer will need to define the following functions,
				259	they will be called by the instrumented code.
				260
				261	.. code-block:: c++
				262
				263	// Called before a comparison instruction.
				264	// SizeAndType is a packed value containing
				265	// - [63:32] the Size of the operands of comparison in bits
				266	// - [31:0] the Type of comparison (one of ICMP_EQ, ... ICMP_SLE)
				267	// Arg1 and Arg2 are arguments of the comparison.
				268	void __sanitizer_cov_trace_cmp(uint64_t SizeAndType, uint64_t Arg1, uint64_t Arg2);
				269
				270	// Called before a switch statement.
				271	// Val is the switch operand.
				272	// Cases[0] is the number of case constants.
				273	// Cases[1] is the size of Val in bits.
				274	// Cases[2:] are the case constants.
				275	void __sanitizer_cov_trace_switch(uint64_t Val, uint64_t *Cases);
				276
				277	This interface is a subject to change.
Kostya Serebryany	a94e6e7	2015-11-30 22:17:19 +0000	[diff] [blame]	278	The current implementation is not thread-safe and thus can be safely used only for single-threaded targets.
Kostya Serebryany	b17e298	2015-07-31 21:48:10 +0000	[diff] [blame]	279
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	280	Output directory
				281	================
				282
				283	By default, .sancov files are created in the current working directory.
				284	This can be changed with ``ASAN_OPTIONS=coverage_dir=/path``:
				285
				286	.. code-block:: console
				287
				288	% ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo
				289	% ls -l /tmp/cov/*sancov
				290	-rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
				291	-rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
				292
				293	Sudden death
				294	============
				295
				296	Normally, coverage data is collected in memory and saved to disk when the
				297	program exits (with an ``atexit()`` handler), when a SIGSEGV is caught, or when
				298	``__sanitizer_cov_dump()`` is called.
				299
				300	If the program ends with a signal that ASan does not handle (or can not handle
				301	at all, like SIGKILL), coverage data will be lost. This is a big problem on
				302	Android, where SIGKILL is a normal way of evicting applications from memory.
				303
				304	With ``ASAN_OPTIONS=coverage=1:coverage_direct=1`` coverage data is written to a
				305	memory-mapped file as soon as it collected.
				306
				307	.. code-block:: console
				308
				309	% ASAN_OPTIONS="coverage=1:coverage_direct=1" ./a.out
				310	main
				311	% ls
				312	7036.sancov.map 7036.sancov.raw a.out
				313	% sancov.py rawunpack 7036.sancov.raw
				314	sancov.py: reading map 7036.sancov.map
				315	sancov.py: unpacking 7036.sancov.raw
				316	writing 1 PCs to a.out.7036.sancov
				317	% sancov.py print a.out.7036.sancov
				318	sancov.py: read 1 PCs from a.out.7036.sancov
				319	sancov.py: 1 files merged; 1 PCs total
				320	0x4b2bae
				321
				322	Note that on 64-bit platforms, this method writes 2x more data than the default,
				323	because it stores full PC values instead of 32-bit offsets.
				324
				325	In-process fuzzing
				326	==================
				327
				328	Coverage data could be useful for fuzzers and sometimes it is preferable to run
				329	a fuzzer in the same process as the code being fuzzed (in-process fuzzer).
				330
				331	You can use ``__sanitizer_get_total_unique_coverage()`` from
				332	``<sanitizer/coverage_interface.h>`` which returns the number of currently
				333	covered entities in the program. This will tell the fuzzer if the coverage has
				334	increased after testing every new input.
				335
				336	If a fuzzer finds a bug in the ASan run, you will need to save the reproducer
				337	before exiting the process. Use ``__asan_set_death_callback`` from
				338	``<sanitizer/asan_interface.h>`` to do that.
				339
				340	An example of such fuzzer can be found in `the LLVM tree
				341	<http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup>`_.
				342
				343	Performance
				344	===========
				345
				346	This coverage implementation is fast. With function-level coverage
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	347	(``-fsanitize-coverage=func``) the overhead is not measurable. With
				348	basic-block-level coverage (``-fsanitize-coverage=bb``) the overhead varies
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	349	between 0 and 25%.
				350
				351	============== ========= ========= ========= ========= ========= =========
				352	benchmark cov0 cov1 diff 0-1 cov2 diff 0-2 diff 1-2
				353	============== ========= ========= ========= ========= ========= =========
				354	400.perlbench 1296.00 1307.00 1.01 1465.00 1.13 1.12
				355	401.bzip2 858.00 854.00 1.00 1010.00 1.18 1.18
				356	403.gcc 613.00 617.00 1.01 683.00 1.11 1.11
				357	429.mcf 605.00 582.00 0.96 610.00 1.01 1.05
				358	445.gobmk 896.00 880.00 0.98 1050.00 1.17 1.19
				359	456.hmmer 892.00 892.00 1.00 918.00 1.03 1.03
				360	458.sjeng 995.00 1009.00 1.01 1217.00 1.22 1.21
				361	462.libquantum 497.00 492.00 0.99 534.00 1.07 1.09
				362	464.h264ref 1461.00 1467.00 1.00 1543.00 1.06 1.05
				363	471.omnetpp 575.00 590.00 1.03 660.00 1.15 1.12
				364	473.astar 658.00 652.00 0.99 715.00 1.09 1.10
				365	483.xalancbmk 471.00 491.00 1.04 582.00 1.24 1.19
				366	433.milc 616.00 627.00 1.02 627.00 1.02 1.00
				367	444.namd 602.00 601.00 1.00 654.00 1.09 1.09
				368	447.dealII 630.00 634.00 1.01 653.00 1.04 1.03
				369	450.soplex 365.00 368.00 1.01 395.00 1.08 1.07
				370	453.povray 427.00 434.00 1.02 495.00 1.16 1.14
				371	470.lbm 357.00 375.00 1.05 370.00 1.04 0.99
				372	482.sphinx3 927.00 928.00 1.00 1000.00 1.08 1.08
				373	============== ========= ========= ========= ========= ========= =========
				374
				375	Why another coverage?
				376	=====================
				377
				378	Why did we implement yet another code coverage?
				379	* We needed something that is lightning fast, plays well with
				380	AddressSanitizer, and does not significantly increase the binary size.
				381	* Traditional coverage implementations based in global counters
				382	`suffer from contention on counters
				383	<https://groups.google.com/forum/#!topic/llvm-dev/cDqYgnxNEhY>`_.