Blame - clang/docs/SanitizerCoverage.rst - toolchain/llvm-project

blob: efcb49e6eb42caa474f5c71d858f4118fdadc119 [file] [log] [blame]

Sergey Matveev	33e3224	2015-04-23 21:29:37 +0000	[diff] [blame]	1	=================
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	2	SanitizerCoverage
Sergey Matveev	33e3224	2015-04-23 21:29:37 +0000	[diff] [blame]	3	=================
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	4
				5	.. contents::
				6	:local:
				7
				8	Introduction
				9	============
				10
				11	Sanitizer tools have a very simple code coverage tool built in. It allows to
				12	get function-level, basic-block-level, and edge-level coverage at a very low
				13	cost.
				14
				15	How to build and run
				16	====================
				17
				18	SanitizerCoverage can be used with :doc:`AddressSanitizer`,
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	19	:doc:`LeakSanitizer`, :doc:`MemorySanitizer`, and UndefinedBehaviorSanitizer.
				20	In addition to ``-fsanitize=``, pass one of the following compile-time flags:
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	21
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	22	* ``-fsanitize-coverage=func`` for function-level coverage (very fast).
				23	* ``-fsanitize-coverage=bb`` for basic-block-level coverage (may add up to 30%
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	24	extra slowdown).
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	25	* ``-fsanitize-coverage=edge`` for edge-level coverage (up to 40% slowdown).
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	26
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	27	You may also specify ``-fsanitize-coverage=indirect-calls`` for
				28	additional `caller-callee coverage`_.
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	29
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	30	At run time, pass ``coverage=1`` in ``ASAN_OPTIONS``, ``LSAN_OPTIONS``,
				31	``MSAN_OPTIONS`` or ``UBSAN_OPTIONS``, as appropriate.
				32
				33	To get `Coverage counters`_, add ``-fsanitize-coverage=8bit-counters``
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	34	to one of the above compile-time flags. At runtime, use
				35	``*SAN_OPTIONS=coverage=1:coverage_counters=1``.
				36
				37	Example:
				38
				39	.. code-block:: console
				40
				41	% cat -n cov.cc
				42	1 #include <stdio.h>
				43	2 __attribute__((noinline))
				44	3 void foo() { printf("foo\n"); }
				45	4
				46	5 int main(int argc, char **argv) {
				47	6 if (argc == 2)
				48	7 foo();
				49	8 printf("main\n");
				50	9 }
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	51	% clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=func
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	52	% ASAN_OPTIONS=coverage=1 ./a.out; ls -l *sancov
				53	main
				54	-rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
				55	% ASAN_OPTIONS=coverage=1 ./a.out foo ; ls -l *sancov
				56	foo
				57	main
				58	-rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
				59	-rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
				60
				61	Every time you run an executable instrumented with SanitizerCoverage
				62	one ``*.sancov`` file is created during the process shutdown.
				63	If the executable is dynamically linked against instrumented DSOs,
				64	one ``*.sancov`` file will be also created for every DSO.
				65
				66	Postprocessing
				67	==============
				68
				69	The format of ``*.sancov`` files is very simple: the first 8 bytes is the magic,
				70	one of ``0xC0BFFFFFFFFFFF64`` and ``0xC0BFFFFFFFFFFF32``. The last byte of the
				71	magic defines the size of the following offsets. The rest of the data is the
				72	offsets in the corresponding binary/DSO that were executed during the run.
				73
				74	A simple script
				75	``$LLVM/projects/compiler-rt/lib/sanitizer_common/scripts/sancov.py`` is
				76	provided to dump these offsets.
				77
				78	.. code-block:: console
				79
				80	% sancov.py print a.out.22679.sancov a.out.22673.sancov
				81	sancov.py: read 2 PCs from a.out.22679.sancov
				82	sancov.py: read 1 PCs from a.out.22673.sancov
				83	sancov.py: 2 files merged; 2 PCs total
				84	0x465250
				85	0x4652a0
				86
				87	You can then filter the output of ``sancov.py`` through ``addr2line --exe
				88	ObjectFile`` or ``llvm-symbolizer --obj ObjectFile`` to get file names and line
				89	numbers:
				90
				91	.. code-block:: console
				92
				93	% sancov.py print a.out.22679.sancov a.out.22673.sancov 2> /dev/null \| llvm-symbolizer --obj a.out
				94	cov.cc:3
				95	cov.cc:5
				96
				97	How good is the coverage?
				98	=========================
				99
Sergey Matveev	ea558e0	2015-05-06 21:09:00 +0000	[diff] [blame]	100	It is possible to find out which PCs are not covered, by subtracting the covered
				101	set from the set of all instrumented PCs. The latter can be obtained by listing
				102	all callsites of ``__sanitizer_cov()`` in the binary. On Linux, ``sancov.py``
				103	can do this for you. Just supply the path to binary and a list of covered PCs:
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	104
				105	.. code-block:: console
				106
Sergey Matveev	ea558e0	2015-05-06 21:09:00 +0000	[diff] [blame]	107	% sancov.py print a.out.12345.sancov > covered.txt
				108	sancov.py: read 2 64-bit PCs from a.out.12345.sancov
				109	sancov.py: 1 file merged; 2 PCs total
				110	% sancov.py missing a.out < covered.txt
				111	sancov.py: found 3 instrumented PCs in a.out
				112	sancov.py: read 2 PCs from stdin
				113	sancov.py: 1 PCs missing from coverage
				114	0x4cc61c
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	115
				116	Edge coverage
				117	=============
				118
				119	Consider this code:
				120
				121	.. code-block:: c++
				122
				123	void foo(int *a) {
				124	if (a)
				125	*a = 0;
				126	}
				127
				128	It contains 3 basic blocks, let's name them A, B, C:
				129
				130	.. code-block:: none
				131
				132	A
				133	\|\
				134	\| \
				135	\| B
				136	\| /
				137	\|/
				138	C
				139
				140	If blocks A, B, and C are all covered we know for certain that the edges A=>B
				141	and B=>C were executed, but we still don't know if the edge A=>C was executed.
				142	Such edges of control flow graph are called
				143	`critical <http://en.wikipedia.org/wiki/Control_flow_graph#Special_edges>`_. The
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	144	edge-level coverage (``-fsanitize-coverage=edge``) simply splits all critical
				145	edges by introducing new dummy blocks and then instruments those blocks:
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	146
				147	.. code-block:: none
				148
				149	A
				150	\|\
				151	\| \
				152	D B
				153	\| /
				154	\|/
				155	C
				156
				157	Bitset
				158	======
				159
				160	When ``coverage_bitset=1`` run-time flag is given, the coverage will also be
				161	dumped as a bitset (text file with 1 for blocks that have been executed and 0
				162	for blocks that were not).
				163
				164	.. code-block:: console
				165
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	166	% clang++ -fsanitize=address -fsanitize-coverage=edge cov.cc
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	167	% ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out
				168	main
				169	% ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out 1
				170	foo
				171	main
				172	% head bitset
				173	==> a.out.38214.bitset-sancov <==
				174	01101
				175	==> a.out.6128.bitset-sancov <==
				176	11011%
				177
				178	For a given executable the length of the bitset is always the same (well,
				179	unless dlopen/dlclose come into play), so the bitset coverage can be
				180	easily used for bitset-based corpus distillation.
				181
				182	Caller-callee coverage
				183	======================
				184
				185	(Experimental!)
				186	Every indirect function call is instrumented with a run-time function call that
				187	captures caller and callee. At the shutdown time the process dumps a separate
				188	file called ``caller-callee.PID.sancov`` which contains caller/callee pairs as
				189	pairs of lines (odd lines are callers, even lines are callees)
				190
				191	.. code-block:: console
				192
				193	a.out 0x4a2e0c
				194	a.out 0x4a6510
				195	a.out 0x4a2e0c
				196	a.out 0x4a87f0
				197
				198	Current limitations:
				199
				200	* Only the first 14 callees for every caller are recorded, the rest are silently
				201	ignored.
				202	* The output format is not very compact since caller and callee may reside in
				203	different modules and we need to spell out the module names.
				204	* The routine that dumps the output is not optimized for speed
				205	* Only Linux x86_64 is tested so far.
				206	* Sandboxes are not supported.
				207
				208	Coverage counters
				209	=================
				210
				211	This experimental feature is inspired by
				212	`AFL <http://lcamtuf.coredump.cx/afl/technical_details.txt>`_'s coverage
				213	instrumentation. With additional compile-time and run-time flags you can get
				214	more sensitive coverage information. In addition to boolean values assigned to
				215	every basic block (edge) the instrumentation will collect imprecise counters.
				216	On exit, every counter will be mapped to a 8-bit bitset representing counter
				217	ranges: ``1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+`` and those 8-bit bitsets will
				218	be dumped to disk.
				219
				220	.. code-block:: console
				221
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	222	% clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=edge,8bit-counters
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	223	% ASAN_OPTIONS="coverage=1:coverage_counters=1" ./a.out
				224	% ls -l *counters-sancov
				225	... a.out.17110.counters-sancov
				226	% xxd *counters-sancov
				227	0000000: 0001 0100 01
				228
				229	These counters may also be used for in-process coverage-guided fuzzers. See
				230	``include/sanitizer/coverage_interface.h``:
				231
				232	.. code-block:: c++
				233
				234	// The coverage instrumentation may optionally provide imprecise counters.
				235	// Rather than exposing the counter values to the user we instead map
				236	// the counters to a bitset.
				237	// Every counter is associated with 8 bits in the bitset.
				238	// We define 8 value ranges: 1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+
				239	// The i-th bit is set to 1 if the counter value is in the i-th range.
				240	// This counter-based coverage implementation is not thread-safe.
				241
				242	// Returns the number of registered coverage counters.
				243	uintptr_t __sanitizer_get_number_of_counters();
				244	// Updates the counter 'bitset', clears the counters and returns the number of
				245	// new bits in 'bitset'.
				246	// If 'bitset' is nullptr, only clears the counters.
				247	// Otherwise 'bitset' should be at least
				248	// __sanitizer_get_number_of_counters bytes long and 8-aligned.
				249	uintptr_t
				250	__sanitizer_update_counter_bitset_and_clear_counters(uint8_t *bitset);
				251
Kostya Serebryany	b17e298	2015-07-31 21:48:10 +0000	[diff] [blame^]	252	Tracing data flow
				253	=================
				254
				255	An experimental feature to support data-flow-guided fuzzing.
				256	With ``-fsanitize-coverage=trace-cmp`` the compiler will insert extra instrumentation
				257	around comparison instructions and switch statements.
				258	The fuzzer will need to define the following functions,
				259	they will be called by the instrumented code.
				260
				261	.. code-block:: c++
				262
				263	// Called before a comparison instruction.
				264	// SizeAndType is a packed value containing
				265	// - [63:32] the Size of the operands of comparison in bits
				266	// - [31:0] the Type of comparison (one of ICMP_EQ, ... ICMP_SLE)
				267	// Arg1 and Arg2 are arguments of the comparison.
				268	void __sanitizer_cov_trace_cmp(uint64_t SizeAndType, uint64_t Arg1, uint64_t Arg2);
				269
				270	// Called before a switch statement.
				271	// Val is the switch operand.
				272	// Cases[0] is the number of case constants.
				273	// Cases[1] is the size of Val in bits.
				274	// Cases[2:] are the case constants.
				275	void __sanitizer_cov_trace_switch(uint64_t Val, uint64_t *Cases);
				276
				277	This interface is a subject to change.
				278
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	279	Output directory
				280	================
				281
				282	By default, .sancov files are created in the current working directory.
				283	This can be changed with ``ASAN_OPTIONS=coverage_dir=/path``:
				284
				285	.. code-block:: console
				286
				287	% ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo
				288	% ls -l /tmp/cov/*sancov
				289	-rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
				290	-rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
				291
				292	Sudden death
				293	============
				294
				295	Normally, coverage data is collected in memory and saved to disk when the
				296	program exits (with an ``atexit()`` handler), when a SIGSEGV is caught, or when
				297	``__sanitizer_cov_dump()`` is called.
				298
				299	If the program ends with a signal that ASan does not handle (or can not handle
				300	at all, like SIGKILL), coverage data will be lost. This is a big problem on
				301	Android, where SIGKILL is a normal way of evicting applications from memory.
				302
				303	With ``ASAN_OPTIONS=coverage=1:coverage_direct=1`` coverage data is written to a
				304	memory-mapped file as soon as it collected.
				305
				306	.. code-block:: console
				307
				308	% ASAN_OPTIONS="coverage=1:coverage_direct=1" ./a.out
				309	main
				310	% ls
				311	7036.sancov.map 7036.sancov.raw a.out
				312	% sancov.py rawunpack 7036.sancov.raw
				313	sancov.py: reading map 7036.sancov.map
				314	sancov.py: unpacking 7036.sancov.raw
				315	writing 1 PCs to a.out.7036.sancov
				316	% sancov.py print a.out.7036.sancov
				317	sancov.py: read 1 PCs from a.out.7036.sancov
				318	sancov.py: 1 files merged; 1 PCs total
				319	0x4b2bae
				320
				321	Note that on 64-bit platforms, this method writes 2x more data than the default,
				322	because it stores full PC values instead of 32-bit offsets.
				323
				324	In-process fuzzing
				325	==================
				326
				327	Coverage data could be useful for fuzzers and sometimes it is preferable to run
				328	a fuzzer in the same process as the code being fuzzed (in-process fuzzer).
				329
				330	You can use ``__sanitizer_get_total_unique_coverage()`` from
				331	``<sanitizer/coverage_interface.h>`` which returns the number of currently
				332	covered entities in the program. This will tell the fuzzer if the coverage has
				333	increased after testing every new input.
				334
				335	If a fuzzer finds a bug in the ASan run, you will need to save the reproducer
				336	before exiting the process. Use ``__asan_set_death_callback`` from
				337	``<sanitizer/asan_interface.h>`` to do that.
				338
				339	An example of such fuzzer can be found in `the LLVM tree
				340	<http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup>`_.
				341
				342	Performance
				343	===========
				344
				345	This coverage implementation is fast. With function-level coverage
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	346	(``-fsanitize-coverage=func``) the overhead is not measurable. With
				347	basic-block-level coverage (``-fsanitize-coverage=bb``) the overhead varies
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	348	between 0 and 25%.
				349
				350	============== ========= ========= ========= ========= ========= =========
				351	benchmark cov0 cov1 diff 0-1 cov2 diff 0-2 diff 1-2
				352	============== ========= ========= ========= ========= ========= =========
				353	400.perlbench 1296.00 1307.00 1.01 1465.00 1.13 1.12
				354	401.bzip2 858.00 854.00 1.00 1010.00 1.18 1.18
				355	403.gcc 613.00 617.00 1.01 683.00 1.11 1.11
				356	429.mcf 605.00 582.00 0.96 610.00 1.01 1.05
				357	445.gobmk 896.00 880.00 0.98 1050.00 1.17 1.19
				358	456.hmmer 892.00 892.00 1.00 918.00 1.03 1.03
				359	458.sjeng 995.00 1009.00 1.01 1217.00 1.22 1.21
				360	462.libquantum 497.00 492.00 0.99 534.00 1.07 1.09
				361	464.h264ref 1461.00 1467.00 1.00 1543.00 1.06 1.05
				362	471.omnetpp 575.00 590.00 1.03 660.00 1.15 1.12
				363	473.astar 658.00 652.00 0.99 715.00 1.09 1.10
				364	483.xalancbmk 471.00 491.00 1.04 582.00 1.24 1.19
				365	433.milc 616.00 627.00 1.02 627.00 1.02 1.00
				366	444.namd 602.00 601.00 1.00 654.00 1.09 1.09
				367	447.dealII 630.00 634.00 1.01 653.00 1.04 1.03
				368	450.soplex 365.00 368.00 1.01 395.00 1.08 1.07
				369	453.povray 427.00 434.00 1.02 495.00 1.16 1.14
				370	470.lbm 357.00 375.00 1.05 370.00 1.04 0.99
				371	482.sphinx3 927.00 928.00 1.00 1000.00 1.08 1.08
				372	============== ========= ========= ========= ========= ========= =========
				373
				374	Why another coverage?
				375	=====================
				376
				377	Why did we implement yet another code coverage?
				378	* We needed something that is lightning fast, plays well with
				379	AddressSanitizer, and does not significantly increase the binary size.
				380	* Traditional coverage implementations based in global counters
				381	`suffer from contention on counters
				382	<https://groups.google.com/forum/#!topic/llvm-dev/cDqYgnxNEhY>`_.