Blame - clang/docs/SanitizerCoverage.rst - toolchain/llvm-project

blob: 0e493400b4ec014616438ca4bf730af6121eac15 [file] [log] [blame]

Sergey Matveev	33e3224	2015-04-23 21:29:37 +0000	[diff] [blame]	1	=================
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	2	SanitizerCoverage
Sergey Matveev	33e3224	2015-04-23 21:29:37 +0000	[diff] [blame]	3	=================
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	4
				5	.. contents::
				6	:local:
				7
				8	Introduction
				9	============
				10
				11	Sanitizer tools have a very simple code coverage tool built in. It allows to
				12	get function-level, basic-block-level, and edge-level coverage at a very low
				13	cost.
				14
				15	How to build and run
				16	====================
				17
				18	SanitizerCoverage can be used with :doc:`AddressSanitizer`,
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	19	:doc:`LeakSanitizer`, :doc:`MemorySanitizer`, and UndefinedBehaviorSanitizer.
				20	In addition to ``-fsanitize=``, pass one of the following compile-time flags:
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	21
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	22	* ``-fsanitize-coverage=func`` for function-level coverage (very fast).
				23	* ``-fsanitize-coverage=bb`` for basic-block-level coverage (may add up to 30%
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	24	extra slowdown).
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	25	* ``-fsanitize-coverage=edge`` for edge-level coverage (up to 40% slowdown).
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	26
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	27	You may also specify ``-fsanitize-coverage=indirect-calls`` for
				28	additional `caller-callee coverage`_.
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	29
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	30	At run time, pass ``coverage=1`` in ``ASAN_OPTIONS``, ``LSAN_OPTIONS``,
				31	``MSAN_OPTIONS`` or ``UBSAN_OPTIONS``, as appropriate.
				32
				33	To get `Coverage counters`_, add ``-fsanitize-coverage=8bit-counters``
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	34	to one of the above compile-time flags. At runtime, use
				35	``*SAN_OPTIONS=coverage=1:coverage_counters=1``.
				36
				37	Example:
				38
				39	.. code-block:: console
				40
				41	% cat -n cov.cc
				42	1 #include <stdio.h>
				43	2 __attribute__((noinline))
				44	3 void foo() { printf("foo\n"); }
				45	4
				46	5 int main(int argc, char **argv) {
				47	6 if (argc == 2)
				48	7 foo();
				49	8 printf("main\n");
				50	9 }
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	51	% clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=func
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	52	% ASAN_OPTIONS=coverage=1 ./a.out; ls -l *sancov
				53	main
				54	-rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
				55	% ASAN_OPTIONS=coverage=1 ./a.out foo ; ls -l *sancov
				56	foo
				57	main
				58	-rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
				59	-rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
				60
				61	Every time you run an executable instrumented with SanitizerCoverage
				62	one ``*.sancov`` file is created during the process shutdown.
				63	If the executable is dynamically linked against instrumented DSOs,
				64	one ``*.sancov`` file will be also created for every DSO.
				65
				66	Postprocessing
				67	==============
				68
				69	The format of ``*.sancov`` files is very simple: the first 8 bytes is the magic,
				70	one of ``0xC0BFFFFFFFFFFF64`` and ``0xC0BFFFFFFFFFFF32``. The last byte of the
				71	magic defines the size of the following offsets. The rest of the data is the
				72	offsets in the corresponding binary/DSO that were executed during the run.
				73
				74	A simple script
				75	``$LLVM/projects/compiler-rt/lib/sanitizer_common/scripts/sancov.py`` is
				76	provided to dump these offsets.
				77
				78	.. code-block:: console
				79
				80	% sancov.py print a.out.22679.sancov a.out.22673.sancov
				81	sancov.py: read 2 PCs from a.out.22679.sancov
				82	sancov.py: read 1 PCs from a.out.22673.sancov
				83	sancov.py: 2 files merged; 2 PCs total
				84	0x465250
				85	0x4652a0
				86
				87	You can then filter the output of ``sancov.py`` through ``addr2line --exe
				88	ObjectFile`` or ``llvm-symbolizer --obj ObjectFile`` to get file names and line
				89	numbers:
				90
				91	.. code-block:: console
				92
				93	% sancov.py print a.out.22679.sancov a.out.22673.sancov 2> /dev/null \| llvm-symbolizer --obj a.out
				94	cov.cc:3
				95	cov.cc:5
				96
Mike Aizatsky	3828cbb	2016-01-27 23:56:12 +0000	[diff] [blame]	97	Sancov Tool
				98	===========
				99
				100	A new experimental ``sancov`` tool is developed to process coverage files.
				101	The tool is part of LLVM project and is currently supported only on Linux.
Mike Aizatsky	a731ee3	2016-02-12 00:29:45 +0000	[diff] [blame]	102	It can handle symbolization tasks autonomously without any extra support
				103	from the environment. You need to pass .sancov files (named
				104	``<module_name>.<pid>.sancov`` and paths to all corresponding binary elf files.
				105	Sancov matches these files using module names and binaries file names.
Mike Aizatsky	3828cbb	2016-01-27 23:56:12 +0000	[diff] [blame]	106
				107	.. code-block:: console
				108
Mike Aizatsky	a731ee3	2016-02-12 00:29:45 +0000	[diff] [blame]	109	USAGE: sancov [options] <action> (<binary file>\|<.sancov file>)...
Mike Aizatsky	3828cbb	2016-01-27 23:56:12 +0000	[diff] [blame]	110
				111	Action (required)
				112	-print - Print coverage addresses
Sylvestre Ledru	be8f396	2016-02-14 20:20:58 +0000	[diff] [blame^]	113	-covered-functions - Print all covered functions.
				114	-not-covered-functions - Print all not covered functions.
Mike Aizatsky	3828cbb	2016-01-27 23:56:12 +0000	[diff] [blame]	115	-html-report - Print HTML coverage report.
				116
				117	Options
				118	-blacklist=<string> - Blacklist file (sanitizer blacklist format).
				119	-demangle - Print demangled function name.
Mike Aizatsky	3828cbb	2016-01-27 23:56:12 +0000	[diff] [blame]	120	-strip_path_prefix=<string> - Strip this prefix from file paths in reports
				121
				122
				123	Automatic HTML Report Generation
				124	================================
				125
				126	If ``*SAN_OPTIONS`` contains ``html_cov_report=1`` option set, then html
				127	coverage report would be automatically generated alongside the coverage files.
				128	The ``sancov`` binary should be present in ``PATH`` or
				129	``sancov_path=<path_to_sancov`` option can be used to specify tool location.
				130
				131
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	132	How good is the coverage?
				133	=========================
				134
Sergey Matveev	ea558e0	2015-05-06 21:09:00 +0000	[diff] [blame]	135	It is possible to find out which PCs are not covered, by subtracting the covered
				136	set from the set of all instrumented PCs. The latter can be obtained by listing
				137	all callsites of ``__sanitizer_cov()`` in the binary. On Linux, ``sancov.py``
				138	can do this for you. Just supply the path to binary and a list of covered PCs:
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	139
				140	.. code-block:: console
				141
Sergey Matveev	ea558e0	2015-05-06 21:09:00 +0000	[diff] [blame]	142	% sancov.py print a.out.12345.sancov > covered.txt
				143	sancov.py: read 2 64-bit PCs from a.out.12345.sancov
				144	sancov.py: 1 file merged; 2 PCs total
				145	% sancov.py missing a.out < covered.txt
				146	sancov.py: found 3 instrumented PCs in a.out
				147	sancov.py: read 2 PCs from stdin
				148	sancov.py: 1 PCs missing from coverage
				149	0x4cc61c
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	150
				151	Edge coverage
				152	=============
				153
				154	Consider this code:
				155
				156	.. code-block:: c++
				157
				158	void foo(int *a) {
				159	if (a)
				160	*a = 0;
				161	}
				162
				163	It contains 3 basic blocks, let's name them A, B, C:
				164
				165	.. code-block:: none
				166
				167	A
				168	\|\
				169	\| \
				170	\| B
				171	\| /
				172	\|/
				173	C
				174
				175	If blocks A, B, and C are all covered we know for certain that the edges A=>B
				176	and B=>C were executed, but we still don't know if the edge A=>C was executed.
				177	Such edges of control flow graph are called
				178	`critical <http://en.wikipedia.org/wiki/Control_flow_graph#Special_edges>`_. The
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	179	edge-level coverage (``-fsanitize-coverage=edge``) simply splits all critical
				180	edges by introducing new dummy blocks and then instruments those blocks:
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	181
				182	.. code-block:: none
				183
				184	A
				185	\|\
				186	\| \
				187	D B
				188	\| /
				189	\|/
				190	C
				191
				192	Bitset
				193	======
				194
				195	When ``coverage_bitset=1`` run-time flag is given, the coverage will also be
				196	dumped as a bitset (text file with 1 for blocks that have been executed and 0
				197	for blocks that were not).
				198
				199	.. code-block:: console
				200
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	201	% clang++ -fsanitize=address -fsanitize-coverage=edge cov.cc
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	202	% ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out
				203	main
				204	% ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out 1
				205	foo
				206	main
				207	% head bitset
				208	==> a.out.38214.bitset-sancov <==
				209	01101
				210	==> a.out.6128.bitset-sancov <==
				211	11011%
				212
				213	For a given executable the length of the bitset is always the same (well,
				214	unless dlopen/dlclose come into play), so the bitset coverage can be
				215	easily used for bitset-based corpus distillation.
				216
				217	Caller-callee coverage
				218	======================
				219
				220	(Experimental!)
				221	Every indirect function call is instrumented with a run-time function call that
				222	captures caller and callee. At the shutdown time the process dumps a separate
				223	file called ``caller-callee.PID.sancov`` which contains caller/callee pairs as
				224	pairs of lines (odd lines are callers, even lines are callees)
				225
				226	.. code-block:: console
				227
				228	a.out 0x4a2e0c
				229	a.out 0x4a6510
				230	a.out 0x4a2e0c
				231	a.out 0x4a87f0
				232
				233	Current limitations:
				234
				235	* Only the first 14 callees for every caller are recorded, the rest are silently
				236	ignored.
				237	* The output format is not very compact since caller and callee may reside in
				238	different modules and we need to spell out the module names.
				239	* The routine that dumps the output is not optimized for speed
				240	* Only Linux x86_64 is tested so far.
				241	* Sandboxes are not supported.
				242
				243	Coverage counters
				244	=================
				245
				246	This experimental feature is inspired by
				247	`AFL <http://lcamtuf.coredump.cx/afl/technical_details.txt>`_'s coverage
				248	instrumentation. With additional compile-time and run-time flags you can get
				249	more sensitive coverage information. In addition to boolean values assigned to
				250	every basic block (edge) the instrumentation will collect imprecise counters.
				251	On exit, every counter will be mapped to a 8-bit bitset representing counter
				252	ranges: ``1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+`` and those 8-bit bitsets will
				253	be dumped to disk.
				254
				255	.. code-block:: console
				256
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	257	% clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=edge,8bit-counters
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	258	% ASAN_OPTIONS="coverage=1:coverage_counters=1" ./a.out
				259	% ls -l *counters-sancov
				260	... a.out.17110.counters-sancov
				261	% xxd *counters-sancov
				262	0000000: 0001 0100 01
				263
				264	These counters may also be used for in-process coverage-guided fuzzers. See
				265	``include/sanitizer/coverage_interface.h``:
				266
				267	.. code-block:: c++
				268
				269	// The coverage instrumentation may optionally provide imprecise counters.
				270	// Rather than exposing the counter values to the user we instead map
				271	// the counters to a bitset.
				272	// Every counter is associated with 8 bits in the bitset.
				273	// We define 8 value ranges: 1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+
				274	// The i-th bit is set to 1 if the counter value is in the i-th range.
				275	// This counter-based coverage implementation is not thread-safe.
				276
				277	// Returns the number of registered coverage counters.
				278	uintptr_t __sanitizer_get_number_of_counters();
				279	// Updates the counter 'bitset', clears the counters and returns the number of
				280	// new bits in 'bitset'.
				281	// If 'bitset' is nullptr, only clears the counters.
				282	// Otherwise 'bitset' should be at least
				283	// __sanitizer_get_number_of_counters bytes long and 8-aligned.
				284	uintptr_t
				285	__sanitizer_update_counter_bitset_and_clear_counters(uint8_t *bitset);
				286
Kostya Serebryany	5ce8179	2015-12-02 02:08:26 +0000	[diff] [blame]	287	Tracing basic blocks
				288	====================
				289	An experimental feature to support basic block (or edge) tracing.
				290	With ``-fsanitize-coverage=trace-bb`` the compiler will insert
				291	``__sanitizer_cov_trace_basic_block(s32 *id)`` before every function, basic block, or edge
				292	(depending on the value of ``-fsanitize-coverage=[func,bb,edge]``).
				293
Kostya Serebryany	b17e298	2015-07-31 21:48:10 +0000	[diff] [blame]	294	Tracing data flow
				295	=================
				296
				297	An experimental feature to support data-flow-guided fuzzing.
				298	With ``-fsanitize-coverage=trace-cmp`` the compiler will insert extra instrumentation
				299	around comparison instructions and switch statements.
				300	The fuzzer will need to define the following functions,
				301	they will be called by the instrumented code.
				302
				303	.. code-block:: c++
				304
				305	// Called before a comparison instruction.
				306	// SizeAndType is a packed value containing
				307	// - [63:32] the Size of the operands of comparison in bits
				308	// - [31:0] the Type of comparison (one of ICMP_EQ, ... ICMP_SLE)
				309	// Arg1 and Arg2 are arguments of the comparison.
				310	void __sanitizer_cov_trace_cmp(uint64_t SizeAndType, uint64_t Arg1, uint64_t Arg2);
				311
				312	// Called before a switch statement.
				313	// Val is the switch operand.
				314	// Cases[0] is the number of case constants.
				315	// Cases[1] is the size of Val in bits.
				316	// Cases[2:] are the case constants.
				317	void __sanitizer_cov_trace_switch(uint64_t Val, uint64_t *Cases);
				318
				319	This interface is a subject to change.
Kostya Serebryany	a94e6e7	2015-11-30 22:17:19 +0000	[diff] [blame]	320	The current implementation is not thread-safe and thus can be safely used only for single-threaded targets.
Kostya Serebryany	b17e298	2015-07-31 21:48:10 +0000	[diff] [blame]	321
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	322	Output directory
				323	================
				324
				325	By default, .sancov files are created in the current working directory.
				326	This can be changed with ``ASAN_OPTIONS=coverage_dir=/path``:
				327
				328	.. code-block:: console
				329
				330	% ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo
				331	% ls -l /tmp/cov/*sancov
				332	-rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
				333	-rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
				334
				335	Sudden death
				336	============
				337
				338	Normally, coverage data is collected in memory and saved to disk when the
				339	program exits (with an ``atexit()`` handler), when a SIGSEGV is caught, or when
				340	``__sanitizer_cov_dump()`` is called.
				341
				342	If the program ends with a signal that ASan does not handle (or can not handle
				343	at all, like SIGKILL), coverage data will be lost. This is a big problem on
				344	Android, where SIGKILL is a normal way of evicting applications from memory.
				345
				346	With ``ASAN_OPTIONS=coverage=1:coverage_direct=1`` coverage data is written to a
				347	memory-mapped file as soon as it collected.
				348
				349	.. code-block:: console
				350
				351	% ASAN_OPTIONS="coverage=1:coverage_direct=1" ./a.out
				352	main
				353	% ls
				354	7036.sancov.map 7036.sancov.raw a.out
				355	% sancov.py rawunpack 7036.sancov.raw
				356	sancov.py: reading map 7036.sancov.map
				357	sancov.py: unpacking 7036.sancov.raw
				358	writing 1 PCs to a.out.7036.sancov
				359	% sancov.py print a.out.7036.sancov
				360	sancov.py: read 1 PCs from a.out.7036.sancov
				361	sancov.py: 1 files merged; 1 PCs total
				362	0x4b2bae
				363
				364	Note that on 64-bit platforms, this method writes 2x more data than the default,
				365	because it stores full PC values instead of 32-bit offsets.
				366
				367	In-process fuzzing
				368	==================
				369
				370	Coverage data could be useful for fuzzers and sometimes it is preferable to run
				371	a fuzzer in the same process as the code being fuzzed (in-process fuzzer).
				372
				373	You can use ``__sanitizer_get_total_unique_coverage()`` from
				374	``<sanitizer/coverage_interface.h>`` which returns the number of currently
				375	covered entities in the program. This will tell the fuzzer if the coverage has
				376	increased after testing every new input.
				377
				378	If a fuzzer finds a bug in the ASan run, you will need to save the reproducer
				379	before exiting the process. Use ``__asan_set_death_callback`` from
				380	``<sanitizer/asan_interface.h>`` to do that.
				381
				382	An example of such fuzzer can be found in `the LLVM tree
				383	<http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup>`_.
				384
				385	Performance
				386	===========
				387
				388	This coverage implementation is fast. With function-level coverage
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	389	(``-fsanitize-coverage=func``) the overhead is not measurable. With
				390	basic-block-level coverage (``-fsanitize-coverage=bb``) the overhead varies
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	391	between 0 and 25%.
				392
				393	============== ========= ========= ========= ========= ========= =========
				394	benchmark cov0 cov1 diff 0-1 cov2 diff 0-2 diff 1-2
				395	============== ========= ========= ========= ========= ========= =========
				396	400.perlbench 1296.00 1307.00 1.01 1465.00 1.13 1.12
				397	401.bzip2 858.00 854.00 1.00 1010.00 1.18 1.18
				398	403.gcc 613.00 617.00 1.01 683.00 1.11 1.11
				399	429.mcf 605.00 582.00 0.96 610.00 1.01 1.05
				400	445.gobmk 896.00 880.00 0.98 1050.00 1.17 1.19
				401	456.hmmer 892.00 892.00 1.00 918.00 1.03 1.03
				402	458.sjeng 995.00 1009.00 1.01 1217.00 1.22 1.21
				403	462.libquantum 497.00 492.00 0.99 534.00 1.07 1.09
				404	464.h264ref 1461.00 1467.00 1.00 1543.00 1.06 1.05
				405	471.omnetpp 575.00 590.00 1.03 660.00 1.15 1.12
				406	473.astar 658.00 652.00 0.99 715.00 1.09 1.10
				407	483.xalancbmk 471.00 491.00 1.04 582.00 1.24 1.19
				408	433.milc 616.00 627.00 1.02 627.00 1.02 1.00
				409	444.namd 602.00 601.00 1.00 654.00 1.09 1.09
				410	447.dealII 630.00 634.00 1.01 653.00 1.04 1.03
				411	450.soplex 365.00 368.00 1.01 395.00 1.08 1.07
				412	453.povray 427.00 434.00 1.02 495.00 1.16 1.14
				413	470.lbm 357.00 375.00 1.05 370.00 1.04 0.99
				414	482.sphinx3 927.00 928.00 1.00 1000.00 1.08 1.08
				415	============== ========= ========= ========= ========= ========= =========
				416
				417	Why another coverage?
				418	=====================
				419
				420	Why did we implement yet another code coverage?
				421	* We needed something that is lightning fast, plays well with
				422	AddressSanitizer, and does not significantly increase the binary size.
				423	* Traditional coverage implementations based in global counters
				424	`suffer from contention on counters
				425	<https://groups.google.com/forum/#!topic/llvm-dev/cDqYgnxNEhY>`_.