Blame - clang/docs/SanitizerCoverage.rst - toolchain/llvm-project

blob: 7b2a5c6315b3a6f322057370562b4f613099ca2d [file] [log] [blame]

Sergey Matveev	33e3224	2015-04-23 21:29:37 +0000	[diff] [blame]	1	=================
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	2	SanitizerCoverage
Sergey Matveev	33e3224	2015-04-23 21:29:37 +0000	[diff] [blame]	3	=================
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	4
				5	.. contents::
				6	:local:
				7
				8	Introduction
				9	============
				10
				11	Sanitizer tools have a very simple code coverage tool built in. It allows to
				12	get function-level, basic-block-level, and edge-level coverage at a very low
				13	cost.
				14
				15	How to build and run
				16	====================
				17
				18	SanitizerCoverage can be used with :doc:`AddressSanitizer`,
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	19	:doc:`LeakSanitizer`, :doc:`MemorySanitizer`, and UndefinedBehaviorSanitizer.
				20	In addition to ``-fsanitize=``, pass one of the following compile-time flags:
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	21
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	22	* ``-fsanitize-coverage=func`` for function-level coverage (very fast).
				23	* ``-fsanitize-coverage=bb`` for basic-block-level coverage (may add up to 30%
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	24	extra slowdown).
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	25	* ``-fsanitize-coverage=edge`` for edge-level coverage (up to 40% slowdown).
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	26
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	27	You may also specify ``-fsanitize-coverage=indirect-calls`` for
				28	additional `caller-callee coverage`_.
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	29
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	30	At run time, pass ``coverage=1`` in ``ASAN_OPTIONS``, ``LSAN_OPTIONS``,
				31	``MSAN_OPTIONS`` or ``UBSAN_OPTIONS``, as appropriate.
				32
				33	To get `Coverage counters`_, add ``-fsanitize-coverage=8bit-counters``
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	34	to one of the above compile-time flags. At runtime, use
				35	``*SAN_OPTIONS=coverage=1:coverage_counters=1``.
				36
				37	Example:
				38
				39	.. code-block:: console
				40
				41	% cat -n cov.cc
				42	1 #include <stdio.h>
				43	2 __attribute__((noinline))
				44	3 void foo() { printf("foo\n"); }
				45	4
				46	5 int main(int argc, char **argv) {
				47	6 if (argc == 2)
				48	7 foo();
				49	8 printf("main\n");
				50	9 }
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	51	% clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=func
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	52	% ASAN_OPTIONS=coverage=1 ./a.out; ls -l *sancov
				53	main
				54	-rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
				55	% ASAN_OPTIONS=coverage=1 ./a.out foo ; ls -l *sancov
				56	foo
				57	main
				58	-rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
				59	-rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
				60
				61	Every time you run an executable instrumented with SanitizerCoverage
				62	one ``*.sancov`` file is created during the process shutdown.
				63	If the executable is dynamically linked against instrumented DSOs,
				64	one ``*.sancov`` file will be also created for every DSO.
				65
				66	Postprocessing
				67	==============
				68
				69	The format of ``*.sancov`` files is very simple: the first 8 bytes is the magic,
				70	one of ``0xC0BFFFFFFFFFFF64`` and ``0xC0BFFFFFFFFFFF32``. The last byte of the
				71	magic defines the size of the following offsets. The rest of the data is the
				72	offsets in the corresponding binary/DSO that were executed during the run.
				73
				74	A simple script
				75	``$LLVM/projects/compiler-rt/lib/sanitizer_common/scripts/sancov.py`` is
				76	provided to dump these offsets.
				77
				78	.. code-block:: console
				79
				80	% sancov.py print a.out.22679.sancov a.out.22673.sancov
				81	sancov.py: read 2 PCs from a.out.22679.sancov
				82	sancov.py: read 1 PCs from a.out.22673.sancov
				83	sancov.py: 2 files merged; 2 PCs total
				84	0x465250
				85	0x4652a0
				86
				87	You can then filter the output of ``sancov.py`` through ``addr2line --exe
				88	ObjectFile`` or ``llvm-symbolizer --obj ObjectFile`` to get file names and line
				89	numbers:
				90
				91	.. code-block:: console
				92
				93	% sancov.py print a.out.22679.sancov a.out.22673.sancov 2> /dev/null \| llvm-symbolizer --obj a.out
				94	cov.cc:3
				95	cov.cc:5
				96
Mike Aizatsky	3828cbb	2016-01-27 23:56:12 +0000	[diff] [blame^]	97	Sancov Tool
				98	===========
				99
				100	A new experimental ``sancov`` tool is developed to process coverage files.
				101	The tool is part of LLVM project and is currently supported only on Linux.
				102	It can handle symbolization tasks autonomously without needed any extra
				103	support from environment.
				104
				105	.. code-block:: console
				106
				107	USAGE: sancov [options] <action> <filenames...>
				108
				109	Action (required)
				110	-print - Print coverage addresses
				111	-covered-functions - Print all covered funcions.
				112	-not-covered-functions - Print all not covered funcions.
				113	-html-report - Print HTML coverage report.
				114
				115	Options
				116	-blacklist=<string> - Blacklist file (sanitizer blacklist format).
				117	-demangle - Print demangled function name.
				118	-obj=<string> - Path to object file to be symbolized
				119	-strip_path_prefix=<string> - Strip this prefix from file paths in reports
				120
				121
				122	Automatic HTML Report Generation
				123	================================
				124
				125	If ``*SAN_OPTIONS`` contains ``html_cov_report=1`` option set, then html
				126	coverage report would be automatically generated alongside the coverage files.
				127	The ``sancov`` binary should be present in ``PATH`` or
				128	``sancov_path=<path_to_sancov`` option can be used to specify tool location.
				129
				130
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	131	How good is the coverage?
				132	=========================
				133
Sergey Matveev	ea558e0	2015-05-06 21:09:00 +0000	[diff] [blame]	134	It is possible to find out which PCs are not covered, by subtracting the covered
				135	set from the set of all instrumented PCs. The latter can be obtained by listing
				136	all callsites of ``__sanitizer_cov()`` in the binary. On Linux, ``sancov.py``
				137	can do this for you. Just supply the path to binary and a list of covered PCs:
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	138
				139	.. code-block:: console
				140
Sergey Matveev	ea558e0	2015-05-06 21:09:00 +0000	[diff] [blame]	141	% sancov.py print a.out.12345.sancov > covered.txt
				142	sancov.py: read 2 64-bit PCs from a.out.12345.sancov
				143	sancov.py: 1 file merged; 2 PCs total
				144	% sancov.py missing a.out < covered.txt
				145	sancov.py: found 3 instrumented PCs in a.out
				146	sancov.py: read 2 PCs from stdin
				147	sancov.py: 1 PCs missing from coverage
				148	0x4cc61c
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	149
				150	Edge coverage
				151	=============
				152
				153	Consider this code:
				154
				155	.. code-block:: c++
				156
				157	void foo(int *a) {
				158	if (a)
				159	*a = 0;
				160	}
				161
				162	It contains 3 basic blocks, let's name them A, B, C:
				163
				164	.. code-block:: none
				165
				166	A
				167	\|\
				168	\| \
				169	\| B
				170	\| /
				171	\|/
				172	C
				173
				174	If blocks A, B, and C are all covered we know for certain that the edges A=>B
				175	and B=>C were executed, but we still don't know if the edge A=>C was executed.
				176	Such edges of control flow graph are called
				177	`critical <http://en.wikipedia.org/wiki/Control_flow_graph#Special_edges>`_. The
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	178	edge-level coverage (``-fsanitize-coverage=edge``) simply splits all critical
				179	edges by introducing new dummy blocks and then instruments those blocks:
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	180
				181	.. code-block:: none
				182
				183	A
				184	\|\
				185	\| \
				186	D B
				187	\| /
				188	\|/
				189	C
				190
				191	Bitset
				192	======
				193
				194	When ``coverage_bitset=1`` run-time flag is given, the coverage will also be
				195	dumped as a bitset (text file with 1 for blocks that have been executed and 0
				196	for blocks that were not).
				197
				198	.. code-block:: console
				199
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	200	% clang++ -fsanitize=address -fsanitize-coverage=edge cov.cc
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	201	% ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out
				202	main
				203	% ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out 1
				204	foo
				205	main
				206	% head bitset
				207	==> a.out.38214.bitset-sancov <==
				208	01101
				209	==> a.out.6128.bitset-sancov <==
				210	11011%
				211
				212	For a given executable the length of the bitset is always the same (well,
				213	unless dlopen/dlclose come into play), so the bitset coverage can be
				214	easily used for bitset-based corpus distillation.
				215
				216	Caller-callee coverage
				217	======================
				218
				219	(Experimental!)
				220	Every indirect function call is instrumented with a run-time function call that
				221	captures caller and callee. At the shutdown time the process dumps a separate
				222	file called ``caller-callee.PID.sancov`` which contains caller/callee pairs as
				223	pairs of lines (odd lines are callers, even lines are callees)
				224
				225	.. code-block:: console
				226
				227	a.out 0x4a2e0c
				228	a.out 0x4a6510
				229	a.out 0x4a2e0c
				230	a.out 0x4a87f0
				231
				232	Current limitations:
				233
				234	* Only the first 14 callees for every caller are recorded, the rest are silently
				235	ignored.
				236	* The output format is not very compact since caller and callee may reside in
				237	different modules and we need to spell out the module names.
				238	* The routine that dumps the output is not optimized for speed
				239	* Only Linux x86_64 is tested so far.
				240	* Sandboxes are not supported.
				241
				242	Coverage counters
				243	=================
				244
				245	This experimental feature is inspired by
				246	`AFL <http://lcamtuf.coredump.cx/afl/technical_details.txt>`_'s coverage
				247	instrumentation. With additional compile-time and run-time flags you can get
				248	more sensitive coverage information. In addition to boolean values assigned to
				249	every basic block (edge) the instrumentation will collect imprecise counters.
				250	On exit, every counter will be mapped to a 8-bit bitset representing counter
				251	ranges: ``1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+`` and those 8-bit bitsets will
				252	be dumped to disk.
				253
				254	.. code-block:: console
				255
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	256	% clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=edge,8bit-counters
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	257	% ASAN_OPTIONS="coverage=1:coverage_counters=1" ./a.out
				258	% ls -l *counters-sancov
				259	... a.out.17110.counters-sancov
				260	% xxd *counters-sancov
				261	0000000: 0001 0100 01
				262
				263	These counters may also be used for in-process coverage-guided fuzzers. See
				264	``include/sanitizer/coverage_interface.h``:
				265
				266	.. code-block:: c++
				267
				268	// The coverage instrumentation may optionally provide imprecise counters.
				269	// Rather than exposing the counter values to the user we instead map
				270	// the counters to a bitset.
				271	// Every counter is associated with 8 bits in the bitset.
				272	// We define 8 value ranges: 1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+
				273	// The i-th bit is set to 1 if the counter value is in the i-th range.
				274	// This counter-based coverage implementation is not thread-safe.
				275
				276	// Returns the number of registered coverage counters.
				277	uintptr_t __sanitizer_get_number_of_counters();
				278	// Updates the counter 'bitset', clears the counters and returns the number of
				279	// new bits in 'bitset'.
				280	// If 'bitset' is nullptr, only clears the counters.
				281	// Otherwise 'bitset' should be at least
				282	// __sanitizer_get_number_of_counters bytes long and 8-aligned.
				283	uintptr_t
				284	__sanitizer_update_counter_bitset_and_clear_counters(uint8_t *bitset);
				285
Kostya Serebryany	5ce8179	2015-12-02 02:08:26 +0000	[diff] [blame]	286	Tracing basic blocks
				287	====================
				288	An experimental feature to support basic block (or edge) tracing.
				289	With ``-fsanitize-coverage=trace-bb`` the compiler will insert
				290	``__sanitizer_cov_trace_basic_block(s32 *id)`` before every function, basic block, or edge
				291	(depending on the value of ``-fsanitize-coverage=[func,bb,edge]``).
				292
Kostya Serebryany	b17e298	2015-07-31 21:48:10 +0000	[diff] [blame]	293	Tracing data flow
				294	=================
				295
				296	An experimental feature to support data-flow-guided fuzzing.
				297	With ``-fsanitize-coverage=trace-cmp`` the compiler will insert extra instrumentation
				298	around comparison instructions and switch statements.
				299	The fuzzer will need to define the following functions,
				300	they will be called by the instrumented code.
				301
				302	.. code-block:: c++
				303
				304	// Called before a comparison instruction.
				305	// SizeAndType is a packed value containing
				306	// - [63:32] the Size of the operands of comparison in bits
				307	// - [31:0] the Type of comparison (one of ICMP_EQ, ... ICMP_SLE)
				308	// Arg1 and Arg2 are arguments of the comparison.
				309	void __sanitizer_cov_trace_cmp(uint64_t SizeAndType, uint64_t Arg1, uint64_t Arg2);
				310
				311	// Called before a switch statement.
				312	// Val is the switch operand.
				313	// Cases[0] is the number of case constants.
				314	// Cases[1] is the size of Val in bits.
				315	// Cases[2:] are the case constants.
				316	void __sanitizer_cov_trace_switch(uint64_t Val, uint64_t *Cases);
				317
				318	This interface is a subject to change.
Kostya Serebryany	a94e6e7	2015-11-30 22:17:19 +0000	[diff] [blame]	319	The current implementation is not thread-safe and thus can be safely used only for single-threaded targets.
Kostya Serebryany	b17e298	2015-07-31 21:48:10 +0000	[diff] [blame]	320
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	321	Output directory
				322	================
				323
				324	By default, .sancov files are created in the current working directory.
				325	This can be changed with ``ASAN_OPTIONS=coverage_dir=/path``:
				326
				327	.. code-block:: console
				328
				329	% ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo
				330	% ls -l /tmp/cov/*sancov
				331	-rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
				332	-rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
				333
				334	Sudden death
				335	============
				336
				337	Normally, coverage data is collected in memory and saved to disk when the
				338	program exits (with an ``atexit()`` handler), when a SIGSEGV is caught, or when
				339	``__sanitizer_cov_dump()`` is called.
				340
				341	If the program ends with a signal that ASan does not handle (or can not handle
				342	at all, like SIGKILL), coverage data will be lost. This is a big problem on
				343	Android, where SIGKILL is a normal way of evicting applications from memory.
				344
				345	With ``ASAN_OPTIONS=coverage=1:coverage_direct=1`` coverage data is written to a
				346	memory-mapped file as soon as it collected.
				347
				348	.. code-block:: console
				349
				350	% ASAN_OPTIONS="coverage=1:coverage_direct=1" ./a.out
				351	main
				352	% ls
				353	7036.sancov.map 7036.sancov.raw a.out
				354	% sancov.py rawunpack 7036.sancov.raw
				355	sancov.py: reading map 7036.sancov.map
				356	sancov.py: unpacking 7036.sancov.raw
				357	writing 1 PCs to a.out.7036.sancov
				358	% sancov.py print a.out.7036.sancov
				359	sancov.py: read 1 PCs from a.out.7036.sancov
				360	sancov.py: 1 files merged; 1 PCs total
				361	0x4b2bae
				362
				363	Note that on 64-bit platforms, this method writes 2x more data than the default,
				364	because it stores full PC values instead of 32-bit offsets.
				365
				366	In-process fuzzing
				367	==================
				368
				369	Coverage data could be useful for fuzzers and sometimes it is preferable to run
				370	a fuzzer in the same process as the code being fuzzed (in-process fuzzer).
				371
				372	You can use ``__sanitizer_get_total_unique_coverage()`` from
				373	``<sanitizer/coverage_interface.h>`` which returns the number of currently
				374	covered entities in the program. This will tell the fuzzer if the coverage has
				375	increased after testing every new input.
				376
				377	If a fuzzer finds a bug in the ASan run, you will need to save the reproducer
				378	before exiting the process. Use ``__asan_set_death_callback`` from
				379	``<sanitizer/asan_interface.h>`` to do that.
				380
				381	An example of such fuzzer can be found in `the LLVM tree
				382	<http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup>`_.
				383
				384	Performance
				385	===========
				386
				387	This coverage implementation is fast. With function-level coverage
Alexey Samsonov	8fffba1	2015-05-07 23:04:19 +0000	[diff] [blame]	388	(``-fsanitize-coverage=func``) the overhead is not measurable. With
				389	basic-block-level coverage (``-fsanitize-coverage=bb``) the overhead varies
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	390	between 0 and 25%.
				391
				392	============== ========= ========= ========= ========= ========= =========
				393	benchmark cov0 cov1 diff 0-1 cov2 diff 0-2 diff 1-2
				394	============== ========= ========= ========= ========= ========= =========
				395	400.perlbench 1296.00 1307.00 1.01 1465.00 1.13 1.12
				396	401.bzip2 858.00 854.00 1.00 1010.00 1.18 1.18
				397	403.gcc 613.00 617.00 1.01 683.00 1.11 1.11
				398	429.mcf 605.00 582.00 0.96 610.00 1.01 1.05
				399	445.gobmk 896.00 880.00 0.98 1050.00 1.17 1.19
				400	456.hmmer 892.00 892.00 1.00 918.00 1.03 1.03
				401	458.sjeng 995.00 1009.00 1.01 1217.00 1.22 1.21
				402	462.libquantum 497.00 492.00 0.99 534.00 1.07 1.09
				403	464.h264ref 1461.00 1467.00 1.00 1543.00 1.06 1.05
				404	471.omnetpp 575.00 590.00 1.03 660.00 1.15 1.12
				405	473.astar 658.00 652.00 0.99 715.00 1.09 1.10
				406	483.xalancbmk 471.00 491.00 1.04 582.00 1.24 1.19
				407	433.milc 616.00 627.00 1.02 627.00 1.02 1.00
				408	444.namd 602.00 601.00 1.00 654.00 1.09 1.09
				409	447.dealII 630.00 634.00 1.01 653.00 1.04 1.03
				410	450.soplex 365.00 368.00 1.01 395.00 1.08 1.07
				411	453.povray 427.00 434.00 1.02 495.00 1.16 1.14
				412	470.lbm 357.00 375.00 1.05 370.00 1.04 0.99
				413	482.sphinx3 927.00 928.00 1.00 1000.00 1.08 1.08
				414	============== ========= ========= ========= ========= ========= =========
				415
				416	Why another coverage?
				417	=====================
				418
				419	Why did we implement yet another code coverage?
				420	* We needed something that is lightning fast, plays well with
				421	AddressSanitizer, and does not significantly increase the binary size.
				422	* Traditional coverage implementations based in global counters
				423	`suffer from contention on counters
				424	<https://groups.google.com/forum/#!topic/llvm-dev/cDqYgnxNEhY>`_.