Blame - clang/docs/SanitizerCoverage.rst - toolchain/llvm-project

blob: a2d60ffdaf181b1b5ee8a67ae977de92b59a03b4 [file] [log] [blame]

Sergey Matveev	33e3224	2015-04-23 21:29:37 +0000	[diff] [blame]	1	=================
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	2	SanitizerCoverage
Sergey Matveev	33e3224	2015-04-23 21:29:37 +0000	[diff] [blame]	3	=================
Sergey Matveev	07e2d28	2015-04-23 20:40:04 +0000	[diff] [blame]	4
				5	.. contents::
				6	:local:
				7
				8	Introduction
				9	============
				10
				11	Sanitizer tools have a very simple code coverage tool built in. It allows to
				12	get function-level, basic-block-level, and edge-level coverage at a very low
				13	cost.
				14
				15	How to build and run
				16	====================
				17
				18	SanitizerCoverage can be used with :doc:`AddressSanitizer`,
				19	:doc:`LeakSanitizer` or :doc:`MemorySanitizer`. In addition to
				20	``-fsanitize=address``, ``leak`` or ``memory``, pass one of the following
				21	compile-time flags:
				22
				23	* ``-fsanitize-coverage=1`` for function-level coverage (very fast).
				24	* ``-fsanitize-coverage=2`` for basic-block-level coverage (may add up to 30%
				25	extra slowdown).
				26	* ``-fsanitize-coverage=3`` for edge-level coverage (up to 40% slowdown).
				27	* ``-fsanitize-coverage=4`` for additional calleer-callee coverage.
				28
				29	At run time, pass ``coverage=1`` in ``ASAN_OPTIONS``, ``LSAN_OPTIONS`` or
				30	``MSAN_OPTIONS``, as appropriate.
				31
				32	To get `Coverage counters`_, add ``-mllvm -sanitizer-coverage-8bit-counters=1``
				33	to one of the above compile-time flags. At runtime, use
				34	``*SAN_OPTIONS=coverage=1:coverage_counters=1``.
				35
				36	Example:
				37
				38	.. code-block:: console
				39
				40	% cat -n cov.cc
				41	1 #include <stdio.h>
				42	2 __attribute__((noinline))
				43	3 void foo() { printf("foo\n"); }
				44	4
				45	5 int main(int argc, char **argv) {
				46	6 if (argc == 2)
				47	7 foo();
				48	8 printf("main\n");
				49	9 }
				50	% clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=1
				51	% ASAN_OPTIONS=coverage=1 ./a.out; ls -l *sancov
				52	main
				53	-rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
				54	% ASAN_OPTIONS=coverage=1 ./a.out foo ; ls -l *sancov
				55	foo
				56	main
				57	-rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
				58	-rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
				59
				60	Every time you run an executable instrumented with SanitizerCoverage
				61	one ``*.sancov`` file is created during the process shutdown.
				62	If the executable is dynamically linked against instrumented DSOs,
				63	one ``*.sancov`` file will be also created for every DSO.
				64
				65	Postprocessing
				66	==============
				67
				68	The format of ``*.sancov`` files is very simple: the first 8 bytes is the magic,
				69	one of ``0xC0BFFFFFFFFFFF64`` and ``0xC0BFFFFFFFFFFF32``. The last byte of the
				70	magic defines the size of the following offsets. The rest of the data is the
				71	offsets in the corresponding binary/DSO that were executed during the run.
				72
				73	A simple script
				74	``$LLVM/projects/compiler-rt/lib/sanitizer_common/scripts/sancov.py`` is
				75	provided to dump these offsets.
				76
				77	.. code-block:: console
				78
				79	% sancov.py print a.out.22679.sancov a.out.22673.sancov
				80	sancov.py: read 2 PCs from a.out.22679.sancov
				81	sancov.py: read 1 PCs from a.out.22673.sancov
				82	sancov.py: 2 files merged; 2 PCs total
				83	0x465250
				84	0x4652a0
				85
				86	You can then filter the output of ``sancov.py`` through ``addr2line --exe
				87	ObjectFile`` or ``llvm-symbolizer --obj ObjectFile`` to get file names and line
				88	numbers:
				89
				90	.. code-block:: console
				91
				92	% sancov.py print a.out.22679.sancov a.out.22673.sancov 2> /dev/null \| llvm-symbolizer --obj a.out
				93	cov.cc:3
				94	cov.cc:5
				95
				96	How good is the coverage?
				97	=========================
				98
				99	If you want to know what PCs are still not covered, you can get the list of all
				100	instrumented PCs and then subtract all covered PCs from it. You can use
				101	``objdump`` to get all instrumented PCs:
				102
				103	.. code-block:: console
				104
				105	% objdump -d ./your-binary \| grep '__sanitizer_cov\>' \| grep -o "^ *[0-9a-f]\+"
				106
				107	TODO: implement scripts for doing this.
				108
				109	Edge coverage
				110	=============
				111
				112	Consider this code:
				113
				114	.. code-block:: c++
				115
				116	void foo(int *a) {
				117	if (a)
				118	*a = 0;
				119	}
				120
				121	It contains 3 basic blocks, let's name them A, B, C:
				122
				123	.. code-block:: none
				124
				125	A
				126	\|\
				127	\| \
				128	\| B
				129	\| /
				130	\|/
				131	C
				132
				133	If blocks A, B, and C are all covered we know for certain that the edges A=>B
				134	and B=>C were executed, but we still don't know if the edge A=>C was executed.
				135	Such edges of control flow graph are called
				136	`critical <http://en.wikipedia.org/wiki/Control_flow_graph#Special_edges>`_. The
				137	edge-level coverage (``-fsanitize-coverage=3``) simply splits all critical edges
				138	by introducing new dummy blocks and then instruments those blocks:
				139
				140	.. code-block:: none
				141
				142	A
				143	\|\
				144	\| \
				145	D B
				146	\| /
				147	\|/
				148	C
				149
				150	Bitset
				151	======
				152
				153	When ``coverage_bitset=1`` run-time flag is given, the coverage will also be
				154	dumped as a bitset (text file with 1 for blocks that have been executed and 0
				155	for blocks that were not).
				156
				157	.. code-block:: console
				158
				159	% clang++ -fsanitize=address -fsanitize-coverage=3 cov.cc
				160	% ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out
				161	main
				162	% ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out 1
				163	foo
				164	main
				165	% head bitset
				166	==> a.out.38214.bitset-sancov <==
				167	01101
				168	==> a.out.6128.bitset-sancov <==
				169	11011%
				170
				171	For a given executable the length of the bitset is always the same (well,
				172	unless dlopen/dlclose come into play), so the bitset coverage can be
				173	easily used for bitset-based corpus distillation.
				174
				175	Caller-callee coverage
				176	======================
				177
				178	(Experimental!)
				179	Every indirect function call is instrumented with a run-time function call that
				180	captures caller and callee. At the shutdown time the process dumps a separate
				181	file called ``caller-callee.PID.sancov`` which contains caller/callee pairs as
				182	pairs of lines (odd lines are callers, even lines are callees)
				183
				184	.. code-block:: console
				185
				186	a.out 0x4a2e0c
				187	a.out 0x4a6510
				188	a.out 0x4a2e0c
				189	a.out 0x4a87f0
				190
				191	Current limitations:
				192
				193	* Only the first 14 callees for every caller are recorded, the rest are silently
				194	ignored.
				195	* The output format is not very compact since caller and callee may reside in
				196	different modules and we need to spell out the module names.
				197	* The routine that dumps the output is not optimized for speed
				198	* Only Linux x86_64 is tested so far.
				199	* Sandboxes are not supported.
				200
				201	Coverage counters
				202	=================
				203
				204	This experimental feature is inspired by
				205	`AFL <http://lcamtuf.coredump.cx/afl/technical_details.txt>`_'s coverage
				206	instrumentation. With additional compile-time and run-time flags you can get
				207	more sensitive coverage information. In addition to boolean values assigned to
				208	every basic block (edge) the instrumentation will collect imprecise counters.
				209	On exit, every counter will be mapped to a 8-bit bitset representing counter
				210	ranges: ``1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+`` and those 8-bit bitsets will
				211	be dumped to disk.
				212
				213	.. code-block:: console
				214
				215	% clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=3 -mllvm -sanitizer-coverage-8bit-counters=1
				216	% ASAN_OPTIONS="coverage=1:coverage_counters=1" ./a.out
				217	% ls -l *counters-sancov
				218	... a.out.17110.counters-sancov
				219	% xxd *counters-sancov
				220	0000000: 0001 0100 01
				221
				222	These counters may also be used for in-process coverage-guided fuzzers. See
				223	``include/sanitizer/coverage_interface.h``:
				224
				225	.. code-block:: c++
				226
				227	// The coverage instrumentation may optionally provide imprecise counters.
				228	// Rather than exposing the counter values to the user we instead map
				229	// the counters to a bitset.
				230	// Every counter is associated with 8 bits in the bitset.
				231	// We define 8 value ranges: 1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+
				232	// The i-th bit is set to 1 if the counter value is in the i-th range.
				233	// This counter-based coverage implementation is not thread-safe.
				234
				235	// Returns the number of registered coverage counters.
				236	uintptr_t __sanitizer_get_number_of_counters();
				237	// Updates the counter 'bitset', clears the counters and returns the number of
				238	// new bits in 'bitset'.
				239	// If 'bitset' is nullptr, only clears the counters.
				240	// Otherwise 'bitset' should be at least
				241	// __sanitizer_get_number_of_counters bytes long and 8-aligned.
				242	uintptr_t
				243	__sanitizer_update_counter_bitset_and_clear_counters(uint8_t *bitset);
				244
				245	Output directory
				246	================
				247
				248	By default, .sancov files are created in the current working directory.
				249	This can be changed with ``ASAN_OPTIONS=coverage_dir=/path``:
				250
				251	.. code-block:: console
				252
				253	% ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo
				254	% ls -l /tmp/cov/*sancov
				255	-rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
				256	-rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
				257
				258	Sudden death
				259	============
				260
				261	Normally, coverage data is collected in memory and saved to disk when the
				262	program exits (with an ``atexit()`` handler), when a SIGSEGV is caught, or when
				263	``__sanitizer_cov_dump()`` is called.
				264
				265	If the program ends with a signal that ASan does not handle (or can not handle
				266	at all, like SIGKILL), coverage data will be lost. This is a big problem on
				267	Android, where SIGKILL is a normal way of evicting applications from memory.
				268
				269	With ``ASAN_OPTIONS=coverage=1:coverage_direct=1`` coverage data is written to a
				270	memory-mapped file as soon as it collected.
				271
				272	.. code-block:: console
				273
				274	% ASAN_OPTIONS="coverage=1:coverage_direct=1" ./a.out
				275	main
				276	% ls
				277	7036.sancov.map 7036.sancov.raw a.out
				278	% sancov.py rawunpack 7036.sancov.raw
				279	sancov.py: reading map 7036.sancov.map
				280	sancov.py: unpacking 7036.sancov.raw
				281	writing 1 PCs to a.out.7036.sancov
				282	% sancov.py print a.out.7036.sancov
				283	sancov.py: read 1 PCs from a.out.7036.sancov
				284	sancov.py: 1 files merged; 1 PCs total
				285	0x4b2bae
				286
				287	Note that on 64-bit platforms, this method writes 2x more data than the default,
				288	because it stores full PC values instead of 32-bit offsets.
				289
				290	In-process fuzzing
				291	==================
				292
				293	Coverage data could be useful for fuzzers and sometimes it is preferable to run
				294	a fuzzer in the same process as the code being fuzzed (in-process fuzzer).
				295
				296	You can use ``__sanitizer_get_total_unique_coverage()`` from
				297	``<sanitizer/coverage_interface.h>`` which returns the number of currently
				298	covered entities in the program. This will tell the fuzzer if the coverage has
				299	increased after testing every new input.
				300
				301	If a fuzzer finds a bug in the ASan run, you will need to save the reproducer
				302	before exiting the process. Use ``__asan_set_death_callback`` from
				303	``<sanitizer/asan_interface.h>`` to do that.
				304
				305	An example of such fuzzer can be found in `the LLVM tree
				306	<http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup>`_.
				307
				308	Performance
				309	===========
				310
				311	This coverage implementation is fast. With function-level coverage
				312	(``-fsanitize-coverage=1``) the overhead is not measurable. With
				313	basic-block-level coverage (``-fsanitize-coverage=2``) the overhead varies
				314	between 0 and 25%.
				315
				316	============== ========= ========= ========= ========= ========= =========
				317	benchmark cov0 cov1 diff 0-1 cov2 diff 0-2 diff 1-2
				318	============== ========= ========= ========= ========= ========= =========
				319	400.perlbench 1296.00 1307.00 1.01 1465.00 1.13 1.12
				320	401.bzip2 858.00 854.00 1.00 1010.00 1.18 1.18
				321	403.gcc 613.00 617.00 1.01 683.00 1.11 1.11
				322	429.mcf 605.00 582.00 0.96 610.00 1.01 1.05
				323	445.gobmk 896.00 880.00 0.98 1050.00 1.17 1.19
				324	456.hmmer 892.00 892.00 1.00 918.00 1.03 1.03
				325	458.sjeng 995.00 1009.00 1.01 1217.00 1.22 1.21
				326	462.libquantum 497.00 492.00 0.99 534.00 1.07 1.09
				327	464.h264ref 1461.00 1467.00 1.00 1543.00 1.06 1.05
				328	471.omnetpp 575.00 590.00 1.03 660.00 1.15 1.12
				329	473.astar 658.00 652.00 0.99 715.00 1.09 1.10
				330	483.xalancbmk 471.00 491.00 1.04 582.00 1.24 1.19
				331	433.milc 616.00 627.00 1.02 627.00 1.02 1.00
				332	444.namd 602.00 601.00 1.00 654.00 1.09 1.09
				333	447.dealII 630.00 634.00 1.01 653.00 1.04 1.03
				334	450.soplex 365.00 368.00 1.01 395.00 1.08 1.07
				335	453.povray 427.00 434.00 1.02 495.00 1.16 1.14
				336	470.lbm 357.00 375.00 1.05 370.00 1.04 0.99
				337	482.sphinx3 927.00 928.00 1.00 1000.00 1.08 1.08
				338	============== ========= ========= ========= ========= ========= =========
				339
				340	Why another coverage?
				341	=====================
				342
				343	Why did we implement yet another code coverage?
				344	* We needed something that is lightning fast, plays well with
				345	AddressSanitizer, and does not significantly increase the binary size.
				346	* Traditional coverage implementations based in global counters
				347	`suffer from contention on counters
				348	<https://groups.google.com/forum/#!topic/llvm-dev/cDqYgnxNEhY>`_.