Blame - FAQ.txt - platform/external/valgrind

blob: 063a1615bbe55bc99475ad0c117981261aebb634 [file] [log] [blame]

nethercote	ef0abd1	2004-04-10 00:29:58 +0000	[diff] [blame]	1	Valgrind FAQ, version 2.1.2
				2	~~~~~~~~~~~~~~~~~~~~~~~~~~~
nethercote	8deae81	2004-07-18 10:35:36 +0000	[diff] [blame]	3	Last revised 18 July 2004
nethercote	ef0abd1	2004-04-10 00:29:58 +0000	[diff] [blame]	4	~~~~~~~~~~~~~~~~~~~~~~~~~
njn	4e59bd9	2003-04-22 20:58:47 +0000	[diff] [blame]	5
nethercote	ef0abd1	2004-04-10 00:29:58 +0000	[diff] [blame]	6	1. Background
				7	2. Compiling, installing and configuring
				8	3. Valgrind aborts unexpectedly
				9	4. Valgrind behaves unexpectedly
				10	5. Memcheck doesn't find my bug
				11	6. Miscellaneous
				12
				13
				14	-----------------------------------------------------------------
				15	1. Background
				16	-----------------------------------------------------------------
				17
				18	1.1. How do you pronounce "Valgrind"?
				19
				20	The "Val" as in the world "value". The "grind" is pronounced with a
				21	short 'i' -- ie. "grinned" (rhymes with "tinned") rather than "grined"
				22	(rhymes with "find").
				23
				24	Don't feel bad: almost everyone gets it wrong at first.
njn	4e59bd9	2003-04-22 20:58:47 +0000	[diff] [blame]	25
sewardj	36a53ad	2003-04-22 23:26:24 +0000	[diff] [blame]	26	-----------------------------------------------------------------
				27
nethercote	ef0abd1	2004-04-10 00:29:58 +0000	[diff] [blame]	28	1.2. Where does the name "Valgrind" come from?
				29
				30	From Nordic mythology. Originally (before release) the project was
				31	named Heimdall, after the watchman of the Nordic gods. He could "see a
				32	hundred miles by day or night, hear the grass growing, see the wool
				33	growing on a sheep's back" (etc). This would have been a great name,
				34	but it was already taken by a security package "Heimdal".
				35
				36	Keeping with the Nordic theme, Valgrind was chosen. Valgrind is the
				37	name of the main entrance to Valhalla (the Hall of the Chosen Slain in
				38	Asgard). Over this entrance there resides a wolf and over it there is
				39	the head of a boar and on it perches a huge eagle, whose eyes can see to
				40	the far regions of the nine worlds. Only those judged worthy by the
				41	guardians are allowed to pass through Valgrind. All others are refused
				42	entrance.
				43
				44	It's not short for "value grinder", although that's not a bad guess.
				45
				46
				47	-----------------------------------------------------------------
				48	2. Compiling, installing and configuring
				49	-----------------------------------------------------------------
				50
				51	2.1. When I trying building Valgrind, 'make' dies partway with an
				52	assertion failure, something like this: make: expand.c:489:
				53
				54	allocated_variable_append: Assertion
				55	`current_variable_set_list->next != 0' failed.
				56
				57	It's probably a bug in 'make'. Some, but not all, instances of version 3.79.1
				58	have this bug, see www.mail-archive.com/bug-make@gnu.org/msg01658.html. Try
				59	upgrading to a more recent version of 'make'. Alternatively, we have heard
				60	that unsetting the CFLAGS environment variable avoids the problem.
				61
				62
				63	-----------------------------------------------------------------
				64	3. Valgrind aborts unexpectedly
				65	-----------------------------------------------------------------
				66
				67	3.1. Programs run OK on Valgrind, but at exit produce a bunch of errors a bit
				68	like this
njn	4e59bd9	2003-04-22 20:58:47 +0000	[diff] [blame]	69
				70	==20755== Invalid read of size 4
				71	==20755== at 0x40281C8A: _nl_unload_locale (loadlocale.c:238)
				72	==20755== by 0x4028179D: free_mem (findlocale.c:257)
				73	==20755== by 0x402E0962: __libc_freeres (set-freeres.c:34)
				74	==20755== by 0x40048DCC: vgPlain___libc_freeres_wrapper
				75	(vg_clientfuncs.c:585)
				76	==20755== Address 0x40CC304C is 8 bytes inside a block of size 380 free'd
				77	==20755== at 0x400484C9: free (vg_clientfuncs.c:180)
				78	==20755== by 0x40281CBA: _nl_unload_locale (loadlocale.c:246)
				79	==20755== by 0x40281218: free_mem (setlocale.c:461)
				80	==20755== by 0x402E0962: __libc_freeres (set-freeres.c:34)
				81
				82	and then die with a segmentation fault.
				83
nethercote	ef0abd1	2004-04-10 00:29:58 +0000	[diff] [blame]	84	When the program exits, Valgrind runs the procedure __libc_freeres() in
				85	glibc. This is a hook for memory debuggers, so they can ask glibc to
				86	free up any memory it has used. Doing that is needed to ensure that
				87	Valgrind doesn't incorrectly report space leaks in glibc.
njn	4e59bd9	2003-04-22 20:58:47 +0000	[diff] [blame]	88
nethercote	ef0abd1	2004-04-10 00:29:58 +0000	[diff] [blame]	89	Problem is that running __libc_freeres() in older glibc versions causes
				90	this crash.
njn	4e59bd9	2003-04-22 20:58:47 +0000	[diff] [blame]	91
nethercote	ef0abd1	2004-04-10 00:29:58 +0000	[diff] [blame]	92	WORKAROUND FOR 1.1.X and later versions of Valgrind: use the
				93	--run-libc-freeres=no flag. You may then get space leak reports for
				94	glibc-allocations (please _don't_ report these to the glibc people,
				95	since they are not real leaks), but at least the program runs.
njn	4e59bd9	2003-04-22 20:58:47 +0000	[diff] [blame]	96
sewardj	36a53ad	2003-04-22 23:26:24 +0000	[diff] [blame]	97	-----------------------------------------------------------------
njn	4e59bd9	2003-04-22 20:58:47 +0000	[diff] [blame]	98
nethercote	ef0abd1	2004-04-10 00:29:58 +0000	[diff] [blame]	99	3.2. My (buggy) program dies like this:
njn	4e59bd9	2003-04-22 20:58:47 +0000	[diff] [blame]	100	valgrind: vg_malloc2.c:442 (bszW_to_pszW):
				101	Assertion `pszW >= 0' failed.
njn	4e59bd9	2003-04-22 20:58:47 +0000	[diff] [blame]	102
nethercote	ef0abd1	2004-04-10 00:29:58 +0000	[diff] [blame]	103	If Memcheck (the memory checker) shows any invalid reads, invalid writes
				104	and invalid frees in your program, the above may happen. Reason is that
				105	your program may trash Valgrind's low-level memory manager, which then
				106	dies with the above assertion, or something like this. The cure is to
				107	fix your program so that it doesn't do any illegal memory accesses. The
				108	above failure will hopefully go away after that.
njn	4e59bd9	2003-04-22 20:58:47 +0000	[diff] [blame]	109
sewardj	36a53ad	2003-04-22 23:26:24 +0000	[diff] [blame]	110	-----------------------------------------------------------------
njn	4e59bd9	2003-04-22 20:58:47 +0000	[diff] [blame]	111
nethercote	ef0abd1	2004-04-10 00:29:58 +0000	[diff] [blame]	112	3.3. My program dies, printing a message like this along the way:
sewardj	36a53ad	2003-04-22 23:26:24 +0000	[diff] [blame]	113
nethercote	3178887	2003-11-02 16:32:05 +0000	[diff] [blame]	114	disInstr: unhandled instruction bytes: 0x66 0xF 0x2E 0x5
sewardj	36a53ad	2003-04-22 23:26:24 +0000	[diff] [blame]	115
nethercote	ef0abd1	2004-04-10 00:29:58 +0000	[diff] [blame]	116	Older versions did not support some x86 instructions, particularly
				117	SSE/SSE2 instructions. Try a newer Valgrind; we now support almost all
				118	instructions. If it still happens with newer versions, if the failing
				119	instruction is an SSE/SSE2 instruction, you might be able to recompile
nethercote	8deae81	2004-07-18 10:35:36 +0000	[diff] [blame]	120	your program without it by using the flag -march to gcc. Either way,
nethercote	ef0abd1	2004-04-10 00:29:58 +0000	[diff] [blame]	121	let us know and we'll try to fix it.
sewardj	36a53ad	2003-04-22 23:26:24 +0000	[diff] [blame]	122
nethercote	8deae81	2004-07-18 10:35:36 +0000	[diff] [blame]	123	Another possibility is that your program has a bug and erroneously jumps
				124	to a non-code address, in which case you'll get a SIGILL signal.
				125	Memcheck/Addrcheck may issue a warning just before this happens, but they
				126	might not if the jump happens to land in addressable memory.
				127
nethercote	ef0abd1	2004-04-10 00:29:58 +0000	[diff] [blame]	128
				129	-----------------------------------------------------------------
				130	4. Valgrind behaves unexpectedly
				131	-----------------------------------------------------------------
				132
njn	a11b9b0	2005-03-27 17:05:08 +0000	[diff] [blame]	133	4.1. My threaded server process runs unbelievably slowly on Valgrind.
nethercote	ef0abd1	2004-04-10 00:29:58 +0000	[diff] [blame]	134	So slowly, in fact, that at first I thought it had completely
				135	locked up.
sewardj	03272ff	2003-04-26 22:23:35 +0000	[diff] [blame]	136
nethercote	ef0abd1	2004-04-10 00:29:58 +0000	[diff] [blame]	137	We are not completely sure about this, but one possibility is that
				138	laptops with power management fool Valgrind's timekeeping mechanism,
				139	which is (somewhat in error) based on the x86 RDTSC instruction. A
				140	"fix" which is claimed to work is to run some other cpu-intensive
				141	process at the same time, so that the laptop's power-management
				142	clock-slowing does not kick in. We would be interested in hearing more
				143	feedback on this.
sewardj	03272ff	2003-04-26 22:23:35 +0000	[diff] [blame]	144
nethercote	ef0abd1	2004-04-10 00:29:58 +0000	[diff] [blame]	145	Another possible cause is that versions prior to 1.9.6 did not support
				146	threading on glibc 2.3.X systems well. Hopefully the situation is much
				147	improved with 1.9.6 and later versions.
sewardj	03272ff	2003-04-26 22:23:35 +0000	[diff] [blame]	148
				149	-----------------------------------------------------------------
				150
njn	a11b9b0	2005-03-27 17:05:08 +0000	[diff] [blame]	151	4.2. My program uses the C++ STL and string classes. Valgrind
nethercote	ef0abd1	2004-04-10 00:29:58 +0000	[diff] [blame]	152	reports 'still reachable' memory leaks involving these classes
				153	at the exit of the program, but there should be none.
njn	ae34aef	2003-08-07 21:24:24 +0000	[diff] [blame]	154
nethercote	ef0abd1	2004-04-10 00:29:58 +0000	[diff] [blame]	155	First of all: relax, it's probably not a bug, but a feature. Many
				156	implementations of the C++ standard libraries use their own memory pool
				157	allocators. Memory for quite a number of destructed objects is not
				158	immediately freed and given back to the OS, but kept in the pool(s) for
				159	later re-use. The fact that the pools are not freed at the exit() of
				160	the program cause Valgrind to report this memory as still reachable.
				161	The behaviour not to free pools at the exit() could be called a bug of
				162	the library though.
njn	ae34aef	2003-08-07 21:24:24 +0000	[diff] [blame]	163
nethercote	ef0abd1	2004-04-10 00:29:58 +0000	[diff] [blame]	164	Using gcc, you can force the STL to use malloc and to free memory as
				165	soon as possible by globally disabling memory caching. Beware! Doing
				166	so will probably slow down your program, sometimes drastically.
njn	ae34aef	2003-08-07 21:24:24 +0000	[diff] [blame]	167
nethercote	ef0abd1	2004-04-10 00:29:58 +0000	[diff] [blame]	168	- With gcc 2.91, 2.95, 3.0 and 3.1, compile all source using the STL
				169	with -D__USE_MALLOC. Beware! This is removed from gcc starting with
				170	version 3.3.
				171
njn	8a5ad76	2005-05-12 13:45:56 +0000	[diff] [blame]	172	- With gcc 3.2.2 and later, you should export the environment variable
nethercote	ef0abd1	2004-04-10 00:29:58 +0000	[diff] [blame]	173	GLIBCPP_FORCE_NEW before running your program.
				174
njn	8a5ad76	2005-05-12 13:45:56 +0000	[diff] [blame]	175	- With gcc 3.4 and later, that variable has changed name to
				176	GLIBCXX_FORCE_NEW.
				177
nethercote	ef0abd1	2004-04-10 00:29:58 +0000	[diff] [blame]	178	There are other ways to disable memory pooling: using the malloc_alloc
				179	template with your objects (not portable, but should work for gcc) or
				180	even writing your own memory allocators. But all this goes beyond the
				181	scope of this FAQ. Start by reading
				182	http://gcc.gnu.org/onlinedocs/libstdc++/ext/howto.html#3 if you
				183	absolutely want to do that. But beware:
				184
				185	1) there are currently changes underway for gcc which are not totally
				186	reflected in the docs right now ("now" == 26 Apr 03)
				187
				188	2) allocators belong to the more messy parts of the STL and people went
				189	at great lengths to make it portable across platforms. Chances are
				190	good that your solution will work on your platform, but not on
				191	others.
				192
				193	-----------------------------------------------------------------------------
njn	a11b9b0	2005-03-27 17:05:08 +0000	[diff] [blame]	194	4.3. The stack traces given by Memcheck (or another tool) aren't helpful.
nethercote	ef0abd1	2004-04-10 00:29:58 +0000	[diff] [blame]	195	How can I improve them?
				196
				197	If they're not long enough, use --num-callers to make them longer.
				198
				199	If they're not detailed enough, make sure you are compiling with -g to add
				200	debug information. And don't strip symbol tables (programs should be
				201	unstripped unless you run 'strip' on them; some libraries ship stripped).
				202
njn	0211ff3	2005-05-15 14:49:24 +0000	[diff] [blame]	203	Also, for leak reports involving shared objects, if the shared object is
				204	unloaded before the program terminates, Valgrind will discard the debug
				205	information and the error message will be full of "???" entries. The
				206	workaround here is to avoid calling dlclose() on these shared objects.
				207
nethercote	ef0abd1	2004-04-10 00:29:58 +0000	[diff] [blame]	208	Also, -fomit-frame-pointer and -fstack-check can make stack traces worse.
				209
				210	Some example sub-traces:
				211
				212	With debug information and unstripped (best):
				213
				214	Invalid write of size 1
				215	at 0x80483BF: really (malloc1.c:20)
				216	by 0x8048370: main (malloc1.c:9)
				217
				218	With no debug information, unstripped:
				219
				220	Invalid write of size 1
				221	at 0x80483BF: really (in /auto/homes/njn25/grind/head5/a.out)
				222	by 0x8048370: main (in /auto/homes/njn25/grind/head5/a.out)
				223
				224	With no debug information, stripped:
				225
				226	Invalid write of size 1
				227	at 0x80483BF: (within /auto/homes/njn25/grind/head5/a.out)
				228	by 0x8048370: (within /auto/homes/njn25/grind/head5/a.out)
				229	by 0x42015703: __libc_start_main (in /lib/tls/libc-2.3.2.so)
				230	by 0x80482CC: (within /auto/homes/njn25/grind/head5/a.out)
				231
				232	With debug information and -fomit-frame-pointer:
				233
				234	Invalid write of size 1
				235	at 0x80483C4: really (malloc1.c:20)
				236	by 0x42015703: __libc_start_main (in /lib/tls/libc-2.3.2.so)
				237	by 0x80482CC: ??? (start.S:81)
				238
njn	0211ff3	2005-05-15 14:49:24 +0000	[diff] [blame]	239	A leak error message involving an unloaded shared object:
				240
				241	84 bytes in 1 blocks are possibly lost in loss record 488 of 713
				242	at 0x1B9036DA: operator new(unsigned) (vg_replace_malloc.c:132)
				243	by 0x1DB63EEB: ???
				244	by 0x1DB4B800: ???
				245	by 0x1D65E007: ???
				246	by 0x8049EE6: main (main.cpp:24)
				247
njn	16eeb4e	2005-06-16 03:56:58 +0000	[diff] [blame]	248	-----------------------------------------------------------------------------
				249	4.4. The stack traces given by Memcheck (or another tool) seem to
				250	have the wrong function name in them. What's happening?
				251
				252	Occasionally Valgrind stack traces get the wrong function names.
				253	This is caused by glibc using aliases to effectively give one function
				254	two names. Most of the time Valgrind chooses a suitable name, but
				255	very occasionally it gets it wrong.
				256
				257	Examples we know of are printing 'bcmp' instead of 'memcmp', 'index'
				258	instead of 'strchr', and 'rindex' instead of 'strrchr'.
				259
nethercote	ef0abd1	2004-04-10 00:29:58 +0000	[diff] [blame]	260	-----------------------------------------------------------------
				261	5. Memcheck doesn't find my bug
				262	-----------------------------------------------------------------
				263
				264	5.1. I try running "valgrind --tool=memcheck my_program" and get
				265	Valgrind's startup message, but I don't get any errors and I know
				266	my program has errors.
njn	a11b9b0	2005-03-27 17:05:08 +0000	[diff] [blame]	267
				268	There are two possible causes of this.
nethercote	ef0abd1	2004-04-10 00:29:58 +0000	[diff] [blame]	269
njn	a11b9b0	2005-03-27 17:05:08 +0000	[diff] [blame]	270	First, by default, Valgrind only traces the top-level process. So if your
nethercote	ef0abd1	2004-04-10 00:29:58 +0000	[diff] [blame]	271	program spawns children, they won't be traced by Valgrind by default.
				272	Also, if your program is started by a shell script, Perl script, or
				273	something similar, Valgrind will trace the shell, or the Perl
				274	interpreter, or equivalent.
				275
				276	To trace child processes, use the --trace-children=yes option.
				277
				278	If you are tracing large trees of processes, it can be less disruptive
				279	to have the output sent over the network. Give Valgrind the flag
nethercote	f854867	2004-06-21 12:42:35 +0000	[diff] [blame]	280	--log-socket=127.0.0.1:12345 (if you want logging output sent to port
nethercote	ef0abd1	2004-04-10 00:29:58 +0000	[diff] [blame]	281	12345 on localhost). You can use the valgrind-listener program to
				282	listen on that port:
				283
				284	valgrind-listener 12345
				285
				286	Obviously you have to start the listener process first. See the
				287	documentation for more details.
njn	ae34aef	2003-08-07 21:24:24 +0000	[diff] [blame]	288
njn	a11b9b0	2005-03-27 17:05:08 +0000	[diff] [blame]	289	Second, if your program is statically linked, most Valgrind tools won't
				290	work as well, because they won't be able to replace certain functions,
				291	such as malloc(), with their own versions. A key indicator of this is
				292	if Memcheck says:
				293
				294	No malloc'd blocks -- no leaks are possible
				295
				296	when you know your program calls malloc(). The workaround is to avoid
				297	statically linking your program.
				298
njn	ae34aef	2003-08-07 21:24:24 +0000	[diff] [blame]	299	-----------------------------------------------------------------
				300
nethercote	ef0abd1	2004-04-10 00:29:58 +0000	[diff] [blame]	301	5.2. Why doesn't Memcheck find the array overruns in this program?
				302
				303	int static[5];
				304
				305	int main(void)
				306	{
				307	int stack[5];
				308
				309	static[5] = 0;
				310	stack [5] = 0;
				311
				312	return 0;
				313	}
				314
				315	Unfortunately, Memcheck doesn't do bounds checking on static or stack
				316	arrays. We'd like to, but it's just not possible to do in a reasonable
				317	way that fits with how Memcheck works. Sorry.
njn	1aa1850	2003-08-15 07:35:20 +0000	[diff] [blame]	318
nethercote	ef0abd1	2004-04-10 00:29:58 +0000	[diff] [blame]	319
				320	-----------------------------------------------------------------
				321	6. Miscellaneous
				322	-----------------------------------------------------------------
				323
				324	6.1. I tried writing a suppression but it didn't work. Can you
				325	write my suppression for me?
				326
				327	Yes! Use the --gen-suppressions=yes feature to spit out suppressions
				328	automatically for you. You can then edit them if you like, eg.
				329	combining similar automatically generated suppressions using wildcards
				330	like '*'.
				331
				332	If you really want to write suppressions by hand, read the manual
				333	carefully. Note particularly that C++ function names must be _mangled_.
				334
				335	-----------------------------------------------------------------
				336
				337	6.2. With Memcheck/Addrcheck's memory leak detector, what's the
				338	difference between "definitely lost", "possibly lost", "still
				339	reachable", and "suppressed"?
				340
				341	The details are in section 3.6 of the manual.
				342
				343	In short:
				344
				345	- "definitely lost" means your program is leaking memory -- fix it!
				346
				347	- "possibly lost" means your program is probably leaking memory,
				348	unless you're doing funny things with pointers.
				349
				350	- "still reachable" means your program is probably ok -- it didn't
				351	free some memory it could have. This is quite common and often
				352	reasonable. Don't use --show-reachable=yes if you don't want to see
				353	these reports.
				354
				355	- "suppressed" means that a leak error has been suppressed. There are
				356	some suppressions in the default suppression files. You can ignore
				357	suppressed errors.
njn	a8fb5a3	2003-08-20 11:19:17 +0000	[diff] [blame]	358
				359	-----------------------------------------------------------------
				360
njn	4e59bd9	2003-04-22 20:58:47 +0000	[diff] [blame]	361	(this is the end of the FAQ.)