Blame - coregrind/docs/manual.html - platform/external/valgrind

blob: 20fbb36b594bea57edc24257fd7c4cfc1b97c427 [file] [log] [blame]

sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1	<html>
				2	<head>
				3	<style type="text/css">
				4	body { background-color: #ffffff;
				5	color: #000000;
				6	font-family: Times, Helvetica, Arial;
				7	font-size: 14pt}
				8	h4 { margin-bottom: 0.3em}
				9	code { color: #000000;
				10	font-family: Courier;
				11	font-size: 13pt }
				12	pre { color: #000000;
				13	font-family: Courier;
				14	font-size: 13pt }
				15	a:link { color: #0000C0;
				16	text-decoration: none; }
				17	a:visited { color: #0000C0;
				18	text-decoration: none; }
				19	a:active { color: #0000C0;
				20	text-decoration: none; }
				21	</style>
				22	</head>
				23
				24	<body bgcolor="#ffffff">
				25
				26	<a name="title"> </a>
sewardj	18d7513	2002-05-16 11:06:21 +0000	[diff] [blame^]	27	<h1 align=center>Valgrind, snapshot 20020516</h1>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	28	<center>This manual was majorly updated on 20020501</center>
sewardj	18d7513	2002-05-16 11:06:21 +0000	[diff] [blame^]	29	<center>This manual was minorly updated on 20020516</center>
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	30	<p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	31
				32	<center>
				33	<a href="mailto:jseward@acm.org">jseward@acm.org<br>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	34	Copyright © 2000-2002 Julian Seward
				35	<p>
				36	Valgrind is licensed under the GNU General Public License,
				37	version 2<br>
				38	An open-source tool for finding memory-management problems in
				39	Linux-x86 executables.
				40	</center>
				41
				42	<p>
				43
				44	<hr width="100%">
				45	<a name="contents"></a>
				46	<h2>Contents of this manual</h2>
				47
				48	<h4>1  <a href="#intro">Introduction</a></h4>
				49	1.1  <a href="#whatfor">What Valgrind is for</a><br>
				50	1.2  <a href="#whatdoes">What it does with your program</a>
				51
				52	<h4>2  <a href="#howtouse">How to use it, and how to make sense
				53	of the results</a></h4>
				54	2.1  <a href="#starta">Getting started</a><br>
				55	2.2  <a href="#comment">The commentary</a><br>
				56	2.3  <a href="#report">Reporting of errors</a><br>
				57	2.4  <a href="#suppress">Suppressing errors</a><br>
				58	2.5  <a href="#flags">Command-line flags</a><br>
				59	2.6  <a href="#errormsgs">Explaination of error messages</a><br>
				60	2.7  <a href="#suppfiles">Writing suppressions files</a><br>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	61	2.8  <a href="#clientreq">The Client Request mechanism</a><br>
				62	2.9  <a href="#pthreads">Support for POSIX pthreads</a><br>
				63	2.10  <a href="#install">Building and installing</a><br>
				64	2.11  <a href="#problems">If you have problems</a><br>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	65
				66	<h4>3  <a href="#machine">Details of the checking machinery</a></h4>
				67	3.1  <a href="#vvalue">Valid-value (V) bits</a><br>
				68	3.2  <a href="#vaddress">Valid-address (A) bits</a><br>
				69	3.3  <a href="#together">Putting it all together</a><br>
				70	3.4  <a href="#signals">Signals</a><br>
				71	3.5  <a href="#leaks">Memory leak detection</a><br>
				72
				73	<h4>4  <a href="#limits">Limitations</a></h4>
				74
				75	<h4>5  <a href="#howitworks">How it works -- a rough overview</a></h4>
				76	5.1  <a href="#startb">Getting started</a><br>
				77	5.2  <a href="#engine">The translation/instrumentation engine</a><br>
				78	5.3  <a href="#track">Tracking the status of memory</a><br>
				79	5.4  <a href="#sys_calls">System calls</a><br>
				80	5.5  <a href="#sys_signals">Signals</a><br>
				81
				82	<h4>6  <a href="#example">An example</a></h4>
				83
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	84	<h4>7  <a href="#cache">Cache profiling</a></h4>
				85
				86	<h4>8  <a href="techdocs.html">The design and implementation of Valgrind</a></h4>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	87
				88	<hr width="100%">
				89
				90	<a name="intro"></a>
				91	<h2>1  Introduction</h2>
				92
				93	<a name="whatfor"></a>
				94	<h3>1.1  What Valgrind is for</h3>
				95
				96	Valgrind is a tool to help you find memory-management problems in your
				97	programs. When a program is run under Valgrind's supervision, all
				98	reads and writes of memory are checked, and calls to
				99	malloc/new/free/delete are intercepted. As a result, Valgrind can
				100	detect problems such as:
				101	<ul>
				102	<li>Use of uninitialised memory</li>
				103	<li>Reading/writing memory after it has been free'd</li>
				104	<li>Reading/writing off the end of malloc'd blocks</li>
				105	<li>Reading/writing inappropriate areas on the stack</li>
sewardj	18d7513	2002-05-16 11:06:21 +0000	[diff] [blame^]	106	<li>Memory leaks -- where pointers to malloc'd blocks are lost
				107	forever</li>
				108	<li>Mismatched use of malloc/new/new [] vs free/delete/delete []</li>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	109	</ul>
				110
				111	Problems like these can be difficult to find by other means, often
				112	lying undetected for long periods, then causing occasional,
				113	difficult-to-diagnose crashes.
				114
				115	<p>
				116	Valgrind is closely tied to details of the CPU, operating system and
				117	to a less extent, compiler and basic C libraries. This makes it
				118	difficult to make it portable, so I have chosen at the outset to
				119	concentrate on what I believe to be a widely used platform: Red Hat
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	120	Linux 7.2, on x86s. Valgrind uses the standard Unix
				121	<code>./configure</code>, <code>make</code>, <code>make install</code>
				122	mechanism, and I have attempted to ensure that it works on machines
				123	with kernel 2.2 or 2.4 and glibc 2.1.X or 2.2.X. This should cover
				124	the vast majority of modern Linux installations.
				125
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	126
				127	<p>
				128	Valgrind is licensed under the GNU General Public License, version
				129	2. Read the file LICENSE in the source distribution for details.
				130
				131	<a name="whatdoes">
				132	<h3>1.2  What it does with your program</h3>
				133
				134	Valgrind is designed to be as non-intrusive as possible. It works
				135	directly with existing executables. You don't need to recompile,
				136	relink, or otherwise modify, the program to be checked. Simply place
				137	the word <code>valgrind</code> at the start of the command line
				138	normally used to run the program. So, for example, if you want to run
				139	the command <code>ls -l</code> on Valgrind, simply issue the
				140	command: <code>valgrind ls -l</code>.
				141
				142	<p>Valgrind takes control of your program before it starts. Debugging
				143	information is read from the executable and associated libraries, so
				144	that error messages can be phrased in terms of source code
				145	locations. Your program is then run on a synthetic x86 CPU which
				146	checks every memory access. All detected errors are written to a
				147	log. When the program finishes, Valgrind searches for and reports on
				148	leaked memory.
				149
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	150	<p>You can run pretty much any dynamically linked ELF x86 executable
				151	using Valgrind. Programs run 25 to 50 times slower, and take a lot
				152	more memory, than they usually would. It works well enough to run
				153	large programs. For example, the Konqueror web browser from the KDE
				154	Desktop Environment, version 3.0, runs slowly but usably on Valgrind.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	155
				156	<p>Valgrind simulates every single instruction your program executes.
				157	Because of this, it finds errors not only in your application but also
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	158	in all supporting dynamically-linked (<code>.so</code>-format)
				159	libraries, including the GNU C library, the X client libraries, Qt, if
				160	you work with KDE, and so on. That often includes libraries, for
				161	example the GNU C library, which contain memory access violations, but
				162	which you cannot or do not want to fix.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	163
				164	<p>Rather than swamping you with errors in which you are not
				165	interested, Valgrind allows you to selectively suppress errors, by
				166	recording them in a suppressions file which is read when Valgrind
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	167	starts up. The build mechanism attempts to select suppressions which
				168	give reasonable behaviour for the libc and XFree86 versions detected
				169	on your machine.
				170
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	171
				172	<p><a href="#example">Section 6</a> shows an example of use.
				173	<p>
				174	<hr width="100%">
				175
				176	<a name="howtouse"></a>
				177	<h2>2  How to use it, and how to make sense of the results</h2>
				178
				179	<a name="starta"></a>
				180	<h3>2.1  Getting started</h3>
				181
				182	First off, consider whether it might be beneficial to recompile your
				183	application and supporting libraries with optimisation disabled and
				184	debugging info enabled (the <code>-g</code> flag). You don't have to
				185	do this, but doing so helps Valgrind produce more accurate and less
				186	confusing error reports. Chances are you're set up like this already,
				187	if you intended to debug your program with GNU gdb, or some other
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	188	debugger.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	189
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	190	<p>
				191	A plausible compromise is to use <code>-g -O</code>.
				192	Optimisation levels above <code>-O</code> have been observed, on very
				193	rare occasions, to cause gcc to generate code which fools Valgrind's
				194	error tracking machinery into wrongly reporting uninitialised value
				195	errors. <code>-O</code> gets you the vast majority of the benefits of
				196	higher optimisation levels anyway, so you don't lose much there.
				197
				198	<p>
				199	Note that as of 1 May 2002 Valgrind does not understand the DWARF
				200	debugging format, which is unfortunate since the upcoming gcc-3.1 uses
				201	it by default. Valgrind only knows about the older "stabs" format.
				202	If you use gcc-3.1 or above, you can still ask for stabs-format debug
				203	info by passing <code>-gstabs</code> to gcc.
				204
				205	<p>
				206	Then just run your application, but place the word
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	207	<code>valgrind</code> in front of your usual command-line invokation.
				208	Note that you should run the real (machine-code) executable here. If
				209	your application is started by, for example, a shell or perl script,
				210	you'll need to modify it to invoke Valgrind on the real executables.
				211	Running such scripts directly under Valgrind will result in you
				212	getting error reports pertaining to <code>/bin/sh</code>,
				213	<code>/usr/bin/perl</code>, or whatever interpreter you're using.
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	214	This almost certainly isn't what you want and can be confusing.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	215
				216	<a name="comment"></a>
				217	<h3>2.2  The commentary</h3>
				218
				219	Valgrind writes a commentary, detailing error reports and other
				220	significant events. The commentary goes to standard output by
				221	default. This may interfere with your program, so you can ask for it
				222	to be directed elsewhere.
				223
				224	<p>All lines in the commentary are of the following form:<br>
				225	<pre>
				226	==12345== some-message-from-Valgrind
				227	</pre>
				228	<p>The <code>12345</code> is the process ID. This scheme makes it easy
				229	to distinguish program output from Valgrind commentary, and also easy
				230	to differentiate commentaries from different processes which have
				231	become merged together, for whatever reason.
				232
				233	<p>By default, Valgrind writes only essential messages to the commentary,
				234	so as to avoid flooding you with information of secondary importance.
				235	If you want more information about what is happening, re-run, passing
				236	the <code>-v</code> flag to Valgrind.
				237
				238
				239	<a name="report"></a>
				240	<h3>2.3  Reporting of errors</h3>
				241
				242	When Valgrind detects something bad happening in the program, an error
				243	message is written to the commentary. For example:<br>
				244	<pre>
				245	==25832== Invalid read of size 4
				246	==25832== at 0x8048724: BandMatrix::ReSize(int, int, int) (bogon.cpp:45)
				247	==25832== by 0x80487AF: main (bogon.cpp:66)
				248	==25832== by 0x40371E5E: __libc_start_main (libc-start.c:129)
				249	==25832== by 0x80485D1: (within /home/sewardj/newmat10/bogon)
				250	==25832== Address 0xBFFFF74C is not stack'd, malloc'd or free'd
				251	</pre>
				252
				253	<p>This message says that the program did an illegal 4-byte read of
				254	address 0xBFFFF74C, which, as far as it can tell, is not a valid stack
				255	address, nor corresponds to any currently malloc'd or free'd blocks.
				256	The read is happening at line 45 of <code>bogon.cpp</code>, called
				257	from line 66 of the same file, etc. For errors associated with an
				258	identified malloc'd/free'd block, for example reading free'd memory,
				259	Valgrind reports not only the location where the error happened, but
				260	also where the associated block was malloc'd/free'd.
				261
				262	<p>Valgrind remembers all error reports. When an error is detected,
				263	it is compared against old reports, to see if it is a duplicate. If
				264	so, the error is noted, but no further commentary is emitted. This
				265	avoids you being swamped with bazillions of duplicate error reports.
				266
				267	<p>If you want to know how many times each error occurred, run with
				268	the <code>-v</code> option. When execution finishes, all the reports
				269	are printed out, along with, and sorted by, their occurrence counts.
				270	This makes it easy to see which errors have occurred most frequently.
				271
				272	<p>Errors are reported before the associated operation actually
				273	happens. For example, if you program decides to read from address
				274	zero, Valgrind will emit a message to this effect, and the program
				275	will then duly die with a segmentation fault.
				276
				277	<p>In general, you should try and fix errors in the order that they
				278	are reported. Not doing so can be confusing. For example, a program
				279	which copies uninitialised values to several memory locations, and
				280	later uses them, will generate several error messages. The first such
				281	error message may well give the most direct clue to the root cause of
				282	the problem.
				283
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	284	<p>The process of detecting duplicate errors is quite an expensive
				285	one and can become a significant performance overhead if your program
				286	generates huge quantities of errors. To avoid serious problems here,
				287	Valgrind will simply stop collecting errors after 300 different errors
				288	have been seen, or 30000 errors in total have been seen. In this
				289	situation you might as well stop your program and fix it, because
				290	Valgrind won't tell you anything else useful after this. Note that
				291	the 300/30000 limits apply after suppressed errors are removed. These
				292	limits are defined in <code>vg_include.h</code> and can be increased
				293	if necessary.
				294
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	295	<a name="suppress"></a>
				296	<h3>2.4  Suppressing errors</h3>
				297
				298	Valgrind detects numerous problems in the base libraries, such as the
				299	GNU C library, and the XFree86 client libraries, which come
				300	pre-installed on your GNU/Linux system. You can't easily fix these,
				301	but you don't want to see these errors (and yes, there are many!) So
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	302	Valgrind reads a list of errors to suppress at startup.
				303	A default suppression file is cooked up by the
				304	<code>./configure</code> script.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	305
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	306	<p>You can modify and add to the suppressions file at your leisure,
				307	or, better, write your own. Multiple suppression files are allowed.
				308	This is useful if part of your project contains errors you can't or
				309	don't want to fix, yet you don't want to continuously be reminded of
				310	them.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	311
				312	<p>Each error to be suppressed is described very specifically, to
				313	minimise the possibility that a suppression-directive inadvertantly
				314	suppresses a bunch of similar errors which you did want to see. The
				315	suppression mechanism is designed to allow precise yet flexible
				316	specification of errors to suppress.
				317
				318	<p>If you use the <code>-v</code> flag, at the end of execution, Valgrind
				319	prints out one line for each used suppression, giving its name and the
				320	number of times it got used. Here's the suppressions used by a run of
				321	<code>ls -l</code>:
				322	<pre>
				323	--27579-- supp: 1 socketcall.connect(serv_addr)/__libc_connect/__nscd_getgrgid_r
				324	--27579-- supp: 1 socketcall.connect(serv_addr)/__libc_connect/__nscd_getpwuid_r
				325	--27579-- supp: 6 strrchr/_dl_map_object_from_fd/_dl_map_object
				326	</pre>
				327
				328	<a name="flags"></a>
				329	<h3>2.5  Command-line flags</h3>
				330
				331	You invoke Valgrind like this:
				332	<pre>
				333	valgrind [options-for-Valgrind] your-prog [options for your-prog]
				334	</pre>
				335
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	336	<p>Note that Valgrind also reads options from the environment variable
				337	<code>$VALGRIND</code>, and processes them before the command-line
				338	options.
				339
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	340	<p>Valgrind's default settings succeed in giving reasonable behaviour
				341	in most cases. Available options, in no particular order, are as
				342	follows:
				343	<ul>
				344	<li><code>--help</code></li><br>
				345
				346	<li><code>--version</code><br>
				347	<p>The usual deal.</li><br><p>
				348
				349	<li><code>-v --verbose</code><br>
				350	<p>Be more verbose. Gives extra information on various aspects
				351	of your program, such as: the shared objects loaded, the
				352	suppressions used, the progress of the instrumentation engine,
				353	and warnings about unusual behaviour.
				354	</li><br><p>
				355
				356	<li><code>-q --quiet</code><br>
				357	<p>Run silently, and only print error messages. Useful if you
				358	are running regression tests or have some other automated test
				359	machinery.
				360	</li><br><p>
				361
				362	<li><code>--demangle=no</code><br>
				363	<code>--demangle=yes</code> [the default]
				364	<p>Disable/enable automatic demangling (decoding) of C++ names.
				365	Enabled by default. When enabled, Valgrind will attempt to
				366	translate encoded C++ procedure names back to something
				367	approaching the original. The demangler handles symbols mangled
				368	by g++ versions 2.X and 3.X.
				369
				370	<p>An important fact about demangling is that function
				371	names mentioned in suppressions files should be in their mangled
				372	form. Valgrind does not demangle function names when searching
				373	for applicable suppressions, because to do otherwise would make
				374	suppressions file contents dependent on the state of Valgrind's
				375	demangling machinery, and would also be slow and pointless.
				376	</li><br><p>
				377
				378	<li><code>--num-callers=<number></code> [default=4]<br>
				379	<p>By default, Valgrind shows four levels of function call names
				380	to help you identify program locations. You can change that
				381	number with this option. This can help in determining the
				382	program's location in deeply-nested call chains. Note that errors
				383	are commoned up using only the top three function locations (the
				384	place in the current function, and that of its two immediate
				385	callers). So this doesn't affect the total number of errors
				386	reported.
				387	<p>
				388	The maximum value for this is 50. Note that higher settings
				389	will make Valgrind run a bit more slowly and take a bit more
				390	memory, but can be useful when working with programs with
				391	deeply-nested call chains.
				392	</li><br><p>
				393
				394	<li><code>--gdb-attach=no</code> [the default]<br>
				395	<code>--gdb-attach=yes</code>
				396	<p>When enabled, Valgrind will pause after every error shown,
				397	and print the line
				398	<br>
				399	<code>---- Attach to GDB ? --- [Return/N/n/Y/y/C/c] ----</code>
				400	<p>
				401	Pressing <code>Ret</code>, or <code>N</code> <code>Ret</code>
				402	or <code>n</code> <code>Ret</code>, causes Valgrind not to
				403	start GDB for this error.
				404	<p>
				405	<code>Y</code> <code>Ret</code>
				406	or <code>y</code> <code>Ret</code> causes Valgrind to
				407	start GDB, for the program at this point. When you have
				408	finished with GDB, quit from it, and the program will continue.
				409	Trying to continue from inside GDB doesn't work.
				410	<p>
				411	<code>C</code> <code>Ret</code>
				412	or <code>c</code> <code>Ret</code> causes Valgrind not to
				413	start GDB, and not to ask again.
				414	<p>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	415	<code>--gdb-attach=yes</code> conflicts with
				416	<code>--trace-children=yes</code>. You can't use them together.
				417	Valgrind refuses to start up in this situation. 1 May 2002:
				418	this is a historical relic which could be easily fixed if it
				419	gets in your way. Mail me and complain if this is a problem for
				420	you. </li><br><p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	421
				422	<li><code>--partial-loads-ok=yes</code> [the default]<br>
				423	<code>--partial-loads-ok=no</code>
				424	<p>Controls how Valgrind handles word (4-byte) loads from
				425	addresses for which some bytes are addressible and others
				426	are not. When <code>yes</code> (the default), such loads
				427	do not elicit an address error. Instead, the loaded V bytes
				428	corresponding to the illegal addresses indicate undefined, and
				429	those corresponding to legal addresses are loaded from shadow
				430	memory, as usual.
				431	<p>
				432	When <code>no</code>, loads from partially
				433	invalid addresses are treated the same as loads from completely
				434	invalid addresses: an illegal-address error is issued,
				435	and the resulting V bytes indicate valid data.
				436	</li><br><p>
				437
				438	<li><code>--sloppy-malloc=no</code> [the default]<br>
				439	<code>--sloppy-malloc=yes</code>
				440	<p>When enabled, all requests for malloc/calloc are rounded up
				441	to a whole number of machine words -- in other words, made
				442	divisible by 4. For example, a request for 17 bytes of space
				443	would result in a 20-byte area being made available. This works
				444	around bugs in sloppy libraries which assume that they can
				445	safely rely on malloc/calloc requests being rounded up in this
				446	fashion. Without the workaround, these libraries tend to
				447	generate large numbers of errors when they access the ends of
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	448	these areas.
				449	<p>
				450	Valgrind snapshots dated 17 Feb 2002 and later are
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	451	cleverer about this problem, and you should no longer need to
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	452	use this flag. To put it bluntly, if you do need to use this
				453	flag, your program violates the ANSI C semantics defined for
				454	<code>malloc</code> and <code>free</code>, even if it appears to
				455	work correctly, and you should fix it, at least if you hope for
				456	maximum portability.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	457	</li><br><p>
				458
				459	<li><code>--trace-children=no</code> [the default]</br>
				460	<code>--trace-children=yes</code>
				461	<p>When enabled, Valgrind will trace into child processes. This
				462	is confusing and usually not what you want, so is disabled by
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	463	default. As of 1 May 2002, tracing into a child process from a
				464	parent which uses <code>libpthread.so</code> is probably broken
				465	and is likely to cause breakage. Please report any such
				466	problems to me. </li><br><p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	467
				468	<li><code>--freelist-vol=<number></code> [default: 1000000]
				469	<p>When the client program releases memory using free (in C) or
				470	delete (C++), that memory is not immediately made available for
				471	re-allocation. Instead it is marked inaccessible and placed in
				472	a queue of freed blocks. The purpose is to delay the point at
				473	which freed-up memory comes back into circulation. This
				474	increases the chance that Valgrind will be able to detect
				475	invalid accesses to blocks for some significant period of time
				476	after they have been freed.
				477	<p>
				478	This flag specifies the maximum total size, in bytes, of the
				479	blocks in the queue. The default value is one million bytes.
				480	Increasing this increases the total amount of memory used by
				481	Valgrind but may detect invalid uses of freed blocks which would
				482	otherwise go undetected.</li><br><p>
				483
				484	<li><code>--logfile-fd=<number></code> [default: 2, stderr]
				485	<p>Specifies the file descriptor on which Valgrind communicates
				486	all of its messages. The default, 2, is the standard error
				487	channel. This may interfere with the client's own use of
				488	stderr. To dump Valgrind's commentary in a file without using
				489	stderr, something like the following works well (sh/bash
				490	syntax):<br>
				491	<code>
				492	valgrind --logfile-fd=9 my_prog 9> logfile</code><br>
				493	That is: tell Valgrind to send all output to file descriptor 9,
				494	and ask the shell to route file descriptor 9 to "logfile".
				495	</li><br><p>
				496
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	497	<li><code>--suppressions=<filename></code>
				498	[default: $PREFIX/lib/valgrind/default.supp]
				499	<p>Specifies an extra
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	500	file from which to read descriptions of errors to suppress. You
				501	may use as many extra suppressions files as you
				502	like.</li><br><p>
				503
				504	<li><code>--leak-check=no</code> [default]<br>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	505	<code>--leak-check=yes</code>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	506	<p>When enabled, search for memory leaks when the client program
				507	finishes. A memory leak means a malloc'd block, which has not
				508	yet been free'd, but to which no pointer can be found. Such a
				509	block can never be free'd by the program, since no pointer to it
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	510	exists. Leak checking is disabled by default because it tends
				511	to generate dozens of error messages. </li><br><p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	512
				513	<li><code>--show-reachable=no</code> [default]<br>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	514	<code>--show-reachable=yes</code>
				515	<p>When disabled, the memory leak detector only shows blocks for
				516	which it cannot find a pointer to at all, or it can only find a
				517	pointer to the middle of. These blocks are prime candidates for
				518	memory leaks. When enabled, the leak detector also reports on
				519	blocks which it could find a pointer to. Your program could, at
				520	least in principle, have freed such blocks before exit.
				521	Contrast this to blocks for which no pointer, or only an
				522	interior pointer could be found: they are more likely to
				523	indicate memory leaks, because you do not actually have a
				524	pointer to the start of the block which you can hand to
				525	<code>free</code>, even if you wanted to. </li><br><p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	526
				527	<li><code>--leak-resolution=low</code> [default]<br>
				528	<code>--leak-resolution=med</code> <br>
				529	<code>--leak-resolution=high</code>
				530	<p>When doing leak checking, determines how willing Valgrind is
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	531	to consider different backtraces to be the same. When set to
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	532	<code>low</code>, the default, only the first two entries need
				533	match. When <code>med</code>, four entries have to match. When
				534	<code>high</code>, all entries need to match.
				535	<p>
				536	For hardcore leak debugging, you probably want to use
				537	<code>--leak-resolution=high</code> together with
				538	<code>--num-callers=40</code> or some such large number. Note
				539	however that this can give an overwhelming amount of
				540	information, which is why the defaults are 4 callers and
				541	low-resolution matching.
				542	<p>
				543	Note that the <code>--leak-resolution=</code> setting does not
				544	affect Valgrind's ability to find leaks. It only changes how
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	545	the results are presented.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	546	</li><br><p>
				547
				548	<li><code>--workaround-gcc296-bugs=no</code> [default]<br>
				549	<code>--workaround-gcc296-bugs=yes</code> <p>When enabled,
				550	assume that reads and writes some small distance below the stack
				551	pointer <code>%esp</code> are due to bugs in gcc 2.96, and does
				552	not report them. The "small distance" is 256 bytes by default.
				553	Note that gcc 2.96 is the default compiler on some popular Linux
				554	distributions (RedHat 7.X, Mandrake) and so you may well need to
				555	use this flag. Do not use it if you do not have to, as it can
				556	cause real errors to be overlooked. A better option is to use a
				557	gcc/g++ which works properly; 2.95.3 seems to be a good choice.
				558	<p>
				559	Unfortunately (27 Feb 02) it looks like g++ 3.0.4 is similarly
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	560	buggy, so you may need to issue this flag if you use 3.0.4. A
				561	while later (early Apr 02) this is confirmed as a scheduling bug
				562	in g++-3.0.4.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	563	</li><br><p>
				564
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	565	<li><code>--cachesim=no</code> [default]<br>
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	566	<code>--cachesim=yes</code> <p>When enabled, turns off memory
				567	checking, and turns on cache profiling. Cache profiling is
sewardj	3984b85	2002-05-12 03:00:17 +0000	[diff] [blame]	568	described in detail in <a href="#cache">Section 7</a>.
				569	</li><br><p>
				570
sewardj	8d365b5	2002-05-12 10:52:16 +0000	[diff] [blame]	571	<li><code>--weird-hacks=hack1,hack2,...</code>
sewardj	3984b85	2002-05-12 03:00:17 +0000	[diff] [blame]	572	Pass miscellaneous hints to Valgrind which slightly modify the
				573	simulated behaviour in nonstandard or dangerous ways, possibly
				574	to help the simulation of strange features. By default no hacks
				575	are enabled. Use with caution! Currently known hacks are:
				576	<p>
				577	<ul>
				578	<li><code>ioctl-VTIME</code> Use this if you have a program
				579	which sets readable file descriptors to have a timeout by
				580	doing <code>ioctl</code> on them with a
				581	<code>TCSETA</code>-style command <b>and</b> a non-zero
				582	<code>VTIME</code> timeout value. This is considered
				583	potentially dangerous and therefore is not engaged by
				584	default, because it is (remotely) conceivable that it could
				585	cause threads doing <code>read</code> to incorrectly block
				586	the entire process.
				587	<p>
				588	You probably want to try this one if you have a program
				589	which unexpectedly blocks in a <code>read</code> from a file
				590	descriptor which you know to have been messed with by
				591	<code>ioctl</code>. This could happen, for example, if the
				592	descriptor is used to read input from some kind of screen
				593	handling library.
				594	<p>
				595	To find out if your program is blocking unexpectedly in the
				596	<code>read</code> system call, run with
				597	<code>--trace-syscalls=yes</code> flag.
				598	</ul>
				599
				600	</li><p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	601	</ul>
				602
				603	There are also some options for debugging Valgrind itself. You
				604	shouldn't need to use them in the normal run of things. Nevertheless:
				605
				606	<ul>
				607
				608	<li><code>--single-step=no</code> [default]<br>
				609	<code>--single-step=yes</code>
				610	<p>When enabled, each x86 insn is translated seperately into
				611	instrumented code. When disabled, translation is done on a
				612	per-basic-block basis, giving much better translations.</li><br>
				613	<p>
				614
				615	<li><code>--optimise=no</code><br>
				616	<code>--optimise=yes</code> [default]
				617	<p>When enabled, various improvements are applied to the
				618	intermediate code, mainly aimed at allowing the simulated CPU's
				619	registers to be cached in the real CPU's registers over several
				620	simulated instructions.</li><br>
				621	<p>
				622
				623	<li><code>--instrument=no</code><br>
				624	<code>--instrument=yes</code> [default]
				625	<p>When disabled, the translations don't actually contain any
				626	instrumentation.</li><br>
				627	<p>
				628
				629	<li><code>--cleanup=no</code><br>
				630	<code>--cleanup=yes</code> [default]
				631	<p>When enabled, various improvments are applied to the
				632	post-instrumented intermediate code, aimed at removing redundant
				633	value checks.</li><br>
				634	<p>
				635
				636	<li><code>--trace-syscalls=no</code> [default]<br>
				637	<code>--trace-syscalls=yes</code>
				638	<p>Enable/disable tracing of system call intercepts.</li><br>
				639	<p>
				640
				641	<li><code>--trace-signals=no</code> [default]<br>
				642	<code>--trace-signals=yes</code>
				643	<p>Enable/disable tracing of signal handling.</li><br>
				644	<p>
				645
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	646	<li><code>--trace-sched=no</code> [default]<br>
				647	<code>--trace-sched=yes</code>
				648	<p>Enable/disable tracing of thread scheduling events.</li><br>
				649	<p>
				650
sewardj	45b4b37	2002-04-16 22:50:32 +0000	[diff] [blame]	651	<li><code>--trace-pthread=none</code> [default]<br>
				652	<code>--trace-pthread=some</code> <br>
				653	<code>--trace-pthread=all</code>
				654	<p>Specifies amount of trace detail for pthread-related events.</li><br>
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	655	<p>
				656
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	657	<li><code>--trace-symtab=no</code> [default]<br>
				658	<code>--trace-symtab=yes</code>
				659	<p>Enable/disable tracing of symbol table reading.</li><br>
				660	<p>
				661
				662	<li><code>--trace-malloc=no</code> [default]<br>
				663	<code>--trace-malloc=yes</code>
				664	<p>Enable/disable tracing of malloc/free (et al) intercepts.
				665	</li><br>
				666	<p>
				667
				668	<li><code>--stop-after=<number></code>
				669	[default: infinity, more or less]
				670	<p>After <number> basic blocks have been executed, shut down
				671	Valgrind and switch back to running the client on the real CPU.
				672	</li><br>
				673	<p>
				674
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	675	<li><code>--dump-error=<number></code> [default: inactive]
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	676	<p>After the program has exited, show gory details of the
				677	translation of the basic block containing the <number>'th
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	678	error context. When used with <code>--single-step=yes</code>,
				679	can show the exact x86 instruction causing an error. This is
				680	all fairly dodgy and doesn't work at all if threads are
				681	involved.</li><br>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	682	<p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	683	</ul>
				684
				685
				686	<a name="errormsgs">
				687	<h3>2.6  Explaination of error messages</h3>
				688
				689	Despite considerable sophistication under the hood, Valgrind can only
				690	really detect two kinds of errors, use of illegal addresses, and use
				691	of undefined values. Nevertheless, this is enough to help you
				692	discover all sorts of memory-management nasties in your code. This
				693	section presents a quick summary of what error messages mean. The
				694	precise behaviour of the error-checking machinery is described in
				695	<a href="#machine">Section 4</a>.
				696
				697
				698	<h4>2.6.1  Illegal read / Illegal write errors</h4>
				699	For example:
				700	<pre>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	701	Invalid read of size 4
				702	at 0x40F6BBCC: (within /usr/lib/libpng.so.2.1.0.9)
				703	by 0x40F6B804: (within /usr/lib/libpng.so.2.1.0.9)
				704	by 0x40B07FF4: read_png_image__FP8QImageIO (kernel/qpngio.cpp:326)
				705	by 0x40AC751B: QImageIO::read() (kernel/qimage.cpp:3621)
				706	Address 0xBFFFF0E0 is not stack'd, malloc'd or free'd
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	707	</pre>
				708
				709	<p>This happens when your program reads or writes memory at a place
				710	which Valgrind reckons it shouldn't. In this example, the program did
				711	a 4-byte read at address 0xBFFFF0E0, somewhere within the
				712	system-supplied library libpng.so.2.1.0.9, which was called from
				713	somewhere else in the same library, called from line 326 of
				714	qpngio.cpp, and so on.
				715
				716	<p>Valgrind tries to establish what the illegal address might relate
				717	to, since that's often useful. So, if it points into a block of
				718	memory which has already been freed, you'll be informed of this, and
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	719	also where the block was free'd at. Likewise, if it should turn out
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	720	to be just off the end of a malloc'd block, a common result of
				721	off-by-one-errors in array subscripting, you'll be informed of this
				722	fact, and also where the block was malloc'd.
				723
				724	<p>In this example, Valgrind can't identify the address. Actually the
				725	address is on the stack, but, for some reason, this is not a valid
				726	stack address -- it is below the stack pointer, %esp, and that isn't
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	727	allowed. In this particular case it's probably caused by gcc
				728	generating invalid code, a known bug in various flavours of gcc.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	729
				730	<p>Note that Valgrind only tells you that your program is about to
				731	access memory at an illegal address. It can't stop the access from
				732	happening. So, if your program makes an access which normally would
				733	result in a segmentation fault, you program will still suffer the same
				734	fate -- but you will get a message from Valgrind immediately prior to
				735	this. In this particular example, reading junk on the stack is
				736	non-fatal, and the program stays alive.
				737
				738
				739	<h4>2.6.2  Use of uninitialised values</h4>
				740	For example:
				741	<pre>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	742	Conditional jump or move depends on uninitialised value(s)
				743	at 0x402DFA94: _IO_vfprintf (_itoa.h:49)
				744	by 0x402E8476: _IO_printf (printf.c:36)
				745	by 0x8048472: main (tests/manuel1.c:8)
				746	by 0x402A6E5E: __libc_start_main (libc-start.c:129)
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	747	</pre>
				748
				749	<p>An uninitialised-value use error is reported when your program uses
				750	a value which hasn't been initialised -- in other words, is undefined.
				751	Here, the undefined value is used somewhere inside the printf()
				752	machinery of the C library. This error was reported when running the
				753	following small program:
				754	<pre>
				755	int main()
				756	{
				757	int x;
				758	printf ("x = %d\n", x);
				759	}
				760	</pre>
				761
				762	<p>It is important to understand that your program can copy around
				763	junk (uninitialised) data to its heart's content. Valgrind observes
				764	this and keeps track of the data, but does not complain. A complaint
				765	is issued only when your program attempts to make use of uninitialised
				766	data. In this example, x is uninitialised. Valgrind observes the
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	767	value being passed to _IO_printf and thence to _IO_vfprintf, but makes
				768	no comment. However, _IO_vfprintf has to examine the value of x so it
				769	can turn it into the corresponding ASCII string, and it is at this
				770	point that Valgrind complains.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	771
				772	<p>Sources of uninitialised data tend to be:
				773	<ul>
				774	<li>Local variables in procedures which have not been initialised,
				775	as in the example above.</li><br><p>
				776
				777	<li>The contents of malloc'd blocks, before you write something
				778	there. In C++, the new operator is a wrapper round malloc, so
				779	if you create an object with new, its fields will be
				780	uninitialised until you fill them in, which is only Right and
				781	Proper.</li>
				782	</ul>
				783
				784
				785
				786	<h4>2.6.3  Illegal frees</h4>
				787	For example:
				788	<pre>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	789	Invalid free()
				790	at 0x4004FFDF: free (ut_clientmalloc.c:577)
				791	by 0x80484C7: main (tests/doublefree.c:10)
				792	by 0x402A6E5E: __libc_start_main (libc-start.c:129)
				793	by 0x80483B1: (within tests/doublefree)
				794	Address 0x3807F7B4 is 0 bytes inside a block of size 177 free'd
				795	at 0x4004FFDF: free (ut_clientmalloc.c:577)
				796	by 0x80484C7: main (tests/doublefree.c:10)
				797	by 0x402A6E5E: __libc_start_main (libc-start.c:129)
				798	by 0x80483B1: (within tests/doublefree)
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	799	</pre>
				800	<p>Valgrind keeps track of the blocks allocated by your program with
				801	malloc/new, so it can know exactly whether or not the argument to
				802	free/delete is legitimate or not. Here, this test program has
				803	freed the same block twice. As with the illegal read/write errors,
				804	Valgrind attempts to make sense of the address free'd. If, as
				805	here, the address is one which has previously been freed, you wil
				806	be told that -- making duplicate frees of the same block easy to spot.
				807
				808
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	809	<h4>2.6.4  When a block is freed with an inappropriate
				810	deallocation function</h4>
sewardj	7c062c9	2002-05-01 21:46:38 +0000	[diff] [blame]	811	In the following example, a block allocated with <code>new []</code>
				812	has wrongly been deallocated with <code>free</code>:
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	813	<pre>
				814	Mismatched free() / delete / delete []
sewardj	7c062c9	2002-05-01 21:46:38 +0000	[diff] [blame]	815	at 0x40043249: free (vg_clientfuncs.c:171)
				816	by 0x4102BB4E: QGArray::~QGArray(void) (tools/qgarray.cpp:149)
				817	by 0x4C261C41: PptDoc::~PptDoc(void) (include/qmemarray.h:60)
				818	by 0x4C261F0E: PptXml::~PptXml(void) (pptxml.cc:44)
				819	Address 0x4BB292A8 is 0 bytes inside a block of size 64 alloc'd
				820	at 0x4004318C: __builtin_vec_new (vg_clientfuncs.c:152)
				821	by 0x4C21BC15: KLaola::readSBStream(int) const (klaola.cc:314)
				822	by 0x4C21C155: KLaola::stream(KLaola::OLENode const *) (klaola.cc:416)
				823	by 0x4C21788F: OLEFilter::convert(QCString const &) (olefilter.cc:272)
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	824	</pre>
				825	The following was told to me be the KDE 3 developers. I didn't know
				826	any of it myself. They also implemented the check itself.
				827	<p>
				828	In C++ it's important to deallocate memory in a way compatible with
				829	how it was allocated. The deal is:
				830	<ul>
				831	<li>If allocated with <code>malloc</code>, <code>calloc</code>,
				832	<code>realloc</code>, <code>valloc</code> or
				833	<code>memalign</code>, you must deallocate with <code>free</code>.
				834	<li>If allocated with <code>new []</code>, you must deallocate with
				835	<code>delete []</code>.
				836	<li>If allocated with <code>new</code>, you must deallocate with
				837	<code>delete</code>.
				838	</ul>
				839	The worst thing is that on Linux apparently it doesn't matter if you
				840	do muddle these up, and it all seems to work ok, but the same program
				841	may then crash on a different platform, Solaris for example. So it's
				842	best to fix it properly. According to the KDE folks "it's amazing how
				843	many C++ programmers don't know this".
				844
				845
				846
				847	<h4>2.6.5  Passing system call parameters with inadequate
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	848	read/write permissions</h4>
				849
				850	Valgrind checks all parameters to system calls. If a system call
				851	needs to read from a buffer provided by your program, Valgrind checks
				852	that the entire buffer is addressible and has valid data, ie, it is
				853	readable. And if the system call needs to write to a user-supplied
				854	buffer, Valgrind checks that the buffer is addressible. After the
				855	system call, Valgrind updates its administrative information to
				856	precisely reflect any changes in memory permissions caused by the
				857	system call.
				858
				859	<p>Here's an example of a system call with an invalid parameter:
				860	<pre>
				861	#include <stdlib.h>
				862	#include <unistd.h>
				863	int main( void )
				864	{
				865	char* arr = malloc(10);
				866	(void) write( 1 /* stdout */, arr, 10 );
				867	return 0;
				868	}
				869	</pre>
				870
				871	<p>You get this complaint ...
				872	<pre>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	873	Syscall param write(buf) contains uninitialised or unaddressable byte(s)
				874	at 0x4035E072: __libc_write
				875	by 0x402A6E5E: __libc_start_main (libc-start.c:129)
				876	by 0x80483B1: (within tests/badwrite)
				877	by <bogus frame pointer> ???
				878	Address 0x3807E6D0 is 0 bytes inside a block of size 10 alloc'd
				879	at 0x4004FEE6: malloc (ut_clientmalloc.c:539)
				880	by 0x80484A0: main (tests/badwrite.c:6)
				881	by 0x402A6E5E: __libc_start_main (libc-start.c:129)
				882	by 0x80483B1: (within tests/badwrite)
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	883	</pre>
				884
				885	<p>... because the program has tried to write uninitialised junk from
				886	the malloc'd block to the standard output.
				887
				888
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	889	<h4>2.6.6  Warning messages you might see</h4>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	890
				891	Most of these only appear if you run in verbose mode (enabled by
				892	<code>-v</code>):
				893	<ul>
				894	<li> <code>More than 50 errors detected. Subsequent errors
				895	will still be recorded, but in less detail than before.</code>
				896	<br>
				897	After 50 different errors have been shown, Valgrind becomes
				898	more conservative about collecting them. It then requires only
				899	the program counters in the top two stack frames to match when
				900	deciding whether or not two errors are really the same one.
				901	Prior to this point, the PCs in the top four frames are required
				902	to match. This hack has the effect of slowing down the
				903	appearance of new errors after the first 50. The 50 constant can
				904	be changed by recompiling Valgrind.
				905	<p>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	906	<li> <code>More than 300 errors detected. I'm not reporting any more.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	907	Final error counts may be inaccurate. Go fix your
				908	program!</code>
				909	<br>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	910	After 300 different errors have been detected, Valgrind ignores
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	911	any more. It seems unlikely that collecting even more different
				912	ones would be of practical help to anybody, and it avoids the
				913	danger that Valgrind spends more and more of its time comparing
				914	new errors against an ever-growing collection. As above, the 500
				915	number is a compile-time constant.
				916	<p>
				917	<li> <code>Warning: client exiting by calling exit(<number>).
				918	Bye!</code>
				919	<br>
				920	Your program has called the <code>exit</code> system call, which
				921	will immediately terminate the process. You'll get no exit-time
				922	error summaries or leak checks. Note that this is not the same
				923	as your program calling the ANSI C function <code>exit()</code>
				924	-- that causes a normal, controlled shutdown of Valgrind.
				925	<p>
				926	<li> <code>Warning: client switching stacks?</code>
				927	<br>
				928	Valgrind spotted such a large change in the stack pointer, %esp,
				929	that it guesses the client is switching to a different stack.
				930	At this point it makes a kludgey guess where the base of the new
				931	stack is, and sets memory permissions accordingly. You may get
				932	many bogus error messages following this, if Valgrind guesses
				933	wrong. At the moment "large change" is defined as a change of
				934	more that 2000000 in the value of the %esp (stack pointer)
				935	register.
				936	<p>
				937	<li> <code>Warning: client attempted to close Valgrind's logfile fd <number>
				938	</code>
				939	<br>
				940	Valgrind doesn't allow the client
				941	to close the logfile, because you'd never see any diagnostic
				942	information after that point. If you see this message,
				943	you may want to use the <code>--logfile-fd=<number></code>
				944	option to specify a different logfile file-descriptor number.
				945	<p>
				946	<li> <code>Warning: noted but unhandled ioctl <number></code>
				947	<br>
				948	Valgrind observed a call to one of the vast family of
				949	<code>ioctl</code> system calls, but did not modify its
				950	memory status info (because I have not yet got round to it).
				951	The call will still have gone through, but you may get spurious
				952	errors after this as a result of the non-update of the memory info.
				953	<p>
				954	<li> <code>Warning: unblocking signal <number> due to
				955	sigprocmask</code>
				956	<br>
				957	Really just a diagnostic from the signal simulation machinery.
				958	This message will appear if your program handles a signal by
				959	first <code>longjmp</code>ing out of the signal handler,
				960	and then unblocking the signal with <code>sigprocmask</code>
				961	-- a standard signal-handling idiom.
				962	<p>
				963	<li> <code>Warning: bad signal number <number> in __NR_sigaction.</code>
				964	<br>
				965	Probably indicates a bug in the signal simulation machinery.
				966	<p>
				967	<li> <code>Warning: set address range perms: large range <number></code>
				968	<br>
				969	Diagnostic message, mostly for my benefit, to do with memory
				970	permissions.
				971	</ul>
				972
				973
				974	<a name="suppfiles"></a>
				975	<h3>2.7  Writing suppressions files</h3>
				976
				977	A suppression file describes a bunch of errors which, for one reason
				978	or another, you don't want Valgrind to tell you about. Usually the
				979	reason is that the system libraries are buggy but unfixable, at least
				980	within the scope of the current debugging session. Multiple
				981	suppresions files are allowed. By default, Valgrind uses
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	982	<code>$PREFIX/lib/valgrind/default.supp</code>.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	983
				984	<p>
				985	You can ask to add suppressions from another file, by specifying
				986	<code>--suppressions=/path/to/file.supp</code>.
				987
				988	<p>Each suppression has the following components:<br>
				989	<ul>
				990
				991	<li>Its name. This merely gives a handy name to the suppression, by
				992	which it is referred to in the summary of used suppressions
				993	printed out when a program finishes. It's not important what
				994	the name is; any identifying string will do.
				995	<p>
				996
				997	<li>The nature of the error to suppress. Either:
				998	<code>Value1</code>,
				999	<code>Value2</code>,
sewardj	a7dc795	2002-03-24 11:29:13 +0000	[diff] [blame]	1000	<code>Value4</code> or
				1001	<code>Value8</code>,
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1002	meaning an uninitialised-value error when
sewardj	a7dc795	2002-03-24 11:29:13 +0000	[diff] [blame]	1003	using a value of 1, 2, 4 or 8 bytes.
				1004	Or
				1005	<code>Cond</code> (or its old name, <code>Value0</code>),
				1006	meaning use of an uninitialised CPU condition code. Or:
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1007	<code>Addr1</code>,
				1008	<code>Addr2</code>,
				1009	<code>Addr4</code> or
				1010	<code>Addr8</code>, meaning an invalid address during a
				1011	memory access of 1, 2, 4 or 8 bytes respectively. Or
				1012	<code>Param</code>,
				1013	meaning an invalid system call parameter error. Or
				1014	<code>Free</code>, meaning an invalid or mismatching free.</li><br>
				1015	<p>
				1016
				1017	<li>The "immediate location" specification. For Value and Addr
				1018	errors, is either the name of the function in which the error
				1019	occurred, or, failing that, the full path the the .so file
				1020	containing the error location. For Param errors, is the name of
				1021	the offending system call parameter. For Free errors, is the
				1022	name of the function doing the freeing (eg, <code>free</code>,
				1023	<code>__builtin_vec_delete</code>, etc)</li><br>
				1024	<p>
				1025
				1026	<li>The caller of the above "immediate location". Again, either a
				1027	function or shared-object name.</li><br>
				1028	<p>
				1029
				1030	<li>Optionally, one or two extra calling-function or object names,
				1031	for greater precision.</li>
				1032	</ul>
				1033
				1034	<p>
				1035	Locations may be either names of shared objects or wildcards matching
				1036	function names. They begin <code>obj:</code> and <code>fun:</code>
				1037	respectively. Function and object names to match against may use the
				1038	wildcard characters <code>*</code> and <code>?</code>.
				1039
				1040	A suppression only suppresses an error when the error matches all the
				1041	details in the suppression. Here's an example:
				1042	<pre>
				1043	{
				1044	__gconv_transform_ascii_internal/__mbrtowc/mbtowc
				1045	Value4
				1046	fun:__gconv_transform_ascii_internal
				1047	fun:__mbr*toc
				1048	fun:mbtowc
				1049	}
				1050	</pre>
				1051
				1052	<p>What is means is: suppress a use-of-uninitialised-value error, when
				1053	the data size is 4, when it occurs in the function
				1054	<code>__gconv_transform_ascii_internal</code>, when that is called
				1055	from any function of name matching <code>__mbr*toc</code>,
				1056	when that is called from
				1057	<code>mbtowc</code>. It doesn't apply under any other circumstances.
				1058	The string by which this suppression is identified to the user is
				1059	__gconv_transform_ascii_internal/__mbrtowc/mbtowc.
				1060
				1061	<p>Another example:
				1062	<pre>
				1063	{
				1064	libX11.so.6.2/libX11.so.6.2/libXaw.so.7.0
				1065	Value4
				1066	obj:/usr/X11R6/lib/libX11.so.6.2
				1067	obj:/usr/X11R6/lib/libX11.so.6.2
				1068	obj:/usr/X11R6/lib/libXaw.so.7.0
				1069	}
				1070	</pre>
				1071
				1072	<p>Suppress any size 4 uninitialised-value error which occurs anywhere
				1073	in <code>libX11.so.6.2</code>, when called from anywhere in the same
				1074	library, when called from anywhere in <code>libXaw.so.7.0</code>. The
				1075	inexact specification of locations is regrettable, but is about all
				1076	you can hope for, given that the X11 libraries shipped with Red Hat
				1077	7.2 have had their symbol tables removed.
				1078
				1079	<p>Note -- since the above two examples did not make it clear -- that
				1080	you can freely mix the <code>obj:</code> and <code>fun:</code>
				1081	styles of description within a single suppression record.
				1082
				1083
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1084	<a name="clientreq"></a>
				1085	<h3>2.8  The Client Request mechanism</h3>
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	1086
				1087	Valgrind has a trapdoor mechanism via which the client program can
				1088	pass all manner of requests and queries to Valgrind. Internally, this
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1089	is used extensively to make malloc, free, signals, threads, etc, work,
				1090	although you don't see that.
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	1091	<p>
				1092	For your convenience, a subset of these so-called client requests is
				1093	provided to allow you to tell Valgrind facts about the behaviour of
				1094	your program, and conversely to make queries. In particular, your
				1095	program can tell Valgrind about changes in memory range permissions
				1096	that Valgrind would not otherwise know about, and so allows clients to
				1097	get Valgrind to do arbitrary custom checks.
				1098	<p>
				1099	Clients need to include the header file <code>valgrind.h</code> to
				1100	make this work. The macros therein have the magical property that
				1101	they generate code in-line which Valgrind can spot. However, the code
				1102	does nothing when not run on Valgrind, so you are not forced to run
				1103	your program on Valgrind just because you use the macros in this file.
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1104	Also, you are not required to link your program with any extra
				1105	supporting libraries.
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	1106	<p>
				1107	A brief description of the available macros:
				1108	<ul>
				1109	<li><code>VALGRIND_MAKE_NOACCESS</code>,
				1110	<code>VALGRIND_MAKE_WRITABLE</code> and
				1111	<code>VALGRIND_MAKE_READABLE</code>. These mark address
				1112	ranges as completely inaccessible, accessible but containing
				1113	undefined data, and accessible and containing defined data,
				1114	respectively. Subsequent errors may have their faulting
				1115	addresses described in terms of these blocks. Returns a
				1116	"block handle". Returns zero when not run on Valgrind.
				1117	<p>
				1118	<li><code>VALGRIND_DISCARD</code>: At some point you may want
				1119	Valgrind to stop reporting errors in terms of the blocks
				1120	defined by the previous three macros. To do this, the above
				1121	macros return a small-integer "block handle". You can pass
				1122	this block handle to <code>VALGRIND_DISCARD</code>. After
				1123	doing so, Valgrind will no longer be able to relate
				1124	addressing errors to the user-defined block associated with
				1125	the handle. The permissions settings associated with the
				1126	handle remain in place; this just affects how errors are
				1127	reported, not whether they are reported. Returns 1 for an
				1128	invalid handle and 0 for a valid handle (although passing
				1129	invalid handles is harmless). Always returns 0 when not run
				1130	on Valgrind.
				1131	<p>
				1132	<li><code>VALGRIND_CHECK_NOACCESS</code>,
				1133	<code>VALGRIND_CHECK_WRITABLE</code> and
				1134	<code>VALGRIND_CHECK_READABLE</code>: check immediately
				1135	whether or not the given address range has the relevant
				1136	property, and if not, print an error message. Also, for the
				1137	convenience of the client, returns zero if the relevant
				1138	property holds; otherwise, the returned value is the address
				1139	of the first byte for which the property is not true.
				1140	Always returns 0 when not run on Valgrind.
				1141	<p>
				1142	<li><code>VALGRIND_CHECK_NOACCESS</code>: a quick and easy way
				1143	to find out whether Valgrind thinks a particular variable
				1144	(lvalue, to be precise) is addressible and defined. Prints
				1145	an error message if not. Returns no value.
				1146	<p>
				1147	<li><code>VALGRIND_MAKE_NOACCESS_STACK</code>: a highly
				1148	experimental feature. Similarly to
				1149	<code>VALGRIND_MAKE_NOACCESS</code>, this marks an address
				1150	range as inaccessible, so that subsequent accesses to an
				1151	address in the range gives an error. However, this macro
				1152	does not return a block handle. Instead, all annotations
				1153	created like this are reviewed at each client
				1154	<code>ret</code> (subroutine return) instruction, and those
				1155	which now define an address range block the client's stack
				1156	pointer register (<code>%esp</code>) are automatically
				1157	deleted.
				1158	<p>
				1159	In other words, this macro allows the client to tell
				1160	Valgrind about red-zones on its own stack. Valgrind
				1161	automatically discards this information when the stack
				1162	retreats past such blocks. Beware: hacky and flaky, and
				1163	probably interacts badly with the new pthread support.
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	1164	<p>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1165	<li><code>RUNNING_ON_VALGRIND</code>: returns 1 if running on
				1166	Valgrind, 0 if running on the real CPU.
				1167	<p>
				1168	<li><code>VALGRIND_DO_LEAK_CHECK</code>: run the memory leak detector
				1169	right now. Returns no value. I guess this could be used to
				1170	incrementally check for leaks between arbitrary places in the
				1171	program's execution. Warning: not properly tested!
sewardj	18d7513	2002-05-16 11:06:21 +0000	[diff] [blame^]	1172	<p>
				1173	<li><code>VALGRIND_DISCARD_TRANSLATIONS</code>: discard translations
				1174	of code in the specified address range. Useful if you are
				1175	debugging a JITter or some other dynamic code generation system.
				1176	After this call, attempts to execute code in the invalidated
				1177	address range will cause valgrind to make new translations of that
				1178	code, which is probably the semantics you want. Note that this is
				1179	implemented naively, and involves checking all 200191 entries in
				1180	the translation table to see if any of them overlap the specified
				1181	address range. So try not to call it often, or performance will
				1182	nosedive. Note that you can be clever about this: you only need
				1183	to call it when an area which previously contained code is
				1184	overwritten with new code. You can choose to write code into
				1185	fresh memory, and just call this occasionally to discard large
				1186	chunks of old code all at once.
				1187	<p>
				1188	Warning: minimally tested. Also, doesn't interact well with the
				1189	cache simulator.
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1190	</ul>
				1191	<p>
				1192
				1193
				1194	<a name="pthreads"></a>
				1195	<h3>2.9  Support for POSIX Pthreads</h3>
				1196
				1197	As of late April 02, Valgrind supports programs which use POSIX
				1198	pthreads. Doing this has proved technically challenging and is still
				1199	in progress, but it works well enough, as of 1 May 02, for significant
				1200	threaded applications to work.
				1201	<p>
				1202	It works as follows: threaded apps are (dynamically) linked against
				1203	<code>libpthread.so</code>. Usually this is the one installed with
				1204	your Linux distribution. Valgrind, however, supplies its own
				1205	<code>libpthread.so</code> and automatically connects your program to
				1206	it instead.
				1207	<p>
				1208	The fake <code>libpthread.so</code> and Valgrind cooperate to
				1209	implement a user-space pthreads package. This approach avoids the
				1210	horrible implementation problems of implementing a truly
				1211	multiprocessor version of Valgrind, but it does mean that threaded
				1212	apps run only on one CPU, even if you have a multiprocessor machine.
				1213	<p>
				1214	Valgrind schedules your threads in a round-robin fashion, with all
				1215	threads having equal priority. It switches threads every 20000 basic
				1216	blocks (typically around 120000 x86 instructions), which means you'll
				1217	get a much finer interleaving of thread executions than when run
				1218	natively. This in itself may cause your program to behave differently
				1219	if you have some kind of concurrency, critical race, locking, or
				1220	similar, bugs.
				1221	<p>
				1222	The current (1 May 02) state of pthread support is as follows. Please
				1223	note that things are advancing rapidly, so the situation may have
				1224	improved by the time you read this -- check the web site for further
				1225	updates.
				1226	<ul>
				1227	<li>Mutexes, condition variables, thread-specific data and
				1228	<code>pthread_once</code> currently work.
				1229	<p>
				1230	<li>Various attribute-like calls are handled but ignored.
				1231	You get a warning message.
				1232	<p>
				1233	<li>The main big omission is proper cleanup support for cancellation.
				1234	<code>pthread_cancel</code> works, but instantly nukes the target
				1235	thread without giving it any chance to clean up. Also, when a
				1236	thread exits, it does not run any cleanup handlers.
				1237	<p>
				1238	<li>Currently the following syscalls are thread-safe (nonblocking):
				1239	<code>write</code> <code>read</code> <code>nanosleep</code>
				1240	<code>sleep</code> <code>select</code> and <code>poll</code>.
				1241	<p>
				1242	<li>The POSIX requirement that each thread have its own
				1243	signal-blocking mask is not done; the signal handling mechanism is
				1244	thread-unaware and all signals are delivered to the main thread,
				1245	antidisirregardless.
				1246	</ul>
				1247
				1248
				1249	As of 1 May 02, the following programs now work fine on my RedHat 7.2
				1250	box: Opera 6.0Beta2, KNode in KDE 3.0, Mozilla-0.9.2.1 and
				1251	Galeon-0.11.3, both as supplied with RedHat 7.2.
				1252	<p>
sewardj	1f13ab1	2002-05-02 03:57:00 +0000	[diff] [blame]	1253	Mozilla 1.0RC1 works fine too, provided that you patch it as described
				1254	here: <a href="http://bugzilla.mozilla.org/show_bug.cgi?id=124335">
				1255	http://bugzilla.mozilla.org/show_bug.cgi?id=124335</a>. This fixes a
				1256	bug in Mozilla which assumes that memory returned from
				1257	<code>malloc</code> is 8-aligned. Valgrind's allocator only
				1258	guarantees 4-alignment, so without the patch Mozilla makes an illegal
				1259	memory access, which Valgrind of course spots, and then bombs.
sewardj	18d7513	2002-05-16 11:06:21 +0000	[diff] [blame^]	1260	Mozilla 1.0RC2 works fine out-of-the-box.
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1261
				1262
				1263	<a name="install"></a>
				1264	<h3>2.10  Building and installing</h3>
				1265
				1266	We now use the standard Unix <code>./configure</code>,
				1267	<code>make</code>, <code>make install</code> mechanism, and I have
				1268	attempted to ensure that it works on machines with kernel 2.2 or 2.4
				1269	and glibc 2.1.X or 2.2.X. I don't think there is much else to say.
				1270	There are no options apart from the usual <code>--prefix</code> that
				1271	you should give to <code>./configure</code>.
				1272	<p>
				1273	Let me know if you have build problems.
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	1274
				1275
				1276
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1277	<a name="problems"></a>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1278	<h3>2.11  If you have problems</h3>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1279	Mail me (<a href="mailto:jseward@acm.org">jseward@acm.org</a>).
				1280
				1281	<p>See <a href="#limits">Section 4</a> for the known limitations of
				1282	Valgrind, and for a list of programs which are known not to work on
				1283	it.
				1284
				1285	<p>The translator/instrumentor has a lot of assertions in it. They
				1286	are permanently enabled, and I have no plans to disable them. If one
				1287	of these breaks, please mail me!
				1288
				1289	<p>If you get an assertion failure on the expression
				1290	<code>chunkSane(ch)</code> in <code>vg_free()</code> in
				1291	<code>vg_malloc.c</code>, this may have happened because your program
				1292	wrote off the end of a malloc'd block, or before its beginning.
				1293	Valgrind should have emitted a proper message to that effect before
				1294	dying in this way. This is a known problem which I should fix.
				1295	<p>
				1296
				1297	<hr width="100%">
				1298
				1299	<a name="machine"></a>
				1300	<h2>3  Details of the checking machinery</h2>
				1301
				1302	Read this section if you want to know, in detail, exactly what and how
				1303	Valgrind is checking.
				1304
				1305	<a name="vvalue"></a>
				1306	<h3>3.1  Valid-value (V) bits</h3>
				1307
				1308	It is simplest to think of Valgrind implementing a synthetic Intel x86
				1309	CPU which is identical to a real CPU, except for one crucial detail.
				1310	Every bit (literally) of data processed, stored and handled by the
				1311	real CPU has, in the synthetic CPU, an associated "valid-value" bit,
				1312	which says whether or not the accompanying bit has a legitimate value.
				1313	In the discussions which follow, this bit is referred to as the V
				1314	(valid-value) bit.
				1315
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1316	<p>Each byte in the system therefore has a 8 V bits which follow
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1317	it wherever it goes. For example, when the CPU loads a word-size item
				1318	(4 bytes) from memory, it also loads the corresponding 32 V bits from
				1319	a bitmap which stores the V bits for the process' entire address
				1320	space. If the CPU should later write the whole or some part of that
				1321	value to memory at a different address, the relevant V bits will be
				1322	stored back in the V-bit bitmap.
				1323
				1324	<p>In short, each bit in the system has an associated V bit, which
				1325	follows it around everywhere, even inside the CPU. Yes, the CPU's
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1326	(integer and <code>%eflags</code>) registers have their own V bit
				1327	vectors.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1328
				1329	<p>Copying values around does not cause Valgrind to check for, or
				1330	report on, errors. However, when a value is used in a way which might
				1331	conceivably affect the outcome of your program's computation, the
				1332	associated V bits are immediately checked. If any of these indicate
				1333	that the value is undefined, an error is reported.
				1334
				1335	<p>Here's an (admittedly nonsensical) example:
				1336	<pre>
				1337	int i, j;
				1338	int a[10], b[10];
				1339	for (i = 0; i < 10; i++) {
				1340	j = a[i];
				1341	b[i] = j;
				1342	}
				1343	</pre>
				1344
				1345	<p>Valgrind emits no complaints about this, since it merely copies
				1346	uninitialised values from <code>a[]</code> into <code>b[]</code>, and
				1347	doesn't use them in any way. However, if the loop is changed to
				1348	<pre>
				1349	for (i = 0; i < 10; i++) {
				1350	j += a[i];
				1351	}
				1352	if (j == 77)
				1353	printf("hello there\n");
				1354	</pre>
				1355	then Valgrind will complain, at the <code>if</code>, that the
				1356	condition depends on uninitialised values.
				1357
				1358	<p>Most low level operations, such as adds, cause Valgrind to
				1359	use the V bits for the operands to calculate the V bits for the
				1360	result. Even if the result is partially or wholly undefined,
				1361	it does not complain.
				1362
				1363	<p>Checks on definedness only occur in two places: when a value is
				1364	used to generate a memory address, and where control flow decision
				1365	needs to be made. Also, when a system call is detected, valgrind
				1366	checks definedness of parameters as required.
				1367
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1368	<p>If a check should detect undefinedness, an error message is
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1369	issued. The resulting value is subsequently regarded as well-defined.
				1370	To do otherwise would give long chains of error messages. In effect,
				1371	we say that undefined values are non-infectious.
				1372
				1373	<p>This sounds overcomplicated. Why not just check all reads from
				1374	memory, and complain if an undefined value is loaded into a CPU register?
				1375	Well, that doesn't work well, because perfectly legitimate C programs routinely
				1376	copy uninitialised values around in memory, and we don't want endless complaints
				1377	about that. Here's the canonical example. Consider a struct
				1378	like this:
				1379	<pre>
				1380	struct S { int x; char c; };
				1381	struct S s1, s2;
				1382	s1.x = 42;
				1383	s1.c = 'z';
				1384	s2 = s1;
				1385	</pre>
				1386
				1387	<p>The question to ask is: how large is <code>struct S</code>, in
				1388	bytes? An int is 4 bytes and a char one byte, so perhaps a struct S
				1389	occupies 5 bytes? Wrong. All (non-toy) compilers I know of will
				1390	round the size of <code>struct S</code> up to a whole number of words,
				1391	in this case 8 bytes. Not doing this forces compilers to generate
				1392	truly appalling code for subscripting arrays of <code>struct
				1393	S</code>'s.
				1394
				1395	<p>So s1 occupies 8 bytes, yet only 5 of them will be initialised.
				1396	For the assignment <code>s2 = s1</code>, gcc generates code to copy
				1397	all 8 bytes wholesale into <code>s2</code> without regard for their
				1398	meaning. If Valgrind simply checked values as they came out of
				1399	memory, it would yelp every time a structure assignment like this
				1400	happened. So the more complicated semantics described above is
				1401	necessary. This allows gcc to copy <code>s1</code> into
				1402	<code>s2</code> any way it likes, and a warning will only be emitted
				1403	if the uninitialised values are later used.
				1404
				1405	<p>One final twist to this story. The above scheme allows garbage to
				1406	pass through the CPU's integer registers without complaint. It does
				1407	this by giving the integer registers V tags, passing these around in
				1408	the expected way. This complicated and computationally expensive to
				1409	do, but is necessary. Valgrind is more simplistic about
				1410	floating-point loads and stores. In particular, V bits for data read
				1411	as a result of floating-point loads are checked at the load
				1412	instruction. So if your program uses the floating-point registers to
				1413	do memory-to-memory copies, you will get complaints about
				1414	uninitialised values. Fortunately, I have not yet encountered a
				1415	program which (ab)uses the floating-point registers in this way.
				1416
				1417	<a name="vaddress"></a>
				1418	<h3>3.2  Valid-address (A) bits</h3>
				1419
				1420	Notice that the previous section describes how the validity of values
				1421	is established and maintained without having to say whether the
				1422	program does or does not have the right to access any particular
				1423	memory location. We now consider the latter issue.
				1424
				1425	<p>As described above, every bit in memory or in the CPU has an
				1426	associated valid-value (V) bit. In addition, all bytes in memory, but
				1427	not in the CPU, have an associated valid-address (A) bit. This
				1428	indicates whether or not the program can legitimately read or write
				1429	that location. It does not give any indication of the validity or the
				1430	data at that location -- that's the job of the V bits -- only whether
				1431	or not the location may be accessed.
				1432
				1433	<p>Every time your program reads or writes memory, Valgrind checks the
				1434	A bits associated with the address. If any of them indicate an
				1435	invalid address, an error is emitted. Note that the reads and writes
				1436	themselves do not change the A bits, only consult them.
				1437
				1438	<p>So how do the A bits get set/cleared? Like this:
				1439
				1440	<ul>
				1441	<li>When the program starts, all the global data areas are marked as
				1442	accessible.</li><br>
				1443	<p>
				1444
				1445	<li>When the program does malloc/new, the A bits for the exactly the
				1446	area allocated, and not a byte more, are marked as accessible.
				1447	Upon freeing the area the A bits are changed to indicate
				1448	inaccessibility.</li><br>
				1449	<p>
				1450
				1451	<li>When the stack pointer register (%esp) moves up or down, A bits
				1452	are set. The rule is that the area from %esp up to the base of
				1453	the stack is marked as accessible, and below %esp is
				1454	inaccessible. (If that sounds illogical, bear in mind that the
				1455	stack grows down, not up, on almost all Unix systems, including
				1456	GNU/Linux.) Tracking %esp like this has the useful side-effect
				1457	that the section of stack used by a function for local variables
				1458	etc is automatically marked accessible on function entry and
				1459	inaccessible on exit.</li><br>
				1460	<p>
				1461
				1462	<li>When doing system calls, A bits are changed appropriately. For
				1463	example, mmap() magically makes files appear in the process's
				1464	address space, so the A bits must be updated if mmap()
				1465	succeeds.</li><br>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1466	<p>
				1467
				1468	<li>Optionally, your program can tell Valgrind about such changes
				1469	explicitly, using the client request mechanism described above.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1470	</ul>
				1471
				1472
				1473	<a name="together"></a>
				1474	<h3>3.3  Putting it all together</h3>
				1475	Valgrind's checking machinery can be summarised as follows:
				1476
				1477	<ul>
				1478	<li>Each byte in memory has 8 associated V (valid-value) bits,
				1479	saying whether or not the byte has a defined value, and a single
				1480	A (valid-address) bit, saying whether or not the program
				1481	currently has the right to read/write that address.</li><br>
				1482	<p>
				1483
				1484	<li>When memory is read or written, the relevant A bits are
				1485	consulted. If they indicate an invalid address, Valgrind emits
				1486	an Invalid read or Invalid write error.</li><br>
				1487	<p>
				1488
				1489	<li>When memory is read into the CPU's integer registers, the
				1490	relevant V bits are fetched from memory and stored in the
				1491	simulated CPU. They are not consulted.</li><br>
				1492	<p>
				1493
				1494	<li>When an integer register is written out to memory, the V bits
				1495	for that register are written back to memory too.</li><br>
				1496	<p>
				1497
				1498	<li>When memory is read into the CPU's floating point registers, the
				1499	relevant V bits are read from memory and they are immediately
				1500	checked. If any are invalid, an uninitialised value error is
				1501	emitted. This precludes using the floating-point registers to
				1502	copy possibly-uninitialised memory, but simplifies Valgrind in
				1503	that it does not have to track the validity status of the
				1504	floating-point registers.</li><br>
				1505	<p>
				1506
				1507	<li>As a result, when a floating-point register is written to
				1508	memory, the associated V bits are set to indicate a valid
				1509	value.</li><br>
				1510	<p>
				1511
				1512	<li>When values in integer CPU registers are used to generate a
				1513	memory address, or to determine the outcome of a conditional
				1514	branch, the V bits for those values are checked, and an error
				1515	emitted if any of them are undefined.</li><br>
				1516	<p>
				1517
				1518	<li>When values in integer CPU registers are used for any other
				1519	purpose, Valgrind computes the V bits for the result, but does
				1520	not check them.</li><br>
				1521	<p>
				1522
				1523	<li>One the V bits for a value in the CPU have been checked, they
				1524	are then set to indicate validity. This avoids long chains of
				1525	errors.</li><br>
				1526	<p>
				1527
				1528	<li>When values are loaded from memory, valgrind checks the A bits
				1529	for that location and issues an illegal-address warning if
				1530	needed. In that case, the V bits loaded are forced to indicate
				1531	Valid, despite the location being invalid.
				1532	<p>
				1533	This apparently strange choice reduces the amount of confusing
				1534	information presented to the user. It avoids the
				1535	unpleasant phenomenon in which memory is read from a place which
				1536	is both unaddressible and contains invalid values, and, as a
				1537	result, you get not only an invalid-address (read/write) error,
				1538	but also a potentially large set of uninitialised-value errors,
				1539	one for every time the value is used.
				1540	<p>
				1541	There is a hazy boundary case to do with multi-byte loads from
				1542	addresses which are partially valid and partially invalid. See
				1543	details of the flag <code>--partial-loads-ok</code> for details.
				1544	</li><br>
				1545	</ul>
				1546
				1547	Valgrind intercepts calls to malloc, calloc, realloc, valloc,
				1548	memalign, free, new and delete. The behaviour you get is:
				1549
				1550	<ul>
				1551
				1552	<li>malloc/new: the returned memory is marked as addressible but not
				1553	having valid values. This means you have to write on it before
				1554	you can read it.</li><br>
				1555	<p>
				1556
				1557	<li>calloc: returned memory is marked both addressible and valid,
				1558	since calloc() clears the area to zero.</li><br>
				1559	<p>
				1560
				1561	<li>realloc: if the new size is larger than the old, the new section
				1562	is addressible but invalid, as with malloc.</li><br>
				1563	<p>
				1564
				1565	<li>If the new size is smaller, the dropped-off section is marked as
				1566	unaddressible. You may only pass to realloc a pointer
				1567	previously issued to you by malloc/calloc/new/realloc.</li><br>
				1568	<p>
				1569
				1570	<li>free/delete: you may only pass to free a pointer previously
				1571	issued to you by malloc/calloc/new/realloc, or the value
				1572	NULL. Otherwise, Valgrind complains. If the pointer is indeed
				1573	valid, Valgrind marks the entire area it points at as
				1574	unaddressible, and places the block in the freed-blocks-queue.
				1575	The aim is to defer as long as possible reallocation of this
				1576	block. Until that happens, all attempts to access it will
				1577	elicit an invalid-address error, as you would hope.</li><br>
				1578	</ul>
				1579
				1580
				1581
				1582	<a name="signals"></a>
				1583	<h3>3.4  Signals</h3>
				1584
				1585	Valgrind provides suitable handling of signals, so, provided you stick
				1586	to POSIX stuff, you should be ok. Basic sigaction() and sigprocmask()
				1587	are handled. Signal handlers may return in the normal way or do
				1588	longjmp(); both should work ok. As specified by POSIX, a signal is
				1589	blocked in its own handler. Default actions for signals should work
				1590	as before. Etc, etc.
				1591
				1592	<p>Under the hood, dealing with signals is a real pain, and Valgrind's
				1593	simulation leaves much to be desired. If your program does
				1594	way-strange stuff with signals, bad things may happen. If so, let me
				1595	know. I don't promise to fix it, but I'd at least like to be aware of
				1596	it.
				1597
				1598
				1599	<a name="leaks"><a/>
				1600	<h3>3.5  Memory leak detection</h3>
				1601
				1602	Valgrind keeps track of all memory blocks issued in response to calls
				1603	to malloc/calloc/realloc/new. So when the program exits, it knows
				1604	which blocks are still outstanding -- have not been returned, in other
				1605	words. Ideally, you want your program to have no blocks still in use
				1606	at exit. But many programs do.
				1607
				1608	<p>For each such block, Valgrind scans the entire address space of the
				1609	process, looking for pointers to the block. One of three situations
				1610	may result:
				1611
				1612	<ul>
				1613	<li>A pointer to the start of the block is found. This usually
				1614	indicates programming sloppiness; since the block is still
				1615	pointed at, the programmer could, at least in principle, free'd
				1616	it before program exit.</li><br>
				1617	<p>
				1618
				1619	<li>A pointer to the interior of the block is found. The pointer
				1620	might originally have pointed to the start and have been moved
				1621	along, or it might be entirely unrelated. Valgrind deems such a
				1622	block as "dubious", that is, possibly leaked,
				1623	because it's unclear whether or
				1624	not a pointer to it still exists.</li><br>
				1625	<p>
				1626
				1627	<li>The worst outcome is that no pointer to the block can be found.
				1628	The block is classified as "leaked", because the
				1629	programmer could not possibly have free'd it at program exit,
				1630	since no pointer to it exists. This might be a symptom of
				1631	having lost the pointer at some earlier point in the
				1632	program.</li>
				1633	</ul>
				1634
				1635	Valgrind reports summaries about leaked and dubious blocks.
				1636	For each such block, it will also tell you where the block was
				1637	allocated. This should help you figure out why the pointer to it has
				1638	been lost. In general, you should attempt to ensure your programs do
				1639	not have any leaked or dubious blocks at exit.
				1640
				1641	<p>The precise area of memory in which Valgrind searches for pointers
				1642	is: all naturally-aligned 4-byte words for which all A bits indicate
				1643	addressibility and all V bits indicated that the stored value is
				1644	actually valid.
				1645
				1646	<p><hr width="100%">
				1647
				1648
				1649	<a name="limits"></a>
				1650	<h2>4  Limitations</h2>
				1651
				1652	The following list of limitations seems depressingly long. However,
				1653	most programs actually work fine.
				1654
				1655	<p>Valgrind will run x86-GNU/Linux ELF dynamically linked binaries, on
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1656	a kernel 2.2.X or 2.4.X system, subject to the following constraints:
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1657
				1658	<ul>
				1659	<li>No MMX, SSE, SSE2, 3DNow instructions. If the translator
				1660	encounters these, Valgrind will simply give up. It may be
				1661	possible to add support for them at a later time. Intel added a
				1662	few instructions such as "cmov" to the integer instruction set
				1663	on Pentium and later processors, and these are supported.
				1664	Nevertheless it's safest to think of Valgrind as implementing
				1665	the 486 instruction set.</li><br>
				1666	<p>
				1667
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1668	<li>Pthreads support is improving, but there are still significant
				1669	limitations in that department. See the section above on
				1670	Pthreads. Note that your program must be dynamically linked
				1671	against <code>libpthread.so</code>, so that Valgrind can
				1672	substitute its own implementation at program startup time. If
				1673	you're statically linked against it, things will fail
				1674	badly.</li><br>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1675	<p>
				1676
				1677	<li>Valgrind assumes that the floating point registers are not used
				1678	as intermediaries in memory-to-memory copies, so it immediately
				1679	checks V bits in floating-point loads/stores. If you want to
				1680	write code which copies around possibly-uninitialised values,
				1681	you must ensure these travel through the integer registers, not
				1682	the FPU.</li><br>
				1683	<p>
				1684
				1685	<li>If your program does its own memory management, rather than
				1686	using malloc/new/free/delete, it should still work, but
				1687	Valgrind's error checking won't be so effective.</li><br>
				1688	<p>
				1689
				1690	<li>Valgrind's signal simulation is not as robust as it could be.
				1691	Basic POSIX-compliant sigaction and sigprocmask functionality is
				1692	supplied, but it's conceivable that things could go badly awry
				1693	if you do wierd things with signals. Workaround: don't.
				1694	Programs that do non-POSIX signal tricks are in any case
				1695	inherently unportable, so should be avoided if
				1696	possible.</li><br>
				1697	<p>
				1698
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1699	<li>Programs which try to handle signals on
				1700	an alternate stack (sigaltstack) are not supported, although
				1701	they could be, with a bit of effort.</li><br>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1702	<p>
				1703
				1704	<li>Programs which switch stacks are not well handled. Valgrind
				1705	does have support for this, but I don't have great faith in it.
				1706	It's difficult -- there's no cast-iron way to decide whether a
				1707	large change in %esp is as a result of the program switching
				1708	stacks, or merely allocating a large object temporarily on the
				1709	current stack -- yet Valgrind needs to handle the two situations
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1710	differently. 1 May 02: this probably interacts badly with the
				1711	new pthread support. I haven't checked properly.</li><br>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1712	<p>
				1713
				1714	<li>x86 instructions, and system calls, have been implemented on
				1715	demand. So it's possible, although unlikely, that a program
				1716	will fall over with a message to that effect. If this happens,
				1717	please mail me ALL the details printed out, so I can try and
				1718	implement the missing feature.</li><br>
				1719	<p>
				1720
				1721	<li>x86 floating point works correctly, but floating-point code may
				1722	run even more slowly than integer code, due to my simplistic
				1723	approach to FPU emulation.</li><br>
				1724	<p>
				1725
				1726	<li>You can't Valgrind-ize statically linked binaries. Valgrind
				1727	relies on the dynamic-link mechanism to gain control at
				1728	startup.</li><br>
				1729	<p>
				1730
				1731	<li>Memory consumption of your program is majorly increased whilst
				1732	running under Valgrind. This is due to the large amount of
				1733	adminstrative information maintained behind the scenes. Another
				1734	cause is that Valgrind dynamically translates the original
sewardj	18d7513	2002-05-16 11:06:21 +0000	[diff] [blame^]	1735	executable. Translated, instrumented code is 14-16 times larger
				1736	than the original (!) so you can easily end up with 30+ MB of
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1737	translations when running (eg) a web browser.
				1738	</li>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1739	</ul>
				1740
				1741
				1742	Programs which are known not to work are:
				1743
				1744	<ul>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1745	<li>emacs starts up but immediately concludes it is out of memory
				1746	and aborts. Emacs has it's own memory-management scheme, but I
				1747	don't understand why this should interact so badly with
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1748	Valgrind. Emacs works fine if you build it to use the standard
				1749	malloc/free routines.</li><br>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1750	<p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1751	</ul>
				1752
				1753
				1754	<p><hr width="100%">
				1755
				1756
				1757	<a name="howitworks"></a>
				1758	<h2>5  How it works -- a rough overview</h2>
				1759	Some gory details, for those with a passion for gory details. You
				1760	don't need to read this section if all you want to do is use Valgrind.
				1761
				1762	<a name="startb"></a>
				1763	<h3>5.1  Getting started</h3>
				1764
				1765	Valgrind is compiled into a shared object, valgrind.so. The shell
				1766	script valgrind sets the LD_PRELOAD environment variable to point to
				1767	valgrind.so. This causes the .so to be loaded as an extra library to
				1768	any subsequently executed dynamically-linked ELF binary, viz, the
				1769	program you want to debug.
				1770
				1771	<p>The dynamic linker allows each .so in the process image to have an
				1772	initialisation function which is run before main(). It also allows
				1773	each .so to have a finalisation function run after main() exits.
				1774
				1775	<p>When valgrind.so's initialisation function is called by the dynamic
				1776	linker, the synthetic CPU to starts up. The real CPU remains locked
				1777	in valgrind.so for the entire rest of the program, but the synthetic
				1778	CPU returns from the initialisation function. Startup of the program
				1779	now continues as usual -- the dynamic linker calls all the other .so's
				1780	initialisation routines, and eventually runs main(). This all runs on
				1781	the synthetic CPU, not the real one, but the client program cannot
				1782	tell the difference.
				1783
				1784	<p>Eventually main() exits, so the synthetic CPU calls valgrind.so's
				1785	finalisation function. Valgrind detects this, and uses it as its cue
				1786	to exit. It prints summaries of all errors detected, possibly checks
				1787	for memory leaks, and then exits the finalisation routine, but now on
				1788	the real CPU. The synthetic CPU has now lost control -- permanently
				1789	-- so the program exits back to the OS on the real CPU, just as it
				1790	would have done anyway.
				1791
				1792	<p>On entry, Valgrind switches stacks, so it runs on its own stack.
				1793	On exit, it switches back. This means that the client program
				1794	continues to run on its own stack, so we can switch back and forth
				1795	between running it on the simulated and real CPUs without difficulty.
				1796	This was an important design decision, because it makes it easy (well,
				1797	significantly less difficult) to debug the synthetic CPU.
				1798
				1799
				1800	<a name="engine"></a>
				1801	<h3>5.2  The translation/instrumentation engine</h3>
				1802
				1803	Valgrind does not directly run any of the original program's code. Only
				1804	instrumented translations are run. Valgrind maintains a translation
				1805	table, which allows it to find the translation quickly for any branch
				1806	target (code address). If no translation has yet been made, the
				1807	translator - a just-in-time translator - is summoned. This makes an
				1808	instrumented translation, which is added to the collection of
				1809	translations. Subsequent jumps to that address will use this
				1810	translation.
				1811
sewardj	18d7513	2002-05-16 11:06:21 +0000	[diff] [blame^]	1812	<p>Valgrind no longer directly supports detection of self-modifying
				1813	code. Such checking is expensive, and in practice (fortunately)
				1814	almost no applications need it. However, to help people who are
				1815	debugging dynamic code generation systems, there is a Client Request
				1816	(basically a macro you can put in your program) which directs Valgrind
				1817	to discard translations in a given address range. So Valgrind can
				1818	still work in this situation provided the client tells it when
				1819	code has become out-of-date and needs to be retranslated.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1820
				1821	<p>The JITter translates basic blocks -- blocks of straight-line-code
				1822	-- as single entities. To minimise the considerable difficulties of
				1823	dealing with the x86 instruction set, x86 instructions are first
				1824	translated to a RISC-like intermediate code, similar to sparc code,
				1825	but with an infinite number of virtual integer registers. Initially
				1826	each insn is translated seperately, and there is no attempt at
				1827	instrumentation.
				1828
				1829	<p>The intermediate code is improved, mostly so as to try and cache
				1830	the simulated machine's registers in the real machine's registers over
				1831	several simulated instructions. This is often very effective. Also,
				1832	we try to remove redundant updates of the simulated machines's
				1833	condition-code register.
				1834
				1835	<p>The intermediate code is then instrumented, giving more
				1836	intermediate code. There are a few extra intermediate-code operations
				1837	to support instrumentation; it is all refreshingly simple. After
				1838	instrumentation there is a cleanup pass to remove redundant value
				1839	checks.
				1840
				1841	<p>This gives instrumented intermediate code which mentions arbitrary
				1842	numbers of virtual registers. A linear-scan register allocator is
				1843	used to assign real registers and possibly generate spill code. All
				1844	of this is still phrased in terms of the intermediate code. This
				1845	machinery is inspired by the work of Reuben Thomas (MITE).
				1846
				1847	<p>Then, and only then, is the final x86 code emitted. The
				1848	intermediate code is carefully designed so that x86 code can be
				1849	generated from it without need for spare registers or other
				1850	inconveniences.
				1851
				1852	<p>The translations are managed using a traditional LRU-based caching
				1853	scheme. The translation cache has a default size of about 14MB.
				1854
				1855	<a name="track"></a>
				1856
				1857	<h3>5.3  Tracking the status of memory</h3> Each byte in the
				1858	process' address space has nine bits associated with it: one A bit and
				1859	eight V bits. The A and V bits for each byte are stored using a
				1860	sparse array, which flexibly and efficiently covers arbitrary parts of
				1861	the 32-bit address space without imposing significant space or
				1862	performance overheads for the parts of the address space never
				1863	visited. The scheme used, and speedup hacks, are described in detail
				1864	at the top of the source file vg_memory.c, so you should read that for
				1865	the gory details.
				1866
				1867	<a name="sys_calls"></a>
				1868
				1869	<h3>5.4 System calls</h3>
				1870	All system calls are intercepted. The memory status map is consulted
				1871	before and updated after each call. It's all rather tiresome. See
				1872	vg_syscall_mem.c for details.
				1873
				1874	<a name="sys_signals"></a>
				1875
				1876	<h3>5.5  Signals</h3>
				1877	All system calls to sigaction() and sigprocmask() are intercepted. If
				1878	the client program is trying to set a signal handler, Valgrind makes a
				1879	note of the handler address and which signal it is for. Valgrind then
				1880	arranges for the same signal to be delivered to its own handler.
				1881
				1882	<p>When such a signal arrives, Valgrind's own handler catches it, and
				1883	notes the fact. At a convenient safe point in execution, Valgrind
				1884	builds a signal delivery frame on the client's stack and runs its
				1885	handler. If the handler longjmp()s, there is nothing more to be said.
				1886	If the handler returns, Valgrind notices this, zaps the delivery
				1887	frame, and carries on where it left off before delivering the signal.
				1888
				1889	<p>The purpose of this nonsense is that setting signal handlers
				1890	essentially amounts to giving callback addresses to the Linux kernel.
				1891	We can't allow this to happen, because if it did, signal handlers
				1892	would run on the real CPU, not the simulated one. This means the
				1893	checking machinery would not operate during the handler run, and,
				1894	worse, memory permissions maps would not be updated, which could cause
				1895	spurious error reports once the handler had returned.
				1896
				1897	<p>An even worse thing would happen if the signal handler longjmp'd
				1898	rather than returned: Valgrind would completely lose control of the
				1899	client program.
				1900
				1901	<p>Upshot: we can't allow the client to install signal handlers
				1902	directly. Instead, Valgrind must catch, on behalf of the client, any
				1903	signal the client asks to catch, and must delivery it to the client on
				1904	the simulated CPU, not the real one. This involves considerable
				1905	gruesome fakery; see vg_signals.c for details.
				1906	<p>
				1907
				1908	<hr width="100%">
				1909
				1910	<a name="example"></a>
				1911	<h2>6  Example</h2>
				1912	This is the log for a run of a small program. The program is in fact
				1913	correct, and the reported error is as the result of a potentially serious
				1914	code generation bug in GNU g++ (snapshot 20010527).
				1915	<pre>
				1916	sewardj@phoenix:~/newmat10$
				1917	~/Valgrind-6/valgrind -v ./bogon
				1918	==25832== Valgrind 0.10, a memory error detector for x86 RedHat 7.1.
				1919	==25832== Copyright (C) 2000-2001, and GNU GPL'd, by Julian Seward.
				1920	==25832== Startup, with flags:
				1921	==25832== --suppressions=/home/sewardj/Valgrind/redhat71.supp
				1922	==25832== reading syms from /lib/ld-linux.so.2
				1923	==25832== reading syms from /lib/libc.so.6
				1924	==25832== reading syms from /mnt/pima/jrs/Inst/lib/libgcc_s.so.0
				1925	==25832== reading syms from /lib/libm.so.6
				1926	==25832== reading syms from /mnt/pima/jrs/Inst/lib/libstdc++.so.3
				1927	==25832== reading syms from /home/sewardj/Valgrind/valgrind.so
				1928	==25832== reading syms from /proc/self/exe
				1929	==25832== loaded 5950 symbols, 142333 line number locations
				1930	==25832==
				1931	==25832== Invalid read of size 4
				1932	==25832== at 0x8048724: _ZN10BandMatrix6ReSizeEiii (bogon.cpp:45)
				1933	==25832== by 0x80487AF: main (bogon.cpp:66)
				1934	==25832== by 0x40371E5E: __libc_start_main (libc-start.c:129)
				1935	==25832== by 0x80485D1: (within /home/sewardj/newmat10/bogon)
				1936	==25832== Address 0xBFFFF74C is not stack'd, malloc'd or free'd
				1937	==25832==
				1938	==25832== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
				1939	==25832== malloc/free: in use at exit: 0 bytes in 0 blocks.
				1940	==25832== malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
				1941	==25832== For a detailed leak analysis, rerun with: --leak-check=yes
				1942	==25832==
				1943	==25832== exiting, did 1881 basic blocks, 0 misses.
				1944	==25832== 223 translations, 3626 bytes in, 56801 bytes out.
				1945	</pre>
				1946	<p>The GCC folks fixed this about a week before gcc-3.0 shipped.
				1947	<hr width="100%">
				1948	<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	1949
				1950
				1951
				1952	<a name="cache"></a>
				1953	<h2>7  Cache profiling</h2>
				1954	As well as memory debugging, Valgrind also allows you to do cache simulations
				1955	and annotate your source line-by-line with the number of cache misses. In
				1956	particular, it records:
				1957	<ul>
				1958	<li>L1 instruction cache reads and misses;
				1959	<li>L1 data cache reads and read misses, writes and write misses;
				1960	<li>L2 unified cache reads and read misses, writes and writes misses.
				1961	</ul>
				1962	On a modern x86 machine, an L1 miss will typically cost around 10 cycles,
				1963	and an L2 miss can cost as much as 200 cycles. Detailed cache profiling can be
njn	7cfd572	2002-05-03 17:51:10 +0000	[diff] [blame]	1964	very useful for improving the performance of your program.<p>
				1965
				1966	Also, since one instruction cache read is performed per instruction executed,
				1967	you can find out how many instructions are executed per line, which can be
				1968	useful for optimisation and test coverage.<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	1969
				1970	Please note that this is an experimental feature. Any feedback, bug-fixes,
				1971	suggestions, etc, welcome.
				1972
				1973
				1974	<h3>7.1  Overview</h3>
				1975	First off, as for normal Valgrind use, you probably want to turn on debugging
				1976	info (the <code>-g</code> flag). But by contrast with normal Valgrind use, you
				1977	probably <b>do</b> want to turn optimisation on, since you should profile your
				1978	program as it will be normally run.
				1979
				1980	The three steps are:
				1981	<ol>
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	1982	<li>Generate a cache simulator for your machine's cache
				1983	configuration with the supplied <code>vg_cachegen</code>
				1984	program, and recompile Valgrind with <code>make install</code>.
				1985	<p>
				1986	The default settings are for an AMD Athlon, and you will get
				1987	useful information with the defaults, so you can skip this step
				1988	if you want. Nevertheless, for accurate cache profiles you will
				1989	need use <code>vg_cachegen</code> to customise
				1990	<code>cachegrind</code> for your system.
				1991	<p>
				1992	This step only needs to be done once, unless you are interested
				1993	in simulating different cache configurations (eg. first
				1994	concentrating on instruction cache misses, then on data cache
				1995	misses).
				1996	</li>
				1997	<p>
				1998	<li>Run your program with <code>cachegrind</code> in front of the
				1999	normal command line invocation. When the program finishes,
				2000	Valgrind will print summary cache statistics. It also collects
				2001	line-by-line information in a file <code>cachegrind.out</code>.
				2002	<p>
				2003	This step should be done every time you want to collect
				2004	information about a new program, a changed program, or about the
				2005	same program with different input.
				2006	</li>
				2007	<p>
				2008	<li>Generate a function-by-function summary, and possibly annotate
				2009	source files with 'vg_annotate'. Source files to annotate can be
				2010	specified manually, or manually on the command line, or
				2011	"interesting" source files can be annotated automatically with
				2012	the <code>--auto=yes</code> option. You can annotate C/C++
				2013	files or assembly language files equally easily.</li>
				2014	<p>
				2015	This step can be performed as many times as you like for each
				2016	Step 2. You may want to do multiple annotations showing
				2017	different information each time.<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2018	</ol>
				2019
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2020	The steps are described in detail in the following sections.<p>
				2021
				2022
				2023	<a name="generate"></a>
				2024	<h3>7.3  Generating a cache simulator</h3>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2025
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2026	Although Valgrind comes with a pre-generated cache simulator, it most
				2027	likely won't match the cache configuration of your machine, so you
				2028	should generate a new simulator.<p>
				2029
				2030	You need to generate three files, one for each of the I1, D1 and L2
				2031	caches. For each cache, you need to know the:
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2032	<ul>
				2033	<li>Cache size (bytes);
				2034	<li>Line size (bytes);
				2035	<li>Associativity.
				2036	</ul>
				2037
				2038	vg_cachegen takes three options:
				2039	<ul>
				2040	<li><code>--I1=size,line_size,associativity</code>
				2041	<li><code>--D1=size,line_size,associativity</code>
				2042	<li><code>--L2=size,line_size,associativity</code>
				2043	</ul>
				2044
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2045	You can specify one, two or all three caches per invocation of
				2046	vg_cachegen. It checks that the configuration is sensible before
				2047	generating the simulators; to see the allowed values, run
				2048	<code>vg_cachegen -h</code>.<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2049
				2050	An example invocation would be:
				2051
				2052	<blockquote><code>
				2053	vg_cachegen --I1=65536,64,2 --D1=65536,64,2 --L2=262144,64,8
				2054	</code></blockquote>
				2055
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2056	This simulates a machine with a 128KB split L1 2-way associative
				2057	cache, and a 256KB unified 8-way associative L2 cache. Both caches
				2058	have 64B lines.<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2059
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2060	If you don't know your cache configuration, you'll have to find it
				2061	out. (Ideally <code>vg_cachegen</code> could auto-identify your cache
				2062	configuration using the CPUID instruction, which could be done
				2063	automatically during installation, and this whole step could be
				2064	skipped.)<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2065
				2066
				2067	<h3>7.4  Cache simulation specifics</h3>
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2068
				2069	<code>vg_cachegen</code> only generates simulations for a machine with
				2070	a split L1 cache and a unified L2 cache. This configuration is used
				2071	for all (modern) x86-based machines we are aware of. Old Cyrix CPUs
				2072	had a unified I and D L1 cache, but they are ancient history now.<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2073
				2074	The more specific characteristics of the simulation are as follows.
				2075
				2076	<ul>
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2077	<li>Write-allocate: when a write miss occurs, the block written to
				2078	is brought into the D1 cache. Most modern caches have this
				2079	property.</li><p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2080
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2081	<li>Bit-selection hash function: the line(s) in the cache to which a
				2082	memory block maps is chosen by the middle bits M--(M+N-1) of the
				2083	byte address, where:
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2084	<ul>
				2085	<li> line size = 2^M bytes </li>
				2086	<li>(cache size / line size) = 2^N bytes</li>
				2087	</ul> </li><p>
				2088
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2089	<li>Inclusive L2 cache: the L2 cache replicates all the entries of
				2090	the L1 cache. This is standard on Pentium chips, but AMD
				2091	Athlons use an exclusive L2 cache that only holds blocks evicted
				2092	from L1. Ditto AMD Durons and most modern VIAs.</li><p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2093	</ul>
				2094
				2095	Other noteworthy behaviour:
				2096
				2097	<ul>
				2098	<li>References that straddle two cache lines are treated as follows:</li>
				2099	<ul>
				2100	<li>If both blocks hit --> counted as one hit</li>
				2101	<li>If one block hits, the other misses --> counted as one miss</li>
				2102	<li>If both blocks miss --> counted as one miss (not two)</li>
				2103	</ul><p>
				2104
				2105	<li>Instructions that modify a memory location (eg. <code>inc</code> and
				2106	<code>dec</code>) are counted as doing just a read, ie. a single data
				2107	reference. This may seem strange, but since the write can never cause a
				2108	miss (the read guarantees the block is in the cache) it's not very
				2109	interesting.<p>
				2110
				2111	Thus it measures not the number of times the data cache is accessed, but
				2112	the number of times a data cache miss could occur.<p>
				2113	</li>
				2114	</ul>
				2115
				2116	If you are interested in simulating a cache with different properties, it is
				2117	not particularly hard to write your own cache simulator, or to modify existing
				2118	ones in <code>vg_cachesim_I1.c</code>, <code>vg_cachesim_I1.c</code> and
				2119	<code>vg_cachesim_I1.c</code>. We'd be interested to hear from anyone who
				2120	does.
				2121
				2122
				2123	<a name="profile"></a>
				2124	<h3>7.5  Profiling programs</h3>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2125
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2126	Cache profiling is enabled by using the <code>--cachesim=yes</code>
				2127	option to the <code>valgrind</code> shell script. Alternatively, it
				2128	is probably more convenient to use the <code>cachegrind</code> script.
				2129	This automatically turns off Valgrind's memory checking functions,
				2130	since the cache simulation is slow enough already, and you probably
				2131	don't want to do both at once.
				2132	<p>
				2133	To gather cache profiling information about the program <code>ls
				2134	-l<code, type:
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2135
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2136	<blockquote><code>cachegrind ls -l</code></blockquote>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2137
				2138	The program will execute (slowly). Upon completion, summary statistics
				2139	that look like this will be printed:
				2140
				2141	<pre>
				2142	==31751== I refs: 27,742,716
				2143	==31751== I1 misses: 276
				2144	==31751== L2 misses: 275
				2145	==31751== I1 miss rate: 0.0%
				2146	==31751== L2i miss rate: 0.0%
				2147	==31751==
				2148	==31751== D refs: 15,430,290 (10,955,517 rd + 4,474,773 wr)
				2149	==31751== D1 misses: 41,185 ( 21,905 rd + 19,280 wr)
				2150	==31751== L2 misses: 23,085 ( 3,987 rd + 19,098 wr)
				2151	==31751== D1 miss rate: 0.2% ( 0.1% + 0.4%)
				2152	==31751== L2d miss rate: 0.1% ( 0.0% + 0.4%)
				2153	==31751==
				2154	==31751== L2 misses: 23,360 ( 4,262 rd + 19,098 wr)
				2155	==31751== L2 miss rate: 0.0% ( 0.0% + 0.4%)
				2156	</pre>
				2157
				2158	Cache accesses for instruction fetches are summarised first, giving the
				2159	number of fetches made (this is the number of instructions executed, which
				2160	can be useful to know in its own right), the number of I1 misses, and the
				2161	number of L2 instruction (<code>L2i</code>) misses.<p>
				2162
				2163	Cache accesses for data follow. The information is similar to that of the
				2164	instruction fetches, except that the values are also shown split between reads
				2165	and writes (note each row's <code>rd</code> and <code>wr</code> values add up
				2166	to the row's total).<p>
				2167
				2168	Combined instruction and data figures for the L2 cache follow that.<p>
				2169
				2170
				2171	<h3>7.6  Output file</h3>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2172
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2173	As well as printing summary information, Cachegrind also writes
				2174	line-by-line cache profiling information to a file named
				2175	<code>cachegrind.out</code>. This file is human-readable, but is best
				2176	interpreted by the accompanying program <code>vg_annotate</code>,
				2177	described in the next section.
				2178	<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2179	Things to note about the <code>cachegrind.out</code> file:
				2180	<ul>
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2181	<li>It is written every time <code>valgrind --cachesim=yes</code> or
				2182	<code>cachegrind</code> is run, and will overwrite any existing
				2183	<code>cachegrind.out</code> in the current directory.</li>
				2184	<p>
				2185	<li>It can be huge: <code>ls -l</code> generates a file of about
				2186	350KB. Browsing a few files and web pages with a Konqueror
				2187	built with full debugging information generates a file
				2188	of around 15 MB.</li>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2189	</ul>
				2190
				2191
				2192	<a name="annotate"></a>
				2193	<h3>7.7  Annotating C/C++ programs</h3>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2194
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2195	Before using <code>vg_annotate</code>, it is worth widening your
				2196	window to be at least 120-characters wide if possible, as the output
				2197	lines can be quite long.
				2198	<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2199	To get a function-by-function summary, run <code>vg_annotate</code> in
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2200	directory containing a <code>cachegrind.out</code> file. The output
				2201	looks like this:
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2202
				2203	<pre>
				2204	--------------------------------------------------------------------------------
				2205	I1 cache: 65536 B, 64 B, 2-way associative
				2206	D1 cache: 65536 B, 64 B, 2-way associative
				2207	L2 cache: 262144 B, 64 B, 8-way associative
				2208	Command: concord vg_to_ucode.c
				2209	Events recorded: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				2210	Events shown: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				2211	Event sort order: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				2212	Threshold: 99%
				2213	Chosen for annotation:
				2214	Auto-annotation: on
				2215
				2216	--------------------------------------------------------------------------------
				2217	Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				2218	--------------------------------------------------------------------------------
				2219	27,742,716 276 275 10,955,517 21,905 3,987 4,474,773 19,280 19,098 PROGRAM TOTALS
				2220
				2221	--------------------------------------------------------------------------------
				2222	Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw file:function
				2223	--------------------------------------------------------------------------------
				2224	8,821,482 5 5 2,242,702 1,621 73 1,794,230 0 0 getc.c:_IO_getc
				2225	5,222,023 4 4 2,276,334 16 12 875,959 1 1 concord.c:get_word
				2226	2,649,248 2 2 1,344,810 7,326 1,385 . . . vg_main.c:strcmp
				2227	2,521,927 2 2 591,215 0 0 179,398 0 0 concord.c:hash
				2228	2,242,740 2 2 1,046,612 568 22 448,548 0 0 ctype.c:tolower
				2229	1,496,937 4 4 630,874 9,000 1,400 279,388 0 0 concord.c:insert
				2230	897,991 51 51 897,831 95 30 62 1 1 ???:???
				2231	598,068 1 1 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__flockfile
				2232	598,068 0 0 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__funlockfile
				2233	598,024 4 4 213,580 35 16 149,506 0 0 vg_clientmalloc.c:malloc
				2234	446,587 1 1 215,973 2,167 430 129,948 14,057 13,957 concord.c:add_existing
				2235	341,760 2 2 128,160 0 0 128,160 0 0 vg_clientmalloc.c:vg_trap_here_WRAPPER
				2236	320,782 4 4 150,711 276 0 56,027 53 53 concord.c:init_hash_table
				2237	298,998 1 1 106,785 0 0 64,071 1 1 concord.c:create
				2238	149,518 0 0 149,516 0 0 1 0 0 ???:tolower@@GLIBC_2.0
				2239	149,518 0 0 149,516 0 0 1 0 0 ???:fgetc@@GLIBC_2.0
				2240	95,983 4 4 38,031 0 0 34,409 3,152 3,150 concord.c:new_word_node
				2241	85,440 0 0 42,720 0 0 21,360 0 0 vg_clientmalloc.c:vg_bogus_epilogue
				2242	</pre>
				2243
				2244	First up is a summary of the annotation options:
				2245
				2246	<ul>
				2247	<li>I1 cache, D1 cache, L2 cache: cache configuration. So you know the
				2248	configuration with which these results were obtained.</li><p>
				2249
				2250	<li>Command: the command line invocation of the program under
				2251	examination.</li><p>
				2252
				2253	<li>Events recorded: event abbreviations are:<p>
				2254	<ul>
				2255	<li><code>Ir </code>: I cache reads (ie. instructions executed)</li>
				2256	<li><code>I1mr</code>: I1 cache read misses</li>
				2257	<li><code>I2mr</code>: L2 cache instruction read misses</li>
				2258	<li><code>Dr </code>: D cache reads (ie. memory reads)</li>
				2259	<li><code>D1mr</code>: D1 cache read misses</li>
				2260	<li><code>D2mr</code>: L2 cache data read misses</li>
				2261	<li><code>Dw </code>: D cache writes (ie. memory writes)</li>
				2262	<li><code>D1mw</code>: D1 cache write misses</li>
				2263	<li><code>D2mw</code>: L2 cache data write misses</li>
				2264	</ul><p>
				2265	Note that D1 total accesses is given by <code>D1mr</code> +
				2266	<code>D1mw</code>, and that L2 total accesses is given by
				2267	<code>I2mr</code> + <code>D2mr</code> + <code>D2mw</code>.</li><p>
				2268
				2269	<li>Events shown: the events shown (a subset of events gathered). This can
				2270	be adjusted with the <code>--show</code> option.</li><p>
				2271
				2272	<li>Event sort order: the sort order in which functions are shown. For
				2273	example, in this case the functions are sorted from highest
				2274	<code>Ir</code> counts to lowest. If two functions have identical
				2275	<code>Ir</code> counts, they will then be sorted by <code>I1mr</code>
				2276	counts, and so on. This order can be adjusted with the
				2277	<code>--sort</code> option.<p>
				2278
				2279	Note that this dictates the order the functions appear. It is <b>not</b>
				2280	the order in which the columns appear; that is dictated by the "events
				2281	shown" line (and can be changed with the <code>--sort</code> option).
				2282	</li><p>
				2283
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2284	<li>Threshold: <code>vg_annotate</code> by default omits functions
				2285	that cause very low numbers of misses to avoid drowning you in
				2286	information. In this case, vg_annotate shows summaries the
				2287	functions that account for 99% of the <code>Ir</code> counts;
				2288	<code>Ir</code> is chosen as the threshold event since it is the
				2289	primary sort event. The threshold can be adjusted with the
				2290	<code>--threshold</code> option.</li><p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2291
				2292	<li>Chosen for annotation: names of files specified manually for annotation;
				2293	in this case none.</li><p>
				2294
				2295	<li>Auto-annotation: whether auto-annotation was requested via the
				2296	<code>--auto=yes</code> option. In this case no.</li><p>
				2297	</ul>
				2298
				2299	Then follows summary statistics for the whole program. These are similar
				2300	to the summary provided when running <code>valgrind --cachesim=yes</code>.<p>
				2301
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2302	Then follows function-by-function statistics. Each function is
				2303	identified by a <code>file_name:function_name</code> pair. If a column
				2304	contains only a dot it means the function never performs
				2305	that event (eg. the third row shows that <code>strcmp()</code>
				2306	contains no instructions that write to memory). The name
				2307	<code>???</code> is used if the the file name and/or function name
				2308	could not be determined from debugging information. If most of the
				2309	entries have the form <code>???:???</code> the program probably wasn't
				2310	compiled with <code>-g</code>. <p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2311
				2312	It is worth noting that functions will come from three types of source files:
				2313	<ol>
				2314	<li> From the profiled program (<code>concord.c</code> in this example).</li>
				2315	<li>From libraries (eg. <code>getc.c</code>)</li>
				2316	<li>From Valgrind's implementation of some libc functions (eg.
				2317	<code>vg_clientmalloc.c:malloc</code>). These are recognisable because
				2318	the filename begins with <code>vg_</code>, and is probably one of
				2319	<code>vg_main.c</code>, <code>vg_clientmalloc.c</code> or
				2320	<code>vg_mylibc.c</code>.
				2321	</li>
				2322	</ol>
				2323
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2324	There are two ways to annotate source files -- by choosing them
				2325	manually, or with the <code>--auto=yes</code> option. To do it
				2326	manually, just specify the filenames as arguments to
				2327	<code>vg_annotate</code>. For example, the output from running
				2328	<code>vg_annotate concord.c</code> for our example produces the same
				2329	output as above followed by an annotated version of
				2330	<code>concord.c</code>, a section of which looks like:
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2331
				2332	<pre>
				2333	--------------------------------------------------------------------------------
				2334	-- User-annotated source: concord.c
				2335	--------------------------------------------------------------------------------
				2336	Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				2337
				2338	[snip]
				2339
				2340	. . . . . . . . . void init_hash_table(char file_name, Word_Node table[])
				2341	3 1 1 . . . 1 0 0 {
				2342	. . . . . . . . . FILE *file_ptr;
				2343	. . . . . . . . . Word_Info *data;
				2344	1 0 0 . . . 1 1 1 int line = 1, i;
				2345	. . . . . . . . .
				2346	5 0 0 . . . 3 0 0 data = (Word_Info *) create(sizeof(Word_Info));
				2347	. . . . . . . . .
				2348	4,991 0 0 1,995 0 0 998 0 0 for (i = 0; i < TABLE_SIZE; i++)
				2349	3,988 1 1 1,994 0 0 997 53 52 table[i] = NULL;
				2350	. . . . . . . . .
				2351	. . . . . . . . . /* Open file, check it. */
				2352	6 0 0 1 0 0 4 0 0 file_ptr = fopen(file_name, "r");
				2353	2 0 0 1 0 0 . . . if (!(file_ptr)) {
				2354	. . . . . . . . . fprintf(stderr, "Couldn't open '%s'.\n", file_name);
				2355	1 1 1 . . . . . . exit(EXIT_FAILURE);
				2356	. . . . . . . . . }
				2357	. . . . . . . . .
				2358	165,062 1 1 73,360 0 0 91,700 0 0 while ((line = get_word(data, line, file_ptr)) != EOF)
				2359	146,712 0 0 73,356 0 0 73,356 0 0 insert(data->;word, data->line, table);
				2360	. . . . . . . . .
				2361	4 0 0 1 0 0 2 0 0 free(data);
				2362	4 0 0 1 0 0 2 0 0 fclose(file_ptr);
				2363	3 0 0 2 0 0 . . . }
				2364	</pre>
				2365
				2366	(Although column widths are automatically minimised, a wide terminal is clearly
				2367	useful.)<p>
				2368
				2369	Each source file is clearly marked (<code>User-annotated source</code>) as
				2370	having been chosen manually for annotation. If the file was found in one of
				2371	the directories specified with the <code>-I</code>/<code>--include</code>
				2372	option, the directory and file are both given.<p>
				2373
				2374	Each line is annotated with its event counts. Events not applicable for a line
				2375	are represented by a `.'; this is useful for distinguishing between an event
				2376	which cannot happen, and one which can but did not.<p>
				2377
				2378	Sometimes only a small section of a source file is executed. To minimise
				2379	uninteresting output, Valgrind only shows annotated lines and lines within a
				2380	small distance of annotated lines. Gaps are marked with the line numbers so
				2381	you know which part of a file the shown code comes from, eg:
				2382
				2383	<pre>
				2384	(figures and code for line 704)
				2385	-- line 704 ----------------------------------------
				2386	-- line 878 ----------------------------------------
				2387	(figures and code for line 878)
				2388	</pre>
				2389
				2390	The amount of context to show around annotated lines is controlled by the
				2391	<code>--context</code> option.<p>
				2392
				2393	To get automatic annotation, run <code>vg_annotate --auto=yes</code>.
				2394	vg_annotate will automatically annotate every source file it can find that is
				2395	mentioned in the function-by-function summary. Therefore, the files chosen for
				2396	auto-annotation are affected by the <code>--sort</code> and
				2397	<code>--threshold</code> options. Each source file is clearly marked
				2398	(<code>Auto-annotated source</code>) as being chosen automatically. Any files
				2399	that could not be found are mentioned at the end of the output, eg:
				2400
				2401	<pre>
				2402	--------------------------------------------------------------------------------
				2403	The following files chosen for auto-annotation could not be found:
				2404	--------------------------------------------------------------------------------
				2405	getc.c
				2406	ctype.c
				2407	../sysdeps/generic/lockfile.c
				2408	</pre>
				2409
				2410	This is quite common for library files, since libraries are usually compiled
				2411	with debugging information, but the source files are often not present on a
				2412	system. If a file is chosen for annotation <b>both</b> manually and
				2413	automatically, it is marked as <code>User-annotated source</code>.
				2414
				2415	Use the <code>-I/--include</code> option to tell Valgrind where to look for
				2416	source files if the filenames found from the debugging information aren't
				2417	specific enough.
				2418
				2419	Beware that vg_annotate can take some time to digest large
				2420	<code>cachegrind.out</code> files, eg. 30 seconds or more. Also beware that
				2421	auto-annotation can produce a lot of output if your program is large!
				2422
				2423
				2424	<h3>7.8  Annotating assembler programs</h3>
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2425
				2426	Valgrind can annotate assembler programs too, or annotate the
				2427	assembler generated for your C program. Sometimes this is useful for
				2428	understanding what is really happening when an interesting line of C
				2429	code is translated into multiple instructions.<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2430
				2431	To do this, you just need to assemble your <code>.s</code> files with
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2432	assembler-level debug information. gcc doesn't do this, but you can
				2433	use the GNU assembler with the <code>--gstabs</code> option to
				2434	generate object files with this information, eg:
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2435
				2436	<blockquote><code>as --gstabs foo.s</code></blockquote>
				2437
				2438	You can then profile and annotate source files in the same way as for C/C++
				2439	programs.
				2440
				2441
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2442	<h3>7.9  <code>vg_annotate</code> options</h3>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2443	<ul>
				2444	<li><code>-h, --help</code></li><p>
				2445	<li><code>-v, --version</code><p>
				2446
				2447	Help and version, as usual.</li>
				2448
				2449	<li><code>--sort=A,B,C</code> [default: order in
				2450	<code>cachegrind.out</code>]<p>
				2451	Specifies the events upon which the sorting of the function-by-function
				2452	entries will be based. Useful if you want to concentrate on eg. I cache
				2453	misses (<code>--sort=I1mr,I2mr</code>), or D cache misses
				2454	(<code>--sort=D1mr,D2mr</code>), or L2 misses
				2455	(<code>--sort=D2mr,I2mr</code>).</li><p>
				2456
				2457	<li><code>--show=A,B,C</code> [default: all, using order in
				2458	<code>cachegrind.out</code>]<p>
				2459	Specifies which events to show (and the column order). Default is to use
				2460	all present in the <code>cachegrind.out</code> file (and use the order in
				2461	the file).</li><p>
				2462
				2463	<li><code>--threshold=X</code> [default: 99%] <p>
				2464	Sets the threshold for the function-by-function summary. Functions are
njn	bff8876	2002-05-13 20:27:54 +0000	[diff] [blame]	2465	shown that account for more than X% of the primary sort event. If
				2466	auto-annotating, also affects which files are annotated.
				2467
				2468	Note: thresholds can be set for more than one of the events by appending
				2469	any events for the <code>--sort</code> option with a colon and a number
				2470	(no spaces, though). E.g. if you want to see the functions that cover
				2471	99% of L2 read misses and 99% of L2 write misses, use this option:
				2472
				2473	<blockquote><code>--sort=D2mr:99,D2mw:99</code></blockquote>
				2474	</li><p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2475
				2476	<li><code>--auto=no</code> [default]<br>
				2477	<code>--auto=yes</code> <p>
				2478	When enabled, automatically annotates every file that is mentioned in the
				2479	function-by-function summary that can be found. Also gives a list of
				2480	those that couldn't be found.
				2481
				2482	<li><code>--context=N</code> [default: 8]<p>
				2483	Print N lines of context before and after each annotated line. Avoids
				2484	printing large sections of source files that were not executed. Use a
				2485	large number (eg. 10,000) to show all source lines.
				2486	</li><p>
				2487
				2488	<li><code>-I=<dir>, --include=<dir></code>
				2489	[default: empty string]<p>
				2490	Adds a directory to the list in which to search for files. Multiple
				2491	-I/--include options can be given to add multiple directories.
				2492	</ul>
				2493
				2494
				2495	<h3>7.10  Warnings</h3>
				2496	There are a couple of situations in which vg_annotate issues warnings.
				2497
				2498	<ul>
				2499	<li>If a source file is more recent than the <code>cachegrind.out</code>
				2500	file. This is because the information in <code>cachegrind.out</code> is
				2501	only recorded with line numbers, so if the line numbers change at all in
				2502	the source (eg. lines added, deleted, swapped), any annotations will be
				2503	incorrect.<p>
				2504
				2505	<li>If information is recorded about line numbers past the end of a file.
				2506	This can be caused by the above problem, ie. shortening the source file
				2507	while using an old <code>cachegrind.out</code> file. If this happens,
				2508	the figures for the bogus lines are printed anyway (clearly marked as
				2509	bogus) in case they are important.</li><p>
				2510	</ul>
				2511
				2512
				2513	<h3>7.10  Things to watch out for</h3>
				2514	Some odd things that can occur during annotation:
				2515
				2516	<ul>
				2517	<li>If annotating at the assembler level, you might see something like this:
				2518
				2519	<pre>
				2520	1 0 0 . . . . . . leal -12(%ebp),%eax
				2521	1 0 0 . . . 1 0 0 movl %eax,84(%ebx)
				2522	2 0 0 0 0 0 1 0 0 movl $1,-20(%ebp)
				2523	. . . . . . . . . .align 4,0x90
				2524	1 0 0 . . . . . . movl $.LnrB,%eax
				2525	1 0 0 . . . 1 0 0 movl %eax,-16(%ebp)
				2526	</pre>
				2527
				2528	How can the third instruction be executed twice when the others are
				2529	executed only once? As it turns out, it isn't. Here's a dump of the
				2530	executable, from objdump:
				2531
				2532	<pre>
				2533	8048f25: 8d 45 f4 lea 0xfffffff4(%ebp),%eax
				2534	8048f28: 89 43 54 mov %eax,0x54(%ebx)
				2535	8048f2b: c7 45 ec 01 00 00 00 movl $0x1,0xffffffec(%ebp)
				2536	8048f32: 89 f6 mov %esi,%esi
				2537	8048f34: b8 08 8b 07 08 mov $0x8078b08,%eax
				2538	8048f39: 89 45 f0 mov %eax,0xfffffff0(%ebp)
				2539	</pre>
				2540
				2541	Notice the extra <code>mov %esi,%esi</code> instruction. Where did this
				2542	come from? The GNU assembler inserted it to serve as the two bytes of
				2543	padding needed to align the <code>movl $.LnrB,%eax</code> instruction on
				2544	a four-byte boundary, but pretended it didn't exist when adding debug
				2545	information. Thus when Valgrind reads the debug info it thinks that the
				2546	<code>movl $0x1,0xffffffec(%ebp)</code> instruction covers the address
				2547	range 0x8048f2b--0x804833 by itself, and attributes the counts for the
				2548	<code>mov %esi,%esi</code> to it.<p>
				2549	</li>
				2550
njn	7efaa11	2002-05-07 10:26:57 +0000	[diff] [blame]	2551	<li>Inlined functions can cause strange results in the function-by-function
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2552	summary. If a function <code>inline_me()</code> is defined in
				2553	<code>foo.h</code> and inlined in the functions <code>f1()</code>,
				2554	<code>f2()</code> and <code>f3()</code> in <code>bar.c</code>, there will
				2555	not be a <code>foo.h:inline_me()</code> function entry. Instead, there
				2556	will be separate function entries for each inlining site, ie.
				2557	<code>foo.h:f1()</code>, <code>foo.h:f2()</code> and
				2558	<code>foo.h:f3()</code>. To find the total counts for
				2559	<code>foo.h:inline_me()</code>, add up the counts from each entry.<p>
				2560
				2561	The reason for this is that although the debug info output by gcc
				2562	indicates the switch from <code>bar.c</code> to <code>foo.h</code>, it
				2563	doesn't indicate the name of the function in <code>foo.h</code>, so
				2564	Valgrind keeps using the old one.<p>
				2565
njn	7efaa11	2002-05-07 10:26:57 +0000	[diff] [blame]	2566	<li>Sometimes, the same filename might be represented with a relative name
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2567	and with an absolute name in different parts of the debug info, eg:
				2568	<code>/home/user/proj/proj.h</code> and <code>../proj.h</code>. In this
				2569	case, if you use auto-annotation, the file will be annotated twice with
				2570	the counts split between the two.<p>
				2571	</li>
njn	7efaa11	2002-05-07 10:26:57 +0000	[diff] [blame]	2572
				2573	<li>Files with more than 65,535 lines cause difficulties for the stabs debug
				2574	info reader. This is because the line number in the <code>struct
				2575	nlist</code> defined in <code>a.out.h</code> under Linux is only a 16-bit
				2576	number. Valgrind can handle some files with more than 65,535 lines
				2577	correctly by making some guesses to identify line number overflows. But
				2578	some cases are beyond it, in which case you'll get a warning message
njn	bff8876	2002-05-13 20:27:54 +0000	[diff] [blame]	2579	explaining that annotations for the file might be incorrect.<p>
				2580	</li>
				2581
				2582	<li>If you compile some files with <code>-g</code> and some without, some
				2583	events that take place in a file without debug info could be attributed
				2584	to the last line of a file with debug info (whichever one gets placed
				2585	before the non-debug-info file in the executable).<p>
njn	7efaa11	2002-05-07 10:26:57 +0000	[diff] [blame]	2586	</li>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2587	</ul>
				2588
njn	bff8876	2002-05-13 20:27:54 +0000	[diff] [blame]	2589	This list looks long, but these cases should be fairly rare.<p>
				2590
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2591	Note: stabs is not an easy format to read. If you come across bizarre
				2592	annotations that look like might be caused by a bug in the stabs reader,
njn	bff8876	2002-05-13 20:27:54 +0000	[diff] [blame]	2593	please let us know.<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2594
				2595
				2596	<h3>7.11  Accuracy</h3>
				2597	Valgrind's cache profiling has a number of shortcomings:
				2598
				2599	<ul>
				2600	<li>It doesn't account for kernel activity -- the effect of system calls on
				2601	the cache contents is ignored.</li><p>
				2602
				2603	<li>It doesn't account for other process activity (although this is probably
				2604	desirable when considering a single program).</li><p>
				2605
				2606	<li>It doesn't account for virtual-to-physical address mappings; hence the
				2607	entire simulation is not a true representation of what's happening in the
				2608	cache.</li><p>
				2609
				2610	<li>It doesn't account for cache misses not visible at the instruction level,
				2611	eg. those arising from TLB misses, or speculative execution.</li><p>
njn	db75e4d	2002-04-30 12:46:22 +0000	[diff] [blame]	2612
njn	bff8876	2002-05-13 20:27:54 +0000	[diff] [blame]	2613	<li>Valgrind's custom <code>malloc()</code> will allocate memory in different
				2614	ways to the standard <code>malloc()</code>, which could warp the results.
				2615	</li><p>
				2616
njn	db75e4d	2002-04-30 12:46:22 +0000	[diff] [blame]	2617	<li>The instructions <code>bts</code>, <code>btr</code> and <code>btc</code>
				2618	will incorrectly be counted as doing a data read if both the arguments
				2619	are registers, eg:
				2620
				2621	<blockquote><code>btsl %eax, %edx</code></blockquote>
				2622
				2623	This should only happen rarely.
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2624	</ul>
				2625
				2626	Another thing worth nothing is that results are very sensitive. Changing the
				2627	size of the <code>valgrind.so</code> file, the size of the program being
				2628	profiled, or even the length of its name can perturb the results. Variations
				2629	will be small, but don't expect perfectly repeatable results if your program
				2630	changes at all.<p>
				2631
				2632	While these factors mean you shouldn't trust the results to be super-accurate,
				2633	hopefully they should be close enough to be useful.<p>
				2634
				2635
				2636	<h3>7.12  Todo</h3>
				2637	<ul>
				2638	<li>Use CPUID instruction to auto-identify cache configuration during
				2639	installation. This would save the user from having to know their cache
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2640	configuration and using vg_cachegen.</li>
				2641	<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2642	<li>Program start-up/shut-down calls a lot of functions that aren't
				2643	interesting and just complicate the output. Would be nice to exclude
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2644	these somehow.</li>
				2645	<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2646	</ul>
				2647	<hr width="100%">
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	2648	</body>
				2649	</html>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2650