Blame - memcheck/docs/manual.html - platform/external/valgrind

blob: 4b6b773915b4cbedccde9c96d2b785ebb75d76a4 [file] [log] [blame]

sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1	<html>
				2	<head>
				3	<style type="text/css">
				4	body { background-color: #ffffff;
				5	color: #000000;
				6	font-family: Times, Helvetica, Arial;
				7	font-size: 14pt}
				8	h4 { margin-bottom: 0.3em}
				9	code { color: #000000;
				10	font-family: Courier;
				11	font-size: 13pt }
				12	pre { color: #000000;
				13	font-family: Courier;
				14	font-size: 13pt }
				15	a:link { color: #0000C0;
				16	text-decoration: none; }
				17	a:visited { color: #0000C0;
				18	text-decoration: none; }
				19	a:active { color: #0000C0;
				20	text-decoration: none; }
				21	</style>
				22	</head>
				23
				24	<body bgcolor="#ffffff">
				25
				26	<a name="title"> </a>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	27	<h1 align=center>Valgrind, snapshot 20020501</h1>
				28	<center>This manual was majorly updated on 20020501</center>
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	29	<p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	30
				31	<center>
				32	<a href="mailto:jseward@acm.org">jseward@acm.org<br>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	33	Copyright © 2000-2002 Julian Seward
				34	<p>
				35	Valgrind is licensed under the GNU General Public License,
				36	version 2<br>
				37	An open-source tool for finding memory-management problems in
				38	Linux-x86 executables.
				39	</center>
				40
				41	<p>
				42
				43	<hr width="100%">
				44	<a name="contents"></a>
				45	<h2>Contents of this manual</h2>
				46
				47	<h4>1  <a href="#intro">Introduction</a></h4>
				48	1.1  <a href="#whatfor">What Valgrind is for</a><br>
				49	1.2  <a href="#whatdoes">What it does with your program</a>
				50
				51	<h4>2  <a href="#howtouse">How to use it, and how to make sense
				52	of the results</a></h4>
				53	2.1  <a href="#starta">Getting started</a><br>
				54	2.2  <a href="#comment">The commentary</a><br>
				55	2.3  <a href="#report">Reporting of errors</a><br>
				56	2.4  <a href="#suppress">Suppressing errors</a><br>
				57	2.5  <a href="#flags">Command-line flags</a><br>
				58	2.6  <a href="#errormsgs">Explaination of error messages</a><br>
				59	2.7  <a href="#suppfiles">Writing suppressions files</a><br>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	60	2.8  <a href="#clientreq">The Client Request mechanism</a><br>
				61	2.9  <a href="#pthreads">Support for POSIX pthreads</a><br>
				62	2.10  <a href="#install">Building and installing</a><br>
				63	2.11  <a href="#problems">If you have problems</a><br>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	64
				65	<h4>3  <a href="#machine">Details of the checking machinery</a></h4>
				66	3.1  <a href="#vvalue">Valid-value (V) bits</a><br>
				67	3.2  <a href="#vaddress">Valid-address (A) bits</a><br>
				68	3.3  <a href="#together">Putting it all together</a><br>
				69	3.4  <a href="#signals">Signals</a><br>
				70	3.5  <a href="#leaks">Memory leak detection</a><br>
				71
				72	<h4>4  <a href="#limits">Limitations</a></h4>
				73
				74	<h4>5  <a href="#howitworks">How it works -- a rough overview</a></h4>
				75	5.1  <a href="#startb">Getting started</a><br>
				76	5.2  <a href="#engine">The translation/instrumentation engine</a><br>
				77	5.3  <a href="#track">Tracking the status of memory</a><br>
				78	5.4  <a href="#sys_calls">System calls</a><br>
				79	5.5  <a href="#sys_signals">Signals</a><br>
				80
				81	<h4>6  <a href="#example">An example</a></h4>
				82
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	83	<h4>7  <a href="#cache">Cache profiling</a></h4>
				84
				85	<h4>8  <a href="techdocs.html">The design and implementation of Valgrind</a></h4>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	86
				87	<hr width="100%">
				88
				89	<a name="intro"></a>
				90	<h2>1  Introduction</h2>
				91
				92	<a name="whatfor"></a>
				93	<h3>1.1  What Valgrind is for</h3>
				94
				95	Valgrind is a tool to help you find memory-management problems in your
				96	programs. When a program is run under Valgrind's supervision, all
				97	reads and writes of memory are checked, and calls to
				98	malloc/new/free/delete are intercepted. As a result, Valgrind can
				99	detect problems such as:
				100	<ul>
				101	<li>Use of uninitialised memory</li>
				102	<li>Reading/writing memory after it has been free'd</li>
				103	<li>Reading/writing off the end of malloc'd blocks</li>
				104	<li>Reading/writing inappropriate areas on the stack</li>
				105	<li>Memory leaks -- where pointers to malloc'd blocks are lost forever</li>
				106	</ul>
				107
				108	Problems like these can be difficult to find by other means, often
				109	lying undetected for long periods, then causing occasional,
				110	difficult-to-diagnose crashes.
				111
				112	<p>
				113	Valgrind is closely tied to details of the CPU, operating system and
				114	to a less extent, compiler and basic C libraries. This makes it
				115	difficult to make it portable, so I have chosen at the outset to
				116	concentrate on what I believe to be a widely used platform: Red Hat
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	117	Linux 7.2, on x86s. Valgrind uses the standard Unix
				118	<code>./configure</code>, <code>make</code>, <code>make install</code>
				119	mechanism, and I have attempted to ensure that it works on machines
				120	with kernel 2.2 or 2.4 and glibc 2.1.X or 2.2.X. This should cover
				121	the vast majority of modern Linux installations.
				122
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	123
				124	<p>
				125	Valgrind is licensed under the GNU General Public License, version
				126	2. Read the file LICENSE in the source distribution for details.
				127
				128	<a name="whatdoes">
				129	<h3>1.2  What it does with your program</h3>
				130
				131	Valgrind is designed to be as non-intrusive as possible. It works
				132	directly with existing executables. You don't need to recompile,
				133	relink, or otherwise modify, the program to be checked. Simply place
				134	the word <code>valgrind</code> at the start of the command line
				135	normally used to run the program. So, for example, if you want to run
				136	the command <code>ls -l</code> on Valgrind, simply issue the
				137	command: <code>valgrind ls -l</code>.
				138
				139	<p>Valgrind takes control of your program before it starts. Debugging
				140	information is read from the executable and associated libraries, so
				141	that error messages can be phrased in terms of source code
				142	locations. Your program is then run on a synthetic x86 CPU which
				143	checks every memory access. All detected errors are written to a
				144	log. When the program finishes, Valgrind searches for and reports on
				145	leaked memory.
				146
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	147	<p>You can run pretty much any dynamically linked ELF x86 executable
				148	using Valgrind. Programs run 25 to 50 times slower, and take a lot
				149	more memory, than they usually would. It works well enough to run
				150	large programs. For example, the Konqueror web browser from the KDE
				151	Desktop Environment, version 3.0, runs slowly but usably on Valgrind.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	152
				153	<p>Valgrind simulates every single instruction your program executes.
				154	Because of this, it finds errors not only in your application but also
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	155	in all supporting dynamically-linked (<code>.so</code>-format)
				156	libraries, including the GNU C library, the X client libraries, Qt, if
				157	you work with KDE, and so on. That often includes libraries, for
				158	example the GNU C library, which contain memory access violations, but
				159	which you cannot or do not want to fix.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	160
				161	<p>Rather than swamping you with errors in which you are not
				162	interested, Valgrind allows you to selectively suppress errors, by
				163	recording them in a suppressions file which is read when Valgrind
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	164	starts up. The build mechanism attempts to select suppressions which
				165	give reasonable behaviour for the libc and XFree86 versions detected
				166	on your machine.
				167
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	168
				169	<p><a href="#example">Section 6</a> shows an example of use.
				170	<p>
				171	<hr width="100%">
				172
				173	<a name="howtouse"></a>
				174	<h2>2  How to use it, and how to make sense of the results</h2>
				175
				176	<a name="starta"></a>
				177	<h3>2.1  Getting started</h3>
				178
				179	First off, consider whether it might be beneficial to recompile your
				180	application and supporting libraries with optimisation disabled and
				181	debugging info enabled (the <code>-g</code> flag). You don't have to
				182	do this, but doing so helps Valgrind produce more accurate and less
				183	confusing error reports. Chances are you're set up like this already,
				184	if you intended to debug your program with GNU gdb, or some other
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	185	debugger.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	186
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	187	<p>
				188	A plausible compromise is to use <code>-g -O</code>.
				189	Optimisation levels above <code>-O</code> have been observed, on very
				190	rare occasions, to cause gcc to generate code which fools Valgrind's
				191	error tracking machinery into wrongly reporting uninitialised value
				192	errors. <code>-O</code> gets you the vast majority of the benefits of
				193	higher optimisation levels anyway, so you don't lose much there.
				194
				195	<p>
				196	Note that as of 1 May 2002 Valgrind does not understand the DWARF
				197	debugging format, which is unfortunate since the upcoming gcc-3.1 uses
				198	it by default. Valgrind only knows about the older "stabs" format.
				199	If you use gcc-3.1 or above, you can still ask for stabs-format debug
				200	info by passing <code>-gstabs</code> to gcc.
				201
				202	<p>
				203	Then just run your application, but place the word
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	204	<code>valgrind</code> in front of your usual command-line invokation.
				205	Note that you should run the real (machine-code) executable here. If
				206	your application is started by, for example, a shell or perl script,
				207	you'll need to modify it to invoke Valgrind on the real executables.
				208	Running such scripts directly under Valgrind will result in you
				209	getting error reports pertaining to <code>/bin/sh</code>,
				210	<code>/usr/bin/perl</code>, or whatever interpreter you're using.
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	211	This almost certainly isn't what you want and can be confusing.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	212
				213	<a name="comment"></a>
				214	<h3>2.2  The commentary</h3>
				215
				216	Valgrind writes a commentary, detailing error reports and other
				217	significant events. The commentary goes to standard output by
				218	default. This may interfere with your program, so you can ask for it
				219	to be directed elsewhere.
				220
				221	<p>All lines in the commentary are of the following form:<br>
				222	<pre>
				223	==12345== some-message-from-Valgrind
				224	</pre>
				225	<p>The <code>12345</code> is the process ID. This scheme makes it easy
				226	to distinguish program output from Valgrind commentary, and also easy
				227	to differentiate commentaries from different processes which have
				228	become merged together, for whatever reason.
				229
				230	<p>By default, Valgrind writes only essential messages to the commentary,
				231	so as to avoid flooding you with information of secondary importance.
				232	If you want more information about what is happening, re-run, passing
				233	the <code>-v</code> flag to Valgrind.
				234
				235
				236	<a name="report"></a>
				237	<h3>2.3  Reporting of errors</h3>
				238
				239	When Valgrind detects something bad happening in the program, an error
				240	message is written to the commentary. For example:<br>
				241	<pre>
				242	==25832== Invalid read of size 4
				243	==25832== at 0x8048724: BandMatrix::ReSize(int, int, int) (bogon.cpp:45)
				244	==25832== by 0x80487AF: main (bogon.cpp:66)
				245	==25832== by 0x40371E5E: __libc_start_main (libc-start.c:129)
				246	==25832== by 0x80485D1: (within /home/sewardj/newmat10/bogon)
				247	==25832== Address 0xBFFFF74C is not stack'd, malloc'd or free'd
				248	</pre>
				249
				250	<p>This message says that the program did an illegal 4-byte read of
				251	address 0xBFFFF74C, which, as far as it can tell, is not a valid stack
				252	address, nor corresponds to any currently malloc'd or free'd blocks.
				253	The read is happening at line 45 of <code>bogon.cpp</code>, called
				254	from line 66 of the same file, etc. For errors associated with an
				255	identified malloc'd/free'd block, for example reading free'd memory,
				256	Valgrind reports not only the location where the error happened, but
				257	also where the associated block was malloc'd/free'd.
				258
				259	<p>Valgrind remembers all error reports. When an error is detected,
				260	it is compared against old reports, to see if it is a duplicate. If
				261	so, the error is noted, but no further commentary is emitted. This
				262	avoids you being swamped with bazillions of duplicate error reports.
				263
				264	<p>If you want to know how many times each error occurred, run with
				265	the <code>-v</code> option. When execution finishes, all the reports
				266	are printed out, along with, and sorted by, their occurrence counts.
				267	This makes it easy to see which errors have occurred most frequently.
				268
				269	<p>Errors are reported before the associated operation actually
				270	happens. For example, if you program decides to read from address
				271	zero, Valgrind will emit a message to this effect, and the program
				272	will then duly die with a segmentation fault.
				273
				274	<p>In general, you should try and fix errors in the order that they
				275	are reported. Not doing so can be confusing. For example, a program
				276	which copies uninitialised values to several memory locations, and
				277	later uses them, will generate several error messages. The first such
				278	error message may well give the most direct clue to the root cause of
				279	the problem.
				280
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	281	<p>The process of detecting duplicate errors is quite an expensive
				282	one and can become a significant performance overhead if your program
				283	generates huge quantities of errors. To avoid serious problems here,
				284	Valgrind will simply stop collecting errors after 300 different errors
				285	have been seen, or 30000 errors in total have been seen. In this
				286	situation you might as well stop your program and fix it, because
				287	Valgrind won't tell you anything else useful after this. Note that
				288	the 300/30000 limits apply after suppressed errors are removed. These
				289	limits are defined in <code>vg_include.h</code> and can be increased
				290	if necessary.
				291
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	292	<a name="suppress"></a>
				293	<h3>2.4  Suppressing errors</h3>
				294
				295	Valgrind detects numerous problems in the base libraries, such as the
				296	GNU C library, and the XFree86 client libraries, which come
				297	pre-installed on your GNU/Linux system. You can't easily fix these,
				298	but you don't want to see these errors (and yes, there are many!) So
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	299	Valgrind reads a list of errors to suppress at startup.
				300	A default suppression file is cooked up by the
				301	<code>./configure</code> script.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	302
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	303	<p>You can modify and add to the suppressions file at your leisure,
				304	or, better, write your own. Multiple suppression files are allowed.
				305	This is useful if part of your project contains errors you can't or
				306	don't want to fix, yet you don't want to continuously be reminded of
				307	them.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	308
				309	<p>Each error to be suppressed is described very specifically, to
				310	minimise the possibility that a suppression-directive inadvertantly
				311	suppresses a bunch of similar errors which you did want to see. The
				312	suppression mechanism is designed to allow precise yet flexible
				313	specification of errors to suppress.
				314
				315	<p>If you use the <code>-v</code> flag, at the end of execution, Valgrind
				316	prints out one line for each used suppression, giving its name and the
				317	number of times it got used. Here's the suppressions used by a run of
				318	<code>ls -l</code>:
				319	<pre>
				320	--27579-- supp: 1 socketcall.connect(serv_addr)/__libc_connect/__nscd_getgrgid_r
				321	--27579-- supp: 1 socketcall.connect(serv_addr)/__libc_connect/__nscd_getpwuid_r
				322	--27579-- supp: 6 strrchr/_dl_map_object_from_fd/_dl_map_object
				323	</pre>
				324
				325	<a name="flags"></a>
				326	<h3>2.5  Command-line flags</h3>
				327
				328	You invoke Valgrind like this:
				329	<pre>
				330	valgrind [options-for-Valgrind] your-prog [options for your-prog]
				331	</pre>
				332
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	333	<p>Note that Valgrind also reads options from the environment variable
				334	<code>$VALGRIND</code>, and processes them before the command-line
				335	options.
				336
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	337	<p>Valgrind's default settings succeed in giving reasonable behaviour
				338	in most cases. Available options, in no particular order, are as
				339	follows:
				340	<ul>
				341	<li><code>--help</code></li><br>
				342
				343	<li><code>--version</code><br>
				344	<p>The usual deal.</li><br><p>
				345
				346	<li><code>-v --verbose</code><br>
				347	<p>Be more verbose. Gives extra information on various aspects
				348	of your program, such as: the shared objects loaded, the
				349	suppressions used, the progress of the instrumentation engine,
				350	and warnings about unusual behaviour.
				351	</li><br><p>
				352
				353	<li><code>-q --quiet</code><br>
				354	<p>Run silently, and only print error messages. Useful if you
				355	are running regression tests or have some other automated test
				356	machinery.
				357	</li><br><p>
				358
				359	<li><code>--demangle=no</code><br>
				360	<code>--demangle=yes</code> [the default]
				361	<p>Disable/enable automatic demangling (decoding) of C++ names.
				362	Enabled by default. When enabled, Valgrind will attempt to
				363	translate encoded C++ procedure names back to something
				364	approaching the original. The demangler handles symbols mangled
				365	by g++ versions 2.X and 3.X.
				366
				367	<p>An important fact about demangling is that function
				368	names mentioned in suppressions files should be in their mangled
				369	form. Valgrind does not demangle function names when searching
				370	for applicable suppressions, because to do otherwise would make
				371	suppressions file contents dependent on the state of Valgrind's
				372	demangling machinery, and would also be slow and pointless.
				373	</li><br><p>
				374
				375	<li><code>--num-callers=<number></code> [default=4]<br>
				376	<p>By default, Valgrind shows four levels of function call names
				377	to help you identify program locations. You can change that
				378	number with this option. This can help in determining the
				379	program's location in deeply-nested call chains. Note that errors
				380	are commoned up using only the top three function locations (the
				381	place in the current function, and that of its two immediate
				382	callers). So this doesn't affect the total number of errors
				383	reported.
				384	<p>
				385	The maximum value for this is 50. Note that higher settings
				386	will make Valgrind run a bit more slowly and take a bit more
				387	memory, but can be useful when working with programs with
				388	deeply-nested call chains.
				389	</li><br><p>
				390
				391	<li><code>--gdb-attach=no</code> [the default]<br>
				392	<code>--gdb-attach=yes</code>
				393	<p>When enabled, Valgrind will pause after every error shown,
				394	and print the line
				395	<br>
				396	<code>---- Attach to GDB ? --- [Return/N/n/Y/y/C/c] ----</code>
				397	<p>
				398	Pressing <code>Ret</code>, or <code>N</code> <code>Ret</code>
				399	or <code>n</code> <code>Ret</code>, causes Valgrind not to
				400	start GDB for this error.
				401	<p>
				402	<code>Y</code> <code>Ret</code>
				403	or <code>y</code> <code>Ret</code> causes Valgrind to
				404	start GDB, for the program at this point. When you have
				405	finished with GDB, quit from it, and the program will continue.
				406	Trying to continue from inside GDB doesn't work.
				407	<p>
				408	<code>C</code> <code>Ret</code>
				409	or <code>c</code> <code>Ret</code> causes Valgrind not to
				410	start GDB, and not to ask again.
				411	<p>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	412	<code>--gdb-attach=yes</code> conflicts with
				413	<code>--trace-children=yes</code>. You can't use them together.
				414	Valgrind refuses to start up in this situation. 1 May 2002:
				415	this is a historical relic which could be easily fixed if it
				416	gets in your way. Mail me and complain if this is a problem for
				417	you. </li><br><p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	418
				419	<li><code>--partial-loads-ok=yes</code> [the default]<br>
				420	<code>--partial-loads-ok=no</code>
				421	<p>Controls how Valgrind handles word (4-byte) loads from
				422	addresses for which some bytes are addressible and others
				423	are not. When <code>yes</code> (the default), such loads
				424	do not elicit an address error. Instead, the loaded V bytes
				425	corresponding to the illegal addresses indicate undefined, and
				426	those corresponding to legal addresses are loaded from shadow
				427	memory, as usual.
				428	<p>
				429	When <code>no</code>, loads from partially
				430	invalid addresses are treated the same as loads from completely
				431	invalid addresses: an illegal-address error is issued,
				432	and the resulting V bytes indicate valid data.
				433	</li><br><p>
				434
				435	<li><code>--sloppy-malloc=no</code> [the default]<br>
				436	<code>--sloppy-malloc=yes</code>
				437	<p>When enabled, all requests for malloc/calloc are rounded up
				438	to a whole number of machine words -- in other words, made
				439	divisible by 4. For example, a request for 17 bytes of space
				440	would result in a 20-byte area being made available. This works
				441	around bugs in sloppy libraries which assume that they can
				442	safely rely on malloc/calloc requests being rounded up in this
				443	fashion. Without the workaround, these libraries tend to
				444	generate large numbers of errors when they access the ends of
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	445	these areas.
				446	<p>
				447	Valgrind snapshots dated 17 Feb 2002 and later are
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	448	cleverer about this problem, and you should no longer need to
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	449	use this flag. To put it bluntly, if you do need to use this
				450	flag, your program violates the ANSI C semantics defined for
				451	<code>malloc</code> and <code>free</code>, even if it appears to
				452	work correctly, and you should fix it, at least if you hope for
				453	maximum portability.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	454	</li><br><p>
				455
				456	<li><code>--trace-children=no</code> [the default]</br>
				457	<code>--trace-children=yes</code>
				458	<p>When enabled, Valgrind will trace into child processes. This
				459	is confusing and usually not what you want, so is disabled by
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	460	default. As of 1 May 2002, tracing into a child process from a
				461	parent which uses <code>libpthread.so</code> is probably broken
				462	and is likely to cause breakage. Please report any such
				463	problems to me. </li><br><p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	464
				465	<li><code>--freelist-vol=<number></code> [default: 1000000]
				466	<p>When the client program releases memory using free (in C) or
				467	delete (C++), that memory is not immediately made available for
				468	re-allocation. Instead it is marked inaccessible and placed in
				469	a queue of freed blocks. The purpose is to delay the point at
				470	which freed-up memory comes back into circulation. This
				471	increases the chance that Valgrind will be able to detect
				472	invalid accesses to blocks for some significant period of time
				473	after they have been freed.
				474	<p>
				475	This flag specifies the maximum total size, in bytes, of the
				476	blocks in the queue. The default value is one million bytes.
				477	Increasing this increases the total amount of memory used by
				478	Valgrind but may detect invalid uses of freed blocks which would
				479	otherwise go undetected.</li><br><p>
				480
				481	<li><code>--logfile-fd=<number></code> [default: 2, stderr]
				482	<p>Specifies the file descriptor on which Valgrind communicates
				483	all of its messages. The default, 2, is the standard error
				484	channel. This may interfere with the client's own use of
				485	stderr. To dump Valgrind's commentary in a file without using
				486	stderr, something like the following works well (sh/bash
				487	syntax):<br>
				488	<code>
				489	valgrind --logfile-fd=9 my_prog 9> logfile</code><br>
				490	That is: tell Valgrind to send all output to file descriptor 9,
				491	and ask the shell to route file descriptor 9 to "logfile".
				492	</li><br><p>
				493
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	494	<li><code>--suppressions=<filename></code>
				495	[default: $PREFIX/lib/valgrind/default.supp]
				496	<p>Specifies an extra
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	497	file from which to read descriptions of errors to suppress. You
				498	may use as many extra suppressions files as you
				499	like.</li><br><p>
				500
				501	<li><code>--leak-check=no</code> [default]<br>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	502	<code>--leak-check=yes</code>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	503	<p>When enabled, search for memory leaks when the client program
				504	finishes. A memory leak means a malloc'd block, which has not
				505	yet been free'd, but to which no pointer can be found. Such a
				506	block can never be free'd by the program, since no pointer to it
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	507	exists. Leak checking is disabled by default because it tends
				508	to generate dozens of error messages. </li><br><p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	509
				510	<li><code>--show-reachable=no</code> [default]<br>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	511	<code>--show-reachable=yes</code>
				512	<p>When disabled, the memory leak detector only shows blocks for
				513	which it cannot find a pointer to at all, or it can only find a
				514	pointer to the middle of. These blocks are prime candidates for
				515	memory leaks. When enabled, the leak detector also reports on
				516	blocks which it could find a pointer to. Your program could, at
				517	least in principle, have freed such blocks before exit.
				518	Contrast this to blocks for which no pointer, or only an
				519	interior pointer could be found: they are more likely to
				520	indicate memory leaks, because you do not actually have a
				521	pointer to the start of the block which you can hand to
				522	<code>free</code>, even if you wanted to. </li><br><p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	523
				524	<li><code>--leak-resolution=low</code> [default]<br>
				525	<code>--leak-resolution=med</code> <br>
				526	<code>--leak-resolution=high</code>
				527	<p>When doing leak checking, determines how willing Valgrind is
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	528	to consider different backtraces to be the same. When set to
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	529	<code>low</code>, the default, only the first two entries need
				530	match. When <code>med</code>, four entries have to match. When
				531	<code>high</code>, all entries need to match.
				532	<p>
				533	For hardcore leak debugging, you probably want to use
				534	<code>--leak-resolution=high</code> together with
				535	<code>--num-callers=40</code> or some such large number. Note
				536	however that this can give an overwhelming amount of
				537	information, which is why the defaults are 4 callers and
				538	low-resolution matching.
				539	<p>
				540	Note that the <code>--leak-resolution=</code> setting does not
				541	affect Valgrind's ability to find leaks. It only changes how
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	542	the results are presented.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	543	</li><br><p>
				544
				545	<li><code>--workaround-gcc296-bugs=no</code> [default]<br>
				546	<code>--workaround-gcc296-bugs=yes</code> <p>When enabled,
				547	assume that reads and writes some small distance below the stack
				548	pointer <code>%esp</code> are due to bugs in gcc 2.96, and does
				549	not report them. The "small distance" is 256 bytes by default.
				550	Note that gcc 2.96 is the default compiler on some popular Linux
				551	distributions (RedHat 7.X, Mandrake) and so you may well need to
				552	use this flag. Do not use it if you do not have to, as it can
				553	cause real errors to be overlooked. A better option is to use a
				554	gcc/g++ which works properly; 2.95.3 seems to be a good choice.
				555	<p>
				556	Unfortunately (27 Feb 02) it looks like g++ 3.0.4 is similarly
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	557	buggy, so you may need to issue this flag if you use 3.0.4. A
				558	while later (early Apr 02) this is confirmed as a scheduling bug
				559	in g++-3.0.4.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	560	</li><br><p>
				561
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	562	<li><code>--cachesim=no</code> [default]<br>
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	563	<code>--cachesim=yes</code> <p>When enabled, turns off memory
				564	checking, and turns on cache profiling. Cache profiling is
				565	described in detail in <a href="#cache">Section 7</a>. </li><p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	566	</ul>
				567
				568	There are also some options for debugging Valgrind itself. You
				569	shouldn't need to use them in the normal run of things. Nevertheless:
				570
				571	<ul>
				572
				573	<li><code>--single-step=no</code> [default]<br>
				574	<code>--single-step=yes</code>
				575	<p>When enabled, each x86 insn is translated seperately into
				576	instrumented code. When disabled, translation is done on a
				577	per-basic-block basis, giving much better translations.</li><br>
				578	<p>
				579
				580	<li><code>--optimise=no</code><br>
				581	<code>--optimise=yes</code> [default]
				582	<p>When enabled, various improvements are applied to the
				583	intermediate code, mainly aimed at allowing the simulated CPU's
				584	registers to be cached in the real CPU's registers over several
				585	simulated instructions.</li><br>
				586	<p>
				587
				588	<li><code>--instrument=no</code><br>
				589	<code>--instrument=yes</code> [default]
				590	<p>When disabled, the translations don't actually contain any
				591	instrumentation.</li><br>
				592	<p>
				593
				594	<li><code>--cleanup=no</code><br>
				595	<code>--cleanup=yes</code> [default]
				596	<p>When enabled, various improvments are applied to the
				597	post-instrumented intermediate code, aimed at removing redundant
				598	value checks.</li><br>
				599	<p>
				600
				601	<li><code>--trace-syscalls=no</code> [default]<br>
				602	<code>--trace-syscalls=yes</code>
				603	<p>Enable/disable tracing of system call intercepts.</li><br>
				604	<p>
				605
				606	<li><code>--trace-signals=no</code> [default]<br>
				607	<code>--trace-signals=yes</code>
				608	<p>Enable/disable tracing of signal handling.</li><br>
				609	<p>
				610
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	611	<li><code>--trace-sched=no</code> [default]<br>
				612	<code>--trace-sched=yes</code>
				613	<p>Enable/disable tracing of thread scheduling events.</li><br>
				614	<p>
				615
sewardj	45b4b37	2002-04-16 22:50:32 +0000	[diff] [blame]	616	<li><code>--trace-pthread=none</code> [default]<br>
				617	<code>--trace-pthread=some</code> <br>
				618	<code>--trace-pthread=all</code>
				619	<p>Specifies amount of trace detail for pthread-related events.</li><br>
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	620	<p>
				621
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	622	<li><code>--trace-symtab=no</code> [default]<br>
				623	<code>--trace-symtab=yes</code>
				624	<p>Enable/disable tracing of symbol table reading.</li><br>
				625	<p>
				626
				627	<li><code>--trace-malloc=no</code> [default]<br>
				628	<code>--trace-malloc=yes</code>
				629	<p>Enable/disable tracing of malloc/free (et al) intercepts.
				630	</li><br>
				631	<p>
				632
				633	<li><code>--stop-after=<number></code>
				634	[default: infinity, more or less]
				635	<p>After <number> basic blocks have been executed, shut down
				636	Valgrind and switch back to running the client on the real CPU.
				637	</li><br>
				638	<p>
				639
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	640	<li><code>--dump-error=<number></code> [default: inactive]
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	641	<p>After the program has exited, show gory details of the
				642	translation of the basic block containing the <number>'th
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	643	error context. When used with <code>--single-step=yes</code>,
				644	can show the exact x86 instruction causing an error. This is
				645	all fairly dodgy and doesn't work at all if threads are
				646	involved.</li><br>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	647	<p>
				648
				649	<li><code>--smc-check=none</code><br>
				650	<code>--smc-check=some</code> [default]<br>
				651	<code>--smc-check=all</code>
				652	<p>How carefully should Valgrind check for self-modifying code
				653	writes, so that translations can be discarded?  When
				654	"none", no writes are checked. When "some", only writes
				655	resulting from moves from integer registers to memory are
				656	checked. When "all", all memory writes are checked, even those
				657	with which are no sane program would generate code -- for
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	658	example, floating-point writes.
				659	<p>
				660	NOTE that this is all a bit bogus. This mechanism has never
				661	been enabled in any snapshot of Valgrind which was made
				662	available to the general public, because the extra checks reduce
				663	performance, increase complexity, and I have yet to come across
				664	any programs which actually use self-modifying code. I think
				665	the flag is ignored.
				666	</li>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	667	</ul>
				668
				669
				670	<a name="errormsgs">
				671	<h3>2.6  Explaination of error messages</h3>
				672
				673	Despite considerable sophistication under the hood, Valgrind can only
				674	really detect two kinds of errors, use of illegal addresses, and use
				675	of undefined values. Nevertheless, this is enough to help you
				676	discover all sorts of memory-management nasties in your code. This
				677	section presents a quick summary of what error messages mean. The
				678	precise behaviour of the error-checking machinery is described in
				679	<a href="#machine">Section 4</a>.
				680
				681
				682	<h4>2.6.1  Illegal read / Illegal write errors</h4>
				683	For example:
				684	<pre>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	685	Invalid read of size 4
				686	at 0x40F6BBCC: (within /usr/lib/libpng.so.2.1.0.9)
				687	by 0x40F6B804: (within /usr/lib/libpng.so.2.1.0.9)
				688	by 0x40B07FF4: read_png_image__FP8QImageIO (kernel/qpngio.cpp:326)
				689	by 0x40AC751B: QImageIO::read() (kernel/qimage.cpp:3621)
				690	Address 0xBFFFF0E0 is not stack'd, malloc'd or free'd
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	691	</pre>
				692
				693	<p>This happens when your program reads or writes memory at a place
				694	which Valgrind reckons it shouldn't. In this example, the program did
				695	a 4-byte read at address 0xBFFFF0E0, somewhere within the
				696	system-supplied library libpng.so.2.1.0.9, which was called from
				697	somewhere else in the same library, called from line 326 of
				698	qpngio.cpp, and so on.
				699
				700	<p>Valgrind tries to establish what the illegal address might relate
				701	to, since that's often useful. So, if it points into a block of
				702	memory which has already been freed, you'll be informed of this, and
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	703	also where the block was free'd at. Likewise, if it should turn out
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	704	to be just off the end of a malloc'd block, a common result of
				705	off-by-one-errors in array subscripting, you'll be informed of this
				706	fact, and also where the block was malloc'd.
				707
				708	<p>In this example, Valgrind can't identify the address. Actually the
				709	address is on the stack, but, for some reason, this is not a valid
				710	stack address -- it is below the stack pointer, %esp, and that isn't
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	711	allowed. In this particular case it's probably caused by gcc
				712	generating invalid code, a known bug in various flavours of gcc.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	713
				714	<p>Note that Valgrind only tells you that your program is about to
				715	access memory at an illegal address. It can't stop the access from
				716	happening. So, if your program makes an access which normally would
				717	result in a segmentation fault, you program will still suffer the same
				718	fate -- but you will get a message from Valgrind immediately prior to
				719	this. In this particular example, reading junk on the stack is
				720	non-fatal, and the program stays alive.
				721
				722
				723	<h4>2.6.2  Use of uninitialised values</h4>
				724	For example:
				725	<pre>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	726	Conditional jump or move depends on uninitialised value(s)
				727	at 0x402DFA94: _IO_vfprintf (_itoa.h:49)
				728	by 0x402E8476: _IO_printf (printf.c:36)
				729	by 0x8048472: main (tests/manuel1.c:8)
				730	by 0x402A6E5E: __libc_start_main (libc-start.c:129)
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	731	</pre>
				732
				733	<p>An uninitialised-value use error is reported when your program uses
				734	a value which hasn't been initialised -- in other words, is undefined.
				735	Here, the undefined value is used somewhere inside the printf()
				736	machinery of the C library. This error was reported when running the
				737	following small program:
				738	<pre>
				739	int main()
				740	{
				741	int x;
				742	printf ("x = %d\n", x);
				743	}
				744	</pre>
				745
				746	<p>It is important to understand that your program can copy around
				747	junk (uninitialised) data to its heart's content. Valgrind observes
				748	this and keeps track of the data, but does not complain. A complaint
				749	is issued only when your program attempts to make use of uninitialised
				750	data. In this example, x is uninitialised. Valgrind observes the
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	751	value being passed to _IO_printf and thence to _IO_vfprintf, but makes
				752	no comment. However, _IO_vfprintf has to examine the value of x so it
				753	can turn it into the corresponding ASCII string, and it is at this
				754	point that Valgrind complains.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	755
				756	<p>Sources of uninitialised data tend to be:
				757	<ul>
				758	<li>Local variables in procedures which have not been initialised,
				759	as in the example above.</li><br><p>
				760
				761	<li>The contents of malloc'd blocks, before you write something
				762	there. In C++, the new operator is a wrapper round malloc, so
				763	if you create an object with new, its fields will be
				764	uninitialised until you fill them in, which is only Right and
				765	Proper.</li>
				766	</ul>
				767
				768
				769
				770	<h4>2.6.3  Illegal frees</h4>
				771	For example:
				772	<pre>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	773	Invalid free()
				774	at 0x4004FFDF: free (ut_clientmalloc.c:577)
				775	by 0x80484C7: main (tests/doublefree.c:10)
				776	by 0x402A6E5E: __libc_start_main (libc-start.c:129)
				777	by 0x80483B1: (within tests/doublefree)
				778	Address 0x3807F7B4 is 0 bytes inside a block of size 177 free'd
				779	at 0x4004FFDF: free (ut_clientmalloc.c:577)
				780	by 0x80484C7: main (tests/doublefree.c:10)
				781	by 0x402A6E5E: __libc_start_main (libc-start.c:129)
				782	by 0x80483B1: (within tests/doublefree)
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	783	</pre>
				784	<p>Valgrind keeps track of the blocks allocated by your program with
				785	malloc/new, so it can know exactly whether or not the argument to
				786	free/delete is legitimate or not. Here, this test program has
				787	freed the same block twice. As with the illegal read/write errors,
				788	Valgrind attempts to make sense of the address free'd. If, as
				789	here, the address is one which has previously been freed, you wil
				790	be told that -- making duplicate frees of the same block easy to spot.
				791
				792
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	793	<h4>2.6.4  When a block is freed with an inappropriate
				794	deallocation function</h4>
sewardj	7c062c9	2002-05-01 21:46:38 +0000	[diff] [blame]	795	In the following example, a block allocated with <code>new []</code>
				796	has wrongly been deallocated with <code>free</code>:
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	797	<pre>
				798	Mismatched free() / delete / delete []
sewardj	7c062c9	2002-05-01 21:46:38 +0000	[diff] [blame]	799	at 0x40043249: free (vg_clientfuncs.c:171)
				800	by 0x4102BB4E: QGArray::~QGArray(void) (tools/qgarray.cpp:149)
				801	by 0x4C261C41: PptDoc::~PptDoc(void) (include/qmemarray.h:60)
				802	by 0x4C261F0E: PptXml::~PptXml(void) (pptxml.cc:44)
				803	Address 0x4BB292A8 is 0 bytes inside a block of size 64 alloc'd
				804	at 0x4004318C: __builtin_vec_new (vg_clientfuncs.c:152)
				805	by 0x4C21BC15: KLaola::readSBStream(int) const (klaola.cc:314)
				806	by 0x4C21C155: KLaola::stream(KLaola::OLENode const *) (klaola.cc:416)
				807	by 0x4C21788F: OLEFilter::convert(QCString const &) (olefilter.cc:272)
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	808	</pre>
				809	The following was told to me be the KDE 3 developers. I didn't know
				810	any of it myself. They also implemented the check itself.
				811	<p>
				812	In C++ it's important to deallocate memory in a way compatible with
				813	how it was allocated. The deal is:
				814	<ul>
				815	<li>If allocated with <code>malloc</code>, <code>calloc</code>,
				816	<code>realloc</code>, <code>valloc</code> or
				817	<code>memalign</code>, you must deallocate with <code>free</code>.
				818	<li>If allocated with <code>new []</code>, you must deallocate with
				819	<code>delete []</code>.
				820	<li>If allocated with <code>new</code>, you must deallocate with
				821	<code>delete</code>.
				822	</ul>
				823	The worst thing is that on Linux apparently it doesn't matter if you
				824	do muddle these up, and it all seems to work ok, but the same program
				825	may then crash on a different platform, Solaris for example. So it's
				826	best to fix it properly. According to the KDE folks "it's amazing how
				827	many C++ programmers don't know this".
				828
				829
				830
				831	<h4>2.6.5  Passing system call parameters with inadequate
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	832	read/write permissions</h4>
				833
				834	Valgrind checks all parameters to system calls. If a system call
				835	needs to read from a buffer provided by your program, Valgrind checks
				836	that the entire buffer is addressible and has valid data, ie, it is
				837	readable. And if the system call needs to write to a user-supplied
				838	buffer, Valgrind checks that the buffer is addressible. After the
				839	system call, Valgrind updates its administrative information to
				840	precisely reflect any changes in memory permissions caused by the
				841	system call.
				842
				843	<p>Here's an example of a system call with an invalid parameter:
				844	<pre>
				845	#include <stdlib.h>
				846	#include <unistd.h>
				847	int main( void )
				848	{
				849	char* arr = malloc(10);
				850	(void) write( 1 /* stdout */, arr, 10 );
				851	return 0;
				852	}
				853	</pre>
				854
				855	<p>You get this complaint ...
				856	<pre>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	857	Syscall param write(buf) contains uninitialised or unaddressable byte(s)
				858	at 0x4035E072: __libc_write
				859	by 0x402A6E5E: __libc_start_main (libc-start.c:129)
				860	by 0x80483B1: (within tests/badwrite)
				861	by <bogus frame pointer> ???
				862	Address 0x3807E6D0 is 0 bytes inside a block of size 10 alloc'd
				863	at 0x4004FEE6: malloc (ut_clientmalloc.c:539)
				864	by 0x80484A0: main (tests/badwrite.c:6)
				865	by 0x402A6E5E: __libc_start_main (libc-start.c:129)
				866	by 0x80483B1: (within tests/badwrite)
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	867	</pre>
				868
				869	<p>... because the program has tried to write uninitialised junk from
				870	the malloc'd block to the standard output.
				871
				872
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	873	<h4>2.6.6  Warning messages you might see</h4>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	874
				875	Most of these only appear if you run in verbose mode (enabled by
				876	<code>-v</code>):
				877	<ul>
				878	<li> <code>More than 50 errors detected. Subsequent errors
				879	will still be recorded, but in less detail than before.</code>
				880	<br>
				881	After 50 different errors have been shown, Valgrind becomes
				882	more conservative about collecting them. It then requires only
				883	the program counters in the top two stack frames to match when
				884	deciding whether or not two errors are really the same one.
				885	Prior to this point, the PCs in the top four frames are required
				886	to match. This hack has the effect of slowing down the
				887	appearance of new errors after the first 50. The 50 constant can
				888	be changed by recompiling Valgrind.
				889	<p>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	890	<li> <code>More than 300 errors detected. I'm not reporting any more.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	891	Final error counts may be inaccurate. Go fix your
				892	program!</code>
				893	<br>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	894	After 300 different errors have been detected, Valgrind ignores
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	895	any more. It seems unlikely that collecting even more different
				896	ones would be of practical help to anybody, and it avoids the
				897	danger that Valgrind spends more and more of its time comparing
				898	new errors against an ever-growing collection. As above, the 500
				899	number is a compile-time constant.
				900	<p>
				901	<li> <code>Warning: client exiting by calling exit(<number>).
				902	Bye!</code>
				903	<br>
				904	Your program has called the <code>exit</code> system call, which
				905	will immediately terminate the process. You'll get no exit-time
				906	error summaries or leak checks. Note that this is not the same
				907	as your program calling the ANSI C function <code>exit()</code>
				908	-- that causes a normal, controlled shutdown of Valgrind.
				909	<p>
				910	<li> <code>Warning: client switching stacks?</code>
				911	<br>
				912	Valgrind spotted such a large change in the stack pointer, %esp,
				913	that it guesses the client is switching to a different stack.
				914	At this point it makes a kludgey guess where the base of the new
				915	stack is, and sets memory permissions accordingly. You may get
				916	many bogus error messages following this, if Valgrind guesses
				917	wrong. At the moment "large change" is defined as a change of
				918	more that 2000000 in the value of the %esp (stack pointer)
				919	register.
				920	<p>
				921	<li> <code>Warning: client attempted to close Valgrind's logfile fd <number>
				922	</code>
				923	<br>
				924	Valgrind doesn't allow the client
				925	to close the logfile, because you'd never see any diagnostic
				926	information after that point. If you see this message,
				927	you may want to use the <code>--logfile-fd=<number></code>
				928	option to specify a different logfile file-descriptor number.
				929	<p>
				930	<li> <code>Warning: noted but unhandled ioctl <number></code>
				931	<br>
				932	Valgrind observed a call to one of the vast family of
				933	<code>ioctl</code> system calls, but did not modify its
				934	memory status info (because I have not yet got round to it).
				935	The call will still have gone through, but you may get spurious
				936	errors after this as a result of the non-update of the memory info.
				937	<p>
				938	<li> <code>Warning: unblocking signal <number> due to
				939	sigprocmask</code>
				940	<br>
				941	Really just a diagnostic from the signal simulation machinery.
				942	This message will appear if your program handles a signal by
				943	first <code>longjmp</code>ing out of the signal handler,
				944	and then unblocking the signal with <code>sigprocmask</code>
				945	-- a standard signal-handling idiom.
				946	<p>
				947	<li> <code>Warning: bad signal number <number> in __NR_sigaction.</code>
				948	<br>
				949	Probably indicates a bug in the signal simulation machinery.
				950	<p>
				951	<li> <code>Warning: set address range perms: large range <number></code>
				952	<br>
				953	Diagnostic message, mostly for my benefit, to do with memory
				954	permissions.
				955	</ul>
				956
				957
				958	<a name="suppfiles"></a>
				959	<h3>2.7  Writing suppressions files</h3>
				960
				961	A suppression file describes a bunch of errors which, for one reason
				962	or another, you don't want Valgrind to tell you about. Usually the
				963	reason is that the system libraries are buggy but unfixable, at least
				964	within the scope of the current debugging session. Multiple
				965	suppresions files are allowed. By default, Valgrind uses
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	966	<code>$PREFIX/lib/valgrind/default.supp</code>.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	967
				968	<p>
				969	You can ask to add suppressions from another file, by specifying
				970	<code>--suppressions=/path/to/file.supp</code>.
				971
				972	<p>Each suppression has the following components:<br>
				973	<ul>
				974
				975	<li>Its name. This merely gives a handy name to the suppression, by
				976	which it is referred to in the summary of used suppressions
				977	printed out when a program finishes. It's not important what
				978	the name is; any identifying string will do.
				979	<p>
				980
				981	<li>The nature of the error to suppress. Either:
				982	<code>Value1</code>,
				983	<code>Value2</code>,
sewardj	a7dc795	2002-03-24 11:29:13 +0000	[diff] [blame]	984	<code>Value4</code> or
				985	<code>Value8</code>,
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	986	meaning an uninitialised-value error when
sewardj	a7dc795	2002-03-24 11:29:13 +0000	[diff] [blame]	987	using a value of 1, 2, 4 or 8 bytes.
				988	Or
				989	<code>Cond</code> (or its old name, <code>Value0</code>),
				990	meaning use of an uninitialised CPU condition code. Or:
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	991	<code>Addr1</code>,
				992	<code>Addr2</code>,
				993	<code>Addr4</code> or
				994	<code>Addr8</code>, meaning an invalid address during a
				995	memory access of 1, 2, 4 or 8 bytes respectively. Or
				996	<code>Param</code>,
				997	meaning an invalid system call parameter error. Or
				998	<code>Free</code>, meaning an invalid or mismatching free.</li><br>
				999	<p>
				1000
				1001	<li>The "immediate location" specification. For Value and Addr
				1002	errors, is either the name of the function in which the error
				1003	occurred, or, failing that, the full path the the .so file
				1004	containing the error location. For Param errors, is the name of
				1005	the offending system call parameter. For Free errors, is the
				1006	name of the function doing the freeing (eg, <code>free</code>,
				1007	<code>__builtin_vec_delete</code>, etc)</li><br>
				1008	<p>
				1009
				1010	<li>The caller of the above "immediate location". Again, either a
				1011	function or shared-object name.</li><br>
				1012	<p>
				1013
				1014	<li>Optionally, one or two extra calling-function or object names,
				1015	for greater precision.</li>
				1016	</ul>
				1017
				1018	<p>
				1019	Locations may be either names of shared objects or wildcards matching
				1020	function names. They begin <code>obj:</code> and <code>fun:</code>
				1021	respectively. Function and object names to match against may use the
				1022	wildcard characters <code>*</code> and <code>?</code>.
				1023
				1024	A suppression only suppresses an error when the error matches all the
				1025	details in the suppression. Here's an example:
				1026	<pre>
				1027	{
				1028	__gconv_transform_ascii_internal/__mbrtowc/mbtowc
				1029	Value4
				1030	fun:__gconv_transform_ascii_internal
				1031	fun:__mbr*toc
				1032	fun:mbtowc
				1033	}
				1034	</pre>
				1035
				1036	<p>What is means is: suppress a use-of-uninitialised-value error, when
				1037	the data size is 4, when it occurs in the function
				1038	<code>__gconv_transform_ascii_internal</code>, when that is called
				1039	from any function of name matching <code>__mbr*toc</code>,
				1040	when that is called from
				1041	<code>mbtowc</code>. It doesn't apply under any other circumstances.
				1042	The string by which this suppression is identified to the user is
				1043	__gconv_transform_ascii_internal/__mbrtowc/mbtowc.
				1044
				1045	<p>Another example:
				1046	<pre>
				1047	{
				1048	libX11.so.6.2/libX11.so.6.2/libXaw.so.7.0
				1049	Value4
				1050	obj:/usr/X11R6/lib/libX11.so.6.2
				1051	obj:/usr/X11R6/lib/libX11.so.6.2
				1052	obj:/usr/X11R6/lib/libXaw.so.7.0
				1053	}
				1054	</pre>
				1055
				1056	<p>Suppress any size 4 uninitialised-value error which occurs anywhere
				1057	in <code>libX11.so.6.2</code>, when called from anywhere in the same
				1058	library, when called from anywhere in <code>libXaw.so.7.0</code>. The
				1059	inexact specification of locations is regrettable, but is about all
				1060	you can hope for, given that the X11 libraries shipped with Red Hat
				1061	7.2 have had their symbol tables removed.
				1062
				1063	<p>Note -- since the above two examples did not make it clear -- that
				1064	you can freely mix the <code>obj:</code> and <code>fun:</code>
				1065	styles of description within a single suppression record.
				1066
				1067
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1068	<a name="clientreq"></a>
				1069	<h3>2.8  The Client Request mechanism</h3>
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	1070
				1071	Valgrind has a trapdoor mechanism via which the client program can
				1072	pass all manner of requests and queries to Valgrind. Internally, this
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1073	is used extensively to make malloc, free, signals, threads, etc, work,
				1074	although you don't see that.
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	1075	<p>
				1076	For your convenience, a subset of these so-called client requests is
				1077	provided to allow you to tell Valgrind facts about the behaviour of
				1078	your program, and conversely to make queries. In particular, your
				1079	program can tell Valgrind about changes in memory range permissions
				1080	that Valgrind would not otherwise know about, and so allows clients to
				1081	get Valgrind to do arbitrary custom checks.
				1082	<p>
				1083	Clients need to include the header file <code>valgrind.h</code> to
				1084	make this work. The macros therein have the magical property that
				1085	they generate code in-line which Valgrind can spot. However, the code
				1086	does nothing when not run on Valgrind, so you are not forced to run
				1087	your program on Valgrind just because you use the macros in this file.
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1088	Also, you are not required to link your program with any extra
				1089	supporting libraries.
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	1090	<p>
				1091	A brief description of the available macros:
				1092	<ul>
				1093	<li><code>VALGRIND_MAKE_NOACCESS</code>,
				1094	<code>VALGRIND_MAKE_WRITABLE</code> and
				1095	<code>VALGRIND_MAKE_READABLE</code>. These mark address
				1096	ranges as completely inaccessible, accessible but containing
				1097	undefined data, and accessible and containing defined data,
				1098	respectively. Subsequent errors may have their faulting
				1099	addresses described in terms of these blocks. Returns a
				1100	"block handle". Returns zero when not run on Valgrind.
				1101	<p>
				1102	<li><code>VALGRIND_DISCARD</code>: At some point you may want
				1103	Valgrind to stop reporting errors in terms of the blocks
				1104	defined by the previous three macros. To do this, the above
				1105	macros return a small-integer "block handle". You can pass
				1106	this block handle to <code>VALGRIND_DISCARD</code>. After
				1107	doing so, Valgrind will no longer be able to relate
				1108	addressing errors to the user-defined block associated with
				1109	the handle. The permissions settings associated with the
				1110	handle remain in place; this just affects how errors are
				1111	reported, not whether they are reported. Returns 1 for an
				1112	invalid handle and 0 for a valid handle (although passing
				1113	invalid handles is harmless). Always returns 0 when not run
				1114	on Valgrind.
				1115	<p>
				1116	<li><code>VALGRIND_CHECK_NOACCESS</code>,
				1117	<code>VALGRIND_CHECK_WRITABLE</code> and
				1118	<code>VALGRIND_CHECK_READABLE</code>: check immediately
				1119	whether or not the given address range has the relevant
				1120	property, and if not, print an error message. Also, for the
				1121	convenience of the client, returns zero if the relevant
				1122	property holds; otherwise, the returned value is the address
				1123	of the first byte for which the property is not true.
				1124	Always returns 0 when not run on Valgrind.
				1125	<p>
				1126	<li><code>VALGRIND_CHECK_NOACCESS</code>: a quick and easy way
				1127	to find out whether Valgrind thinks a particular variable
				1128	(lvalue, to be precise) is addressible and defined. Prints
				1129	an error message if not. Returns no value.
				1130	<p>
				1131	<li><code>VALGRIND_MAKE_NOACCESS_STACK</code>: a highly
				1132	experimental feature. Similarly to
				1133	<code>VALGRIND_MAKE_NOACCESS</code>, this marks an address
				1134	range as inaccessible, so that subsequent accesses to an
				1135	address in the range gives an error. However, this macro
				1136	does not return a block handle. Instead, all annotations
				1137	created like this are reviewed at each client
				1138	<code>ret</code> (subroutine return) instruction, and those
				1139	which now define an address range block the client's stack
				1140	pointer register (<code>%esp</code>) are automatically
				1141	deleted.
				1142	<p>
				1143	In other words, this macro allows the client to tell
				1144	Valgrind about red-zones on its own stack. Valgrind
				1145	automatically discards this information when the stack
				1146	retreats past such blocks. Beware: hacky and flaky, and
				1147	probably interacts badly with the new pthread support.
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	1148	<p>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1149	<li><code>RUNNING_ON_VALGRIND</code>: returns 1 if running on
				1150	Valgrind, 0 if running on the real CPU.
				1151	<p>
				1152	<li><code>VALGRIND_DO_LEAK_CHECK</code>: run the memory leak detector
				1153	right now. Returns no value. I guess this could be used to
				1154	incrementally check for leaks between arbitrary places in the
				1155	program's execution. Warning: not properly tested!
				1156	</ul>
				1157	<p>
				1158
				1159
				1160	<a name="pthreads"></a>
				1161	<h3>2.9  Support for POSIX Pthreads</h3>
				1162
				1163	As of late April 02, Valgrind supports programs which use POSIX
				1164	pthreads. Doing this has proved technically challenging and is still
				1165	in progress, but it works well enough, as of 1 May 02, for significant
				1166	threaded applications to work.
				1167	<p>
				1168	It works as follows: threaded apps are (dynamically) linked against
				1169	<code>libpthread.so</code>. Usually this is the one installed with
				1170	your Linux distribution. Valgrind, however, supplies its own
				1171	<code>libpthread.so</code> and automatically connects your program to
				1172	it instead.
				1173	<p>
				1174	The fake <code>libpthread.so</code> and Valgrind cooperate to
				1175	implement a user-space pthreads package. This approach avoids the
				1176	horrible implementation problems of implementing a truly
				1177	multiprocessor version of Valgrind, but it does mean that threaded
				1178	apps run only on one CPU, even if you have a multiprocessor machine.
				1179	<p>
				1180	Valgrind schedules your threads in a round-robin fashion, with all
				1181	threads having equal priority. It switches threads every 20000 basic
				1182	blocks (typically around 120000 x86 instructions), which means you'll
				1183	get a much finer interleaving of thread executions than when run
				1184	natively. This in itself may cause your program to behave differently
				1185	if you have some kind of concurrency, critical race, locking, or
				1186	similar, bugs.
				1187	<p>
				1188	The current (1 May 02) state of pthread support is as follows. Please
				1189	note that things are advancing rapidly, so the situation may have
				1190	improved by the time you read this -- check the web site for further
				1191	updates.
				1192	<ul>
				1193	<li>Mutexes, condition variables, thread-specific data and
				1194	<code>pthread_once</code> currently work.
				1195	<p>
				1196	<li>Various attribute-like calls are handled but ignored.
				1197	You get a warning message.
				1198	<p>
				1199	<li>The main big omission is proper cleanup support for cancellation.
				1200	<code>pthread_cancel</code> works, but instantly nukes the target
				1201	thread without giving it any chance to clean up. Also, when a
				1202	thread exits, it does not run any cleanup handlers.
				1203	<p>
				1204	<li>Currently the following syscalls are thread-safe (nonblocking):
				1205	<code>write</code> <code>read</code> <code>nanosleep</code>
				1206	<code>sleep</code> <code>select</code> and <code>poll</code>.
				1207	<p>
				1208	<li>The POSIX requirement that each thread have its own
				1209	signal-blocking mask is not done; the signal handling mechanism is
				1210	thread-unaware and all signals are delivered to the main thread,
				1211	antidisirregardless.
				1212	</ul>
				1213
				1214
				1215	As of 1 May 02, the following programs now work fine on my RedHat 7.2
				1216	box: Opera 6.0Beta2, KNode in KDE 3.0, Mozilla-0.9.2.1 and
				1217	Galeon-0.11.3, both as supplied with RedHat 7.2.
				1218	<p>
sewardj	1f13ab1	2002-05-02 03:57:00 +0000	[diff] [blame]	1219	Mozilla 1.0RC1 works fine too, provided that you patch it as described
				1220	here: <a href="http://bugzilla.mozilla.org/show_bug.cgi?id=124335">
				1221	http://bugzilla.mozilla.org/show_bug.cgi?id=124335</a>. This fixes a
				1222	bug in Mozilla which assumes that memory returned from
				1223	<code>malloc</code> is 8-aligned. Valgrind's allocator only
				1224	guarantees 4-alignment, so without the patch Mozilla makes an illegal
				1225	memory access, which Valgrind of course spots, and then bombs.
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1226
				1227
				1228
				1229	<a name="install"></a>
				1230	<h3>2.10  Building and installing</h3>
				1231
				1232	We now use the standard Unix <code>./configure</code>,
				1233	<code>make</code>, <code>make install</code> mechanism, and I have
				1234	attempted to ensure that it works on machines with kernel 2.2 or 2.4
				1235	and glibc 2.1.X or 2.2.X. I don't think there is much else to say.
				1236	There are no options apart from the usual <code>--prefix</code> that
				1237	you should give to <code>./configure</code>.
				1238	<p>
				1239	Let me know if you have build problems.
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	1240
				1241
				1242
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1243	<a name="problems"></a>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1244	<h3>2.11  If you have problems</h3>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1245	Mail me (<a href="mailto:jseward@acm.org">jseward@acm.org</a>).
				1246
				1247	<p>See <a href="#limits">Section 4</a> for the known limitations of
				1248	Valgrind, and for a list of programs which are known not to work on
				1249	it.
				1250
				1251	<p>The translator/instrumentor has a lot of assertions in it. They
				1252	are permanently enabled, and I have no plans to disable them. If one
				1253	of these breaks, please mail me!
				1254
				1255	<p>If you get an assertion failure on the expression
				1256	<code>chunkSane(ch)</code> in <code>vg_free()</code> in
				1257	<code>vg_malloc.c</code>, this may have happened because your program
				1258	wrote off the end of a malloc'd block, or before its beginning.
				1259	Valgrind should have emitted a proper message to that effect before
				1260	dying in this way. This is a known problem which I should fix.
				1261	<p>
				1262
				1263	<hr width="100%">
				1264
				1265	<a name="machine"></a>
				1266	<h2>3  Details of the checking machinery</h2>
				1267
				1268	Read this section if you want to know, in detail, exactly what and how
				1269	Valgrind is checking.
				1270
				1271	<a name="vvalue"></a>
				1272	<h3>3.1  Valid-value (V) bits</h3>
				1273
				1274	It is simplest to think of Valgrind implementing a synthetic Intel x86
				1275	CPU which is identical to a real CPU, except for one crucial detail.
				1276	Every bit (literally) of data processed, stored and handled by the
				1277	real CPU has, in the synthetic CPU, an associated "valid-value" bit,
				1278	which says whether or not the accompanying bit has a legitimate value.
				1279	In the discussions which follow, this bit is referred to as the V
				1280	(valid-value) bit.
				1281
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1282	<p>Each byte in the system therefore has a 8 V bits which follow
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1283	it wherever it goes. For example, when the CPU loads a word-size item
				1284	(4 bytes) from memory, it also loads the corresponding 32 V bits from
				1285	a bitmap which stores the V bits for the process' entire address
				1286	space. If the CPU should later write the whole or some part of that
				1287	value to memory at a different address, the relevant V bits will be
				1288	stored back in the V-bit bitmap.
				1289
				1290	<p>In short, each bit in the system has an associated V bit, which
				1291	follows it around everywhere, even inside the CPU. Yes, the CPU's
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1292	(integer and <code>%eflags</code>) registers have their own V bit
				1293	vectors.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1294
				1295	<p>Copying values around does not cause Valgrind to check for, or
				1296	report on, errors. However, when a value is used in a way which might
				1297	conceivably affect the outcome of your program's computation, the
				1298	associated V bits are immediately checked. If any of these indicate
				1299	that the value is undefined, an error is reported.
				1300
				1301	<p>Here's an (admittedly nonsensical) example:
				1302	<pre>
				1303	int i, j;
				1304	int a[10], b[10];
				1305	for (i = 0; i < 10; i++) {
				1306	j = a[i];
				1307	b[i] = j;
				1308	}
				1309	</pre>
				1310
				1311	<p>Valgrind emits no complaints about this, since it merely copies
				1312	uninitialised values from <code>a[]</code> into <code>b[]</code>, and
				1313	doesn't use them in any way. However, if the loop is changed to
				1314	<pre>
				1315	for (i = 0; i < 10; i++) {
				1316	j += a[i];
				1317	}
				1318	if (j == 77)
				1319	printf("hello there\n");
				1320	</pre>
				1321	then Valgrind will complain, at the <code>if</code>, that the
				1322	condition depends on uninitialised values.
				1323
				1324	<p>Most low level operations, such as adds, cause Valgrind to
				1325	use the V bits for the operands to calculate the V bits for the
				1326	result. Even if the result is partially or wholly undefined,
				1327	it does not complain.
				1328
				1329	<p>Checks on definedness only occur in two places: when a value is
				1330	used to generate a memory address, and where control flow decision
				1331	needs to be made. Also, when a system call is detected, valgrind
				1332	checks definedness of parameters as required.
				1333
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1334	<p>If a check should detect undefinedness, an error message is
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1335	issued. The resulting value is subsequently regarded as well-defined.
				1336	To do otherwise would give long chains of error messages. In effect,
				1337	we say that undefined values are non-infectious.
				1338
				1339	<p>This sounds overcomplicated. Why not just check all reads from
				1340	memory, and complain if an undefined value is loaded into a CPU register?
				1341	Well, that doesn't work well, because perfectly legitimate C programs routinely
				1342	copy uninitialised values around in memory, and we don't want endless complaints
				1343	about that. Here's the canonical example. Consider a struct
				1344	like this:
				1345	<pre>
				1346	struct S { int x; char c; };
				1347	struct S s1, s2;
				1348	s1.x = 42;
				1349	s1.c = 'z';
				1350	s2 = s1;
				1351	</pre>
				1352
				1353	<p>The question to ask is: how large is <code>struct S</code>, in
				1354	bytes? An int is 4 bytes and a char one byte, so perhaps a struct S
				1355	occupies 5 bytes? Wrong. All (non-toy) compilers I know of will
				1356	round the size of <code>struct S</code> up to a whole number of words,
				1357	in this case 8 bytes. Not doing this forces compilers to generate
				1358	truly appalling code for subscripting arrays of <code>struct
				1359	S</code>'s.
				1360
				1361	<p>So s1 occupies 8 bytes, yet only 5 of them will be initialised.
				1362	For the assignment <code>s2 = s1</code>, gcc generates code to copy
				1363	all 8 bytes wholesale into <code>s2</code> without regard for their
				1364	meaning. If Valgrind simply checked values as they came out of
				1365	memory, it would yelp every time a structure assignment like this
				1366	happened. So the more complicated semantics described above is
				1367	necessary. This allows gcc to copy <code>s1</code> into
				1368	<code>s2</code> any way it likes, and a warning will only be emitted
				1369	if the uninitialised values are later used.
				1370
				1371	<p>One final twist to this story. The above scheme allows garbage to
				1372	pass through the CPU's integer registers without complaint. It does
				1373	this by giving the integer registers V tags, passing these around in
				1374	the expected way. This complicated and computationally expensive to
				1375	do, but is necessary. Valgrind is more simplistic about
				1376	floating-point loads and stores. In particular, V bits for data read
				1377	as a result of floating-point loads are checked at the load
				1378	instruction. So if your program uses the floating-point registers to
				1379	do memory-to-memory copies, you will get complaints about
				1380	uninitialised values. Fortunately, I have not yet encountered a
				1381	program which (ab)uses the floating-point registers in this way.
				1382
				1383	<a name="vaddress"></a>
				1384	<h3>3.2  Valid-address (A) bits</h3>
				1385
				1386	Notice that the previous section describes how the validity of values
				1387	is established and maintained without having to say whether the
				1388	program does or does not have the right to access any particular
				1389	memory location. We now consider the latter issue.
				1390
				1391	<p>As described above, every bit in memory or in the CPU has an
				1392	associated valid-value (V) bit. In addition, all bytes in memory, but
				1393	not in the CPU, have an associated valid-address (A) bit. This
				1394	indicates whether or not the program can legitimately read or write
				1395	that location. It does not give any indication of the validity or the
				1396	data at that location -- that's the job of the V bits -- only whether
				1397	or not the location may be accessed.
				1398
				1399	<p>Every time your program reads or writes memory, Valgrind checks the
				1400	A bits associated with the address. If any of them indicate an
				1401	invalid address, an error is emitted. Note that the reads and writes
				1402	themselves do not change the A bits, only consult them.
				1403
				1404	<p>So how do the A bits get set/cleared? Like this:
				1405
				1406	<ul>
				1407	<li>When the program starts, all the global data areas are marked as
				1408	accessible.</li><br>
				1409	<p>
				1410
				1411	<li>When the program does malloc/new, the A bits for the exactly the
				1412	area allocated, and not a byte more, are marked as accessible.
				1413	Upon freeing the area the A bits are changed to indicate
				1414	inaccessibility.</li><br>
				1415	<p>
				1416
				1417	<li>When the stack pointer register (%esp) moves up or down, A bits
				1418	are set. The rule is that the area from %esp up to the base of
				1419	the stack is marked as accessible, and below %esp is
				1420	inaccessible. (If that sounds illogical, bear in mind that the
				1421	stack grows down, not up, on almost all Unix systems, including
				1422	GNU/Linux.) Tracking %esp like this has the useful side-effect
				1423	that the section of stack used by a function for local variables
				1424	etc is automatically marked accessible on function entry and
				1425	inaccessible on exit.</li><br>
				1426	<p>
				1427
				1428	<li>When doing system calls, A bits are changed appropriately. For
				1429	example, mmap() magically makes files appear in the process's
				1430	address space, so the A bits must be updated if mmap()
				1431	succeeds.</li><br>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1432	<p>
				1433
				1434	<li>Optionally, your program can tell Valgrind about such changes
				1435	explicitly, using the client request mechanism described above.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1436	</ul>
				1437
				1438
				1439	<a name="together"></a>
				1440	<h3>3.3  Putting it all together</h3>
				1441	Valgrind's checking machinery can be summarised as follows:
				1442
				1443	<ul>
				1444	<li>Each byte in memory has 8 associated V (valid-value) bits,
				1445	saying whether or not the byte has a defined value, and a single
				1446	A (valid-address) bit, saying whether or not the program
				1447	currently has the right to read/write that address.</li><br>
				1448	<p>
				1449
				1450	<li>When memory is read or written, the relevant A bits are
				1451	consulted. If they indicate an invalid address, Valgrind emits
				1452	an Invalid read or Invalid write error.</li><br>
				1453	<p>
				1454
				1455	<li>When memory is read into the CPU's integer registers, the
				1456	relevant V bits are fetched from memory and stored in the
				1457	simulated CPU. They are not consulted.</li><br>
				1458	<p>
				1459
				1460	<li>When an integer register is written out to memory, the V bits
				1461	for that register are written back to memory too.</li><br>
				1462	<p>
				1463
				1464	<li>When memory is read into the CPU's floating point registers, the
				1465	relevant V bits are read from memory and they are immediately
				1466	checked. If any are invalid, an uninitialised value error is
				1467	emitted. This precludes using the floating-point registers to
				1468	copy possibly-uninitialised memory, but simplifies Valgrind in
				1469	that it does not have to track the validity status of the
				1470	floating-point registers.</li><br>
				1471	<p>
				1472
				1473	<li>As a result, when a floating-point register is written to
				1474	memory, the associated V bits are set to indicate a valid
				1475	value.</li><br>
				1476	<p>
				1477
				1478	<li>When values in integer CPU registers are used to generate a
				1479	memory address, or to determine the outcome of a conditional
				1480	branch, the V bits for those values are checked, and an error
				1481	emitted if any of them are undefined.</li><br>
				1482	<p>
				1483
				1484	<li>When values in integer CPU registers are used for any other
				1485	purpose, Valgrind computes the V bits for the result, but does
				1486	not check them.</li><br>
				1487	<p>
				1488
				1489	<li>One the V bits for a value in the CPU have been checked, they
				1490	are then set to indicate validity. This avoids long chains of
				1491	errors.</li><br>
				1492	<p>
				1493
				1494	<li>When values are loaded from memory, valgrind checks the A bits
				1495	for that location and issues an illegal-address warning if
				1496	needed. In that case, the V bits loaded are forced to indicate
				1497	Valid, despite the location being invalid.
				1498	<p>
				1499	This apparently strange choice reduces the amount of confusing
				1500	information presented to the user. It avoids the
				1501	unpleasant phenomenon in which memory is read from a place which
				1502	is both unaddressible and contains invalid values, and, as a
				1503	result, you get not only an invalid-address (read/write) error,
				1504	but also a potentially large set of uninitialised-value errors,
				1505	one for every time the value is used.
				1506	<p>
				1507	There is a hazy boundary case to do with multi-byte loads from
				1508	addresses which are partially valid and partially invalid. See
				1509	details of the flag <code>--partial-loads-ok</code> for details.
				1510	</li><br>
				1511	</ul>
				1512
				1513	Valgrind intercepts calls to malloc, calloc, realloc, valloc,
				1514	memalign, free, new and delete. The behaviour you get is:
				1515
				1516	<ul>
				1517
				1518	<li>malloc/new: the returned memory is marked as addressible but not
				1519	having valid values. This means you have to write on it before
				1520	you can read it.</li><br>
				1521	<p>
				1522
				1523	<li>calloc: returned memory is marked both addressible and valid,
				1524	since calloc() clears the area to zero.</li><br>
				1525	<p>
				1526
				1527	<li>realloc: if the new size is larger than the old, the new section
				1528	is addressible but invalid, as with malloc.</li><br>
				1529	<p>
				1530
				1531	<li>If the new size is smaller, the dropped-off section is marked as
				1532	unaddressible. You may only pass to realloc a pointer
				1533	previously issued to you by malloc/calloc/new/realloc.</li><br>
				1534	<p>
				1535
				1536	<li>free/delete: you may only pass to free a pointer previously
				1537	issued to you by malloc/calloc/new/realloc, or the value
				1538	NULL. Otherwise, Valgrind complains. If the pointer is indeed
				1539	valid, Valgrind marks the entire area it points at as
				1540	unaddressible, and places the block in the freed-blocks-queue.
				1541	The aim is to defer as long as possible reallocation of this
				1542	block. Until that happens, all attempts to access it will
				1543	elicit an invalid-address error, as you would hope.</li><br>
				1544	</ul>
				1545
				1546
				1547
				1548	<a name="signals"></a>
				1549	<h3>3.4  Signals</h3>
				1550
				1551	Valgrind provides suitable handling of signals, so, provided you stick
				1552	to POSIX stuff, you should be ok. Basic sigaction() and sigprocmask()
				1553	are handled. Signal handlers may return in the normal way or do
				1554	longjmp(); both should work ok. As specified by POSIX, a signal is
				1555	blocked in its own handler. Default actions for signals should work
				1556	as before. Etc, etc.
				1557
				1558	<p>Under the hood, dealing with signals is a real pain, and Valgrind's
				1559	simulation leaves much to be desired. If your program does
				1560	way-strange stuff with signals, bad things may happen. If so, let me
				1561	know. I don't promise to fix it, but I'd at least like to be aware of
				1562	it.
				1563
				1564
				1565	<a name="leaks"><a/>
				1566	<h3>3.5  Memory leak detection</h3>
				1567
				1568	Valgrind keeps track of all memory blocks issued in response to calls
				1569	to malloc/calloc/realloc/new. So when the program exits, it knows
				1570	which blocks are still outstanding -- have not been returned, in other
				1571	words. Ideally, you want your program to have no blocks still in use
				1572	at exit. But many programs do.
				1573
				1574	<p>For each such block, Valgrind scans the entire address space of the
				1575	process, looking for pointers to the block. One of three situations
				1576	may result:
				1577
				1578	<ul>
				1579	<li>A pointer to the start of the block is found. This usually
				1580	indicates programming sloppiness; since the block is still
				1581	pointed at, the programmer could, at least in principle, free'd
				1582	it before program exit.</li><br>
				1583	<p>
				1584
				1585	<li>A pointer to the interior of the block is found. The pointer
				1586	might originally have pointed to the start and have been moved
				1587	along, or it might be entirely unrelated. Valgrind deems such a
				1588	block as "dubious", that is, possibly leaked,
				1589	because it's unclear whether or
				1590	not a pointer to it still exists.</li><br>
				1591	<p>
				1592
				1593	<li>The worst outcome is that no pointer to the block can be found.
				1594	The block is classified as "leaked", because the
				1595	programmer could not possibly have free'd it at program exit,
				1596	since no pointer to it exists. This might be a symptom of
				1597	having lost the pointer at some earlier point in the
				1598	program.</li>
				1599	</ul>
				1600
				1601	Valgrind reports summaries about leaked and dubious blocks.
				1602	For each such block, it will also tell you where the block was
				1603	allocated. This should help you figure out why the pointer to it has
				1604	been lost. In general, you should attempt to ensure your programs do
				1605	not have any leaked or dubious blocks at exit.
				1606
				1607	<p>The precise area of memory in which Valgrind searches for pointers
				1608	is: all naturally-aligned 4-byte words for which all A bits indicate
				1609	addressibility and all V bits indicated that the stored value is
				1610	actually valid.
				1611
				1612	<p><hr width="100%">
				1613
				1614
				1615	<a name="limits"></a>
				1616	<h2>4  Limitations</h2>
				1617
				1618	The following list of limitations seems depressingly long. However,
				1619	most programs actually work fine.
				1620
				1621	<p>Valgrind will run x86-GNU/Linux ELF dynamically linked binaries, on
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1622	a kernel 2.2.X or 2.4.X system, subject to the following constraints:
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1623
				1624	<ul>
				1625	<li>No MMX, SSE, SSE2, 3DNow instructions. If the translator
				1626	encounters these, Valgrind will simply give up. It may be
				1627	possible to add support for them at a later time. Intel added a
				1628	few instructions such as "cmov" to the integer instruction set
				1629	on Pentium and later processors, and these are supported.
				1630	Nevertheless it's safest to think of Valgrind as implementing
				1631	the 486 instruction set.</li><br>
				1632	<p>
				1633
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1634	<li>Pthreads support is improving, but there are still significant
				1635	limitations in that department. See the section above on
				1636	Pthreads. Note that your program must be dynamically linked
				1637	against <code>libpthread.so</code>, so that Valgrind can
				1638	substitute its own implementation at program startup time. If
				1639	you're statically linked against it, things will fail
				1640	badly.</li><br>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1641	<p>
				1642
				1643	<li>Valgrind assumes that the floating point registers are not used
				1644	as intermediaries in memory-to-memory copies, so it immediately
				1645	checks V bits in floating-point loads/stores. If you want to
				1646	write code which copies around possibly-uninitialised values,
				1647	you must ensure these travel through the integer registers, not
				1648	the FPU.</li><br>
				1649	<p>
				1650
				1651	<li>If your program does its own memory management, rather than
				1652	using malloc/new/free/delete, it should still work, but
				1653	Valgrind's error checking won't be so effective.</li><br>
				1654	<p>
				1655
				1656	<li>Valgrind's signal simulation is not as robust as it could be.
				1657	Basic POSIX-compliant sigaction and sigprocmask functionality is
				1658	supplied, but it's conceivable that things could go badly awry
				1659	if you do wierd things with signals. Workaround: don't.
				1660	Programs that do non-POSIX signal tricks are in any case
				1661	inherently unportable, so should be avoided if
				1662	possible.</li><br>
				1663	<p>
				1664
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1665	<li>Programs which try to handle signals on
				1666	an alternate stack (sigaltstack) are not supported, although
				1667	they could be, with a bit of effort.</li><br>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1668	<p>
				1669
				1670	<li>Programs which switch stacks are not well handled. Valgrind
				1671	does have support for this, but I don't have great faith in it.
				1672	It's difficult -- there's no cast-iron way to decide whether a
				1673	large change in %esp is as a result of the program switching
				1674	stacks, or merely allocating a large object temporarily on the
				1675	current stack -- yet Valgrind needs to handle the two situations
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1676	differently. 1 May 02: this probably interacts badly with the
				1677	new pthread support. I haven't checked properly.</li><br>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1678	<p>
				1679
				1680	<li>x86 instructions, and system calls, have been implemented on
				1681	demand. So it's possible, although unlikely, that a program
				1682	will fall over with a message to that effect. If this happens,
				1683	please mail me ALL the details printed out, so I can try and
				1684	implement the missing feature.</li><br>
				1685	<p>
				1686
				1687	<li>x86 floating point works correctly, but floating-point code may
				1688	run even more slowly than integer code, due to my simplistic
				1689	approach to FPU emulation.</li><br>
				1690	<p>
				1691
				1692	<li>You can't Valgrind-ize statically linked binaries. Valgrind
				1693	relies on the dynamic-link mechanism to gain control at
				1694	startup.</li><br>
				1695	<p>
				1696
				1697	<li>Memory consumption of your program is majorly increased whilst
				1698	running under Valgrind. This is due to the large amount of
				1699	adminstrative information maintained behind the scenes. Another
				1700	cause is that Valgrind dynamically translates the original
				1701	executable and never throws any translation away, except in
				1702	those rare cases where self-modifying code is detected.
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1703	Translated, instrumented code is 12-14 times larger than the
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1704	original (!) so you can easily end up with 15+ MB of
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1705	translations when running (eg) a web browser.
				1706	</li>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1707	</ul>
				1708
				1709
				1710	Programs which are known not to work are:
				1711
				1712	<ul>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1713	<li>emacs starts up but immediately concludes it is out of memory
				1714	and aborts. Emacs has it's own memory-management scheme, but I
				1715	don't understand why this should interact so badly with
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1716	Valgrind. Emacs works fine if you build it to use the standard
				1717	malloc/free routines.</li><br>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1718	<p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1719	</ul>
				1720
				1721
				1722	<p><hr width="100%">
				1723
				1724
				1725	<a name="howitworks"></a>
				1726	<h2>5  How it works -- a rough overview</h2>
				1727	Some gory details, for those with a passion for gory details. You
				1728	don't need to read this section if all you want to do is use Valgrind.
				1729
				1730	<a name="startb"></a>
				1731	<h3>5.1  Getting started</h3>
				1732
				1733	Valgrind is compiled into a shared object, valgrind.so. The shell
				1734	script valgrind sets the LD_PRELOAD environment variable to point to
				1735	valgrind.so. This causes the .so to be loaded as an extra library to
				1736	any subsequently executed dynamically-linked ELF binary, viz, the
				1737	program you want to debug.
				1738
				1739	<p>The dynamic linker allows each .so in the process image to have an
				1740	initialisation function which is run before main(). It also allows
				1741	each .so to have a finalisation function run after main() exits.
				1742
				1743	<p>When valgrind.so's initialisation function is called by the dynamic
				1744	linker, the synthetic CPU to starts up. The real CPU remains locked
				1745	in valgrind.so for the entire rest of the program, but the synthetic
				1746	CPU returns from the initialisation function. Startup of the program
				1747	now continues as usual -- the dynamic linker calls all the other .so's
				1748	initialisation routines, and eventually runs main(). This all runs on
				1749	the synthetic CPU, not the real one, but the client program cannot
				1750	tell the difference.
				1751
				1752	<p>Eventually main() exits, so the synthetic CPU calls valgrind.so's
				1753	finalisation function. Valgrind detects this, and uses it as its cue
				1754	to exit. It prints summaries of all errors detected, possibly checks
				1755	for memory leaks, and then exits the finalisation routine, but now on
				1756	the real CPU. The synthetic CPU has now lost control -- permanently
				1757	-- so the program exits back to the OS on the real CPU, just as it
				1758	would have done anyway.
				1759
				1760	<p>On entry, Valgrind switches stacks, so it runs on its own stack.
				1761	On exit, it switches back. This means that the client program
				1762	continues to run on its own stack, so we can switch back and forth
				1763	between running it on the simulated and real CPUs without difficulty.
				1764	This was an important design decision, because it makes it easy (well,
				1765	significantly less difficult) to debug the synthetic CPU.
				1766
				1767
				1768	<a name="engine"></a>
				1769	<h3>5.2  The translation/instrumentation engine</h3>
				1770
				1771	Valgrind does not directly run any of the original program's code. Only
				1772	instrumented translations are run. Valgrind maintains a translation
				1773	table, which allows it to find the translation quickly for any branch
				1774	target (code address). If no translation has yet been made, the
				1775	translator - a just-in-time translator - is summoned. This makes an
				1776	instrumented translation, which is added to the collection of
				1777	translations. Subsequent jumps to that address will use this
				1778	translation.
				1779
				1780	<p>Valgrind can optionally check writes made by the application, to
				1781	see if they are writing an address contained within code which has
				1782	been translated. Such a write invalidates translations of code
				1783	bracketing the written address. Valgrind will discard the relevant
				1784	translations, which causes them to be re-made, if they are needed
				1785	again, reflecting the new updated data stored there. In this way,
				1786	self modifying code is supported. In practice I have not found any
				1787	Linux applications which use self-modifying-code.
				1788
				1789	<p>The JITter translates basic blocks -- blocks of straight-line-code
				1790	-- as single entities. To minimise the considerable difficulties of
				1791	dealing with the x86 instruction set, x86 instructions are first
				1792	translated to a RISC-like intermediate code, similar to sparc code,
				1793	but with an infinite number of virtual integer registers. Initially
				1794	each insn is translated seperately, and there is no attempt at
				1795	instrumentation.
				1796
				1797	<p>The intermediate code is improved, mostly so as to try and cache
				1798	the simulated machine's registers in the real machine's registers over
				1799	several simulated instructions. This is often very effective. Also,
				1800	we try to remove redundant updates of the simulated machines's
				1801	condition-code register.
				1802
				1803	<p>The intermediate code is then instrumented, giving more
				1804	intermediate code. There are a few extra intermediate-code operations
				1805	to support instrumentation; it is all refreshingly simple. After
				1806	instrumentation there is a cleanup pass to remove redundant value
				1807	checks.
				1808
				1809	<p>This gives instrumented intermediate code which mentions arbitrary
				1810	numbers of virtual registers. A linear-scan register allocator is
				1811	used to assign real registers and possibly generate spill code. All
				1812	of this is still phrased in terms of the intermediate code. This
				1813	machinery is inspired by the work of Reuben Thomas (MITE).
				1814
				1815	<p>Then, and only then, is the final x86 code emitted. The
				1816	intermediate code is carefully designed so that x86 code can be
				1817	generated from it without need for spare registers or other
				1818	inconveniences.
				1819
				1820	<p>The translations are managed using a traditional LRU-based caching
				1821	scheme. The translation cache has a default size of about 14MB.
				1822
				1823	<a name="track"></a>
				1824
				1825	<h3>5.3  Tracking the status of memory</h3> Each byte in the
				1826	process' address space has nine bits associated with it: one A bit and
				1827	eight V bits. The A and V bits for each byte are stored using a
				1828	sparse array, which flexibly and efficiently covers arbitrary parts of
				1829	the 32-bit address space without imposing significant space or
				1830	performance overheads for the parts of the address space never
				1831	visited. The scheme used, and speedup hacks, are described in detail
				1832	at the top of the source file vg_memory.c, so you should read that for
				1833	the gory details.
				1834
				1835	<a name="sys_calls"></a>
				1836
				1837	<h3>5.4 System calls</h3>
				1838	All system calls are intercepted. The memory status map is consulted
				1839	before and updated after each call. It's all rather tiresome. See
				1840	vg_syscall_mem.c for details.
				1841
				1842	<a name="sys_signals"></a>
				1843
				1844	<h3>5.5  Signals</h3>
				1845	All system calls to sigaction() and sigprocmask() are intercepted. If
				1846	the client program is trying to set a signal handler, Valgrind makes a
				1847	note of the handler address and which signal it is for. Valgrind then
				1848	arranges for the same signal to be delivered to its own handler.
				1849
				1850	<p>When such a signal arrives, Valgrind's own handler catches it, and
				1851	notes the fact. At a convenient safe point in execution, Valgrind
				1852	builds a signal delivery frame on the client's stack and runs its
				1853	handler. If the handler longjmp()s, there is nothing more to be said.
				1854	If the handler returns, Valgrind notices this, zaps the delivery
				1855	frame, and carries on where it left off before delivering the signal.
				1856
				1857	<p>The purpose of this nonsense is that setting signal handlers
				1858	essentially amounts to giving callback addresses to the Linux kernel.
				1859	We can't allow this to happen, because if it did, signal handlers
				1860	would run on the real CPU, not the simulated one. This means the
				1861	checking machinery would not operate during the handler run, and,
				1862	worse, memory permissions maps would not be updated, which could cause
				1863	spurious error reports once the handler had returned.
				1864
				1865	<p>An even worse thing would happen if the signal handler longjmp'd
				1866	rather than returned: Valgrind would completely lose control of the
				1867	client program.
				1868
				1869	<p>Upshot: we can't allow the client to install signal handlers
				1870	directly. Instead, Valgrind must catch, on behalf of the client, any
				1871	signal the client asks to catch, and must delivery it to the client on
				1872	the simulated CPU, not the real one. This involves considerable
				1873	gruesome fakery; see vg_signals.c for details.
				1874	<p>
				1875
				1876	<hr width="100%">
				1877
				1878	<a name="example"></a>
				1879	<h2>6  Example</h2>
				1880	This is the log for a run of a small program. The program is in fact
				1881	correct, and the reported error is as the result of a potentially serious
				1882	code generation bug in GNU g++ (snapshot 20010527).
				1883	<pre>
				1884	sewardj@phoenix:~/newmat10$
				1885	~/Valgrind-6/valgrind -v ./bogon
				1886	==25832== Valgrind 0.10, a memory error detector for x86 RedHat 7.1.
				1887	==25832== Copyright (C) 2000-2001, and GNU GPL'd, by Julian Seward.
				1888	==25832== Startup, with flags:
				1889	==25832== --suppressions=/home/sewardj/Valgrind/redhat71.supp
				1890	==25832== reading syms from /lib/ld-linux.so.2
				1891	==25832== reading syms from /lib/libc.so.6
				1892	==25832== reading syms from /mnt/pima/jrs/Inst/lib/libgcc_s.so.0
				1893	==25832== reading syms from /lib/libm.so.6
				1894	==25832== reading syms from /mnt/pima/jrs/Inst/lib/libstdc++.so.3
				1895	==25832== reading syms from /home/sewardj/Valgrind/valgrind.so
				1896	==25832== reading syms from /proc/self/exe
				1897	==25832== loaded 5950 symbols, 142333 line number locations
				1898	==25832==
				1899	==25832== Invalid read of size 4
				1900	==25832== at 0x8048724: _ZN10BandMatrix6ReSizeEiii (bogon.cpp:45)
				1901	==25832== by 0x80487AF: main (bogon.cpp:66)
				1902	==25832== by 0x40371E5E: __libc_start_main (libc-start.c:129)
				1903	==25832== by 0x80485D1: (within /home/sewardj/newmat10/bogon)
				1904	==25832== Address 0xBFFFF74C is not stack'd, malloc'd or free'd
				1905	==25832==
				1906	==25832== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
				1907	==25832== malloc/free: in use at exit: 0 bytes in 0 blocks.
				1908	==25832== malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
				1909	==25832== For a detailed leak analysis, rerun with: --leak-check=yes
				1910	==25832==
				1911	==25832== exiting, did 1881 basic blocks, 0 misses.
				1912	==25832== 223 translations, 3626 bytes in, 56801 bytes out.
				1913	</pre>
				1914	<p>The GCC folks fixed this about a week before gcc-3.0 shipped.
				1915	<hr width="100%">
				1916	<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	1917
				1918
				1919
				1920	<a name="cache"></a>
				1921	<h2>7  Cache profiling</h2>
				1922	As well as memory debugging, Valgrind also allows you to do cache simulations
				1923	and annotate your source line-by-line with the number of cache misses. In
				1924	particular, it records:
				1925	<ul>
				1926	<li>L1 instruction cache reads and misses;
				1927	<li>L1 data cache reads and read misses, writes and write misses;
				1928	<li>L2 unified cache reads and read misses, writes and writes misses.
				1929	</ul>
				1930	On a modern x86 machine, an L1 miss will typically cost around 10 cycles,
				1931	and an L2 miss can cost as much as 200 cycles. Detailed cache profiling can be
njn	7cfd572	2002-05-03 17:51:10 +0000	[diff] [blame^]	1932	very useful for improving the performance of your program.<p>
				1933
				1934	Also, since one instruction cache read is performed per instruction executed,
				1935	you can find out how many instructions are executed per line, which can be
				1936	useful for optimisation and test coverage.<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	1937
				1938	Please note that this is an experimental feature. Any feedback, bug-fixes,
				1939	suggestions, etc, welcome.
				1940
				1941
				1942	<h3>7.1  Overview</h3>
				1943	First off, as for normal Valgrind use, you probably want to turn on debugging
				1944	info (the <code>-g</code> flag). But by contrast with normal Valgrind use, you
				1945	probably <b>do</b> want to turn optimisation on, since you should profile your
				1946	program as it will be normally run.
				1947
				1948	The three steps are:
				1949	<ol>
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	1950	<li>Generate a cache simulator for your machine's cache
				1951	configuration with the supplied <code>vg_cachegen</code>
				1952	program, and recompile Valgrind with <code>make install</code>.
				1953	<p>
				1954	The default settings are for an AMD Athlon, and you will get
				1955	useful information with the defaults, so you can skip this step
				1956	if you want. Nevertheless, for accurate cache profiles you will
				1957	need use <code>vg_cachegen</code> to customise
				1958	<code>cachegrind</code> for your system.
				1959	<p>
				1960	This step only needs to be done once, unless you are interested
				1961	in simulating different cache configurations (eg. first
				1962	concentrating on instruction cache misses, then on data cache
				1963	misses).
				1964	</li>
				1965	<p>
				1966	<li>Run your program with <code>cachegrind</code> in front of the
				1967	normal command line invocation. When the program finishes,
				1968	Valgrind will print summary cache statistics. It also collects
				1969	line-by-line information in a file <code>cachegrind.out</code>.
				1970	<p>
				1971	This step should be done every time you want to collect
				1972	information about a new program, a changed program, or about the
				1973	same program with different input.
				1974	</li>
				1975	<p>
				1976	<li>Generate a function-by-function summary, and possibly annotate
				1977	source files with 'vg_annotate'. Source files to annotate can be
				1978	specified manually, or manually on the command line, or
				1979	"interesting" source files can be annotated automatically with
				1980	the <code>--auto=yes</code> option. You can annotate C/C++
				1981	files or assembly language files equally easily.</li>
				1982	<p>
				1983	This step can be performed as many times as you like for each
				1984	Step 2. You may want to do multiple annotations showing
				1985	different information each time.<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	1986	</ol>
				1987
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	1988	The steps are described in detail in the following sections.<p>
				1989
				1990
				1991	<a name="generate"></a>
				1992	<h3>7.3  Generating a cache simulator</h3>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	1993
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	1994	Although Valgrind comes with a pre-generated cache simulator, it most
				1995	likely won't match the cache configuration of your machine, so you
				1996	should generate a new simulator.<p>
				1997
				1998	You need to generate three files, one for each of the I1, D1 and L2
				1999	caches. For each cache, you need to know the:
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2000	<ul>
				2001	<li>Cache size (bytes);
				2002	<li>Line size (bytes);
				2003	<li>Associativity.
				2004	</ul>
				2005
				2006	vg_cachegen takes three options:
				2007	<ul>
				2008	<li><code>--I1=size,line_size,associativity</code>
				2009	<li><code>--D1=size,line_size,associativity</code>
				2010	<li><code>--L2=size,line_size,associativity</code>
				2011	</ul>
				2012
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2013	You can specify one, two or all three caches per invocation of
				2014	vg_cachegen. It checks that the configuration is sensible before
				2015	generating the simulators; to see the allowed values, run
				2016	<code>vg_cachegen -h</code>.<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2017
				2018	An example invocation would be:
				2019
				2020	<blockquote><code>
				2021	vg_cachegen --I1=65536,64,2 --D1=65536,64,2 --L2=262144,64,8
				2022	</code></blockquote>
				2023
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2024	This simulates a machine with a 128KB split L1 2-way associative
				2025	cache, and a 256KB unified 8-way associative L2 cache. Both caches
				2026	have 64B lines.<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2027
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2028	If you don't know your cache configuration, you'll have to find it
				2029	out. (Ideally <code>vg_cachegen</code> could auto-identify your cache
				2030	configuration using the CPUID instruction, which could be done
				2031	automatically during installation, and this whole step could be
				2032	skipped.)<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2033
				2034
				2035	<h3>7.4  Cache simulation specifics</h3>
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2036
				2037	<code>vg_cachegen</code> only generates simulations for a machine with
				2038	a split L1 cache and a unified L2 cache. This configuration is used
				2039	for all (modern) x86-based machines we are aware of. Old Cyrix CPUs
				2040	had a unified I and D L1 cache, but they are ancient history now.<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2041
				2042	The more specific characteristics of the simulation are as follows.
				2043
				2044	<ul>
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2045	<li>Write-allocate: when a write miss occurs, the block written to
				2046	is brought into the D1 cache. Most modern caches have this
				2047	property.</li><p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2048
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2049	<li>Bit-selection hash function: the line(s) in the cache to which a
				2050	memory block maps is chosen by the middle bits M--(M+N-1) of the
				2051	byte address, where:
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2052	<ul>
				2053	<li> line size = 2^M bytes </li>
				2054	<li>(cache size / line size) = 2^N bytes</li>
				2055	</ul> </li><p>
				2056
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2057	<li>Inclusive L2 cache: the L2 cache replicates all the entries of
				2058	the L1 cache. This is standard on Pentium chips, but AMD
				2059	Athlons use an exclusive L2 cache that only holds blocks evicted
				2060	from L1. Ditto AMD Durons and most modern VIAs.</li><p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2061	</ul>
				2062
				2063	Other noteworthy behaviour:
				2064
				2065	<ul>
				2066	<li>References that straddle two cache lines are treated as follows:</li>
				2067	<ul>
				2068	<li>If both blocks hit --> counted as one hit</li>
				2069	<li>If one block hits, the other misses --> counted as one miss</li>
				2070	<li>If both blocks miss --> counted as one miss (not two)</li>
				2071	</ul><p>
				2072
				2073	<li>Instructions that modify a memory location (eg. <code>inc</code> and
				2074	<code>dec</code>) are counted as doing just a read, ie. a single data
				2075	reference. This may seem strange, but since the write can never cause a
				2076	miss (the read guarantees the block is in the cache) it's not very
				2077	interesting.<p>
				2078
				2079	Thus it measures not the number of times the data cache is accessed, but
				2080	the number of times a data cache miss could occur.<p>
				2081	</li>
				2082	</ul>
				2083
				2084	If you are interested in simulating a cache with different properties, it is
				2085	not particularly hard to write your own cache simulator, or to modify existing
				2086	ones in <code>vg_cachesim_I1.c</code>, <code>vg_cachesim_I1.c</code> and
				2087	<code>vg_cachesim_I1.c</code>. We'd be interested to hear from anyone who
				2088	does.
				2089
				2090
				2091	<a name="profile"></a>
				2092	<h3>7.5  Profiling programs</h3>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2093
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2094	Cache profiling is enabled by using the <code>--cachesim=yes</code>
				2095	option to the <code>valgrind</code> shell script. Alternatively, it
				2096	is probably more convenient to use the <code>cachegrind</code> script.
				2097	This automatically turns off Valgrind's memory checking functions,
				2098	since the cache simulation is slow enough already, and you probably
				2099	don't want to do both at once.
				2100	<p>
				2101	To gather cache profiling information about the program <code>ls
				2102	-l<code, type:
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2103
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2104	<blockquote><code>cachegrind ls -l</code></blockquote>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2105
				2106	The program will execute (slowly). Upon completion, summary statistics
				2107	that look like this will be printed:
				2108
				2109	<pre>
				2110	==31751== I refs: 27,742,716
				2111	==31751== I1 misses: 276
				2112	==31751== L2 misses: 275
				2113	==31751== I1 miss rate: 0.0%
				2114	==31751== L2i miss rate: 0.0%
				2115	==31751==
				2116	==31751== D refs: 15,430,290 (10,955,517 rd + 4,474,773 wr)
				2117	==31751== D1 misses: 41,185 ( 21,905 rd + 19,280 wr)
				2118	==31751== L2 misses: 23,085 ( 3,987 rd + 19,098 wr)
				2119	==31751== D1 miss rate: 0.2% ( 0.1% + 0.4%)
				2120	==31751== L2d miss rate: 0.1% ( 0.0% + 0.4%)
				2121	==31751==
				2122	==31751== L2 misses: 23,360 ( 4,262 rd + 19,098 wr)
				2123	==31751== L2 miss rate: 0.0% ( 0.0% + 0.4%)
				2124	</pre>
				2125
				2126	Cache accesses for instruction fetches are summarised first, giving the
				2127	number of fetches made (this is the number of instructions executed, which
				2128	can be useful to know in its own right), the number of I1 misses, and the
				2129	number of L2 instruction (<code>L2i</code>) misses.<p>
				2130
				2131	Cache accesses for data follow. The information is similar to that of the
				2132	instruction fetches, except that the values are also shown split between reads
				2133	and writes (note each row's <code>rd</code> and <code>wr</code> values add up
				2134	to the row's total).<p>
				2135
				2136	Combined instruction and data figures for the L2 cache follow that.<p>
				2137
				2138
				2139	<h3>7.6  Output file</h3>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2140
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2141	As well as printing summary information, Cachegrind also writes
				2142	line-by-line cache profiling information to a file named
				2143	<code>cachegrind.out</code>. This file is human-readable, but is best
				2144	interpreted by the accompanying program <code>vg_annotate</code>,
				2145	described in the next section.
				2146	<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2147	Things to note about the <code>cachegrind.out</code> file:
				2148	<ul>
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2149	<li>It is written every time <code>valgrind --cachesim=yes</code> or
				2150	<code>cachegrind</code> is run, and will overwrite any existing
				2151	<code>cachegrind.out</code> in the current directory.</li>
				2152	<p>
				2153	<li>It can be huge: <code>ls -l</code> generates a file of about
				2154	350KB. Browsing a few files and web pages with a Konqueror
				2155	built with full debugging information generates a file
				2156	of around 15 MB.</li>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2157	</ul>
				2158
				2159
				2160	<a name="annotate"></a>
				2161	<h3>7.7  Annotating C/C++ programs</h3>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2162
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2163	Before using <code>vg_annotate</code>, it is worth widening your
				2164	window to be at least 120-characters wide if possible, as the output
				2165	lines can be quite long.
				2166	<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2167	To get a function-by-function summary, run <code>vg_annotate</code> in
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2168	directory containing a <code>cachegrind.out</code> file. The output
				2169	looks like this:
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2170
				2171	<pre>
				2172	--------------------------------------------------------------------------------
				2173	I1 cache: 65536 B, 64 B, 2-way associative
				2174	D1 cache: 65536 B, 64 B, 2-way associative
				2175	L2 cache: 262144 B, 64 B, 8-way associative
				2176	Command: concord vg_to_ucode.c
				2177	Events recorded: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				2178	Events shown: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				2179	Event sort order: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				2180	Threshold: 99%
				2181	Chosen for annotation:
				2182	Auto-annotation: on
				2183
				2184	--------------------------------------------------------------------------------
				2185	Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				2186	--------------------------------------------------------------------------------
				2187	27,742,716 276 275 10,955,517 21,905 3,987 4,474,773 19,280 19,098 PROGRAM TOTALS
				2188
				2189	--------------------------------------------------------------------------------
				2190	Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw file:function
				2191	--------------------------------------------------------------------------------
				2192	8,821,482 5 5 2,242,702 1,621 73 1,794,230 0 0 getc.c:_IO_getc
				2193	5,222,023 4 4 2,276,334 16 12 875,959 1 1 concord.c:get_word
				2194	2,649,248 2 2 1,344,810 7,326 1,385 . . . vg_main.c:strcmp
				2195	2,521,927 2 2 591,215 0 0 179,398 0 0 concord.c:hash
				2196	2,242,740 2 2 1,046,612 568 22 448,548 0 0 ctype.c:tolower
				2197	1,496,937 4 4 630,874 9,000 1,400 279,388 0 0 concord.c:insert
				2198	897,991 51 51 897,831 95 30 62 1 1 ???:???
				2199	598,068 1 1 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__flockfile
				2200	598,068 0 0 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__funlockfile
				2201	598,024 4 4 213,580 35 16 149,506 0 0 vg_clientmalloc.c:malloc
				2202	446,587 1 1 215,973 2,167 430 129,948 14,057 13,957 concord.c:add_existing
				2203	341,760 2 2 128,160 0 0 128,160 0 0 vg_clientmalloc.c:vg_trap_here_WRAPPER
				2204	320,782 4 4 150,711 276 0 56,027 53 53 concord.c:init_hash_table
				2205	298,998 1 1 106,785 0 0 64,071 1 1 concord.c:create
				2206	149,518 0 0 149,516 0 0 1 0 0 ???:tolower@@GLIBC_2.0
				2207	149,518 0 0 149,516 0 0 1 0 0 ???:fgetc@@GLIBC_2.0
				2208	95,983 4 4 38,031 0 0 34,409 3,152 3,150 concord.c:new_word_node
				2209	85,440 0 0 42,720 0 0 21,360 0 0 vg_clientmalloc.c:vg_bogus_epilogue
				2210	</pre>
				2211
				2212	First up is a summary of the annotation options:
				2213
				2214	<ul>
				2215	<li>I1 cache, D1 cache, L2 cache: cache configuration. So you know the
				2216	configuration with which these results were obtained.</li><p>
				2217
				2218	<li>Command: the command line invocation of the program under
				2219	examination.</li><p>
				2220
				2221	<li>Events recorded: event abbreviations are:<p>
				2222	<ul>
				2223	<li><code>Ir </code>: I cache reads (ie. instructions executed)</li>
				2224	<li><code>I1mr</code>: I1 cache read misses</li>
				2225	<li><code>I2mr</code>: L2 cache instruction read misses</li>
				2226	<li><code>Dr </code>: D cache reads (ie. memory reads)</li>
				2227	<li><code>D1mr</code>: D1 cache read misses</li>
				2228	<li><code>D2mr</code>: L2 cache data read misses</li>
				2229	<li><code>Dw </code>: D cache writes (ie. memory writes)</li>
				2230	<li><code>D1mw</code>: D1 cache write misses</li>
				2231	<li><code>D2mw</code>: L2 cache data write misses</li>
				2232	</ul><p>
				2233	Note that D1 total accesses is given by <code>D1mr</code> +
				2234	<code>D1mw</code>, and that L2 total accesses is given by
				2235	<code>I2mr</code> + <code>D2mr</code> + <code>D2mw</code>.</li><p>
				2236
				2237	<li>Events shown: the events shown (a subset of events gathered). This can
				2238	be adjusted with the <code>--show</code> option.</li><p>
				2239
				2240	<li>Event sort order: the sort order in which functions are shown. For
				2241	example, in this case the functions are sorted from highest
				2242	<code>Ir</code> counts to lowest. If two functions have identical
				2243	<code>Ir</code> counts, they will then be sorted by <code>I1mr</code>
				2244	counts, and so on. This order can be adjusted with the
				2245	<code>--sort</code> option.<p>
				2246
				2247	Note that this dictates the order the functions appear. It is <b>not</b>
				2248	the order in which the columns appear; that is dictated by the "events
				2249	shown" line (and can be changed with the <code>--sort</code> option).
				2250	</li><p>
				2251
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2252	<li>Threshold: <code>vg_annotate</code> by default omits functions
				2253	that cause very low numbers of misses to avoid drowning you in
				2254	information. In this case, vg_annotate shows summaries the
				2255	functions that account for 99% of the <code>Ir</code> counts;
				2256	<code>Ir</code> is chosen as the threshold event since it is the
				2257	primary sort event. The threshold can be adjusted with the
				2258	<code>--threshold</code> option.</li><p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2259
				2260	<li>Chosen for annotation: names of files specified manually for annotation;
				2261	in this case none.</li><p>
				2262
				2263	<li>Auto-annotation: whether auto-annotation was requested via the
				2264	<code>--auto=yes</code> option. In this case no.</li><p>
				2265	</ul>
				2266
				2267	Then follows summary statistics for the whole program. These are similar
				2268	to the summary provided when running <code>valgrind --cachesim=yes</code>.<p>
				2269
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2270	Then follows function-by-function statistics. Each function is
				2271	identified by a <code>file_name:function_name</code> pair. If a column
				2272	contains only a dot it means the function never performs
				2273	that event (eg. the third row shows that <code>strcmp()</code>
				2274	contains no instructions that write to memory). The name
				2275	<code>???</code> is used if the the file name and/or function name
				2276	could not be determined from debugging information. If most of the
				2277	entries have the form <code>???:???</code> the program probably wasn't
				2278	compiled with <code>-g</code>. <p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2279
				2280	It is worth noting that functions will come from three types of source files:
				2281	<ol>
				2282	<li> From the profiled program (<code>concord.c</code> in this example).</li>
				2283	<li>From libraries (eg. <code>getc.c</code>)</li>
				2284	<li>From Valgrind's implementation of some libc functions (eg.
				2285	<code>vg_clientmalloc.c:malloc</code>). These are recognisable because
				2286	the filename begins with <code>vg_</code>, and is probably one of
				2287	<code>vg_main.c</code>, <code>vg_clientmalloc.c</code> or
				2288	<code>vg_mylibc.c</code>.
				2289	</li>
				2290	</ol>
				2291
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2292	There are two ways to annotate source files -- by choosing them
				2293	manually, or with the <code>--auto=yes</code> option. To do it
				2294	manually, just specify the filenames as arguments to
				2295	<code>vg_annotate</code>. For example, the output from running
				2296	<code>vg_annotate concord.c</code> for our example produces the same
				2297	output as above followed by an annotated version of
				2298	<code>concord.c</code>, a section of which looks like:
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2299
				2300	<pre>
				2301	--------------------------------------------------------------------------------
				2302	-- User-annotated source: concord.c
				2303	--------------------------------------------------------------------------------
				2304	Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				2305
				2306	[snip]
				2307
				2308	. . . . . . . . . void init_hash_table(char file_name, Word_Node table[])
				2309	3 1 1 . . . 1 0 0 {
				2310	. . . . . . . . . FILE *file_ptr;
				2311	. . . . . . . . . Word_Info *data;
				2312	1 0 0 . . . 1 1 1 int line = 1, i;
				2313	. . . . . . . . .
				2314	5 0 0 . . . 3 0 0 data = (Word_Info *) create(sizeof(Word_Info));
				2315	. . . . . . . . .
				2316	4,991 0 0 1,995 0 0 998 0 0 for (i = 0; i < TABLE_SIZE; i++)
				2317	3,988 1 1 1,994 0 0 997 53 52 table[i] = NULL;
				2318	. . . . . . . . .
				2319	. . . . . . . . . /* Open file, check it. */
				2320	6 0 0 1 0 0 4 0 0 file_ptr = fopen(file_name, "r");
				2321	2 0 0 1 0 0 . . . if (!(file_ptr)) {
				2322	. . . . . . . . . fprintf(stderr, "Couldn't open '%s'.\n", file_name);
				2323	1 1 1 . . . . . . exit(EXIT_FAILURE);
				2324	. . . . . . . . . }
				2325	. . . . . . . . .
				2326	165,062 1 1 73,360 0 0 91,700 0 0 while ((line = get_word(data, line, file_ptr)) != EOF)
				2327	146,712 0 0 73,356 0 0 73,356 0 0 insert(data->;word, data->line, table);
				2328	. . . . . . . . .
				2329	4 0 0 1 0 0 2 0 0 free(data);
				2330	4 0 0 1 0 0 2 0 0 fclose(file_ptr);
				2331	3 0 0 2 0 0 . . . }
				2332	</pre>
				2333
				2334	(Although column widths are automatically minimised, a wide terminal is clearly
				2335	useful.)<p>
				2336
				2337	Each source file is clearly marked (<code>User-annotated source</code>) as
				2338	having been chosen manually for annotation. If the file was found in one of
				2339	the directories specified with the <code>-I</code>/<code>--include</code>
				2340	option, the directory and file are both given.<p>
				2341
				2342	Each line is annotated with its event counts. Events not applicable for a line
				2343	are represented by a `.'; this is useful for distinguishing between an event
				2344	which cannot happen, and one which can but did not.<p>
				2345
				2346	Sometimes only a small section of a source file is executed. To minimise
				2347	uninteresting output, Valgrind only shows annotated lines and lines within a
				2348	small distance of annotated lines. Gaps are marked with the line numbers so
				2349	you know which part of a file the shown code comes from, eg:
				2350
				2351	<pre>
				2352	(figures and code for line 704)
				2353	-- line 704 ----------------------------------------
				2354	-- line 878 ----------------------------------------
				2355	(figures and code for line 878)
				2356	</pre>
				2357
				2358	The amount of context to show around annotated lines is controlled by the
				2359	<code>--context</code> option.<p>
				2360
				2361	To get automatic annotation, run <code>vg_annotate --auto=yes</code>.
				2362	vg_annotate will automatically annotate every source file it can find that is
				2363	mentioned in the function-by-function summary. Therefore, the files chosen for
				2364	auto-annotation are affected by the <code>--sort</code> and
				2365	<code>--threshold</code> options. Each source file is clearly marked
				2366	(<code>Auto-annotated source</code>) as being chosen automatically. Any files
				2367	that could not be found are mentioned at the end of the output, eg:
				2368
				2369	<pre>
				2370	--------------------------------------------------------------------------------
				2371	The following files chosen for auto-annotation could not be found:
				2372	--------------------------------------------------------------------------------
				2373	getc.c
				2374	ctype.c
				2375	../sysdeps/generic/lockfile.c
				2376	</pre>
				2377
				2378	This is quite common for library files, since libraries are usually compiled
				2379	with debugging information, but the source files are often not present on a
				2380	system. If a file is chosen for annotation <b>both</b> manually and
				2381	automatically, it is marked as <code>User-annotated source</code>.
				2382
				2383	Use the <code>-I/--include</code> option to tell Valgrind where to look for
				2384	source files if the filenames found from the debugging information aren't
				2385	specific enough.
				2386
				2387	Beware that vg_annotate can take some time to digest large
				2388	<code>cachegrind.out</code> files, eg. 30 seconds or more. Also beware that
				2389	auto-annotation can produce a lot of output if your program is large!
				2390
				2391
				2392	<h3>7.8  Annotating assembler programs</h3>
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2393
				2394	Valgrind can annotate assembler programs too, or annotate the
				2395	assembler generated for your C program. Sometimes this is useful for
				2396	understanding what is really happening when an interesting line of C
				2397	code is translated into multiple instructions.<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2398
				2399	To do this, you just need to assemble your <code>.s</code> files with
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2400	assembler-level debug information. gcc doesn't do this, but you can
				2401	use the GNU assembler with the <code>--gstabs</code> option to
				2402	generate object files with this information, eg:
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2403
				2404	<blockquote><code>as --gstabs foo.s</code></blockquote>
				2405
				2406	You can then profile and annotate source files in the same way as for C/C++
				2407	programs.
				2408
				2409
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2410	<h3>7.9  <code>vg_annotate</code> options</h3>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2411	<ul>
				2412	<li><code>-h, --help</code></li><p>
				2413	<li><code>-v, --version</code><p>
				2414
				2415	Help and version, as usual.</li>
				2416
				2417	<li><code>--sort=A,B,C</code> [default: order in
				2418	<code>cachegrind.out</code>]<p>
				2419	Specifies the events upon which the sorting of the function-by-function
				2420	entries will be based. Useful if you want to concentrate on eg. I cache
				2421	misses (<code>--sort=I1mr,I2mr</code>), or D cache misses
				2422	(<code>--sort=D1mr,D2mr</code>), or L2 misses
				2423	(<code>--sort=D2mr,I2mr</code>).</li><p>
				2424
				2425	<li><code>--show=A,B,C</code> [default: all, using order in
				2426	<code>cachegrind.out</code>]<p>
				2427	Specifies which events to show (and the column order). Default is to use
				2428	all present in the <code>cachegrind.out</code> file (and use the order in
				2429	the file).</li><p>
				2430
				2431	<li><code>--threshold=X</code> [default: 99%] <p>
				2432	Sets the threshold for the function-by-function summary. Functions are
				2433	shown that account for more than X% of all the primary sort events. If
				2434	auto-annotating, also affects which files are annotated.</li><p>
				2435
				2436	<li><code>--auto=no</code> [default]<br>
				2437	<code>--auto=yes</code> <p>
				2438	When enabled, automatically annotates every file that is mentioned in the
				2439	function-by-function summary that can be found. Also gives a list of
				2440	those that couldn't be found.
				2441
				2442	<li><code>--context=N</code> [default: 8]<p>
				2443	Print N lines of context before and after each annotated line. Avoids
				2444	printing large sections of source files that were not executed. Use a
				2445	large number (eg. 10,000) to show all source lines.
				2446	</li><p>
				2447
				2448	<li><code>-I=<dir>, --include=<dir></code>
				2449	[default: empty string]<p>
				2450	Adds a directory to the list in which to search for files. Multiple
				2451	-I/--include options can be given to add multiple directories.
				2452	</ul>
				2453
				2454
				2455	<h3>7.10  Warnings</h3>
				2456	There are a couple of situations in which vg_annotate issues warnings.
				2457
				2458	<ul>
				2459	<li>If a source file is more recent than the <code>cachegrind.out</code>
				2460	file. This is because the information in <code>cachegrind.out</code> is
				2461	only recorded with line numbers, so if the line numbers change at all in
				2462	the source (eg. lines added, deleted, swapped), any annotations will be
				2463	incorrect.<p>
				2464
				2465	<li>If information is recorded about line numbers past the end of a file.
				2466	This can be caused by the above problem, ie. shortening the source file
				2467	while using an old <code>cachegrind.out</code> file. If this happens,
				2468	the figures for the bogus lines are printed anyway (clearly marked as
				2469	bogus) in case they are important.</li><p>
				2470	</ul>
				2471
				2472
				2473	<h3>7.10  Things to watch out for</h3>
				2474	Some odd things that can occur during annotation:
				2475
				2476	<ul>
				2477	<li>If annotating at the assembler level, you might see something like this:
				2478
				2479	<pre>
				2480	1 0 0 . . . . . . leal -12(%ebp),%eax
				2481	1 0 0 . . . 1 0 0 movl %eax,84(%ebx)
				2482	2 0 0 0 0 0 1 0 0 movl $1,-20(%ebp)
				2483	. . . . . . . . . .align 4,0x90
				2484	1 0 0 . . . . . . movl $.LnrB,%eax
				2485	1 0 0 . . . 1 0 0 movl %eax,-16(%ebp)
				2486	</pre>
				2487
				2488	How can the third instruction be executed twice when the others are
				2489	executed only once? As it turns out, it isn't. Here's a dump of the
				2490	executable, from objdump:
				2491
				2492	<pre>
				2493	8048f25: 8d 45 f4 lea 0xfffffff4(%ebp),%eax
				2494	8048f28: 89 43 54 mov %eax,0x54(%ebx)
				2495	8048f2b: c7 45 ec 01 00 00 00 movl $0x1,0xffffffec(%ebp)
				2496	8048f32: 89 f6 mov %esi,%esi
				2497	8048f34: b8 08 8b 07 08 mov $0x8078b08,%eax
				2498	8048f39: 89 45 f0 mov %eax,0xfffffff0(%ebp)
				2499	</pre>
				2500
				2501	Notice the extra <code>mov %esi,%esi</code> instruction. Where did this
				2502	come from? The GNU assembler inserted it to serve as the two bytes of
				2503	padding needed to align the <code>movl $.LnrB,%eax</code> instruction on
				2504	a four-byte boundary, but pretended it didn't exist when adding debug
				2505	information. Thus when Valgrind reads the debug info it thinks that the
				2506	<code>movl $0x1,0xffffffec(%ebp)</code> instruction covers the address
				2507	range 0x8048f2b--0x804833 by itself, and attributes the counts for the
				2508	<code>mov %esi,%esi</code> to it.<p>
				2509	</li>
				2510
				2511	<li>
				2512	Inlined functions can cause strange results in the function-by-function
				2513	summary. If a function <code>inline_me()</code> is defined in
				2514	<code>foo.h</code> and inlined in the functions <code>f1()</code>,
				2515	<code>f2()</code> and <code>f3()</code> in <code>bar.c</code>, there will
				2516	not be a <code>foo.h:inline_me()</code> function entry. Instead, there
				2517	will be separate function entries for each inlining site, ie.
				2518	<code>foo.h:f1()</code>, <code>foo.h:f2()</code> and
				2519	<code>foo.h:f3()</code>. To find the total counts for
				2520	<code>foo.h:inline_me()</code>, add up the counts from each entry.<p>
				2521
				2522	The reason for this is that although the debug info output by gcc
				2523	indicates the switch from <code>bar.c</code> to <code>foo.h</code>, it
				2524	doesn't indicate the name of the function in <code>foo.h</code>, so
				2525	Valgrind keeps using the old one.<p>
				2526
				2527	<li>
				2528	Sometimes, the same filename might be represented with a relative name
				2529	and with an absolute name in different parts of the debug info, eg:
				2530	<code>/home/user/proj/proj.h</code> and <code>../proj.h</code>. In this
				2531	case, if you use auto-annotation, the file will be annotated twice with
				2532	the counts split between the two.<p>
				2533	</li>
				2534	</ul>
				2535
				2536	Note: stabs is not an easy format to read. If you come across bizarre
				2537	annotations that look like might be caused by a bug in the stabs reader,
				2538	please let us know.
				2539
				2540
				2541	<h3>7.11  Accuracy</h3>
				2542	Valgrind's cache profiling has a number of shortcomings:
				2543
				2544	<ul>
				2545	<li>It doesn't account for kernel activity -- the effect of system calls on
				2546	the cache contents is ignored.</li><p>
				2547
				2548	<li>It doesn't account for other process activity (although this is probably
				2549	desirable when considering a single program).</li><p>
				2550
				2551	<li>It doesn't account for virtual-to-physical address mappings; hence the
				2552	entire simulation is not a true representation of what's happening in the
				2553	cache.</li><p>
				2554
				2555	<li>It doesn't account for cache misses not visible at the instruction level,
				2556	eg. those arising from TLB misses, or speculative execution.</li><p>
njn	db75e4d	2002-04-30 12:46:22 +0000	[diff] [blame]	2557
				2558	<li>The instructions <code>bts</code>, <code>btr</code> and <code>btc</code>
				2559	will incorrectly be counted as doing a data read if both the arguments
				2560	are registers, eg:
				2561
				2562	<blockquote><code>btsl %eax, %edx</code></blockquote>
				2563
				2564	This should only happen rarely.
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2565	</ul>
				2566
				2567	Another thing worth nothing is that results are very sensitive. Changing the
				2568	size of the <code>valgrind.so</code> file, the size of the program being
				2569	profiled, or even the length of its name can perturb the results. Variations
				2570	will be small, but don't expect perfectly repeatable results if your program
				2571	changes at all.<p>
				2572
				2573	While these factors mean you shouldn't trust the results to be super-accurate,
				2574	hopefully they should be close enough to be useful.<p>
				2575
				2576
				2577	<h3>7.12  Todo</h3>
				2578	<ul>
				2579	<li>Use CPUID instruction to auto-identify cache configuration during
				2580	installation. This would save the user from having to know their cache
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2581	configuration and using vg_cachegen.</li>
				2582	<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2583	<li>Program start-up/shut-down calls a lot of functions that aren't
				2584	interesting and just complicate the output. Would be nice to exclude
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2585	these somehow.</li>
				2586	<p>
				2587	<li>Handle files with more than 65535 lines.</li><p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2588	</ul>
				2589	<hr width="100%">
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	2590	</body>
				2591	</html>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2592