Blame - memcheck/docs/manual.html - platform/external/valgrind

blob: dc66721359901a6b88ee4a338c07bee29d9c67e1 [file] [log] [blame]

sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1	<html>
				2	<head>
				3	<style type="text/css">
				4	body { background-color: #ffffff;
				5	color: #000000;
				6	font-family: Times, Helvetica, Arial;
				7	font-size: 14pt}
				8	h4 { margin-bottom: 0.3em}
				9	code { color: #000000;
				10	font-family: Courier;
				11	font-size: 13pt }
				12	pre { color: #000000;
				13	font-family: Courier;
				14	font-size: 13pt }
				15	a:link { color: #0000C0;
				16	text-decoration: none; }
				17	a:visited { color: #0000C0;
				18	text-decoration: none; }
				19	a:active { color: #0000C0;
				20	text-decoration: none; }
				21	</style>
				22	</head>
				23
				24	<body bgcolor="#ffffff">
				25
				26	<a name="title"> </a>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	27	<h1 align=center>Valgrind, snapshot 20020501</h1>
				28	<center>This manual was majorly updated on 20020501</center>
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	29	<p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	30
				31	<center>
				32	<a href="mailto:jseward@acm.org">jseward@acm.org<br>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	33	Copyright © 2000-2002 Julian Seward
				34	<p>
				35	Valgrind is licensed under the GNU General Public License,
				36	version 2<br>
				37	An open-source tool for finding memory-management problems in
				38	Linux-x86 executables.
				39	</center>
				40
				41	<p>
				42
				43	<hr width="100%">
				44	<a name="contents"></a>
				45	<h2>Contents of this manual</h2>
				46
				47	<h4>1  <a href="#intro">Introduction</a></h4>
				48	1.1  <a href="#whatfor">What Valgrind is for</a><br>
				49	1.2  <a href="#whatdoes">What it does with your program</a>
				50
				51	<h4>2  <a href="#howtouse">How to use it, and how to make sense
				52	of the results</a></h4>
				53	2.1  <a href="#starta">Getting started</a><br>
				54	2.2  <a href="#comment">The commentary</a><br>
				55	2.3  <a href="#report">Reporting of errors</a><br>
				56	2.4  <a href="#suppress">Suppressing errors</a><br>
				57	2.5  <a href="#flags">Command-line flags</a><br>
				58	2.6  <a href="#errormsgs">Explaination of error messages</a><br>
				59	2.7  <a href="#suppfiles">Writing suppressions files</a><br>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	60	2.8  <a href="#clientreq">The Client Request mechanism</a><br>
				61	2.9  <a href="#pthreads">Support for POSIX pthreads</a><br>
				62	2.10  <a href="#install">Building and installing</a><br>
				63	2.11  <a href="#problems">If you have problems</a><br>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	64
				65	<h4>3  <a href="#machine">Details of the checking machinery</a></h4>
				66	3.1  <a href="#vvalue">Valid-value (V) bits</a><br>
				67	3.2  <a href="#vaddress">Valid-address (A) bits</a><br>
				68	3.3  <a href="#together">Putting it all together</a><br>
				69	3.4  <a href="#signals">Signals</a><br>
				70	3.5  <a href="#leaks">Memory leak detection</a><br>
				71
				72	<h4>4  <a href="#limits">Limitations</a></h4>
				73
				74	<h4>5  <a href="#howitworks">How it works -- a rough overview</a></h4>
				75	5.1  <a href="#startb">Getting started</a><br>
				76	5.2  <a href="#engine">The translation/instrumentation engine</a><br>
				77	5.3  <a href="#track">Tracking the status of memory</a><br>
				78	5.4  <a href="#sys_calls">System calls</a><br>
				79	5.5  <a href="#sys_signals">Signals</a><br>
				80
				81	<h4>6  <a href="#example">An example</a></h4>
				82
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	83	<h4>7  <a href="#cache">Cache profiling</a></h4>
				84
				85	<h4>8  <a href="techdocs.html">The design and implementation of Valgrind</a></h4>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	86
				87	<hr width="100%">
				88
				89	<a name="intro"></a>
				90	<h2>1  Introduction</h2>
				91
				92	<a name="whatfor"></a>
				93	<h3>1.1  What Valgrind is for</h3>
				94
				95	Valgrind is a tool to help you find memory-management problems in your
				96	programs. When a program is run under Valgrind's supervision, all
				97	reads and writes of memory are checked, and calls to
				98	malloc/new/free/delete are intercepted. As a result, Valgrind can
				99	detect problems such as:
				100	<ul>
				101	<li>Use of uninitialised memory</li>
				102	<li>Reading/writing memory after it has been free'd</li>
				103	<li>Reading/writing off the end of malloc'd blocks</li>
				104	<li>Reading/writing inappropriate areas on the stack</li>
				105	<li>Memory leaks -- where pointers to malloc'd blocks are lost forever</li>
				106	</ul>
				107
				108	Problems like these can be difficult to find by other means, often
				109	lying undetected for long periods, then causing occasional,
				110	difficult-to-diagnose crashes.
				111
				112	<p>
				113	Valgrind is closely tied to details of the CPU, operating system and
				114	to a less extent, compiler and basic C libraries. This makes it
				115	difficult to make it portable, so I have chosen at the outset to
				116	concentrate on what I believe to be a widely used platform: Red Hat
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	117	Linux 7.2, on x86s. Valgrind uses the standard Unix
				118	<code>./configure</code>, <code>make</code>, <code>make install</code>
				119	mechanism, and I have attempted to ensure that it works on machines
				120	with kernel 2.2 or 2.4 and glibc 2.1.X or 2.2.X. This should cover
				121	the vast majority of modern Linux installations.
				122
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	123
				124	<p>
				125	Valgrind is licensed under the GNU General Public License, version
				126	2. Read the file LICENSE in the source distribution for details.
				127
				128	<a name="whatdoes">
				129	<h3>1.2  What it does with your program</h3>
				130
				131	Valgrind is designed to be as non-intrusive as possible. It works
				132	directly with existing executables. You don't need to recompile,
				133	relink, or otherwise modify, the program to be checked. Simply place
				134	the word <code>valgrind</code> at the start of the command line
				135	normally used to run the program. So, for example, if you want to run
				136	the command <code>ls -l</code> on Valgrind, simply issue the
				137	command: <code>valgrind ls -l</code>.
				138
				139	<p>Valgrind takes control of your program before it starts. Debugging
				140	information is read from the executable and associated libraries, so
				141	that error messages can be phrased in terms of source code
				142	locations. Your program is then run on a synthetic x86 CPU which
				143	checks every memory access. All detected errors are written to a
				144	log. When the program finishes, Valgrind searches for and reports on
				145	leaked memory.
				146
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	147	<p>You can run pretty much any dynamically linked ELF x86 executable
				148	using Valgrind. Programs run 25 to 50 times slower, and take a lot
				149	more memory, than they usually would. It works well enough to run
				150	large programs. For example, the Konqueror web browser from the KDE
				151	Desktop Environment, version 3.0, runs slowly but usably on Valgrind.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	152
				153	<p>Valgrind simulates every single instruction your program executes.
				154	Because of this, it finds errors not only in your application but also
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	155	in all supporting dynamically-linked (<code>.so</code>-format)
				156	libraries, including the GNU C library, the X client libraries, Qt, if
				157	you work with KDE, and so on. That often includes libraries, for
				158	example the GNU C library, which contain memory access violations, but
				159	which you cannot or do not want to fix.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	160
				161	<p>Rather than swamping you with errors in which you are not
				162	interested, Valgrind allows you to selectively suppress errors, by
				163	recording them in a suppressions file which is read when Valgrind
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	164	starts up. The build mechanism attempts to select suppressions which
				165	give reasonable behaviour for the libc and XFree86 versions detected
				166	on your machine.
				167
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	168
				169	<p><a href="#example">Section 6</a> shows an example of use.
				170	<p>
				171	<hr width="100%">
				172
				173	<a name="howtouse"></a>
				174	<h2>2  How to use it, and how to make sense of the results</h2>
				175
				176	<a name="starta"></a>
				177	<h3>2.1  Getting started</h3>
				178
				179	First off, consider whether it might be beneficial to recompile your
				180	application and supporting libraries with optimisation disabled and
				181	debugging info enabled (the <code>-g</code> flag). You don't have to
				182	do this, but doing so helps Valgrind produce more accurate and less
				183	confusing error reports. Chances are you're set up like this already,
				184	if you intended to debug your program with GNU gdb, or some other
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	185	debugger.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	186
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	187	<p>
				188	A plausible compromise is to use <code>-g -O</code>.
				189	Optimisation levels above <code>-O</code> have been observed, on very
				190	rare occasions, to cause gcc to generate code which fools Valgrind's
				191	error tracking machinery into wrongly reporting uninitialised value
				192	errors. <code>-O</code> gets you the vast majority of the benefits of
				193	higher optimisation levels anyway, so you don't lose much there.
				194
				195	<p>
				196	Note that as of 1 May 2002 Valgrind does not understand the DWARF
				197	debugging format, which is unfortunate since the upcoming gcc-3.1 uses
				198	it by default. Valgrind only knows about the older "stabs" format.
				199	If you use gcc-3.1 or above, you can still ask for stabs-format debug
				200	info by passing <code>-gstabs</code> to gcc.
				201
				202	<p>
				203	Then just run your application, but place the word
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	204	<code>valgrind</code> in front of your usual command-line invokation.
				205	Note that you should run the real (machine-code) executable here. If
				206	your application is started by, for example, a shell or perl script,
				207	you'll need to modify it to invoke Valgrind on the real executables.
				208	Running such scripts directly under Valgrind will result in you
				209	getting error reports pertaining to <code>/bin/sh</code>,
				210	<code>/usr/bin/perl</code>, or whatever interpreter you're using.
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	211	This almost certainly isn't what you want and can be confusing.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	212
				213	<a name="comment"></a>
				214	<h3>2.2  The commentary</h3>
				215
				216	Valgrind writes a commentary, detailing error reports and other
				217	significant events. The commentary goes to standard output by
				218	default. This may interfere with your program, so you can ask for it
				219	to be directed elsewhere.
				220
				221	<p>All lines in the commentary are of the following form:<br>
				222	<pre>
				223	==12345== some-message-from-Valgrind
				224	</pre>
				225	<p>The <code>12345</code> is the process ID. This scheme makes it easy
				226	to distinguish program output from Valgrind commentary, and also easy
				227	to differentiate commentaries from different processes which have
				228	become merged together, for whatever reason.
				229
				230	<p>By default, Valgrind writes only essential messages to the commentary,
				231	so as to avoid flooding you with information of secondary importance.
				232	If you want more information about what is happening, re-run, passing
				233	the <code>-v</code> flag to Valgrind.
				234
				235
				236	<a name="report"></a>
				237	<h3>2.3  Reporting of errors</h3>
				238
				239	When Valgrind detects something bad happening in the program, an error
				240	message is written to the commentary. For example:<br>
				241	<pre>
				242	==25832== Invalid read of size 4
				243	==25832== at 0x8048724: BandMatrix::ReSize(int, int, int) (bogon.cpp:45)
				244	==25832== by 0x80487AF: main (bogon.cpp:66)
				245	==25832== by 0x40371E5E: __libc_start_main (libc-start.c:129)
				246	==25832== by 0x80485D1: (within /home/sewardj/newmat10/bogon)
				247	==25832== Address 0xBFFFF74C is not stack'd, malloc'd or free'd
				248	</pre>
				249
				250	<p>This message says that the program did an illegal 4-byte read of
				251	address 0xBFFFF74C, which, as far as it can tell, is not a valid stack
				252	address, nor corresponds to any currently malloc'd or free'd blocks.
				253	The read is happening at line 45 of <code>bogon.cpp</code>, called
				254	from line 66 of the same file, etc. For errors associated with an
				255	identified malloc'd/free'd block, for example reading free'd memory,
				256	Valgrind reports not only the location where the error happened, but
				257	also where the associated block was malloc'd/free'd.
				258
				259	<p>Valgrind remembers all error reports. When an error is detected,
				260	it is compared against old reports, to see if it is a duplicate. If
				261	so, the error is noted, but no further commentary is emitted. This
				262	avoids you being swamped with bazillions of duplicate error reports.
				263
				264	<p>If you want to know how many times each error occurred, run with
				265	the <code>-v</code> option. When execution finishes, all the reports
				266	are printed out, along with, and sorted by, their occurrence counts.
				267	This makes it easy to see which errors have occurred most frequently.
				268
				269	<p>Errors are reported before the associated operation actually
				270	happens. For example, if you program decides to read from address
				271	zero, Valgrind will emit a message to this effect, and the program
				272	will then duly die with a segmentation fault.
				273
				274	<p>In general, you should try and fix errors in the order that they
				275	are reported. Not doing so can be confusing. For example, a program
				276	which copies uninitialised values to several memory locations, and
				277	later uses them, will generate several error messages. The first such
				278	error message may well give the most direct clue to the root cause of
				279	the problem.
				280
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	281	<p>The process of detecting duplicate errors is quite an expensive
				282	one and can become a significant performance overhead if your program
				283	generates huge quantities of errors. To avoid serious problems here,
				284	Valgrind will simply stop collecting errors after 300 different errors
				285	have been seen, or 30000 errors in total have been seen. In this
				286	situation you might as well stop your program and fix it, because
				287	Valgrind won't tell you anything else useful after this. Note that
				288	the 300/30000 limits apply after suppressed errors are removed. These
				289	limits are defined in <code>vg_include.h</code> and can be increased
				290	if necessary.
				291
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	292	<a name="suppress"></a>
				293	<h3>2.4  Suppressing errors</h3>
				294
				295	Valgrind detects numerous problems in the base libraries, such as the
				296	GNU C library, and the XFree86 client libraries, which come
				297	pre-installed on your GNU/Linux system. You can't easily fix these,
				298	but you don't want to see these errors (and yes, there are many!) So
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	299	Valgrind reads a list of errors to suppress at startup.
				300	A default suppression file is cooked up by the
				301	<code>./configure</code> script.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	302
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	303	<p>You can modify and add to the suppressions file at your leisure,
				304	or, better, write your own. Multiple suppression files are allowed.
				305	This is useful if part of your project contains errors you can't or
				306	don't want to fix, yet you don't want to continuously be reminded of
				307	them.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	308
				309	<p>Each error to be suppressed is described very specifically, to
				310	minimise the possibility that a suppression-directive inadvertantly
				311	suppresses a bunch of similar errors which you did want to see. The
				312	suppression mechanism is designed to allow precise yet flexible
				313	specification of errors to suppress.
				314
				315	<p>If you use the <code>-v</code> flag, at the end of execution, Valgrind
				316	prints out one line for each used suppression, giving its name and the
				317	number of times it got used. Here's the suppressions used by a run of
				318	<code>ls -l</code>:
				319	<pre>
				320	--27579-- supp: 1 socketcall.connect(serv_addr)/__libc_connect/__nscd_getgrgid_r
				321	--27579-- supp: 1 socketcall.connect(serv_addr)/__libc_connect/__nscd_getpwuid_r
				322	--27579-- supp: 6 strrchr/_dl_map_object_from_fd/_dl_map_object
				323	</pre>
				324
				325	<a name="flags"></a>
				326	<h3>2.5  Command-line flags</h3>
				327
				328	You invoke Valgrind like this:
				329	<pre>
				330	valgrind [options-for-Valgrind] your-prog [options for your-prog]
				331	</pre>
				332
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	333	<p>Note that Valgrind also reads options from the environment variable
				334	<code>$VALGRIND</code>, and processes them before the command-line
				335	options.
				336
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	337	<p>Valgrind's default settings succeed in giving reasonable behaviour
				338	in most cases. Available options, in no particular order, are as
				339	follows:
				340	<ul>
				341	<li><code>--help</code></li><br>
				342
				343	<li><code>--version</code><br>
				344	<p>The usual deal.</li><br><p>
				345
				346	<li><code>-v --verbose</code><br>
				347	<p>Be more verbose. Gives extra information on various aspects
				348	of your program, such as: the shared objects loaded, the
				349	suppressions used, the progress of the instrumentation engine,
				350	and warnings about unusual behaviour.
				351	</li><br><p>
				352
				353	<li><code>-q --quiet</code><br>
				354	<p>Run silently, and only print error messages. Useful if you
				355	are running regression tests or have some other automated test
				356	machinery.
				357	</li><br><p>
				358
				359	<li><code>--demangle=no</code><br>
				360	<code>--demangle=yes</code> [the default]
				361	<p>Disable/enable automatic demangling (decoding) of C++ names.
				362	Enabled by default. When enabled, Valgrind will attempt to
				363	translate encoded C++ procedure names back to something
				364	approaching the original. The demangler handles symbols mangled
				365	by g++ versions 2.X and 3.X.
				366
				367	<p>An important fact about demangling is that function
				368	names mentioned in suppressions files should be in their mangled
				369	form. Valgrind does not demangle function names when searching
				370	for applicable suppressions, because to do otherwise would make
				371	suppressions file contents dependent on the state of Valgrind's
				372	demangling machinery, and would also be slow and pointless.
				373	</li><br><p>
				374
				375	<li><code>--num-callers=<number></code> [default=4]<br>
				376	<p>By default, Valgrind shows four levels of function call names
				377	to help you identify program locations. You can change that
				378	number with this option. This can help in determining the
				379	program's location in deeply-nested call chains. Note that errors
				380	are commoned up using only the top three function locations (the
				381	place in the current function, and that of its two immediate
				382	callers). So this doesn't affect the total number of errors
				383	reported.
				384	<p>
				385	The maximum value for this is 50. Note that higher settings
				386	will make Valgrind run a bit more slowly and take a bit more
				387	memory, but can be useful when working with programs with
				388	deeply-nested call chains.
				389	</li><br><p>
				390
				391	<li><code>--gdb-attach=no</code> [the default]<br>
				392	<code>--gdb-attach=yes</code>
				393	<p>When enabled, Valgrind will pause after every error shown,
				394	and print the line
				395	<br>
				396	<code>---- Attach to GDB ? --- [Return/N/n/Y/y/C/c] ----</code>
				397	<p>
				398	Pressing <code>Ret</code>, or <code>N</code> <code>Ret</code>
				399	or <code>n</code> <code>Ret</code>, causes Valgrind not to
				400	start GDB for this error.
				401	<p>
				402	<code>Y</code> <code>Ret</code>
				403	or <code>y</code> <code>Ret</code> causes Valgrind to
				404	start GDB, for the program at this point. When you have
				405	finished with GDB, quit from it, and the program will continue.
				406	Trying to continue from inside GDB doesn't work.
				407	<p>
				408	<code>C</code> <code>Ret</code>
				409	or <code>c</code> <code>Ret</code> causes Valgrind not to
				410	start GDB, and not to ask again.
				411	<p>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	412	<code>--gdb-attach=yes</code> conflicts with
				413	<code>--trace-children=yes</code>. You can't use them together.
				414	Valgrind refuses to start up in this situation. 1 May 2002:
				415	this is a historical relic which could be easily fixed if it
				416	gets in your way. Mail me and complain if this is a problem for
				417	you. </li><br><p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	418
				419	<li><code>--partial-loads-ok=yes</code> [the default]<br>
				420	<code>--partial-loads-ok=no</code>
				421	<p>Controls how Valgrind handles word (4-byte) loads from
				422	addresses for which some bytes are addressible and others
				423	are not. When <code>yes</code> (the default), such loads
				424	do not elicit an address error. Instead, the loaded V bytes
				425	corresponding to the illegal addresses indicate undefined, and
				426	those corresponding to legal addresses are loaded from shadow
				427	memory, as usual.
				428	<p>
				429	When <code>no</code>, loads from partially
				430	invalid addresses are treated the same as loads from completely
				431	invalid addresses: an illegal-address error is issued,
				432	and the resulting V bytes indicate valid data.
				433	</li><br><p>
				434
				435	<li><code>--sloppy-malloc=no</code> [the default]<br>
				436	<code>--sloppy-malloc=yes</code>
				437	<p>When enabled, all requests for malloc/calloc are rounded up
				438	to a whole number of machine words -- in other words, made
				439	divisible by 4. For example, a request for 17 bytes of space
				440	would result in a 20-byte area being made available. This works
				441	around bugs in sloppy libraries which assume that they can
				442	safely rely on malloc/calloc requests being rounded up in this
				443	fashion. Without the workaround, these libraries tend to
				444	generate large numbers of errors when they access the ends of
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	445	these areas.
				446	<p>
				447	Valgrind snapshots dated 17 Feb 2002 and later are
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	448	cleverer about this problem, and you should no longer need to
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	449	use this flag. To put it bluntly, if you do need to use this
				450	flag, your program violates the ANSI C semantics defined for
				451	<code>malloc</code> and <code>free</code>, even if it appears to
				452	work correctly, and you should fix it, at least if you hope for
				453	maximum portability.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	454	</li><br><p>
				455
				456	<li><code>--trace-children=no</code> [the default]</br>
				457	<code>--trace-children=yes</code>
				458	<p>When enabled, Valgrind will trace into child processes. This
				459	is confusing and usually not what you want, so is disabled by
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	460	default. As of 1 May 2002, tracing into a child process from a
				461	parent which uses <code>libpthread.so</code> is probably broken
				462	and is likely to cause breakage. Please report any such
				463	problems to me. </li><br><p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	464
				465	<li><code>--freelist-vol=<number></code> [default: 1000000]
				466	<p>When the client program releases memory using free (in C) or
				467	delete (C++), that memory is not immediately made available for
				468	re-allocation. Instead it is marked inaccessible and placed in
				469	a queue of freed blocks. The purpose is to delay the point at
				470	which freed-up memory comes back into circulation. This
				471	increases the chance that Valgrind will be able to detect
				472	invalid accesses to blocks for some significant period of time
				473	after they have been freed.
				474	<p>
				475	This flag specifies the maximum total size, in bytes, of the
				476	blocks in the queue. The default value is one million bytes.
				477	Increasing this increases the total amount of memory used by
				478	Valgrind but may detect invalid uses of freed blocks which would
				479	otherwise go undetected.</li><br><p>
				480
				481	<li><code>--logfile-fd=<number></code> [default: 2, stderr]
				482	<p>Specifies the file descriptor on which Valgrind communicates
				483	all of its messages. The default, 2, is the standard error
				484	channel. This may interfere with the client's own use of
				485	stderr. To dump Valgrind's commentary in a file without using
				486	stderr, something like the following works well (sh/bash
				487	syntax):<br>
				488	<code>
				489	valgrind --logfile-fd=9 my_prog 9> logfile</code><br>
				490	That is: tell Valgrind to send all output to file descriptor 9,
				491	and ask the shell to route file descriptor 9 to "logfile".
				492	</li><br><p>
				493
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	494	<li><code>--suppressions=<filename></code>
				495	[default: $PREFIX/lib/valgrind/default.supp]
				496	<p>Specifies an extra
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	497	file from which to read descriptions of errors to suppress. You
				498	may use as many extra suppressions files as you
				499	like.</li><br><p>
				500
				501	<li><code>--leak-check=no</code> [default]<br>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	502	<code>--leak-check=yes</code>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	503	<p>When enabled, search for memory leaks when the client program
				504	finishes. A memory leak means a malloc'd block, which has not
				505	yet been free'd, but to which no pointer can be found. Such a
				506	block can never be free'd by the program, since no pointer to it
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	507	exists. Leak checking is disabled by default because it tends
				508	to generate dozens of error messages. </li><br><p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	509
				510	<li><code>--show-reachable=no</code> [default]<br>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	511	<code>--show-reachable=yes</code>
				512	<p>When disabled, the memory leak detector only shows blocks for
				513	which it cannot find a pointer to at all, or it can only find a
				514	pointer to the middle of. These blocks are prime candidates for
				515	memory leaks. When enabled, the leak detector also reports on
				516	blocks which it could find a pointer to. Your program could, at
				517	least in principle, have freed such blocks before exit.
				518	Contrast this to blocks for which no pointer, or only an
				519	interior pointer could be found: they are more likely to
				520	indicate memory leaks, because you do not actually have a
				521	pointer to the start of the block which you can hand to
				522	<code>free</code>, even if you wanted to. </li><br><p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	523
				524	<li><code>--leak-resolution=low</code> [default]<br>
				525	<code>--leak-resolution=med</code> <br>
				526	<code>--leak-resolution=high</code>
				527	<p>When doing leak checking, determines how willing Valgrind is
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	528	to consider different backtraces to be the same. When set to
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	529	<code>low</code>, the default, only the first two entries need
				530	match. When <code>med</code>, four entries have to match. When
				531	<code>high</code>, all entries need to match.
				532	<p>
				533	For hardcore leak debugging, you probably want to use
				534	<code>--leak-resolution=high</code> together with
				535	<code>--num-callers=40</code> or some such large number. Note
				536	however that this can give an overwhelming amount of
				537	information, which is why the defaults are 4 callers and
				538	low-resolution matching.
				539	<p>
				540	Note that the <code>--leak-resolution=</code> setting does not
				541	affect Valgrind's ability to find leaks. It only changes how
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	542	the results are presented.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	543	</li><br><p>
				544
				545	<li><code>--workaround-gcc296-bugs=no</code> [default]<br>
				546	<code>--workaround-gcc296-bugs=yes</code> <p>When enabled,
				547	assume that reads and writes some small distance below the stack
				548	pointer <code>%esp</code> are due to bugs in gcc 2.96, and does
				549	not report them. The "small distance" is 256 bytes by default.
				550	Note that gcc 2.96 is the default compiler on some popular Linux
				551	distributions (RedHat 7.X, Mandrake) and so you may well need to
				552	use this flag. Do not use it if you do not have to, as it can
				553	cause real errors to be overlooked. A better option is to use a
				554	gcc/g++ which works properly; 2.95.3 seems to be a good choice.
				555	<p>
				556	Unfortunately (27 Feb 02) it looks like g++ 3.0.4 is similarly
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	557	buggy, so you may need to issue this flag if you use 3.0.4. A
				558	while later (early Apr 02) this is confirmed as a scheduling bug
				559	in g++-3.0.4.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	560	</li><br><p>
				561
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	562	<li><code>--cachesim=no</code> [default]<br>
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	563	<code>--cachesim=yes</code> <p>When enabled, turns off memory
				564	checking, and turns on cache profiling. Cache profiling is
sewardj	3984b85	2002-05-12 03:00:17 +0000	[diff] [blame]	565	described in detail in <a href="#cache">Section 7</a>.
				566	</li><br><p>
				567
sewardj	8d365b5	2002-05-12 10:52:16 +0000	[diff] [blame]	568	<li><code>--weird-hacks=hack1,hack2,...</code>
sewardj	3984b85	2002-05-12 03:00:17 +0000	[diff] [blame]	569	Pass miscellaneous hints to Valgrind which slightly modify the
				570	simulated behaviour in nonstandard or dangerous ways, possibly
				571	to help the simulation of strange features. By default no hacks
				572	are enabled. Use with caution! Currently known hacks are:
				573	<p>
				574	<ul>
				575	<li><code>ioctl-VTIME</code> Use this if you have a program
				576	which sets readable file descriptors to have a timeout by
				577	doing <code>ioctl</code> on them with a
				578	<code>TCSETA</code>-style command <b>and</b> a non-zero
				579	<code>VTIME</code> timeout value. This is considered
				580	potentially dangerous and therefore is not engaged by
				581	default, because it is (remotely) conceivable that it could
				582	cause threads doing <code>read</code> to incorrectly block
				583	the entire process.
				584	<p>
				585	You probably want to try this one if you have a program
				586	which unexpectedly blocks in a <code>read</code> from a file
				587	descriptor which you know to have been messed with by
				588	<code>ioctl</code>. This could happen, for example, if the
				589	descriptor is used to read input from some kind of screen
				590	handling library.
				591	<p>
				592	To find out if your program is blocking unexpectedly in the
				593	<code>read</code> system call, run with
				594	<code>--trace-syscalls=yes</code> flag.
				595	</ul>
				596
				597	</li><p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	598	</ul>
				599
				600	There are also some options for debugging Valgrind itself. You
				601	shouldn't need to use them in the normal run of things. Nevertheless:
				602
				603	<ul>
				604
				605	<li><code>--single-step=no</code> [default]<br>
				606	<code>--single-step=yes</code>
				607	<p>When enabled, each x86 insn is translated seperately into
				608	instrumented code. When disabled, translation is done on a
				609	per-basic-block basis, giving much better translations.</li><br>
				610	<p>
				611
				612	<li><code>--optimise=no</code><br>
				613	<code>--optimise=yes</code> [default]
				614	<p>When enabled, various improvements are applied to the
				615	intermediate code, mainly aimed at allowing the simulated CPU's
				616	registers to be cached in the real CPU's registers over several
				617	simulated instructions.</li><br>
				618	<p>
				619
				620	<li><code>--instrument=no</code><br>
				621	<code>--instrument=yes</code> [default]
				622	<p>When disabled, the translations don't actually contain any
				623	instrumentation.</li><br>
				624	<p>
				625
				626	<li><code>--cleanup=no</code><br>
				627	<code>--cleanup=yes</code> [default]
				628	<p>When enabled, various improvments are applied to the
				629	post-instrumented intermediate code, aimed at removing redundant
				630	value checks.</li><br>
				631	<p>
				632
				633	<li><code>--trace-syscalls=no</code> [default]<br>
				634	<code>--trace-syscalls=yes</code>
				635	<p>Enable/disable tracing of system call intercepts.</li><br>
				636	<p>
				637
				638	<li><code>--trace-signals=no</code> [default]<br>
				639	<code>--trace-signals=yes</code>
				640	<p>Enable/disable tracing of signal handling.</li><br>
				641	<p>
				642
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	643	<li><code>--trace-sched=no</code> [default]<br>
				644	<code>--trace-sched=yes</code>
				645	<p>Enable/disable tracing of thread scheduling events.</li><br>
				646	<p>
				647
sewardj	45b4b37	2002-04-16 22:50:32 +0000	[diff] [blame]	648	<li><code>--trace-pthread=none</code> [default]<br>
				649	<code>--trace-pthread=some</code> <br>
				650	<code>--trace-pthread=all</code>
				651	<p>Specifies amount of trace detail for pthread-related events.</li><br>
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	652	<p>
				653
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	654	<li><code>--trace-symtab=no</code> [default]<br>
				655	<code>--trace-symtab=yes</code>
				656	<p>Enable/disable tracing of symbol table reading.</li><br>
				657	<p>
				658
				659	<li><code>--trace-malloc=no</code> [default]<br>
				660	<code>--trace-malloc=yes</code>
				661	<p>Enable/disable tracing of malloc/free (et al) intercepts.
				662	</li><br>
				663	<p>
				664
				665	<li><code>--stop-after=<number></code>
				666	[default: infinity, more or less]
				667	<p>After <number> basic blocks have been executed, shut down
				668	Valgrind and switch back to running the client on the real CPU.
				669	</li><br>
				670	<p>
				671
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	672	<li><code>--dump-error=<number></code> [default: inactive]
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	673	<p>After the program has exited, show gory details of the
				674	translation of the basic block containing the <number>'th
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	675	error context. When used with <code>--single-step=yes</code>,
				676	can show the exact x86 instruction causing an error. This is
				677	all fairly dodgy and doesn't work at all if threads are
				678	involved.</li><br>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	679	<p>
				680
				681	<li><code>--smc-check=none</code><br>
				682	<code>--smc-check=some</code> [default]<br>
				683	<code>--smc-check=all</code>
				684	<p>How carefully should Valgrind check for self-modifying code
				685	writes, so that translations can be discarded?  When
				686	"none", no writes are checked. When "some", only writes
				687	resulting from moves from integer registers to memory are
				688	checked. When "all", all memory writes are checked, even those
				689	with which are no sane program would generate code -- for
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	690	example, floating-point writes.
				691	<p>
				692	NOTE that this is all a bit bogus. This mechanism has never
				693	been enabled in any snapshot of Valgrind which was made
				694	available to the general public, because the extra checks reduce
				695	performance, increase complexity, and I have yet to come across
				696	any programs which actually use self-modifying code. I think
				697	the flag is ignored.
				698	</li>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	699	</ul>
				700
				701
				702	<a name="errormsgs">
				703	<h3>2.6  Explaination of error messages</h3>
				704
				705	Despite considerable sophistication under the hood, Valgrind can only
				706	really detect two kinds of errors, use of illegal addresses, and use
				707	of undefined values. Nevertheless, this is enough to help you
				708	discover all sorts of memory-management nasties in your code. This
				709	section presents a quick summary of what error messages mean. The
				710	precise behaviour of the error-checking machinery is described in
				711	<a href="#machine">Section 4</a>.
				712
				713
				714	<h4>2.6.1  Illegal read / Illegal write errors</h4>
				715	For example:
				716	<pre>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	717	Invalid read of size 4
				718	at 0x40F6BBCC: (within /usr/lib/libpng.so.2.1.0.9)
				719	by 0x40F6B804: (within /usr/lib/libpng.so.2.1.0.9)
				720	by 0x40B07FF4: read_png_image__FP8QImageIO (kernel/qpngio.cpp:326)
				721	by 0x40AC751B: QImageIO::read() (kernel/qimage.cpp:3621)
				722	Address 0xBFFFF0E0 is not stack'd, malloc'd or free'd
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	723	</pre>
				724
				725	<p>This happens when your program reads or writes memory at a place
				726	which Valgrind reckons it shouldn't. In this example, the program did
				727	a 4-byte read at address 0xBFFFF0E0, somewhere within the
				728	system-supplied library libpng.so.2.1.0.9, which was called from
				729	somewhere else in the same library, called from line 326 of
				730	qpngio.cpp, and so on.
				731
				732	<p>Valgrind tries to establish what the illegal address might relate
				733	to, since that's often useful. So, if it points into a block of
				734	memory which has already been freed, you'll be informed of this, and
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	735	also where the block was free'd at. Likewise, if it should turn out
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	736	to be just off the end of a malloc'd block, a common result of
				737	off-by-one-errors in array subscripting, you'll be informed of this
				738	fact, and also where the block was malloc'd.
				739
				740	<p>In this example, Valgrind can't identify the address. Actually the
				741	address is on the stack, but, for some reason, this is not a valid
				742	stack address -- it is below the stack pointer, %esp, and that isn't
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	743	allowed. In this particular case it's probably caused by gcc
				744	generating invalid code, a known bug in various flavours of gcc.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	745
				746	<p>Note that Valgrind only tells you that your program is about to
				747	access memory at an illegal address. It can't stop the access from
				748	happening. So, if your program makes an access which normally would
				749	result in a segmentation fault, you program will still suffer the same
				750	fate -- but you will get a message from Valgrind immediately prior to
				751	this. In this particular example, reading junk on the stack is
				752	non-fatal, and the program stays alive.
				753
				754
				755	<h4>2.6.2  Use of uninitialised values</h4>
				756	For example:
				757	<pre>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	758	Conditional jump or move depends on uninitialised value(s)
				759	at 0x402DFA94: _IO_vfprintf (_itoa.h:49)
				760	by 0x402E8476: _IO_printf (printf.c:36)
				761	by 0x8048472: main (tests/manuel1.c:8)
				762	by 0x402A6E5E: __libc_start_main (libc-start.c:129)
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	763	</pre>
				764
				765	<p>An uninitialised-value use error is reported when your program uses
				766	a value which hasn't been initialised -- in other words, is undefined.
				767	Here, the undefined value is used somewhere inside the printf()
				768	machinery of the C library. This error was reported when running the
				769	following small program:
				770	<pre>
				771	int main()
				772	{
				773	int x;
				774	printf ("x = %d\n", x);
				775	}
				776	</pre>
				777
				778	<p>It is important to understand that your program can copy around
				779	junk (uninitialised) data to its heart's content. Valgrind observes
				780	this and keeps track of the data, but does not complain. A complaint
				781	is issued only when your program attempts to make use of uninitialised
				782	data. In this example, x is uninitialised. Valgrind observes the
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	783	value being passed to _IO_printf and thence to _IO_vfprintf, but makes
				784	no comment. However, _IO_vfprintf has to examine the value of x so it
				785	can turn it into the corresponding ASCII string, and it is at this
				786	point that Valgrind complains.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	787
				788	<p>Sources of uninitialised data tend to be:
				789	<ul>
				790	<li>Local variables in procedures which have not been initialised,
				791	as in the example above.</li><br><p>
				792
				793	<li>The contents of malloc'd blocks, before you write something
				794	there. In C++, the new operator is a wrapper round malloc, so
				795	if you create an object with new, its fields will be
				796	uninitialised until you fill them in, which is only Right and
				797	Proper.</li>
				798	</ul>
				799
				800
				801
				802	<h4>2.6.3  Illegal frees</h4>
				803	For example:
				804	<pre>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	805	Invalid free()
				806	at 0x4004FFDF: free (ut_clientmalloc.c:577)
				807	by 0x80484C7: main (tests/doublefree.c:10)
				808	by 0x402A6E5E: __libc_start_main (libc-start.c:129)
				809	by 0x80483B1: (within tests/doublefree)
				810	Address 0x3807F7B4 is 0 bytes inside a block of size 177 free'd
				811	at 0x4004FFDF: free (ut_clientmalloc.c:577)
				812	by 0x80484C7: main (tests/doublefree.c:10)
				813	by 0x402A6E5E: __libc_start_main (libc-start.c:129)
				814	by 0x80483B1: (within tests/doublefree)
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	815	</pre>
				816	<p>Valgrind keeps track of the blocks allocated by your program with
				817	malloc/new, so it can know exactly whether or not the argument to
				818	free/delete is legitimate or not. Here, this test program has
				819	freed the same block twice. As with the illegal read/write errors,
				820	Valgrind attempts to make sense of the address free'd. If, as
				821	here, the address is one which has previously been freed, you wil
				822	be told that -- making duplicate frees of the same block easy to spot.
				823
				824
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	825	<h4>2.6.4  When a block is freed with an inappropriate
				826	deallocation function</h4>
sewardj	7c062c9	2002-05-01 21:46:38 +0000	[diff] [blame]	827	In the following example, a block allocated with <code>new []</code>
				828	has wrongly been deallocated with <code>free</code>:
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	829	<pre>
				830	Mismatched free() / delete / delete []
sewardj	7c062c9	2002-05-01 21:46:38 +0000	[diff] [blame]	831	at 0x40043249: free (vg_clientfuncs.c:171)
				832	by 0x4102BB4E: QGArray::~QGArray(void) (tools/qgarray.cpp:149)
				833	by 0x4C261C41: PptDoc::~PptDoc(void) (include/qmemarray.h:60)
				834	by 0x4C261F0E: PptXml::~PptXml(void) (pptxml.cc:44)
				835	Address 0x4BB292A8 is 0 bytes inside a block of size 64 alloc'd
				836	at 0x4004318C: __builtin_vec_new (vg_clientfuncs.c:152)
				837	by 0x4C21BC15: KLaola::readSBStream(int) const (klaola.cc:314)
				838	by 0x4C21C155: KLaola::stream(KLaola::OLENode const *) (klaola.cc:416)
				839	by 0x4C21788F: OLEFilter::convert(QCString const &) (olefilter.cc:272)
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	840	</pre>
				841	The following was told to me be the KDE 3 developers. I didn't know
				842	any of it myself. They also implemented the check itself.
				843	<p>
				844	In C++ it's important to deallocate memory in a way compatible with
				845	how it was allocated. The deal is:
				846	<ul>
				847	<li>If allocated with <code>malloc</code>, <code>calloc</code>,
				848	<code>realloc</code>, <code>valloc</code> or
				849	<code>memalign</code>, you must deallocate with <code>free</code>.
				850	<li>If allocated with <code>new []</code>, you must deallocate with
				851	<code>delete []</code>.
				852	<li>If allocated with <code>new</code>, you must deallocate with
				853	<code>delete</code>.
				854	</ul>
				855	The worst thing is that on Linux apparently it doesn't matter if you
				856	do muddle these up, and it all seems to work ok, but the same program
				857	may then crash on a different platform, Solaris for example. So it's
				858	best to fix it properly. According to the KDE folks "it's amazing how
				859	many C++ programmers don't know this".
				860
				861
				862
				863	<h4>2.6.5  Passing system call parameters with inadequate
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	864	read/write permissions</h4>
				865
				866	Valgrind checks all parameters to system calls. If a system call
				867	needs to read from a buffer provided by your program, Valgrind checks
				868	that the entire buffer is addressible and has valid data, ie, it is
				869	readable. And if the system call needs to write to a user-supplied
				870	buffer, Valgrind checks that the buffer is addressible. After the
				871	system call, Valgrind updates its administrative information to
				872	precisely reflect any changes in memory permissions caused by the
				873	system call.
				874
				875	<p>Here's an example of a system call with an invalid parameter:
				876	<pre>
				877	#include <stdlib.h>
				878	#include <unistd.h>
				879	int main( void )
				880	{
				881	char* arr = malloc(10);
				882	(void) write( 1 /* stdout */, arr, 10 );
				883	return 0;
				884	}
				885	</pre>
				886
				887	<p>You get this complaint ...
				888	<pre>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	889	Syscall param write(buf) contains uninitialised or unaddressable byte(s)
				890	at 0x4035E072: __libc_write
				891	by 0x402A6E5E: __libc_start_main (libc-start.c:129)
				892	by 0x80483B1: (within tests/badwrite)
				893	by <bogus frame pointer> ???
				894	Address 0x3807E6D0 is 0 bytes inside a block of size 10 alloc'd
				895	at 0x4004FEE6: malloc (ut_clientmalloc.c:539)
				896	by 0x80484A0: main (tests/badwrite.c:6)
				897	by 0x402A6E5E: __libc_start_main (libc-start.c:129)
				898	by 0x80483B1: (within tests/badwrite)
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	899	</pre>
				900
				901	<p>... because the program has tried to write uninitialised junk from
				902	the malloc'd block to the standard output.
				903
				904
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	905	<h4>2.6.6  Warning messages you might see</h4>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	906
				907	Most of these only appear if you run in verbose mode (enabled by
				908	<code>-v</code>):
				909	<ul>
				910	<li> <code>More than 50 errors detected. Subsequent errors
				911	will still be recorded, but in less detail than before.</code>
				912	<br>
				913	After 50 different errors have been shown, Valgrind becomes
				914	more conservative about collecting them. It then requires only
				915	the program counters in the top two stack frames to match when
				916	deciding whether or not two errors are really the same one.
				917	Prior to this point, the PCs in the top four frames are required
				918	to match. This hack has the effect of slowing down the
				919	appearance of new errors after the first 50. The 50 constant can
				920	be changed by recompiling Valgrind.
				921	<p>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	922	<li> <code>More than 300 errors detected. I'm not reporting any more.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	923	Final error counts may be inaccurate. Go fix your
				924	program!</code>
				925	<br>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	926	After 300 different errors have been detected, Valgrind ignores
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	927	any more. It seems unlikely that collecting even more different
				928	ones would be of practical help to anybody, and it avoids the
				929	danger that Valgrind spends more and more of its time comparing
				930	new errors against an ever-growing collection. As above, the 500
				931	number is a compile-time constant.
				932	<p>
				933	<li> <code>Warning: client exiting by calling exit(<number>).
				934	Bye!</code>
				935	<br>
				936	Your program has called the <code>exit</code> system call, which
				937	will immediately terminate the process. You'll get no exit-time
				938	error summaries or leak checks. Note that this is not the same
				939	as your program calling the ANSI C function <code>exit()</code>
				940	-- that causes a normal, controlled shutdown of Valgrind.
				941	<p>
				942	<li> <code>Warning: client switching stacks?</code>
				943	<br>
				944	Valgrind spotted such a large change in the stack pointer, %esp,
				945	that it guesses the client is switching to a different stack.
				946	At this point it makes a kludgey guess where the base of the new
				947	stack is, and sets memory permissions accordingly. You may get
				948	many bogus error messages following this, if Valgrind guesses
				949	wrong. At the moment "large change" is defined as a change of
				950	more that 2000000 in the value of the %esp (stack pointer)
				951	register.
				952	<p>
				953	<li> <code>Warning: client attempted to close Valgrind's logfile fd <number>
				954	</code>
				955	<br>
				956	Valgrind doesn't allow the client
				957	to close the logfile, because you'd never see any diagnostic
				958	information after that point. If you see this message,
				959	you may want to use the <code>--logfile-fd=<number></code>
				960	option to specify a different logfile file-descriptor number.
				961	<p>
				962	<li> <code>Warning: noted but unhandled ioctl <number></code>
				963	<br>
				964	Valgrind observed a call to one of the vast family of
				965	<code>ioctl</code> system calls, but did not modify its
				966	memory status info (because I have not yet got round to it).
				967	The call will still have gone through, but you may get spurious
				968	errors after this as a result of the non-update of the memory info.
				969	<p>
				970	<li> <code>Warning: unblocking signal <number> due to
				971	sigprocmask</code>
				972	<br>
				973	Really just a diagnostic from the signal simulation machinery.
				974	This message will appear if your program handles a signal by
				975	first <code>longjmp</code>ing out of the signal handler,
				976	and then unblocking the signal with <code>sigprocmask</code>
				977	-- a standard signal-handling idiom.
				978	<p>
				979	<li> <code>Warning: bad signal number <number> in __NR_sigaction.</code>
				980	<br>
				981	Probably indicates a bug in the signal simulation machinery.
				982	<p>
				983	<li> <code>Warning: set address range perms: large range <number></code>
				984	<br>
				985	Diagnostic message, mostly for my benefit, to do with memory
				986	permissions.
				987	</ul>
				988
				989
				990	<a name="suppfiles"></a>
				991	<h3>2.7  Writing suppressions files</h3>
				992
				993	A suppression file describes a bunch of errors which, for one reason
				994	or another, you don't want Valgrind to tell you about. Usually the
				995	reason is that the system libraries are buggy but unfixable, at least
				996	within the scope of the current debugging session. Multiple
				997	suppresions files are allowed. By default, Valgrind uses
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	998	<code>$PREFIX/lib/valgrind/default.supp</code>.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	999
				1000	<p>
				1001	You can ask to add suppressions from another file, by specifying
				1002	<code>--suppressions=/path/to/file.supp</code>.
				1003
				1004	<p>Each suppression has the following components:<br>
				1005	<ul>
				1006
				1007	<li>Its name. This merely gives a handy name to the suppression, by
				1008	which it is referred to in the summary of used suppressions
				1009	printed out when a program finishes. It's not important what
				1010	the name is; any identifying string will do.
				1011	<p>
				1012
				1013	<li>The nature of the error to suppress. Either:
				1014	<code>Value1</code>,
				1015	<code>Value2</code>,
sewardj	a7dc795	2002-03-24 11:29:13 +0000	[diff] [blame]	1016	<code>Value4</code> or
				1017	<code>Value8</code>,
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1018	meaning an uninitialised-value error when
sewardj	a7dc795	2002-03-24 11:29:13 +0000	[diff] [blame]	1019	using a value of 1, 2, 4 or 8 bytes.
				1020	Or
				1021	<code>Cond</code> (or its old name, <code>Value0</code>),
				1022	meaning use of an uninitialised CPU condition code. Or:
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1023	<code>Addr1</code>,
				1024	<code>Addr2</code>,
				1025	<code>Addr4</code> or
				1026	<code>Addr8</code>, meaning an invalid address during a
				1027	memory access of 1, 2, 4 or 8 bytes respectively. Or
				1028	<code>Param</code>,
				1029	meaning an invalid system call parameter error. Or
				1030	<code>Free</code>, meaning an invalid or mismatching free.</li><br>
				1031	<p>
				1032
				1033	<li>The "immediate location" specification. For Value and Addr
				1034	errors, is either the name of the function in which the error
				1035	occurred, or, failing that, the full path the the .so file
				1036	containing the error location. For Param errors, is the name of
				1037	the offending system call parameter. For Free errors, is the
				1038	name of the function doing the freeing (eg, <code>free</code>,
				1039	<code>__builtin_vec_delete</code>, etc)</li><br>
				1040	<p>
				1041
				1042	<li>The caller of the above "immediate location". Again, either a
				1043	function or shared-object name.</li><br>
				1044	<p>
				1045
				1046	<li>Optionally, one or two extra calling-function or object names,
				1047	for greater precision.</li>
				1048	</ul>
				1049
				1050	<p>
				1051	Locations may be either names of shared objects or wildcards matching
				1052	function names. They begin <code>obj:</code> and <code>fun:</code>
				1053	respectively. Function and object names to match against may use the
				1054	wildcard characters <code>*</code> and <code>?</code>.
				1055
				1056	A suppression only suppresses an error when the error matches all the
				1057	details in the suppression. Here's an example:
				1058	<pre>
				1059	{
				1060	__gconv_transform_ascii_internal/__mbrtowc/mbtowc
				1061	Value4
				1062	fun:__gconv_transform_ascii_internal
				1063	fun:__mbr*toc
				1064	fun:mbtowc
				1065	}
				1066	</pre>
				1067
				1068	<p>What is means is: suppress a use-of-uninitialised-value error, when
				1069	the data size is 4, when it occurs in the function
				1070	<code>__gconv_transform_ascii_internal</code>, when that is called
				1071	from any function of name matching <code>__mbr*toc</code>,
				1072	when that is called from
				1073	<code>mbtowc</code>. It doesn't apply under any other circumstances.
				1074	The string by which this suppression is identified to the user is
				1075	__gconv_transform_ascii_internal/__mbrtowc/mbtowc.
				1076
				1077	<p>Another example:
				1078	<pre>
				1079	{
				1080	libX11.so.6.2/libX11.so.6.2/libXaw.so.7.0
				1081	Value4
				1082	obj:/usr/X11R6/lib/libX11.so.6.2
				1083	obj:/usr/X11R6/lib/libX11.so.6.2
				1084	obj:/usr/X11R6/lib/libXaw.so.7.0
				1085	}
				1086	</pre>
				1087
				1088	<p>Suppress any size 4 uninitialised-value error which occurs anywhere
				1089	in <code>libX11.so.6.2</code>, when called from anywhere in the same
				1090	library, when called from anywhere in <code>libXaw.so.7.0</code>. The
				1091	inexact specification of locations is regrettable, but is about all
				1092	you can hope for, given that the X11 libraries shipped with Red Hat
				1093	7.2 have had their symbol tables removed.
				1094
				1095	<p>Note -- since the above two examples did not make it clear -- that
				1096	you can freely mix the <code>obj:</code> and <code>fun:</code>
				1097	styles of description within a single suppression record.
				1098
				1099
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1100	<a name="clientreq"></a>
				1101	<h3>2.8  The Client Request mechanism</h3>
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	1102
				1103	Valgrind has a trapdoor mechanism via which the client program can
				1104	pass all manner of requests and queries to Valgrind. Internally, this
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1105	is used extensively to make malloc, free, signals, threads, etc, work,
				1106	although you don't see that.
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	1107	<p>
				1108	For your convenience, a subset of these so-called client requests is
				1109	provided to allow you to tell Valgrind facts about the behaviour of
				1110	your program, and conversely to make queries. In particular, your
				1111	program can tell Valgrind about changes in memory range permissions
				1112	that Valgrind would not otherwise know about, and so allows clients to
				1113	get Valgrind to do arbitrary custom checks.
				1114	<p>
				1115	Clients need to include the header file <code>valgrind.h</code> to
				1116	make this work. The macros therein have the magical property that
				1117	they generate code in-line which Valgrind can spot. However, the code
				1118	does nothing when not run on Valgrind, so you are not forced to run
				1119	your program on Valgrind just because you use the macros in this file.
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1120	Also, you are not required to link your program with any extra
				1121	supporting libraries.
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	1122	<p>
				1123	A brief description of the available macros:
				1124	<ul>
				1125	<li><code>VALGRIND_MAKE_NOACCESS</code>,
				1126	<code>VALGRIND_MAKE_WRITABLE</code> and
				1127	<code>VALGRIND_MAKE_READABLE</code>. These mark address
				1128	ranges as completely inaccessible, accessible but containing
				1129	undefined data, and accessible and containing defined data,
				1130	respectively. Subsequent errors may have their faulting
				1131	addresses described in terms of these blocks. Returns a
				1132	"block handle". Returns zero when not run on Valgrind.
				1133	<p>
				1134	<li><code>VALGRIND_DISCARD</code>: At some point you may want
				1135	Valgrind to stop reporting errors in terms of the blocks
				1136	defined by the previous three macros. To do this, the above
				1137	macros return a small-integer "block handle". You can pass
				1138	this block handle to <code>VALGRIND_DISCARD</code>. After
				1139	doing so, Valgrind will no longer be able to relate
				1140	addressing errors to the user-defined block associated with
				1141	the handle. The permissions settings associated with the
				1142	handle remain in place; this just affects how errors are
				1143	reported, not whether they are reported. Returns 1 for an
				1144	invalid handle and 0 for a valid handle (although passing
				1145	invalid handles is harmless). Always returns 0 when not run
				1146	on Valgrind.
				1147	<p>
				1148	<li><code>VALGRIND_CHECK_NOACCESS</code>,
				1149	<code>VALGRIND_CHECK_WRITABLE</code> and
				1150	<code>VALGRIND_CHECK_READABLE</code>: check immediately
				1151	whether or not the given address range has the relevant
				1152	property, and if not, print an error message. Also, for the
				1153	convenience of the client, returns zero if the relevant
				1154	property holds; otherwise, the returned value is the address
				1155	of the first byte for which the property is not true.
				1156	Always returns 0 when not run on Valgrind.
				1157	<p>
				1158	<li><code>VALGRIND_CHECK_NOACCESS</code>: a quick and easy way
				1159	to find out whether Valgrind thinks a particular variable
				1160	(lvalue, to be precise) is addressible and defined. Prints
				1161	an error message if not. Returns no value.
				1162	<p>
				1163	<li><code>VALGRIND_MAKE_NOACCESS_STACK</code>: a highly
				1164	experimental feature. Similarly to
				1165	<code>VALGRIND_MAKE_NOACCESS</code>, this marks an address
				1166	range as inaccessible, so that subsequent accesses to an
				1167	address in the range gives an error. However, this macro
				1168	does not return a block handle. Instead, all annotations
				1169	created like this are reviewed at each client
				1170	<code>ret</code> (subroutine return) instruction, and those
				1171	which now define an address range block the client's stack
				1172	pointer register (<code>%esp</code>) are automatically
				1173	deleted.
				1174	<p>
				1175	In other words, this macro allows the client to tell
				1176	Valgrind about red-zones on its own stack. Valgrind
				1177	automatically discards this information when the stack
				1178	retreats past such blocks. Beware: hacky and flaky, and
				1179	probably interacts badly with the new pthread support.
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	1180	<p>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1181	<li><code>RUNNING_ON_VALGRIND</code>: returns 1 if running on
				1182	Valgrind, 0 if running on the real CPU.
				1183	<p>
				1184	<li><code>VALGRIND_DO_LEAK_CHECK</code>: run the memory leak detector
				1185	right now. Returns no value. I guess this could be used to
				1186	incrementally check for leaks between arbitrary places in the
				1187	program's execution. Warning: not properly tested!
				1188	</ul>
				1189	<p>
				1190
				1191
				1192	<a name="pthreads"></a>
				1193	<h3>2.9  Support for POSIX Pthreads</h3>
				1194
				1195	As of late April 02, Valgrind supports programs which use POSIX
				1196	pthreads. Doing this has proved technically challenging and is still
				1197	in progress, but it works well enough, as of 1 May 02, for significant
				1198	threaded applications to work.
				1199	<p>
				1200	It works as follows: threaded apps are (dynamically) linked against
				1201	<code>libpthread.so</code>. Usually this is the one installed with
				1202	your Linux distribution. Valgrind, however, supplies its own
				1203	<code>libpthread.so</code> and automatically connects your program to
				1204	it instead.
				1205	<p>
				1206	The fake <code>libpthread.so</code> and Valgrind cooperate to
				1207	implement a user-space pthreads package. This approach avoids the
				1208	horrible implementation problems of implementing a truly
				1209	multiprocessor version of Valgrind, but it does mean that threaded
				1210	apps run only on one CPU, even if you have a multiprocessor machine.
				1211	<p>
				1212	Valgrind schedules your threads in a round-robin fashion, with all
				1213	threads having equal priority. It switches threads every 20000 basic
				1214	blocks (typically around 120000 x86 instructions), which means you'll
				1215	get a much finer interleaving of thread executions than when run
				1216	natively. This in itself may cause your program to behave differently
				1217	if you have some kind of concurrency, critical race, locking, or
				1218	similar, bugs.
				1219	<p>
				1220	The current (1 May 02) state of pthread support is as follows. Please
				1221	note that things are advancing rapidly, so the situation may have
				1222	improved by the time you read this -- check the web site for further
				1223	updates.
				1224	<ul>
				1225	<li>Mutexes, condition variables, thread-specific data and
				1226	<code>pthread_once</code> currently work.
				1227	<p>
				1228	<li>Various attribute-like calls are handled but ignored.
				1229	You get a warning message.
				1230	<p>
				1231	<li>The main big omission is proper cleanup support for cancellation.
				1232	<code>pthread_cancel</code> works, but instantly nukes the target
				1233	thread without giving it any chance to clean up. Also, when a
				1234	thread exits, it does not run any cleanup handlers.
				1235	<p>
				1236	<li>Currently the following syscalls are thread-safe (nonblocking):
				1237	<code>write</code> <code>read</code> <code>nanosleep</code>
				1238	<code>sleep</code> <code>select</code> and <code>poll</code>.
				1239	<p>
				1240	<li>The POSIX requirement that each thread have its own
				1241	signal-blocking mask is not done; the signal handling mechanism is
				1242	thread-unaware and all signals are delivered to the main thread,
				1243	antidisirregardless.
				1244	</ul>
				1245
				1246
				1247	As of 1 May 02, the following programs now work fine on my RedHat 7.2
				1248	box: Opera 6.0Beta2, KNode in KDE 3.0, Mozilla-0.9.2.1 and
				1249	Galeon-0.11.3, both as supplied with RedHat 7.2.
				1250	<p>
sewardj	1f13ab1	2002-05-02 03:57:00 +0000	[diff] [blame]	1251	Mozilla 1.0RC1 works fine too, provided that you patch it as described
				1252	here: <a href="http://bugzilla.mozilla.org/show_bug.cgi?id=124335">
				1253	http://bugzilla.mozilla.org/show_bug.cgi?id=124335</a>. This fixes a
				1254	bug in Mozilla which assumes that memory returned from
				1255	<code>malloc</code> is 8-aligned. Valgrind's allocator only
				1256	guarantees 4-alignment, so without the patch Mozilla makes an illegal
				1257	memory access, which Valgrind of course spots, and then bombs.
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1258
				1259
				1260
				1261	<a name="install"></a>
				1262	<h3>2.10  Building and installing</h3>
				1263
				1264	We now use the standard Unix <code>./configure</code>,
				1265	<code>make</code>, <code>make install</code> mechanism, and I have
				1266	attempted to ensure that it works on machines with kernel 2.2 or 2.4
				1267	and glibc 2.1.X or 2.2.X. I don't think there is much else to say.
				1268	There are no options apart from the usual <code>--prefix</code> that
				1269	you should give to <code>./configure</code>.
				1270	<p>
				1271	Let me know if you have build problems.
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	1272
				1273
				1274
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1275	<a name="problems"></a>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1276	<h3>2.11  If you have problems</h3>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1277	Mail me (<a href="mailto:jseward@acm.org">jseward@acm.org</a>).
				1278
				1279	<p>See <a href="#limits">Section 4</a> for the known limitations of
				1280	Valgrind, and for a list of programs which are known not to work on
				1281	it.
				1282
				1283	<p>The translator/instrumentor has a lot of assertions in it. They
				1284	are permanently enabled, and I have no plans to disable them. If one
				1285	of these breaks, please mail me!
				1286
				1287	<p>If you get an assertion failure on the expression
				1288	<code>chunkSane(ch)</code> in <code>vg_free()</code> in
				1289	<code>vg_malloc.c</code>, this may have happened because your program
				1290	wrote off the end of a malloc'd block, or before its beginning.
				1291	Valgrind should have emitted a proper message to that effect before
				1292	dying in this way. This is a known problem which I should fix.
				1293	<p>
				1294
				1295	<hr width="100%">
				1296
				1297	<a name="machine"></a>
				1298	<h2>3  Details of the checking machinery</h2>
				1299
				1300	Read this section if you want to know, in detail, exactly what and how
				1301	Valgrind is checking.
				1302
				1303	<a name="vvalue"></a>
				1304	<h3>3.1  Valid-value (V) bits</h3>
				1305
				1306	It is simplest to think of Valgrind implementing a synthetic Intel x86
				1307	CPU which is identical to a real CPU, except for one crucial detail.
				1308	Every bit (literally) of data processed, stored and handled by the
				1309	real CPU has, in the synthetic CPU, an associated "valid-value" bit,
				1310	which says whether or not the accompanying bit has a legitimate value.
				1311	In the discussions which follow, this bit is referred to as the V
				1312	(valid-value) bit.
				1313
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1314	<p>Each byte in the system therefore has a 8 V bits which follow
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1315	it wherever it goes. For example, when the CPU loads a word-size item
				1316	(4 bytes) from memory, it also loads the corresponding 32 V bits from
				1317	a bitmap which stores the V bits for the process' entire address
				1318	space. If the CPU should later write the whole or some part of that
				1319	value to memory at a different address, the relevant V bits will be
				1320	stored back in the V-bit bitmap.
				1321
				1322	<p>In short, each bit in the system has an associated V bit, which
				1323	follows it around everywhere, even inside the CPU. Yes, the CPU's
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1324	(integer and <code>%eflags</code>) registers have their own V bit
				1325	vectors.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1326
				1327	<p>Copying values around does not cause Valgrind to check for, or
				1328	report on, errors. However, when a value is used in a way which might
				1329	conceivably affect the outcome of your program's computation, the
				1330	associated V bits are immediately checked. If any of these indicate
				1331	that the value is undefined, an error is reported.
				1332
				1333	<p>Here's an (admittedly nonsensical) example:
				1334	<pre>
				1335	int i, j;
				1336	int a[10], b[10];
				1337	for (i = 0; i < 10; i++) {
				1338	j = a[i];
				1339	b[i] = j;
				1340	}
				1341	</pre>
				1342
				1343	<p>Valgrind emits no complaints about this, since it merely copies
				1344	uninitialised values from <code>a[]</code> into <code>b[]</code>, and
				1345	doesn't use them in any way. However, if the loop is changed to
				1346	<pre>
				1347	for (i = 0; i < 10; i++) {
				1348	j += a[i];
				1349	}
				1350	if (j == 77)
				1351	printf("hello there\n");
				1352	</pre>
				1353	then Valgrind will complain, at the <code>if</code>, that the
				1354	condition depends on uninitialised values.
				1355
				1356	<p>Most low level operations, such as adds, cause Valgrind to
				1357	use the V bits for the operands to calculate the V bits for the
				1358	result. Even if the result is partially or wholly undefined,
				1359	it does not complain.
				1360
				1361	<p>Checks on definedness only occur in two places: when a value is
				1362	used to generate a memory address, and where control flow decision
				1363	needs to be made. Also, when a system call is detected, valgrind
				1364	checks definedness of parameters as required.
				1365
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1366	<p>If a check should detect undefinedness, an error message is
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1367	issued. The resulting value is subsequently regarded as well-defined.
				1368	To do otherwise would give long chains of error messages. In effect,
				1369	we say that undefined values are non-infectious.
				1370
				1371	<p>This sounds overcomplicated. Why not just check all reads from
				1372	memory, and complain if an undefined value is loaded into a CPU register?
				1373	Well, that doesn't work well, because perfectly legitimate C programs routinely
				1374	copy uninitialised values around in memory, and we don't want endless complaints
				1375	about that. Here's the canonical example. Consider a struct
				1376	like this:
				1377	<pre>
				1378	struct S { int x; char c; };
				1379	struct S s1, s2;
				1380	s1.x = 42;
				1381	s1.c = 'z';
				1382	s2 = s1;
				1383	</pre>
				1384
				1385	<p>The question to ask is: how large is <code>struct S</code>, in
				1386	bytes? An int is 4 bytes and a char one byte, so perhaps a struct S
				1387	occupies 5 bytes? Wrong. All (non-toy) compilers I know of will
				1388	round the size of <code>struct S</code> up to a whole number of words,
				1389	in this case 8 bytes. Not doing this forces compilers to generate
				1390	truly appalling code for subscripting arrays of <code>struct
				1391	S</code>'s.
				1392
				1393	<p>So s1 occupies 8 bytes, yet only 5 of them will be initialised.
				1394	For the assignment <code>s2 = s1</code>, gcc generates code to copy
				1395	all 8 bytes wholesale into <code>s2</code> without regard for their
				1396	meaning. If Valgrind simply checked values as they came out of
				1397	memory, it would yelp every time a structure assignment like this
				1398	happened. So the more complicated semantics described above is
				1399	necessary. This allows gcc to copy <code>s1</code> into
				1400	<code>s2</code> any way it likes, and a warning will only be emitted
				1401	if the uninitialised values are later used.
				1402
				1403	<p>One final twist to this story. The above scheme allows garbage to
				1404	pass through the CPU's integer registers without complaint. It does
				1405	this by giving the integer registers V tags, passing these around in
				1406	the expected way. This complicated and computationally expensive to
				1407	do, but is necessary. Valgrind is more simplistic about
				1408	floating-point loads and stores. In particular, V bits for data read
				1409	as a result of floating-point loads are checked at the load
				1410	instruction. So if your program uses the floating-point registers to
				1411	do memory-to-memory copies, you will get complaints about
				1412	uninitialised values. Fortunately, I have not yet encountered a
				1413	program which (ab)uses the floating-point registers in this way.
				1414
				1415	<a name="vaddress"></a>
				1416	<h3>3.2  Valid-address (A) bits</h3>
				1417
				1418	Notice that the previous section describes how the validity of values
				1419	is established and maintained without having to say whether the
				1420	program does or does not have the right to access any particular
				1421	memory location. We now consider the latter issue.
				1422
				1423	<p>As described above, every bit in memory or in the CPU has an
				1424	associated valid-value (V) bit. In addition, all bytes in memory, but
				1425	not in the CPU, have an associated valid-address (A) bit. This
				1426	indicates whether or not the program can legitimately read or write
				1427	that location. It does not give any indication of the validity or the
				1428	data at that location -- that's the job of the V bits -- only whether
				1429	or not the location may be accessed.
				1430
				1431	<p>Every time your program reads or writes memory, Valgrind checks the
				1432	A bits associated with the address. If any of them indicate an
				1433	invalid address, an error is emitted. Note that the reads and writes
				1434	themselves do not change the A bits, only consult them.
				1435
				1436	<p>So how do the A bits get set/cleared? Like this:
				1437
				1438	<ul>
				1439	<li>When the program starts, all the global data areas are marked as
				1440	accessible.</li><br>
				1441	<p>
				1442
				1443	<li>When the program does malloc/new, the A bits for the exactly the
				1444	area allocated, and not a byte more, are marked as accessible.
				1445	Upon freeing the area the A bits are changed to indicate
				1446	inaccessibility.</li><br>
				1447	<p>
				1448
				1449	<li>When the stack pointer register (%esp) moves up or down, A bits
				1450	are set. The rule is that the area from %esp up to the base of
				1451	the stack is marked as accessible, and below %esp is
				1452	inaccessible. (If that sounds illogical, bear in mind that the
				1453	stack grows down, not up, on almost all Unix systems, including
				1454	GNU/Linux.) Tracking %esp like this has the useful side-effect
				1455	that the section of stack used by a function for local variables
				1456	etc is automatically marked accessible on function entry and
				1457	inaccessible on exit.</li><br>
				1458	<p>
				1459
				1460	<li>When doing system calls, A bits are changed appropriately. For
				1461	example, mmap() magically makes files appear in the process's
				1462	address space, so the A bits must be updated if mmap()
				1463	succeeds.</li><br>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1464	<p>
				1465
				1466	<li>Optionally, your program can tell Valgrind about such changes
				1467	explicitly, using the client request mechanism described above.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1468	</ul>
				1469
				1470
				1471	<a name="together"></a>
				1472	<h3>3.3  Putting it all together</h3>
				1473	Valgrind's checking machinery can be summarised as follows:
				1474
				1475	<ul>
				1476	<li>Each byte in memory has 8 associated V (valid-value) bits,
				1477	saying whether or not the byte has a defined value, and a single
				1478	A (valid-address) bit, saying whether or not the program
				1479	currently has the right to read/write that address.</li><br>
				1480	<p>
				1481
				1482	<li>When memory is read or written, the relevant A bits are
				1483	consulted. If they indicate an invalid address, Valgrind emits
				1484	an Invalid read or Invalid write error.</li><br>
				1485	<p>
				1486
				1487	<li>When memory is read into the CPU's integer registers, the
				1488	relevant V bits are fetched from memory and stored in the
				1489	simulated CPU. They are not consulted.</li><br>
				1490	<p>
				1491
				1492	<li>When an integer register is written out to memory, the V bits
				1493	for that register are written back to memory too.</li><br>
				1494	<p>
				1495
				1496	<li>When memory is read into the CPU's floating point registers, the
				1497	relevant V bits are read from memory and they are immediately
				1498	checked. If any are invalid, an uninitialised value error is
				1499	emitted. This precludes using the floating-point registers to
				1500	copy possibly-uninitialised memory, but simplifies Valgrind in
				1501	that it does not have to track the validity status of the
				1502	floating-point registers.</li><br>
				1503	<p>
				1504
				1505	<li>As a result, when a floating-point register is written to
				1506	memory, the associated V bits are set to indicate a valid
				1507	value.</li><br>
				1508	<p>
				1509
				1510	<li>When values in integer CPU registers are used to generate a
				1511	memory address, or to determine the outcome of a conditional
				1512	branch, the V bits for those values are checked, and an error
				1513	emitted if any of them are undefined.</li><br>
				1514	<p>
				1515
				1516	<li>When values in integer CPU registers are used for any other
				1517	purpose, Valgrind computes the V bits for the result, but does
				1518	not check them.</li><br>
				1519	<p>
				1520
				1521	<li>One the V bits for a value in the CPU have been checked, they
				1522	are then set to indicate validity. This avoids long chains of
				1523	errors.</li><br>
				1524	<p>
				1525
				1526	<li>When values are loaded from memory, valgrind checks the A bits
				1527	for that location and issues an illegal-address warning if
				1528	needed. In that case, the V bits loaded are forced to indicate
				1529	Valid, despite the location being invalid.
				1530	<p>
				1531	This apparently strange choice reduces the amount of confusing
				1532	information presented to the user. It avoids the
				1533	unpleasant phenomenon in which memory is read from a place which
				1534	is both unaddressible and contains invalid values, and, as a
				1535	result, you get not only an invalid-address (read/write) error,
				1536	but also a potentially large set of uninitialised-value errors,
				1537	one for every time the value is used.
				1538	<p>
				1539	There is a hazy boundary case to do with multi-byte loads from
				1540	addresses which are partially valid and partially invalid. See
				1541	details of the flag <code>--partial-loads-ok</code> for details.
				1542	</li><br>
				1543	</ul>
				1544
				1545	Valgrind intercepts calls to malloc, calloc, realloc, valloc,
				1546	memalign, free, new and delete. The behaviour you get is:
				1547
				1548	<ul>
				1549
				1550	<li>malloc/new: the returned memory is marked as addressible but not
				1551	having valid values. This means you have to write on it before
				1552	you can read it.</li><br>
				1553	<p>
				1554
				1555	<li>calloc: returned memory is marked both addressible and valid,
				1556	since calloc() clears the area to zero.</li><br>
				1557	<p>
				1558
				1559	<li>realloc: if the new size is larger than the old, the new section
				1560	is addressible but invalid, as with malloc.</li><br>
				1561	<p>
				1562
				1563	<li>If the new size is smaller, the dropped-off section is marked as
				1564	unaddressible. You may only pass to realloc a pointer
				1565	previously issued to you by malloc/calloc/new/realloc.</li><br>
				1566	<p>
				1567
				1568	<li>free/delete: you may only pass to free a pointer previously
				1569	issued to you by malloc/calloc/new/realloc, or the value
				1570	NULL. Otherwise, Valgrind complains. If the pointer is indeed
				1571	valid, Valgrind marks the entire area it points at as
				1572	unaddressible, and places the block in the freed-blocks-queue.
				1573	The aim is to defer as long as possible reallocation of this
				1574	block. Until that happens, all attempts to access it will
				1575	elicit an invalid-address error, as you would hope.</li><br>
				1576	</ul>
				1577
				1578
				1579
				1580	<a name="signals"></a>
				1581	<h3>3.4  Signals</h3>
				1582
				1583	Valgrind provides suitable handling of signals, so, provided you stick
				1584	to POSIX stuff, you should be ok. Basic sigaction() and sigprocmask()
				1585	are handled. Signal handlers may return in the normal way or do
				1586	longjmp(); both should work ok. As specified by POSIX, a signal is
				1587	blocked in its own handler. Default actions for signals should work
				1588	as before. Etc, etc.
				1589
				1590	<p>Under the hood, dealing with signals is a real pain, and Valgrind's
				1591	simulation leaves much to be desired. If your program does
				1592	way-strange stuff with signals, bad things may happen. If so, let me
				1593	know. I don't promise to fix it, but I'd at least like to be aware of
				1594	it.
				1595
				1596
				1597	<a name="leaks"><a/>
				1598	<h3>3.5  Memory leak detection</h3>
				1599
				1600	Valgrind keeps track of all memory blocks issued in response to calls
				1601	to malloc/calloc/realloc/new. So when the program exits, it knows
				1602	which blocks are still outstanding -- have not been returned, in other
				1603	words. Ideally, you want your program to have no blocks still in use
				1604	at exit. But many programs do.
				1605
				1606	<p>For each such block, Valgrind scans the entire address space of the
				1607	process, looking for pointers to the block. One of three situations
				1608	may result:
				1609
				1610	<ul>
				1611	<li>A pointer to the start of the block is found. This usually
				1612	indicates programming sloppiness; since the block is still
				1613	pointed at, the programmer could, at least in principle, free'd
				1614	it before program exit.</li><br>
				1615	<p>
				1616
				1617	<li>A pointer to the interior of the block is found. The pointer
				1618	might originally have pointed to the start and have been moved
				1619	along, or it might be entirely unrelated. Valgrind deems such a
				1620	block as "dubious", that is, possibly leaked,
				1621	because it's unclear whether or
				1622	not a pointer to it still exists.</li><br>
				1623	<p>
				1624
				1625	<li>The worst outcome is that no pointer to the block can be found.
				1626	The block is classified as "leaked", because the
				1627	programmer could not possibly have free'd it at program exit,
				1628	since no pointer to it exists. This might be a symptom of
				1629	having lost the pointer at some earlier point in the
				1630	program.</li>
				1631	</ul>
				1632
				1633	Valgrind reports summaries about leaked and dubious blocks.
				1634	For each such block, it will also tell you where the block was
				1635	allocated. This should help you figure out why the pointer to it has
				1636	been lost. In general, you should attempt to ensure your programs do
				1637	not have any leaked or dubious blocks at exit.
				1638
				1639	<p>The precise area of memory in which Valgrind searches for pointers
				1640	is: all naturally-aligned 4-byte words for which all A bits indicate
				1641	addressibility and all V bits indicated that the stored value is
				1642	actually valid.
				1643
				1644	<p><hr width="100%">
				1645
				1646
				1647	<a name="limits"></a>
				1648	<h2>4  Limitations</h2>
				1649
				1650	The following list of limitations seems depressingly long. However,
				1651	most programs actually work fine.
				1652
				1653	<p>Valgrind will run x86-GNU/Linux ELF dynamically linked binaries, on
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1654	a kernel 2.2.X or 2.4.X system, subject to the following constraints:
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1655
				1656	<ul>
				1657	<li>No MMX, SSE, SSE2, 3DNow instructions. If the translator
				1658	encounters these, Valgrind will simply give up. It may be
				1659	possible to add support for them at a later time. Intel added a
				1660	few instructions such as "cmov" to the integer instruction set
				1661	on Pentium and later processors, and these are supported.
				1662	Nevertheless it's safest to think of Valgrind as implementing
				1663	the 486 instruction set.</li><br>
				1664	<p>
				1665
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1666	<li>Pthreads support is improving, but there are still significant
				1667	limitations in that department. See the section above on
				1668	Pthreads. Note that your program must be dynamically linked
				1669	against <code>libpthread.so</code>, so that Valgrind can
				1670	substitute its own implementation at program startup time. If
				1671	you're statically linked against it, things will fail
				1672	badly.</li><br>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1673	<p>
				1674
				1675	<li>Valgrind assumes that the floating point registers are not used
				1676	as intermediaries in memory-to-memory copies, so it immediately
				1677	checks V bits in floating-point loads/stores. If you want to
				1678	write code which copies around possibly-uninitialised values,
				1679	you must ensure these travel through the integer registers, not
				1680	the FPU.</li><br>
				1681	<p>
				1682
				1683	<li>If your program does its own memory management, rather than
				1684	using malloc/new/free/delete, it should still work, but
				1685	Valgrind's error checking won't be so effective.</li><br>
				1686	<p>
				1687
				1688	<li>Valgrind's signal simulation is not as robust as it could be.
				1689	Basic POSIX-compliant sigaction and sigprocmask functionality is
				1690	supplied, but it's conceivable that things could go badly awry
				1691	if you do wierd things with signals. Workaround: don't.
				1692	Programs that do non-POSIX signal tricks are in any case
				1693	inherently unportable, so should be avoided if
				1694	possible.</li><br>
				1695	<p>
				1696
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1697	<li>Programs which try to handle signals on
				1698	an alternate stack (sigaltstack) are not supported, although
				1699	they could be, with a bit of effort.</li><br>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1700	<p>
				1701
				1702	<li>Programs which switch stacks are not well handled. Valgrind
				1703	does have support for this, but I don't have great faith in it.
				1704	It's difficult -- there's no cast-iron way to decide whether a
				1705	large change in %esp is as a result of the program switching
				1706	stacks, or merely allocating a large object temporarily on the
				1707	current stack -- yet Valgrind needs to handle the two situations
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1708	differently. 1 May 02: this probably interacts badly with the
				1709	new pthread support. I haven't checked properly.</li><br>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1710	<p>
				1711
				1712	<li>x86 instructions, and system calls, have been implemented on
				1713	demand. So it's possible, although unlikely, that a program
				1714	will fall over with a message to that effect. If this happens,
				1715	please mail me ALL the details printed out, so I can try and
				1716	implement the missing feature.</li><br>
				1717	<p>
				1718
				1719	<li>x86 floating point works correctly, but floating-point code may
				1720	run even more slowly than integer code, due to my simplistic
				1721	approach to FPU emulation.</li><br>
				1722	<p>
				1723
				1724	<li>You can't Valgrind-ize statically linked binaries. Valgrind
				1725	relies on the dynamic-link mechanism to gain control at
				1726	startup.</li><br>
				1727	<p>
				1728
				1729	<li>Memory consumption of your program is majorly increased whilst
				1730	running under Valgrind. This is due to the large amount of
				1731	adminstrative information maintained behind the scenes. Another
				1732	cause is that Valgrind dynamically translates the original
				1733	executable and never throws any translation away, except in
				1734	those rare cases where self-modifying code is detected.
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1735	Translated, instrumented code is 12-14 times larger than the
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1736	original (!) so you can easily end up with 15+ MB of
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1737	translations when running (eg) a web browser.
				1738	</li>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1739	</ul>
				1740
				1741
				1742	Programs which are known not to work are:
				1743
				1744	<ul>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1745	<li>emacs starts up but immediately concludes it is out of memory
				1746	and aborts. Emacs has it's own memory-management scheme, but I
				1747	don't understand why this should interact so badly with
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1748	Valgrind. Emacs works fine if you build it to use the standard
				1749	malloc/free routines.</li><br>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1750	<p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1751	</ul>
				1752
				1753
				1754	<p><hr width="100%">
				1755
				1756
				1757	<a name="howitworks"></a>
				1758	<h2>5  How it works -- a rough overview</h2>
				1759	Some gory details, for those with a passion for gory details. You
				1760	don't need to read this section if all you want to do is use Valgrind.
				1761
				1762	<a name="startb"></a>
				1763	<h3>5.1  Getting started</h3>
				1764
				1765	Valgrind is compiled into a shared object, valgrind.so. The shell
				1766	script valgrind sets the LD_PRELOAD environment variable to point to
				1767	valgrind.so. This causes the .so to be loaded as an extra library to
				1768	any subsequently executed dynamically-linked ELF binary, viz, the
				1769	program you want to debug.
				1770
				1771	<p>The dynamic linker allows each .so in the process image to have an
				1772	initialisation function which is run before main(). It also allows
				1773	each .so to have a finalisation function run after main() exits.
				1774
				1775	<p>When valgrind.so's initialisation function is called by the dynamic
				1776	linker, the synthetic CPU to starts up. The real CPU remains locked
				1777	in valgrind.so for the entire rest of the program, but the synthetic
				1778	CPU returns from the initialisation function. Startup of the program
				1779	now continues as usual -- the dynamic linker calls all the other .so's
				1780	initialisation routines, and eventually runs main(). This all runs on
				1781	the synthetic CPU, not the real one, but the client program cannot
				1782	tell the difference.
				1783
				1784	<p>Eventually main() exits, so the synthetic CPU calls valgrind.so's
				1785	finalisation function. Valgrind detects this, and uses it as its cue
				1786	to exit. It prints summaries of all errors detected, possibly checks
				1787	for memory leaks, and then exits the finalisation routine, but now on
				1788	the real CPU. The synthetic CPU has now lost control -- permanently
				1789	-- so the program exits back to the OS on the real CPU, just as it
				1790	would have done anyway.
				1791
				1792	<p>On entry, Valgrind switches stacks, so it runs on its own stack.
				1793	On exit, it switches back. This means that the client program
				1794	continues to run on its own stack, so we can switch back and forth
				1795	between running it on the simulated and real CPUs without difficulty.
				1796	This was an important design decision, because it makes it easy (well,
				1797	significantly less difficult) to debug the synthetic CPU.
				1798
				1799
				1800	<a name="engine"></a>
				1801	<h3>5.2  The translation/instrumentation engine</h3>
				1802
				1803	Valgrind does not directly run any of the original program's code. Only
				1804	instrumented translations are run. Valgrind maintains a translation
				1805	table, which allows it to find the translation quickly for any branch
				1806	target (code address). If no translation has yet been made, the
				1807	translator - a just-in-time translator - is summoned. This makes an
				1808	instrumented translation, which is added to the collection of
				1809	translations. Subsequent jumps to that address will use this
				1810	translation.
				1811
				1812	<p>Valgrind can optionally check writes made by the application, to
				1813	see if they are writing an address contained within code which has
				1814	been translated. Such a write invalidates translations of code
				1815	bracketing the written address. Valgrind will discard the relevant
				1816	translations, which causes them to be re-made, if they are needed
				1817	again, reflecting the new updated data stored there. In this way,
				1818	self modifying code is supported. In practice I have not found any
				1819	Linux applications which use self-modifying-code.
				1820
				1821	<p>The JITter translates basic blocks -- blocks of straight-line-code
				1822	-- as single entities. To minimise the considerable difficulties of
				1823	dealing with the x86 instruction set, x86 instructions are first
				1824	translated to a RISC-like intermediate code, similar to sparc code,
				1825	but with an infinite number of virtual integer registers. Initially
				1826	each insn is translated seperately, and there is no attempt at
				1827	instrumentation.
				1828
				1829	<p>The intermediate code is improved, mostly so as to try and cache
				1830	the simulated machine's registers in the real machine's registers over
				1831	several simulated instructions. This is often very effective. Also,
				1832	we try to remove redundant updates of the simulated machines's
				1833	condition-code register.
				1834
				1835	<p>The intermediate code is then instrumented, giving more
				1836	intermediate code. There are a few extra intermediate-code operations
				1837	to support instrumentation; it is all refreshingly simple. After
				1838	instrumentation there is a cleanup pass to remove redundant value
				1839	checks.
				1840
				1841	<p>This gives instrumented intermediate code which mentions arbitrary
				1842	numbers of virtual registers. A linear-scan register allocator is
				1843	used to assign real registers and possibly generate spill code. All
				1844	of this is still phrased in terms of the intermediate code. This
				1845	machinery is inspired by the work of Reuben Thomas (MITE).
				1846
				1847	<p>Then, and only then, is the final x86 code emitted. The
				1848	intermediate code is carefully designed so that x86 code can be
				1849	generated from it without need for spare registers or other
				1850	inconveniences.
				1851
				1852	<p>The translations are managed using a traditional LRU-based caching
				1853	scheme. The translation cache has a default size of about 14MB.
				1854
				1855	<a name="track"></a>
				1856
				1857	<h3>5.3  Tracking the status of memory</h3> Each byte in the
				1858	process' address space has nine bits associated with it: one A bit and
				1859	eight V bits. The A and V bits for each byte are stored using a
				1860	sparse array, which flexibly and efficiently covers arbitrary parts of
				1861	the 32-bit address space without imposing significant space or
				1862	performance overheads for the parts of the address space never
				1863	visited. The scheme used, and speedup hacks, are described in detail
				1864	at the top of the source file vg_memory.c, so you should read that for
				1865	the gory details.
				1866
				1867	<a name="sys_calls"></a>
				1868
				1869	<h3>5.4 System calls</h3>
				1870	All system calls are intercepted. The memory status map is consulted
				1871	before and updated after each call. It's all rather tiresome. See
				1872	vg_syscall_mem.c for details.
				1873
				1874	<a name="sys_signals"></a>
				1875
				1876	<h3>5.5  Signals</h3>
				1877	All system calls to sigaction() and sigprocmask() are intercepted. If
				1878	the client program is trying to set a signal handler, Valgrind makes a
				1879	note of the handler address and which signal it is for. Valgrind then
				1880	arranges for the same signal to be delivered to its own handler.
				1881
				1882	<p>When such a signal arrives, Valgrind's own handler catches it, and
				1883	notes the fact. At a convenient safe point in execution, Valgrind
				1884	builds a signal delivery frame on the client's stack and runs its
				1885	handler. If the handler longjmp()s, there is nothing more to be said.
				1886	If the handler returns, Valgrind notices this, zaps the delivery
				1887	frame, and carries on where it left off before delivering the signal.
				1888
				1889	<p>The purpose of this nonsense is that setting signal handlers
				1890	essentially amounts to giving callback addresses to the Linux kernel.
				1891	We can't allow this to happen, because if it did, signal handlers
				1892	would run on the real CPU, not the simulated one. This means the
				1893	checking machinery would not operate during the handler run, and,
				1894	worse, memory permissions maps would not be updated, which could cause
				1895	spurious error reports once the handler had returned.
				1896
				1897	<p>An even worse thing would happen if the signal handler longjmp'd
				1898	rather than returned: Valgrind would completely lose control of the
				1899	client program.
				1900
				1901	<p>Upshot: we can't allow the client to install signal handlers
				1902	directly. Instead, Valgrind must catch, on behalf of the client, any
				1903	signal the client asks to catch, and must delivery it to the client on
				1904	the simulated CPU, not the real one. This involves considerable
				1905	gruesome fakery; see vg_signals.c for details.
				1906	<p>
				1907
				1908	<hr width="100%">
				1909
				1910	<a name="example"></a>
				1911	<h2>6  Example</h2>
				1912	This is the log for a run of a small program. The program is in fact
				1913	correct, and the reported error is as the result of a potentially serious
				1914	code generation bug in GNU g++ (snapshot 20010527).
				1915	<pre>
				1916	sewardj@phoenix:~/newmat10$
				1917	~/Valgrind-6/valgrind -v ./bogon
				1918	==25832== Valgrind 0.10, a memory error detector for x86 RedHat 7.1.
				1919	==25832== Copyright (C) 2000-2001, and GNU GPL'd, by Julian Seward.
				1920	==25832== Startup, with flags:
				1921	==25832== --suppressions=/home/sewardj/Valgrind/redhat71.supp
				1922	==25832== reading syms from /lib/ld-linux.so.2
				1923	==25832== reading syms from /lib/libc.so.6
				1924	==25832== reading syms from /mnt/pima/jrs/Inst/lib/libgcc_s.so.0
				1925	==25832== reading syms from /lib/libm.so.6
				1926	==25832== reading syms from /mnt/pima/jrs/Inst/lib/libstdc++.so.3
				1927	==25832== reading syms from /home/sewardj/Valgrind/valgrind.so
				1928	==25832== reading syms from /proc/self/exe
				1929	==25832== loaded 5950 symbols, 142333 line number locations
				1930	==25832==
				1931	==25832== Invalid read of size 4
				1932	==25832== at 0x8048724: _ZN10BandMatrix6ReSizeEiii (bogon.cpp:45)
				1933	==25832== by 0x80487AF: main (bogon.cpp:66)
				1934	==25832== by 0x40371E5E: __libc_start_main (libc-start.c:129)
				1935	==25832== by 0x80485D1: (within /home/sewardj/newmat10/bogon)
				1936	==25832== Address 0xBFFFF74C is not stack'd, malloc'd or free'd
				1937	==25832==
				1938	==25832== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
				1939	==25832== malloc/free: in use at exit: 0 bytes in 0 blocks.
				1940	==25832== malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
				1941	==25832== For a detailed leak analysis, rerun with: --leak-check=yes
				1942	==25832==
				1943	==25832== exiting, did 1881 basic blocks, 0 misses.
				1944	==25832== 223 translations, 3626 bytes in, 56801 bytes out.
				1945	</pre>
				1946	<p>The GCC folks fixed this about a week before gcc-3.0 shipped.
				1947	<hr width="100%">
				1948	<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	1949
				1950
				1951
				1952	<a name="cache"></a>
				1953	<h2>7  Cache profiling</h2>
				1954	As well as memory debugging, Valgrind also allows you to do cache simulations
				1955	and annotate your source line-by-line with the number of cache misses. In
				1956	particular, it records:
				1957	<ul>
				1958	<li>L1 instruction cache reads and misses;
				1959	<li>L1 data cache reads and read misses, writes and write misses;
				1960	<li>L2 unified cache reads and read misses, writes and writes misses.
				1961	</ul>
				1962	On a modern x86 machine, an L1 miss will typically cost around 10 cycles,
				1963	and an L2 miss can cost as much as 200 cycles. Detailed cache profiling can be
njn	7cfd572	2002-05-03 17:51:10 +0000	[diff] [blame]	1964	very useful for improving the performance of your program.<p>
				1965
				1966	Also, since one instruction cache read is performed per instruction executed,
				1967	you can find out how many instructions are executed per line, which can be
				1968	useful for optimisation and test coverage.<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	1969
				1970	Please note that this is an experimental feature. Any feedback, bug-fixes,
				1971	suggestions, etc, welcome.
				1972
				1973
				1974	<h3>7.1  Overview</h3>
				1975	First off, as for normal Valgrind use, you probably want to turn on debugging
				1976	info (the <code>-g</code> flag). But by contrast with normal Valgrind use, you
				1977	probably <b>do</b> want to turn optimisation on, since you should profile your
				1978	program as it will be normally run.
				1979
				1980	The three steps are:
				1981	<ol>
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	1982	<li>Generate a cache simulator for your machine's cache
				1983	configuration with the supplied <code>vg_cachegen</code>
				1984	program, and recompile Valgrind with <code>make install</code>.
				1985	<p>
				1986	The default settings are for an AMD Athlon, and you will get
				1987	useful information with the defaults, so you can skip this step
				1988	if you want. Nevertheless, for accurate cache profiles you will
				1989	need use <code>vg_cachegen</code> to customise
				1990	<code>cachegrind</code> for your system.
				1991	<p>
				1992	This step only needs to be done once, unless you are interested
				1993	in simulating different cache configurations (eg. first
				1994	concentrating on instruction cache misses, then on data cache
				1995	misses).
				1996	</li>
				1997	<p>
				1998	<li>Run your program with <code>cachegrind</code> in front of the
				1999	normal command line invocation. When the program finishes,
				2000	Valgrind will print summary cache statistics. It also collects
				2001	line-by-line information in a file <code>cachegrind.out</code>.
				2002	<p>
				2003	This step should be done every time you want to collect
				2004	information about a new program, a changed program, or about the
				2005	same program with different input.
				2006	</li>
				2007	<p>
				2008	<li>Generate a function-by-function summary, and possibly annotate
				2009	source files with 'vg_annotate'. Source files to annotate can be
				2010	specified manually, or manually on the command line, or
				2011	"interesting" source files can be annotated automatically with
				2012	the <code>--auto=yes</code> option. You can annotate C/C++
				2013	files or assembly language files equally easily.</li>
				2014	<p>
				2015	This step can be performed as many times as you like for each
				2016	Step 2. You may want to do multiple annotations showing
				2017	different information each time.<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2018	</ol>
				2019
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2020	The steps are described in detail in the following sections.<p>
				2021
				2022
				2023	<a name="generate"></a>
				2024	<h3>7.3  Generating a cache simulator</h3>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2025
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2026	Although Valgrind comes with a pre-generated cache simulator, it most
				2027	likely won't match the cache configuration of your machine, so you
				2028	should generate a new simulator.<p>
				2029
				2030	You need to generate three files, one for each of the I1, D1 and L2
				2031	caches. For each cache, you need to know the:
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2032	<ul>
				2033	<li>Cache size (bytes);
				2034	<li>Line size (bytes);
				2035	<li>Associativity.
				2036	</ul>
				2037
				2038	vg_cachegen takes three options:
				2039	<ul>
				2040	<li><code>--I1=size,line_size,associativity</code>
				2041	<li><code>--D1=size,line_size,associativity</code>
				2042	<li><code>--L2=size,line_size,associativity</code>
				2043	</ul>
				2044
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2045	You can specify one, two or all three caches per invocation of
				2046	vg_cachegen. It checks that the configuration is sensible before
				2047	generating the simulators; to see the allowed values, run
				2048	<code>vg_cachegen -h</code>.<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2049
				2050	An example invocation would be:
				2051
				2052	<blockquote><code>
				2053	vg_cachegen --I1=65536,64,2 --D1=65536,64,2 --L2=262144,64,8
				2054	</code></blockquote>
				2055
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2056	This simulates a machine with a 128KB split L1 2-way associative
				2057	cache, and a 256KB unified 8-way associative L2 cache. Both caches
				2058	have 64B lines.<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2059
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2060	If you don't know your cache configuration, you'll have to find it
				2061	out. (Ideally <code>vg_cachegen</code> could auto-identify your cache
				2062	configuration using the CPUID instruction, which could be done
				2063	automatically during installation, and this whole step could be
				2064	skipped.)<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2065
				2066
				2067	<h3>7.4  Cache simulation specifics</h3>
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2068
				2069	<code>vg_cachegen</code> only generates simulations for a machine with
				2070	a split L1 cache and a unified L2 cache. This configuration is used
				2071	for all (modern) x86-based machines we are aware of. Old Cyrix CPUs
				2072	had a unified I and D L1 cache, but they are ancient history now.<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2073
				2074	The more specific characteristics of the simulation are as follows.
				2075
				2076	<ul>
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2077	<li>Write-allocate: when a write miss occurs, the block written to
				2078	is brought into the D1 cache. Most modern caches have this
				2079	property.</li><p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2080
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2081	<li>Bit-selection hash function: the line(s) in the cache to which a
				2082	memory block maps is chosen by the middle bits M--(M+N-1) of the
				2083	byte address, where:
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2084	<ul>
				2085	<li> line size = 2^M bytes </li>
				2086	<li>(cache size / line size) = 2^N bytes</li>
				2087	</ul> </li><p>
				2088
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2089	<li>Inclusive L2 cache: the L2 cache replicates all the entries of
				2090	the L1 cache. This is standard on Pentium chips, but AMD
				2091	Athlons use an exclusive L2 cache that only holds blocks evicted
				2092	from L1. Ditto AMD Durons and most modern VIAs.</li><p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2093	</ul>
				2094
				2095	Other noteworthy behaviour:
				2096
				2097	<ul>
				2098	<li>References that straddle two cache lines are treated as follows:</li>
				2099	<ul>
				2100	<li>If both blocks hit --> counted as one hit</li>
				2101	<li>If one block hits, the other misses --> counted as one miss</li>
				2102	<li>If both blocks miss --> counted as one miss (not two)</li>
				2103	</ul><p>
				2104
				2105	<li>Instructions that modify a memory location (eg. <code>inc</code> and
				2106	<code>dec</code>) are counted as doing just a read, ie. a single data
				2107	reference. This may seem strange, but since the write can never cause a
				2108	miss (the read guarantees the block is in the cache) it's not very
				2109	interesting.<p>
				2110
				2111	Thus it measures not the number of times the data cache is accessed, but
				2112	the number of times a data cache miss could occur.<p>
				2113	</li>
				2114	</ul>
				2115
				2116	If you are interested in simulating a cache with different properties, it is
				2117	not particularly hard to write your own cache simulator, or to modify existing
				2118	ones in <code>vg_cachesim_I1.c</code>, <code>vg_cachesim_I1.c</code> and
				2119	<code>vg_cachesim_I1.c</code>. We'd be interested to hear from anyone who
				2120	does.
				2121
				2122
				2123	<a name="profile"></a>
				2124	<h3>7.5  Profiling programs</h3>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2125
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2126	Cache profiling is enabled by using the <code>--cachesim=yes</code>
				2127	option to the <code>valgrind</code> shell script. Alternatively, it
				2128	is probably more convenient to use the <code>cachegrind</code> script.
				2129	This automatically turns off Valgrind's memory checking functions,
				2130	since the cache simulation is slow enough already, and you probably
				2131	don't want to do both at once.
				2132	<p>
				2133	To gather cache profiling information about the program <code>ls
				2134	-l<code, type:
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2135
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2136	<blockquote><code>cachegrind ls -l</code></blockquote>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2137
				2138	The program will execute (slowly). Upon completion, summary statistics
				2139	that look like this will be printed:
				2140
				2141	<pre>
				2142	==31751== I refs: 27,742,716
				2143	==31751== I1 misses: 276
				2144	==31751== L2 misses: 275
				2145	==31751== I1 miss rate: 0.0%
				2146	==31751== L2i miss rate: 0.0%
				2147	==31751==
				2148	==31751== D refs: 15,430,290 (10,955,517 rd + 4,474,773 wr)
				2149	==31751== D1 misses: 41,185 ( 21,905 rd + 19,280 wr)
				2150	==31751== L2 misses: 23,085 ( 3,987 rd + 19,098 wr)
				2151	==31751== D1 miss rate: 0.2% ( 0.1% + 0.4%)
				2152	==31751== L2d miss rate: 0.1% ( 0.0% + 0.4%)
				2153	==31751==
				2154	==31751== L2 misses: 23,360 ( 4,262 rd + 19,098 wr)
				2155	==31751== L2 miss rate: 0.0% ( 0.0% + 0.4%)
				2156	</pre>
				2157
				2158	Cache accesses for instruction fetches are summarised first, giving the
				2159	number of fetches made (this is the number of instructions executed, which
				2160	can be useful to know in its own right), the number of I1 misses, and the
				2161	number of L2 instruction (<code>L2i</code>) misses.<p>
				2162
				2163	Cache accesses for data follow. The information is similar to that of the
				2164	instruction fetches, except that the values are also shown split between reads
				2165	and writes (note each row's <code>rd</code> and <code>wr</code> values add up
				2166	to the row's total).<p>
				2167
				2168	Combined instruction and data figures for the L2 cache follow that.<p>
				2169
				2170
				2171	<h3>7.6  Output file</h3>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2172
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2173	As well as printing summary information, Cachegrind also writes
				2174	line-by-line cache profiling information to a file named
				2175	<code>cachegrind.out</code>. This file is human-readable, but is best
				2176	interpreted by the accompanying program <code>vg_annotate</code>,
				2177	described in the next section.
				2178	<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2179	Things to note about the <code>cachegrind.out</code> file:
				2180	<ul>
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2181	<li>It is written every time <code>valgrind --cachesim=yes</code> or
				2182	<code>cachegrind</code> is run, and will overwrite any existing
				2183	<code>cachegrind.out</code> in the current directory.</li>
				2184	<p>
				2185	<li>It can be huge: <code>ls -l</code> generates a file of about
				2186	350KB. Browsing a few files and web pages with a Konqueror
				2187	built with full debugging information generates a file
				2188	of around 15 MB.</li>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2189	</ul>
				2190
				2191
				2192	<a name="annotate"></a>
				2193	<h3>7.7  Annotating C/C++ programs</h3>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2194
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2195	Before using <code>vg_annotate</code>, it is worth widening your
				2196	window to be at least 120-characters wide if possible, as the output
				2197	lines can be quite long.
				2198	<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2199	To get a function-by-function summary, run <code>vg_annotate</code> in
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2200	directory containing a <code>cachegrind.out</code> file. The output
				2201	looks like this:
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2202
				2203	<pre>
				2204	--------------------------------------------------------------------------------
				2205	I1 cache: 65536 B, 64 B, 2-way associative
				2206	D1 cache: 65536 B, 64 B, 2-way associative
				2207	L2 cache: 262144 B, 64 B, 8-way associative
				2208	Command: concord vg_to_ucode.c
				2209	Events recorded: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				2210	Events shown: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				2211	Event sort order: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				2212	Threshold: 99%
				2213	Chosen for annotation:
				2214	Auto-annotation: on
				2215
				2216	--------------------------------------------------------------------------------
				2217	Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				2218	--------------------------------------------------------------------------------
				2219	27,742,716 276 275 10,955,517 21,905 3,987 4,474,773 19,280 19,098 PROGRAM TOTALS
				2220
				2221	--------------------------------------------------------------------------------
				2222	Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw file:function
				2223	--------------------------------------------------------------------------------
				2224	8,821,482 5 5 2,242,702 1,621 73 1,794,230 0 0 getc.c:_IO_getc
				2225	5,222,023 4 4 2,276,334 16 12 875,959 1 1 concord.c:get_word
				2226	2,649,248 2 2 1,344,810 7,326 1,385 . . . vg_main.c:strcmp
				2227	2,521,927 2 2 591,215 0 0 179,398 0 0 concord.c:hash
				2228	2,242,740 2 2 1,046,612 568 22 448,548 0 0 ctype.c:tolower
				2229	1,496,937 4 4 630,874 9,000 1,400 279,388 0 0 concord.c:insert
				2230	897,991 51 51 897,831 95 30 62 1 1 ???:???
				2231	598,068 1 1 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__flockfile
				2232	598,068 0 0 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__funlockfile
				2233	598,024 4 4 213,580 35 16 149,506 0 0 vg_clientmalloc.c:malloc
				2234	446,587 1 1 215,973 2,167 430 129,948 14,057 13,957 concord.c:add_existing
				2235	341,760 2 2 128,160 0 0 128,160 0 0 vg_clientmalloc.c:vg_trap_here_WRAPPER
				2236	320,782 4 4 150,711 276 0 56,027 53 53 concord.c:init_hash_table
				2237	298,998 1 1 106,785 0 0 64,071 1 1 concord.c:create
				2238	149,518 0 0 149,516 0 0 1 0 0 ???:tolower@@GLIBC_2.0
				2239	149,518 0 0 149,516 0 0 1 0 0 ???:fgetc@@GLIBC_2.0
				2240	95,983 4 4 38,031 0 0 34,409 3,152 3,150 concord.c:new_word_node
				2241	85,440 0 0 42,720 0 0 21,360 0 0 vg_clientmalloc.c:vg_bogus_epilogue
				2242	</pre>
				2243
				2244	First up is a summary of the annotation options:
				2245
				2246	<ul>
				2247	<li>I1 cache, D1 cache, L2 cache: cache configuration. So you know the
				2248	configuration with which these results were obtained.</li><p>
				2249
				2250	<li>Command: the command line invocation of the program under
				2251	examination.</li><p>
				2252
				2253	<li>Events recorded: event abbreviations are:<p>
				2254	<ul>
				2255	<li><code>Ir </code>: I cache reads (ie. instructions executed)</li>
				2256	<li><code>I1mr</code>: I1 cache read misses</li>
				2257	<li><code>I2mr</code>: L2 cache instruction read misses</li>
				2258	<li><code>Dr </code>: D cache reads (ie. memory reads)</li>
				2259	<li><code>D1mr</code>: D1 cache read misses</li>
				2260	<li><code>D2mr</code>: L2 cache data read misses</li>
				2261	<li><code>Dw </code>: D cache writes (ie. memory writes)</li>
				2262	<li><code>D1mw</code>: D1 cache write misses</li>
				2263	<li><code>D2mw</code>: L2 cache data write misses</li>
				2264	</ul><p>
				2265	Note that D1 total accesses is given by <code>D1mr</code> +
				2266	<code>D1mw</code>, and that L2 total accesses is given by
				2267	<code>I2mr</code> + <code>D2mr</code> + <code>D2mw</code>.</li><p>
				2268
				2269	<li>Events shown: the events shown (a subset of events gathered). This can
				2270	be adjusted with the <code>--show</code> option.</li><p>
				2271
				2272	<li>Event sort order: the sort order in which functions are shown. For
				2273	example, in this case the functions are sorted from highest
				2274	<code>Ir</code> counts to lowest. If two functions have identical
				2275	<code>Ir</code> counts, they will then be sorted by <code>I1mr</code>
				2276	counts, and so on. This order can be adjusted with the
				2277	<code>--sort</code> option.<p>
				2278
				2279	Note that this dictates the order the functions appear. It is <b>not</b>
				2280	the order in which the columns appear; that is dictated by the "events
				2281	shown" line (and can be changed with the <code>--sort</code> option).
				2282	</li><p>
				2283
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2284	<li>Threshold: <code>vg_annotate</code> by default omits functions
				2285	that cause very low numbers of misses to avoid drowning you in
				2286	information. In this case, vg_annotate shows summaries the
				2287	functions that account for 99% of the <code>Ir</code> counts;
				2288	<code>Ir</code> is chosen as the threshold event since it is the
				2289	primary sort event. The threshold can be adjusted with the
				2290	<code>--threshold</code> option.</li><p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2291
				2292	<li>Chosen for annotation: names of files specified manually for annotation;
				2293	in this case none.</li><p>
				2294
				2295	<li>Auto-annotation: whether auto-annotation was requested via the
				2296	<code>--auto=yes</code> option. In this case no.</li><p>
				2297	</ul>
				2298
				2299	Then follows summary statistics for the whole program. These are similar
				2300	to the summary provided when running <code>valgrind --cachesim=yes</code>.<p>
				2301
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2302	Then follows function-by-function statistics. Each function is
				2303	identified by a <code>file_name:function_name</code> pair. If a column
				2304	contains only a dot it means the function never performs
				2305	that event (eg. the third row shows that <code>strcmp()</code>
				2306	contains no instructions that write to memory). The name
				2307	<code>???</code> is used if the the file name and/or function name
				2308	could not be determined from debugging information. If most of the
				2309	entries have the form <code>???:???</code> the program probably wasn't
				2310	compiled with <code>-g</code>. <p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2311
				2312	It is worth noting that functions will come from three types of source files:
				2313	<ol>
				2314	<li> From the profiled program (<code>concord.c</code> in this example).</li>
				2315	<li>From libraries (eg. <code>getc.c</code>)</li>
				2316	<li>From Valgrind's implementation of some libc functions (eg.
				2317	<code>vg_clientmalloc.c:malloc</code>). These are recognisable because
				2318	the filename begins with <code>vg_</code>, and is probably one of
				2319	<code>vg_main.c</code>, <code>vg_clientmalloc.c</code> or
				2320	<code>vg_mylibc.c</code>.
				2321	</li>
				2322	</ol>
				2323
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2324	There are two ways to annotate source files -- by choosing them
				2325	manually, or with the <code>--auto=yes</code> option. To do it
				2326	manually, just specify the filenames as arguments to
				2327	<code>vg_annotate</code>. For example, the output from running
				2328	<code>vg_annotate concord.c</code> for our example produces the same
				2329	output as above followed by an annotated version of
				2330	<code>concord.c</code>, a section of which looks like:
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2331
				2332	<pre>
				2333	--------------------------------------------------------------------------------
				2334	-- User-annotated source: concord.c
				2335	--------------------------------------------------------------------------------
				2336	Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				2337
				2338	[snip]
				2339
				2340	. . . . . . . . . void init_hash_table(char file_name, Word_Node table[])
				2341	3 1 1 . . . 1 0 0 {
				2342	. . . . . . . . . FILE *file_ptr;
				2343	. . . . . . . . . Word_Info *data;
				2344	1 0 0 . . . 1 1 1 int line = 1, i;
				2345	. . . . . . . . .
				2346	5 0 0 . . . 3 0 0 data = (Word_Info *) create(sizeof(Word_Info));
				2347	. . . . . . . . .
				2348	4,991 0 0 1,995 0 0 998 0 0 for (i = 0; i < TABLE_SIZE; i++)
				2349	3,988 1 1 1,994 0 0 997 53 52 table[i] = NULL;
				2350	. . . . . . . . .
				2351	. . . . . . . . . /* Open file, check it. */
				2352	6 0 0 1 0 0 4 0 0 file_ptr = fopen(file_name, "r");
				2353	2 0 0 1 0 0 . . . if (!(file_ptr)) {
				2354	. . . . . . . . . fprintf(stderr, "Couldn't open '%s'.\n", file_name);
				2355	1 1 1 . . . . . . exit(EXIT_FAILURE);
				2356	. . . . . . . . . }
				2357	. . . . . . . . .
				2358	165,062 1 1 73,360 0 0 91,700 0 0 while ((line = get_word(data, line, file_ptr)) != EOF)
				2359	146,712 0 0 73,356 0 0 73,356 0 0 insert(data->;word, data->line, table);
				2360	. . . . . . . . .
				2361	4 0 0 1 0 0 2 0 0 free(data);
				2362	4 0 0 1 0 0 2 0 0 fclose(file_ptr);
				2363	3 0 0 2 0 0 . . . }
				2364	</pre>
				2365
				2366	(Although column widths are automatically minimised, a wide terminal is clearly
				2367	useful.)<p>
				2368
				2369	Each source file is clearly marked (<code>User-annotated source</code>) as
				2370	having been chosen manually for annotation. If the file was found in one of
				2371	the directories specified with the <code>-I</code>/<code>--include</code>
				2372	option, the directory and file are both given.<p>
				2373
				2374	Each line is annotated with its event counts. Events not applicable for a line
				2375	are represented by a `.'; this is useful for distinguishing between an event
				2376	which cannot happen, and one which can but did not.<p>
				2377
				2378	Sometimes only a small section of a source file is executed. To minimise
				2379	uninteresting output, Valgrind only shows annotated lines and lines within a
				2380	small distance of annotated lines. Gaps are marked with the line numbers so
				2381	you know which part of a file the shown code comes from, eg:
				2382
				2383	<pre>
				2384	(figures and code for line 704)
				2385	-- line 704 ----------------------------------------
				2386	-- line 878 ----------------------------------------
				2387	(figures and code for line 878)
				2388	</pre>
				2389
				2390	The amount of context to show around annotated lines is controlled by the
				2391	<code>--context</code> option.<p>
				2392
				2393	To get automatic annotation, run <code>vg_annotate --auto=yes</code>.
				2394	vg_annotate will automatically annotate every source file it can find that is
				2395	mentioned in the function-by-function summary. Therefore, the files chosen for
				2396	auto-annotation are affected by the <code>--sort</code> and
				2397	<code>--threshold</code> options. Each source file is clearly marked
				2398	(<code>Auto-annotated source</code>) as being chosen automatically. Any files
				2399	that could not be found are mentioned at the end of the output, eg:
				2400
				2401	<pre>
				2402	--------------------------------------------------------------------------------
				2403	The following files chosen for auto-annotation could not be found:
				2404	--------------------------------------------------------------------------------
				2405	getc.c
				2406	ctype.c
				2407	../sysdeps/generic/lockfile.c
				2408	</pre>
				2409
				2410	This is quite common for library files, since libraries are usually compiled
				2411	with debugging information, but the source files are often not present on a
				2412	system. If a file is chosen for annotation <b>both</b> manually and
				2413	automatically, it is marked as <code>User-annotated source</code>.
				2414
				2415	Use the <code>-I/--include</code> option to tell Valgrind where to look for
				2416	source files if the filenames found from the debugging information aren't
				2417	specific enough.
				2418
				2419	Beware that vg_annotate can take some time to digest large
				2420	<code>cachegrind.out</code> files, eg. 30 seconds or more. Also beware that
				2421	auto-annotation can produce a lot of output if your program is large!
				2422
				2423
				2424	<h3>7.8  Annotating assembler programs</h3>
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2425
				2426	Valgrind can annotate assembler programs too, or annotate the
				2427	assembler generated for your C program. Sometimes this is useful for
				2428	understanding what is really happening when an interesting line of C
				2429	code is translated into multiple instructions.<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2430
				2431	To do this, you just need to assemble your <code>.s</code> files with
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2432	assembler-level debug information. gcc doesn't do this, but you can
				2433	use the GNU assembler with the <code>--gstabs</code> option to
				2434	generate object files with this information, eg:
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2435
				2436	<blockquote><code>as --gstabs foo.s</code></blockquote>
				2437
				2438	You can then profile and annotate source files in the same way as for C/C++
				2439	programs.
				2440
				2441
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2442	<h3>7.9  <code>vg_annotate</code> options</h3>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2443	<ul>
				2444	<li><code>-h, --help</code></li><p>
				2445	<li><code>-v, --version</code><p>
				2446
				2447	Help and version, as usual.</li>
				2448
				2449	<li><code>--sort=A,B,C</code> [default: order in
				2450	<code>cachegrind.out</code>]<p>
				2451	Specifies the events upon which the sorting of the function-by-function
				2452	entries will be based. Useful if you want to concentrate on eg. I cache
				2453	misses (<code>--sort=I1mr,I2mr</code>), or D cache misses
				2454	(<code>--sort=D1mr,D2mr</code>), or L2 misses
				2455	(<code>--sort=D2mr,I2mr</code>).</li><p>
				2456
				2457	<li><code>--show=A,B,C</code> [default: all, using order in
				2458	<code>cachegrind.out</code>]<p>
				2459	Specifies which events to show (and the column order). Default is to use
				2460	all present in the <code>cachegrind.out</code> file (and use the order in
				2461	the file).</li><p>
				2462
				2463	<li><code>--threshold=X</code> [default: 99%] <p>
				2464	Sets the threshold for the function-by-function summary. Functions are
njn	bff8876	2002-05-13 20:27:54 +0000	[diff] [blame^]	2465	shown that account for more than X% of the primary sort event. If
				2466	auto-annotating, also affects which files are annotated.
				2467
				2468	Note: thresholds can be set for more than one of the events by appending
				2469	any events for the <code>--sort</code> option with a colon and a number
				2470	(no spaces, though). E.g. if you want to see the functions that cover
				2471	99% of L2 read misses and 99% of L2 write misses, use this option:
				2472
				2473	<blockquote><code>--sort=D2mr:99,D2mw:99</code></blockquote>
				2474	</li><p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2475
				2476	<li><code>--auto=no</code> [default]<br>
				2477	<code>--auto=yes</code> <p>
				2478	When enabled, automatically annotates every file that is mentioned in the
				2479	function-by-function summary that can be found. Also gives a list of
				2480	those that couldn't be found.
				2481
				2482	<li><code>--context=N</code> [default: 8]<p>
				2483	Print N lines of context before and after each annotated line. Avoids
				2484	printing large sections of source files that were not executed. Use a
				2485	large number (eg. 10,000) to show all source lines.
				2486	</li><p>
				2487
				2488	<li><code>-I=<dir>, --include=<dir></code>
				2489	[default: empty string]<p>
				2490	Adds a directory to the list in which to search for files. Multiple
				2491	-I/--include options can be given to add multiple directories.
				2492	</ul>
				2493
				2494
				2495	<h3>7.10  Warnings</h3>
				2496	There are a couple of situations in which vg_annotate issues warnings.
				2497
				2498	<ul>
				2499	<li>If a source file is more recent than the <code>cachegrind.out</code>
				2500	file. This is because the information in <code>cachegrind.out</code> is
				2501	only recorded with line numbers, so if the line numbers change at all in
				2502	the source (eg. lines added, deleted, swapped), any annotations will be
				2503	incorrect.<p>
				2504
				2505	<li>If information is recorded about line numbers past the end of a file.
				2506	This can be caused by the above problem, ie. shortening the source file
				2507	while using an old <code>cachegrind.out</code> file. If this happens,
				2508	the figures for the bogus lines are printed anyway (clearly marked as
				2509	bogus) in case they are important.</li><p>
				2510	</ul>
				2511
				2512
				2513	<h3>7.10  Things to watch out for</h3>
				2514	Some odd things that can occur during annotation:
				2515
				2516	<ul>
				2517	<li>If annotating at the assembler level, you might see something like this:
				2518
				2519	<pre>
				2520	1 0 0 . . . . . . leal -12(%ebp),%eax
				2521	1 0 0 . . . 1 0 0 movl %eax,84(%ebx)
				2522	2 0 0 0 0 0 1 0 0 movl $1,-20(%ebp)
				2523	. . . . . . . . . .align 4,0x90
				2524	1 0 0 . . . . . . movl $.LnrB,%eax
				2525	1 0 0 . . . 1 0 0 movl %eax,-16(%ebp)
				2526	</pre>
				2527
				2528	How can the third instruction be executed twice when the others are
				2529	executed only once? As it turns out, it isn't. Here's a dump of the
				2530	executable, from objdump:
				2531
				2532	<pre>
				2533	8048f25: 8d 45 f4 lea 0xfffffff4(%ebp),%eax
				2534	8048f28: 89 43 54 mov %eax,0x54(%ebx)
				2535	8048f2b: c7 45 ec 01 00 00 00 movl $0x1,0xffffffec(%ebp)
				2536	8048f32: 89 f6 mov %esi,%esi
				2537	8048f34: b8 08 8b 07 08 mov $0x8078b08,%eax
				2538	8048f39: 89 45 f0 mov %eax,0xfffffff0(%ebp)
				2539	</pre>
				2540
				2541	Notice the extra <code>mov %esi,%esi</code> instruction. Where did this
				2542	come from? The GNU assembler inserted it to serve as the two bytes of
				2543	padding needed to align the <code>movl $.LnrB,%eax</code> instruction on
				2544	a four-byte boundary, but pretended it didn't exist when adding debug
				2545	information. Thus when Valgrind reads the debug info it thinks that the
				2546	<code>movl $0x1,0xffffffec(%ebp)</code> instruction covers the address
				2547	range 0x8048f2b--0x804833 by itself, and attributes the counts for the
				2548	<code>mov %esi,%esi</code> to it.<p>
				2549	</li>
				2550
njn	7efaa11	2002-05-07 10:26:57 +0000	[diff] [blame]	2551	<li>Inlined functions can cause strange results in the function-by-function
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2552	summary. If a function <code>inline_me()</code> is defined in
				2553	<code>foo.h</code> and inlined in the functions <code>f1()</code>,
				2554	<code>f2()</code> and <code>f3()</code> in <code>bar.c</code>, there will
				2555	not be a <code>foo.h:inline_me()</code> function entry. Instead, there
				2556	will be separate function entries for each inlining site, ie.
				2557	<code>foo.h:f1()</code>, <code>foo.h:f2()</code> and
				2558	<code>foo.h:f3()</code>. To find the total counts for
				2559	<code>foo.h:inline_me()</code>, add up the counts from each entry.<p>
				2560
				2561	The reason for this is that although the debug info output by gcc
				2562	indicates the switch from <code>bar.c</code> to <code>foo.h</code>, it
				2563	doesn't indicate the name of the function in <code>foo.h</code>, so
				2564	Valgrind keeps using the old one.<p>
				2565
njn	7efaa11	2002-05-07 10:26:57 +0000	[diff] [blame]	2566	<li>Sometimes, the same filename might be represented with a relative name
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2567	and with an absolute name in different parts of the debug info, eg:
				2568	<code>/home/user/proj/proj.h</code> and <code>../proj.h</code>. In this
				2569	case, if you use auto-annotation, the file will be annotated twice with
				2570	the counts split between the two.<p>
				2571	</li>
njn	7efaa11	2002-05-07 10:26:57 +0000	[diff] [blame]	2572
				2573	<li>Files with more than 65,535 lines cause difficulties for the stabs debug
				2574	info reader. This is because the line number in the <code>struct
				2575	nlist</code> defined in <code>a.out.h</code> under Linux is only a 16-bit
				2576	number. Valgrind can handle some files with more than 65,535 lines
				2577	correctly by making some guesses to identify line number overflows. But
				2578	some cases are beyond it, in which case you'll get a warning message
njn	bff8876	2002-05-13 20:27:54 +0000	[diff] [blame^]	2579	explaining that annotations for the file might be incorrect.<p>
				2580	</li>
				2581
				2582	<li>If you compile some files with <code>-g</code> and some without, some
				2583	events that take place in a file without debug info could be attributed
				2584	to the last line of a file with debug info (whichever one gets placed
				2585	before the non-debug-info file in the executable).<p>
njn	7efaa11	2002-05-07 10:26:57 +0000	[diff] [blame]	2586	</li>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2587	</ul>
				2588
njn	bff8876	2002-05-13 20:27:54 +0000	[diff] [blame^]	2589	This list looks long, but these cases should be fairly rare.<p>
				2590
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2591	Note: stabs is not an easy format to read. If you come across bizarre
				2592	annotations that look like might be caused by a bug in the stabs reader,
njn	bff8876	2002-05-13 20:27:54 +0000	[diff] [blame^]	2593	please let us know.<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2594
				2595
				2596	<h3>7.11  Accuracy</h3>
				2597	Valgrind's cache profiling has a number of shortcomings:
				2598
				2599	<ul>
				2600	<li>It doesn't account for kernel activity -- the effect of system calls on
				2601	the cache contents is ignored.</li><p>
				2602
				2603	<li>It doesn't account for other process activity (although this is probably
				2604	desirable when considering a single program).</li><p>
				2605
				2606	<li>It doesn't account for virtual-to-physical address mappings; hence the
				2607	entire simulation is not a true representation of what's happening in the
				2608	cache.</li><p>
				2609
				2610	<li>It doesn't account for cache misses not visible at the instruction level,
				2611	eg. those arising from TLB misses, or speculative execution.</li><p>
njn	db75e4d	2002-04-30 12:46:22 +0000	[diff] [blame]	2612
njn	bff8876	2002-05-13 20:27:54 +0000	[diff] [blame^]	2613	<li>Valgrind's custom <code>malloc()</code> will allocate memory in different
				2614	ways to the standard <code>malloc()</code>, which could warp the results.
				2615	</li><p>
				2616
njn	db75e4d	2002-04-30 12:46:22 +0000	[diff] [blame]	2617	<li>The instructions <code>bts</code>, <code>btr</code> and <code>btc</code>
				2618	will incorrectly be counted as doing a data read if both the arguments
				2619	are registers, eg:
				2620
				2621	<blockquote><code>btsl %eax, %edx</code></blockquote>
				2622
				2623	This should only happen rarely.
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2624	</ul>
				2625
				2626	Another thing worth nothing is that results are very sensitive. Changing the
				2627	size of the <code>valgrind.so</code> file, the size of the program being
				2628	profiled, or even the length of its name can perturb the results. Variations
				2629	will be small, but don't expect perfectly repeatable results if your program
				2630	changes at all.<p>
				2631
				2632	While these factors mean you shouldn't trust the results to be super-accurate,
				2633	hopefully they should be close enough to be useful.<p>
				2634
				2635
				2636	<h3>7.12  Todo</h3>
				2637	<ul>
				2638	<li>Use CPUID instruction to auto-identify cache configuration during
				2639	installation. This would save the user from having to know their cache
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2640	configuration and using vg_cachegen.</li>
				2641	<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2642	<li>Program start-up/shut-down calls a lot of functions that aren't
				2643	interesting and just complicate the output. Would be nice to exclude
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2644	these somehow.</li>
				2645	<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2646	</ul>
				2647	<hr width="100%">
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	2648	</body>
				2649	</html>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2650