Blame - memcheck/docs/manual.html - platform/external/valgrind

blob: 8a24c85cf60f0e4c51c2ed89fd34eaa9b6ed0ee6 [file] [log] [blame]

sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1	<html>
				2	<head>
				3	<style type="text/css">
				4	body { background-color: #ffffff;
				5	color: #000000;
				6	font-family: Times, Helvetica, Arial;
				7	font-size: 14pt}
				8	h4 { margin-bottom: 0.3em}
				9	code { color: #000000;
				10	font-family: Courier;
				11	font-size: 13pt }
				12	pre { color: #000000;
				13	font-family: Courier;
				14	font-size: 13pt }
				15	a:link { color: #0000C0;
				16	text-decoration: none; }
				17	a:visited { color: #0000C0;
				18	text-decoration: none; }
				19	a:active { color: #0000C0;
				20	text-decoration: none; }
				21	</style>
				22	</head>
				23
				24	<body bgcolor="#ffffff">
				25
				26	<a name="title"> </a>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	27	<h1 align=center>Valgrind, snapshot 20020501</h1>
				28	<center>This manual was majorly updated on 20020501</center>
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	29	<p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	30
				31	<center>
				32	<a href="mailto:jseward@acm.org">jseward@acm.org<br>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	33	Copyright © 2000-2002 Julian Seward
				34	<p>
				35	Valgrind is licensed under the GNU General Public License,
				36	version 2<br>
				37	An open-source tool for finding memory-management problems in
				38	Linux-x86 executables.
				39	</center>
				40
				41	<p>
				42
				43	<hr width="100%">
				44	<a name="contents"></a>
				45	<h2>Contents of this manual</h2>
				46
				47	<h4>1  <a href="#intro">Introduction</a></h4>
				48	1.1  <a href="#whatfor">What Valgrind is for</a><br>
				49	1.2  <a href="#whatdoes">What it does with your program</a>
				50
				51	<h4>2  <a href="#howtouse">How to use it, and how to make sense
				52	of the results</a></h4>
				53	2.1  <a href="#starta">Getting started</a><br>
				54	2.2  <a href="#comment">The commentary</a><br>
				55	2.3  <a href="#report">Reporting of errors</a><br>
				56	2.4  <a href="#suppress">Suppressing errors</a><br>
				57	2.5  <a href="#flags">Command-line flags</a><br>
				58	2.6  <a href="#errormsgs">Explaination of error messages</a><br>
				59	2.7  <a href="#suppfiles">Writing suppressions files</a><br>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	60	2.8  <a href="#clientreq">The Client Request mechanism</a><br>
				61	2.9  <a href="#pthreads">Support for POSIX pthreads</a><br>
				62	2.10  <a href="#install">Building and installing</a><br>
				63	2.11  <a href="#problems">If you have problems</a><br>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	64
				65	<h4>3  <a href="#machine">Details of the checking machinery</a></h4>
				66	3.1  <a href="#vvalue">Valid-value (V) bits</a><br>
				67	3.2  <a href="#vaddress">Valid-address (A) bits</a><br>
				68	3.3  <a href="#together">Putting it all together</a><br>
				69	3.4  <a href="#signals">Signals</a><br>
				70	3.5  <a href="#leaks">Memory leak detection</a><br>
				71
				72	<h4>4  <a href="#limits">Limitations</a></h4>
				73
				74	<h4>5  <a href="#howitworks">How it works -- a rough overview</a></h4>
				75	5.1  <a href="#startb">Getting started</a><br>
				76	5.2  <a href="#engine">The translation/instrumentation engine</a><br>
				77	5.3  <a href="#track">Tracking the status of memory</a><br>
				78	5.4  <a href="#sys_calls">System calls</a><br>
				79	5.5  <a href="#sys_signals">Signals</a><br>
				80
				81	<h4>6  <a href="#example">An example</a></h4>
				82
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	83	<h4>7  <a href="#cache">Cache profiling</a></h4>
				84
				85	<h4>8  <a href="techdocs.html">The design and implementation of Valgrind</a></h4>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	86
				87	<hr width="100%">
				88
				89	<a name="intro"></a>
				90	<h2>1  Introduction</h2>
				91
				92	<a name="whatfor"></a>
				93	<h3>1.1  What Valgrind is for</h3>
				94
				95	Valgrind is a tool to help you find memory-management problems in your
				96	programs. When a program is run under Valgrind's supervision, all
				97	reads and writes of memory are checked, and calls to
				98	malloc/new/free/delete are intercepted. As a result, Valgrind can
				99	detect problems such as:
				100	<ul>
				101	<li>Use of uninitialised memory</li>
				102	<li>Reading/writing memory after it has been free'd</li>
				103	<li>Reading/writing off the end of malloc'd blocks</li>
				104	<li>Reading/writing inappropriate areas on the stack</li>
				105	<li>Memory leaks -- where pointers to malloc'd blocks are lost forever</li>
				106	</ul>
				107
				108	Problems like these can be difficult to find by other means, often
				109	lying undetected for long periods, then causing occasional,
				110	difficult-to-diagnose crashes.
				111
				112	<p>
				113	Valgrind is closely tied to details of the CPU, operating system and
				114	to a less extent, compiler and basic C libraries. This makes it
				115	difficult to make it portable, so I have chosen at the outset to
				116	concentrate on what I believe to be a widely used platform: Red Hat
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	117	Linux 7.2, on x86s. Valgrind uses the standard Unix
				118	<code>./configure</code>, <code>make</code>, <code>make install</code>
				119	mechanism, and I have attempted to ensure that it works on machines
				120	with kernel 2.2 or 2.4 and glibc 2.1.X or 2.2.X. This should cover
				121	the vast majority of modern Linux installations.
				122
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	123
				124	<p>
				125	Valgrind is licensed under the GNU General Public License, version
				126	2. Read the file LICENSE in the source distribution for details.
				127
				128	<a name="whatdoes">
				129	<h3>1.2  What it does with your program</h3>
				130
				131	Valgrind is designed to be as non-intrusive as possible. It works
				132	directly with existing executables. You don't need to recompile,
				133	relink, or otherwise modify, the program to be checked. Simply place
				134	the word <code>valgrind</code> at the start of the command line
				135	normally used to run the program. So, for example, if you want to run
				136	the command <code>ls -l</code> on Valgrind, simply issue the
				137	command: <code>valgrind ls -l</code>.
				138
				139	<p>Valgrind takes control of your program before it starts. Debugging
				140	information is read from the executable and associated libraries, so
				141	that error messages can be phrased in terms of source code
				142	locations. Your program is then run on a synthetic x86 CPU which
				143	checks every memory access. All detected errors are written to a
				144	log. When the program finishes, Valgrind searches for and reports on
				145	leaked memory.
				146
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	147	<p>You can run pretty much any dynamically linked ELF x86 executable
				148	using Valgrind. Programs run 25 to 50 times slower, and take a lot
				149	more memory, than they usually would. It works well enough to run
				150	large programs. For example, the Konqueror web browser from the KDE
				151	Desktop Environment, version 3.0, runs slowly but usably on Valgrind.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	152
				153	<p>Valgrind simulates every single instruction your program executes.
				154	Because of this, it finds errors not only in your application but also
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	155	in all supporting dynamically-linked (<code>.so</code>-format)
				156	libraries, including the GNU C library, the X client libraries, Qt, if
				157	you work with KDE, and so on. That often includes libraries, for
				158	example the GNU C library, which contain memory access violations, but
				159	which you cannot or do not want to fix.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	160
				161	<p>Rather than swamping you with errors in which you are not
				162	interested, Valgrind allows you to selectively suppress errors, by
				163	recording them in a suppressions file which is read when Valgrind
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	164	starts up. The build mechanism attempts to select suppressions which
				165	give reasonable behaviour for the libc and XFree86 versions detected
				166	on your machine.
				167
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	168
				169	<p><a href="#example">Section 6</a> shows an example of use.
				170	<p>
				171	<hr width="100%">
				172
				173	<a name="howtouse"></a>
				174	<h2>2  How to use it, and how to make sense of the results</h2>
				175
				176	<a name="starta"></a>
				177	<h3>2.1  Getting started</h3>
				178
				179	First off, consider whether it might be beneficial to recompile your
				180	application and supporting libraries with optimisation disabled and
				181	debugging info enabled (the <code>-g</code> flag). You don't have to
				182	do this, but doing so helps Valgrind produce more accurate and less
				183	confusing error reports. Chances are you're set up like this already,
				184	if you intended to debug your program with GNU gdb, or some other
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	185	debugger.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	186
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	187	<p>
				188	A plausible compromise is to use <code>-g -O</code>.
				189	Optimisation levels above <code>-O</code> have been observed, on very
				190	rare occasions, to cause gcc to generate code which fools Valgrind's
				191	error tracking machinery into wrongly reporting uninitialised value
				192	errors. <code>-O</code> gets you the vast majority of the benefits of
				193	higher optimisation levels anyway, so you don't lose much there.
				194
				195	<p>
				196	Note that as of 1 May 2002 Valgrind does not understand the DWARF
				197	debugging format, which is unfortunate since the upcoming gcc-3.1 uses
				198	it by default. Valgrind only knows about the older "stabs" format.
				199	If you use gcc-3.1 or above, you can still ask for stabs-format debug
				200	info by passing <code>-gstabs</code> to gcc.
				201
				202	<p>
				203	Then just run your application, but place the word
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	204	<code>valgrind</code> in front of your usual command-line invokation.
				205	Note that you should run the real (machine-code) executable here. If
				206	your application is started by, for example, a shell or perl script,
				207	you'll need to modify it to invoke Valgrind on the real executables.
				208	Running such scripts directly under Valgrind will result in you
				209	getting error reports pertaining to <code>/bin/sh</code>,
				210	<code>/usr/bin/perl</code>, or whatever interpreter you're using.
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	211	This almost certainly isn't what you want and can be confusing.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	212
				213	<a name="comment"></a>
				214	<h3>2.2  The commentary</h3>
				215
				216	Valgrind writes a commentary, detailing error reports and other
				217	significant events. The commentary goes to standard output by
				218	default. This may interfere with your program, so you can ask for it
				219	to be directed elsewhere.
				220
				221	<p>All lines in the commentary are of the following form:<br>
				222	<pre>
				223	==12345== some-message-from-Valgrind
				224	</pre>
				225	<p>The <code>12345</code> is the process ID. This scheme makes it easy
				226	to distinguish program output from Valgrind commentary, and also easy
				227	to differentiate commentaries from different processes which have
				228	become merged together, for whatever reason.
				229
				230	<p>By default, Valgrind writes only essential messages to the commentary,
				231	so as to avoid flooding you with information of secondary importance.
				232	If you want more information about what is happening, re-run, passing
				233	the <code>-v</code> flag to Valgrind.
				234
				235
				236	<a name="report"></a>
				237	<h3>2.3  Reporting of errors</h3>
				238
				239	When Valgrind detects something bad happening in the program, an error
				240	message is written to the commentary. For example:<br>
				241	<pre>
				242	==25832== Invalid read of size 4
				243	==25832== at 0x8048724: BandMatrix::ReSize(int, int, int) (bogon.cpp:45)
				244	==25832== by 0x80487AF: main (bogon.cpp:66)
				245	==25832== by 0x40371E5E: __libc_start_main (libc-start.c:129)
				246	==25832== by 0x80485D1: (within /home/sewardj/newmat10/bogon)
				247	==25832== Address 0xBFFFF74C is not stack'd, malloc'd or free'd
				248	</pre>
				249
				250	<p>This message says that the program did an illegal 4-byte read of
				251	address 0xBFFFF74C, which, as far as it can tell, is not a valid stack
				252	address, nor corresponds to any currently malloc'd or free'd blocks.
				253	The read is happening at line 45 of <code>bogon.cpp</code>, called
				254	from line 66 of the same file, etc. For errors associated with an
				255	identified malloc'd/free'd block, for example reading free'd memory,
				256	Valgrind reports not only the location where the error happened, but
				257	also where the associated block was malloc'd/free'd.
				258
				259	<p>Valgrind remembers all error reports. When an error is detected,
				260	it is compared against old reports, to see if it is a duplicate. If
				261	so, the error is noted, but no further commentary is emitted. This
				262	avoids you being swamped with bazillions of duplicate error reports.
				263
				264	<p>If you want to know how many times each error occurred, run with
				265	the <code>-v</code> option. When execution finishes, all the reports
				266	are printed out, along with, and sorted by, their occurrence counts.
				267	This makes it easy to see which errors have occurred most frequently.
				268
				269	<p>Errors are reported before the associated operation actually
				270	happens. For example, if you program decides to read from address
				271	zero, Valgrind will emit a message to this effect, and the program
				272	will then duly die with a segmentation fault.
				273
				274	<p>In general, you should try and fix errors in the order that they
				275	are reported. Not doing so can be confusing. For example, a program
				276	which copies uninitialised values to several memory locations, and
				277	later uses them, will generate several error messages. The first such
				278	error message may well give the most direct clue to the root cause of
				279	the problem.
				280
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	281	<p>The process of detecting duplicate errors is quite an expensive
				282	one and can become a significant performance overhead if your program
				283	generates huge quantities of errors. To avoid serious problems here,
				284	Valgrind will simply stop collecting errors after 300 different errors
				285	have been seen, or 30000 errors in total have been seen. In this
				286	situation you might as well stop your program and fix it, because
				287	Valgrind won't tell you anything else useful after this. Note that
				288	the 300/30000 limits apply after suppressed errors are removed. These
				289	limits are defined in <code>vg_include.h</code> and can be increased
				290	if necessary.
				291
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	292	<a name="suppress"></a>
				293	<h3>2.4  Suppressing errors</h3>
				294
				295	Valgrind detects numerous problems in the base libraries, such as the
				296	GNU C library, and the XFree86 client libraries, which come
				297	pre-installed on your GNU/Linux system. You can't easily fix these,
				298	but you don't want to see these errors (and yes, there are many!) So
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	299	Valgrind reads a list of errors to suppress at startup.
				300	A default suppression file is cooked up by the
				301	<code>./configure</code> script.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	302
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	303	<p>You can modify and add to the suppressions file at your leisure,
				304	or, better, write your own. Multiple suppression files are allowed.
				305	This is useful if part of your project contains errors you can't or
				306	don't want to fix, yet you don't want to continuously be reminded of
				307	them.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	308
				309	<p>Each error to be suppressed is described very specifically, to
				310	minimise the possibility that a suppression-directive inadvertantly
				311	suppresses a bunch of similar errors which you did want to see. The
				312	suppression mechanism is designed to allow precise yet flexible
				313	specification of errors to suppress.
				314
				315	<p>If you use the <code>-v</code> flag, at the end of execution, Valgrind
				316	prints out one line for each used suppression, giving its name and the
				317	number of times it got used. Here's the suppressions used by a run of
				318	<code>ls -l</code>:
				319	<pre>
				320	--27579-- supp: 1 socketcall.connect(serv_addr)/__libc_connect/__nscd_getgrgid_r
				321	--27579-- supp: 1 socketcall.connect(serv_addr)/__libc_connect/__nscd_getpwuid_r
				322	--27579-- supp: 6 strrchr/_dl_map_object_from_fd/_dl_map_object
				323	</pre>
				324
				325	<a name="flags"></a>
				326	<h3>2.5  Command-line flags</h3>
				327
				328	You invoke Valgrind like this:
				329	<pre>
				330	valgrind [options-for-Valgrind] your-prog [options for your-prog]
				331	</pre>
				332
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	333	<p>Note that Valgrind also reads options from the environment variable
				334	<code>$VALGRIND</code>, and processes them before the command-line
				335	options.
				336
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	337	<p>Valgrind's default settings succeed in giving reasonable behaviour
				338	in most cases. Available options, in no particular order, are as
				339	follows:
				340	<ul>
				341	<li><code>--help</code></li><br>
				342
				343	<li><code>--version</code><br>
				344	<p>The usual deal.</li><br><p>
				345
				346	<li><code>-v --verbose</code><br>
				347	<p>Be more verbose. Gives extra information on various aspects
				348	of your program, such as: the shared objects loaded, the
				349	suppressions used, the progress of the instrumentation engine,
				350	and warnings about unusual behaviour.
				351	</li><br><p>
				352
				353	<li><code>-q --quiet</code><br>
				354	<p>Run silently, and only print error messages. Useful if you
				355	are running regression tests or have some other automated test
				356	machinery.
				357	</li><br><p>
				358
				359	<li><code>--demangle=no</code><br>
				360	<code>--demangle=yes</code> [the default]
				361	<p>Disable/enable automatic demangling (decoding) of C++ names.
				362	Enabled by default. When enabled, Valgrind will attempt to
				363	translate encoded C++ procedure names back to something
				364	approaching the original. The demangler handles symbols mangled
				365	by g++ versions 2.X and 3.X.
				366
				367	<p>An important fact about demangling is that function
				368	names mentioned in suppressions files should be in their mangled
				369	form. Valgrind does not demangle function names when searching
				370	for applicable suppressions, because to do otherwise would make
				371	suppressions file contents dependent on the state of Valgrind's
				372	demangling machinery, and would also be slow and pointless.
				373	</li><br><p>
				374
				375	<li><code>--num-callers=<number></code> [default=4]<br>
				376	<p>By default, Valgrind shows four levels of function call names
				377	to help you identify program locations. You can change that
				378	number with this option. This can help in determining the
				379	program's location in deeply-nested call chains. Note that errors
				380	are commoned up using only the top three function locations (the
				381	place in the current function, and that of its two immediate
				382	callers). So this doesn't affect the total number of errors
				383	reported.
				384	<p>
				385	The maximum value for this is 50. Note that higher settings
				386	will make Valgrind run a bit more slowly and take a bit more
				387	memory, but can be useful when working with programs with
				388	deeply-nested call chains.
				389	</li><br><p>
				390
				391	<li><code>--gdb-attach=no</code> [the default]<br>
				392	<code>--gdb-attach=yes</code>
				393	<p>When enabled, Valgrind will pause after every error shown,
				394	and print the line
				395	<br>
				396	<code>---- Attach to GDB ? --- [Return/N/n/Y/y/C/c] ----</code>
				397	<p>
				398	Pressing <code>Ret</code>, or <code>N</code> <code>Ret</code>
				399	or <code>n</code> <code>Ret</code>, causes Valgrind not to
				400	start GDB for this error.
				401	<p>
				402	<code>Y</code> <code>Ret</code>
				403	or <code>y</code> <code>Ret</code> causes Valgrind to
				404	start GDB, for the program at this point. When you have
				405	finished with GDB, quit from it, and the program will continue.
				406	Trying to continue from inside GDB doesn't work.
				407	<p>
				408	<code>C</code> <code>Ret</code>
				409	or <code>c</code> <code>Ret</code> causes Valgrind not to
				410	start GDB, and not to ask again.
				411	<p>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	412	<code>--gdb-attach=yes</code> conflicts with
				413	<code>--trace-children=yes</code>. You can't use them together.
				414	Valgrind refuses to start up in this situation. 1 May 2002:
				415	this is a historical relic which could be easily fixed if it
				416	gets in your way. Mail me and complain if this is a problem for
				417	you. </li><br><p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	418
				419	<li><code>--partial-loads-ok=yes</code> [the default]<br>
				420	<code>--partial-loads-ok=no</code>
				421	<p>Controls how Valgrind handles word (4-byte) loads from
				422	addresses for which some bytes are addressible and others
				423	are not. When <code>yes</code> (the default), such loads
				424	do not elicit an address error. Instead, the loaded V bytes
				425	corresponding to the illegal addresses indicate undefined, and
				426	those corresponding to legal addresses are loaded from shadow
				427	memory, as usual.
				428	<p>
				429	When <code>no</code>, loads from partially
				430	invalid addresses are treated the same as loads from completely
				431	invalid addresses: an illegal-address error is issued,
				432	and the resulting V bytes indicate valid data.
				433	</li><br><p>
				434
				435	<li><code>--sloppy-malloc=no</code> [the default]<br>
				436	<code>--sloppy-malloc=yes</code>
				437	<p>When enabled, all requests for malloc/calloc are rounded up
				438	to a whole number of machine words -- in other words, made
				439	divisible by 4. For example, a request for 17 bytes of space
				440	would result in a 20-byte area being made available. This works
				441	around bugs in sloppy libraries which assume that they can
				442	safely rely on malloc/calloc requests being rounded up in this
				443	fashion. Without the workaround, these libraries tend to
				444	generate large numbers of errors when they access the ends of
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	445	these areas.
				446	<p>
				447	Valgrind snapshots dated 17 Feb 2002 and later are
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	448	cleverer about this problem, and you should no longer need to
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	449	use this flag. To put it bluntly, if you do need to use this
				450	flag, your program violates the ANSI C semantics defined for
				451	<code>malloc</code> and <code>free</code>, even if it appears to
				452	work correctly, and you should fix it, at least if you hope for
				453	maximum portability.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	454	</li><br><p>
				455
				456	<li><code>--trace-children=no</code> [the default]</br>
				457	<code>--trace-children=yes</code>
				458	<p>When enabled, Valgrind will trace into child processes. This
				459	is confusing and usually not what you want, so is disabled by
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	460	default. As of 1 May 2002, tracing into a child process from a
				461	parent which uses <code>libpthread.so</code> is probably broken
				462	and is likely to cause breakage. Please report any such
				463	problems to me. </li><br><p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	464
				465	<li><code>--freelist-vol=<number></code> [default: 1000000]
				466	<p>When the client program releases memory using free (in C) or
				467	delete (C++), that memory is not immediately made available for
				468	re-allocation. Instead it is marked inaccessible and placed in
				469	a queue of freed blocks. The purpose is to delay the point at
				470	which freed-up memory comes back into circulation. This
				471	increases the chance that Valgrind will be able to detect
				472	invalid accesses to blocks for some significant period of time
				473	after they have been freed.
				474	<p>
				475	This flag specifies the maximum total size, in bytes, of the
				476	blocks in the queue. The default value is one million bytes.
				477	Increasing this increases the total amount of memory used by
				478	Valgrind but may detect invalid uses of freed blocks which would
				479	otherwise go undetected.</li><br><p>
				480
				481	<li><code>--logfile-fd=<number></code> [default: 2, stderr]
				482	<p>Specifies the file descriptor on which Valgrind communicates
				483	all of its messages. The default, 2, is the standard error
				484	channel. This may interfere with the client's own use of
				485	stderr. To dump Valgrind's commentary in a file without using
				486	stderr, something like the following works well (sh/bash
				487	syntax):<br>
				488	<code>
				489	valgrind --logfile-fd=9 my_prog 9> logfile</code><br>
				490	That is: tell Valgrind to send all output to file descriptor 9,
				491	and ask the shell to route file descriptor 9 to "logfile".
				492	</li><br><p>
				493
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	494	<li><code>--suppressions=<filename></code>
				495	[default: $PREFIX/lib/valgrind/default.supp]
				496	<p>Specifies an extra
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	497	file from which to read descriptions of errors to suppress. You
				498	may use as many extra suppressions files as you
				499	like.</li><br><p>
				500
				501	<li><code>--leak-check=no</code> [default]<br>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	502	<code>--leak-check=yes</code>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	503	<p>When enabled, search for memory leaks when the client program
				504	finishes. A memory leak means a malloc'd block, which has not
				505	yet been free'd, but to which no pointer can be found. Such a
				506	block can never be free'd by the program, since no pointer to it
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	507	exists. Leak checking is disabled by default because it tends
				508	to generate dozens of error messages. </li><br><p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	509
				510	<li><code>--show-reachable=no</code> [default]<br>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	511	<code>--show-reachable=yes</code>
				512	<p>When disabled, the memory leak detector only shows blocks for
				513	which it cannot find a pointer to at all, or it can only find a
				514	pointer to the middle of. These blocks are prime candidates for
				515	memory leaks. When enabled, the leak detector also reports on
				516	blocks which it could find a pointer to. Your program could, at
				517	least in principle, have freed such blocks before exit.
				518	Contrast this to blocks for which no pointer, or only an
				519	interior pointer could be found: they are more likely to
				520	indicate memory leaks, because you do not actually have a
				521	pointer to the start of the block which you can hand to
				522	<code>free</code>, even if you wanted to. </li><br><p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	523
				524	<li><code>--leak-resolution=low</code> [default]<br>
				525	<code>--leak-resolution=med</code> <br>
				526	<code>--leak-resolution=high</code>
				527	<p>When doing leak checking, determines how willing Valgrind is
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	528	to consider different backtraces to be the same. When set to
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	529	<code>low</code>, the default, only the first two entries need
				530	match. When <code>med</code>, four entries have to match. When
				531	<code>high</code>, all entries need to match.
				532	<p>
				533	For hardcore leak debugging, you probably want to use
				534	<code>--leak-resolution=high</code> together with
				535	<code>--num-callers=40</code> or some such large number. Note
				536	however that this can give an overwhelming amount of
				537	information, which is why the defaults are 4 callers and
				538	low-resolution matching.
				539	<p>
				540	Note that the <code>--leak-resolution=</code> setting does not
				541	affect Valgrind's ability to find leaks. It only changes how
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	542	the results are presented.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	543	</li><br><p>
				544
				545	<li><code>--workaround-gcc296-bugs=no</code> [default]<br>
				546	<code>--workaround-gcc296-bugs=yes</code> <p>When enabled,
				547	assume that reads and writes some small distance below the stack
				548	pointer <code>%esp</code> are due to bugs in gcc 2.96, and does
				549	not report them. The "small distance" is 256 bytes by default.
				550	Note that gcc 2.96 is the default compiler on some popular Linux
				551	distributions (RedHat 7.X, Mandrake) and so you may well need to
				552	use this flag. Do not use it if you do not have to, as it can
				553	cause real errors to be overlooked. A better option is to use a
				554	gcc/g++ which works properly; 2.95.3 seems to be a good choice.
				555	<p>
				556	Unfortunately (27 Feb 02) it looks like g++ 3.0.4 is similarly
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	557	buggy, so you may need to issue this flag if you use 3.0.4. A
				558	while later (early Apr 02) this is confirmed as a scheduling bug
				559	in g++-3.0.4.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	560	</li><br><p>
				561
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	562	<li><code>--cachesim=no</code> [default]<br>
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	563	<code>--cachesim=yes</code> <p>When enabled, turns off memory
				564	checking, and turns on cache profiling. Cache profiling is
				565	described in detail in <a href="#cache">Section 7</a>. </li><p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	566	</ul>
				567
				568	There are also some options for debugging Valgrind itself. You
				569	shouldn't need to use them in the normal run of things. Nevertheless:
				570
				571	<ul>
				572
				573	<li><code>--single-step=no</code> [default]<br>
				574	<code>--single-step=yes</code>
				575	<p>When enabled, each x86 insn is translated seperately into
				576	instrumented code. When disabled, translation is done on a
				577	per-basic-block basis, giving much better translations.</li><br>
				578	<p>
				579
				580	<li><code>--optimise=no</code><br>
				581	<code>--optimise=yes</code> [default]
				582	<p>When enabled, various improvements are applied to the
				583	intermediate code, mainly aimed at allowing the simulated CPU's
				584	registers to be cached in the real CPU's registers over several
				585	simulated instructions.</li><br>
				586	<p>
				587
				588	<li><code>--instrument=no</code><br>
				589	<code>--instrument=yes</code> [default]
				590	<p>When disabled, the translations don't actually contain any
				591	instrumentation.</li><br>
				592	<p>
				593
				594	<li><code>--cleanup=no</code><br>
				595	<code>--cleanup=yes</code> [default]
				596	<p>When enabled, various improvments are applied to the
				597	post-instrumented intermediate code, aimed at removing redundant
				598	value checks.</li><br>
				599	<p>
				600
				601	<li><code>--trace-syscalls=no</code> [default]<br>
				602	<code>--trace-syscalls=yes</code>
				603	<p>Enable/disable tracing of system call intercepts.</li><br>
				604	<p>
				605
				606	<li><code>--trace-signals=no</code> [default]<br>
				607	<code>--trace-signals=yes</code>
				608	<p>Enable/disable tracing of signal handling.</li><br>
				609	<p>
				610
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	611	<li><code>--trace-sched=no</code> [default]<br>
				612	<code>--trace-sched=yes</code>
				613	<p>Enable/disable tracing of thread scheduling events.</li><br>
				614	<p>
				615
sewardj	45b4b37	2002-04-16 22:50:32 +0000	[diff] [blame]	616	<li><code>--trace-pthread=none</code> [default]<br>
				617	<code>--trace-pthread=some</code> <br>
				618	<code>--trace-pthread=all</code>
				619	<p>Specifies amount of trace detail for pthread-related events.</li><br>
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	620	<p>
				621
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	622	<li><code>--trace-symtab=no</code> [default]<br>
				623	<code>--trace-symtab=yes</code>
				624	<p>Enable/disable tracing of symbol table reading.</li><br>
				625	<p>
				626
				627	<li><code>--trace-malloc=no</code> [default]<br>
				628	<code>--trace-malloc=yes</code>
				629	<p>Enable/disable tracing of malloc/free (et al) intercepts.
				630	</li><br>
				631	<p>
				632
				633	<li><code>--stop-after=<number></code>
				634	[default: infinity, more or less]
				635	<p>After <number> basic blocks have been executed, shut down
				636	Valgrind and switch back to running the client on the real CPU.
				637	</li><br>
				638	<p>
				639
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	640	<li><code>--dump-error=<number></code> [default: inactive]
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	641	<p>After the program has exited, show gory details of the
				642	translation of the basic block containing the <number>'th
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	643	error context. When used with <code>--single-step=yes</code>,
				644	can show the exact x86 instruction causing an error. This is
				645	all fairly dodgy and doesn't work at all if threads are
				646	involved.</li><br>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	647	<p>
				648
				649	<li><code>--smc-check=none</code><br>
				650	<code>--smc-check=some</code> [default]<br>
				651	<code>--smc-check=all</code>
				652	<p>How carefully should Valgrind check for self-modifying code
				653	writes, so that translations can be discarded?  When
				654	"none", no writes are checked. When "some", only writes
				655	resulting from moves from integer registers to memory are
				656	checked. When "all", all memory writes are checked, even those
				657	with which are no sane program would generate code -- for
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	658	example, floating-point writes.
				659	<p>
				660	NOTE that this is all a bit bogus. This mechanism has never
				661	been enabled in any snapshot of Valgrind which was made
				662	available to the general public, because the extra checks reduce
				663	performance, increase complexity, and I have yet to come across
				664	any programs which actually use self-modifying code. I think
				665	the flag is ignored.
				666	</li>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	667	</ul>
				668
				669
				670	<a name="errormsgs">
				671	<h3>2.6  Explaination of error messages</h3>
				672
				673	Despite considerable sophistication under the hood, Valgrind can only
				674	really detect two kinds of errors, use of illegal addresses, and use
				675	of undefined values. Nevertheless, this is enough to help you
				676	discover all sorts of memory-management nasties in your code. This
				677	section presents a quick summary of what error messages mean. The
				678	precise behaviour of the error-checking machinery is described in
				679	<a href="#machine">Section 4</a>.
				680
				681
				682	<h4>2.6.1  Illegal read / Illegal write errors</h4>
				683	For example:
				684	<pre>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	685	Invalid read of size 4
				686	at 0x40F6BBCC: (within /usr/lib/libpng.so.2.1.0.9)
				687	by 0x40F6B804: (within /usr/lib/libpng.so.2.1.0.9)
				688	by 0x40B07FF4: read_png_image__FP8QImageIO (kernel/qpngio.cpp:326)
				689	by 0x40AC751B: QImageIO::read() (kernel/qimage.cpp:3621)
				690	Address 0xBFFFF0E0 is not stack'd, malloc'd or free'd
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	691	</pre>
				692
				693	<p>This happens when your program reads or writes memory at a place
				694	which Valgrind reckons it shouldn't. In this example, the program did
				695	a 4-byte read at address 0xBFFFF0E0, somewhere within the
				696	system-supplied library libpng.so.2.1.0.9, which was called from
				697	somewhere else in the same library, called from line 326 of
				698	qpngio.cpp, and so on.
				699
				700	<p>Valgrind tries to establish what the illegal address might relate
				701	to, since that's often useful. So, if it points into a block of
				702	memory which has already been freed, you'll be informed of this, and
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	703	also where the block was free'd at. Likewise, if it should turn out
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	704	to be just off the end of a malloc'd block, a common result of
				705	off-by-one-errors in array subscripting, you'll be informed of this
				706	fact, and also where the block was malloc'd.
				707
				708	<p>In this example, Valgrind can't identify the address. Actually the
				709	address is on the stack, but, for some reason, this is not a valid
				710	stack address -- it is below the stack pointer, %esp, and that isn't
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	711	allowed. In this particular case it's probably caused by gcc
				712	generating invalid code, a known bug in various flavours of gcc.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	713
				714	<p>Note that Valgrind only tells you that your program is about to
				715	access memory at an illegal address. It can't stop the access from
				716	happening. So, if your program makes an access which normally would
				717	result in a segmentation fault, you program will still suffer the same
				718	fate -- but you will get a message from Valgrind immediately prior to
				719	this. In this particular example, reading junk on the stack is
				720	non-fatal, and the program stays alive.
				721
				722
				723	<h4>2.6.2  Use of uninitialised values</h4>
				724	For example:
				725	<pre>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	726	Conditional jump or move depends on uninitialised value(s)
				727	at 0x402DFA94: _IO_vfprintf (_itoa.h:49)
				728	by 0x402E8476: _IO_printf (printf.c:36)
				729	by 0x8048472: main (tests/manuel1.c:8)
				730	by 0x402A6E5E: __libc_start_main (libc-start.c:129)
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	731	</pre>
				732
				733	<p>An uninitialised-value use error is reported when your program uses
				734	a value which hasn't been initialised -- in other words, is undefined.
				735	Here, the undefined value is used somewhere inside the printf()
				736	machinery of the C library. This error was reported when running the
				737	following small program:
				738	<pre>
				739	int main()
				740	{
				741	int x;
				742	printf ("x = %d\n", x);
				743	}
				744	</pre>
				745
				746	<p>It is important to understand that your program can copy around
				747	junk (uninitialised) data to its heart's content. Valgrind observes
				748	this and keeps track of the data, but does not complain. A complaint
				749	is issued only when your program attempts to make use of uninitialised
				750	data. In this example, x is uninitialised. Valgrind observes the
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	751	value being passed to _IO_printf and thence to _IO_vfprintf, but makes
				752	no comment. However, _IO_vfprintf has to examine the value of x so it
				753	can turn it into the corresponding ASCII string, and it is at this
				754	point that Valgrind complains.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	755
				756	<p>Sources of uninitialised data tend to be:
				757	<ul>
				758	<li>Local variables in procedures which have not been initialised,
				759	as in the example above.</li><br><p>
				760
				761	<li>The contents of malloc'd blocks, before you write something
				762	there. In C++, the new operator is a wrapper round malloc, so
				763	if you create an object with new, its fields will be
				764	uninitialised until you fill them in, which is only Right and
				765	Proper.</li>
				766	</ul>
				767
				768
				769
				770	<h4>2.6.3  Illegal frees</h4>
				771	For example:
				772	<pre>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	773	Invalid free()
				774	at 0x4004FFDF: free (ut_clientmalloc.c:577)
				775	by 0x80484C7: main (tests/doublefree.c:10)
				776	by 0x402A6E5E: __libc_start_main (libc-start.c:129)
				777	by 0x80483B1: (within tests/doublefree)
				778	Address 0x3807F7B4 is 0 bytes inside a block of size 177 free'd
				779	at 0x4004FFDF: free (ut_clientmalloc.c:577)
				780	by 0x80484C7: main (tests/doublefree.c:10)
				781	by 0x402A6E5E: __libc_start_main (libc-start.c:129)
				782	by 0x80483B1: (within tests/doublefree)
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	783	</pre>
				784	<p>Valgrind keeps track of the blocks allocated by your program with
				785	malloc/new, so it can know exactly whether or not the argument to
				786	free/delete is legitimate or not. Here, this test program has
				787	freed the same block twice. As with the illegal read/write errors,
				788	Valgrind attempts to make sense of the address free'd. If, as
				789	here, the address is one which has previously been freed, you wil
				790	be told that -- making duplicate frees of the same block easy to spot.
				791
				792
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	793	<h4>2.6.4  When a block is freed with an inappropriate
				794	deallocation function</h4>
sewardj	7c062c9	2002-05-01 21:46:38 +0000	[diff] [blame^]	795	In the following example, a block allocated with <code>new []</code>
				796	has wrongly been deallocated with <code>free</code>:
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	797	<pre>
				798	Mismatched free() / delete / delete []
sewardj	7c062c9	2002-05-01 21:46:38 +0000	[diff] [blame^]	799	at 0x40043249: free (vg_clientfuncs.c:171)
				800	by 0x4102BB4E: QGArray::~QGArray(void) (tools/qgarray.cpp:149)
				801	by 0x4C261C41: PptDoc::~PptDoc(void) (include/qmemarray.h:60)
				802	by 0x4C261F0E: PptXml::~PptXml(void) (pptxml.cc:44)
				803	Address 0x4BB292A8 is 0 bytes inside a block of size 64 alloc'd
				804	at 0x4004318C: __builtin_vec_new (vg_clientfuncs.c:152)
				805	by 0x4C21BC15: KLaola::readSBStream(int) const (klaola.cc:314)
				806	by 0x4C21C155: KLaola::stream(KLaola::OLENode const *) (klaola.cc:416)
				807	by 0x4C21788F: OLEFilter::convert(QCString const &) (olefilter.cc:272)
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	808	</pre>
				809	The following was told to me be the KDE 3 developers. I didn't know
				810	any of it myself. They also implemented the check itself.
				811	<p>
				812	In C++ it's important to deallocate memory in a way compatible with
				813	how it was allocated. The deal is:
				814	<ul>
				815	<li>If allocated with <code>malloc</code>, <code>calloc</code>,
				816	<code>realloc</code>, <code>valloc</code> or
				817	<code>memalign</code>, you must deallocate with <code>free</code>.
				818	<li>If allocated with <code>new []</code>, you must deallocate with
				819	<code>delete []</code>.
				820	<li>If allocated with <code>new</code>, you must deallocate with
				821	<code>delete</code>.
				822	</ul>
				823	The worst thing is that on Linux apparently it doesn't matter if you
				824	do muddle these up, and it all seems to work ok, but the same program
				825	may then crash on a different platform, Solaris for example. So it's
				826	best to fix it properly. According to the KDE folks "it's amazing how
				827	many C++ programmers don't know this".
				828
				829
				830
				831	<h4>2.6.5  Passing system call parameters with inadequate
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	832	read/write permissions</h4>
				833
				834	Valgrind checks all parameters to system calls. If a system call
				835	needs to read from a buffer provided by your program, Valgrind checks
				836	that the entire buffer is addressible and has valid data, ie, it is
				837	readable. And if the system call needs to write to a user-supplied
				838	buffer, Valgrind checks that the buffer is addressible. After the
				839	system call, Valgrind updates its administrative information to
				840	precisely reflect any changes in memory permissions caused by the
				841	system call.
				842
				843	<p>Here's an example of a system call with an invalid parameter:
				844	<pre>
				845	#include <stdlib.h>
				846	#include <unistd.h>
				847	int main( void )
				848	{
				849	char* arr = malloc(10);
				850	(void) write( 1 /* stdout */, arr, 10 );
				851	return 0;
				852	}
				853	</pre>
				854
				855	<p>You get this complaint ...
				856	<pre>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	857	Syscall param write(buf) contains uninitialised or unaddressable byte(s)
				858	at 0x4035E072: __libc_write
				859	by 0x402A6E5E: __libc_start_main (libc-start.c:129)
				860	by 0x80483B1: (within tests/badwrite)
				861	by <bogus frame pointer> ???
				862	Address 0x3807E6D0 is 0 bytes inside a block of size 10 alloc'd
				863	at 0x4004FEE6: malloc (ut_clientmalloc.c:539)
				864	by 0x80484A0: main (tests/badwrite.c:6)
				865	by 0x402A6E5E: __libc_start_main (libc-start.c:129)
				866	by 0x80483B1: (within tests/badwrite)
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	867	</pre>
				868
				869	<p>... because the program has tried to write uninitialised junk from
				870	the malloc'd block to the standard output.
				871
				872
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	873	<h4>2.6.6  Warning messages you might see</h4>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	874
				875	Most of these only appear if you run in verbose mode (enabled by
				876	<code>-v</code>):
				877	<ul>
				878	<li> <code>More than 50 errors detected. Subsequent errors
				879	will still be recorded, but in less detail than before.</code>
				880	<br>
				881	After 50 different errors have been shown, Valgrind becomes
				882	more conservative about collecting them. It then requires only
				883	the program counters in the top two stack frames to match when
				884	deciding whether or not two errors are really the same one.
				885	Prior to this point, the PCs in the top four frames are required
				886	to match. This hack has the effect of slowing down the
				887	appearance of new errors after the first 50. The 50 constant can
				888	be changed by recompiling Valgrind.
				889	<p>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	890	<li> <code>More than 300 errors detected. I'm not reporting any more.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	891	Final error counts may be inaccurate. Go fix your
				892	program!</code>
				893	<br>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	894	After 300 different errors have been detected, Valgrind ignores
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	895	any more. It seems unlikely that collecting even more different
				896	ones would be of practical help to anybody, and it avoids the
				897	danger that Valgrind spends more and more of its time comparing
				898	new errors against an ever-growing collection. As above, the 500
				899	number is a compile-time constant.
				900	<p>
				901	<li> <code>Warning: client exiting by calling exit(<number>).
				902	Bye!</code>
				903	<br>
				904	Your program has called the <code>exit</code> system call, which
				905	will immediately terminate the process. You'll get no exit-time
				906	error summaries or leak checks. Note that this is not the same
				907	as your program calling the ANSI C function <code>exit()</code>
				908	-- that causes a normal, controlled shutdown of Valgrind.
				909	<p>
				910	<li> <code>Warning: client switching stacks?</code>
				911	<br>
				912	Valgrind spotted such a large change in the stack pointer, %esp,
				913	that it guesses the client is switching to a different stack.
				914	At this point it makes a kludgey guess where the base of the new
				915	stack is, and sets memory permissions accordingly. You may get
				916	many bogus error messages following this, if Valgrind guesses
				917	wrong. At the moment "large change" is defined as a change of
				918	more that 2000000 in the value of the %esp (stack pointer)
				919	register.
				920	<p>
				921	<li> <code>Warning: client attempted to close Valgrind's logfile fd <number>
				922	</code>
				923	<br>
				924	Valgrind doesn't allow the client
				925	to close the logfile, because you'd never see any diagnostic
				926	information after that point. If you see this message,
				927	you may want to use the <code>--logfile-fd=<number></code>
				928	option to specify a different logfile file-descriptor number.
				929	<p>
				930	<li> <code>Warning: noted but unhandled ioctl <number></code>
				931	<br>
				932	Valgrind observed a call to one of the vast family of
				933	<code>ioctl</code> system calls, but did not modify its
				934	memory status info (because I have not yet got round to it).
				935	The call will still have gone through, but you may get spurious
				936	errors after this as a result of the non-update of the memory info.
				937	<p>
				938	<li> <code>Warning: unblocking signal <number> due to
				939	sigprocmask</code>
				940	<br>
				941	Really just a diagnostic from the signal simulation machinery.
				942	This message will appear if your program handles a signal by
				943	first <code>longjmp</code>ing out of the signal handler,
				944	and then unblocking the signal with <code>sigprocmask</code>
				945	-- a standard signal-handling idiom.
				946	<p>
				947	<li> <code>Warning: bad signal number <number> in __NR_sigaction.</code>
				948	<br>
				949	Probably indicates a bug in the signal simulation machinery.
				950	<p>
				951	<li> <code>Warning: set address range perms: large range <number></code>
				952	<br>
				953	Diagnostic message, mostly for my benefit, to do with memory
				954	permissions.
				955	</ul>
				956
				957
				958	<a name="suppfiles"></a>
				959	<h3>2.7  Writing suppressions files</h3>
				960
				961	A suppression file describes a bunch of errors which, for one reason
				962	or another, you don't want Valgrind to tell you about. Usually the
				963	reason is that the system libraries are buggy but unfixable, at least
				964	within the scope of the current debugging session. Multiple
				965	suppresions files are allowed. By default, Valgrind uses
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	966	<code>$PREFIX/lib/valgrind/default.supp</code>.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	967
				968	<p>
				969	You can ask to add suppressions from another file, by specifying
				970	<code>--suppressions=/path/to/file.supp</code>.
				971
				972	<p>Each suppression has the following components:<br>
				973	<ul>
				974
				975	<li>Its name. This merely gives a handy name to the suppression, by
				976	which it is referred to in the summary of used suppressions
				977	printed out when a program finishes. It's not important what
				978	the name is; any identifying string will do.
				979	<p>
				980
				981	<li>The nature of the error to suppress. Either:
				982	<code>Value1</code>,
				983	<code>Value2</code>,
sewardj	a7dc795	2002-03-24 11:29:13 +0000	[diff] [blame]	984	<code>Value4</code> or
				985	<code>Value8</code>,
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	986	meaning an uninitialised-value error when
sewardj	a7dc795	2002-03-24 11:29:13 +0000	[diff] [blame]	987	using a value of 1, 2, 4 or 8 bytes.
				988	Or
				989	<code>Cond</code> (or its old name, <code>Value0</code>),
				990	meaning use of an uninitialised CPU condition code. Or:
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	991	<code>Addr1</code>,
				992	<code>Addr2</code>,
				993	<code>Addr4</code> or
				994	<code>Addr8</code>, meaning an invalid address during a
				995	memory access of 1, 2, 4 or 8 bytes respectively. Or
				996	<code>Param</code>,
				997	meaning an invalid system call parameter error. Or
				998	<code>Free</code>, meaning an invalid or mismatching free.</li><br>
				999	<p>
				1000
				1001	<li>The "immediate location" specification. For Value and Addr
				1002	errors, is either the name of the function in which the error
				1003	occurred, or, failing that, the full path the the .so file
				1004	containing the error location. For Param errors, is the name of
				1005	the offending system call parameter. For Free errors, is the
				1006	name of the function doing the freeing (eg, <code>free</code>,
				1007	<code>__builtin_vec_delete</code>, etc)</li><br>
				1008	<p>
				1009
				1010	<li>The caller of the above "immediate location". Again, either a
				1011	function or shared-object name.</li><br>
				1012	<p>
				1013
				1014	<li>Optionally, one or two extra calling-function or object names,
				1015	for greater precision.</li>
				1016	</ul>
				1017
				1018	<p>
				1019	Locations may be either names of shared objects or wildcards matching
				1020	function names. They begin <code>obj:</code> and <code>fun:</code>
				1021	respectively. Function and object names to match against may use the
				1022	wildcard characters <code>*</code> and <code>?</code>.
				1023
				1024	A suppression only suppresses an error when the error matches all the
				1025	details in the suppression. Here's an example:
				1026	<pre>
				1027	{
				1028	__gconv_transform_ascii_internal/__mbrtowc/mbtowc
				1029	Value4
				1030	fun:__gconv_transform_ascii_internal
				1031	fun:__mbr*toc
				1032	fun:mbtowc
				1033	}
				1034	</pre>
				1035
				1036	<p>What is means is: suppress a use-of-uninitialised-value error, when
				1037	the data size is 4, when it occurs in the function
				1038	<code>__gconv_transform_ascii_internal</code>, when that is called
				1039	from any function of name matching <code>__mbr*toc</code>,
				1040	when that is called from
				1041	<code>mbtowc</code>. It doesn't apply under any other circumstances.
				1042	The string by which this suppression is identified to the user is
				1043	__gconv_transform_ascii_internal/__mbrtowc/mbtowc.
				1044
				1045	<p>Another example:
				1046	<pre>
				1047	{
				1048	libX11.so.6.2/libX11.so.6.2/libXaw.so.7.0
				1049	Value4
				1050	obj:/usr/X11R6/lib/libX11.so.6.2
				1051	obj:/usr/X11R6/lib/libX11.so.6.2
				1052	obj:/usr/X11R6/lib/libXaw.so.7.0
				1053	}
				1054	</pre>
				1055
				1056	<p>Suppress any size 4 uninitialised-value error which occurs anywhere
				1057	in <code>libX11.so.6.2</code>, when called from anywhere in the same
				1058	library, when called from anywhere in <code>libXaw.so.7.0</code>. The
				1059	inexact specification of locations is regrettable, but is about all
				1060	you can hope for, given that the X11 libraries shipped with Red Hat
				1061	7.2 have had their symbol tables removed.
				1062
				1063	<p>Note -- since the above two examples did not make it clear -- that
				1064	you can freely mix the <code>obj:</code> and <code>fun:</code>
				1065	styles of description within a single suppression record.
				1066
				1067
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1068	<a name="clientreq"></a>
				1069	<h3>2.8  The Client Request mechanism</h3>
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	1070
				1071	Valgrind has a trapdoor mechanism via which the client program can
				1072	pass all manner of requests and queries to Valgrind. Internally, this
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1073	is used extensively to make malloc, free, signals, threads, etc, work,
				1074	although you don't see that.
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	1075	<p>
				1076	For your convenience, a subset of these so-called client requests is
				1077	provided to allow you to tell Valgrind facts about the behaviour of
				1078	your program, and conversely to make queries. In particular, your
				1079	program can tell Valgrind about changes in memory range permissions
				1080	that Valgrind would not otherwise know about, and so allows clients to
				1081	get Valgrind to do arbitrary custom checks.
				1082	<p>
				1083	Clients need to include the header file <code>valgrind.h</code> to
				1084	make this work. The macros therein have the magical property that
				1085	they generate code in-line which Valgrind can spot. However, the code
				1086	does nothing when not run on Valgrind, so you are not forced to run
				1087	your program on Valgrind just because you use the macros in this file.
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1088	Also, you are not required to link your program with any extra
				1089	supporting libraries.
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	1090	<p>
				1091	A brief description of the available macros:
				1092	<ul>
				1093	<li><code>VALGRIND_MAKE_NOACCESS</code>,
				1094	<code>VALGRIND_MAKE_WRITABLE</code> and
				1095	<code>VALGRIND_MAKE_READABLE</code>. These mark address
				1096	ranges as completely inaccessible, accessible but containing
				1097	undefined data, and accessible and containing defined data,
				1098	respectively. Subsequent errors may have their faulting
				1099	addresses described in terms of these blocks. Returns a
				1100	"block handle". Returns zero when not run on Valgrind.
				1101	<p>
				1102	<li><code>VALGRIND_DISCARD</code>: At some point you may want
				1103	Valgrind to stop reporting errors in terms of the blocks
				1104	defined by the previous three macros. To do this, the above
				1105	macros return a small-integer "block handle". You can pass
				1106	this block handle to <code>VALGRIND_DISCARD</code>. After
				1107	doing so, Valgrind will no longer be able to relate
				1108	addressing errors to the user-defined block associated with
				1109	the handle. The permissions settings associated with the
				1110	handle remain in place; this just affects how errors are
				1111	reported, not whether they are reported. Returns 1 for an
				1112	invalid handle and 0 for a valid handle (although passing
				1113	invalid handles is harmless). Always returns 0 when not run
				1114	on Valgrind.
				1115	<p>
				1116	<li><code>VALGRIND_CHECK_NOACCESS</code>,
				1117	<code>VALGRIND_CHECK_WRITABLE</code> and
				1118	<code>VALGRIND_CHECK_READABLE</code>: check immediately
				1119	whether or not the given address range has the relevant
				1120	property, and if not, print an error message. Also, for the
				1121	convenience of the client, returns zero if the relevant
				1122	property holds; otherwise, the returned value is the address
				1123	of the first byte for which the property is not true.
				1124	Always returns 0 when not run on Valgrind.
				1125	<p>
				1126	<li><code>VALGRIND_CHECK_NOACCESS</code>: a quick and easy way
				1127	to find out whether Valgrind thinks a particular variable
				1128	(lvalue, to be precise) is addressible and defined. Prints
				1129	an error message if not. Returns no value.
				1130	<p>
				1131	<li><code>VALGRIND_MAKE_NOACCESS_STACK</code>: a highly
				1132	experimental feature. Similarly to
				1133	<code>VALGRIND_MAKE_NOACCESS</code>, this marks an address
				1134	range as inaccessible, so that subsequent accesses to an
				1135	address in the range gives an error. However, this macro
				1136	does not return a block handle. Instead, all annotations
				1137	created like this are reviewed at each client
				1138	<code>ret</code> (subroutine return) instruction, and those
				1139	which now define an address range block the client's stack
				1140	pointer register (<code>%esp</code>) are automatically
				1141	deleted.
				1142	<p>
				1143	In other words, this macro allows the client to tell
				1144	Valgrind about red-zones on its own stack. Valgrind
				1145	automatically discards this information when the stack
				1146	retreats past such blocks. Beware: hacky and flaky, and
				1147	probably interacts badly with the new pthread support.
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	1148	<p>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1149	<li><code>RUNNING_ON_VALGRIND</code>: returns 1 if running on
				1150	Valgrind, 0 if running on the real CPU.
				1151	<p>
				1152	<li><code>VALGRIND_DO_LEAK_CHECK</code>: run the memory leak detector
				1153	right now. Returns no value. I guess this could be used to
				1154	incrementally check for leaks between arbitrary places in the
				1155	program's execution. Warning: not properly tested!
				1156	</ul>
				1157	<p>
				1158
				1159
				1160	<a name="pthreads"></a>
				1161	<h3>2.9  Support for POSIX Pthreads</h3>
				1162
				1163	As of late April 02, Valgrind supports programs which use POSIX
				1164	pthreads. Doing this has proved technically challenging and is still
				1165	in progress, but it works well enough, as of 1 May 02, for significant
				1166	threaded applications to work.
				1167	<p>
				1168	It works as follows: threaded apps are (dynamically) linked against
				1169	<code>libpthread.so</code>. Usually this is the one installed with
				1170	your Linux distribution. Valgrind, however, supplies its own
				1171	<code>libpthread.so</code> and automatically connects your program to
				1172	it instead.
				1173	<p>
				1174	The fake <code>libpthread.so</code> and Valgrind cooperate to
				1175	implement a user-space pthreads package. This approach avoids the
				1176	horrible implementation problems of implementing a truly
				1177	multiprocessor version of Valgrind, but it does mean that threaded
				1178	apps run only on one CPU, even if you have a multiprocessor machine.
				1179	<p>
				1180	Valgrind schedules your threads in a round-robin fashion, with all
				1181	threads having equal priority. It switches threads every 20000 basic
				1182	blocks (typically around 120000 x86 instructions), which means you'll
				1183	get a much finer interleaving of thread executions than when run
				1184	natively. This in itself may cause your program to behave differently
				1185	if you have some kind of concurrency, critical race, locking, or
				1186	similar, bugs.
				1187	<p>
				1188	The current (1 May 02) state of pthread support is as follows. Please
				1189	note that things are advancing rapidly, so the situation may have
				1190	improved by the time you read this -- check the web site for further
				1191	updates.
				1192	<ul>
				1193	<li>Mutexes, condition variables, thread-specific data and
				1194	<code>pthread_once</code> currently work.
				1195	<p>
				1196	<li>Various attribute-like calls are handled but ignored.
				1197	You get a warning message.
				1198	<p>
				1199	<li>The main big omission is proper cleanup support for cancellation.
				1200	<code>pthread_cancel</code> works, but instantly nukes the target
				1201	thread without giving it any chance to clean up. Also, when a
				1202	thread exits, it does not run any cleanup handlers.
				1203	<p>
				1204	<li>Currently the following syscalls are thread-safe (nonblocking):
				1205	<code>write</code> <code>read</code> <code>nanosleep</code>
				1206	<code>sleep</code> <code>select</code> and <code>poll</code>.
				1207	<p>
				1208	<li>The POSIX requirement that each thread have its own
				1209	signal-blocking mask is not done; the signal handling mechanism is
				1210	thread-unaware and all signals are delivered to the main thread,
				1211	antidisirregardless.
				1212	</ul>
				1213
				1214
				1215	As of 1 May 02, the following programs now work fine on my RedHat 7.2
				1216	box: Opera 6.0Beta2, KNode in KDE 3.0, Mozilla-0.9.2.1 and
				1217	Galeon-0.11.3, both as supplied with RedHat 7.2.
				1218	<p>
				1219	Mozilla 1.0RC1 crashes because it jumps to location zero: <code>Jump
				1220	to the invalid address stated on the next line</code>. Other people
				1221	have reported the same thing. Despite considerable effort in tracking
				1222	this down, I cannot figure out what's going on. If you have a program
				1223	which does this, is small enough that I have half a hope of making
				1224	sense of it, and is open-source (or at least you'd be happy for me to
				1225	look at), I'd be very grateful to have it.
				1226	<p>
				1227	On the other hand, I have received mail from at least one person
				1228	who appears to be successful in running CVS builds of Mozilla on
				1229	Valgrind.
				1230
				1231
				1232
				1233	<a name="install"></a>
				1234	<h3>2.10  Building and installing</h3>
				1235
				1236	We now use the standard Unix <code>./configure</code>,
				1237	<code>make</code>, <code>make install</code> mechanism, and I have
				1238	attempted to ensure that it works on machines with kernel 2.2 or 2.4
				1239	and glibc 2.1.X or 2.2.X. I don't think there is much else to say.
				1240	There are no options apart from the usual <code>--prefix</code> that
				1241	you should give to <code>./configure</code>.
				1242	<p>
				1243	Let me know if you have build problems.
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	1244
				1245
				1246
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1247	<a name="problems"></a>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1248	<h3>2.11  If you have problems</h3>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1249	Mail me (<a href="mailto:jseward@acm.org">jseward@acm.org</a>).
				1250
				1251	<p>See <a href="#limits">Section 4</a> for the known limitations of
				1252	Valgrind, and for a list of programs which are known not to work on
				1253	it.
				1254
				1255	<p>The translator/instrumentor has a lot of assertions in it. They
				1256	are permanently enabled, and I have no plans to disable them. If one
				1257	of these breaks, please mail me!
				1258
				1259	<p>If you get an assertion failure on the expression
				1260	<code>chunkSane(ch)</code> in <code>vg_free()</code> in
				1261	<code>vg_malloc.c</code>, this may have happened because your program
				1262	wrote off the end of a malloc'd block, or before its beginning.
				1263	Valgrind should have emitted a proper message to that effect before
				1264	dying in this way. This is a known problem which I should fix.
				1265	<p>
				1266
				1267	<hr width="100%">
				1268
				1269	<a name="machine"></a>
				1270	<h2>3  Details of the checking machinery</h2>
				1271
				1272	Read this section if you want to know, in detail, exactly what and how
				1273	Valgrind is checking.
				1274
				1275	<a name="vvalue"></a>
				1276	<h3>3.1  Valid-value (V) bits</h3>
				1277
				1278	It is simplest to think of Valgrind implementing a synthetic Intel x86
				1279	CPU which is identical to a real CPU, except for one crucial detail.
				1280	Every bit (literally) of data processed, stored and handled by the
				1281	real CPU has, in the synthetic CPU, an associated "valid-value" bit,
				1282	which says whether or not the accompanying bit has a legitimate value.
				1283	In the discussions which follow, this bit is referred to as the V
				1284	(valid-value) bit.
				1285
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1286	<p>Each byte in the system therefore has a 8 V bits which follow
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1287	it wherever it goes. For example, when the CPU loads a word-size item
				1288	(4 bytes) from memory, it also loads the corresponding 32 V bits from
				1289	a bitmap which stores the V bits for the process' entire address
				1290	space. If the CPU should later write the whole or some part of that
				1291	value to memory at a different address, the relevant V bits will be
				1292	stored back in the V-bit bitmap.
				1293
				1294	<p>In short, each bit in the system has an associated V bit, which
				1295	follows it around everywhere, even inside the CPU. Yes, the CPU's
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1296	(integer and <code>%eflags</code>) registers have their own V bit
				1297	vectors.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1298
				1299	<p>Copying values around does not cause Valgrind to check for, or
				1300	report on, errors. However, when a value is used in a way which might
				1301	conceivably affect the outcome of your program's computation, the
				1302	associated V bits are immediately checked. If any of these indicate
				1303	that the value is undefined, an error is reported.
				1304
				1305	<p>Here's an (admittedly nonsensical) example:
				1306	<pre>
				1307	int i, j;
				1308	int a[10], b[10];
				1309	for (i = 0; i < 10; i++) {
				1310	j = a[i];
				1311	b[i] = j;
				1312	}
				1313	</pre>
				1314
				1315	<p>Valgrind emits no complaints about this, since it merely copies
				1316	uninitialised values from <code>a[]</code> into <code>b[]</code>, and
				1317	doesn't use them in any way. However, if the loop is changed to
				1318	<pre>
				1319	for (i = 0; i < 10; i++) {
				1320	j += a[i];
				1321	}
				1322	if (j == 77)
				1323	printf("hello there\n");
				1324	</pre>
				1325	then Valgrind will complain, at the <code>if</code>, that the
				1326	condition depends on uninitialised values.
				1327
				1328	<p>Most low level operations, such as adds, cause Valgrind to
				1329	use the V bits for the operands to calculate the V bits for the
				1330	result. Even if the result is partially or wholly undefined,
				1331	it does not complain.
				1332
				1333	<p>Checks on definedness only occur in two places: when a value is
				1334	used to generate a memory address, and where control flow decision
				1335	needs to be made. Also, when a system call is detected, valgrind
				1336	checks definedness of parameters as required.
				1337
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1338	<p>If a check should detect undefinedness, an error message is
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1339	issued. The resulting value is subsequently regarded as well-defined.
				1340	To do otherwise would give long chains of error messages. In effect,
				1341	we say that undefined values are non-infectious.
				1342
				1343	<p>This sounds overcomplicated. Why not just check all reads from
				1344	memory, and complain if an undefined value is loaded into a CPU register?
				1345	Well, that doesn't work well, because perfectly legitimate C programs routinely
				1346	copy uninitialised values around in memory, and we don't want endless complaints
				1347	about that. Here's the canonical example. Consider a struct
				1348	like this:
				1349	<pre>
				1350	struct S { int x; char c; };
				1351	struct S s1, s2;
				1352	s1.x = 42;
				1353	s1.c = 'z';
				1354	s2 = s1;
				1355	</pre>
				1356
				1357	<p>The question to ask is: how large is <code>struct S</code>, in
				1358	bytes? An int is 4 bytes and a char one byte, so perhaps a struct S
				1359	occupies 5 bytes? Wrong. All (non-toy) compilers I know of will
				1360	round the size of <code>struct S</code> up to a whole number of words,
				1361	in this case 8 bytes. Not doing this forces compilers to generate
				1362	truly appalling code for subscripting arrays of <code>struct
				1363	S</code>'s.
				1364
				1365	<p>So s1 occupies 8 bytes, yet only 5 of them will be initialised.
				1366	For the assignment <code>s2 = s1</code>, gcc generates code to copy
				1367	all 8 bytes wholesale into <code>s2</code> without regard for their
				1368	meaning. If Valgrind simply checked values as they came out of
				1369	memory, it would yelp every time a structure assignment like this
				1370	happened. So the more complicated semantics described above is
				1371	necessary. This allows gcc to copy <code>s1</code> into
				1372	<code>s2</code> any way it likes, and a warning will only be emitted
				1373	if the uninitialised values are later used.
				1374
				1375	<p>One final twist to this story. The above scheme allows garbage to
				1376	pass through the CPU's integer registers without complaint. It does
				1377	this by giving the integer registers V tags, passing these around in
				1378	the expected way. This complicated and computationally expensive to
				1379	do, but is necessary. Valgrind is more simplistic about
				1380	floating-point loads and stores. In particular, V bits for data read
				1381	as a result of floating-point loads are checked at the load
				1382	instruction. So if your program uses the floating-point registers to
				1383	do memory-to-memory copies, you will get complaints about
				1384	uninitialised values. Fortunately, I have not yet encountered a
				1385	program which (ab)uses the floating-point registers in this way.
				1386
				1387	<a name="vaddress"></a>
				1388	<h3>3.2  Valid-address (A) bits</h3>
				1389
				1390	Notice that the previous section describes how the validity of values
				1391	is established and maintained without having to say whether the
				1392	program does or does not have the right to access any particular
				1393	memory location. We now consider the latter issue.
				1394
				1395	<p>As described above, every bit in memory or in the CPU has an
				1396	associated valid-value (V) bit. In addition, all bytes in memory, but
				1397	not in the CPU, have an associated valid-address (A) bit. This
				1398	indicates whether or not the program can legitimately read or write
				1399	that location. It does not give any indication of the validity or the
				1400	data at that location -- that's the job of the V bits -- only whether
				1401	or not the location may be accessed.
				1402
				1403	<p>Every time your program reads or writes memory, Valgrind checks the
				1404	A bits associated with the address. If any of them indicate an
				1405	invalid address, an error is emitted. Note that the reads and writes
				1406	themselves do not change the A bits, only consult them.
				1407
				1408	<p>So how do the A bits get set/cleared? Like this:
				1409
				1410	<ul>
				1411	<li>When the program starts, all the global data areas are marked as
				1412	accessible.</li><br>
				1413	<p>
				1414
				1415	<li>When the program does malloc/new, the A bits for the exactly the
				1416	area allocated, and not a byte more, are marked as accessible.
				1417	Upon freeing the area the A bits are changed to indicate
				1418	inaccessibility.</li><br>
				1419	<p>
				1420
				1421	<li>When the stack pointer register (%esp) moves up or down, A bits
				1422	are set. The rule is that the area from %esp up to the base of
				1423	the stack is marked as accessible, and below %esp is
				1424	inaccessible. (If that sounds illogical, bear in mind that the
				1425	stack grows down, not up, on almost all Unix systems, including
				1426	GNU/Linux.) Tracking %esp like this has the useful side-effect
				1427	that the section of stack used by a function for local variables
				1428	etc is automatically marked accessible on function entry and
				1429	inaccessible on exit.</li><br>
				1430	<p>
				1431
				1432	<li>When doing system calls, A bits are changed appropriately. For
				1433	example, mmap() magically makes files appear in the process's
				1434	address space, so the A bits must be updated if mmap()
				1435	succeeds.</li><br>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1436	<p>
				1437
				1438	<li>Optionally, your program can tell Valgrind about such changes
				1439	explicitly, using the client request mechanism described above.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1440	</ul>
				1441
				1442
				1443	<a name="together"></a>
				1444	<h3>3.3  Putting it all together</h3>
				1445	Valgrind's checking machinery can be summarised as follows:
				1446
				1447	<ul>
				1448	<li>Each byte in memory has 8 associated V (valid-value) bits,
				1449	saying whether or not the byte has a defined value, and a single
				1450	A (valid-address) bit, saying whether or not the program
				1451	currently has the right to read/write that address.</li><br>
				1452	<p>
				1453
				1454	<li>When memory is read or written, the relevant A bits are
				1455	consulted. If they indicate an invalid address, Valgrind emits
				1456	an Invalid read or Invalid write error.</li><br>
				1457	<p>
				1458
				1459	<li>When memory is read into the CPU's integer registers, the
				1460	relevant V bits are fetched from memory and stored in the
				1461	simulated CPU. They are not consulted.</li><br>
				1462	<p>
				1463
				1464	<li>When an integer register is written out to memory, the V bits
				1465	for that register are written back to memory too.</li><br>
				1466	<p>
				1467
				1468	<li>When memory is read into the CPU's floating point registers, the
				1469	relevant V bits are read from memory and they are immediately
				1470	checked. If any are invalid, an uninitialised value error is
				1471	emitted. This precludes using the floating-point registers to
				1472	copy possibly-uninitialised memory, but simplifies Valgrind in
				1473	that it does not have to track the validity status of the
				1474	floating-point registers.</li><br>
				1475	<p>
				1476
				1477	<li>As a result, when a floating-point register is written to
				1478	memory, the associated V bits are set to indicate a valid
				1479	value.</li><br>
				1480	<p>
				1481
				1482	<li>When values in integer CPU registers are used to generate a
				1483	memory address, or to determine the outcome of a conditional
				1484	branch, the V bits for those values are checked, and an error
				1485	emitted if any of them are undefined.</li><br>
				1486	<p>
				1487
				1488	<li>When values in integer CPU registers are used for any other
				1489	purpose, Valgrind computes the V bits for the result, but does
				1490	not check them.</li><br>
				1491	<p>
				1492
				1493	<li>One the V bits for a value in the CPU have been checked, they
				1494	are then set to indicate validity. This avoids long chains of
				1495	errors.</li><br>
				1496	<p>
				1497
				1498	<li>When values are loaded from memory, valgrind checks the A bits
				1499	for that location and issues an illegal-address warning if
				1500	needed. In that case, the V bits loaded are forced to indicate
				1501	Valid, despite the location being invalid.
				1502	<p>
				1503	This apparently strange choice reduces the amount of confusing
				1504	information presented to the user. It avoids the
				1505	unpleasant phenomenon in which memory is read from a place which
				1506	is both unaddressible and contains invalid values, and, as a
				1507	result, you get not only an invalid-address (read/write) error,
				1508	but also a potentially large set of uninitialised-value errors,
				1509	one for every time the value is used.
				1510	<p>
				1511	There is a hazy boundary case to do with multi-byte loads from
				1512	addresses which are partially valid and partially invalid. See
				1513	details of the flag <code>--partial-loads-ok</code> for details.
				1514	</li><br>
				1515	</ul>
				1516
				1517	Valgrind intercepts calls to malloc, calloc, realloc, valloc,
				1518	memalign, free, new and delete. The behaviour you get is:
				1519
				1520	<ul>
				1521
				1522	<li>malloc/new: the returned memory is marked as addressible but not
				1523	having valid values. This means you have to write on it before
				1524	you can read it.</li><br>
				1525	<p>
				1526
				1527	<li>calloc: returned memory is marked both addressible and valid,
				1528	since calloc() clears the area to zero.</li><br>
				1529	<p>
				1530
				1531	<li>realloc: if the new size is larger than the old, the new section
				1532	is addressible but invalid, as with malloc.</li><br>
				1533	<p>
				1534
				1535	<li>If the new size is smaller, the dropped-off section is marked as
				1536	unaddressible. You may only pass to realloc a pointer
				1537	previously issued to you by malloc/calloc/new/realloc.</li><br>
				1538	<p>
				1539
				1540	<li>free/delete: you may only pass to free a pointer previously
				1541	issued to you by malloc/calloc/new/realloc, or the value
				1542	NULL. Otherwise, Valgrind complains. If the pointer is indeed
				1543	valid, Valgrind marks the entire area it points at as
				1544	unaddressible, and places the block in the freed-blocks-queue.
				1545	The aim is to defer as long as possible reallocation of this
				1546	block. Until that happens, all attempts to access it will
				1547	elicit an invalid-address error, as you would hope.</li><br>
				1548	</ul>
				1549
				1550
				1551
				1552	<a name="signals"></a>
				1553	<h3>3.4  Signals</h3>
				1554
				1555	Valgrind provides suitable handling of signals, so, provided you stick
				1556	to POSIX stuff, you should be ok. Basic sigaction() and sigprocmask()
				1557	are handled. Signal handlers may return in the normal way or do
				1558	longjmp(); both should work ok. As specified by POSIX, a signal is
				1559	blocked in its own handler. Default actions for signals should work
				1560	as before. Etc, etc.
				1561
				1562	<p>Under the hood, dealing with signals is a real pain, and Valgrind's
				1563	simulation leaves much to be desired. If your program does
				1564	way-strange stuff with signals, bad things may happen. If so, let me
				1565	know. I don't promise to fix it, but I'd at least like to be aware of
				1566	it.
				1567
				1568
				1569	<a name="leaks"><a/>
				1570	<h3>3.5  Memory leak detection</h3>
				1571
				1572	Valgrind keeps track of all memory blocks issued in response to calls
				1573	to malloc/calloc/realloc/new. So when the program exits, it knows
				1574	which blocks are still outstanding -- have not been returned, in other
				1575	words. Ideally, you want your program to have no blocks still in use
				1576	at exit. But many programs do.
				1577
				1578	<p>For each such block, Valgrind scans the entire address space of the
				1579	process, looking for pointers to the block. One of three situations
				1580	may result:
				1581
				1582	<ul>
				1583	<li>A pointer to the start of the block is found. This usually
				1584	indicates programming sloppiness; since the block is still
				1585	pointed at, the programmer could, at least in principle, free'd
				1586	it before program exit.</li><br>
				1587	<p>
				1588
				1589	<li>A pointer to the interior of the block is found. The pointer
				1590	might originally have pointed to the start and have been moved
				1591	along, or it might be entirely unrelated. Valgrind deems such a
				1592	block as "dubious", that is, possibly leaked,
				1593	because it's unclear whether or
				1594	not a pointer to it still exists.</li><br>
				1595	<p>
				1596
				1597	<li>The worst outcome is that no pointer to the block can be found.
				1598	The block is classified as "leaked", because the
				1599	programmer could not possibly have free'd it at program exit,
				1600	since no pointer to it exists. This might be a symptom of
				1601	having lost the pointer at some earlier point in the
				1602	program.</li>
				1603	</ul>
				1604
				1605	Valgrind reports summaries about leaked and dubious blocks.
				1606	For each such block, it will also tell you where the block was
				1607	allocated. This should help you figure out why the pointer to it has
				1608	been lost. In general, you should attempt to ensure your programs do
				1609	not have any leaked or dubious blocks at exit.
				1610
				1611	<p>The precise area of memory in which Valgrind searches for pointers
				1612	is: all naturally-aligned 4-byte words for which all A bits indicate
				1613	addressibility and all V bits indicated that the stored value is
				1614	actually valid.
				1615
				1616	<p><hr width="100%">
				1617
				1618
				1619	<a name="limits"></a>
				1620	<h2>4  Limitations</h2>
				1621
				1622	The following list of limitations seems depressingly long. However,
				1623	most programs actually work fine.
				1624
				1625	<p>Valgrind will run x86-GNU/Linux ELF dynamically linked binaries, on
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1626	a kernel 2.2.X or 2.4.X system, subject to the following constraints:
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1627
				1628	<ul>
				1629	<li>No MMX, SSE, SSE2, 3DNow instructions. If the translator
				1630	encounters these, Valgrind will simply give up. It may be
				1631	possible to add support for them at a later time. Intel added a
				1632	few instructions such as "cmov" to the integer instruction set
				1633	on Pentium and later processors, and these are supported.
				1634	Nevertheless it's safest to think of Valgrind as implementing
				1635	the 486 instruction set.</li><br>
				1636	<p>
				1637
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1638	<li>Pthreads support is improving, but there are still significant
				1639	limitations in that department. See the section above on
				1640	Pthreads. Note that your program must be dynamically linked
				1641	against <code>libpthread.so</code>, so that Valgrind can
				1642	substitute its own implementation at program startup time. If
				1643	you're statically linked against it, things will fail
				1644	badly.</li><br>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1645	<p>
				1646
				1647	<li>Valgrind assumes that the floating point registers are not used
				1648	as intermediaries in memory-to-memory copies, so it immediately
				1649	checks V bits in floating-point loads/stores. If you want to
				1650	write code which copies around possibly-uninitialised values,
				1651	you must ensure these travel through the integer registers, not
				1652	the FPU.</li><br>
				1653	<p>
				1654
				1655	<li>If your program does its own memory management, rather than
				1656	using malloc/new/free/delete, it should still work, but
				1657	Valgrind's error checking won't be so effective.</li><br>
				1658	<p>
				1659
				1660	<li>Valgrind's signal simulation is not as robust as it could be.
				1661	Basic POSIX-compliant sigaction and sigprocmask functionality is
				1662	supplied, but it's conceivable that things could go badly awry
				1663	if you do wierd things with signals. Workaround: don't.
				1664	Programs that do non-POSIX signal tricks are in any case
				1665	inherently unportable, so should be avoided if
				1666	possible.</li><br>
				1667	<p>
				1668
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1669	<li>Programs which try to handle signals on
				1670	an alternate stack (sigaltstack) are not supported, although
				1671	they could be, with a bit of effort.</li><br>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1672	<p>
				1673
				1674	<li>Programs which switch stacks are not well handled. Valgrind
				1675	does have support for this, but I don't have great faith in it.
				1676	It's difficult -- there's no cast-iron way to decide whether a
				1677	large change in %esp is as a result of the program switching
				1678	stacks, or merely allocating a large object temporarily on the
				1679	current stack -- yet Valgrind needs to handle the two situations
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1680	differently. 1 May 02: this probably interacts badly with the
				1681	new pthread support. I haven't checked properly.</li><br>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1682	<p>
				1683
				1684	<li>x86 instructions, and system calls, have been implemented on
				1685	demand. So it's possible, although unlikely, that a program
				1686	will fall over with a message to that effect. If this happens,
				1687	please mail me ALL the details printed out, so I can try and
				1688	implement the missing feature.</li><br>
				1689	<p>
				1690
				1691	<li>x86 floating point works correctly, but floating-point code may
				1692	run even more slowly than integer code, due to my simplistic
				1693	approach to FPU emulation.</li><br>
				1694	<p>
				1695
				1696	<li>You can't Valgrind-ize statically linked binaries. Valgrind
				1697	relies on the dynamic-link mechanism to gain control at
				1698	startup.</li><br>
				1699	<p>
				1700
				1701	<li>Memory consumption of your program is majorly increased whilst
				1702	running under Valgrind. This is due to the large amount of
				1703	adminstrative information maintained behind the scenes. Another
				1704	cause is that Valgrind dynamically translates the original
				1705	executable and never throws any translation away, except in
				1706	those rare cases where self-modifying code is detected.
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1707	Translated, instrumented code is 12-14 times larger than the
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1708	original (!) so you can easily end up with 15+ MB of
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1709	translations when running (eg) a web browser.
				1710	</li>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1711	</ul>
				1712
				1713
				1714	Programs which are known not to work are:
				1715
				1716	<ul>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1717	<li>emacs starts up but immediately concludes it is out of memory
				1718	and aborts. Emacs has it's own memory-management scheme, but I
				1719	don't understand why this should interact so badly with
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1720	Valgrind. Emacs works fine if you build it to use the standard
				1721	malloc/free routines.</li><br>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1722	<p>
sewardj	ab1d9d1	2002-05-01 12:38:06 +0000	[diff] [blame]	1723	<li>Mozilla 1.0RC1 crashes because it jumps to location zero:
				1724	<code>Jump to the invalid address stated on the next
				1725	line</code>. Other people have reported the same thing.
				1726	Despite considerable effort in tracking this down, I cannot
				1727	figure out what's going on. If you have a program which does
				1728	this, is small enough that I have half a hope of making sense of
				1729	it, and is open-source (or at least you'd be happy for me to
				1730	look at), I'd be very grateful to have it.
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1731	</ul>
				1732
				1733
				1734	<p><hr width="100%">
				1735
				1736
				1737	<a name="howitworks"></a>
				1738	<h2>5  How it works -- a rough overview</h2>
				1739	Some gory details, for those with a passion for gory details. You
				1740	don't need to read this section if all you want to do is use Valgrind.
				1741
				1742	<a name="startb"></a>
				1743	<h3>5.1  Getting started</h3>
				1744
				1745	Valgrind is compiled into a shared object, valgrind.so. The shell
				1746	script valgrind sets the LD_PRELOAD environment variable to point to
				1747	valgrind.so. This causes the .so to be loaded as an extra library to
				1748	any subsequently executed dynamically-linked ELF binary, viz, the
				1749	program you want to debug.
				1750
				1751	<p>The dynamic linker allows each .so in the process image to have an
				1752	initialisation function which is run before main(). It also allows
				1753	each .so to have a finalisation function run after main() exits.
				1754
				1755	<p>When valgrind.so's initialisation function is called by the dynamic
				1756	linker, the synthetic CPU to starts up. The real CPU remains locked
				1757	in valgrind.so for the entire rest of the program, but the synthetic
				1758	CPU returns from the initialisation function. Startup of the program
				1759	now continues as usual -- the dynamic linker calls all the other .so's
				1760	initialisation routines, and eventually runs main(). This all runs on
				1761	the synthetic CPU, not the real one, but the client program cannot
				1762	tell the difference.
				1763
				1764	<p>Eventually main() exits, so the synthetic CPU calls valgrind.so's
				1765	finalisation function. Valgrind detects this, and uses it as its cue
				1766	to exit. It prints summaries of all errors detected, possibly checks
				1767	for memory leaks, and then exits the finalisation routine, but now on
				1768	the real CPU. The synthetic CPU has now lost control -- permanently
				1769	-- so the program exits back to the OS on the real CPU, just as it
				1770	would have done anyway.
				1771
				1772	<p>On entry, Valgrind switches stacks, so it runs on its own stack.
				1773	On exit, it switches back. This means that the client program
				1774	continues to run on its own stack, so we can switch back and forth
				1775	between running it on the simulated and real CPUs without difficulty.
				1776	This was an important design decision, because it makes it easy (well,
				1777	significantly less difficult) to debug the synthetic CPU.
				1778
				1779
				1780	<a name="engine"></a>
				1781	<h3>5.2  The translation/instrumentation engine</h3>
				1782
				1783	Valgrind does not directly run any of the original program's code. Only
				1784	instrumented translations are run. Valgrind maintains a translation
				1785	table, which allows it to find the translation quickly for any branch
				1786	target (code address). If no translation has yet been made, the
				1787	translator - a just-in-time translator - is summoned. This makes an
				1788	instrumented translation, which is added to the collection of
				1789	translations. Subsequent jumps to that address will use this
				1790	translation.
				1791
				1792	<p>Valgrind can optionally check writes made by the application, to
				1793	see if they are writing an address contained within code which has
				1794	been translated. Such a write invalidates translations of code
				1795	bracketing the written address. Valgrind will discard the relevant
				1796	translations, which causes them to be re-made, if they are needed
				1797	again, reflecting the new updated data stored there. In this way,
				1798	self modifying code is supported. In practice I have not found any
				1799	Linux applications which use self-modifying-code.
				1800
				1801	<p>The JITter translates basic blocks -- blocks of straight-line-code
				1802	-- as single entities. To minimise the considerable difficulties of
				1803	dealing with the x86 instruction set, x86 instructions are first
				1804	translated to a RISC-like intermediate code, similar to sparc code,
				1805	but with an infinite number of virtual integer registers. Initially
				1806	each insn is translated seperately, and there is no attempt at
				1807	instrumentation.
				1808
				1809	<p>The intermediate code is improved, mostly so as to try and cache
				1810	the simulated machine's registers in the real machine's registers over
				1811	several simulated instructions. This is often very effective. Also,
				1812	we try to remove redundant updates of the simulated machines's
				1813	condition-code register.
				1814
				1815	<p>The intermediate code is then instrumented, giving more
				1816	intermediate code. There are a few extra intermediate-code operations
				1817	to support instrumentation; it is all refreshingly simple. After
				1818	instrumentation there is a cleanup pass to remove redundant value
				1819	checks.
				1820
				1821	<p>This gives instrumented intermediate code which mentions arbitrary
				1822	numbers of virtual registers. A linear-scan register allocator is
				1823	used to assign real registers and possibly generate spill code. All
				1824	of this is still phrased in terms of the intermediate code. This
				1825	machinery is inspired by the work of Reuben Thomas (MITE).
				1826
				1827	<p>Then, and only then, is the final x86 code emitted. The
				1828	intermediate code is carefully designed so that x86 code can be
				1829	generated from it without need for spare registers or other
				1830	inconveniences.
				1831
				1832	<p>The translations are managed using a traditional LRU-based caching
				1833	scheme. The translation cache has a default size of about 14MB.
				1834
				1835	<a name="track"></a>
				1836
				1837	<h3>5.3  Tracking the status of memory</h3> Each byte in the
				1838	process' address space has nine bits associated with it: one A bit and
				1839	eight V bits. The A and V bits for each byte are stored using a
				1840	sparse array, which flexibly and efficiently covers arbitrary parts of
				1841	the 32-bit address space without imposing significant space or
				1842	performance overheads for the parts of the address space never
				1843	visited. The scheme used, and speedup hacks, are described in detail
				1844	at the top of the source file vg_memory.c, so you should read that for
				1845	the gory details.
				1846
				1847	<a name="sys_calls"></a>
				1848
				1849	<h3>5.4 System calls</h3>
				1850	All system calls are intercepted. The memory status map is consulted
				1851	before and updated after each call. It's all rather tiresome. See
				1852	vg_syscall_mem.c for details.
				1853
				1854	<a name="sys_signals"></a>
				1855
				1856	<h3>5.5  Signals</h3>
				1857	All system calls to sigaction() and sigprocmask() are intercepted. If
				1858	the client program is trying to set a signal handler, Valgrind makes a
				1859	note of the handler address and which signal it is for. Valgrind then
				1860	arranges for the same signal to be delivered to its own handler.
				1861
				1862	<p>When such a signal arrives, Valgrind's own handler catches it, and
				1863	notes the fact. At a convenient safe point in execution, Valgrind
				1864	builds a signal delivery frame on the client's stack and runs its
				1865	handler. If the handler longjmp()s, there is nothing more to be said.
				1866	If the handler returns, Valgrind notices this, zaps the delivery
				1867	frame, and carries on where it left off before delivering the signal.
				1868
				1869	<p>The purpose of this nonsense is that setting signal handlers
				1870	essentially amounts to giving callback addresses to the Linux kernel.
				1871	We can't allow this to happen, because if it did, signal handlers
				1872	would run on the real CPU, not the simulated one. This means the
				1873	checking machinery would not operate during the handler run, and,
				1874	worse, memory permissions maps would not be updated, which could cause
				1875	spurious error reports once the handler had returned.
				1876
				1877	<p>An even worse thing would happen if the signal handler longjmp'd
				1878	rather than returned: Valgrind would completely lose control of the
				1879	client program.
				1880
				1881	<p>Upshot: we can't allow the client to install signal handlers
				1882	directly. Instead, Valgrind must catch, on behalf of the client, any
				1883	signal the client asks to catch, and must delivery it to the client on
				1884	the simulated CPU, not the real one. This involves considerable
				1885	gruesome fakery; see vg_signals.c for details.
				1886	<p>
				1887
				1888	<hr width="100%">
				1889
				1890	<a name="example"></a>
				1891	<h2>6  Example</h2>
				1892	This is the log for a run of a small program. The program is in fact
				1893	correct, and the reported error is as the result of a potentially serious
				1894	code generation bug in GNU g++ (snapshot 20010527).
				1895	<pre>
				1896	sewardj@phoenix:~/newmat10$
				1897	~/Valgrind-6/valgrind -v ./bogon
				1898	==25832== Valgrind 0.10, a memory error detector for x86 RedHat 7.1.
				1899	==25832== Copyright (C) 2000-2001, and GNU GPL'd, by Julian Seward.
				1900	==25832== Startup, with flags:
				1901	==25832== --suppressions=/home/sewardj/Valgrind/redhat71.supp
				1902	==25832== reading syms from /lib/ld-linux.so.2
				1903	==25832== reading syms from /lib/libc.so.6
				1904	==25832== reading syms from /mnt/pima/jrs/Inst/lib/libgcc_s.so.0
				1905	==25832== reading syms from /lib/libm.so.6
				1906	==25832== reading syms from /mnt/pima/jrs/Inst/lib/libstdc++.so.3
				1907	==25832== reading syms from /home/sewardj/Valgrind/valgrind.so
				1908	==25832== reading syms from /proc/self/exe
				1909	==25832== loaded 5950 symbols, 142333 line number locations
				1910	==25832==
				1911	==25832== Invalid read of size 4
				1912	==25832== at 0x8048724: _ZN10BandMatrix6ReSizeEiii (bogon.cpp:45)
				1913	==25832== by 0x80487AF: main (bogon.cpp:66)
				1914	==25832== by 0x40371E5E: __libc_start_main (libc-start.c:129)
				1915	==25832== by 0x80485D1: (within /home/sewardj/newmat10/bogon)
				1916	==25832== Address 0xBFFFF74C is not stack'd, malloc'd or free'd
				1917	==25832==
				1918	==25832== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
				1919	==25832== malloc/free: in use at exit: 0 bytes in 0 blocks.
				1920	==25832== malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
				1921	==25832== For a detailed leak analysis, rerun with: --leak-check=yes
				1922	==25832==
				1923	==25832== exiting, did 1881 basic blocks, 0 misses.
				1924	==25832== 223 translations, 3626 bytes in, 56801 bytes out.
				1925	</pre>
				1926	<p>The GCC folks fixed this about a week before gcc-3.0 shipped.
				1927	<hr width="100%">
				1928	<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	1929
				1930
				1931
				1932	<a name="cache"></a>
				1933	<h2>7  Cache profiling</h2>
				1934	As well as memory debugging, Valgrind also allows you to do cache simulations
				1935	and annotate your source line-by-line with the number of cache misses. In
				1936	particular, it records:
				1937	<ul>
				1938	<li>L1 instruction cache reads and misses;
				1939	<li>L1 data cache reads and read misses, writes and write misses;
				1940	<li>L2 unified cache reads and read misses, writes and writes misses.
				1941	</ul>
				1942	On a modern x86 machine, an L1 miss will typically cost around 10 cycles,
				1943	and an L2 miss can cost as much as 200 cycles. Detailed cache profiling can be
				1944	very useful for improving the performance of your program.
				1945
				1946	Please note that this is an experimental feature. Any feedback, bug-fixes,
				1947	suggestions, etc, welcome.
				1948
				1949
				1950	<h3>7.1  Overview</h3>
				1951	First off, as for normal Valgrind use, you probably want to turn on debugging
				1952	info (the <code>-g</code> flag). But by contrast with normal Valgrind use, you
				1953	probably <b>do</b> want to turn optimisation on, since you should profile your
				1954	program as it will be normally run.
				1955
				1956	The three steps are:
				1957	<ol>
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	1958	<li>Generate a cache simulator for your machine's cache
				1959	configuration with the supplied <code>vg_cachegen</code>
				1960	program, and recompile Valgrind with <code>make install</code>.
				1961	<p>
				1962	The default settings are for an AMD Athlon, and you will get
				1963	useful information with the defaults, so you can skip this step
				1964	if you want. Nevertheless, for accurate cache profiles you will
				1965	need use <code>vg_cachegen</code> to customise
				1966	<code>cachegrind</code> for your system.
				1967	<p>
				1968	This step only needs to be done once, unless you are interested
				1969	in simulating different cache configurations (eg. first
				1970	concentrating on instruction cache misses, then on data cache
				1971	misses).
				1972	</li>
				1973	<p>
				1974	<li>Run your program with <code>cachegrind</code> in front of the
				1975	normal command line invocation. When the program finishes,
				1976	Valgrind will print summary cache statistics. It also collects
				1977	line-by-line information in a file <code>cachegrind.out</code>.
				1978	<p>
				1979	This step should be done every time you want to collect
				1980	information about a new program, a changed program, or about the
				1981	same program with different input.
				1982	</li>
				1983	<p>
				1984	<li>Generate a function-by-function summary, and possibly annotate
				1985	source files with 'vg_annotate'. Source files to annotate can be
				1986	specified manually, or manually on the command line, or
				1987	"interesting" source files can be annotated automatically with
				1988	the <code>--auto=yes</code> option. You can annotate C/C++
				1989	files or assembly language files equally easily.</li>
				1990	<p>
				1991	This step can be performed as many times as you like for each
				1992	Step 2. You may want to do multiple annotations showing
				1993	different information each time.<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	1994	</ol>
				1995
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	1996	The steps are described in detail in the following sections.<p>
				1997
				1998
				1999	<a name="generate"></a>
				2000	<h3>7.3  Generating a cache simulator</h3>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2001
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2002	Although Valgrind comes with a pre-generated cache simulator, it most
				2003	likely won't match the cache configuration of your machine, so you
				2004	should generate a new simulator.<p>
				2005
				2006	You need to generate three files, one for each of the I1, D1 and L2
				2007	caches. For each cache, you need to know the:
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2008	<ul>
				2009	<li>Cache size (bytes);
				2010	<li>Line size (bytes);
				2011	<li>Associativity.
				2012	</ul>
				2013
				2014	vg_cachegen takes three options:
				2015	<ul>
				2016	<li><code>--I1=size,line_size,associativity</code>
				2017	<li><code>--D1=size,line_size,associativity</code>
				2018	<li><code>--L2=size,line_size,associativity</code>
				2019	</ul>
				2020
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2021	You can specify one, two or all three caches per invocation of
				2022	vg_cachegen. It checks that the configuration is sensible before
				2023	generating the simulators; to see the allowed values, run
				2024	<code>vg_cachegen -h</code>.<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2025
				2026	An example invocation would be:
				2027
				2028	<blockquote><code>
				2029	vg_cachegen --I1=65536,64,2 --D1=65536,64,2 --L2=262144,64,8
				2030	</code></blockquote>
				2031
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2032	This simulates a machine with a 128KB split L1 2-way associative
				2033	cache, and a 256KB unified 8-way associative L2 cache. Both caches
				2034	have 64B lines.<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2035
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2036	If you don't know your cache configuration, you'll have to find it
				2037	out. (Ideally <code>vg_cachegen</code> could auto-identify your cache
				2038	configuration using the CPUID instruction, which could be done
				2039	automatically during installation, and this whole step could be
				2040	skipped.)<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2041
				2042
				2043	<h3>7.4  Cache simulation specifics</h3>
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2044
				2045	<code>vg_cachegen</code> only generates simulations for a machine with
				2046	a split L1 cache and a unified L2 cache. This configuration is used
				2047	for all (modern) x86-based machines we are aware of. Old Cyrix CPUs
				2048	had a unified I and D L1 cache, but they are ancient history now.<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2049
				2050	The more specific characteristics of the simulation are as follows.
				2051
				2052	<ul>
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2053	<li>Write-allocate: when a write miss occurs, the block written to
				2054	is brought into the D1 cache. Most modern caches have this
				2055	property.</li><p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2056
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2057	<li>Bit-selection hash function: the line(s) in the cache to which a
				2058	memory block maps is chosen by the middle bits M--(M+N-1) of the
				2059	byte address, where:
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2060	<ul>
				2061	<li> line size = 2^M bytes </li>
				2062	<li>(cache size / line size) = 2^N bytes</li>
				2063	</ul> </li><p>
				2064
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2065	<li>Inclusive L2 cache: the L2 cache replicates all the entries of
				2066	the L1 cache. This is standard on Pentium chips, but AMD
				2067	Athlons use an exclusive L2 cache that only holds blocks evicted
				2068	from L1. Ditto AMD Durons and most modern VIAs.</li><p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2069	</ul>
				2070
				2071	Other noteworthy behaviour:
				2072
				2073	<ul>
				2074	<li>References that straddle two cache lines are treated as follows:</li>
				2075	<ul>
				2076	<li>If both blocks hit --> counted as one hit</li>
				2077	<li>If one block hits, the other misses --> counted as one miss</li>
				2078	<li>If both blocks miss --> counted as one miss (not two)</li>
				2079	</ul><p>
				2080
				2081	<li>Instructions that modify a memory location (eg. <code>inc</code> and
				2082	<code>dec</code>) are counted as doing just a read, ie. a single data
				2083	reference. This may seem strange, but since the write can never cause a
				2084	miss (the read guarantees the block is in the cache) it's not very
				2085	interesting.<p>
				2086
				2087	Thus it measures not the number of times the data cache is accessed, but
				2088	the number of times a data cache miss could occur.<p>
				2089	</li>
				2090	</ul>
				2091
				2092	If you are interested in simulating a cache with different properties, it is
				2093	not particularly hard to write your own cache simulator, or to modify existing
				2094	ones in <code>vg_cachesim_I1.c</code>, <code>vg_cachesim_I1.c</code> and
				2095	<code>vg_cachesim_I1.c</code>. We'd be interested to hear from anyone who
				2096	does.
				2097
				2098
				2099	<a name="profile"></a>
				2100	<h3>7.5  Profiling programs</h3>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2101
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2102	Cache profiling is enabled by using the <code>--cachesim=yes</code>
				2103	option to the <code>valgrind</code> shell script. Alternatively, it
				2104	is probably more convenient to use the <code>cachegrind</code> script.
				2105	This automatically turns off Valgrind's memory checking functions,
				2106	since the cache simulation is slow enough already, and you probably
				2107	don't want to do both at once.
				2108	<p>
				2109	To gather cache profiling information about the program <code>ls
				2110	-l<code, type:
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2111
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2112	<blockquote><code>cachegrind ls -l</code></blockquote>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2113
				2114	The program will execute (slowly). Upon completion, summary statistics
				2115	that look like this will be printed:
				2116
				2117	<pre>
				2118	==31751== I refs: 27,742,716
				2119	==31751== I1 misses: 276
				2120	==31751== L2 misses: 275
				2121	==31751== I1 miss rate: 0.0%
				2122	==31751== L2i miss rate: 0.0%
				2123	==31751==
				2124	==31751== D refs: 15,430,290 (10,955,517 rd + 4,474,773 wr)
				2125	==31751== D1 misses: 41,185 ( 21,905 rd + 19,280 wr)
				2126	==31751== L2 misses: 23,085 ( 3,987 rd + 19,098 wr)
				2127	==31751== D1 miss rate: 0.2% ( 0.1% + 0.4%)
				2128	==31751== L2d miss rate: 0.1% ( 0.0% + 0.4%)
				2129	==31751==
				2130	==31751== L2 misses: 23,360 ( 4,262 rd + 19,098 wr)
				2131	==31751== L2 miss rate: 0.0% ( 0.0% + 0.4%)
				2132	</pre>
				2133
				2134	Cache accesses for instruction fetches are summarised first, giving the
				2135	number of fetches made (this is the number of instructions executed, which
				2136	can be useful to know in its own right), the number of I1 misses, and the
				2137	number of L2 instruction (<code>L2i</code>) misses.<p>
				2138
				2139	Cache accesses for data follow. The information is similar to that of the
				2140	instruction fetches, except that the values are also shown split between reads
				2141	and writes (note each row's <code>rd</code> and <code>wr</code> values add up
				2142	to the row's total).<p>
				2143
				2144	Combined instruction and data figures for the L2 cache follow that.<p>
				2145
				2146
				2147	<h3>7.6  Output file</h3>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2148
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2149	As well as printing summary information, Cachegrind also writes
				2150	line-by-line cache profiling information to a file named
				2151	<code>cachegrind.out</code>. This file is human-readable, but is best
				2152	interpreted by the accompanying program <code>vg_annotate</code>,
				2153	described in the next section.
				2154	<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2155	Things to note about the <code>cachegrind.out</code> file:
				2156	<ul>
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2157	<li>It is written every time <code>valgrind --cachesim=yes</code> or
				2158	<code>cachegrind</code> is run, and will overwrite any existing
				2159	<code>cachegrind.out</code> in the current directory.</li>
				2160	<p>
				2161	<li>It can be huge: <code>ls -l</code> generates a file of about
				2162	350KB. Browsing a few files and web pages with a Konqueror
				2163	built with full debugging information generates a file
				2164	of around 15 MB.</li>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2165	</ul>
				2166
				2167
				2168	<a name="annotate"></a>
				2169	<h3>7.7  Annotating C/C++ programs</h3>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2170
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2171	Before using <code>vg_annotate</code>, it is worth widening your
				2172	window to be at least 120-characters wide if possible, as the output
				2173	lines can be quite long.
				2174	<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2175	To get a function-by-function summary, run <code>vg_annotate</code> in
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2176	directory containing a <code>cachegrind.out</code> file. The output
				2177	looks like this:
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2178
				2179	<pre>
				2180	--------------------------------------------------------------------------------
				2181	I1 cache: 65536 B, 64 B, 2-way associative
				2182	D1 cache: 65536 B, 64 B, 2-way associative
				2183	L2 cache: 262144 B, 64 B, 8-way associative
				2184	Command: concord vg_to_ucode.c
				2185	Events recorded: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				2186	Events shown: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				2187	Event sort order: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				2188	Threshold: 99%
				2189	Chosen for annotation:
				2190	Auto-annotation: on
				2191
				2192	--------------------------------------------------------------------------------
				2193	Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				2194	--------------------------------------------------------------------------------
				2195	27,742,716 276 275 10,955,517 21,905 3,987 4,474,773 19,280 19,098 PROGRAM TOTALS
				2196
				2197	--------------------------------------------------------------------------------
				2198	Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw file:function
				2199	--------------------------------------------------------------------------------
				2200	8,821,482 5 5 2,242,702 1,621 73 1,794,230 0 0 getc.c:_IO_getc
				2201	5,222,023 4 4 2,276,334 16 12 875,959 1 1 concord.c:get_word
				2202	2,649,248 2 2 1,344,810 7,326 1,385 . . . vg_main.c:strcmp
				2203	2,521,927 2 2 591,215 0 0 179,398 0 0 concord.c:hash
				2204	2,242,740 2 2 1,046,612 568 22 448,548 0 0 ctype.c:tolower
				2205	1,496,937 4 4 630,874 9,000 1,400 279,388 0 0 concord.c:insert
				2206	897,991 51 51 897,831 95 30 62 1 1 ???:???
				2207	598,068 1 1 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__flockfile
				2208	598,068 0 0 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__funlockfile
				2209	598,024 4 4 213,580 35 16 149,506 0 0 vg_clientmalloc.c:malloc
				2210	446,587 1 1 215,973 2,167 430 129,948 14,057 13,957 concord.c:add_existing
				2211	341,760 2 2 128,160 0 0 128,160 0 0 vg_clientmalloc.c:vg_trap_here_WRAPPER
				2212	320,782 4 4 150,711 276 0 56,027 53 53 concord.c:init_hash_table
				2213	298,998 1 1 106,785 0 0 64,071 1 1 concord.c:create
				2214	149,518 0 0 149,516 0 0 1 0 0 ???:tolower@@GLIBC_2.0
				2215	149,518 0 0 149,516 0 0 1 0 0 ???:fgetc@@GLIBC_2.0
				2216	95,983 4 4 38,031 0 0 34,409 3,152 3,150 concord.c:new_word_node
				2217	85,440 0 0 42,720 0 0 21,360 0 0 vg_clientmalloc.c:vg_bogus_epilogue
				2218	</pre>
				2219
				2220	First up is a summary of the annotation options:
				2221
				2222	<ul>
				2223	<li>I1 cache, D1 cache, L2 cache: cache configuration. So you know the
				2224	configuration with which these results were obtained.</li><p>
				2225
				2226	<li>Command: the command line invocation of the program under
				2227	examination.</li><p>
				2228
				2229	<li>Events recorded: event abbreviations are:<p>
				2230	<ul>
				2231	<li><code>Ir </code>: I cache reads (ie. instructions executed)</li>
				2232	<li><code>I1mr</code>: I1 cache read misses</li>
				2233	<li><code>I2mr</code>: L2 cache instruction read misses</li>
				2234	<li><code>Dr </code>: D cache reads (ie. memory reads)</li>
				2235	<li><code>D1mr</code>: D1 cache read misses</li>
				2236	<li><code>D2mr</code>: L2 cache data read misses</li>
				2237	<li><code>Dw </code>: D cache writes (ie. memory writes)</li>
				2238	<li><code>D1mw</code>: D1 cache write misses</li>
				2239	<li><code>D2mw</code>: L2 cache data write misses</li>
				2240	</ul><p>
				2241	Note that D1 total accesses is given by <code>D1mr</code> +
				2242	<code>D1mw</code>, and that L2 total accesses is given by
				2243	<code>I2mr</code> + <code>D2mr</code> + <code>D2mw</code>.</li><p>
				2244
				2245	<li>Events shown: the events shown (a subset of events gathered). This can
				2246	be adjusted with the <code>--show</code> option.</li><p>
				2247
				2248	<li>Event sort order: the sort order in which functions are shown. For
				2249	example, in this case the functions are sorted from highest
				2250	<code>Ir</code> counts to lowest. If two functions have identical
				2251	<code>Ir</code> counts, they will then be sorted by <code>I1mr</code>
				2252	counts, and so on. This order can be adjusted with the
				2253	<code>--sort</code> option.<p>
				2254
				2255	Note that this dictates the order the functions appear. It is <b>not</b>
				2256	the order in which the columns appear; that is dictated by the "events
				2257	shown" line (and can be changed with the <code>--sort</code> option).
				2258	</li><p>
				2259
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2260	<li>Threshold: <code>vg_annotate</code> by default omits functions
				2261	that cause very low numbers of misses to avoid drowning you in
				2262	information. In this case, vg_annotate shows summaries the
				2263	functions that account for 99% of the <code>Ir</code> counts;
				2264	<code>Ir</code> is chosen as the threshold event since it is the
				2265	primary sort event. The threshold can be adjusted with the
				2266	<code>--threshold</code> option.</li><p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2267
				2268	<li>Chosen for annotation: names of files specified manually for annotation;
				2269	in this case none.</li><p>
				2270
				2271	<li>Auto-annotation: whether auto-annotation was requested via the
				2272	<code>--auto=yes</code> option. In this case no.</li><p>
				2273	</ul>
				2274
				2275	Then follows summary statistics for the whole program. These are similar
				2276	to the summary provided when running <code>valgrind --cachesim=yes</code>.<p>
				2277
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2278	Then follows function-by-function statistics. Each function is
				2279	identified by a <code>file_name:function_name</code> pair. If a column
				2280	contains only a dot it means the function never performs
				2281	that event (eg. the third row shows that <code>strcmp()</code>
				2282	contains no instructions that write to memory). The name
				2283	<code>???</code> is used if the the file name and/or function name
				2284	could not be determined from debugging information. If most of the
				2285	entries have the form <code>???:???</code> the program probably wasn't
				2286	compiled with <code>-g</code>. <p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2287
				2288	It is worth noting that functions will come from three types of source files:
				2289	<ol>
				2290	<li> From the profiled program (<code>concord.c</code> in this example).</li>
				2291	<li>From libraries (eg. <code>getc.c</code>)</li>
				2292	<li>From Valgrind's implementation of some libc functions (eg.
				2293	<code>vg_clientmalloc.c:malloc</code>). These are recognisable because
				2294	the filename begins with <code>vg_</code>, and is probably one of
				2295	<code>vg_main.c</code>, <code>vg_clientmalloc.c</code> or
				2296	<code>vg_mylibc.c</code>.
				2297	</li>
				2298	</ol>
				2299
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2300	There are two ways to annotate source files -- by choosing them
				2301	manually, or with the <code>--auto=yes</code> option. To do it
				2302	manually, just specify the filenames as arguments to
				2303	<code>vg_annotate</code>. For example, the output from running
				2304	<code>vg_annotate concord.c</code> for our example produces the same
				2305	output as above followed by an annotated version of
				2306	<code>concord.c</code>, a section of which looks like:
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2307
				2308	<pre>
				2309	--------------------------------------------------------------------------------
				2310	-- User-annotated source: concord.c
				2311	--------------------------------------------------------------------------------
				2312	Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				2313
				2314	[snip]
				2315
				2316	. . . . . . . . . void init_hash_table(char file_name, Word_Node table[])
				2317	3 1 1 . . . 1 0 0 {
				2318	. . . . . . . . . FILE *file_ptr;
				2319	. . . . . . . . . Word_Info *data;
				2320	1 0 0 . . . 1 1 1 int line = 1, i;
				2321	. . . . . . . . .
				2322	5 0 0 . . . 3 0 0 data = (Word_Info *) create(sizeof(Word_Info));
				2323	. . . . . . . . .
				2324	4,991 0 0 1,995 0 0 998 0 0 for (i = 0; i < TABLE_SIZE; i++)
				2325	3,988 1 1 1,994 0 0 997 53 52 table[i] = NULL;
				2326	. . . . . . . . .
				2327	. . . . . . . . . /* Open file, check it. */
				2328	6 0 0 1 0 0 4 0 0 file_ptr = fopen(file_name, "r");
				2329	2 0 0 1 0 0 . . . if (!(file_ptr)) {
				2330	. . . . . . . . . fprintf(stderr, "Couldn't open '%s'.\n", file_name);
				2331	1 1 1 . . . . . . exit(EXIT_FAILURE);
				2332	. . . . . . . . . }
				2333	. . . . . . . . .
				2334	165,062 1 1 73,360 0 0 91,700 0 0 while ((line = get_word(data, line, file_ptr)) != EOF)
				2335	146,712 0 0 73,356 0 0 73,356 0 0 insert(data->;word, data->line, table);
				2336	. . . . . . . . .
				2337	4 0 0 1 0 0 2 0 0 free(data);
				2338	4 0 0 1 0 0 2 0 0 fclose(file_ptr);
				2339	3 0 0 2 0 0 . . . }
				2340	</pre>
				2341
				2342	(Although column widths are automatically minimised, a wide terminal is clearly
				2343	useful.)<p>
				2344
				2345	Each source file is clearly marked (<code>User-annotated source</code>) as
				2346	having been chosen manually for annotation. If the file was found in one of
				2347	the directories specified with the <code>-I</code>/<code>--include</code>
				2348	option, the directory and file are both given.<p>
				2349
				2350	Each line is annotated with its event counts. Events not applicable for a line
				2351	are represented by a `.'; this is useful for distinguishing between an event
				2352	which cannot happen, and one which can but did not.<p>
				2353
				2354	Sometimes only a small section of a source file is executed. To minimise
				2355	uninteresting output, Valgrind only shows annotated lines and lines within a
				2356	small distance of annotated lines. Gaps are marked with the line numbers so
				2357	you know which part of a file the shown code comes from, eg:
				2358
				2359	<pre>
				2360	(figures and code for line 704)
				2361	-- line 704 ----------------------------------------
				2362	-- line 878 ----------------------------------------
				2363	(figures and code for line 878)
				2364	</pre>
				2365
				2366	The amount of context to show around annotated lines is controlled by the
				2367	<code>--context</code> option.<p>
				2368
				2369	To get automatic annotation, run <code>vg_annotate --auto=yes</code>.
				2370	vg_annotate will automatically annotate every source file it can find that is
				2371	mentioned in the function-by-function summary. Therefore, the files chosen for
				2372	auto-annotation are affected by the <code>--sort</code> and
				2373	<code>--threshold</code> options. Each source file is clearly marked
				2374	(<code>Auto-annotated source</code>) as being chosen automatically. Any files
				2375	that could not be found are mentioned at the end of the output, eg:
				2376
				2377	<pre>
				2378	--------------------------------------------------------------------------------
				2379	The following files chosen for auto-annotation could not be found:
				2380	--------------------------------------------------------------------------------
				2381	getc.c
				2382	ctype.c
				2383	../sysdeps/generic/lockfile.c
				2384	</pre>
				2385
				2386	This is quite common for library files, since libraries are usually compiled
				2387	with debugging information, but the source files are often not present on a
				2388	system. If a file is chosen for annotation <b>both</b> manually and
				2389	automatically, it is marked as <code>User-annotated source</code>.
				2390
				2391	Use the <code>-I/--include</code> option to tell Valgrind where to look for
				2392	source files if the filenames found from the debugging information aren't
				2393	specific enough.
				2394
				2395	Beware that vg_annotate can take some time to digest large
				2396	<code>cachegrind.out</code> files, eg. 30 seconds or more. Also beware that
				2397	auto-annotation can produce a lot of output if your program is large!
				2398
				2399
				2400	<h3>7.8  Annotating assembler programs</h3>
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2401
				2402	Valgrind can annotate assembler programs too, or annotate the
				2403	assembler generated for your C program. Sometimes this is useful for
				2404	understanding what is really happening when an interesting line of C
				2405	code is translated into multiple instructions.<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2406
				2407	To do this, you just need to assemble your <code>.s</code> files with
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2408	assembler-level debug information. gcc doesn't do this, but you can
				2409	use the GNU assembler with the <code>--gstabs</code> option to
				2410	generate object files with this information, eg:
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2411
				2412	<blockquote><code>as --gstabs foo.s</code></blockquote>
				2413
				2414	You can then profile and annotate source files in the same way as for C/C++
				2415	programs.
				2416
				2417
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2418	<h3>7.9  <code>vg_annotate</code> options</h3>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2419	<ul>
				2420	<li><code>-h, --help</code></li><p>
				2421	<li><code>-v, --version</code><p>
				2422
				2423	Help and version, as usual.</li>
				2424
				2425	<li><code>--sort=A,B,C</code> [default: order in
				2426	<code>cachegrind.out</code>]<p>
				2427	Specifies the events upon which the sorting of the function-by-function
				2428	entries will be based. Useful if you want to concentrate on eg. I cache
				2429	misses (<code>--sort=I1mr,I2mr</code>), or D cache misses
				2430	(<code>--sort=D1mr,D2mr</code>), or L2 misses
				2431	(<code>--sort=D2mr,I2mr</code>).</li><p>
				2432
				2433	<li><code>--show=A,B,C</code> [default: all, using order in
				2434	<code>cachegrind.out</code>]<p>
				2435	Specifies which events to show (and the column order). Default is to use
				2436	all present in the <code>cachegrind.out</code> file (and use the order in
				2437	the file).</li><p>
				2438
				2439	<li><code>--threshold=X</code> [default: 99%] <p>
				2440	Sets the threshold for the function-by-function summary. Functions are
				2441	shown that account for more than X% of all the primary sort events. If
				2442	auto-annotating, also affects which files are annotated.</li><p>
				2443
				2444	<li><code>--auto=no</code> [default]<br>
				2445	<code>--auto=yes</code> <p>
				2446	When enabled, automatically annotates every file that is mentioned in the
				2447	function-by-function summary that can be found. Also gives a list of
				2448	those that couldn't be found.
				2449
				2450	<li><code>--context=N</code> [default: 8]<p>
				2451	Print N lines of context before and after each annotated line. Avoids
				2452	printing large sections of source files that were not executed. Use a
				2453	large number (eg. 10,000) to show all source lines.
				2454	</li><p>
				2455
				2456	<li><code>-I=<dir>, --include=<dir></code>
				2457	[default: empty string]<p>
				2458	Adds a directory to the list in which to search for files. Multiple
				2459	-I/--include options can be given to add multiple directories.
				2460	</ul>
				2461
				2462
				2463	<h3>7.10  Warnings</h3>
				2464	There are a couple of situations in which vg_annotate issues warnings.
				2465
				2466	<ul>
				2467	<li>If a source file is more recent than the <code>cachegrind.out</code>
				2468	file. This is because the information in <code>cachegrind.out</code> is
				2469	only recorded with line numbers, so if the line numbers change at all in
				2470	the source (eg. lines added, deleted, swapped), any annotations will be
				2471	incorrect.<p>
				2472
				2473	<li>If information is recorded about line numbers past the end of a file.
				2474	This can be caused by the above problem, ie. shortening the source file
				2475	while using an old <code>cachegrind.out</code> file. If this happens,
				2476	the figures for the bogus lines are printed anyway (clearly marked as
				2477	bogus) in case they are important.</li><p>
				2478	</ul>
				2479
				2480
				2481	<h3>7.10  Things to watch out for</h3>
				2482	Some odd things that can occur during annotation:
				2483
				2484	<ul>
				2485	<li>If annotating at the assembler level, you might see something like this:
				2486
				2487	<pre>
				2488	1 0 0 . . . . . . leal -12(%ebp),%eax
				2489	1 0 0 . . . 1 0 0 movl %eax,84(%ebx)
				2490	2 0 0 0 0 0 1 0 0 movl $1,-20(%ebp)
				2491	. . . . . . . . . .align 4,0x90
				2492	1 0 0 . . . . . . movl $.LnrB,%eax
				2493	1 0 0 . . . 1 0 0 movl %eax,-16(%ebp)
				2494	</pre>
				2495
				2496	How can the third instruction be executed twice when the others are
				2497	executed only once? As it turns out, it isn't. Here's a dump of the
				2498	executable, from objdump:
				2499
				2500	<pre>
				2501	8048f25: 8d 45 f4 lea 0xfffffff4(%ebp),%eax
				2502	8048f28: 89 43 54 mov %eax,0x54(%ebx)
				2503	8048f2b: c7 45 ec 01 00 00 00 movl $0x1,0xffffffec(%ebp)
				2504	8048f32: 89 f6 mov %esi,%esi
				2505	8048f34: b8 08 8b 07 08 mov $0x8078b08,%eax
				2506	8048f39: 89 45 f0 mov %eax,0xfffffff0(%ebp)
				2507	</pre>
				2508
				2509	Notice the extra <code>mov %esi,%esi</code> instruction. Where did this
				2510	come from? The GNU assembler inserted it to serve as the two bytes of
				2511	padding needed to align the <code>movl $.LnrB,%eax</code> instruction on
				2512	a four-byte boundary, but pretended it didn't exist when adding debug
				2513	information. Thus when Valgrind reads the debug info it thinks that the
				2514	<code>movl $0x1,0xffffffec(%ebp)</code> instruction covers the address
				2515	range 0x8048f2b--0x804833 by itself, and attributes the counts for the
				2516	<code>mov %esi,%esi</code> to it.<p>
				2517	</li>
				2518
				2519	<li>
				2520	Inlined functions can cause strange results in the function-by-function
				2521	summary. If a function <code>inline_me()</code> is defined in
				2522	<code>foo.h</code> and inlined in the functions <code>f1()</code>,
				2523	<code>f2()</code> and <code>f3()</code> in <code>bar.c</code>, there will
				2524	not be a <code>foo.h:inline_me()</code> function entry. Instead, there
				2525	will be separate function entries for each inlining site, ie.
				2526	<code>foo.h:f1()</code>, <code>foo.h:f2()</code> and
				2527	<code>foo.h:f3()</code>. To find the total counts for
				2528	<code>foo.h:inline_me()</code>, add up the counts from each entry.<p>
				2529
				2530	The reason for this is that although the debug info output by gcc
				2531	indicates the switch from <code>bar.c</code> to <code>foo.h</code>, it
				2532	doesn't indicate the name of the function in <code>foo.h</code>, so
				2533	Valgrind keeps using the old one.<p>
				2534
				2535	<li>
				2536	Sometimes, the same filename might be represented with a relative name
				2537	and with an absolute name in different parts of the debug info, eg:
				2538	<code>/home/user/proj/proj.h</code> and <code>../proj.h</code>. In this
				2539	case, if you use auto-annotation, the file will be annotated twice with
				2540	the counts split between the two.<p>
				2541	</li>
				2542	</ul>
				2543
				2544	Note: stabs is not an easy format to read. If you come across bizarre
				2545	annotations that look like might be caused by a bug in the stabs reader,
				2546	please let us know.
				2547
				2548
				2549	<h3>7.11  Accuracy</h3>
				2550	Valgrind's cache profiling has a number of shortcomings:
				2551
				2552	<ul>
				2553	<li>It doesn't account for kernel activity -- the effect of system calls on
				2554	the cache contents is ignored.</li><p>
				2555
				2556	<li>It doesn't account for other process activity (although this is probably
				2557	desirable when considering a single program).</li><p>
				2558
				2559	<li>It doesn't account for virtual-to-physical address mappings; hence the
				2560	entire simulation is not a true representation of what's happening in the
				2561	cache.</li><p>
				2562
				2563	<li>It doesn't account for cache misses not visible at the instruction level,
				2564	eg. those arising from TLB misses, or speculative execution.</li><p>
njn	db75e4d	2002-04-30 12:46:22 +0000	[diff] [blame]	2565
				2566	<li>The instructions <code>bts</code>, <code>btr</code> and <code>btc</code>
				2567	will incorrectly be counted as doing a data read if both the arguments
				2568	are registers, eg:
				2569
				2570	<blockquote><code>btsl %eax, %edx</code></blockquote>
				2571
				2572	This should only happen rarely.
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2573	</ul>
				2574
				2575	Another thing worth nothing is that results are very sensitive. Changing the
				2576	size of the <code>valgrind.so</code> file, the size of the program being
				2577	profiled, or even the length of its name can perturb the results. Variations
				2578	will be small, but don't expect perfectly repeatable results if your program
				2579	changes at all.<p>
				2580
				2581	While these factors mean you shouldn't trust the results to be super-accurate,
				2582	hopefully they should be close enough to be useful.<p>
				2583
				2584
				2585	<h3>7.12  Todo</h3>
				2586	<ul>
				2587	<li>Use CPUID instruction to auto-identify cache configuration during
				2588	installation. This would save the user from having to know their cache
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2589	configuration and using vg_cachegen.</li>
				2590	<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2591	<li>Program start-up/shut-down calls a lot of functions that aren't
				2592	interesting and just complicate the output. Would be nice to exclude
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2593	these somehow.</li>
				2594	<p>
				2595	<li>Handle files with more than 65535 lines.</li><p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2596	</ul>
				2597	<hr width="100%">
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	2598	</body>
				2599	</html>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2600