Blame - memcheck/docs/manual.html - platform/external/valgrind

blob: daaa1535d1150b4096770d7e21bf866c9a7d1a1f [file] [log] [blame]

sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1	<html>
				2	<head>
				3	<style type="text/css">
				4	body { background-color: #ffffff;
				5	color: #000000;
				6	font-family: Times, Helvetica, Arial;
				7	font-size: 14pt}
				8	h4 { margin-bottom: 0.3em}
				9	code { color: #000000;
				10	font-family: Courier;
				11	font-size: 13pt }
				12	pre { color: #000000;
				13	font-family: Courier;
				14	font-size: 13pt }
				15	a:link { color: #0000C0;
				16	text-decoration: none; }
				17	a:visited { color: #0000C0;
				18	text-decoration: none; }
				19	a:active { color: #0000C0;
				20	text-decoration: none; }
				21	</style>
				22	</head>
				23
				24	<body bgcolor="#ffffff">
				25
				26	<a name="title"> </a>
sewardj	a7dc795	2002-03-24 11:29:13 +0000	[diff] [blame]	27	<h1 align=center>Valgrind, snapshot 20020324</h1>
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	28	<center>This manual was minimally updated on 20020415</center>
				29	<p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	30
				31	<center>
				32	<a href="mailto:jseward@acm.org">jseward@acm.org<br>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	33	Copyright © 2000-2002 Julian Seward
				34	<p>
				35	Valgrind is licensed under the GNU General Public License,
				36	version 2<br>
				37	An open-source tool for finding memory-management problems in
				38	Linux-x86 executables.
				39	</center>
				40
				41	<p>
				42
				43	<hr width="100%">
				44	<a name="contents"></a>
				45	<h2>Contents of this manual</h2>
				46
				47	<h4>1  <a href="#intro">Introduction</a></h4>
				48	1.1  <a href="#whatfor">What Valgrind is for</a><br>
				49	1.2  <a href="#whatdoes">What it does with your program</a>
				50
				51	<h4>2  <a href="#howtouse">How to use it, and how to make sense
				52	of the results</a></h4>
				53	2.1  <a href="#starta">Getting started</a><br>
				54	2.2  <a href="#comment">The commentary</a><br>
				55	2.3  <a href="#report">Reporting of errors</a><br>
				56	2.4  <a href="#suppress">Suppressing errors</a><br>
				57	2.5  <a href="#flags">Command-line flags</a><br>
				58	2.6  <a href="#errormsgs">Explaination of error messages</a><br>
				59	2.7  <a href="#suppfiles">Writing suppressions files</a><br>
				60	2.8  <a href="#install">Building and installing</a><br>
				61	2.9  <a href="#problems">If you have problems</a><br>
				62
				63	<h4>3  <a href="#machine">Details of the checking machinery</a></h4>
				64	3.1  <a href="#vvalue">Valid-value (V) bits</a><br>
				65	3.2  <a href="#vaddress">Valid-address (A) bits</a><br>
				66	3.3  <a href="#together">Putting it all together</a><br>
				67	3.4  <a href="#signals">Signals</a><br>
				68	3.5  <a href="#leaks">Memory leak detection</a><br>
				69
				70	<h4>4  <a href="#limits">Limitations</a></h4>
				71
				72	<h4>5  <a href="#howitworks">How it works -- a rough overview</a></h4>
				73	5.1  <a href="#startb">Getting started</a><br>
				74	5.2  <a href="#engine">The translation/instrumentation engine</a><br>
				75	5.3  <a href="#track">Tracking the status of memory</a><br>
				76	5.4  <a href="#sys_calls">System calls</a><br>
				77	5.5  <a href="#sys_signals">Signals</a><br>
				78
				79	<h4>6  <a href="#example">An example</a></h4>
				80
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	81	<h4>7  <a href="#cache">Cache profiling</a></h4>
				82
				83	<h4>8  <a href="techdocs.html">The design and implementation of Valgrind</a></h4>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	84
				85	<hr width="100%">
				86
				87	<a name="intro"></a>
				88	<h2>1  Introduction</h2>
				89
				90	<a name="whatfor"></a>
				91	<h3>1.1  What Valgrind is for</h3>
				92
				93	Valgrind is a tool to help you find memory-management problems in your
				94	programs. When a program is run under Valgrind's supervision, all
				95	reads and writes of memory are checked, and calls to
				96	malloc/new/free/delete are intercepted. As a result, Valgrind can
				97	detect problems such as:
				98	<ul>
				99	<li>Use of uninitialised memory</li>
				100	<li>Reading/writing memory after it has been free'd</li>
				101	<li>Reading/writing off the end of malloc'd blocks</li>
				102	<li>Reading/writing inappropriate areas on the stack</li>
				103	<li>Memory leaks -- where pointers to malloc'd blocks are lost forever</li>
				104	</ul>
				105
				106	Problems like these can be difficult to find by other means, often
				107	lying undetected for long periods, then causing occasional,
				108	difficult-to-diagnose crashes.
				109
				110	<p>
				111	Valgrind is closely tied to details of the CPU, operating system and
				112	to a less extent, compiler and basic C libraries. This makes it
				113	difficult to make it portable, so I have chosen at the outset to
				114	concentrate on what I believe to be a widely used platform: Red Hat
				115	Linux 7.2, on x86s. I believe that it will work without significant
				116	difficulty on other x86 GNU/Linux systems which use the 2.4 kernel and
				117	GNU libc 2.2.X, for example SuSE 7.1 and Mandrake 8.0. Red Hat 6.2 is
				118	also supported. It has worked in the past, and probably still does,
				119	on RedHat 7.1 and 6.2. Note that I haven't compiled it on RedHat 7.1
				120	and 6.2 for a while, so they may no longer work now.
				121	<p>
				122	(Early Feb 02: after feedback from the KDE people it also works better
				123	on other Linuxes).
				124	<p>
				125	At some point in the past, Valgrind has also worked on Red Hat 6.2
				126	(x86), thanks to the efforts of Rob Noble.
				127
				128	<p>
				129	Valgrind is licensed under the GNU General Public License, version
				130	2. Read the file LICENSE in the source distribution for details.
				131
				132	<a name="whatdoes">
				133	<h3>1.2  What it does with your program</h3>
				134
				135	Valgrind is designed to be as non-intrusive as possible. It works
				136	directly with existing executables. You don't need to recompile,
				137	relink, or otherwise modify, the program to be checked. Simply place
				138	the word <code>valgrind</code> at the start of the command line
				139	normally used to run the program. So, for example, if you want to run
				140	the command <code>ls -l</code> on Valgrind, simply issue the
				141	command: <code>valgrind ls -l</code>.
				142
				143	<p>Valgrind takes control of your program before it starts. Debugging
				144	information is read from the executable and associated libraries, so
				145	that error messages can be phrased in terms of source code
				146	locations. Your program is then run on a synthetic x86 CPU which
				147	checks every memory access. All detected errors are written to a
				148	log. When the program finishes, Valgrind searches for and reports on
				149	leaked memory.
				150
				151	<p>You can run pretty much any dynamically linked ELF x86 executable using
				152	Valgrind. Programs run 25 to 50 times slower, and take a lot more
				153	memory, than they usually would. It works well enough to run large
				154	programs. For example, the Konqueror web browser from the KDE Desktop
				155	Environment, version 2.1.1, runs slowly but usably on Valgrind.
				156
				157	<p>Valgrind simulates every single instruction your program executes.
				158	Because of this, it finds errors not only in your application but also
				159	in all supporting dynamically-linked (.so-format) libraries, including
				160	the GNU C library, the X client libraries, Qt, if you work with KDE, and
				161	so on. That often includes libraries, for example the GNU C library,
				162	which contain memory access violations, but which you cannot or do not
				163	want to fix.
				164
				165	<p>Rather than swamping you with errors in which you are not
				166	interested, Valgrind allows you to selectively suppress errors, by
				167	recording them in a suppressions file which is read when Valgrind
				168	starts up. As supplied, Valgrind comes with a suppressions file
				169	designed to give reasonable behaviour on Red Hat 7.2 (also 7.1 and
				170	6.2) when running text-only and simple X applications.
				171
				172	<p><a href="#example">Section 6</a> shows an example of use.
				173	<p>
				174	<hr width="100%">
				175
				176	<a name="howtouse"></a>
				177	<h2>2  How to use it, and how to make sense of the results</h2>
				178
				179	<a name="starta"></a>
				180	<h3>2.1  Getting started</h3>
				181
				182	First off, consider whether it might be beneficial to recompile your
				183	application and supporting libraries with optimisation disabled and
				184	debugging info enabled (the <code>-g</code> flag). You don't have to
				185	do this, but doing so helps Valgrind produce more accurate and less
				186	confusing error reports. Chances are you're set up like this already,
				187	if you intended to debug your program with GNU gdb, or some other
				188	debugger.
				189
				190	<p>Then just run your application, but place the word
				191	<code>valgrind</code> in front of your usual command-line invokation.
				192	Note that you should run the real (machine-code) executable here. If
				193	your application is started by, for example, a shell or perl script,
				194	you'll need to modify it to invoke Valgrind on the real executables.
				195	Running such scripts directly under Valgrind will result in you
				196	getting error reports pertaining to <code>/bin/sh</code>,
				197	<code>/usr/bin/perl</code>, or whatever interpreter you're using.
				198	This almost certainly isn't what you want and can be hugely confusing.
				199
				200	<a name="comment"></a>
				201	<h3>2.2  The commentary</h3>
				202
				203	Valgrind writes a commentary, detailing error reports and other
				204	significant events. The commentary goes to standard output by
				205	default. This may interfere with your program, so you can ask for it
				206	to be directed elsewhere.
				207
				208	<p>All lines in the commentary are of the following form:<br>
				209	<pre>
				210	==12345== some-message-from-Valgrind
				211	</pre>
				212	<p>The <code>12345</code> is the process ID. This scheme makes it easy
				213	to distinguish program output from Valgrind commentary, and also easy
				214	to differentiate commentaries from different processes which have
				215	become merged together, for whatever reason.
				216
				217	<p>By default, Valgrind writes only essential messages to the commentary,
				218	so as to avoid flooding you with information of secondary importance.
				219	If you want more information about what is happening, re-run, passing
				220	the <code>-v</code> flag to Valgrind.
				221
				222
				223	<a name="report"></a>
				224	<h3>2.3  Reporting of errors</h3>
				225
				226	When Valgrind detects something bad happening in the program, an error
				227	message is written to the commentary. For example:<br>
				228	<pre>
				229	==25832== Invalid read of size 4
				230	==25832== at 0x8048724: BandMatrix::ReSize(int, int, int) (bogon.cpp:45)
				231	==25832== by 0x80487AF: main (bogon.cpp:66)
				232	==25832== by 0x40371E5E: __libc_start_main (libc-start.c:129)
				233	==25832== by 0x80485D1: (within /home/sewardj/newmat10/bogon)
				234	==25832== Address 0xBFFFF74C is not stack'd, malloc'd or free'd
				235	</pre>
				236
				237	<p>This message says that the program did an illegal 4-byte read of
				238	address 0xBFFFF74C, which, as far as it can tell, is not a valid stack
				239	address, nor corresponds to any currently malloc'd or free'd blocks.
				240	The read is happening at line 45 of <code>bogon.cpp</code>, called
				241	from line 66 of the same file, etc. For errors associated with an
				242	identified malloc'd/free'd block, for example reading free'd memory,
				243	Valgrind reports not only the location where the error happened, but
				244	also where the associated block was malloc'd/free'd.
				245
				246	<p>Valgrind remembers all error reports. When an error is detected,
				247	it is compared against old reports, to see if it is a duplicate. If
				248	so, the error is noted, but no further commentary is emitted. This
				249	avoids you being swamped with bazillions of duplicate error reports.
				250
				251	<p>If you want to know how many times each error occurred, run with
				252	the <code>-v</code> option. When execution finishes, all the reports
				253	are printed out, along with, and sorted by, their occurrence counts.
				254	This makes it easy to see which errors have occurred most frequently.
				255
				256	<p>Errors are reported before the associated operation actually
				257	happens. For example, if you program decides to read from address
				258	zero, Valgrind will emit a message to this effect, and the program
				259	will then duly die with a segmentation fault.
				260
				261	<p>In general, you should try and fix errors in the order that they
				262	are reported. Not doing so can be confusing. For example, a program
				263	which copies uninitialised values to several memory locations, and
				264	later uses them, will generate several error messages. The first such
				265	error message may well give the most direct clue to the root cause of
				266	the problem.
				267
				268	<a name="suppress"></a>
				269	<h3>2.4  Suppressing errors</h3>
				270
				271	Valgrind detects numerous problems in the base libraries, such as the
				272	GNU C library, and the XFree86 client libraries, which come
				273	pre-installed on your GNU/Linux system. You can't easily fix these,
				274	but you don't want to see these errors (and yes, there are many!) So
				275	Valgrind reads a list of errors to suppress at startup. By default
				276	this file is <code>redhat72.supp</code>, located in the Valgrind
				277	installation directory.
				278
				279	<p>You can modify and add to the suppressions file at your leisure, or
				280	write your own. Multiple suppression files are allowed. This is
				281	useful if part of your project contains errors you can't or don't want
				282	to fix, yet you don't want to continuously be reminded of them.
				283
				284	<p>Each error to be suppressed is described very specifically, to
				285	minimise the possibility that a suppression-directive inadvertantly
				286	suppresses a bunch of similar errors which you did want to see. The
				287	suppression mechanism is designed to allow precise yet flexible
				288	specification of errors to suppress.
				289
				290	<p>If you use the <code>-v</code> flag, at the end of execution, Valgrind
				291	prints out one line for each used suppression, giving its name and the
				292	number of times it got used. Here's the suppressions used by a run of
				293	<code>ls -l</code>:
				294	<pre>
				295	--27579-- supp: 1 socketcall.connect(serv_addr)/__libc_connect/__nscd_getgrgid_r
				296	--27579-- supp: 1 socketcall.connect(serv_addr)/__libc_connect/__nscd_getpwuid_r
				297	--27579-- supp: 6 strrchr/_dl_map_object_from_fd/_dl_map_object
				298	</pre>
				299
				300	<a name="flags"></a>
				301	<h3>2.5  Command-line flags</h3>
				302
				303	You invoke Valgrind like this:
				304	<pre>
				305	valgrind [options-for-Valgrind] your-prog [options for your-prog]
				306	</pre>
				307
				308	<p>Valgrind's default settings succeed in giving reasonable behaviour
				309	in most cases. Available options, in no particular order, are as
				310	follows:
				311	<ul>
				312	<li><code>--help</code></li><br>
				313
				314	<li><code>--version</code><br>
				315	<p>The usual deal.</li><br><p>
				316
				317	<li><code>-v --verbose</code><br>
				318	<p>Be more verbose. Gives extra information on various aspects
				319	of your program, such as: the shared objects loaded, the
				320	suppressions used, the progress of the instrumentation engine,
				321	and warnings about unusual behaviour.
				322	</li><br><p>
				323
				324	<li><code>-q --quiet</code><br>
				325	<p>Run silently, and only print error messages. Useful if you
				326	are running regression tests or have some other automated test
				327	machinery.
				328	</li><br><p>
				329
				330	<li><code>--demangle=no</code><br>
				331	<code>--demangle=yes</code> [the default]
				332	<p>Disable/enable automatic demangling (decoding) of C++ names.
				333	Enabled by default. When enabled, Valgrind will attempt to
				334	translate encoded C++ procedure names back to something
				335	approaching the original. The demangler handles symbols mangled
				336	by g++ versions 2.X and 3.X.
				337
				338	<p>An important fact about demangling is that function
				339	names mentioned in suppressions files should be in their mangled
				340	form. Valgrind does not demangle function names when searching
				341	for applicable suppressions, because to do otherwise would make
				342	suppressions file contents dependent on the state of Valgrind's
				343	demangling machinery, and would also be slow and pointless.
				344	</li><br><p>
				345
				346	<li><code>--num-callers=<number></code> [default=4]<br>
				347	<p>By default, Valgrind shows four levels of function call names
				348	to help you identify program locations. You can change that
				349	number with this option. This can help in determining the
				350	program's location in deeply-nested call chains. Note that errors
				351	are commoned up using only the top three function locations (the
				352	place in the current function, and that of its two immediate
				353	callers). So this doesn't affect the total number of errors
				354	reported.
				355	<p>
				356	The maximum value for this is 50. Note that higher settings
				357	will make Valgrind run a bit more slowly and take a bit more
				358	memory, but can be useful when working with programs with
				359	deeply-nested call chains.
				360	</li><br><p>
				361
				362	<li><code>--gdb-attach=no</code> [the default]<br>
				363	<code>--gdb-attach=yes</code>
				364	<p>When enabled, Valgrind will pause after every error shown,
				365	and print the line
				366	<br>
				367	<code>---- Attach to GDB ? --- [Return/N/n/Y/y/C/c] ----</code>
				368	<p>
				369	Pressing <code>Ret</code>, or <code>N</code> <code>Ret</code>
				370	or <code>n</code> <code>Ret</code>, causes Valgrind not to
				371	start GDB for this error.
				372	<p>
				373	<code>Y</code> <code>Ret</code>
				374	or <code>y</code> <code>Ret</code> causes Valgrind to
				375	start GDB, for the program at this point. When you have
				376	finished with GDB, quit from it, and the program will continue.
				377	Trying to continue from inside GDB doesn't work.
				378	<p>
				379	<code>C</code> <code>Ret</code>
				380	or <code>c</code> <code>Ret</code> causes Valgrind not to
				381	start GDB, and not to ask again.
				382	<p>
				383	<code>--gdb-attach=yes</code> conflicts with
				384	<code>--trace-children=yes</code>. You can't use them
				385	together. Valgrind refuses to start up in this situation.
				386	</li><br><p>
				387
				388	<li><code>--partial-loads-ok=yes</code> [the default]<br>
				389	<code>--partial-loads-ok=no</code>
				390	<p>Controls how Valgrind handles word (4-byte) loads from
				391	addresses for which some bytes are addressible and others
				392	are not. When <code>yes</code> (the default), such loads
				393	do not elicit an address error. Instead, the loaded V bytes
				394	corresponding to the illegal addresses indicate undefined, and
				395	those corresponding to legal addresses are loaded from shadow
				396	memory, as usual.
				397	<p>
				398	When <code>no</code>, loads from partially
				399	invalid addresses are treated the same as loads from completely
				400	invalid addresses: an illegal-address error is issued,
				401	and the resulting V bytes indicate valid data.
				402	</li><br><p>
				403
				404	<li><code>--sloppy-malloc=no</code> [the default]<br>
				405	<code>--sloppy-malloc=yes</code>
				406	<p>When enabled, all requests for malloc/calloc are rounded up
				407	to a whole number of machine words -- in other words, made
				408	divisible by 4. For example, a request for 17 bytes of space
				409	would result in a 20-byte area being made available. This works
				410	around bugs in sloppy libraries which assume that they can
				411	safely rely on malloc/calloc requests being rounded up in this
				412	fashion. Without the workaround, these libraries tend to
				413	generate large numbers of errors when they access the ends of
				414	these areas. Valgrind snapshots dated 17 Feb 2002 and later are
				415	cleverer about this problem, and you should no longer need to
				416	use this flag.
				417	</li><br><p>
				418
				419	<li><code>--trace-children=no</code> [the default]</br>
				420	<code>--trace-children=yes</code>
				421	<p>When enabled, Valgrind will trace into child processes. This
				422	is confusing and usually not what you want, so is disabled by
				423	default.</li><br><p>
				424
				425	<li><code>--freelist-vol=<number></code> [default: 1000000]
				426	<p>When the client program releases memory using free (in C) or
				427	delete (C++), that memory is not immediately made available for
				428	re-allocation. Instead it is marked inaccessible and placed in
				429	a queue of freed blocks. The purpose is to delay the point at
				430	which freed-up memory comes back into circulation. This
				431	increases the chance that Valgrind will be able to detect
				432	invalid accesses to blocks for some significant period of time
				433	after they have been freed.
				434	<p>
				435	This flag specifies the maximum total size, in bytes, of the
				436	blocks in the queue. The default value is one million bytes.
				437	Increasing this increases the total amount of memory used by
				438	Valgrind but may detect invalid uses of freed blocks which would
				439	otherwise go undetected.</li><br><p>
				440
				441	<li><code>--logfile-fd=<number></code> [default: 2, stderr]
				442	<p>Specifies the file descriptor on which Valgrind communicates
				443	all of its messages. The default, 2, is the standard error
				444	channel. This may interfere with the client's own use of
				445	stderr. To dump Valgrind's commentary in a file without using
				446	stderr, something like the following works well (sh/bash
				447	syntax):<br>
				448	<code>
				449	valgrind --logfile-fd=9 my_prog 9> logfile</code><br>
				450	That is: tell Valgrind to send all output to file descriptor 9,
				451	and ask the shell to route file descriptor 9 to "logfile".
				452	</li><br><p>
				453
				454	<li><code>--suppressions=<filename></code> [default:
				455	/installation/directory/redhat72.supp] <p>Specifies an extra
				456	file from which to read descriptions of errors to suppress. You
				457	may use as many extra suppressions files as you
				458	like.</li><br><p>
				459
				460	<li><code>--leak-check=no</code> [default]<br>
				461	<code>--leak-check=yes</code>
				462	<p>When enabled, search for memory leaks when the client program
				463	finishes. A memory leak means a malloc'd block, which has not
				464	yet been free'd, but to which no pointer can be found. Such a
				465	block can never be free'd by the program, since no pointer to it
				466	exists. Leak checking is disabled by default
				467	because it tends to generate dozens of error messages.
				468	</li><br><p>
				469
				470	<li><code>--show-reachable=no</code> [default]<br>
				471	<code>--show-reachable=yes</code> <p>When disabled, the memory
				472	leak detector only shows blocks for which it cannot find a
				473	pointer to at all, or it can only find a pointer to the middle
				474	of. These blocks are prime candidates for memory leaks. When
				475	enabled, the leak detector also reports on blocks which it could
				476	find a pointer to. Your program could, at least in principle,
				477	have freed such blocks before exit. Contrast this to blocks for
				478	which no pointer, or only an interior pointer could be found:
				479	they are more likely to indicate memory leaks, because
				480	you do not actually have a pointer to the start of the block
				481	which you can hand to free(), even if you wanted to.
				482	</li><br><p>
				483
				484	<li><code>--leak-resolution=low</code> [default]<br>
				485	<code>--leak-resolution=med</code> <br>
				486	<code>--leak-resolution=high</code>
				487	<p>When doing leak checking, determines how willing Valgrind is
				488	to consider different backtraces the same. When set to
				489	<code>low</code>, the default, only the first two entries need
				490	match. When <code>med</code>, four entries have to match. When
				491	<code>high</code>, all entries need to match.
				492	<p>
				493	For hardcore leak debugging, you probably want to use
				494	<code>--leak-resolution=high</code> together with
				495	<code>--num-callers=40</code> or some such large number. Note
				496	however that this can give an overwhelming amount of
				497	information, which is why the defaults are 4 callers and
				498	low-resolution matching.
				499	<p>
				500	Note that the <code>--leak-resolution=</code> setting does not
				501	affect Valgrind's ability to find leaks. It only changes how
				502	the results are presented to you.
				503	</li><br><p>
				504
				505	<li><code>--workaround-gcc296-bugs=no</code> [default]<br>
				506	<code>--workaround-gcc296-bugs=yes</code> <p>When enabled,
				507	assume that reads and writes some small distance below the stack
				508	pointer <code>%esp</code> are due to bugs in gcc 2.96, and does
				509	not report them. The "small distance" is 256 bytes by default.
				510	Note that gcc 2.96 is the default compiler on some popular Linux
				511	distributions (RedHat 7.X, Mandrake) and so you may well need to
				512	use this flag. Do not use it if you do not have to, as it can
				513	cause real errors to be overlooked. A better option is to use a
				514	gcc/g++ which works properly; 2.95.3 seems to be a good choice.
				515	<p>
				516	Unfortunately (27 Feb 02) it looks like g++ 3.0.4 is similarly
				517	buggy, so you may need to issue this flag if you use 3.0.4.
				518	</li><br><p>
				519
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	520	<li><code>--cachesim=no</code> [default]<br>
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	521	<code>--cachesim=yes</code> <p>When enabled, turns off memory
				522	checking, and turns on cache profiling. Cache profiling is
				523	described in detail in <a href="#cache">Section 7</a>. </li><p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	524	</ul>
				525
				526	There are also some options for debugging Valgrind itself. You
				527	shouldn't need to use them in the normal run of things. Nevertheless:
				528
				529	<ul>
				530
				531	<li><code>--single-step=no</code> [default]<br>
				532	<code>--single-step=yes</code>
				533	<p>When enabled, each x86 insn is translated seperately into
				534	instrumented code. When disabled, translation is done on a
				535	per-basic-block basis, giving much better translations.</li><br>
				536	<p>
				537
				538	<li><code>--optimise=no</code><br>
				539	<code>--optimise=yes</code> [default]
				540	<p>When enabled, various improvements are applied to the
				541	intermediate code, mainly aimed at allowing the simulated CPU's
				542	registers to be cached in the real CPU's registers over several
				543	simulated instructions.</li><br>
				544	<p>
				545
				546	<li><code>--instrument=no</code><br>
				547	<code>--instrument=yes</code> [default]
				548	<p>When disabled, the translations don't actually contain any
				549	instrumentation.</li><br>
				550	<p>
				551
				552	<li><code>--cleanup=no</code><br>
				553	<code>--cleanup=yes</code> [default]
				554	<p>When enabled, various improvments are applied to the
				555	post-instrumented intermediate code, aimed at removing redundant
				556	value checks.</li><br>
				557	<p>
				558
				559	<li><code>--trace-syscalls=no</code> [default]<br>
				560	<code>--trace-syscalls=yes</code>
				561	<p>Enable/disable tracing of system call intercepts.</li><br>
				562	<p>
				563
				564	<li><code>--trace-signals=no</code> [default]<br>
				565	<code>--trace-signals=yes</code>
				566	<p>Enable/disable tracing of signal handling.</li><br>
				567	<p>
				568
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	569	<li><code>--trace-sched=no</code> [default]<br>
				570	<code>--trace-sched=yes</code>
				571	<p>Enable/disable tracing of thread scheduling events.</li><br>
				572	<p>
				573
sewardj	45b4b37	2002-04-16 22:50:32 +0000	[diff] [blame]	574	<li><code>--trace-pthread=none</code> [default]<br>
				575	<code>--trace-pthread=some</code> <br>
				576	<code>--trace-pthread=all</code>
				577	<p>Specifies amount of trace detail for pthread-related events.</li><br>
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	578	<p>
				579
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	580	<li><code>--trace-symtab=no</code> [default]<br>
				581	<code>--trace-symtab=yes</code>
				582	<p>Enable/disable tracing of symbol table reading.</li><br>
				583	<p>
				584
				585	<li><code>--trace-malloc=no</code> [default]<br>
				586	<code>--trace-malloc=yes</code>
				587	<p>Enable/disable tracing of malloc/free (et al) intercepts.
				588	</li><br>
				589	<p>
				590
				591	<li><code>--stop-after=<number></code>
				592	[default: infinity, more or less]
				593	<p>After <number> basic blocks have been executed, shut down
				594	Valgrind and switch back to running the client on the real CPU.
				595	</li><br>
				596	<p>
				597
				598	<li><code>--dump-error=<number></code>
				599	[default: inactive]
				600	<p>After the program has exited, show gory details of the
				601	translation of the basic block containing the <number>'th
				602	error context. When used with <code>--single-step=yes</code>,
				603	can show the
				604	exact x86 instruction causing an error.</li><br>
				605	<p>
				606
				607	<li><code>--smc-check=none</code><br>
				608	<code>--smc-check=some</code> [default]<br>
				609	<code>--smc-check=all</code>
				610	<p>How carefully should Valgrind check for self-modifying code
				611	writes, so that translations can be discarded?  When
				612	"none", no writes are checked. When "some", only writes
				613	resulting from moves from integer registers to memory are
				614	checked. When "all", all memory writes are checked, even those
				615	with which are no sane program would generate code -- for
				616	example, floating-point writes.</li>
				617	</ul>
				618
				619
				620	<a name="errormsgs">
				621	<h3>2.6  Explaination of error messages</h3>
				622
				623	Despite considerable sophistication under the hood, Valgrind can only
				624	really detect two kinds of errors, use of illegal addresses, and use
				625	of undefined values. Nevertheless, this is enough to help you
				626	discover all sorts of memory-management nasties in your code. This
				627	section presents a quick summary of what error messages mean. The
				628	precise behaviour of the error-checking machinery is described in
				629	<a href="#machine">Section 4</a>.
				630
				631
				632	<h4>2.6.1  Illegal read / Illegal write errors</h4>
				633	For example:
				634	<pre>
				635	==30975== Invalid read of size 4
				636	==30975== at 0x40F6BBCC: (within /usr/lib/libpng.so.2.1.0.9)
				637	==30975== by 0x40F6B804: (within /usr/lib/libpng.so.2.1.0.9)
				638	==30975== by 0x40B07FF4: read_png_image__FP8QImageIO (kernel/qpngio.cpp:326)
				639	==30975== by 0x40AC751B: QImageIO::read() (kernel/qimage.cpp:3621)
				640	==30975== Address 0xBFFFF0E0 is not stack'd, malloc'd or free'd
				641	</pre>
				642
				643	<p>This happens when your program reads or writes memory at a place
				644	which Valgrind reckons it shouldn't. In this example, the program did
				645	a 4-byte read at address 0xBFFFF0E0, somewhere within the
				646	system-supplied library libpng.so.2.1.0.9, which was called from
				647	somewhere else in the same library, called from line 326 of
				648	qpngio.cpp, and so on.
				649
				650	<p>Valgrind tries to establish what the illegal address might relate
				651	to, since that's often useful. So, if it points into a block of
				652	memory which has already been freed, you'll be informed of this, and
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	653	also where the block was free'd at. Likewise, if it should turn out
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	654	to be just off the end of a malloc'd block, a common result of
				655	off-by-one-errors in array subscripting, you'll be informed of this
				656	fact, and also where the block was malloc'd.
				657
				658	<p>In this example, Valgrind can't identify the address. Actually the
				659	address is on the stack, but, for some reason, this is not a valid
				660	stack address -- it is below the stack pointer, %esp, and that isn't
				661	allowed.
				662
				663	<p>Note that Valgrind only tells you that your program is about to
				664	access memory at an illegal address. It can't stop the access from
				665	happening. So, if your program makes an access which normally would
				666	result in a segmentation fault, you program will still suffer the same
				667	fate -- but you will get a message from Valgrind immediately prior to
				668	this. In this particular example, reading junk on the stack is
				669	non-fatal, and the program stays alive.
				670
				671
				672	<h4>2.6.2  Use of uninitialised values</h4>
				673	For example:
				674	<pre>
sewardj	a7dc795	2002-03-24 11:29:13 +0000	[diff] [blame]	675	==19146== Conditional jump or move depends on uninitialised value(s)
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	676	==19146== at 0x402DFA94: _IO_vfprintf (_itoa.h:49)
				677	==19146== by 0x402E8476: _IO_printf (printf.c:36)
				678	==19146== by 0x8048472: main (tests/manuel1.c:8)
				679	==19146== by 0x402A6E5E: __libc_start_main (libc-start.c:129)
				680	</pre>
				681
				682	<p>An uninitialised-value use error is reported when your program uses
				683	a value which hasn't been initialised -- in other words, is undefined.
				684	Here, the undefined value is used somewhere inside the printf()
				685	machinery of the C library. This error was reported when running the
				686	following small program:
				687	<pre>
				688	int main()
				689	{
				690	int x;
				691	printf ("x = %d\n", x);
				692	}
				693	</pre>
				694
				695	<p>It is important to understand that your program can copy around
				696	junk (uninitialised) data to its heart's content. Valgrind observes
				697	this and keeps track of the data, but does not complain. A complaint
				698	is issued only when your program attempts to make use of uninitialised
				699	data. In this example, x is uninitialised. Valgrind observes the
				700	value being passed to _IO_printf and thence to
				701	_IO_vfprintf, but makes no comment. However,
				702	_IO_vfprintf has to examine the value of x
				703	so it can turn it into the corresponding ASCII string, and it is at
				704	this point that Valgrind complains.
				705
				706	<p>Sources of uninitialised data tend to be:
				707	<ul>
				708	<li>Local variables in procedures which have not been initialised,
				709	as in the example above.</li><br><p>
				710
				711	<li>The contents of malloc'd blocks, before you write something
				712	there. In C++, the new operator is a wrapper round malloc, so
				713	if you create an object with new, its fields will be
				714	uninitialised until you fill them in, which is only Right and
				715	Proper.</li>
				716	</ul>
				717
				718
				719
				720	<h4>2.6.3  Illegal frees</h4>
				721	For example:
				722	<pre>
				723	==7593== Invalid free()
				724	==7593== at 0x4004FFDF: free (ut_clientmalloc.c:577)
				725	==7593== by 0x80484C7: main (tests/doublefree.c:10)
				726	==7593== by 0x402A6E5E: __libc_start_main (libc-start.c:129)
				727	==7593== by 0x80483B1: (within tests/doublefree)
				728	==7593== Address 0x3807F7B4 is 0 bytes inside a block of size 177 free'd
				729	==7593== at 0x4004FFDF: free (ut_clientmalloc.c:577)
				730	==7593== by 0x80484C7: main (tests/doublefree.c:10)
				731	==7593== by 0x402A6E5E: __libc_start_main (libc-start.c:129)
				732	==7593== by 0x80483B1: (within tests/doublefree)
				733	</pre>
				734	<p>Valgrind keeps track of the blocks allocated by your program with
				735	malloc/new, so it can know exactly whether or not the argument to
				736	free/delete is legitimate or not. Here, this test program has
				737	freed the same block twice. As with the illegal read/write errors,
				738	Valgrind attempts to make sense of the address free'd. If, as
				739	here, the address is one which has previously been freed, you wil
				740	be told that -- making duplicate frees of the same block easy to spot.
				741
				742
				743	<h4>2.6.4  Passing system call parameters with inadequate
				744	read/write permissions</h4>
				745
				746	Valgrind checks all parameters to system calls. If a system call
				747	needs to read from a buffer provided by your program, Valgrind checks
				748	that the entire buffer is addressible and has valid data, ie, it is
				749	readable. And if the system call needs to write to a user-supplied
				750	buffer, Valgrind checks that the buffer is addressible. After the
				751	system call, Valgrind updates its administrative information to
				752	precisely reflect any changes in memory permissions caused by the
				753	system call.
				754
				755	<p>Here's an example of a system call with an invalid parameter:
				756	<pre>
				757	#include <stdlib.h>
				758	#include <unistd.h>
				759	int main( void )
				760	{
				761	char* arr = malloc(10);
				762	(void) write( 1 /* stdout */, arr, 10 );
				763	return 0;
				764	}
				765	</pre>
				766
				767	<p>You get this complaint ...
				768	<pre>
				769	==8230== Syscall param write(buf) lacks read permissions
				770	==8230== at 0x4035E072: __libc_write
				771	==8230== by 0x402A6E5E: __libc_start_main (libc-start.c:129)
				772	==8230== by 0x80483B1: (within tests/badwrite)
				773	==8230== by <bogus frame pointer> ???
				774	==8230== Address 0x3807E6D0 is 0 bytes inside a block of size 10 alloc'd
				775	==8230== at 0x4004FEE6: malloc (ut_clientmalloc.c:539)
				776	==8230== by 0x80484A0: main (tests/badwrite.c:6)
				777	==8230== by 0x402A6E5E: __libc_start_main (libc-start.c:129)
				778	==8230== by 0x80483B1: (within tests/badwrite)
				779	</pre>
				780
				781	<p>... because the program has tried to write uninitialised junk from
				782	the malloc'd block to the standard output.
				783
				784
				785	<h4>2.6.5  Warning messages you might see</h4>
				786
				787	Most of these only appear if you run in verbose mode (enabled by
				788	<code>-v</code>):
				789	<ul>
				790	<li> <code>More than 50 errors detected. Subsequent errors
				791	will still be recorded, but in less detail than before.</code>
				792	<br>
				793	After 50 different errors have been shown, Valgrind becomes
				794	more conservative about collecting them. It then requires only
				795	the program counters in the top two stack frames to match when
				796	deciding whether or not two errors are really the same one.
				797	Prior to this point, the PCs in the top four frames are required
				798	to match. This hack has the effect of slowing down the
				799	appearance of new errors after the first 50. The 50 constant can
				800	be changed by recompiling Valgrind.
				801	<p>
				802	<li> <code>More than 500 errors detected. I'm not reporting any more.
				803	Final error counts may be inaccurate. Go fix your
				804	program!</code>
				805	<br>
				806	After 500 different errors have been detected, Valgrind ignores
				807	any more. It seems unlikely that collecting even more different
				808	ones would be of practical help to anybody, and it avoids the
				809	danger that Valgrind spends more and more of its time comparing
				810	new errors against an ever-growing collection. As above, the 500
				811	number is a compile-time constant.
				812	<p>
				813	<li> <code>Warning: client exiting by calling exit(<number>).
				814	Bye!</code>
				815	<br>
				816	Your program has called the <code>exit</code> system call, which
				817	will immediately terminate the process. You'll get no exit-time
				818	error summaries or leak checks. Note that this is not the same
				819	as your program calling the ANSI C function <code>exit()</code>
				820	-- that causes a normal, controlled shutdown of Valgrind.
				821	<p>
				822	<li> <code>Warning: client switching stacks?</code>
				823	<br>
				824	Valgrind spotted such a large change in the stack pointer, %esp,
				825	that it guesses the client is switching to a different stack.
				826	At this point it makes a kludgey guess where the base of the new
				827	stack is, and sets memory permissions accordingly. You may get
				828	many bogus error messages following this, if Valgrind guesses
				829	wrong. At the moment "large change" is defined as a change of
				830	more that 2000000 in the value of the %esp (stack pointer)
				831	register.
				832	<p>
				833	<li> <code>Warning: client attempted to close Valgrind's logfile fd <number>
				834	</code>
				835	<br>
				836	Valgrind doesn't allow the client
				837	to close the logfile, because you'd never see any diagnostic
				838	information after that point. If you see this message,
				839	you may want to use the <code>--logfile-fd=<number></code>
				840	option to specify a different logfile file-descriptor number.
				841	<p>
				842	<li> <code>Warning: noted but unhandled ioctl <number></code>
				843	<br>
				844	Valgrind observed a call to one of the vast family of
				845	<code>ioctl</code> system calls, but did not modify its
				846	memory status info (because I have not yet got round to it).
				847	The call will still have gone through, but you may get spurious
				848	errors after this as a result of the non-update of the memory info.
				849	<p>
				850	<li> <code>Warning: unblocking signal <number> due to
				851	sigprocmask</code>
				852	<br>
				853	Really just a diagnostic from the signal simulation machinery.
				854	This message will appear if your program handles a signal by
				855	first <code>longjmp</code>ing out of the signal handler,
				856	and then unblocking the signal with <code>sigprocmask</code>
				857	-- a standard signal-handling idiom.
				858	<p>
				859	<li> <code>Warning: bad signal number <number> in __NR_sigaction.</code>
				860	<br>
				861	Probably indicates a bug in the signal simulation machinery.
				862	<p>
				863	<li> <code>Warning: set address range perms: large range <number></code>
				864	<br>
				865	Diagnostic message, mostly for my benefit, to do with memory
				866	permissions.
				867	</ul>
				868
				869
				870	<a name="suppfiles"></a>
				871	<h3>2.7  Writing suppressions files</h3>
				872
				873	A suppression file describes a bunch of errors which, for one reason
				874	or another, you don't want Valgrind to tell you about. Usually the
				875	reason is that the system libraries are buggy but unfixable, at least
				876	within the scope of the current debugging session. Multiple
				877	suppresions files are allowed. By default, Valgrind uses
				878	<code>linux24.supp</code> in the directory where it is installed.
				879
				880	<p>
				881	You can ask to add suppressions from another file, by specifying
				882	<code>--suppressions=/path/to/file.supp</code>.
				883
				884	<p>Each suppression has the following components:<br>
				885	<ul>
				886
				887	<li>Its name. This merely gives a handy name to the suppression, by
				888	which it is referred to in the summary of used suppressions
				889	printed out when a program finishes. It's not important what
				890	the name is; any identifying string will do.
				891	<p>
				892
				893	<li>The nature of the error to suppress. Either:
				894	<code>Value1</code>,
				895	<code>Value2</code>,
sewardj	a7dc795	2002-03-24 11:29:13 +0000	[diff] [blame]	896	<code>Value4</code> or
				897	<code>Value8</code>,
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	898	meaning an uninitialised-value error when
sewardj	a7dc795	2002-03-24 11:29:13 +0000	[diff] [blame]	899	using a value of 1, 2, 4 or 8 bytes.
				900	Or
				901	<code>Cond</code> (or its old name, <code>Value0</code>),
				902	meaning use of an uninitialised CPU condition code. Or:
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	903	<code>Addr1</code>,
				904	<code>Addr2</code>,
				905	<code>Addr4</code> or
				906	<code>Addr8</code>, meaning an invalid address during a
				907	memory access of 1, 2, 4 or 8 bytes respectively. Or
				908	<code>Param</code>,
				909	meaning an invalid system call parameter error. Or
				910	<code>Free</code>, meaning an invalid or mismatching free.</li><br>
				911	<p>
				912
				913	<li>The "immediate location" specification. For Value and Addr
				914	errors, is either the name of the function in which the error
				915	occurred, or, failing that, the full path the the .so file
				916	containing the error location. For Param errors, is the name of
				917	the offending system call parameter. For Free errors, is the
				918	name of the function doing the freeing (eg, <code>free</code>,
				919	<code>__builtin_vec_delete</code>, etc)</li><br>
				920	<p>
				921
				922	<li>The caller of the above "immediate location". Again, either a
				923	function or shared-object name.</li><br>
				924	<p>
				925
				926	<li>Optionally, one or two extra calling-function or object names,
				927	for greater precision.</li>
				928	</ul>
				929
				930	<p>
				931	Locations may be either names of shared objects or wildcards matching
				932	function names. They begin <code>obj:</code> and <code>fun:</code>
				933	respectively. Function and object names to match against may use the
				934	wildcard characters <code>*</code> and <code>?</code>.
				935
				936	A suppression only suppresses an error when the error matches all the
				937	details in the suppression. Here's an example:
				938	<pre>
				939	{
				940	__gconv_transform_ascii_internal/__mbrtowc/mbtowc
				941	Value4
				942	fun:__gconv_transform_ascii_internal
				943	fun:__mbr*toc
				944	fun:mbtowc
				945	}
				946	</pre>
				947
				948	<p>What is means is: suppress a use-of-uninitialised-value error, when
				949	the data size is 4, when it occurs in the function
				950	<code>__gconv_transform_ascii_internal</code>, when that is called
				951	from any function of name matching <code>__mbr*toc</code>,
				952	when that is called from
				953	<code>mbtowc</code>. It doesn't apply under any other circumstances.
				954	The string by which this suppression is identified to the user is
				955	__gconv_transform_ascii_internal/__mbrtowc/mbtowc.
				956
				957	<p>Another example:
				958	<pre>
				959	{
				960	libX11.so.6.2/libX11.so.6.2/libXaw.so.7.0
				961	Value4
				962	obj:/usr/X11R6/lib/libX11.so.6.2
				963	obj:/usr/X11R6/lib/libX11.so.6.2
				964	obj:/usr/X11R6/lib/libXaw.so.7.0
				965	}
				966	</pre>
				967
				968	<p>Suppress any size 4 uninitialised-value error which occurs anywhere
				969	in <code>libX11.so.6.2</code>, when called from anywhere in the same
				970	library, when called from anywhere in <code>libXaw.so.7.0</code>. The
				971	inexact specification of locations is regrettable, but is about all
				972	you can hope for, given that the X11 libraries shipped with Red Hat
				973	7.2 have had their symbol tables removed.
				974
				975	<p>Note -- since the above two examples did not make it clear -- that
				976	you can freely mix the <code>obj:</code> and <code>fun:</code>
				977	styles of description within a single suppression record.
				978
				979
				980	<a name="install"></a>
				981	<h3>2.8  Building and installing</h3>
				982	At the moment, very rudimentary.
				983
				984	<p>The tarball is set up for a standard Red Hat 7.1 (6.2) machine. To
				985	build, just do "make". No configure script, no autoconf, no nothing.
				986
				987	<p>The files needed for installation are: valgrind.so, valgring.so,
				988	valgrind, VERSION, redhat72.supp (or redhat62.supp). You can copy
				989	these to any directory you like. However, you then need to edit the
				990	shell script "valgrind". On line 4, set the environment variable
				991	<code>VALGRIND</code> to point to the directory you have copied the
				992	installation into.
				993
				994
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	995	<a name="install"></a>
				996	<h3>2.9  The Client Request mechanism</h3>
				997
				998	Valgrind has a trapdoor mechanism via which the client program can
				999	pass all manner of requests and queries to Valgrind. Internally, this
				1000	is used extensively to make malloc, free, signals, etc, work, although
				1001	you don't see that.
				1002	<p>
				1003	For your convenience, a subset of these so-called client requests is
				1004	provided to allow you to tell Valgrind facts about the behaviour of
				1005	your program, and conversely to make queries. In particular, your
				1006	program can tell Valgrind about changes in memory range permissions
				1007	that Valgrind would not otherwise know about, and so allows clients to
				1008	get Valgrind to do arbitrary custom checks.
				1009	<p>
				1010	Clients need to include the header file <code>valgrind.h</code> to
				1011	make this work. The macros therein have the magical property that
				1012	they generate code in-line which Valgrind can spot. However, the code
				1013	does nothing when not run on Valgrind, so you are not forced to run
				1014	your program on Valgrind just because you use the macros in this file.
				1015	<p>
				1016	A brief description of the available macros:
				1017	<ul>
				1018	<li><code>VALGRIND_MAKE_NOACCESS</code>,
				1019	<code>VALGRIND_MAKE_WRITABLE</code> and
				1020	<code>VALGRIND_MAKE_READABLE</code>. These mark address
				1021	ranges as completely inaccessible, accessible but containing
				1022	undefined data, and accessible and containing defined data,
				1023	respectively. Subsequent errors may have their faulting
				1024	addresses described in terms of these blocks. Returns a
				1025	"block handle". Returns zero when not run on Valgrind.
				1026	<p>
				1027	<li><code>VALGRIND_DISCARD</code>: At some point you may want
				1028	Valgrind to stop reporting errors in terms of the blocks
				1029	defined by the previous three macros. To do this, the above
				1030	macros return a small-integer "block handle". You can pass
				1031	this block handle to <code>VALGRIND_DISCARD</code>. After
				1032	doing so, Valgrind will no longer be able to relate
				1033	addressing errors to the user-defined block associated with
				1034	the handle. The permissions settings associated with the
				1035	handle remain in place; this just affects how errors are
				1036	reported, not whether they are reported. Returns 1 for an
				1037	invalid handle and 0 for a valid handle (although passing
				1038	invalid handles is harmless). Always returns 0 when not run
				1039	on Valgrind.
				1040	<p>
				1041	<li><code>VALGRIND_CHECK_NOACCESS</code>,
				1042	<code>VALGRIND_CHECK_WRITABLE</code> and
				1043	<code>VALGRIND_CHECK_READABLE</code>: check immediately
				1044	whether or not the given address range has the relevant
				1045	property, and if not, print an error message. Also, for the
				1046	convenience of the client, returns zero if the relevant
				1047	property holds; otherwise, the returned value is the address
				1048	of the first byte for which the property is not true.
				1049	Always returns 0 when not run on Valgrind.
				1050	<p>
				1051	<li><code>VALGRIND_CHECK_NOACCESS</code>: a quick and easy way
				1052	to find out whether Valgrind thinks a particular variable
				1053	(lvalue, to be precise) is addressible and defined. Prints
				1054	an error message if not. Returns no value.
				1055	<p>
				1056	<li><code>VALGRIND_MAKE_NOACCESS_STACK</code>: a highly
				1057	experimental feature. Similarly to
				1058	<code>VALGRIND_MAKE_NOACCESS</code>, this marks an address
				1059	range as inaccessible, so that subsequent accesses to an
				1060	address in the range gives an error. However, this macro
				1061	does not return a block handle. Instead, all annotations
				1062	created like this are reviewed at each client
				1063	<code>ret</code> (subroutine return) instruction, and those
				1064	which now define an address range block the client's stack
				1065	pointer register (<code>%esp</code>) are automatically
				1066	deleted.
				1067	<p>
				1068	In other words, this macro allows the client to tell
				1069	Valgrind about red-zones on its own stack. Valgrind
				1070	automatically discards this information when the stack
				1071	retreats past such blocks. Beware: hacky and flaky, and
				1072	probably interacts badly with the new pthread support.
				1073	</ul>
				1074	</li>
				1075	<p>
				1076
				1077
				1078
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1079	<a name="problems"></a>
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	1080	<h3>2.10  If you have problems</h3>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1081	Mail me (<a href="mailto:jseward@acm.org">jseward@acm.org</a>).
				1082
				1083	<p>See <a href="#limits">Section 4</a> for the known limitations of
				1084	Valgrind, and for a list of programs which are known not to work on
				1085	it.
				1086
				1087	<p>The translator/instrumentor has a lot of assertions in it. They
				1088	are permanently enabled, and I have no plans to disable them. If one
				1089	of these breaks, please mail me!
				1090
				1091	<p>If you get an assertion failure on the expression
				1092	<code>chunkSane(ch)</code> in <code>vg_free()</code> in
				1093	<code>vg_malloc.c</code>, this may have happened because your program
				1094	wrote off the end of a malloc'd block, or before its beginning.
				1095	Valgrind should have emitted a proper message to that effect before
				1096	dying in this way. This is a known problem which I should fix.
				1097	<p>
				1098
				1099	<hr width="100%">
				1100
				1101	<a name="machine"></a>
				1102	<h2>3  Details of the checking machinery</h2>
				1103
				1104	Read this section if you want to know, in detail, exactly what and how
				1105	Valgrind is checking.
				1106
				1107	<a name="vvalue"></a>
				1108	<h3>3.1  Valid-value (V) bits</h3>
				1109
				1110	It is simplest to think of Valgrind implementing a synthetic Intel x86
				1111	CPU which is identical to a real CPU, except for one crucial detail.
				1112	Every bit (literally) of data processed, stored and handled by the
				1113	real CPU has, in the synthetic CPU, an associated "valid-value" bit,
				1114	which says whether or not the accompanying bit has a legitimate value.
				1115	In the discussions which follow, this bit is referred to as the V
				1116	(valid-value) bit.
				1117
				1118	<p>Each byte in the system therefore has a 8 V bits which accompanies
				1119	it wherever it goes. For example, when the CPU loads a word-size item
				1120	(4 bytes) from memory, it also loads the corresponding 32 V bits from
				1121	a bitmap which stores the V bits for the process' entire address
				1122	space. If the CPU should later write the whole or some part of that
				1123	value to memory at a different address, the relevant V bits will be
				1124	stored back in the V-bit bitmap.
				1125
				1126	<p>In short, each bit in the system has an associated V bit, which
				1127	follows it around everywhere, even inside the CPU. Yes, the CPU's
				1128	(integer) registers have their own V bit vectors.
				1129
				1130	<p>Copying values around does not cause Valgrind to check for, or
				1131	report on, errors. However, when a value is used in a way which might
				1132	conceivably affect the outcome of your program's computation, the
				1133	associated V bits are immediately checked. If any of these indicate
				1134	that the value is undefined, an error is reported.
				1135
				1136	<p>Here's an (admittedly nonsensical) example:
				1137	<pre>
				1138	int i, j;
				1139	int a[10], b[10];
				1140	for (i = 0; i < 10; i++) {
				1141	j = a[i];
				1142	b[i] = j;
				1143	}
				1144	</pre>
				1145
				1146	<p>Valgrind emits no complaints about this, since it merely copies
				1147	uninitialised values from <code>a[]</code> into <code>b[]</code>, and
				1148	doesn't use them in any way. However, if the loop is changed to
				1149	<pre>
				1150	for (i = 0; i < 10; i++) {
				1151	j += a[i];
				1152	}
				1153	if (j == 77)
				1154	printf("hello there\n");
				1155	</pre>
				1156	then Valgrind will complain, at the <code>if</code>, that the
				1157	condition depends on uninitialised values.
				1158
				1159	<p>Most low level operations, such as adds, cause Valgrind to
				1160	use the V bits for the operands to calculate the V bits for the
				1161	result. Even if the result is partially or wholly undefined,
				1162	it does not complain.
				1163
				1164	<p>Checks on definedness only occur in two places: when a value is
				1165	used to generate a memory address, and where control flow decision
				1166	needs to be made. Also, when a system call is detected, valgrind
				1167	checks definedness of parameters as required.
				1168
				1169	<p>If a check should detect undefinedness, and error message is
				1170	issued. The resulting value is subsequently regarded as well-defined.
				1171	To do otherwise would give long chains of error messages. In effect,
				1172	we say that undefined values are non-infectious.
				1173
				1174	<p>This sounds overcomplicated. Why not just check all reads from
				1175	memory, and complain if an undefined value is loaded into a CPU register?
				1176	Well, that doesn't work well, because perfectly legitimate C programs routinely
				1177	copy uninitialised values around in memory, and we don't want endless complaints
				1178	about that. Here's the canonical example. Consider a struct
				1179	like this:
				1180	<pre>
				1181	struct S { int x; char c; };
				1182	struct S s1, s2;
				1183	s1.x = 42;
				1184	s1.c = 'z';
				1185	s2 = s1;
				1186	</pre>
				1187
				1188	<p>The question to ask is: how large is <code>struct S</code>, in
				1189	bytes? An int is 4 bytes and a char one byte, so perhaps a struct S
				1190	occupies 5 bytes? Wrong. All (non-toy) compilers I know of will
				1191	round the size of <code>struct S</code> up to a whole number of words,
				1192	in this case 8 bytes. Not doing this forces compilers to generate
				1193	truly appalling code for subscripting arrays of <code>struct
				1194	S</code>'s.
				1195
				1196	<p>So s1 occupies 8 bytes, yet only 5 of them will be initialised.
				1197	For the assignment <code>s2 = s1</code>, gcc generates code to copy
				1198	all 8 bytes wholesale into <code>s2</code> without regard for their
				1199	meaning. If Valgrind simply checked values as they came out of
				1200	memory, it would yelp every time a structure assignment like this
				1201	happened. So the more complicated semantics described above is
				1202	necessary. This allows gcc to copy <code>s1</code> into
				1203	<code>s2</code> any way it likes, and a warning will only be emitted
				1204	if the uninitialised values are later used.
				1205
				1206	<p>One final twist to this story. The above scheme allows garbage to
				1207	pass through the CPU's integer registers without complaint. It does
				1208	this by giving the integer registers V tags, passing these around in
				1209	the expected way. This complicated and computationally expensive to
				1210	do, but is necessary. Valgrind is more simplistic about
				1211	floating-point loads and stores. In particular, V bits for data read
				1212	as a result of floating-point loads are checked at the load
				1213	instruction. So if your program uses the floating-point registers to
				1214	do memory-to-memory copies, you will get complaints about
				1215	uninitialised values. Fortunately, I have not yet encountered a
				1216	program which (ab)uses the floating-point registers in this way.
				1217
				1218	<a name="vaddress"></a>
				1219	<h3>3.2  Valid-address (A) bits</h3>
				1220
				1221	Notice that the previous section describes how the validity of values
				1222	is established and maintained without having to say whether the
				1223	program does or does not have the right to access any particular
				1224	memory location. We now consider the latter issue.
				1225
				1226	<p>As described above, every bit in memory or in the CPU has an
				1227	associated valid-value (V) bit. In addition, all bytes in memory, but
				1228	not in the CPU, have an associated valid-address (A) bit. This
				1229	indicates whether or not the program can legitimately read or write
				1230	that location. It does not give any indication of the validity or the
				1231	data at that location -- that's the job of the V bits -- only whether
				1232	or not the location may be accessed.
				1233
				1234	<p>Every time your program reads or writes memory, Valgrind checks the
				1235	A bits associated with the address. If any of them indicate an
				1236	invalid address, an error is emitted. Note that the reads and writes
				1237	themselves do not change the A bits, only consult them.
				1238
				1239	<p>So how do the A bits get set/cleared? Like this:
				1240
				1241	<ul>
				1242	<li>When the program starts, all the global data areas are marked as
				1243	accessible.</li><br>
				1244	<p>
				1245
				1246	<li>When the program does malloc/new, the A bits for the exactly the
				1247	area allocated, and not a byte more, are marked as accessible.
				1248	Upon freeing the area the A bits are changed to indicate
				1249	inaccessibility.</li><br>
				1250	<p>
				1251
				1252	<li>When the stack pointer register (%esp) moves up or down, A bits
				1253	are set. The rule is that the area from %esp up to the base of
				1254	the stack is marked as accessible, and below %esp is
				1255	inaccessible. (If that sounds illogical, bear in mind that the
				1256	stack grows down, not up, on almost all Unix systems, including
				1257	GNU/Linux.) Tracking %esp like this has the useful side-effect
				1258	that the section of stack used by a function for local variables
				1259	etc is automatically marked accessible on function entry and
				1260	inaccessible on exit.</li><br>
				1261	<p>
				1262
				1263	<li>When doing system calls, A bits are changed appropriately. For
				1264	example, mmap() magically makes files appear in the process's
				1265	address space, so the A bits must be updated if mmap()
				1266	succeeds.</li><br>
				1267	</ul>
				1268
				1269
				1270	<a name="together"></a>
				1271	<h3>3.3  Putting it all together</h3>
				1272	Valgrind's checking machinery can be summarised as follows:
				1273
				1274	<ul>
				1275	<li>Each byte in memory has 8 associated V (valid-value) bits,
				1276	saying whether or not the byte has a defined value, and a single
				1277	A (valid-address) bit, saying whether or not the program
				1278	currently has the right to read/write that address.</li><br>
				1279	<p>
				1280
				1281	<li>When memory is read or written, the relevant A bits are
				1282	consulted. If they indicate an invalid address, Valgrind emits
				1283	an Invalid read or Invalid write error.</li><br>
				1284	<p>
				1285
				1286	<li>When memory is read into the CPU's integer registers, the
				1287	relevant V bits are fetched from memory and stored in the
				1288	simulated CPU. They are not consulted.</li><br>
				1289	<p>
				1290
				1291	<li>When an integer register is written out to memory, the V bits
				1292	for that register are written back to memory too.</li><br>
				1293	<p>
				1294
				1295	<li>When memory is read into the CPU's floating point registers, the
				1296	relevant V bits are read from memory and they are immediately
				1297	checked. If any are invalid, an uninitialised value error is
				1298	emitted. This precludes using the floating-point registers to
				1299	copy possibly-uninitialised memory, but simplifies Valgrind in
				1300	that it does not have to track the validity status of the
				1301	floating-point registers.</li><br>
				1302	<p>
				1303
				1304	<li>As a result, when a floating-point register is written to
				1305	memory, the associated V bits are set to indicate a valid
				1306	value.</li><br>
				1307	<p>
				1308
				1309	<li>When values in integer CPU registers are used to generate a
				1310	memory address, or to determine the outcome of a conditional
				1311	branch, the V bits for those values are checked, and an error
				1312	emitted if any of them are undefined.</li><br>
				1313	<p>
				1314
				1315	<li>When values in integer CPU registers are used for any other
				1316	purpose, Valgrind computes the V bits for the result, but does
				1317	not check them.</li><br>
				1318	<p>
				1319
				1320	<li>One the V bits for a value in the CPU have been checked, they
				1321	are then set to indicate validity. This avoids long chains of
				1322	errors.</li><br>
				1323	<p>
				1324
				1325	<li>When values are loaded from memory, valgrind checks the A bits
				1326	for that location and issues an illegal-address warning if
				1327	needed. In that case, the V bits loaded are forced to indicate
				1328	Valid, despite the location being invalid.
				1329	<p>
				1330	This apparently strange choice reduces the amount of confusing
				1331	information presented to the user. It avoids the
				1332	unpleasant phenomenon in which memory is read from a place which
				1333	is both unaddressible and contains invalid values, and, as a
				1334	result, you get not only an invalid-address (read/write) error,
				1335	but also a potentially large set of uninitialised-value errors,
				1336	one for every time the value is used.
				1337	<p>
				1338	There is a hazy boundary case to do with multi-byte loads from
				1339	addresses which are partially valid and partially invalid. See
				1340	details of the flag <code>--partial-loads-ok</code> for details.
				1341	</li><br>
				1342	</ul>
				1343
				1344	Valgrind intercepts calls to malloc, calloc, realloc, valloc,
				1345	memalign, free, new and delete. The behaviour you get is:
				1346
				1347	<ul>
				1348
				1349	<li>malloc/new: the returned memory is marked as addressible but not
				1350	having valid values. This means you have to write on it before
				1351	you can read it.</li><br>
				1352	<p>
				1353
				1354	<li>calloc: returned memory is marked both addressible and valid,
				1355	since calloc() clears the area to zero.</li><br>
				1356	<p>
				1357
				1358	<li>realloc: if the new size is larger than the old, the new section
				1359	is addressible but invalid, as with malloc.</li><br>
				1360	<p>
				1361
				1362	<li>If the new size is smaller, the dropped-off section is marked as
				1363	unaddressible. You may only pass to realloc a pointer
				1364	previously issued to you by malloc/calloc/new/realloc.</li><br>
				1365	<p>
				1366
				1367	<li>free/delete: you may only pass to free a pointer previously
				1368	issued to you by malloc/calloc/new/realloc, or the value
				1369	NULL. Otherwise, Valgrind complains. If the pointer is indeed
				1370	valid, Valgrind marks the entire area it points at as
				1371	unaddressible, and places the block in the freed-blocks-queue.
				1372	The aim is to defer as long as possible reallocation of this
				1373	block. Until that happens, all attempts to access it will
				1374	elicit an invalid-address error, as you would hope.</li><br>
				1375	</ul>
				1376
				1377
				1378
				1379	<a name="signals"></a>
				1380	<h3>3.4  Signals</h3>
				1381
				1382	Valgrind provides suitable handling of signals, so, provided you stick
				1383	to POSIX stuff, you should be ok. Basic sigaction() and sigprocmask()
				1384	are handled. Signal handlers may return in the normal way or do
				1385	longjmp(); both should work ok. As specified by POSIX, a signal is
				1386	blocked in its own handler. Default actions for signals should work
				1387	as before. Etc, etc.
				1388
				1389	<p>Under the hood, dealing with signals is a real pain, and Valgrind's
				1390	simulation leaves much to be desired. If your program does
				1391	way-strange stuff with signals, bad things may happen. If so, let me
				1392	know. I don't promise to fix it, but I'd at least like to be aware of
				1393	it.
				1394
				1395
				1396	<a name="leaks"><a/>
				1397	<h3>3.5  Memory leak detection</h3>
				1398
				1399	Valgrind keeps track of all memory blocks issued in response to calls
				1400	to malloc/calloc/realloc/new. So when the program exits, it knows
				1401	which blocks are still outstanding -- have not been returned, in other
				1402	words. Ideally, you want your program to have no blocks still in use
				1403	at exit. But many programs do.
				1404
				1405	<p>For each such block, Valgrind scans the entire address space of the
				1406	process, looking for pointers to the block. One of three situations
				1407	may result:
				1408
				1409	<ul>
				1410	<li>A pointer to the start of the block is found. This usually
				1411	indicates programming sloppiness; since the block is still
				1412	pointed at, the programmer could, at least in principle, free'd
				1413	it before program exit.</li><br>
				1414	<p>
				1415
				1416	<li>A pointer to the interior of the block is found. The pointer
				1417	might originally have pointed to the start and have been moved
				1418	along, or it might be entirely unrelated. Valgrind deems such a
				1419	block as "dubious", that is, possibly leaked,
				1420	because it's unclear whether or
				1421	not a pointer to it still exists.</li><br>
				1422	<p>
				1423
				1424	<li>The worst outcome is that no pointer to the block can be found.
				1425	The block is classified as "leaked", because the
				1426	programmer could not possibly have free'd it at program exit,
				1427	since no pointer to it exists. This might be a symptom of
				1428	having lost the pointer at some earlier point in the
				1429	program.</li>
				1430	</ul>
				1431
				1432	Valgrind reports summaries about leaked and dubious blocks.
				1433	For each such block, it will also tell you where the block was
				1434	allocated. This should help you figure out why the pointer to it has
				1435	been lost. In general, you should attempt to ensure your programs do
				1436	not have any leaked or dubious blocks at exit.
				1437
				1438	<p>The precise area of memory in which Valgrind searches for pointers
				1439	is: all naturally-aligned 4-byte words for which all A bits indicate
				1440	addressibility and all V bits indicated that the stored value is
				1441	actually valid.
				1442
				1443	<p><hr width="100%">
				1444
				1445
				1446	<a name="limits"></a>
				1447	<h2>4  Limitations</h2>
				1448
				1449	The following list of limitations seems depressingly long. However,
				1450	most programs actually work fine.
				1451
				1452	<p>Valgrind will run x86-GNU/Linux ELF dynamically linked binaries, on
				1453	a kernel 2.4.X system, subject to the following constraints:
				1454
				1455	<ul>
				1456	<li>No MMX, SSE, SSE2, 3DNow instructions. If the translator
				1457	encounters these, Valgrind will simply give up. It may be
				1458	possible to add support for them at a later time. Intel added a
				1459	few instructions such as "cmov" to the integer instruction set
				1460	on Pentium and later processors, and these are supported.
				1461	Nevertheless it's safest to think of Valgrind as implementing
				1462	the 486 instruction set.</li><br>
				1463	<p>
				1464
				1465	<li>Multithreaded programs are not supported, since I haven't yet
				1466	figured out how to do this. To be more specific, it is the
				1467	"clone" system call which is not supported. A program calls
				1468	"clone" to create threads. Valgrind will abort if this
				1469	happens.</li><nr>
				1470	<p>
				1471
				1472	<li>Valgrind assumes that the floating point registers are not used
				1473	as intermediaries in memory-to-memory copies, so it immediately
				1474	checks V bits in floating-point loads/stores. If you want to
				1475	write code which copies around possibly-uninitialised values,
				1476	you must ensure these travel through the integer registers, not
				1477	the FPU.</li><br>
				1478	<p>
				1479
				1480	<li>If your program does its own memory management, rather than
				1481	using malloc/new/free/delete, it should still work, but
				1482	Valgrind's error checking won't be so effective.</li><br>
				1483	<p>
				1484
				1485	<li>Valgrind's signal simulation is not as robust as it could be.
				1486	Basic POSIX-compliant sigaction and sigprocmask functionality is
				1487	supplied, but it's conceivable that things could go badly awry
				1488	if you do wierd things with signals. Workaround: don't.
				1489	Programs that do non-POSIX signal tricks are in any case
				1490	inherently unportable, so should be avoided if
				1491	possible.</li><br>
				1492	<p>
				1493
				1494	<li>I have no idea what happens if programs try to handle signals on
				1495	an alternate stack (sigaltstack). YMMV.</li><br>
				1496	<p>
				1497
				1498	<li>Programs which switch stacks are not well handled. Valgrind
				1499	does have support for this, but I don't have great faith in it.
				1500	It's difficult -- there's no cast-iron way to decide whether a
				1501	large change in %esp is as a result of the program switching
				1502	stacks, or merely allocating a large object temporarily on the
				1503	current stack -- yet Valgrind needs to handle the two situations
				1504	differently.</li><br>
				1505	<p>
				1506
				1507	<li>x86 instructions, and system calls, have been implemented on
				1508	demand. So it's possible, although unlikely, that a program
				1509	will fall over with a message to that effect. If this happens,
				1510	please mail me ALL the details printed out, so I can try and
				1511	implement the missing feature.</li><br>
				1512	<p>
				1513
				1514	<li>x86 floating point works correctly, but floating-point code may
				1515	run even more slowly than integer code, due to my simplistic
				1516	approach to FPU emulation.</li><br>
				1517	<p>
				1518
				1519	<li>You can't Valgrind-ize statically linked binaries. Valgrind
				1520	relies on the dynamic-link mechanism to gain control at
				1521	startup.</li><br>
				1522	<p>
				1523
				1524	<li>Memory consumption of your program is majorly increased whilst
				1525	running under Valgrind. This is due to the large amount of
				1526	adminstrative information maintained behind the scenes. Another
				1527	cause is that Valgrind dynamically translates the original
				1528	executable and never throws any translation away, except in
				1529	those rare cases where self-modifying code is detected.
				1530	Translated, instrumented code is 8-12 times larger than the
				1531	original (!) so you can easily end up with 15+ MB of
				1532	translations when running (eg) a web browser. There's not a lot
				1533	you can do about this -- use Valgrind on a fast machine with a lot
				1534	of memory and swap space. At some point I may implement a LRU
				1535	caching scheme for translations, so as to bound the maximum
				1536	amount of memory devoted to them, to say 8 or 16 MB.</li>
				1537	</ul>
				1538
				1539
				1540	Programs which are known not to work are:
				1541
				1542	<ul>
				1543	<li>Netscape 4.76 works pretty well on some platforms -- quite
				1544	nicely on my AMD K6-III (400 MHz). I can surf, do mail, etc, no
				1545	problem. On other platforms is has been observed to crash
				1546	during startup. Despite much investigation I can't figure out
				1547	why.</li><br>
				1548	<p>
				1549
				1550	<li>kpackage (a KDE front end to rpm) dies because the CPUID
				1551	instruction is unimplemented. Easy to fix.</li><br>
				1552	<p>
				1553
				1554	<li>knode (a KDE newsreader) tries to do multithreaded things, and
				1555	fails.</li><br>
				1556	<p>
				1557
				1558	<li>emacs starts up but immediately concludes it is out of memory
				1559	and aborts. Emacs has it's own memory-management scheme, but I
				1560	don't understand why this should interact so badly with
				1561	Valgrind.</li><br>
				1562	<p>
				1563
				1564	<li>Gimp and Gnome and GTK-based apps die early on because
				1565	of unimplemented system call wrappers. (I'm a KDE user :)
				1566	This wouldn't be hard to fix.
				1567	</li><br>
				1568	<p>
				1569
				1570	<li>As a consequence of me being a KDE user, almost all KDE apps
				1571	work ok -- except those which are multithreaded.
				1572	</li><br>
				1573	<p>
				1574	</ul>
				1575
				1576
				1577	<p><hr width="100%">
				1578
				1579
				1580	<a name="howitworks"></a>
				1581	<h2>5  How it works -- a rough overview</h2>
				1582	Some gory details, for those with a passion for gory details. You
				1583	don't need to read this section if all you want to do is use Valgrind.
				1584
				1585	<a name="startb"></a>
				1586	<h3>5.1  Getting started</h3>
				1587
				1588	Valgrind is compiled into a shared object, valgrind.so. The shell
				1589	script valgrind sets the LD_PRELOAD environment variable to point to
				1590	valgrind.so. This causes the .so to be loaded as an extra library to
				1591	any subsequently executed dynamically-linked ELF binary, viz, the
				1592	program you want to debug.
				1593
				1594	<p>The dynamic linker allows each .so in the process image to have an
				1595	initialisation function which is run before main(). It also allows
				1596	each .so to have a finalisation function run after main() exits.
				1597
				1598	<p>When valgrind.so's initialisation function is called by the dynamic
				1599	linker, the synthetic CPU to starts up. The real CPU remains locked
				1600	in valgrind.so for the entire rest of the program, but the synthetic
				1601	CPU returns from the initialisation function. Startup of the program
				1602	now continues as usual -- the dynamic linker calls all the other .so's
				1603	initialisation routines, and eventually runs main(). This all runs on
				1604	the synthetic CPU, not the real one, but the client program cannot
				1605	tell the difference.
				1606
				1607	<p>Eventually main() exits, so the synthetic CPU calls valgrind.so's
				1608	finalisation function. Valgrind detects this, and uses it as its cue
				1609	to exit. It prints summaries of all errors detected, possibly checks
				1610	for memory leaks, and then exits the finalisation routine, but now on
				1611	the real CPU. The synthetic CPU has now lost control -- permanently
				1612	-- so the program exits back to the OS on the real CPU, just as it
				1613	would have done anyway.
				1614
				1615	<p>On entry, Valgrind switches stacks, so it runs on its own stack.
				1616	On exit, it switches back. This means that the client program
				1617	continues to run on its own stack, so we can switch back and forth
				1618	between running it on the simulated and real CPUs without difficulty.
				1619	This was an important design decision, because it makes it easy (well,
				1620	significantly less difficult) to debug the synthetic CPU.
				1621
				1622
				1623	<a name="engine"></a>
				1624	<h3>5.2  The translation/instrumentation engine</h3>
				1625
				1626	Valgrind does not directly run any of the original program's code. Only
				1627	instrumented translations are run. Valgrind maintains a translation
				1628	table, which allows it to find the translation quickly for any branch
				1629	target (code address). If no translation has yet been made, the
				1630	translator - a just-in-time translator - is summoned. This makes an
				1631	instrumented translation, which is added to the collection of
				1632	translations. Subsequent jumps to that address will use this
				1633	translation.
				1634
				1635	<p>Valgrind can optionally check writes made by the application, to
				1636	see if they are writing an address contained within code which has
				1637	been translated. Such a write invalidates translations of code
				1638	bracketing the written address. Valgrind will discard the relevant
				1639	translations, which causes them to be re-made, if they are needed
				1640	again, reflecting the new updated data stored there. In this way,
				1641	self modifying code is supported. In practice I have not found any
				1642	Linux applications which use self-modifying-code.
				1643
				1644	<p>The JITter translates basic blocks -- blocks of straight-line-code
				1645	-- as single entities. To minimise the considerable difficulties of
				1646	dealing with the x86 instruction set, x86 instructions are first
				1647	translated to a RISC-like intermediate code, similar to sparc code,
				1648	but with an infinite number of virtual integer registers. Initially
				1649	each insn is translated seperately, and there is no attempt at
				1650	instrumentation.
				1651
				1652	<p>The intermediate code is improved, mostly so as to try and cache
				1653	the simulated machine's registers in the real machine's registers over
				1654	several simulated instructions. This is often very effective. Also,
				1655	we try to remove redundant updates of the simulated machines's
				1656	condition-code register.
				1657
				1658	<p>The intermediate code is then instrumented, giving more
				1659	intermediate code. There are a few extra intermediate-code operations
				1660	to support instrumentation; it is all refreshingly simple. After
				1661	instrumentation there is a cleanup pass to remove redundant value
				1662	checks.
				1663
				1664	<p>This gives instrumented intermediate code which mentions arbitrary
				1665	numbers of virtual registers. A linear-scan register allocator is
				1666	used to assign real registers and possibly generate spill code. All
				1667	of this is still phrased in terms of the intermediate code. This
				1668	machinery is inspired by the work of Reuben Thomas (MITE).
				1669
				1670	<p>Then, and only then, is the final x86 code emitted. The
				1671	intermediate code is carefully designed so that x86 code can be
				1672	generated from it without need for spare registers or other
				1673	inconveniences.
				1674
				1675	<p>The translations are managed using a traditional LRU-based caching
				1676	scheme. The translation cache has a default size of about 14MB.
				1677
				1678	<a name="track"></a>
				1679
				1680	<h3>5.3  Tracking the status of memory</h3> Each byte in the
				1681	process' address space has nine bits associated with it: one A bit and
				1682	eight V bits. The A and V bits for each byte are stored using a
				1683	sparse array, which flexibly and efficiently covers arbitrary parts of
				1684	the 32-bit address space without imposing significant space or
				1685	performance overheads for the parts of the address space never
				1686	visited. The scheme used, and speedup hacks, are described in detail
				1687	at the top of the source file vg_memory.c, so you should read that for
				1688	the gory details.
				1689
				1690	<a name="sys_calls"></a>
				1691
				1692	<h3>5.4 System calls</h3>
				1693	All system calls are intercepted. The memory status map is consulted
				1694	before and updated after each call. It's all rather tiresome. See
				1695	vg_syscall_mem.c for details.
				1696
				1697	<a name="sys_signals"></a>
				1698
				1699	<h3>5.5  Signals</h3>
				1700	All system calls to sigaction() and sigprocmask() are intercepted. If
				1701	the client program is trying to set a signal handler, Valgrind makes a
				1702	note of the handler address and which signal it is for. Valgrind then
				1703	arranges for the same signal to be delivered to its own handler.
				1704
				1705	<p>When such a signal arrives, Valgrind's own handler catches it, and
				1706	notes the fact. At a convenient safe point in execution, Valgrind
				1707	builds a signal delivery frame on the client's stack and runs its
				1708	handler. If the handler longjmp()s, there is nothing more to be said.
				1709	If the handler returns, Valgrind notices this, zaps the delivery
				1710	frame, and carries on where it left off before delivering the signal.
				1711
				1712	<p>The purpose of this nonsense is that setting signal handlers
				1713	essentially amounts to giving callback addresses to the Linux kernel.
				1714	We can't allow this to happen, because if it did, signal handlers
				1715	would run on the real CPU, not the simulated one. This means the
				1716	checking machinery would not operate during the handler run, and,
				1717	worse, memory permissions maps would not be updated, which could cause
				1718	spurious error reports once the handler had returned.
				1719
				1720	<p>An even worse thing would happen if the signal handler longjmp'd
				1721	rather than returned: Valgrind would completely lose control of the
				1722	client program.
				1723
				1724	<p>Upshot: we can't allow the client to install signal handlers
				1725	directly. Instead, Valgrind must catch, on behalf of the client, any
				1726	signal the client asks to catch, and must delivery it to the client on
				1727	the simulated CPU, not the real one. This involves considerable
				1728	gruesome fakery; see vg_signals.c for details.
				1729	<p>
				1730
				1731	<hr width="100%">
				1732
				1733	<a name="example"></a>
				1734	<h2>6  Example</h2>
				1735	This is the log for a run of a small program. The program is in fact
				1736	correct, and the reported error is as the result of a potentially serious
				1737	code generation bug in GNU g++ (snapshot 20010527).
				1738	<pre>
				1739	sewardj@phoenix:~/newmat10$
				1740	~/Valgrind-6/valgrind -v ./bogon
				1741	==25832== Valgrind 0.10, a memory error detector for x86 RedHat 7.1.
				1742	==25832== Copyright (C) 2000-2001, and GNU GPL'd, by Julian Seward.
				1743	==25832== Startup, with flags:
				1744	==25832== --suppressions=/home/sewardj/Valgrind/redhat71.supp
				1745	==25832== reading syms from /lib/ld-linux.so.2
				1746	==25832== reading syms from /lib/libc.so.6
				1747	==25832== reading syms from /mnt/pima/jrs/Inst/lib/libgcc_s.so.0
				1748	==25832== reading syms from /lib/libm.so.6
				1749	==25832== reading syms from /mnt/pima/jrs/Inst/lib/libstdc++.so.3
				1750	==25832== reading syms from /home/sewardj/Valgrind/valgrind.so
				1751	==25832== reading syms from /proc/self/exe
				1752	==25832== loaded 5950 symbols, 142333 line number locations
				1753	==25832==
				1754	==25832== Invalid read of size 4
				1755	==25832== at 0x8048724: _ZN10BandMatrix6ReSizeEiii (bogon.cpp:45)
				1756	==25832== by 0x80487AF: main (bogon.cpp:66)
				1757	==25832== by 0x40371E5E: __libc_start_main (libc-start.c:129)
				1758	==25832== by 0x80485D1: (within /home/sewardj/newmat10/bogon)
				1759	==25832== Address 0xBFFFF74C is not stack'd, malloc'd or free'd
				1760	==25832==
				1761	==25832== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
				1762	==25832== malloc/free: in use at exit: 0 bytes in 0 blocks.
				1763	==25832== malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
				1764	==25832== For a detailed leak analysis, rerun with: --leak-check=yes
				1765	==25832==
				1766	==25832== exiting, did 1881 basic blocks, 0 misses.
				1767	==25832== 223 translations, 3626 bytes in, 56801 bytes out.
				1768	</pre>
				1769	<p>The GCC folks fixed this about a week before gcc-3.0 shipped.
				1770	<hr width="100%">
				1771	<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	1772
				1773
				1774
				1775	<a name="cache"></a>
				1776	<h2>7  Cache profiling</h2>
				1777	As well as memory debugging, Valgrind also allows you to do cache simulations
				1778	and annotate your source line-by-line with the number of cache misses. In
				1779	particular, it records:
				1780	<ul>
				1781	<li>L1 instruction cache reads and misses;
				1782	<li>L1 data cache reads and read misses, writes and write misses;
				1783	<li>L2 unified cache reads and read misses, writes and writes misses.
				1784	</ul>
				1785	On a modern x86 machine, an L1 miss will typically cost around 10 cycles,
				1786	and an L2 miss can cost as much as 200 cycles. Detailed cache profiling can be
				1787	very useful for improving the performance of your program.
				1788
				1789	Please note that this is an experimental feature. Any feedback, bug-fixes,
				1790	suggestions, etc, welcome.
				1791
				1792
				1793	<h3>7.1  Overview</h3>
				1794	First off, as for normal Valgrind use, you probably want to turn on debugging
				1795	info (the <code>-g</code> flag). But by contrast with normal Valgrind use, you
				1796	probably <b>do</b> want to turn optimisation on, since you should profile your
				1797	program as it will be normally run.
				1798
				1799	The three steps are:
				1800	<ol>
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	1801	<li>Generate a cache simulator for your machine's cache
				1802	configuration with the supplied <code>vg_cachegen</code>
				1803	program, and recompile Valgrind with <code>make install</code>.
				1804	<p>
				1805	The default settings are for an AMD Athlon, and you will get
				1806	useful information with the defaults, so you can skip this step
				1807	if you want. Nevertheless, for accurate cache profiles you will
				1808	need use <code>vg_cachegen</code> to customise
				1809	<code>cachegrind</code> for your system.
				1810	<p>
				1811	This step only needs to be done once, unless you are interested
				1812	in simulating different cache configurations (eg. first
				1813	concentrating on instruction cache misses, then on data cache
				1814	misses).
				1815	</li>
				1816	<p>
				1817	<li>Run your program with <code>cachegrind</code> in front of the
				1818	normal command line invocation. When the program finishes,
				1819	Valgrind will print summary cache statistics. It also collects
				1820	line-by-line information in a file <code>cachegrind.out</code>.
				1821	<p>
				1822	This step should be done every time you want to collect
				1823	information about a new program, a changed program, or about the
				1824	same program with different input.
				1825	</li>
				1826	<p>
				1827	<li>Generate a function-by-function summary, and possibly annotate
				1828	source files with 'vg_annotate'. Source files to annotate can be
				1829	specified manually, or manually on the command line, or
				1830	"interesting" source files can be annotated automatically with
				1831	the <code>--auto=yes</code> option. You can annotate C/C++
				1832	files or assembly language files equally easily.</li>
				1833	<p>
				1834	This step can be performed as many times as you like for each
				1835	Step 2. You may want to do multiple annotations showing
				1836	different information each time.<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	1837	</ol>
				1838
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	1839	The steps are described in detail in the following sections.<p>
				1840
				1841
				1842	<a name="generate"></a>
				1843	<h3>7.3  Generating a cache simulator</h3>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	1844
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	1845	Although Valgrind comes with a pre-generated cache simulator, it most
				1846	likely won't match the cache configuration of your machine, so you
				1847	should generate a new simulator.<p>
				1848
				1849	You need to generate three files, one for each of the I1, D1 and L2
				1850	caches. For each cache, you need to know the:
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	1851	<ul>
				1852	<li>Cache size (bytes);
				1853	<li>Line size (bytes);
				1854	<li>Associativity.
				1855	</ul>
				1856
				1857	vg_cachegen takes three options:
				1858	<ul>
				1859	<li><code>--I1=size,line_size,associativity</code>
				1860	<li><code>--D1=size,line_size,associativity</code>
				1861	<li><code>--L2=size,line_size,associativity</code>
				1862	</ul>
				1863
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	1864	You can specify one, two or all three caches per invocation of
				1865	vg_cachegen. It checks that the configuration is sensible before
				1866	generating the simulators; to see the allowed values, run
				1867	<code>vg_cachegen -h</code>.<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	1868
				1869	An example invocation would be:
				1870
				1871	<blockquote><code>
				1872	vg_cachegen --I1=65536,64,2 --D1=65536,64,2 --L2=262144,64,8
				1873	</code></blockquote>
				1874
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	1875	This simulates a machine with a 128KB split L1 2-way associative
				1876	cache, and a 256KB unified 8-way associative L2 cache. Both caches
				1877	have 64B lines.<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	1878
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	1879	If you don't know your cache configuration, you'll have to find it
				1880	out. (Ideally <code>vg_cachegen</code> could auto-identify your cache
				1881	configuration using the CPUID instruction, which could be done
				1882	automatically during installation, and this whole step could be
				1883	skipped.)<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	1884
				1885
				1886	<h3>7.4  Cache simulation specifics</h3>
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	1887
				1888	<code>vg_cachegen</code> only generates simulations for a machine with
				1889	a split L1 cache and a unified L2 cache. This configuration is used
				1890	for all (modern) x86-based machines we are aware of. Old Cyrix CPUs
				1891	had a unified I and D L1 cache, but they are ancient history now.<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	1892
				1893	The more specific characteristics of the simulation are as follows.
				1894
				1895	<ul>
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	1896	<li>Write-allocate: when a write miss occurs, the block written to
				1897	is brought into the D1 cache. Most modern caches have this
				1898	property.</li><p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	1899
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	1900	<li>Bit-selection hash function: the line(s) in the cache to which a
				1901	memory block maps is chosen by the middle bits M--(M+N-1) of the
				1902	byte address, where:
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	1903	<ul>
				1904	<li> line size = 2^M bytes </li>
				1905	<li>(cache size / line size) = 2^N bytes</li>
				1906	</ul> </li><p>
				1907
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	1908	<li>Inclusive L2 cache: the L2 cache replicates all the entries of
				1909	the L1 cache. This is standard on Pentium chips, but AMD
				1910	Athlons use an exclusive L2 cache that only holds blocks evicted
				1911	from L1. Ditto AMD Durons and most modern VIAs.</li><p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	1912	</ul>
				1913
				1914	Other noteworthy behaviour:
				1915
				1916	<ul>
				1917	<li>References that straddle two cache lines are treated as follows:</li>
				1918	<ul>
				1919	<li>If both blocks hit --> counted as one hit</li>
				1920	<li>If one block hits, the other misses --> counted as one miss</li>
				1921	<li>If both blocks miss --> counted as one miss (not two)</li>
				1922	</ul><p>
				1923
				1924	<li>Instructions that modify a memory location (eg. <code>inc</code> and
				1925	<code>dec</code>) are counted as doing just a read, ie. a single data
				1926	reference. This may seem strange, but since the write can never cause a
				1927	miss (the read guarantees the block is in the cache) it's not very
				1928	interesting.<p>
				1929
				1930	Thus it measures not the number of times the data cache is accessed, but
				1931	the number of times a data cache miss could occur.<p>
				1932	</li>
				1933	</ul>
				1934
				1935	If you are interested in simulating a cache with different properties, it is
				1936	not particularly hard to write your own cache simulator, or to modify existing
				1937	ones in <code>vg_cachesim_I1.c</code>, <code>vg_cachesim_I1.c</code> and
				1938	<code>vg_cachesim_I1.c</code>. We'd be interested to hear from anyone who
				1939	does.
				1940
				1941
				1942	<a name="profile"></a>
				1943	<h3>7.5  Profiling programs</h3>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	1944
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	1945	Cache profiling is enabled by using the <code>--cachesim=yes</code>
				1946	option to the <code>valgrind</code> shell script. Alternatively, it
				1947	is probably more convenient to use the <code>cachegrind</code> script.
				1948	This automatically turns off Valgrind's memory checking functions,
				1949	since the cache simulation is slow enough already, and you probably
				1950	don't want to do both at once.
				1951	<p>
				1952	To gather cache profiling information about the program <code>ls
				1953	-l<code, type:
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	1954
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	1955	<blockquote><code>cachegrind ls -l</code></blockquote>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	1956
				1957	The program will execute (slowly). Upon completion, summary statistics
				1958	that look like this will be printed:
				1959
				1960	<pre>
				1961	==31751== I refs: 27,742,716
				1962	==31751== I1 misses: 276
				1963	==31751== L2 misses: 275
				1964	==31751== I1 miss rate: 0.0%
				1965	==31751== L2i miss rate: 0.0%
				1966	==31751==
				1967	==31751== D refs: 15,430,290 (10,955,517 rd + 4,474,773 wr)
				1968	==31751== D1 misses: 41,185 ( 21,905 rd + 19,280 wr)
				1969	==31751== L2 misses: 23,085 ( 3,987 rd + 19,098 wr)
				1970	==31751== D1 miss rate: 0.2% ( 0.1% + 0.4%)
				1971	==31751== L2d miss rate: 0.1% ( 0.0% + 0.4%)
				1972	==31751==
				1973	==31751== L2 misses: 23,360 ( 4,262 rd + 19,098 wr)
				1974	==31751== L2 miss rate: 0.0% ( 0.0% + 0.4%)
				1975	</pre>
				1976
				1977	Cache accesses for instruction fetches are summarised first, giving the
				1978	number of fetches made (this is the number of instructions executed, which
				1979	can be useful to know in its own right), the number of I1 misses, and the
				1980	number of L2 instruction (<code>L2i</code>) misses.<p>
				1981
				1982	Cache accesses for data follow. The information is similar to that of the
				1983	instruction fetches, except that the values are also shown split between reads
				1984	and writes (note each row's <code>rd</code> and <code>wr</code> values add up
				1985	to the row's total).<p>
				1986
				1987	Combined instruction and data figures for the L2 cache follow that.<p>
				1988
				1989
				1990	<h3>7.6  Output file</h3>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	1991
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	1992	As well as printing summary information, Cachegrind also writes
				1993	line-by-line cache profiling information to a file named
				1994	<code>cachegrind.out</code>. This file is human-readable, but is best
				1995	interpreted by the accompanying program <code>vg_annotate</code>,
				1996	described in the next section.
				1997	<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	1998	Things to note about the <code>cachegrind.out</code> file:
				1999	<ul>
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2000	<li>It is written every time <code>valgrind --cachesim=yes</code> or
				2001	<code>cachegrind</code> is run, and will overwrite any existing
				2002	<code>cachegrind.out</code> in the current directory.</li>
				2003	<p>
				2004	<li>It can be huge: <code>ls -l</code> generates a file of about
				2005	350KB. Browsing a few files and web pages with a Konqueror
				2006	built with full debugging information generates a file
				2007	of around 15 MB.</li>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2008	</ul>
				2009
				2010
				2011	<a name="annotate"></a>
				2012	<h3>7.7  Annotating C/C++ programs</h3>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2013
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2014	Before using <code>vg_annotate</code>, it is worth widening your
				2015	window to be at least 120-characters wide if possible, as the output
				2016	lines can be quite long.
				2017	<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2018	To get a function-by-function summary, run <code>vg_annotate</code> in
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2019	directory containing a <code>cachegrind.out</code> file. The output
				2020	looks like this:
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2021
				2022	<pre>
				2023	--------------------------------------------------------------------------------
				2024	I1 cache: 65536 B, 64 B, 2-way associative
				2025	D1 cache: 65536 B, 64 B, 2-way associative
				2026	L2 cache: 262144 B, 64 B, 8-way associative
				2027	Command: concord vg_to_ucode.c
				2028	Events recorded: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				2029	Events shown: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				2030	Event sort order: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				2031	Threshold: 99%
				2032	Chosen for annotation:
				2033	Auto-annotation: on
				2034
				2035	--------------------------------------------------------------------------------
				2036	Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				2037	--------------------------------------------------------------------------------
				2038	27,742,716 276 275 10,955,517 21,905 3,987 4,474,773 19,280 19,098 PROGRAM TOTALS
				2039
				2040	--------------------------------------------------------------------------------
				2041	Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw file:function
				2042	--------------------------------------------------------------------------------
				2043	8,821,482 5 5 2,242,702 1,621 73 1,794,230 0 0 getc.c:_IO_getc
				2044	5,222,023 4 4 2,276,334 16 12 875,959 1 1 concord.c:get_word
				2045	2,649,248 2 2 1,344,810 7,326 1,385 . . . vg_main.c:strcmp
				2046	2,521,927 2 2 591,215 0 0 179,398 0 0 concord.c:hash
				2047	2,242,740 2 2 1,046,612 568 22 448,548 0 0 ctype.c:tolower
				2048	1,496,937 4 4 630,874 9,000 1,400 279,388 0 0 concord.c:insert
				2049	897,991 51 51 897,831 95 30 62 1 1 ???:???
				2050	598,068 1 1 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__flockfile
				2051	598,068 0 0 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__funlockfile
				2052	598,024 4 4 213,580 35 16 149,506 0 0 vg_clientmalloc.c:malloc
				2053	446,587 1 1 215,973 2,167 430 129,948 14,057 13,957 concord.c:add_existing
				2054	341,760 2 2 128,160 0 0 128,160 0 0 vg_clientmalloc.c:vg_trap_here_WRAPPER
				2055	320,782 4 4 150,711 276 0 56,027 53 53 concord.c:init_hash_table
				2056	298,998 1 1 106,785 0 0 64,071 1 1 concord.c:create
				2057	149,518 0 0 149,516 0 0 1 0 0 ???:tolower@@GLIBC_2.0
				2058	149,518 0 0 149,516 0 0 1 0 0 ???:fgetc@@GLIBC_2.0
				2059	95,983 4 4 38,031 0 0 34,409 3,152 3,150 concord.c:new_word_node
				2060	85,440 0 0 42,720 0 0 21,360 0 0 vg_clientmalloc.c:vg_bogus_epilogue
				2061	</pre>
				2062
				2063	First up is a summary of the annotation options:
				2064
				2065	<ul>
				2066	<li>I1 cache, D1 cache, L2 cache: cache configuration. So you know the
				2067	configuration with which these results were obtained.</li><p>
				2068
				2069	<li>Command: the command line invocation of the program under
				2070	examination.</li><p>
				2071
				2072	<li>Events recorded: event abbreviations are:<p>
				2073	<ul>
				2074	<li><code>Ir </code>: I cache reads (ie. instructions executed)</li>
				2075	<li><code>I1mr</code>: I1 cache read misses</li>
				2076	<li><code>I2mr</code>: L2 cache instruction read misses</li>
				2077	<li><code>Dr </code>: D cache reads (ie. memory reads)</li>
				2078	<li><code>D1mr</code>: D1 cache read misses</li>
				2079	<li><code>D2mr</code>: L2 cache data read misses</li>
				2080	<li><code>Dw </code>: D cache writes (ie. memory writes)</li>
				2081	<li><code>D1mw</code>: D1 cache write misses</li>
				2082	<li><code>D2mw</code>: L2 cache data write misses</li>
				2083	</ul><p>
				2084	Note that D1 total accesses is given by <code>D1mr</code> +
				2085	<code>D1mw</code>, and that L2 total accesses is given by
				2086	<code>I2mr</code> + <code>D2mr</code> + <code>D2mw</code>.</li><p>
				2087
				2088	<li>Events shown: the events shown (a subset of events gathered). This can
				2089	be adjusted with the <code>--show</code> option.</li><p>
				2090
				2091	<li>Event sort order: the sort order in which functions are shown. For
				2092	example, in this case the functions are sorted from highest
				2093	<code>Ir</code> counts to lowest. If two functions have identical
				2094	<code>Ir</code> counts, they will then be sorted by <code>I1mr</code>
				2095	counts, and so on. This order can be adjusted with the
				2096	<code>--sort</code> option.<p>
				2097
				2098	Note that this dictates the order the functions appear. It is <b>not</b>
				2099	the order in which the columns appear; that is dictated by the "events
				2100	shown" line (and can be changed with the <code>--sort</code> option).
				2101	</li><p>
				2102
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2103	<li>Threshold: <code>vg_annotate</code> by default omits functions
				2104	that cause very low numbers of misses to avoid drowning you in
				2105	information. In this case, vg_annotate shows summaries the
				2106	functions that account for 99% of the <code>Ir</code> counts;
				2107	<code>Ir</code> is chosen as the threshold event since it is the
				2108	primary sort event. The threshold can be adjusted with the
				2109	<code>--threshold</code> option.</li><p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2110
				2111	<li>Chosen for annotation: names of files specified manually for annotation;
				2112	in this case none.</li><p>
				2113
				2114	<li>Auto-annotation: whether auto-annotation was requested via the
				2115	<code>--auto=yes</code> option. In this case no.</li><p>
				2116	</ul>
				2117
				2118	Then follows summary statistics for the whole program. These are similar
				2119	to the summary provided when running <code>valgrind --cachesim=yes</code>.<p>
				2120
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2121	Then follows function-by-function statistics. Each function is
				2122	identified by a <code>file_name:function_name</code> pair. If a column
				2123	contains only a dot it means the function never performs
				2124	that event (eg. the third row shows that <code>strcmp()</code>
				2125	contains no instructions that write to memory). The name
				2126	<code>???</code> is used if the the file name and/or function name
				2127	could not be determined from debugging information. If most of the
				2128	entries have the form <code>???:???</code> the program probably wasn't
				2129	compiled with <code>-g</code>. <p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2130
				2131	It is worth noting that functions will come from three types of source files:
				2132	<ol>
				2133	<li> From the profiled program (<code>concord.c</code> in this example).</li>
				2134	<li>From libraries (eg. <code>getc.c</code>)</li>
				2135	<li>From Valgrind's implementation of some libc functions (eg.
				2136	<code>vg_clientmalloc.c:malloc</code>). These are recognisable because
				2137	the filename begins with <code>vg_</code>, and is probably one of
				2138	<code>vg_main.c</code>, <code>vg_clientmalloc.c</code> or
				2139	<code>vg_mylibc.c</code>.
				2140	</li>
				2141	</ol>
				2142
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2143	There are two ways to annotate source files -- by choosing them
				2144	manually, or with the <code>--auto=yes</code> option. To do it
				2145	manually, just specify the filenames as arguments to
				2146	<code>vg_annotate</code>. For example, the output from running
				2147	<code>vg_annotate concord.c</code> for our example produces the same
				2148	output as above followed by an annotated version of
				2149	<code>concord.c</code>, a section of which looks like:
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2150
				2151	<pre>
				2152	--------------------------------------------------------------------------------
				2153	-- User-annotated source: concord.c
				2154	--------------------------------------------------------------------------------
				2155	Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				2156
				2157	[snip]
				2158
				2159	. . . . . . . . . void init_hash_table(char file_name, Word_Node table[])
				2160	3 1 1 . . . 1 0 0 {
				2161	. . . . . . . . . FILE *file_ptr;
				2162	. . . . . . . . . Word_Info *data;
				2163	1 0 0 . . . 1 1 1 int line = 1, i;
				2164	. . . . . . . . .
				2165	5 0 0 . . . 3 0 0 data = (Word_Info *) create(sizeof(Word_Info));
				2166	. . . . . . . . .
				2167	4,991 0 0 1,995 0 0 998 0 0 for (i = 0; i < TABLE_SIZE; i++)
				2168	3,988 1 1 1,994 0 0 997 53 52 table[i] = NULL;
				2169	. . . . . . . . .
				2170	. . . . . . . . . /* Open file, check it. */
				2171	6 0 0 1 0 0 4 0 0 file_ptr = fopen(file_name, "r");
				2172	2 0 0 1 0 0 . . . if (!(file_ptr)) {
				2173	. . . . . . . . . fprintf(stderr, "Couldn't open '%s'.\n", file_name);
				2174	1 1 1 . . . . . . exit(EXIT_FAILURE);
				2175	. . . . . . . . . }
				2176	. . . . . . . . .
				2177	165,062 1 1 73,360 0 0 91,700 0 0 while ((line = get_word(data, line, file_ptr)) != EOF)
				2178	146,712 0 0 73,356 0 0 73,356 0 0 insert(data->;word, data->line, table);
				2179	. . . . . . . . .
				2180	4 0 0 1 0 0 2 0 0 free(data);
				2181	4 0 0 1 0 0 2 0 0 fclose(file_ptr);
				2182	3 0 0 2 0 0 . . . }
				2183	</pre>
				2184
				2185	(Although column widths are automatically minimised, a wide terminal is clearly
				2186	useful.)<p>
				2187
				2188	Each source file is clearly marked (<code>User-annotated source</code>) as
				2189	having been chosen manually for annotation. If the file was found in one of
				2190	the directories specified with the <code>-I</code>/<code>--include</code>
				2191	option, the directory and file are both given.<p>
				2192
				2193	Each line is annotated with its event counts. Events not applicable for a line
				2194	are represented by a `.'; this is useful for distinguishing between an event
				2195	which cannot happen, and one which can but did not.<p>
				2196
				2197	Sometimes only a small section of a source file is executed. To minimise
				2198	uninteresting output, Valgrind only shows annotated lines and lines within a
				2199	small distance of annotated lines. Gaps are marked with the line numbers so
				2200	you know which part of a file the shown code comes from, eg:
				2201
				2202	<pre>
				2203	(figures and code for line 704)
				2204	-- line 704 ----------------------------------------
				2205	-- line 878 ----------------------------------------
				2206	(figures and code for line 878)
				2207	</pre>
				2208
				2209	The amount of context to show around annotated lines is controlled by the
				2210	<code>--context</code> option.<p>
				2211
				2212	To get automatic annotation, run <code>vg_annotate --auto=yes</code>.
				2213	vg_annotate will automatically annotate every source file it can find that is
				2214	mentioned in the function-by-function summary. Therefore, the files chosen for
				2215	auto-annotation are affected by the <code>--sort</code> and
				2216	<code>--threshold</code> options. Each source file is clearly marked
				2217	(<code>Auto-annotated source</code>) as being chosen automatically. Any files
				2218	that could not be found are mentioned at the end of the output, eg:
				2219
				2220	<pre>
				2221	--------------------------------------------------------------------------------
				2222	The following files chosen for auto-annotation could not be found:
				2223	--------------------------------------------------------------------------------
				2224	getc.c
				2225	ctype.c
				2226	../sysdeps/generic/lockfile.c
				2227	</pre>
				2228
				2229	This is quite common for library files, since libraries are usually compiled
				2230	with debugging information, but the source files are often not present on a
				2231	system. If a file is chosen for annotation <b>both</b> manually and
				2232	automatically, it is marked as <code>User-annotated source</code>.
				2233
				2234	Use the <code>-I/--include</code> option to tell Valgrind where to look for
				2235	source files if the filenames found from the debugging information aren't
				2236	specific enough.
				2237
				2238	Beware that vg_annotate can take some time to digest large
				2239	<code>cachegrind.out</code> files, eg. 30 seconds or more. Also beware that
				2240	auto-annotation can produce a lot of output if your program is large!
				2241
				2242
				2243	<h3>7.8  Annotating assembler programs</h3>
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2244
				2245	Valgrind can annotate assembler programs too, or annotate the
				2246	assembler generated for your C program. Sometimes this is useful for
				2247	understanding what is really happening when an interesting line of C
				2248	code is translated into multiple instructions.<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2249
				2250	To do this, you just need to assemble your <code>.s</code> files with
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2251	assembler-level debug information. gcc doesn't do this, but you can
				2252	use the GNU assembler with the <code>--gstabs</code> option to
				2253	generate object files with this information, eg:
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2254
				2255	<blockquote><code>as --gstabs foo.s</code></blockquote>
				2256
				2257	You can then profile and annotate source files in the same way as for C/C++
				2258	programs.
				2259
				2260
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2261	<h3>7.9  <code>vg_annotate</code> options</h3>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2262	<ul>
				2263	<li><code>-h, --help</code></li><p>
				2264	<li><code>-v, --version</code><p>
				2265
				2266	Help and version, as usual.</li>
				2267
				2268	<li><code>--sort=A,B,C</code> [default: order in
				2269	<code>cachegrind.out</code>]<p>
				2270	Specifies the events upon which the sorting of the function-by-function
				2271	entries will be based. Useful if you want to concentrate on eg. I cache
				2272	misses (<code>--sort=I1mr,I2mr</code>), or D cache misses
				2273	(<code>--sort=D1mr,D2mr</code>), or L2 misses
				2274	(<code>--sort=D2mr,I2mr</code>).</li><p>
				2275
				2276	<li><code>--show=A,B,C</code> [default: all, using order in
				2277	<code>cachegrind.out</code>]<p>
				2278	Specifies which events to show (and the column order). Default is to use
				2279	all present in the <code>cachegrind.out</code> file (and use the order in
				2280	the file).</li><p>
				2281
				2282	<li><code>--threshold=X</code> [default: 99%] <p>
				2283	Sets the threshold for the function-by-function summary. Functions are
				2284	shown that account for more than X% of all the primary sort events. If
				2285	auto-annotating, also affects which files are annotated.</li><p>
				2286
				2287	<li><code>--auto=no</code> [default]<br>
				2288	<code>--auto=yes</code> <p>
				2289	When enabled, automatically annotates every file that is mentioned in the
				2290	function-by-function summary that can be found. Also gives a list of
				2291	those that couldn't be found.
				2292
				2293	<li><code>--context=N</code> [default: 8]<p>
				2294	Print N lines of context before and after each annotated line. Avoids
				2295	printing large sections of source files that were not executed. Use a
				2296	large number (eg. 10,000) to show all source lines.
				2297	</li><p>
				2298
				2299	<li><code>-I=<dir>, --include=<dir></code>
				2300	[default: empty string]<p>
				2301	Adds a directory to the list in which to search for files. Multiple
				2302	-I/--include options can be given to add multiple directories.
				2303	</ul>
				2304
				2305
				2306	<h3>7.10  Warnings</h3>
				2307	There are a couple of situations in which vg_annotate issues warnings.
				2308
				2309	<ul>
				2310	<li>If a source file is more recent than the <code>cachegrind.out</code>
				2311	file. This is because the information in <code>cachegrind.out</code> is
				2312	only recorded with line numbers, so if the line numbers change at all in
				2313	the source (eg. lines added, deleted, swapped), any annotations will be
				2314	incorrect.<p>
				2315
				2316	<li>If information is recorded about line numbers past the end of a file.
				2317	This can be caused by the above problem, ie. shortening the source file
				2318	while using an old <code>cachegrind.out</code> file. If this happens,
				2319	the figures for the bogus lines are printed anyway (clearly marked as
				2320	bogus) in case they are important.</li><p>
				2321	</ul>
				2322
				2323
				2324	<h3>7.10  Things to watch out for</h3>
				2325	Some odd things that can occur during annotation:
				2326
				2327	<ul>
				2328	<li>If annotating at the assembler level, you might see something like this:
				2329
				2330	<pre>
				2331	1 0 0 . . . . . . leal -12(%ebp),%eax
				2332	1 0 0 . . . 1 0 0 movl %eax,84(%ebx)
				2333	2 0 0 0 0 0 1 0 0 movl $1,-20(%ebp)
				2334	. . . . . . . . . .align 4,0x90
				2335	1 0 0 . . . . . . movl $.LnrB,%eax
				2336	1 0 0 . . . 1 0 0 movl %eax,-16(%ebp)
				2337	</pre>
				2338
				2339	How can the third instruction be executed twice when the others are
				2340	executed only once? As it turns out, it isn't. Here's a dump of the
				2341	executable, from objdump:
				2342
				2343	<pre>
				2344	8048f25: 8d 45 f4 lea 0xfffffff4(%ebp),%eax
				2345	8048f28: 89 43 54 mov %eax,0x54(%ebx)
				2346	8048f2b: c7 45 ec 01 00 00 00 movl $0x1,0xffffffec(%ebp)
				2347	8048f32: 89 f6 mov %esi,%esi
				2348	8048f34: b8 08 8b 07 08 mov $0x8078b08,%eax
				2349	8048f39: 89 45 f0 mov %eax,0xfffffff0(%ebp)
				2350	</pre>
				2351
				2352	Notice the extra <code>mov %esi,%esi</code> instruction. Where did this
				2353	come from? The GNU assembler inserted it to serve as the two bytes of
				2354	padding needed to align the <code>movl $.LnrB,%eax</code> instruction on
				2355	a four-byte boundary, but pretended it didn't exist when adding debug
				2356	information. Thus when Valgrind reads the debug info it thinks that the
				2357	<code>movl $0x1,0xffffffec(%ebp)</code> instruction covers the address
				2358	range 0x8048f2b--0x804833 by itself, and attributes the counts for the
				2359	<code>mov %esi,%esi</code> to it.<p>
				2360	</li>
				2361
				2362	<li>
				2363	Inlined functions can cause strange results in the function-by-function
				2364	summary. If a function <code>inline_me()</code> is defined in
				2365	<code>foo.h</code> and inlined in the functions <code>f1()</code>,
				2366	<code>f2()</code> and <code>f3()</code> in <code>bar.c</code>, there will
				2367	not be a <code>foo.h:inline_me()</code> function entry. Instead, there
				2368	will be separate function entries for each inlining site, ie.
				2369	<code>foo.h:f1()</code>, <code>foo.h:f2()</code> and
				2370	<code>foo.h:f3()</code>. To find the total counts for
				2371	<code>foo.h:inline_me()</code>, add up the counts from each entry.<p>
				2372
				2373	The reason for this is that although the debug info output by gcc
				2374	indicates the switch from <code>bar.c</code> to <code>foo.h</code>, it
				2375	doesn't indicate the name of the function in <code>foo.h</code>, so
				2376	Valgrind keeps using the old one.<p>
				2377
				2378	<li>
				2379	Sometimes, the same filename might be represented with a relative name
				2380	and with an absolute name in different parts of the debug info, eg:
				2381	<code>/home/user/proj/proj.h</code> and <code>../proj.h</code>. In this
				2382	case, if you use auto-annotation, the file will be annotated twice with
				2383	the counts split between the two.<p>
				2384	</li>
				2385	</ul>
				2386
				2387	Note: stabs is not an easy format to read. If you come across bizarre
				2388	annotations that look like might be caused by a bug in the stabs reader,
				2389	please let us know.
				2390
				2391
				2392	<h3>7.11  Accuracy</h3>
				2393	Valgrind's cache profiling has a number of shortcomings:
				2394
				2395	<ul>
				2396	<li>It doesn't account for kernel activity -- the effect of system calls on
				2397	the cache contents is ignored.</li><p>
				2398
				2399	<li>It doesn't account for other process activity (although this is probably
				2400	desirable when considering a single program).</li><p>
				2401
				2402	<li>It doesn't account for virtual-to-physical address mappings; hence the
				2403	entire simulation is not a true representation of what's happening in the
				2404	cache.</li><p>
				2405
				2406	<li>It doesn't account for cache misses not visible at the instruction level,
				2407	eg. those arising from TLB misses, or speculative execution.</li><p>
njn	db75e4d	2002-04-30 12:46:22 +0000	[diff] [blame]	2408
				2409	<li>The instructions <code>bts</code>, <code>btr</code> and <code>btc</code>
				2410	will incorrectly be counted as doing a data read if both the arguments
				2411	are registers, eg:
				2412
				2413	<blockquote><code>btsl %eax, %edx</code></blockquote>
				2414
				2415	This should only happen rarely.
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2416	</ul>
				2417
				2418	Another thing worth nothing is that results are very sensitive. Changing the
				2419	size of the <code>valgrind.so</code> file, the size of the program being
				2420	profiled, or even the length of its name can perturb the results. Variations
				2421	will be small, but don't expect perfectly repeatable results if your program
				2422	changes at all.<p>
				2423
				2424	While these factors mean you shouldn't trust the results to be super-accurate,
				2425	hopefully they should be close enough to be useful.<p>
				2426
				2427
				2428	<h3>7.12  Todo</h3>
				2429	<ul>
				2430	<li>Use CPUID instruction to auto-identify cache configuration during
				2431	installation. This would save the user from having to know their cache
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2432	configuration and using vg_cachegen.</li>
				2433	<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2434	<li>Program start-up/shut-down calls a lot of functions that aren't
				2435	interesting and just complicate the output. Would be nice to exclude
sewardj	434f57f	2002-05-01 01:24:52 +0000	[diff] [blame]	2436	these somehow.</li>
				2437	<p>
				2438	<li>Handle files with more than 65535 lines.</li><p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2439	</ul>
				2440	<hr width="100%">
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	2441	</body>
				2442	</html>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame]	2443