Blame - memcheck/docs/manual.html - platform/external/valgrind

blob: a97c2f9fec2d9ebed2a6f959b80e33f91db6680b [file] [log] [blame]

sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1	<html>
				2	<head>
				3	<style type="text/css">
				4	body { background-color: #ffffff;
				5	color: #000000;
				6	font-family: Times, Helvetica, Arial;
				7	font-size: 14pt}
				8	h4 { margin-bottom: 0.3em}
				9	code { color: #000000;
				10	font-family: Courier;
				11	font-size: 13pt }
				12	pre { color: #000000;
				13	font-family: Courier;
				14	font-size: 13pt }
				15	a:link { color: #0000C0;
				16	text-decoration: none; }
				17	a:visited { color: #0000C0;
				18	text-decoration: none; }
				19	a:active { color: #0000C0;
				20	text-decoration: none; }
				21	</style>
				22	</head>
				23
				24	<body bgcolor="#ffffff">
				25
				26	<a name="title"> </a>
sewardj	a7dc795	2002-03-24 11:29:13 +0000	[diff] [blame]	27	<h1 align=center>Valgrind, snapshot 20020324</h1>
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	28	<center>This manual was minimally updated on 20020415</center>
				29	<p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	30
				31	<center>
				32	<a href="mailto:jseward@acm.org">jseward@acm.org<br>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	33	Copyright © 2000-2002 Julian Seward
				34	<p>
				35	Valgrind is licensed under the GNU General Public License,
				36	version 2<br>
				37	An open-source tool for finding memory-management problems in
				38	Linux-x86 executables.
				39	</center>
				40
				41	<p>
				42
				43	<hr width="100%">
				44	<a name="contents"></a>
				45	<h2>Contents of this manual</h2>
				46
				47	<h4>1  <a href="#intro">Introduction</a></h4>
				48	1.1  <a href="#whatfor">What Valgrind is for</a><br>
				49	1.2  <a href="#whatdoes">What it does with your program</a>
				50
				51	<h4>2  <a href="#howtouse">How to use it, and how to make sense
				52	of the results</a></h4>
				53	2.1  <a href="#starta">Getting started</a><br>
				54	2.2  <a href="#comment">The commentary</a><br>
				55	2.3  <a href="#report">Reporting of errors</a><br>
				56	2.4  <a href="#suppress">Suppressing errors</a><br>
				57	2.5  <a href="#flags">Command-line flags</a><br>
				58	2.6  <a href="#errormsgs">Explaination of error messages</a><br>
				59	2.7  <a href="#suppfiles">Writing suppressions files</a><br>
				60	2.8  <a href="#install">Building and installing</a><br>
				61	2.9  <a href="#problems">If you have problems</a><br>
				62
				63	<h4>3  <a href="#machine">Details of the checking machinery</a></h4>
				64	3.1  <a href="#vvalue">Valid-value (V) bits</a><br>
				65	3.2  <a href="#vaddress">Valid-address (A) bits</a><br>
				66	3.3  <a href="#together">Putting it all together</a><br>
				67	3.4  <a href="#signals">Signals</a><br>
				68	3.5  <a href="#leaks">Memory leak detection</a><br>
				69
				70	<h4>4  <a href="#limits">Limitations</a></h4>
				71
				72	<h4>5  <a href="#howitworks">How it works -- a rough overview</a></h4>
				73	5.1  <a href="#startb">Getting started</a><br>
				74	5.2  <a href="#engine">The translation/instrumentation engine</a><br>
				75	5.3  <a href="#track">Tracking the status of memory</a><br>
				76	5.4  <a href="#sys_calls">System calls</a><br>
				77	5.5  <a href="#sys_signals">Signals</a><br>
				78
				79	<h4>6  <a href="#example">An example</a></h4>
				80
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame^]	81	<h4>7  <a href="#cache">Cache profiling</a></h4>
				82
				83	<h4>8  <a href="techdocs.html">The design and implementation of Valgrind</a></h4>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	84
				85	<hr width="100%">
				86
				87	<a name="intro"></a>
				88	<h2>1  Introduction</h2>
				89
				90	<a name="whatfor"></a>
				91	<h3>1.1  What Valgrind is for</h3>
				92
				93	Valgrind is a tool to help you find memory-management problems in your
				94	programs. When a program is run under Valgrind's supervision, all
				95	reads and writes of memory are checked, and calls to
				96	malloc/new/free/delete are intercepted. As a result, Valgrind can
				97	detect problems such as:
				98	<ul>
				99	<li>Use of uninitialised memory</li>
				100	<li>Reading/writing memory after it has been free'd</li>
				101	<li>Reading/writing off the end of malloc'd blocks</li>
				102	<li>Reading/writing inappropriate areas on the stack</li>
				103	<li>Memory leaks -- where pointers to malloc'd blocks are lost forever</li>
				104	</ul>
				105
				106	Problems like these can be difficult to find by other means, often
				107	lying undetected for long periods, then causing occasional,
				108	difficult-to-diagnose crashes.
				109
				110	<p>
				111	Valgrind is closely tied to details of the CPU, operating system and
				112	to a less extent, compiler and basic C libraries. This makes it
				113	difficult to make it portable, so I have chosen at the outset to
				114	concentrate on what I believe to be a widely used platform: Red Hat
				115	Linux 7.2, on x86s. I believe that it will work without significant
				116	difficulty on other x86 GNU/Linux systems which use the 2.4 kernel and
				117	GNU libc 2.2.X, for example SuSE 7.1 and Mandrake 8.0. Red Hat 6.2 is
				118	also supported. It has worked in the past, and probably still does,
				119	on RedHat 7.1 and 6.2. Note that I haven't compiled it on RedHat 7.1
				120	and 6.2 for a while, so they may no longer work now.
				121	<p>
				122	(Early Feb 02: after feedback from the KDE people it also works better
				123	on other Linuxes).
				124	<p>
				125	At some point in the past, Valgrind has also worked on Red Hat 6.2
				126	(x86), thanks to the efforts of Rob Noble.
				127
				128	<p>
				129	Valgrind is licensed under the GNU General Public License, version
				130	2. Read the file LICENSE in the source distribution for details.
				131
				132	<a name="whatdoes">
				133	<h3>1.2  What it does with your program</h3>
				134
				135	Valgrind is designed to be as non-intrusive as possible. It works
				136	directly with existing executables. You don't need to recompile,
				137	relink, or otherwise modify, the program to be checked. Simply place
				138	the word <code>valgrind</code> at the start of the command line
				139	normally used to run the program. So, for example, if you want to run
				140	the command <code>ls -l</code> on Valgrind, simply issue the
				141	command: <code>valgrind ls -l</code>.
				142
				143	<p>Valgrind takes control of your program before it starts. Debugging
				144	information is read from the executable and associated libraries, so
				145	that error messages can be phrased in terms of source code
				146	locations. Your program is then run on a synthetic x86 CPU which
				147	checks every memory access. All detected errors are written to a
				148	log. When the program finishes, Valgrind searches for and reports on
				149	leaked memory.
				150
				151	<p>You can run pretty much any dynamically linked ELF x86 executable using
				152	Valgrind. Programs run 25 to 50 times slower, and take a lot more
				153	memory, than they usually would. It works well enough to run large
				154	programs. For example, the Konqueror web browser from the KDE Desktop
				155	Environment, version 2.1.1, runs slowly but usably on Valgrind.
				156
				157	<p>Valgrind simulates every single instruction your program executes.
				158	Because of this, it finds errors not only in your application but also
				159	in all supporting dynamically-linked (.so-format) libraries, including
				160	the GNU C library, the X client libraries, Qt, if you work with KDE, and
				161	so on. That often includes libraries, for example the GNU C library,
				162	which contain memory access violations, but which you cannot or do not
				163	want to fix.
				164
				165	<p>Rather than swamping you with errors in which you are not
				166	interested, Valgrind allows you to selectively suppress errors, by
				167	recording them in a suppressions file which is read when Valgrind
				168	starts up. As supplied, Valgrind comes with a suppressions file
				169	designed to give reasonable behaviour on Red Hat 7.2 (also 7.1 and
				170	6.2) when running text-only and simple X applications.
				171
				172	<p><a href="#example">Section 6</a> shows an example of use.
				173	<p>
				174	<hr width="100%">
				175
				176	<a name="howtouse"></a>
				177	<h2>2  How to use it, and how to make sense of the results</h2>
				178
				179	<a name="starta"></a>
				180	<h3>2.1  Getting started</h3>
				181
				182	First off, consider whether it might be beneficial to recompile your
				183	application and supporting libraries with optimisation disabled and
				184	debugging info enabled (the <code>-g</code> flag). You don't have to
				185	do this, but doing so helps Valgrind produce more accurate and less
				186	confusing error reports. Chances are you're set up like this already,
				187	if you intended to debug your program with GNU gdb, or some other
				188	debugger.
				189
				190	<p>Then just run your application, but place the word
				191	<code>valgrind</code> in front of your usual command-line invokation.
				192	Note that you should run the real (machine-code) executable here. If
				193	your application is started by, for example, a shell or perl script,
				194	you'll need to modify it to invoke Valgrind on the real executables.
				195	Running such scripts directly under Valgrind will result in you
				196	getting error reports pertaining to <code>/bin/sh</code>,
				197	<code>/usr/bin/perl</code>, or whatever interpreter you're using.
				198	This almost certainly isn't what you want and can be hugely confusing.
				199
				200	<a name="comment"></a>
				201	<h3>2.2  The commentary</h3>
				202
				203	Valgrind writes a commentary, detailing error reports and other
				204	significant events. The commentary goes to standard output by
				205	default. This may interfere with your program, so you can ask for it
				206	to be directed elsewhere.
				207
				208	<p>All lines in the commentary are of the following form:<br>
				209	<pre>
				210	==12345== some-message-from-Valgrind
				211	</pre>
				212	<p>The <code>12345</code> is the process ID. This scheme makes it easy
				213	to distinguish program output from Valgrind commentary, and also easy
				214	to differentiate commentaries from different processes which have
				215	become merged together, for whatever reason.
				216
				217	<p>By default, Valgrind writes only essential messages to the commentary,
				218	so as to avoid flooding you with information of secondary importance.
				219	If you want more information about what is happening, re-run, passing
				220	the <code>-v</code> flag to Valgrind.
				221
				222
				223	<a name="report"></a>
				224	<h3>2.3  Reporting of errors</h3>
				225
				226	When Valgrind detects something bad happening in the program, an error
				227	message is written to the commentary. For example:<br>
				228	<pre>
				229	==25832== Invalid read of size 4
				230	==25832== at 0x8048724: BandMatrix::ReSize(int, int, int) (bogon.cpp:45)
				231	==25832== by 0x80487AF: main (bogon.cpp:66)
				232	==25832== by 0x40371E5E: __libc_start_main (libc-start.c:129)
				233	==25832== by 0x80485D1: (within /home/sewardj/newmat10/bogon)
				234	==25832== Address 0xBFFFF74C is not stack'd, malloc'd or free'd
				235	</pre>
				236
				237	<p>This message says that the program did an illegal 4-byte read of
				238	address 0xBFFFF74C, which, as far as it can tell, is not a valid stack
				239	address, nor corresponds to any currently malloc'd or free'd blocks.
				240	The read is happening at line 45 of <code>bogon.cpp</code>, called
				241	from line 66 of the same file, etc. For errors associated with an
				242	identified malloc'd/free'd block, for example reading free'd memory,
				243	Valgrind reports not only the location where the error happened, but
				244	also where the associated block was malloc'd/free'd.
				245
				246	<p>Valgrind remembers all error reports. When an error is detected,
				247	it is compared against old reports, to see if it is a duplicate. If
				248	so, the error is noted, but no further commentary is emitted. This
				249	avoids you being swamped with bazillions of duplicate error reports.
				250
				251	<p>If you want to know how many times each error occurred, run with
				252	the <code>-v</code> option. When execution finishes, all the reports
				253	are printed out, along with, and sorted by, their occurrence counts.
				254	This makes it easy to see which errors have occurred most frequently.
				255
				256	<p>Errors are reported before the associated operation actually
				257	happens. For example, if you program decides to read from address
				258	zero, Valgrind will emit a message to this effect, and the program
				259	will then duly die with a segmentation fault.
				260
				261	<p>In general, you should try and fix errors in the order that they
				262	are reported. Not doing so can be confusing. For example, a program
				263	which copies uninitialised values to several memory locations, and
				264	later uses them, will generate several error messages. The first such
				265	error message may well give the most direct clue to the root cause of
				266	the problem.
				267
				268	<a name="suppress"></a>
				269	<h3>2.4  Suppressing errors</h3>
				270
				271	Valgrind detects numerous problems in the base libraries, such as the
				272	GNU C library, and the XFree86 client libraries, which come
				273	pre-installed on your GNU/Linux system. You can't easily fix these,
				274	but you don't want to see these errors (and yes, there are many!) So
				275	Valgrind reads a list of errors to suppress at startup. By default
				276	this file is <code>redhat72.supp</code>, located in the Valgrind
				277	installation directory.
				278
				279	<p>You can modify and add to the suppressions file at your leisure, or
				280	write your own. Multiple suppression files are allowed. This is
				281	useful if part of your project contains errors you can't or don't want
				282	to fix, yet you don't want to continuously be reminded of them.
				283
				284	<p>Each error to be suppressed is described very specifically, to
				285	minimise the possibility that a suppression-directive inadvertantly
				286	suppresses a bunch of similar errors which you did want to see. The
				287	suppression mechanism is designed to allow precise yet flexible
				288	specification of errors to suppress.
				289
				290	<p>If you use the <code>-v</code> flag, at the end of execution, Valgrind
				291	prints out one line for each used suppression, giving its name and the
				292	number of times it got used. Here's the suppressions used by a run of
				293	<code>ls -l</code>:
				294	<pre>
				295	--27579-- supp: 1 socketcall.connect(serv_addr)/__libc_connect/__nscd_getgrgid_r
				296	--27579-- supp: 1 socketcall.connect(serv_addr)/__libc_connect/__nscd_getpwuid_r
				297	--27579-- supp: 6 strrchr/_dl_map_object_from_fd/_dl_map_object
				298	</pre>
				299
				300	<a name="flags"></a>
				301	<h3>2.5  Command-line flags</h3>
				302
				303	You invoke Valgrind like this:
				304	<pre>
				305	valgrind [options-for-Valgrind] your-prog [options for your-prog]
				306	</pre>
				307
				308	<p>Valgrind's default settings succeed in giving reasonable behaviour
				309	in most cases. Available options, in no particular order, are as
				310	follows:
				311	<ul>
				312	<li><code>--help</code></li><br>
				313
				314	<li><code>--version</code><br>
				315	<p>The usual deal.</li><br><p>
				316
				317	<li><code>-v --verbose</code><br>
				318	<p>Be more verbose. Gives extra information on various aspects
				319	of your program, such as: the shared objects loaded, the
				320	suppressions used, the progress of the instrumentation engine,
				321	and warnings about unusual behaviour.
				322	</li><br><p>
				323
				324	<li><code>-q --quiet</code><br>
				325	<p>Run silently, and only print error messages. Useful if you
				326	are running regression tests or have some other automated test
				327	machinery.
				328	</li><br><p>
				329
				330	<li><code>--demangle=no</code><br>
				331	<code>--demangle=yes</code> [the default]
				332	<p>Disable/enable automatic demangling (decoding) of C++ names.
				333	Enabled by default. When enabled, Valgrind will attempt to
				334	translate encoded C++ procedure names back to something
				335	approaching the original. The demangler handles symbols mangled
				336	by g++ versions 2.X and 3.X.
				337
				338	<p>An important fact about demangling is that function
				339	names mentioned in suppressions files should be in their mangled
				340	form. Valgrind does not demangle function names when searching
				341	for applicable suppressions, because to do otherwise would make
				342	suppressions file contents dependent on the state of Valgrind's
				343	demangling machinery, and would also be slow and pointless.
				344	</li><br><p>
				345
				346	<li><code>--num-callers=<number></code> [default=4]<br>
				347	<p>By default, Valgrind shows four levels of function call names
				348	to help you identify program locations. You can change that
				349	number with this option. This can help in determining the
				350	program's location in deeply-nested call chains. Note that errors
				351	are commoned up using only the top three function locations (the
				352	place in the current function, and that of its two immediate
				353	callers). So this doesn't affect the total number of errors
				354	reported.
				355	<p>
				356	The maximum value for this is 50. Note that higher settings
				357	will make Valgrind run a bit more slowly and take a bit more
				358	memory, but can be useful when working with programs with
				359	deeply-nested call chains.
				360	</li><br><p>
				361
				362	<li><code>--gdb-attach=no</code> [the default]<br>
				363	<code>--gdb-attach=yes</code>
				364	<p>When enabled, Valgrind will pause after every error shown,
				365	and print the line
				366	<br>
				367	<code>---- Attach to GDB ? --- [Return/N/n/Y/y/C/c] ----</code>
				368	<p>
				369	Pressing <code>Ret</code>, or <code>N</code> <code>Ret</code>
				370	or <code>n</code> <code>Ret</code>, causes Valgrind not to
				371	start GDB for this error.
				372	<p>
				373	<code>Y</code> <code>Ret</code>
				374	or <code>y</code> <code>Ret</code> causes Valgrind to
				375	start GDB, for the program at this point. When you have
				376	finished with GDB, quit from it, and the program will continue.
				377	Trying to continue from inside GDB doesn't work.
				378	<p>
				379	<code>C</code> <code>Ret</code>
				380	or <code>c</code> <code>Ret</code> causes Valgrind not to
				381	start GDB, and not to ask again.
				382	<p>
				383	<code>--gdb-attach=yes</code> conflicts with
				384	<code>--trace-children=yes</code>. You can't use them
				385	together. Valgrind refuses to start up in this situation.
				386	</li><br><p>
				387
				388	<li><code>--partial-loads-ok=yes</code> [the default]<br>
				389	<code>--partial-loads-ok=no</code>
				390	<p>Controls how Valgrind handles word (4-byte) loads from
				391	addresses for which some bytes are addressible and others
				392	are not. When <code>yes</code> (the default), such loads
				393	do not elicit an address error. Instead, the loaded V bytes
				394	corresponding to the illegal addresses indicate undefined, and
				395	those corresponding to legal addresses are loaded from shadow
				396	memory, as usual.
				397	<p>
				398	When <code>no</code>, loads from partially
				399	invalid addresses are treated the same as loads from completely
				400	invalid addresses: an illegal-address error is issued,
				401	and the resulting V bytes indicate valid data.
				402	</li><br><p>
				403
				404	<li><code>--sloppy-malloc=no</code> [the default]<br>
				405	<code>--sloppy-malloc=yes</code>
				406	<p>When enabled, all requests for malloc/calloc are rounded up
				407	to a whole number of machine words -- in other words, made
				408	divisible by 4. For example, a request for 17 bytes of space
				409	would result in a 20-byte area being made available. This works
				410	around bugs in sloppy libraries which assume that they can
				411	safely rely on malloc/calloc requests being rounded up in this
				412	fashion. Without the workaround, these libraries tend to
				413	generate large numbers of errors when they access the ends of
				414	these areas. Valgrind snapshots dated 17 Feb 2002 and later are
				415	cleverer about this problem, and you should no longer need to
				416	use this flag.
				417	</li><br><p>
				418
				419	<li><code>--trace-children=no</code> [the default]</br>
				420	<code>--trace-children=yes</code>
				421	<p>When enabled, Valgrind will trace into child processes. This
				422	is confusing and usually not what you want, so is disabled by
				423	default.</li><br><p>
				424
				425	<li><code>--freelist-vol=<number></code> [default: 1000000]
				426	<p>When the client program releases memory using free (in C) or
				427	delete (C++), that memory is not immediately made available for
				428	re-allocation. Instead it is marked inaccessible and placed in
				429	a queue of freed blocks. The purpose is to delay the point at
				430	which freed-up memory comes back into circulation. This
				431	increases the chance that Valgrind will be able to detect
				432	invalid accesses to blocks for some significant period of time
				433	after they have been freed.
				434	<p>
				435	This flag specifies the maximum total size, in bytes, of the
				436	blocks in the queue. The default value is one million bytes.
				437	Increasing this increases the total amount of memory used by
				438	Valgrind but may detect invalid uses of freed blocks which would
				439	otherwise go undetected.</li><br><p>
				440
				441	<li><code>--logfile-fd=<number></code> [default: 2, stderr]
				442	<p>Specifies the file descriptor on which Valgrind communicates
				443	all of its messages. The default, 2, is the standard error
				444	channel. This may interfere with the client's own use of
				445	stderr. To dump Valgrind's commentary in a file without using
				446	stderr, something like the following works well (sh/bash
				447	syntax):<br>
				448	<code>
				449	valgrind --logfile-fd=9 my_prog 9> logfile</code><br>
				450	That is: tell Valgrind to send all output to file descriptor 9,
				451	and ask the shell to route file descriptor 9 to "logfile".
				452	</li><br><p>
				453
				454	<li><code>--suppressions=<filename></code> [default:
				455	/installation/directory/redhat72.supp] <p>Specifies an extra
				456	file from which to read descriptions of errors to suppress. You
				457	may use as many extra suppressions files as you
				458	like.</li><br><p>
				459
				460	<li><code>--leak-check=no</code> [default]<br>
				461	<code>--leak-check=yes</code>
				462	<p>When enabled, search for memory leaks when the client program
				463	finishes. A memory leak means a malloc'd block, which has not
				464	yet been free'd, but to which no pointer can be found. Such a
				465	block can never be free'd by the program, since no pointer to it
				466	exists. Leak checking is disabled by default
				467	because it tends to generate dozens of error messages.
				468	</li><br><p>
				469
				470	<li><code>--show-reachable=no</code> [default]<br>
				471	<code>--show-reachable=yes</code> <p>When disabled, the memory
				472	leak detector only shows blocks for which it cannot find a
				473	pointer to at all, or it can only find a pointer to the middle
				474	of. These blocks are prime candidates for memory leaks. When
				475	enabled, the leak detector also reports on blocks which it could
				476	find a pointer to. Your program could, at least in principle,
				477	have freed such blocks before exit. Contrast this to blocks for
				478	which no pointer, or only an interior pointer could be found:
				479	they are more likely to indicate memory leaks, because
				480	you do not actually have a pointer to the start of the block
				481	which you can hand to free(), even if you wanted to.
				482	</li><br><p>
				483
				484	<li><code>--leak-resolution=low</code> [default]<br>
				485	<code>--leak-resolution=med</code> <br>
				486	<code>--leak-resolution=high</code>
				487	<p>When doing leak checking, determines how willing Valgrind is
				488	to consider different backtraces the same. When set to
				489	<code>low</code>, the default, only the first two entries need
				490	match. When <code>med</code>, four entries have to match. When
				491	<code>high</code>, all entries need to match.
				492	<p>
				493	For hardcore leak debugging, you probably want to use
				494	<code>--leak-resolution=high</code> together with
				495	<code>--num-callers=40</code> or some such large number. Note
				496	however that this can give an overwhelming amount of
				497	information, which is why the defaults are 4 callers and
				498	low-resolution matching.
				499	<p>
				500	Note that the <code>--leak-resolution=</code> setting does not
				501	affect Valgrind's ability to find leaks. It only changes how
				502	the results are presented to you.
				503	</li><br><p>
				504
				505	<li><code>--workaround-gcc296-bugs=no</code> [default]<br>
				506	<code>--workaround-gcc296-bugs=yes</code> <p>When enabled,
				507	assume that reads and writes some small distance below the stack
				508	pointer <code>%esp</code> are due to bugs in gcc 2.96, and does
				509	not report them. The "small distance" is 256 bytes by default.
				510	Note that gcc 2.96 is the default compiler on some popular Linux
				511	distributions (RedHat 7.X, Mandrake) and so you may well need to
				512	use this flag. Do not use it if you do not have to, as it can
				513	cause real errors to be overlooked. A better option is to use a
				514	gcc/g++ which works properly; 2.95.3 seems to be a good choice.
				515	<p>
				516	Unfortunately (27 Feb 02) it looks like g++ 3.0.4 is similarly
				517	buggy, so you may need to issue this flag if you use 3.0.4.
				518	</li><br><p>
				519
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame^]	520	<li><code>--cachesim=no</code> [default]<br>
				521	<code>--cachesim=yes</code>
				522	<p>When enabled, turns off memory checking, and turns on cache profiling.
				523	Cache profiling is described in detail in <a href="#cache">Section 7</a>.
				524	</li><p>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	525	</ul>
				526
				527	There are also some options for debugging Valgrind itself. You
				528	shouldn't need to use them in the normal run of things. Nevertheless:
				529
				530	<ul>
				531
				532	<li><code>--single-step=no</code> [default]<br>
				533	<code>--single-step=yes</code>
				534	<p>When enabled, each x86 insn is translated seperately into
				535	instrumented code. When disabled, translation is done on a
				536	per-basic-block basis, giving much better translations.</li><br>
				537	<p>
				538
				539	<li><code>--optimise=no</code><br>
				540	<code>--optimise=yes</code> [default]
				541	<p>When enabled, various improvements are applied to the
				542	intermediate code, mainly aimed at allowing the simulated CPU's
				543	registers to be cached in the real CPU's registers over several
				544	simulated instructions.</li><br>
				545	<p>
				546
				547	<li><code>--instrument=no</code><br>
				548	<code>--instrument=yes</code> [default]
				549	<p>When disabled, the translations don't actually contain any
				550	instrumentation.</li><br>
				551	<p>
				552
				553	<li><code>--cleanup=no</code><br>
				554	<code>--cleanup=yes</code> [default]
				555	<p>When enabled, various improvments are applied to the
				556	post-instrumented intermediate code, aimed at removing redundant
				557	value checks.</li><br>
				558	<p>
				559
				560	<li><code>--trace-syscalls=no</code> [default]<br>
				561	<code>--trace-syscalls=yes</code>
				562	<p>Enable/disable tracing of system call intercepts.</li><br>
				563	<p>
				564
				565	<li><code>--trace-signals=no</code> [default]<br>
				566	<code>--trace-signals=yes</code>
				567	<p>Enable/disable tracing of signal handling.</li><br>
				568	<p>
				569
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	570	<li><code>--trace-sched=no</code> [default]<br>
				571	<code>--trace-sched=yes</code>
				572	<p>Enable/disable tracing of thread scheduling events.</li><br>
				573	<p>
				574
sewardj	45b4b37	2002-04-16 22:50:32 +0000	[diff] [blame]	575	<li><code>--trace-pthread=none</code> [default]<br>
				576	<code>--trace-pthread=some</code> <br>
				577	<code>--trace-pthread=all</code>
				578	<p>Specifies amount of trace detail for pthread-related events.</li><br>
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	579	<p>
				580
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	581	<li><code>--trace-symtab=no</code> [default]<br>
				582	<code>--trace-symtab=yes</code>
				583	<p>Enable/disable tracing of symbol table reading.</li><br>
				584	<p>
				585
				586	<li><code>--trace-malloc=no</code> [default]<br>
				587	<code>--trace-malloc=yes</code>
				588	<p>Enable/disable tracing of malloc/free (et al) intercepts.
				589	</li><br>
				590	<p>
				591
				592	<li><code>--stop-after=<number></code>
				593	[default: infinity, more or less]
				594	<p>After <number> basic blocks have been executed, shut down
				595	Valgrind and switch back to running the client on the real CPU.
				596	</li><br>
				597	<p>
				598
				599	<li><code>--dump-error=<number></code>
				600	[default: inactive]
				601	<p>After the program has exited, show gory details of the
				602	translation of the basic block containing the <number>'th
				603	error context. When used with <code>--single-step=yes</code>,
				604	can show the
				605	exact x86 instruction causing an error.</li><br>
				606	<p>
				607
				608	<li><code>--smc-check=none</code><br>
				609	<code>--smc-check=some</code> [default]<br>
				610	<code>--smc-check=all</code>
				611	<p>How carefully should Valgrind check for self-modifying code
				612	writes, so that translations can be discarded?  When
				613	"none", no writes are checked. When "some", only writes
				614	resulting from moves from integer registers to memory are
				615	checked. When "all", all memory writes are checked, even those
				616	with which are no sane program would generate code -- for
				617	example, floating-point writes.</li>
				618	</ul>
				619
				620
				621	<a name="errormsgs">
				622	<h3>2.6  Explaination of error messages</h3>
				623
				624	Despite considerable sophistication under the hood, Valgrind can only
				625	really detect two kinds of errors, use of illegal addresses, and use
				626	of undefined values. Nevertheless, this is enough to help you
				627	discover all sorts of memory-management nasties in your code. This
				628	section presents a quick summary of what error messages mean. The
				629	precise behaviour of the error-checking machinery is described in
				630	<a href="#machine">Section 4</a>.
				631
				632
				633	<h4>2.6.1  Illegal read / Illegal write errors</h4>
				634	For example:
				635	<pre>
				636	==30975== Invalid read of size 4
				637	==30975== at 0x40F6BBCC: (within /usr/lib/libpng.so.2.1.0.9)
				638	==30975== by 0x40F6B804: (within /usr/lib/libpng.so.2.1.0.9)
				639	==30975== by 0x40B07FF4: read_png_image__FP8QImageIO (kernel/qpngio.cpp:326)
				640	==30975== by 0x40AC751B: QImageIO::read() (kernel/qimage.cpp:3621)
				641	==30975== Address 0xBFFFF0E0 is not stack'd, malloc'd or free'd
				642	</pre>
				643
				644	<p>This happens when your program reads or writes memory at a place
				645	which Valgrind reckons it shouldn't. In this example, the program did
				646	a 4-byte read at address 0xBFFFF0E0, somewhere within the
				647	system-supplied library libpng.so.2.1.0.9, which was called from
				648	somewhere else in the same library, called from line 326 of
				649	qpngio.cpp, and so on.
				650
				651	<p>Valgrind tries to establish what the illegal address might relate
				652	to, since that's often useful. So, if it points into a block of
				653	memory which has already been freed, you'll be informed of this, and
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	654	also where the block was free'd at. Likewise, if it should turn out
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	655	to be just off the end of a malloc'd block, a common result of
				656	off-by-one-errors in array subscripting, you'll be informed of this
				657	fact, and also where the block was malloc'd.
				658
				659	<p>In this example, Valgrind can't identify the address. Actually the
				660	address is on the stack, but, for some reason, this is not a valid
				661	stack address -- it is below the stack pointer, %esp, and that isn't
				662	allowed.
				663
				664	<p>Note that Valgrind only tells you that your program is about to
				665	access memory at an illegal address. It can't stop the access from
				666	happening. So, if your program makes an access which normally would
				667	result in a segmentation fault, you program will still suffer the same
				668	fate -- but you will get a message from Valgrind immediately prior to
				669	this. In this particular example, reading junk on the stack is
				670	non-fatal, and the program stays alive.
				671
				672
				673	<h4>2.6.2  Use of uninitialised values</h4>
				674	For example:
				675	<pre>
sewardj	a7dc795	2002-03-24 11:29:13 +0000	[diff] [blame]	676	==19146== Conditional jump or move depends on uninitialised value(s)
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	677	==19146== at 0x402DFA94: _IO_vfprintf (_itoa.h:49)
				678	==19146== by 0x402E8476: _IO_printf (printf.c:36)
				679	==19146== by 0x8048472: main (tests/manuel1.c:8)
				680	==19146== by 0x402A6E5E: __libc_start_main (libc-start.c:129)
				681	</pre>
				682
				683	<p>An uninitialised-value use error is reported when your program uses
				684	a value which hasn't been initialised -- in other words, is undefined.
				685	Here, the undefined value is used somewhere inside the printf()
				686	machinery of the C library. This error was reported when running the
				687	following small program:
				688	<pre>
				689	int main()
				690	{
				691	int x;
				692	printf ("x = %d\n", x);
				693	}
				694	</pre>
				695
				696	<p>It is important to understand that your program can copy around
				697	junk (uninitialised) data to its heart's content. Valgrind observes
				698	this and keeps track of the data, but does not complain. A complaint
				699	is issued only when your program attempts to make use of uninitialised
				700	data. In this example, x is uninitialised. Valgrind observes the
				701	value being passed to _IO_printf and thence to
				702	_IO_vfprintf, but makes no comment. However,
				703	_IO_vfprintf has to examine the value of x
				704	so it can turn it into the corresponding ASCII string, and it is at
				705	this point that Valgrind complains.
				706
				707	<p>Sources of uninitialised data tend to be:
				708	<ul>
				709	<li>Local variables in procedures which have not been initialised,
				710	as in the example above.</li><br><p>
				711
				712	<li>The contents of malloc'd blocks, before you write something
				713	there. In C++, the new operator is a wrapper round malloc, so
				714	if you create an object with new, its fields will be
				715	uninitialised until you fill them in, which is only Right and
				716	Proper.</li>
				717	</ul>
				718
				719
				720
				721	<h4>2.6.3  Illegal frees</h4>
				722	For example:
				723	<pre>
				724	==7593== Invalid free()
				725	==7593== at 0x4004FFDF: free (ut_clientmalloc.c:577)
				726	==7593== by 0x80484C7: main (tests/doublefree.c:10)
				727	==7593== by 0x402A6E5E: __libc_start_main (libc-start.c:129)
				728	==7593== by 0x80483B1: (within tests/doublefree)
				729	==7593== Address 0x3807F7B4 is 0 bytes inside a block of size 177 free'd
				730	==7593== at 0x4004FFDF: free (ut_clientmalloc.c:577)
				731	==7593== by 0x80484C7: main (tests/doublefree.c:10)
				732	==7593== by 0x402A6E5E: __libc_start_main (libc-start.c:129)
				733	==7593== by 0x80483B1: (within tests/doublefree)
				734	</pre>
				735	<p>Valgrind keeps track of the blocks allocated by your program with
				736	malloc/new, so it can know exactly whether or not the argument to
				737	free/delete is legitimate or not. Here, this test program has
				738	freed the same block twice. As with the illegal read/write errors,
				739	Valgrind attempts to make sense of the address free'd. If, as
				740	here, the address is one which has previously been freed, you wil
				741	be told that -- making duplicate frees of the same block easy to spot.
				742
				743
				744	<h4>2.6.4  Passing system call parameters with inadequate
				745	read/write permissions</h4>
				746
				747	Valgrind checks all parameters to system calls. If a system call
				748	needs to read from a buffer provided by your program, Valgrind checks
				749	that the entire buffer is addressible and has valid data, ie, it is
				750	readable. And if the system call needs to write to a user-supplied
				751	buffer, Valgrind checks that the buffer is addressible. After the
				752	system call, Valgrind updates its administrative information to
				753	precisely reflect any changes in memory permissions caused by the
				754	system call.
				755
				756	<p>Here's an example of a system call with an invalid parameter:
				757	<pre>
				758	#include <stdlib.h>
				759	#include <unistd.h>
				760	int main( void )
				761	{
				762	char* arr = malloc(10);
				763	(void) write( 1 /* stdout */, arr, 10 );
				764	return 0;
				765	}
				766	</pre>
				767
				768	<p>You get this complaint ...
				769	<pre>
				770	==8230== Syscall param write(buf) lacks read permissions
				771	==8230== at 0x4035E072: __libc_write
				772	==8230== by 0x402A6E5E: __libc_start_main (libc-start.c:129)
				773	==8230== by 0x80483B1: (within tests/badwrite)
				774	==8230== by <bogus frame pointer> ???
				775	==8230== Address 0x3807E6D0 is 0 bytes inside a block of size 10 alloc'd
				776	==8230== at 0x4004FEE6: malloc (ut_clientmalloc.c:539)
				777	==8230== by 0x80484A0: main (tests/badwrite.c:6)
				778	==8230== by 0x402A6E5E: __libc_start_main (libc-start.c:129)
				779	==8230== by 0x80483B1: (within tests/badwrite)
				780	</pre>
				781
				782	<p>... because the program has tried to write uninitialised junk from
				783	the malloc'd block to the standard output.
				784
				785
				786	<h4>2.6.5  Warning messages you might see</h4>
				787
				788	Most of these only appear if you run in verbose mode (enabled by
				789	<code>-v</code>):
				790	<ul>
				791	<li> <code>More than 50 errors detected. Subsequent errors
				792	will still be recorded, but in less detail than before.</code>
				793	<br>
				794	After 50 different errors have been shown, Valgrind becomes
				795	more conservative about collecting them. It then requires only
				796	the program counters in the top two stack frames to match when
				797	deciding whether or not two errors are really the same one.
				798	Prior to this point, the PCs in the top four frames are required
				799	to match. This hack has the effect of slowing down the
				800	appearance of new errors after the first 50. The 50 constant can
				801	be changed by recompiling Valgrind.
				802	<p>
				803	<li> <code>More than 500 errors detected. I'm not reporting any more.
				804	Final error counts may be inaccurate. Go fix your
				805	program!</code>
				806	<br>
				807	After 500 different errors have been detected, Valgrind ignores
				808	any more. It seems unlikely that collecting even more different
				809	ones would be of practical help to anybody, and it avoids the
				810	danger that Valgrind spends more and more of its time comparing
				811	new errors against an ever-growing collection. As above, the 500
				812	number is a compile-time constant.
				813	<p>
				814	<li> <code>Warning: client exiting by calling exit(<number>).
				815	Bye!</code>
				816	<br>
				817	Your program has called the <code>exit</code> system call, which
				818	will immediately terminate the process. You'll get no exit-time
				819	error summaries or leak checks. Note that this is not the same
				820	as your program calling the ANSI C function <code>exit()</code>
				821	-- that causes a normal, controlled shutdown of Valgrind.
				822	<p>
				823	<li> <code>Warning: client switching stacks?</code>
				824	<br>
				825	Valgrind spotted such a large change in the stack pointer, %esp,
				826	that it guesses the client is switching to a different stack.
				827	At this point it makes a kludgey guess where the base of the new
				828	stack is, and sets memory permissions accordingly. You may get
				829	many bogus error messages following this, if Valgrind guesses
				830	wrong. At the moment "large change" is defined as a change of
				831	more that 2000000 in the value of the %esp (stack pointer)
				832	register.
				833	<p>
				834	<li> <code>Warning: client attempted to close Valgrind's logfile fd <number>
				835	</code>
				836	<br>
				837	Valgrind doesn't allow the client
				838	to close the logfile, because you'd never see any diagnostic
				839	information after that point. If you see this message,
				840	you may want to use the <code>--logfile-fd=<number></code>
				841	option to specify a different logfile file-descriptor number.
				842	<p>
				843	<li> <code>Warning: noted but unhandled ioctl <number></code>
				844	<br>
				845	Valgrind observed a call to one of the vast family of
				846	<code>ioctl</code> system calls, but did not modify its
				847	memory status info (because I have not yet got round to it).
				848	The call will still have gone through, but you may get spurious
				849	errors after this as a result of the non-update of the memory info.
				850	<p>
				851	<li> <code>Warning: unblocking signal <number> due to
				852	sigprocmask</code>
				853	<br>
				854	Really just a diagnostic from the signal simulation machinery.
				855	This message will appear if your program handles a signal by
				856	first <code>longjmp</code>ing out of the signal handler,
				857	and then unblocking the signal with <code>sigprocmask</code>
				858	-- a standard signal-handling idiom.
				859	<p>
				860	<li> <code>Warning: bad signal number <number> in __NR_sigaction.</code>
				861	<br>
				862	Probably indicates a bug in the signal simulation machinery.
				863	<p>
				864	<li> <code>Warning: set address range perms: large range <number></code>
				865	<br>
				866	Diagnostic message, mostly for my benefit, to do with memory
				867	permissions.
				868	</ul>
				869
				870
				871	<a name="suppfiles"></a>
				872	<h3>2.7  Writing suppressions files</h3>
				873
				874	A suppression file describes a bunch of errors which, for one reason
				875	or another, you don't want Valgrind to tell you about. Usually the
				876	reason is that the system libraries are buggy but unfixable, at least
				877	within the scope of the current debugging session. Multiple
				878	suppresions files are allowed. By default, Valgrind uses
				879	<code>linux24.supp</code> in the directory where it is installed.
				880
				881	<p>
				882	You can ask to add suppressions from another file, by specifying
				883	<code>--suppressions=/path/to/file.supp</code>.
				884
				885	<p>Each suppression has the following components:<br>
				886	<ul>
				887
				888	<li>Its name. This merely gives a handy name to the suppression, by
				889	which it is referred to in the summary of used suppressions
				890	printed out when a program finishes. It's not important what
				891	the name is; any identifying string will do.
				892	<p>
				893
				894	<li>The nature of the error to suppress. Either:
				895	<code>Value1</code>,
				896	<code>Value2</code>,
sewardj	a7dc795	2002-03-24 11:29:13 +0000	[diff] [blame]	897	<code>Value4</code> or
				898	<code>Value8</code>,
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	899	meaning an uninitialised-value error when
sewardj	a7dc795	2002-03-24 11:29:13 +0000	[diff] [blame]	900	using a value of 1, 2, 4 or 8 bytes.
				901	Or
				902	<code>Cond</code> (or its old name, <code>Value0</code>),
				903	meaning use of an uninitialised CPU condition code. Or:
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	904	<code>Addr1</code>,
				905	<code>Addr2</code>,
				906	<code>Addr4</code> or
				907	<code>Addr8</code>, meaning an invalid address during a
				908	memory access of 1, 2, 4 or 8 bytes respectively. Or
				909	<code>Param</code>,
				910	meaning an invalid system call parameter error. Or
				911	<code>Free</code>, meaning an invalid or mismatching free.</li><br>
				912	<p>
				913
				914	<li>The "immediate location" specification. For Value and Addr
				915	errors, is either the name of the function in which the error
				916	occurred, or, failing that, the full path the the .so file
				917	containing the error location. For Param errors, is the name of
				918	the offending system call parameter. For Free errors, is the
				919	name of the function doing the freeing (eg, <code>free</code>,
				920	<code>__builtin_vec_delete</code>, etc)</li><br>
				921	<p>
				922
				923	<li>The caller of the above "immediate location". Again, either a
				924	function or shared-object name.</li><br>
				925	<p>
				926
				927	<li>Optionally, one or two extra calling-function or object names,
				928	for greater precision.</li>
				929	</ul>
				930
				931	<p>
				932	Locations may be either names of shared objects or wildcards matching
				933	function names. They begin <code>obj:</code> and <code>fun:</code>
				934	respectively. Function and object names to match against may use the
				935	wildcard characters <code>*</code> and <code>?</code>.
				936
				937	A suppression only suppresses an error when the error matches all the
				938	details in the suppression. Here's an example:
				939	<pre>
				940	{
				941	__gconv_transform_ascii_internal/__mbrtowc/mbtowc
				942	Value4
				943	fun:__gconv_transform_ascii_internal
				944	fun:__mbr*toc
				945	fun:mbtowc
				946	}
				947	</pre>
				948
				949	<p>What is means is: suppress a use-of-uninitialised-value error, when
				950	the data size is 4, when it occurs in the function
				951	<code>__gconv_transform_ascii_internal</code>, when that is called
				952	from any function of name matching <code>__mbr*toc</code>,
				953	when that is called from
				954	<code>mbtowc</code>. It doesn't apply under any other circumstances.
				955	The string by which this suppression is identified to the user is
				956	__gconv_transform_ascii_internal/__mbrtowc/mbtowc.
				957
				958	<p>Another example:
				959	<pre>
				960	{
				961	libX11.so.6.2/libX11.so.6.2/libXaw.so.7.0
				962	Value4
				963	obj:/usr/X11R6/lib/libX11.so.6.2
				964	obj:/usr/X11R6/lib/libX11.so.6.2
				965	obj:/usr/X11R6/lib/libXaw.so.7.0
				966	}
				967	</pre>
				968
				969	<p>Suppress any size 4 uninitialised-value error which occurs anywhere
				970	in <code>libX11.so.6.2</code>, when called from anywhere in the same
				971	library, when called from anywhere in <code>libXaw.so.7.0</code>. The
				972	inexact specification of locations is regrettable, but is about all
				973	you can hope for, given that the X11 libraries shipped with Red Hat
				974	7.2 have had their symbol tables removed.
				975
				976	<p>Note -- since the above two examples did not make it clear -- that
				977	you can freely mix the <code>obj:</code> and <code>fun:</code>
				978	styles of description within a single suppression record.
				979
				980
				981	<a name="install"></a>
				982	<h3>2.8  Building and installing</h3>
				983	At the moment, very rudimentary.
				984
				985	<p>The tarball is set up for a standard Red Hat 7.1 (6.2) machine. To
				986	build, just do "make". No configure script, no autoconf, no nothing.
				987
				988	<p>The files needed for installation are: valgrind.so, valgring.so,
				989	valgrind, VERSION, redhat72.supp (or redhat62.supp). You can copy
				990	these to any directory you like. However, you then need to edit the
				991	shell script "valgrind". On line 4, set the environment variable
				992	<code>VALGRIND</code> to point to the directory you have copied the
				993	installation into.
				994
				995
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	996	<a name="install"></a>
				997	<h3>2.9  The Client Request mechanism</h3>
				998
				999	Valgrind has a trapdoor mechanism via which the client program can
				1000	pass all manner of requests and queries to Valgrind. Internally, this
				1001	is used extensively to make malloc, free, signals, etc, work, although
				1002	you don't see that.
				1003	<p>
				1004	For your convenience, a subset of these so-called client requests is
				1005	provided to allow you to tell Valgrind facts about the behaviour of
				1006	your program, and conversely to make queries. In particular, your
				1007	program can tell Valgrind about changes in memory range permissions
				1008	that Valgrind would not otherwise know about, and so allows clients to
				1009	get Valgrind to do arbitrary custom checks.
				1010	<p>
				1011	Clients need to include the header file <code>valgrind.h</code> to
				1012	make this work. The macros therein have the magical property that
				1013	they generate code in-line which Valgrind can spot. However, the code
				1014	does nothing when not run on Valgrind, so you are not forced to run
				1015	your program on Valgrind just because you use the macros in this file.
				1016	<p>
				1017	A brief description of the available macros:
				1018	<ul>
				1019	<li><code>VALGRIND_MAKE_NOACCESS</code>,
				1020	<code>VALGRIND_MAKE_WRITABLE</code> and
				1021	<code>VALGRIND_MAKE_READABLE</code>. These mark address
				1022	ranges as completely inaccessible, accessible but containing
				1023	undefined data, and accessible and containing defined data,
				1024	respectively. Subsequent errors may have their faulting
				1025	addresses described in terms of these blocks. Returns a
				1026	"block handle". Returns zero when not run on Valgrind.
				1027	<p>
				1028	<li><code>VALGRIND_DISCARD</code>: At some point you may want
				1029	Valgrind to stop reporting errors in terms of the blocks
				1030	defined by the previous three macros. To do this, the above
				1031	macros return a small-integer "block handle". You can pass
				1032	this block handle to <code>VALGRIND_DISCARD</code>. After
				1033	doing so, Valgrind will no longer be able to relate
				1034	addressing errors to the user-defined block associated with
				1035	the handle. The permissions settings associated with the
				1036	handle remain in place; this just affects how errors are
				1037	reported, not whether they are reported. Returns 1 for an
				1038	invalid handle and 0 for a valid handle (although passing
				1039	invalid handles is harmless). Always returns 0 when not run
				1040	on Valgrind.
				1041	<p>
				1042	<li><code>VALGRIND_CHECK_NOACCESS</code>,
				1043	<code>VALGRIND_CHECK_WRITABLE</code> and
				1044	<code>VALGRIND_CHECK_READABLE</code>: check immediately
				1045	whether or not the given address range has the relevant
				1046	property, and if not, print an error message. Also, for the
				1047	convenience of the client, returns zero if the relevant
				1048	property holds; otherwise, the returned value is the address
				1049	of the first byte for which the property is not true.
				1050	Always returns 0 when not run on Valgrind.
				1051	<p>
				1052	<li><code>VALGRIND_CHECK_NOACCESS</code>: a quick and easy way
				1053	to find out whether Valgrind thinks a particular variable
				1054	(lvalue, to be precise) is addressible and defined. Prints
				1055	an error message if not. Returns no value.
				1056	<p>
				1057	<li><code>VALGRIND_MAKE_NOACCESS_STACK</code>: a highly
				1058	experimental feature. Similarly to
				1059	<code>VALGRIND_MAKE_NOACCESS</code>, this marks an address
				1060	range as inaccessible, so that subsequent accesses to an
				1061	address in the range gives an error. However, this macro
				1062	does not return a block handle. Instead, all annotations
				1063	created like this are reviewed at each client
				1064	<code>ret</code> (subroutine return) instruction, and those
				1065	which now define an address range block the client's stack
				1066	pointer register (<code>%esp</code>) are automatically
				1067	deleted.
				1068	<p>
				1069	In other words, this macro allows the client to tell
				1070	Valgrind about red-zones on its own stack. Valgrind
				1071	automatically discards this information when the stack
				1072	retreats past such blocks. Beware: hacky and flaky, and
				1073	probably interacts badly with the new pthread support.
				1074	</ul>
				1075	</li>
				1076	<p>
				1077
				1078
				1079
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1080	<a name="problems"></a>
sewardj	c7529c3	2002-04-16 01:55:18 +0000	[diff] [blame]	1081	<h3>2.10  If you have problems</h3>
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	1082	Mail me (<a href="mailto:jseward@acm.org">jseward@acm.org</a>).
				1083
				1084	<p>See <a href="#limits">Section 4</a> for the known limitations of
				1085	Valgrind, and for a list of programs which are known not to work on
				1086	it.
				1087
				1088	<p>The translator/instrumentor has a lot of assertions in it. They
				1089	are permanently enabled, and I have no plans to disable them. If one
				1090	of these breaks, please mail me!
				1091
				1092	<p>If you get an assertion failure on the expression
				1093	<code>chunkSane(ch)</code> in <code>vg_free()</code> in
				1094	<code>vg_malloc.c</code>, this may have happened because your program
				1095	wrote off the end of a malloc'd block, or before its beginning.
				1096	Valgrind should have emitted a proper message to that effect before
				1097	dying in this way. This is a known problem which I should fix.
				1098	<p>
				1099
				1100	<hr width="100%">
				1101
				1102	<a name="machine"></a>
				1103	<h2>3  Details of the checking machinery</h2>
				1104
				1105	Read this section if you want to know, in detail, exactly what and how
				1106	Valgrind is checking.
				1107
				1108	<a name="vvalue"></a>
				1109	<h3>3.1  Valid-value (V) bits</h3>
				1110
				1111	It is simplest to think of Valgrind implementing a synthetic Intel x86
				1112	CPU which is identical to a real CPU, except for one crucial detail.
				1113	Every bit (literally) of data processed, stored and handled by the
				1114	real CPU has, in the synthetic CPU, an associated "valid-value" bit,
				1115	which says whether or not the accompanying bit has a legitimate value.
				1116	In the discussions which follow, this bit is referred to as the V
				1117	(valid-value) bit.
				1118
				1119	<p>Each byte in the system therefore has a 8 V bits which accompanies
				1120	it wherever it goes. For example, when the CPU loads a word-size item
				1121	(4 bytes) from memory, it also loads the corresponding 32 V bits from
				1122	a bitmap which stores the V bits for the process' entire address
				1123	space. If the CPU should later write the whole or some part of that
				1124	value to memory at a different address, the relevant V bits will be
				1125	stored back in the V-bit bitmap.
				1126
				1127	<p>In short, each bit in the system has an associated V bit, which
				1128	follows it around everywhere, even inside the CPU. Yes, the CPU's
				1129	(integer) registers have their own V bit vectors.
				1130
				1131	<p>Copying values around does not cause Valgrind to check for, or
				1132	report on, errors. However, when a value is used in a way which might
				1133	conceivably affect the outcome of your program's computation, the
				1134	associated V bits are immediately checked. If any of these indicate
				1135	that the value is undefined, an error is reported.
				1136
				1137	<p>Here's an (admittedly nonsensical) example:
				1138	<pre>
				1139	int i, j;
				1140	int a[10], b[10];
				1141	for (i = 0; i < 10; i++) {
				1142	j = a[i];
				1143	b[i] = j;
				1144	}
				1145	</pre>
				1146
				1147	<p>Valgrind emits no complaints about this, since it merely copies
				1148	uninitialised values from <code>a[]</code> into <code>b[]</code>, and
				1149	doesn't use them in any way. However, if the loop is changed to
				1150	<pre>
				1151	for (i = 0; i < 10; i++) {
				1152	j += a[i];
				1153	}
				1154	if (j == 77)
				1155	printf("hello there\n");
				1156	</pre>
				1157	then Valgrind will complain, at the <code>if</code>, that the
				1158	condition depends on uninitialised values.
				1159
				1160	<p>Most low level operations, such as adds, cause Valgrind to
				1161	use the V bits for the operands to calculate the V bits for the
				1162	result. Even if the result is partially or wholly undefined,
				1163	it does not complain.
				1164
				1165	<p>Checks on definedness only occur in two places: when a value is
				1166	used to generate a memory address, and where control flow decision
				1167	needs to be made. Also, when a system call is detected, valgrind
				1168	checks definedness of parameters as required.
				1169
				1170	<p>If a check should detect undefinedness, and error message is
				1171	issued. The resulting value is subsequently regarded as well-defined.
				1172	To do otherwise would give long chains of error messages. In effect,
				1173	we say that undefined values are non-infectious.
				1174
				1175	<p>This sounds overcomplicated. Why not just check all reads from
				1176	memory, and complain if an undefined value is loaded into a CPU register?
				1177	Well, that doesn't work well, because perfectly legitimate C programs routinely
				1178	copy uninitialised values around in memory, and we don't want endless complaints
				1179	about that. Here's the canonical example. Consider a struct
				1180	like this:
				1181	<pre>
				1182	struct S { int x; char c; };
				1183	struct S s1, s2;
				1184	s1.x = 42;
				1185	s1.c = 'z';
				1186	s2 = s1;
				1187	</pre>
				1188
				1189	<p>The question to ask is: how large is <code>struct S</code>, in
				1190	bytes? An int is 4 bytes and a char one byte, so perhaps a struct S
				1191	occupies 5 bytes? Wrong. All (non-toy) compilers I know of will
				1192	round the size of <code>struct S</code> up to a whole number of words,
				1193	in this case 8 bytes. Not doing this forces compilers to generate
				1194	truly appalling code for subscripting arrays of <code>struct
				1195	S</code>'s.
				1196
				1197	<p>So s1 occupies 8 bytes, yet only 5 of them will be initialised.
				1198	For the assignment <code>s2 = s1</code>, gcc generates code to copy
				1199	all 8 bytes wholesale into <code>s2</code> without regard for their
				1200	meaning. If Valgrind simply checked values as they came out of
				1201	memory, it would yelp every time a structure assignment like this
				1202	happened. So the more complicated semantics described above is
				1203	necessary. This allows gcc to copy <code>s1</code> into
				1204	<code>s2</code> any way it likes, and a warning will only be emitted
				1205	if the uninitialised values are later used.
				1206
				1207	<p>One final twist to this story. The above scheme allows garbage to
				1208	pass through the CPU's integer registers without complaint. It does
				1209	this by giving the integer registers V tags, passing these around in
				1210	the expected way. This complicated and computationally expensive to
				1211	do, but is necessary. Valgrind is more simplistic about
				1212	floating-point loads and stores. In particular, V bits for data read
				1213	as a result of floating-point loads are checked at the load
				1214	instruction. So if your program uses the floating-point registers to
				1215	do memory-to-memory copies, you will get complaints about
				1216	uninitialised values. Fortunately, I have not yet encountered a
				1217	program which (ab)uses the floating-point registers in this way.
				1218
				1219	<a name="vaddress"></a>
				1220	<h3>3.2  Valid-address (A) bits</h3>
				1221
				1222	Notice that the previous section describes how the validity of values
				1223	is established and maintained without having to say whether the
				1224	program does or does not have the right to access any particular
				1225	memory location. We now consider the latter issue.
				1226
				1227	<p>As described above, every bit in memory or in the CPU has an
				1228	associated valid-value (V) bit. In addition, all bytes in memory, but
				1229	not in the CPU, have an associated valid-address (A) bit. This
				1230	indicates whether or not the program can legitimately read or write
				1231	that location. It does not give any indication of the validity or the
				1232	data at that location -- that's the job of the V bits -- only whether
				1233	or not the location may be accessed.
				1234
				1235	<p>Every time your program reads or writes memory, Valgrind checks the
				1236	A bits associated with the address. If any of them indicate an
				1237	invalid address, an error is emitted. Note that the reads and writes
				1238	themselves do not change the A bits, only consult them.
				1239
				1240	<p>So how do the A bits get set/cleared? Like this:
				1241
				1242	<ul>
				1243	<li>When the program starts, all the global data areas are marked as
				1244	accessible.</li><br>
				1245	<p>
				1246
				1247	<li>When the program does malloc/new, the A bits for the exactly the
				1248	area allocated, and not a byte more, are marked as accessible.
				1249	Upon freeing the area the A bits are changed to indicate
				1250	inaccessibility.</li><br>
				1251	<p>
				1252
				1253	<li>When the stack pointer register (%esp) moves up or down, A bits
				1254	are set. The rule is that the area from %esp up to the base of
				1255	the stack is marked as accessible, and below %esp is
				1256	inaccessible. (If that sounds illogical, bear in mind that the
				1257	stack grows down, not up, on almost all Unix systems, including
				1258	GNU/Linux.) Tracking %esp like this has the useful side-effect
				1259	that the section of stack used by a function for local variables
				1260	etc is automatically marked accessible on function entry and
				1261	inaccessible on exit.</li><br>
				1262	<p>
				1263
				1264	<li>When doing system calls, A bits are changed appropriately. For
				1265	example, mmap() magically makes files appear in the process's
				1266	address space, so the A bits must be updated if mmap()
				1267	succeeds.</li><br>
				1268	</ul>
				1269
				1270
				1271	<a name="together"></a>
				1272	<h3>3.3  Putting it all together</h3>
				1273	Valgrind's checking machinery can be summarised as follows:
				1274
				1275	<ul>
				1276	<li>Each byte in memory has 8 associated V (valid-value) bits,
				1277	saying whether or not the byte has a defined value, and a single
				1278	A (valid-address) bit, saying whether or not the program
				1279	currently has the right to read/write that address.</li><br>
				1280	<p>
				1281
				1282	<li>When memory is read or written, the relevant A bits are
				1283	consulted. If they indicate an invalid address, Valgrind emits
				1284	an Invalid read or Invalid write error.</li><br>
				1285	<p>
				1286
				1287	<li>When memory is read into the CPU's integer registers, the
				1288	relevant V bits are fetched from memory and stored in the
				1289	simulated CPU. They are not consulted.</li><br>
				1290	<p>
				1291
				1292	<li>When an integer register is written out to memory, the V bits
				1293	for that register are written back to memory too.</li><br>
				1294	<p>
				1295
				1296	<li>When memory is read into the CPU's floating point registers, the
				1297	relevant V bits are read from memory and they are immediately
				1298	checked. If any are invalid, an uninitialised value error is
				1299	emitted. This precludes using the floating-point registers to
				1300	copy possibly-uninitialised memory, but simplifies Valgrind in
				1301	that it does not have to track the validity status of the
				1302	floating-point registers.</li><br>
				1303	<p>
				1304
				1305	<li>As a result, when a floating-point register is written to
				1306	memory, the associated V bits are set to indicate a valid
				1307	value.</li><br>
				1308	<p>
				1309
				1310	<li>When values in integer CPU registers are used to generate a
				1311	memory address, or to determine the outcome of a conditional
				1312	branch, the V bits for those values are checked, and an error
				1313	emitted if any of them are undefined.</li><br>
				1314	<p>
				1315
				1316	<li>When values in integer CPU registers are used for any other
				1317	purpose, Valgrind computes the V bits for the result, but does
				1318	not check them.</li><br>
				1319	<p>
				1320
				1321	<li>One the V bits for a value in the CPU have been checked, they
				1322	are then set to indicate validity. This avoids long chains of
				1323	errors.</li><br>
				1324	<p>
				1325
				1326	<li>When values are loaded from memory, valgrind checks the A bits
				1327	for that location and issues an illegal-address warning if
				1328	needed. In that case, the V bits loaded are forced to indicate
				1329	Valid, despite the location being invalid.
				1330	<p>
				1331	This apparently strange choice reduces the amount of confusing
				1332	information presented to the user. It avoids the
				1333	unpleasant phenomenon in which memory is read from a place which
				1334	is both unaddressible and contains invalid values, and, as a
				1335	result, you get not only an invalid-address (read/write) error,
				1336	but also a potentially large set of uninitialised-value errors,
				1337	one for every time the value is used.
				1338	<p>
				1339	There is a hazy boundary case to do with multi-byte loads from
				1340	addresses which are partially valid and partially invalid. See
				1341	details of the flag <code>--partial-loads-ok</code> for details.
				1342	</li><br>
				1343	</ul>
				1344
				1345	Valgrind intercepts calls to malloc, calloc, realloc, valloc,
				1346	memalign, free, new and delete. The behaviour you get is:
				1347
				1348	<ul>
				1349
				1350	<li>malloc/new: the returned memory is marked as addressible but not
				1351	having valid values. This means you have to write on it before
				1352	you can read it.</li><br>
				1353	<p>
				1354
				1355	<li>calloc: returned memory is marked both addressible and valid,
				1356	since calloc() clears the area to zero.</li><br>
				1357	<p>
				1358
				1359	<li>realloc: if the new size is larger than the old, the new section
				1360	is addressible but invalid, as with malloc.</li><br>
				1361	<p>
				1362
				1363	<li>If the new size is smaller, the dropped-off section is marked as
				1364	unaddressible. You may only pass to realloc a pointer
				1365	previously issued to you by malloc/calloc/new/realloc.</li><br>
				1366	<p>
				1367
				1368	<li>free/delete: you may only pass to free a pointer previously
				1369	issued to you by malloc/calloc/new/realloc, or the value
				1370	NULL. Otherwise, Valgrind complains. If the pointer is indeed
				1371	valid, Valgrind marks the entire area it points at as
				1372	unaddressible, and places the block in the freed-blocks-queue.
				1373	The aim is to defer as long as possible reallocation of this
				1374	block. Until that happens, all attempts to access it will
				1375	elicit an invalid-address error, as you would hope.</li><br>
				1376	</ul>
				1377
				1378
				1379
				1380	<a name="signals"></a>
				1381	<h3>3.4  Signals</h3>
				1382
				1383	Valgrind provides suitable handling of signals, so, provided you stick
				1384	to POSIX stuff, you should be ok. Basic sigaction() and sigprocmask()
				1385	are handled. Signal handlers may return in the normal way or do
				1386	longjmp(); both should work ok. As specified by POSIX, a signal is
				1387	blocked in its own handler. Default actions for signals should work
				1388	as before. Etc, etc.
				1389
				1390	<p>Under the hood, dealing with signals is a real pain, and Valgrind's
				1391	simulation leaves much to be desired. If your program does
				1392	way-strange stuff with signals, bad things may happen. If so, let me
				1393	know. I don't promise to fix it, but I'd at least like to be aware of
				1394	it.
				1395
				1396
				1397	<a name="leaks"><a/>
				1398	<h3>3.5  Memory leak detection</h3>
				1399
				1400	Valgrind keeps track of all memory blocks issued in response to calls
				1401	to malloc/calloc/realloc/new. So when the program exits, it knows
				1402	which blocks are still outstanding -- have not been returned, in other
				1403	words. Ideally, you want your program to have no blocks still in use
				1404	at exit. But many programs do.
				1405
				1406	<p>For each such block, Valgrind scans the entire address space of the
				1407	process, looking for pointers to the block. One of three situations
				1408	may result:
				1409
				1410	<ul>
				1411	<li>A pointer to the start of the block is found. This usually
				1412	indicates programming sloppiness; since the block is still
				1413	pointed at, the programmer could, at least in principle, free'd
				1414	it before program exit.</li><br>
				1415	<p>
				1416
				1417	<li>A pointer to the interior of the block is found. The pointer
				1418	might originally have pointed to the start and have been moved
				1419	along, or it might be entirely unrelated. Valgrind deems such a
				1420	block as "dubious", that is, possibly leaked,
				1421	because it's unclear whether or
				1422	not a pointer to it still exists.</li><br>
				1423	<p>
				1424
				1425	<li>The worst outcome is that no pointer to the block can be found.
				1426	The block is classified as "leaked", because the
				1427	programmer could not possibly have free'd it at program exit,
				1428	since no pointer to it exists. This might be a symptom of
				1429	having lost the pointer at some earlier point in the
				1430	program.</li>
				1431	</ul>
				1432
				1433	Valgrind reports summaries about leaked and dubious blocks.
				1434	For each such block, it will also tell you where the block was
				1435	allocated. This should help you figure out why the pointer to it has
				1436	been lost. In general, you should attempt to ensure your programs do
				1437	not have any leaked or dubious blocks at exit.
				1438
				1439	<p>The precise area of memory in which Valgrind searches for pointers
				1440	is: all naturally-aligned 4-byte words for which all A bits indicate
				1441	addressibility and all V bits indicated that the stored value is
				1442	actually valid.
				1443
				1444	<p><hr width="100%">
				1445
				1446
				1447	<a name="limits"></a>
				1448	<h2>4  Limitations</h2>
				1449
				1450	The following list of limitations seems depressingly long. However,
				1451	most programs actually work fine.
				1452
				1453	<p>Valgrind will run x86-GNU/Linux ELF dynamically linked binaries, on
				1454	a kernel 2.4.X system, subject to the following constraints:
				1455
				1456	<ul>
				1457	<li>No MMX, SSE, SSE2, 3DNow instructions. If the translator
				1458	encounters these, Valgrind will simply give up. It may be
				1459	possible to add support for them at a later time. Intel added a
				1460	few instructions such as "cmov" to the integer instruction set
				1461	on Pentium and later processors, and these are supported.
				1462	Nevertheless it's safest to think of Valgrind as implementing
				1463	the 486 instruction set.</li><br>
				1464	<p>
				1465
				1466	<li>Multithreaded programs are not supported, since I haven't yet
				1467	figured out how to do this. To be more specific, it is the
				1468	"clone" system call which is not supported. A program calls
				1469	"clone" to create threads. Valgrind will abort if this
				1470	happens.</li><nr>
				1471	<p>
				1472
				1473	<li>Valgrind assumes that the floating point registers are not used
				1474	as intermediaries in memory-to-memory copies, so it immediately
				1475	checks V bits in floating-point loads/stores. If you want to
				1476	write code which copies around possibly-uninitialised values,
				1477	you must ensure these travel through the integer registers, not
				1478	the FPU.</li><br>
				1479	<p>
				1480
				1481	<li>If your program does its own memory management, rather than
				1482	using malloc/new/free/delete, it should still work, but
				1483	Valgrind's error checking won't be so effective.</li><br>
				1484	<p>
				1485
				1486	<li>Valgrind's signal simulation is not as robust as it could be.
				1487	Basic POSIX-compliant sigaction and sigprocmask functionality is
				1488	supplied, but it's conceivable that things could go badly awry
				1489	if you do wierd things with signals. Workaround: don't.
				1490	Programs that do non-POSIX signal tricks are in any case
				1491	inherently unportable, so should be avoided if
				1492	possible.</li><br>
				1493	<p>
				1494
				1495	<li>I have no idea what happens if programs try to handle signals on
				1496	an alternate stack (sigaltstack). YMMV.</li><br>
				1497	<p>
				1498
				1499	<li>Programs which switch stacks are not well handled. Valgrind
				1500	does have support for this, but I don't have great faith in it.
				1501	It's difficult -- there's no cast-iron way to decide whether a
				1502	large change in %esp is as a result of the program switching
				1503	stacks, or merely allocating a large object temporarily on the
				1504	current stack -- yet Valgrind needs to handle the two situations
				1505	differently.</li><br>
				1506	<p>
				1507
				1508	<li>x86 instructions, and system calls, have been implemented on
				1509	demand. So it's possible, although unlikely, that a program
				1510	will fall over with a message to that effect. If this happens,
				1511	please mail me ALL the details printed out, so I can try and
				1512	implement the missing feature.</li><br>
				1513	<p>
				1514
				1515	<li>x86 floating point works correctly, but floating-point code may
				1516	run even more slowly than integer code, due to my simplistic
				1517	approach to FPU emulation.</li><br>
				1518	<p>
				1519
				1520	<li>You can't Valgrind-ize statically linked binaries. Valgrind
				1521	relies on the dynamic-link mechanism to gain control at
				1522	startup.</li><br>
				1523	<p>
				1524
				1525	<li>Memory consumption of your program is majorly increased whilst
				1526	running under Valgrind. This is due to the large amount of
				1527	adminstrative information maintained behind the scenes. Another
				1528	cause is that Valgrind dynamically translates the original
				1529	executable and never throws any translation away, except in
				1530	those rare cases where self-modifying code is detected.
				1531	Translated, instrumented code is 8-12 times larger than the
				1532	original (!) so you can easily end up with 15+ MB of
				1533	translations when running (eg) a web browser. There's not a lot
				1534	you can do about this -- use Valgrind on a fast machine with a lot
				1535	of memory and swap space. At some point I may implement a LRU
				1536	caching scheme for translations, so as to bound the maximum
				1537	amount of memory devoted to them, to say 8 or 16 MB.</li>
				1538	</ul>
				1539
				1540
				1541	Programs which are known not to work are:
				1542
				1543	<ul>
				1544	<li>Netscape 4.76 works pretty well on some platforms -- quite
				1545	nicely on my AMD K6-III (400 MHz). I can surf, do mail, etc, no
				1546	problem. On other platforms is has been observed to crash
				1547	during startup. Despite much investigation I can't figure out
				1548	why.</li><br>
				1549	<p>
				1550
				1551	<li>kpackage (a KDE front end to rpm) dies because the CPUID
				1552	instruction is unimplemented. Easy to fix.</li><br>
				1553	<p>
				1554
				1555	<li>knode (a KDE newsreader) tries to do multithreaded things, and
				1556	fails.</li><br>
				1557	<p>
				1558
				1559	<li>emacs starts up but immediately concludes it is out of memory
				1560	and aborts. Emacs has it's own memory-management scheme, but I
				1561	don't understand why this should interact so badly with
				1562	Valgrind.</li><br>
				1563	<p>
				1564
				1565	<li>Gimp and Gnome and GTK-based apps die early on because
				1566	of unimplemented system call wrappers. (I'm a KDE user :)
				1567	This wouldn't be hard to fix.
				1568	</li><br>
				1569	<p>
				1570
				1571	<li>As a consequence of me being a KDE user, almost all KDE apps
				1572	work ok -- except those which are multithreaded.
				1573	</li><br>
				1574	<p>
				1575	</ul>
				1576
				1577
				1578	<p><hr width="100%">
				1579
				1580
				1581	<a name="howitworks"></a>
				1582	<h2>5  How it works -- a rough overview</h2>
				1583	Some gory details, for those with a passion for gory details. You
				1584	don't need to read this section if all you want to do is use Valgrind.
				1585
				1586	<a name="startb"></a>
				1587	<h3>5.1  Getting started</h3>
				1588
				1589	Valgrind is compiled into a shared object, valgrind.so. The shell
				1590	script valgrind sets the LD_PRELOAD environment variable to point to
				1591	valgrind.so. This causes the .so to be loaded as an extra library to
				1592	any subsequently executed dynamically-linked ELF binary, viz, the
				1593	program you want to debug.
				1594
				1595	<p>The dynamic linker allows each .so in the process image to have an
				1596	initialisation function which is run before main(). It also allows
				1597	each .so to have a finalisation function run after main() exits.
				1598
				1599	<p>When valgrind.so's initialisation function is called by the dynamic
				1600	linker, the synthetic CPU to starts up. The real CPU remains locked
				1601	in valgrind.so for the entire rest of the program, but the synthetic
				1602	CPU returns from the initialisation function. Startup of the program
				1603	now continues as usual -- the dynamic linker calls all the other .so's
				1604	initialisation routines, and eventually runs main(). This all runs on
				1605	the synthetic CPU, not the real one, but the client program cannot
				1606	tell the difference.
				1607
				1608	<p>Eventually main() exits, so the synthetic CPU calls valgrind.so's
				1609	finalisation function. Valgrind detects this, and uses it as its cue
				1610	to exit. It prints summaries of all errors detected, possibly checks
				1611	for memory leaks, and then exits the finalisation routine, but now on
				1612	the real CPU. The synthetic CPU has now lost control -- permanently
				1613	-- so the program exits back to the OS on the real CPU, just as it
				1614	would have done anyway.
				1615
				1616	<p>On entry, Valgrind switches stacks, so it runs on its own stack.
				1617	On exit, it switches back. This means that the client program
				1618	continues to run on its own stack, so we can switch back and forth
				1619	between running it on the simulated and real CPUs without difficulty.
				1620	This was an important design decision, because it makes it easy (well,
				1621	significantly less difficult) to debug the synthetic CPU.
				1622
				1623
				1624	<a name="engine"></a>
				1625	<h3>5.2  The translation/instrumentation engine</h3>
				1626
				1627	Valgrind does not directly run any of the original program's code. Only
				1628	instrumented translations are run. Valgrind maintains a translation
				1629	table, which allows it to find the translation quickly for any branch
				1630	target (code address). If no translation has yet been made, the
				1631	translator - a just-in-time translator - is summoned. This makes an
				1632	instrumented translation, which is added to the collection of
				1633	translations. Subsequent jumps to that address will use this
				1634	translation.
				1635
				1636	<p>Valgrind can optionally check writes made by the application, to
				1637	see if they are writing an address contained within code which has
				1638	been translated. Such a write invalidates translations of code
				1639	bracketing the written address. Valgrind will discard the relevant
				1640	translations, which causes them to be re-made, if they are needed
				1641	again, reflecting the new updated data stored there. In this way,
				1642	self modifying code is supported. In practice I have not found any
				1643	Linux applications which use self-modifying-code.
				1644
				1645	<p>The JITter translates basic blocks -- blocks of straight-line-code
				1646	-- as single entities. To minimise the considerable difficulties of
				1647	dealing with the x86 instruction set, x86 instructions are first
				1648	translated to a RISC-like intermediate code, similar to sparc code,
				1649	but with an infinite number of virtual integer registers. Initially
				1650	each insn is translated seperately, and there is no attempt at
				1651	instrumentation.
				1652
				1653	<p>The intermediate code is improved, mostly so as to try and cache
				1654	the simulated machine's registers in the real machine's registers over
				1655	several simulated instructions. This is often very effective. Also,
				1656	we try to remove redundant updates of the simulated machines's
				1657	condition-code register.
				1658
				1659	<p>The intermediate code is then instrumented, giving more
				1660	intermediate code. There are a few extra intermediate-code operations
				1661	to support instrumentation; it is all refreshingly simple. After
				1662	instrumentation there is a cleanup pass to remove redundant value
				1663	checks.
				1664
				1665	<p>This gives instrumented intermediate code which mentions arbitrary
				1666	numbers of virtual registers. A linear-scan register allocator is
				1667	used to assign real registers and possibly generate spill code. All
				1668	of this is still phrased in terms of the intermediate code. This
				1669	machinery is inspired by the work of Reuben Thomas (MITE).
				1670
				1671	<p>Then, and only then, is the final x86 code emitted. The
				1672	intermediate code is carefully designed so that x86 code can be
				1673	generated from it without need for spare registers or other
				1674	inconveniences.
				1675
				1676	<p>The translations are managed using a traditional LRU-based caching
				1677	scheme. The translation cache has a default size of about 14MB.
				1678
				1679	<a name="track"></a>
				1680
				1681	<h3>5.3  Tracking the status of memory</h3> Each byte in the
				1682	process' address space has nine bits associated with it: one A bit and
				1683	eight V bits. The A and V bits for each byte are stored using a
				1684	sparse array, which flexibly and efficiently covers arbitrary parts of
				1685	the 32-bit address space without imposing significant space or
				1686	performance overheads for the parts of the address space never
				1687	visited. The scheme used, and speedup hacks, are described in detail
				1688	at the top of the source file vg_memory.c, so you should read that for
				1689	the gory details.
				1690
				1691	<a name="sys_calls"></a>
				1692
				1693	<h3>5.4 System calls</h3>
				1694	All system calls are intercepted. The memory status map is consulted
				1695	before and updated after each call. It's all rather tiresome. See
				1696	vg_syscall_mem.c for details.
				1697
				1698	<a name="sys_signals"></a>
				1699
				1700	<h3>5.5  Signals</h3>
				1701	All system calls to sigaction() and sigprocmask() are intercepted. If
				1702	the client program is trying to set a signal handler, Valgrind makes a
				1703	note of the handler address and which signal it is for. Valgrind then
				1704	arranges for the same signal to be delivered to its own handler.
				1705
				1706	<p>When such a signal arrives, Valgrind's own handler catches it, and
				1707	notes the fact. At a convenient safe point in execution, Valgrind
				1708	builds a signal delivery frame on the client's stack and runs its
				1709	handler. If the handler longjmp()s, there is nothing more to be said.
				1710	If the handler returns, Valgrind notices this, zaps the delivery
				1711	frame, and carries on where it left off before delivering the signal.
				1712
				1713	<p>The purpose of this nonsense is that setting signal handlers
				1714	essentially amounts to giving callback addresses to the Linux kernel.
				1715	We can't allow this to happen, because if it did, signal handlers
				1716	would run on the real CPU, not the simulated one. This means the
				1717	checking machinery would not operate during the handler run, and,
				1718	worse, memory permissions maps would not be updated, which could cause
				1719	spurious error reports once the handler had returned.
				1720
				1721	<p>An even worse thing would happen if the signal handler longjmp'd
				1722	rather than returned: Valgrind would completely lose control of the
				1723	client program.
				1724
				1725	<p>Upshot: we can't allow the client to install signal handlers
				1726	directly. Instead, Valgrind must catch, on behalf of the client, any
				1727	signal the client asks to catch, and must delivery it to the client on
				1728	the simulated CPU, not the real one. This involves considerable
				1729	gruesome fakery; see vg_signals.c for details.
				1730	<p>
				1731
				1732	<hr width="100%">
				1733
				1734	<a name="example"></a>
				1735	<h2>6  Example</h2>
				1736	This is the log for a run of a small program. The program is in fact
				1737	correct, and the reported error is as the result of a potentially serious
				1738	code generation bug in GNU g++ (snapshot 20010527).
				1739	<pre>
				1740	sewardj@phoenix:~/newmat10$
				1741	~/Valgrind-6/valgrind -v ./bogon
				1742	==25832== Valgrind 0.10, a memory error detector for x86 RedHat 7.1.
				1743	==25832== Copyright (C) 2000-2001, and GNU GPL'd, by Julian Seward.
				1744	==25832== Startup, with flags:
				1745	==25832== --suppressions=/home/sewardj/Valgrind/redhat71.supp
				1746	==25832== reading syms from /lib/ld-linux.so.2
				1747	==25832== reading syms from /lib/libc.so.6
				1748	==25832== reading syms from /mnt/pima/jrs/Inst/lib/libgcc_s.so.0
				1749	==25832== reading syms from /lib/libm.so.6
				1750	==25832== reading syms from /mnt/pima/jrs/Inst/lib/libstdc++.so.3
				1751	==25832== reading syms from /home/sewardj/Valgrind/valgrind.so
				1752	==25832== reading syms from /proc/self/exe
				1753	==25832== loaded 5950 symbols, 142333 line number locations
				1754	==25832==
				1755	==25832== Invalid read of size 4
				1756	==25832== at 0x8048724: _ZN10BandMatrix6ReSizeEiii (bogon.cpp:45)
				1757	==25832== by 0x80487AF: main (bogon.cpp:66)
				1758	==25832== by 0x40371E5E: __libc_start_main (libc-start.c:129)
				1759	==25832== by 0x80485D1: (within /home/sewardj/newmat10/bogon)
				1760	==25832== Address 0xBFFFF74C is not stack'd, malloc'd or free'd
				1761	==25832==
				1762	==25832== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
				1763	==25832== malloc/free: in use at exit: 0 bytes in 0 blocks.
				1764	==25832== malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
				1765	==25832== For a detailed leak analysis, rerun with: --leak-check=yes
				1766	==25832==
				1767	==25832== exiting, did 1881 basic blocks, 0 misses.
				1768	==25832== 223 translations, 3626 bytes in, 56801 bytes out.
				1769	</pre>
				1770	<p>The GCC folks fixed this about a week before gcc-3.0 shipped.
				1771	<hr width="100%">
				1772	<p>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame^]	1773
				1774
				1775
				1776	<a name="cache"></a>
				1777	<h2>7  Cache profiling</h2>
				1778	As well as memory debugging, Valgrind also allows you to do cache simulations
				1779	and annotate your source line-by-line with the number of cache misses. In
				1780	particular, it records:
				1781	<ul>
				1782	<li>L1 instruction cache reads and misses;
				1783	<li>L1 data cache reads and read misses, writes and write misses;
				1784	<li>L2 unified cache reads and read misses, writes and writes misses.
				1785	</ul>
				1786	On a modern x86 machine, an L1 miss will typically cost around 10 cycles,
				1787	and an L2 miss can cost as much as 200 cycles. Detailed cache profiling can be
				1788	very useful for improving the performance of your program.
				1789
				1790	Please note that this is an experimental feature. Any feedback, bug-fixes,
				1791	suggestions, etc, welcome.
				1792
				1793
				1794	<h3>7.1  Overview</h3>
				1795	First off, as for normal Valgrind use, you probably want to turn on debugging
				1796	info (the <code>-g</code> flag). But by contrast with normal Valgrind use, you
				1797	probably <b>do</b> want to turn optimisation on, since you should profile your
				1798	program as it will be normally run.
				1799
				1800	The three steps are:
				1801	<ol>
				1802	<li>Generate a cache simulator for your machine's cache configuration with
				1803	`vg_cachegen' and recompile Valgrind with <code>make install</code>.
				1804	Valgrind comes with a default simulator, but it is unlikely to be correct
				1805	for your system, so you should generate a simulator yourself.</li>
				1806	<li>Run your program with <code>valgrind --cachesim=yes</code> in front of
				1807	the normal command line invocation. When the program finishes, Valgrind
				1808	will print summary cache statistics. It also collects line-by-line
				1809	information in a file <code>cachegrind.out</code>.</li>
				1810	<li>Generate a function-by-function summary, and possibly annotate source
				1811	files with 'vg_annotate'. Source files to annotate can be specified
				1812	manually, or manually on the command line, or "interesting" source files
				1813	can be annotated automatically with the <code>--auto=yes</code> option.
				1814	You can annotate C/C++ files or assembly language files equally
				1815	easily.</li>
				1816	</ol>
				1817
				1818	<a href="#generate">Step 1</a> only needs to be done once, unless you are
				1819	interested in simulating different cache configurations (eg. first
				1820	concentrating on instruction cache misses, then on data cache misses).<p>
				1821
				1822	<a href="#profile">Step 2</a> should be done every time you want to collect
				1823	information about a new program, a changed program, or about the same program
				1824	with different input.<p>
				1825
				1826	<a href="#annotate">Step 3</a> can be performed as many times as you like for
				1827	each Step 2; you may want to do multiple annotations showing different
				1828	information each time.<p>
				1829
				1830	The steps are described in detail in the following sections.<p>
				1831
				1832
				1833	<a name="generate"></a>
				1834	<h3>7.3  Generating a cache simulator</h3>
				1835	Although Valgrind comes with a pre-generated cache simulator, it most likely
				1836	won't match the cache configuration of your machine, so you should generate
				1837	a new simulator.<p>
				1838
				1839	You need to generate three files, one for each of the I1, D1 and L2 caches.
				1840	For each cache, you need to know the:
				1841	<ul>
				1842	<li>Cache size (bytes);
				1843	<li>Line size (bytes);
				1844	<li>Associativity.
				1845	</ul>
				1846
				1847	vg_cachegen takes three options:
				1848	<ul>
				1849	<li><code>--I1=size,line_size,associativity</code>
				1850	<li><code>--D1=size,line_size,associativity</code>
				1851	<li><code>--L2=size,line_size,associativity</code>
				1852	</ul>
				1853
				1854	You can specify one, two or all three caches per invocation of vg_cachegen. It
				1855	checks that the configuration is sensible before generating the simulators; to
				1856	see the allowed values, run <code>vg_cachegen -h</code>.<p>
				1857
				1858	An example invocation would be:
				1859
				1860	<blockquote><code>
				1861	vg_cachegen --I1=65536,64,2 --D1=65536,64,2 --L2=262144,64,8
				1862	</code></blockquote>
				1863
				1864	This simulates a machine with a 128KB split L1 2-way associative cache, and a
				1865	256KB unified 8-way associative L2 cache. Both caches have 64B lines.<p>
				1866
				1867	If you don't know your cache configuration, you'll have to find it out.
				1868	(Ideally vg_cachegen could auto-identify your cache configuration using the
				1869	CPUID instruction, which could be done automatically during installation, and
				1870	this whole step could be skipped...)<p>
				1871
				1872
				1873	<h3>7.4  Cache simulation specifics</h3>
				1874	vg_cachegen only generates simulations for a machine with a split L1 cache and
				1875	a unified L2 cache. This configuration is used for all x86-based machines we
				1876	are aware of.<p>
				1877
				1878	The more specific characteristics of the simulation are as follows.
				1879
				1880	<ul>
				1881	<li>Write-allocate: when a write miss occurs, the block written to is brought
				1882	into the D1 cache. Most modern caches have this property.</li><p>
				1883
				1884	<li>Bit-selection hash function: the line(s) in the cache to which a memory
				1885	block maps is chosen by the middle bits M--(M+N-1) of the byte address,
				1886	where:
				1887	<ul>
				1888	<li> line size = 2^M bytes </li>
				1889	<li>(cache size / line size) = 2^N bytes</li>
				1890	</ul> </li><p>
				1891
				1892	<li>Inclusive L2 cache: the L2 cache replicates all the entries of the L1
				1893	cache. This is standard on Pentium chips, but AMD Athlons use an
				1894	exclusive L2 cache that only holds blocks evicted from L1.</li><p>
				1895	</ul>
				1896
				1897	Other noteworthy behaviour:
				1898
				1899	<ul>
				1900	<li>References that straddle two cache lines are treated as follows:</li>
				1901	<ul>
				1902	<li>If both blocks hit --> counted as one hit</li>
				1903	<li>If one block hits, the other misses --> counted as one miss</li>
				1904	<li>If both blocks miss --> counted as one miss (not two)</li>
				1905	</ul><p>
				1906
				1907	<li>Instructions that modify a memory location (eg. <code>inc</code> and
				1908	<code>dec</code>) are counted as doing just a read, ie. a single data
				1909	reference. This may seem strange, but since the write can never cause a
				1910	miss (the read guarantees the block is in the cache) it's not very
				1911	interesting.<p>
				1912
				1913	Thus it measures not the number of times the data cache is accessed, but
				1914	the number of times a data cache miss could occur.<p>
				1915	</li>
				1916	</ul>
				1917
				1918	If you are interested in simulating a cache with different properties, it is
				1919	not particularly hard to write your own cache simulator, or to modify existing
				1920	ones in <code>vg_cachesim_I1.c</code>, <code>vg_cachesim_I1.c</code> and
				1921	<code>vg_cachesim_I1.c</code>. We'd be interested to hear from anyone who
				1922	does.
				1923
				1924
				1925	<a name="profile"></a>
				1926	<h3>7.5  Profiling programs</h3>
				1927	Cache profiling is enabled by using the <code>--cachesim=yes</code> option to
				1928	Valgrind. This automatically turns off Valgrind's memory checking functions,
				1929	since the cache simulation is slow enough already, and you probably don't want
				1930	to do both at once.<p>
				1931
				1932	To gather cache profiling information about the program <code>ls -l<code, type:
				1933
				1934	<blockquote><code>valgrind --cachesim=yes ls -l</code></blockquote>
				1935
				1936	The program will execute (slowly). Upon completion, summary statistics
				1937	that look like this will be printed:
				1938
				1939	<pre>
				1940	==31751== I refs: 27,742,716
				1941	==31751== I1 misses: 276
				1942	==31751== L2 misses: 275
				1943	==31751== I1 miss rate: 0.0%
				1944	==31751== L2i miss rate: 0.0%
				1945	==31751==
				1946	==31751== D refs: 15,430,290 (10,955,517 rd + 4,474,773 wr)
				1947	==31751== D1 misses: 41,185 ( 21,905 rd + 19,280 wr)
				1948	==31751== L2 misses: 23,085 ( 3,987 rd + 19,098 wr)
				1949	==31751== D1 miss rate: 0.2% ( 0.1% + 0.4%)
				1950	==31751== L2d miss rate: 0.1% ( 0.0% + 0.4%)
				1951	==31751==
				1952	==31751== L2 misses: 23,360 ( 4,262 rd + 19,098 wr)
				1953	==31751== L2 miss rate: 0.0% ( 0.0% + 0.4%)
				1954	</pre>
				1955
				1956	Cache accesses for instruction fetches are summarised first, giving the
				1957	number of fetches made (this is the number of instructions executed, which
				1958	can be useful to know in its own right), the number of I1 misses, and the
				1959	number of L2 instruction (<code>L2i</code>) misses.<p>
				1960
				1961	Cache accesses for data follow. The information is similar to that of the
				1962	instruction fetches, except that the values are also shown split between reads
				1963	and writes (note each row's <code>rd</code> and <code>wr</code> values add up
				1964	to the row's total).<p>
				1965
				1966	Combined instruction and data figures for the L2 cache follow that.<p>
				1967
				1968
				1969	<h3>7.6  Output file</h3>
				1970	As well as printing summary information, Valgrind also writes line-by-line
				1971	cache profiling information to a file named <code>cachegrind.out</code> . This
				1972	file is human-readable, but is best interpreted by the accompanying program
				1973	vg_annotate, described in the next section.<p>
				1974
				1975	Things to note about the <code>cachegrind.out</code> file:
				1976	<ul>
				1977	<li>It is written every time <code>valgrind --cachesim=yes</code> is run; it
				1978	will automatically overwrite any existing <code>cachegrind.out<code/> in
				1979	the current directory.</li>
				1980	<li>It can be quite large: <code>ls -l</code> generates a file of about
				1981	350KB; browsing a few files and web pages with Konqueror generates a file
				1982	of around 10MB.</li>
				1983	</ul>
				1984
				1985
				1986	<a name="annotate"></a>
				1987	<h3>7.7  Annotating C/C++ programs</h3>
				1988	Before using vg_annotate, it is worth widening your window to be at least
				1989	120-characters wide if possible, as the output lines can be quite long.<p>
				1990
				1991	To get a function-by-function summary, run <code>vg_annotate</code> in
				1992	directory containing a <code>cachegrind.out</code> file. The output looks like
				1993	this:
				1994
				1995	<pre>
				1996	--------------------------------------------------------------------------------
				1997	I1 cache: 65536 B, 64 B, 2-way associative
				1998	D1 cache: 65536 B, 64 B, 2-way associative
				1999	L2 cache: 262144 B, 64 B, 8-way associative
				2000	Command: concord vg_to_ucode.c
				2001	Events recorded: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				2002	Events shown: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				2003	Event sort order: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				2004	Threshold: 99%
				2005	Chosen for annotation:
				2006	Auto-annotation: on
				2007
				2008	--------------------------------------------------------------------------------
				2009	Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				2010	--------------------------------------------------------------------------------
				2011	27,742,716 276 275 10,955,517 21,905 3,987 4,474,773 19,280 19,098 PROGRAM TOTALS
				2012
				2013	--------------------------------------------------------------------------------
				2014	Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw file:function
				2015	--------------------------------------------------------------------------------
				2016	8,821,482 5 5 2,242,702 1,621 73 1,794,230 0 0 getc.c:_IO_getc
				2017	5,222,023 4 4 2,276,334 16 12 875,959 1 1 concord.c:get_word
				2018	2,649,248 2 2 1,344,810 7,326 1,385 . . . vg_main.c:strcmp
				2019	2,521,927 2 2 591,215 0 0 179,398 0 0 concord.c:hash
				2020	2,242,740 2 2 1,046,612 568 22 448,548 0 0 ctype.c:tolower
				2021	1,496,937 4 4 630,874 9,000 1,400 279,388 0 0 concord.c:insert
				2022	897,991 51 51 897,831 95 30 62 1 1 ???:???
				2023	598,068 1 1 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__flockfile
				2024	598,068 0 0 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__funlockfile
				2025	598,024 4 4 213,580 35 16 149,506 0 0 vg_clientmalloc.c:malloc
				2026	446,587 1 1 215,973 2,167 430 129,948 14,057 13,957 concord.c:add_existing
				2027	341,760 2 2 128,160 0 0 128,160 0 0 vg_clientmalloc.c:vg_trap_here_WRAPPER
				2028	320,782 4 4 150,711 276 0 56,027 53 53 concord.c:init_hash_table
				2029	298,998 1 1 106,785 0 0 64,071 1 1 concord.c:create
				2030	149,518 0 0 149,516 0 0 1 0 0 ???:tolower@@GLIBC_2.0
				2031	149,518 0 0 149,516 0 0 1 0 0 ???:fgetc@@GLIBC_2.0
				2032	95,983 4 4 38,031 0 0 34,409 3,152 3,150 concord.c:new_word_node
				2033	85,440 0 0 42,720 0 0 21,360 0 0 vg_clientmalloc.c:vg_bogus_epilogue
				2034	</pre>
				2035
				2036	First up is a summary of the annotation options:
				2037
				2038	<ul>
				2039	<li>I1 cache, D1 cache, L2 cache: cache configuration. So you know the
				2040	configuration with which these results were obtained.</li><p>
				2041
				2042	<li>Command: the command line invocation of the program under
				2043	examination.</li><p>
				2044
				2045	<li>Events recorded: event abbreviations are:<p>
				2046	<ul>
				2047	<li><code>Ir </code>: I cache reads (ie. instructions executed)</li>
				2048	<li><code>I1mr</code>: I1 cache read misses</li>
				2049	<li><code>I2mr</code>: L2 cache instruction read misses</li>
				2050	<li><code>Dr </code>: D cache reads (ie. memory reads)</li>
				2051	<li><code>D1mr</code>: D1 cache read misses</li>
				2052	<li><code>D2mr</code>: L2 cache data read misses</li>
				2053	<li><code>Dw </code>: D cache writes (ie. memory writes)</li>
				2054	<li><code>D1mw</code>: D1 cache write misses</li>
				2055	<li><code>D2mw</code>: L2 cache data write misses</li>
				2056	</ul><p>
				2057	Note that D1 total accesses is given by <code>D1mr</code> +
				2058	<code>D1mw</code>, and that L2 total accesses is given by
				2059	<code>I2mr</code> + <code>D2mr</code> + <code>D2mw</code>.</li><p>
				2060
				2061	<li>Events shown: the events shown (a subset of events gathered). This can
				2062	be adjusted with the <code>--show</code> option.</li><p>
				2063
				2064	<li>Event sort order: the sort order in which functions are shown. For
				2065	example, in this case the functions are sorted from highest
				2066	<code>Ir</code> counts to lowest. If two functions have identical
				2067	<code>Ir</code> counts, they will then be sorted by <code>I1mr</code>
				2068	counts, and so on. This order can be adjusted with the
				2069	<code>--sort</code> option.<p>
				2070
				2071	Note that this dictates the order the functions appear. It is <b>not</b>
				2072	the order in which the columns appear; that is dictated by the "events
				2073	shown" line (and can be changed with the <code>--sort</code> option).
				2074	</li><p>
				2075
				2076	<li>Threshold: vg_annotate by default omits functions that cause very low
				2077	numbers of misses to avoid drowing you in information. In this case,
				2078	vg_annotate shows summaries the functions that account for 99% of the
				2079	<code>Ir</code> counts; <code>Ir</code> is chosen as the treshold event
				2080	since it is the primary sort event. The threshold can be adjusted with
				2081	the <code>--threshold</code> option.</li><p>
				2082
				2083	<li>Chosen for annotation: names of files specified manually for annotation;
				2084	in this case none.</li><p>
				2085
				2086	<li>Auto-annotation: whether auto-annotation was requested via the
				2087	<code>--auto=yes</code> option. In this case no.</li><p>
				2088	</ul>
				2089
				2090	Then follows summary statistics for the whole program. These are similar
				2091	to the summary provided when running <code>valgrind --cachesim=yes</code>.<p>
				2092
				2093	Then follows function-by-function statistics. Each function is identified by a
				2094	<code>file_name:function_name</code> pair. If a column contains only a
				2095	`.' it means the function never performs that event (eg. the third row shows
				2096	that <code>strcmp()</code> contains no instructions that write to memory). The
				2097	name <code>???</code> is used if the the file name and/or function name could
				2098	not be determined from debugging information. (If most of the entries have the
				2099	form <code>???:???</code> the program probably wasn't compiled with
				2100	<code>-g</code>.)<p>
				2101
				2102	It is worth noting that functions will come from three types of source files:
				2103	<ol>
				2104	<li> From the profiled program (<code>concord.c</code> in this example).</li>
				2105	<li>From libraries (eg. <code>getc.c</code>)</li>
				2106	<li>From Valgrind's implementation of some libc functions (eg.
				2107	<code>vg_clientmalloc.c:malloc</code>). These are recognisable because
				2108	the filename begins with <code>vg_</code>, and is probably one of
				2109	<code>vg_main.c</code>, <code>vg_clientmalloc.c</code> or
				2110	<code>vg_mylibc.c</code>.
				2111	</li>
				2112	</ol>
				2113
				2114	There are two ways to annotate source files -- by choosing them manually, or
				2115	with the <code>--auto=yes</code> option. To do it manually, just
				2116	specify the filenames as arguments to vg_annotate. For example, the output from
				2117	running <code>vg_annotate concord.c</code> for our example produces the same
				2118	output as above followed by an annotated version of <code>concord.c</code>, a
				2119	section of which looks like:
				2120
				2121	<pre>
				2122	--------------------------------------------------------------------------------
				2123	-- User-annotated source: concord.c
				2124	--------------------------------------------------------------------------------
				2125	Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				2126
				2127	[snip]
				2128
				2129	. . . . . . . . . void init_hash_table(char file_name, Word_Node table[])
				2130	3 1 1 . . . 1 0 0 {
				2131	. . . . . . . . . FILE *file_ptr;
				2132	. . . . . . . . . Word_Info *data;
				2133	1 0 0 . . . 1 1 1 int line = 1, i;
				2134	. . . . . . . . .
				2135	5 0 0 . . . 3 0 0 data = (Word_Info *) create(sizeof(Word_Info));
				2136	. . . . . . . . .
				2137	4,991 0 0 1,995 0 0 998 0 0 for (i = 0; i < TABLE_SIZE; i++)
				2138	3,988 1 1 1,994 0 0 997 53 52 table[i] = NULL;
				2139	. . . . . . . . .
				2140	. . . . . . . . . /* Open file, check it. */
				2141	6 0 0 1 0 0 4 0 0 file_ptr = fopen(file_name, "r");
				2142	2 0 0 1 0 0 . . . if (!(file_ptr)) {
				2143	. . . . . . . . . fprintf(stderr, "Couldn't open '%s'.\n", file_name);
				2144	1 1 1 . . . . . . exit(EXIT_FAILURE);
				2145	. . . . . . . . . }
				2146	. . . . . . . . .
				2147	165,062 1 1 73,360 0 0 91,700 0 0 while ((line = get_word(data, line, file_ptr)) != EOF)
				2148	146,712 0 0 73,356 0 0 73,356 0 0 insert(data->;word, data->line, table);
				2149	. . . . . . . . .
				2150	4 0 0 1 0 0 2 0 0 free(data);
				2151	4 0 0 1 0 0 2 0 0 fclose(file_ptr);
				2152	3 0 0 2 0 0 . . . }
				2153	</pre>
				2154
				2155	(Although column widths are automatically minimised, a wide terminal is clearly
				2156	useful.)<p>
				2157
				2158	Each source file is clearly marked (<code>User-annotated source</code>) as
				2159	having been chosen manually for annotation. If the file was found in one of
				2160	the directories specified with the <code>-I</code>/<code>--include</code>
				2161	option, the directory and file are both given.<p>
				2162
				2163	Each line is annotated with its event counts. Events not applicable for a line
				2164	are represented by a `.'; this is useful for distinguishing between an event
				2165	which cannot happen, and one which can but did not.<p>
				2166
				2167	Sometimes only a small section of a source file is executed. To minimise
				2168	uninteresting output, Valgrind only shows annotated lines and lines within a
				2169	small distance of annotated lines. Gaps are marked with the line numbers so
				2170	you know which part of a file the shown code comes from, eg:
				2171
				2172	<pre>
				2173	(figures and code for line 704)
				2174	-- line 704 ----------------------------------------
				2175	-- line 878 ----------------------------------------
				2176	(figures and code for line 878)
				2177	</pre>
				2178
				2179	The amount of context to show around annotated lines is controlled by the
				2180	<code>--context</code> option.<p>
				2181
				2182	To get automatic annotation, run <code>vg_annotate --auto=yes</code>.
				2183	vg_annotate will automatically annotate every source file it can find that is
				2184	mentioned in the function-by-function summary. Therefore, the files chosen for
				2185	auto-annotation are affected by the <code>--sort</code> and
				2186	<code>--threshold</code> options. Each source file is clearly marked
				2187	(<code>Auto-annotated source</code>) as being chosen automatically. Any files
				2188	that could not be found are mentioned at the end of the output, eg:
				2189
				2190	<pre>
				2191	--------------------------------------------------------------------------------
				2192	The following files chosen for auto-annotation could not be found:
				2193	--------------------------------------------------------------------------------
				2194	getc.c
				2195	ctype.c
				2196	../sysdeps/generic/lockfile.c
				2197	</pre>
				2198
				2199	This is quite common for library files, since libraries are usually compiled
				2200	with debugging information, but the source files are often not present on a
				2201	system. If a file is chosen for annotation <b>both</b> manually and
				2202	automatically, it is marked as <code>User-annotated source</code>.
				2203
				2204	Use the <code>-I/--include</code> option to tell Valgrind where to look for
				2205	source files if the filenames found from the debugging information aren't
				2206	specific enough.
				2207
				2208	Beware that vg_annotate can take some time to digest large
				2209	<code>cachegrind.out</code> files, eg. 30 seconds or more. Also beware that
				2210	auto-annotation can produce a lot of output if your program is large!
				2211
				2212
				2213	<h3>7.8  Annotating assembler programs</h3>
				2214	Valgrind can annotate assembler programs too, or annotate the assembler
				2215	generated for your C program. Sometimes this is useful for understanding what
				2216	is really happening when an interesting line of C code is translated into
				2217	multiple instructions.<p>
				2218
				2219	To do this, you just need to assemble your <code>.s</code> files with
				2220	assembler-level debug information. gcc doesn't do this, but you can use GNU as
				2221	with the <code>--gstabs</code> option to generate object files with this
				2222	information, eg:
				2223
				2224	<blockquote><code>as --gstabs foo.s</code></blockquote>
				2225
				2226	You can then profile and annotate source files in the same way as for C/C++
				2227	programs.
				2228
				2229
				2230	<h3>7.9  vg_annotate options</h3>
				2231	<ul>
				2232	<li><code>-h, --help</code></li><p>
				2233	<li><code>-v, --version</code><p>
				2234
				2235	Help and version, as usual.</li>
				2236
				2237	<li><code>--sort=A,B,C</code> [default: order in
				2238	<code>cachegrind.out</code>]<p>
				2239	Specifies the events upon which the sorting of the function-by-function
				2240	entries will be based. Useful if you want to concentrate on eg. I cache
				2241	misses (<code>--sort=I1mr,I2mr</code>), or D cache misses
				2242	(<code>--sort=D1mr,D2mr</code>), or L2 misses
				2243	(<code>--sort=D2mr,I2mr</code>).</li><p>
				2244
				2245	<li><code>--show=A,B,C</code> [default: all, using order in
				2246	<code>cachegrind.out</code>]<p>
				2247	Specifies which events to show (and the column order). Default is to use
				2248	all present in the <code>cachegrind.out</code> file (and use the order in
				2249	the file).</li><p>
				2250
				2251	<li><code>--threshold=X</code> [default: 99%] <p>
				2252	Sets the threshold for the function-by-function summary. Functions are
				2253	shown that account for more than X% of all the primary sort events. If
				2254	auto-annotating, also affects which files are annotated.</li><p>
				2255
				2256	<li><code>--auto=no</code> [default]<br>
				2257	<code>--auto=yes</code> <p>
				2258	When enabled, automatically annotates every file that is mentioned in the
				2259	function-by-function summary that can be found. Also gives a list of
				2260	those that couldn't be found.
				2261
				2262	<li><code>--context=N</code> [default: 8]<p>
				2263	Print N lines of context before and after each annotated line. Avoids
				2264	printing large sections of source files that were not executed. Use a
				2265	large number (eg. 10,000) to show all source lines.
				2266	</li><p>
				2267
				2268	<li><code>-I=<dir>, --include=<dir></code>
				2269	[default: empty string]<p>
				2270	Adds a directory to the list in which to search for files. Multiple
				2271	-I/--include options can be given to add multiple directories.
				2272	</ul>
				2273
				2274
				2275	<h3>7.10  Warnings</h3>
				2276	There are a couple of situations in which vg_annotate issues warnings.
				2277
				2278	<ul>
				2279	<li>If a source file is more recent than the <code>cachegrind.out</code>
				2280	file. This is because the information in <code>cachegrind.out</code> is
				2281	only recorded with line numbers, so if the line numbers change at all in
				2282	the source (eg. lines added, deleted, swapped), any annotations will be
				2283	incorrect.<p>
				2284
				2285	<li>If information is recorded about line numbers past the end of a file.
				2286	This can be caused by the above problem, ie. shortening the source file
				2287	while using an old <code>cachegrind.out</code> file. If this happens,
				2288	the figures for the bogus lines are printed anyway (clearly marked as
				2289	bogus) in case they are important.</li><p>
				2290	</ul>
				2291
				2292
				2293	<h3>7.10  Things to watch out for</h3>
				2294	Some odd things that can occur during annotation:
				2295
				2296	<ul>
				2297	<li>If annotating at the assembler level, you might see something like this:
				2298
				2299	<pre>
				2300	1 0 0 . . . . . . leal -12(%ebp),%eax
				2301	1 0 0 . . . 1 0 0 movl %eax,84(%ebx)
				2302	2 0 0 0 0 0 1 0 0 movl $1,-20(%ebp)
				2303	. . . . . . . . . .align 4,0x90
				2304	1 0 0 . . . . . . movl $.LnrB,%eax
				2305	1 0 0 . . . 1 0 0 movl %eax,-16(%ebp)
				2306	</pre>
				2307
				2308	How can the third instruction be executed twice when the others are
				2309	executed only once? As it turns out, it isn't. Here's a dump of the
				2310	executable, from objdump:
				2311
				2312	<pre>
				2313	8048f25: 8d 45 f4 lea 0xfffffff4(%ebp),%eax
				2314	8048f28: 89 43 54 mov %eax,0x54(%ebx)
				2315	8048f2b: c7 45 ec 01 00 00 00 movl $0x1,0xffffffec(%ebp)
				2316	8048f32: 89 f6 mov %esi,%esi
				2317	8048f34: b8 08 8b 07 08 mov $0x8078b08,%eax
				2318	8048f39: 89 45 f0 mov %eax,0xfffffff0(%ebp)
				2319	</pre>
				2320
				2321	Notice the extra <code>mov %esi,%esi</code> instruction. Where did this
				2322	come from? The GNU assembler inserted it to serve as the two bytes of
				2323	padding needed to align the <code>movl $.LnrB,%eax</code> instruction on
				2324	a four-byte boundary, but pretended it didn't exist when adding debug
				2325	information. Thus when Valgrind reads the debug info it thinks that the
				2326	<code>movl $0x1,0xffffffec(%ebp)</code> instruction covers the address
				2327	range 0x8048f2b--0x804833 by itself, and attributes the counts for the
				2328	<code>mov %esi,%esi</code> to it.<p>
				2329	</li>
				2330
				2331	<li>
				2332	Inlined functions can cause strange results in the function-by-function
				2333	summary. If a function <code>inline_me()</code> is defined in
				2334	<code>foo.h</code> and inlined in the functions <code>f1()</code>,
				2335	<code>f2()</code> and <code>f3()</code> in <code>bar.c</code>, there will
				2336	not be a <code>foo.h:inline_me()</code> function entry. Instead, there
				2337	will be separate function entries for each inlining site, ie.
				2338	<code>foo.h:f1()</code>, <code>foo.h:f2()</code> and
				2339	<code>foo.h:f3()</code>. To find the total counts for
				2340	<code>foo.h:inline_me()</code>, add up the counts from each entry.<p>
				2341
				2342	The reason for this is that although the debug info output by gcc
				2343	indicates the switch from <code>bar.c</code> to <code>foo.h</code>, it
				2344	doesn't indicate the name of the function in <code>foo.h</code>, so
				2345	Valgrind keeps using the old one.<p>
				2346
				2347	<li>
				2348	Sometimes, the same filename might be represented with a relative name
				2349	and with an absolute name in different parts of the debug info, eg:
				2350	<code>/home/user/proj/proj.h</code> and <code>../proj.h</code>. In this
				2351	case, if you use auto-annotation, the file will be annotated twice with
				2352	the counts split between the two.<p>
				2353	</li>
				2354	</ul>
				2355
				2356	Note: stabs is not an easy format to read. If you come across bizarre
				2357	annotations that look like might be caused by a bug in the stabs reader,
				2358	please let us know.
				2359
				2360
				2361	<h3>7.11  Accuracy</h3>
				2362	Valgrind's cache profiling has a number of shortcomings:
				2363
				2364	<ul>
				2365	<li>It doesn't account for kernel activity -- the effect of system calls on
				2366	the cache contents is ignored.</li><p>
				2367
				2368	<li>It doesn't account for other process activity (although this is probably
				2369	desirable when considering a single program).</li><p>
				2370
				2371	<li>It doesn't account for virtual-to-physical address mappings; hence the
				2372	entire simulation is not a true representation of what's happening in the
				2373	cache.</li><p>
				2374
				2375	<li>It doesn't account for cache misses not visible at the instruction level,
				2376	eg. those arising from TLB misses, or speculative execution.</li><p>
				2377	</ul>
				2378
				2379	Another thing worth nothing is that results are very sensitive. Changing the
				2380	size of the <code>valgrind.so</code> file, the size of the program being
				2381	profiled, or even the length of its name can perturb the results. Variations
				2382	will be small, but don't expect perfectly repeatable results if your program
				2383	changes at all.<p>
				2384
				2385	While these factors mean you shouldn't trust the results to be super-accurate,
				2386	hopefully they should be close enough to be useful.<p>
				2387
				2388
				2389	<h3>7.12  Todo</h3>
				2390	<ul>
				2391	<li>Use CPUID instruction to auto-identify cache configuration during
				2392	installation. This would save the user from having to know their cache
				2393	configuration and using vg_cachegen.</li><p>
				2394	<li>Program start-up/shut-down calls a lot of functions that aren't
				2395	interesting and just complicate the output. Would be nice to exclude
				2396	these somehow.</li><p>
				2397	</ul>
				2398	<hr width="100%">
sewardj	de4a1d0	2002-03-22 01:27:54 +0000	[diff] [blame]	2399	</body>
				2400	</html>
njn	4f9c934	2002-04-29 16:03:24 +0000	[diff] [blame^]	2401