blob: a97c2f9fec2d9ebed2a6f959b80e33f91db6680b [file] [log] [blame]
sewardjde4a1d02002-03-22 01:27:54 +00001<html>
2 <head>
3 <style type="text/css">
4 body { background-color: #ffffff;
5 color: #000000;
6 font-family: Times, Helvetica, Arial;
7 font-size: 14pt}
8 h4 { margin-bottom: 0.3em}
9 code { color: #000000;
10 font-family: Courier;
11 font-size: 13pt }
12 pre { color: #000000;
13 font-family: Courier;
14 font-size: 13pt }
15 a:link { color: #0000C0;
16 text-decoration: none; }
17 a:visited { color: #0000C0;
18 text-decoration: none; }
19 a:active { color: #0000C0;
20 text-decoration: none; }
21 </style>
22 </head>
23
24<body bgcolor="#ffffff">
25
26<a name="title">&nbsp;</a>
sewardja7dc7952002-03-24 11:29:13 +000027<h1 align=center>Valgrind, snapshot 20020324</h1>
sewardjc7529c32002-04-16 01:55:18 +000028<center>This manual was minimally updated on 20020415</center>
29<p>
sewardjde4a1d02002-03-22 01:27:54 +000030
31<center>
32<a href="mailto:jseward@acm.org">jseward@acm.org<br>
sewardjde4a1d02002-03-22 01:27:54 +000033Copyright &copy; 2000-2002 Julian Seward
34<p>
35Valgrind is licensed under the GNU General Public License,
36version 2<br>
37An open-source tool for finding memory-management problems in
38Linux-x86 executables.
39</center>
40
41<p>
42
43<hr width="100%">
44<a name="contents"></a>
45<h2>Contents of this manual</h2>
46
47<h4>1&nbsp; <a href="#intro">Introduction</a></h4>
48 1.1&nbsp; <a href="#whatfor">What Valgrind is for</a><br>
49 1.2&nbsp; <a href="#whatdoes">What it does with your program</a>
50
51<h4>2&nbsp; <a href="#howtouse">How to use it, and how to make sense
52 of the results</a></h4>
53 2.1&nbsp; <a href="#starta">Getting started</a><br>
54 2.2&nbsp; <a href="#comment">The commentary</a><br>
55 2.3&nbsp; <a href="#report">Reporting of errors</a><br>
56 2.4&nbsp; <a href="#suppress">Suppressing errors</a><br>
57 2.5&nbsp; <a href="#flags">Command-line flags</a><br>
58 2.6&nbsp; <a href="#errormsgs">Explaination of error messages</a><br>
59 2.7&nbsp; <a href="#suppfiles">Writing suppressions files</a><br>
60 2.8&nbsp; <a href="#install">Building and installing</a><br>
61 2.9&nbsp; <a href="#problems">If you have problems</a><br>
62
63<h4>3&nbsp; <a href="#machine">Details of the checking machinery</a></h4>
64 3.1&nbsp; <a href="#vvalue">Valid-value (V) bits</a><br>
65 3.2&nbsp; <a href="#vaddress">Valid-address (A)&nbsp;bits</a><br>
66 3.3&nbsp; <a href="#together">Putting it all together</a><br>
67 3.4&nbsp; <a href="#signals">Signals</a><br>
68 3.5&nbsp; <a href="#leaks">Memory leak detection</a><br>
69
70<h4>4&nbsp; <a href="#limits">Limitations</a></h4>
71
72<h4>5&nbsp; <a href="#howitworks">How it works -- a rough overview</a></h4>
73 5.1&nbsp; <a href="#startb">Getting started</a><br>
74 5.2&nbsp; <a href="#engine">The translation/instrumentation engine</a><br>
75 5.3&nbsp; <a href="#track">Tracking the status of memory</a><br>
76 5.4&nbsp; <a href="#sys_calls">System calls</a><br>
77 5.5&nbsp; <a href="#sys_signals">Signals</a><br>
78
79<h4>6&nbsp; <a href="#example">An example</a></h4>
80
njn4f9c9342002-04-29 16:03:24 +000081<h4>7&nbsp; <a href="#cache">Cache profiling</a></h4>
82
83<h4>8&nbsp; <a href="techdocs.html">The design and implementation of Valgrind</a></h4>
sewardjde4a1d02002-03-22 01:27:54 +000084
85<hr width="100%">
86
87<a name="intro"></a>
88<h2>1&nbsp; Introduction</h2>
89
90<a name="whatfor"></a>
91<h3>1.1&nbsp; What Valgrind is for</h3>
92
93Valgrind is a tool to help you find memory-management problems in your
94programs. When a program is run under Valgrind's supervision, all
95reads and writes of memory are checked, and calls to
96malloc/new/free/delete are intercepted. As a result, Valgrind can
97detect problems such as:
98<ul>
99 <li>Use of uninitialised memory</li>
100 <li>Reading/writing memory after it has been free'd</li>
101 <li>Reading/writing off the end of malloc'd blocks</li>
102 <li>Reading/writing inappropriate areas on the stack</li>
103 <li>Memory leaks -- where pointers to malloc'd blocks are lost forever</li>
104</ul>
105
106Problems like these can be difficult to find by other means, often
107lying undetected for long periods, then causing occasional,
108difficult-to-diagnose crashes.
109
110<p>
111Valgrind is closely tied to details of the CPU, operating system and
112to a less extent, compiler and basic C libraries. This makes it
113difficult to make it portable, so I have chosen at the outset to
114concentrate on what I believe to be a widely used platform: Red Hat
115Linux 7.2, on x86s. I believe that it will work without significant
116difficulty on other x86 GNU/Linux systems which use the 2.4 kernel and
117GNU libc 2.2.X, for example SuSE 7.1 and Mandrake 8.0. Red Hat 6.2 is
118also supported. It has worked in the past, and probably still does,
119on RedHat 7.1 and 6.2. Note that I haven't compiled it on RedHat 7.1
120and 6.2 for a while, so they may no longer work now.
121<p>
122(Early Feb 02: after feedback from the KDE people it also works better
123on other Linuxes).
124<p>
125At some point in the past, Valgrind has also worked on Red Hat 6.2
126(x86), thanks to the efforts of Rob Noble.
127
128<p>
129Valgrind is licensed under the GNU General Public License, version
1302. Read the file LICENSE in the source distribution for details.
131
132<a name="whatdoes">
133<h3>1.2&nbsp; What it does with your program</h3>
134
135Valgrind is designed to be as non-intrusive as possible. It works
136directly with existing executables. You don't need to recompile,
137relink, or otherwise modify, the program to be checked. Simply place
138the word <code>valgrind</code> at the start of the command line
139normally used to run the program. So, for example, if you want to run
140the command <code>ls -l</code> on Valgrind, simply issue the
141command: <code>valgrind ls -l</code>.
142
143<p>Valgrind takes control of your program before it starts. Debugging
144information is read from the executable and associated libraries, so
145that error messages can be phrased in terms of source code
146locations. Your program is then run on a synthetic x86 CPU which
147checks every memory access. All detected errors are written to a
148log. When the program finishes, Valgrind searches for and reports on
149leaked memory.
150
151<p>You can run pretty much any dynamically linked ELF x86 executable using
152Valgrind. Programs run 25 to 50 times slower, and take a lot more
153memory, than they usually would. It works well enough to run large
154programs. For example, the Konqueror web browser from the KDE Desktop
155Environment, version 2.1.1, runs slowly but usably on Valgrind.
156
157<p>Valgrind simulates every single instruction your program executes.
158Because of this, it finds errors not only in your application but also
159in all supporting dynamically-linked (.so-format) libraries, including
160the GNU C library, the X client libraries, Qt, if you work with KDE, and
161so on. That often includes libraries, for example the GNU C library,
162which contain memory access violations, but which you cannot or do not
163want to fix.
164
165<p>Rather than swamping you with errors in which you are not
166interested, Valgrind allows you to selectively suppress errors, by
167recording them in a suppressions file which is read when Valgrind
168starts up. As supplied, Valgrind comes with a suppressions file
169designed to give reasonable behaviour on Red Hat 7.2 (also 7.1 and
1706.2) when running text-only and simple X applications.
171
172<p><a href="#example">Section 6</a> shows an example of use.
173<p>
174<hr width="100%">
175
176<a name="howtouse"></a>
177<h2>2&nbsp; How to use it, and how to make sense of the results</h2>
178
179<a name="starta"></a>
180<h3>2.1&nbsp; Getting started</h3>
181
182First off, consider whether it might be beneficial to recompile your
183application and supporting libraries with optimisation disabled and
184debugging info enabled (the <code>-g</code> flag). You don't have to
185do this, but doing so helps Valgrind produce more accurate and less
186confusing error reports. Chances are you're set up like this already,
187if you intended to debug your program with GNU gdb, or some other
188debugger.
189
190<p>Then just run your application, but place the word
191<code>valgrind</code> in front of your usual command-line invokation.
192Note that you should run the real (machine-code) executable here. If
193your application is started by, for example, a shell or perl script,
194you'll need to modify it to invoke Valgrind on the real executables.
195Running such scripts directly under Valgrind will result in you
196getting error reports pertaining to <code>/bin/sh</code>,
197<code>/usr/bin/perl</code>, or whatever interpreter you're using.
198This almost certainly isn't what you want and can be hugely confusing.
199
200<a name="comment"></a>
201<h3>2.2&nbsp; The commentary</h3>
202
203Valgrind writes a commentary, detailing error reports and other
204significant events. The commentary goes to standard output by
205default. This may interfere with your program, so you can ask for it
206to be directed elsewhere.
207
208<p>All lines in the commentary are of the following form:<br>
209<pre>
210 ==12345== some-message-from-Valgrind
211</pre>
212<p>The <code>12345</code> is the process ID. This scheme makes it easy
213to distinguish program output from Valgrind commentary, and also easy
214to differentiate commentaries from different processes which have
215become merged together, for whatever reason.
216
217<p>By default, Valgrind writes only essential messages to the commentary,
218so as to avoid flooding you with information of secondary importance.
219If you want more information about what is happening, re-run, passing
220the <code>-v</code> flag to Valgrind.
221
222
223<a name="report"></a>
224<h3>2.3&nbsp; Reporting of errors</h3>
225
226When Valgrind detects something bad happening in the program, an error
227message is written to the commentary. For example:<br>
228<pre>
229 ==25832== Invalid read of size 4
230 ==25832== at 0x8048724: BandMatrix::ReSize(int, int, int) (bogon.cpp:45)
231 ==25832== by 0x80487AF: main (bogon.cpp:66)
232 ==25832== by 0x40371E5E: __libc_start_main (libc-start.c:129)
233 ==25832== by 0x80485D1: (within /home/sewardj/newmat10/bogon)
234 ==25832== Address 0xBFFFF74C is not stack'd, malloc'd or free'd
235</pre>
236
237<p>This message says that the program did an illegal 4-byte read of
238address 0xBFFFF74C, which, as far as it can tell, is not a valid stack
239address, nor corresponds to any currently malloc'd or free'd blocks.
240The read is happening at line 45 of <code>bogon.cpp</code>, called
241from line 66 of the same file, etc. For errors associated with an
242identified malloc'd/free'd block, for example reading free'd memory,
243Valgrind reports not only the location where the error happened, but
244also where the associated block was malloc'd/free'd.
245
246<p>Valgrind remembers all error reports. When an error is detected,
247it is compared against old reports, to see if it is a duplicate. If
248so, the error is noted, but no further commentary is emitted. This
249avoids you being swamped with bazillions of duplicate error reports.
250
251<p>If you want to know how many times each error occurred, run with
252the <code>-v</code> option. When execution finishes, all the reports
253are printed out, along with, and sorted by, their occurrence counts.
254This makes it easy to see which errors have occurred most frequently.
255
256<p>Errors are reported before the associated operation actually
257happens. For example, if you program decides to read from address
258zero, Valgrind will emit a message to this effect, and the program
259will then duly die with a segmentation fault.
260
261<p>In general, you should try and fix errors in the order that they
262are reported. Not doing so can be confusing. For example, a program
263which copies uninitialised values to several memory locations, and
264later uses them, will generate several error messages. The first such
265error message may well give the most direct clue to the root cause of
266the problem.
267
268<a name="suppress"></a>
269<h3>2.4&nbsp; Suppressing errors</h3>
270
271Valgrind detects numerous problems in the base libraries, such as the
272GNU C library, and the XFree86 client libraries, which come
273pre-installed on your GNU/Linux system. You can't easily fix these,
274but you don't want to see these errors (and yes, there are many!) So
275Valgrind reads a list of errors to suppress at startup. By default
276this file is <code>redhat72.supp</code>, located in the Valgrind
277installation directory.
278
279<p>You can modify and add to the suppressions file at your leisure, or
280write your own. Multiple suppression files are allowed. This is
281useful if part of your project contains errors you can't or don't want
282to fix, yet you don't want to continuously be reminded of them.
283
284<p>Each error to be suppressed is described very specifically, to
285minimise the possibility that a suppression-directive inadvertantly
286suppresses a bunch of similar errors which you did want to see. The
287suppression mechanism is designed to allow precise yet flexible
288specification of errors to suppress.
289
290<p>If you use the <code>-v</code> flag, at the end of execution, Valgrind
291prints out one line for each used suppression, giving its name and the
292number of times it got used. Here's the suppressions used by a run of
293<code>ls -l</code>:
294<pre>
295 --27579-- supp: 1 socketcall.connect(serv_addr)/__libc_connect/__nscd_getgrgid_r
296 --27579-- supp: 1 socketcall.connect(serv_addr)/__libc_connect/__nscd_getpwuid_r
297 --27579-- supp: 6 strrchr/_dl_map_object_from_fd/_dl_map_object
298</pre>
299
300<a name="flags"></a>
301<h3>2.5&nbsp; Command-line flags</h3>
302
303You invoke Valgrind like this:
304<pre>
305 valgrind [options-for-Valgrind] your-prog [options for your-prog]
306</pre>
307
308<p>Valgrind's default settings succeed in giving reasonable behaviour
309in most cases. Available options, in no particular order, are as
310follows:
311<ul>
312 <li><code>--help</code></li><br>
313
314 <li><code>--version</code><br>
315 <p>The usual deal.</li><br><p>
316
317 <li><code>-v --verbose</code><br>
318 <p>Be more verbose. Gives extra information on various aspects
319 of your program, such as: the shared objects loaded, the
320 suppressions used, the progress of the instrumentation engine,
321 and warnings about unusual behaviour.
322 </li><br><p>
323
324 <li><code>-q --quiet</code><br>
325 <p>Run silently, and only print error messages. Useful if you
326 are running regression tests or have some other automated test
327 machinery.
328 </li><br><p>
329
330 <li><code>--demangle=no</code><br>
331 <code>--demangle=yes</code> [the default]
332 <p>Disable/enable automatic demangling (decoding) of C++ names.
333 Enabled by default. When enabled, Valgrind will attempt to
334 translate encoded C++ procedure names back to something
335 approaching the original. The demangler handles symbols mangled
336 by g++ versions 2.X and 3.X.
337
338 <p>An important fact about demangling is that function
339 names mentioned in suppressions files should be in their mangled
340 form. Valgrind does not demangle function names when searching
341 for applicable suppressions, because to do otherwise would make
342 suppressions file contents dependent on the state of Valgrind's
343 demangling machinery, and would also be slow and pointless.
344 </li><br><p>
345
346 <li><code>--num-callers=&lt;number&gt;</code> [default=4]<br>
347 <p>By default, Valgrind shows four levels of function call names
348 to help you identify program locations. You can change that
349 number with this option. This can help in determining the
350 program's location in deeply-nested call chains. Note that errors
351 are commoned up using only the top three function locations (the
352 place in the current function, and that of its two immediate
353 callers). So this doesn't affect the total number of errors
354 reported.
355 <p>
356 The maximum value for this is 50. Note that higher settings
357 will make Valgrind run a bit more slowly and take a bit more
358 memory, but can be useful when working with programs with
359 deeply-nested call chains.
360 </li><br><p>
361
362 <li><code>--gdb-attach=no</code> [the default]<br>
363 <code>--gdb-attach=yes</code>
364 <p>When enabled, Valgrind will pause after every error shown,
365 and print the line
366 <br>
367 <code>---- Attach to GDB ? --- [Return/N/n/Y/y/C/c] ----</code>
368 <p>
369 Pressing <code>Ret</code>, or <code>N</code> <code>Ret</code>
370 or <code>n</code> <code>Ret</code>, causes Valgrind not to
371 start GDB for this error.
372 <p>
373 <code>Y</code> <code>Ret</code>
374 or <code>y</code> <code>Ret</code> causes Valgrind to
375 start GDB, for the program at this point. When you have
376 finished with GDB, quit from it, and the program will continue.
377 Trying to continue from inside GDB doesn't work.
378 <p>
379 <code>C</code> <code>Ret</code>
380 or <code>c</code> <code>Ret</code> causes Valgrind not to
381 start GDB, and not to ask again.
382 <p>
383 <code>--gdb-attach=yes</code> conflicts with
384 <code>--trace-children=yes</code>. You can't use them
385 together. Valgrind refuses to start up in this situation.
386 </li><br><p>
387
388 <li><code>--partial-loads-ok=yes</code> [the default]<br>
389 <code>--partial-loads-ok=no</code>
390 <p>Controls how Valgrind handles word (4-byte) loads from
391 addresses for which some bytes are addressible and others
392 are not. When <code>yes</code> (the default), such loads
393 do not elicit an address error. Instead, the loaded V bytes
394 corresponding to the illegal addresses indicate undefined, and
395 those corresponding to legal addresses are loaded from shadow
396 memory, as usual.
397 <p>
398 When <code>no</code>, loads from partially
399 invalid addresses are treated the same as loads from completely
400 invalid addresses: an illegal-address error is issued,
401 and the resulting V bytes indicate valid data.
402 </li><br><p>
403
404 <li><code>--sloppy-malloc=no</code> [the default]<br>
405 <code>--sloppy-malloc=yes</code>
406 <p>When enabled, all requests for malloc/calloc are rounded up
407 to a whole number of machine words -- in other words, made
408 divisible by 4. For example, a request for 17 bytes of space
409 would result in a 20-byte area being made available. This works
410 around bugs in sloppy libraries which assume that they can
411 safely rely on malloc/calloc requests being rounded up in this
412 fashion. Without the workaround, these libraries tend to
413 generate large numbers of errors when they access the ends of
414 these areas. Valgrind snapshots dated 17 Feb 2002 and later are
415 cleverer about this problem, and you should no longer need to
416 use this flag.
417 </li><br><p>
418
419 <li><code>--trace-children=no</code> [the default]</br>
420 <code>--trace-children=yes</code>
421 <p>When enabled, Valgrind will trace into child processes. This
422 is confusing and usually not what you want, so is disabled by
423 default.</li><br><p>
424
425 <li><code>--freelist-vol=&lt;number></code> [default: 1000000]
426 <p>When the client program releases memory using free (in C) or
427 delete (C++), that memory is not immediately made available for
428 re-allocation. Instead it is marked inaccessible and placed in
429 a queue of freed blocks. The purpose is to delay the point at
430 which freed-up memory comes back into circulation. This
431 increases the chance that Valgrind will be able to detect
432 invalid accesses to blocks for some significant period of time
433 after they have been freed.
434 <p>
435 This flag specifies the maximum total size, in bytes, of the
436 blocks in the queue. The default value is one million bytes.
437 Increasing this increases the total amount of memory used by
438 Valgrind but may detect invalid uses of freed blocks which would
439 otherwise go undetected.</li><br><p>
440
441 <li><code>--logfile-fd=&lt;number></code> [default: 2, stderr]
442 <p>Specifies the file descriptor on which Valgrind communicates
443 all of its messages. The default, 2, is the standard error
444 channel. This may interfere with the client's own use of
445 stderr. To dump Valgrind's commentary in a file without using
446 stderr, something like the following works well (sh/bash
447 syntax):<br>
448 <code>&nbsp;&nbsp;
449 valgrind --logfile-fd=9 my_prog 9> logfile</code><br>
450 That is: tell Valgrind to send all output to file descriptor 9,
451 and ask the shell to route file descriptor 9 to "logfile".
452 </li><br><p>
453
454 <li><code>--suppressions=&lt;filename></code> [default:
455 /installation/directory/redhat72.supp] <p>Specifies an extra
456 file from which to read descriptions of errors to suppress. You
457 may use as many extra suppressions files as you
458 like.</li><br><p>
459
460 <li><code>--leak-check=no</code> [default]<br>
461 <code>--leak-check=yes</code>
462 <p>When enabled, search for memory leaks when the client program
463 finishes. A memory leak means a malloc'd block, which has not
464 yet been free'd, but to which no pointer can be found. Such a
465 block can never be free'd by the program, since no pointer to it
466 exists. Leak checking is disabled by default
467 because it tends to generate dozens of error messages.
468 </li><br><p>
469
470 <li><code>--show-reachable=no</code> [default]<br>
471 <code>--show-reachable=yes</code> <p>When disabled, the memory
472 leak detector only shows blocks for which it cannot find a
473 pointer to at all, or it can only find a pointer to the middle
474 of. These blocks are prime candidates for memory leaks. When
475 enabled, the leak detector also reports on blocks which it could
476 find a pointer to. Your program could, at least in principle,
477 have freed such blocks before exit. Contrast this to blocks for
478 which no pointer, or only an interior pointer could be found:
479 they are more likely to indicate memory leaks, because
480 you do not actually have a pointer to the start of the block
481 which you can hand to free(), even if you wanted to.
482 </li><br><p>
483
484 <li><code>--leak-resolution=low</code> [default]<br>
485 <code>--leak-resolution=med</code> <br>
486 <code>--leak-resolution=high</code>
487 <p>When doing leak checking, determines how willing Valgrind is
488 to consider different backtraces the same. When set to
489 <code>low</code>, the default, only the first two entries need
490 match. When <code>med</code>, four entries have to match. When
491 <code>high</code>, all entries need to match.
492 <p>
493 For hardcore leak debugging, you probably want to use
494 <code>--leak-resolution=high</code> together with
495 <code>--num-callers=40</code> or some such large number. Note
496 however that this can give an overwhelming amount of
497 information, which is why the defaults are 4 callers and
498 low-resolution matching.
499 <p>
500 Note that the <code>--leak-resolution=</code> setting does not
501 affect Valgrind's ability to find leaks. It only changes how
502 the results are presented to you.
503 </li><br><p>
504
505 <li><code>--workaround-gcc296-bugs=no</code> [default]<br>
506 <code>--workaround-gcc296-bugs=yes</code> <p>When enabled,
507 assume that reads and writes some small distance below the stack
508 pointer <code>%esp</code> are due to bugs in gcc 2.96, and does
509 not report them. The "small distance" is 256 bytes by default.
510 Note that gcc 2.96 is the default compiler on some popular Linux
511 distributions (RedHat 7.X, Mandrake) and so you may well need to
512 use this flag. Do not use it if you do not have to, as it can
513 cause real errors to be overlooked. A better option is to use a
514 gcc/g++ which works properly; 2.95.3 seems to be a good choice.
515 <p>
516 Unfortunately (27 Feb 02) it looks like g++ 3.0.4 is similarly
517 buggy, so you may need to issue this flag if you use 3.0.4.
518 </li><br><p>
519
njn4f9c9342002-04-29 16:03:24 +0000520 <li><code>--cachesim=no</code> [default]<br>
521 <code>--cachesim=yes</code>
522 <p>When enabled, turns off memory checking, and turns on cache profiling.
523 Cache profiling is described in detail in <a href="#cache">Section 7</a>.
524 </li><p>
sewardjde4a1d02002-03-22 01:27:54 +0000525</ul>
526
527There are also some options for debugging Valgrind itself. You
528shouldn't need to use them in the normal run of things. Nevertheless:
529
530<ul>
531
532 <li><code>--single-step=no</code> [default]<br>
533 <code>--single-step=yes</code>
534 <p>When enabled, each x86 insn is translated seperately into
535 instrumented code. When disabled, translation is done on a
536 per-basic-block basis, giving much better translations.</li><br>
537 <p>
538
539 <li><code>--optimise=no</code><br>
540 <code>--optimise=yes</code> [default]
541 <p>When enabled, various improvements are applied to the
542 intermediate code, mainly aimed at allowing the simulated CPU's
543 registers to be cached in the real CPU's registers over several
544 simulated instructions.</li><br>
545 <p>
546
547 <li><code>--instrument=no</code><br>
548 <code>--instrument=yes</code> [default]
549 <p>When disabled, the translations don't actually contain any
550 instrumentation.</li><br>
551 <p>
552
553 <li><code>--cleanup=no</code><br>
554 <code>--cleanup=yes</code> [default]
555 <p>When enabled, various improvments are applied to the
556 post-instrumented intermediate code, aimed at removing redundant
557 value checks.</li><br>
558 <p>
559
560 <li><code>--trace-syscalls=no</code> [default]<br>
561 <code>--trace-syscalls=yes</code>
562 <p>Enable/disable tracing of system call intercepts.</li><br>
563 <p>
564
565 <li><code>--trace-signals=no</code> [default]<br>
566 <code>--trace-signals=yes</code>
567 <p>Enable/disable tracing of signal handling.</li><br>
568 <p>
569
sewardjc7529c32002-04-16 01:55:18 +0000570 <li><code>--trace-sched=no</code> [default]<br>
571 <code>--trace-sched=yes</code>
572 <p>Enable/disable tracing of thread scheduling events.</li><br>
573 <p>
574
sewardj45b4b372002-04-16 22:50:32 +0000575 <li><code>--trace-pthread=none</code> [default]<br>
576 <code>--trace-pthread=some</code> <br>
577 <code>--trace-pthread=all</code>
578 <p>Specifies amount of trace detail for pthread-related events.</li><br>
sewardjc7529c32002-04-16 01:55:18 +0000579 <p>
580
sewardjde4a1d02002-03-22 01:27:54 +0000581 <li><code>--trace-symtab=no</code> [default]<br>
582 <code>--trace-symtab=yes</code>
583 <p>Enable/disable tracing of symbol table reading.</li><br>
584 <p>
585
586 <li><code>--trace-malloc=no</code> [default]<br>
587 <code>--trace-malloc=yes</code>
588 <p>Enable/disable tracing of malloc/free (et al) intercepts.
589 </li><br>
590 <p>
591
592 <li><code>--stop-after=&lt;number></code>
593 [default: infinity, more or less]
594 <p>After &lt;number> basic blocks have been executed, shut down
595 Valgrind and switch back to running the client on the real CPU.
596 </li><br>
597 <p>
598
599 <li><code>--dump-error=&lt;number></code>
600 [default: inactive]
601 <p>After the program has exited, show gory details of the
602 translation of the basic block containing the &lt;number>'th
603 error context. When used with <code>--single-step=yes</code>,
604 can show the
605 exact x86 instruction causing an error.</li><br>
606 <p>
607
608 <li><code>--smc-check=none</code><br>
609 <code>--smc-check=some</code> [default]<br>
610 <code>--smc-check=all</code>
611 <p>How carefully should Valgrind check for self-modifying code
612 writes, so that translations can be discarded?&nbsp; When
613 "none", no writes are checked. When "some", only writes
614 resulting from moves from integer registers to memory are
615 checked. When "all", all memory writes are checked, even those
616 with which are no sane program would generate code -- for
617 example, floating-point writes.</li>
618</ul>
619
620
621<a name="errormsgs">
622<h3>2.6&nbsp; Explaination of error messages</h3>
623
624Despite considerable sophistication under the hood, Valgrind can only
625really detect two kinds of errors, use of illegal addresses, and use
626of undefined values. Nevertheless, this is enough to help you
627discover all sorts of memory-management nasties in your code. This
628section presents a quick summary of what error messages mean. The
629precise behaviour of the error-checking machinery is described in
630<a href="#machine">Section 4</a>.
631
632
633<h4>2.6.1&nbsp; Illegal read / Illegal write errors</h4>
634For example:
635<pre>
636 ==30975== Invalid read of size 4
637 ==30975== at 0x40F6BBCC: (within /usr/lib/libpng.so.2.1.0.9)
638 ==30975== by 0x40F6B804: (within /usr/lib/libpng.so.2.1.0.9)
639 ==30975== by 0x40B07FF4: read_png_image__FP8QImageIO (kernel/qpngio.cpp:326)
640 ==30975== by 0x40AC751B: QImageIO::read() (kernel/qimage.cpp:3621)
641 ==30975== Address 0xBFFFF0E0 is not stack'd, malloc'd or free'd
642</pre>
643
644<p>This happens when your program reads or writes memory at a place
645which Valgrind reckons it shouldn't. In this example, the program did
646a 4-byte read at address 0xBFFFF0E0, somewhere within the
647system-supplied library libpng.so.2.1.0.9, which was called from
648somewhere else in the same library, called from line 326 of
649qpngio.cpp, and so on.
650
651<p>Valgrind tries to establish what the illegal address might relate
652to, since that's often useful. So, if it points into a block of
653memory which has already been freed, you'll be informed of this, and
sewardjc7529c32002-04-16 01:55:18 +0000654also where the block was free'd at. Likewise, if it should turn out
sewardjde4a1d02002-03-22 01:27:54 +0000655to be just off the end of a malloc'd block, a common result of
656off-by-one-errors in array subscripting, you'll be informed of this
657fact, and also where the block was malloc'd.
658
659<p>In this example, Valgrind can't identify the address. Actually the
660address is on the stack, but, for some reason, this is not a valid
661stack address -- it is below the stack pointer, %esp, and that isn't
662allowed.
663
664<p>Note that Valgrind only tells you that your program is about to
665access memory at an illegal address. It can't stop the access from
666happening. So, if your program makes an access which normally would
667result in a segmentation fault, you program will still suffer the same
668fate -- but you will get a message from Valgrind immediately prior to
669this. In this particular example, reading junk on the stack is
670non-fatal, and the program stays alive.
671
672
673<h4>2.6.2&nbsp; Use of uninitialised values</h4>
674For example:
675<pre>
sewardja7dc7952002-03-24 11:29:13 +0000676 ==19146== Conditional jump or move depends on uninitialised value(s)
sewardjde4a1d02002-03-22 01:27:54 +0000677 ==19146== at 0x402DFA94: _IO_vfprintf (_itoa.h:49)
678 ==19146== by 0x402E8476: _IO_printf (printf.c:36)
679 ==19146== by 0x8048472: main (tests/manuel1.c:8)
680 ==19146== by 0x402A6E5E: __libc_start_main (libc-start.c:129)
681</pre>
682
683<p>An uninitialised-value use error is reported when your program uses
684a value which hasn't been initialised -- in other words, is undefined.
685Here, the undefined value is used somewhere inside the printf()
686machinery of the C library. This error was reported when running the
687following small program:
688<pre>
689 int main()
690 {
691 int x;
692 printf ("x = %d\n", x);
693 }
694</pre>
695
696<p>It is important to understand that your program can copy around
697junk (uninitialised) data to its heart's content. Valgrind observes
698this and keeps track of the data, but does not complain. A complaint
699is issued only when your program attempts to make use of uninitialised
700data. In this example, x is uninitialised. Valgrind observes the
701value being passed to _IO_printf and thence to
702_IO_vfprintf, but makes no comment. However,
703_IO_vfprintf has to examine the value of x
704so it can turn it into the corresponding ASCII string, and it is at
705this point that Valgrind complains.
706
707<p>Sources of uninitialised data tend to be:
708<ul>
709 <li>Local variables in procedures which have not been initialised,
710 as in the example above.</li><br><p>
711
712 <li>The contents of malloc'd blocks, before you write something
713 there. In C++, the new operator is a wrapper round malloc, so
714 if you create an object with new, its fields will be
715 uninitialised until you fill them in, which is only Right and
716 Proper.</li>
717</ul>
718
719
720
721<h4>2.6.3&nbsp; Illegal frees</h4>
722For example:
723<pre>
724 ==7593== Invalid free()
725 ==7593== at 0x4004FFDF: free (ut_clientmalloc.c:577)
726 ==7593== by 0x80484C7: main (tests/doublefree.c:10)
727 ==7593== by 0x402A6E5E: __libc_start_main (libc-start.c:129)
728 ==7593== by 0x80483B1: (within tests/doublefree)
729 ==7593== Address 0x3807F7B4 is 0 bytes inside a block of size 177 free'd
730 ==7593== at 0x4004FFDF: free (ut_clientmalloc.c:577)
731 ==7593== by 0x80484C7: main (tests/doublefree.c:10)
732 ==7593== by 0x402A6E5E: __libc_start_main (libc-start.c:129)
733 ==7593== by 0x80483B1: (within tests/doublefree)
734</pre>
735<p>Valgrind keeps track of the blocks allocated by your program with
736malloc/new, so it can know exactly whether or not the argument to
737free/delete is legitimate or not. Here, this test program has
738freed the same block twice. As with the illegal read/write errors,
739Valgrind attempts to make sense of the address free'd. If, as
740here, the address is one which has previously been freed, you wil
741be told that -- making duplicate frees of the same block easy to spot.
742
743
744<h4>2.6.4&nbsp; Passing system call parameters with inadequate
745read/write permissions</h4>
746
747Valgrind checks all parameters to system calls. If a system call
748needs to read from a buffer provided by your program, Valgrind checks
749that the entire buffer is addressible and has valid data, ie, it is
750readable. And if the system call needs to write to a user-supplied
751buffer, Valgrind checks that the buffer is addressible. After the
752system call, Valgrind updates its administrative information to
753precisely reflect any changes in memory permissions caused by the
754system call.
755
756<p>Here's an example of a system call with an invalid parameter:
757<pre>
758 #include &lt;stdlib.h>
759 #include &lt;unistd.h>
760 int main( void )
761 {
762 char* arr = malloc(10);
763 (void) write( 1 /* stdout */, arr, 10 );
764 return 0;
765 }
766</pre>
767
768<p>You get this complaint ...
769<pre>
770 ==8230== Syscall param write(buf) lacks read permissions
771 ==8230== at 0x4035E072: __libc_write
772 ==8230== by 0x402A6E5E: __libc_start_main (libc-start.c:129)
773 ==8230== by 0x80483B1: (within tests/badwrite)
774 ==8230== by &lt;bogus frame pointer> ???
775 ==8230== Address 0x3807E6D0 is 0 bytes inside a block of size 10 alloc'd
776 ==8230== at 0x4004FEE6: malloc (ut_clientmalloc.c:539)
777 ==8230== by 0x80484A0: main (tests/badwrite.c:6)
778 ==8230== by 0x402A6E5E: __libc_start_main (libc-start.c:129)
779 ==8230== by 0x80483B1: (within tests/badwrite)
780</pre>
781
782<p>... because the program has tried to write uninitialised junk from
783the malloc'd block to the standard output.
784
785
786<h4>2.6.5&nbsp; Warning messages you might see</h4>
787
788Most of these only appear if you run in verbose mode (enabled by
789<code>-v</code>):
790<ul>
791<li> <code>More than 50 errors detected. Subsequent errors
792 will still be recorded, but in less detail than before.</code>
793 <br>
794 After 50 different errors have been shown, Valgrind becomes
795 more conservative about collecting them. It then requires only
796 the program counters in the top two stack frames to match when
797 deciding whether or not two errors are really the same one.
798 Prior to this point, the PCs in the top four frames are required
799 to match. This hack has the effect of slowing down the
800 appearance of new errors after the first 50. The 50 constant can
801 be changed by recompiling Valgrind.
802<p>
803<li> <code>More than 500 errors detected. I'm not reporting any more.
804 Final error counts may be inaccurate. Go fix your
805 program!</code>
806 <br>
807 After 500 different errors have been detected, Valgrind ignores
808 any more. It seems unlikely that collecting even more different
809 ones would be of practical help to anybody, and it avoids the
810 danger that Valgrind spends more and more of its time comparing
811 new errors against an ever-growing collection. As above, the 500
812 number is a compile-time constant.
813<p>
814<li> <code>Warning: client exiting by calling exit(&lt;number>).
815 Bye!</code>
816 <br>
817 Your program has called the <code>exit</code> system call, which
818 will immediately terminate the process. You'll get no exit-time
819 error summaries or leak checks. Note that this is not the same
820 as your program calling the ANSI C function <code>exit()</code>
821 -- that causes a normal, controlled shutdown of Valgrind.
822<p>
823<li> <code>Warning: client switching stacks?</code>
824 <br>
825 Valgrind spotted such a large change in the stack pointer, %esp,
826 that it guesses the client is switching to a different stack.
827 At this point it makes a kludgey guess where the base of the new
828 stack is, and sets memory permissions accordingly. You may get
829 many bogus error messages following this, if Valgrind guesses
830 wrong. At the moment "large change" is defined as a change of
831 more that 2000000 in the value of the %esp (stack pointer)
832 register.
833<p>
834<li> <code>Warning: client attempted to close Valgrind's logfile fd &lt;number>
835 </code>
836 <br>
837 Valgrind doesn't allow the client
838 to close the logfile, because you'd never see any diagnostic
839 information after that point. If you see this message,
840 you may want to use the <code>--logfile-fd=&lt;number></code>
841 option to specify a different logfile file-descriptor number.
842<p>
843<li> <code>Warning: noted but unhandled ioctl &lt;number></code>
844 <br>
845 Valgrind observed a call to one of the vast family of
846 <code>ioctl</code> system calls, but did not modify its
847 memory status info (because I have not yet got round to it).
848 The call will still have gone through, but you may get spurious
849 errors after this as a result of the non-update of the memory info.
850<p>
851<li> <code>Warning: unblocking signal &lt;number> due to
852 sigprocmask</code>
853 <br>
854 Really just a diagnostic from the signal simulation machinery.
855 This message will appear if your program handles a signal by
856 first <code>longjmp</code>ing out of the signal handler,
857 and then unblocking the signal with <code>sigprocmask</code>
858 -- a standard signal-handling idiom.
859<p>
860<li> <code>Warning: bad signal number &lt;number> in __NR_sigaction.</code>
861 <br>
862 Probably indicates a bug in the signal simulation machinery.
863<p>
864<li> <code>Warning: set address range perms: large range &lt;number></code>
865 <br>
866 Diagnostic message, mostly for my benefit, to do with memory
867 permissions.
868</ul>
869
870
871<a name="suppfiles"></a>
872<h3>2.7&nbsp; Writing suppressions files</h3>
873
874A suppression file describes a bunch of errors which, for one reason
875or another, you don't want Valgrind to tell you about. Usually the
876reason is that the system libraries are buggy but unfixable, at least
877within the scope of the current debugging session. Multiple
878suppresions files are allowed. By default, Valgrind uses
879<code>linux24.supp</code> in the directory where it is installed.
880
881<p>
882You can ask to add suppressions from another file, by specifying
883<code>--suppressions=/path/to/file.supp</code>.
884
885<p>Each suppression has the following components:<br>
886<ul>
887
888 <li>Its name. This merely gives a handy name to the suppression, by
889 which it is referred to in the summary of used suppressions
890 printed out when a program finishes. It's not important what
891 the name is; any identifying string will do.
892 <p>
893
894 <li>The nature of the error to suppress. Either:
895 <code>Value1</code>,
896 <code>Value2</code>,
sewardja7dc7952002-03-24 11:29:13 +0000897 <code>Value4</code> or
898 <code>Value8</code>,
sewardjde4a1d02002-03-22 01:27:54 +0000899 meaning an uninitialised-value error when
sewardja7dc7952002-03-24 11:29:13 +0000900 using a value of 1, 2, 4 or 8 bytes.
901 Or
902 <code>Cond</code> (or its old name, <code>Value0</code>),
903 meaning use of an uninitialised CPU condition code. Or:
sewardjde4a1d02002-03-22 01:27:54 +0000904 <code>Addr1</code>,
905 <code>Addr2</code>,
906 <code>Addr4</code> or
907 <code>Addr8</code>, meaning an invalid address during a
908 memory access of 1, 2, 4 or 8 bytes respectively. Or
909 <code>Param</code>,
910 meaning an invalid system call parameter error. Or
911 <code>Free</code>, meaning an invalid or mismatching free.</li><br>
912 <p>
913
914 <li>The "immediate location" specification. For Value and Addr
915 errors, is either the name of the function in which the error
916 occurred, or, failing that, the full path the the .so file
917 containing the error location. For Param errors, is the name of
918 the offending system call parameter. For Free errors, is the
919 name of the function doing the freeing (eg, <code>free</code>,
920 <code>__builtin_vec_delete</code>, etc)</li><br>
921 <p>
922
923 <li>The caller of the above "immediate location". Again, either a
924 function or shared-object name.</li><br>
925 <p>
926
927 <li>Optionally, one or two extra calling-function or object names,
928 for greater precision.</li>
929</ul>
930
931<p>
932Locations may be either names of shared objects or wildcards matching
933function names. They begin <code>obj:</code> and <code>fun:</code>
934respectively. Function and object names to match against may use the
935wildcard characters <code>*</code> and <code>?</code>.
936
937A suppression only suppresses an error when the error matches all the
938details in the suppression. Here's an example:
939<pre>
940 {
941 __gconv_transform_ascii_internal/__mbrtowc/mbtowc
942 Value4
943 fun:__gconv_transform_ascii_internal
944 fun:__mbr*toc
945 fun:mbtowc
946 }
947</pre>
948
949<p>What is means is: suppress a use-of-uninitialised-value error, when
950the data size is 4, when it occurs in the function
951<code>__gconv_transform_ascii_internal</code>, when that is called
952from any function of name matching <code>__mbr*toc</code>,
953when that is called from
954<code>mbtowc</code>. It doesn't apply under any other circumstances.
955The string by which this suppression is identified to the user is
956__gconv_transform_ascii_internal/__mbrtowc/mbtowc.
957
958<p>Another example:
959<pre>
960 {
961 libX11.so.6.2/libX11.so.6.2/libXaw.so.7.0
962 Value4
963 obj:/usr/X11R6/lib/libX11.so.6.2
964 obj:/usr/X11R6/lib/libX11.so.6.2
965 obj:/usr/X11R6/lib/libXaw.so.7.0
966 }
967</pre>
968
969<p>Suppress any size 4 uninitialised-value error which occurs anywhere
970in <code>libX11.so.6.2</code>, when called from anywhere in the same
971library, when called from anywhere in <code>libXaw.so.7.0</code>. The
972inexact specification of locations is regrettable, but is about all
973you can hope for, given that the X11 libraries shipped with Red Hat
9747.2 have had their symbol tables removed.
975
976<p>Note -- since the above two examples did not make it clear -- that
977you can freely mix the <code>obj:</code> and <code>fun:</code>
978styles of description within a single suppression record.
979
980
981<a name="install"></a>
982<h3>2.8&nbsp; Building and installing</h3>
983At the moment, very rudimentary.
984
985<p>The tarball is set up for a standard Red Hat 7.1 (6.2) machine. To
986build, just do "make". No configure script, no autoconf, no nothing.
987
988<p>The files needed for installation are: valgrind.so, valgring.so,
989valgrind, VERSION, redhat72.supp (or redhat62.supp). You can copy
990these to any directory you like. However, you then need to edit the
991shell script "valgrind". On line 4, set the environment variable
992<code>VALGRIND</code> to point to the directory you have copied the
993installation into.
994
995
sewardjc7529c32002-04-16 01:55:18 +0000996<a name="install"></a>
997<h3>2.9&nbsp; The Client Request mechanism</h3>
998
999Valgrind has a trapdoor mechanism via which the client program can
1000pass all manner of requests and queries to Valgrind. Internally, this
1001is used extensively to make malloc, free, signals, etc, work, although
1002you don't see that.
1003<p>
1004For your convenience, a subset of these so-called client requests is
1005provided to allow you to tell Valgrind facts about the behaviour of
1006your program, and conversely to make queries. In particular, your
1007program can tell Valgrind about changes in memory range permissions
1008that Valgrind would not otherwise know about, and so allows clients to
1009get Valgrind to do arbitrary custom checks.
1010<p>
1011Clients need to include the header file <code>valgrind.h</code> to
1012make this work. The macros therein have the magical property that
1013they generate code in-line which Valgrind can spot. However, the code
1014does nothing when not run on Valgrind, so you are not forced to run
1015your program on Valgrind just because you use the macros in this file.
1016<p>
1017A brief description of the available macros:
1018<ul>
1019<li><code>VALGRIND_MAKE_NOACCESS</code>,
1020 <code>VALGRIND_MAKE_WRITABLE</code> and
1021 <code>VALGRIND_MAKE_READABLE</code>. These mark address
1022 ranges as completely inaccessible, accessible but containing
1023 undefined data, and accessible and containing defined data,
1024 respectively. Subsequent errors may have their faulting
1025 addresses described in terms of these blocks. Returns a
1026 "block handle". Returns zero when not run on Valgrind.
1027<p>
1028<li><code>VALGRIND_DISCARD</code>: At some point you may want
1029 Valgrind to stop reporting errors in terms of the blocks
1030 defined by the previous three macros. To do this, the above
1031 macros return a small-integer "block handle". You can pass
1032 this block handle to <code>VALGRIND_DISCARD</code>. After
1033 doing so, Valgrind will no longer be able to relate
1034 addressing errors to the user-defined block associated with
1035 the handle. The permissions settings associated with the
1036 handle remain in place; this just affects how errors are
1037 reported, not whether they are reported. Returns 1 for an
1038 invalid handle and 0 for a valid handle (although passing
1039 invalid handles is harmless). Always returns 0 when not run
1040 on Valgrind.
1041<p>
1042<li><code>VALGRIND_CHECK_NOACCESS</code>,
1043 <code>VALGRIND_CHECK_WRITABLE</code> and
1044 <code>VALGRIND_CHECK_READABLE</code>: check immediately
1045 whether or not the given address range has the relevant
1046 property, and if not, print an error message. Also, for the
1047 convenience of the client, returns zero if the relevant
1048 property holds; otherwise, the returned value is the address
1049 of the first byte for which the property is not true.
1050 Always returns 0 when not run on Valgrind.
1051<p>
1052<li><code>VALGRIND_CHECK_NOACCESS</code>: a quick and easy way
1053 to find out whether Valgrind thinks a particular variable
1054 (lvalue, to be precise) is addressible and defined. Prints
1055 an error message if not. Returns no value.
1056<p>
1057<li><code>VALGRIND_MAKE_NOACCESS_STACK</code>: a highly
1058 experimental feature. Similarly to
1059 <code>VALGRIND_MAKE_NOACCESS</code>, this marks an address
1060 range as inaccessible, so that subsequent accesses to an
1061 address in the range gives an error. However, this macro
1062 does not return a block handle. Instead, all annotations
1063 created like this are reviewed at each client
1064 <code>ret</code> (subroutine return) instruction, and those
1065 which now define an address range block the client's stack
1066 pointer register (<code>%esp</code>) are automatically
1067 deleted.
1068 <p>
1069 In other words, this macro allows the client to tell
1070 Valgrind about red-zones on its own stack. Valgrind
1071 automatically discards this information when the stack
1072 retreats past such blocks. Beware: hacky and flaky, and
1073 probably interacts badly with the new pthread support.
1074</ul>
1075</li>
1076<p>
1077
1078
1079
sewardjde4a1d02002-03-22 01:27:54 +00001080<a name="problems"></a>
sewardjc7529c32002-04-16 01:55:18 +00001081<h3>2.10&nbsp; If you have problems</h3>
sewardjde4a1d02002-03-22 01:27:54 +00001082Mail me (<a href="mailto:jseward@acm.org">jseward@acm.org</a>).
1083
1084<p>See <a href="#limits">Section 4</a> for the known limitations of
1085Valgrind, and for a list of programs which are known not to work on
1086it.
1087
1088<p>The translator/instrumentor has a lot of assertions in it. They
1089are permanently enabled, and I have no plans to disable them. If one
1090of these breaks, please mail me!
1091
1092<p>If you get an assertion failure on the expression
1093<code>chunkSane(ch)</code> in <code>vg_free()</code> in
1094<code>vg_malloc.c</code>, this may have happened because your program
1095wrote off the end of a malloc'd block, or before its beginning.
1096Valgrind should have emitted a proper message to that effect before
1097dying in this way. This is a known problem which I should fix.
1098<p>
1099
1100<hr width="100%">
1101
1102<a name="machine"></a>
1103<h2>3&nbsp; Details of the checking machinery</h2>
1104
1105Read this section if you want to know, in detail, exactly what and how
1106Valgrind is checking.
1107
1108<a name="vvalue"></a>
1109<h3>3.1&nbsp; Valid-value (V) bits</h3>
1110
1111It is simplest to think of Valgrind implementing a synthetic Intel x86
1112CPU which is identical to a real CPU, except for one crucial detail.
1113Every bit (literally) of data processed, stored and handled by the
1114real CPU has, in the synthetic CPU, an associated "valid-value" bit,
1115which says whether or not the accompanying bit has a legitimate value.
1116In the discussions which follow, this bit is referred to as the V
1117(valid-value) bit.
1118
1119<p>Each byte in the system therefore has a 8 V bits which accompanies
1120it wherever it goes. For example, when the CPU loads a word-size item
1121(4 bytes) from memory, it also loads the corresponding 32 V bits from
1122a bitmap which stores the V bits for the process' entire address
1123space. If the CPU should later write the whole or some part of that
1124value to memory at a different address, the relevant V bits will be
1125stored back in the V-bit bitmap.
1126
1127<p>In short, each bit in the system has an associated V bit, which
1128follows it around everywhere, even inside the CPU. Yes, the CPU's
1129(integer) registers have their own V bit vectors.
1130
1131<p>Copying values around does not cause Valgrind to check for, or
1132report on, errors. However, when a value is used in a way which might
1133conceivably affect the outcome of your program's computation, the
1134associated V bits are immediately checked. If any of these indicate
1135that the value is undefined, an error is reported.
1136
1137<p>Here's an (admittedly nonsensical) example:
1138<pre>
1139 int i, j;
1140 int a[10], b[10];
1141 for (i = 0; i &lt; 10; i++) {
1142 j = a[i];
1143 b[i] = j;
1144 }
1145</pre>
1146
1147<p>Valgrind emits no complaints about this, since it merely copies
1148uninitialised values from <code>a[]</code> into <code>b[]</code>, and
1149doesn't use them in any way. However, if the loop is changed to
1150<pre>
1151 for (i = 0; i &lt; 10; i++) {
1152 j += a[i];
1153 }
1154 if (j == 77)
1155 printf("hello there\n");
1156</pre>
1157then Valgrind will complain, at the <code>if</code>, that the
1158condition depends on uninitialised values.
1159
1160<p>Most low level operations, such as adds, cause Valgrind to
1161use the V bits for the operands to calculate the V bits for the
1162result. Even if the result is partially or wholly undefined,
1163it does not complain.
1164
1165<p>Checks on definedness only occur in two places: when a value is
1166used to generate a memory address, and where control flow decision
1167needs to be made. Also, when a system call is detected, valgrind
1168checks definedness of parameters as required.
1169
1170<p>If a check should detect undefinedness, and error message is
1171issued. The resulting value is subsequently regarded as well-defined.
1172To do otherwise would give long chains of error messages. In effect,
1173we say that undefined values are non-infectious.
1174
1175<p>This sounds overcomplicated. Why not just check all reads from
1176memory, and complain if an undefined value is loaded into a CPU register?
1177Well, that doesn't work well, because perfectly legitimate C programs routinely
1178copy uninitialised values around in memory, and we don't want endless complaints
1179about that. Here's the canonical example. Consider a struct
1180like this:
1181<pre>
1182 struct S { int x; char c; };
1183 struct S s1, s2;
1184 s1.x = 42;
1185 s1.c = 'z';
1186 s2 = s1;
1187</pre>
1188
1189<p>The question to ask is: how large is <code>struct S</code>, in
1190bytes? An int is 4 bytes and a char one byte, so perhaps a struct S
1191occupies 5 bytes? Wrong. All (non-toy) compilers I know of will
1192round the size of <code>struct S</code> up to a whole number of words,
1193in this case 8 bytes. Not doing this forces compilers to generate
1194truly appalling code for subscripting arrays of <code>struct
1195S</code>'s.
1196
1197<p>So s1 occupies 8 bytes, yet only 5 of them will be initialised.
1198For the assignment <code>s2 = s1</code>, gcc generates code to copy
1199all 8 bytes wholesale into <code>s2</code> without regard for their
1200meaning. If Valgrind simply checked values as they came out of
1201memory, it would yelp every time a structure assignment like this
1202happened. So the more complicated semantics described above is
1203necessary. This allows gcc to copy <code>s1</code> into
1204<code>s2</code> any way it likes, and a warning will only be emitted
1205if the uninitialised values are later used.
1206
1207<p>One final twist to this story. The above scheme allows garbage to
1208pass through the CPU's integer registers without complaint. It does
1209this by giving the integer registers V tags, passing these around in
1210the expected way. This complicated and computationally expensive to
1211do, but is necessary. Valgrind is more simplistic about
1212floating-point loads and stores. In particular, V bits for data read
1213as a result of floating-point loads are checked at the load
1214instruction. So if your program uses the floating-point registers to
1215do memory-to-memory copies, you will get complaints about
1216uninitialised values. Fortunately, I have not yet encountered a
1217program which (ab)uses the floating-point registers in this way.
1218
1219<a name="vaddress"></a>
1220<h3>3.2&nbsp; Valid-address (A) bits</h3>
1221
1222Notice that the previous section describes how the validity of values
1223is established and maintained without having to say whether the
1224program does or does not have the right to access any particular
1225memory location. We now consider the latter issue.
1226
1227<p>As described above, every bit in memory or in the CPU has an
1228associated valid-value (V) bit. In addition, all bytes in memory, but
1229not in the CPU, have an associated valid-address (A) bit. This
1230indicates whether or not the program can legitimately read or write
1231that location. It does not give any indication of the validity or the
1232data at that location -- that's the job of the V bits -- only whether
1233or not the location may be accessed.
1234
1235<p>Every time your program reads or writes memory, Valgrind checks the
1236A bits associated with the address. If any of them indicate an
1237invalid address, an error is emitted. Note that the reads and writes
1238themselves do not change the A bits, only consult them.
1239
1240<p>So how do the A bits get set/cleared? Like this:
1241
1242<ul>
1243 <li>When the program starts, all the global data areas are marked as
1244 accessible.</li><br>
1245 <p>
1246
1247 <li>When the program does malloc/new, the A bits for the exactly the
1248 area allocated, and not a byte more, are marked as accessible.
1249 Upon freeing the area the A bits are changed to indicate
1250 inaccessibility.</li><br>
1251 <p>
1252
1253 <li>When the stack pointer register (%esp) moves up or down, A bits
1254 are set. The rule is that the area from %esp up to the base of
1255 the stack is marked as accessible, and below %esp is
1256 inaccessible. (If that sounds illogical, bear in mind that the
1257 stack grows down, not up, on almost all Unix systems, including
1258 GNU/Linux.) Tracking %esp like this has the useful side-effect
1259 that the section of stack used by a function for local variables
1260 etc is automatically marked accessible on function entry and
1261 inaccessible on exit.</li><br>
1262 <p>
1263
1264 <li>When doing system calls, A bits are changed appropriately. For
1265 example, mmap() magically makes files appear in the process's
1266 address space, so the A bits must be updated if mmap()
1267 succeeds.</li><br>
1268</ul>
1269
1270
1271<a name="together"></a>
1272<h3>3.3&nbsp; Putting it all together</h3>
1273Valgrind's checking machinery can be summarised as follows:
1274
1275<ul>
1276 <li>Each byte in memory has 8 associated V (valid-value) bits,
1277 saying whether or not the byte has a defined value, and a single
1278 A (valid-address) bit, saying whether or not the program
1279 currently has the right to read/write that address.</li><br>
1280 <p>
1281
1282 <li>When memory is read or written, the relevant A bits are
1283 consulted. If they indicate an invalid address, Valgrind emits
1284 an Invalid read or Invalid write error.</li><br>
1285 <p>
1286
1287 <li>When memory is read into the CPU's integer registers, the
1288 relevant V bits are fetched from memory and stored in the
1289 simulated CPU. They are not consulted.</li><br>
1290 <p>
1291
1292 <li>When an integer register is written out to memory, the V bits
1293 for that register are written back to memory too.</li><br>
1294 <p>
1295
1296 <li>When memory is read into the CPU's floating point registers, the
1297 relevant V bits are read from memory and they are immediately
1298 checked. If any are invalid, an uninitialised value error is
1299 emitted. This precludes using the floating-point registers to
1300 copy possibly-uninitialised memory, but simplifies Valgrind in
1301 that it does not have to track the validity status of the
1302 floating-point registers.</li><br>
1303 <p>
1304
1305 <li>As a result, when a floating-point register is written to
1306 memory, the associated V bits are set to indicate a valid
1307 value.</li><br>
1308 <p>
1309
1310 <li>When values in integer CPU registers are used to generate a
1311 memory address, or to determine the outcome of a conditional
1312 branch, the V bits for those values are checked, and an error
1313 emitted if any of them are undefined.</li><br>
1314 <p>
1315
1316 <li>When values in integer CPU registers are used for any other
1317 purpose, Valgrind computes the V bits for the result, but does
1318 not check them.</li><br>
1319 <p>
1320
1321 <li>One the V bits for a value in the CPU have been checked, they
1322 are then set to indicate validity. This avoids long chains of
1323 errors.</li><br>
1324 <p>
1325
1326 <li>When values are loaded from memory, valgrind checks the A bits
1327 for that location and issues an illegal-address warning if
1328 needed. In that case, the V bits loaded are forced to indicate
1329 Valid, despite the location being invalid.
1330 <p>
1331 This apparently strange choice reduces the amount of confusing
1332 information presented to the user. It avoids the
1333 unpleasant phenomenon in which memory is read from a place which
1334 is both unaddressible and contains invalid values, and, as a
1335 result, you get not only an invalid-address (read/write) error,
1336 but also a potentially large set of uninitialised-value errors,
1337 one for every time the value is used.
1338 <p>
1339 There is a hazy boundary case to do with multi-byte loads from
1340 addresses which are partially valid and partially invalid. See
1341 details of the flag <code>--partial-loads-ok</code> for details.
1342 </li><br>
1343</ul>
1344
1345Valgrind intercepts calls to malloc, calloc, realloc, valloc,
1346memalign, free, new and delete. The behaviour you get is:
1347
1348<ul>
1349
1350 <li>malloc/new: the returned memory is marked as addressible but not
1351 having valid values. This means you have to write on it before
1352 you can read it.</li><br>
1353 <p>
1354
1355 <li>calloc: returned memory is marked both addressible and valid,
1356 since calloc() clears the area to zero.</li><br>
1357 <p>
1358
1359 <li>realloc: if the new size is larger than the old, the new section
1360 is addressible but invalid, as with malloc.</li><br>
1361 <p>
1362
1363 <li>If the new size is smaller, the dropped-off section is marked as
1364 unaddressible. You may only pass to realloc a pointer
1365 previously issued to you by malloc/calloc/new/realloc.</li><br>
1366 <p>
1367
1368 <li>free/delete: you may only pass to free a pointer previously
1369 issued to you by malloc/calloc/new/realloc, or the value
1370 NULL. Otherwise, Valgrind complains. If the pointer is indeed
1371 valid, Valgrind marks the entire area it points at as
1372 unaddressible, and places the block in the freed-blocks-queue.
1373 The aim is to defer as long as possible reallocation of this
1374 block. Until that happens, all attempts to access it will
1375 elicit an invalid-address error, as you would hope.</li><br>
1376</ul>
1377
1378
1379
1380<a name="signals"></a>
1381<h3>3.4&nbsp; Signals</h3>
1382
1383Valgrind provides suitable handling of signals, so, provided you stick
1384to POSIX stuff, you should be ok. Basic sigaction() and sigprocmask()
1385are handled. Signal handlers may return in the normal way or do
1386longjmp(); both should work ok. As specified by POSIX, a signal is
1387blocked in its own handler. Default actions for signals should work
1388as before. Etc, etc.
1389
1390<p>Under the hood, dealing with signals is a real pain, and Valgrind's
1391simulation leaves much to be desired. If your program does
1392way-strange stuff with signals, bad things may happen. If so, let me
1393know. I don't promise to fix it, but I'd at least like to be aware of
1394it.
1395
1396
1397<a name="leaks"><a/>
1398<h3>3.5&nbsp; Memory leak detection</h3>
1399
1400Valgrind keeps track of all memory blocks issued in response to calls
1401to malloc/calloc/realloc/new. So when the program exits, it knows
1402which blocks are still outstanding -- have not been returned, in other
1403words. Ideally, you want your program to have no blocks still in use
1404at exit. But many programs do.
1405
1406<p>For each such block, Valgrind scans the entire address space of the
1407process, looking for pointers to the block. One of three situations
1408may result:
1409
1410<ul>
1411 <li>A pointer to the start of the block is found. This usually
1412 indicates programming sloppiness; since the block is still
1413 pointed at, the programmer could, at least in principle, free'd
1414 it before program exit.</li><br>
1415 <p>
1416
1417 <li>A pointer to the interior of the block is found. The pointer
1418 might originally have pointed to the start and have been moved
1419 along, or it might be entirely unrelated. Valgrind deems such a
1420 block as "dubious", that is, possibly leaked,
1421 because it's unclear whether or
1422 not a pointer to it still exists.</li><br>
1423 <p>
1424
1425 <li>The worst outcome is that no pointer to the block can be found.
1426 The block is classified as "leaked", because the
1427 programmer could not possibly have free'd it at program exit,
1428 since no pointer to it exists. This might be a symptom of
1429 having lost the pointer at some earlier point in the
1430 program.</li>
1431</ul>
1432
1433Valgrind reports summaries about leaked and dubious blocks.
1434For each such block, it will also tell you where the block was
1435allocated. This should help you figure out why the pointer to it has
1436been lost. In general, you should attempt to ensure your programs do
1437not have any leaked or dubious blocks at exit.
1438
1439<p>The precise area of memory in which Valgrind searches for pointers
1440is: all naturally-aligned 4-byte words for which all A bits indicate
1441addressibility and all V bits indicated that the stored value is
1442actually valid.
1443
1444<p><hr width="100%">
1445
1446
1447<a name="limits"></a>
1448<h2>4&nbsp; Limitations</h2>
1449
1450The following list of limitations seems depressingly long. However,
1451most programs actually work fine.
1452
1453<p>Valgrind will run x86-GNU/Linux ELF dynamically linked binaries, on
1454a kernel 2.4.X system, subject to the following constraints:
1455
1456<ul>
1457 <li>No MMX, SSE, SSE2, 3DNow instructions. If the translator
1458 encounters these, Valgrind will simply give up. It may be
1459 possible to add support for them at a later time. Intel added a
1460 few instructions such as "cmov" to the integer instruction set
1461 on Pentium and later processors, and these are supported.
1462 Nevertheless it's safest to think of Valgrind as implementing
1463 the 486 instruction set.</li><br>
1464 <p>
1465
1466 <li>Multithreaded programs are not supported, since I haven't yet
1467 figured out how to do this. To be more specific, it is the
1468 "clone" system call which is not supported. A program calls
1469 "clone" to create threads. Valgrind will abort if this
1470 happens.</li><nr>
1471 <p>
1472
1473 <li>Valgrind assumes that the floating point registers are not used
1474 as intermediaries in memory-to-memory copies, so it immediately
1475 checks V bits in floating-point loads/stores. If you want to
1476 write code which copies around possibly-uninitialised values,
1477 you must ensure these travel through the integer registers, not
1478 the FPU.</li><br>
1479 <p>
1480
1481 <li>If your program does its own memory management, rather than
1482 using malloc/new/free/delete, it should still work, but
1483 Valgrind's error checking won't be so effective.</li><br>
1484 <p>
1485
1486 <li>Valgrind's signal simulation is not as robust as it could be.
1487 Basic POSIX-compliant sigaction and sigprocmask functionality is
1488 supplied, but it's conceivable that things could go badly awry
1489 if you do wierd things with signals. Workaround: don't.
1490 Programs that do non-POSIX signal tricks are in any case
1491 inherently unportable, so should be avoided if
1492 possible.</li><br>
1493 <p>
1494
1495 <li>I have no idea what happens if programs try to handle signals on
1496 an alternate stack (sigaltstack). YMMV.</li><br>
1497 <p>
1498
1499 <li>Programs which switch stacks are not well handled. Valgrind
1500 does have support for this, but I don't have great faith in it.
1501 It's difficult -- there's no cast-iron way to decide whether a
1502 large change in %esp is as a result of the program switching
1503 stacks, or merely allocating a large object temporarily on the
1504 current stack -- yet Valgrind needs to handle the two situations
1505 differently.</li><br>
1506 <p>
1507
1508 <li>x86 instructions, and system calls, have been implemented on
1509 demand. So it's possible, although unlikely, that a program
1510 will fall over with a message to that effect. If this happens,
1511 please mail me ALL the details printed out, so I can try and
1512 implement the missing feature.</li><br>
1513 <p>
1514
1515 <li>x86 floating point works correctly, but floating-point code may
1516 run even more slowly than integer code, due to my simplistic
1517 approach to FPU emulation.</li><br>
1518 <p>
1519
1520 <li>You can't Valgrind-ize statically linked binaries. Valgrind
1521 relies on the dynamic-link mechanism to gain control at
1522 startup.</li><br>
1523 <p>
1524
1525 <li>Memory consumption of your program is majorly increased whilst
1526 running under Valgrind. This is due to the large amount of
1527 adminstrative information maintained behind the scenes. Another
1528 cause is that Valgrind dynamically translates the original
1529 executable and never throws any translation away, except in
1530 those rare cases where self-modifying code is detected.
1531 Translated, instrumented code is 8-12 times larger than the
1532 original (!) so you can easily end up with 15+ MB of
1533 translations when running (eg) a web browser. There's not a lot
1534 you can do about this -- use Valgrind on a fast machine with a lot
1535 of memory and swap space. At some point I may implement a LRU
1536 caching scheme for translations, so as to bound the maximum
1537 amount of memory devoted to them, to say 8 or 16 MB.</li>
1538</ul>
1539
1540
1541Programs which are known not to work are:
1542
1543<ul>
1544 <li>Netscape 4.76 works pretty well on some platforms -- quite
1545 nicely on my AMD K6-III (400 MHz). I can surf, do mail, etc, no
1546 problem. On other platforms is has been observed to crash
1547 during startup. Despite much investigation I can't figure out
1548 why.</li><br>
1549 <p>
1550
1551 <li>kpackage (a KDE front end to rpm) dies because the CPUID
1552 instruction is unimplemented. Easy to fix.</li><br>
1553 <p>
1554
1555 <li>knode (a KDE newsreader) tries to do multithreaded things, and
1556 fails.</li><br>
1557 <p>
1558
1559 <li>emacs starts up but immediately concludes it is out of memory
1560 and aborts. Emacs has it's own memory-management scheme, but I
1561 don't understand why this should interact so badly with
1562 Valgrind.</li><br>
1563 <p>
1564
1565 <li>Gimp and Gnome and GTK-based apps die early on because
1566 of unimplemented system call wrappers. (I'm a KDE user :)
1567 This wouldn't be hard to fix.
1568 </li><br>
1569 <p>
1570
1571 <li>As a consequence of me being a KDE user, almost all KDE apps
1572 work ok -- except those which are multithreaded.
1573 </li><br>
1574 <p>
1575</ul>
1576
1577
1578<p><hr width="100%">
1579
1580
1581<a name="howitworks"></a>
1582<h2>5&nbsp; How it works -- a rough overview</h2>
1583Some gory details, for those with a passion for gory details. You
1584don't need to read this section if all you want to do is use Valgrind.
1585
1586<a name="startb"></a>
1587<h3>5.1&nbsp; Getting started</h3>
1588
1589Valgrind is compiled into a shared object, valgrind.so. The shell
1590script valgrind sets the LD_PRELOAD environment variable to point to
1591valgrind.so. This causes the .so to be loaded as an extra library to
1592any subsequently executed dynamically-linked ELF binary, viz, the
1593program you want to debug.
1594
1595<p>The dynamic linker allows each .so in the process image to have an
1596initialisation function which is run before main(). It also allows
1597each .so to have a finalisation function run after main() exits.
1598
1599<p>When valgrind.so's initialisation function is called by the dynamic
1600linker, the synthetic CPU to starts up. The real CPU remains locked
1601in valgrind.so for the entire rest of the program, but the synthetic
1602CPU returns from the initialisation function. Startup of the program
1603now continues as usual -- the dynamic linker calls all the other .so's
1604initialisation routines, and eventually runs main(). This all runs on
1605the synthetic CPU, not the real one, but the client program cannot
1606tell the difference.
1607
1608<p>Eventually main() exits, so the synthetic CPU calls valgrind.so's
1609finalisation function. Valgrind detects this, and uses it as its cue
1610to exit. It prints summaries of all errors detected, possibly checks
1611for memory leaks, and then exits the finalisation routine, but now on
1612the real CPU. The synthetic CPU has now lost control -- permanently
1613-- so the program exits back to the OS on the real CPU, just as it
1614would have done anyway.
1615
1616<p>On entry, Valgrind switches stacks, so it runs on its own stack.
1617On exit, it switches back. This means that the client program
1618continues to run on its own stack, so we can switch back and forth
1619between running it on the simulated and real CPUs without difficulty.
1620This was an important design decision, because it makes it easy (well,
1621significantly less difficult) to debug the synthetic CPU.
1622
1623
1624<a name="engine"></a>
1625<h3>5.2&nbsp; The translation/instrumentation engine</h3>
1626
1627Valgrind does not directly run any of the original program's code. Only
1628instrumented translations are run. Valgrind maintains a translation
1629table, which allows it to find the translation quickly for any branch
1630target (code address). If no translation has yet been made, the
1631translator - a just-in-time translator - is summoned. This makes an
1632instrumented translation, which is added to the collection of
1633translations. Subsequent jumps to that address will use this
1634translation.
1635
1636<p>Valgrind can optionally check writes made by the application, to
1637see if they are writing an address contained within code which has
1638been translated. Such a write invalidates translations of code
1639bracketing the written address. Valgrind will discard the relevant
1640translations, which causes them to be re-made, if they are needed
1641again, reflecting the new updated data stored there. In this way,
1642self modifying code is supported. In practice I have not found any
1643Linux applications which use self-modifying-code.
1644
1645<p>The JITter translates basic blocks -- blocks of straight-line-code
1646-- as single entities. To minimise the considerable difficulties of
1647dealing with the x86 instruction set, x86 instructions are first
1648translated to a RISC-like intermediate code, similar to sparc code,
1649but with an infinite number of virtual integer registers. Initially
1650each insn is translated seperately, and there is no attempt at
1651instrumentation.
1652
1653<p>The intermediate code is improved, mostly so as to try and cache
1654the simulated machine's registers in the real machine's registers over
1655several simulated instructions. This is often very effective. Also,
1656we try to remove redundant updates of the simulated machines's
1657condition-code register.
1658
1659<p>The intermediate code is then instrumented, giving more
1660intermediate code. There are a few extra intermediate-code operations
1661to support instrumentation; it is all refreshingly simple. After
1662instrumentation there is a cleanup pass to remove redundant value
1663checks.
1664
1665<p>This gives instrumented intermediate code which mentions arbitrary
1666numbers of virtual registers. A linear-scan register allocator is
1667used to assign real registers and possibly generate spill code. All
1668of this is still phrased in terms of the intermediate code. This
1669machinery is inspired by the work of Reuben Thomas (MITE).
1670
1671<p>Then, and only then, is the final x86 code emitted. The
1672intermediate code is carefully designed so that x86 code can be
1673generated from it without need for spare registers or other
1674inconveniences.
1675
1676<p>The translations are managed using a traditional LRU-based caching
1677scheme. The translation cache has a default size of about 14MB.
1678
1679<a name="track"></a>
1680
1681<h3>5.3&nbsp; Tracking the status of memory</h3> Each byte in the
1682process' address space has nine bits associated with it: one A bit and
1683eight V bits. The A and V bits for each byte are stored using a
1684sparse array, which flexibly and efficiently covers arbitrary parts of
1685the 32-bit address space without imposing significant space or
1686performance overheads for the parts of the address space never
1687visited. The scheme used, and speedup hacks, are described in detail
1688at the top of the source file vg_memory.c, so you should read that for
1689the gory details.
1690
1691<a name="sys_calls"></a>
1692
1693<h3>5.4 System calls</h3>
1694All system calls are intercepted. The memory status map is consulted
1695before and updated after each call. It's all rather tiresome. See
1696vg_syscall_mem.c for details.
1697
1698<a name="sys_signals"></a>
1699
1700<h3>5.5&nbsp; Signals</h3>
1701All system calls to sigaction() and sigprocmask() are intercepted. If
1702the client program is trying to set a signal handler, Valgrind makes a
1703note of the handler address and which signal it is for. Valgrind then
1704arranges for the same signal to be delivered to its own handler.
1705
1706<p>When such a signal arrives, Valgrind's own handler catches it, and
1707notes the fact. At a convenient safe point in execution, Valgrind
1708builds a signal delivery frame on the client's stack and runs its
1709handler. If the handler longjmp()s, there is nothing more to be said.
1710If the handler returns, Valgrind notices this, zaps the delivery
1711frame, and carries on where it left off before delivering the signal.
1712
1713<p>The purpose of this nonsense is that setting signal handlers
1714essentially amounts to giving callback addresses to the Linux kernel.
1715We can't allow this to happen, because if it did, signal handlers
1716would run on the real CPU, not the simulated one. This means the
1717checking machinery would not operate during the handler run, and,
1718worse, memory permissions maps would not be updated, which could cause
1719spurious error reports once the handler had returned.
1720
1721<p>An even worse thing would happen if the signal handler longjmp'd
1722rather than returned: Valgrind would completely lose control of the
1723client program.
1724
1725<p>Upshot: we can't allow the client to install signal handlers
1726directly. Instead, Valgrind must catch, on behalf of the client, any
1727signal the client asks to catch, and must delivery it to the client on
1728the simulated CPU, not the real one. This involves considerable
1729gruesome fakery; see vg_signals.c for details.
1730<p>
1731
1732<hr width="100%">
1733
1734<a name="example"></a>
1735<h2>6&nbsp; Example</h2>
1736This is the log for a run of a small program. The program is in fact
1737correct, and the reported error is as the result of a potentially serious
1738code generation bug in GNU g++ (snapshot 20010527).
1739<pre>
1740sewardj@phoenix:~/newmat10$
1741~/Valgrind-6/valgrind -v ./bogon
1742==25832== Valgrind 0.10, a memory error detector for x86 RedHat 7.1.
1743==25832== Copyright (C) 2000-2001, and GNU GPL'd, by Julian Seward.
1744==25832== Startup, with flags:
1745==25832== --suppressions=/home/sewardj/Valgrind/redhat71.supp
1746==25832== reading syms from /lib/ld-linux.so.2
1747==25832== reading syms from /lib/libc.so.6
1748==25832== reading syms from /mnt/pima/jrs/Inst/lib/libgcc_s.so.0
1749==25832== reading syms from /lib/libm.so.6
1750==25832== reading syms from /mnt/pima/jrs/Inst/lib/libstdc++.so.3
1751==25832== reading syms from /home/sewardj/Valgrind/valgrind.so
1752==25832== reading syms from /proc/self/exe
1753==25832== loaded 5950 symbols, 142333 line number locations
1754==25832==
1755==25832== Invalid read of size 4
1756==25832== at 0x8048724: _ZN10BandMatrix6ReSizeEiii (bogon.cpp:45)
1757==25832== by 0x80487AF: main (bogon.cpp:66)
1758==25832== by 0x40371E5E: __libc_start_main (libc-start.c:129)
1759==25832== by 0x80485D1: (within /home/sewardj/newmat10/bogon)
1760==25832== Address 0xBFFFF74C is not stack'd, malloc'd or free'd
1761==25832==
1762==25832== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
1763==25832== malloc/free: in use at exit: 0 bytes in 0 blocks.
1764==25832== malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
1765==25832== For a detailed leak analysis, rerun with: --leak-check=yes
1766==25832==
1767==25832== exiting, did 1881 basic blocks, 0 misses.
1768==25832== 223 translations, 3626 bytes in, 56801 bytes out.
1769</pre>
1770<p>The GCC folks fixed this about a week before gcc-3.0 shipped.
1771<hr width="100%">
1772<p>
njn4f9c9342002-04-29 16:03:24 +00001773
1774
1775
1776<a name="cache"></a>
1777<h2>7&nbsp; Cache profiling</h2>
1778As well as memory debugging, Valgrind also allows you to do cache simulations
1779and annotate your source line-by-line with the number of cache misses. In
1780particular, it records:
1781<ul>
1782 <li>L1 instruction cache reads and misses;
1783 <li>L1 data cache reads and read misses, writes and write misses;
1784 <li>L2 unified cache reads and read misses, writes and writes misses.
1785</ul>
1786On a modern x86 machine, an L1 miss will typically cost around 10 cycles,
1787and an L2 miss can cost as much as 200 cycles. Detailed cache profiling can be
1788very useful for improving the performance of your program.
1789
1790Please note that this is an experimental feature. Any feedback, bug-fixes,
1791suggestions, etc, welcome.
1792
1793
1794<h3>7.1&nbsp; Overview</h3>
1795First off, as for normal Valgrind use, you probably want to turn on debugging
1796info (the <code>-g</code> flag). But by contrast with normal Valgrind use, you
1797probably <b>do</b> want to turn optimisation on, since you should profile your
1798program as it will be normally run.
1799
1800The three steps are:
1801<ol>
1802 <li>Generate a cache simulator for your machine's cache configuration with
1803 `vg_cachegen' and recompile Valgrind with <code>make install</code>.
1804 Valgrind comes with a default simulator, but it is unlikely to be correct
1805 for your system, so you should generate a simulator yourself.</li>
1806 <li>Run your program with <code>valgrind --cachesim=yes</code> in front of
1807 the normal command line invocation. When the program finishes, Valgrind
1808 will print summary cache statistics. It also collects line-by-line
1809 information in a file <code>cachegrind.out</code>.</li>
1810 <li>Generate a function-by-function summary, and possibly annotate source
1811 files with 'vg_annotate'. Source files to annotate can be specified
1812 manually, or manually on the command line, or "interesting" source files
1813 can be annotated automatically with the <code>--auto=yes</code> option.
1814 You can annotate C/C++ files or assembly language files equally
1815 easily.</li>
1816</ol>
1817
1818<a href="#generate">Step 1</a> only needs to be done once, unless you are
1819interested in simulating different cache configurations (eg. first
1820concentrating on instruction cache misses, then on data cache misses).<p>
1821
1822<a href="#profile">Step 2</a> should be done every time you want to collect
1823information about a new program, a changed program, or about the same program
1824with different input.<p>
1825
1826<a href="#annotate">Step 3</a> can be performed as many times as you like for
1827each Step 2; you may want to do multiple annotations showing different
1828information each time.<p>
1829
1830The steps are described in detail in the following sections.<p>
1831
1832
1833<a name="generate"></a>
1834<h3>7.3&nbsp; Generating a cache simulator</h3>
1835Although Valgrind comes with a pre-generated cache simulator, it most likely
1836won't match the cache configuration of your machine, so you should generate
1837a new simulator.<p>
1838
1839You need to generate three files, one for each of the I1, D1 and L2 caches.
1840For each cache, you need to know the:
1841<ul>
1842 <li>Cache size (bytes);
1843 <li>Line size (bytes);
1844 <li>Associativity.
1845</ul>
1846
1847vg_cachegen takes three options:
1848<ul>
1849 <li><code>--I1=size,line_size,associativity</code>
1850 <li><code>--D1=size,line_size,associativity</code>
1851 <li><code>--L2=size,line_size,associativity</code>
1852</ul>
1853
1854You can specify one, two or all three caches per invocation of vg_cachegen. It
1855checks that the configuration is sensible before generating the simulators; to
1856see the allowed values, run <code>vg_cachegen -h</code>.<p>
1857
1858An example invocation would be:
1859
1860<blockquote><code>
1861 vg_cachegen --I1=65536,64,2 --D1=65536,64,2 --L2=262144,64,8
1862</code></blockquote>
1863
1864This simulates a machine with a 128KB split L1 2-way associative cache, and a
1865256KB unified 8-way associative L2 cache. Both caches have 64B lines.<p>
1866
1867If you don't know your cache configuration, you'll have to find it out.
1868(Ideally vg_cachegen could auto-identify your cache configuration using the
1869CPUID instruction, which could be done automatically during installation, and
1870this whole step could be skipped...)<p>
1871
1872
1873<h3>7.4&nbsp; Cache simulation specifics</h3>
1874vg_cachegen only generates simulations for a machine with a split L1 cache and
1875a unified L2 cache. This configuration is used for all x86-based machines we
1876are aware of.<p>
1877
1878The more specific characteristics of the simulation are as follows.
1879
1880<ul>
1881 <li>Write-allocate: when a write miss occurs, the block written to is brought
1882 into the D1 cache. Most modern caches have this property.</li><p>
1883
1884 <li>Bit-selection hash function: the line(s) in the cache to which a memory
1885 block maps is chosen by the middle bits M--(M+N-1) of the byte address,
1886 where:
1887 <ul>
1888 <li>&nbsp;line size = 2^M bytes&nbsp;</li>
1889 <li>(cache size / line size) = 2^N bytes</li>
1890 </ul> </li><p>
1891
1892 <li>Inclusive L2 cache: the L2 cache replicates all the entries of the L1
1893 cache. This is standard on Pentium chips, but AMD Athlons use an
1894 exclusive L2 cache that only holds blocks evicted from L1.</li><p>
1895</ul>
1896
1897Other noteworthy behaviour:
1898
1899<ul>
1900 <li>References that straddle two cache lines are treated as follows:</li>
1901 <ul>
1902 <li>If both blocks hit --&gt; counted as one hit</li>
1903 <li>If one block hits, the other misses --&gt; counted as one miss</li>
1904 <li>If both blocks miss --&gt; counted as one miss (not two)</li>
1905 </ul><p>
1906
1907 <li>Instructions that modify a memory location (eg. <code>inc</code> and
1908 <code>dec</code>) are counted as doing just a read, ie. a single data
1909 reference. This may seem strange, but since the write can never cause a
1910 miss (the read guarantees the block is in the cache) it's not very
1911 interesting.<p>
1912
1913 Thus it measures not the number of times the data cache is accessed, but
1914 the number of times a data cache miss could occur.<p>
1915 </li>
1916</ul>
1917
1918If you are interested in simulating a cache with different properties, it is
1919not particularly hard to write your own cache simulator, or to modify existing
1920ones in <code>vg_cachesim_I1.c</code>, <code>vg_cachesim_I1.c</code> and
1921<code>vg_cachesim_I1.c</code>. We'd be interested to hear from anyone who
1922does.
1923
1924
1925<a name="profile"></a>
1926<h3>7.5&nbsp; Profiling programs</h3>
1927Cache profiling is enabled by using the <code>--cachesim=yes</code> option to
1928Valgrind. This automatically turns off Valgrind's memory checking functions,
1929since the cache simulation is slow enough already, and you probably don't want
1930to do both at once.<p>
1931
1932To gather cache profiling information about the program <code>ls -l<code, type:
1933
1934<blockquote><code>valgrind --cachesim=yes ls -l</code></blockquote>
1935
1936The program will execute (slowly). Upon completion, summary statistics
1937that look like this will be printed:
1938
1939<pre>
1940==31751== I refs: 27,742,716
1941==31751== I1 misses: 276
1942==31751== L2 misses: 275
1943==31751== I1 miss rate: 0.0%
1944==31751== L2i miss rate: 0.0%
1945==31751==
1946==31751== D refs: 15,430,290 (10,955,517 rd + 4,474,773 wr)
1947==31751== D1 misses: 41,185 ( 21,905 rd + 19,280 wr)
1948==31751== L2 misses: 23,085 ( 3,987 rd + 19,098 wr)
1949==31751== D1 miss rate: 0.2% ( 0.1% + 0.4%)
1950==31751== L2d miss rate: 0.1% ( 0.0% + 0.4%)
1951==31751==
1952==31751== L2 misses: 23,360 ( 4,262 rd + 19,098 wr)
1953==31751== L2 miss rate: 0.0% ( 0.0% + 0.4%)
1954</pre>
1955
1956Cache accesses for instruction fetches are summarised first, giving the
1957number of fetches made (this is the number of instructions executed, which
1958can be useful to know in its own right), the number of I1 misses, and the
1959number of L2 instruction (<code>L2i</code>) misses.<p>
1960
1961Cache accesses for data follow. The information is similar to that of the
1962instruction fetches, except that the values are also shown split between reads
1963and writes (note each row's <code>rd</code> and <code>wr</code> values add up
1964to the row's total).<p>
1965
1966Combined instruction and data figures for the L2 cache follow that.<p>
1967
1968
1969<h3>7.6&nbsp; Output file</h3>
1970As well as printing summary information, Valgrind also writes line-by-line
1971cache profiling information to a file named <code>cachegrind.out</code> . This
1972file is human-readable, but is best interpreted by the accompanying program
1973vg_annotate, described in the next section.<p>
1974
1975Things to note about the <code>cachegrind.out</code> file:
1976<ul>
1977 <li>It is written every time <code>valgrind --cachesim=yes</code> is run; it
1978 will automatically overwrite any existing <code>cachegrind.out<code/> in
1979 the current directory.</li>
1980 <li>It can be quite large: <code>ls -l</code> generates a file of about
1981 350KB; browsing a few files and web pages with Konqueror generates a file
1982 of around 10MB.</li>
1983</ul>
1984
1985
1986<a name="annotate"></a>
1987<h3>7.7&nbsp; Annotating C/C++ programs</h3>
1988Before using vg_annotate, it is worth widening your window to be at least
1989120-characters wide if possible, as the output lines can be quite long.<p>
1990
1991To get a function-by-function summary, run <code>vg_annotate</code> in
1992directory containing a <code>cachegrind.out</code> file. The output looks like
1993this:
1994
1995<pre>
1996--------------------------------------------------------------------------------
1997I1 cache: 65536 B, 64 B, 2-way associative
1998D1 cache: 65536 B, 64 B, 2-way associative
1999L2 cache: 262144 B, 64 B, 8-way associative
2000Command: concord vg_to_ucode.c
2001Events recorded: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
2002Events shown: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
2003Event sort order: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
2004Threshold: 99%
2005Chosen for annotation:
2006Auto-annotation: on
2007
2008--------------------------------------------------------------------------------
2009Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
2010--------------------------------------------------------------------------------
201127,742,716 276 275 10,955,517 21,905 3,987 4,474,773 19,280 19,098 PROGRAM TOTALS
2012
2013--------------------------------------------------------------------------------
2014Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw file:function
2015--------------------------------------------------------------------------------
20168,821,482 5 5 2,242,702 1,621 73 1,794,230 0 0 getc.c:_IO_getc
20175,222,023 4 4 2,276,334 16 12 875,959 1 1 concord.c:get_word
20182,649,248 2 2 1,344,810 7,326 1,385 . . . vg_main.c:strcmp
20192,521,927 2 2 591,215 0 0 179,398 0 0 concord.c:hash
20202,242,740 2 2 1,046,612 568 22 448,548 0 0 ctype.c:tolower
20211,496,937 4 4 630,874 9,000 1,400 279,388 0 0 concord.c:insert
2022 897,991 51 51 897,831 95 30 62 1 1 ???:???
2023 598,068 1 1 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__flockfile
2024 598,068 0 0 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__funlockfile
2025 598,024 4 4 213,580 35 16 149,506 0 0 vg_clientmalloc.c:malloc
2026 446,587 1 1 215,973 2,167 430 129,948 14,057 13,957 concord.c:add_existing
2027 341,760 2 2 128,160 0 0 128,160 0 0 vg_clientmalloc.c:vg_trap_here_WRAPPER
2028 320,782 4 4 150,711 276 0 56,027 53 53 concord.c:init_hash_table
2029 298,998 1 1 106,785 0 0 64,071 1 1 concord.c:create
2030 149,518 0 0 149,516 0 0 1 0 0 ???:tolower@@GLIBC_2.0
2031 149,518 0 0 149,516 0 0 1 0 0 ???:fgetc@@GLIBC_2.0
2032 95,983 4 4 38,031 0 0 34,409 3,152 3,150 concord.c:new_word_node
2033 85,440 0 0 42,720 0 0 21,360 0 0 vg_clientmalloc.c:vg_bogus_epilogue
2034</pre>
2035
2036First up is a summary of the annotation options:
2037
2038<ul>
2039 <li>I1 cache, D1 cache, L2 cache: cache configuration. So you know the
2040 configuration with which these results were obtained.</li><p>
2041
2042 <li>Command: the command line invocation of the program under
2043 examination.</li><p>
2044
2045 <li>Events recorded: event abbreviations are:<p>
2046 <ul>
2047 <li><code>Ir </code>: I cache reads (ie. instructions executed)</li>
2048 <li><code>I1mr</code>: I1 cache read misses</li>
2049 <li><code>I2mr</code>: L2 cache instruction read misses</li>
2050 <li><code>Dr </code>: D cache reads (ie. memory reads)</li>
2051 <li><code>D1mr</code>: D1 cache read misses</li>
2052 <li><code>D2mr</code>: L2 cache data read misses</li>
2053 <li><code>Dw </code>: D cache writes (ie. memory writes)</li>
2054 <li><code>D1mw</code>: D1 cache write misses</li>
2055 <li><code>D2mw</code>: L2 cache data write misses</li>
2056 </ul><p>
2057 Note that D1 total accesses is given by <code>D1mr</code> +
2058 <code>D1mw</code>, and that L2 total accesses is given by
2059 <code>I2mr</code> + <code>D2mr</code> + <code>D2mw</code>.</li><p>
2060
2061 <li>Events shown: the events shown (a subset of events gathered). This can
2062 be adjusted with the <code>--show</code> option.</li><p>
2063
2064 <li>Event sort order: the sort order in which functions are shown. For
2065 example, in this case the functions are sorted from highest
2066 <code>Ir</code> counts to lowest. If two functions have identical
2067 <code>Ir</code> counts, they will then be sorted by <code>I1mr</code>
2068 counts, and so on. This order can be adjusted with the
2069 <code>--sort</code> option.<p>
2070
2071 Note that this dictates the order the functions appear. It is <b>not</b>
2072 the order in which the columns appear; that is dictated by the "events
2073 shown" line (and can be changed with the <code>--sort</code> option).
2074 </li><p>
2075
2076 <li>Threshold: vg_annotate by default omits functions that cause very low
2077 numbers of misses to avoid drowing you in information. In this case,
2078 vg_annotate shows summaries the functions that account for 99% of the
2079 <code>Ir</code> counts; <code>Ir</code> is chosen as the treshold event
2080 since it is the primary sort event. The threshold can be adjusted with
2081 the <code>--threshold</code> option.</li><p>
2082
2083 <li>Chosen for annotation: names of files specified manually for annotation;
2084 in this case none.</li><p>
2085
2086 <li>Auto-annotation: whether auto-annotation was requested via the
2087 <code>--auto=yes</code> option. In this case no.</li><p>
2088</ul>
2089
2090Then follows summary statistics for the whole program. These are similar
2091to the summary provided when running <code>valgrind --cachesim=yes</code>.<p>
2092
2093Then follows function-by-function statistics. Each function is identified by a
2094<code>file_name:function_name</code> pair. If a column contains only a
2095`.' it means the function never performs that event (eg. the third row shows
2096that <code>strcmp()</code> contains no instructions that write to memory). The
2097name <code>???</code> is used if the the file name and/or function name could
2098not be determined from debugging information. (If most of the entries have the
2099form <code>???:???</code> the program probably wasn't compiled with
2100<code>-g</code>.)<p>
2101
2102It is worth noting that functions will come from three types of source files:
2103<ol>
2104 <li> From the profiled program (<code>concord.c</code> in this example).</li>
2105 <li>From libraries (eg. <code>getc.c</code>)</li>
2106 <li>From Valgrind's implementation of some libc functions (eg.
2107 <code>vg_clientmalloc.c:malloc</code>). These are recognisable because
2108 the filename begins with <code>vg_</code>, and is probably one of
2109 <code>vg_main.c</code>, <code>vg_clientmalloc.c</code> or
2110 <code>vg_mylibc.c</code>.
2111 </li>
2112</ol>
2113
2114There are two ways to annotate source files -- by choosing them manually, or
2115with the <code>--auto=yes</code> option. To do it manually, just
2116specify the filenames as arguments to vg_annotate. For example, the output from
2117running <code>vg_annotate concord.c</code> for our example produces the same
2118output as above followed by an annotated version of <code>concord.c</code>, a
2119section of which looks like:
2120
2121<pre>
2122--------------------------------------------------------------------------------
2123-- User-annotated source: concord.c
2124--------------------------------------------------------------------------------
2125Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
2126
2127[snip]
2128
2129 . . . . . . . . . void init_hash_table(char *file_name, Word_Node *table[])
2130 3 1 1 . . . 1 0 0 {
2131 . . . . . . . . . FILE *file_ptr;
2132 . . . . . . . . . Word_Info *data;
2133 1 0 0 . . . 1 1 1 int line = 1, i;
2134 . . . . . . . . .
2135 5 0 0 . . . 3 0 0 data = (Word_Info *) create(sizeof(Word_Info));
2136 . . . . . . . . .
2137 4,991 0 0 1,995 0 0 998 0 0 for (i = 0; i < TABLE_SIZE; i++)
2138 3,988 1 1 1,994 0 0 997 53 52 table[i] = NULL;
2139 . . . . . . . . .
2140 . . . . . . . . . /* Open file, check it. */
2141 6 0 0 1 0 0 4 0 0 file_ptr = fopen(file_name, "r");
2142 2 0 0 1 0 0 . . . if (!(file_ptr)) {
2143 . . . . . . . . . fprintf(stderr, "Couldn't open '%s'.\n", file_name);
2144 1 1 1 . . . . . . exit(EXIT_FAILURE);
2145 . . . . . . . . . }
2146 . . . . . . . . .
2147 165,062 1 1 73,360 0 0 91,700 0 0 while ((line = get_word(data, line, file_ptr)) != EOF)
2148 146,712 0 0 73,356 0 0 73,356 0 0 insert(data->;word, data->line, table);
2149 . . . . . . . . .
2150 4 0 0 1 0 0 2 0 0 free(data);
2151 4 0 0 1 0 0 2 0 0 fclose(file_ptr);
2152 3 0 0 2 0 0 . . . }
2153</pre>
2154
2155(Although column widths are automatically minimised, a wide terminal is clearly
2156useful.)<p>
2157
2158Each source file is clearly marked (<code>User-annotated source</code>) as
2159having been chosen manually for annotation. If the file was found in one of
2160the directories specified with the <code>-I</code>/<code>--include</code>
2161option, the directory and file are both given.<p>
2162
2163Each line is annotated with its event counts. Events not applicable for a line
2164are represented by a `.'; this is useful for distinguishing between an event
2165which cannot happen, and one which can but did not.<p>
2166
2167Sometimes only a small section of a source file is executed. To minimise
2168uninteresting output, Valgrind only shows annotated lines and lines within a
2169small distance of annotated lines. Gaps are marked with the line numbers so
2170you know which part of a file the shown code comes from, eg:
2171
2172<pre>
2173(figures and code for line 704)
2174-- line 704 ----------------------------------------
2175-- line 878 ----------------------------------------
2176(figures and code for line 878)
2177</pre>
2178
2179The amount of context to show around annotated lines is controlled by the
2180<code>--context</code> option.<p>
2181
2182To get automatic annotation, run <code>vg_annotate --auto=yes</code>.
2183vg_annotate will automatically annotate every source file it can find that is
2184mentioned in the function-by-function summary. Therefore, the files chosen for
2185auto-annotation are affected by the <code>--sort</code> and
2186<code>--threshold</code> options. Each source file is clearly marked
2187(<code>Auto-annotated source</code>) as being chosen automatically. Any files
2188that could not be found are mentioned at the end of the output, eg:
2189
2190<pre>
2191--------------------------------------------------------------------------------
2192The following files chosen for auto-annotation could not be found:
2193--------------------------------------------------------------------------------
2194 getc.c
2195 ctype.c
2196 ../sysdeps/generic/lockfile.c
2197</pre>
2198
2199This is quite common for library files, since libraries are usually compiled
2200with debugging information, but the source files are often not present on a
2201system. If a file is chosen for annotation <b>both</b> manually and
2202automatically, it is marked as <code>User-annotated source</code>.
2203
2204Use the <code>-I/--include</code> option to tell Valgrind where to look for
2205source files if the filenames found from the debugging information aren't
2206specific enough.
2207
2208Beware that vg_annotate can take some time to digest large
2209<code>cachegrind.out</code> files, eg. 30 seconds or more. Also beware that
2210auto-annotation can produce a lot of output if your program is large!
2211
2212
2213<h3>7.8&nbsp; Annotating assembler programs</h3>
2214Valgrind can annotate assembler programs too, or annotate the assembler
2215generated for your C program. Sometimes this is useful for understanding what
2216is really happening when an interesting line of C code is translated into
2217multiple instructions.<p>
2218
2219To do this, you just need to assemble your <code>.s</code> files with
2220assembler-level debug information. gcc doesn't do this, but you can use GNU as
2221with the <code>--gstabs</code> option to generate object files with this
2222information, eg:
2223
2224<blockquote><code>as --gstabs foo.s</code></blockquote>
2225
2226You can then profile and annotate source files in the same way as for C/C++
2227programs.
2228
2229
2230<h3>7.9&nbsp; vg_annotate options</h3>
2231<ul>
2232 <li><code>-h, --help</code></li><p>
2233 <li><code>-v, --version</code><p>
2234
2235 Help and version, as usual.</li>
2236
2237 <li><code>--sort=A,B,C</code> [default: order in
2238 <code>cachegrind.out</code>]<p>
2239 Specifies the events upon which the sorting of the function-by-function
2240 entries will be based. Useful if you want to concentrate on eg. I cache
2241 misses (<code>--sort=I1mr,I2mr</code>), or D cache misses
2242 (<code>--sort=D1mr,D2mr</code>), or L2 misses
2243 (<code>--sort=D2mr,I2mr</code>).</li><p>
2244
2245 <li><code>--show=A,B,C</code> [default: all, using order in
2246 <code>cachegrind.out</code>]<p>
2247 Specifies which events to show (and the column order). Default is to use
2248 all present in the <code>cachegrind.out</code> file (and use the order in
2249 the file).</li><p>
2250
2251 <li><code>--threshold=X</code> [default: 99%] <p>
2252 Sets the threshold for the function-by-function summary. Functions are
2253 shown that account for more than X% of all the primary sort events. If
2254 auto-annotating, also affects which files are annotated.</li><p>
2255
2256 <li><code>--auto=no</code> [default]<br>
2257 <code>--auto=yes</code> <p>
2258 When enabled, automatically annotates every file that is mentioned in the
2259 function-by-function summary that can be found. Also gives a list of
2260 those that couldn't be found.
2261
2262 <li><code>--context=N</code> [default: 8]<p>
2263 Print N lines of context before and after each annotated line. Avoids
2264 printing large sections of source files that were not executed. Use a
2265 large number (eg. 10,000) to show all source lines.
2266 </li><p>
2267
2268 <li><code>-I=&lt;dir&gt;, --include=&lt;dir&gt;</code>
2269 [default: empty string]<p>
2270 Adds a directory to the list in which to search for files. Multiple
2271 -I/--include options can be given to add multiple directories.
2272</ul>
2273
2274
2275<h3>7.10&nbsp; Warnings</h3>
2276There are a couple of situations in which vg_annotate issues warnings.
2277
2278<ul>
2279 <li>If a source file is more recent than the <code>cachegrind.out</code>
2280 file. This is because the information in <code>cachegrind.out</code> is
2281 only recorded with line numbers, so if the line numbers change at all in
2282 the source (eg. lines added, deleted, swapped), any annotations will be
2283 incorrect.<p>
2284
2285 <li>If information is recorded about line numbers past the end of a file.
2286 This can be caused by the above problem, ie. shortening the source file
2287 while using an old <code>cachegrind.out</code> file. If this happens,
2288 the figures for the bogus lines are printed anyway (clearly marked as
2289 bogus) in case they are important.</li><p>
2290</ul>
2291
2292
2293<h3>7.10&nbsp; Things to watch out for</h3>
2294Some odd things that can occur during annotation:
2295
2296<ul>
2297 <li>If annotating at the assembler level, you might see something like this:
2298
2299 <pre>
2300 1 0 0 . . . . . . leal -12(%ebp),%eax
2301 1 0 0 . . . 1 0 0 movl %eax,84(%ebx)
2302 2 0 0 0 0 0 1 0 0 movl $1,-20(%ebp)
2303 . . . . . . . . . .align 4,0x90
2304 1 0 0 . . . . . . movl $.LnrB,%eax
2305 1 0 0 . . . 1 0 0 movl %eax,-16(%ebp)
2306 </pre>
2307
2308 How can the third instruction be executed twice when the others are
2309 executed only once? As it turns out, it isn't. Here's a dump of the
2310 executable, from objdump:
2311
2312 <pre>
2313 8048f25: 8d 45 f4 lea 0xfffffff4(%ebp),%eax
2314 8048f28: 89 43 54 mov %eax,0x54(%ebx)
2315 8048f2b: c7 45 ec 01 00 00 00 movl $0x1,0xffffffec(%ebp)
2316 8048f32: 89 f6 mov %esi,%esi
2317 8048f34: b8 08 8b 07 08 mov $0x8078b08,%eax
2318 8048f39: 89 45 f0 mov %eax,0xfffffff0(%ebp)
2319 </pre>
2320
2321 Notice the extra <code>mov %esi,%esi</code> instruction. Where did this
2322 come from? The GNU assembler inserted it to serve as the two bytes of
2323 padding needed to align the <code>movl $.LnrB,%eax</code> instruction on
2324 a four-byte boundary, but pretended it didn't exist when adding debug
2325 information. Thus when Valgrind reads the debug info it thinks that the
2326 <code>movl $0x1,0xffffffec(%ebp)</code> instruction covers the address
2327 range 0x8048f2b--0x804833 by itself, and attributes the counts for the
2328 <code>mov %esi,%esi</code> to it.<p>
2329 </li>
2330
2331 <li>
2332 Inlined functions can cause strange results in the function-by-function
2333 summary. If a function <code>inline_me()</code> is defined in
2334 <code>foo.h</code> and inlined in the functions <code>f1()</code>,
2335 <code>f2()</code> and <code>f3()</code> in <code>bar.c</code>, there will
2336 not be a <code>foo.h:inline_me()</code> function entry. Instead, there
2337 will be separate function entries for each inlining site, ie.
2338 <code>foo.h:f1()</code>, <code>foo.h:f2()</code> and
2339 <code>foo.h:f3()</code>. To find the total counts for
2340 <code>foo.h:inline_me()</code>, add up the counts from each entry.<p>
2341
2342 The reason for this is that although the debug info output by gcc
2343 indicates the switch from <code>bar.c</code> to <code>foo.h</code>, it
2344 doesn't indicate the name of the function in <code>foo.h</code>, so
2345 Valgrind keeps using the old one.<p>
2346
2347 <li>
2348 Sometimes, the same filename might be represented with a relative name
2349 and with an absolute name in different parts of the debug info, eg:
2350 <code>/home/user/proj/proj.h</code> and <code>../proj.h</code>. In this
2351 case, if you use auto-annotation, the file will be annotated twice with
2352 the counts split between the two.<p>
2353 </li>
2354</ul>
2355
2356Note: stabs is not an easy format to read. If you come across bizarre
2357annotations that look like might be caused by a bug in the stabs reader,
2358please let us know.
2359
2360
2361<h3>7.11&nbsp; Accuracy</h3>
2362Valgrind's cache profiling has a number of shortcomings:
2363
2364<ul>
2365 <li>It doesn't account for kernel activity -- the effect of system calls on
2366 the cache contents is ignored.</li><p>
2367
2368 <li>It doesn't account for other process activity (although this is probably
2369 desirable when considering a single program).</li><p>
2370
2371 <li>It doesn't account for virtual-to-physical address mappings; hence the
2372 entire simulation is not a true representation of what's happening in the
2373 cache.</li><p>
2374
2375 <li>It doesn't account for cache misses not visible at the instruction level,
2376 eg. those arising from TLB misses, or speculative execution.</li><p>
2377</ul>
2378
2379Another thing worth nothing is that results are very sensitive. Changing the
2380size of the <code>valgrind.so</code> file, the size of the program being
2381profiled, or even the length of its name can perturb the results. Variations
2382will be small, but don't expect perfectly repeatable results if your program
2383changes at all.<p>
2384
2385While these factors mean you shouldn't trust the results to be super-accurate,
2386hopefully they should be close enough to be useful.<p>
2387
2388
2389<h3>7.12&nbsp; Todo</h3>
2390<ul>
2391 <li>Use CPUID instruction to auto-identify cache configuration during
2392 installation. This would save the user from having to know their cache
2393 configuration and using vg_cachegen.</li><p>
2394 <li>Program start-up/shut-down calls a lot of functions that aren't
2395 interesting and just complicate the output. Would be nice to exclude
2396 these somehow.</li><p>
2397</ul>
2398<hr width="100%">
sewardjde4a1d02002-03-22 01:27:54 +00002399</body>
2400</html>
njn4f9c9342002-04-29 16:03:24 +00002401