| <html> |
| <head> |
| <style type="text/css"> |
| body { background-color: #ffffff; |
| color: #000000; |
| font-family: Times, Helvetica, Arial; |
| font-size: 14pt} |
| h4 { margin-bottom: 0.3em} |
| code { color: #000000; |
| font-family: Courier; |
| font-size: 13pt } |
| pre { color: #000000; |
| font-family: Courier; |
| font-size: 13pt } |
| a:link { color: #0000C0; |
| text-decoration: none; } |
| a:visited { color: #0000C0; |
| text-decoration: none; } |
| a:active { color: #0000C0; |
| text-decoration: none; } |
| </style> |
| </head> |
| |
| <body bgcolor="#ffffff"> |
| |
| <a name="title"> </a> |
| <h1 align=center>Valgrind, snapshot 20020501</h1> |
| <center>This manual was majorly updated on 20020501</center> |
| <p> |
| |
| <center> |
| <a href="mailto:jseward@acm.org">jseward@acm.org<br> |
| Copyright © 2000-2002 Julian Seward |
| <p> |
| Valgrind is licensed under the GNU General Public License, |
| version 2<br> |
| An open-source tool for finding memory-management problems in |
| Linux-x86 executables. |
| </center> |
| |
| <p> |
| |
| <hr width="100%"> |
| <a name="contents"></a> |
| <h2>Contents of this manual</h2> |
| |
| <h4>1 <a href="#intro">Introduction</a></h4> |
| 1.1 <a href="#whatfor">What Valgrind is for</a><br> |
| 1.2 <a href="#whatdoes">What it does with your program</a> |
| |
| <h4>2 <a href="#howtouse">How to use it, and how to make sense |
| of the results</a></h4> |
| 2.1 <a href="#starta">Getting started</a><br> |
| 2.2 <a href="#comment">The commentary</a><br> |
| 2.3 <a href="#report">Reporting of errors</a><br> |
| 2.4 <a href="#suppress">Suppressing errors</a><br> |
| 2.5 <a href="#flags">Command-line flags</a><br> |
| 2.6 <a href="#errormsgs">Explaination of error messages</a><br> |
| 2.7 <a href="#suppfiles">Writing suppressions files</a><br> |
| 2.8 <a href="#clientreq">The Client Request mechanism</a><br> |
| 2.9 <a href="#pthreads">Support for POSIX pthreads</a><br> |
| 2.10 <a href="#install">Building and installing</a><br> |
| 2.11 <a href="#problems">If you have problems</a><br> |
| |
| <h4>3 <a href="#machine">Details of the checking machinery</a></h4> |
| 3.1 <a href="#vvalue">Valid-value (V) bits</a><br> |
| 3.2 <a href="#vaddress">Valid-address (A) bits</a><br> |
| 3.3 <a href="#together">Putting it all together</a><br> |
| 3.4 <a href="#signals">Signals</a><br> |
| 3.5 <a href="#leaks">Memory leak detection</a><br> |
| |
| <h4>4 <a href="#limits">Limitations</a></h4> |
| |
| <h4>5 <a href="#howitworks">How it works -- a rough overview</a></h4> |
| 5.1 <a href="#startb">Getting started</a><br> |
| 5.2 <a href="#engine">The translation/instrumentation engine</a><br> |
| 5.3 <a href="#track">Tracking the status of memory</a><br> |
| 5.4 <a href="#sys_calls">System calls</a><br> |
| 5.5 <a href="#sys_signals">Signals</a><br> |
| |
| <h4>6 <a href="#example">An example</a></h4> |
| |
| <h4>7 <a href="#cache">Cache profiling</a></h4> |
| |
| <h4>8 <a href="techdocs.html">The design and implementation of Valgrind</a></h4> |
| |
| <hr width="100%"> |
| |
| <a name="intro"></a> |
| <h2>1 Introduction</h2> |
| |
| <a name="whatfor"></a> |
| <h3>1.1 What Valgrind is for</h3> |
| |
| Valgrind is a tool to help you find memory-management problems in your |
| programs. When a program is run under Valgrind's supervision, all |
| reads and writes of memory are checked, and calls to |
| malloc/new/free/delete are intercepted. As a result, Valgrind can |
| detect problems such as: |
| <ul> |
| <li>Use of uninitialised memory</li> |
| <li>Reading/writing memory after it has been free'd</li> |
| <li>Reading/writing off the end of malloc'd blocks</li> |
| <li>Reading/writing inappropriate areas on the stack</li> |
| <li>Memory leaks -- where pointers to malloc'd blocks are lost forever</li> |
| </ul> |
| |
| Problems like these can be difficult to find by other means, often |
| lying undetected for long periods, then causing occasional, |
| difficult-to-diagnose crashes. |
| |
| <p> |
| Valgrind is closely tied to details of the CPU, operating system and |
| to a less extent, compiler and basic C libraries. This makes it |
| difficult to make it portable, so I have chosen at the outset to |
| concentrate on what I believe to be a widely used platform: Red Hat |
| Linux 7.2, on x86s. Valgrind uses the standard Unix |
| <code>./configure</code>, <code>make</code>, <code>make install</code> |
| mechanism, and I have attempted to ensure that it works on machines |
| with kernel 2.2 or 2.4 and glibc 2.1.X or 2.2.X. This should cover |
| the vast majority of modern Linux installations. |
| |
| |
| <p> |
| Valgrind is licensed under the GNU General Public License, version |
| 2. Read the file LICENSE in the source distribution for details. |
| |
| <a name="whatdoes"> |
| <h3>1.2 What it does with your program</h3> |
| |
| Valgrind is designed to be as non-intrusive as possible. It works |
| directly with existing executables. You don't need to recompile, |
| relink, or otherwise modify, the program to be checked. Simply place |
| the word <code>valgrind</code> at the start of the command line |
| normally used to run the program. So, for example, if you want to run |
| the command <code>ls -l</code> on Valgrind, simply issue the |
| command: <code>valgrind ls -l</code>. |
| |
| <p>Valgrind takes control of your program before it starts. Debugging |
| information is read from the executable and associated libraries, so |
| that error messages can be phrased in terms of source code |
| locations. Your program is then run on a synthetic x86 CPU which |
| checks every memory access. All detected errors are written to a |
| log. When the program finishes, Valgrind searches for and reports on |
| leaked memory. |
| |
| <p>You can run pretty much any dynamically linked ELF x86 executable |
| using Valgrind. Programs run 25 to 50 times slower, and take a lot |
| more memory, than they usually would. It works well enough to run |
| large programs. For example, the Konqueror web browser from the KDE |
| Desktop Environment, version 3.0, runs slowly but usably on Valgrind. |
| |
| <p>Valgrind simulates every single instruction your program executes. |
| Because of this, it finds errors not only in your application but also |
| in all supporting dynamically-linked (<code>.so</code>-format) |
| libraries, including the GNU C library, the X client libraries, Qt, if |
| you work with KDE, and so on. That often includes libraries, for |
| example the GNU C library, which contain memory access violations, but |
| which you cannot or do not want to fix. |
| |
| <p>Rather than swamping you with errors in which you are not |
| interested, Valgrind allows you to selectively suppress errors, by |
| recording them in a suppressions file which is read when Valgrind |
| starts up. The build mechanism attempts to select suppressions which |
| give reasonable behaviour for the libc and XFree86 versions detected |
| on your machine. |
| |
| |
| <p><a href="#example">Section 6</a> shows an example of use. |
| <p> |
| <hr width="100%"> |
| |
| <a name="howtouse"></a> |
| <h2>2 How to use it, and how to make sense of the results</h2> |
| |
| <a name="starta"></a> |
| <h3>2.1 Getting started</h3> |
| |
| First off, consider whether it might be beneficial to recompile your |
| application and supporting libraries with optimisation disabled and |
| debugging info enabled (the <code>-g</code> flag). You don't have to |
| do this, but doing so helps Valgrind produce more accurate and less |
| confusing error reports. Chances are you're set up like this already, |
| if you intended to debug your program with GNU gdb, or some other |
| debugger. |
| |
| <p> |
| A plausible compromise is to use <code>-g -O</code>. |
| Optimisation levels above <code>-O</code> have been observed, on very |
| rare occasions, to cause gcc to generate code which fools Valgrind's |
| error tracking machinery into wrongly reporting uninitialised value |
| errors. <code>-O</code> gets you the vast majority of the benefits of |
| higher optimisation levels anyway, so you don't lose much there. |
| |
| <p> |
| Note that as of 1 May 2002 Valgrind does not understand the DWARF |
| debugging format, which is unfortunate since the upcoming gcc-3.1 uses |
| it by default. Valgrind only knows about the older "stabs" format. |
| If you use gcc-3.1 or above, you can still ask for stabs-format debug |
| info by passing <code>-gstabs</code> to gcc. |
| |
| <p> |
| Then just run your application, but place the word |
| <code>valgrind</code> in front of your usual command-line invokation. |
| Note that you should run the real (machine-code) executable here. If |
| your application is started by, for example, a shell or perl script, |
| you'll need to modify it to invoke Valgrind on the real executables. |
| Running such scripts directly under Valgrind will result in you |
| getting error reports pertaining to <code>/bin/sh</code>, |
| <code>/usr/bin/perl</code>, or whatever interpreter you're using. |
| This almost certainly isn't what you want and can be confusing. |
| |
| <a name="comment"></a> |
| <h3>2.2 The commentary</h3> |
| |
| Valgrind writes a commentary, detailing error reports and other |
| significant events. The commentary goes to standard output by |
| default. This may interfere with your program, so you can ask for it |
| to be directed elsewhere. |
| |
| <p>All lines in the commentary are of the following form:<br> |
| <pre> |
| ==12345== some-message-from-Valgrind |
| </pre> |
| <p>The <code>12345</code> is the process ID. This scheme makes it easy |
| to distinguish program output from Valgrind commentary, and also easy |
| to differentiate commentaries from different processes which have |
| become merged together, for whatever reason. |
| |
| <p>By default, Valgrind writes only essential messages to the commentary, |
| so as to avoid flooding you with information of secondary importance. |
| If you want more information about what is happening, re-run, passing |
| the <code>-v</code> flag to Valgrind. |
| |
| |
| <a name="report"></a> |
| <h3>2.3 Reporting of errors</h3> |
| |
| When Valgrind detects something bad happening in the program, an error |
| message is written to the commentary. For example:<br> |
| <pre> |
| ==25832== Invalid read of size 4 |
| ==25832== at 0x8048724: BandMatrix::ReSize(int, int, int) (bogon.cpp:45) |
| ==25832== by 0x80487AF: main (bogon.cpp:66) |
| ==25832== by 0x40371E5E: __libc_start_main (libc-start.c:129) |
| ==25832== by 0x80485D1: (within /home/sewardj/newmat10/bogon) |
| ==25832== Address 0xBFFFF74C is not stack'd, malloc'd or free'd |
| </pre> |
| |
| <p>This message says that the program did an illegal 4-byte read of |
| address 0xBFFFF74C, which, as far as it can tell, is not a valid stack |
| address, nor corresponds to any currently malloc'd or free'd blocks. |
| The read is happening at line 45 of <code>bogon.cpp</code>, called |
| from line 66 of the same file, etc. For errors associated with an |
| identified malloc'd/free'd block, for example reading free'd memory, |
| Valgrind reports not only the location where the error happened, but |
| also where the associated block was malloc'd/free'd. |
| |
| <p>Valgrind remembers all error reports. When an error is detected, |
| it is compared against old reports, to see if it is a duplicate. If |
| so, the error is noted, but no further commentary is emitted. This |
| avoids you being swamped with bazillions of duplicate error reports. |
| |
| <p>If you want to know how many times each error occurred, run with |
| the <code>-v</code> option. When execution finishes, all the reports |
| are printed out, along with, and sorted by, their occurrence counts. |
| This makes it easy to see which errors have occurred most frequently. |
| |
| <p>Errors are reported before the associated operation actually |
| happens. For example, if you program decides to read from address |
| zero, Valgrind will emit a message to this effect, and the program |
| will then duly die with a segmentation fault. |
| |
| <p>In general, you should try and fix errors in the order that they |
| are reported. Not doing so can be confusing. For example, a program |
| which copies uninitialised values to several memory locations, and |
| later uses them, will generate several error messages. The first such |
| error message may well give the most direct clue to the root cause of |
| the problem. |
| |
| <p>The process of detecting duplicate errors is quite an expensive |
| one and can become a significant performance overhead if your program |
| generates huge quantities of errors. To avoid serious problems here, |
| Valgrind will simply stop collecting errors after 300 different errors |
| have been seen, or 30000 errors in total have been seen. In this |
| situation you might as well stop your program and fix it, because |
| Valgrind won't tell you anything else useful after this. Note that |
| the 300/30000 limits apply after suppressed errors are removed. These |
| limits are defined in <code>vg_include.h</code> and can be increased |
| if necessary. |
| |
| <a name="suppress"></a> |
| <h3>2.4 Suppressing errors</h3> |
| |
| Valgrind detects numerous problems in the base libraries, such as the |
| GNU C library, and the XFree86 client libraries, which come |
| pre-installed on your GNU/Linux system. You can't easily fix these, |
| but you don't want to see these errors (and yes, there are many!) So |
| Valgrind reads a list of errors to suppress at startup. |
| A default suppression file is cooked up by the |
| <code>./configure</code> script. |
| |
| <p>You can modify and add to the suppressions file at your leisure, |
| or, better, write your own. Multiple suppression files are allowed. |
| This is useful if part of your project contains errors you can't or |
| don't want to fix, yet you don't want to continuously be reminded of |
| them. |
| |
| <p>Each error to be suppressed is described very specifically, to |
| minimise the possibility that a suppression-directive inadvertantly |
| suppresses a bunch of similar errors which you did want to see. The |
| suppression mechanism is designed to allow precise yet flexible |
| specification of errors to suppress. |
| |
| <p>If you use the <code>-v</code> flag, at the end of execution, Valgrind |
| prints out one line for each used suppression, giving its name and the |
| number of times it got used. Here's the suppressions used by a run of |
| <code>ls -l</code>: |
| <pre> |
| --27579-- supp: 1 socketcall.connect(serv_addr)/__libc_connect/__nscd_getgrgid_r |
| --27579-- supp: 1 socketcall.connect(serv_addr)/__libc_connect/__nscd_getpwuid_r |
| --27579-- supp: 6 strrchr/_dl_map_object_from_fd/_dl_map_object |
| </pre> |
| |
| <a name="flags"></a> |
| <h3>2.5 Command-line flags</h3> |
| |
| You invoke Valgrind like this: |
| <pre> |
| valgrind [options-for-Valgrind] your-prog [options for your-prog] |
| </pre> |
| |
| <p>Note that Valgrind also reads options from the environment variable |
| <code>$VALGRIND</code>, and processes them before the command-line |
| options. |
| |
| <p>Valgrind's default settings succeed in giving reasonable behaviour |
| in most cases. Available options, in no particular order, are as |
| follows: |
| <ul> |
| <li><code>--help</code></li><br> |
| |
| <li><code>--version</code><br> |
| <p>The usual deal.</li><br><p> |
| |
| <li><code>-v --verbose</code><br> |
| <p>Be more verbose. Gives extra information on various aspects |
| of your program, such as: the shared objects loaded, the |
| suppressions used, the progress of the instrumentation engine, |
| and warnings about unusual behaviour. |
| </li><br><p> |
| |
| <li><code>-q --quiet</code><br> |
| <p>Run silently, and only print error messages. Useful if you |
| are running regression tests or have some other automated test |
| machinery. |
| </li><br><p> |
| |
| <li><code>--demangle=no</code><br> |
| <code>--demangle=yes</code> [the default] |
| <p>Disable/enable automatic demangling (decoding) of C++ names. |
| Enabled by default. When enabled, Valgrind will attempt to |
| translate encoded C++ procedure names back to something |
| approaching the original. The demangler handles symbols mangled |
| by g++ versions 2.X and 3.X. |
| |
| <p>An important fact about demangling is that function |
| names mentioned in suppressions files should be in their mangled |
| form. Valgrind does not demangle function names when searching |
| for applicable suppressions, because to do otherwise would make |
| suppressions file contents dependent on the state of Valgrind's |
| demangling machinery, and would also be slow and pointless. |
| </li><br><p> |
| |
| <li><code>--num-callers=<number></code> [default=4]<br> |
| <p>By default, Valgrind shows four levels of function call names |
| to help you identify program locations. You can change that |
| number with this option. This can help in determining the |
| program's location in deeply-nested call chains. Note that errors |
| are commoned up using only the top three function locations (the |
| place in the current function, and that of its two immediate |
| callers). So this doesn't affect the total number of errors |
| reported. |
| <p> |
| The maximum value for this is 50. Note that higher settings |
| will make Valgrind run a bit more slowly and take a bit more |
| memory, but can be useful when working with programs with |
| deeply-nested call chains. |
| </li><br><p> |
| |
| <li><code>--gdb-attach=no</code> [the default]<br> |
| <code>--gdb-attach=yes</code> |
| <p>When enabled, Valgrind will pause after every error shown, |
| and print the line |
| <br> |
| <code>---- Attach to GDB ? --- [Return/N/n/Y/y/C/c] ----</code> |
| <p> |
| Pressing <code>Ret</code>, or <code>N</code> <code>Ret</code> |
| or <code>n</code> <code>Ret</code>, causes Valgrind not to |
| start GDB for this error. |
| <p> |
| <code>Y</code> <code>Ret</code> |
| or <code>y</code> <code>Ret</code> causes Valgrind to |
| start GDB, for the program at this point. When you have |
| finished with GDB, quit from it, and the program will continue. |
| Trying to continue from inside GDB doesn't work. |
| <p> |
| <code>C</code> <code>Ret</code> |
| or <code>c</code> <code>Ret</code> causes Valgrind not to |
| start GDB, and not to ask again. |
| <p> |
| <code>--gdb-attach=yes</code> conflicts with |
| <code>--trace-children=yes</code>. You can't use them together. |
| Valgrind refuses to start up in this situation. 1 May 2002: |
| this is a historical relic which could be easily fixed if it |
| gets in your way. Mail me and complain if this is a problem for |
| you. </li><br><p> |
| |
| <li><code>--partial-loads-ok=yes</code> [the default]<br> |
| <code>--partial-loads-ok=no</code> |
| <p>Controls how Valgrind handles word (4-byte) loads from |
| addresses for which some bytes are addressible and others |
| are not. When <code>yes</code> (the default), such loads |
| do not elicit an address error. Instead, the loaded V bytes |
| corresponding to the illegal addresses indicate undefined, and |
| those corresponding to legal addresses are loaded from shadow |
| memory, as usual. |
| <p> |
| When <code>no</code>, loads from partially |
| invalid addresses are treated the same as loads from completely |
| invalid addresses: an illegal-address error is issued, |
| and the resulting V bytes indicate valid data. |
| </li><br><p> |
| |
| <li><code>--sloppy-malloc=no</code> [the default]<br> |
| <code>--sloppy-malloc=yes</code> |
| <p>When enabled, all requests for malloc/calloc are rounded up |
| to a whole number of machine words -- in other words, made |
| divisible by 4. For example, a request for 17 bytes of space |
| would result in a 20-byte area being made available. This works |
| around bugs in sloppy libraries which assume that they can |
| safely rely on malloc/calloc requests being rounded up in this |
| fashion. Without the workaround, these libraries tend to |
| generate large numbers of errors when they access the ends of |
| these areas. |
| <p> |
| Valgrind snapshots dated 17 Feb 2002 and later are |
| cleverer about this problem, and you should no longer need to |
| use this flag. To put it bluntly, if you do need to use this |
| flag, your program violates the ANSI C semantics defined for |
| <code>malloc</code> and <code>free</code>, even if it appears to |
| work correctly, and you should fix it, at least if you hope for |
| maximum portability. |
| </li><br><p> |
| |
| <li><code>--trace-children=no</code> [the default]</br> |
| <code>--trace-children=yes</code> |
| <p>When enabled, Valgrind will trace into child processes. This |
| is confusing and usually not what you want, so is disabled by |
| default. As of 1 May 2002, tracing into a child process from a |
| parent which uses <code>libpthread.so</code> is probably broken |
| and is likely to cause breakage. Please report any such |
| problems to me. </li><br><p> |
| |
| <li><code>--freelist-vol=<number></code> [default: 1000000] |
| <p>When the client program releases memory using free (in C) or |
| delete (C++), that memory is not immediately made available for |
| re-allocation. Instead it is marked inaccessible and placed in |
| a queue of freed blocks. The purpose is to delay the point at |
| which freed-up memory comes back into circulation. This |
| increases the chance that Valgrind will be able to detect |
| invalid accesses to blocks for some significant period of time |
| after they have been freed. |
| <p> |
| This flag specifies the maximum total size, in bytes, of the |
| blocks in the queue. The default value is one million bytes. |
| Increasing this increases the total amount of memory used by |
| Valgrind but may detect invalid uses of freed blocks which would |
| otherwise go undetected.</li><br><p> |
| |
| <li><code>--logfile-fd=<number></code> [default: 2, stderr] |
| <p>Specifies the file descriptor on which Valgrind communicates |
| all of its messages. The default, 2, is the standard error |
| channel. This may interfere with the client's own use of |
| stderr. To dump Valgrind's commentary in a file without using |
| stderr, something like the following works well (sh/bash |
| syntax):<br> |
| <code> |
| valgrind --logfile-fd=9 my_prog 9> logfile</code><br> |
| That is: tell Valgrind to send all output to file descriptor 9, |
| and ask the shell to route file descriptor 9 to "logfile". |
| </li><br><p> |
| |
| <li><code>--suppressions=<filename></code> |
| [default: $PREFIX/lib/valgrind/default.supp] |
| <p>Specifies an extra |
| file from which to read descriptions of errors to suppress. You |
| may use as many extra suppressions files as you |
| like.</li><br><p> |
| |
| <li><code>--leak-check=no</code> [default]<br> |
| <code>--leak-check=yes</code> |
| <p>When enabled, search for memory leaks when the client program |
| finishes. A memory leak means a malloc'd block, which has not |
| yet been free'd, but to which no pointer can be found. Such a |
| block can never be free'd by the program, since no pointer to it |
| exists. Leak checking is disabled by default because it tends |
| to generate dozens of error messages. </li><br><p> |
| |
| <li><code>--show-reachable=no</code> [default]<br> |
| <code>--show-reachable=yes</code> |
| <p>When disabled, the memory leak detector only shows blocks for |
| which it cannot find a pointer to at all, or it can only find a |
| pointer to the middle of. These blocks are prime candidates for |
| memory leaks. When enabled, the leak detector also reports on |
| blocks which it could find a pointer to. Your program could, at |
| least in principle, have freed such blocks before exit. |
| Contrast this to blocks for which no pointer, or only an |
| interior pointer could be found: they are more likely to |
| indicate memory leaks, because you do not actually have a |
| pointer to the start of the block which you can hand to |
| <code>free</code>, even if you wanted to. </li><br><p> |
| |
| <li><code>--leak-resolution=low</code> [default]<br> |
| <code>--leak-resolution=med</code> <br> |
| <code>--leak-resolution=high</code> |
| <p>When doing leak checking, determines how willing Valgrind is |
| to consider different backtraces to be the same. When set to |
| <code>low</code>, the default, only the first two entries need |
| match. When <code>med</code>, four entries have to match. When |
| <code>high</code>, all entries need to match. |
| <p> |
| For hardcore leak debugging, you probably want to use |
| <code>--leak-resolution=high</code> together with |
| <code>--num-callers=40</code> or some such large number. Note |
| however that this can give an overwhelming amount of |
| information, which is why the defaults are 4 callers and |
| low-resolution matching. |
| <p> |
| Note that the <code>--leak-resolution=</code> setting does not |
| affect Valgrind's ability to find leaks. It only changes how |
| the results are presented. |
| </li><br><p> |
| |
| <li><code>--workaround-gcc296-bugs=no</code> [default]<br> |
| <code>--workaround-gcc296-bugs=yes</code> <p>When enabled, |
| assume that reads and writes some small distance below the stack |
| pointer <code>%esp</code> are due to bugs in gcc 2.96, and does |
| not report them. The "small distance" is 256 bytes by default. |
| Note that gcc 2.96 is the default compiler on some popular Linux |
| distributions (RedHat 7.X, Mandrake) and so you may well need to |
| use this flag. Do not use it if you do not have to, as it can |
| cause real errors to be overlooked. A better option is to use a |
| gcc/g++ which works properly; 2.95.3 seems to be a good choice. |
| <p> |
| Unfortunately (27 Feb 02) it looks like g++ 3.0.4 is similarly |
| buggy, so you may need to issue this flag if you use 3.0.4. A |
| while later (early Apr 02) this is confirmed as a scheduling bug |
| in g++-3.0.4. |
| </li><br><p> |
| |
| <li><code>--cachesim=no</code> [default]<br> |
| <code>--cachesim=yes</code> <p>When enabled, turns off memory |
| checking, and turns on cache profiling. Cache profiling is |
| described in detail in <a href="#cache">Section 7</a>. |
| </li><br><p> |
| |
| <li><code>--weird-hacks=hack1,hack2,...</code> |
| Pass miscellaneous hints to Valgrind which slightly modify the |
| simulated behaviour in nonstandard or dangerous ways, possibly |
| to help the simulation of strange features. By default no hacks |
| are enabled. Use with caution! Currently known hacks are: |
| <p> |
| <ul> |
| <li><code>ioctl-VTIME</code> Use this if you have a program |
| which sets readable file descriptors to have a timeout by |
| doing <code>ioctl</code> on them with a |
| <code>TCSETA</code>-style command <b>and</b> a non-zero |
| <code>VTIME</code> timeout value. This is considered |
| potentially dangerous and therefore is not engaged by |
| default, because it is (remotely) conceivable that it could |
| cause threads doing <code>read</code> to incorrectly block |
| the entire process. |
| <p> |
| You probably want to try this one if you have a program |
| which unexpectedly blocks in a <code>read</code> from a file |
| descriptor which you know to have been messed with by |
| <code>ioctl</code>. This could happen, for example, if the |
| descriptor is used to read input from some kind of screen |
| handling library. |
| <p> |
| To find out if your program is blocking unexpectedly in the |
| <code>read</code> system call, run with |
| <code>--trace-syscalls=yes</code> flag. |
| </ul> |
| |
| </li><p> |
| </ul> |
| |
| There are also some options for debugging Valgrind itself. You |
| shouldn't need to use them in the normal run of things. Nevertheless: |
| |
| <ul> |
| |
| <li><code>--single-step=no</code> [default]<br> |
| <code>--single-step=yes</code> |
| <p>When enabled, each x86 insn is translated seperately into |
| instrumented code. When disabled, translation is done on a |
| per-basic-block basis, giving much better translations.</li><br> |
| <p> |
| |
| <li><code>--optimise=no</code><br> |
| <code>--optimise=yes</code> [default] |
| <p>When enabled, various improvements are applied to the |
| intermediate code, mainly aimed at allowing the simulated CPU's |
| registers to be cached in the real CPU's registers over several |
| simulated instructions.</li><br> |
| <p> |
| |
| <li><code>--instrument=no</code><br> |
| <code>--instrument=yes</code> [default] |
| <p>When disabled, the translations don't actually contain any |
| instrumentation.</li><br> |
| <p> |
| |
| <li><code>--cleanup=no</code><br> |
| <code>--cleanup=yes</code> [default] |
| <p>When enabled, various improvments are applied to the |
| post-instrumented intermediate code, aimed at removing redundant |
| value checks.</li><br> |
| <p> |
| |
| <li><code>--trace-syscalls=no</code> [default]<br> |
| <code>--trace-syscalls=yes</code> |
| <p>Enable/disable tracing of system call intercepts.</li><br> |
| <p> |
| |
| <li><code>--trace-signals=no</code> [default]<br> |
| <code>--trace-signals=yes</code> |
| <p>Enable/disable tracing of signal handling.</li><br> |
| <p> |
| |
| <li><code>--trace-sched=no</code> [default]<br> |
| <code>--trace-sched=yes</code> |
| <p>Enable/disable tracing of thread scheduling events.</li><br> |
| <p> |
| |
| <li><code>--trace-pthread=none</code> [default]<br> |
| <code>--trace-pthread=some</code> <br> |
| <code>--trace-pthread=all</code> |
| <p>Specifies amount of trace detail for pthread-related events.</li><br> |
| <p> |
| |
| <li><code>--trace-symtab=no</code> [default]<br> |
| <code>--trace-symtab=yes</code> |
| <p>Enable/disable tracing of symbol table reading.</li><br> |
| <p> |
| |
| <li><code>--trace-malloc=no</code> [default]<br> |
| <code>--trace-malloc=yes</code> |
| <p>Enable/disable tracing of malloc/free (et al) intercepts. |
| </li><br> |
| <p> |
| |
| <li><code>--stop-after=<number></code> |
| [default: infinity, more or less] |
| <p>After <number> basic blocks have been executed, shut down |
| Valgrind and switch back to running the client on the real CPU. |
| </li><br> |
| <p> |
| |
| <li><code>--dump-error=<number></code> [default: inactive] |
| <p>After the program has exited, show gory details of the |
| translation of the basic block containing the <number>'th |
| error context. When used with <code>--single-step=yes</code>, |
| can show the exact x86 instruction causing an error. This is |
| all fairly dodgy and doesn't work at all if threads are |
| involved.</li><br> |
| <p> |
| |
| <li><code>--smc-check=none</code><br> |
| <code>--smc-check=some</code> [default]<br> |
| <code>--smc-check=all</code> |
| <p>How carefully should Valgrind check for self-modifying code |
| writes, so that translations can be discarded? When |
| "none", no writes are checked. When "some", only writes |
| resulting from moves from integer registers to memory are |
| checked. When "all", all memory writes are checked, even those |
| with which are no sane program would generate code -- for |
| example, floating-point writes. |
| <p> |
| NOTE that this is all a bit bogus. This mechanism has never |
| been enabled in any snapshot of Valgrind which was made |
| available to the general public, because the extra checks reduce |
| performance, increase complexity, and I have yet to come across |
| any programs which actually use self-modifying code. I think |
| the flag is ignored. |
| </li> |
| </ul> |
| |
| |
| <a name="errormsgs"> |
| <h3>2.6 Explaination of error messages</h3> |
| |
| Despite considerable sophistication under the hood, Valgrind can only |
| really detect two kinds of errors, use of illegal addresses, and use |
| of undefined values. Nevertheless, this is enough to help you |
| discover all sorts of memory-management nasties in your code. This |
| section presents a quick summary of what error messages mean. The |
| precise behaviour of the error-checking machinery is described in |
| <a href="#machine">Section 4</a>. |
| |
| |
| <h4>2.6.1 Illegal read / Illegal write errors</h4> |
| For example: |
| <pre> |
| Invalid read of size 4 |
| at 0x40F6BBCC: (within /usr/lib/libpng.so.2.1.0.9) |
| by 0x40F6B804: (within /usr/lib/libpng.so.2.1.0.9) |
| by 0x40B07FF4: read_png_image__FP8QImageIO (kernel/qpngio.cpp:326) |
| by 0x40AC751B: QImageIO::read() (kernel/qimage.cpp:3621) |
| Address 0xBFFFF0E0 is not stack'd, malloc'd or free'd |
| </pre> |
| |
| <p>This happens when your program reads or writes memory at a place |
| which Valgrind reckons it shouldn't. In this example, the program did |
| a 4-byte read at address 0xBFFFF0E0, somewhere within the |
| system-supplied library libpng.so.2.1.0.9, which was called from |
| somewhere else in the same library, called from line 326 of |
| qpngio.cpp, and so on. |
| |
| <p>Valgrind tries to establish what the illegal address might relate |
| to, since that's often useful. So, if it points into a block of |
| memory which has already been freed, you'll be informed of this, and |
| also where the block was free'd at. Likewise, if it should turn out |
| to be just off the end of a malloc'd block, a common result of |
| off-by-one-errors in array subscripting, you'll be informed of this |
| fact, and also where the block was malloc'd. |
| |
| <p>In this example, Valgrind can't identify the address. Actually the |
| address is on the stack, but, for some reason, this is not a valid |
| stack address -- it is below the stack pointer, %esp, and that isn't |
| allowed. In this particular case it's probably caused by gcc |
| generating invalid code, a known bug in various flavours of gcc. |
| |
| <p>Note that Valgrind only tells you that your program is about to |
| access memory at an illegal address. It can't stop the access from |
| happening. So, if your program makes an access which normally would |
| result in a segmentation fault, you program will still suffer the same |
| fate -- but you will get a message from Valgrind immediately prior to |
| this. In this particular example, reading junk on the stack is |
| non-fatal, and the program stays alive. |
| |
| |
| <h4>2.6.2 Use of uninitialised values</h4> |
| For example: |
| <pre> |
| Conditional jump or move depends on uninitialised value(s) |
| at 0x402DFA94: _IO_vfprintf (_itoa.h:49) |
| by 0x402E8476: _IO_printf (printf.c:36) |
| by 0x8048472: main (tests/manuel1.c:8) |
| by 0x402A6E5E: __libc_start_main (libc-start.c:129) |
| </pre> |
| |
| <p>An uninitialised-value use error is reported when your program uses |
| a value which hasn't been initialised -- in other words, is undefined. |
| Here, the undefined value is used somewhere inside the printf() |
| machinery of the C library. This error was reported when running the |
| following small program: |
| <pre> |
| int main() |
| { |
| int x; |
| printf ("x = %d\n", x); |
| } |
| </pre> |
| |
| <p>It is important to understand that your program can copy around |
| junk (uninitialised) data to its heart's content. Valgrind observes |
| this and keeps track of the data, but does not complain. A complaint |
| is issued only when your program attempts to make use of uninitialised |
| data. In this example, x is uninitialised. Valgrind observes the |
| value being passed to _IO_printf and thence to _IO_vfprintf, but makes |
| no comment. However, _IO_vfprintf has to examine the value of x so it |
| can turn it into the corresponding ASCII string, and it is at this |
| point that Valgrind complains. |
| |
| <p>Sources of uninitialised data tend to be: |
| <ul> |
| <li>Local variables in procedures which have not been initialised, |
| as in the example above.</li><br><p> |
| |
| <li>The contents of malloc'd blocks, before you write something |
| there. In C++, the new operator is a wrapper round malloc, so |
| if you create an object with new, its fields will be |
| uninitialised until you fill them in, which is only Right and |
| Proper.</li> |
| </ul> |
| |
| |
| |
| <h4>2.6.3 Illegal frees</h4> |
| For example: |
| <pre> |
| Invalid free() |
| at 0x4004FFDF: free (ut_clientmalloc.c:577) |
| by 0x80484C7: main (tests/doublefree.c:10) |
| by 0x402A6E5E: __libc_start_main (libc-start.c:129) |
| by 0x80483B1: (within tests/doublefree) |
| Address 0x3807F7B4 is 0 bytes inside a block of size 177 free'd |
| at 0x4004FFDF: free (ut_clientmalloc.c:577) |
| by 0x80484C7: main (tests/doublefree.c:10) |
| by 0x402A6E5E: __libc_start_main (libc-start.c:129) |
| by 0x80483B1: (within tests/doublefree) |
| </pre> |
| <p>Valgrind keeps track of the blocks allocated by your program with |
| malloc/new, so it can know exactly whether or not the argument to |
| free/delete is legitimate or not. Here, this test program has |
| freed the same block twice. As with the illegal read/write errors, |
| Valgrind attempts to make sense of the address free'd. If, as |
| here, the address is one which has previously been freed, you wil |
| be told that -- making duplicate frees of the same block easy to spot. |
| |
| |
| <h4>2.6.4 When a block is freed with an inappropriate |
| deallocation function</h4> |
| In the following example, a block allocated with <code>new []</code> |
| has wrongly been deallocated with <code>free</code>: |
| <pre> |
| Mismatched free() / delete / delete [] |
| at 0x40043249: free (vg_clientfuncs.c:171) |
| by 0x4102BB4E: QGArray::~QGArray(void) (tools/qgarray.cpp:149) |
| by 0x4C261C41: PptDoc::~PptDoc(void) (include/qmemarray.h:60) |
| by 0x4C261F0E: PptXml::~PptXml(void) (pptxml.cc:44) |
| Address 0x4BB292A8 is 0 bytes inside a block of size 64 alloc'd |
| at 0x4004318C: __builtin_vec_new (vg_clientfuncs.c:152) |
| by 0x4C21BC15: KLaola::readSBStream(int) const (klaola.cc:314) |
| by 0x4C21C155: KLaola::stream(KLaola::OLENode const *) (klaola.cc:416) |
| by 0x4C21788F: OLEFilter::convert(QCString const &) (olefilter.cc:272) |
| </pre> |
| The following was told to me be the KDE 3 developers. I didn't know |
| any of it myself. They also implemented the check itself. |
| <p> |
| In C++ it's important to deallocate memory in a way compatible with |
| how it was allocated. The deal is: |
| <ul> |
| <li>If allocated with <code>malloc</code>, <code>calloc</code>, |
| <code>realloc</code>, <code>valloc</code> or |
| <code>memalign</code>, you must deallocate with <code>free</code>. |
| <li>If allocated with <code>new []</code>, you must deallocate with |
| <code>delete []</code>. |
| <li>If allocated with <code>new</code>, you must deallocate with |
| <code>delete</code>. |
| </ul> |
| The worst thing is that on Linux apparently it doesn't matter if you |
| do muddle these up, and it all seems to work ok, but the same program |
| may then crash on a different platform, Solaris for example. So it's |
| best to fix it properly. According to the KDE folks "it's amazing how |
| many C++ programmers don't know this". |
| |
| |
| |
| <h4>2.6.5 Passing system call parameters with inadequate |
| read/write permissions</h4> |
| |
| Valgrind checks all parameters to system calls. If a system call |
| needs to read from a buffer provided by your program, Valgrind checks |
| that the entire buffer is addressible and has valid data, ie, it is |
| readable. And if the system call needs to write to a user-supplied |
| buffer, Valgrind checks that the buffer is addressible. After the |
| system call, Valgrind updates its administrative information to |
| precisely reflect any changes in memory permissions caused by the |
| system call. |
| |
| <p>Here's an example of a system call with an invalid parameter: |
| <pre> |
| #include <stdlib.h> |
| #include <unistd.h> |
| int main( void ) |
| { |
| char* arr = malloc(10); |
| (void) write( 1 /* stdout */, arr, 10 ); |
| return 0; |
| } |
| </pre> |
| |
| <p>You get this complaint ... |
| <pre> |
| Syscall param write(buf) contains uninitialised or unaddressable byte(s) |
| at 0x4035E072: __libc_write |
| by 0x402A6E5E: __libc_start_main (libc-start.c:129) |
| by 0x80483B1: (within tests/badwrite) |
| by <bogus frame pointer> ??? |
| Address 0x3807E6D0 is 0 bytes inside a block of size 10 alloc'd |
| at 0x4004FEE6: malloc (ut_clientmalloc.c:539) |
| by 0x80484A0: main (tests/badwrite.c:6) |
| by 0x402A6E5E: __libc_start_main (libc-start.c:129) |
| by 0x80483B1: (within tests/badwrite) |
| </pre> |
| |
| <p>... because the program has tried to write uninitialised junk from |
| the malloc'd block to the standard output. |
| |
| |
| <h4>2.6.6 Warning messages you might see</h4> |
| |
| Most of these only appear if you run in verbose mode (enabled by |
| <code>-v</code>): |
| <ul> |
| <li> <code>More than 50 errors detected. Subsequent errors |
| will still be recorded, but in less detail than before.</code> |
| <br> |
| After 50 different errors have been shown, Valgrind becomes |
| more conservative about collecting them. It then requires only |
| the program counters in the top two stack frames to match when |
| deciding whether or not two errors are really the same one. |
| Prior to this point, the PCs in the top four frames are required |
| to match. This hack has the effect of slowing down the |
| appearance of new errors after the first 50. The 50 constant can |
| be changed by recompiling Valgrind. |
| <p> |
| <li> <code>More than 300 errors detected. I'm not reporting any more. |
| Final error counts may be inaccurate. Go fix your |
| program!</code> |
| <br> |
| After 300 different errors have been detected, Valgrind ignores |
| any more. It seems unlikely that collecting even more different |
| ones would be of practical help to anybody, and it avoids the |
| danger that Valgrind spends more and more of its time comparing |
| new errors against an ever-growing collection. As above, the 500 |
| number is a compile-time constant. |
| <p> |
| <li> <code>Warning: client exiting by calling exit(<number>). |
| Bye!</code> |
| <br> |
| Your program has called the <code>exit</code> system call, which |
| will immediately terminate the process. You'll get no exit-time |
| error summaries or leak checks. Note that this is not the same |
| as your program calling the ANSI C function <code>exit()</code> |
| -- that causes a normal, controlled shutdown of Valgrind. |
| <p> |
| <li> <code>Warning: client switching stacks?</code> |
| <br> |
| Valgrind spotted such a large change in the stack pointer, %esp, |
| that it guesses the client is switching to a different stack. |
| At this point it makes a kludgey guess where the base of the new |
| stack is, and sets memory permissions accordingly. You may get |
| many bogus error messages following this, if Valgrind guesses |
| wrong. At the moment "large change" is defined as a change of |
| more that 2000000 in the value of the %esp (stack pointer) |
| register. |
| <p> |
| <li> <code>Warning: client attempted to close Valgrind's logfile fd <number> |
| </code> |
| <br> |
| Valgrind doesn't allow the client |
| to close the logfile, because you'd never see any diagnostic |
| information after that point. If you see this message, |
| you may want to use the <code>--logfile-fd=<number></code> |
| option to specify a different logfile file-descriptor number. |
| <p> |
| <li> <code>Warning: noted but unhandled ioctl <number></code> |
| <br> |
| Valgrind observed a call to one of the vast family of |
| <code>ioctl</code> system calls, but did not modify its |
| memory status info (because I have not yet got round to it). |
| The call will still have gone through, but you may get spurious |
| errors after this as a result of the non-update of the memory info. |
| <p> |
| <li> <code>Warning: unblocking signal <number> due to |
| sigprocmask</code> |
| <br> |
| Really just a diagnostic from the signal simulation machinery. |
| This message will appear if your program handles a signal by |
| first <code>longjmp</code>ing out of the signal handler, |
| and then unblocking the signal with <code>sigprocmask</code> |
| -- a standard signal-handling idiom. |
| <p> |
| <li> <code>Warning: bad signal number <number> in __NR_sigaction.</code> |
| <br> |
| Probably indicates a bug in the signal simulation machinery. |
| <p> |
| <li> <code>Warning: set address range perms: large range <number></code> |
| <br> |
| Diagnostic message, mostly for my benefit, to do with memory |
| permissions. |
| </ul> |
| |
| |
| <a name="suppfiles"></a> |
| <h3>2.7 Writing suppressions files</h3> |
| |
| A suppression file describes a bunch of errors which, for one reason |
| or another, you don't want Valgrind to tell you about. Usually the |
| reason is that the system libraries are buggy but unfixable, at least |
| within the scope of the current debugging session. Multiple |
| suppresions files are allowed. By default, Valgrind uses |
| <code>$PREFIX/lib/valgrind/default.supp</code>. |
| |
| <p> |
| You can ask to add suppressions from another file, by specifying |
| <code>--suppressions=/path/to/file.supp</code>. |
| |
| <p>Each suppression has the following components:<br> |
| <ul> |
| |
| <li>Its name. This merely gives a handy name to the suppression, by |
| which it is referred to in the summary of used suppressions |
| printed out when a program finishes. It's not important what |
| the name is; any identifying string will do. |
| <p> |
| |
| <li>The nature of the error to suppress. Either: |
| <code>Value1</code>, |
| <code>Value2</code>, |
| <code>Value4</code> or |
| <code>Value8</code>, |
| meaning an uninitialised-value error when |
| using a value of 1, 2, 4 or 8 bytes. |
| Or |
| <code>Cond</code> (or its old name, <code>Value0</code>), |
| meaning use of an uninitialised CPU condition code. Or: |
| <code>Addr1</code>, |
| <code>Addr2</code>, |
| <code>Addr4</code> or |
| <code>Addr8</code>, meaning an invalid address during a |
| memory access of 1, 2, 4 or 8 bytes respectively. Or |
| <code>Param</code>, |
| meaning an invalid system call parameter error. Or |
| <code>Free</code>, meaning an invalid or mismatching free.</li><br> |
| <p> |
| |
| <li>The "immediate location" specification. For Value and Addr |
| errors, is either the name of the function in which the error |
| occurred, or, failing that, the full path the the .so file |
| containing the error location. For Param errors, is the name of |
| the offending system call parameter. For Free errors, is the |
| name of the function doing the freeing (eg, <code>free</code>, |
| <code>__builtin_vec_delete</code>, etc)</li><br> |
| <p> |
| |
| <li>The caller of the above "immediate location". Again, either a |
| function or shared-object name.</li><br> |
| <p> |
| |
| <li>Optionally, one or two extra calling-function or object names, |
| for greater precision.</li> |
| </ul> |
| |
| <p> |
| Locations may be either names of shared objects or wildcards matching |
| function names. They begin <code>obj:</code> and <code>fun:</code> |
| respectively. Function and object names to match against may use the |
| wildcard characters <code>*</code> and <code>?</code>. |
| |
| A suppression only suppresses an error when the error matches all the |
| details in the suppression. Here's an example: |
| <pre> |
| { |
| __gconv_transform_ascii_internal/__mbrtowc/mbtowc |
| Value4 |
| fun:__gconv_transform_ascii_internal |
| fun:__mbr*toc |
| fun:mbtowc |
| } |
| </pre> |
| |
| <p>What is means is: suppress a use-of-uninitialised-value error, when |
| the data size is 4, when it occurs in the function |
| <code>__gconv_transform_ascii_internal</code>, when that is called |
| from any function of name matching <code>__mbr*toc</code>, |
| when that is called from |
| <code>mbtowc</code>. It doesn't apply under any other circumstances. |
| The string by which this suppression is identified to the user is |
| __gconv_transform_ascii_internal/__mbrtowc/mbtowc. |
| |
| <p>Another example: |
| <pre> |
| { |
| libX11.so.6.2/libX11.so.6.2/libXaw.so.7.0 |
| Value4 |
| obj:/usr/X11R6/lib/libX11.so.6.2 |
| obj:/usr/X11R6/lib/libX11.so.6.2 |
| obj:/usr/X11R6/lib/libXaw.so.7.0 |
| } |
| </pre> |
| |
| <p>Suppress any size 4 uninitialised-value error which occurs anywhere |
| in <code>libX11.so.6.2</code>, when called from anywhere in the same |
| library, when called from anywhere in <code>libXaw.so.7.0</code>. The |
| inexact specification of locations is regrettable, but is about all |
| you can hope for, given that the X11 libraries shipped with Red Hat |
| 7.2 have had their symbol tables removed. |
| |
| <p>Note -- since the above two examples did not make it clear -- that |
| you can freely mix the <code>obj:</code> and <code>fun:</code> |
| styles of description within a single suppression record. |
| |
| |
| <a name="clientreq"></a> |
| <h3>2.8 The Client Request mechanism</h3> |
| |
| Valgrind has a trapdoor mechanism via which the client program can |
| pass all manner of requests and queries to Valgrind. Internally, this |
| is used extensively to make malloc, free, signals, threads, etc, work, |
| although you don't see that. |
| <p> |
| For your convenience, a subset of these so-called client requests is |
| provided to allow you to tell Valgrind facts about the behaviour of |
| your program, and conversely to make queries. In particular, your |
| program can tell Valgrind about changes in memory range permissions |
| that Valgrind would not otherwise know about, and so allows clients to |
| get Valgrind to do arbitrary custom checks. |
| <p> |
| Clients need to include the header file <code>valgrind.h</code> to |
| make this work. The macros therein have the magical property that |
| they generate code in-line which Valgrind can spot. However, the code |
| does nothing when not run on Valgrind, so you are not forced to run |
| your program on Valgrind just because you use the macros in this file. |
| Also, you are not required to link your program with any extra |
| supporting libraries. |
| <p> |
| A brief description of the available macros: |
| <ul> |
| <li><code>VALGRIND_MAKE_NOACCESS</code>, |
| <code>VALGRIND_MAKE_WRITABLE</code> and |
| <code>VALGRIND_MAKE_READABLE</code>. These mark address |
| ranges as completely inaccessible, accessible but containing |
| undefined data, and accessible and containing defined data, |
| respectively. Subsequent errors may have their faulting |
| addresses described in terms of these blocks. Returns a |
| "block handle". Returns zero when not run on Valgrind. |
| <p> |
| <li><code>VALGRIND_DISCARD</code>: At some point you may want |
| Valgrind to stop reporting errors in terms of the blocks |
| defined by the previous three macros. To do this, the above |
| macros return a small-integer "block handle". You can pass |
| this block handle to <code>VALGRIND_DISCARD</code>. After |
| doing so, Valgrind will no longer be able to relate |
| addressing errors to the user-defined block associated with |
| the handle. The permissions settings associated with the |
| handle remain in place; this just affects how errors are |
| reported, not whether they are reported. Returns 1 for an |
| invalid handle and 0 for a valid handle (although passing |
| invalid handles is harmless). Always returns 0 when not run |
| on Valgrind. |
| <p> |
| <li><code>VALGRIND_CHECK_NOACCESS</code>, |
| <code>VALGRIND_CHECK_WRITABLE</code> and |
| <code>VALGRIND_CHECK_READABLE</code>: check immediately |
| whether or not the given address range has the relevant |
| property, and if not, print an error message. Also, for the |
| convenience of the client, returns zero if the relevant |
| property holds; otherwise, the returned value is the address |
| of the first byte for which the property is not true. |
| Always returns 0 when not run on Valgrind. |
| <p> |
| <li><code>VALGRIND_CHECK_NOACCESS</code>: a quick and easy way |
| to find out whether Valgrind thinks a particular variable |
| (lvalue, to be precise) is addressible and defined. Prints |
| an error message if not. Returns no value. |
| <p> |
| <li><code>VALGRIND_MAKE_NOACCESS_STACK</code>: a highly |
| experimental feature. Similarly to |
| <code>VALGRIND_MAKE_NOACCESS</code>, this marks an address |
| range as inaccessible, so that subsequent accesses to an |
| address in the range gives an error. However, this macro |
| does not return a block handle. Instead, all annotations |
| created like this are reviewed at each client |
| <code>ret</code> (subroutine return) instruction, and those |
| which now define an address range block the client's stack |
| pointer register (<code>%esp</code>) are automatically |
| deleted. |
| <p> |
| In other words, this macro allows the client to tell |
| Valgrind about red-zones on its own stack. Valgrind |
| automatically discards this information when the stack |
| retreats past such blocks. Beware: hacky and flaky, and |
| probably interacts badly with the new pthread support. |
| <p> |
| <li><code>RUNNING_ON_VALGRIND</code>: returns 1 if running on |
| Valgrind, 0 if running on the real CPU. |
| <p> |
| <li><code>VALGRIND_DO_LEAK_CHECK</code>: run the memory leak detector |
| right now. Returns no value. I guess this could be used to |
| incrementally check for leaks between arbitrary places in the |
| program's execution. Warning: not properly tested! |
| </ul> |
| <p> |
| |
| |
| <a name="pthreads"></a> |
| <h3>2.9 Support for POSIX Pthreads</h3> |
| |
| As of late April 02, Valgrind supports programs which use POSIX |
| pthreads. Doing this has proved technically challenging and is still |
| in progress, but it works well enough, as of 1 May 02, for significant |
| threaded applications to work. |
| <p> |
| It works as follows: threaded apps are (dynamically) linked against |
| <code>libpthread.so</code>. Usually this is the one installed with |
| your Linux distribution. Valgrind, however, supplies its own |
| <code>libpthread.so</code> and automatically connects your program to |
| it instead. |
| <p> |
| The fake <code>libpthread.so</code> and Valgrind cooperate to |
| implement a user-space pthreads package. This approach avoids the |
| horrible implementation problems of implementing a truly |
| multiprocessor version of Valgrind, but it does mean that threaded |
| apps run only on one CPU, even if you have a multiprocessor machine. |
| <p> |
| Valgrind schedules your threads in a round-robin fashion, with all |
| threads having equal priority. It switches threads every 20000 basic |
| blocks (typically around 120000 x86 instructions), which means you'll |
| get a much finer interleaving of thread executions than when run |
| natively. This in itself may cause your program to behave differently |
| if you have some kind of concurrency, critical race, locking, or |
| similar, bugs. |
| <p> |
| The current (1 May 02) state of pthread support is as follows. Please |
| note that things are advancing rapidly, so the situation may have |
| improved by the time you read this -- check the web site for further |
| updates. |
| <ul> |
| <li>Mutexes, condition variables, thread-specific data and |
| <code>pthread_once</code> currently work. |
| <p> |
| <li>Various attribute-like calls are handled but ignored. |
| You get a warning message. |
| <p> |
| <li>The main big omission is proper cleanup support for cancellation. |
| <code>pthread_cancel</code> works, but instantly nukes the target |
| thread without giving it any chance to clean up. Also, when a |
| thread exits, it does not run any cleanup handlers. |
| <p> |
| <li>Currently the following syscalls are thread-safe (nonblocking): |
| <code>write</code> <code>read</code> <code>nanosleep</code> |
| <code>sleep</code> <code>select</code> and <code>poll</code>. |
| <p> |
| <li>The POSIX requirement that each thread have its own |
| signal-blocking mask is not done; the signal handling mechanism is |
| thread-unaware and all signals are delivered to the main thread, |
| antidisirregardless. |
| </ul> |
| |
| |
| As of 1 May 02, the following programs now work fine on my RedHat 7.2 |
| box: Opera 6.0Beta2, KNode in KDE 3.0, Mozilla-0.9.2.1 and |
| Galeon-0.11.3, both as supplied with RedHat 7.2. |
| <p> |
| Mozilla 1.0RC1 works fine too, provided that you patch it as described |
| here: <a href="http://bugzilla.mozilla.org/show_bug.cgi?id=124335"> |
| http://bugzilla.mozilla.org/show_bug.cgi?id=124335</a>. This fixes a |
| bug in Mozilla which assumes that memory returned from |
| <code>malloc</code> is 8-aligned. Valgrind's allocator only |
| guarantees 4-alignment, so without the patch Mozilla makes an illegal |
| memory access, which Valgrind of course spots, and then bombs. |
| |
| |
| |
| <a name="install"></a> |
| <h3>2.10 Building and installing</h3> |
| |
| We now use the standard Unix <code>./configure</code>, |
| <code>make</code>, <code>make install</code> mechanism, and I have |
| attempted to ensure that it works on machines with kernel 2.2 or 2.4 |
| and glibc 2.1.X or 2.2.X. I don't think there is much else to say. |
| There are no options apart from the usual <code>--prefix</code> that |
| you should give to <code>./configure</code>. |
| <p> |
| Let me know if you have build problems. |
| |
| |
| |
| <a name="problems"></a> |
| <h3>2.11 If you have problems</h3> |
| Mail me (<a href="mailto:jseward@acm.org">jseward@acm.org</a>). |
| |
| <p>See <a href="#limits">Section 4</a> for the known limitations of |
| Valgrind, and for a list of programs which are known not to work on |
| it. |
| |
| <p>The translator/instrumentor has a lot of assertions in it. They |
| are permanently enabled, and I have no plans to disable them. If one |
| of these breaks, please mail me! |
| |
| <p>If you get an assertion failure on the expression |
| <code>chunkSane(ch)</code> in <code>vg_free()</code> in |
| <code>vg_malloc.c</code>, this may have happened because your program |
| wrote off the end of a malloc'd block, or before its beginning. |
| Valgrind should have emitted a proper message to that effect before |
| dying in this way. This is a known problem which I should fix. |
| <p> |
| |
| <hr width="100%"> |
| |
| <a name="machine"></a> |
| <h2>3 Details of the checking machinery</h2> |
| |
| Read this section if you want to know, in detail, exactly what and how |
| Valgrind is checking. |
| |
| <a name="vvalue"></a> |
| <h3>3.1 Valid-value (V) bits</h3> |
| |
| It is simplest to think of Valgrind implementing a synthetic Intel x86 |
| CPU which is identical to a real CPU, except for one crucial detail. |
| Every bit (literally) of data processed, stored and handled by the |
| real CPU has, in the synthetic CPU, an associated "valid-value" bit, |
| which says whether or not the accompanying bit has a legitimate value. |
| In the discussions which follow, this bit is referred to as the V |
| (valid-value) bit. |
| |
| <p>Each byte in the system therefore has a 8 V bits which follow |
| it wherever it goes. For example, when the CPU loads a word-size item |
| (4 bytes) from memory, it also loads the corresponding 32 V bits from |
| a bitmap which stores the V bits for the process' entire address |
| space. If the CPU should later write the whole or some part of that |
| value to memory at a different address, the relevant V bits will be |
| stored back in the V-bit bitmap. |
| |
| <p>In short, each bit in the system has an associated V bit, which |
| follows it around everywhere, even inside the CPU. Yes, the CPU's |
| (integer and <code>%eflags</code>) registers have their own V bit |
| vectors. |
| |
| <p>Copying values around does not cause Valgrind to check for, or |
| report on, errors. However, when a value is used in a way which might |
| conceivably affect the outcome of your program's computation, the |
| associated V bits are immediately checked. If any of these indicate |
| that the value is undefined, an error is reported. |
| |
| <p>Here's an (admittedly nonsensical) example: |
| <pre> |
| int i, j; |
| int a[10], b[10]; |
| for (i = 0; i < 10; i++) { |
| j = a[i]; |
| b[i] = j; |
| } |
| </pre> |
| |
| <p>Valgrind emits no complaints about this, since it merely copies |
| uninitialised values from <code>a[]</code> into <code>b[]</code>, and |
| doesn't use them in any way. However, if the loop is changed to |
| <pre> |
| for (i = 0; i < 10; i++) { |
| j += a[i]; |
| } |
| if (j == 77) |
| printf("hello there\n"); |
| </pre> |
| then Valgrind will complain, at the <code>if</code>, that the |
| condition depends on uninitialised values. |
| |
| <p>Most low level operations, such as adds, cause Valgrind to |
| use the V bits for the operands to calculate the V bits for the |
| result. Even if the result is partially or wholly undefined, |
| it does not complain. |
| |
| <p>Checks on definedness only occur in two places: when a value is |
| used to generate a memory address, and where control flow decision |
| needs to be made. Also, when a system call is detected, valgrind |
| checks definedness of parameters as required. |
| |
| <p>If a check should detect undefinedness, an error message is |
| issued. The resulting value is subsequently regarded as well-defined. |
| To do otherwise would give long chains of error messages. In effect, |
| we say that undefined values are non-infectious. |
| |
| <p>This sounds overcomplicated. Why not just check all reads from |
| memory, and complain if an undefined value is loaded into a CPU register? |
| Well, that doesn't work well, because perfectly legitimate C programs routinely |
| copy uninitialised values around in memory, and we don't want endless complaints |
| about that. Here's the canonical example. Consider a struct |
| like this: |
| <pre> |
| struct S { int x; char c; }; |
| struct S s1, s2; |
| s1.x = 42; |
| s1.c = 'z'; |
| s2 = s1; |
| </pre> |
| |
| <p>The question to ask is: how large is <code>struct S</code>, in |
| bytes? An int is 4 bytes and a char one byte, so perhaps a struct S |
| occupies 5 bytes? Wrong. All (non-toy) compilers I know of will |
| round the size of <code>struct S</code> up to a whole number of words, |
| in this case 8 bytes. Not doing this forces compilers to generate |
| truly appalling code for subscripting arrays of <code>struct |
| S</code>'s. |
| |
| <p>So s1 occupies 8 bytes, yet only 5 of them will be initialised. |
| For the assignment <code>s2 = s1</code>, gcc generates code to copy |
| all 8 bytes wholesale into <code>s2</code> without regard for their |
| meaning. If Valgrind simply checked values as they came out of |
| memory, it would yelp every time a structure assignment like this |
| happened. So the more complicated semantics described above is |
| necessary. This allows gcc to copy <code>s1</code> into |
| <code>s2</code> any way it likes, and a warning will only be emitted |
| if the uninitialised values are later used. |
| |
| <p>One final twist to this story. The above scheme allows garbage to |
| pass through the CPU's integer registers without complaint. It does |
| this by giving the integer registers V tags, passing these around in |
| the expected way. This complicated and computationally expensive to |
| do, but is necessary. Valgrind is more simplistic about |
| floating-point loads and stores. In particular, V bits for data read |
| as a result of floating-point loads are checked at the load |
| instruction. So if your program uses the floating-point registers to |
| do memory-to-memory copies, you will get complaints about |
| uninitialised values. Fortunately, I have not yet encountered a |
| program which (ab)uses the floating-point registers in this way. |
| |
| <a name="vaddress"></a> |
| <h3>3.2 Valid-address (A) bits</h3> |
| |
| Notice that the previous section describes how the validity of values |
| is established and maintained without having to say whether the |
| program does or does not have the right to access any particular |
| memory location. We now consider the latter issue. |
| |
| <p>As described above, every bit in memory or in the CPU has an |
| associated valid-value (V) bit. In addition, all bytes in memory, but |
| not in the CPU, have an associated valid-address (A) bit. This |
| indicates whether or not the program can legitimately read or write |
| that location. It does not give any indication of the validity or the |
| data at that location -- that's the job of the V bits -- only whether |
| or not the location may be accessed. |
| |
| <p>Every time your program reads or writes memory, Valgrind checks the |
| A bits associated with the address. If any of them indicate an |
| invalid address, an error is emitted. Note that the reads and writes |
| themselves do not change the A bits, only consult them. |
| |
| <p>So how do the A bits get set/cleared? Like this: |
| |
| <ul> |
| <li>When the program starts, all the global data areas are marked as |
| accessible.</li><br> |
| <p> |
| |
| <li>When the program does malloc/new, the A bits for the exactly the |
| area allocated, and not a byte more, are marked as accessible. |
| Upon freeing the area the A bits are changed to indicate |
| inaccessibility.</li><br> |
| <p> |
| |
| <li>When the stack pointer register (%esp) moves up or down, A bits |
| are set. The rule is that the area from %esp up to the base of |
| the stack is marked as accessible, and below %esp is |
| inaccessible. (If that sounds illogical, bear in mind that the |
| stack grows down, not up, on almost all Unix systems, including |
| GNU/Linux.) Tracking %esp like this has the useful side-effect |
| that the section of stack used by a function for local variables |
| etc is automatically marked accessible on function entry and |
| inaccessible on exit.</li><br> |
| <p> |
| |
| <li>When doing system calls, A bits are changed appropriately. For |
| example, mmap() magically makes files appear in the process's |
| address space, so the A bits must be updated if mmap() |
| succeeds.</li><br> |
| <p> |
| |
| <li>Optionally, your program can tell Valgrind about such changes |
| explicitly, using the client request mechanism described above. |
| </ul> |
| |
| |
| <a name="together"></a> |
| <h3>3.3 Putting it all together</h3> |
| Valgrind's checking machinery can be summarised as follows: |
| |
| <ul> |
| <li>Each byte in memory has 8 associated V (valid-value) bits, |
| saying whether or not the byte has a defined value, and a single |
| A (valid-address) bit, saying whether or not the program |
| currently has the right to read/write that address.</li><br> |
| <p> |
| |
| <li>When memory is read or written, the relevant A bits are |
| consulted. If they indicate an invalid address, Valgrind emits |
| an Invalid read or Invalid write error.</li><br> |
| <p> |
| |
| <li>When memory is read into the CPU's integer registers, the |
| relevant V bits are fetched from memory and stored in the |
| simulated CPU. They are not consulted.</li><br> |
| <p> |
| |
| <li>When an integer register is written out to memory, the V bits |
| for that register are written back to memory too.</li><br> |
| <p> |
| |
| <li>When memory is read into the CPU's floating point registers, the |
| relevant V bits are read from memory and they are immediately |
| checked. If any are invalid, an uninitialised value error is |
| emitted. This precludes using the floating-point registers to |
| copy possibly-uninitialised memory, but simplifies Valgrind in |
| that it does not have to track the validity status of the |
| floating-point registers.</li><br> |
| <p> |
| |
| <li>As a result, when a floating-point register is written to |
| memory, the associated V bits are set to indicate a valid |
| value.</li><br> |
| <p> |
| |
| <li>When values in integer CPU registers are used to generate a |
| memory address, or to determine the outcome of a conditional |
| branch, the V bits for those values are checked, and an error |
| emitted if any of them are undefined.</li><br> |
| <p> |
| |
| <li>When values in integer CPU registers are used for any other |
| purpose, Valgrind computes the V bits for the result, but does |
| not check them.</li><br> |
| <p> |
| |
| <li>One the V bits for a value in the CPU have been checked, they |
| are then set to indicate validity. This avoids long chains of |
| errors.</li><br> |
| <p> |
| |
| <li>When values are loaded from memory, valgrind checks the A bits |
| for that location and issues an illegal-address warning if |
| needed. In that case, the V bits loaded are forced to indicate |
| Valid, despite the location being invalid. |
| <p> |
| This apparently strange choice reduces the amount of confusing |
| information presented to the user. It avoids the |
| unpleasant phenomenon in which memory is read from a place which |
| is both unaddressible and contains invalid values, and, as a |
| result, you get not only an invalid-address (read/write) error, |
| but also a potentially large set of uninitialised-value errors, |
| one for every time the value is used. |
| <p> |
| There is a hazy boundary case to do with multi-byte loads from |
| addresses which are partially valid and partially invalid. See |
| details of the flag <code>--partial-loads-ok</code> for details. |
| </li><br> |
| </ul> |
| |
| Valgrind intercepts calls to malloc, calloc, realloc, valloc, |
| memalign, free, new and delete. The behaviour you get is: |
| |
| <ul> |
| |
| <li>malloc/new: the returned memory is marked as addressible but not |
| having valid values. This means you have to write on it before |
| you can read it.</li><br> |
| <p> |
| |
| <li>calloc: returned memory is marked both addressible and valid, |
| since calloc() clears the area to zero.</li><br> |
| <p> |
| |
| <li>realloc: if the new size is larger than the old, the new section |
| is addressible but invalid, as with malloc.</li><br> |
| <p> |
| |
| <li>If the new size is smaller, the dropped-off section is marked as |
| unaddressible. You may only pass to realloc a pointer |
| previously issued to you by malloc/calloc/new/realloc.</li><br> |
| <p> |
| |
| <li>free/delete: you may only pass to free a pointer previously |
| issued to you by malloc/calloc/new/realloc, or the value |
| NULL. Otherwise, Valgrind complains. If the pointer is indeed |
| valid, Valgrind marks the entire area it points at as |
| unaddressible, and places the block in the freed-blocks-queue. |
| The aim is to defer as long as possible reallocation of this |
| block. Until that happens, all attempts to access it will |
| elicit an invalid-address error, as you would hope.</li><br> |
| </ul> |
| |
| |
| |
| <a name="signals"></a> |
| <h3>3.4 Signals</h3> |
| |
| Valgrind provides suitable handling of signals, so, provided you stick |
| to POSIX stuff, you should be ok. Basic sigaction() and sigprocmask() |
| are handled. Signal handlers may return in the normal way or do |
| longjmp(); both should work ok. As specified by POSIX, a signal is |
| blocked in its own handler. Default actions for signals should work |
| as before. Etc, etc. |
| |
| <p>Under the hood, dealing with signals is a real pain, and Valgrind's |
| simulation leaves much to be desired. If your program does |
| way-strange stuff with signals, bad things may happen. If so, let me |
| know. I don't promise to fix it, but I'd at least like to be aware of |
| it. |
| |
| |
| <a name="leaks"><a/> |
| <h3>3.5 Memory leak detection</h3> |
| |
| Valgrind keeps track of all memory blocks issued in response to calls |
| to malloc/calloc/realloc/new. So when the program exits, it knows |
| which blocks are still outstanding -- have not been returned, in other |
| words. Ideally, you want your program to have no blocks still in use |
| at exit. But many programs do. |
| |
| <p>For each such block, Valgrind scans the entire address space of the |
| process, looking for pointers to the block. One of three situations |
| may result: |
| |
| <ul> |
| <li>A pointer to the start of the block is found. This usually |
| indicates programming sloppiness; since the block is still |
| pointed at, the programmer could, at least in principle, free'd |
| it before program exit.</li><br> |
| <p> |
| |
| <li>A pointer to the interior of the block is found. The pointer |
| might originally have pointed to the start and have been moved |
| along, or it might be entirely unrelated. Valgrind deems such a |
| block as "dubious", that is, possibly leaked, |
| because it's unclear whether or |
| not a pointer to it still exists.</li><br> |
| <p> |
| |
| <li>The worst outcome is that no pointer to the block can be found. |
| The block is classified as "leaked", because the |
| programmer could not possibly have free'd it at program exit, |
| since no pointer to it exists. This might be a symptom of |
| having lost the pointer at some earlier point in the |
| program.</li> |
| </ul> |
| |
| Valgrind reports summaries about leaked and dubious blocks. |
| For each such block, it will also tell you where the block was |
| allocated. This should help you figure out why the pointer to it has |
| been lost. In general, you should attempt to ensure your programs do |
| not have any leaked or dubious blocks at exit. |
| |
| <p>The precise area of memory in which Valgrind searches for pointers |
| is: all naturally-aligned 4-byte words for which all A bits indicate |
| addressibility and all V bits indicated that the stored value is |
| actually valid. |
| |
| <p><hr width="100%"> |
| |
| |
| <a name="limits"></a> |
| <h2>4 Limitations</h2> |
| |
| The following list of limitations seems depressingly long. However, |
| most programs actually work fine. |
| |
| <p>Valgrind will run x86-GNU/Linux ELF dynamically linked binaries, on |
| a kernel 2.2.X or 2.4.X system, subject to the following constraints: |
| |
| <ul> |
| <li>No MMX, SSE, SSE2, 3DNow instructions. If the translator |
| encounters these, Valgrind will simply give up. It may be |
| possible to add support for them at a later time. Intel added a |
| few instructions such as "cmov" to the integer instruction set |
| on Pentium and later processors, and these are supported. |
| Nevertheless it's safest to think of Valgrind as implementing |
| the 486 instruction set.</li><br> |
| <p> |
| |
| <li>Pthreads support is improving, but there are still significant |
| limitations in that department. See the section above on |
| Pthreads. Note that your program must be dynamically linked |
| against <code>libpthread.so</code>, so that Valgrind can |
| substitute its own implementation at program startup time. If |
| you're statically linked against it, things will fail |
| badly.</li><br> |
| <p> |
| |
| <li>Valgrind assumes that the floating point registers are not used |
| as intermediaries in memory-to-memory copies, so it immediately |
| checks V bits in floating-point loads/stores. If you want to |
| write code which copies around possibly-uninitialised values, |
| you must ensure these travel through the integer registers, not |
| the FPU.</li><br> |
| <p> |
| |
| <li>If your program does its own memory management, rather than |
| using malloc/new/free/delete, it should still work, but |
| Valgrind's error checking won't be so effective.</li><br> |
| <p> |
| |
| <li>Valgrind's signal simulation is not as robust as it could be. |
| Basic POSIX-compliant sigaction and sigprocmask functionality is |
| supplied, but it's conceivable that things could go badly awry |
| if you do wierd things with signals. Workaround: don't. |
| Programs that do non-POSIX signal tricks are in any case |
| inherently unportable, so should be avoided if |
| possible.</li><br> |
| <p> |
| |
| <li>Programs which try to handle signals on |
| an alternate stack (sigaltstack) are not supported, although |
| they could be, with a bit of effort.</li><br> |
| <p> |
| |
| <li>Programs which switch stacks are not well handled. Valgrind |
| does have support for this, but I don't have great faith in it. |
| It's difficult -- there's no cast-iron way to decide whether a |
| large change in %esp is as a result of the program switching |
| stacks, or merely allocating a large object temporarily on the |
| current stack -- yet Valgrind needs to handle the two situations |
| differently. 1 May 02: this probably interacts badly with the |
| new pthread support. I haven't checked properly.</li><br> |
| <p> |
| |
| <li>x86 instructions, and system calls, have been implemented on |
| demand. So it's possible, although unlikely, that a program |
| will fall over with a message to that effect. If this happens, |
| please mail me ALL the details printed out, so I can try and |
| implement the missing feature.</li><br> |
| <p> |
| |
| <li>x86 floating point works correctly, but floating-point code may |
| run even more slowly than integer code, due to my simplistic |
| approach to FPU emulation.</li><br> |
| <p> |
| |
| <li>You can't Valgrind-ize statically linked binaries. Valgrind |
| relies on the dynamic-link mechanism to gain control at |
| startup.</li><br> |
| <p> |
| |
| <li>Memory consumption of your program is majorly increased whilst |
| running under Valgrind. This is due to the large amount of |
| adminstrative information maintained behind the scenes. Another |
| cause is that Valgrind dynamically translates the original |
| executable and never throws any translation away, except in |
| those rare cases where self-modifying code is detected. |
| Translated, instrumented code is 12-14 times larger than the |
| original (!) so you can easily end up with 15+ MB of |
| translations when running (eg) a web browser. |
| </li> |
| </ul> |
| |
| |
| Programs which are known not to work are: |
| |
| <ul> |
| <li>emacs starts up but immediately concludes it is out of memory |
| and aborts. Emacs has it's own memory-management scheme, but I |
| don't understand why this should interact so badly with |
| Valgrind. Emacs works fine if you build it to use the standard |
| malloc/free routines.</li><br> |
| <p> |
| </ul> |
| |
| |
| <p><hr width="100%"> |
| |
| |
| <a name="howitworks"></a> |
| <h2>5 How it works -- a rough overview</h2> |
| Some gory details, for those with a passion for gory details. You |
| don't need to read this section if all you want to do is use Valgrind. |
| |
| <a name="startb"></a> |
| <h3>5.1 Getting started</h3> |
| |
| Valgrind is compiled into a shared object, valgrind.so. The shell |
| script valgrind sets the LD_PRELOAD environment variable to point to |
| valgrind.so. This causes the .so to be loaded as an extra library to |
| any subsequently executed dynamically-linked ELF binary, viz, the |
| program you want to debug. |
| |
| <p>The dynamic linker allows each .so in the process image to have an |
| initialisation function which is run before main(). It also allows |
| each .so to have a finalisation function run after main() exits. |
| |
| <p>When valgrind.so's initialisation function is called by the dynamic |
| linker, the synthetic CPU to starts up. The real CPU remains locked |
| in valgrind.so for the entire rest of the program, but the synthetic |
| CPU returns from the initialisation function. Startup of the program |
| now continues as usual -- the dynamic linker calls all the other .so's |
| initialisation routines, and eventually runs main(). This all runs on |
| the synthetic CPU, not the real one, but the client program cannot |
| tell the difference. |
| |
| <p>Eventually main() exits, so the synthetic CPU calls valgrind.so's |
| finalisation function. Valgrind detects this, and uses it as its cue |
| to exit. It prints summaries of all errors detected, possibly checks |
| for memory leaks, and then exits the finalisation routine, but now on |
| the real CPU. The synthetic CPU has now lost control -- permanently |
| -- so the program exits back to the OS on the real CPU, just as it |
| would have done anyway. |
| |
| <p>On entry, Valgrind switches stacks, so it runs on its own stack. |
| On exit, it switches back. This means that the client program |
| continues to run on its own stack, so we can switch back and forth |
| between running it on the simulated and real CPUs without difficulty. |
| This was an important design decision, because it makes it easy (well, |
| significantly less difficult) to debug the synthetic CPU. |
| |
| |
| <a name="engine"></a> |
| <h3>5.2 The translation/instrumentation engine</h3> |
| |
| Valgrind does not directly run any of the original program's code. Only |
| instrumented translations are run. Valgrind maintains a translation |
| table, which allows it to find the translation quickly for any branch |
| target (code address). If no translation has yet been made, the |
| translator - a just-in-time translator - is summoned. This makes an |
| instrumented translation, which is added to the collection of |
| translations. Subsequent jumps to that address will use this |
| translation. |
| |
| <p>Valgrind can optionally check writes made by the application, to |
| see if they are writing an address contained within code which has |
| been translated. Such a write invalidates translations of code |
| bracketing the written address. Valgrind will discard the relevant |
| translations, which causes them to be re-made, if they are needed |
| again, reflecting the new updated data stored there. In this way, |
| self modifying code is supported. In practice I have not found any |
| Linux applications which use self-modifying-code. |
| |
| <p>The JITter translates basic blocks -- blocks of straight-line-code |
| -- as single entities. To minimise the considerable difficulties of |
| dealing with the x86 instruction set, x86 instructions are first |
| translated to a RISC-like intermediate code, similar to sparc code, |
| but with an infinite number of virtual integer registers. Initially |
| each insn is translated seperately, and there is no attempt at |
| instrumentation. |
| |
| <p>The intermediate code is improved, mostly so as to try and cache |
| the simulated machine's registers in the real machine's registers over |
| several simulated instructions. This is often very effective. Also, |
| we try to remove redundant updates of the simulated machines's |
| condition-code register. |
| |
| <p>The intermediate code is then instrumented, giving more |
| intermediate code. There are a few extra intermediate-code operations |
| to support instrumentation; it is all refreshingly simple. After |
| instrumentation there is a cleanup pass to remove redundant value |
| checks. |
| |
| <p>This gives instrumented intermediate code which mentions arbitrary |
| numbers of virtual registers. A linear-scan register allocator is |
| used to assign real registers and possibly generate spill code. All |
| of this is still phrased in terms of the intermediate code. This |
| machinery is inspired by the work of Reuben Thomas (MITE). |
| |
| <p>Then, and only then, is the final x86 code emitted. The |
| intermediate code is carefully designed so that x86 code can be |
| generated from it without need for spare registers or other |
| inconveniences. |
| |
| <p>The translations are managed using a traditional LRU-based caching |
| scheme. The translation cache has a default size of about 14MB. |
| |
| <a name="track"></a> |
| |
| <h3>5.3 Tracking the status of memory</h3> Each byte in the |
| process' address space has nine bits associated with it: one A bit and |
| eight V bits. The A and V bits for each byte are stored using a |
| sparse array, which flexibly and efficiently covers arbitrary parts of |
| the 32-bit address space without imposing significant space or |
| performance overheads for the parts of the address space never |
| visited. The scheme used, and speedup hacks, are described in detail |
| at the top of the source file vg_memory.c, so you should read that for |
| the gory details. |
| |
| <a name="sys_calls"></a> |
| |
| <h3>5.4 System calls</h3> |
| All system calls are intercepted. The memory status map is consulted |
| before and updated after each call. It's all rather tiresome. See |
| vg_syscall_mem.c for details. |
| |
| <a name="sys_signals"></a> |
| |
| <h3>5.5 Signals</h3> |
| All system calls to sigaction() and sigprocmask() are intercepted. If |
| the client program is trying to set a signal handler, Valgrind makes a |
| note of the handler address and which signal it is for. Valgrind then |
| arranges for the same signal to be delivered to its own handler. |
| |
| <p>When such a signal arrives, Valgrind's own handler catches it, and |
| notes the fact. At a convenient safe point in execution, Valgrind |
| builds a signal delivery frame on the client's stack and runs its |
| handler. If the handler longjmp()s, there is nothing more to be said. |
| If the handler returns, Valgrind notices this, zaps the delivery |
| frame, and carries on where it left off before delivering the signal. |
| |
| <p>The purpose of this nonsense is that setting signal handlers |
| essentially amounts to giving callback addresses to the Linux kernel. |
| We can't allow this to happen, because if it did, signal handlers |
| would run on the real CPU, not the simulated one. This means the |
| checking machinery would not operate during the handler run, and, |
| worse, memory permissions maps would not be updated, which could cause |
| spurious error reports once the handler had returned. |
| |
| <p>An even worse thing would happen if the signal handler longjmp'd |
| rather than returned: Valgrind would completely lose control of the |
| client program. |
| |
| <p>Upshot: we can't allow the client to install signal handlers |
| directly. Instead, Valgrind must catch, on behalf of the client, any |
| signal the client asks to catch, and must delivery it to the client on |
| the simulated CPU, not the real one. This involves considerable |
| gruesome fakery; see vg_signals.c for details. |
| <p> |
| |
| <hr width="100%"> |
| |
| <a name="example"></a> |
| <h2>6 Example</h2> |
| This is the log for a run of a small program. The program is in fact |
| correct, and the reported error is as the result of a potentially serious |
| code generation bug in GNU g++ (snapshot 20010527). |
| <pre> |
| sewardj@phoenix:~/newmat10$ |
| ~/Valgrind-6/valgrind -v ./bogon |
| ==25832== Valgrind 0.10, a memory error detector for x86 RedHat 7.1. |
| ==25832== Copyright (C) 2000-2001, and GNU GPL'd, by Julian Seward. |
| ==25832== Startup, with flags: |
| ==25832== --suppressions=/home/sewardj/Valgrind/redhat71.supp |
| ==25832== reading syms from /lib/ld-linux.so.2 |
| ==25832== reading syms from /lib/libc.so.6 |
| ==25832== reading syms from /mnt/pima/jrs/Inst/lib/libgcc_s.so.0 |
| ==25832== reading syms from /lib/libm.so.6 |
| ==25832== reading syms from /mnt/pima/jrs/Inst/lib/libstdc++.so.3 |
| ==25832== reading syms from /home/sewardj/Valgrind/valgrind.so |
| ==25832== reading syms from /proc/self/exe |
| ==25832== loaded 5950 symbols, 142333 line number locations |
| ==25832== |
| ==25832== Invalid read of size 4 |
| ==25832== at 0x8048724: _ZN10BandMatrix6ReSizeEiii (bogon.cpp:45) |
| ==25832== by 0x80487AF: main (bogon.cpp:66) |
| ==25832== by 0x40371E5E: __libc_start_main (libc-start.c:129) |
| ==25832== by 0x80485D1: (within /home/sewardj/newmat10/bogon) |
| ==25832== Address 0xBFFFF74C is not stack'd, malloc'd or free'd |
| ==25832== |
| ==25832== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) |
| ==25832== malloc/free: in use at exit: 0 bytes in 0 blocks. |
| ==25832== malloc/free: 0 allocs, 0 frees, 0 bytes allocated. |
| ==25832== For a detailed leak analysis, rerun with: --leak-check=yes |
| ==25832== |
| ==25832== exiting, did 1881 basic blocks, 0 misses. |
| ==25832== 223 translations, 3626 bytes in, 56801 bytes out. |
| </pre> |
| <p>The GCC folks fixed this about a week before gcc-3.0 shipped. |
| <hr width="100%"> |
| <p> |
| |
| |
| |
| <a name="cache"></a> |
| <h2>7 Cache profiling</h2> |
| As well as memory debugging, Valgrind also allows you to do cache simulations |
| and annotate your source line-by-line with the number of cache misses. In |
| particular, it records: |
| <ul> |
| <li>L1 instruction cache reads and misses; |
| <li>L1 data cache reads and read misses, writes and write misses; |
| <li>L2 unified cache reads and read misses, writes and writes misses. |
| </ul> |
| On a modern x86 machine, an L1 miss will typically cost around 10 cycles, |
| and an L2 miss can cost as much as 200 cycles. Detailed cache profiling can be |
| very useful for improving the performance of your program.<p> |
| |
| Also, since one instruction cache read is performed per instruction executed, |
| you can find out how many instructions are executed per line, which can be |
| useful for optimisation and test coverage.<p> |
| |
| Please note that this is an experimental feature. Any feedback, bug-fixes, |
| suggestions, etc, welcome. |
| |
| |
| <h3>7.1 Overview</h3> |
| First off, as for normal Valgrind use, you probably want to turn on debugging |
| info (the <code>-g</code> flag). But by contrast with normal Valgrind use, you |
| probably <b>do</b> want to turn optimisation on, since you should profile your |
| program as it will be normally run. |
| |
| The three steps are: |
| <ol> |
| <li>Generate a cache simulator for your machine's cache |
| configuration with the supplied <code>vg_cachegen</code> |
| program, and recompile Valgrind with <code>make install</code>. |
| <p> |
| The default settings are for an AMD Athlon, and you will get |
| useful information with the defaults, so you can skip this step |
| if you want. Nevertheless, for accurate cache profiles you will |
| need use <code>vg_cachegen</code> to customise |
| <code>cachegrind</code> for your system. |
| <p> |
| This step only needs to be done once, unless you are interested |
| in simulating different cache configurations (eg. first |
| concentrating on instruction cache misses, then on data cache |
| misses). |
| </li> |
| <p> |
| <li>Run your program with <code>cachegrind</code> in front of the |
| normal command line invocation. When the program finishes, |
| Valgrind will print summary cache statistics. It also collects |
| line-by-line information in a file <code>cachegrind.out</code>. |
| <p> |
| This step should be done every time you want to collect |
| information about a new program, a changed program, or about the |
| same program with different input. |
| </li> |
| <p> |
| <li>Generate a function-by-function summary, and possibly annotate |
| source files with 'vg_annotate'. Source files to annotate can be |
| specified manually, or manually on the command line, or |
| "interesting" source files can be annotated automatically with |
| the <code>--auto=yes</code> option. You can annotate C/C++ |
| files or assembly language files equally easily.</li> |
| <p> |
| This step can be performed as many times as you like for each |
| Step 2. You may want to do multiple annotations showing |
| different information each time.<p> |
| </ol> |
| |
| The steps are described in detail in the following sections.<p> |
| |
| |
| <a name="generate"></a> |
| <h3>7.3 Generating a cache simulator</h3> |
| |
| Although Valgrind comes with a pre-generated cache simulator, it most |
| likely won't match the cache configuration of your machine, so you |
| should generate a new simulator.<p> |
| |
| You need to generate three files, one for each of the I1, D1 and L2 |
| caches. For each cache, you need to know the: |
| <ul> |
| <li>Cache size (bytes); |
| <li>Line size (bytes); |
| <li>Associativity. |
| </ul> |
| |
| vg_cachegen takes three options: |
| <ul> |
| <li><code>--I1=size,line_size,associativity</code> |
| <li><code>--D1=size,line_size,associativity</code> |
| <li><code>--L2=size,line_size,associativity</code> |
| </ul> |
| |
| You can specify one, two or all three caches per invocation of |
| vg_cachegen. It checks that the configuration is sensible before |
| generating the simulators; to see the allowed values, run |
| <code>vg_cachegen -h</code>.<p> |
| |
| An example invocation would be: |
| |
| <blockquote><code> |
| vg_cachegen --I1=65536,64,2 --D1=65536,64,2 --L2=262144,64,8 |
| </code></blockquote> |
| |
| This simulates a machine with a 128KB split L1 2-way associative |
| cache, and a 256KB unified 8-way associative L2 cache. Both caches |
| have 64B lines.<p> |
| |
| If you don't know your cache configuration, you'll have to find it |
| out. (Ideally <code>vg_cachegen</code> could auto-identify your cache |
| configuration using the CPUID instruction, which could be done |
| automatically during installation, and this whole step could be |
| skipped.)<p> |
| |
| |
| <h3>7.4 Cache simulation specifics</h3> |
| |
| <code>vg_cachegen</code> only generates simulations for a machine with |
| a split L1 cache and a unified L2 cache. This configuration is used |
| for all (modern) x86-based machines we are aware of. Old Cyrix CPUs |
| had a unified I and D L1 cache, but they are ancient history now.<p> |
| |
| The more specific characteristics of the simulation are as follows. |
| |
| <ul> |
| <li>Write-allocate: when a write miss occurs, the block written to |
| is brought into the D1 cache. Most modern caches have this |
| property.</li><p> |
| |
| <li>Bit-selection hash function: the line(s) in the cache to which a |
| memory block maps is chosen by the middle bits M--(M+N-1) of the |
| byte address, where: |
| <ul> |
| <li> line size = 2^M bytes </li> |
| <li>(cache size / line size) = 2^N bytes</li> |
| </ul> </li><p> |
| |
| <li>Inclusive L2 cache: the L2 cache replicates all the entries of |
| the L1 cache. This is standard on Pentium chips, but AMD |
| Athlons use an exclusive L2 cache that only holds blocks evicted |
| from L1. Ditto AMD Durons and most modern VIAs.</li><p> |
| </ul> |
| |
| Other noteworthy behaviour: |
| |
| <ul> |
| <li>References that straddle two cache lines are treated as follows:</li> |
| <ul> |
| <li>If both blocks hit --> counted as one hit</li> |
| <li>If one block hits, the other misses --> counted as one miss</li> |
| <li>If both blocks miss --> counted as one miss (not two)</li> |
| </ul><p> |
| |
| <li>Instructions that modify a memory location (eg. <code>inc</code> and |
| <code>dec</code>) are counted as doing just a read, ie. a single data |
| reference. This may seem strange, but since the write can never cause a |
| miss (the read guarantees the block is in the cache) it's not very |
| interesting.<p> |
| |
| Thus it measures not the number of times the data cache is accessed, but |
| the number of times a data cache miss could occur.<p> |
| </li> |
| </ul> |
| |
| If you are interested in simulating a cache with different properties, it is |
| not particularly hard to write your own cache simulator, or to modify existing |
| ones in <code>vg_cachesim_I1.c</code>, <code>vg_cachesim_I1.c</code> and |
| <code>vg_cachesim_I1.c</code>. We'd be interested to hear from anyone who |
| does. |
| |
| |
| <a name="profile"></a> |
| <h3>7.5 Profiling programs</h3> |
| |
| Cache profiling is enabled by using the <code>--cachesim=yes</code> |
| option to the <code>valgrind</code> shell script. Alternatively, it |
| is probably more convenient to use the <code>cachegrind</code> script. |
| This automatically turns off Valgrind's memory checking functions, |
| since the cache simulation is slow enough already, and you probably |
| don't want to do both at once. |
| <p> |
| To gather cache profiling information about the program <code>ls |
| -l<code, type: |
| |
| <blockquote><code>cachegrind ls -l</code></blockquote> |
| |
| The program will execute (slowly). Upon completion, summary statistics |
| that look like this will be printed: |
| |
| <pre> |
| ==31751== I refs: 27,742,716 |
| ==31751== I1 misses: 276 |
| ==31751== L2 misses: 275 |
| ==31751== I1 miss rate: 0.0% |
| ==31751== L2i miss rate: 0.0% |
| ==31751== |
| ==31751== D refs: 15,430,290 (10,955,517 rd + 4,474,773 wr) |
| ==31751== D1 misses: 41,185 ( 21,905 rd + 19,280 wr) |
| ==31751== L2 misses: 23,085 ( 3,987 rd + 19,098 wr) |
| ==31751== D1 miss rate: 0.2% ( 0.1% + 0.4%) |
| ==31751== L2d miss rate: 0.1% ( 0.0% + 0.4%) |
| ==31751== |
| ==31751== L2 misses: 23,360 ( 4,262 rd + 19,098 wr) |
| ==31751== L2 miss rate: 0.0% ( 0.0% + 0.4%) |
| </pre> |
| |
| Cache accesses for instruction fetches are summarised first, giving the |
| number of fetches made (this is the number of instructions executed, which |
| can be useful to know in its own right), the number of I1 misses, and the |
| number of L2 instruction (<code>L2i</code>) misses.<p> |
| |
| Cache accesses for data follow. The information is similar to that of the |
| instruction fetches, except that the values are also shown split between reads |
| and writes (note each row's <code>rd</code> and <code>wr</code> values add up |
| to the row's total).<p> |
| |
| Combined instruction and data figures for the L2 cache follow that.<p> |
| |
| |
| <h3>7.6 Output file</h3> |
| |
| As well as printing summary information, Cachegrind also writes |
| line-by-line cache profiling information to a file named |
| <code>cachegrind.out</code>. This file is human-readable, but is best |
| interpreted by the accompanying program <code>vg_annotate</code>, |
| described in the next section. |
| <p> |
| Things to note about the <code>cachegrind.out</code> file: |
| <ul> |
| <li>It is written every time <code>valgrind --cachesim=yes</code> or |
| <code>cachegrind</code> is run, and will overwrite any existing |
| <code>cachegrind.out</code> in the current directory.</li> |
| <p> |
| <li>It can be huge: <code>ls -l</code> generates a file of about |
| 350KB. Browsing a few files and web pages with a Konqueror |
| built with full debugging information generates a file |
| of around 15 MB.</li> |
| </ul> |
| |
| |
| <a name="annotate"></a> |
| <h3>7.7 Annotating C/C++ programs</h3> |
| |
| Before using <code>vg_annotate</code>, it is worth widening your |
| window to be at least 120-characters wide if possible, as the output |
| lines can be quite long. |
| <p> |
| To get a function-by-function summary, run <code>vg_annotate</code> in |
| directory containing a <code>cachegrind.out</code> file. The output |
| looks like this: |
| |
| <pre> |
| -------------------------------------------------------------------------------- |
| I1 cache: 65536 B, 64 B, 2-way associative |
| D1 cache: 65536 B, 64 B, 2-way associative |
| L2 cache: 262144 B, 64 B, 8-way associative |
| Command: concord vg_to_ucode.c |
| Events recorded: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw |
| Events shown: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw |
| Event sort order: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw |
| Threshold: 99% |
| Chosen for annotation: |
| Auto-annotation: on |
| |
| -------------------------------------------------------------------------------- |
| Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw |
| -------------------------------------------------------------------------------- |
| 27,742,716 276 275 10,955,517 21,905 3,987 4,474,773 19,280 19,098 PROGRAM TOTALS |
| |
| -------------------------------------------------------------------------------- |
| Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw file:function |
| -------------------------------------------------------------------------------- |
| 8,821,482 5 5 2,242,702 1,621 73 1,794,230 0 0 getc.c:_IO_getc |
| 5,222,023 4 4 2,276,334 16 12 875,959 1 1 concord.c:get_word |
| 2,649,248 2 2 1,344,810 7,326 1,385 . . . vg_main.c:strcmp |
| 2,521,927 2 2 591,215 0 0 179,398 0 0 concord.c:hash |
| 2,242,740 2 2 1,046,612 568 22 448,548 0 0 ctype.c:tolower |
| 1,496,937 4 4 630,874 9,000 1,400 279,388 0 0 concord.c:insert |
| 897,991 51 51 897,831 95 30 62 1 1 ???:??? |
| 598,068 1 1 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__flockfile |
| 598,068 0 0 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__funlockfile |
| 598,024 4 4 213,580 35 16 149,506 0 0 vg_clientmalloc.c:malloc |
| 446,587 1 1 215,973 2,167 430 129,948 14,057 13,957 concord.c:add_existing |
| 341,760 2 2 128,160 0 0 128,160 0 0 vg_clientmalloc.c:vg_trap_here_WRAPPER |
| 320,782 4 4 150,711 276 0 56,027 53 53 concord.c:init_hash_table |
| 298,998 1 1 106,785 0 0 64,071 1 1 concord.c:create |
| 149,518 0 0 149,516 0 0 1 0 0 ???:tolower@@GLIBC_2.0 |
| 149,518 0 0 149,516 0 0 1 0 0 ???:fgetc@@GLIBC_2.0 |
| 95,983 4 4 38,031 0 0 34,409 3,152 3,150 concord.c:new_word_node |
| 85,440 0 0 42,720 0 0 21,360 0 0 vg_clientmalloc.c:vg_bogus_epilogue |
| </pre> |
| |
| First up is a summary of the annotation options: |
| |
| <ul> |
| <li>I1 cache, D1 cache, L2 cache: cache configuration. So you know the |
| configuration with which these results were obtained.</li><p> |
| |
| <li>Command: the command line invocation of the program under |
| examination.</li><p> |
| |
| <li>Events recorded: event abbreviations are:<p> |
| <ul> |
| <li><code>Ir </code>: I cache reads (ie. instructions executed)</li> |
| <li><code>I1mr</code>: I1 cache read misses</li> |
| <li><code>I2mr</code>: L2 cache instruction read misses</li> |
| <li><code>Dr </code>: D cache reads (ie. memory reads)</li> |
| <li><code>D1mr</code>: D1 cache read misses</li> |
| <li><code>D2mr</code>: L2 cache data read misses</li> |
| <li><code>Dw </code>: D cache writes (ie. memory writes)</li> |
| <li><code>D1mw</code>: D1 cache write misses</li> |
| <li><code>D2mw</code>: L2 cache data write misses</li> |
| </ul><p> |
| Note that D1 total accesses is given by <code>D1mr</code> + |
| <code>D1mw</code>, and that L2 total accesses is given by |
| <code>I2mr</code> + <code>D2mr</code> + <code>D2mw</code>.</li><p> |
| |
| <li>Events shown: the events shown (a subset of events gathered). This can |
| be adjusted with the <code>--show</code> option.</li><p> |
| |
| <li>Event sort order: the sort order in which functions are shown. For |
| example, in this case the functions are sorted from highest |
| <code>Ir</code> counts to lowest. If two functions have identical |
| <code>Ir</code> counts, they will then be sorted by <code>I1mr</code> |
| counts, and so on. This order can be adjusted with the |
| <code>--sort</code> option.<p> |
| |
| Note that this dictates the order the functions appear. It is <b>not</b> |
| the order in which the columns appear; that is dictated by the "events |
| shown" line (and can be changed with the <code>--sort</code> option). |
| </li><p> |
| |
| <li>Threshold: <code>vg_annotate</code> by default omits functions |
| that cause very low numbers of misses to avoid drowning you in |
| information. In this case, vg_annotate shows summaries the |
| functions that account for 99% of the <code>Ir</code> counts; |
| <code>Ir</code> is chosen as the threshold event since it is the |
| primary sort event. The threshold can be adjusted with the |
| <code>--threshold</code> option.</li><p> |
| |
| <li>Chosen for annotation: names of files specified manually for annotation; |
| in this case none.</li><p> |
| |
| <li>Auto-annotation: whether auto-annotation was requested via the |
| <code>--auto=yes</code> option. In this case no.</li><p> |
| </ul> |
| |
| Then follows summary statistics for the whole program. These are similar |
| to the summary provided when running <code>valgrind --cachesim=yes</code>.<p> |
| |
| Then follows function-by-function statistics. Each function is |
| identified by a <code>file_name:function_name</code> pair. If a column |
| contains only a dot it means the function never performs |
| that event (eg. the third row shows that <code>strcmp()</code> |
| contains no instructions that write to memory). The name |
| <code>???</code> is used if the the file name and/or function name |
| could not be determined from debugging information. If most of the |
| entries have the form <code>???:???</code> the program probably wasn't |
| compiled with <code>-g</code>. <p> |
| |
| It is worth noting that functions will come from three types of source files: |
| <ol> |
| <li> From the profiled program (<code>concord.c</code> in this example).</li> |
| <li>From libraries (eg. <code>getc.c</code>)</li> |
| <li>From Valgrind's implementation of some libc functions (eg. |
| <code>vg_clientmalloc.c:malloc</code>). These are recognisable because |
| the filename begins with <code>vg_</code>, and is probably one of |
| <code>vg_main.c</code>, <code>vg_clientmalloc.c</code> or |
| <code>vg_mylibc.c</code>. |
| </li> |
| </ol> |
| |
| There are two ways to annotate source files -- by choosing them |
| manually, or with the <code>--auto=yes</code> option. To do it |
| manually, just specify the filenames as arguments to |
| <code>vg_annotate</code>. For example, the output from running |
| <code>vg_annotate concord.c</code> for our example produces the same |
| output as above followed by an annotated version of |
| <code>concord.c</code>, a section of which looks like: |
| |
| <pre> |
| -------------------------------------------------------------------------------- |
| -- User-annotated source: concord.c |
| -------------------------------------------------------------------------------- |
| Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw |
| |
| [snip] |
| |
| . . . . . . . . . void init_hash_table(char *file_name, Word_Node *table[]) |
| 3 1 1 . . . 1 0 0 { |
| . . . . . . . . . FILE *file_ptr; |
| . . . . . . . . . Word_Info *data; |
| 1 0 0 . . . 1 1 1 int line = 1, i; |
| . . . . . . . . . |
| 5 0 0 . . . 3 0 0 data = (Word_Info *) create(sizeof(Word_Info)); |
| . . . . . . . . . |
| 4,991 0 0 1,995 0 0 998 0 0 for (i = 0; i < TABLE_SIZE; i++) |
| 3,988 1 1 1,994 0 0 997 53 52 table[i] = NULL; |
| . . . . . . . . . |
| . . . . . . . . . /* Open file, check it. */ |
| 6 0 0 1 0 0 4 0 0 file_ptr = fopen(file_name, "r"); |
| 2 0 0 1 0 0 . . . if (!(file_ptr)) { |
| . . . . . . . . . fprintf(stderr, "Couldn't open '%s'.\n", file_name); |
| 1 1 1 . . . . . . exit(EXIT_FAILURE); |
| . . . . . . . . . } |
| . . . . . . . . . |
| 165,062 1 1 73,360 0 0 91,700 0 0 while ((line = get_word(data, line, file_ptr)) != EOF) |
| 146,712 0 0 73,356 0 0 73,356 0 0 insert(data->;word, data->line, table); |
| . . . . . . . . . |
| 4 0 0 1 0 0 2 0 0 free(data); |
| 4 0 0 1 0 0 2 0 0 fclose(file_ptr); |
| 3 0 0 2 0 0 . . . } |
| </pre> |
| |
| (Although column widths are automatically minimised, a wide terminal is clearly |
| useful.)<p> |
| |
| Each source file is clearly marked (<code>User-annotated source</code>) as |
| having been chosen manually for annotation. If the file was found in one of |
| the directories specified with the <code>-I</code>/<code>--include</code> |
| option, the directory and file are both given.<p> |
| |
| Each line is annotated with its event counts. Events not applicable for a line |
| are represented by a `.'; this is useful for distinguishing between an event |
| which cannot happen, and one which can but did not.<p> |
| |
| Sometimes only a small section of a source file is executed. To minimise |
| uninteresting output, Valgrind only shows annotated lines and lines within a |
| small distance of annotated lines. Gaps are marked with the line numbers so |
| you know which part of a file the shown code comes from, eg: |
| |
| <pre> |
| (figures and code for line 704) |
| -- line 704 ---------------------------------------- |
| -- line 878 ---------------------------------------- |
| (figures and code for line 878) |
| </pre> |
| |
| The amount of context to show around annotated lines is controlled by the |
| <code>--context</code> option.<p> |
| |
| To get automatic annotation, run <code>vg_annotate --auto=yes</code>. |
| vg_annotate will automatically annotate every source file it can find that is |
| mentioned in the function-by-function summary. Therefore, the files chosen for |
| auto-annotation are affected by the <code>--sort</code> and |
| <code>--threshold</code> options. Each source file is clearly marked |
| (<code>Auto-annotated source</code>) as being chosen automatically. Any files |
| that could not be found are mentioned at the end of the output, eg: |
| |
| <pre> |
| -------------------------------------------------------------------------------- |
| The following files chosen for auto-annotation could not be found: |
| -------------------------------------------------------------------------------- |
| getc.c |
| ctype.c |
| ../sysdeps/generic/lockfile.c |
| </pre> |
| |
| This is quite common for library files, since libraries are usually compiled |
| with debugging information, but the source files are often not present on a |
| system. If a file is chosen for annotation <b>both</b> manually and |
| automatically, it is marked as <code>User-annotated source</code>. |
| |
| Use the <code>-I/--include</code> option to tell Valgrind where to look for |
| source files if the filenames found from the debugging information aren't |
| specific enough. |
| |
| Beware that vg_annotate can take some time to digest large |
| <code>cachegrind.out</code> files, eg. 30 seconds or more. Also beware that |
| auto-annotation can produce a lot of output if your program is large! |
| |
| |
| <h3>7.8 Annotating assembler programs</h3> |
| |
| Valgrind can annotate assembler programs too, or annotate the |
| assembler generated for your C program. Sometimes this is useful for |
| understanding what is really happening when an interesting line of C |
| code is translated into multiple instructions.<p> |
| |
| To do this, you just need to assemble your <code>.s</code> files with |
| assembler-level debug information. gcc doesn't do this, but you can |
| use the GNU assembler with the <code>--gstabs</code> option to |
| generate object files with this information, eg: |
| |
| <blockquote><code>as --gstabs foo.s</code></blockquote> |
| |
| You can then profile and annotate source files in the same way as for C/C++ |
| programs. |
| |
| |
| <h3>7.9 <code>vg_annotate</code> options</h3> |
| <ul> |
| <li><code>-h, --help</code></li><p> |
| <li><code>-v, --version</code><p> |
| |
| Help and version, as usual.</li> |
| |
| <li><code>--sort=A,B,C</code> [default: order in |
| <code>cachegrind.out</code>]<p> |
| Specifies the events upon which the sorting of the function-by-function |
| entries will be based. Useful if you want to concentrate on eg. I cache |
| misses (<code>--sort=I1mr,I2mr</code>), or D cache misses |
| (<code>--sort=D1mr,D2mr</code>), or L2 misses |
| (<code>--sort=D2mr,I2mr</code>).</li><p> |
| |
| <li><code>--show=A,B,C</code> [default: all, using order in |
| <code>cachegrind.out</code>]<p> |
| Specifies which events to show (and the column order). Default is to use |
| all present in the <code>cachegrind.out</code> file (and use the order in |
| the file).</li><p> |
| |
| <li><code>--threshold=X</code> [default: 99%] <p> |
| Sets the threshold for the function-by-function summary. Functions are |
| shown that account for more than X% of all the primary sort events. If |
| auto-annotating, also affects which files are annotated.</li><p> |
| |
| <li><code>--auto=no</code> [default]<br> |
| <code>--auto=yes</code> <p> |
| When enabled, automatically annotates every file that is mentioned in the |
| function-by-function summary that can be found. Also gives a list of |
| those that couldn't be found. |
| |
| <li><code>--context=N</code> [default: 8]<p> |
| Print N lines of context before and after each annotated line. Avoids |
| printing large sections of source files that were not executed. Use a |
| large number (eg. 10,000) to show all source lines. |
| </li><p> |
| |
| <li><code>-I=<dir>, --include=<dir></code> |
| [default: empty string]<p> |
| Adds a directory to the list in which to search for files. Multiple |
| -I/--include options can be given to add multiple directories. |
| </ul> |
| |
| |
| <h3>7.10 Warnings</h3> |
| There are a couple of situations in which vg_annotate issues warnings. |
| |
| <ul> |
| <li>If a source file is more recent than the <code>cachegrind.out</code> |
| file. This is because the information in <code>cachegrind.out</code> is |
| only recorded with line numbers, so if the line numbers change at all in |
| the source (eg. lines added, deleted, swapped), any annotations will be |
| incorrect.<p> |
| |
| <li>If information is recorded about line numbers past the end of a file. |
| This can be caused by the above problem, ie. shortening the source file |
| while using an old <code>cachegrind.out</code> file. If this happens, |
| the figures for the bogus lines are printed anyway (clearly marked as |
| bogus) in case they are important.</li><p> |
| </ul> |
| |
| |
| <h3>7.10 Things to watch out for</h3> |
| Some odd things that can occur during annotation: |
| |
| <ul> |
| <li>If annotating at the assembler level, you might see something like this: |
| |
| <pre> |
| 1 0 0 . . . . . . leal -12(%ebp),%eax |
| 1 0 0 . . . 1 0 0 movl %eax,84(%ebx) |
| 2 0 0 0 0 0 1 0 0 movl $1,-20(%ebp) |
| . . . . . . . . . .align 4,0x90 |
| 1 0 0 . . . . . . movl $.LnrB,%eax |
| 1 0 0 . . . 1 0 0 movl %eax,-16(%ebp) |
| </pre> |
| |
| How can the third instruction be executed twice when the others are |
| executed only once? As it turns out, it isn't. Here's a dump of the |
| executable, from objdump: |
| |
| <pre> |
| 8048f25: 8d 45 f4 lea 0xfffffff4(%ebp),%eax |
| 8048f28: 89 43 54 mov %eax,0x54(%ebx) |
| 8048f2b: c7 45 ec 01 00 00 00 movl $0x1,0xffffffec(%ebp) |
| 8048f32: 89 f6 mov %esi,%esi |
| 8048f34: b8 08 8b 07 08 mov $0x8078b08,%eax |
| 8048f39: 89 45 f0 mov %eax,0xfffffff0(%ebp) |
| </pre> |
| |
| Notice the extra <code>mov %esi,%esi</code> instruction. Where did this |
| come from? The GNU assembler inserted it to serve as the two bytes of |
| padding needed to align the <code>movl $.LnrB,%eax</code> instruction on |
| a four-byte boundary, but pretended it didn't exist when adding debug |
| information. Thus when Valgrind reads the debug info it thinks that the |
| <code>movl $0x1,0xffffffec(%ebp)</code> instruction covers the address |
| range 0x8048f2b--0x804833 by itself, and attributes the counts for the |
| <code>mov %esi,%esi</code> to it.<p> |
| </li> |
| |
| <li>Inlined functions can cause strange results in the function-by-function |
| summary. If a function <code>inline_me()</code> is defined in |
| <code>foo.h</code> and inlined in the functions <code>f1()</code>, |
| <code>f2()</code> and <code>f3()</code> in <code>bar.c</code>, there will |
| not be a <code>foo.h:inline_me()</code> function entry. Instead, there |
| will be separate function entries for each inlining site, ie. |
| <code>foo.h:f1()</code>, <code>foo.h:f2()</code> and |
| <code>foo.h:f3()</code>. To find the total counts for |
| <code>foo.h:inline_me()</code>, add up the counts from each entry.<p> |
| |
| The reason for this is that although the debug info output by gcc |
| indicates the switch from <code>bar.c</code> to <code>foo.h</code>, it |
| doesn't indicate the name of the function in <code>foo.h</code>, so |
| Valgrind keeps using the old one.<p> |
| |
| <li>Sometimes, the same filename might be represented with a relative name |
| and with an absolute name in different parts of the debug info, eg: |
| <code>/home/user/proj/proj.h</code> and <code>../proj.h</code>. In this |
| case, if you use auto-annotation, the file will be annotated twice with |
| the counts split between the two.<p> |
| </li> |
| |
| <li>Files with more than 65,535 lines cause difficulties for the stabs debug |
| info reader. This is because the line number in the <code>struct |
| nlist</code> defined in <code>a.out.h</code> under Linux is only a 16-bit |
| number. Valgrind can handle some files with more than 65,535 lines |
| correctly by making some guesses to identify line number overflows. But |
| some cases are beyond it, in which case you'll get a warning message |
| explaining that annotations for the file might be incorrect. |
| </li> |
| </ul> |
| |
| Note: stabs is not an easy format to read. If you come across bizarre |
| annotations that look like might be caused by a bug in the stabs reader, |
| please let us know. |
| |
| |
| <h3>7.11 Accuracy</h3> |
| Valgrind's cache profiling has a number of shortcomings: |
| |
| <ul> |
| <li>It doesn't account for kernel activity -- the effect of system calls on |
| the cache contents is ignored.</li><p> |
| |
| <li>It doesn't account for other process activity (although this is probably |
| desirable when considering a single program).</li><p> |
| |
| <li>It doesn't account for virtual-to-physical address mappings; hence the |
| entire simulation is not a true representation of what's happening in the |
| cache.</li><p> |
| |
| <li>It doesn't account for cache misses not visible at the instruction level, |
| eg. those arising from TLB misses, or speculative execution.</li><p> |
| |
| <li>The instructions <code>bts</code>, <code>btr</code> and <code>btc</code> |
| will incorrectly be counted as doing a data read if both the arguments |
| are registers, eg: |
| |
| <blockquote><code>btsl %eax, %edx</code></blockquote> |
| |
| This should only happen rarely. |
| </ul> |
| |
| Another thing worth nothing is that results are very sensitive. Changing the |
| size of the <code>valgrind.so</code> file, the size of the program being |
| profiled, or even the length of its name can perturb the results. Variations |
| will be small, but don't expect perfectly repeatable results if your program |
| changes at all.<p> |
| |
| While these factors mean you shouldn't trust the results to be super-accurate, |
| hopefully they should be close enough to be useful.<p> |
| |
| |
| <h3>7.12 Todo</h3> |
| <ul> |
| <li>Use CPUID instruction to auto-identify cache configuration during |
| installation. This would save the user from having to know their cache |
| configuration and using vg_cachegen.</li> |
| <p> |
| <li>Program start-up/shut-down calls a lot of functions that aren't |
| interesting and just complicate the output. Would be nice to exclude |
| these somehow.</li> |
| <p> |
| </ul> |
| <hr width="100%"> |
| </body> |
| </html> |
| |