sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 1 | <html> |
| 2 | <head> |
| 3 | <style type="text/css"> |
| 4 | body { background-color: #ffffff; |
| 5 | color: #000000; |
| 6 | font-family: Times, Helvetica, Arial; |
| 7 | font-size: 14pt} |
| 8 | h4 { margin-bottom: 0.3em} |
| 9 | code { color: #000000; |
| 10 | font-family: Courier; |
| 11 | font-size: 13pt } |
| 12 | pre { color: #000000; |
| 13 | font-family: Courier; |
| 14 | font-size: 13pt } |
| 15 | a:link { color: #0000C0; |
| 16 | text-decoration: none; } |
| 17 | a:visited { color: #0000C0; |
| 18 | text-decoration: none; } |
| 19 | a:active { color: #0000C0; |
| 20 | text-decoration: none; } |
| 21 | </style> |
| 22 | </head> |
| 23 | |
| 24 | <body bgcolor="#ffffff"> |
| 25 | |
| 26 | <a name="title"> </a> |
sewardj | a7dc795 | 2002-03-24 11:29:13 +0000 | [diff] [blame] | 27 | <h1 align=center>Valgrind, snapshot 20020324</h1> |
sewardj | c7529c3 | 2002-04-16 01:55:18 +0000 | [diff] [blame] | 28 | <center>This manual was minimally updated on 20020415</center> |
| 29 | <p> |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 30 | |
| 31 | <center> |
| 32 | <a href="mailto:jseward@acm.org">jseward@acm.org<br> |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 33 | Copyright © 2000-2002 Julian Seward |
| 34 | <p> |
| 35 | Valgrind is licensed under the GNU General Public License, |
| 36 | version 2<br> |
| 37 | An open-source tool for finding memory-management problems in |
| 38 | Linux-x86 executables. |
| 39 | </center> |
| 40 | |
| 41 | <p> |
| 42 | |
| 43 | <hr width="100%"> |
| 44 | <a name="contents"></a> |
| 45 | <h2>Contents of this manual</h2> |
| 46 | |
| 47 | <h4>1 <a href="#intro">Introduction</a></h4> |
| 48 | 1.1 <a href="#whatfor">What Valgrind is for</a><br> |
| 49 | 1.2 <a href="#whatdoes">What it does with your program</a> |
| 50 | |
| 51 | <h4>2 <a href="#howtouse">How to use it, and how to make sense |
| 52 | of the results</a></h4> |
| 53 | 2.1 <a href="#starta">Getting started</a><br> |
| 54 | 2.2 <a href="#comment">The commentary</a><br> |
| 55 | 2.3 <a href="#report">Reporting of errors</a><br> |
| 56 | 2.4 <a href="#suppress">Suppressing errors</a><br> |
| 57 | 2.5 <a href="#flags">Command-line flags</a><br> |
| 58 | 2.6 <a href="#errormsgs">Explaination of error messages</a><br> |
| 59 | 2.7 <a href="#suppfiles">Writing suppressions files</a><br> |
| 60 | 2.8 <a href="#install">Building and installing</a><br> |
| 61 | 2.9 <a href="#problems">If you have problems</a><br> |
| 62 | |
| 63 | <h4>3 <a href="#machine">Details of the checking machinery</a></h4> |
| 64 | 3.1 <a href="#vvalue">Valid-value (V) bits</a><br> |
| 65 | 3.2 <a href="#vaddress">Valid-address (A) bits</a><br> |
| 66 | 3.3 <a href="#together">Putting it all together</a><br> |
| 67 | 3.4 <a href="#signals">Signals</a><br> |
| 68 | 3.5 <a href="#leaks">Memory leak detection</a><br> |
| 69 | |
| 70 | <h4>4 <a href="#limits">Limitations</a></h4> |
| 71 | |
| 72 | <h4>5 <a href="#howitworks">How it works -- a rough overview</a></h4> |
| 73 | 5.1 <a href="#startb">Getting started</a><br> |
| 74 | 5.2 <a href="#engine">The translation/instrumentation engine</a><br> |
| 75 | 5.3 <a href="#track">Tracking the status of memory</a><br> |
| 76 | 5.4 <a href="#sys_calls">System calls</a><br> |
| 77 | 5.5 <a href="#sys_signals">Signals</a><br> |
| 78 | |
| 79 | <h4>6 <a href="#example">An example</a></h4> |
| 80 | |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame^] | 81 | <h4>7 <a href="#cache">Cache profiling</a></h4> |
| 82 | |
| 83 | <h4>8 <a href="techdocs.html">The design and implementation of Valgrind</a></h4> |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 84 | |
| 85 | <hr width="100%"> |
| 86 | |
| 87 | <a name="intro"></a> |
| 88 | <h2>1 Introduction</h2> |
| 89 | |
| 90 | <a name="whatfor"></a> |
| 91 | <h3>1.1 What Valgrind is for</h3> |
| 92 | |
| 93 | Valgrind is a tool to help you find memory-management problems in your |
| 94 | programs. When a program is run under Valgrind's supervision, all |
| 95 | reads and writes of memory are checked, and calls to |
| 96 | malloc/new/free/delete are intercepted. As a result, Valgrind can |
| 97 | detect problems such as: |
| 98 | <ul> |
| 99 | <li>Use of uninitialised memory</li> |
| 100 | <li>Reading/writing memory after it has been free'd</li> |
| 101 | <li>Reading/writing off the end of malloc'd blocks</li> |
| 102 | <li>Reading/writing inappropriate areas on the stack</li> |
| 103 | <li>Memory leaks -- where pointers to malloc'd blocks are lost forever</li> |
| 104 | </ul> |
| 105 | |
| 106 | Problems like these can be difficult to find by other means, often |
| 107 | lying undetected for long periods, then causing occasional, |
| 108 | difficult-to-diagnose crashes. |
| 109 | |
| 110 | <p> |
| 111 | Valgrind is closely tied to details of the CPU, operating system and |
| 112 | to a less extent, compiler and basic C libraries. This makes it |
| 113 | difficult to make it portable, so I have chosen at the outset to |
| 114 | concentrate on what I believe to be a widely used platform: Red Hat |
| 115 | Linux 7.2, on x86s. I believe that it will work without significant |
| 116 | difficulty on other x86 GNU/Linux systems which use the 2.4 kernel and |
| 117 | GNU libc 2.2.X, for example SuSE 7.1 and Mandrake 8.0. Red Hat 6.2 is |
| 118 | also supported. It has worked in the past, and probably still does, |
| 119 | on RedHat 7.1 and 6.2. Note that I haven't compiled it on RedHat 7.1 |
| 120 | and 6.2 for a while, so they may no longer work now. |
| 121 | <p> |
| 122 | (Early Feb 02: after feedback from the KDE people it also works better |
| 123 | on other Linuxes). |
| 124 | <p> |
| 125 | At some point in the past, Valgrind has also worked on Red Hat 6.2 |
| 126 | (x86), thanks to the efforts of Rob Noble. |
| 127 | |
| 128 | <p> |
| 129 | Valgrind is licensed under the GNU General Public License, version |
| 130 | 2. Read the file LICENSE in the source distribution for details. |
| 131 | |
| 132 | <a name="whatdoes"> |
| 133 | <h3>1.2 What it does with your program</h3> |
| 134 | |
| 135 | Valgrind is designed to be as non-intrusive as possible. It works |
| 136 | directly with existing executables. You don't need to recompile, |
| 137 | relink, or otherwise modify, the program to be checked. Simply place |
| 138 | the word <code>valgrind</code> at the start of the command line |
| 139 | normally used to run the program. So, for example, if you want to run |
| 140 | the command <code>ls -l</code> on Valgrind, simply issue the |
| 141 | command: <code>valgrind ls -l</code>. |
| 142 | |
| 143 | <p>Valgrind takes control of your program before it starts. Debugging |
| 144 | information is read from the executable and associated libraries, so |
| 145 | that error messages can be phrased in terms of source code |
| 146 | locations. Your program is then run on a synthetic x86 CPU which |
| 147 | checks every memory access. All detected errors are written to a |
| 148 | log. When the program finishes, Valgrind searches for and reports on |
| 149 | leaked memory. |
| 150 | |
| 151 | <p>You can run pretty much any dynamically linked ELF x86 executable using |
| 152 | Valgrind. Programs run 25 to 50 times slower, and take a lot more |
| 153 | memory, than they usually would. It works well enough to run large |
| 154 | programs. For example, the Konqueror web browser from the KDE Desktop |
| 155 | Environment, version 2.1.1, runs slowly but usably on Valgrind. |
| 156 | |
| 157 | <p>Valgrind simulates every single instruction your program executes. |
| 158 | Because of this, it finds errors not only in your application but also |
| 159 | in all supporting dynamically-linked (.so-format) libraries, including |
| 160 | the GNU C library, the X client libraries, Qt, if you work with KDE, and |
| 161 | so on. That often includes libraries, for example the GNU C library, |
| 162 | which contain memory access violations, but which you cannot or do not |
| 163 | want to fix. |
| 164 | |
| 165 | <p>Rather than swamping you with errors in which you are not |
| 166 | interested, Valgrind allows you to selectively suppress errors, by |
| 167 | recording them in a suppressions file which is read when Valgrind |
| 168 | starts up. As supplied, Valgrind comes with a suppressions file |
| 169 | designed to give reasonable behaviour on Red Hat 7.2 (also 7.1 and |
| 170 | 6.2) when running text-only and simple X applications. |
| 171 | |
| 172 | <p><a href="#example">Section 6</a> shows an example of use. |
| 173 | <p> |
| 174 | <hr width="100%"> |
| 175 | |
| 176 | <a name="howtouse"></a> |
| 177 | <h2>2 How to use it, and how to make sense of the results</h2> |
| 178 | |
| 179 | <a name="starta"></a> |
| 180 | <h3>2.1 Getting started</h3> |
| 181 | |
| 182 | First off, consider whether it might be beneficial to recompile your |
| 183 | application and supporting libraries with optimisation disabled and |
| 184 | debugging info enabled (the <code>-g</code> flag). You don't have to |
| 185 | do this, but doing so helps Valgrind produce more accurate and less |
| 186 | confusing error reports. Chances are you're set up like this already, |
| 187 | if you intended to debug your program with GNU gdb, or some other |
| 188 | debugger. |
| 189 | |
| 190 | <p>Then just run your application, but place the word |
| 191 | <code>valgrind</code> in front of your usual command-line invokation. |
| 192 | Note that you should run the real (machine-code) executable here. If |
| 193 | your application is started by, for example, a shell or perl script, |
| 194 | you'll need to modify it to invoke Valgrind on the real executables. |
| 195 | Running such scripts directly under Valgrind will result in you |
| 196 | getting error reports pertaining to <code>/bin/sh</code>, |
| 197 | <code>/usr/bin/perl</code>, or whatever interpreter you're using. |
| 198 | This almost certainly isn't what you want and can be hugely confusing. |
| 199 | |
| 200 | <a name="comment"></a> |
| 201 | <h3>2.2 The commentary</h3> |
| 202 | |
| 203 | Valgrind writes a commentary, detailing error reports and other |
| 204 | significant events. The commentary goes to standard output by |
| 205 | default. This may interfere with your program, so you can ask for it |
| 206 | to be directed elsewhere. |
| 207 | |
| 208 | <p>All lines in the commentary are of the following form:<br> |
| 209 | <pre> |
| 210 | ==12345== some-message-from-Valgrind |
| 211 | </pre> |
| 212 | <p>The <code>12345</code> is the process ID. This scheme makes it easy |
| 213 | to distinguish program output from Valgrind commentary, and also easy |
| 214 | to differentiate commentaries from different processes which have |
| 215 | become merged together, for whatever reason. |
| 216 | |
| 217 | <p>By default, Valgrind writes only essential messages to the commentary, |
| 218 | so as to avoid flooding you with information of secondary importance. |
| 219 | If you want more information about what is happening, re-run, passing |
| 220 | the <code>-v</code> flag to Valgrind. |
| 221 | |
| 222 | |
| 223 | <a name="report"></a> |
| 224 | <h3>2.3 Reporting of errors</h3> |
| 225 | |
| 226 | When Valgrind detects something bad happening in the program, an error |
| 227 | message is written to the commentary. For example:<br> |
| 228 | <pre> |
| 229 | ==25832== Invalid read of size 4 |
| 230 | ==25832== at 0x8048724: BandMatrix::ReSize(int, int, int) (bogon.cpp:45) |
| 231 | ==25832== by 0x80487AF: main (bogon.cpp:66) |
| 232 | ==25832== by 0x40371E5E: __libc_start_main (libc-start.c:129) |
| 233 | ==25832== by 0x80485D1: (within /home/sewardj/newmat10/bogon) |
| 234 | ==25832== Address 0xBFFFF74C is not stack'd, malloc'd or free'd |
| 235 | </pre> |
| 236 | |
| 237 | <p>This message says that the program did an illegal 4-byte read of |
| 238 | address 0xBFFFF74C, which, as far as it can tell, is not a valid stack |
| 239 | address, nor corresponds to any currently malloc'd or free'd blocks. |
| 240 | The read is happening at line 45 of <code>bogon.cpp</code>, called |
| 241 | from line 66 of the same file, etc. For errors associated with an |
| 242 | identified malloc'd/free'd block, for example reading free'd memory, |
| 243 | Valgrind reports not only the location where the error happened, but |
| 244 | also where the associated block was malloc'd/free'd. |
| 245 | |
| 246 | <p>Valgrind remembers all error reports. When an error is detected, |
| 247 | it is compared against old reports, to see if it is a duplicate. If |
| 248 | so, the error is noted, but no further commentary is emitted. This |
| 249 | avoids you being swamped with bazillions of duplicate error reports. |
| 250 | |
| 251 | <p>If you want to know how many times each error occurred, run with |
| 252 | the <code>-v</code> option. When execution finishes, all the reports |
| 253 | are printed out, along with, and sorted by, their occurrence counts. |
| 254 | This makes it easy to see which errors have occurred most frequently. |
| 255 | |
| 256 | <p>Errors are reported before the associated operation actually |
| 257 | happens. For example, if you program decides to read from address |
| 258 | zero, Valgrind will emit a message to this effect, and the program |
| 259 | will then duly die with a segmentation fault. |
| 260 | |
| 261 | <p>In general, you should try and fix errors in the order that they |
| 262 | are reported. Not doing so can be confusing. For example, a program |
| 263 | which copies uninitialised values to several memory locations, and |
| 264 | later uses them, will generate several error messages. The first such |
| 265 | error message may well give the most direct clue to the root cause of |
| 266 | the problem. |
| 267 | |
| 268 | <a name="suppress"></a> |
| 269 | <h3>2.4 Suppressing errors</h3> |
| 270 | |
| 271 | Valgrind detects numerous problems in the base libraries, such as the |
| 272 | GNU C library, and the XFree86 client libraries, which come |
| 273 | pre-installed on your GNU/Linux system. You can't easily fix these, |
| 274 | but you don't want to see these errors (and yes, there are many!) So |
| 275 | Valgrind reads a list of errors to suppress at startup. By default |
| 276 | this file is <code>redhat72.supp</code>, located in the Valgrind |
| 277 | installation directory. |
| 278 | |
| 279 | <p>You can modify and add to the suppressions file at your leisure, or |
| 280 | write your own. Multiple suppression files are allowed. This is |
| 281 | useful if part of your project contains errors you can't or don't want |
| 282 | to fix, yet you don't want to continuously be reminded of them. |
| 283 | |
| 284 | <p>Each error to be suppressed is described very specifically, to |
| 285 | minimise the possibility that a suppression-directive inadvertantly |
| 286 | suppresses a bunch of similar errors which you did want to see. The |
| 287 | suppression mechanism is designed to allow precise yet flexible |
| 288 | specification of errors to suppress. |
| 289 | |
| 290 | <p>If you use the <code>-v</code> flag, at the end of execution, Valgrind |
| 291 | prints out one line for each used suppression, giving its name and the |
| 292 | number of times it got used. Here's the suppressions used by a run of |
| 293 | <code>ls -l</code>: |
| 294 | <pre> |
| 295 | --27579-- supp: 1 socketcall.connect(serv_addr)/__libc_connect/__nscd_getgrgid_r |
| 296 | --27579-- supp: 1 socketcall.connect(serv_addr)/__libc_connect/__nscd_getpwuid_r |
| 297 | --27579-- supp: 6 strrchr/_dl_map_object_from_fd/_dl_map_object |
| 298 | </pre> |
| 299 | |
| 300 | <a name="flags"></a> |
| 301 | <h3>2.5 Command-line flags</h3> |
| 302 | |
| 303 | You invoke Valgrind like this: |
| 304 | <pre> |
| 305 | valgrind [options-for-Valgrind] your-prog [options for your-prog] |
| 306 | </pre> |
| 307 | |
| 308 | <p>Valgrind's default settings succeed in giving reasonable behaviour |
| 309 | in most cases. Available options, in no particular order, are as |
| 310 | follows: |
| 311 | <ul> |
| 312 | <li><code>--help</code></li><br> |
| 313 | |
| 314 | <li><code>--version</code><br> |
| 315 | <p>The usual deal.</li><br><p> |
| 316 | |
| 317 | <li><code>-v --verbose</code><br> |
| 318 | <p>Be more verbose. Gives extra information on various aspects |
| 319 | of your program, such as: the shared objects loaded, the |
| 320 | suppressions used, the progress of the instrumentation engine, |
| 321 | and warnings about unusual behaviour. |
| 322 | </li><br><p> |
| 323 | |
| 324 | <li><code>-q --quiet</code><br> |
| 325 | <p>Run silently, and only print error messages. Useful if you |
| 326 | are running regression tests or have some other automated test |
| 327 | machinery. |
| 328 | </li><br><p> |
| 329 | |
| 330 | <li><code>--demangle=no</code><br> |
| 331 | <code>--demangle=yes</code> [the default] |
| 332 | <p>Disable/enable automatic demangling (decoding) of C++ names. |
| 333 | Enabled by default. When enabled, Valgrind will attempt to |
| 334 | translate encoded C++ procedure names back to something |
| 335 | approaching the original. The demangler handles symbols mangled |
| 336 | by g++ versions 2.X and 3.X. |
| 337 | |
| 338 | <p>An important fact about demangling is that function |
| 339 | names mentioned in suppressions files should be in their mangled |
| 340 | form. Valgrind does not demangle function names when searching |
| 341 | for applicable suppressions, because to do otherwise would make |
| 342 | suppressions file contents dependent on the state of Valgrind's |
| 343 | demangling machinery, and would also be slow and pointless. |
| 344 | </li><br><p> |
| 345 | |
| 346 | <li><code>--num-callers=<number></code> [default=4]<br> |
| 347 | <p>By default, Valgrind shows four levels of function call names |
| 348 | to help you identify program locations. You can change that |
| 349 | number with this option. This can help in determining the |
| 350 | program's location in deeply-nested call chains. Note that errors |
| 351 | are commoned up using only the top three function locations (the |
| 352 | place in the current function, and that of its two immediate |
| 353 | callers). So this doesn't affect the total number of errors |
| 354 | reported. |
| 355 | <p> |
| 356 | The maximum value for this is 50. Note that higher settings |
| 357 | will make Valgrind run a bit more slowly and take a bit more |
| 358 | memory, but can be useful when working with programs with |
| 359 | deeply-nested call chains. |
| 360 | </li><br><p> |
| 361 | |
| 362 | <li><code>--gdb-attach=no</code> [the default]<br> |
| 363 | <code>--gdb-attach=yes</code> |
| 364 | <p>When enabled, Valgrind will pause after every error shown, |
| 365 | and print the line |
| 366 | <br> |
| 367 | <code>---- Attach to GDB ? --- [Return/N/n/Y/y/C/c] ----</code> |
| 368 | <p> |
| 369 | Pressing <code>Ret</code>, or <code>N</code> <code>Ret</code> |
| 370 | or <code>n</code> <code>Ret</code>, causes Valgrind not to |
| 371 | start GDB for this error. |
| 372 | <p> |
| 373 | <code>Y</code> <code>Ret</code> |
| 374 | or <code>y</code> <code>Ret</code> causes Valgrind to |
| 375 | start GDB, for the program at this point. When you have |
| 376 | finished with GDB, quit from it, and the program will continue. |
| 377 | Trying to continue from inside GDB doesn't work. |
| 378 | <p> |
| 379 | <code>C</code> <code>Ret</code> |
| 380 | or <code>c</code> <code>Ret</code> causes Valgrind not to |
| 381 | start GDB, and not to ask again. |
| 382 | <p> |
| 383 | <code>--gdb-attach=yes</code> conflicts with |
| 384 | <code>--trace-children=yes</code>. You can't use them |
| 385 | together. Valgrind refuses to start up in this situation. |
| 386 | </li><br><p> |
| 387 | |
| 388 | <li><code>--partial-loads-ok=yes</code> [the default]<br> |
| 389 | <code>--partial-loads-ok=no</code> |
| 390 | <p>Controls how Valgrind handles word (4-byte) loads from |
| 391 | addresses for which some bytes are addressible and others |
| 392 | are not. When <code>yes</code> (the default), such loads |
| 393 | do not elicit an address error. Instead, the loaded V bytes |
| 394 | corresponding to the illegal addresses indicate undefined, and |
| 395 | those corresponding to legal addresses are loaded from shadow |
| 396 | memory, as usual. |
| 397 | <p> |
| 398 | When <code>no</code>, loads from partially |
| 399 | invalid addresses are treated the same as loads from completely |
| 400 | invalid addresses: an illegal-address error is issued, |
| 401 | and the resulting V bytes indicate valid data. |
| 402 | </li><br><p> |
| 403 | |
| 404 | <li><code>--sloppy-malloc=no</code> [the default]<br> |
| 405 | <code>--sloppy-malloc=yes</code> |
| 406 | <p>When enabled, all requests for malloc/calloc are rounded up |
| 407 | to a whole number of machine words -- in other words, made |
| 408 | divisible by 4. For example, a request for 17 bytes of space |
| 409 | would result in a 20-byte area being made available. This works |
| 410 | around bugs in sloppy libraries which assume that they can |
| 411 | safely rely on malloc/calloc requests being rounded up in this |
| 412 | fashion. Without the workaround, these libraries tend to |
| 413 | generate large numbers of errors when they access the ends of |
| 414 | these areas. Valgrind snapshots dated 17 Feb 2002 and later are |
| 415 | cleverer about this problem, and you should no longer need to |
| 416 | use this flag. |
| 417 | </li><br><p> |
| 418 | |
| 419 | <li><code>--trace-children=no</code> [the default]</br> |
| 420 | <code>--trace-children=yes</code> |
| 421 | <p>When enabled, Valgrind will trace into child processes. This |
| 422 | is confusing and usually not what you want, so is disabled by |
| 423 | default.</li><br><p> |
| 424 | |
| 425 | <li><code>--freelist-vol=<number></code> [default: 1000000] |
| 426 | <p>When the client program releases memory using free (in C) or |
| 427 | delete (C++), that memory is not immediately made available for |
| 428 | re-allocation. Instead it is marked inaccessible and placed in |
| 429 | a queue of freed blocks. The purpose is to delay the point at |
| 430 | which freed-up memory comes back into circulation. This |
| 431 | increases the chance that Valgrind will be able to detect |
| 432 | invalid accesses to blocks for some significant period of time |
| 433 | after they have been freed. |
| 434 | <p> |
| 435 | This flag specifies the maximum total size, in bytes, of the |
| 436 | blocks in the queue. The default value is one million bytes. |
| 437 | Increasing this increases the total amount of memory used by |
| 438 | Valgrind but may detect invalid uses of freed blocks which would |
| 439 | otherwise go undetected.</li><br><p> |
| 440 | |
| 441 | <li><code>--logfile-fd=<number></code> [default: 2, stderr] |
| 442 | <p>Specifies the file descriptor on which Valgrind communicates |
| 443 | all of its messages. The default, 2, is the standard error |
| 444 | channel. This may interfere with the client's own use of |
| 445 | stderr. To dump Valgrind's commentary in a file without using |
| 446 | stderr, something like the following works well (sh/bash |
| 447 | syntax):<br> |
| 448 | <code> |
| 449 | valgrind --logfile-fd=9 my_prog 9> logfile</code><br> |
| 450 | That is: tell Valgrind to send all output to file descriptor 9, |
| 451 | and ask the shell to route file descriptor 9 to "logfile". |
| 452 | </li><br><p> |
| 453 | |
| 454 | <li><code>--suppressions=<filename></code> [default: |
| 455 | /installation/directory/redhat72.supp] <p>Specifies an extra |
| 456 | file from which to read descriptions of errors to suppress. You |
| 457 | may use as many extra suppressions files as you |
| 458 | like.</li><br><p> |
| 459 | |
| 460 | <li><code>--leak-check=no</code> [default]<br> |
| 461 | <code>--leak-check=yes</code> |
| 462 | <p>When enabled, search for memory leaks when the client program |
| 463 | finishes. A memory leak means a malloc'd block, which has not |
| 464 | yet been free'd, but to which no pointer can be found. Such a |
| 465 | block can never be free'd by the program, since no pointer to it |
| 466 | exists. Leak checking is disabled by default |
| 467 | because it tends to generate dozens of error messages. |
| 468 | </li><br><p> |
| 469 | |
| 470 | <li><code>--show-reachable=no</code> [default]<br> |
| 471 | <code>--show-reachable=yes</code> <p>When disabled, the memory |
| 472 | leak detector only shows blocks for which it cannot find a |
| 473 | pointer to at all, or it can only find a pointer to the middle |
| 474 | of. These blocks are prime candidates for memory leaks. When |
| 475 | enabled, the leak detector also reports on blocks which it could |
| 476 | find a pointer to. Your program could, at least in principle, |
| 477 | have freed such blocks before exit. Contrast this to blocks for |
| 478 | which no pointer, or only an interior pointer could be found: |
| 479 | they are more likely to indicate memory leaks, because |
| 480 | you do not actually have a pointer to the start of the block |
| 481 | which you can hand to free(), even if you wanted to. |
| 482 | </li><br><p> |
| 483 | |
| 484 | <li><code>--leak-resolution=low</code> [default]<br> |
| 485 | <code>--leak-resolution=med</code> <br> |
| 486 | <code>--leak-resolution=high</code> |
| 487 | <p>When doing leak checking, determines how willing Valgrind is |
| 488 | to consider different backtraces the same. When set to |
| 489 | <code>low</code>, the default, only the first two entries need |
| 490 | match. When <code>med</code>, four entries have to match. When |
| 491 | <code>high</code>, all entries need to match. |
| 492 | <p> |
| 493 | For hardcore leak debugging, you probably want to use |
| 494 | <code>--leak-resolution=high</code> together with |
| 495 | <code>--num-callers=40</code> or some such large number. Note |
| 496 | however that this can give an overwhelming amount of |
| 497 | information, which is why the defaults are 4 callers and |
| 498 | low-resolution matching. |
| 499 | <p> |
| 500 | Note that the <code>--leak-resolution=</code> setting does not |
| 501 | affect Valgrind's ability to find leaks. It only changes how |
| 502 | the results are presented to you. |
| 503 | </li><br><p> |
| 504 | |
| 505 | <li><code>--workaround-gcc296-bugs=no</code> [default]<br> |
| 506 | <code>--workaround-gcc296-bugs=yes</code> <p>When enabled, |
| 507 | assume that reads and writes some small distance below the stack |
| 508 | pointer <code>%esp</code> are due to bugs in gcc 2.96, and does |
| 509 | not report them. The "small distance" is 256 bytes by default. |
| 510 | Note that gcc 2.96 is the default compiler on some popular Linux |
| 511 | distributions (RedHat 7.X, Mandrake) and so you may well need to |
| 512 | use this flag. Do not use it if you do not have to, as it can |
| 513 | cause real errors to be overlooked. A better option is to use a |
| 514 | gcc/g++ which works properly; 2.95.3 seems to be a good choice. |
| 515 | <p> |
| 516 | Unfortunately (27 Feb 02) it looks like g++ 3.0.4 is similarly |
| 517 | buggy, so you may need to issue this flag if you use 3.0.4. |
| 518 | </li><br><p> |
| 519 | |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame^] | 520 | <li><code>--cachesim=no</code> [default]<br> |
| 521 | <code>--cachesim=yes</code> |
| 522 | <p>When enabled, turns off memory checking, and turns on cache profiling. |
| 523 | Cache profiling is described in detail in <a href="#cache">Section 7</a>. |
| 524 | </li><p> |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 525 | </ul> |
| 526 | |
| 527 | There are also some options for debugging Valgrind itself. You |
| 528 | shouldn't need to use them in the normal run of things. Nevertheless: |
| 529 | |
| 530 | <ul> |
| 531 | |
| 532 | <li><code>--single-step=no</code> [default]<br> |
| 533 | <code>--single-step=yes</code> |
| 534 | <p>When enabled, each x86 insn is translated seperately into |
| 535 | instrumented code. When disabled, translation is done on a |
| 536 | per-basic-block basis, giving much better translations.</li><br> |
| 537 | <p> |
| 538 | |
| 539 | <li><code>--optimise=no</code><br> |
| 540 | <code>--optimise=yes</code> [default] |
| 541 | <p>When enabled, various improvements are applied to the |
| 542 | intermediate code, mainly aimed at allowing the simulated CPU's |
| 543 | registers to be cached in the real CPU's registers over several |
| 544 | simulated instructions.</li><br> |
| 545 | <p> |
| 546 | |
| 547 | <li><code>--instrument=no</code><br> |
| 548 | <code>--instrument=yes</code> [default] |
| 549 | <p>When disabled, the translations don't actually contain any |
| 550 | instrumentation.</li><br> |
| 551 | <p> |
| 552 | |
| 553 | <li><code>--cleanup=no</code><br> |
| 554 | <code>--cleanup=yes</code> [default] |
| 555 | <p>When enabled, various improvments are applied to the |
| 556 | post-instrumented intermediate code, aimed at removing redundant |
| 557 | value checks.</li><br> |
| 558 | <p> |
| 559 | |
| 560 | <li><code>--trace-syscalls=no</code> [default]<br> |
| 561 | <code>--trace-syscalls=yes</code> |
| 562 | <p>Enable/disable tracing of system call intercepts.</li><br> |
| 563 | <p> |
| 564 | |
| 565 | <li><code>--trace-signals=no</code> [default]<br> |
| 566 | <code>--trace-signals=yes</code> |
| 567 | <p>Enable/disable tracing of signal handling.</li><br> |
| 568 | <p> |
| 569 | |
sewardj | c7529c3 | 2002-04-16 01:55:18 +0000 | [diff] [blame] | 570 | <li><code>--trace-sched=no</code> [default]<br> |
| 571 | <code>--trace-sched=yes</code> |
| 572 | <p>Enable/disable tracing of thread scheduling events.</li><br> |
| 573 | <p> |
| 574 | |
sewardj | 45b4b37 | 2002-04-16 22:50:32 +0000 | [diff] [blame] | 575 | <li><code>--trace-pthread=none</code> [default]<br> |
| 576 | <code>--trace-pthread=some</code> <br> |
| 577 | <code>--trace-pthread=all</code> |
| 578 | <p>Specifies amount of trace detail for pthread-related events.</li><br> |
sewardj | c7529c3 | 2002-04-16 01:55:18 +0000 | [diff] [blame] | 579 | <p> |
| 580 | |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 581 | <li><code>--trace-symtab=no</code> [default]<br> |
| 582 | <code>--trace-symtab=yes</code> |
| 583 | <p>Enable/disable tracing of symbol table reading.</li><br> |
| 584 | <p> |
| 585 | |
| 586 | <li><code>--trace-malloc=no</code> [default]<br> |
| 587 | <code>--trace-malloc=yes</code> |
| 588 | <p>Enable/disable tracing of malloc/free (et al) intercepts. |
| 589 | </li><br> |
| 590 | <p> |
| 591 | |
| 592 | <li><code>--stop-after=<number></code> |
| 593 | [default: infinity, more or less] |
| 594 | <p>After <number> basic blocks have been executed, shut down |
| 595 | Valgrind and switch back to running the client on the real CPU. |
| 596 | </li><br> |
| 597 | <p> |
| 598 | |
| 599 | <li><code>--dump-error=<number></code> |
| 600 | [default: inactive] |
| 601 | <p>After the program has exited, show gory details of the |
| 602 | translation of the basic block containing the <number>'th |
| 603 | error context. When used with <code>--single-step=yes</code>, |
| 604 | can show the |
| 605 | exact x86 instruction causing an error.</li><br> |
| 606 | <p> |
| 607 | |
| 608 | <li><code>--smc-check=none</code><br> |
| 609 | <code>--smc-check=some</code> [default]<br> |
| 610 | <code>--smc-check=all</code> |
| 611 | <p>How carefully should Valgrind check for self-modifying code |
| 612 | writes, so that translations can be discarded? When |
| 613 | "none", no writes are checked. When "some", only writes |
| 614 | resulting from moves from integer registers to memory are |
| 615 | checked. When "all", all memory writes are checked, even those |
| 616 | with which are no sane program would generate code -- for |
| 617 | example, floating-point writes.</li> |
| 618 | </ul> |
| 619 | |
| 620 | |
| 621 | <a name="errormsgs"> |
| 622 | <h3>2.6 Explaination of error messages</h3> |
| 623 | |
| 624 | Despite considerable sophistication under the hood, Valgrind can only |
| 625 | really detect two kinds of errors, use of illegal addresses, and use |
| 626 | of undefined values. Nevertheless, this is enough to help you |
| 627 | discover all sorts of memory-management nasties in your code. This |
| 628 | section presents a quick summary of what error messages mean. The |
| 629 | precise behaviour of the error-checking machinery is described in |
| 630 | <a href="#machine">Section 4</a>. |
| 631 | |
| 632 | |
| 633 | <h4>2.6.1 Illegal read / Illegal write errors</h4> |
| 634 | For example: |
| 635 | <pre> |
| 636 | ==30975== Invalid read of size 4 |
| 637 | ==30975== at 0x40F6BBCC: (within /usr/lib/libpng.so.2.1.0.9) |
| 638 | ==30975== by 0x40F6B804: (within /usr/lib/libpng.so.2.1.0.9) |
| 639 | ==30975== by 0x40B07FF4: read_png_image__FP8QImageIO (kernel/qpngio.cpp:326) |
| 640 | ==30975== by 0x40AC751B: QImageIO::read() (kernel/qimage.cpp:3621) |
| 641 | ==30975== Address 0xBFFFF0E0 is not stack'd, malloc'd or free'd |
| 642 | </pre> |
| 643 | |
| 644 | <p>This happens when your program reads or writes memory at a place |
| 645 | which Valgrind reckons it shouldn't. In this example, the program did |
| 646 | a 4-byte read at address 0xBFFFF0E0, somewhere within the |
| 647 | system-supplied library libpng.so.2.1.0.9, which was called from |
| 648 | somewhere else in the same library, called from line 326 of |
| 649 | qpngio.cpp, and so on. |
| 650 | |
| 651 | <p>Valgrind tries to establish what the illegal address might relate |
| 652 | to, since that's often useful. So, if it points into a block of |
| 653 | memory which has already been freed, you'll be informed of this, and |
sewardj | c7529c3 | 2002-04-16 01:55:18 +0000 | [diff] [blame] | 654 | also where the block was free'd at. Likewise, if it should turn out |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 655 | to be just off the end of a malloc'd block, a common result of |
| 656 | off-by-one-errors in array subscripting, you'll be informed of this |
| 657 | fact, and also where the block was malloc'd. |
| 658 | |
| 659 | <p>In this example, Valgrind can't identify the address. Actually the |
| 660 | address is on the stack, but, for some reason, this is not a valid |
| 661 | stack address -- it is below the stack pointer, %esp, and that isn't |
| 662 | allowed. |
| 663 | |
| 664 | <p>Note that Valgrind only tells you that your program is about to |
| 665 | access memory at an illegal address. It can't stop the access from |
| 666 | happening. So, if your program makes an access which normally would |
| 667 | result in a segmentation fault, you program will still suffer the same |
| 668 | fate -- but you will get a message from Valgrind immediately prior to |
| 669 | this. In this particular example, reading junk on the stack is |
| 670 | non-fatal, and the program stays alive. |
| 671 | |
| 672 | |
| 673 | <h4>2.6.2 Use of uninitialised values</h4> |
| 674 | For example: |
| 675 | <pre> |
sewardj | a7dc795 | 2002-03-24 11:29:13 +0000 | [diff] [blame] | 676 | ==19146== Conditional jump or move depends on uninitialised value(s) |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 677 | ==19146== at 0x402DFA94: _IO_vfprintf (_itoa.h:49) |
| 678 | ==19146== by 0x402E8476: _IO_printf (printf.c:36) |
| 679 | ==19146== by 0x8048472: main (tests/manuel1.c:8) |
| 680 | ==19146== by 0x402A6E5E: __libc_start_main (libc-start.c:129) |
| 681 | </pre> |
| 682 | |
| 683 | <p>An uninitialised-value use error is reported when your program uses |
| 684 | a value which hasn't been initialised -- in other words, is undefined. |
| 685 | Here, the undefined value is used somewhere inside the printf() |
| 686 | machinery of the C library. This error was reported when running the |
| 687 | following small program: |
| 688 | <pre> |
| 689 | int main() |
| 690 | { |
| 691 | int x; |
| 692 | printf ("x = %d\n", x); |
| 693 | } |
| 694 | </pre> |
| 695 | |
| 696 | <p>It is important to understand that your program can copy around |
| 697 | junk (uninitialised) data to its heart's content. Valgrind observes |
| 698 | this and keeps track of the data, but does not complain. A complaint |
| 699 | is issued only when your program attempts to make use of uninitialised |
| 700 | data. In this example, x is uninitialised. Valgrind observes the |
| 701 | value being passed to _IO_printf and thence to |
| 702 | _IO_vfprintf, but makes no comment. However, |
| 703 | _IO_vfprintf has to examine the value of x |
| 704 | so it can turn it into the corresponding ASCII string, and it is at |
| 705 | this point that Valgrind complains. |
| 706 | |
| 707 | <p>Sources of uninitialised data tend to be: |
| 708 | <ul> |
| 709 | <li>Local variables in procedures which have not been initialised, |
| 710 | as in the example above.</li><br><p> |
| 711 | |
| 712 | <li>The contents of malloc'd blocks, before you write something |
| 713 | there. In C++, the new operator is a wrapper round malloc, so |
| 714 | if you create an object with new, its fields will be |
| 715 | uninitialised until you fill them in, which is only Right and |
| 716 | Proper.</li> |
| 717 | </ul> |
| 718 | |
| 719 | |
| 720 | |
| 721 | <h4>2.6.3 Illegal frees</h4> |
| 722 | For example: |
| 723 | <pre> |
| 724 | ==7593== Invalid free() |
| 725 | ==7593== at 0x4004FFDF: free (ut_clientmalloc.c:577) |
| 726 | ==7593== by 0x80484C7: main (tests/doublefree.c:10) |
| 727 | ==7593== by 0x402A6E5E: __libc_start_main (libc-start.c:129) |
| 728 | ==7593== by 0x80483B1: (within tests/doublefree) |
| 729 | ==7593== Address 0x3807F7B4 is 0 bytes inside a block of size 177 free'd |
| 730 | ==7593== at 0x4004FFDF: free (ut_clientmalloc.c:577) |
| 731 | ==7593== by 0x80484C7: main (tests/doublefree.c:10) |
| 732 | ==7593== by 0x402A6E5E: __libc_start_main (libc-start.c:129) |
| 733 | ==7593== by 0x80483B1: (within tests/doublefree) |
| 734 | </pre> |
| 735 | <p>Valgrind keeps track of the blocks allocated by your program with |
| 736 | malloc/new, so it can know exactly whether or not the argument to |
| 737 | free/delete is legitimate or not. Here, this test program has |
| 738 | freed the same block twice. As with the illegal read/write errors, |
| 739 | Valgrind attempts to make sense of the address free'd. If, as |
| 740 | here, the address is one which has previously been freed, you wil |
| 741 | be told that -- making duplicate frees of the same block easy to spot. |
| 742 | |
| 743 | |
| 744 | <h4>2.6.4 Passing system call parameters with inadequate |
| 745 | read/write permissions</h4> |
| 746 | |
| 747 | Valgrind checks all parameters to system calls. If a system call |
| 748 | needs to read from a buffer provided by your program, Valgrind checks |
| 749 | that the entire buffer is addressible and has valid data, ie, it is |
| 750 | readable. And if the system call needs to write to a user-supplied |
| 751 | buffer, Valgrind checks that the buffer is addressible. After the |
| 752 | system call, Valgrind updates its administrative information to |
| 753 | precisely reflect any changes in memory permissions caused by the |
| 754 | system call. |
| 755 | |
| 756 | <p>Here's an example of a system call with an invalid parameter: |
| 757 | <pre> |
| 758 | #include <stdlib.h> |
| 759 | #include <unistd.h> |
| 760 | int main( void ) |
| 761 | { |
| 762 | char* arr = malloc(10); |
| 763 | (void) write( 1 /* stdout */, arr, 10 ); |
| 764 | return 0; |
| 765 | } |
| 766 | </pre> |
| 767 | |
| 768 | <p>You get this complaint ... |
| 769 | <pre> |
| 770 | ==8230== Syscall param write(buf) lacks read permissions |
| 771 | ==8230== at 0x4035E072: __libc_write |
| 772 | ==8230== by 0x402A6E5E: __libc_start_main (libc-start.c:129) |
| 773 | ==8230== by 0x80483B1: (within tests/badwrite) |
| 774 | ==8230== by <bogus frame pointer> ??? |
| 775 | ==8230== Address 0x3807E6D0 is 0 bytes inside a block of size 10 alloc'd |
| 776 | ==8230== at 0x4004FEE6: malloc (ut_clientmalloc.c:539) |
| 777 | ==8230== by 0x80484A0: main (tests/badwrite.c:6) |
| 778 | ==8230== by 0x402A6E5E: __libc_start_main (libc-start.c:129) |
| 779 | ==8230== by 0x80483B1: (within tests/badwrite) |
| 780 | </pre> |
| 781 | |
| 782 | <p>... because the program has tried to write uninitialised junk from |
| 783 | the malloc'd block to the standard output. |
| 784 | |
| 785 | |
| 786 | <h4>2.6.5 Warning messages you might see</h4> |
| 787 | |
| 788 | Most of these only appear if you run in verbose mode (enabled by |
| 789 | <code>-v</code>): |
| 790 | <ul> |
| 791 | <li> <code>More than 50 errors detected. Subsequent errors |
| 792 | will still be recorded, but in less detail than before.</code> |
| 793 | <br> |
| 794 | After 50 different errors have been shown, Valgrind becomes |
| 795 | more conservative about collecting them. It then requires only |
| 796 | the program counters in the top two stack frames to match when |
| 797 | deciding whether or not two errors are really the same one. |
| 798 | Prior to this point, the PCs in the top four frames are required |
| 799 | to match. This hack has the effect of slowing down the |
| 800 | appearance of new errors after the first 50. The 50 constant can |
| 801 | be changed by recompiling Valgrind. |
| 802 | <p> |
| 803 | <li> <code>More than 500 errors detected. I'm not reporting any more. |
| 804 | Final error counts may be inaccurate. Go fix your |
| 805 | program!</code> |
| 806 | <br> |
| 807 | After 500 different errors have been detected, Valgrind ignores |
| 808 | any more. It seems unlikely that collecting even more different |
| 809 | ones would be of practical help to anybody, and it avoids the |
| 810 | danger that Valgrind spends more and more of its time comparing |
| 811 | new errors against an ever-growing collection. As above, the 500 |
| 812 | number is a compile-time constant. |
| 813 | <p> |
| 814 | <li> <code>Warning: client exiting by calling exit(<number>). |
| 815 | Bye!</code> |
| 816 | <br> |
| 817 | Your program has called the <code>exit</code> system call, which |
| 818 | will immediately terminate the process. You'll get no exit-time |
| 819 | error summaries or leak checks. Note that this is not the same |
| 820 | as your program calling the ANSI C function <code>exit()</code> |
| 821 | -- that causes a normal, controlled shutdown of Valgrind. |
| 822 | <p> |
| 823 | <li> <code>Warning: client switching stacks?</code> |
| 824 | <br> |
| 825 | Valgrind spotted such a large change in the stack pointer, %esp, |
| 826 | that it guesses the client is switching to a different stack. |
| 827 | At this point it makes a kludgey guess where the base of the new |
| 828 | stack is, and sets memory permissions accordingly. You may get |
| 829 | many bogus error messages following this, if Valgrind guesses |
| 830 | wrong. At the moment "large change" is defined as a change of |
| 831 | more that 2000000 in the value of the %esp (stack pointer) |
| 832 | register. |
| 833 | <p> |
| 834 | <li> <code>Warning: client attempted to close Valgrind's logfile fd <number> |
| 835 | </code> |
| 836 | <br> |
| 837 | Valgrind doesn't allow the client |
| 838 | to close the logfile, because you'd never see any diagnostic |
| 839 | information after that point. If you see this message, |
| 840 | you may want to use the <code>--logfile-fd=<number></code> |
| 841 | option to specify a different logfile file-descriptor number. |
| 842 | <p> |
| 843 | <li> <code>Warning: noted but unhandled ioctl <number></code> |
| 844 | <br> |
| 845 | Valgrind observed a call to one of the vast family of |
| 846 | <code>ioctl</code> system calls, but did not modify its |
| 847 | memory status info (because I have not yet got round to it). |
| 848 | The call will still have gone through, but you may get spurious |
| 849 | errors after this as a result of the non-update of the memory info. |
| 850 | <p> |
| 851 | <li> <code>Warning: unblocking signal <number> due to |
| 852 | sigprocmask</code> |
| 853 | <br> |
| 854 | Really just a diagnostic from the signal simulation machinery. |
| 855 | This message will appear if your program handles a signal by |
| 856 | first <code>longjmp</code>ing out of the signal handler, |
| 857 | and then unblocking the signal with <code>sigprocmask</code> |
| 858 | -- a standard signal-handling idiom. |
| 859 | <p> |
| 860 | <li> <code>Warning: bad signal number <number> in __NR_sigaction.</code> |
| 861 | <br> |
| 862 | Probably indicates a bug in the signal simulation machinery. |
| 863 | <p> |
| 864 | <li> <code>Warning: set address range perms: large range <number></code> |
| 865 | <br> |
| 866 | Diagnostic message, mostly for my benefit, to do with memory |
| 867 | permissions. |
| 868 | </ul> |
| 869 | |
| 870 | |
| 871 | <a name="suppfiles"></a> |
| 872 | <h3>2.7 Writing suppressions files</h3> |
| 873 | |
| 874 | A suppression file describes a bunch of errors which, for one reason |
| 875 | or another, you don't want Valgrind to tell you about. Usually the |
| 876 | reason is that the system libraries are buggy but unfixable, at least |
| 877 | within the scope of the current debugging session. Multiple |
| 878 | suppresions files are allowed. By default, Valgrind uses |
| 879 | <code>linux24.supp</code> in the directory where it is installed. |
| 880 | |
| 881 | <p> |
| 882 | You can ask to add suppressions from another file, by specifying |
| 883 | <code>--suppressions=/path/to/file.supp</code>. |
| 884 | |
| 885 | <p>Each suppression has the following components:<br> |
| 886 | <ul> |
| 887 | |
| 888 | <li>Its name. This merely gives a handy name to the suppression, by |
| 889 | which it is referred to in the summary of used suppressions |
| 890 | printed out when a program finishes. It's not important what |
| 891 | the name is; any identifying string will do. |
| 892 | <p> |
| 893 | |
| 894 | <li>The nature of the error to suppress. Either: |
| 895 | <code>Value1</code>, |
| 896 | <code>Value2</code>, |
sewardj | a7dc795 | 2002-03-24 11:29:13 +0000 | [diff] [blame] | 897 | <code>Value4</code> or |
| 898 | <code>Value8</code>, |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 899 | meaning an uninitialised-value error when |
sewardj | a7dc795 | 2002-03-24 11:29:13 +0000 | [diff] [blame] | 900 | using a value of 1, 2, 4 or 8 bytes. |
| 901 | Or |
| 902 | <code>Cond</code> (or its old name, <code>Value0</code>), |
| 903 | meaning use of an uninitialised CPU condition code. Or: |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 904 | <code>Addr1</code>, |
| 905 | <code>Addr2</code>, |
| 906 | <code>Addr4</code> or |
| 907 | <code>Addr8</code>, meaning an invalid address during a |
| 908 | memory access of 1, 2, 4 or 8 bytes respectively. Or |
| 909 | <code>Param</code>, |
| 910 | meaning an invalid system call parameter error. Or |
| 911 | <code>Free</code>, meaning an invalid or mismatching free.</li><br> |
| 912 | <p> |
| 913 | |
| 914 | <li>The "immediate location" specification. For Value and Addr |
| 915 | errors, is either the name of the function in which the error |
| 916 | occurred, or, failing that, the full path the the .so file |
| 917 | containing the error location. For Param errors, is the name of |
| 918 | the offending system call parameter. For Free errors, is the |
| 919 | name of the function doing the freeing (eg, <code>free</code>, |
| 920 | <code>__builtin_vec_delete</code>, etc)</li><br> |
| 921 | <p> |
| 922 | |
| 923 | <li>The caller of the above "immediate location". Again, either a |
| 924 | function or shared-object name.</li><br> |
| 925 | <p> |
| 926 | |
| 927 | <li>Optionally, one or two extra calling-function or object names, |
| 928 | for greater precision.</li> |
| 929 | </ul> |
| 930 | |
| 931 | <p> |
| 932 | Locations may be either names of shared objects or wildcards matching |
| 933 | function names. They begin <code>obj:</code> and <code>fun:</code> |
| 934 | respectively. Function and object names to match against may use the |
| 935 | wildcard characters <code>*</code> and <code>?</code>. |
| 936 | |
| 937 | A suppression only suppresses an error when the error matches all the |
| 938 | details in the suppression. Here's an example: |
| 939 | <pre> |
| 940 | { |
| 941 | __gconv_transform_ascii_internal/__mbrtowc/mbtowc |
| 942 | Value4 |
| 943 | fun:__gconv_transform_ascii_internal |
| 944 | fun:__mbr*toc |
| 945 | fun:mbtowc |
| 946 | } |
| 947 | </pre> |
| 948 | |
| 949 | <p>What is means is: suppress a use-of-uninitialised-value error, when |
| 950 | the data size is 4, when it occurs in the function |
| 951 | <code>__gconv_transform_ascii_internal</code>, when that is called |
| 952 | from any function of name matching <code>__mbr*toc</code>, |
| 953 | when that is called from |
| 954 | <code>mbtowc</code>. It doesn't apply under any other circumstances. |
| 955 | The string by which this suppression is identified to the user is |
| 956 | __gconv_transform_ascii_internal/__mbrtowc/mbtowc. |
| 957 | |
| 958 | <p>Another example: |
| 959 | <pre> |
| 960 | { |
| 961 | libX11.so.6.2/libX11.so.6.2/libXaw.so.7.0 |
| 962 | Value4 |
| 963 | obj:/usr/X11R6/lib/libX11.so.6.2 |
| 964 | obj:/usr/X11R6/lib/libX11.so.6.2 |
| 965 | obj:/usr/X11R6/lib/libXaw.so.7.0 |
| 966 | } |
| 967 | </pre> |
| 968 | |
| 969 | <p>Suppress any size 4 uninitialised-value error which occurs anywhere |
| 970 | in <code>libX11.so.6.2</code>, when called from anywhere in the same |
| 971 | library, when called from anywhere in <code>libXaw.so.7.0</code>. The |
| 972 | inexact specification of locations is regrettable, but is about all |
| 973 | you can hope for, given that the X11 libraries shipped with Red Hat |
| 974 | 7.2 have had their symbol tables removed. |
| 975 | |
| 976 | <p>Note -- since the above two examples did not make it clear -- that |
| 977 | you can freely mix the <code>obj:</code> and <code>fun:</code> |
| 978 | styles of description within a single suppression record. |
| 979 | |
| 980 | |
| 981 | <a name="install"></a> |
| 982 | <h3>2.8 Building and installing</h3> |
| 983 | At the moment, very rudimentary. |
| 984 | |
| 985 | <p>The tarball is set up for a standard Red Hat 7.1 (6.2) machine. To |
| 986 | build, just do "make". No configure script, no autoconf, no nothing. |
| 987 | |
| 988 | <p>The files needed for installation are: valgrind.so, valgring.so, |
| 989 | valgrind, VERSION, redhat72.supp (or redhat62.supp). You can copy |
| 990 | these to any directory you like. However, you then need to edit the |
| 991 | shell script "valgrind". On line 4, set the environment variable |
| 992 | <code>VALGRIND</code> to point to the directory you have copied the |
| 993 | installation into. |
| 994 | |
| 995 | |
sewardj | c7529c3 | 2002-04-16 01:55:18 +0000 | [diff] [blame] | 996 | <a name="install"></a> |
| 997 | <h3>2.9 The Client Request mechanism</h3> |
| 998 | |
| 999 | Valgrind has a trapdoor mechanism via which the client program can |
| 1000 | pass all manner of requests and queries to Valgrind. Internally, this |
| 1001 | is used extensively to make malloc, free, signals, etc, work, although |
| 1002 | you don't see that. |
| 1003 | <p> |
| 1004 | For your convenience, a subset of these so-called client requests is |
| 1005 | provided to allow you to tell Valgrind facts about the behaviour of |
| 1006 | your program, and conversely to make queries. In particular, your |
| 1007 | program can tell Valgrind about changes in memory range permissions |
| 1008 | that Valgrind would not otherwise know about, and so allows clients to |
| 1009 | get Valgrind to do arbitrary custom checks. |
| 1010 | <p> |
| 1011 | Clients need to include the header file <code>valgrind.h</code> to |
| 1012 | make this work. The macros therein have the magical property that |
| 1013 | they generate code in-line which Valgrind can spot. However, the code |
| 1014 | does nothing when not run on Valgrind, so you are not forced to run |
| 1015 | your program on Valgrind just because you use the macros in this file. |
| 1016 | <p> |
| 1017 | A brief description of the available macros: |
| 1018 | <ul> |
| 1019 | <li><code>VALGRIND_MAKE_NOACCESS</code>, |
| 1020 | <code>VALGRIND_MAKE_WRITABLE</code> and |
| 1021 | <code>VALGRIND_MAKE_READABLE</code>. These mark address |
| 1022 | ranges as completely inaccessible, accessible but containing |
| 1023 | undefined data, and accessible and containing defined data, |
| 1024 | respectively. Subsequent errors may have their faulting |
| 1025 | addresses described in terms of these blocks. Returns a |
| 1026 | "block handle". Returns zero when not run on Valgrind. |
| 1027 | <p> |
| 1028 | <li><code>VALGRIND_DISCARD</code>: At some point you may want |
| 1029 | Valgrind to stop reporting errors in terms of the blocks |
| 1030 | defined by the previous three macros. To do this, the above |
| 1031 | macros return a small-integer "block handle". You can pass |
| 1032 | this block handle to <code>VALGRIND_DISCARD</code>. After |
| 1033 | doing so, Valgrind will no longer be able to relate |
| 1034 | addressing errors to the user-defined block associated with |
| 1035 | the handle. The permissions settings associated with the |
| 1036 | handle remain in place; this just affects how errors are |
| 1037 | reported, not whether they are reported. Returns 1 for an |
| 1038 | invalid handle and 0 for a valid handle (although passing |
| 1039 | invalid handles is harmless). Always returns 0 when not run |
| 1040 | on Valgrind. |
| 1041 | <p> |
| 1042 | <li><code>VALGRIND_CHECK_NOACCESS</code>, |
| 1043 | <code>VALGRIND_CHECK_WRITABLE</code> and |
| 1044 | <code>VALGRIND_CHECK_READABLE</code>: check immediately |
| 1045 | whether or not the given address range has the relevant |
| 1046 | property, and if not, print an error message. Also, for the |
| 1047 | convenience of the client, returns zero if the relevant |
| 1048 | property holds; otherwise, the returned value is the address |
| 1049 | of the first byte for which the property is not true. |
| 1050 | Always returns 0 when not run on Valgrind. |
| 1051 | <p> |
| 1052 | <li><code>VALGRIND_CHECK_NOACCESS</code>: a quick and easy way |
| 1053 | to find out whether Valgrind thinks a particular variable |
| 1054 | (lvalue, to be precise) is addressible and defined. Prints |
| 1055 | an error message if not. Returns no value. |
| 1056 | <p> |
| 1057 | <li><code>VALGRIND_MAKE_NOACCESS_STACK</code>: a highly |
| 1058 | experimental feature. Similarly to |
| 1059 | <code>VALGRIND_MAKE_NOACCESS</code>, this marks an address |
| 1060 | range as inaccessible, so that subsequent accesses to an |
| 1061 | address in the range gives an error. However, this macro |
| 1062 | does not return a block handle. Instead, all annotations |
| 1063 | created like this are reviewed at each client |
| 1064 | <code>ret</code> (subroutine return) instruction, and those |
| 1065 | which now define an address range block the client's stack |
| 1066 | pointer register (<code>%esp</code>) are automatically |
| 1067 | deleted. |
| 1068 | <p> |
| 1069 | In other words, this macro allows the client to tell |
| 1070 | Valgrind about red-zones on its own stack. Valgrind |
| 1071 | automatically discards this information when the stack |
| 1072 | retreats past such blocks. Beware: hacky and flaky, and |
| 1073 | probably interacts badly with the new pthread support. |
| 1074 | </ul> |
| 1075 | </li> |
| 1076 | <p> |
| 1077 | |
| 1078 | |
| 1079 | |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 1080 | <a name="problems"></a> |
sewardj | c7529c3 | 2002-04-16 01:55:18 +0000 | [diff] [blame] | 1081 | <h3>2.10 If you have problems</h3> |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 1082 | Mail me (<a href="mailto:jseward@acm.org">jseward@acm.org</a>). |
| 1083 | |
| 1084 | <p>See <a href="#limits">Section 4</a> for the known limitations of |
| 1085 | Valgrind, and for a list of programs which are known not to work on |
| 1086 | it. |
| 1087 | |
| 1088 | <p>The translator/instrumentor has a lot of assertions in it. They |
| 1089 | are permanently enabled, and I have no plans to disable them. If one |
| 1090 | of these breaks, please mail me! |
| 1091 | |
| 1092 | <p>If you get an assertion failure on the expression |
| 1093 | <code>chunkSane(ch)</code> in <code>vg_free()</code> in |
| 1094 | <code>vg_malloc.c</code>, this may have happened because your program |
| 1095 | wrote off the end of a malloc'd block, or before its beginning. |
| 1096 | Valgrind should have emitted a proper message to that effect before |
| 1097 | dying in this way. This is a known problem which I should fix. |
| 1098 | <p> |
| 1099 | |
| 1100 | <hr width="100%"> |
| 1101 | |
| 1102 | <a name="machine"></a> |
| 1103 | <h2>3 Details of the checking machinery</h2> |
| 1104 | |
| 1105 | Read this section if you want to know, in detail, exactly what and how |
| 1106 | Valgrind is checking. |
| 1107 | |
| 1108 | <a name="vvalue"></a> |
| 1109 | <h3>3.1 Valid-value (V) bits</h3> |
| 1110 | |
| 1111 | It is simplest to think of Valgrind implementing a synthetic Intel x86 |
| 1112 | CPU which is identical to a real CPU, except for one crucial detail. |
| 1113 | Every bit (literally) of data processed, stored and handled by the |
| 1114 | real CPU has, in the synthetic CPU, an associated "valid-value" bit, |
| 1115 | which says whether or not the accompanying bit has a legitimate value. |
| 1116 | In the discussions which follow, this bit is referred to as the V |
| 1117 | (valid-value) bit. |
| 1118 | |
| 1119 | <p>Each byte in the system therefore has a 8 V bits which accompanies |
| 1120 | it wherever it goes. For example, when the CPU loads a word-size item |
| 1121 | (4 bytes) from memory, it also loads the corresponding 32 V bits from |
| 1122 | a bitmap which stores the V bits for the process' entire address |
| 1123 | space. If the CPU should later write the whole or some part of that |
| 1124 | value to memory at a different address, the relevant V bits will be |
| 1125 | stored back in the V-bit bitmap. |
| 1126 | |
| 1127 | <p>In short, each bit in the system has an associated V bit, which |
| 1128 | follows it around everywhere, even inside the CPU. Yes, the CPU's |
| 1129 | (integer) registers have their own V bit vectors. |
| 1130 | |
| 1131 | <p>Copying values around does not cause Valgrind to check for, or |
| 1132 | report on, errors. However, when a value is used in a way which might |
| 1133 | conceivably affect the outcome of your program's computation, the |
| 1134 | associated V bits are immediately checked. If any of these indicate |
| 1135 | that the value is undefined, an error is reported. |
| 1136 | |
| 1137 | <p>Here's an (admittedly nonsensical) example: |
| 1138 | <pre> |
| 1139 | int i, j; |
| 1140 | int a[10], b[10]; |
| 1141 | for (i = 0; i < 10; i++) { |
| 1142 | j = a[i]; |
| 1143 | b[i] = j; |
| 1144 | } |
| 1145 | </pre> |
| 1146 | |
| 1147 | <p>Valgrind emits no complaints about this, since it merely copies |
| 1148 | uninitialised values from <code>a[]</code> into <code>b[]</code>, and |
| 1149 | doesn't use them in any way. However, if the loop is changed to |
| 1150 | <pre> |
| 1151 | for (i = 0; i < 10; i++) { |
| 1152 | j += a[i]; |
| 1153 | } |
| 1154 | if (j == 77) |
| 1155 | printf("hello there\n"); |
| 1156 | </pre> |
| 1157 | then Valgrind will complain, at the <code>if</code>, that the |
| 1158 | condition depends on uninitialised values. |
| 1159 | |
| 1160 | <p>Most low level operations, such as adds, cause Valgrind to |
| 1161 | use the V bits for the operands to calculate the V bits for the |
| 1162 | result. Even if the result is partially or wholly undefined, |
| 1163 | it does not complain. |
| 1164 | |
| 1165 | <p>Checks on definedness only occur in two places: when a value is |
| 1166 | used to generate a memory address, and where control flow decision |
| 1167 | needs to be made. Also, when a system call is detected, valgrind |
| 1168 | checks definedness of parameters as required. |
| 1169 | |
| 1170 | <p>If a check should detect undefinedness, and error message is |
| 1171 | issued. The resulting value is subsequently regarded as well-defined. |
| 1172 | To do otherwise would give long chains of error messages. In effect, |
| 1173 | we say that undefined values are non-infectious. |
| 1174 | |
| 1175 | <p>This sounds overcomplicated. Why not just check all reads from |
| 1176 | memory, and complain if an undefined value is loaded into a CPU register? |
| 1177 | Well, that doesn't work well, because perfectly legitimate C programs routinely |
| 1178 | copy uninitialised values around in memory, and we don't want endless complaints |
| 1179 | about that. Here's the canonical example. Consider a struct |
| 1180 | like this: |
| 1181 | <pre> |
| 1182 | struct S { int x; char c; }; |
| 1183 | struct S s1, s2; |
| 1184 | s1.x = 42; |
| 1185 | s1.c = 'z'; |
| 1186 | s2 = s1; |
| 1187 | </pre> |
| 1188 | |
| 1189 | <p>The question to ask is: how large is <code>struct S</code>, in |
| 1190 | bytes? An int is 4 bytes and a char one byte, so perhaps a struct S |
| 1191 | occupies 5 bytes? Wrong. All (non-toy) compilers I know of will |
| 1192 | round the size of <code>struct S</code> up to a whole number of words, |
| 1193 | in this case 8 bytes. Not doing this forces compilers to generate |
| 1194 | truly appalling code for subscripting arrays of <code>struct |
| 1195 | S</code>'s. |
| 1196 | |
| 1197 | <p>So s1 occupies 8 bytes, yet only 5 of them will be initialised. |
| 1198 | For the assignment <code>s2 = s1</code>, gcc generates code to copy |
| 1199 | all 8 bytes wholesale into <code>s2</code> without regard for their |
| 1200 | meaning. If Valgrind simply checked values as they came out of |
| 1201 | memory, it would yelp every time a structure assignment like this |
| 1202 | happened. So the more complicated semantics described above is |
| 1203 | necessary. This allows gcc to copy <code>s1</code> into |
| 1204 | <code>s2</code> any way it likes, and a warning will only be emitted |
| 1205 | if the uninitialised values are later used. |
| 1206 | |
| 1207 | <p>One final twist to this story. The above scheme allows garbage to |
| 1208 | pass through the CPU's integer registers without complaint. It does |
| 1209 | this by giving the integer registers V tags, passing these around in |
| 1210 | the expected way. This complicated and computationally expensive to |
| 1211 | do, but is necessary. Valgrind is more simplistic about |
| 1212 | floating-point loads and stores. In particular, V bits for data read |
| 1213 | as a result of floating-point loads are checked at the load |
| 1214 | instruction. So if your program uses the floating-point registers to |
| 1215 | do memory-to-memory copies, you will get complaints about |
| 1216 | uninitialised values. Fortunately, I have not yet encountered a |
| 1217 | program which (ab)uses the floating-point registers in this way. |
| 1218 | |
| 1219 | <a name="vaddress"></a> |
| 1220 | <h3>3.2 Valid-address (A) bits</h3> |
| 1221 | |
| 1222 | Notice that the previous section describes how the validity of values |
| 1223 | is established and maintained without having to say whether the |
| 1224 | program does or does not have the right to access any particular |
| 1225 | memory location. We now consider the latter issue. |
| 1226 | |
| 1227 | <p>As described above, every bit in memory or in the CPU has an |
| 1228 | associated valid-value (V) bit. In addition, all bytes in memory, but |
| 1229 | not in the CPU, have an associated valid-address (A) bit. This |
| 1230 | indicates whether or not the program can legitimately read or write |
| 1231 | that location. It does not give any indication of the validity or the |
| 1232 | data at that location -- that's the job of the V bits -- only whether |
| 1233 | or not the location may be accessed. |
| 1234 | |
| 1235 | <p>Every time your program reads or writes memory, Valgrind checks the |
| 1236 | A bits associated with the address. If any of them indicate an |
| 1237 | invalid address, an error is emitted. Note that the reads and writes |
| 1238 | themselves do not change the A bits, only consult them. |
| 1239 | |
| 1240 | <p>So how do the A bits get set/cleared? Like this: |
| 1241 | |
| 1242 | <ul> |
| 1243 | <li>When the program starts, all the global data areas are marked as |
| 1244 | accessible.</li><br> |
| 1245 | <p> |
| 1246 | |
| 1247 | <li>When the program does malloc/new, the A bits for the exactly the |
| 1248 | area allocated, and not a byte more, are marked as accessible. |
| 1249 | Upon freeing the area the A bits are changed to indicate |
| 1250 | inaccessibility.</li><br> |
| 1251 | <p> |
| 1252 | |
| 1253 | <li>When the stack pointer register (%esp) moves up or down, A bits |
| 1254 | are set. The rule is that the area from %esp up to the base of |
| 1255 | the stack is marked as accessible, and below %esp is |
| 1256 | inaccessible. (If that sounds illogical, bear in mind that the |
| 1257 | stack grows down, not up, on almost all Unix systems, including |
| 1258 | GNU/Linux.) Tracking %esp like this has the useful side-effect |
| 1259 | that the section of stack used by a function for local variables |
| 1260 | etc is automatically marked accessible on function entry and |
| 1261 | inaccessible on exit.</li><br> |
| 1262 | <p> |
| 1263 | |
| 1264 | <li>When doing system calls, A bits are changed appropriately. For |
| 1265 | example, mmap() magically makes files appear in the process's |
| 1266 | address space, so the A bits must be updated if mmap() |
| 1267 | succeeds.</li><br> |
| 1268 | </ul> |
| 1269 | |
| 1270 | |
| 1271 | <a name="together"></a> |
| 1272 | <h3>3.3 Putting it all together</h3> |
| 1273 | Valgrind's checking machinery can be summarised as follows: |
| 1274 | |
| 1275 | <ul> |
| 1276 | <li>Each byte in memory has 8 associated V (valid-value) bits, |
| 1277 | saying whether or not the byte has a defined value, and a single |
| 1278 | A (valid-address) bit, saying whether or not the program |
| 1279 | currently has the right to read/write that address.</li><br> |
| 1280 | <p> |
| 1281 | |
| 1282 | <li>When memory is read or written, the relevant A bits are |
| 1283 | consulted. If they indicate an invalid address, Valgrind emits |
| 1284 | an Invalid read or Invalid write error.</li><br> |
| 1285 | <p> |
| 1286 | |
| 1287 | <li>When memory is read into the CPU's integer registers, the |
| 1288 | relevant V bits are fetched from memory and stored in the |
| 1289 | simulated CPU. They are not consulted.</li><br> |
| 1290 | <p> |
| 1291 | |
| 1292 | <li>When an integer register is written out to memory, the V bits |
| 1293 | for that register are written back to memory too.</li><br> |
| 1294 | <p> |
| 1295 | |
| 1296 | <li>When memory is read into the CPU's floating point registers, the |
| 1297 | relevant V bits are read from memory and they are immediately |
| 1298 | checked. If any are invalid, an uninitialised value error is |
| 1299 | emitted. This precludes using the floating-point registers to |
| 1300 | copy possibly-uninitialised memory, but simplifies Valgrind in |
| 1301 | that it does not have to track the validity status of the |
| 1302 | floating-point registers.</li><br> |
| 1303 | <p> |
| 1304 | |
| 1305 | <li>As a result, when a floating-point register is written to |
| 1306 | memory, the associated V bits are set to indicate a valid |
| 1307 | value.</li><br> |
| 1308 | <p> |
| 1309 | |
| 1310 | <li>When values in integer CPU registers are used to generate a |
| 1311 | memory address, or to determine the outcome of a conditional |
| 1312 | branch, the V bits for those values are checked, and an error |
| 1313 | emitted if any of them are undefined.</li><br> |
| 1314 | <p> |
| 1315 | |
| 1316 | <li>When values in integer CPU registers are used for any other |
| 1317 | purpose, Valgrind computes the V bits for the result, but does |
| 1318 | not check them.</li><br> |
| 1319 | <p> |
| 1320 | |
| 1321 | <li>One the V bits for a value in the CPU have been checked, they |
| 1322 | are then set to indicate validity. This avoids long chains of |
| 1323 | errors.</li><br> |
| 1324 | <p> |
| 1325 | |
| 1326 | <li>When values are loaded from memory, valgrind checks the A bits |
| 1327 | for that location and issues an illegal-address warning if |
| 1328 | needed. In that case, the V bits loaded are forced to indicate |
| 1329 | Valid, despite the location being invalid. |
| 1330 | <p> |
| 1331 | This apparently strange choice reduces the amount of confusing |
| 1332 | information presented to the user. It avoids the |
| 1333 | unpleasant phenomenon in which memory is read from a place which |
| 1334 | is both unaddressible and contains invalid values, and, as a |
| 1335 | result, you get not only an invalid-address (read/write) error, |
| 1336 | but also a potentially large set of uninitialised-value errors, |
| 1337 | one for every time the value is used. |
| 1338 | <p> |
| 1339 | There is a hazy boundary case to do with multi-byte loads from |
| 1340 | addresses which are partially valid and partially invalid. See |
| 1341 | details of the flag <code>--partial-loads-ok</code> for details. |
| 1342 | </li><br> |
| 1343 | </ul> |
| 1344 | |
| 1345 | Valgrind intercepts calls to malloc, calloc, realloc, valloc, |
| 1346 | memalign, free, new and delete. The behaviour you get is: |
| 1347 | |
| 1348 | <ul> |
| 1349 | |
| 1350 | <li>malloc/new: the returned memory is marked as addressible but not |
| 1351 | having valid values. This means you have to write on it before |
| 1352 | you can read it.</li><br> |
| 1353 | <p> |
| 1354 | |
| 1355 | <li>calloc: returned memory is marked both addressible and valid, |
| 1356 | since calloc() clears the area to zero.</li><br> |
| 1357 | <p> |
| 1358 | |
| 1359 | <li>realloc: if the new size is larger than the old, the new section |
| 1360 | is addressible but invalid, as with malloc.</li><br> |
| 1361 | <p> |
| 1362 | |
| 1363 | <li>If the new size is smaller, the dropped-off section is marked as |
| 1364 | unaddressible. You may only pass to realloc a pointer |
| 1365 | previously issued to you by malloc/calloc/new/realloc.</li><br> |
| 1366 | <p> |
| 1367 | |
| 1368 | <li>free/delete: you may only pass to free a pointer previously |
| 1369 | issued to you by malloc/calloc/new/realloc, or the value |
| 1370 | NULL. Otherwise, Valgrind complains. If the pointer is indeed |
| 1371 | valid, Valgrind marks the entire area it points at as |
| 1372 | unaddressible, and places the block in the freed-blocks-queue. |
| 1373 | The aim is to defer as long as possible reallocation of this |
| 1374 | block. Until that happens, all attempts to access it will |
| 1375 | elicit an invalid-address error, as you would hope.</li><br> |
| 1376 | </ul> |
| 1377 | |
| 1378 | |
| 1379 | |
| 1380 | <a name="signals"></a> |
| 1381 | <h3>3.4 Signals</h3> |
| 1382 | |
| 1383 | Valgrind provides suitable handling of signals, so, provided you stick |
| 1384 | to POSIX stuff, you should be ok. Basic sigaction() and sigprocmask() |
| 1385 | are handled. Signal handlers may return in the normal way or do |
| 1386 | longjmp(); both should work ok. As specified by POSIX, a signal is |
| 1387 | blocked in its own handler. Default actions for signals should work |
| 1388 | as before. Etc, etc. |
| 1389 | |
| 1390 | <p>Under the hood, dealing with signals is a real pain, and Valgrind's |
| 1391 | simulation leaves much to be desired. If your program does |
| 1392 | way-strange stuff with signals, bad things may happen. If so, let me |
| 1393 | know. I don't promise to fix it, but I'd at least like to be aware of |
| 1394 | it. |
| 1395 | |
| 1396 | |
| 1397 | <a name="leaks"><a/> |
| 1398 | <h3>3.5 Memory leak detection</h3> |
| 1399 | |
| 1400 | Valgrind keeps track of all memory blocks issued in response to calls |
| 1401 | to malloc/calloc/realloc/new. So when the program exits, it knows |
| 1402 | which blocks are still outstanding -- have not been returned, in other |
| 1403 | words. Ideally, you want your program to have no blocks still in use |
| 1404 | at exit. But many programs do. |
| 1405 | |
| 1406 | <p>For each such block, Valgrind scans the entire address space of the |
| 1407 | process, looking for pointers to the block. One of three situations |
| 1408 | may result: |
| 1409 | |
| 1410 | <ul> |
| 1411 | <li>A pointer to the start of the block is found. This usually |
| 1412 | indicates programming sloppiness; since the block is still |
| 1413 | pointed at, the programmer could, at least in principle, free'd |
| 1414 | it before program exit.</li><br> |
| 1415 | <p> |
| 1416 | |
| 1417 | <li>A pointer to the interior of the block is found. The pointer |
| 1418 | might originally have pointed to the start and have been moved |
| 1419 | along, or it might be entirely unrelated. Valgrind deems such a |
| 1420 | block as "dubious", that is, possibly leaked, |
| 1421 | because it's unclear whether or |
| 1422 | not a pointer to it still exists.</li><br> |
| 1423 | <p> |
| 1424 | |
| 1425 | <li>The worst outcome is that no pointer to the block can be found. |
| 1426 | The block is classified as "leaked", because the |
| 1427 | programmer could not possibly have free'd it at program exit, |
| 1428 | since no pointer to it exists. This might be a symptom of |
| 1429 | having lost the pointer at some earlier point in the |
| 1430 | program.</li> |
| 1431 | </ul> |
| 1432 | |
| 1433 | Valgrind reports summaries about leaked and dubious blocks. |
| 1434 | For each such block, it will also tell you where the block was |
| 1435 | allocated. This should help you figure out why the pointer to it has |
| 1436 | been lost. In general, you should attempt to ensure your programs do |
| 1437 | not have any leaked or dubious blocks at exit. |
| 1438 | |
| 1439 | <p>The precise area of memory in which Valgrind searches for pointers |
| 1440 | is: all naturally-aligned 4-byte words for which all A bits indicate |
| 1441 | addressibility and all V bits indicated that the stored value is |
| 1442 | actually valid. |
| 1443 | |
| 1444 | <p><hr width="100%"> |
| 1445 | |
| 1446 | |
| 1447 | <a name="limits"></a> |
| 1448 | <h2>4 Limitations</h2> |
| 1449 | |
| 1450 | The following list of limitations seems depressingly long. However, |
| 1451 | most programs actually work fine. |
| 1452 | |
| 1453 | <p>Valgrind will run x86-GNU/Linux ELF dynamically linked binaries, on |
| 1454 | a kernel 2.4.X system, subject to the following constraints: |
| 1455 | |
| 1456 | <ul> |
| 1457 | <li>No MMX, SSE, SSE2, 3DNow instructions. If the translator |
| 1458 | encounters these, Valgrind will simply give up. It may be |
| 1459 | possible to add support for them at a later time. Intel added a |
| 1460 | few instructions such as "cmov" to the integer instruction set |
| 1461 | on Pentium and later processors, and these are supported. |
| 1462 | Nevertheless it's safest to think of Valgrind as implementing |
| 1463 | the 486 instruction set.</li><br> |
| 1464 | <p> |
| 1465 | |
| 1466 | <li>Multithreaded programs are not supported, since I haven't yet |
| 1467 | figured out how to do this. To be more specific, it is the |
| 1468 | "clone" system call which is not supported. A program calls |
| 1469 | "clone" to create threads. Valgrind will abort if this |
| 1470 | happens.</li><nr> |
| 1471 | <p> |
| 1472 | |
| 1473 | <li>Valgrind assumes that the floating point registers are not used |
| 1474 | as intermediaries in memory-to-memory copies, so it immediately |
| 1475 | checks V bits in floating-point loads/stores. If you want to |
| 1476 | write code which copies around possibly-uninitialised values, |
| 1477 | you must ensure these travel through the integer registers, not |
| 1478 | the FPU.</li><br> |
| 1479 | <p> |
| 1480 | |
| 1481 | <li>If your program does its own memory management, rather than |
| 1482 | using malloc/new/free/delete, it should still work, but |
| 1483 | Valgrind's error checking won't be so effective.</li><br> |
| 1484 | <p> |
| 1485 | |
| 1486 | <li>Valgrind's signal simulation is not as robust as it could be. |
| 1487 | Basic POSIX-compliant sigaction and sigprocmask functionality is |
| 1488 | supplied, but it's conceivable that things could go badly awry |
| 1489 | if you do wierd things with signals. Workaround: don't. |
| 1490 | Programs that do non-POSIX signal tricks are in any case |
| 1491 | inherently unportable, so should be avoided if |
| 1492 | possible.</li><br> |
| 1493 | <p> |
| 1494 | |
| 1495 | <li>I have no idea what happens if programs try to handle signals on |
| 1496 | an alternate stack (sigaltstack). YMMV.</li><br> |
| 1497 | <p> |
| 1498 | |
| 1499 | <li>Programs which switch stacks are not well handled. Valgrind |
| 1500 | does have support for this, but I don't have great faith in it. |
| 1501 | It's difficult -- there's no cast-iron way to decide whether a |
| 1502 | large change in %esp is as a result of the program switching |
| 1503 | stacks, or merely allocating a large object temporarily on the |
| 1504 | current stack -- yet Valgrind needs to handle the two situations |
| 1505 | differently.</li><br> |
| 1506 | <p> |
| 1507 | |
| 1508 | <li>x86 instructions, and system calls, have been implemented on |
| 1509 | demand. So it's possible, although unlikely, that a program |
| 1510 | will fall over with a message to that effect. If this happens, |
| 1511 | please mail me ALL the details printed out, so I can try and |
| 1512 | implement the missing feature.</li><br> |
| 1513 | <p> |
| 1514 | |
| 1515 | <li>x86 floating point works correctly, but floating-point code may |
| 1516 | run even more slowly than integer code, due to my simplistic |
| 1517 | approach to FPU emulation.</li><br> |
| 1518 | <p> |
| 1519 | |
| 1520 | <li>You can't Valgrind-ize statically linked binaries. Valgrind |
| 1521 | relies on the dynamic-link mechanism to gain control at |
| 1522 | startup.</li><br> |
| 1523 | <p> |
| 1524 | |
| 1525 | <li>Memory consumption of your program is majorly increased whilst |
| 1526 | running under Valgrind. This is due to the large amount of |
| 1527 | adminstrative information maintained behind the scenes. Another |
| 1528 | cause is that Valgrind dynamically translates the original |
| 1529 | executable and never throws any translation away, except in |
| 1530 | those rare cases where self-modifying code is detected. |
| 1531 | Translated, instrumented code is 8-12 times larger than the |
| 1532 | original (!) so you can easily end up with 15+ MB of |
| 1533 | translations when running (eg) a web browser. There's not a lot |
| 1534 | you can do about this -- use Valgrind on a fast machine with a lot |
| 1535 | of memory and swap space. At some point I may implement a LRU |
| 1536 | caching scheme for translations, so as to bound the maximum |
| 1537 | amount of memory devoted to them, to say 8 or 16 MB.</li> |
| 1538 | </ul> |
| 1539 | |
| 1540 | |
| 1541 | Programs which are known not to work are: |
| 1542 | |
| 1543 | <ul> |
| 1544 | <li>Netscape 4.76 works pretty well on some platforms -- quite |
| 1545 | nicely on my AMD K6-III (400 MHz). I can surf, do mail, etc, no |
| 1546 | problem. On other platforms is has been observed to crash |
| 1547 | during startup. Despite much investigation I can't figure out |
| 1548 | why.</li><br> |
| 1549 | <p> |
| 1550 | |
| 1551 | <li>kpackage (a KDE front end to rpm) dies because the CPUID |
| 1552 | instruction is unimplemented. Easy to fix.</li><br> |
| 1553 | <p> |
| 1554 | |
| 1555 | <li>knode (a KDE newsreader) tries to do multithreaded things, and |
| 1556 | fails.</li><br> |
| 1557 | <p> |
| 1558 | |
| 1559 | <li>emacs starts up but immediately concludes it is out of memory |
| 1560 | and aborts. Emacs has it's own memory-management scheme, but I |
| 1561 | don't understand why this should interact so badly with |
| 1562 | Valgrind.</li><br> |
| 1563 | <p> |
| 1564 | |
| 1565 | <li>Gimp and Gnome and GTK-based apps die early on because |
| 1566 | of unimplemented system call wrappers. (I'm a KDE user :) |
| 1567 | This wouldn't be hard to fix. |
| 1568 | </li><br> |
| 1569 | <p> |
| 1570 | |
| 1571 | <li>As a consequence of me being a KDE user, almost all KDE apps |
| 1572 | work ok -- except those which are multithreaded. |
| 1573 | </li><br> |
| 1574 | <p> |
| 1575 | </ul> |
| 1576 | |
| 1577 | |
| 1578 | <p><hr width="100%"> |
| 1579 | |
| 1580 | |
| 1581 | <a name="howitworks"></a> |
| 1582 | <h2>5 How it works -- a rough overview</h2> |
| 1583 | Some gory details, for those with a passion for gory details. You |
| 1584 | don't need to read this section if all you want to do is use Valgrind. |
| 1585 | |
| 1586 | <a name="startb"></a> |
| 1587 | <h3>5.1 Getting started</h3> |
| 1588 | |
| 1589 | Valgrind is compiled into a shared object, valgrind.so. The shell |
| 1590 | script valgrind sets the LD_PRELOAD environment variable to point to |
| 1591 | valgrind.so. This causes the .so to be loaded as an extra library to |
| 1592 | any subsequently executed dynamically-linked ELF binary, viz, the |
| 1593 | program you want to debug. |
| 1594 | |
| 1595 | <p>The dynamic linker allows each .so in the process image to have an |
| 1596 | initialisation function which is run before main(). It also allows |
| 1597 | each .so to have a finalisation function run after main() exits. |
| 1598 | |
| 1599 | <p>When valgrind.so's initialisation function is called by the dynamic |
| 1600 | linker, the synthetic CPU to starts up. The real CPU remains locked |
| 1601 | in valgrind.so for the entire rest of the program, but the synthetic |
| 1602 | CPU returns from the initialisation function. Startup of the program |
| 1603 | now continues as usual -- the dynamic linker calls all the other .so's |
| 1604 | initialisation routines, and eventually runs main(). This all runs on |
| 1605 | the synthetic CPU, not the real one, but the client program cannot |
| 1606 | tell the difference. |
| 1607 | |
| 1608 | <p>Eventually main() exits, so the synthetic CPU calls valgrind.so's |
| 1609 | finalisation function. Valgrind detects this, and uses it as its cue |
| 1610 | to exit. It prints summaries of all errors detected, possibly checks |
| 1611 | for memory leaks, and then exits the finalisation routine, but now on |
| 1612 | the real CPU. The synthetic CPU has now lost control -- permanently |
| 1613 | -- so the program exits back to the OS on the real CPU, just as it |
| 1614 | would have done anyway. |
| 1615 | |
| 1616 | <p>On entry, Valgrind switches stacks, so it runs on its own stack. |
| 1617 | On exit, it switches back. This means that the client program |
| 1618 | continues to run on its own stack, so we can switch back and forth |
| 1619 | between running it on the simulated and real CPUs without difficulty. |
| 1620 | This was an important design decision, because it makes it easy (well, |
| 1621 | significantly less difficult) to debug the synthetic CPU. |
| 1622 | |
| 1623 | |
| 1624 | <a name="engine"></a> |
| 1625 | <h3>5.2 The translation/instrumentation engine</h3> |
| 1626 | |
| 1627 | Valgrind does not directly run any of the original program's code. Only |
| 1628 | instrumented translations are run. Valgrind maintains a translation |
| 1629 | table, which allows it to find the translation quickly for any branch |
| 1630 | target (code address). If no translation has yet been made, the |
| 1631 | translator - a just-in-time translator - is summoned. This makes an |
| 1632 | instrumented translation, which is added to the collection of |
| 1633 | translations. Subsequent jumps to that address will use this |
| 1634 | translation. |
| 1635 | |
| 1636 | <p>Valgrind can optionally check writes made by the application, to |
| 1637 | see if they are writing an address contained within code which has |
| 1638 | been translated. Such a write invalidates translations of code |
| 1639 | bracketing the written address. Valgrind will discard the relevant |
| 1640 | translations, which causes them to be re-made, if they are needed |
| 1641 | again, reflecting the new updated data stored there. In this way, |
| 1642 | self modifying code is supported. In practice I have not found any |
| 1643 | Linux applications which use self-modifying-code. |
| 1644 | |
| 1645 | <p>The JITter translates basic blocks -- blocks of straight-line-code |
| 1646 | -- as single entities. To minimise the considerable difficulties of |
| 1647 | dealing with the x86 instruction set, x86 instructions are first |
| 1648 | translated to a RISC-like intermediate code, similar to sparc code, |
| 1649 | but with an infinite number of virtual integer registers. Initially |
| 1650 | each insn is translated seperately, and there is no attempt at |
| 1651 | instrumentation. |
| 1652 | |
| 1653 | <p>The intermediate code is improved, mostly so as to try and cache |
| 1654 | the simulated machine's registers in the real machine's registers over |
| 1655 | several simulated instructions. This is often very effective. Also, |
| 1656 | we try to remove redundant updates of the simulated machines's |
| 1657 | condition-code register. |
| 1658 | |
| 1659 | <p>The intermediate code is then instrumented, giving more |
| 1660 | intermediate code. There are a few extra intermediate-code operations |
| 1661 | to support instrumentation; it is all refreshingly simple. After |
| 1662 | instrumentation there is a cleanup pass to remove redundant value |
| 1663 | checks. |
| 1664 | |
| 1665 | <p>This gives instrumented intermediate code which mentions arbitrary |
| 1666 | numbers of virtual registers. A linear-scan register allocator is |
| 1667 | used to assign real registers and possibly generate spill code. All |
| 1668 | of this is still phrased in terms of the intermediate code. This |
| 1669 | machinery is inspired by the work of Reuben Thomas (MITE). |
| 1670 | |
| 1671 | <p>Then, and only then, is the final x86 code emitted. The |
| 1672 | intermediate code is carefully designed so that x86 code can be |
| 1673 | generated from it without need for spare registers or other |
| 1674 | inconveniences. |
| 1675 | |
| 1676 | <p>The translations are managed using a traditional LRU-based caching |
| 1677 | scheme. The translation cache has a default size of about 14MB. |
| 1678 | |
| 1679 | <a name="track"></a> |
| 1680 | |
| 1681 | <h3>5.3 Tracking the status of memory</h3> Each byte in the |
| 1682 | process' address space has nine bits associated with it: one A bit and |
| 1683 | eight V bits. The A and V bits for each byte are stored using a |
| 1684 | sparse array, which flexibly and efficiently covers arbitrary parts of |
| 1685 | the 32-bit address space without imposing significant space or |
| 1686 | performance overheads for the parts of the address space never |
| 1687 | visited. The scheme used, and speedup hacks, are described in detail |
| 1688 | at the top of the source file vg_memory.c, so you should read that for |
| 1689 | the gory details. |
| 1690 | |
| 1691 | <a name="sys_calls"></a> |
| 1692 | |
| 1693 | <h3>5.4 System calls</h3> |
| 1694 | All system calls are intercepted. The memory status map is consulted |
| 1695 | before and updated after each call. It's all rather tiresome. See |
| 1696 | vg_syscall_mem.c for details. |
| 1697 | |
| 1698 | <a name="sys_signals"></a> |
| 1699 | |
| 1700 | <h3>5.5 Signals</h3> |
| 1701 | All system calls to sigaction() and sigprocmask() are intercepted. If |
| 1702 | the client program is trying to set a signal handler, Valgrind makes a |
| 1703 | note of the handler address and which signal it is for. Valgrind then |
| 1704 | arranges for the same signal to be delivered to its own handler. |
| 1705 | |
| 1706 | <p>When such a signal arrives, Valgrind's own handler catches it, and |
| 1707 | notes the fact. At a convenient safe point in execution, Valgrind |
| 1708 | builds a signal delivery frame on the client's stack and runs its |
| 1709 | handler. If the handler longjmp()s, there is nothing more to be said. |
| 1710 | If the handler returns, Valgrind notices this, zaps the delivery |
| 1711 | frame, and carries on where it left off before delivering the signal. |
| 1712 | |
| 1713 | <p>The purpose of this nonsense is that setting signal handlers |
| 1714 | essentially amounts to giving callback addresses to the Linux kernel. |
| 1715 | We can't allow this to happen, because if it did, signal handlers |
| 1716 | would run on the real CPU, not the simulated one. This means the |
| 1717 | checking machinery would not operate during the handler run, and, |
| 1718 | worse, memory permissions maps would not be updated, which could cause |
| 1719 | spurious error reports once the handler had returned. |
| 1720 | |
| 1721 | <p>An even worse thing would happen if the signal handler longjmp'd |
| 1722 | rather than returned: Valgrind would completely lose control of the |
| 1723 | client program. |
| 1724 | |
| 1725 | <p>Upshot: we can't allow the client to install signal handlers |
| 1726 | directly. Instead, Valgrind must catch, on behalf of the client, any |
| 1727 | signal the client asks to catch, and must delivery it to the client on |
| 1728 | the simulated CPU, not the real one. This involves considerable |
| 1729 | gruesome fakery; see vg_signals.c for details. |
| 1730 | <p> |
| 1731 | |
| 1732 | <hr width="100%"> |
| 1733 | |
| 1734 | <a name="example"></a> |
| 1735 | <h2>6 Example</h2> |
| 1736 | This is the log for a run of a small program. The program is in fact |
| 1737 | correct, and the reported error is as the result of a potentially serious |
| 1738 | code generation bug in GNU g++ (snapshot 20010527). |
| 1739 | <pre> |
| 1740 | sewardj@phoenix:~/newmat10$ |
| 1741 | ~/Valgrind-6/valgrind -v ./bogon |
| 1742 | ==25832== Valgrind 0.10, a memory error detector for x86 RedHat 7.1. |
| 1743 | ==25832== Copyright (C) 2000-2001, and GNU GPL'd, by Julian Seward. |
| 1744 | ==25832== Startup, with flags: |
| 1745 | ==25832== --suppressions=/home/sewardj/Valgrind/redhat71.supp |
| 1746 | ==25832== reading syms from /lib/ld-linux.so.2 |
| 1747 | ==25832== reading syms from /lib/libc.so.6 |
| 1748 | ==25832== reading syms from /mnt/pima/jrs/Inst/lib/libgcc_s.so.0 |
| 1749 | ==25832== reading syms from /lib/libm.so.6 |
| 1750 | ==25832== reading syms from /mnt/pima/jrs/Inst/lib/libstdc++.so.3 |
| 1751 | ==25832== reading syms from /home/sewardj/Valgrind/valgrind.so |
| 1752 | ==25832== reading syms from /proc/self/exe |
| 1753 | ==25832== loaded 5950 symbols, 142333 line number locations |
| 1754 | ==25832== |
| 1755 | ==25832== Invalid read of size 4 |
| 1756 | ==25832== at 0x8048724: _ZN10BandMatrix6ReSizeEiii (bogon.cpp:45) |
| 1757 | ==25832== by 0x80487AF: main (bogon.cpp:66) |
| 1758 | ==25832== by 0x40371E5E: __libc_start_main (libc-start.c:129) |
| 1759 | ==25832== by 0x80485D1: (within /home/sewardj/newmat10/bogon) |
| 1760 | ==25832== Address 0xBFFFF74C is not stack'd, malloc'd or free'd |
| 1761 | ==25832== |
| 1762 | ==25832== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) |
| 1763 | ==25832== malloc/free: in use at exit: 0 bytes in 0 blocks. |
| 1764 | ==25832== malloc/free: 0 allocs, 0 frees, 0 bytes allocated. |
| 1765 | ==25832== For a detailed leak analysis, rerun with: --leak-check=yes |
| 1766 | ==25832== |
| 1767 | ==25832== exiting, did 1881 basic blocks, 0 misses. |
| 1768 | ==25832== 223 translations, 3626 bytes in, 56801 bytes out. |
| 1769 | </pre> |
| 1770 | <p>The GCC folks fixed this about a week before gcc-3.0 shipped. |
| 1771 | <hr width="100%"> |
| 1772 | <p> |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame^] | 1773 | |
| 1774 | |
| 1775 | |
| 1776 | <a name="cache"></a> |
| 1777 | <h2>7 Cache profiling</h2> |
| 1778 | As well as memory debugging, Valgrind also allows you to do cache simulations |
| 1779 | and annotate your source line-by-line with the number of cache misses. In |
| 1780 | particular, it records: |
| 1781 | <ul> |
| 1782 | <li>L1 instruction cache reads and misses; |
| 1783 | <li>L1 data cache reads and read misses, writes and write misses; |
| 1784 | <li>L2 unified cache reads and read misses, writes and writes misses. |
| 1785 | </ul> |
| 1786 | On a modern x86 machine, an L1 miss will typically cost around 10 cycles, |
| 1787 | and an L2 miss can cost as much as 200 cycles. Detailed cache profiling can be |
| 1788 | very useful for improving the performance of your program. |
| 1789 | |
| 1790 | Please note that this is an experimental feature. Any feedback, bug-fixes, |
| 1791 | suggestions, etc, welcome. |
| 1792 | |
| 1793 | |
| 1794 | <h3>7.1 Overview</h3> |
| 1795 | First off, as for normal Valgrind use, you probably want to turn on debugging |
| 1796 | info (the <code>-g</code> flag). But by contrast with normal Valgrind use, you |
| 1797 | probably <b>do</b> want to turn optimisation on, since you should profile your |
| 1798 | program as it will be normally run. |
| 1799 | |
| 1800 | The three steps are: |
| 1801 | <ol> |
| 1802 | <li>Generate a cache simulator for your machine's cache configuration with |
| 1803 | `vg_cachegen' and recompile Valgrind with <code>make install</code>. |
| 1804 | Valgrind comes with a default simulator, but it is unlikely to be correct |
| 1805 | for your system, so you should generate a simulator yourself.</li> |
| 1806 | <li>Run your program with <code>valgrind --cachesim=yes</code> in front of |
| 1807 | the normal command line invocation. When the program finishes, Valgrind |
| 1808 | will print summary cache statistics. It also collects line-by-line |
| 1809 | information in a file <code>cachegrind.out</code>.</li> |
| 1810 | <li>Generate a function-by-function summary, and possibly annotate source |
| 1811 | files with 'vg_annotate'. Source files to annotate can be specified |
| 1812 | manually, or manually on the command line, or "interesting" source files |
| 1813 | can be annotated automatically with the <code>--auto=yes</code> option. |
| 1814 | You can annotate C/C++ files or assembly language files equally |
| 1815 | easily.</li> |
| 1816 | </ol> |
| 1817 | |
| 1818 | <a href="#generate">Step 1</a> only needs to be done once, unless you are |
| 1819 | interested in simulating different cache configurations (eg. first |
| 1820 | concentrating on instruction cache misses, then on data cache misses).<p> |
| 1821 | |
| 1822 | <a href="#profile">Step 2</a> should be done every time you want to collect |
| 1823 | information about a new program, a changed program, or about the same program |
| 1824 | with different input.<p> |
| 1825 | |
| 1826 | <a href="#annotate">Step 3</a> can be performed as many times as you like for |
| 1827 | each Step 2; you may want to do multiple annotations showing different |
| 1828 | information each time.<p> |
| 1829 | |
| 1830 | The steps are described in detail in the following sections.<p> |
| 1831 | |
| 1832 | |
| 1833 | <a name="generate"></a> |
| 1834 | <h3>7.3 Generating a cache simulator</h3> |
| 1835 | Although Valgrind comes with a pre-generated cache simulator, it most likely |
| 1836 | won't match the cache configuration of your machine, so you should generate |
| 1837 | a new simulator.<p> |
| 1838 | |
| 1839 | You need to generate three files, one for each of the I1, D1 and L2 caches. |
| 1840 | For each cache, you need to know the: |
| 1841 | <ul> |
| 1842 | <li>Cache size (bytes); |
| 1843 | <li>Line size (bytes); |
| 1844 | <li>Associativity. |
| 1845 | </ul> |
| 1846 | |
| 1847 | vg_cachegen takes three options: |
| 1848 | <ul> |
| 1849 | <li><code>--I1=size,line_size,associativity</code> |
| 1850 | <li><code>--D1=size,line_size,associativity</code> |
| 1851 | <li><code>--L2=size,line_size,associativity</code> |
| 1852 | </ul> |
| 1853 | |
| 1854 | You can specify one, two or all three caches per invocation of vg_cachegen. It |
| 1855 | checks that the configuration is sensible before generating the simulators; to |
| 1856 | see the allowed values, run <code>vg_cachegen -h</code>.<p> |
| 1857 | |
| 1858 | An example invocation would be: |
| 1859 | |
| 1860 | <blockquote><code> |
| 1861 | vg_cachegen --I1=65536,64,2 --D1=65536,64,2 --L2=262144,64,8 |
| 1862 | </code></blockquote> |
| 1863 | |
| 1864 | This simulates a machine with a 128KB split L1 2-way associative cache, and a |
| 1865 | 256KB unified 8-way associative L2 cache. Both caches have 64B lines.<p> |
| 1866 | |
| 1867 | If you don't know your cache configuration, you'll have to find it out. |
| 1868 | (Ideally vg_cachegen could auto-identify your cache configuration using the |
| 1869 | CPUID instruction, which could be done automatically during installation, and |
| 1870 | this whole step could be skipped...)<p> |
| 1871 | |
| 1872 | |
| 1873 | <h3>7.4 Cache simulation specifics</h3> |
| 1874 | vg_cachegen only generates simulations for a machine with a split L1 cache and |
| 1875 | a unified L2 cache. This configuration is used for all x86-based machines we |
| 1876 | are aware of.<p> |
| 1877 | |
| 1878 | The more specific characteristics of the simulation are as follows. |
| 1879 | |
| 1880 | <ul> |
| 1881 | <li>Write-allocate: when a write miss occurs, the block written to is brought |
| 1882 | into the D1 cache. Most modern caches have this property.</li><p> |
| 1883 | |
| 1884 | <li>Bit-selection hash function: the line(s) in the cache to which a memory |
| 1885 | block maps is chosen by the middle bits M--(M+N-1) of the byte address, |
| 1886 | where: |
| 1887 | <ul> |
| 1888 | <li> line size = 2^M bytes </li> |
| 1889 | <li>(cache size / line size) = 2^N bytes</li> |
| 1890 | </ul> </li><p> |
| 1891 | |
| 1892 | <li>Inclusive L2 cache: the L2 cache replicates all the entries of the L1 |
| 1893 | cache. This is standard on Pentium chips, but AMD Athlons use an |
| 1894 | exclusive L2 cache that only holds blocks evicted from L1.</li><p> |
| 1895 | </ul> |
| 1896 | |
| 1897 | Other noteworthy behaviour: |
| 1898 | |
| 1899 | <ul> |
| 1900 | <li>References that straddle two cache lines are treated as follows:</li> |
| 1901 | <ul> |
| 1902 | <li>If both blocks hit --> counted as one hit</li> |
| 1903 | <li>If one block hits, the other misses --> counted as one miss</li> |
| 1904 | <li>If both blocks miss --> counted as one miss (not two)</li> |
| 1905 | </ul><p> |
| 1906 | |
| 1907 | <li>Instructions that modify a memory location (eg. <code>inc</code> and |
| 1908 | <code>dec</code>) are counted as doing just a read, ie. a single data |
| 1909 | reference. This may seem strange, but since the write can never cause a |
| 1910 | miss (the read guarantees the block is in the cache) it's not very |
| 1911 | interesting.<p> |
| 1912 | |
| 1913 | Thus it measures not the number of times the data cache is accessed, but |
| 1914 | the number of times a data cache miss could occur.<p> |
| 1915 | </li> |
| 1916 | </ul> |
| 1917 | |
| 1918 | If you are interested in simulating a cache with different properties, it is |
| 1919 | not particularly hard to write your own cache simulator, or to modify existing |
| 1920 | ones in <code>vg_cachesim_I1.c</code>, <code>vg_cachesim_I1.c</code> and |
| 1921 | <code>vg_cachesim_I1.c</code>. We'd be interested to hear from anyone who |
| 1922 | does. |
| 1923 | |
| 1924 | |
| 1925 | <a name="profile"></a> |
| 1926 | <h3>7.5 Profiling programs</h3> |
| 1927 | Cache profiling is enabled by using the <code>--cachesim=yes</code> option to |
| 1928 | Valgrind. This automatically turns off Valgrind's memory checking functions, |
| 1929 | since the cache simulation is slow enough already, and you probably don't want |
| 1930 | to do both at once.<p> |
| 1931 | |
| 1932 | To gather cache profiling information about the program <code>ls -l<code, type: |
| 1933 | |
| 1934 | <blockquote><code>valgrind --cachesim=yes ls -l</code></blockquote> |
| 1935 | |
| 1936 | The program will execute (slowly). Upon completion, summary statistics |
| 1937 | that look like this will be printed: |
| 1938 | |
| 1939 | <pre> |
| 1940 | ==31751== I refs: 27,742,716 |
| 1941 | ==31751== I1 misses: 276 |
| 1942 | ==31751== L2 misses: 275 |
| 1943 | ==31751== I1 miss rate: 0.0% |
| 1944 | ==31751== L2i miss rate: 0.0% |
| 1945 | ==31751== |
| 1946 | ==31751== D refs: 15,430,290 (10,955,517 rd + 4,474,773 wr) |
| 1947 | ==31751== D1 misses: 41,185 ( 21,905 rd + 19,280 wr) |
| 1948 | ==31751== L2 misses: 23,085 ( 3,987 rd + 19,098 wr) |
| 1949 | ==31751== D1 miss rate: 0.2% ( 0.1% + 0.4%) |
| 1950 | ==31751== L2d miss rate: 0.1% ( 0.0% + 0.4%) |
| 1951 | ==31751== |
| 1952 | ==31751== L2 misses: 23,360 ( 4,262 rd + 19,098 wr) |
| 1953 | ==31751== L2 miss rate: 0.0% ( 0.0% + 0.4%) |
| 1954 | </pre> |
| 1955 | |
| 1956 | Cache accesses for instruction fetches are summarised first, giving the |
| 1957 | number of fetches made (this is the number of instructions executed, which |
| 1958 | can be useful to know in its own right), the number of I1 misses, and the |
| 1959 | number of L2 instruction (<code>L2i</code>) misses.<p> |
| 1960 | |
| 1961 | Cache accesses for data follow. The information is similar to that of the |
| 1962 | instruction fetches, except that the values are also shown split between reads |
| 1963 | and writes (note each row's <code>rd</code> and <code>wr</code> values add up |
| 1964 | to the row's total).<p> |
| 1965 | |
| 1966 | Combined instruction and data figures for the L2 cache follow that.<p> |
| 1967 | |
| 1968 | |
| 1969 | <h3>7.6 Output file</h3> |
| 1970 | As well as printing summary information, Valgrind also writes line-by-line |
| 1971 | cache profiling information to a file named <code>cachegrind.out</code> . This |
| 1972 | file is human-readable, but is best interpreted by the accompanying program |
| 1973 | vg_annotate, described in the next section.<p> |
| 1974 | |
| 1975 | Things to note about the <code>cachegrind.out</code> file: |
| 1976 | <ul> |
| 1977 | <li>It is written every time <code>valgrind --cachesim=yes</code> is run; it |
| 1978 | will automatically overwrite any existing <code>cachegrind.out<code/> in |
| 1979 | the current directory.</li> |
| 1980 | <li>It can be quite large: <code>ls -l</code> generates a file of about |
| 1981 | 350KB; browsing a few files and web pages with Konqueror generates a file |
| 1982 | of around 10MB.</li> |
| 1983 | </ul> |
| 1984 | |
| 1985 | |
| 1986 | <a name="annotate"></a> |
| 1987 | <h3>7.7 Annotating C/C++ programs</h3> |
| 1988 | Before using vg_annotate, it is worth widening your window to be at least |
| 1989 | 120-characters wide if possible, as the output lines can be quite long.<p> |
| 1990 | |
| 1991 | To get a function-by-function summary, run <code>vg_annotate</code> in |
| 1992 | directory containing a <code>cachegrind.out</code> file. The output looks like |
| 1993 | this: |
| 1994 | |
| 1995 | <pre> |
| 1996 | -------------------------------------------------------------------------------- |
| 1997 | I1 cache: 65536 B, 64 B, 2-way associative |
| 1998 | D1 cache: 65536 B, 64 B, 2-way associative |
| 1999 | L2 cache: 262144 B, 64 B, 8-way associative |
| 2000 | Command: concord vg_to_ucode.c |
| 2001 | Events recorded: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw |
| 2002 | Events shown: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw |
| 2003 | Event sort order: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw |
| 2004 | Threshold: 99% |
| 2005 | Chosen for annotation: |
| 2006 | Auto-annotation: on |
| 2007 | |
| 2008 | -------------------------------------------------------------------------------- |
| 2009 | Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw |
| 2010 | -------------------------------------------------------------------------------- |
| 2011 | 27,742,716 276 275 10,955,517 21,905 3,987 4,474,773 19,280 19,098 PROGRAM TOTALS |
| 2012 | |
| 2013 | -------------------------------------------------------------------------------- |
| 2014 | Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw file:function |
| 2015 | -------------------------------------------------------------------------------- |
| 2016 | 8,821,482 5 5 2,242,702 1,621 73 1,794,230 0 0 getc.c:_IO_getc |
| 2017 | 5,222,023 4 4 2,276,334 16 12 875,959 1 1 concord.c:get_word |
| 2018 | 2,649,248 2 2 1,344,810 7,326 1,385 . . . vg_main.c:strcmp |
| 2019 | 2,521,927 2 2 591,215 0 0 179,398 0 0 concord.c:hash |
| 2020 | 2,242,740 2 2 1,046,612 568 22 448,548 0 0 ctype.c:tolower |
| 2021 | 1,496,937 4 4 630,874 9,000 1,400 279,388 0 0 concord.c:insert |
| 2022 | 897,991 51 51 897,831 95 30 62 1 1 ???:??? |
| 2023 | 598,068 1 1 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__flockfile |
| 2024 | 598,068 0 0 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__funlockfile |
| 2025 | 598,024 4 4 213,580 35 16 149,506 0 0 vg_clientmalloc.c:malloc |
| 2026 | 446,587 1 1 215,973 2,167 430 129,948 14,057 13,957 concord.c:add_existing |
| 2027 | 341,760 2 2 128,160 0 0 128,160 0 0 vg_clientmalloc.c:vg_trap_here_WRAPPER |
| 2028 | 320,782 4 4 150,711 276 0 56,027 53 53 concord.c:init_hash_table |
| 2029 | 298,998 1 1 106,785 0 0 64,071 1 1 concord.c:create |
| 2030 | 149,518 0 0 149,516 0 0 1 0 0 ???:tolower@@GLIBC_2.0 |
| 2031 | 149,518 0 0 149,516 0 0 1 0 0 ???:fgetc@@GLIBC_2.0 |
| 2032 | 95,983 4 4 38,031 0 0 34,409 3,152 3,150 concord.c:new_word_node |
| 2033 | 85,440 0 0 42,720 0 0 21,360 0 0 vg_clientmalloc.c:vg_bogus_epilogue |
| 2034 | </pre> |
| 2035 | |
| 2036 | First up is a summary of the annotation options: |
| 2037 | |
| 2038 | <ul> |
| 2039 | <li>I1 cache, D1 cache, L2 cache: cache configuration. So you know the |
| 2040 | configuration with which these results were obtained.</li><p> |
| 2041 | |
| 2042 | <li>Command: the command line invocation of the program under |
| 2043 | examination.</li><p> |
| 2044 | |
| 2045 | <li>Events recorded: event abbreviations are:<p> |
| 2046 | <ul> |
| 2047 | <li><code>Ir </code>: I cache reads (ie. instructions executed)</li> |
| 2048 | <li><code>I1mr</code>: I1 cache read misses</li> |
| 2049 | <li><code>I2mr</code>: L2 cache instruction read misses</li> |
| 2050 | <li><code>Dr </code>: D cache reads (ie. memory reads)</li> |
| 2051 | <li><code>D1mr</code>: D1 cache read misses</li> |
| 2052 | <li><code>D2mr</code>: L2 cache data read misses</li> |
| 2053 | <li><code>Dw </code>: D cache writes (ie. memory writes)</li> |
| 2054 | <li><code>D1mw</code>: D1 cache write misses</li> |
| 2055 | <li><code>D2mw</code>: L2 cache data write misses</li> |
| 2056 | </ul><p> |
| 2057 | Note that D1 total accesses is given by <code>D1mr</code> + |
| 2058 | <code>D1mw</code>, and that L2 total accesses is given by |
| 2059 | <code>I2mr</code> + <code>D2mr</code> + <code>D2mw</code>.</li><p> |
| 2060 | |
| 2061 | <li>Events shown: the events shown (a subset of events gathered). This can |
| 2062 | be adjusted with the <code>--show</code> option.</li><p> |
| 2063 | |
| 2064 | <li>Event sort order: the sort order in which functions are shown. For |
| 2065 | example, in this case the functions are sorted from highest |
| 2066 | <code>Ir</code> counts to lowest. If two functions have identical |
| 2067 | <code>Ir</code> counts, they will then be sorted by <code>I1mr</code> |
| 2068 | counts, and so on. This order can be adjusted with the |
| 2069 | <code>--sort</code> option.<p> |
| 2070 | |
| 2071 | Note that this dictates the order the functions appear. It is <b>not</b> |
| 2072 | the order in which the columns appear; that is dictated by the "events |
| 2073 | shown" line (and can be changed with the <code>--sort</code> option). |
| 2074 | </li><p> |
| 2075 | |
| 2076 | <li>Threshold: vg_annotate by default omits functions that cause very low |
| 2077 | numbers of misses to avoid drowing you in information. In this case, |
| 2078 | vg_annotate shows summaries the functions that account for 99% of the |
| 2079 | <code>Ir</code> counts; <code>Ir</code> is chosen as the treshold event |
| 2080 | since it is the primary sort event. The threshold can be adjusted with |
| 2081 | the <code>--threshold</code> option.</li><p> |
| 2082 | |
| 2083 | <li>Chosen for annotation: names of files specified manually for annotation; |
| 2084 | in this case none.</li><p> |
| 2085 | |
| 2086 | <li>Auto-annotation: whether auto-annotation was requested via the |
| 2087 | <code>--auto=yes</code> option. In this case no.</li><p> |
| 2088 | </ul> |
| 2089 | |
| 2090 | Then follows summary statistics for the whole program. These are similar |
| 2091 | to the summary provided when running <code>valgrind --cachesim=yes</code>.<p> |
| 2092 | |
| 2093 | Then follows function-by-function statistics. Each function is identified by a |
| 2094 | <code>file_name:function_name</code> pair. If a column contains only a |
| 2095 | `.' it means the function never performs that event (eg. the third row shows |
| 2096 | that <code>strcmp()</code> contains no instructions that write to memory). The |
| 2097 | name <code>???</code> is used if the the file name and/or function name could |
| 2098 | not be determined from debugging information. (If most of the entries have the |
| 2099 | form <code>???:???</code> the program probably wasn't compiled with |
| 2100 | <code>-g</code>.)<p> |
| 2101 | |
| 2102 | It is worth noting that functions will come from three types of source files: |
| 2103 | <ol> |
| 2104 | <li> From the profiled program (<code>concord.c</code> in this example).</li> |
| 2105 | <li>From libraries (eg. <code>getc.c</code>)</li> |
| 2106 | <li>From Valgrind's implementation of some libc functions (eg. |
| 2107 | <code>vg_clientmalloc.c:malloc</code>). These are recognisable because |
| 2108 | the filename begins with <code>vg_</code>, and is probably one of |
| 2109 | <code>vg_main.c</code>, <code>vg_clientmalloc.c</code> or |
| 2110 | <code>vg_mylibc.c</code>. |
| 2111 | </li> |
| 2112 | </ol> |
| 2113 | |
| 2114 | There are two ways to annotate source files -- by choosing them manually, or |
| 2115 | with the <code>--auto=yes</code> option. To do it manually, just |
| 2116 | specify the filenames as arguments to vg_annotate. For example, the output from |
| 2117 | running <code>vg_annotate concord.c</code> for our example produces the same |
| 2118 | output as above followed by an annotated version of <code>concord.c</code>, a |
| 2119 | section of which looks like: |
| 2120 | |
| 2121 | <pre> |
| 2122 | -------------------------------------------------------------------------------- |
| 2123 | -- User-annotated source: concord.c |
| 2124 | -------------------------------------------------------------------------------- |
| 2125 | Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw |
| 2126 | |
| 2127 | [snip] |
| 2128 | |
| 2129 | . . . . . . . . . void init_hash_table(char *file_name, Word_Node *table[]) |
| 2130 | 3 1 1 . . . 1 0 0 { |
| 2131 | . . . . . . . . . FILE *file_ptr; |
| 2132 | . . . . . . . . . Word_Info *data; |
| 2133 | 1 0 0 . . . 1 1 1 int line = 1, i; |
| 2134 | . . . . . . . . . |
| 2135 | 5 0 0 . . . 3 0 0 data = (Word_Info *) create(sizeof(Word_Info)); |
| 2136 | . . . . . . . . . |
| 2137 | 4,991 0 0 1,995 0 0 998 0 0 for (i = 0; i < TABLE_SIZE; i++) |
| 2138 | 3,988 1 1 1,994 0 0 997 53 52 table[i] = NULL; |
| 2139 | . . . . . . . . . |
| 2140 | . . . . . . . . . /* Open file, check it. */ |
| 2141 | 6 0 0 1 0 0 4 0 0 file_ptr = fopen(file_name, "r"); |
| 2142 | 2 0 0 1 0 0 . . . if (!(file_ptr)) { |
| 2143 | . . . . . . . . . fprintf(stderr, "Couldn't open '%s'.\n", file_name); |
| 2144 | 1 1 1 . . . . . . exit(EXIT_FAILURE); |
| 2145 | . . . . . . . . . } |
| 2146 | . . . . . . . . . |
| 2147 | 165,062 1 1 73,360 0 0 91,700 0 0 while ((line = get_word(data, line, file_ptr)) != EOF) |
| 2148 | 146,712 0 0 73,356 0 0 73,356 0 0 insert(data->;word, data->line, table); |
| 2149 | . . . . . . . . . |
| 2150 | 4 0 0 1 0 0 2 0 0 free(data); |
| 2151 | 4 0 0 1 0 0 2 0 0 fclose(file_ptr); |
| 2152 | 3 0 0 2 0 0 . . . } |
| 2153 | </pre> |
| 2154 | |
| 2155 | (Although column widths are automatically minimised, a wide terminal is clearly |
| 2156 | useful.)<p> |
| 2157 | |
| 2158 | Each source file is clearly marked (<code>User-annotated source</code>) as |
| 2159 | having been chosen manually for annotation. If the file was found in one of |
| 2160 | the directories specified with the <code>-I</code>/<code>--include</code> |
| 2161 | option, the directory and file are both given.<p> |
| 2162 | |
| 2163 | Each line is annotated with its event counts. Events not applicable for a line |
| 2164 | are represented by a `.'; this is useful for distinguishing between an event |
| 2165 | which cannot happen, and one which can but did not.<p> |
| 2166 | |
| 2167 | Sometimes only a small section of a source file is executed. To minimise |
| 2168 | uninteresting output, Valgrind only shows annotated lines and lines within a |
| 2169 | small distance of annotated lines. Gaps are marked with the line numbers so |
| 2170 | you know which part of a file the shown code comes from, eg: |
| 2171 | |
| 2172 | <pre> |
| 2173 | (figures and code for line 704) |
| 2174 | -- line 704 ---------------------------------------- |
| 2175 | -- line 878 ---------------------------------------- |
| 2176 | (figures and code for line 878) |
| 2177 | </pre> |
| 2178 | |
| 2179 | The amount of context to show around annotated lines is controlled by the |
| 2180 | <code>--context</code> option.<p> |
| 2181 | |
| 2182 | To get automatic annotation, run <code>vg_annotate --auto=yes</code>. |
| 2183 | vg_annotate will automatically annotate every source file it can find that is |
| 2184 | mentioned in the function-by-function summary. Therefore, the files chosen for |
| 2185 | auto-annotation are affected by the <code>--sort</code> and |
| 2186 | <code>--threshold</code> options. Each source file is clearly marked |
| 2187 | (<code>Auto-annotated source</code>) as being chosen automatically. Any files |
| 2188 | that could not be found are mentioned at the end of the output, eg: |
| 2189 | |
| 2190 | <pre> |
| 2191 | -------------------------------------------------------------------------------- |
| 2192 | The following files chosen for auto-annotation could not be found: |
| 2193 | -------------------------------------------------------------------------------- |
| 2194 | getc.c |
| 2195 | ctype.c |
| 2196 | ../sysdeps/generic/lockfile.c |
| 2197 | </pre> |
| 2198 | |
| 2199 | This is quite common for library files, since libraries are usually compiled |
| 2200 | with debugging information, but the source files are often not present on a |
| 2201 | system. If a file is chosen for annotation <b>both</b> manually and |
| 2202 | automatically, it is marked as <code>User-annotated source</code>. |
| 2203 | |
| 2204 | Use the <code>-I/--include</code> option to tell Valgrind where to look for |
| 2205 | source files if the filenames found from the debugging information aren't |
| 2206 | specific enough. |
| 2207 | |
| 2208 | Beware that vg_annotate can take some time to digest large |
| 2209 | <code>cachegrind.out</code> files, eg. 30 seconds or more. Also beware that |
| 2210 | auto-annotation can produce a lot of output if your program is large! |
| 2211 | |
| 2212 | |
| 2213 | <h3>7.8 Annotating assembler programs</h3> |
| 2214 | Valgrind can annotate assembler programs too, or annotate the assembler |
| 2215 | generated for your C program. Sometimes this is useful for understanding what |
| 2216 | is really happening when an interesting line of C code is translated into |
| 2217 | multiple instructions.<p> |
| 2218 | |
| 2219 | To do this, you just need to assemble your <code>.s</code> files with |
| 2220 | assembler-level debug information. gcc doesn't do this, but you can use GNU as |
| 2221 | with the <code>--gstabs</code> option to generate object files with this |
| 2222 | information, eg: |
| 2223 | |
| 2224 | <blockquote><code>as --gstabs foo.s</code></blockquote> |
| 2225 | |
| 2226 | You can then profile and annotate source files in the same way as for C/C++ |
| 2227 | programs. |
| 2228 | |
| 2229 | |
| 2230 | <h3>7.9 vg_annotate options</h3> |
| 2231 | <ul> |
| 2232 | <li><code>-h, --help</code></li><p> |
| 2233 | <li><code>-v, --version</code><p> |
| 2234 | |
| 2235 | Help and version, as usual.</li> |
| 2236 | |
| 2237 | <li><code>--sort=A,B,C</code> [default: order in |
| 2238 | <code>cachegrind.out</code>]<p> |
| 2239 | Specifies the events upon which the sorting of the function-by-function |
| 2240 | entries will be based. Useful if you want to concentrate on eg. I cache |
| 2241 | misses (<code>--sort=I1mr,I2mr</code>), or D cache misses |
| 2242 | (<code>--sort=D1mr,D2mr</code>), or L2 misses |
| 2243 | (<code>--sort=D2mr,I2mr</code>).</li><p> |
| 2244 | |
| 2245 | <li><code>--show=A,B,C</code> [default: all, using order in |
| 2246 | <code>cachegrind.out</code>]<p> |
| 2247 | Specifies which events to show (and the column order). Default is to use |
| 2248 | all present in the <code>cachegrind.out</code> file (and use the order in |
| 2249 | the file).</li><p> |
| 2250 | |
| 2251 | <li><code>--threshold=X</code> [default: 99%] <p> |
| 2252 | Sets the threshold for the function-by-function summary. Functions are |
| 2253 | shown that account for more than X% of all the primary sort events. If |
| 2254 | auto-annotating, also affects which files are annotated.</li><p> |
| 2255 | |
| 2256 | <li><code>--auto=no</code> [default]<br> |
| 2257 | <code>--auto=yes</code> <p> |
| 2258 | When enabled, automatically annotates every file that is mentioned in the |
| 2259 | function-by-function summary that can be found. Also gives a list of |
| 2260 | those that couldn't be found. |
| 2261 | |
| 2262 | <li><code>--context=N</code> [default: 8]<p> |
| 2263 | Print N lines of context before and after each annotated line. Avoids |
| 2264 | printing large sections of source files that were not executed. Use a |
| 2265 | large number (eg. 10,000) to show all source lines. |
| 2266 | </li><p> |
| 2267 | |
| 2268 | <li><code>-I=<dir>, --include=<dir></code> |
| 2269 | [default: empty string]<p> |
| 2270 | Adds a directory to the list in which to search for files. Multiple |
| 2271 | -I/--include options can be given to add multiple directories. |
| 2272 | </ul> |
| 2273 | |
| 2274 | |
| 2275 | <h3>7.10 Warnings</h3> |
| 2276 | There are a couple of situations in which vg_annotate issues warnings. |
| 2277 | |
| 2278 | <ul> |
| 2279 | <li>If a source file is more recent than the <code>cachegrind.out</code> |
| 2280 | file. This is because the information in <code>cachegrind.out</code> is |
| 2281 | only recorded with line numbers, so if the line numbers change at all in |
| 2282 | the source (eg. lines added, deleted, swapped), any annotations will be |
| 2283 | incorrect.<p> |
| 2284 | |
| 2285 | <li>If information is recorded about line numbers past the end of a file. |
| 2286 | This can be caused by the above problem, ie. shortening the source file |
| 2287 | while using an old <code>cachegrind.out</code> file. If this happens, |
| 2288 | the figures for the bogus lines are printed anyway (clearly marked as |
| 2289 | bogus) in case they are important.</li><p> |
| 2290 | </ul> |
| 2291 | |
| 2292 | |
| 2293 | <h3>7.10 Things to watch out for</h3> |
| 2294 | Some odd things that can occur during annotation: |
| 2295 | |
| 2296 | <ul> |
| 2297 | <li>If annotating at the assembler level, you might see something like this: |
| 2298 | |
| 2299 | <pre> |
| 2300 | 1 0 0 . . . . . . leal -12(%ebp),%eax |
| 2301 | 1 0 0 . . . 1 0 0 movl %eax,84(%ebx) |
| 2302 | 2 0 0 0 0 0 1 0 0 movl $1,-20(%ebp) |
| 2303 | . . . . . . . . . .align 4,0x90 |
| 2304 | 1 0 0 . . . . . . movl $.LnrB,%eax |
| 2305 | 1 0 0 . . . 1 0 0 movl %eax,-16(%ebp) |
| 2306 | </pre> |
| 2307 | |
| 2308 | How can the third instruction be executed twice when the others are |
| 2309 | executed only once? As it turns out, it isn't. Here's a dump of the |
| 2310 | executable, from objdump: |
| 2311 | |
| 2312 | <pre> |
| 2313 | 8048f25: 8d 45 f4 lea 0xfffffff4(%ebp),%eax |
| 2314 | 8048f28: 89 43 54 mov %eax,0x54(%ebx) |
| 2315 | 8048f2b: c7 45 ec 01 00 00 00 movl $0x1,0xffffffec(%ebp) |
| 2316 | 8048f32: 89 f6 mov %esi,%esi |
| 2317 | 8048f34: b8 08 8b 07 08 mov $0x8078b08,%eax |
| 2318 | 8048f39: 89 45 f0 mov %eax,0xfffffff0(%ebp) |
| 2319 | </pre> |
| 2320 | |
| 2321 | Notice the extra <code>mov %esi,%esi</code> instruction. Where did this |
| 2322 | come from? The GNU assembler inserted it to serve as the two bytes of |
| 2323 | padding needed to align the <code>movl $.LnrB,%eax</code> instruction on |
| 2324 | a four-byte boundary, but pretended it didn't exist when adding debug |
| 2325 | information. Thus when Valgrind reads the debug info it thinks that the |
| 2326 | <code>movl $0x1,0xffffffec(%ebp)</code> instruction covers the address |
| 2327 | range 0x8048f2b--0x804833 by itself, and attributes the counts for the |
| 2328 | <code>mov %esi,%esi</code> to it.<p> |
| 2329 | </li> |
| 2330 | |
| 2331 | <li> |
| 2332 | Inlined functions can cause strange results in the function-by-function |
| 2333 | summary. If a function <code>inline_me()</code> is defined in |
| 2334 | <code>foo.h</code> and inlined in the functions <code>f1()</code>, |
| 2335 | <code>f2()</code> and <code>f3()</code> in <code>bar.c</code>, there will |
| 2336 | not be a <code>foo.h:inline_me()</code> function entry. Instead, there |
| 2337 | will be separate function entries for each inlining site, ie. |
| 2338 | <code>foo.h:f1()</code>, <code>foo.h:f2()</code> and |
| 2339 | <code>foo.h:f3()</code>. To find the total counts for |
| 2340 | <code>foo.h:inline_me()</code>, add up the counts from each entry.<p> |
| 2341 | |
| 2342 | The reason for this is that although the debug info output by gcc |
| 2343 | indicates the switch from <code>bar.c</code> to <code>foo.h</code>, it |
| 2344 | doesn't indicate the name of the function in <code>foo.h</code>, so |
| 2345 | Valgrind keeps using the old one.<p> |
| 2346 | |
| 2347 | <li> |
| 2348 | Sometimes, the same filename might be represented with a relative name |
| 2349 | and with an absolute name in different parts of the debug info, eg: |
| 2350 | <code>/home/user/proj/proj.h</code> and <code>../proj.h</code>. In this |
| 2351 | case, if you use auto-annotation, the file will be annotated twice with |
| 2352 | the counts split between the two.<p> |
| 2353 | </li> |
| 2354 | </ul> |
| 2355 | |
| 2356 | Note: stabs is not an easy format to read. If you come across bizarre |
| 2357 | annotations that look like might be caused by a bug in the stabs reader, |
| 2358 | please let us know. |
| 2359 | |
| 2360 | |
| 2361 | <h3>7.11 Accuracy</h3> |
| 2362 | Valgrind's cache profiling has a number of shortcomings: |
| 2363 | |
| 2364 | <ul> |
| 2365 | <li>It doesn't account for kernel activity -- the effect of system calls on |
| 2366 | the cache contents is ignored.</li><p> |
| 2367 | |
| 2368 | <li>It doesn't account for other process activity (although this is probably |
| 2369 | desirable when considering a single program).</li><p> |
| 2370 | |
| 2371 | <li>It doesn't account for virtual-to-physical address mappings; hence the |
| 2372 | entire simulation is not a true representation of what's happening in the |
| 2373 | cache.</li><p> |
| 2374 | |
| 2375 | <li>It doesn't account for cache misses not visible at the instruction level, |
| 2376 | eg. those arising from TLB misses, or speculative execution.</li><p> |
| 2377 | </ul> |
| 2378 | |
| 2379 | Another thing worth nothing is that results are very sensitive. Changing the |
| 2380 | size of the <code>valgrind.so</code> file, the size of the program being |
| 2381 | profiled, or even the length of its name can perturb the results. Variations |
| 2382 | will be small, but don't expect perfectly repeatable results if your program |
| 2383 | changes at all.<p> |
| 2384 | |
| 2385 | While these factors mean you shouldn't trust the results to be super-accurate, |
| 2386 | hopefully they should be close enough to be useful.<p> |
| 2387 | |
| 2388 | |
| 2389 | <h3>7.12 Todo</h3> |
| 2390 | <ul> |
| 2391 | <li>Use CPUID instruction to auto-identify cache configuration during |
| 2392 | installation. This would save the user from having to know their cache |
| 2393 | configuration and using vg_cachegen.</li><p> |
| 2394 | <li>Program start-up/shut-down calls a lot of functions that aren't |
| 2395 | interesting and just complicate the output. Would be nice to exclude |
| 2396 | these somehow.</li><p> |
| 2397 | </ul> |
| 2398 | <hr width="100%"> |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 2399 | </body> |
| 2400 | </html> |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame^] | 2401 | |