sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 1 | <html> |
| 2 | <head> |
| 3 | <style type="text/css"> |
| 4 | body { background-color: #ffffff; |
| 5 | color: #000000; |
| 6 | font-family: Times, Helvetica, Arial; |
| 7 | font-size: 14pt} |
| 8 | h4 { margin-bottom: 0.3em} |
| 9 | code { color: #000000; |
| 10 | font-family: Courier; |
| 11 | font-size: 13pt } |
| 12 | pre { color: #000000; |
| 13 | font-family: Courier; |
| 14 | font-size: 13pt } |
| 15 | a:link { color: #0000C0; |
| 16 | text-decoration: none; } |
| 17 | a:visited { color: #0000C0; |
| 18 | text-decoration: none; } |
| 19 | a:active { color: #0000C0; |
| 20 | text-decoration: none; } |
| 21 | </style> |
| 22 | </head> |
| 23 | |
| 24 | <body bgcolor="#ffffff"> |
| 25 | |
| 26 | <a name="title"> </a> |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 27 | <h1 align=center>Valgrind, snapshot 20020501</h1> |
| 28 | <center>This manual was majorly updated on 20020501</center> |
sewardj | c7529c3 | 2002-04-16 01:55:18 +0000 | [diff] [blame] | 29 | <p> |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 30 | |
| 31 | <center> |
| 32 | <a href="mailto:jseward@acm.org">jseward@acm.org<br> |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 33 | Copyright © 2000-2002 Julian Seward |
| 34 | <p> |
| 35 | Valgrind is licensed under the GNU General Public License, |
| 36 | version 2<br> |
| 37 | An open-source tool for finding memory-management problems in |
| 38 | Linux-x86 executables. |
| 39 | </center> |
| 40 | |
| 41 | <p> |
| 42 | |
| 43 | <hr width="100%"> |
| 44 | <a name="contents"></a> |
| 45 | <h2>Contents of this manual</h2> |
| 46 | |
| 47 | <h4>1 <a href="#intro">Introduction</a></h4> |
| 48 | 1.1 <a href="#whatfor">What Valgrind is for</a><br> |
| 49 | 1.2 <a href="#whatdoes">What it does with your program</a> |
| 50 | |
| 51 | <h4>2 <a href="#howtouse">How to use it, and how to make sense |
| 52 | of the results</a></h4> |
| 53 | 2.1 <a href="#starta">Getting started</a><br> |
| 54 | 2.2 <a href="#comment">The commentary</a><br> |
| 55 | 2.3 <a href="#report">Reporting of errors</a><br> |
| 56 | 2.4 <a href="#suppress">Suppressing errors</a><br> |
| 57 | 2.5 <a href="#flags">Command-line flags</a><br> |
| 58 | 2.6 <a href="#errormsgs">Explaination of error messages</a><br> |
| 59 | 2.7 <a href="#suppfiles">Writing suppressions files</a><br> |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 60 | 2.8 <a href="#clientreq">The Client Request mechanism</a><br> |
| 61 | 2.9 <a href="#pthreads">Support for POSIX pthreads</a><br> |
| 62 | 2.10 <a href="#install">Building and installing</a><br> |
| 63 | 2.11 <a href="#problems">If you have problems</a><br> |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 64 | |
| 65 | <h4>3 <a href="#machine">Details of the checking machinery</a></h4> |
| 66 | 3.1 <a href="#vvalue">Valid-value (V) bits</a><br> |
| 67 | 3.2 <a href="#vaddress">Valid-address (A) bits</a><br> |
| 68 | 3.3 <a href="#together">Putting it all together</a><br> |
| 69 | 3.4 <a href="#signals">Signals</a><br> |
| 70 | 3.5 <a href="#leaks">Memory leak detection</a><br> |
| 71 | |
| 72 | <h4>4 <a href="#limits">Limitations</a></h4> |
| 73 | |
| 74 | <h4>5 <a href="#howitworks">How it works -- a rough overview</a></h4> |
| 75 | 5.1 <a href="#startb">Getting started</a><br> |
| 76 | 5.2 <a href="#engine">The translation/instrumentation engine</a><br> |
| 77 | 5.3 <a href="#track">Tracking the status of memory</a><br> |
| 78 | 5.4 <a href="#sys_calls">System calls</a><br> |
| 79 | 5.5 <a href="#sys_signals">Signals</a><br> |
| 80 | |
| 81 | <h4>6 <a href="#example">An example</a></h4> |
| 82 | |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 83 | <h4>7 <a href="#cache">Cache profiling</a></h4> |
| 84 | |
| 85 | <h4>8 <a href="techdocs.html">The design and implementation of Valgrind</a></h4> |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 86 | |
| 87 | <hr width="100%"> |
| 88 | |
| 89 | <a name="intro"></a> |
| 90 | <h2>1 Introduction</h2> |
| 91 | |
| 92 | <a name="whatfor"></a> |
| 93 | <h3>1.1 What Valgrind is for</h3> |
| 94 | |
| 95 | Valgrind is a tool to help you find memory-management problems in your |
| 96 | programs. When a program is run under Valgrind's supervision, all |
| 97 | reads and writes of memory are checked, and calls to |
| 98 | malloc/new/free/delete are intercepted. As a result, Valgrind can |
| 99 | detect problems such as: |
| 100 | <ul> |
| 101 | <li>Use of uninitialised memory</li> |
| 102 | <li>Reading/writing memory after it has been free'd</li> |
| 103 | <li>Reading/writing off the end of malloc'd blocks</li> |
| 104 | <li>Reading/writing inappropriate areas on the stack</li> |
| 105 | <li>Memory leaks -- where pointers to malloc'd blocks are lost forever</li> |
| 106 | </ul> |
| 107 | |
| 108 | Problems like these can be difficult to find by other means, often |
| 109 | lying undetected for long periods, then causing occasional, |
| 110 | difficult-to-diagnose crashes. |
| 111 | |
| 112 | <p> |
| 113 | Valgrind is closely tied to details of the CPU, operating system and |
| 114 | to a less extent, compiler and basic C libraries. This makes it |
| 115 | difficult to make it portable, so I have chosen at the outset to |
| 116 | concentrate on what I believe to be a widely used platform: Red Hat |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 117 | Linux 7.2, on x86s. Valgrind uses the standard Unix |
| 118 | <code>./configure</code>, <code>make</code>, <code>make install</code> |
| 119 | mechanism, and I have attempted to ensure that it works on machines |
| 120 | with kernel 2.2 or 2.4 and glibc 2.1.X or 2.2.X. This should cover |
| 121 | the vast majority of modern Linux installations. |
| 122 | |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 123 | |
| 124 | <p> |
| 125 | Valgrind is licensed under the GNU General Public License, version |
| 126 | 2. Read the file LICENSE in the source distribution for details. |
| 127 | |
| 128 | <a name="whatdoes"> |
| 129 | <h3>1.2 What it does with your program</h3> |
| 130 | |
| 131 | Valgrind is designed to be as non-intrusive as possible. It works |
| 132 | directly with existing executables. You don't need to recompile, |
| 133 | relink, or otherwise modify, the program to be checked. Simply place |
| 134 | the word <code>valgrind</code> at the start of the command line |
| 135 | normally used to run the program. So, for example, if you want to run |
| 136 | the command <code>ls -l</code> on Valgrind, simply issue the |
| 137 | command: <code>valgrind ls -l</code>. |
| 138 | |
| 139 | <p>Valgrind takes control of your program before it starts. Debugging |
| 140 | information is read from the executable and associated libraries, so |
| 141 | that error messages can be phrased in terms of source code |
| 142 | locations. Your program is then run on a synthetic x86 CPU which |
| 143 | checks every memory access. All detected errors are written to a |
| 144 | log. When the program finishes, Valgrind searches for and reports on |
| 145 | leaked memory. |
| 146 | |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 147 | <p>You can run pretty much any dynamically linked ELF x86 executable |
| 148 | using Valgrind. Programs run 25 to 50 times slower, and take a lot |
| 149 | more memory, than they usually would. It works well enough to run |
| 150 | large programs. For example, the Konqueror web browser from the KDE |
| 151 | Desktop Environment, version 3.0, runs slowly but usably on Valgrind. |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 152 | |
| 153 | <p>Valgrind simulates every single instruction your program executes. |
| 154 | Because of this, it finds errors not only in your application but also |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 155 | in all supporting dynamically-linked (<code>.so</code>-format) |
| 156 | libraries, including the GNU C library, the X client libraries, Qt, if |
| 157 | you work with KDE, and so on. That often includes libraries, for |
| 158 | example the GNU C library, which contain memory access violations, but |
| 159 | which you cannot or do not want to fix. |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 160 | |
| 161 | <p>Rather than swamping you with errors in which you are not |
| 162 | interested, Valgrind allows you to selectively suppress errors, by |
| 163 | recording them in a suppressions file which is read when Valgrind |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 164 | starts up. The build mechanism attempts to select suppressions which |
| 165 | give reasonable behaviour for the libc and XFree86 versions detected |
| 166 | on your machine. |
| 167 | |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 168 | |
| 169 | <p><a href="#example">Section 6</a> shows an example of use. |
| 170 | <p> |
| 171 | <hr width="100%"> |
| 172 | |
| 173 | <a name="howtouse"></a> |
| 174 | <h2>2 How to use it, and how to make sense of the results</h2> |
| 175 | |
| 176 | <a name="starta"></a> |
| 177 | <h3>2.1 Getting started</h3> |
| 178 | |
| 179 | First off, consider whether it might be beneficial to recompile your |
| 180 | application and supporting libraries with optimisation disabled and |
| 181 | debugging info enabled (the <code>-g</code> flag). You don't have to |
| 182 | do this, but doing so helps Valgrind produce more accurate and less |
| 183 | confusing error reports. Chances are you're set up like this already, |
| 184 | if you intended to debug your program with GNU gdb, or some other |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 185 | debugger. |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 186 | |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 187 | <p> |
| 188 | A plausible compromise is to use <code>-g -O</code>. |
| 189 | Optimisation levels above <code>-O</code> have been observed, on very |
| 190 | rare occasions, to cause gcc to generate code which fools Valgrind's |
| 191 | error tracking machinery into wrongly reporting uninitialised value |
| 192 | errors. <code>-O</code> gets you the vast majority of the benefits of |
| 193 | higher optimisation levels anyway, so you don't lose much there. |
| 194 | |
| 195 | <p> |
| 196 | Note that as of 1 May 2002 Valgrind does not understand the DWARF |
| 197 | debugging format, which is unfortunate since the upcoming gcc-3.1 uses |
| 198 | it by default. Valgrind only knows about the older "stabs" format. |
| 199 | If you use gcc-3.1 or above, you can still ask for stabs-format debug |
| 200 | info by passing <code>-gstabs</code> to gcc. |
| 201 | |
| 202 | <p> |
| 203 | Then just run your application, but place the word |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 204 | <code>valgrind</code> in front of your usual command-line invokation. |
| 205 | Note that you should run the real (machine-code) executable here. If |
| 206 | your application is started by, for example, a shell or perl script, |
| 207 | you'll need to modify it to invoke Valgrind on the real executables. |
| 208 | Running such scripts directly under Valgrind will result in you |
| 209 | getting error reports pertaining to <code>/bin/sh</code>, |
| 210 | <code>/usr/bin/perl</code>, or whatever interpreter you're using. |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 211 | This almost certainly isn't what you want and can be confusing. |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 212 | |
| 213 | <a name="comment"></a> |
| 214 | <h3>2.2 The commentary</h3> |
| 215 | |
| 216 | Valgrind writes a commentary, detailing error reports and other |
| 217 | significant events. The commentary goes to standard output by |
| 218 | default. This may interfere with your program, so you can ask for it |
| 219 | to be directed elsewhere. |
| 220 | |
| 221 | <p>All lines in the commentary are of the following form:<br> |
| 222 | <pre> |
| 223 | ==12345== some-message-from-Valgrind |
| 224 | </pre> |
| 225 | <p>The <code>12345</code> is the process ID. This scheme makes it easy |
| 226 | to distinguish program output from Valgrind commentary, and also easy |
| 227 | to differentiate commentaries from different processes which have |
| 228 | become merged together, for whatever reason. |
| 229 | |
| 230 | <p>By default, Valgrind writes only essential messages to the commentary, |
| 231 | so as to avoid flooding you with information of secondary importance. |
| 232 | If you want more information about what is happening, re-run, passing |
| 233 | the <code>-v</code> flag to Valgrind. |
| 234 | |
| 235 | |
| 236 | <a name="report"></a> |
| 237 | <h3>2.3 Reporting of errors</h3> |
| 238 | |
| 239 | When Valgrind detects something bad happening in the program, an error |
| 240 | message is written to the commentary. For example:<br> |
| 241 | <pre> |
| 242 | ==25832== Invalid read of size 4 |
| 243 | ==25832== at 0x8048724: BandMatrix::ReSize(int, int, int) (bogon.cpp:45) |
| 244 | ==25832== by 0x80487AF: main (bogon.cpp:66) |
| 245 | ==25832== by 0x40371E5E: __libc_start_main (libc-start.c:129) |
| 246 | ==25832== by 0x80485D1: (within /home/sewardj/newmat10/bogon) |
| 247 | ==25832== Address 0xBFFFF74C is not stack'd, malloc'd or free'd |
| 248 | </pre> |
| 249 | |
| 250 | <p>This message says that the program did an illegal 4-byte read of |
| 251 | address 0xBFFFF74C, which, as far as it can tell, is not a valid stack |
| 252 | address, nor corresponds to any currently malloc'd or free'd blocks. |
| 253 | The read is happening at line 45 of <code>bogon.cpp</code>, called |
| 254 | from line 66 of the same file, etc. For errors associated with an |
| 255 | identified malloc'd/free'd block, for example reading free'd memory, |
| 256 | Valgrind reports not only the location where the error happened, but |
| 257 | also where the associated block was malloc'd/free'd. |
| 258 | |
| 259 | <p>Valgrind remembers all error reports. When an error is detected, |
| 260 | it is compared against old reports, to see if it is a duplicate. If |
| 261 | so, the error is noted, but no further commentary is emitted. This |
| 262 | avoids you being swamped with bazillions of duplicate error reports. |
| 263 | |
| 264 | <p>If you want to know how many times each error occurred, run with |
| 265 | the <code>-v</code> option. When execution finishes, all the reports |
| 266 | are printed out, along with, and sorted by, their occurrence counts. |
| 267 | This makes it easy to see which errors have occurred most frequently. |
| 268 | |
| 269 | <p>Errors are reported before the associated operation actually |
| 270 | happens. For example, if you program decides to read from address |
| 271 | zero, Valgrind will emit a message to this effect, and the program |
| 272 | will then duly die with a segmentation fault. |
| 273 | |
| 274 | <p>In general, you should try and fix errors in the order that they |
| 275 | are reported. Not doing so can be confusing. For example, a program |
| 276 | which copies uninitialised values to several memory locations, and |
| 277 | later uses them, will generate several error messages. The first such |
| 278 | error message may well give the most direct clue to the root cause of |
| 279 | the problem. |
| 280 | |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 281 | <p>The process of detecting duplicate errors is quite an expensive |
| 282 | one and can become a significant performance overhead if your program |
| 283 | generates huge quantities of errors. To avoid serious problems here, |
| 284 | Valgrind will simply stop collecting errors after 300 different errors |
| 285 | have been seen, or 30000 errors in total have been seen. In this |
| 286 | situation you might as well stop your program and fix it, because |
| 287 | Valgrind won't tell you anything else useful after this. Note that |
| 288 | the 300/30000 limits apply after suppressed errors are removed. These |
| 289 | limits are defined in <code>vg_include.h</code> and can be increased |
| 290 | if necessary. |
| 291 | |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 292 | <a name="suppress"></a> |
| 293 | <h3>2.4 Suppressing errors</h3> |
| 294 | |
| 295 | Valgrind detects numerous problems in the base libraries, such as the |
| 296 | GNU C library, and the XFree86 client libraries, which come |
| 297 | pre-installed on your GNU/Linux system. You can't easily fix these, |
| 298 | but you don't want to see these errors (and yes, there are many!) So |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 299 | Valgrind reads a list of errors to suppress at startup. |
| 300 | A default suppression file is cooked up by the |
| 301 | <code>./configure</code> script. |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 302 | |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 303 | <p>You can modify and add to the suppressions file at your leisure, |
| 304 | or, better, write your own. Multiple suppression files are allowed. |
| 305 | This is useful if part of your project contains errors you can't or |
| 306 | don't want to fix, yet you don't want to continuously be reminded of |
| 307 | them. |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 308 | |
| 309 | <p>Each error to be suppressed is described very specifically, to |
| 310 | minimise the possibility that a suppression-directive inadvertantly |
| 311 | suppresses a bunch of similar errors which you did want to see. The |
| 312 | suppression mechanism is designed to allow precise yet flexible |
| 313 | specification of errors to suppress. |
| 314 | |
| 315 | <p>If you use the <code>-v</code> flag, at the end of execution, Valgrind |
| 316 | prints out one line for each used suppression, giving its name and the |
| 317 | number of times it got used. Here's the suppressions used by a run of |
| 318 | <code>ls -l</code>: |
| 319 | <pre> |
| 320 | --27579-- supp: 1 socketcall.connect(serv_addr)/__libc_connect/__nscd_getgrgid_r |
| 321 | --27579-- supp: 1 socketcall.connect(serv_addr)/__libc_connect/__nscd_getpwuid_r |
| 322 | --27579-- supp: 6 strrchr/_dl_map_object_from_fd/_dl_map_object |
| 323 | </pre> |
| 324 | |
| 325 | <a name="flags"></a> |
| 326 | <h3>2.5 Command-line flags</h3> |
| 327 | |
| 328 | You invoke Valgrind like this: |
| 329 | <pre> |
| 330 | valgrind [options-for-Valgrind] your-prog [options for your-prog] |
| 331 | </pre> |
| 332 | |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 333 | <p>Note that Valgrind also reads options from the environment variable |
| 334 | <code>$VALGRIND</code>, and processes them before the command-line |
| 335 | options. |
| 336 | |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 337 | <p>Valgrind's default settings succeed in giving reasonable behaviour |
| 338 | in most cases. Available options, in no particular order, are as |
| 339 | follows: |
| 340 | <ul> |
| 341 | <li><code>--help</code></li><br> |
| 342 | |
| 343 | <li><code>--version</code><br> |
| 344 | <p>The usual deal.</li><br><p> |
| 345 | |
| 346 | <li><code>-v --verbose</code><br> |
| 347 | <p>Be more verbose. Gives extra information on various aspects |
| 348 | of your program, such as: the shared objects loaded, the |
| 349 | suppressions used, the progress of the instrumentation engine, |
| 350 | and warnings about unusual behaviour. |
| 351 | </li><br><p> |
| 352 | |
| 353 | <li><code>-q --quiet</code><br> |
| 354 | <p>Run silently, and only print error messages. Useful if you |
| 355 | are running regression tests or have some other automated test |
| 356 | machinery. |
| 357 | </li><br><p> |
| 358 | |
| 359 | <li><code>--demangle=no</code><br> |
| 360 | <code>--demangle=yes</code> [the default] |
| 361 | <p>Disable/enable automatic demangling (decoding) of C++ names. |
| 362 | Enabled by default. When enabled, Valgrind will attempt to |
| 363 | translate encoded C++ procedure names back to something |
| 364 | approaching the original. The demangler handles symbols mangled |
| 365 | by g++ versions 2.X and 3.X. |
| 366 | |
| 367 | <p>An important fact about demangling is that function |
| 368 | names mentioned in suppressions files should be in their mangled |
| 369 | form. Valgrind does not demangle function names when searching |
| 370 | for applicable suppressions, because to do otherwise would make |
| 371 | suppressions file contents dependent on the state of Valgrind's |
| 372 | demangling machinery, and would also be slow and pointless. |
| 373 | </li><br><p> |
| 374 | |
| 375 | <li><code>--num-callers=<number></code> [default=4]<br> |
| 376 | <p>By default, Valgrind shows four levels of function call names |
| 377 | to help you identify program locations. You can change that |
| 378 | number with this option. This can help in determining the |
| 379 | program's location in deeply-nested call chains. Note that errors |
| 380 | are commoned up using only the top three function locations (the |
| 381 | place in the current function, and that of its two immediate |
| 382 | callers). So this doesn't affect the total number of errors |
| 383 | reported. |
| 384 | <p> |
| 385 | The maximum value for this is 50. Note that higher settings |
| 386 | will make Valgrind run a bit more slowly and take a bit more |
| 387 | memory, but can be useful when working with programs with |
| 388 | deeply-nested call chains. |
| 389 | </li><br><p> |
| 390 | |
| 391 | <li><code>--gdb-attach=no</code> [the default]<br> |
| 392 | <code>--gdb-attach=yes</code> |
| 393 | <p>When enabled, Valgrind will pause after every error shown, |
| 394 | and print the line |
| 395 | <br> |
| 396 | <code>---- Attach to GDB ? --- [Return/N/n/Y/y/C/c] ----</code> |
| 397 | <p> |
| 398 | Pressing <code>Ret</code>, or <code>N</code> <code>Ret</code> |
| 399 | or <code>n</code> <code>Ret</code>, causes Valgrind not to |
| 400 | start GDB for this error. |
| 401 | <p> |
| 402 | <code>Y</code> <code>Ret</code> |
| 403 | or <code>y</code> <code>Ret</code> causes Valgrind to |
| 404 | start GDB, for the program at this point. When you have |
| 405 | finished with GDB, quit from it, and the program will continue. |
| 406 | Trying to continue from inside GDB doesn't work. |
| 407 | <p> |
| 408 | <code>C</code> <code>Ret</code> |
| 409 | or <code>c</code> <code>Ret</code> causes Valgrind not to |
| 410 | start GDB, and not to ask again. |
| 411 | <p> |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 412 | <code>--gdb-attach=yes</code> conflicts with |
| 413 | <code>--trace-children=yes</code>. You can't use them together. |
| 414 | Valgrind refuses to start up in this situation. 1 May 2002: |
| 415 | this is a historical relic which could be easily fixed if it |
| 416 | gets in your way. Mail me and complain if this is a problem for |
| 417 | you. </li><br><p> |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 418 | |
| 419 | <li><code>--partial-loads-ok=yes</code> [the default]<br> |
| 420 | <code>--partial-loads-ok=no</code> |
| 421 | <p>Controls how Valgrind handles word (4-byte) loads from |
| 422 | addresses for which some bytes are addressible and others |
| 423 | are not. When <code>yes</code> (the default), such loads |
| 424 | do not elicit an address error. Instead, the loaded V bytes |
| 425 | corresponding to the illegal addresses indicate undefined, and |
| 426 | those corresponding to legal addresses are loaded from shadow |
| 427 | memory, as usual. |
| 428 | <p> |
| 429 | When <code>no</code>, loads from partially |
| 430 | invalid addresses are treated the same as loads from completely |
| 431 | invalid addresses: an illegal-address error is issued, |
| 432 | and the resulting V bytes indicate valid data. |
| 433 | </li><br><p> |
| 434 | |
| 435 | <li><code>--sloppy-malloc=no</code> [the default]<br> |
| 436 | <code>--sloppy-malloc=yes</code> |
| 437 | <p>When enabled, all requests for malloc/calloc are rounded up |
| 438 | to a whole number of machine words -- in other words, made |
| 439 | divisible by 4. For example, a request for 17 bytes of space |
| 440 | would result in a 20-byte area being made available. This works |
| 441 | around bugs in sloppy libraries which assume that they can |
| 442 | safely rely on malloc/calloc requests being rounded up in this |
| 443 | fashion. Without the workaround, these libraries tend to |
| 444 | generate large numbers of errors when they access the ends of |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 445 | these areas. |
| 446 | <p> |
| 447 | Valgrind snapshots dated 17 Feb 2002 and later are |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 448 | cleverer about this problem, and you should no longer need to |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 449 | use this flag. To put it bluntly, if you do need to use this |
| 450 | flag, your program violates the ANSI C semantics defined for |
| 451 | <code>malloc</code> and <code>free</code>, even if it appears to |
| 452 | work correctly, and you should fix it, at least if you hope for |
| 453 | maximum portability. |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 454 | </li><br><p> |
| 455 | |
| 456 | <li><code>--trace-children=no</code> [the default]</br> |
| 457 | <code>--trace-children=yes</code> |
| 458 | <p>When enabled, Valgrind will trace into child processes. This |
| 459 | is confusing and usually not what you want, so is disabled by |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 460 | default. As of 1 May 2002, tracing into a child process from a |
| 461 | parent which uses <code>libpthread.so</code> is probably broken |
| 462 | and is likely to cause breakage. Please report any such |
| 463 | problems to me. </li><br><p> |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 464 | |
| 465 | <li><code>--freelist-vol=<number></code> [default: 1000000] |
| 466 | <p>When the client program releases memory using free (in C) or |
| 467 | delete (C++), that memory is not immediately made available for |
| 468 | re-allocation. Instead it is marked inaccessible and placed in |
| 469 | a queue of freed blocks. The purpose is to delay the point at |
| 470 | which freed-up memory comes back into circulation. This |
| 471 | increases the chance that Valgrind will be able to detect |
| 472 | invalid accesses to blocks for some significant period of time |
| 473 | after they have been freed. |
| 474 | <p> |
| 475 | This flag specifies the maximum total size, in bytes, of the |
| 476 | blocks in the queue. The default value is one million bytes. |
| 477 | Increasing this increases the total amount of memory used by |
| 478 | Valgrind but may detect invalid uses of freed blocks which would |
| 479 | otherwise go undetected.</li><br><p> |
| 480 | |
| 481 | <li><code>--logfile-fd=<number></code> [default: 2, stderr] |
| 482 | <p>Specifies the file descriptor on which Valgrind communicates |
| 483 | all of its messages. The default, 2, is the standard error |
| 484 | channel. This may interfere with the client's own use of |
| 485 | stderr. To dump Valgrind's commentary in a file without using |
| 486 | stderr, something like the following works well (sh/bash |
| 487 | syntax):<br> |
| 488 | <code> |
| 489 | valgrind --logfile-fd=9 my_prog 9> logfile</code><br> |
| 490 | That is: tell Valgrind to send all output to file descriptor 9, |
| 491 | and ask the shell to route file descriptor 9 to "logfile". |
| 492 | </li><br><p> |
| 493 | |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 494 | <li><code>--suppressions=<filename></code> |
| 495 | [default: $PREFIX/lib/valgrind/default.supp] |
| 496 | <p>Specifies an extra |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 497 | file from which to read descriptions of errors to suppress. You |
| 498 | may use as many extra suppressions files as you |
| 499 | like.</li><br><p> |
| 500 | |
| 501 | <li><code>--leak-check=no</code> [default]<br> |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 502 | <code>--leak-check=yes</code> |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 503 | <p>When enabled, search for memory leaks when the client program |
| 504 | finishes. A memory leak means a malloc'd block, which has not |
| 505 | yet been free'd, but to which no pointer can be found. Such a |
| 506 | block can never be free'd by the program, since no pointer to it |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 507 | exists. Leak checking is disabled by default because it tends |
| 508 | to generate dozens of error messages. </li><br><p> |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 509 | |
| 510 | <li><code>--show-reachable=no</code> [default]<br> |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 511 | <code>--show-reachable=yes</code> |
| 512 | <p>When disabled, the memory leak detector only shows blocks for |
| 513 | which it cannot find a pointer to at all, or it can only find a |
| 514 | pointer to the middle of. These blocks are prime candidates for |
| 515 | memory leaks. When enabled, the leak detector also reports on |
| 516 | blocks which it could find a pointer to. Your program could, at |
| 517 | least in principle, have freed such blocks before exit. |
| 518 | Contrast this to blocks for which no pointer, or only an |
| 519 | interior pointer could be found: they are more likely to |
| 520 | indicate memory leaks, because you do not actually have a |
| 521 | pointer to the start of the block which you can hand to |
| 522 | <code>free</code>, even if you wanted to. </li><br><p> |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 523 | |
| 524 | <li><code>--leak-resolution=low</code> [default]<br> |
| 525 | <code>--leak-resolution=med</code> <br> |
| 526 | <code>--leak-resolution=high</code> |
| 527 | <p>When doing leak checking, determines how willing Valgrind is |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 528 | to consider different backtraces to be the same. When set to |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 529 | <code>low</code>, the default, only the first two entries need |
| 530 | match. When <code>med</code>, four entries have to match. When |
| 531 | <code>high</code>, all entries need to match. |
| 532 | <p> |
| 533 | For hardcore leak debugging, you probably want to use |
| 534 | <code>--leak-resolution=high</code> together with |
| 535 | <code>--num-callers=40</code> or some such large number. Note |
| 536 | however that this can give an overwhelming amount of |
| 537 | information, which is why the defaults are 4 callers and |
| 538 | low-resolution matching. |
| 539 | <p> |
| 540 | Note that the <code>--leak-resolution=</code> setting does not |
| 541 | affect Valgrind's ability to find leaks. It only changes how |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 542 | the results are presented. |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 543 | </li><br><p> |
| 544 | |
| 545 | <li><code>--workaround-gcc296-bugs=no</code> [default]<br> |
| 546 | <code>--workaround-gcc296-bugs=yes</code> <p>When enabled, |
| 547 | assume that reads and writes some small distance below the stack |
| 548 | pointer <code>%esp</code> are due to bugs in gcc 2.96, and does |
| 549 | not report them. The "small distance" is 256 bytes by default. |
| 550 | Note that gcc 2.96 is the default compiler on some popular Linux |
| 551 | distributions (RedHat 7.X, Mandrake) and so you may well need to |
| 552 | use this flag. Do not use it if you do not have to, as it can |
| 553 | cause real errors to be overlooked. A better option is to use a |
| 554 | gcc/g++ which works properly; 2.95.3 seems to be a good choice. |
| 555 | <p> |
| 556 | Unfortunately (27 Feb 02) it looks like g++ 3.0.4 is similarly |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 557 | buggy, so you may need to issue this flag if you use 3.0.4. A |
| 558 | while later (early Apr 02) this is confirmed as a scheduling bug |
| 559 | in g++-3.0.4. |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 560 | </li><br><p> |
| 561 | |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 562 | <li><code>--cachesim=no</code> [default]<br> |
sewardj | 434f57f | 2002-05-01 01:24:52 +0000 | [diff] [blame] | 563 | <code>--cachesim=yes</code> <p>When enabled, turns off memory |
| 564 | checking, and turns on cache profiling. Cache profiling is |
sewardj | 3984b85 | 2002-05-12 03:00:17 +0000 | [diff] [blame] | 565 | described in detail in <a href="#cache">Section 7</a>. |
| 566 | </li><br><p> |
| 567 | |
sewardj | 8d365b5 | 2002-05-12 10:52:16 +0000 | [diff] [blame] | 568 | <li><code>--weird-hacks=hack1,hack2,...</code> |
sewardj | 3984b85 | 2002-05-12 03:00:17 +0000 | [diff] [blame] | 569 | Pass miscellaneous hints to Valgrind which slightly modify the |
| 570 | simulated behaviour in nonstandard or dangerous ways, possibly |
| 571 | to help the simulation of strange features. By default no hacks |
| 572 | are enabled. Use with caution! Currently known hacks are: |
| 573 | <p> |
| 574 | <ul> |
| 575 | <li><code>ioctl-VTIME</code> Use this if you have a program |
| 576 | which sets readable file descriptors to have a timeout by |
| 577 | doing <code>ioctl</code> on them with a |
| 578 | <code>TCSETA</code>-style command <b>and</b> a non-zero |
| 579 | <code>VTIME</code> timeout value. This is considered |
| 580 | potentially dangerous and therefore is not engaged by |
| 581 | default, because it is (remotely) conceivable that it could |
| 582 | cause threads doing <code>read</code> to incorrectly block |
| 583 | the entire process. |
| 584 | <p> |
| 585 | You probably want to try this one if you have a program |
| 586 | which unexpectedly blocks in a <code>read</code> from a file |
| 587 | descriptor which you know to have been messed with by |
| 588 | <code>ioctl</code>. This could happen, for example, if the |
| 589 | descriptor is used to read input from some kind of screen |
| 590 | handling library. |
| 591 | <p> |
| 592 | To find out if your program is blocking unexpectedly in the |
| 593 | <code>read</code> system call, run with |
| 594 | <code>--trace-syscalls=yes</code> flag. |
| 595 | </ul> |
| 596 | |
| 597 | </li><p> |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 598 | </ul> |
| 599 | |
| 600 | There are also some options for debugging Valgrind itself. You |
| 601 | shouldn't need to use them in the normal run of things. Nevertheless: |
| 602 | |
| 603 | <ul> |
| 604 | |
| 605 | <li><code>--single-step=no</code> [default]<br> |
| 606 | <code>--single-step=yes</code> |
| 607 | <p>When enabled, each x86 insn is translated seperately into |
| 608 | instrumented code. When disabled, translation is done on a |
| 609 | per-basic-block basis, giving much better translations.</li><br> |
| 610 | <p> |
| 611 | |
| 612 | <li><code>--optimise=no</code><br> |
| 613 | <code>--optimise=yes</code> [default] |
| 614 | <p>When enabled, various improvements are applied to the |
| 615 | intermediate code, mainly aimed at allowing the simulated CPU's |
| 616 | registers to be cached in the real CPU's registers over several |
| 617 | simulated instructions.</li><br> |
| 618 | <p> |
| 619 | |
| 620 | <li><code>--instrument=no</code><br> |
| 621 | <code>--instrument=yes</code> [default] |
| 622 | <p>When disabled, the translations don't actually contain any |
| 623 | instrumentation.</li><br> |
| 624 | <p> |
| 625 | |
| 626 | <li><code>--cleanup=no</code><br> |
| 627 | <code>--cleanup=yes</code> [default] |
| 628 | <p>When enabled, various improvments are applied to the |
| 629 | post-instrumented intermediate code, aimed at removing redundant |
| 630 | value checks.</li><br> |
| 631 | <p> |
| 632 | |
| 633 | <li><code>--trace-syscalls=no</code> [default]<br> |
| 634 | <code>--trace-syscalls=yes</code> |
| 635 | <p>Enable/disable tracing of system call intercepts.</li><br> |
| 636 | <p> |
| 637 | |
| 638 | <li><code>--trace-signals=no</code> [default]<br> |
| 639 | <code>--trace-signals=yes</code> |
| 640 | <p>Enable/disable tracing of signal handling.</li><br> |
| 641 | <p> |
| 642 | |
sewardj | c7529c3 | 2002-04-16 01:55:18 +0000 | [diff] [blame] | 643 | <li><code>--trace-sched=no</code> [default]<br> |
| 644 | <code>--trace-sched=yes</code> |
| 645 | <p>Enable/disable tracing of thread scheduling events.</li><br> |
| 646 | <p> |
| 647 | |
sewardj | 45b4b37 | 2002-04-16 22:50:32 +0000 | [diff] [blame] | 648 | <li><code>--trace-pthread=none</code> [default]<br> |
| 649 | <code>--trace-pthread=some</code> <br> |
| 650 | <code>--trace-pthread=all</code> |
| 651 | <p>Specifies amount of trace detail for pthread-related events.</li><br> |
sewardj | c7529c3 | 2002-04-16 01:55:18 +0000 | [diff] [blame] | 652 | <p> |
| 653 | |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 654 | <li><code>--trace-symtab=no</code> [default]<br> |
| 655 | <code>--trace-symtab=yes</code> |
| 656 | <p>Enable/disable tracing of symbol table reading.</li><br> |
| 657 | <p> |
| 658 | |
| 659 | <li><code>--trace-malloc=no</code> [default]<br> |
| 660 | <code>--trace-malloc=yes</code> |
| 661 | <p>Enable/disable tracing of malloc/free (et al) intercepts. |
| 662 | </li><br> |
| 663 | <p> |
| 664 | |
| 665 | <li><code>--stop-after=<number></code> |
| 666 | [default: infinity, more or less] |
| 667 | <p>After <number> basic blocks have been executed, shut down |
| 668 | Valgrind and switch back to running the client on the real CPU. |
| 669 | </li><br> |
| 670 | <p> |
| 671 | |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 672 | <li><code>--dump-error=<number></code> [default: inactive] |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 673 | <p>After the program has exited, show gory details of the |
| 674 | translation of the basic block containing the <number>'th |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 675 | error context. When used with <code>--single-step=yes</code>, |
| 676 | can show the exact x86 instruction causing an error. This is |
| 677 | all fairly dodgy and doesn't work at all if threads are |
| 678 | involved.</li><br> |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 679 | <p> |
| 680 | |
| 681 | <li><code>--smc-check=none</code><br> |
| 682 | <code>--smc-check=some</code> [default]<br> |
| 683 | <code>--smc-check=all</code> |
| 684 | <p>How carefully should Valgrind check for self-modifying code |
| 685 | writes, so that translations can be discarded? When |
| 686 | "none", no writes are checked. When "some", only writes |
| 687 | resulting from moves from integer registers to memory are |
| 688 | checked. When "all", all memory writes are checked, even those |
| 689 | with which are no sane program would generate code -- for |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 690 | example, floating-point writes. |
| 691 | <p> |
| 692 | NOTE that this is all a bit bogus. This mechanism has never |
| 693 | been enabled in any snapshot of Valgrind which was made |
| 694 | available to the general public, because the extra checks reduce |
| 695 | performance, increase complexity, and I have yet to come across |
| 696 | any programs which actually use self-modifying code. I think |
| 697 | the flag is ignored. |
| 698 | </li> |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 699 | </ul> |
| 700 | |
| 701 | |
| 702 | <a name="errormsgs"> |
| 703 | <h3>2.6 Explaination of error messages</h3> |
| 704 | |
| 705 | Despite considerable sophistication under the hood, Valgrind can only |
| 706 | really detect two kinds of errors, use of illegal addresses, and use |
| 707 | of undefined values. Nevertheless, this is enough to help you |
| 708 | discover all sorts of memory-management nasties in your code. This |
| 709 | section presents a quick summary of what error messages mean. The |
| 710 | precise behaviour of the error-checking machinery is described in |
| 711 | <a href="#machine">Section 4</a>. |
| 712 | |
| 713 | |
| 714 | <h4>2.6.1 Illegal read / Illegal write errors</h4> |
| 715 | For example: |
| 716 | <pre> |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 717 | Invalid read of size 4 |
| 718 | at 0x40F6BBCC: (within /usr/lib/libpng.so.2.1.0.9) |
| 719 | by 0x40F6B804: (within /usr/lib/libpng.so.2.1.0.9) |
| 720 | by 0x40B07FF4: read_png_image__FP8QImageIO (kernel/qpngio.cpp:326) |
| 721 | by 0x40AC751B: QImageIO::read() (kernel/qimage.cpp:3621) |
| 722 | Address 0xBFFFF0E0 is not stack'd, malloc'd or free'd |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 723 | </pre> |
| 724 | |
| 725 | <p>This happens when your program reads or writes memory at a place |
| 726 | which Valgrind reckons it shouldn't. In this example, the program did |
| 727 | a 4-byte read at address 0xBFFFF0E0, somewhere within the |
| 728 | system-supplied library libpng.so.2.1.0.9, which was called from |
| 729 | somewhere else in the same library, called from line 326 of |
| 730 | qpngio.cpp, and so on. |
| 731 | |
| 732 | <p>Valgrind tries to establish what the illegal address might relate |
| 733 | to, since that's often useful. So, if it points into a block of |
| 734 | memory which has already been freed, you'll be informed of this, and |
sewardj | c7529c3 | 2002-04-16 01:55:18 +0000 | [diff] [blame] | 735 | also where the block was free'd at. Likewise, if it should turn out |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 736 | to be just off the end of a malloc'd block, a common result of |
| 737 | off-by-one-errors in array subscripting, you'll be informed of this |
| 738 | fact, and also where the block was malloc'd. |
| 739 | |
| 740 | <p>In this example, Valgrind can't identify the address. Actually the |
| 741 | address is on the stack, but, for some reason, this is not a valid |
| 742 | stack address -- it is below the stack pointer, %esp, and that isn't |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 743 | allowed. In this particular case it's probably caused by gcc |
| 744 | generating invalid code, a known bug in various flavours of gcc. |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 745 | |
| 746 | <p>Note that Valgrind only tells you that your program is about to |
| 747 | access memory at an illegal address. It can't stop the access from |
| 748 | happening. So, if your program makes an access which normally would |
| 749 | result in a segmentation fault, you program will still suffer the same |
| 750 | fate -- but you will get a message from Valgrind immediately prior to |
| 751 | this. In this particular example, reading junk on the stack is |
| 752 | non-fatal, and the program stays alive. |
| 753 | |
| 754 | |
| 755 | <h4>2.6.2 Use of uninitialised values</h4> |
| 756 | For example: |
| 757 | <pre> |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 758 | Conditional jump or move depends on uninitialised value(s) |
| 759 | at 0x402DFA94: _IO_vfprintf (_itoa.h:49) |
| 760 | by 0x402E8476: _IO_printf (printf.c:36) |
| 761 | by 0x8048472: main (tests/manuel1.c:8) |
| 762 | by 0x402A6E5E: __libc_start_main (libc-start.c:129) |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 763 | </pre> |
| 764 | |
| 765 | <p>An uninitialised-value use error is reported when your program uses |
| 766 | a value which hasn't been initialised -- in other words, is undefined. |
| 767 | Here, the undefined value is used somewhere inside the printf() |
| 768 | machinery of the C library. This error was reported when running the |
| 769 | following small program: |
| 770 | <pre> |
| 771 | int main() |
| 772 | { |
| 773 | int x; |
| 774 | printf ("x = %d\n", x); |
| 775 | } |
| 776 | </pre> |
| 777 | |
| 778 | <p>It is important to understand that your program can copy around |
| 779 | junk (uninitialised) data to its heart's content. Valgrind observes |
| 780 | this and keeps track of the data, but does not complain. A complaint |
| 781 | is issued only when your program attempts to make use of uninitialised |
| 782 | data. In this example, x is uninitialised. Valgrind observes the |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 783 | value being passed to _IO_printf and thence to _IO_vfprintf, but makes |
| 784 | no comment. However, _IO_vfprintf has to examine the value of x so it |
| 785 | can turn it into the corresponding ASCII string, and it is at this |
| 786 | point that Valgrind complains. |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 787 | |
| 788 | <p>Sources of uninitialised data tend to be: |
| 789 | <ul> |
| 790 | <li>Local variables in procedures which have not been initialised, |
| 791 | as in the example above.</li><br><p> |
| 792 | |
| 793 | <li>The contents of malloc'd blocks, before you write something |
| 794 | there. In C++, the new operator is a wrapper round malloc, so |
| 795 | if you create an object with new, its fields will be |
| 796 | uninitialised until you fill them in, which is only Right and |
| 797 | Proper.</li> |
| 798 | </ul> |
| 799 | |
| 800 | |
| 801 | |
| 802 | <h4>2.6.3 Illegal frees</h4> |
| 803 | For example: |
| 804 | <pre> |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 805 | Invalid free() |
| 806 | at 0x4004FFDF: free (ut_clientmalloc.c:577) |
| 807 | by 0x80484C7: main (tests/doublefree.c:10) |
| 808 | by 0x402A6E5E: __libc_start_main (libc-start.c:129) |
| 809 | by 0x80483B1: (within tests/doublefree) |
| 810 | Address 0x3807F7B4 is 0 bytes inside a block of size 177 free'd |
| 811 | at 0x4004FFDF: free (ut_clientmalloc.c:577) |
| 812 | by 0x80484C7: main (tests/doublefree.c:10) |
| 813 | by 0x402A6E5E: __libc_start_main (libc-start.c:129) |
| 814 | by 0x80483B1: (within tests/doublefree) |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 815 | </pre> |
| 816 | <p>Valgrind keeps track of the blocks allocated by your program with |
| 817 | malloc/new, so it can know exactly whether or not the argument to |
| 818 | free/delete is legitimate or not. Here, this test program has |
| 819 | freed the same block twice. As with the illegal read/write errors, |
| 820 | Valgrind attempts to make sense of the address free'd. If, as |
| 821 | here, the address is one which has previously been freed, you wil |
| 822 | be told that -- making duplicate frees of the same block easy to spot. |
| 823 | |
| 824 | |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 825 | <h4>2.6.4 When a block is freed with an inappropriate |
| 826 | deallocation function</h4> |
sewardj | 7c062c9 | 2002-05-01 21:46:38 +0000 | [diff] [blame] | 827 | In the following example, a block allocated with <code>new []</code> |
| 828 | has wrongly been deallocated with <code>free</code>: |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 829 | <pre> |
| 830 | Mismatched free() / delete / delete [] |
sewardj | 7c062c9 | 2002-05-01 21:46:38 +0000 | [diff] [blame] | 831 | at 0x40043249: free (vg_clientfuncs.c:171) |
| 832 | by 0x4102BB4E: QGArray::~QGArray(void) (tools/qgarray.cpp:149) |
| 833 | by 0x4C261C41: PptDoc::~PptDoc(void) (include/qmemarray.h:60) |
| 834 | by 0x4C261F0E: PptXml::~PptXml(void) (pptxml.cc:44) |
| 835 | Address 0x4BB292A8 is 0 bytes inside a block of size 64 alloc'd |
| 836 | at 0x4004318C: __builtin_vec_new (vg_clientfuncs.c:152) |
| 837 | by 0x4C21BC15: KLaola::readSBStream(int) const (klaola.cc:314) |
| 838 | by 0x4C21C155: KLaola::stream(KLaola::OLENode const *) (klaola.cc:416) |
| 839 | by 0x4C21788F: OLEFilter::convert(QCString const &) (olefilter.cc:272) |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 840 | </pre> |
| 841 | The following was told to me be the KDE 3 developers. I didn't know |
| 842 | any of it myself. They also implemented the check itself. |
| 843 | <p> |
| 844 | In C++ it's important to deallocate memory in a way compatible with |
| 845 | how it was allocated. The deal is: |
| 846 | <ul> |
| 847 | <li>If allocated with <code>malloc</code>, <code>calloc</code>, |
| 848 | <code>realloc</code>, <code>valloc</code> or |
| 849 | <code>memalign</code>, you must deallocate with <code>free</code>. |
| 850 | <li>If allocated with <code>new []</code>, you must deallocate with |
| 851 | <code>delete []</code>. |
| 852 | <li>If allocated with <code>new</code>, you must deallocate with |
| 853 | <code>delete</code>. |
| 854 | </ul> |
| 855 | The worst thing is that on Linux apparently it doesn't matter if you |
| 856 | do muddle these up, and it all seems to work ok, but the same program |
| 857 | may then crash on a different platform, Solaris for example. So it's |
| 858 | best to fix it properly. According to the KDE folks "it's amazing how |
| 859 | many C++ programmers don't know this". |
| 860 | |
| 861 | |
| 862 | |
| 863 | <h4>2.6.5 Passing system call parameters with inadequate |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 864 | read/write permissions</h4> |
| 865 | |
| 866 | Valgrind checks all parameters to system calls. If a system call |
| 867 | needs to read from a buffer provided by your program, Valgrind checks |
| 868 | that the entire buffer is addressible and has valid data, ie, it is |
| 869 | readable. And if the system call needs to write to a user-supplied |
| 870 | buffer, Valgrind checks that the buffer is addressible. After the |
| 871 | system call, Valgrind updates its administrative information to |
| 872 | precisely reflect any changes in memory permissions caused by the |
| 873 | system call. |
| 874 | |
| 875 | <p>Here's an example of a system call with an invalid parameter: |
| 876 | <pre> |
| 877 | #include <stdlib.h> |
| 878 | #include <unistd.h> |
| 879 | int main( void ) |
| 880 | { |
| 881 | char* arr = malloc(10); |
| 882 | (void) write( 1 /* stdout */, arr, 10 ); |
| 883 | return 0; |
| 884 | } |
| 885 | </pre> |
| 886 | |
| 887 | <p>You get this complaint ... |
| 888 | <pre> |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 889 | Syscall param write(buf) contains uninitialised or unaddressable byte(s) |
| 890 | at 0x4035E072: __libc_write |
| 891 | by 0x402A6E5E: __libc_start_main (libc-start.c:129) |
| 892 | by 0x80483B1: (within tests/badwrite) |
| 893 | by <bogus frame pointer> ??? |
| 894 | Address 0x3807E6D0 is 0 bytes inside a block of size 10 alloc'd |
| 895 | at 0x4004FEE6: malloc (ut_clientmalloc.c:539) |
| 896 | by 0x80484A0: main (tests/badwrite.c:6) |
| 897 | by 0x402A6E5E: __libc_start_main (libc-start.c:129) |
| 898 | by 0x80483B1: (within tests/badwrite) |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 899 | </pre> |
| 900 | |
| 901 | <p>... because the program has tried to write uninitialised junk from |
| 902 | the malloc'd block to the standard output. |
| 903 | |
| 904 | |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 905 | <h4>2.6.6 Warning messages you might see</h4> |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 906 | |
| 907 | Most of these only appear if you run in verbose mode (enabled by |
| 908 | <code>-v</code>): |
| 909 | <ul> |
| 910 | <li> <code>More than 50 errors detected. Subsequent errors |
| 911 | will still be recorded, but in less detail than before.</code> |
| 912 | <br> |
| 913 | After 50 different errors have been shown, Valgrind becomes |
| 914 | more conservative about collecting them. It then requires only |
| 915 | the program counters in the top two stack frames to match when |
| 916 | deciding whether or not two errors are really the same one. |
| 917 | Prior to this point, the PCs in the top four frames are required |
| 918 | to match. This hack has the effect of slowing down the |
| 919 | appearance of new errors after the first 50. The 50 constant can |
| 920 | be changed by recompiling Valgrind. |
| 921 | <p> |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 922 | <li> <code>More than 300 errors detected. I'm not reporting any more. |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 923 | Final error counts may be inaccurate. Go fix your |
| 924 | program!</code> |
| 925 | <br> |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 926 | After 300 different errors have been detected, Valgrind ignores |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 927 | any more. It seems unlikely that collecting even more different |
| 928 | ones would be of practical help to anybody, and it avoids the |
| 929 | danger that Valgrind spends more and more of its time comparing |
| 930 | new errors against an ever-growing collection. As above, the 500 |
| 931 | number is a compile-time constant. |
| 932 | <p> |
| 933 | <li> <code>Warning: client exiting by calling exit(<number>). |
| 934 | Bye!</code> |
| 935 | <br> |
| 936 | Your program has called the <code>exit</code> system call, which |
| 937 | will immediately terminate the process. You'll get no exit-time |
| 938 | error summaries or leak checks. Note that this is not the same |
| 939 | as your program calling the ANSI C function <code>exit()</code> |
| 940 | -- that causes a normal, controlled shutdown of Valgrind. |
| 941 | <p> |
| 942 | <li> <code>Warning: client switching stacks?</code> |
| 943 | <br> |
| 944 | Valgrind spotted such a large change in the stack pointer, %esp, |
| 945 | that it guesses the client is switching to a different stack. |
| 946 | At this point it makes a kludgey guess where the base of the new |
| 947 | stack is, and sets memory permissions accordingly. You may get |
| 948 | many bogus error messages following this, if Valgrind guesses |
| 949 | wrong. At the moment "large change" is defined as a change of |
| 950 | more that 2000000 in the value of the %esp (stack pointer) |
| 951 | register. |
| 952 | <p> |
| 953 | <li> <code>Warning: client attempted to close Valgrind's logfile fd <number> |
| 954 | </code> |
| 955 | <br> |
| 956 | Valgrind doesn't allow the client |
| 957 | to close the logfile, because you'd never see any diagnostic |
| 958 | information after that point. If you see this message, |
| 959 | you may want to use the <code>--logfile-fd=<number></code> |
| 960 | option to specify a different logfile file-descriptor number. |
| 961 | <p> |
| 962 | <li> <code>Warning: noted but unhandled ioctl <number></code> |
| 963 | <br> |
| 964 | Valgrind observed a call to one of the vast family of |
| 965 | <code>ioctl</code> system calls, but did not modify its |
| 966 | memory status info (because I have not yet got round to it). |
| 967 | The call will still have gone through, but you may get spurious |
| 968 | errors after this as a result of the non-update of the memory info. |
| 969 | <p> |
| 970 | <li> <code>Warning: unblocking signal <number> due to |
| 971 | sigprocmask</code> |
| 972 | <br> |
| 973 | Really just a diagnostic from the signal simulation machinery. |
| 974 | This message will appear if your program handles a signal by |
| 975 | first <code>longjmp</code>ing out of the signal handler, |
| 976 | and then unblocking the signal with <code>sigprocmask</code> |
| 977 | -- a standard signal-handling idiom. |
| 978 | <p> |
| 979 | <li> <code>Warning: bad signal number <number> in __NR_sigaction.</code> |
| 980 | <br> |
| 981 | Probably indicates a bug in the signal simulation machinery. |
| 982 | <p> |
| 983 | <li> <code>Warning: set address range perms: large range <number></code> |
| 984 | <br> |
| 985 | Diagnostic message, mostly for my benefit, to do with memory |
| 986 | permissions. |
| 987 | </ul> |
| 988 | |
| 989 | |
| 990 | <a name="suppfiles"></a> |
| 991 | <h3>2.7 Writing suppressions files</h3> |
| 992 | |
| 993 | A suppression file describes a bunch of errors which, for one reason |
| 994 | or another, you don't want Valgrind to tell you about. Usually the |
| 995 | reason is that the system libraries are buggy but unfixable, at least |
| 996 | within the scope of the current debugging session. Multiple |
| 997 | suppresions files are allowed. By default, Valgrind uses |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 998 | <code>$PREFIX/lib/valgrind/default.supp</code>. |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 999 | |
| 1000 | <p> |
| 1001 | You can ask to add suppressions from another file, by specifying |
| 1002 | <code>--suppressions=/path/to/file.supp</code>. |
| 1003 | |
| 1004 | <p>Each suppression has the following components:<br> |
| 1005 | <ul> |
| 1006 | |
| 1007 | <li>Its name. This merely gives a handy name to the suppression, by |
| 1008 | which it is referred to in the summary of used suppressions |
| 1009 | printed out when a program finishes. It's not important what |
| 1010 | the name is; any identifying string will do. |
| 1011 | <p> |
| 1012 | |
| 1013 | <li>The nature of the error to suppress. Either: |
| 1014 | <code>Value1</code>, |
| 1015 | <code>Value2</code>, |
sewardj | a7dc795 | 2002-03-24 11:29:13 +0000 | [diff] [blame] | 1016 | <code>Value4</code> or |
| 1017 | <code>Value8</code>, |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 1018 | meaning an uninitialised-value error when |
sewardj | a7dc795 | 2002-03-24 11:29:13 +0000 | [diff] [blame] | 1019 | using a value of 1, 2, 4 or 8 bytes. |
| 1020 | Or |
| 1021 | <code>Cond</code> (or its old name, <code>Value0</code>), |
| 1022 | meaning use of an uninitialised CPU condition code. Or: |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 1023 | <code>Addr1</code>, |
| 1024 | <code>Addr2</code>, |
| 1025 | <code>Addr4</code> or |
| 1026 | <code>Addr8</code>, meaning an invalid address during a |
| 1027 | memory access of 1, 2, 4 or 8 bytes respectively. Or |
| 1028 | <code>Param</code>, |
| 1029 | meaning an invalid system call parameter error. Or |
| 1030 | <code>Free</code>, meaning an invalid or mismatching free.</li><br> |
| 1031 | <p> |
| 1032 | |
| 1033 | <li>The "immediate location" specification. For Value and Addr |
| 1034 | errors, is either the name of the function in which the error |
| 1035 | occurred, or, failing that, the full path the the .so file |
| 1036 | containing the error location. For Param errors, is the name of |
| 1037 | the offending system call parameter. For Free errors, is the |
| 1038 | name of the function doing the freeing (eg, <code>free</code>, |
| 1039 | <code>__builtin_vec_delete</code>, etc)</li><br> |
| 1040 | <p> |
| 1041 | |
| 1042 | <li>The caller of the above "immediate location". Again, either a |
| 1043 | function or shared-object name.</li><br> |
| 1044 | <p> |
| 1045 | |
| 1046 | <li>Optionally, one or two extra calling-function or object names, |
| 1047 | for greater precision.</li> |
| 1048 | </ul> |
| 1049 | |
| 1050 | <p> |
| 1051 | Locations may be either names of shared objects or wildcards matching |
| 1052 | function names. They begin <code>obj:</code> and <code>fun:</code> |
| 1053 | respectively. Function and object names to match against may use the |
| 1054 | wildcard characters <code>*</code> and <code>?</code>. |
| 1055 | |
| 1056 | A suppression only suppresses an error when the error matches all the |
| 1057 | details in the suppression. Here's an example: |
| 1058 | <pre> |
| 1059 | { |
| 1060 | __gconv_transform_ascii_internal/__mbrtowc/mbtowc |
| 1061 | Value4 |
| 1062 | fun:__gconv_transform_ascii_internal |
| 1063 | fun:__mbr*toc |
| 1064 | fun:mbtowc |
| 1065 | } |
| 1066 | </pre> |
| 1067 | |
| 1068 | <p>What is means is: suppress a use-of-uninitialised-value error, when |
| 1069 | the data size is 4, when it occurs in the function |
| 1070 | <code>__gconv_transform_ascii_internal</code>, when that is called |
| 1071 | from any function of name matching <code>__mbr*toc</code>, |
| 1072 | when that is called from |
| 1073 | <code>mbtowc</code>. It doesn't apply under any other circumstances. |
| 1074 | The string by which this suppression is identified to the user is |
| 1075 | __gconv_transform_ascii_internal/__mbrtowc/mbtowc. |
| 1076 | |
| 1077 | <p>Another example: |
| 1078 | <pre> |
| 1079 | { |
| 1080 | libX11.so.6.2/libX11.so.6.2/libXaw.so.7.0 |
| 1081 | Value4 |
| 1082 | obj:/usr/X11R6/lib/libX11.so.6.2 |
| 1083 | obj:/usr/X11R6/lib/libX11.so.6.2 |
| 1084 | obj:/usr/X11R6/lib/libXaw.so.7.0 |
| 1085 | } |
| 1086 | </pre> |
| 1087 | |
| 1088 | <p>Suppress any size 4 uninitialised-value error which occurs anywhere |
| 1089 | in <code>libX11.so.6.2</code>, when called from anywhere in the same |
| 1090 | library, when called from anywhere in <code>libXaw.so.7.0</code>. The |
| 1091 | inexact specification of locations is regrettable, but is about all |
| 1092 | you can hope for, given that the X11 libraries shipped with Red Hat |
| 1093 | 7.2 have had their symbol tables removed. |
| 1094 | |
| 1095 | <p>Note -- since the above two examples did not make it clear -- that |
| 1096 | you can freely mix the <code>obj:</code> and <code>fun:</code> |
| 1097 | styles of description within a single suppression record. |
| 1098 | |
| 1099 | |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 1100 | <a name="clientreq"></a> |
| 1101 | <h3>2.8 The Client Request mechanism</h3> |
sewardj | c7529c3 | 2002-04-16 01:55:18 +0000 | [diff] [blame] | 1102 | |
| 1103 | Valgrind has a trapdoor mechanism via which the client program can |
| 1104 | pass all manner of requests and queries to Valgrind. Internally, this |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 1105 | is used extensively to make malloc, free, signals, threads, etc, work, |
| 1106 | although you don't see that. |
sewardj | c7529c3 | 2002-04-16 01:55:18 +0000 | [diff] [blame] | 1107 | <p> |
| 1108 | For your convenience, a subset of these so-called client requests is |
| 1109 | provided to allow you to tell Valgrind facts about the behaviour of |
| 1110 | your program, and conversely to make queries. In particular, your |
| 1111 | program can tell Valgrind about changes in memory range permissions |
| 1112 | that Valgrind would not otherwise know about, and so allows clients to |
| 1113 | get Valgrind to do arbitrary custom checks. |
| 1114 | <p> |
| 1115 | Clients need to include the header file <code>valgrind.h</code> to |
| 1116 | make this work. The macros therein have the magical property that |
| 1117 | they generate code in-line which Valgrind can spot. However, the code |
| 1118 | does nothing when not run on Valgrind, so you are not forced to run |
| 1119 | your program on Valgrind just because you use the macros in this file. |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 1120 | Also, you are not required to link your program with any extra |
| 1121 | supporting libraries. |
sewardj | c7529c3 | 2002-04-16 01:55:18 +0000 | [diff] [blame] | 1122 | <p> |
| 1123 | A brief description of the available macros: |
| 1124 | <ul> |
| 1125 | <li><code>VALGRIND_MAKE_NOACCESS</code>, |
| 1126 | <code>VALGRIND_MAKE_WRITABLE</code> and |
| 1127 | <code>VALGRIND_MAKE_READABLE</code>. These mark address |
| 1128 | ranges as completely inaccessible, accessible but containing |
| 1129 | undefined data, and accessible and containing defined data, |
| 1130 | respectively. Subsequent errors may have their faulting |
| 1131 | addresses described in terms of these blocks. Returns a |
| 1132 | "block handle". Returns zero when not run on Valgrind. |
| 1133 | <p> |
| 1134 | <li><code>VALGRIND_DISCARD</code>: At some point you may want |
| 1135 | Valgrind to stop reporting errors in terms of the blocks |
| 1136 | defined by the previous three macros. To do this, the above |
| 1137 | macros return a small-integer "block handle". You can pass |
| 1138 | this block handle to <code>VALGRIND_DISCARD</code>. After |
| 1139 | doing so, Valgrind will no longer be able to relate |
| 1140 | addressing errors to the user-defined block associated with |
| 1141 | the handle. The permissions settings associated with the |
| 1142 | handle remain in place; this just affects how errors are |
| 1143 | reported, not whether they are reported. Returns 1 for an |
| 1144 | invalid handle and 0 for a valid handle (although passing |
| 1145 | invalid handles is harmless). Always returns 0 when not run |
| 1146 | on Valgrind. |
| 1147 | <p> |
| 1148 | <li><code>VALGRIND_CHECK_NOACCESS</code>, |
| 1149 | <code>VALGRIND_CHECK_WRITABLE</code> and |
| 1150 | <code>VALGRIND_CHECK_READABLE</code>: check immediately |
| 1151 | whether or not the given address range has the relevant |
| 1152 | property, and if not, print an error message. Also, for the |
| 1153 | convenience of the client, returns zero if the relevant |
| 1154 | property holds; otherwise, the returned value is the address |
| 1155 | of the first byte for which the property is not true. |
| 1156 | Always returns 0 when not run on Valgrind. |
| 1157 | <p> |
| 1158 | <li><code>VALGRIND_CHECK_NOACCESS</code>: a quick and easy way |
| 1159 | to find out whether Valgrind thinks a particular variable |
| 1160 | (lvalue, to be precise) is addressible and defined. Prints |
| 1161 | an error message if not. Returns no value. |
| 1162 | <p> |
| 1163 | <li><code>VALGRIND_MAKE_NOACCESS_STACK</code>: a highly |
| 1164 | experimental feature. Similarly to |
| 1165 | <code>VALGRIND_MAKE_NOACCESS</code>, this marks an address |
| 1166 | range as inaccessible, so that subsequent accesses to an |
| 1167 | address in the range gives an error. However, this macro |
| 1168 | does not return a block handle. Instead, all annotations |
| 1169 | created like this are reviewed at each client |
| 1170 | <code>ret</code> (subroutine return) instruction, and those |
| 1171 | which now define an address range block the client's stack |
| 1172 | pointer register (<code>%esp</code>) are automatically |
| 1173 | deleted. |
| 1174 | <p> |
| 1175 | In other words, this macro allows the client to tell |
| 1176 | Valgrind about red-zones on its own stack. Valgrind |
| 1177 | automatically discards this information when the stack |
| 1178 | retreats past such blocks. Beware: hacky and flaky, and |
| 1179 | probably interacts badly with the new pthread support. |
sewardj | c7529c3 | 2002-04-16 01:55:18 +0000 | [diff] [blame] | 1180 | <p> |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 1181 | <li><code>RUNNING_ON_VALGRIND</code>: returns 1 if running on |
| 1182 | Valgrind, 0 if running on the real CPU. |
| 1183 | <p> |
| 1184 | <li><code>VALGRIND_DO_LEAK_CHECK</code>: run the memory leak detector |
| 1185 | right now. Returns no value. I guess this could be used to |
| 1186 | incrementally check for leaks between arbitrary places in the |
| 1187 | program's execution. Warning: not properly tested! |
| 1188 | </ul> |
| 1189 | <p> |
| 1190 | |
| 1191 | |
| 1192 | <a name="pthreads"></a> |
| 1193 | <h3>2.9 Support for POSIX Pthreads</h3> |
| 1194 | |
| 1195 | As of late April 02, Valgrind supports programs which use POSIX |
| 1196 | pthreads. Doing this has proved technically challenging and is still |
| 1197 | in progress, but it works well enough, as of 1 May 02, for significant |
| 1198 | threaded applications to work. |
| 1199 | <p> |
| 1200 | It works as follows: threaded apps are (dynamically) linked against |
| 1201 | <code>libpthread.so</code>. Usually this is the one installed with |
| 1202 | your Linux distribution. Valgrind, however, supplies its own |
| 1203 | <code>libpthread.so</code> and automatically connects your program to |
| 1204 | it instead. |
| 1205 | <p> |
| 1206 | The fake <code>libpthread.so</code> and Valgrind cooperate to |
| 1207 | implement a user-space pthreads package. This approach avoids the |
| 1208 | horrible implementation problems of implementing a truly |
| 1209 | multiprocessor version of Valgrind, but it does mean that threaded |
| 1210 | apps run only on one CPU, even if you have a multiprocessor machine. |
| 1211 | <p> |
| 1212 | Valgrind schedules your threads in a round-robin fashion, with all |
| 1213 | threads having equal priority. It switches threads every 20000 basic |
| 1214 | blocks (typically around 120000 x86 instructions), which means you'll |
| 1215 | get a much finer interleaving of thread executions than when run |
| 1216 | natively. This in itself may cause your program to behave differently |
| 1217 | if you have some kind of concurrency, critical race, locking, or |
| 1218 | similar, bugs. |
| 1219 | <p> |
| 1220 | The current (1 May 02) state of pthread support is as follows. Please |
| 1221 | note that things are advancing rapidly, so the situation may have |
| 1222 | improved by the time you read this -- check the web site for further |
| 1223 | updates. |
| 1224 | <ul> |
| 1225 | <li>Mutexes, condition variables, thread-specific data and |
| 1226 | <code>pthread_once</code> currently work. |
| 1227 | <p> |
| 1228 | <li>Various attribute-like calls are handled but ignored. |
| 1229 | You get a warning message. |
| 1230 | <p> |
| 1231 | <li>The main big omission is proper cleanup support for cancellation. |
| 1232 | <code>pthread_cancel</code> works, but instantly nukes the target |
| 1233 | thread without giving it any chance to clean up. Also, when a |
| 1234 | thread exits, it does not run any cleanup handlers. |
| 1235 | <p> |
| 1236 | <li>Currently the following syscalls are thread-safe (nonblocking): |
| 1237 | <code>write</code> <code>read</code> <code>nanosleep</code> |
| 1238 | <code>sleep</code> <code>select</code> and <code>poll</code>. |
| 1239 | <p> |
| 1240 | <li>The POSIX requirement that each thread have its own |
| 1241 | signal-blocking mask is not done; the signal handling mechanism is |
| 1242 | thread-unaware and all signals are delivered to the main thread, |
| 1243 | antidisirregardless. |
| 1244 | </ul> |
| 1245 | |
| 1246 | |
| 1247 | As of 1 May 02, the following programs now work fine on my RedHat 7.2 |
| 1248 | box: Opera 6.0Beta2, KNode in KDE 3.0, Mozilla-0.9.2.1 and |
| 1249 | Galeon-0.11.3, both as supplied with RedHat 7.2. |
| 1250 | <p> |
sewardj | 1f13ab1 | 2002-05-02 03:57:00 +0000 | [diff] [blame] | 1251 | Mozilla 1.0RC1 works fine too, provided that you patch it as described |
| 1252 | here: <a href="http://bugzilla.mozilla.org/show_bug.cgi?id=124335"> |
| 1253 | http://bugzilla.mozilla.org/show_bug.cgi?id=124335</a>. This fixes a |
| 1254 | bug in Mozilla which assumes that memory returned from |
| 1255 | <code>malloc</code> is 8-aligned. Valgrind's allocator only |
| 1256 | guarantees 4-alignment, so without the patch Mozilla makes an illegal |
| 1257 | memory access, which Valgrind of course spots, and then bombs. |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 1258 | |
| 1259 | |
| 1260 | |
| 1261 | <a name="install"></a> |
| 1262 | <h3>2.10 Building and installing</h3> |
| 1263 | |
| 1264 | We now use the standard Unix <code>./configure</code>, |
| 1265 | <code>make</code>, <code>make install</code> mechanism, and I have |
| 1266 | attempted to ensure that it works on machines with kernel 2.2 or 2.4 |
| 1267 | and glibc 2.1.X or 2.2.X. I don't think there is much else to say. |
| 1268 | There are no options apart from the usual <code>--prefix</code> that |
| 1269 | you should give to <code>./configure</code>. |
| 1270 | <p> |
| 1271 | Let me know if you have build problems. |
sewardj | c7529c3 | 2002-04-16 01:55:18 +0000 | [diff] [blame] | 1272 | |
| 1273 | |
| 1274 | |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 1275 | <a name="problems"></a> |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 1276 | <h3>2.11 If you have problems</h3> |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 1277 | Mail me (<a href="mailto:jseward@acm.org">jseward@acm.org</a>). |
| 1278 | |
| 1279 | <p>See <a href="#limits">Section 4</a> for the known limitations of |
| 1280 | Valgrind, and for a list of programs which are known not to work on |
| 1281 | it. |
| 1282 | |
| 1283 | <p>The translator/instrumentor has a lot of assertions in it. They |
| 1284 | are permanently enabled, and I have no plans to disable them. If one |
| 1285 | of these breaks, please mail me! |
| 1286 | |
| 1287 | <p>If you get an assertion failure on the expression |
| 1288 | <code>chunkSane(ch)</code> in <code>vg_free()</code> in |
| 1289 | <code>vg_malloc.c</code>, this may have happened because your program |
| 1290 | wrote off the end of a malloc'd block, or before its beginning. |
| 1291 | Valgrind should have emitted a proper message to that effect before |
| 1292 | dying in this way. This is a known problem which I should fix. |
| 1293 | <p> |
| 1294 | |
| 1295 | <hr width="100%"> |
| 1296 | |
| 1297 | <a name="machine"></a> |
| 1298 | <h2>3 Details of the checking machinery</h2> |
| 1299 | |
| 1300 | Read this section if you want to know, in detail, exactly what and how |
| 1301 | Valgrind is checking. |
| 1302 | |
| 1303 | <a name="vvalue"></a> |
| 1304 | <h3>3.1 Valid-value (V) bits</h3> |
| 1305 | |
| 1306 | It is simplest to think of Valgrind implementing a synthetic Intel x86 |
| 1307 | CPU which is identical to a real CPU, except for one crucial detail. |
| 1308 | Every bit (literally) of data processed, stored and handled by the |
| 1309 | real CPU has, in the synthetic CPU, an associated "valid-value" bit, |
| 1310 | which says whether or not the accompanying bit has a legitimate value. |
| 1311 | In the discussions which follow, this bit is referred to as the V |
| 1312 | (valid-value) bit. |
| 1313 | |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 1314 | <p>Each byte in the system therefore has a 8 V bits which follow |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 1315 | it wherever it goes. For example, when the CPU loads a word-size item |
| 1316 | (4 bytes) from memory, it also loads the corresponding 32 V bits from |
| 1317 | a bitmap which stores the V bits for the process' entire address |
| 1318 | space. If the CPU should later write the whole or some part of that |
| 1319 | value to memory at a different address, the relevant V bits will be |
| 1320 | stored back in the V-bit bitmap. |
| 1321 | |
| 1322 | <p>In short, each bit in the system has an associated V bit, which |
| 1323 | follows it around everywhere, even inside the CPU. Yes, the CPU's |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 1324 | (integer and <code>%eflags</code>) registers have their own V bit |
| 1325 | vectors. |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 1326 | |
| 1327 | <p>Copying values around does not cause Valgrind to check for, or |
| 1328 | report on, errors. However, when a value is used in a way which might |
| 1329 | conceivably affect the outcome of your program's computation, the |
| 1330 | associated V bits are immediately checked. If any of these indicate |
| 1331 | that the value is undefined, an error is reported. |
| 1332 | |
| 1333 | <p>Here's an (admittedly nonsensical) example: |
| 1334 | <pre> |
| 1335 | int i, j; |
| 1336 | int a[10], b[10]; |
| 1337 | for (i = 0; i < 10; i++) { |
| 1338 | j = a[i]; |
| 1339 | b[i] = j; |
| 1340 | } |
| 1341 | </pre> |
| 1342 | |
| 1343 | <p>Valgrind emits no complaints about this, since it merely copies |
| 1344 | uninitialised values from <code>a[]</code> into <code>b[]</code>, and |
| 1345 | doesn't use them in any way. However, if the loop is changed to |
| 1346 | <pre> |
| 1347 | for (i = 0; i < 10; i++) { |
| 1348 | j += a[i]; |
| 1349 | } |
| 1350 | if (j == 77) |
| 1351 | printf("hello there\n"); |
| 1352 | </pre> |
| 1353 | then Valgrind will complain, at the <code>if</code>, that the |
| 1354 | condition depends on uninitialised values. |
| 1355 | |
| 1356 | <p>Most low level operations, such as adds, cause Valgrind to |
| 1357 | use the V bits for the operands to calculate the V bits for the |
| 1358 | result. Even if the result is partially or wholly undefined, |
| 1359 | it does not complain. |
| 1360 | |
| 1361 | <p>Checks on definedness only occur in two places: when a value is |
| 1362 | used to generate a memory address, and where control flow decision |
| 1363 | needs to be made. Also, when a system call is detected, valgrind |
| 1364 | checks definedness of parameters as required. |
| 1365 | |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 1366 | <p>If a check should detect undefinedness, an error message is |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 1367 | issued. The resulting value is subsequently regarded as well-defined. |
| 1368 | To do otherwise would give long chains of error messages. In effect, |
| 1369 | we say that undefined values are non-infectious. |
| 1370 | |
| 1371 | <p>This sounds overcomplicated. Why not just check all reads from |
| 1372 | memory, and complain if an undefined value is loaded into a CPU register? |
| 1373 | Well, that doesn't work well, because perfectly legitimate C programs routinely |
| 1374 | copy uninitialised values around in memory, and we don't want endless complaints |
| 1375 | about that. Here's the canonical example. Consider a struct |
| 1376 | like this: |
| 1377 | <pre> |
| 1378 | struct S { int x; char c; }; |
| 1379 | struct S s1, s2; |
| 1380 | s1.x = 42; |
| 1381 | s1.c = 'z'; |
| 1382 | s2 = s1; |
| 1383 | </pre> |
| 1384 | |
| 1385 | <p>The question to ask is: how large is <code>struct S</code>, in |
| 1386 | bytes? An int is 4 bytes and a char one byte, so perhaps a struct S |
| 1387 | occupies 5 bytes? Wrong. All (non-toy) compilers I know of will |
| 1388 | round the size of <code>struct S</code> up to a whole number of words, |
| 1389 | in this case 8 bytes. Not doing this forces compilers to generate |
| 1390 | truly appalling code for subscripting arrays of <code>struct |
| 1391 | S</code>'s. |
| 1392 | |
| 1393 | <p>So s1 occupies 8 bytes, yet only 5 of them will be initialised. |
| 1394 | For the assignment <code>s2 = s1</code>, gcc generates code to copy |
| 1395 | all 8 bytes wholesale into <code>s2</code> without regard for their |
| 1396 | meaning. If Valgrind simply checked values as they came out of |
| 1397 | memory, it would yelp every time a structure assignment like this |
| 1398 | happened. So the more complicated semantics described above is |
| 1399 | necessary. This allows gcc to copy <code>s1</code> into |
| 1400 | <code>s2</code> any way it likes, and a warning will only be emitted |
| 1401 | if the uninitialised values are later used. |
| 1402 | |
| 1403 | <p>One final twist to this story. The above scheme allows garbage to |
| 1404 | pass through the CPU's integer registers without complaint. It does |
| 1405 | this by giving the integer registers V tags, passing these around in |
| 1406 | the expected way. This complicated and computationally expensive to |
| 1407 | do, but is necessary. Valgrind is more simplistic about |
| 1408 | floating-point loads and stores. In particular, V bits for data read |
| 1409 | as a result of floating-point loads are checked at the load |
| 1410 | instruction. So if your program uses the floating-point registers to |
| 1411 | do memory-to-memory copies, you will get complaints about |
| 1412 | uninitialised values. Fortunately, I have not yet encountered a |
| 1413 | program which (ab)uses the floating-point registers in this way. |
| 1414 | |
| 1415 | <a name="vaddress"></a> |
| 1416 | <h3>3.2 Valid-address (A) bits</h3> |
| 1417 | |
| 1418 | Notice that the previous section describes how the validity of values |
| 1419 | is established and maintained without having to say whether the |
| 1420 | program does or does not have the right to access any particular |
| 1421 | memory location. We now consider the latter issue. |
| 1422 | |
| 1423 | <p>As described above, every bit in memory or in the CPU has an |
| 1424 | associated valid-value (V) bit. In addition, all bytes in memory, but |
| 1425 | not in the CPU, have an associated valid-address (A) bit. This |
| 1426 | indicates whether or not the program can legitimately read or write |
| 1427 | that location. It does not give any indication of the validity or the |
| 1428 | data at that location -- that's the job of the V bits -- only whether |
| 1429 | or not the location may be accessed. |
| 1430 | |
| 1431 | <p>Every time your program reads or writes memory, Valgrind checks the |
| 1432 | A bits associated with the address. If any of them indicate an |
| 1433 | invalid address, an error is emitted. Note that the reads and writes |
| 1434 | themselves do not change the A bits, only consult them. |
| 1435 | |
| 1436 | <p>So how do the A bits get set/cleared? Like this: |
| 1437 | |
| 1438 | <ul> |
| 1439 | <li>When the program starts, all the global data areas are marked as |
| 1440 | accessible.</li><br> |
| 1441 | <p> |
| 1442 | |
| 1443 | <li>When the program does malloc/new, the A bits for the exactly the |
| 1444 | area allocated, and not a byte more, are marked as accessible. |
| 1445 | Upon freeing the area the A bits are changed to indicate |
| 1446 | inaccessibility.</li><br> |
| 1447 | <p> |
| 1448 | |
| 1449 | <li>When the stack pointer register (%esp) moves up or down, A bits |
| 1450 | are set. The rule is that the area from %esp up to the base of |
| 1451 | the stack is marked as accessible, and below %esp is |
| 1452 | inaccessible. (If that sounds illogical, bear in mind that the |
| 1453 | stack grows down, not up, on almost all Unix systems, including |
| 1454 | GNU/Linux.) Tracking %esp like this has the useful side-effect |
| 1455 | that the section of stack used by a function for local variables |
| 1456 | etc is automatically marked accessible on function entry and |
| 1457 | inaccessible on exit.</li><br> |
| 1458 | <p> |
| 1459 | |
| 1460 | <li>When doing system calls, A bits are changed appropriately. For |
| 1461 | example, mmap() magically makes files appear in the process's |
| 1462 | address space, so the A bits must be updated if mmap() |
| 1463 | succeeds.</li><br> |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 1464 | <p> |
| 1465 | |
| 1466 | <li>Optionally, your program can tell Valgrind about such changes |
| 1467 | explicitly, using the client request mechanism described above. |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 1468 | </ul> |
| 1469 | |
| 1470 | |
| 1471 | <a name="together"></a> |
| 1472 | <h3>3.3 Putting it all together</h3> |
| 1473 | Valgrind's checking machinery can be summarised as follows: |
| 1474 | |
| 1475 | <ul> |
| 1476 | <li>Each byte in memory has 8 associated V (valid-value) bits, |
| 1477 | saying whether or not the byte has a defined value, and a single |
| 1478 | A (valid-address) bit, saying whether or not the program |
| 1479 | currently has the right to read/write that address.</li><br> |
| 1480 | <p> |
| 1481 | |
| 1482 | <li>When memory is read or written, the relevant A bits are |
| 1483 | consulted. If they indicate an invalid address, Valgrind emits |
| 1484 | an Invalid read or Invalid write error.</li><br> |
| 1485 | <p> |
| 1486 | |
| 1487 | <li>When memory is read into the CPU's integer registers, the |
| 1488 | relevant V bits are fetched from memory and stored in the |
| 1489 | simulated CPU. They are not consulted.</li><br> |
| 1490 | <p> |
| 1491 | |
| 1492 | <li>When an integer register is written out to memory, the V bits |
| 1493 | for that register are written back to memory too.</li><br> |
| 1494 | <p> |
| 1495 | |
| 1496 | <li>When memory is read into the CPU's floating point registers, the |
| 1497 | relevant V bits are read from memory and they are immediately |
| 1498 | checked. If any are invalid, an uninitialised value error is |
| 1499 | emitted. This precludes using the floating-point registers to |
| 1500 | copy possibly-uninitialised memory, but simplifies Valgrind in |
| 1501 | that it does not have to track the validity status of the |
| 1502 | floating-point registers.</li><br> |
| 1503 | <p> |
| 1504 | |
| 1505 | <li>As a result, when a floating-point register is written to |
| 1506 | memory, the associated V bits are set to indicate a valid |
| 1507 | value.</li><br> |
| 1508 | <p> |
| 1509 | |
| 1510 | <li>When values in integer CPU registers are used to generate a |
| 1511 | memory address, or to determine the outcome of a conditional |
| 1512 | branch, the V bits for those values are checked, and an error |
| 1513 | emitted if any of them are undefined.</li><br> |
| 1514 | <p> |
| 1515 | |
| 1516 | <li>When values in integer CPU registers are used for any other |
| 1517 | purpose, Valgrind computes the V bits for the result, but does |
| 1518 | not check them.</li><br> |
| 1519 | <p> |
| 1520 | |
| 1521 | <li>One the V bits for a value in the CPU have been checked, they |
| 1522 | are then set to indicate validity. This avoids long chains of |
| 1523 | errors.</li><br> |
| 1524 | <p> |
| 1525 | |
| 1526 | <li>When values are loaded from memory, valgrind checks the A bits |
| 1527 | for that location and issues an illegal-address warning if |
| 1528 | needed. In that case, the V bits loaded are forced to indicate |
| 1529 | Valid, despite the location being invalid. |
| 1530 | <p> |
| 1531 | This apparently strange choice reduces the amount of confusing |
| 1532 | information presented to the user. It avoids the |
| 1533 | unpleasant phenomenon in which memory is read from a place which |
| 1534 | is both unaddressible and contains invalid values, and, as a |
| 1535 | result, you get not only an invalid-address (read/write) error, |
| 1536 | but also a potentially large set of uninitialised-value errors, |
| 1537 | one for every time the value is used. |
| 1538 | <p> |
| 1539 | There is a hazy boundary case to do with multi-byte loads from |
| 1540 | addresses which are partially valid and partially invalid. See |
| 1541 | details of the flag <code>--partial-loads-ok</code> for details. |
| 1542 | </li><br> |
| 1543 | </ul> |
| 1544 | |
| 1545 | Valgrind intercepts calls to malloc, calloc, realloc, valloc, |
| 1546 | memalign, free, new and delete. The behaviour you get is: |
| 1547 | |
| 1548 | <ul> |
| 1549 | |
| 1550 | <li>malloc/new: the returned memory is marked as addressible but not |
| 1551 | having valid values. This means you have to write on it before |
| 1552 | you can read it.</li><br> |
| 1553 | <p> |
| 1554 | |
| 1555 | <li>calloc: returned memory is marked both addressible and valid, |
| 1556 | since calloc() clears the area to zero.</li><br> |
| 1557 | <p> |
| 1558 | |
| 1559 | <li>realloc: if the new size is larger than the old, the new section |
| 1560 | is addressible but invalid, as with malloc.</li><br> |
| 1561 | <p> |
| 1562 | |
| 1563 | <li>If the new size is smaller, the dropped-off section is marked as |
| 1564 | unaddressible. You may only pass to realloc a pointer |
| 1565 | previously issued to you by malloc/calloc/new/realloc.</li><br> |
| 1566 | <p> |
| 1567 | |
| 1568 | <li>free/delete: you may only pass to free a pointer previously |
| 1569 | issued to you by malloc/calloc/new/realloc, or the value |
| 1570 | NULL. Otherwise, Valgrind complains. If the pointer is indeed |
| 1571 | valid, Valgrind marks the entire area it points at as |
| 1572 | unaddressible, and places the block in the freed-blocks-queue. |
| 1573 | The aim is to defer as long as possible reallocation of this |
| 1574 | block. Until that happens, all attempts to access it will |
| 1575 | elicit an invalid-address error, as you would hope.</li><br> |
| 1576 | </ul> |
| 1577 | |
| 1578 | |
| 1579 | |
| 1580 | <a name="signals"></a> |
| 1581 | <h3>3.4 Signals</h3> |
| 1582 | |
| 1583 | Valgrind provides suitable handling of signals, so, provided you stick |
| 1584 | to POSIX stuff, you should be ok. Basic sigaction() and sigprocmask() |
| 1585 | are handled. Signal handlers may return in the normal way or do |
| 1586 | longjmp(); both should work ok. As specified by POSIX, a signal is |
| 1587 | blocked in its own handler. Default actions for signals should work |
| 1588 | as before. Etc, etc. |
| 1589 | |
| 1590 | <p>Under the hood, dealing with signals is a real pain, and Valgrind's |
| 1591 | simulation leaves much to be desired. If your program does |
| 1592 | way-strange stuff with signals, bad things may happen. If so, let me |
| 1593 | know. I don't promise to fix it, but I'd at least like to be aware of |
| 1594 | it. |
| 1595 | |
| 1596 | |
| 1597 | <a name="leaks"><a/> |
| 1598 | <h3>3.5 Memory leak detection</h3> |
| 1599 | |
| 1600 | Valgrind keeps track of all memory blocks issued in response to calls |
| 1601 | to malloc/calloc/realloc/new. So when the program exits, it knows |
| 1602 | which blocks are still outstanding -- have not been returned, in other |
| 1603 | words. Ideally, you want your program to have no blocks still in use |
| 1604 | at exit. But many programs do. |
| 1605 | |
| 1606 | <p>For each such block, Valgrind scans the entire address space of the |
| 1607 | process, looking for pointers to the block. One of three situations |
| 1608 | may result: |
| 1609 | |
| 1610 | <ul> |
| 1611 | <li>A pointer to the start of the block is found. This usually |
| 1612 | indicates programming sloppiness; since the block is still |
| 1613 | pointed at, the programmer could, at least in principle, free'd |
| 1614 | it before program exit.</li><br> |
| 1615 | <p> |
| 1616 | |
| 1617 | <li>A pointer to the interior of the block is found. The pointer |
| 1618 | might originally have pointed to the start and have been moved |
| 1619 | along, or it might be entirely unrelated. Valgrind deems such a |
| 1620 | block as "dubious", that is, possibly leaked, |
| 1621 | because it's unclear whether or |
| 1622 | not a pointer to it still exists.</li><br> |
| 1623 | <p> |
| 1624 | |
| 1625 | <li>The worst outcome is that no pointer to the block can be found. |
| 1626 | The block is classified as "leaked", because the |
| 1627 | programmer could not possibly have free'd it at program exit, |
| 1628 | since no pointer to it exists. This might be a symptom of |
| 1629 | having lost the pointer at some earlier point in the |
| 1630 | program.</li> |
| 1631 | </ul> |
| 1632 | |
| 1633 | Valgrind reports summaries about leaked and dubious blocks. |
| 1634 | For each such block, it will also tell you where the block was |
| 1635 | allocated. This should help you figure out why the pointer to it has |
| 1636 | been lost. In general, you should attempt to ensure your programs do |
| 1637 | not have any leaked or dubious blocks at exit. |
| 1638 | |
| 1639 | <p>The precise area of memory in which Valgrind searches for pointers |
| 1640 | is: all naturally-aligned 4-byte words for which all A bits indicate |
| 1641 | addressibility and all V bits indicated that the stored value is |
| 1642 | actually valid. |
| 1643 | |
| 1644 | <p><hr width="100%"> |
| 1645 | |
| 1646 | |
| 1647 | <a name="limits"></a> |
| 1648 | <h2>4 Limitations</h2> |
| 1649 | |
| 1650 | The following list of limitations seems depressingly long. However, |
| 1651 | most programs actually work fine. |
| 1652 | |
| 1653 | <p>Valgrind will run x86-GNU/Linux ELF dynamically linked binaries, on |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 1654 | a kernel 2.2.X or 2.4.X system, subject to the following constraints: |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 1655 | |
| 1656 | <ul> |
| 1657 | <li>No MMX, SSE, SSE2, 3DNow instructions. If the translator |
| 1658 | encounters these, Valgrind will simply give up. It may be |
| 1659 | possible to add support for them at a later time. Intel added a |
| 1660 | few instructions such as "cmov" to the integer instruction set |
| 1661 | on Pentium and later processors, and these are supported. |
| 1662 | Nevertheless it's safest to think of Valgrind as implementing |
| 1663 | the 486 instruction set.</li><br> |
| 1664 | <p> |
| 1665 | |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 1666 | <li>Pthreads support is improving, but there are still significant |
| 1667 | limitations in that department. See the section above on |
| 1668 | Pthreads. Note that your program must be dynamically linked |
| 1669 | against <code>libpthread.so</code>, so that Valgrind can |
| 1670 | substitute its own implementation at program startup time. If |
| 1671 | you're statically linked against it, things will fail |
| 1672 | badly.</li><br> |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 1673 | <p> |
| 1674 | |
| 1675 | <li>Valgrind assumes that the floating point registers are not used |
| 1676 | as intermediaries in memory-to-memory copies, so it immediately |
| 1677 | checks V bits in floating-point loads/stores. If you want to |
| 1678 | write code which copies around possibly-uninitialised values, |
| 1679 | you must ensure these travel through the integer registers, not |
| 1680 | the FPU.</li><br> |
| 1681 | <p> |
| 1682 | |
| 1683 | <li>If your program does its own memory management, rather than |
| 1684 | using malloc/new/free/delete, it should still work, but |
| 1685 | Valgrind's error checking won't be so effective.</li><br> |
| 1686 | <p> |
| 1687 | |
| 1688 | <li>Valgrind's signal simulation is not as robust as it could be. |
| 1689 | Basic POSIX-compliant sigaction and sigprocmask functionality is |
| 1690 | supplied, but it's conceivable that things could go badly awry |
| 1691 | if you do wierd things with signals. Workaround: don't. |
| 1692 | Programs that do non-POSIX signal tricks are in any case |
| 1693 | inherently unportable, so should be avoided if |
| 1694 | possible.</li><br> |
| 1695 | <p> |
| 1696 | |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 1697 | <li>Programs which try to handle signals on |
| 1698 | an alternate stack (sigaltstack) are not supported, although |
| 1699 | they could be, with a bit of effort.</li><br> |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 1700 | <p> |
| 1701 | |
| 1702 | <li>Programs which switch stacks are not well handled. Valgrind |
| 1703 | does have support for this, but I don't have great faith in it. |
| 1704 | It's difficult -- there's no cast-iron way to decide whether a |
| 1705 | large change in %esp is as a result of the program switching |
| 1706 | stacks, or merely allocating a large object temporarily on the |
| 1707 | current stack -- yet Valgrind needs to handle the two situations |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 1708 | differently. 1 May 02: this probably interacts badly with the |
| 1709 | new pthread support. I haven't checked properly.</li><br> |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 1710 | <p> |
| 1711 | |
| 1712 | <li>x86 instructions, and system calls, have been implemented on |
| 1713 | demand. So it's possible, although unlikely, that a program |
| 1714 | will fall over with a message to that effect. If this happens, |
| 1715 | please mail me ALL the details printed out, so I can try and |
| 1716 | implement the missing feature.</li><br> |
| 1717 | <p> |
| 1718 | |
| 1719 | <li>x86 floating point works correctly, but floating-point code may |
| 1720 | run even more slowly than integer code, due to my simplistic |
| 1721 | approach to FPU emulation.</li><br> |
| 1722 | <p> |
| 1723 | |
| 1724 | <li>You can't Valgrind-ize statically linked binaries. Valgrind |
| 1725 | relies on the dynamic-link mechanism to gain control at |
| 1726 | startup.</li><br> |
| 1727 | <p> |
| 1728 | |
| 1729 | <li>Memory consumption of your program is majorly increased whilst |
| 1730 | running under Valgrind. This is due to the large amount of |
| 1731 | adminstrative information maintained behind the scenes. Another |
| 1732 | cause is that Valgrind dynamically translates the original |
| 1733 | executable and never throws any translation away, except in |
| 1734 | those rare cases where self-modifying code is detected. |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 1735 | Translated, instrumented code is 12-14 times larger than the |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 1736 | original (!) so you can easily end up with 15+ MB of |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 1737 | translations when running (eg) a web browser. |
| 1738 | </li> |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 1739 | </ul> |
| 1740 | |
| 1741 | |
| 1742 | Programs which are known not to work are: |
| 1743 | |
| 1744 | <ul> |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 1745 | <li>emacs starts up but immediately concludes it is out of memory |
| 1746 | and aborts. Emacs has it's own memory-management scheme, but I |
| 1747 | don't understand why this should interact so badly with |
sewardj | ab1d9d1 | 2002-05-01 12:38:06 +0000 | [diff] [blame] | 1748 | Valgrind. Emacs works fine if you build it to use the standard |
| 1749 | malloc/free routines.</li><br> |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 1750 | <p> |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 1751 | </ul> |
| 1752 | |
| 1753 | |
| 1754 | <p><hr width="100%"> |
| 1755 | |
| 1756 | |
| 1757 | <a name="howitworks"></a> |
| 1758 | <h2>5 How it works -- a rough overview</h2> |
| 1759 | Some gory details, for those with a passion for gory details. You |
| 1760 | don't need to read this section if all you want to do is use Valgrind. |
| 1761 | |
| 1762 | <a name="startb"></a> |
| 1763 | <h3>5.1 Getting started</h3> |
| 1764 | |
| 1765 | Valgrind is compiled into a shared object, valgrind.so. The shell |
| 1766 | script valgrind sets the LD_PRELOAD environment variable to point to |
| 1767 | valgrind.so. This causes the .so to be loaded as an extra library to |
| 1768 | any subsequently executed dynamically-linked ELF binary, viz, the |
| 1769 | program you want to debug. |
| 1770 | |
| 1771 | <p>The dynamic linker allows each .so in the process image to have an |
| 1772 | initialisation function which is run before main(). It also allows |
| 1773 | each .so to have a finalisation function run after main() exits. |
| 1774 | |
| 1775 | <p>When valgrind.so's initialisation function is called by the dynamic |
| 1776 | linker, the synthetic CPU to starts up. The real CPU remains locked |
| 1777 | in valgrind.so for the entire rest of the program, but the synthetic |
| 1778 | CPU returns from the initialisation function. Startup of the program |
| 1779 | now continues as usual -- the dynamic linker calls all the other .so's |
| 1780 | initialisation routines, and eventually runs main(). This all runs on |
| 1781 | the synthetic CPU, not the real one, but the client program cannot |
| 1782 | tell the difference. |
| 1783 | |
| 1784 | <p>Eventually main() exits, so the synthetic CPU calls valgrind.so's |
| 1785 | finalisation function. Valgrind detects this, and uses it as its cue |
| 1786 | to exit. It prints summaries of all errors detected, possibly checks |
| 1787 | for memory leaks, and then exits the finalisation routine, but now on |
| 1788 | the real CPU. The synthetic CPU has now lost control -- permanently |
| 1789 | -- so the program exits back to the OS on the real CPU, just as it |
| 1790 | would have done anyway. |
| 1791 | |
| 1792 | <p>On entry, Valgrind switches stacks, so it runs on its own stack. |
| 1793 | On exit, it switches back. This means that the client program |
| 1794 | continues to run on its own stack, so we can switch back and forth |
| 1795 | between running it on the simulated and real CPUs without difficulty. |
| 1796 | This was an important design decision, because it makes it easy (well, |
| 1797 | significantly less difficult) to debug the synthetic CPU. |
| 1798 | |
| 1799 | |
| 1800 | <a name="engine"></a> |
| 1801 | <h3>5.2 The translation/instrumentation engine</h3> |
| 1802 | |
| 1803 | Valgrind does not directly run any of the original program's code. Only |
| 1804 | instrumented translations are run. Valgrind maintains a translation |
| 1805 | table, which allows it to find the translation quickly for any branch |
| 1806 | target (code address). If no translation has yet been made, the |
| 1807 | translator - a just-in-time translator - is summoned. This makes an |
| 1808 | instrumented translation, which is added to the collection of |
| 1809 | translations. Subsequent jumps to that address will use this |
| 1810 | translation. |
| 1811 | |
| 1812 | <p>Valgrind can optionally check writes made by the application, to |
| 1813 | see if they are writing an address contained within code which has |
| 1814 | been translated. Such a write invalidates translations of code |
| 1815 | bracketing the written address. Valgrind will discard the relevant |
| 1816 | translations, which causes them to be re-made, if they are needed |
| 1817 | again, reflecting the new updated data stored there. In this way, |
| 1818 | self modifying code is supported. In practice I have not found any |
| 1819 | Linux applications which use self-modifying-code. |
| 1820 | |
| 1821 | <p>The JITter translates basic blocks -- blocks of straight-line-code |
| 1822 | -- as single entities. To minimise the considerable difficulties of |
| 1823 | dealing with the x86 instruction set, x86 instructions are first |
| 1824 | translated to a RISC-like intermediate code, similar to sparc code, |
| 1825 | but with an infinite number of virtual integer registers. Initially |
| 1826 | each insn is translated seperately, and there is no attempt at |
| 1827 | instrumentation. |
| 1828 | |
| 1829 | <p>The intermediate code is improved, mostly so as to try and cache |
| 1830 | the simulated machine's registers in the real machine's registers over |
| 1831 | several simulated instructions. This is often very effective. Also, |
| 1832 | we try to remove redundant updates of the simulated machines's |
| 1833 | condition-code register. |
| 1834 | |
| 1835 | <p>The intermediate code is then instrumented, giving more |
| 1836 | intermediate code. There are a few extra intermediate-code operations |
| 1837 | to support instrumentation; it is all refreshingly simple. After |
| 1838 | instrumentation there is a cleanup pass to remove redundant value |
| 1839 | checks. |
| 1840 | |
| 1841 | <p>This gives instrumented intermediate code which mentions arbitrary |
| 1842 | numbers of virtual registers. A linear-scan register allocator is |
| 1843 | used to assign real registers and possibly generate spill code. All |
| 1844 | of this is still phrased in terms of the intermediate code. This |
| 1845 | machinery is inspired by the work of Reuben Thomas (MITE). |
| 1846 | |
| 1847 | <p>Then, and only then, is the final x86 code emitted. The |
| 1848 | intermediate code is carefully designed so that x86 code can be |
| 1849 | generated from it without need for spare registers or other |
| 1850 | inconveniences. |
| 1851 | |
| 1852 | <p>The translations are managed using a traditional LRU-based caching |
| 1853 | scheme. The translation cache has a default size of about 14MB. |
| 1854 | |
| 1855 | <a name="track"></a> |
| 1856 | |
| 1857 | <h3>5.3 Tracking the status of memory</h3> Each byte in the |
| 1858 | process' address space has nine bits associated with it: one A bit and |
| 1859 | eight V bits. The A and V bits for each byte are stored using a |
| 1860 | sparse array, which flexibly and efficiently covers arbitrary parts of |
| 1861 | the 32-bit address space without imposing significant space or |
| 1862 | performance overheads for the parts of the address space never |
| 1863 | visited. The scheme used, and speedup hacks, are described in detail |
| 1864 | at the top of the source file vg_memory.c, so you should read that for |
| 1865 | the gory details. |
| 1866 | |
| 1867 | <a name="sys_calls"></a> |
| 1868 | |
| 1869 | <h3>5.4 System calls</h3> |
| 1870 | All system calls are intercepted. The memory status map is consulted |
| 1871 | before and updated after each call. It's all rather tiresome. See |
| 1872 | vg_syscall_mem.c for details. |
| 1873 | |
| 1874 | <a name="sys_signals"></a> |
| 1875 | |
| 1876 | <h3>5.5 Signals</h3> |
| 1877 | All system calls to sigaction() and sigprocmask() are intercepted. If |
| 1878 | the client program is trying to set a signal handler, Valgrind makes a |
| 1879 | note of the handler address and which signal it is for. Valgrind then |
| 1880 | arranges for the same signal to be delivered to its own handler. |
| 1881 | |
| 1882 | <p>When such a signal arrives, Valgrind's own handler catches it, and |
| 1883 | notes the fact. At a convenient safe point in execution, Valgrind |
| 1884 | builds a signal delivery frame on the client's stack and runs its |
| 1885 | handler. If the handler longjmp()s, there is nothing more to be said. |
| 1886 | If the handler returns, Valgrind notices this, zaps the delivery |
| 1887 | frame, and carries on where it left off before delivering the signal. |
| 1888 | |
| 1889 | <p>The purpose of this nonsense is that setting signal handlers |
| 1890 | essentially amounts to giving callback addresses to the Linux kernel. |
| 1891 | We can't allow this to happen, because if it did, signal handlers |
| 1892 | would run on the real CPU, not the simulated one. This means the |
| 1893 | checking machinery would not operate during the handler run, and, |
| 1894 | worse, memory permissions maps would not be updated, which could cause |
| 1895 | spurious error reports once the handler had returned. |
| 1896 | |
| 1897 | <p>An even worse thing would happen if the signal handler longjmp'd |
| 1898 | rather than returned: Valgrind would completely lose control of the |
| 1899 | client program. |
| 1900 | |
| 1901 | <p>Upshot: we can't allow the client to install signal handlers |
| 1902 | directly. Instead, Valgrind must catch, on behalf of the client, any |
| 1903 | signal the client asks to catch, and must delivery it to the client on |
| 1904 | the simulated CPU, not the real one. This involves considerable |
| 1905 | gruesome fakery; see vg_signals.c for details. |
| 1906 | <p> |
| 1907 | |
| 1908 | <hr width="100%"> |
| 1909 | |
| 1910 | <a name="example"></a> |
| 1911 | <h2>6 Example</h2> |
| 1912 | This is the log for a run of a small program. The program is in fact |
| 1913 | correct, and the reported error is as the result of a potentially serious |
| 1914 | code generation bug in GNU g++ (snapshot 20010527). |
| 1915 | <pre> |
| 1916 | sewardj@phoenix:~/newmat10$ |
| 1917 | ~/Valgrind-6/valgrind -v ./bogon |
| 1918 | ==25832== Valgrind 0.10, a memory error detector for x86 RedHat 7.1. |
| 1919 | ==25832== Copyright (C) 2000-2001, and GNU GPL'd, by Julian Seward. |
| 1920 | ==25832== Startup, with flags: |
| 1921 | ==25832== --suppressions=/home/sewardj/Valgrind/redhat71.supp |
| 1922 | ==25832== reading syms from /lib/ld-linux.so.2 |
| 1923 | ==25832== reading syms from /lib/libc.so.6 |
| 1924 | ==25832== reading syms from /mnt/pima/jrs/Inst/lib/libgcc_s.so.0 |
| 1925 | ==25832== reading syms from /lib/libm.so.6 |
| 1926 | ==25832== reading syms from /mnt/pima/jrs/Inst/lib/libstdc++.so.3 |
| 1927 | ==25832== reading syms from /home/sewardj/Valgrind/valgrind.so |
| 1928 | ==25832== reading syms from /proc/self/exe |
| 1929 | ==25832== loaded 5950 symbols, 142333 line number locations |
| 1930 | ==25832== |
| 1931 | ==25832== Invalid read of size 4 |
| 1932 | ==25832== at 0x8048724: _ZN10BandMatrix6ReSizeEiii (bogon.cpp:45) |
| 1933 | ==25832== by 0x80487AF: main (bogon.cpp:66) |
| 1934 | ==25832== by 0x40371E5E: __libc_start_main (libc-start.c:129) |
| 1935 | ==25832== by 0x80485D1: (within /home/sewardj/newmat10/bogon) |
| 1936 | ==25832== Address 0xBFFFF74C is not stack'd, malloc'd or free'd |
| 1937 | ==25832== |
| 1938 | ==25832== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) |
| 1939 | ==25832== malloc/free: in use at exit: 0 bytes in 0 blocks. |
| 1940 | ==25832== malloc/free: 0 allocs, 0 frees, 0 bytes allocated. |
| 1941 | ==25832== For a detailed leak analysis, rerun with: --leak-check=yes |
| 1942 | ==25832== |
| 1943 | ==25832== exiting, did 1881 basic blocks, 0 misses. |
| 1944 | ==25832== 223 translations, 3626 bytes in, 56801 bytes out. |
| 1945 | </pre> |
| 1946 | <p>The GCC folks fixed this about a week before gcc-3.0 shipped. |
| 1947 | <hr width="100%"> |
| 1948 | <p> |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 1949 | |
| 1950 | |
| 1951 | |
| 1952 | <a name="cache"></a> |
| 1953 | <h2>7 Cache profiling</h2> |
| 1954 | As well as memory debugging, Valgrind also allows you to do cache simulations |
| 1955 | and annotate your source line-by-line with the number of cache misses. In |
| 1956 | particular, it records: |
| 1957 | <ul> |
| 1958 | <li>L1 instruction cache reads and misses; |
| 1959 | <li>L1 data cache reads and read misses, writes and write misses; |
| 1960 | <li>L2 unified cache reads and read misses, writes and writes misses. |
| 1961 | </ul> |
| 1962 | On a modern x86 machine, an L1 miss will typically cost around 10 cycles, |
| 1963 | and an L2 miss can cost as much as 200 cycles. Detailed cache profiling can be |
njn | 7cfd572 | 2002-05-03 17:51:10 +0000 | [diff] [blame] | 1964 | very useful for improving the performance of your program.<p> |
| 1965 | |
| 1966 | Also, since one instruction cache read is performed per instruction executed, |
| 1967 | you can find out how many instructions are executed per line, which can be |
| 1968 | useful for optimisation and test coverage.<p> |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 1969 | |
| 1970 | Please note that this is an experimental feature. Any feedback, bug-fixes, |
| 1971 | suggestions, etc, welcome. |
| 1972 | |
| 1973 | |
| 1974 | <h3>7.1 Overview</h3> |
| 1975 | First off, as for normal Valgrind use, you probably want to turn on debugging |
| 1976 | info (the <code>-g</code> flag). But by contrast with normal Valgrind use, you |
| 1977 | probably <b>do</b> want to turn optimisation on, since you should profile your |
| 1978 | program as it will be normally run. |
| 1979 | |
| 1980 | The three steps are: |
| 1981 | <ol> |
sewardj | 434f57f | 2002-05-01 01:24:52 +0000 | [diff] [blame] | 1982 | <li>Generate a cache simulator for your machine's cache |
| 1983 | configuration with the supplied <code>vg_cachegen</code> |
| 1984 | program, and recompile Valgrind with <code>make install</code>. |
| 1985 | <p> |
| 1986 | The default settings are for an AMD Athlon, and you will get |
| 1987 | useful information with the defaults, so you can skip this step |
| 1988 | if you want. Nevertheless, for accurate cache profiles you will |
| 1989 | need use <code>vg_cachegen</code> to customise |
| 1990 | <code>cachegrind</code> for your system. |
| 1991 | <p> |
| 1992 | This step only needs to be done once, unless you are interested |
| 1993 | in simulating different cache configurations (eg. first |
| 1994 | concentrating on instruction cache misses, then on data cache |
| 1995 | misses). |
| 1996 | </li> |
| 1997 | <p> |
| 1998 | <li>Run your program with <code>cachegrind</code> in front of the |
| 1999 | normal command line invocation. When the program finishes, |
| 2000 | Valgrind will print summary cache statistics. It also collects |
| 2001 | line-by-line information in a file <code>cachegrind.out</code>. |
| 2002 | <p> |
| 2003 | This step should be done every time you want to collect |
| 2004 | information about a new program, a changed program, or about the |
| 2005 | same program with different input. |
| 2006 | </li> |
| 2007 | <p> |
| 2008 | <li>Generate a function-by-function summary, and possibly annotate |
| 2009 | source files with 'vg_annotate'. Source files to annotate can be |
| 2010 | specified manually, or manually on the command line, or |
| 2011 | "interesting" source files can be annotated automatically with |
| 2012 | the <code>--auto=yes</code> option. You can annotate C/C++ |
| 2013 | files or assembly language files equally easily.</li> |
| 2014 | <p> |
| 2015 | This step can be performed as many times as you like for each |
| 2016 | Step 2. You may want to do multiple annotations showing |
| 2017 | different information each time.<p> |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2018 | </ol> |
| 2019 | |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2020 | The steps are described in detail in the following sections.<p> |
| 2021 | |
| 2022 | |
| 2023 | <a name="generate"></a> |
| 2024 | <h3>7.3 Generating a cache simulator</h3> |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2025 | |
sewardj | 434f57f | 2002-05-01 01:24:52 +0000 | [diff] [blame] | 2026 | Although Valgrind comes with a pre-generated cache simulator, it most |
| 2027 | likely won't match the cache configuration of your machine, so you |
| 2028 | should generate a new simulator.<p> |
| 2029 | |
| 2030 | You need to generate three files, one for each of the I1, D1 and L2 |
| 2031 | caches. For each cache, you need to know the: |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2032 | <ul> |
| 2033 | <li>Cache size (bytes); |
| 2034 | <li>Line size (bytes); |
| 2035 | <li>Associativity. |
| 2036 | </ul> |
| 2037 | |
| 2038 | vg_cachegen takes three options: |
| 2039 | <ul> |
| 2040 | <li><code>--I1=size,line_size,associativity</code> |
| 2041 | <li><code>--D1=size,line_size,associativity</code> |
| 2042 | <li><code>--L2=size,line_size,associativity</code> |
| 2043 | </ul> |
| 2044 | |
sewardj | 434f57f | 2002-05-01 01:24:52 +0000 | [diff] [blame] | 2045 | You can specify one, two or all three caches per invocation of |
| 2046 | vg_cachegen. It checks that the configuration is sensible before |
| 2047 | generating the simulators; to see the allowed values, run |
| 2048 | <code>vg_cachegen -h</code>.<p> |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2049 | |
| 2050 | An example invocation would be: |
| 2051 | |
| 2052 | <blockquote><code> |
| 2053 | vg_cachegen --I1=65536,64,2 --D1=65536,64,2 --L2=262144,64,8 |
| 2054 | </code></blockquote> |
| 2055 | |
sewardj | 434f57f | 2002-05-01 01:24:52 +0000 | [diff] [blame] | 2056 | This simulates a machine with a 128KB split L1 2-way associative |
| 2057 | cache, and a 256KB unified 8-way associative L2 cache. Both caches |
| 2058 | have 64B lines.<p> |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2059 | |
sewardj | 434f57f | 2002-05-01 01:24:52 +0000 | [diff] [blame] | 2060 | If you don't know your cache configuration, you'll have to find it |
| 2061 | out. (Ideally <code>vg_cachegen</code> could auto-identify your cache |
| 2062 | configuration using the CPUID instruction, which could be done |
| 2063 | automatically during installation, and this whole step could be |
| 2064 | skipped.)<p> |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2065 | |
| 2066 | |
| 2067 | <h3>7.4 Cache simulation specifics</h3> |
sewardj | 434f57f | 2002-05-01 01:24:52 +0000 | [diff] [blame] | 2068 | |
| 2069 | <code>vg_cachegen</code> only generates simulations for a machine with |
| 2070 | a split L1 cache and a unified L2 cache. This configuration is used |
| 2071 | for all (modern) x86-based machines we are aware of. Old Cyrix CPUs |
| 2072 | had a unified I and D L1 cache, but they are ancient history now.<p> |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2073 | |
| 2074 | The more specific characteristics of the simulation are as follows. |
| 2075 | |
| 2076 | <ul> |
sewardj | 434f57f | 2002-05-01 01:24:52 +0000 | [diff] [blame] | 2077 | <li>Write-allocate: when a write miss occurs, the block written to |
| 2078 | is brought into the D1 cache. Most modern caches have this |
| 2079 | property.</li><p> |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2080 | |
sewardj | 434f57f | 2002-05-01 01:24:52 +0000 | [diff] [blame] | 2081 | <li>Bit-selection hash function: the line(s) in the cache to which a |
| 2082 | memory block maps is chosen by the middle bits M--(M+N-1) of the |
| 2083 | byte address, where: |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2084 | <ul> |
| 2085 | <li> line size = 2^M bytes </li> |
| 2086 | <li>(cache size / line size) = 2^N bytes</li> |
| 2087 | </ul> </li><p> |
| 2088 | |
sewardj | 434f57f | 2002-05-01 01:24:52 +0000 | [diff] [blame] | 2089 | <li>Inclusive L2 cache: the L2 cache replicates all the entries of |
| 2090 | the L1 cache. This is standard on Pentium chips, but AMD |
| 2091 | Athlons use an exclusive L2 cache that only holds blocks evicted |
| 2092 | from L1. Ditto AMD Durons and most modern VIAs.</li><p> |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2093 | </ul> |
| 2094 | |
| 2095 | Other noteworthy behaviour: |
| 2096 | |
| 2097 | <ul> |
| 2098 | <li>References that straddle two cache lines are treated as follows:</li> |
| 2099 | <ul> |
| 2100 | <li>If both blocks hit --> counted as one hit</li> |
| 2101 | <li>If one block hits, the other misses --> counted as one miss</li> |
| 2102 | <li>If both blocks miss --> counted as one miss (not two)</li> |
| 2103 | </ul><p> |
| 2104 | |
| 2105 | <li>Instructions that modify a memory location (eg. <code>inc</code> and |
| 2106 | <code>dec</code>) are counted as doing just a read, ie. a single data |
| 2107 | reference. This may seem strange, but since the write can never cause a |
| 2108 | miss (the read guarantees the block is in the cache) it's not very |
| 2109 | interesting.<p> |
| 2110 | |
| 2111 | Thus it measures not the number of times the data cache is accessed, but |
| 2112 | the number of times a data cache miss could occur.<p> |
| 2113 | </li> |
| 2114 | </ul> |
| 2115 | |
| 2116 | If you are interested in simulating a cache with different properties, it is |
| 2117 | not particularly hard to write your own cache simulator, or to modify existing |
| 2118 | ones in <code>vg_cachesim_I1.c</code>, <code>vg_cachesim_I1.c</code> and |
| 2119 | <code>vg_cachesim_I1.c</code>. We'd be interested to hear from anyone who |
| 2120 | does. |
| 2121 | |
| 2122 | |
| 2123 | <a name="profile"></a> |
| 2124 | <h3>7.5 Profiling programs</h3> |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2125 | |
sewardj | 434f57f | 2002-05-01 01:24:52 +0000 | [diff] [blame] | 2126 | Cache profiling is enabled by using the <code>--cachesim=yes</code> |
| 2127 | option to the <code>valgrind</code> shell script. Alternatively, it |
| 2128 | is probably more convenient to use the <code>cachegrind</code> script. |
| 2129 | This automatically turns off Valgrind's memory checking functions, |
| 2130 | since the cache simulation is slow enough already, and you probably |
| 2131 | don't want to do both at once. |
| 2132 | <p> |
| 2133 | To gather cache profiling information about the program <code>ls |
| 2134 | -l<code, type: |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2135 | |
sewardj | 434f57f | 2002-05-01 01:24:52 +0000 | [diff] [blame] | 2136 | <blockquote><code>cachegrind ls -l</code></blockquote> |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2137 | |
| 2138 | The program will execute (slowly). Upon completion, summary statistics |
| 2139 | that look like this will be printed: |
| 2140 | |
| 2141 | <pre> |
| 2142 | ==31751== I refs: 27,742,716 |
| 2143 | ==31751== I1 misses: 276 |
| 2144 | ==31751== L2 misses: 275 |
| 2145 | ==31751== I1 miss rate: 0.0% |
| 2146 | ==31751== L2i miss rate: 0.0% |
| 2147 | ==31751== |
| 2148 | ==31751== D refs: 15,430,290 (10,955,517 rd + 4,474,773 wr) |
| 2149 | ==31751== D1 misses: 41,185 ( 21,905 rd + 19,280 wr) |
| 2150 | ==31751== L2 misses: 23,085 ( 3,987 rd + 19,098 wr) |
| 2151 | ==31751== D1 miss rate: 0.2% ( 0.1% + 0.4%) |
| 2152 | ==31751== L2d miss rate: 0.1% ( 0.0% + 0.4%) |
| 2153 | ==31751== |
| 2154 | ==31751== L2 misses: 23,360 ( 4,262 rd + 19,098 wr) |
| 2155 | ==31751== L2 miss rate: 0.0% ( 0.0% + 0.4%) |
| 2156 | </pre> |
| 2157 | |
| 2158 | Cache accesses for instruction fetches are summarised first, giving the |
| 2159 | number of fetches made (this is the number of instructions executed, which |
| 2160 | can be useful to know in its own right), the number of I1 misses, and the |
| 2161 | number of L2 instruction (<code>L2i</code>) misses.<p> |
| 2162 | |
| 2163 | Cache accesses for data follow. The information is similar to that of the |
| 2164 | instruction fetches, except that the values are also shown split between reads |
| 2165 | and writes (note each row's <code>rd</code> and <code>wr</code> values add up |
| 2166 | to the row's total).<p> |
| 2167 | |
| 2168 | Combined instruction and data figures for the L2 cache follow that.<p> |
| 2169 | |
| 2170 | |
| 2171 | <h3>7.6 Output file</h3> |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2172 | |
sewardj | 434f57f | 2002-05-01 01:24:52 +0000 | [diff] [blame] | 2173 | As well as printing summary information, Cachegrind also writes |
| 2174 | line-by-line cache profiling information to a file named |
| 2175 | <code>cachegrind.out</code>. This file is human-readable, but is best |
| 2176 | interpreted by the accompanying program <code>vg_annotate</code>, |
| 2177 | described in the next section. |
| 2178 | <p> |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2179 | Things to note about the <code>cachegrind.out</code> file: |
| 2180 | <ul> |
sewardj | 434f57f | 2002-05-01 01:24:52 +0000 | [diff] [blame] | 2181 | <li>It is written every time <code>valgrind --cachesim=yes</code> or |
| 2182 | <code>cachegrind</code> is run, and will overwrite any existing |
| 2183 | <code>cachegrind.out</code> in the current directory.</li> |
| 2184 | <p> |
| 2185 | <li>It can be huge: <code>ls -l</code> generates a file of about |
| 2186 | 350KB. Browsing a few files and web pages with a Konqueror |
| 2187 | built with full debugging information generates a file |
| 2188 | of around 15 MB.</li> |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2189 | </ul> |
| 2190 | |
| 2191 | |
| 2192 | <a name="annotate"></a> |
| 2193 | <h3>7.7 Annotating C/C++ programs</h3> |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2194 | |
sewardj | 434f57f | 2002-05-01 01:24:52 +0000 | [diff] [blame] | 2195 | Before using <code>vg_annotate</code>, it is worth widening your |
| 2196 | window to be at least 120-characters wide if possible, as the output |
| 2197 | lines can be quite long. |
| 2198 | <p> |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2199 | To get a function-by-function summary, run <code>vg_annotate</code> in |
sewardj | 434f57f | 2002-05-01 01:24:52 +0000 | [diff] [blame] | 2200 | directory containing a <code>cachegrind.out</code> file. The output |
| 2201 | looks like this: |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2202 | |
| 2203 | <pre> |
| 2204 | -------------------------------------------------------------------------------- |
| 2205 | I1 cache: 65536 B, 64 B, 2-way associative |
| 2206 | D1 cache: 65536 B, 64 B, 2-way associative |
| 2207 | L2 cache: 262144 B, 64 B, 8-way associative |
| 2208 | Command: concord vg_to_ucode.c |
| 2209 | Events recorded: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw |
| 2210 | Events shown: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw |
| 2211 | Event sort order: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw |
| 2212 | Threshold: 99% |
| 2213 | Chosen for annotation: |
| 2214 | Auto-annotation: on |
| 2215 | |
| 2216 | -------------------------------------------------------------------------------- |
| 2217 | Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw |
| 2218 | -------------------------------------------------------------------------------- |
| 2219 | 27,742,716 276 275 10,955,517 21,905 3,987 4,474,773 19,280 19,098 PROGRAM TOTALS |
| 2220 | |
| 2221 | -------------------------------------------------------------------------------- |
| 2222 | Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw file:function |
| 2223 | -------------------------------------------------------------------------------- |
| 2224 | 8,821,482 5 5 2,242,702 1,621 73 1,794,230 0 0 getc.c:_IO_getc |
| 2225 | 5,222,023 4 4 2,276,334 16 12 875,959 1 1 concord.c:get_word |
| 2226 | 2,649,248 2 2 1,344,810 7,326 1,385 . . . vg_main.c:strcmp |
| 2227 | 2,521,927 2 2 591,215 0 0 179,398 0 0 concord.c:hash |
| 2228 | 2,242,740 2 2 1,046,612 568 22 448,548 0 0 ctype.c:tolower |
| 2229 | 1,496,937 4 4 630,874 9,000 1,400 279,388 0 0 concord.c:insert |
| 2230 | 897,991 51 51 897,831 95 30 62 1 1 ???:??? |
| 2231 | 598,068 1 1 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__flockfile |
| 2232 | 598,068 0 0 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__funlockfile |
| 2233 | 598,024 4 4 213,580 35 16 149,506 0 0 vg_clientmalloc.c:malloc |
| 2234 | 446,587 1 1 215,973 2,167 430 129,948 14,057 13,957 concord.c:add_existing |
| 2235 | 341,760 2 2 128,160 0 0 128,160 0 0 vg_clientmalloc.c:vg_trap_here_WRAPPER |
| 2236 | 320,782 4 4 150,711 276 0 56,027 53 53 concord.c:init_hash_table |
| 2237 | 298,998 1 1 106,785 0 0 64,071 1 1 concord.c:create |
| 2238 | 149,518 0 0 149,516 0 0 1 0 0 ???:tolower@@GLIBC_2.0 |
| 2239 | 149,518 0 0 149,516 0 0 1 0 0 ???:fgetc@@GLIBC_2.0 |
| 2240 | 95,983 4 4 38,031 0 0 34,409 3,152 3,150 concord.c:new_word_node |
| 2241 | 85,440 0 0 42,720 0 0 21,360 0 0 vg_clientmalloc.c:vg_bogus_epilogue |
| 2242 | </pre> |
| 2243 | |
| 2244 | First up is a summary of the annotation options: |
| 2245 | |
| 2246 | <ul> |
| 2247 | <li>I1 cache, D1 cache, L2 cache: cache configuration. So you know the |
| 2248 | configuration with which these results were obtained.</li><p> |
| 2249 | |
| 2250 | <li>Command: the command line invocation of the program under |
| 2251 | examination.</li><p> |
| 2252 | |
| 2253 | <li>Events recorded: event abbreviations are:<p> |
| 2254 | <ul> |
| 2255 | <li><code>Ir </code>: I cache reads (ie. instructions executed)</li> |
| 2256 | <li><code>I1mr</code>: I1 cache read misses</li> |
| 2257 | <li><code>I2mr</code>: L2 cache instruction read misses</li> |
| 2258 | <li><code>Dr </code>: D cache reads (ie. memory reads)</li> |
| 2259 | <li><code>D1mr</code>: D1 cache read misses</li> |
| 2260 | <li><code>D2mr</code>: L2 cache data read misses</li> |
| 2261 | <li><code>Dw </code>: D cache writes (ie. memory writes)</li> |
| 2262 | <li><code>D1mw</code>: D1 cache write misses</li> |
| 2263 | <li><code>D2mw</code>: L2 cache data write misses</li> |
| 2264 | </ul><p> |
| 2265 | Note that D1 total accesses is given by <code>D1mr</code> + |
| 2266 | <code>D1mw</code>, and that L2 total accesses is given by |
| 2267 | <code>I2mr</code> + <code>D2mr</code> + <code>D2mw</code>.</li><p> |
| 2268 | |
| 2269 | <li>Events shown: the events shown (a subset of events gathered). This can |
| 2270 | be adjusted with the <code>--show</code> option.</li><p> |
| 2271 | |
| 2272 | <li>Event sort order: the sort order in which functions are shown. For |
| 2273 | example, in this case the functions are sorted from highest |
| 2274 | <code>Ir</code> counts to lowest. If two functions have identical |
| 2275 | <code>Ir</code> counts, they will then be sorted by <code>I1mr</code> |
| 2276 | counts, and so on. This order can be adjusted with the |
| 2277 | <code>--sort</code> option.<p> |
| 2278 | |
| 2279 | Note that this dictates the order the functions appear. It is <b>not</b> |
| 2280 | the order in which the columns appear; that is dictated by the "events |
| 2281 | shown" line (and can be changed with the <code>--sort</code> option). |
| 2282 | </li><p> |
| 2283 | |
sewardj | 434f57f | 2002-05-01 01:24:52 +0000 | [diff] [blame] | 2284 | <li>Threshold: <code>vg_annotate</code> by default omits functions |
| 2285 | that cause very low numbers of misses to avoid drowning you in |
| 2286 | information. In this case, vg_annotate shows summaries the |
| 2287 | functions that account for 99% of the <code>Ir</code> counts; |
| 2288 | <code>Ir</code> is chosen as the threshold event since it is the |
| 2289 | primary sort event. The threshold can be adjusted with the |
| 2290 | <code>--threshold</code> option.</li><p> |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2291 | |
| 2292 | <li>Chosen for annotation: names of files specified manually for annotation; |
| 2293 | in this case none.</li><p> |
| 2294 | |
| 2295 | <li>Auto-annotation: whether auto-annotation was requested via the |
| 2296 | <code>--auto=yes</code> option. In this case no.</li><p> |
| 2297 | </ul> |
| 2298 | |
| 2299 | Then follows summary statistics for the whole program. These are similar |
| 2300 | to the summary provided when running <code>valgrind --cachesim=yes</code>.<p> |
| 2301 | |
sewardj | 434f57f | 2002-05-01 01:24:52 +0000 | [diff] [blame] | 2302 | Then follows function-by-function statistics. Each function is |
| 2303 | identified by a <code>file_name:function_name</code> pair. If a column |
| 2304 | contains only a dot it means the function never performs |
| 2305 | that event (eg. the third row shows that <code>strcmp()</code> |
| 2306 | contains no instructions that write to memory). The name |
| 2307 | <code>???</code> is used if the the file name and/or function name |
| 2308 | could not be determined from debugging information. If most of the |
| 2309 | entries have the form <code>???:???</code> the program probably wasn't |
| 2310 | compiled with <code>-g</code>. <p> |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2311 | |
| 2312 | It is worth noting that functions will come from three types of source files: |
| 2313 | <ol> |
| 2314 | <li> From the profiled program (<code>concord.c</code> in this example).</li> |
| 2315 | <li>From libraries (eg. <code>getc.c</code>)</li> |
| 2316 | <li>From Valgrind's implementation of some libc functions (eg. |
| 2317 | <code>vg_clientmalloc.c:malloc</code>). These are recognisable because |
| 2318 | the filename begins with <code>vg_</code>, and is probably one of |
| 2319 | <code>vg_main.c</code>, <code>vg_clientmalloc.c</code> or |
| 2320 | <code>vg_mylibc.c</code>. |
| 2321 | </li> |
| 2322 | </ol> |
| 2323 | |
sewardj | 434f57f | 2002-05-01 01:24:52 +0000 | [diff] [blame] | 2324 | There are two ways to annotate source files -- by choosing them |
| 2325 | manually, or with the <code>--auto=yes</code> option. To do it |
| 2326 | manually, just specify the filenames as arguments to |
| 2327 | <code>vg_annotate</code>. For example, the output from running |
| 2328 | <code>vg_annotate concord.c</code> for our example produces the same |
| 2329 | output as above followed by an annotated version of |
| 2330 | <code>concord.c</code>, a section of which looks like: |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2331 | |
| 2332 | <pre> |
| 2333 | -------------------------------------------------------------------------------- |
| 2334 | -- User-annotated source: concord.c |
| 2335 | -------------------------------------------------------------------------------- |
| 2336 | Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw |
| 2337 | |
| 2338 | [snip] |
| 2339 | |
| 2340 | . . . . . . . . . void init_hash_table(char *file_name, Word_Node *table[]) |
| 2341 | 3 1 1 . . . 1 0 0 { |
| 2342 | . . . . . . . . . FILE *file_ptr; |
| 2343 | . . . . . . . . . Word_Info *data; |
| 2344 | 1 0 0 . . . 1 1 1 int line = 1, i; |
| 2345 | . . . . . . . . . |
| 2346 | 5 0 0 . . . 3 0 0 data = (Word_Info *) create(sizeof(Word_Info)); |
| 2347 | . . . . . . . . . |
| 2348 | 4,991 0 0 1,995 0 0 998 0 0 for (i = 0; i < TABLE_SIZE; i++) |
| 2349 | 3,988 1 1 1,994 0 0 997 53 52 table[i] = NULL; |
| 2350 | . . . . . . . . . |
| 2351 | . . . . . . . . . /* Open file, check it. */ |
| 2352 | 6 0 0 1 0 0 4 0 0 file_ptr = fopen(file_name, "r"); |
| 2353 | 2 0 0 1 0 0 . . . if (!(file_ptr)) { |
| 2354 | . . . . . . . . . fprintf(stderr, "Couldn't open '%s'.\n", file_name); |
| 2355 | 1 1 1 . . . . . . exit(EXIT_FAILURE); |
| 2356 | . . . . . . . . . } |
| 2357 | . . . . . . . . . |
| 2358 | 165,062 1 1 73,360 0 0 91,700 0 0 while ((line = get_word(data, line, file_ptr)) != EOF) |
| 2359 | 146,712 0 0 73,356 0 0 73,356 0 0 insert(data->;word, data->line, table); |
| 2360 | . . . . . . . . . |
| 2361 | 4 0 0 1 0 0 2 0 0 free(data); |
| 2362 | 4 0 0 1 0 0 2 0 0 fclose(file_ptr); |
| 2363 | 3 0 0 2 0 0 . . . } |
| 2364 | </pre> |
| 2365 | |
| 2366 | (Although column widths are automatically minimised, a wide terminal is clearly |
| 2367 | useful.)<p> |
| 2368 | |
| 2369 | Each source file is clearly marked (<code>User-annotated source</code>) as |
| 2370 | having been chosen manually for annotation. If the file was found in one of |
| 2371 | the directories specified with the <code>-I</code>/<code>--include</code> |
| 2372 | option, the directory and file are both given.<p> |
| 2373 | |
| 2374 | Each line is annotated with its event counts. Events not applicable for a line |
| 2375 | are represented by a `.'; this is useful for distinguishing between an event |
| 2376 | which cannot happen, and one which can but did not.<p> |
| 2377 | |
| 2378 | Sometimes only a small section of a source file is executed. To minimise |
| 2379 | uninteresting output, Valgrind only shows annotated lines and lines within a |
| 2380 | small distance of annotated lines. Gaps are marked with the line numbers so |
| 2381 | you know which part of a file the shown code comes from, eg: |
| 2382 | |
| 2383 | <pre> |
| 2384 | (figures and code for line 704) |
| 2385 | -- line 704 ---------------------------------------- |
| 2386 | -- line 878 ---------------------------------------- |
| 2387 | (figures and code for line 878) |
| 2388 | </pre> |
| 2389 | |
| 2390 | The amount of context to show around annotated lines is controlled by the |
| 2391 | <code>--context</code> option.<p> |
| 2392 | |
| 2393 | To get automatic annotation, run <code>vg_annotate --auto=yes</code>. |
| 2394 | vg_annotate will automatically annotate every source file it can find that is |
| 2395 | mentioned in the function-by-function summary. Therefore, the files chosen for |
| 2396 | auto-annotation are affected by the <code>--sort</code> and |
| 2397 | <code>--threshold</code> options. Each source file is clearly marked |
| 2398 | (<code>Auto-annotated source</code>) as being chosen automatically. Any files |
| 2399 | that could not be found are mentioned at the end of the output, eg: |
| 2400 | |
| 2401 | <pre> |
| 2402 | -------------------------------------------------------------------------------- |
| 2403 | The following files chosen for auto-annotation could not be found: |
| 2404 | -------------------------------------------------------------------------------- |
| 2405 | getc.c |
| 2406 | ctype.c |
| 2407 | ../sysdeps/generic/lockfile.c |
| 2408 | </pre> |
| 2409 | |
| 2410 | This is quite common for library files, since libraries are usually compiled |
| 2411 | with debugging information, but the source files are often not present on a |
| 2412 | system. If a file is chosen for annotation <b>both</b> manually and |
| 2413 | automatically, it is marked as <code>User-annotated source</code>. |
| 2414 | |
| 2415 | Use the <code>-I/--include</code> option to tell Valgrind where to look for |
| 2416 | source files if the filenames found from the debugging information aren't |
| 2417 | specific enough. |
| 2418 | |
| 2419 | Beware that vg_annotate can take some time to digest large |
| 2420 | <code>cachegrind.out</code> files, eg. 30 seconds or more. Also beware that |
| 2421 | auto-annotation can produce a lot of output if your program is large! |
| 2422 | |
| 2423 | |
| 2424 | <h3>7.8 Annotating assembler programs</h3> |
sewardj | 434f57f | 2002-05-01 01:24:52 +0000 | [diff] [blame] | 2425 | |
| 2426 | Valgrind can annotate assembler programs too, or annotate the |
| 2427 | assembler generated for your C program. Sometimes this is useful for |
| 2428 | understanding what is really happening when an interesting line of C |
| 2429 | code is translated into multiple instructions.<p> |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2430 | |
| 2431 | To do this, you just need to assemble your <code>.s</code> files with |
sewardj | 434f57f | 2002-05-01 01:24:52 +0000 | [diff] [blame] | 2432 | assembler-level debug information. gcc doesn't do this, but you can |
| 2433 | use the GNU assembler with the <code>--gstabs</code> option to |
| 2434 | generate object files with this information, eg: |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2435 | |
| 2436 | <blockquote><code>as --gstabs foo.s</code></blockquote> |
| 2437 | |
| 2438 | You can then profile and annotate source files in the same way as for C/C++ |
| 2439 | programs. |
| 2440 | |
| 2441 | |
sewardj | 434f57f | 2002-05-01 01:24:52 +0000 | [diff] [blame] | 2442 | <h3>7.9 <code>vg_annotate</code> options</h3> |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2443 | <ul> |
| 2444 | <li><code>-h, --help</code></li><p> |
| 2445 | <li><code>-v, --version</code><p> |
| 2446 | |
| 2447 | Help and version, as usual.</li> |
| 2448 | |
| 2449 | <li><code>--sort=A,B,C</code> [default: order in |
| 2450 | <code>cachegrind.out</code>]<p> |
| 2451 | Specifies the events upon which the sorting of the function-by-function |
| 2452 | entries will be based. Useful if you want to concentrate on eg. I cache |
| 2453 | misses (<code>--sort=I1mr,I2mr</code>), or D cache misses |
| 2454 | (<code>--sort=D1mr,D2mr</code>), or L2 misses |
| 2455 | (<code>--sort=D2mr,I2mr</code>).</li><p> |
| 2456 | |
| 2457 | <li><code>--show=A,B,C</code> [default: all, using order in |
| 2458 | <code>cachegrind.out</code>]<p> |
| 2459 | Specifies which events to show (and the column order). Default is to use |
| 2460 | all present in the <code>cachegrind.out</code> file (and use the order in |
| 2461 | the file).</li><p> |
| 2462 | |
| 2463 | <li><code>--threshold=X</code> [default: 99%] <p> |
| 2464 | Sets the threshold for the function-by-function summary. Functions are |
njn | bff8876 | 2002-05-13 20:27:54 +0000 | [diff] [blame^] | 2465 | shown that account for more than X% of the primary sort event. If |
| 2466 | auto-annotating, also affects which files are annotated. |
| 2467 | |
| 2468 | Note: thresholds can be set for more than one of the events by appending |
| 2469 | any events for the <code>--sort</code> option with a colon and a number |
| 2470 | (no spaces, though). E.g. if you want to see the functions that cover |
| 2471 | 99% of L2 read misses and 99% of L2 write misses, use this option: |
| 2472 | |
| 2473 | <blockquote><code>--sort=D2mr:99,D2mw:99</code></blockquote> |
| 2474 | </li><p> |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2475 | |
| 2476 | <li><code>--auto=no</code> [default]<br> |
| 2477 | <code>--auto=yes</code> <p> |
| 2478 | When enabled, automatically annotates every file that is mentioned in the |
| 2479 | function-by-function summary that can be found. Also gives a list of |
| 2480 | those that couldn't be found. |
| 2481 | |
| 2482 | <li><code>--context=N</code> [default: 8]<p> |
| 2483 | Print N lines of context before and after each annotated line. Avoids |
| 2484 | printing large sections of source files that were not executed. Use a |
| 2485 | large number (eg. 10,000) to show all source lines. |
| 2486 | </li><p> |
| 2487 | |
| 2488 | <li><code>-I=<dir>, --include=<dir></code> |
| 2489 | [default: empty string]<p> |
| 2490 | Adds a directory to the list in which to search for files. Multiple |
| 2491 | -I/--include options can be given to add multiple directories. |
| 2492 | </ul> |
| 2493 | |
| 2494 | |
| 2495 | <h3>7.10 Warnings</h3> |
| 2496 | There are a couple of situations in which vg_annotate issues warnings. |
| 2497 | |
| 2498 | <ul> |
| 2499 | <li>If a source file is more recent than the <code>cachegrind.out</code> |
| 2500 | file. This is because the information in <code>cachegrind.out</code> is |
| 2501 | only recorded with line numbers, so if the line numbers change at all in |
| 2502 | the source (eg. lines added, deleted, swapped), any annotations will be |
| 2503 | incorrect.<p> |
| 2504 | |
| 2505 | <li>If information is recorded about line numbers past the end of a file. |
| 2506 | This can be caused by the above problem, ie. shortening the source file |
| 2507 | while using an old <code>cachegrind.out</code> file. If this happens, |
| 2508 | the figures for the bogus lines are printed anyway (clearly marked as |
| 2509 | bogus) in case they are important.</li><p> |
| 2510 | </ul> |
| 2511 | |
| 2512 | |
| 2513 | <h3>7.10 Things to watch out for</h3> |
| 2514 | Some odd things that can occur during annotation: |
| 2515 | |
| 2516 | <ul> |
| 2517 | <li>If annotating at the assembler level, you might see something like this: |
| 2518 | |
| 2519 | <pre> |
| 2520 | 1 0 0 . . . . . . leal -12(%ebp),%eax |
| 2521 | 1 0 0 . . . 1 0 0 movl %eax,84(%ebx) |
| 2522 | 2 0 0 0 0 0 1 0 0 movl $1,-20(%ebp) |
| 2523 | . . . . . . . . . .align 4,0x90 |
| 2524 | 1 0 0 . . . . . . movl $.LnrB,%eax |
| 2525 | 1 0 0 . . . 1 0 0 movl %eax,-16(%ebp) |
| 2526 | </pre> |
| 2527 | |
| 2528 | How can the third instruction be executed twice when the others are |
| 2529 | executed only once? As it turns out, it isn't. Here's a dump of the |
| 2530 | executable, from objdump: |
| 2531 | |
| 2532 | <pre> |
| 2533 | 8048f25: 8d 45 f4 lea 0xfffffff4(%ebp),%eax |
| 2534 | 8048f28: 89 43 54 mov %eax,0x54(%ebx) |
| 2535 | 8048f2b: c7 45 ec 01 00 00 00 movl $0x1,0xffffffec(%ebp) |
| 2536 | 8048f32: 89 f6 mov %esi,%esi |
| 2537 | 8048f34: b8 08 8b 07 08 mov $0x8078b08,%eax |
| 2538 | 8048f39: 89 45 f0 mov %eax,0xfffffff0(%ebp) |
| 2539 | </pre> |
| 2540 | |
| 2541 | Notice the extra <code>mov %esi,%esi</code> instruction. Where did this |
| 2542 | come from? The GNU assembler inserted it to serve as the two bytes of |
| 2543 | padding needed to align the <code>movl $.LnrB,%eax</code> instruction on |
| 2544 | a four-byte boundary, but pretended it didn't exist when adding debug |
| 2545 | information. Thus when Valgrind reads the debug info it thinks that the |
| 2546 | <code>movl $0x1,0xffffffec(%ebp)</code> instruction covers the address |
| 2547 | range 0x8048f2b--0x804833 by itself, and attributes the counts for the |
| 2548 | <code>mov %esi,%esi</code> to it.<p> |
| 2549 | </li> |
| 2550 | |
njn | 7efaa11 | 2002-05-07 10:26:57 +0000 | [diff] [blame] | 2551 | <li>Inlined functions can cause strange results in the function-by-function |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2552 | summary. If a function <code>inline_me()</code> is defined in |
| 2553 | <code>foo.h</code> and inlined in the functions <code>f1()</code>, |
| 2554 | <code>f2()</code> and <code>f3()</code> in <code>bar.c</code>, there will |
| 2555 | not be a <code>foo.h:inline_me()</code> function entry. Instead, there |
| 2556 | will be separate function entries for each inlining site, ie. |
| 2557 | <code>foo.h:f1()</code>, <code>foo.h:f2()</code> and |
| 2558 | <code>foo.h:f3()</code>. To find the total counts for |
| 2559 | <code>foo.h:inline_me()</code>, add up the counts from each entry.<p> |
| 2560 | |
| 2561 | The reason for this is that although the debug info output by gcc |
| 2562 | indicates the switch from <code>bar.c</code> to <code>foo.h</code>, it |
| 2563 | doesn't indicate the name of the function in <code>foo.h</code>, so |
| 2564 | Valgrind keeps using the old one.<p> |
| 2565 | |
njn | 7efaa11 | 2002-05-07 10:26:57 +0000 | [diff] [blame] | 2566 | <li>Sometimes, the same filename might be represented with a relative name |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2567 | and with an absolute name in different parts of the debug info, eg: |
| 2568 | <code>/home/user/proj/proj.h</code> and <code>../proj.h</code>. In this |
| 2569 | case, if you use auto-annotation, the file will be annotated twice with |
| 2570 | the counts split between the two.<p> |
| 2571 | </li> |
njn | 7efaa11 | 2002-05-07 10:26:57 +0000 | [diff] [blame] | 2572 | |
| 2573 | <li>Files with more than 65,535 lines cause difficulties for the stabs debug |
| 2574 | info reader. This is because the line number in the <code>struct |
| 2575 | nlist</code> defined in <code>a.out.h</code> under Linux is only a 16-bit |
| 2576 | number. Valgrind can handle some files with more than 65,535 lines |
| 2577 | correctly by making some guesses to identify line number overflows. But |
| 2578 | some cases are beyond it, in which case you'll get a warning message |
njn | bff8876 | 2002-05-13 20:27:54 +0000 | [diff] [blame^] | 2579 | explaining that annotations for the file might be incorrect.<p> |
| 2580 | </li> |
| 2581 | |
| 2582 | <li>If you compile some files with <code>-g</code> and some without, some |
| 2583 | events that take place in a file without debug info could be attributed |
| 2584 | to the last line of a file with debug info (whichever one gets placed |
| 2585 | before the non-debug-info file in the executable).<p> |
njn | 7efaa11 | 2002-05-07 10:26:57 +0000 | [diff] [blame] | 2586 | </li> |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2587 | </ul> |
| 2588 | |
njn | bff8876 | 2002-05-13 20:27:54 +0000 | [diff] [blame^] | 2589 | This list looks long, but these cases should be fairly rare.<p> |
| 2590 | |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2591 | Note: stabs is not an easy format to read. If you come across bizarre |
| 2592 | annotations that look like might be caused by a bug in the stabs reader, |
njn | bff8876 | 2002-05-13 20:27:54 +0000 | [diff] [blame^] | 2593 | please let us know.<p> |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2594 | |
| 2595 | |
| 2596 | <h3>7.11 Accuracy</h3> |
| 2597 | Valgrind's cache profiling has a number of shortcomings: |
| 2598 | |
| 2599 | <ul> |
| 2600 | <li>It doesn't account for kernel activity -- the effect of system calls on |
| 2601 | the cache contents is ignored.</li><p> |
| 2602 | |
| 2603 | <li>It doesn't account for other process activity (although this is probably |
| 2604 | desirable when considering a single program).</li><p> |
| 2605 | |
| 2606 | <li>It doesn't account for virtual-to-physical address mappings; hence the |
| 2607 | entire simulation is not a true representation of what's happening in the |
| 2608 | cache.</li><p> |
| 2609 | |
| 2610 | <li>It doesn't account for cache misses not visible at the instruction level, |
| 2611 | eg. those arising from TLB misses, or speculative execution.</li><p> |
njn | db75e4d | 2002-04-30 12:46:22 +0000 | [diff] [blame] | 2612 | |
njn | bff8876 | 2002-05-13 20:27:54 +0000 | [diff] [blame^] | 2613 | <li>Valgrind's custom <code>malloc()</code> will allocate memory in different |
| 2614 | ways to the standard <code>malloc()</code>, which could warp the results. |
| 2615 | </li><p> |
| 2616 | |
njn | db75e4d | 2002-04-30 12:46:22 +0000 | [diff] [blame] | 2617 | <li>The instructions <code>bts</code>, <code>btr</code> and <code>btc</code> |
| 2618 | will incorrectly be counted as doing a data read if both the arguments |
| 2619 | are registers, eg: |
| 2620 | |
| 2621 | <blockquote><code>btsl %eax, %edx</code></blockquote> |
| 2622 | |
| 2623 | This should only happen rarely. |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2624 | </ul> |
| 2625 | |
| 2626 | Another thing worth nothing is that results are very sensitive. Changing the |
| 2627 | size of the <code>valgrind.so</code> file, the size of the program being |
| 2628 | profiled, or even the length of its name can perturb the results. Variations |
| 2629 | will be small, but don't expect perfectly repeatable results if your program |
| 2630 | changes at all.<p> |
| 2631 | |
| 2632 | While these factors mean you shouldn't trust the results to be super-accurate, |
| 2633 | hopefully they should be close enough to be useful.<p> |
| 2634 | |
| 2635 | |
| 2636 | <h3>7.12 Todo</h3> |
| 2637 | <ul> |
| 2638 | <li>Use CPUID instruction to auto-identify cache configuration during |
| 2639 | installation. This would save the user from having to know their cache |
sewardj | 434f57f | 2002-05-01 01:24:52 +0000 | [diff] [blame] | 2640 | configuration and using vg_cachegen.</li> |
| 2641 | <p> |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2642 | <li>Program start-up/shut-down calls a lot of functions that aren't |
| 2643 | interesting and just complicate the output. Would be nice to exclude |
sewardj | 434f57f | 2002-05-01 01:24:52 +0000 | [diff] [blame] | 2644 | these somehow.</li> |
| 2645 | <p> |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2646 | </ul> |
| 2647 | <hr width="100%"> |
sewardj | de4a1d0 | 2002-03-22 01:27:54 +0000 | [diff] [blame] | 2648 | </body> |
| 2649 | </html> |
njn | 4f9c934 | 2002-04-29 16:03:24 +0000 | [diff] [blame] | 2650 | |