Jeff Brown | ed07e00 | 2011-02-03 17:46:23 -0800 | [diff] [blame] | 1 | |
| 2 | |
| 3 | Valgrind FAQ |
| 4 | Release 3.6.0 21 October 2010 |
| 5 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 6 | |
| 7 | Table of Contents |
| 8 | 1. Background |
| 9 | 2. Compiling, installing and configuring |
| 10 | 3. Valgrind aborts unexpectedly |
| 11 | 4. Valgrind behaves unexpectedly |
| 12 | 5. Miscellaneous |
| 13 | 6. How To Get Further Assistance |
| 14 | |
| 15 | ------------------------------------------------------------------------ |
| 16 | 1. Background |
| 17 | ------------------------------------------------------------------------ |
| 18 | |
| 19 | 1.1. How do you pronounce "Valgrind"? |
| 20 | The "Val" as in the world "value". The "grind" is pronounced with a |
| 21 | short 'i' -- ie. "grinned" (rhymes with "tinned") rather than "grined" |
| 22 | (rhymes with "find"). |
| 23 | |
| 24 | Don't feel bad: almost everyone gets it wrong at first. |
| 25 | ------------------------------------------------------------------------ |
| 26 | |
| 27 | 1.2. Where does the name "Valgrind" come from? |
| 28 | From Nordic mythology. Originally (before release) the project was named |
| 29 | Heimdall, after the watchman of the Nordic gods. He could "see a hundred |
| 30 | miles by day or night, hear the grass growing, see the wool growing on a |
| 31 | sheep's back", etc. This would have been a great name, but it was |
| 32 | already taken by a security package "Heimdal". |
| 33 | |
| 34 | Keeping with the Nordic theme, Valgrind was chosen. Valgrind is the name |
| 35 | of the main entrance to Valhalla (the Hall of the Chosen Slain in |
| 36 | Asgard). Over this entrance there resides a wolf and over it there is |
| 37 | the head of a boar and on it perches a huge eagle, whose eyes can see to |
| 38 | the far regions of the nine worlds. Only those judged worthy by the |
| 39 | guardians are allowed to pass through Valgrind. All others are refused |
| 40 | entrance. |
| 41 | |
| 42 | It's not short for "value grinder", although that's not a bad guess. |
| 43 | |
| 44 | ------------------------------------------------------------------------ |
| 45 | 2. Compiling, installing and configuring |
| 46 | ------------------------------------------------------------------------ |
| 47 | |
| 48 | 2.1. When building Valgrind, 'make' dies partway with an assertion |
| 49 | failure, something like this: |
| 50 | |
| 51 | % make: expand.c:489: allocated_variable_append: |
| 52 | Assertion 'current_variable_set_list->next != 0' failed. |
| 53 | |
| 54 | It's probably a bug in 'make'. Some, but not all, instances of version |
| 55 | 3.79.1 have this bug, see this: |
| 56 | <http://www.mail-archive.com/bug-make@gnu.org/msg01658.html>. Try |
| 57 | upgrading to a more recent version of 'make'. Alternatively, we have |
| 58 | heard that unsetting the CFLAGS environment variable avoids the problem. |
| 59 | |
| 60 | ------------------------------------------------------------------------ |
| 61 | |
| 62 | 2.2. When building Valgrind, 'make' fails with this: |
| 63 | /usr/bin/ld: cannot find -lc |
| 64 | collect2: ld returned 1 exit status |
| 65 | |
| 66 | You need to install the glibc-static-devel package. |
| 67 | |
| 68 | ------------------------------------------------------------------------ |
| 69 | 3. Valgrind aborts unexpectedly |
| 70 | ------------------------------------------------------------------------ |
| 71 | |
| 72 | 3.1. Programs run OK on Valgrind, but at exit produce a bunch of errors |
| 73 | involving __libc_freeres and then die with a segmentation fault. |
| 74 | |
| 75 | When the program exits, Valgrind runs the procedure __libc_freeres in |
| 76 | glibc. This is a hook for memory debuggers, so they can ask glibc to |
| 77 | free up any memory it has used. Doing that is needed to ensure that |
| 78 | Valgrind doesn't incorrectly report space leaks in glibc. |
| 79 | |
| 80 | The problem is that running __libc_freeres in older glibc versions |
| 81 | causes this crash. |
| 82 | |
| 83 | Workaround for 1.1.X and later versions of Valgrind: use the |
| 84 | --run-libc-freeres=no option. You may then get space leak reports for |
| 85 | glibc allocations (please don't report these to the glibc people, since |
| 86 | they are not real leaks), but at least the program runs. |
| 87 | |
| 88 | ------------------------------------------------------------------------ |
| 89 | |
| 90 | 3.2. My (buggy) program dies like this: |
| 91 | valgrind: m_mallocfree.c:248 (get_bszB_as_is): Assertion 'bszB_lo == bszB_hi' failed. |
| 92 | |
| 93 | or like this: |
| 94 | valgrind: m_mallocfree.c:442 (mk_inuse_bszB): Assertion 'bszB != 0' failed. |
| 95 | |
| 96 | or otherwise aborts or crashes in m_mallocfree.c. |
| 97 | If Memcheck (the memory checker) shows any invalid reads, invalid writes |
| 98 | or invalid frees in your program, the above may happen. Reason is that |
| 99 | your program may trash Valgrind's low-level memory manager, which then |
| 100 | dies with the above assertion, or something similar. The cure is to fix |
| 101 | your program so that it doesn't do any illegal memory accesses. The |
| 102 | above failure will hopefully go away after that. |
| 103 | |
| 104 | ------------------------------------------------------------------------ |
| 105 | |
| 106 | 3.3. My program dies, printing a message like this along the way: |
| 107 | vex x86->IR: unhandled instruction bytes: 0x66 0xF 0x2E 0x5 |
| 108 | |
| 109 | One possibility is that your program has a bug and erroneously jumps to |
| 110 | a non-code address, in which case you'll get a SIGILL signal. Memcheck |
| 111 | may issue a warning just before this happens, but it might not if the |
| 112 | jump happens to land in addressable memory. |
| 113 | |
| 114 | Another possibility is that Valgrind does not handle the instruction. If |
| 115 | you are using an older Valgrind, a newer version might handle the |
| 116 | instruction. However, all instruction sets have some obscure, rarely |
| 117 | used instructions. Also, on amd64 there are an almost limitless number |
| 118 | of combinations of redundant instruction prefixes, many of them |
| 119 | undocumented but accepted by CPUs. So Valgrind will still have decoding |
| 120 | failures from time to time. If this happens, please file a bug report. |
| 121 | |
| 122 | ------------------------------------------------------------------------ |
| 123 | |
| 124 | 3.4. I tried running a Java program (or another program that uses a |
| 125 | just-in-time compiler) under Valgrind but something went wrong. Does |
| 126 | Valgrind handle such programs? |
| 127 | |
| 128 | Valgrind can handle dynamically generated code, so long as none of the |
| 129 | generated code is later overwritten by other generated code. If this |
| 130 | happens, though, things will go wrong as Valgrind will continue running |
| 131 | its translations of the old code (this is true on x86 and amd64, on |
| 132 | PowerPC there are explicit cache flush instructions which Valgrind |
| 133 | detects and honours). You should try running with --smc-check=all in |
| 134 | this case. Valgrind will run much more slowly, but should detect the use |
| 135 | of the out-of-date code. |
| 136 | |
| 137 | Alternatively, if you have the source code to the JIT compiler you can |
| 138 | insert calls to the VALGRIND_DISCARD_TRANSLATIONS client request to mark |
| 139 | out-of-date code, saving you from using --smc-check=all. |
| 140 | |
| 141 | Apart from this, in theory Valgrind can run any Java program just fine, |
| 142 | even those that use JNI and are partially implemented in other languages |
| 143 | like C and C++. In practice, Java implementations tend to do nasty |
| 144 | things that most programs do not, and Valgrind sometimes falls over |
| 145 | these corner cases. |
| 146 | |
| 147 | If your Java programs do not run under Valgrind, even with |
| 148 | --smc-check=all, please file a bug report and hopefully we'll be able to |
| 149 | fix the problem. |
| 150 | |
| 151 | |
| 152 | ------------------------------------------------------------------------ |
| 153 | 4. Valgrind behaves unexpectedly |
| 154 | ------------------------------------------------------------------------ |
| 155 | |
| 156 | 4.1. My program uses the C++ STL and string classes. Valgrind reports |
| 157 | 'still reachable' memory leaks involving these classes at the exit of |
| 158 | the program, but there should be none. |
| 159 | |
| 160 | First of all: relax, it's probably not a bug, but a feature. Many |
| 161 | implementations of the C++ standard libraries use their own memory pool |
| 162 | allocators. Memory for quite a number of destructed objects is not |
| 163 | immediately freed and given back to the OS, but kept in the pool(s) for |
| 164 | later re-use. The fact that the pools are not freed at the exit of the |
| 165 | program cause Valgrind to report this memory as still reachable. The |
| 166 | behaviour not to free pools at the exit could be called a bug of the |
| 167 | library though. |
| 168 | |
| 169 | Using GCC, you can force the STL to use malloc and to free memory as |
| 170 | soon as possible by globally disabling memory caching. Beware! Doing so |
| 171 | will probably slow down your program, sometimes drastically. |
| 172 | |
| 173 | * With GCC 2.91, 2.95, 3.0 and 3.1, compile all source using the STL |
| 174 | with -D__USE_MALLOC. Beware! This was removed from GCC starting with |
| 175 | version 3.3. |
| 176 | |
| 177 | * With GCC 3.2.2 and later, you should export the environment variable |
| 178 | GLIBCPP_FORCE_NEW before running your program. |
| 179 | |
| 180 | * With GCC 3.4 and later, that variable has changed name to |
| 181 | GLIBCXX_FORCE_NEW. |
| 182 | |
| 183 | There are other ways to disable memory pooling: using the malloc_alloc |
| 184 | template with your objects (not portable, but should work for GCC) or |
| 185 | even writing your own memory allocators. But all this goes beyond the |
| 186 | scope of this FAQ. Start by reading |
| 187 | http://gcc.gnu.org/onlinedocs/libstdc++/faq/index.html#4_4_leak: |
| 188 | <http://gcc.gnu.org/onlinedocs/libstdc++/faq/index.html#4_4_leak> if you |
| 189 | absolutely want to do that. But beware: allocators belong to the more |
| 190 | messy parts of the STL and people went to great lengths to make the STL |
| 191 | portable across platforms. Chances are good that your solution will work |
| 192 | on your platform, but not on others. |
| 193 | |
| 194 | ------------------------------------------------------------------------ |
| 195 | |
| 196 | 4.2. The stack traces given by Memcheck (or another tool) aren't |
| 197 | helpful. How can I improve them? |
| 198 | |
| 199 | If they're not long enough, use --num-callers to make them longer. |
| 200 | If they're not detailed enough, make sure you are compiling with -g to |
| 201 | add debug information. And don't strip symbol tables (programs should be |
| 202 | unstripped unless you run 'strip' on them; some libraries ship |
| 203 | stripped). |
| 204 | |
| 205 | Also, for leak reports involving shared objects, if the shared object is |
| 206 | unloaded before the program terminates, Valgrind will discard the debug |
| 207 | information and the error message will be full of ??? entries. The |
| 208 | workaround here is to avoid calling dlclose on these shared objects. |
| 209 | |
| 210 | Also, -fomit-frame-pointer and -fstack-check can make stack traces |
| 211 | worse. |
| 212 | |
| 213 | Some example sub-traces: |
| 214 | * With debug information and unstripped (best): |
| 215 | Invalid write of size 1 |
| 216 | at 0x80483BF: really (malloc1.c:20) |
| 217 | by 0x8048370: main (malloc1.c:9) |
| 218 | |
| 219 | * With no debug information, unstripped: |
| 220 | Invalid write of size 1 |
| 221 | at 0x80483BF: really (in /auto/homes/njn25/grind/head5/a.out) |
| 222 | by 0x8048370: main (in /auto/homes/njn25/grind/head5/a.out) |
| 223 | |
| 224 | * With no debug information, stripped: |
| 225 | Invalid write of size 1 |
| 226 | at 0x80483BF: (within /auto/homes/njn25/grind/head5/a.out) |
| 227 | by 0x8048370: (within /auto/homes/njn25/grind/head5/a.out) |
| 228 | by 0x42015703: __libc_start_main (in /lib/tls/libc-2.3.2.so) |
| 229 | by 0x80482CC: (within /auto/homes/njn25/grind/head5/a.out) |
| 230 | |
| 231 | * With debug information and -fomit-frame-pointer: |
| 232 | Invalid write of size 1 |
| 233 | at 0x80483C4: really (malloc1.c:20) |
| 234 | by 0x42015703: __libc_start_main (in /lib/tls/libc-2.3.2.so) |
| 235 | by 0x80482CC: ??? (start.S:81) |
| 236 | |
| 237 | * A leak error message involving an unloaded shared object: |
| 238 | 84 bytes in 1 blocks are possibly lost in loss record 488 of 713 |
| 239 | at 0x1B9036DA: operator new(unsigned) (vg_replace_malloc.c:132) |
| 240 | by 0x1DB63EEB: ??? |
| 241 | by 0x1DB4B800: ??? |
| 242 | by 0x1D65E007: ??? |
| 243 | by 0x8049EE6: main (main.cpp:24) |
| 244 | |
| 245 | ------------------------------------------------------------------------ |
| 246 | |
| 247 | 4.3. The stack traces given by Memcheck (or another tool) seem to have |
| 248 | the wrong function name in them. What's happening? |
| 249 | |
| 250 | Occasionally Valgrind stack traces get the wrong function names. This is |
| 251 | caused by glibc using aliases to effectively give one function two |
| 252 | names. Most of the time Valgrind chooses a suitable name, but very |
| 253 | occasionally it gets it wrong. Examples we know of are printing bcmp |
| 254 | instead of memcmp, index instead of strchr, and rindex instead of |
| 255 | strrchr. |
| 256 | |
| 257 | ------------------------------------------------------------------------ |
| 258 | |
| 259 | 4.4. My program crashes normally, but doesn't under Valgrind, or vice |
| 260 | versa. What's happening? |
| 261 | |
| 262 | When a program runs under Valgrind, its environment is slightly |
| 263 | different to when it runs natively. For example, the memory layout is |
| 264 | different, and the way that threads are scheduled is different. |
| 265 | |
| 266 | Most of the time this doesn't make any difference, but it can, |
| 267 | particularly if your program is buggy. For example, if your program |
| 268 | crashes because it erroneously accesses memory that is unaddressable, |
| 269 | it's possible that this memory will not be unaddressable when run under |
| 270 | Valgrind. Alternatively, if your program has data races, these may not |
| 271 | manifest under Valgrind. |
| 272 | |
| 273 | There isn't anything you can do to change this, it's just the nature of |
| 274 | the way Valgrind works that it cannot exactly replicate a native |
| 275 | execution environment. In the case where your program crashes due to a |
| 276 | memory error when run natively but not when run under Valgrind, in most |
| 277 | cases Memcheck should identify the bad memory operation. |
| 278 | |
| 279 | ------------------------------------------------------------------------ |
| 280 | |
| 281 | 4.5. Memcheck doesn't report any errors and I know my program has |
| 282 | errors. |
| 283 | |
| 284 | There are two possible causes of this. |
| 285 | First, by default, Valgrind only traces the top-level process. So if |
| 286 | your program spawns children, they won't be traced by Valgrind by |
| 287 | default. Also, if your program is started by a shell script, Perl |
| 288 | script, or something similar, Valgrind will trace the shell, or the Perl |
| 289 | interpreter, or equivalent. |
| 290 | |
| 291 | To trace child processes, use the --trace-children=yes option. |
| 292 | If you are tracing large trees of processes, it can be less disruptive |
| 293 | to have the output sent over the network. Give Valgrind the option |
| 294 | --log-socket=127.0.0.1:12345 (if you want logging output sent to port |
| 295 | 12345 on localhost). You can use the valgrind-listener program to listen |
| 296 | on that port: |
| 297 | |
| 298 | valgrind-listener 12345 |
| 299 | |
| 300 | Obviously you have to start the listener process first. See the manual |
| 301 | for more details. |
| 302 | |
| 303 | Second, if your program is statically linked, most Valgrind tools won't |
| 304 | work as well, because they won't be able to replace certain functions, |
| 305 | such as malloc, with their own versions. A key indicator of this is if |
| 306 | Memcheck says: All heap blocks were freed -- no leaks are possible when |
| 307 | you know your program calls malloc. The workaround is to avoid |
| 308 | statically linking your program. |
| 309 | |
| 310 | ------------------------------------------------------------------------ |
| 311 | |
| 312 | 4.6. Why doesn't Memcheck find the array overruns in this program? |
| 313 | int static[5]; |
| 314 | |
| 315 | int main(void) |
| 316 | { |
| 317 | int stack[5]; |
| 318 | |
| 319 | static[5] = 0; |
| 320 | stack [5] = 0; |
| 321 | |
| 322 | return 0; |
| 323 | } |
| 324 | |
| 325 | Unfortunately, Memcheck doesn't do bounds checking on static or stack |
| 326 | arrays. We'd like to, but it's just not possible to do in a reasonable |
| 327 | way that fits with how Memcheck works. Sorry. |
| 328 | |
| 329 | However, the experimental tool Ptrcheck can detect errors like this. Run |
| 330 | Valgrind with the --tool=exp-ptrcheck option to try it, but beware that |
| 331 | it is not as robust as Memcheck. |
| 332 | |
| 333 | |
| 334 | ------------------------------------------------------------------------ |
| 335 | 5. Miscellaneous |
| 336 | ------------------------------------------------------------------------ |
| 337 | |
| 338 | 5.1. I tried writing a suppression but it didn't work. Can you write my |
| 339 | suppression for me? |
| 340 | |
| 341 | Yes! Use the --gen-suppressions=yes feature to spit out suppressions |
| 342 | automatically for you. You can then edit them if you like, eg. combining |
| 343 | similar automatically generated suppressions using wildcards like '*'. |
| 344 | |
| 345 | If you really want to write suppressions by hand, read the manual |
| 346 | carefully. Note particularly that C++ function names must be mangled |
| 347 | (that is, not demangled). |
| 348 | |
| 349 | ------------------------------------------------------------------------ |
| 350 | |
| 351 | 5.2. With Memcheck's memory leak detector, what's the difference between |
| 352 | "definitely lost", "indirectly lost", "possibly lost", "still |
| 353 | reachable", and "suppressed"? |
| 354 | |
| 355 | The details are in the Memcheck section of the user manual. |
| 356 | In short: |
| 357 | * "definitely lost" means your program is leaking memory -- fix those |
| 358 | leaks! |
| 359 | |
| 360 | * "indirectly lost" means your program is leaking memory in a |
| 361 | pointer-based structure. (E.g. if the root node of a binary tree is |
| 362 | "definitely lost", all the children will be "indirectly lost".) If you |
| 363 | fix the "definitely lost" leaks, the "indirectly lost" leaks should go |
| 364 | away. |
| 365 | |
| 366 | * "possibly lost" means your program is leaking memory, unless you're |
| 367 | doing funny things with pointers. This is sometimes reasonable. Use |
| 368 | --show-possibly-lost=no if you don't want to see these reports. |
| 369 | |
| 370 | * "still reachable" means your program is probably ok -- it didn't free |
| 371 | some memory it could have. This is quite common and often reasonable. |
| 372 | Don't use --show-reachable=yes if you don't want to see these reports. |
| 373 | |
| 374 | * "suppressed" means that a leak error has been suppressed. There are |
| 375 | some suppressions in the default suppression files. You can ignore |
| 376 | suppressed errors. |
| 377 | |
| 378 | ------------------------------------------------------------------------ |
| 379 | |
| 380 | 5.3. Memcheck's uninitialised value errors are hard to track down, |
| 381 | because they are often reported some time after they are caused. Could |
| 382 | Memcheck record a trail of operations to better link the cause to the |
| 383 | effect? Or maybe just eagerly report any copies of uninitialised memory |
| 384 | values? |
| 385 | |
| 386 | Prior to version 3.4.0, the answer was "we don't know how to do it |
| 387 | without huge performance penalties". As of 3.4.0, try using the |
| 388 | --track-origins=yes option. It will run slower than usual, but will give |
| 389 | you extra information about the origin of uninitialised values. |
| 390 | |
| 391 | Or if you want to do it the old fashioned way, you can use the client |
| 392 | request VALGRIND_CHECK_VALUE_IS_DEFINED to help track these errors down |
| 393 | -- work backwards from the point where the uninitialised error occurs, |
| 394 | checking suspect values until you find the cause. This requires editing, |
| 395 | compiling and re-running your program multiple times, which is a pain, |
| 396 | but still easier than debugging the problem without Memcheck's help. |
| 397 | |
| 398 | As for eager reporting of copies of uninitialised memory values, this |
| 399 | has been suggested multiple times. Unfortunately, almost all programs |
| 400 | legitimately copy uninitialised memory values around (because compilers |
| 401 | pad structs to preserve alignment) and eager checking leads to hundreds |
| 402 | of false positives. Therefore Memcheck does not support eager checking |
| 403 | at this time. |
| 404 | |
| 405 | ------------------------------------------------------------------------ |
| 406 | |
| 407 | 5.4. Is it possible to attach Valgrind to a program that is already |
| 408 | running? |
| 409 | |
| 410 | No. The environment that Valgrind provides for running programs is |
| 411 | significantly different to that for normal programs, e.g. due to |
| 412 | different layout of memory. Therefore Valgrind has to have full control |
| 413 | from the very start. |
| 414 | |
| 415 | It is possible to achieve something like this by running your program |
| 416 | without any instrumentation (which involves a slow-down of about 5x, |
| 417 | less than that of most tools), and then adding instrumentation once you |
| 418 | get to a point of interest. Support for this must be provided by the |
| 419 | tool, however, and Callgrind is the only tool that currently has such |
| 420 | support. See the instructions on the callgrind_control program for |
| 421 | details. |
| 422 | |
| 423 | |
| 424 | ------------------------------------------------------------------------ |
| 425 | 6. How To Get Further Assistance |
| 426 | ------------------------------------------------------------------------ |
| 427 | |
| 428 | Read the appropriate section(s) of the Valgrind Documentation: |
| 429 | <http://www.valgrind.org/docs/manual/index.html>. |
| 430 | |
| 431 | Search: <http://search.gmane.org> the valgrind-users: |
| 432 | <http://news.gmane.org/gmane.comp.debugging.valgrind> mailing list |
| 433 | archives, using the group name gmane.comp.debugging.valgrind. |
| 434 | |
| 435 | If you think an answer in this FAQ is incomplete or inaccurate, please |
| 436 | e-mail valgrind@valgrind.org: <valgrind@valgrind.org>. |
| 437 | |
| 438 | If you have tried all of these things and are still stuck, you can try |
| 439 | mailing the valgrind-users mailing list: |
| 440 | <http://www.valgrind.org/support/mailing_lists.html>. Note that an email |
| 441 | has a better change of being answered usefully if it is clearly written. |
| 442 | Also remember that, despite the fact that most of the community are very |
| 443 | helpful and responsive to emailed questions, you are probably requesting |
| 444 | help from unpaid volunteers, so you have no guarantee of receiving an |
| 445 | answer. |
| 446 | |