njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 1 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 2 | A mini-FAQ for valgrind, version 1.9.6 |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 3 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
sewardj | 3d47b79 | 2003-05-05 22:15:35 +0000 | [diff] [blame] | 4 | Last revised 5 May 2003 |
| 5 | ~~~~~~~~~~~~~~~~~~~~~~~ |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 6 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 7 | ----------------------------------------------------------------- |
| 8 | |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 9 | Q1. Programs run OK on valgrind, but at exit produce a bunch |
| 10 | of errors a bit like this |
| 11 | |
| 12 | ==20755== Invalid read of size 4 |
| 13 | ==20755== at 0x40281C8A: _nl_unload_locale (loadlocale.c:238) |
| 14 | ==20755== by 0x4028179D: free_mem (findlocale.c:257) |
| 15 | ==20755== by 0x402E0962: __libc_freeres (set-freeres.c:34) |
| 16 | ==20755== by 0x40048DCC: vgPlain___libc_freeres_wrapper |
| 17 | (vg_clientfuncs.c:585) |
| 18 | ==20755== Address 0x40CC304C is 8 bytes inside a block of size 380 free'd |
| 19 | ==20755== at 0x400484C9: free (vg_clientfuncs.c:180) |
| 20 | ==20755== by 0x40281CBA: _nl_unload_locale (loadlocale.c:246) |
| 21 | ==20755== by 0x40281218: free_mem (setlocale.c:461) |
| 22 | ==20755== by 0x402E0962: __libc_freeres (set-freeres.c:34) |
| 23 | |
| 24 | and then die with a segmentation fault. |
| 25 | |
| 26 | A1. When the program exits, valgrind runs the procedure |
| 27 | __libc_freeres() in glibc. This is a hook for memory debuggers, |
| 28 | so they can ask glibc to free up any memory it has used. Doing |
| 29 | that is needed to ensure that valgrind doesn't incorrectly |
| 30 | report space leaks in glibc. |
| 31 | |
| 32 | Problem is that running __libc_freeres() in older glibc versions |
| 33 | causes this crash. |
| 34 | |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 35 | WORKAROUND FOR 1.1.X and later versions of valgrind: use the |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 36 | --run-libc-freeres=no flag. You may then get space leak |
| 37 | reports for glibc-allocations (please _don't_ report these |
| 38 | to the glibc people, since they are not real leaks), but at |
| 39 | least the program runs. |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 40 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 41 | ----------------------------------------------------------------- |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 42 | |
nethercote | 206c469 | 2003-11-02 16:27:39 +0000 | [diff] [blame] | 43 | Q2. [Question erased, as it is no longer relevant] |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 44 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 45 | ----------------------------------------------------------------- |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 46 | |
| 47 | Q3. My (buggy) program dies like this: |
| 48 | valgrind: vg_malloc2.c:442 (bszW_to_pszW): |
| 49 | Assertion `pszW >= 0' failed. |
| 50 | And/or my (buggy) program runs OK on valgrind, but dies like |
| 51 | this on cachegrind. |
| 52 | |
| 53 | A3. If valgrind shows any invalid reads, invalid writes and invalid |
| 54 | frees in your program, the above may happen. Reason is that your |
| 55 | program may trash valgrind's low-level memory manager, which then |
| 56 | dies with the above assertion, or something like this. The cure |
| 57 | is to fix your program so that it doesn't do any illegal memory |
| 58 | accesses. The above failure will hopefully go away after that. |
| 59 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 60 | ----------------------------------------------------------------- |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 61 | |
| 62 | Q4. I'm running Red Hat Advanced Server. Valgrind always segfaults at |
| 63 | startup. |
| 64 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 65 | A4. Known issue with RHAS 2.1, due to funny stack permissions at |
| 66 | startup. However, valgrind-1.9.4 and later automatically handle |
| 67 | this correctly, and should not segfault. |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 68 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 69 | ----------------------------------------------------------------- |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 70 | |
| 71 | Q5. I try running "valgrind my_program", but my_program runs normally, |
| 72 | and Valgrind doesn't emit any output at all. |
| 73 | |
njn | f45a4eb | 2003-09-28 18:18:47 +0000 | [diff] [blame] | 74 | A5. Valgrind doesn't work out-of-the-box with programs that are entirely |
| 75 | statically linked. It does a quick test at startup, and if it detects |
| 76 | that the program is statically linked, it aborts with an explanation. |
| 77 | |
| 78 | This test may fail in some obscure cases, eg. if you run a script |
| 79 | under Valgrind and the script interpreter is statically linked. |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 80 | |
njn | f45a4eb | 2003-09-28 18:18:47 +0000 | [diff] [blame] | 81 | If you still want static linking, you can ask gcc to link certain |
| 82 | libraries statically. Try the following options: |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 83 | |
njn | dc8d5e5 | 2003-09-25 18:20:17 +0000 | [diff] [blame] | 84 | -Wl,-Bstatic -lmyLibrary1 -lotherLibrary -Wl,-Bdynamic |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 85 | |
njn | dc8d5e5 | 2003-09-25 18:20:17 +0000 | [diff] [blame] | 86 | Just make sure you end with -Wl,-Bdynamic so that libc is dynamically |
| 87 | linked. |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 88 | |
njn | f45a4eb | 2003-09-28 18:18:47 +0000 | [diff] [blame] | 89 | If you absolutely cannot use dynamic libraries, you can try statically |
| 90 | linking together all the .o files in coregrind/, all the .o files of the |
nethercote | 137bc55 | 2003-11-14 17:47:54 +0000 | [diff] [blame] | 91 | tool of your choice (eg. those in memcheck/), and the .o files of your |
njn | f45a4eb | 2003-09-28 18:18:47 +0000 | [diff] [blame] | 92 | program. You'll end up with a statically linked binary that runs |
| 93 | permanently under Valgrind's control. Note that we haven't tested this |
| 94 | procedure thoroughly. |
| 95 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 96 | ----------------------------------------------------------------- |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 97 | |
| 98 | Q6. I try running "valgrind my_program" and get Valgrind's startup message, |
| 99 | but I don't get any errors and I know my program has errors. |
| 100 | |
| 101 | A6. By default, Valgrind only traces the top-level process. So if your |
| 102 | program spawns children, they won't be traced by Valgrind by default. |
| 103 | Also, if your program is started by a shell script, Perl script, or |
| 104 | something similar, Valgrind will trace the shell, or the Perl |
| 105 | interpreter, or equivalent. |
| 106 | |
| 107 | To trace child processes, use the --trace-children=yes option. |
| 108 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 109 | If you are tracing large trees of processes, it can be less |
| 110 | disruptive to have the output sent over the network. Give |
| 111 | valgrind the flag --logsocket=127.0.0.1:12345 (if you want |
| 112 | logging output sent to port 12345 on localhost). You can |
| 113 | use the valgrind-listener program to listen on that port: |
| 114 | valgrind-listener 12345 |
| 115 | Obviously you have to start the listener process first. |
| 116 | See the documentation for more details. |
| 117 | |
| 118 | ----------------------------------------------------------------- |
| 119 | |
| 120 | Q7. My threaded server process runs unbelievably slowly on |
| 121 | valgrind. So slowly, in fact, that at first I thought it |
| 122 | had completely locked up. |
| 123 | |
| 124 | A7. We are not completely sure about this, but one possibility |
| 125 | is that laptops with power management fool valgrind's |
| 126 | timekeeping mechanism, which is (somewhat in error) based |
| 127 | on the x86 RDTSC instruction. A "fix" which is claimed to |
| 128 | work is to run some other cpu-intensive process at the same |
| 129 | time, so that the laptop's power-management clock-slowing |
| 130 | does not kick in. We would be interested in hearing more |
| 131 | feedback on this. |
| 132 | |
sewardj | 3d47b79 | 2003-05-05 22:15:35 +0000 | [diff] [blame] | 133 | Another possible cause is that versions prior to 1.9.6 |
| 134 | did not support threading on glibc 2.3.X systems well. |
| 135 | Hopefully the situation is much improved with 1.9.6. |
| 136 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 137 | ----------------------------------------------------------------- |
| 138 | |
nethercote | 3178887 | 2003-11-02 16:32:05 +0000 | [diff] [blame] | 139 | Q8. My program dies, printing a message like this along the way: |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 140 | |
nethercote | 3178887 | 2003-11-02 16:32:05 +0000 | [diff] [blame] | 141 | disInstr: unhandled instruction bytes: 0x66 0xF 0x2E 0x5 |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 142 | |
nethercote | 3178887 | 2003-11-02 16:32:05 +0000 | [diff] [blame] | 143 | A8. Valgrind doesn't support the full x86 instruction set, although |
| 144 | it now supports many SSE and SSE2 instructions. If you know |
| 145 | the failing instruction is an SSE/SSE2 instruction, you might |
| 146 | be able to recompile your progrma without it by using the flag |
| 147 | -march to gcc. Either way, let us know and we'll try to fix it. |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 148 | |
| 149 | ----------------------------------------------------------------- |
| 150 | |
| 151 | Q9. My program dies complaining that __libc_current_sigrtmin |
| 152 | is unimplemented. |
| 153 | |
sewardj | 3d47b79 | 2003-05-05 22:15:35 +0000 | [diff] [blame] | 154 | A9. Should be fixed in 1.9.6. I would appreciate confirmation |
| 155 | of that. |
sewardj | 03272ff | 2003-04-26 22:23:35 +0000 | [diff] [blame] | 156 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 157 | ----------------------------------------------------------------- |
| 158 | |
| 159 | Q10. I upgraded to Red Hat 9 and threaded programs now act |
| 160 | strange / deadlock when they didn't before. |
| 161 | |
| 162 | A10. Thread support on glibc 2.3.2+ with NPTL is not as |
| 163 | good as on older LinuxThreads-based systems. We have |
| 164 | this under consideration. Avoid Red Hat >= 8.1 for |
| 165 | the time being, if you can. |
| 166 | |
sewardj | 3d47b79 | 2003-05-05 22:15:35 +0000 | [diff] [blame] | 167 | 5 May 03: 1.9.6 should be significantly improved on |
| 168 | Red Hat 9, SuSE 8.2 and other glibc-2.3.2 systems. |
| 169 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 170 | ----------------------------------------------------------------- |
| 171 | |
| 172 | Q11. I really need to use the NVidia libGL.so in my app. |
| 173 | Help! |
| 174 | |
| 175 | A11. NVidia also noticed this it seems, and the "latest" drivers |
| 176 | (version 4349, apparently) come with this text |
| 177 | |
| 178 | DISABLING CPU SPECIFIC FEATURES |
| 179 | |
| 180 | Setting the environment variable __GL_FORCE_GENERIC_CPU to a |
| 181 | non-zero value will inhibit the use of CPU specific features |
| 182 | such as MMX, SSE, or 3DNOW!. Use of this option may result in |
| 183 | performance loss. This option may be useful in conjunction with |
| 184 | software such as the Valgrind memory debugger. |
| 185 | |
| 186 | Set __GL_FORCE_GENERIC_CPU=1 and Valgrind should work. This has |
| 187 | been confirmed by various people. Thanks NVidia! |
| 188 | |
| 189 | ----------------------------------------------------------------- |
| 190 | |
| 191 | Q12. My program dies like this (often at exit): |
| 192 | |
| 193 | VG_(mash_LD_PRELOAD_and_LD_LIBRARY_PATH): internal error: |
| 194 | (loads of text) |
| 195 | |
njn | ab88298 | 2003-08-13 08:34:42 +0000 | [diff] [blame] | 196 | A12. One possible cause is that your program modifies its |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 197 | environment variables, possibly including zeroing them |
njn | 481f851 | 2003-08-13 09:56:30 +0000 | [diff] [blame] | 198 | all. Valgrind relies on the LD_PRELOAD, LD_LIBRARY_PATH and |
| 199 | VG_ARGS variables. Zeroing them will break things. |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 200 | |
njn | 3cf1430 | 2003-08-19 07:50:24 +0000 | [diff] [blame] | 201 | As of 1.9.6, Valgrind only uses these variables with |
| 202 | --trace-children=no, when executing execve() or using the |
| 203 | --stop-after=yes flag. This should reduce the potential for |
njn | ab88298 | 2003-08-13 08:34:42 +0000 | [diff] [blame] | 204 | problems. |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 205 | |
| 206 | ----------------------------------------------------------------- |
| 207 | |
| 208 | Q13. My program dies like this: |
| 209 | |
| 210 | error: /lib/librt.so.1: symbol __pthread_clock_settime, version |
| 211 | GLIBC_PRIVATE not defined in file libpthread.so.0 with link time |
| 212 | reference |
| 213 | |
| 214 | A13. This is a total swamp. Nevertheless there is a way out. |
| 215 | It's a problem which is not easy to fix. Really the problem is |
| 216 | that /lib/librt.so.1 refers to some symbols |
| 217 | __pthread_clock_settime and __pthread_clock_gettime in |
| 218 | /lib/libpthread.so which are not intended to be exported, ie |
| 219 | they are private. |
| 220 | |
| 221 | Best solution is to ensure your program does not use |
| 222 | /lib/librt.so.1. |
| 223 | |
| 224 | However .. since you're probably not using it directly, or even |
| 225 | knowingly, that's hard to do. You might instead be able to fix |
| 226 | it by playing around with coregrind/vg_libpthread.vs. Things to |
| 227 | try: |
| 228 | |
| 229 | Remove this |
| 230 | |
| 231 | GLIBC_PRIVATE { |
| 232 | __pthread_clock_gettime; |
| 233 | __pthread_clock_settime; |
| 234 | }; |
| 235 | |
| 236 | or maybe remove this |
| 237 | |
| 238 | GLIBC_2.2.3 { |
| 239 | __pthread_clock_gettime; |
| 240 | __pthread_clock_settime; |
| 241 | } GLIBC_2.2; |
| 242 | |
| 243 | or maybe add this |
| 244 | |
| 245 | GLIBC_2.2.4 { |
| 246 | __pthread_clock_gettime; |
| 247 | __pthread_clock_settime; |
| 248 | } GLIBC_2.2; |
| 249 | |
| 250 | GLIBC_2.2.5 { |
| 251 | __pthread_clock_gettime; |
| 252 | __pthread_clock_settime; |
| 253 | } GLIBC_2.2; |
| 254 | |
| 255 | or some combination of the above. After each change you need to |
| 256 | delete coregrind/libpthread.so and do make && make install. |
| 257 | |
| 258 | I just don't know if any of the above will work. If you can |
| 259 | find a solution which works, I would be interested to hear it. |
| 260 | |
| 261 | To which someone replied: |
| 262 | |
| 263 | I deleted this: |
| 264 | |
| 265 | GLIBC_2.2.3 { |
| 266 | __pthread_clock_gettime; |
| 267 | __pthread_clock_settime; |
| 268 | } GLIBC_2.2; |
| 269 | |
| 270 | and it worked. |
| 271 | |
| 272 | ----------------------------------------------------------------- |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 273 | |
sewardj | 03272ff | 2003-04-26 22:23:35 +0000 | [diff] [blame] | 274 | Q14. My program uses the C++ STL and string classes. Valgrind |
| 275 | reports 'still reachable' memory leaks involving these classes |
| 276 | at the exit of the program, but there should be none. |
| 277 | |
| 278 | A14. First of all: relax, it's probably not a bug, but a feature. |
| 279 | Many implementations of the C++ standard libraries use their own |
| 280 | memory pool allocators. Memory for quite a number of destructed |
| 281 | objects is not immediately freed and given back to the OS, but |
| 282 | kept in the pool(s) for later re-use. The fact that the pools |
| 283 | are not freed at the exit() of the program cause valgrind to |
| 284 | report this memory as still reachable. The behaviour not to |
| 285 | free pools at the exit() could be called a bug of the library |
| 286 | though. |
| 287 | |
| 288 | Using gcc, you can force the STL to use malloc and to free |
| 289 | memory as soon as possible by globally disabling memory caching. |
| 290 | Beware! Doing so will probably slow down your program, |
| 291 | sometimes drastically. |
| 292 | |
| 293 | - With gcc 2.91, 2.95, 3.0 and 3.1, compile all source using the |
| 294 | STL with -D__USE_MALLOC. Beware! This is removed from gcc |
| 295 | starting with version 3.3. |
| 296 | |
| 297 | - With 3.2.2 and later, you should export the environment |
| 298 | variable GLIBCPP_FORCE_NEW before running your program. |
| 299 | |
| 300 | There are other ways to disable memory pooling: using the |
| 301 | malloc_alloc template with your objects (not portable, but |
| 302 | should work for gcc) or even writing your own memory |
| 303 | allocators. But all this goes beyond the scope of this |
| 304 | FAQ. Start by reading |
| 305 | http://gcc.gnu.org/onlinedocs/libstdc++/ext/howto.html#3 |
| 306 | if you absolutely want to do that. But beware: |
| 307 | |
| 308 | 1) there are currently changes underway for gcc which are not |
| 309 | totally reflected in the docs right now |
| 310 | ("now" == 26 Apr 03) |
| 311 | |
| 312 | 2) allocators belong to the more messy parts of the STL and |
| 313 | people went at great lengths to make it portable across |
| 314 | platforms. Chances are good that your solution will work |
| 315 | on your platform, but not on others. |
| 316 | |
| 317 | ----------------------------------------------------------------- |
| 318 | |
njn | ae34aef | 2003-08-07 21:24:24 +0000 | [diff] [blame] | 319 | Q15. My program dies with a segmentation fault, but Valgrind doesn't give |
| 320 | any error messages before it, or none that look related. |
| 321 | |
| 322 | A15. The one kind of segmentation fault that Valgrind won't give any |
| 323 | warnings about is writes to read-only memory. Maybe your program is |
| 324 | writing to a static string like this: |
| 325 | |
| 326 | char* s = "hello"; |
| 327 | s[0] = 'j'; |
| 328 | |
| 329 | or something similar. Writing to read-only memory can also apparently |
| 330 | make LinuxThreads behave strangely. |
| 331 | |
| 332 | ----------------------------------------------------------------- |
| 333 | |
njn | 1aa1850 | 2003-08-15 07:35:20 +0000 | [diff] [blame] | 334 | Q16. When I trying building Valgrind, 'make' dies partway with an |
| 335 | assertion failure, something like this: make: expand.c:489: |
| 336 | |
| 337 | allocated_variable_append: Assertion |
| 338 | `current_variable_set_list->next != 0' failed. |
| 339 | |
| 340 | A16. It's probably a bug in 'make'. Some, but not all, instances of |
| 341 | version 3.79.1 have this bug, see |
| 342 | www.mail-archive.com/bug-make@gnu.org/msg01658.html. Try upgrading to a |
| 343 | more recent version of 'make'. |
| 344 | |
| 345 | ----------------------------------------------------------------- |
| 346 | |
njn | a8fb5a3 | 2003-08-20 11:19:17 +0000 | [diff] [blame] | 347 | Q17. I tried writing a suppression but it didn't work. Can you |
| 348 | write my suppression for me? |
| 349 | |
| 350 | A17. Yes! Use the --gen-suppressions=yes feature to spit out |
| 351 | suppressions automatically for you. You can then edit them |
| 352 | if you like, eg. combining similar automatically generated |
| 353 | suppressions using wildcards like '*'. |
| 354 | |
| 355 | If you really want to write suppressions by hand, read the |
| 356 | manual carefully. Note particularly that C++ function names |
| 357 | must be _mangled_. |
| 358 | |
| 359 | ----------------------------------------------------------------- |
| 360 | |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 361 | (this is the end of the FAQ.) |