njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 1 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 2 | A mini-FAQ for valgrind, version 1.9.6 |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 3 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
sewardj | 3d47b79 | 2003-05-05 22:15:35 +0000 | [diff] [blame] | 4 | Last revised 5 May 2003 |
| 5 | ~~~~~~~~~~~~~~~~~~~~~~~ |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 6 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 7 | ----------------------------------------------------------------- |
| 8 | |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 9 | Q1. Programs run OK on valgrind, but at exit produce a bunch |
| 10 | of errors a bit like this |
| 11 | |
| 12 | ==20755== Invalid read of size 4 |
| 13 | ==20755== at 0x40281C8A: _nl_unload_locale (loadlocale.c:238) |
| 14 | ==20755== by 0x4028179D: free_mem (findlocale.c:257) |
| 15 | ==20755== by 0x402E0962: __libc_freeres (set-freeres.c:34) |
| 16 | ==20755== by 0x40048DCC: vgPlain___libc_freeres_wrapper |
| 17 | (vg_clientfuncs.c:585) |
| 18 | ==20755== Address 0x40CC304C is 8 bytes inside a block of size 380 free'd |
| 19 | ==20755== at 0x400484C9: free (vg_clientfuncs.c:180) |
| 20 | ==20755== by 0x40281CBA: _nl_unload_locale (loadlocale.c:246) |
| 21 | ==20755== by 0x40281218: free_mem (setlocale.c:461) |
| 22 | ==20755== by 0x402E0962: __libc_freeres (set-freeres.c:34) |
| 23 | |
| 24 | and then die with a segmentation fault. |
| 25 | |
| 26 | A1. When the program exits, valgrind runs the procedure |
| 27 | __libc_freeres() in glibc. This is a hook for memory debuggers, |
| 28 | so they can ask glibc to free up any memory it has used. Doing |
| 29 | that is needed to ensure that valgrind doesn't incorrectly |
| 30 | report space leaks in glibc. |
| 31 | |
| 32 | Problem is that running __libc_freeres() in older glibc versions |
| 33 | causes this crash. |
| 34 | |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 35 | WORKAROUND FOR 1.1.X and later versions of valgrind: use the |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 36 | --run-libc-freeres=no flag. You may then get space leak |
| 37 | reports for glibc-allocations (please _don't_ report these |
| 38 | to the glibc people, since they are not real leaks), but at |
| 39 | least the program runs. |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 40 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 41 | ----------------------------------------------------------------- |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 42 | |
| 43 | Q2. My program dies complaining that syscall 197 is unimplemented. |
| 44 | |
| 45 | A2. 197, which is fstat64, is supported by valgrind. The problem is |
| 46 | that the /usr/include/asm/unistd.h on the machine on which your |
| 47 | valgrind was built, doesn't match your kernel -- or, to be more |
| 48 | specific, glibc is asking your kernel to do a syscall which is |
| 49 | not listed in /usr/include/asm/unistd.h. |
| 50 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 51 | The fix is simple. Somewhere near the top of |
| 52 | coregrind/vg_syscalls.c, add the following line: |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 53 | |
| 54 | #define __NR_fstat64 197 |
| 55 | |
| 56 | Rebuild and try again. The above line should appear before any |
| 57 | uses of the __NR_fstat64 symbol in that file. If you look at the |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 58 | place where __NR_fstat64 is used in vg_syscalls.c, it will be |
| 59 | obvious why this fix works. |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 60 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 61 | ----------------------------------------------------------------- |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 62 | |
| 63 | Q3. My (buggy) program dies like this: |
| 64 | valgrind: vg_malloc2.c:442 (bszW_to_pszW): |
| 65 | Assertion `pszW >= 0' failed. |
| 66 | And/or my (buggy) program runs OK on valgrind, but dies like |
| 67 | this on cachegrind. |
| 68 | |
| 69 | A3. If valgrind shows any invalid reads, invalid writes and invalid |
| 70 | frees in your program, the above may happen. Reason is that your |
| 71 | program may trash valgrind's low-level memory manager, which then |
| 72 | dies with the above assertion, or something like this. The cure |
| 73 | is to fix your program so that it doesn't do any illegal memory |
| 74 | accesses. The above failure will hopefully go away after that. |
| 75 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 76 | ----------------------------------------------------------------- |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 77 | |
| 78 | Q4. I'm running Red Hat Advanced Server. Valgrind always segfaults at |
| 79 | startup. |
| 80 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 81 | A4. Known issue with RHAS 2.1, due to funny stack permissions at |
| 82 | startup. However, valgrind-1.9.4 and later automatically handle |
| 83 | this correctly, and should not segfault. |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 84 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 85 | ----------------------------------------------------------------- |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 86 | |
| 87 | Q5. I try running "valgrind my_program", but my_program runs normally, |
| 88 | and Valgrind doesn't emit any output at all. |
| 89 | |
| 90 | A5. Is my_program statically linked? Valgrind doesn't work with |
njn | 5187f43 | 2003-04-23 07:35:56 +0000 | [diff] [blame] | 91 | statically linked binaries. my_program must rely on at least one |
| 92 | shared object. To determine if a my_program is statically linked, |
| 93 | run: |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 94 | |
| 95 | ldd my_program |
| 96 | |
| 97 | It will show what shared objects my_program relies on, or say: |
| 98 | |
| 99 | not a dynamic executable |
| 100 | |
njn | 5187f43 | 2003-04-23 07:35:56 +0000 | [diff] [blame] | 101 | if my_program is statically linked. |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 102 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 103 | ----------------------------------------------------------------- |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 104 | |
| 105 | Q6. I try running "valgrind my_program" and get Valgrind's startup message, |
| 106 | but I don't get any errors and I know my program has errors. |
| 107 | |
| 108 | A6. By default, Valgrind only traces the top-level process. So if your |
| 109 | program spawns children, they won't be traced by Valgrind by default. |
| 110 | Also, if your program is started by a shell script, Perl script, or |
| 111 | something similar, Valgrind will trace the shell, or the Perl |
| 112 | interpreter, or equivalent. |
| 113 | |
| 114 | To trace child processes, use the --trace-children=yes option. |
| 115 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 116 | If you are tracing large trees of processes, it can be less |
| 117 | disruptive to have the output sent over the network. Give |
| 118 | valgrind the flag --logsocket=127.0.0.1:12345 (if you want |
| 119 | logging output sent to port 12345 on localhost). You can |
| 120 | use the valgrind-listener program to listen on that port: |
| 121 | valgrind-listener 12345 |
| 122 | Obviously you have to start the listener process first. |
| 123 | See the documentation for more details. |
| 124 | |
| 125 | ----------------------------------------------------------------- |
| 126 | |
| 127 | Q7. My threaded server process runs unbelievably slowly on |
| 128 | valgrind. So slowly, in fact, that at first I thought it |
| 129 | had completely locked up. |
| 130 | |
| 131 | A7. We are not completely sure about this, but one possibility |
| 132 | is that laptops with power management fool valgrind's |
| 133 | timekeeping mechanism, which is (somewhat in error) based |
| 134 | on the x86 RDTSC instruction. A "fix" which is claimed to |
| 135 | work is to run some other cpu-intensive process at the same |
| 136 | time, so that the laptop's power-management clock-slowing |
| 137 | does not kick in. We would be interested in hearing more |
| 138 | feedback on this. |
| 139 | |
sewardj | 3d47b79 | 2003-05-05 22:15:35 +0000 | [diff] [blame] | 140 | Another possible cause is that versions prior to 1.9.6 |
| 141 | did not support threading on glibc 2.3.X systems well. |
| 142 | Hopefully the situation is much improved with 1.9.6. |
| 143 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 144 | ----------------------------------------------------------------- |
| 145 | |
| 146 | Q8. My program dies (exactly) like this: |
| 147 | |
| 148 | REPE then 0xF |
| 149 | valgrind: the `impossible' happened: |
| 150 | Unhandled REPE case |
| 151 | |
sewardj | 3d47b79 | 2003-05-05 22:15:35 +0000 | [diff] [blame] | 152 | A8. Yeah ... that I believe is a SSE or SSE2 instruction. Are you |
| 153 | building your app with -march=pentium4 or -march=athlon or |
| 154 | something like that? If you can somehow dissuade gcc from |
| 155 | producing SSE/SSE2 instructions, you may be able to avoid this. |
| 156 | Some folks have reported that removing the flag -march=... |
| 157 | works around this. |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 158 | |
| 159 | I'd be interested to hear if you can get rid of it by changing |
| 160 | your application build flags. |
| 161 | |
| 162 | ----------------------------------------------------------------- |
| 163 | |
| 164 | Q9. My program dies complaining that __libc_current_sigrtmin |
| 165 | is unimplemented. |
| 166 | |
sewardj | 3d47b79 | 2003-05-05 22:15:35 +0000 | [diff] [blame] | 167 | A9. Should be fixed in 1.9.6. I would appreciate confirmation |
| 168 | of that. |
sewardj | 03272ff | 2003-04-26 22:23:35 +0000 | [diff] [blame] | 169 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 170 | ----------------------------------------------------------------- |
| 171 | |
| 172 | Q10. I upgraded to Red Hat 9 and threaded programs now act |
| 173 | strange / deadlock when they didn't before. |
| 174 | |
| 175 | A10. Thread support on glibc 2.3.2+ with NPTL is not as |
| 176 | good as on older LinuxThreads-based systems. We have |
| 177 | this under consideration. Avoid Red Hat >= 8.1 for |
| 178 | the time being, if you can. |
| 179 | |
sewardj | 3d47b79 | 2003-05-05 22:15:35 +0000 | [diff] [blame] | 180 | 5 May 03: 1.9.6 should be significantly improved on |
| 181 | Red Hat 9, SuSE 8.2 and other glibc-2.3.2 systems. |
| 182 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 183 | ----------------------------------------------------------------- |
| 184 | |
| 185 | Q11. I really need to use the NVidia libGL.so in my app. |
| 186 | Help! |
| 187 | |
| 188 | A11. NVidia also noticed this it seems, and the "latest" drivers |
| 189 | (version 4349, apparently) come with this text |
| 190 | |
| 191 | DISABLING CPU SPECIFIC FEATURES |
| 192 | |
| 193 | Setting the environment variable __GL_FORCE_GENERIC_CPU to a |
| 194 | non-zero value will inhibit the use of CPU specific features |
| 195 | such as MMX, SSE, or 3DNOW!. Use of this option may result in |
| 196 | performance loss. This option may be useful in conjunction with |
| 197 | software such as the Valgrind memory debugger. |
| 198 | |
| 199 | Set __GL_FORCE_GENERIC_CPU=1 and Valgrind should work. This has |
| 200 | been confirmed by various people. Thanks NVidia! |
| 201 | |
| 202 | ----------------------------------------------------------------- |
| 203 | |
| 204 | Q12. My program dies like this (often at exit): |
| 205 | |
| 206 | VG_(mash_LD_PRELOAD_and_LD_LIBRARY_PATH): internal error: |
| 207 | (loads of text) |
| 208 | |
| 209 | A12. We're not entirely sure about this, and would appreciate |
| 210 | someone sending a simple test case for us to look at. |
| 211 | One possible cause is that your program modifies its |
| 212 | environment variables, possibly including zeroing them |
| 213 | all. Avoid this if you can. |
| 214 | |
sewardj | 3d47b79 | 2003-05-05 22:15:35 +0000 | [diff] [blame] | 215 | 1.9.6 contains a fix which hopefully reduces the chances |
| 216 | of your program bombing out like this. |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame] | 217 | |
| 218 | ----------------------------------------------------------------- |
| 219 | |
| 220 | Q13. My program dies like this: |
| 221 | |
| 222 | error: /lib/librt.so.1: symbol __pthread_clock_settime, version |
| 223 | GLIBC_PRIVATE not defined in file libpthread.so.0 with link time |
| 224 | reference |
| 225 | |
| 226 | A13. This is a total swamp. Nevertheless there is a way out. |
| 227 | It's a problem which is not easy to fix. Really the problem is |
| 228 | that /lib/librt.so.1 refers to some symbols |
| 229 | __pthread_clock_settime and __pthread_clock_gettime in |
| 230 | /lib/libpthread.so which are not intended to be exported, ie |
| 231 | they are private. |
| 232 | |
| 233 | Best solution is to ensure your program does not use |
| 234 | /lib/librt.so.1. |
| 235 | |
| 236 | However .. since you're probably not using it directly, or even |
| 237 | knowingly, that's hard to do. You might instead be able to fix |
| 238 | it by playing around with coregrind/vg_libpthread.vs. Things to |
| 239 | try: |
| 240 | |
| 241 | Remove this |
| 242 | |
| 243 | GLIBC_PRIVATE { |
| 244 | __pthread_clock_gettime; |
| 245 | __pthread_clock_settime; |
| 246 | }; |
| 247 | |
| 248 | or maybe remove this |
| 249 | |
| 250 | GLIBC_2.2.3 { |
| 251 | __pthread_clock_gettime; |
| 252 | __pthread_clock_settime; |
| 253 | } GLIBC_2.2; |
| 254 | |
| 255 | or maybe add this |
| 256 | |
| 257 | GLIBC_2.2.4 { |
| 258 | __pthread_clock_gettime; |
| 259 | __pthread_clock_settime; |
| 260 | } GLIBC_2.2; |
| 261 | |
| 262 | GLIBC_2.2.5 { |
| 263 | __pthread_clock_gettime; |
| 264 | __pthread_clock_settime; |
| 265 | } GLIBC_2.2; |
| 266 | |
| 267 | or some combination of the above. After each change you need to |
| 268 | delete coregrind/libpthread.so and do make && make install. |
| 269 | |
| 270 | I just don't know if any of the above will work. If you can |
| 271 | find a solution which works, I would be interested to hear it. |
| 272 | |
| 273 | To which someone replied: |
| 274 | |
| 275 | I deleted this: |
| 276 | |
| 277 | GLIBC_2.2.3 { |
| 278 | __pthread_clock_gettime; |
| 279 | __pthread_clock_settime; |
| 280 | } GLIBC_2.2; |
| 281 | |
| 282 | and it worked. |
| 283 | |
| 284 | ----------------------------------------------------------------- |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 285 | |
sewardj | 03272ff | 2003-04-26 22:23:35 +0000 | [diff] [blame] | 286 | Q14. My program uses the C++ STL and string classes. Valgrind |
| 287 | reports 'still reachable' memory leaks involving these classes |
| 288 | at the exit of the program, but there should be none. |
| 289 | |
| 290 | A14. First of all: relax, it's probably not a bug, but a feature. |
| 291 | Many implementations of the C++ standard libraries use their own |
| 292 | memory pool allocators. Memory for quite a number of destructed |
| 293 | objects is not immediately freed and given back to the OS, but |
| 294 | kept in the pool(s) for later re-use. The fact that the pools |
| 295 | are not freed at the exit() of the program cause valgrind to |
| 296 | report this memory as still reachable. The behaviour not to |
| 297 | free pools at the exit() could be called a bug of the library |
| 298 | though. |
| 299 | |
| 300 | Using gcc, you can force the STL to use malloc and to free |
| 301 | memory as soon as possible by globally disabling memory caching. |
| 302 | Beware! Doing so will probably slow down your program, |
| 303 | sometimes drastically. |
| 304 | |
| 305 | - With gcc 2.91, 2.95, 3.0 and 3.1, compile all source using the |
| 306 | STL with -D__USE_MALLOC. Beware! This is removed from gcc |
| 307 | starting with version 3.3. |
| 308 | |
| 309 | - With 3.2.2 and later, you should export the environment |
| 310 | variable GLIBCPP_FORCE_NEW before running your program. |
| 311 | |
| 312 | There are other ways to disable memory pooling: using the |
| 313 | malloc_alloc template with your objects (not portable, but |
| 314 | should work for gcc) or even writing your own memory |
| 315 | allocators. But all this goes beyond the scope of this |
| 316 | FAQ. Start by reading |
| 317 | http://gcc.gnu.org/onlinedocs/libstdc++/ext/howto.html#3 |
| 318 | if you absolutely want to do that. But beware: |
| 319 | |
| 320 | 1) there are currently changes underway for gcc which are not |
| 321 | totally reflected in the docs right now |
| 322 | ("now" == 26 Apr 03) |
| 323 | |
| 324 | 2) allocators belong to the more messy parts of the STL and |
| 325 | people went at great lengths to make it portable across |
| 326 | platforms. Chances are good that your solution will work |
| 327 | on your platform, but not on others. |
| 328 | |
| 329 | ----------------------------------------------------------------- |
| 330 | |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 331 | (this is the end of the FAQ.) |