njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 1 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame^] | 2 | A mini-FAQ for valgrind, version 1.9.6 |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 3 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 4 | Last revised 22 Apr 2003 |
| 5 | ~~~~~~~~~~~~~~~~~~~~~~~~ |
| 6 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame^] | 7 | ----------------------------------------------------------------- |
| 8 | |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 9 | Q1. Programs run OK on valgrind, but at exit produce a bunch |
| 10 | of errors a bit like this |
| 11 | |
| 12 | ==20755== Invalid read of size 4 |
| 13 | ==20755== at 0x40281C8A: _nl_unload_locale (loadlocale.c:238) |
| 14 | ==20755== by 0x4028179D: free_mem (findlocale.c:257) |
| 15 | ==20755== by 0x402E0962: __libc_freeres (set-freeres.c:34) |
| 16 | ==20755== by 0x40048DCC: vgPlain___libc_freeres_wrapper |
| 17 | (vg_clientfuncs.c:585) |
| 18 | ==20755== Address 0x40CC304C is 8 bytes inside a block of size 380 free'd |
| 19 | ==20755== at 0x400484C9: free (vg_clientfuncs.c:180) |
| 20 | ==20755== by 0x40281CBA: _nl_unload_locale (loadlocale.c:246) |
| 21 | ==20755== by 0x40281218: free_mem (setlocale.c:461) |
| 22 | ==20755== by 0x402E0962: __libc_freeres (set-freeres.c:34) |
| 23 | |
| 24 | and then die with a segmentation fault. |
| 25 | |
| 26 | A1. When the program exits, valgrind runs the procedure |
| 27 | __libc_freeres() in glibc. This is a hook for memory debuggers, |
| 28 | so they can ask glibc to free up any memory it has used. Doing |
| 29 | that is needed to ensure that valgrind doesn't incorrectly |
| 30 | report space leaks in glibc. |
| 31 | |
| 32 | Problem is that running __libc_freeres() in older glibc versions |
| 33 | causes this crash. |
| 34 | |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 35 | WORKAROUND FOR 1.1.X and later versions of valgrind: use the |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame^] | 36 | --run-libc-freeres=no flag. You may then get space leak |
| 37 | reports for glibc-allocations (please _don't_ report these |
| 38 | to the glibc people, since they are not real leaks), but at |
| 39 | least the program runs. |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 40 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame^] | 41 | ----------------------------------------------------------------- |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 42 | |
| 43 | Q2. My program dies complaining that syscall 197 is unimplemented. |
| 44 | |
| 45 | A2. 197, which is fstat64, is supported by valgrind. The problem is |
| 46 | that the /usr/include/asm/unistd.h on the machine on which your |
| 47 | valgrind was built, doesn't match your kernel -- or, to be more |
| 48 | specific, glibc is asking your kernel to do a syscall which is |
| 49 | not listed in /usr/include/asm/unistd.h. |
| 50 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame^] | 51 | The fix is simple. Somewhere near the top of |
| 52 | coregrind/vg_syscalls.c, add the following line: |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 53 | |
| 54 | #define __NR_fstat64 197 |
| 55 | |
| 56 | Rebuild and try again. The above line should appear before any |
| 57 | uses of the __NR_fstat64 symbol in that file. If you look at the |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame^] | 58 | place where __NR_fstat64 is used in vg_syscalls.c, it will be |
| 59 | obvious why this fix works. |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 60 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame^] | 61 | ----------------------------------------------------------------- |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 62 | |
| 63 | Q3. My (buggy) program dies like this: |
| 64 | valgrind: vg_malloc2.c:442 (bszW_to_pszW): |
| 65 | Assertion `pszW >= 0' failed. |
| 66 | And/or my (buggy) program runs OK on valgrind, but dies like |
| 67 | this on cachegrind. |
| 68 | |
| 69 | A3. If valgrind shows any invalid reads, invalid writes and invalid |
| 70 | frees in your program, the above may happen. Reason is that your |
| 71 | program may trash valgrind's low-level memory manager, which then |
| 72 | dies with the above assertion, or something like this. The cure |
| 73 | is to fix your program so that it doesn't do any illegal memory |
| 74 | accesses. The above failure will hopefully go away after that. |
| 75 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame^] | 76 | ----------------------------------------------------------------- |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 77 | |
| 78 | Q4. I'm running Red Hat Advanced Server. Valgrind always segfaults at |
| 79 | startup. |
| 80 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame^] | 81 | A4. Known issue with RHAS 2.1, due to funny stack permissions at |
| 82 | startup. However, valgrind-1.9.4 and later automatically handle |
| 83 | this correctly, and should not segfault. |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 84 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame^] | 85 | ----------------------------------------------------------------- |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 86 | |
| 87 | Q5. I try running "valgrind my_program", but my_program runs normally, |
| 88 | and Valgrind doesn't emit any output at all. |
| 89 | |
| 90 | A5. Is my_program statically linked? Valgrind doesn't work with |
| 91 | statically linked binaries. It must rely on at least one shared |
| 92 | object. To detrmine if a my_program is statically linked, run: |
| 93 | |
| 94 | ldd my_program |
| 95 | |
| 96 | It will show what shared objects my_program relies on, or say: |
| 97 | |
| 98 | not a dynamic executable |
| 99 | |
| 100 | it my_program is statically linked. |
| 101 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame^] | 102 | ----------------------------------------------------------------- |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 103 | |
| 104 | Q6. I try running "valgrind my_program" and get Valgrind's startup message, |
| 105 | but I don't get any errors and I know my program has errors. |
| 106 | |
| 107 | A6. By default, Valgrind only traces the top-level process. So if your |
| 108 | program spawns children, they won't be traced by Valgrind by default. |
| 109 | Also, if your program is started by a shell script, Perl script, or |
| 110 | something similar, Valgrind will trace the shell, or the Perl |
| 111 | interpreter, or equivalent. |
| 112 | |
| 113 | To trace child processes, use the --trace-children=yes option. |
| 114 | |
sewardj | 36a53ad | 2003-04-22 23:26:24 +0000 | [diff] [blame^] | 115 | If you are tracing large trees of processes, it can be less |
| 116 | disruptive to have the output sent over the network. Give |
| 117 | valgrind the flag --logsocket=127.0.0.1:12345 (if you want |
| 118 | logging output sent to port 12345 on localhost). You can |
| 119 | use the valgrind-listener program to listen on that port: |
| 120 | valgrind-listener 12345 |
| 121 | Obviously you have to start the listener process first. |
| 122 | See the documentation for more details. |
| 123 | |
| 124 | ----------------------------------------------------------------- |
| 125 | |
| 126 | Q7. My threaded server process runs unbelievably slowly on |
| 127 | valgrind. So slowly, in fact, that at first I thought it |
| 128 | had completely locked up. |
| 129 | |
| 130 | A7. We are not completely sure about this, but one possibility |
| 131 | is that laptops with power management fool valgrind's |
| 132 | timekeeping mechanism, which is (somewhat in error) based |
| 133 | on the x86 RDTSC instruction. A "fix" which is claimed to |
| 134 | work is to run some other cpu-intensive process at the same |
| 135 | time, so that the laptop's power-management clock-slowing |
| 136 | does not kick in. We would be interested in hearing more |
| 137 | feedback on this. |
| 138 | |
| 139 | ----------------------------------------------------------------- |
| 140 | |
| 141 | Q8. My program dies (exactly) like this: |
| 142 | |
| 143 | REPE then 0xF |
| 144 | valgrind: the `impossible' happened: |
| 145 | Unhandled REPE case |
| 146 | |
| 147 | A8. Yeah ... that I believe is a P4 specific instruction. Are you |
| 148 | building your app with -march=pentium4 or something like that? |
| 149 | Others have reported that removing the flag works around this. |
| 150 | In fact this is pretty easy to fix and I do have it on my |
| 151 | to-do-for-1.9.6 list. |
| 152 | |
| 153 | I'd be interested to hear if you can get rid of it by changing |
| 154 | your application build flags. |
| 155 | |
| 156 | ----------------------------------------------------------------- |
| 157 | |
| 158 | Q9. My program dies complaining that __libc_current_sigrtmin |
| 159 | is unimplemented. |
| 160 | |
| 161 | A9. Try the following. It is an experiment, but it might work. |
| 162 | We would very much appreciate you telling us if it does/ |
| 163 | does not work for you. |
| 164 | |
| 165 | In vg_libpthread.c, add the 3 functions below. |
| 166 | |
| 167 | In vg_libpthread_unimp.c, remove the stubs for the same 3 |
| 168 | functions. |
| 169 | |
| 170 | Let me know if it helps. Quite a lot of other valgrind users |
| 171 | complain about this, but I have never been able to reproduce it, |
| 172 | so fixing it isn't easy. So it's useful if you can try. |
| 173 | |
| 174 | int __libc_current_sigrtmin (void) |
| 175 | { |
| 176 | return -1; |
| 177 | } |
| 178 | |
| 179 | int __libc_current_sigrtmax (void) |
| 180 | { |
| 181 | return -1; |
| 182 | } |
| 183 | |
| 184 | int __libc_allocate_rtsig (int high) |
| 185 | { |
| 186 | return -1; |
| 187 | } |
| 188 | |
| 189 | ----------------------------------------------------------------- |
| 190 | |
| 191 | Q10. I upgraded to Red Hat 9 and threaded programs now act |
| 192 | strange / deadlock when they didn't before. |
| 193 | |
| 194 | A10. Thread support on glibc 2.3.2+ with NPTL is not as |
| 195 | good as on older LinuxThreads-based systems. We have |
| 196 | this under consideration. Avoid Red Hat >= 8.1 for |
| 197 | the time being, if you can. |
| 198 | |
| 199 | ----------------------------------------------------------------- |
| 200 | |
| 201 | Q11. I really need to use the NVidia libGL.so in my app. |
| 202 | Help! |
| 203 | |
| 204 | A11. NVidia also noticed this it seems, and the "latest" drivers |
| 205 | (version 4349, apparently) come with this text |
| 206 | |
| 207 | DISABLING CPU SPECIFIC FEATURES |
| 208 | |
| 209 | Setting the environment variable __GL_FORCE_GENERIC_CPU to a |
| 210 | non-zero value will inhibit the use of CPU specific features |
| 211 | such as MMX, SSE, or 3DNOW!. Use of this option may result in |
| 212 | performance loss. This option may be useful in conjunction with |
| 213 | software such as the Valgrind memory debugger. |
| 214 | |
| 215 | Set __GL_FORCE_GENERIC_CPU=1 and Valgrind should work. This has |
| 216 | been confirmed by various people. Thanks NVidia! |
| 217 | |
| 218 | ----------------------------------------------------------------- |
| 219 | |
| 220 | Q12. My program dies like this (often at exit): |
| 221 | |
| 222 | VG_(mash_LD_PRELOAD_and_LD_LIBRARY_PATH): internal error: |
| 223 | (loads of text) |
| 224 | |
| 225 | A12. We're not entirely sure about this, and would appreciate |
| 226 | someone sending a simple test case for us to look at. |
| 227 | One possible cause is that your program modifies its |
| 228 | environment variables, possibly including zeroing them |
| 229 | all. Avoid this if you can. |
| 230 | |
| 231 | In any case, you may be able to work around it like this: |
| 232 | Comment out the |
| 233 | call to VG_(core_panic) at coregrind/vg_main.c:1647 and see |
| 234 | if that helps. The text of coregrind/vg_main.c:1647 is as follows: |
| 235 | |
| 236 | VG_(core_panic)("VG_(mash_LD_PRELOAD_and_LD_LIBRARY_PATH) failed\n"); |
| 237 | |
| 238 | and so it's this call you want to comment out. |
| 239 | |
| 240 | ----------------------------------------------------------------- |
| 241 | |
| 242 | Q13. My program dies like this: |
| 243 | |
| 244 | error: /lib/librt.so.1: symbol __pthread_clock_settime, version |
| 245 | GLIBC_PRIVATE not defined in file libpthread.so.0 with link time |
| 246 | reference |
| 247 | |
| 248 | A13. This is a total swamp. Nevertheless there is a way out. |
| 249 | It's a problem which is not easy to fix. Really the problem is |
| 250 | that /lib/librt.so.1 refers to some symbols |
| 251 | __pthread_clock_settime and __pthread_clock_gettime in |
| 252 | /lib/libpthread.so which are not intended to be exported, ie |
| 253 | they are private. |
| 254 | |
| 255 | Best solution is to ensure your program does not use |
| 256 | /lib/librt.so.1. |
| 257 | |
| 258 | However .. since you're probably not using it directly, or even |
| 259 | knowingly, that's hard to do. You might instead be able to fix |
| 260 | it by playing around with coregrind/vg_libpthread.vs. Things to |
| 261 | try: |
| 262 | |
| 263 | Remove this |
| 264 | |
| 265 | GLIBC_PRIVATE { |
| 266 | __pthread_clock_gettime; |
| 267 | __pthread_clock_settime; |
| 268 | }; |
| 269 | |
| 270 | or maybe remove this |
| 271 | |
| 272 | GLIBC_2.2.3 { |
| 273 | __pthread_clock_gettime; |
| 274 | __pthread_clock_settime; |
| 275 | } GLIBC_2.2; |
| 276 | |
| 277 | or maybe add this |
| 278 | |
| 279 | GLIBC_2.2.4 { |
| 280 | __pthread_clock_gettime; |
| 281 | __pthread_clock_settime; |
| 282 | } GLIBC_2.2; |
| 283 | |
| 284 | GLIBC_2.2.5 { |
| 285 | __pthread_clock_gettime; |
| 286 | __pthread_clock_settime; |
| 287 | } GLIBC_2.2; |
| 288 | |
| 289 | or some combination of the above. After each change you need to |
| 290 | delete coregrind/libpthread.so and do make && make install. |
| 291 | |
| 292 | I just don't know if any of the above will work. If you can |
| 293 | find a solution which works, I would be interested to hear it. |
| 294 | |
| 295 | To which someone replied: |
| 296 | |
| 297 | I deleted this: |
| 298 | |
| 299 | GLIBC_2.2.3 { |
| 300 | __pthread_clock_gettime; |
| 301 | __pthread_clock_settime; |
| 302 | } GLIBC_2.2; |
| 303 | |
| 304 | and it worked. |
| 305 | |
| 306 | ----------------------------------------------------------------- |
njn | 4e59bd9 | 2003-04-22 20:58:47 +0000 | [diff] [blame] | 307 | |
| 308 | (this is the end of the FAQ.) |