blob: 3469afc66a98a11179c580f31c9e4ae520ddd48a [file] [log] [blame]
njn4e59bd92003-04-22 20:58:47 +00001
sewardj36a53ad2003-04-22 23:26:24 +00002A mini-FAQ for valgrind, version 1.9.6
njn4e59bd92003-04-22 20:58:47 +00003~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
sewardj3d47b792003-05-05 22:15:35 +00004Last revised 5 May 2003
5~~~~~~~~~~~~~~~~~~~~~~~
njn4e59bd92003-04-22 20:58:47 +00006
sewardj36a53ad2003-04-22 23:26:24 +00007-----------------------------------------------------------------
8
njn4e59bd92003-04-22 20:58:47 +00009Q1. Programs run OK on valgrind, but at exit produce a bunch
10 of errors a bit like this
11
12 ==20755== Invalid read of size 4
13 ==20755== at 0x40281C8A: _nl_unload_locale (loadlocale.c:238)
14 ==20755== by 0x4028179D: free_mem (findlocale.c:257)
15 ==20755== by 0x402E0962: __libc_freeres (set-freeres.c:34)
16 ==20755== by 0x40048DCC: vgPlain___libc_freeres_wrapper
17 (vg_clientfuncs.c:585)
18 ==20755== Address 0x40CC304C is 8 bytes inside a block of size 380 free'd
19 ==20755== at 0x400484C9: free (vg_clientfuncs.c:180)
20 ==20755== by 0x40281CBA: _nl_unload_locale (loadlocale.c:246)
21 ==20755== by 0x40281218: free_mem (setlocale.c:461)
22 ==20755== by 0x402E0962: __libc_freeres (set-freeres.c:34)
23
24 and then die with a segmentation fault.
25
26A1. When the program exits, valgrind runs the procedure
27 __libc_freeres() in glibc. This is a hook for memory debuggers,
28 so they can ask glibc to free up any memory it has used. Doing
29 that is needed to ensure that valgrind doesn't incorrectly
30 report space leaks in glibc.
31
32 Problem is that running __libc_freeres() in older glibc versions
33 causes this crash.
34
njn4e59bd92003-04-22 20:58:47 +000035 WORKAROUND FOR 1.1.X and later versions of valgrind: use the
sewardj36a53ad2003-04-22 23:26:24 +000036 --run-libc-freeres=no flag. You may then get space leak
37 reports for glibc-allocations (please _don't_ report these
38 to the glibc people, since they are not real leaks), but at
39 least the program runs.
njn4e59bd92003-04-22 20:58:47 +000040
sewardj36a53ad2003-04-22 23:26:24 +000041-----------------------------------------------------------------
njn4e59bd92003-04-22 20:58:47 +000042
43Q2. My program dies complaining that syscall 197 is unimplemented.
44
45A2. 197, which is fstat64, is supported by valgrind. The problem is
46 that the /usr/include/asm/unistd.h on the machine on which your
47 valgrind was built, doesn't match your kernel -- or, to be more
48 specific, glibc is asking your kernel to do a syscall which is
49 not listed in /usr/include/asm/unistd.h.
50
sewardj36a53ad2003-04-22 23:26:24 +000051 The fix is simple. Somewhere near the top of
52 coregrind/vg_syscalls.c, add the following line:
njn4e59bd92003-04-22 20:58:47 +000053
54 #define __NR_fstat64 197
55
56 Rebuild and try again. The above line should appear before any
57 uses of the __NR_fstat64 symbol in that file. If you look at the
sewardj36a53ad2003-04-22 23:26:24 +000058 place where __NR_fstat64 is used in vg_syscalls.c, it will be
59 obvious why this fix works.
njn4e59bd92003-04-22 20:58:47 +000060
sewardj36a53ad2003-04-22 23:26:24 +000061-----------------------------------------------------------------
njn4e59bd92003-04-22 20:58:47 +000062
63Q3. My (buggy) program dies like this:
64 valgrind: vg_malloc2.c:442 (bszW_to_pszW):
65 Assertion `pszW >= 0' failed.
66 And/or my (buggy) program runs OK on valgrind, but dies like
67 this on cachegrind.
68
69A3. If valgrind shows any invalid reads, invalid writes and invalid
70 frees in your program, the above may happen. Reason is that your
71 program may trash valgrind's low-level memory manager, which then
72 dies with the above assertion, or something like this. The cure
73 is to fix your program so that it doesn't do any illegal memory
74 accesses. The above failure will hopefully go away after that.
75
sewardj36a53ad2003-04-22 23:26:24 +000076-----------------------------------------------------------------
njn4e59bd92003-04-22 20:58:47 +000077
78Q4. I'm running Red Hat Advanced Server. Valgrind always segfaults at
79 startup.
80
sewardj36a53ad2003-04-22 23:26:24 +000081A4. Known issue with RHAS 2.1, due to funny stack permissions at
82 startup. However, valgrind-1.9.4 and later automatically handle
83 this correctly, and should not segfault.
njn4e59bd92003-04-22 20:58:47 +000084
sewardj36a53ad2003-04-22 23:26:24 +000085-----------------------------------------------------------------
njn4e59bd92003-04-22 20:58:47 +000086
87Q5. I try running "valgrind my_program", but my_program runs normally,
88 and Valgrind doesn't emit any output at all.
89
90A5. Is my_program statically linked? Valgrind doesn't work with
njn5187f432003-04-23 07:35:56 +000091 statically linked binaries. my_program must rely on at least one
92 shared object. To determine if a my_program is statically linked,
93 run:
njn4e59bd92003-04-22 20:58:47 +000094
95 ldd my_program
96
97 It will show what shared objects my_program relies on, or say:
98
99 not a dynamic executable
100
njn5187f432003-04-23 07:35:56 +0000101 if my_program is statically linked.
njn4e59bd92003-04-22 20:58:47 +0000102
sewardj36a53ad2003-04-22 23:26:24 +0000103-----------------------------------------------------------------
njn4e59bd92003-04-22 20:58:47 +0000104
105Q6. I try running "valgrind my_program" and get Valgrind's startup message,
106 but I don't get any errors and I know my program has errors.
107
108A6. By default, Valgrind only traces the top-level process. So if your
109 program spawns children, they won't be traced by Valgrind by default.
110 Also, if your program is started by a shell script, Perl script, or
111 something similar, Valgrind will trace the shell, or the Perl
112 interpreter, or equivalent.
113
114 To trace child processes, use the --trace-children=yes option.
115
sewardj36a53ad2003-04-22 23:26:24 +0000116 If you are tracing large trees of processes, it can be less
117 disruptive to have the output sent over the network. Give
118 valgrind the flag --logsocket=127.0.0.1:12345 (if you want
119 logging output sent to port 12345 on localhost). You can
120 use the valgrind-listener program to listen on that port:
121 valgrind-listener 12345
122 Obviously you have to start the listener process first.
123 See the documentation for more details.
124
125-----------------------------------------------------------------
126
127Q7. My threaded server process runs unbelievably slowly on
128 valgrind. So slowly, in fact, that at first I thought it
129 had completely locked up.
130
131A7. We are not completely sure about this, but one possibility
132 is that laptops with power management fool valgrind's
133 timekeeping mechanism, which is (somewhat in error) based
134 on the x86 RDTSC instruction. A "fix" which is claimed to
135 work is to run some other cpu-intensive process at the same
136 time, so that the laptop's power-management clock-slowing
137 does not kick in. We would be interested in hearing more
138 feedback on this.
139
sewardj3d47b792003-05-05 22:15:35 +0000140 Another possible cause is that versions prior to 1.9.6
141 did not support threading on glibc 2.3.X systems well.
142 Hopefully the situation is much improved with 1.9.6.
143
sewardj36a53ad2003-04-22 23:26:24 +0000144-----------------------------------------------------------------
145
146Q8. My program dies (exactly) like this:
147
148 REPE then 0xF
149 valgrind: the `impossible' happened:
150 Unhandled REPE case
151
sewardj3d47b792003-05-05 22:15:35 +0000152A8. Yeah ... that I believe is a SSE or SSE2 instruction. Are you
153 building your app with -march=pentium4 or -march=athlon or
154 something like that? If you can somehow dissuade gcc from
155 producing SSE/SSE2 instructions, you may be able to avoid this.
156 Some folks have reported that removing the flag -march=...
157 works around this.
sewardj36a53ad2003-04-22 23:26:24 +0000158
159 I'd be interested to hear if you can get rid of it by changing
160 your application build flags.
161
162-----------------------------------------------------------------
163
164Q9. My program dies complaining that __libc_current_sigrtmin
165 is unimplemented.
166
sewardj3d47b792003-05-05 22:15:35 +0000167A9. Should be fixed in 1.9.6. I would appreciate confirmation
168 of that.
sewardj03272ff2003-04-26 22:23:35 +0000169
sewardj36a53ad2003-04-22 23:26:24 +0000170-----------------------------------------------------------------
171
172Q10. I upgraded to Red Hat 9 and threaded programs now act
173 strange / deadlock when they didn't before.
174
175A10. Thread support on glibc 2.3.2+ with NPTL is not as
176 good as on older LinuxThreads-based systems. We have
177 this under consideration. Avoid Red Hat >= 8.1 for
178 the time being, if you can.
179
sewardj3d47b792003-05-05 22:15:35 +0000180 5 May 03: 1.9.6 should be significantly improved on
181 Red Hat 9, SuSE 8.2 and other glibc-2.3.2 systems.
182
sewardj36a53ad2003-04-22 23:26:24 +0000183-----------------------------------------------------------------
184
185Q11. I really need to use the NVidia libGL.so in my app.
186 Help!
187
188A11. NVidia also noticed this it seems, and the "latest" drivers
189 (version 4349, apparently) come with this text
190
191 DISABLING CPU SPECIFIC FEATURES
192
193 Setting the environment variable __GL_FORCE_GENERIC_CPU to a
194 non-zero value will inhibit the use of CPU specific features
195 such as MMX, SSE, or 3DNOW!. Use of this option may result in
196 performance loss. This option may be useful in conjunction with
197 software such as the Valgrind memory debugger.
198
199 Set __GL_FORCE_GENERIC_CPU=1 and Valgrind should work. This has
200 been confirmed by various people. Thanks NVidia!
201
202-----------------------------------------------------------------
203
204Q12. My program dies like this (often at exit):
205
206 VG_(mash_LD_PRELOAD_and_LD_LIBRARY_PATH): internal error:
207 (loads of text)
208
njnab882982003-08-13 08:34:42 +0000209A12. One possible cause is that your program modifies its
sewardj36a53ad2003-04-22 23:26:24 +0000210 environment variables, possibly including zeroing them
njn481f8512003-08-13 09:56:30 +0000211 all. Valgrind relies on the LD_PRELOAD, LD_LIBRARY_PATH and
212 VG_ARGS variables. Zeroing them will break things.
sewardj36a53ad2003-04-22 23:26:24 +0000213
njn3cf14302003-08-19 07:50:24 +0000214 As of 1.9.6, Valgrind only uses these variables with
215 --trace-children=no, when executing execve() or using the
216 --stop-after=yes flag. This should reduce the potential for
njnab882982003-08-13 08:34:42 +0000217 problems.
sewardj36a53ad2003-04-22 23:26:24 +0000218
219-----------------------------------------------------------------
220
221Q13. My program dies like this:
222
223 error: /lib/librt.so.1: symbol __pthread_clock_settime, version
224 GLIBC_PRIVATE not defined in file libpthread.so.0 with link time
225 reference
226
227A13. This is a total swamp. Nevertheless there is a way out.
228 It's a problem which is not easy to fix. Really the problem is
229 that /lib/librt.so.1 refers to some symbols
230 __pthread_clock_settime and __pthread_clock_gettime in
231 /lib/libpthread.so which are not intended to be exported, ie
232 they are private.
233
234 Best solution is to ensure your program does not use
235 /lib/librt.so.1.
236
237 However .. since you're probably not using it directly, or even
238 knowingly, that's hard to do. You might instead be able to fix
239 it by playing around with coregrind/vg_libpthread.vs. Things to
240 try:
241
242 Remove this
243
244 GLIBC_PRIVATE {
245 __pthread_clock_gettime;
246 __pthread_clock_settime;
247 };
248
249 or maybe remove this
250
251 GLIBC_2.2.3 {
252 __pthread_clock_gettime;
253 __pthread_clock_settime;
254 } GLIBC_2.2;
255
256 or maybe add this
257
258 GLIBC_2.2.4 {
259 __pthread_clock_gettime;
260 __pthread_clock_settime;
261 } GLIBC_2.2;
262
263 GLIBC_2.2.5 {
264 __pthread_clock_gettime;
265 __pthread_clock_settime;
266 } GLIBC_2.2;
267
268 or some combination of the above. After each change you need to
269 delete coregrind/libpthread.so and do make && make install.
270
271 I just don't know if any of the above will work. If you can
272 find a solution which works, I would be interested to hear it.
273
274 To which someone replied:
275
276 I deleted this:
277
278 GLIBC_2.2.3 {
279 __pthread_clock_gettime;
280 __pthread_clock_settime;
281 } GLIBC_2.2;
282
283 and it worked.
284
285-----------------------------------------------------------------
njn4e59bd92003-04-22 20:58:47 +0000286
sewardj03272ff2003-04-26 22:23:35 +0000287Q14. My program uses the C++ STL and string classes. Valgrind
288 reports 'still reachable' memory leaks involving these classes
289 at the exit of the program, but there should be none.
290
291A14. First of all: relax, it's probably not a bug, but a feature.
292 Many implementations of the C++ standard libraries use their own
293 memory pool allocators. Memory for quite a number of destructed
294 objects is not immediately freed and given back to the OS, but
295 kept in the pool(s) for later re-use. The fact that the pools
296 are not freed at the exit() of the program cause valgrind to
297 report this memory as still reachable. The behaviour not to
298 free pools at the exit() could be called a bug of the library
299 though.
300
301 Using gcc, you can force the STL to use malloc and to free
302 memory as soon as possible by globally disabling memory caching.
303 Beware! Doing so will probably slow down your program,
304 sometimes drastically.
305
306 - With gcc 2.91, 2.95, 3.0 and 3.1, compile all source using the
307 STL with -D__USE_MALLOC. Beware! This is removed from gcc
308 starting with version 3.3.
309
310 - With 3.2.2 and later, you should export the environment
311 variable GLIBCPP_FORCE_NEW before running your program.
312
313 There are other ways to disable memory pooling: using the
314 malloc_alloc template with your objects (not portable, but
315 should work for gcc) or even writing your own memory
316 allocators. But all this goes beyond the scope of this
317 FAQ. Start by reading
318 http://gcc.gnu.org/onlinedocs/libstdc++/ext/howto.html#3
319 if you absolutely want to do that. But beware:
320
321 1) there are currently changes underway for gcc which are not
322 totally reflected in the docs right now
323 ("now" == 26 Apr 03)
324
325 2) allocators belong to the more messy parts of the STL and
326 people went at great lengths to make it portable across
327 platforms. Chances are good that your solution will work
328 on your platform, but not on others.
329
330-----------------------------------------------------------------
331
njnae34aef2003-08-07 21:24:24 +0000332Q15. My program dies with a segmentation fault, but Valgrind doesn't give
333 any error messages before it, or none that look related.
334
335A15. The one kind of segmentation fault that Valgrind won't give any
336 warnings about is writes to read-only memory. Maybe your program is
337 writing to a static string like this:
338
339 char* s = "hello";
340 s[0] = 'j';
341
342 or something similar. Writing to read-only memory can also apparently
343 make LinuxThreads behave strangely.
344
345-----------------------------------------------------------------
346
njn1aa18502003-08-15 07:35:20 +0000347Q16. When I trying building Valgrind, 'make' dies partway with an
348 assertion failure, something like this: make: expand.c:489:
349
350 allocated_variable_append: Assertion
351 `current_variable_set_list->next != 0' failed.
352
353A16. It's probably a bug in 'make'. Some, but not all, instances of
354 version 3.79.1 have this bug, see
355 www.mail-archive.com/bug-make@gnu.org/msg01658.html. Try upgrading to a
356 more recent version of 'make'.
357
358-----------------------------------------------------------------
359
njna8fb5a32003-08-20 11:19:17 +0000360Q17. I tried writing a suppression but it didn't work. Can you
361 write my suppression for me?
362
363A17. Yes! Use the --gen-suppressions=yes feature to spit out
364 suppressions automatically for you. You can then edit them
365 if you like, eg. combining similar automatically generated
366 suppressions using wildcards like '*'.
367
368 If you really want to write suppressions by hand, read the
369 manual carefully. Note particularly that C++ function names
370 must be _mangled_.
371
372-----------------------------------------------------------------
373
njn4e59bd92003-04-22 20:58:47 +0000374(this is the end of the FAQ.)