blob: 376010a7b0bb87a68003ac9dec3ad8b0c03b7269 [file] [log] [blame]
njn4e59bd92003-04-22 20:58:47 +00001
sewardj36a53ad2003-04-22 23:26:24 +00002A mini-FAQ for valgrind, version 1.9.6
njn4e59bd92003-04-22 20:58:47 +00003~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
sewardj3d47b792003-05-05 22:15:35 +00004Last revised 5 May 2003
5~~~~~~~~~~~~~~~~~~~~~~~
njn4e59bd92003-04-22 20:58:47 +00006
sewardj36a53ad2003-04-22 23:26:24 +00007-----------------------------------------------------------------
8
njn4e59bd92003-04-22 20:58:47 +00009Q1. Programs run OK on valgrind, but at exit produce a bunch
10 of errors a bit like this
11
12 ==20755== Invalid read of size 4
13 ==20755== at 0x40281C8A: _nl_unload_locale (loadlocale.c:238)
14 ==20755== by 0x4028179D: free_mem (findlocale.c:257)
15 ==20755== by 0x402E0962: __libc_freeres (set-freeres.c:34)
16 ==20755== by 0x40048DCC: vgPlain___libc_freeres_wrapper
17 (vg_clientfuncs.c:585)
18 ==20755== Address 0x40CC304C is 8 bytes inside a block of size 380 free'd
19 ==20755== at 0x400484C9: free (vg_clientfuncs.c:180)
20 ==20755== by 0x40281CBA: _nl_unload_locale (loadlocale.c:246)
21 ==20755== by 0x40281218: free_mem (setlocale.c:461)
22 ==20755== by 0x402E0962: __libc_freeres (set-freeres.c:34)
23
24 and then die with a segmentation fault.
25
26A1. When the program exits, valgrind runs the procedure
27 __libc_freeres() in glibc. This is a hook for memory debuggers,
28 so they can ask glibc to free up any memory it has used. Doing
29 that is needed to ensure that valgrind doesn't incorrectly
30 report space leaks in glibc.
31
32 Problem is that running __libc_freeres() in older glibc versions
33 causes this crash.
34
njn4e59bd92003-04-22 20:58:47 +000035 WORKAROUND FOR 1.1.X and later versions of valgrind: use the
sewardj36a53ad2003-04-22 23:26:24 +000036 --run-libc-freeres=no flag. You may then get space leak
37 reports for glibc-allocations (please _don't_ report these
38 to the glibc people, since they are not real leaks), but at
39 least the program runs.
njn4e59bd92003-04-22 20:58:47 +000040
sewardj36a53ad2003-04-22 23:26:24 +000041-----------------------------------------------------------------
njn4e59bd92003-04-22 20:58:47 +000042
43Q2. My program dies complaining that syscall 197 is unimplemented.
44
45A2. 197, which is fstat64, is supported by valgrind. The problem is
46 that the /usr/include/asm/unistd.h on the machine on which your
47 valgrind was built, doesn't match your kernel -- or, to be more
48 specific, glibc is asking your kernel to do a syscall which is
49 not listed in /usr/include/asm/unistd.h.
50
sewardj36a53ad2003-04-22 23:26:24 +000051 The fix is simple. Somewhere near the top of
52 coregrind/vg_syscalls.c, add the following line:
njn4e59bd92003-04-22 20:58:47 +000053
54 #define __NR_fstat64 197
55
56 Rebuild and try again. The above line should appear before any
57 uses of the __NR_fstat64 symbol in that file. If you look at the
sewardj36a53ad2003-04-22 23:26:24 +000058 place where __NR_fstat64 is used in vg_syscalls.c, it will be
59 obvious why this fix works.
njn4e59bd92003-04-22 20:58:47 +000060
sewardj36a53ad2003-04-22 23:26:24 +000061-----------------------------------------------------------------
njn4e59bd92003-04-22 20:58:47 +000062
63Q3. My (buggy) program dies like this:
64 valgrind: vg_malloc2.c:442 (bszW_to_pszW):
65 Assertion `pszW >= 0' failed.
66 And/or my (buggy) program runs OK on valgrind, but dies like
67 this on cachegrind.
68
69A3. If valgrind shows any invalid reads, invalid writes and invalid
70 frees in your program, the above may happen. Reason is that your
71 program may trash valgrind's low-level memory manager, which then
72 dies with the above assertion, or something like this. The cure
73 is to fix your program so that it doesn't do any illegal memory
74 accesses. The above failure will hopefully go away after that.
75
sewardj36a53ad2003-04-22 23:26:24 +000076-----------------------------------------------------------------
njn4e59bd92003-04-22 20:58:47 +000077
78Q4. I'm running Red Hat Advanced Server. Valgrind always segfaults at
79 startup.
80
sewardj36a53ad2003-04-22 23:26:24 +000081A4. Known issue with RHAS 2.1, due to funny stack permissions at
82 startup. However, valgrind-1.9.4 and later automatically handle
83 this correctly, and should not segfault.
njn4e59bd92003-04-22 20:58:47 +000084
sewardj36a53ad2003-04-22 23:26:24 +000085-----------------------------------------------------------------
njn4e59bd92003-04-22 20:58:47 +000086
87Q5. I try running "valgrind my_program", but my_program runs normally,
88 and Valgrind doesn't emit any output at all.
89
90A5. Is my_program statically linked? Valgrind doesn't work with
njn5187f432003-04-23 07:35:56 +000091 statically linked binaries. my_program must rely on at least one
92 shared object. To determine if a my_program is statically linked,
93 run:
njn4e59bd92003-04-22 20:58:47 +000094
95 ldd my_program
96
97 It will show what shared objects my_program relies on, or say:
98
99 not a dynamic executable
100
njn5187f432003-04-23 07:35:56 +0000101 if my_program is statically linked.
njn4e59bd92003-04-22 20:58:47 +0000102
sewardj36a53ad2003-04-22 23:26:24 +0000103-----------------------------------------------------------------
njn4e59bd92003-04-22 20:58:47 +0000104
105Q6. I try running "valgrind my_program" and get Valgrind's startup message,
106 but I don't get any errors and I know my program has errors.
107
108A6. By default, Valgrind only traces the top-level process. So if your
109 program spawns children, they won't be traced by Valgrind by default.
110 Also, if your program is started by a shell script, Perl script, or
111 something similar, Valgrind will trace the shell, or the Perl
112 interpreter, or equivalent.
113
114 To trace child processes, use the --trace-children=yes option.
115
sewardj36a53ad2003-04-22 23:26:24 +0000116 If you are tracing large trees of processes, it can be less
117 disruptive to have the output sent over the network. Give
118 valgrind the flag --logsocket=127.0.0.1:12345 (if you want
119 logging output sent to port 12345 on localhost). You can
120 use the valgrind-listener program to listen on that port:
121 valgrind-listener 12345
122 Obviously you have to start the listener process first.
123 See the documentation for more details.
124
125-----------------------------------------------------------------
126
127Q7. My threaded server process runs unbelievably slowly on
128 valgrind. So slowly, in fact, that at first I thought it
129 had completely locked up.
130
131A7. We are not completely sure about this, but one possibility
132 is that laptops with power management fool valgrind's
133 timekeeping mechanism, which is (somewhat in error) based
134 on the x86 RDTSC instruction. A "fix" which is claimed to
135 work is to run some other cpu-intensive process at the same
136 time, so that the laptop's power-management clock-slowing
137 does not kick in. We would be interested in hearing more
138 feedback on this.
139
sewardj3d47b792003-05-05 22:15:35 +0000140 Another possible cause is that versions prior to 1.9.6
141 did not support threading on glibc 2.3.X systems well.
142 Hopefully the situation is much improved with 1.9.6.
143
sewardj36a53ad2003-04-22 23:26:24 +0000144-----------------------------------------------------------------
145
146Q8. My program dies (exactly) like this:
147
148 REPE then 0xF
149 valgrind: the `impossible' happened:
150 Unhandled REPE case
151
sewardj3d47b792003-05-05 22:15:35 +0000152A8. Yeah ... that I believe is a SSE or SSE2 instruction. Are you
153 building your app with -march=pentium4 or -march=athlon or
154 something like that? If you can somehow dissuade gcc from
155 producing SSE/SSE2 instructions, you may be able to avoid this.
156 Some folks have reported that removing the flag -march=...
157 works around this.
sewardj36a53ad2003-04-22 23:26:24 +0000158
159 I'd be interested to hear if you can get rid of it by changing
160 your application build flags.
161
162-----------------------------------------------------------------
163
164Q9. My program dies complaining that __libc_current_sigrtmin
165 is unimplemented.
166
sewardj3d47b792003-05-05 22:15:35 +0000167A9. Should be fixed in 1.9.6. I would appreciate confirmation
168 of that.
sewardj03272ff2003-04-26 22:23:35 +0000169
sewardj36a53ad2003-04-22 23:26:24 +0000170-----------------------------------------------------------------
171
172Q10. I upgraded to Red Hat 9 and threaded programs now act
173 strange / deadlock when they didn't before.
174
175A10. Thread support on glibc 2.3.2+ with NPTL is not as
176 good as on older LinuxThreads-based systems. We have
177 this under consideration. Avoid Red Hat >= 8.1 for
178 the time being, if you can.
179
sewardj3d47b792003-05-05 22:15:35 +0000180 5 May 03: 1.9.6 should be significantly improved on
181 Red Hat 9, SuSE 8.2 and other glibc-2.3.2 systems.
182
sewardj36a53ad2003-04-22 23:26:24 +0000183-----------------------------------------------------------------
184
185Q11. I really need to use the NVidia libGL.so in my app.
186 Help!
187
188A11. NVidia also noticed this it seems, and the "latest" drivers
189 (version 4349, apparently) come with this text
190
191 DISABLING CPU SPECIFIC FEATURES
192
193 Setting the environment variable __GL_FORCE_GENERIC_CPU to a
194 non-zero value will inhibit the use of CPU specific features
195 such as MMX, SSE, or 3DNOW!. Use of this option may result in
196 performance loss. This option may be useful in conjunction with
197 software such as the Valgrind memory debugger.
198
199 Set __GL_FORCE_GENERIC_CPU=1 and Valgrind should work. This has
200 been confirmed by various people. Thanks NVidia!
201
202-----------------------------------------------------------------
203
204Q12. My program dies like this (often at exit):
205
206 VG_(mash_LD_PRELOAD_and_LD_LIBRARY_PATH): internal error:
207 (loads of text)
208
209A12. We're not entirely sure about this, and would appreciate
210 someone sending a simple test case for us to look at.
211 One possible cause is that your program modifies its
212 environment variables, possibly including zeroing them
213 all. Avoid this if you can.
214
sewardj3d47b792003-05-05 22:15:35 +0000215 1.9.6 contains a fix which hopefully reduces the chances
216 of your program bombing out like this.
sewardj36a53ad2003-04-22 23:26:24 +0000217
218-----------------------------------------------------------------
219
220Q13. My program dies like this:
221
222 error: /lib/librt.so.1: symbol __pthread_clock_settime, version
223 GLIBC_PRIVATE not defined in file libpthread.so.0 with link time
224 reference
225
226A13. This is a total swamp. Nevertheless there is a way out.
227 It's a problem which is not easy to fix. Really the problem is
228 that /lib/librt.so.1 refers to some symbols
229 __pthread_clock_settime and __pthread_clock_gettime in
230 /lib/libpthread.so which are not intended to be exported, ie
231 they are private.
232
233 Best solution is to ensure your program does not use
234 /lib/librt.so.1.
235
236 However .. since you're probably not using it directly, or even
237 knowingly, that's hard to do. You might instead be able to fix
238 it by playing around with coregrind/vg_libpthread.vs. Things to
239 try:
240
241 Remove this
242
243 GLIBC_PRIVATE {
244 __pthread_clock_gettime;
245 __pthread_clock_settime;
246 };
247
248 or maybe remove this
249
250 GLIBC_2.2.3 {
251 __pthread_clock_gettime;
252 __pthread_clock_settime;
253 } GLIBC_2.2;
254
255 or maybe add this
256
257 GLIBC_2.2.4 {
258 __pthread_clock_gettime;
259 __pthread_clock_settime;
260 } GLIBC_2.2;
261
262 GLIBC_2.2.5 {
263 __pthread_clock_gettime;
264 __pthread_clock_settime;
265 } GLIBC_2.2;
266
267 or some combination of the above. After each change you need to
268 delete coregrind/libpthread.so and do make && make install.
269
270 I just don't know if any of the above will work. If you can
271 find a solution which works, I would be interested to hear it.
272
273 To which someone replied:
274
275 I deleted this:
276
277 GLIBC_2.2.3 {
278 __pthread_clock_gettime;
279 __pthread_clock_settime;
280 } GLIBC_2.2;
281
282 and it worked.
283
284-----------------------------------------------------------------
njn4e59bd92003-04-22 20:58:47 +0000285
sewardj03272ff2003-04-26 22:23:35 +0000286Q14. My program uses the C++ STL and string classes. Valgrind
287 reports 'still reachable' memory leaks involving these classes
288 at the exit of the program, but there should be none.
289
290A14. First of all: relax, it's probably not a bug, but a feature.
291 Many implementations of the C++ standard libraries use their own
292 memory pool allocators. Memory for quite a number of destructed
293 objects is not immediately freed and given back to the OS, but
294 kept in the pool(s) for later re-use. The fact that the pools
295 are not freed at the exit() of the program cause valgrind to
296 report this memory as still reachable. The behaviour not to
297 free pools at the exit() could be called a bug of the library
298 though.
299
300 Using gcc, you can force the STL to use malloc and to free
301 memory as soon as possible by globally disabling memory caching.
302 Beware! Doing so will probably slow down your program,
303 sometimes drastically.
304
305 - With gcc 2.91, 2.95, 3.0 and 3.1, compile all source using the
306 STL with -D__USE_MALLOC. Beware! This is removed from gcc
307 starting with version 3.3.
308
309 - With 3.2.2 and later, you should export the environment
310 variable GLIBCPP_FORCE_NEW before running your program.
311
312 There are other ways to disable memory pooling: using the
313 malloc_alloc template with your objects (not portable, but
314 should work for gcc) or even writing your own memory
315 allocators. But all this goes beyond the scope of this
316 FAQ. Start by reading
317 http://gcc.gnu.org/onlinedocs/libstdc++/ext/howto.html#3
318 if you absolutely want to do that. But beware:
319
320 1) there are currently changes underway for gcc which are not
321 totally reflected in the docs right now
322 ("now" == 26 Apr 03)
323
324 2) allocators belong to the more messy parts of the STL and
325 people went at great lengths to make it portable across
326 platforms. Chances are good that your solution will work
327 on your platform, but not on others.
328
329-----------------------------------------------------------------
330
njnae34aef2003-08-07 21:24:24 +0000331Q15. My program dies with a segmentation fault, but Valgrind doesn't give
332 any error messages before it, or none that look related.
333
334A15. The one kind of segmentation fault that Valgrind won't give any
335 warnings about is writes to read-only memory. Maybe your program is
336 writing to a static string like this:
337
338 char* s = "hello";
339 s[0] = 'j';
340
341 or something similar. Writing to read-only memory can also apparently
342 make LinuxThreads behave strangely.
343
344-----------------------------------------------------------------
345
njn4e59bd92003-04-22 20:58:47 +0000346(this is the end of the FAQ.)