blob: fa4d490b7db4d63257c917ce5e32cddb324d3773 [file] [log] [blame]
njn3e986b22004-11-30 10:43:45 +00001<?xml version="1.0"?> <!-- -*- sgml -*- -->
2<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
3 "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
4
5<chapter id="mc-manual" xreflabel="Memcheck: a heavyweight memory checker">
6<title>Memcheck: a heavyweight memory checker</title>
7
8<para>To use this tool, you must specify
9<computeroutput>--tool=memcheck</computeroutput> on the Valgrind
10command line.</para>
11
12
13<sect1 id="mc-manual.bugs"
14 xreflabel="Kinds of bugs that Memcheck can find">
15<title>Kinds of bugs that Memcheck can find</title>
16
17<para>Memcheck is Valgrind-1.0.X's checking mechanism bundled up
18into a tool. All reads and writes of memory are checked, and
19calls to malloc/new/free/delete are intercepted. As a result,
20memcheck can detect the following problems:</para>
21
22<itemizedlist>
23 <listitem>
24 <para>Use of uninitialised memory</para>
25 </listitem>
26 <listitem>
27 <para>Reading/writing memory after it has been free'd</para>
28 </listitem>
29 <listitem>
30 <para>Reading/writing off the end of malloc'd blocks</para>
31 </listitem>
32 <listitem>
33 <para>Reading/writing inappropriate areas on the stack</para>
34 </listitem>
35 <listitem>
36 <para>Memory leaks -- where pointers to malloc'd blocks are
37 lost forever</para>
38 </listitem>
39 <listitem>
40 <para>Mismatched use of malloc/new/new [] vs
41 free/delete/delete []</para>
42 </listitem>
43 <listitem>
44 <para>Overlapping <computeroutput>src</computeroutput> and
45 <computeroutput>dst</computeroutput> pointers in
46 <computeroutput>memcpy()</computeroutput> and related
47 functions</para>
48 </listitem>
49 <listitem>
50 <para>Some misuses of the POSIX pthreads API</para>
51 </listitem>
52</itemizedlist>
53
54</sect1>
55
56
57
58<sect1 id="mc-manual.flags"
59 xreflabel="Command-line flags specific to memcheck">
60<title>Command-line flags specific to memcheck</title>
61
debc32e822005-06-25 14:43:05 +000062<itemizedlist id="leakcheck">
njn3e986b22004-11-30 10:43:45 +000063 <listitem>
64 <para><computeroutput>--leak-check=no</computeroutput>
65 [default]</para>
66 <para><computeroutput>--leak-check=yes</computeroutput></para>
67 <para>When enabled, search for memory leaks when the client
68 program finishes. A memory leak means a malloc'd block,
69 which has not yet been free'd, but to which no pointer can be
70 found. Such a block can never be free'd by the program,
71 since no pointer to it exists. Leak checking is disabled by
72 default because it tends to generate dozens of error
73 messages.</para>
74 </listitem>
75
debc32e822005-06-25 14:43:05 +000076 <listitem id="showreach">
njn3e986b22004-11-30 10:43:45 +000077 <para><computeroutput>--show-reachable=no</computeroutput>
78 [default]</para>
79 <para><computeroutput>--show-reachable=yes</computeroutput></para>
80 <para>When disabled, the memory leak detector only shows
81 blocks for which it cannot find a pointer to at all, or it
82 can only find a pointer to the middle of. These blocks are
83 prime candidates for memory leaks. When enabled, the leak
84 detector also reports on blocks which it could find a pointer
85 to. Your program could, at least in principle, have freed
86 such blocks before exit. Contrast this to blocks for which
87 no pointer, or only an interior pointer could be found: they
88 are more likely to indicate memory leaks, because you do not
89 actually have a pointer to the start of the block which you
90 can hand to <computeroutput>free</computeroutput>, even if
91 you wanted to.</para>
92 </listitem>
93
debc32e822005-06-25 14:43:05 +000094 <listitem id="leakres">
njn3e986b22004-11-30 10:43:45 +000095 <para><computeroutput>--leak-resolution=low</computeroutput>
96 [default]</para>
97 <para><computeroutput>--leak-resolution=med</computeroutput></para>
98 <para><computeroutput>--leak-resolution=high</computeroutput></para>
99 <para>When doing leak checking, determines how willing
100 Memcheck is to consider different backtraces to be the same.
101 When set to <computeroutput>low</computeroutput>, the
102 default, only the first two entries need match. When
103 <computeroutput>med</computeroutput>, four entries have to
104 match. When <computeroutput>high</computeroutput>, all
105 entries need to match.</para>
106 <para>For hardcore leak debugging, you probably want to use
107 <computeroutput>--leak-resolution=high</computeroutput>
108 together with
109 <computeroutput>--num-callers=40</computeroutput> or some
110 such large number. Note however that this can give an
111 overwhelming amount of information, which is why the defaults
112 are 4 callers and low-resolution matching.</para>
113 <para>Note that the
114 <computeroutput>--leak-resolution=</computeroutput> setting
115 does not affect Memcheck's ability to find leaks. It only
116 changes how the results are presented.</para>
117 </listitem>
118
debc32e822005-06-25 14:43:05 +0000119 <listitem id="freelist">
njn3e986b22004-11-30 10:43:45 +0000120 <para><computeroutput>--freelist-vol=&lt;number></computeroutput>
121 [default: 1000000]</para>
122 <para>When the client program releases memory using free (in
123 <literal>C</literal>) or delete (<literal>C++</literal>),
124 that memory is not immediately made available for
125 re-allocation. Instead it is marked inaccessible and placed
126 in a queue of freed blocks. The purpose is to delay the
127 point at which freed-up memory comes back into circulation.
128 This increases the chance that Memcheck will be able to
129 detect invalid accesses to blocks for some significant period
130 of time after they have been freed.</para>
131 <para>This flag specifies the maximum total size, in bytes,
132 of the blocks in the queue. The default value is one million
133 bytes. Increasing this increases the total amount of memory
134 used by Memcheck but may detect invalid uses of freed blocks
135 which would otherwise go undetected.</para>
136 </listitem>
137
debc32e822005-06-25 14:43:05 +0000138 <listitem id="gcc296">
njn3e986b22004-11-30 10:43:45 +0000139 <para><computeroutput>--workaround-gcc296-bugs=no</computeroutput>
140 [default]</para>
141 <para><computeroutput>--workaround-gcc296-bugs=yes</computeroutput></para>
142 <para>When enabled, assume that reads and writes some small
143 distance below the stack pointer
144 <computeroutput>%esp</computeroutput> are due to bugs in gcc
145 2.96, and does not report them. The "small distance" is 256
146 bytes by default. Note that gcc 2.96 is the default compiler
147 on some popular Linux distributions (RedHat 7.X, Mandrake)
148 and so you may well need to use this flag. Do not use it if
149 you do not have to, as it can cause real errors to be
150 overlooked. Another option is to use a gcc/g++ which does
151 not generate accesses below the stack pointer. 2.95.3 seems
152 to be a good choice in this respect.</para>
153 <para>Unfortunately (27 Feb 02) it looks like g++ 3.0.4 has a
154 similar bug, so you may need to issue this flag if you use
155 3.0.4. A while later (early Apr 02) this is confirmed as a
156 scheduling bug in g++-3.0.4.</para>
157 </listitem>
158
debc32e822005-06-25 14:43:05 +0000159 <listitem id="partial">
njn3e986b22004-11-30 10:43:45 +0000160 <para><computeroutput>--partial-loads-ok=yes</computeroutput>
debc32e822005-06-25 14:43:05 +0000161 [default]</para>
njn3e986b22004-11-30 10:43:45 +0000162 <para><computeroutput>--partial-loads-ok=no</computeroutput></para>
163 <para>Controls how Memcheck handles word (4-byte) loads from
164 addresses for which some bytes are addressible and others are
165 not. When <computeroutput>yes</computeroutput> (the
166 default), such loads do not elicit an address error.
167 Instead, the loaded V bytes corresponding to the illegal
168 addresses indicate undefined, and those corresponding to
169 legal addresses are loaded from shadow memory, as usual.</para>
170 <para>When <computeroutput>no</computeroutput>, loads from
171 partially invalid addresses are treated the same as loads
172 from completely invalid addresses: an illegal-address error
173 is issued, and the resulting V bytes indicate valid data.</para>
174 </listitem>
175
debc32e822005-06-25 14:43:05 +0000176 <listitem id="strlen">
177 <para><computeroutput>--avoid-strlen-errors=no</computeroutput></para>
178 <para><computeroutput>--avoid-strlen-errors=yes</computeroutput> [default]</para>
179 <para>Enable or disable a heuristic for dealing with highly-optimized
180 versions of strlen. These versions of strlen can cause spurious errors
181 to be reported by memcheck, so it's usually a good idea to leave this
182 enabled.</para>
183 </listitem>
184
185 <listitem id="cleanup">
njn3e986b22004-11-30 10:43:45 +0000186 <para><computeroutput>--cleanup=no</computeroutput></para>
187 <para><computeroutput>--cleanup=yes</computeroutput> [default]</para>
188 <para><command>This is a flag to help debug valgrind itself.
189 It is of no use to end-users.</command> When enabled, various
190 improvments are applied to the post-instrumented intermediate
191 code, aimed at removing redundant value checks.</para>
192 </listitem>
193
194</itemizedlist>
195</sect1>
196
197
198<sect1 id="mc-manual.errormsgs"
199 xreflabel="Explanation of error messages from Memcheck">
200<title>Explanation of error messages from Memcheck</title>
201
202<para>Despite considerable sophistication under the hood,
203Memcheck can only really detect two kinds of errors, use of
204illegal addresses, and use of undefined values. Nevertheless,
205this is enough to help you discover all sorts of
206memory-management nasties in your code. This section presents a
207quick summary of what error messages mean. The precise behaviour
208of the error-checking machinery is described in <xref
209linkend="mc-manual.machine"/>.</para>
210
211
212<sect2 id="mc-manual.badrw"
213 xreflabel="Illegal read / Illegal write errors">
214<title>Illegal read / Illegal write errors</title>
215
216<para>For example:</para>
217<programlisting><![CDATA[
218Invalid read of size 4
219 at 0x40F6BBCC: (within /usr/lib/libpng.so.2.1.0.9)
220 by 0x40F6B804: (within /usr/lib/libpng.so.2.1.0.9)
221 by 0x40B07FF4: read_png_image__FP8QImageIO (kernel/qpngio.cpp:326)
222 by 0x40AC751B: QImageIO::read() (kernel/qimage.cpp:3621)
njn21f91952005-03-12 22:14:42 +0000223 Address 0xBFFFF0E0 is not stack'd, malloc'd or free'd
njn3e986b22004-11-30 10:43:45 +0000224]]></programlisting>
225
226<para>This happens when your program reads or writes memory at a
227place which Memcheck reckons it shouldn't. In this example, the
228program did a 4-byte read at address 0xBFFFF0E0, somewhere within
229the system-supplied library libpng.so.2.1.0.9, which was called
230from somewhere else in the same library, called from line 326 of
231<filename>qpngio.cpp</filename>, and so on.</para>
232
233<para>Memcheck tries to establish what the illegal address might
234relate to, since that's often useful. So, if it points into a
235block of memory which has already been freed, you'll be informed
236of this, and also where the block was free'd at. Likewise, if it
237should turn out to be just off the end of a malloc'd block, a
238common result of off-by-one-errors in array subscripting, you'll
239be informed of this fact, and also where the block was
240malloc'd.</para>
241
242<para>In this example, Memcheck can't identify the address.
243Actually the address is on the stack, but, for some reason, this
244is not a valid stack address -- it is below the stack pointer,
245<literal>%esp</literal>, and that isn't allowed. In this
246particular case it's probably caused by gcc generating invalid
247code, a known bug in various flavours of gcc.</para>
248
249<para>Note that Memcheck only tells you that your program is
250about to access memory at an illegal address. It can't stop the
251access from happening. So, if your program makes an access which
252normally would result in a segmentation fault, you program will
253still suffer the same fate -- but you will get a message from
254Memcheck immediately prior to this. In this particular example,
255reading junk on the stack is non-fatal, and the program stays
256alive.</para>
257
258</sect2>
259
260
261
262<sect2 id="mc-manual.uninitvals"
263 xreflabel="Use of uninitialised values">
264<title>Use of uninitialised values</title>
265
266<para>For example:</para>
267<programlisting><![CDATA[
268Conditional jump or move depends on uninitialised value(s)
269 at 0x402DFA94: _IO_vfprintf (_itoa.h:49)
270 by 0x402E8476: _IO_printf (printf.c:36)
271 by 0x8048472: main (tests/manuel1.c:8)
njn3e986b22004-11-30 10:43:45 +0000272]]></programlisting>
273
274<para>An uninitialised-value use error is reported when your
275program uses a value which hasn't been initialised -- in other
276words, is undefined. Here, the undefined value is used somewhere
277inside the printf() machinery of the C library. This error was
278reported when running the following small program:</para>
279<programlisting><![CDATA[
280int main()
281{
282 int x;
283 printf ("x = %d\n", x);
284}]]></programlisting>
285
286<para>It is important to understand that your program can copy
287around junk (uninitialised) data to its heart's content.
288Memcheck observes this and keeps track of the data, but does not
289complain. A complaint is issued only when your program attempts
290to make use of uninitialised data. In this example, x is
291uninitialised. Memcheck observes the value being passed to
292<literal>_IO_printf</literal> and thence to
293<literal>_IO_vfprintf</literal>, but makes no comment. However,
294_IO_vfprintf has to examine the value of x so it can turn it into
295the corresponding ASCII string, and it is at this point that
296Memcheck complains.</para>
297
298<para>Sources of uninitialised data tend to be:</para>
299<itemizedlist>
300 <listitem>
301 <para>Local variables in procedures which have not been
302 initialised, as in the example above.</para>
303 </listitem>
304 <listitem>
305 <para>The contents of malloc'd blocks, before you write
306 something there. In C++, the new operator is a wrapper round
307 malloc, so if you create an object with new, its fields will
308 be uninitialised until you (or the constructor) fill them in,
309 which is only Right and Proper.</para>
310 </listitem>
311</itemizedlist>
312
313</sect2>
314
315
316
317<sect2 id="mc-manual.badfrees" xreflabel="Illegal frees">
318<title>Illegal frees</title>
319
320<para>For example:</para>
321<programlisting><![CDATA[
322Invalid free()
323 at 0x4004FFDF: free (vg_clientmalloc.c:577)
324 by 0x80484C7: main (tests/doublefree.c:10)
njn21f91952005-03-12 22:14:42 +0000325 Address 0x3807F7B4 is 0 bytes inside a block of size 177 free'd
njn3e986b22004-11-30 10:43:45 +0000326 at 0x4004FFDF: free (vg_clientmalloc.c:577)
327 by 0x80484C7: main (tests/doublefree.c:10)
njn3e986b22004-11-30 10:43:45 +0000328]]></programlisting>
329
330<para>Memcheck keeps track of the blocks allocated by your
331program with malloc/new, so it can know exactly whether or not
332the argument to free/delete is legitimate or not. Here, this
333test program has freed the same block twice. As with the illegal
334read/write errors, Memcheck attempts to make sense of the address
335free'd. If, as here, the address is one which has previously
336been freed, you wil be told that -- making duplicate frees of the
337same block easy to spot.</para>
338
339</sect2>
340
341
342<sect2 id="mc-manual.rudefn"
343 xreflabel="When a block is freed with an inappropriate deallocation
344function">
345<title>When a block is freed with an inappropriate deallocation
346function</title>
347
348<para>In the following example, a block allocated with
349<computeroutput>new[]</computeroutput> has wrongly been
350deallocated with <computeroutput>free</computeroutput>:</para>
351<programlisting><![CDATA[
352Mismatched free() / delete / delete []
353 at 0x40043249: free (vg_clientfuncs.c:171)
354 by 0x4102BB4E: QGArray::~QGArray(void) (tools/qgarray.cpp:149)
355 by 0x4C261C41: PptDoc::~PptDoc(void) (include/qmemarray.h:60)
356 by 0x4C261F0E: PptXml::~PptXml(void) (pptxml.cc:44)
njn21f91952005-03-12 22:14:42 +0000357 Address 0x4BB292A8 is 0 bytes inside a block of size 64 alloc'd
njn3e986b22004-11-30 10:43:45 +0000358 at 0x4004318C: __builtin_vec_new (vg_clientfuncs.c:152)
359 by 0x4C21BC15: KLaola::readSBStream(int) const (klaola.cc:314)
360 by 0x4C21C155: KLaola::stream(KLaola::OLENode const *) (klaola.cc:416)
361 by 0x4C21788F: OLEFilter::convert(QCString const &) (olefilter.cc:272)
362]]></programlisting>
363
364<para>The following was told to me be the KDE 3 developers. I
365didn't know any of it myself. They also implemented the check
366itself.</para>
367
368<para>In <literal>C++</literal> it's important to deallocate
369memory in a way compatible with how it was allocated. The deal
370is:</para>
371<itemizedlist>
372 <listitem>
373 <para>If allocated with
374 <computeroutput>malloc</computeroutput>,
375 <computeroutput>calloc</computeroutput>,
376 <computeroutput>realloc</computeroutput>,
377 <computeroutput>valloc</computeroutput> or
378 <computeroutput>memalign</computeroutput>, you must
379 deallocate with <computeroutput>free</computeroutput>.</para>
380 </listitem>
381 <listitem>
382 <para>If allocated with
383 <computeroutput>new[]</computeroutput>, you must deallocate
384 with <computeroutput>delete[]</computeroutput>.</para>
385 </listitem>
386 <listitem>
387 <para>If allocated with <computeroutput>new</computeroutput>,
388 you must deallocate with
389 <computeroutput>delete</computeroutput>.</para>
390 </listitem>
391</itemizedlist>
392
393<para>The worst thing is that on Linux apparently it doesn't
394matter if you do muddle these up, and it all seems to work ok,
395but the same program may then crash on a different platform,
396Solaris for example. So it's best to fix it properly. According
397to the KDE folks "it's amazing how many C++ programmers don't
398know this".</para>
399
400<para>Pascal Massimino adds the following clarification:
401<computeroutput>delete[]</computeroutput> must be called
402associated with a <computeroutput>new[]</computeroutput> because
403the compiler stores the size of the array and the
404pointer-to-member to the destructor of the array's content just
405before the pointer actually returned. This implies a
406variable-sized overhead in what's returned by
407<computeroutput>new</computeroutput> or
njn3f7e9112005-06-19 05:43:21 +0000408<computeroutput>new[]</computeroutput>.</para>
njn3e986b22004-11-30 10:43:45 +0000409</sect2>
410
411
412
413<sect2 id="mc-manual.badperm"
414 xreflabel="Passing system call parameters with
415 inadequate read/write permissions">
416<title>Passing system call parameters with inadequate read/write
417permissions</title>
418
njnc4fcca32004-12-01 00:02:36 +0000419<para>Memcheck checks all parameters to system calls, i.e:
420<itemizedlist>
421 <listitem><para>It checks all the direct parameters
422 themselves.</para></listitem>
423 <listitem><para>Also, if a system call needs to read from a buffer provided
424 by your program, Memcheck checks that the entire buffer is addressible and
425 has valid data, ie, it is readable.</para></listitem>
426 <listitem><para>Also, if the system call needs to write to a user-supplied
427 buffer, Memcheck checks that the buffer is addressible.</para></listitem>
428</itemizedlist>
429</para>
njn3e986b22004-11-30 10:43:45 +0000430
njnc4fcca32004-12-01 00:02:36 +0000431<para>After the system call, Memcheck updates its tracked information to
432precisely reflect any changes in memory permissions caused by the system call.
433</para>
njn3e986b22004-11-30 10:43:45 +0000434
njnc4fcca32004-12-01 00:02:36 +0000435<para>Here's an example of two system calls with invalid parameters:</para>
njn3e986b22004-11-30 10:43:45 +0000436<programlisting><![CDATA[
njnc4fcca32004-12-01 00:02:36 +0000437 #include &lt;stdlib.h>
438 #include &lt;unistd.h>
439 int main( void )
440 {
441 char* arr = malloc(10);
442 int* arr2 = malloc(sizeof(int));
443 write( 1 /* stdout */, arr, 10 );
444 exit(arr2[0]);
445 }
njn3e986b22004-11-30 10:43:45 +0000446]]></programlisting>
447
njnc4fcca32004-12-01 00:02:36 +0000448<para>You get these complaints ...</para>
449<programlisting><![CDATA[
450 Syscall param write(buf) points to uninitialised byte(s)
451 at 0x25A48723: __write_nocancel (in /lib/tls/libc-2.3.3.so)
452 by 0x259AFAD3: __libc_start_main (in /lib/tls/libc-2.3.3.so)
453 by 0x8048348: (within /auto/homes/njn25/grind/head4/a.out)
454 Address 0x25AB8028 is 0 bytes inside a block of size 10 alloc'd
455 at 0x259852B0: malloc (vg_replace_malloc.c:130)
456 by 0x80483F1: main (a.c:5)
457
458 Syscall param exit(error_code) contains uninitialised byte(s)
459 at 0x25A21B44: __GI__exit (in /lib/tls/libc-2.3.3.so)
460 by 0x8048426: main (a.c:8)
461]]></programlisting>
462
463<para>... because the program has (a) tried to write uninitialised junk from
464the malloc'd block to the standard output, and (b) passed an uninitialised
465value to <computeroutput>exit</computeroutput>. Note that the first error
466refers to the memory pointed to by <computeroutput>buf</computeroutput> (not
467<computeroutput>buf</computeroutput> itself), but the second error refers to
468the argument <computeroutput>error_code</computeroutput> itself.</para>
njn3e986b22004-11-30 10:43:45 +0000469
470</sect2>
471
472
473<sect2 id="mc-manual.overlap"
474 xreflabel="Overlapping source and destination blocks">
475<title>Overlapping source and destination blocks</title>
476
477<para>The following C library functions copy some data from one
478memory block to another (or something similar):
479<computeroutput>memcpy()</computeroutput>,
480<computeroutput>strcpy()</computeroutput>,
481<computeroutput>strncpy()</computeroutput>,
482<computeroutput>strcat()</computeroutput>,
483<computeroutput>strncat()</computeroutput>.
484The blocks pointed to by their
485<computeroutput>src</computeroutput> and
486<computeroutput>dst</computeroutput> pointers aren't allowed to
487overlap. Memcheck checks for this.</para>
488
489<para>For example:</para>
490<programlisting><![CDATA[
491==27492== Source and destination overlap in memcpy(0xbffff294, 0xbffff280, 21)
492==27492== at 0x40026CDC: memcpy (mc_replace_strmem.c:71)
493==27492== by 0x804865A: main (overlap.c:40)
njn3e986b22004-11-30 10:43:45 +0000494==27492==
495]]></programlisting>
496
497<para>You don't want the two blocks to overlap because one of
498them could get partially trashed by the copying.</para>
499
500</sect2>
501
502
503</sect1>
504
505
506
507<sect1 id="mc-manual.suppfiles" xreflabel="Writing suppressions files">
508<title>Writing suppressions files</title>
509
510<para>The basic suppression format is described in
511<xref linkend="manual-core.suppress"/>.</para>
512
513<para>The suppression (2nd) line should have the form:</para>
514<programlisting><![CDATA[
515Memcheck:suppression_type]]></programlisting>
516
517<para>Or, since some of the suppressions are shared with Addrcheck:</para>
518<programlisting><![CDATA[
519Memcheck,Addrcheck:suppression_type]]></programlisting>
520
521<para>The Memcheck suppression types are as follows:</para>
522
523<itemizedlist>
524 <listitem>
525 <para><computeroutput>Value1</computeroutput>,
526 <computeroutput>Value2</computeroutput>,
527 <computeroutput>Value4</computeroutput>,
528 <computeroutput>Value8</computeroutput>,
529 <computeroutput>Value16</computeroutput>,
530 meaning an uninitialised-value error when
531 using a value of 1, 2, 4, 8 or 16 bytes.</para>
532 </listitem>
533
534 <listitem>
535 <para>Or: <computeroutput>Cond</computeroutput> (or its old
536 name, <computeroutput>Value0</computeroutput>), meaning use
537 of an uninitialised CPU condition code.</para>
538 </listitem>
539
540 <listitem>
541 <para>Or: <computeroutput>Addr1</computeroutput>,
542 <computeroutput>Addr2</computeroutput>,
543 <computeroutput>Addr4</computeroutput>,
544 <computeroutput>Addr8</computeroutput>,
545 <computeroutput>Addr16</computeroutput>,
546 meaning an invalid address during a
547 memory access of 1, 2, 4, 8 or 16 bytes respectively.</para>
548 </listitem>
549
550 <listitem>
551 <para>Or: <computeroutput>Param</computeroutput>, meaning an
552 invalid system call parameter error.</para>
553 </listitem>
554
555 <listitem>
556 <para>Or: <computeroutput>Free</computeroutput>, meaning an
557 invalid or mismatching free.</para>
558 </listitem>
559
560 <listitem>
561 <para><computeroutput>Overlap</computeroutput>, meaning a
562 <computeroutput>src</computeroutput> /
563 <computeroutput>dst</computeroutput> overlap in
564 <computeroutput>memcpy() or a similar
565 function</computeroutput>.</para>
566 </listitem>
567
568 <listitem>
569 <para>Last but not least, you can suppress leak reports with
570 <computeroutput>Leak</computeroutput>. Leak suppression was
571 added in valgrind-1.9.3, I believe.</para>
572 </listitem>
573
574</itemizedlist>
575
576<para>The extra information line: for Param errors, is the name
577of the offending system call parameter. No other error kinds
578have this extra line.</para>
579
580<para>The first line of the calling context: for Value and Addr
581errors, it is either the name of the function in which the error
582occurred, or, failing that, the full path of the .so file or
583executable containing the error location. For Free errors, is
584the name of the function doing the freeing (eg,
585<computeroutput>free</computeroutput>,
586<computeroutput>__builtin_vec_delete</computeroutput>, etc). For
587Overlap errors, is the name of the function with the overlapping
588arguments (eg. <computeroutput>memcpy()</computeroutput>,
589<computeroutput>strcpy()</computeroutput>, etc).</para>
590
591<para>Lastly, there's the rest of the calling context.</para>
592
593</sect1>
594
595
596
597<sect1 id="mc-manual.machine"
598 xreflabel="Details of Memcheck's checking machinery">
599<title>Details of Memcheck's checking machinery</title>
600
601<para>Read this section if you want to know, in detail, exactly
602what and how Memcheck is checking.</para>
603
604
605<sect2 id="mc-manual.value" xreflabel="Valid-value (V) bit">
606<title>Valid-value (V) bits</title>
607
608<para>It is simplest to think of Memcheck implementing a
609synthetic Intel x86 CPU which is identical to a real CPU, except
610for one crucial detail. Every bit (literally) of data processed,
611stored and handled by the real CPU has, in the synthetic CPU, an
612associated "valid-value" bit, which says whether or not the
613accompanying bit has a legitimate value. In the discussions
614which follow, this bit is referred to as the V (valid-value)
615bit.</para>
616
617<para>Each byte in the system therefore has a 8 V bits which
618follow it wherever it goes. For example, when the CPU loads a
619word-size item (4 bytes) from memory, it also loads the
620corresponding 32 V bits from a bitmap which stores the V bits for
621the process' entire address space. If the CPU should later write
622the whole or some part of that value to memory at a different
623address, the relevant V bits will be stored back in the V-bit
624bitmap.</para>
625
626<para>In short, each bit in the system has an associated V bit,
627which follows it around everywhere, even inside the CPU. Yes,
628the CPU's (integer and <computeroutput>%eflags</computeroutput>)
629registers have their own V bit vectors.</para>
630
631<para>Copying values around does not cause Memcheck to check for,
632or report on, errors. However, when a value is used in a way
633which might conceivably affect the outcome of your program's
634computation, the associated V bits are immediately checked. If
635any of these indicate that the value is undefined, an error is
636reported.</para>
637
638<para>Here's an (admittedly nonsensical) example:</para>
639<programlisting><![CDATA[
640int i, j;
641int a[10], b[10];
642for ( i = 0; i < 10; i++ ) {
643 j = a[i];
644 b[i] = j;
645}]]></programlisting>
646
647<para>Memcheck emits no complaints about this, since it merely
648copies uninitialised values from
649<computeroutput>a[]</computeroutput> into
650<computeroutput>b[]</computeroutput>, and doesn't use them in any
651way. However, if the loop is changed to:</para>
652<programlisting><![CDATA[
653for ( i = 0; i < 10; i++ ) {
654 j += a[i];
655}
656if ( j == 77 )
657 printf("hello there\n");
658]]></programlisting>
659
660<para>then Valgrind will complain, at the
661<computeroutput>if</computeroutput>, that the condition depends
662on uninitialised values. Note that it <command>doesn't</command>
663complain at the <computeroutput>j += a[i];</computeroutput>,
664since at that point the undefinedness is not "observable". It's
665only when a decision has to be made as to whether or not to do
666the <computeroutput>printf</computeroutput> -- an observable
667action of your program -- that Memcheck complains.</para>
668
669<para>Most low level operations, such as adds, cause Memcheck to
670use the <literal>V bits</literal> for the operands to calculate
671the V bits for the result. Even if the result is partially or
672wholly undefined, it does not complain.</para>
673
674<para>Checks on definedness only occur in two places: when a
675value is used to generate a memory address, and where control
676flow decision needs to be made. Also, when a system call is
677detected, valgrind checks definedness of parameters as
678required.</para>
679
680<para>If a check should detect undefinedness, an error message is
681issued. The resulting value is subsequently regarded as
682well-defined. To do otherwise would give long chains of error
683messages. In effect, we say that undefined values are
684non-infectious.</para>
685
686<para>This sounds overcomplicated. Why not just check all reads
687from memory, and complain if an undefined value is loaded into a
688CPU register? Well, that doesn't work well, because perfectly
689legitimate C programs routinely copy uninitialised values around
690in memory, and we don't want endless complaints about that.
691Here's the canonical example. Consider a struct like
692this:</para>
693<programlisting><![CDATA[
694struct S { int x; char c; };
695struct S s1, s2;
696s1.x = 42;
697s1.c = 'z';
698s2 = s1;
699]]></programlisting>
700
701<para>The question to ask is: how large is <computeroutput>struct
702S</computeroutput>, in bytes? An
703<computeroutput>int</computeroutput> is 4 bytes and a
704<computeroutput>char</computeroutput> one byte, so perhaps a
705<computeroutput>struct S</computeroutput> occupies 5 bytes?
706Wrong. All (non-toy) compilers we know of will round the size of
707<computeroutput>struct S</computeroutput> up to a whole number of
708words, in this case 8 bytes. Not doing this forces compilers to
709generate truly appalling code for subscripting arrays of
710<computeroutput>struct S</computeroutput>'s.</para>
711
712<para>So <computeroutput>s1</computeroutput> occupies 8 bytes,
713yet only 5 of them will be initialised. For the assignment
714<computeroutput>s2 = s1</computeroutput>, gcc generates code to
715copy all 8 bytes wholesale into
716<computeroutput>s2</computeroutput> without regard for their
717meaning. If Memcheck simply checked values as they came out of
718memory, it would yelp every time a structure assignment like this
719happened. So the more complicated semantics described above is
720necessary. This allows <literal>gcc</literal> to copy
721<computeroutput>s1</computeroutput> into
722<computeroutput>s2</computeroutput> any way it likes, and a
723warning will only be emitted if the uninitialised values are
724later used.</para>
725
726<para>One final twist to this story. The above scheme allows
727garbage to pass through the CPU's integer registers without
728complaint. It does this by giving the integer registers
729<literal>V</literal> tags, passing these around in the expected
730way. This complicated and computationally expensive to do, but
731is necessary. Memcheck is more simplistic about floating-point
732loads and stores. In particular, <literal>V</literal> bits for
733data read as a result of floating-point loads are checked at the
734load instruction. So if your program uses the floating-point
735registers to do memory-to-memory copies, you will get complaints
736about uninitialised values. Fortunately, I have not yet
737encountered a program which (ab)uses the floating-point registers
738in this way.</para>
739
740</sect2>
741
742
743<sect2 id="mc-manual.vaddress" xreflabel=" Valid-address (A) bits">
744<title>Valid-address (A) bits</title>
745
746<para>Notice that the previous subsection describes how the
747validity of values is established and maintained without having
748to say whether the program does or does not have the right to
749access any particular memory location. We now consider the
750latter issue.</para>
751
752<para>As described above, every bit in memory or in the CPU has
753an associated valid-value (<literal>V</literal>) bit. In
754addition, all bytes in memory, but not in the CPU, have an
755associated valid-address (<literal>A</literal>) bit. This
756indicates whether or not the program can legitimately read or
757write that location. It does not give any indication of the
758validity or the data at that location -- that's the job of the
759<literal>V</literal> bits -- only whether or not the location may
760be accessed.</para>
761
762<para>Every time your program reads or writes memory, Memcheck
763checks the <literal>A</literal> bits associated with the address.
764If any of them indicate an invalid address, an error is emitted.
765Note that the reads and writes themselves do not change the A
766bits, only consult them.</para>
767
768<para>So how do the <literal>A</literal> bits get set/cleared?
769Like this:</para>
770
771<itemizedlist>
772 <listitem>
773 <para>When the program starts, all the global data areas are
774 marked as accessible.</para>
775 </listitem>
776
777 <listitem>
778 <para>When the program does malloc/new, the A bits for
779 exactly the area allocated, and not a byte more, are marked
780 as accessible. Upon freeing the area the A bits are changed
781 to indicate inaccessibility.</para>
782 </listitem>
783
784 <listitem>
785
786 <para>When the stack pointer register
787 (<literal>%esp</literal>) moves up or down,
788 <literal>A</literal> bits are set. The rule is that the area
789 from <literal>%esp</literal> up to the base of the stack is
790 marked as accessible, and below <literal>%esp</literal> is
791 inaccessible. (If that sounds illogical, bear in mind that
792 the stack grows down, not up, on almost all Unix systems,
793 including GNU/Linux.) Tracking <literal>%esp</literal> like
794 this has the useful side-effect that the section of stack
795 used by a function for local variables etc is automatically
796 marked accessible on function entry and inaccessible on
797 exit.</para>
798 </listitem>
799
800 <listitem>
801 <para>When doing system calls, A bits are changed
802 appropriately. For example, mmap() magically makes files
803 appear in the process's address space, so the A bits must be
804 updated if mmap() succeeds.</para>
805 </listitem>
806
807 <listitem>
808 <para>Optionally, your program can tell Valgrind about such
809 changes explicitly, using the client request mechanism
810 described above.</para>
811 </listitem>
812
813</itemizedlist>
814
815</sect2>
816
817
818<sect2 id="mc-manual.together" xreflabel="Putting it all together">
819<title>Putting it all together</title>
820
821<para>Memcheck's checking machinery can be summarised as
822follows:</para>
823
824<itemizedlist>
825 <listitem>
826 <para>Each byte in memory has 8 associated
827 <literal>V</literal> (valid-value) bits, saying whether or
828 not the byte has a defined value, and a single
829 <literal>A</literal> (valid-address) bit, saying whether or
830 not the program currently has the right to read/write that
831 address.</para>
832 </listitem>
833
834 <listitem>
835 <para>When memory is read or written, the relevant
836 <literal>A</literal> bits are consulted. If they indicate an
837 invalid address, Valgrind emits an Invalid read or Invalid
838 write error.</para>
839 </listitem>
840
841 <listitem>
842 <para>When memory is read into the CPU's integer registers,
843 the relevant <literal>V</literal> bits are fetched from
844 memory and stored in the simulated CPU. They are not
845 consulted.</para>
846 </listitem>
847
848 <listitem>
849 <para>When an integer register is written out to memory, the
850 <literal>V</literal> bits for that register are written back
851 to memory too.</para>
852 </listitem>
853
854 <listitem>
855 <para>When memory is read into the CPU's floating point
856 registers, the relevant <literal>V</literal> bits are read
857 from memory and they are immediately checked. If any are
858 invalid, an uninitialised value error is emitted. This
859 precludes using the floating-point registers to copy
860 possibly-uninitialised memory, but simplifies Valgrind in
861 that it does not have to track the validity status of the
862 floating-point registers.</para>
863 </listitem>
864
865 <listitem>
866 <para>As a result, when a floating-point register is written
867 to memory, the associated V bits are set to indicate a valid
868 value.</para>
869 </listitem>
870
871 <listitem>
872 <para>When values in integer CPU registers are used to
873 generate a memory address, or to determine the outcome of a
874 conditional branch, the <literal>V</literal> bits for those
875 values are checked, and an error emitted if any of them are
876 undefined.</para>
877 </listitem>
878
879 <listitem>
880 <para>When values in integer CPU registers are used for any
881 other purpose, Valgrind computes the V bits for the result,
882 but does not check them.</para>
883 </listitem>
884
885 <listitem>
886 <para>One the <literal>V</literal> bits for a value in the
887 CPU have been checked, they are then set to indicate
888 validity. This avoids long chains of errors.</para>
889 </listitem>
890
891 <listitem>
892 <para>When values are loaded from memory, valgrind checks the
893 A bits for that location and issues an illegal-address
894 warning if needed. In that case, the V bits loaded are
895 forced to indicate Valid, despite the location being invalid.</para>
896 <para>This apparently strange choice reduces the amount of
897 confusing information presented to the user. It avoids the
898 unpleasant phenomenon in which memory is read from a place
899 which is both unaddressible and contains invalid values, and,
900 as a result, you get not only an invalid-address (read/write)
901 error, but also a potentially large set of
902 uninitialised-value errors, one for every time the value is
903 used.</para>
904 <para>There is a hazy boundary case to do with multi-byte
905 loads from addresses which are partially valid and partially
906 invalid. See details of the flag
907 <computeroutput>--partial-loads-ok</computeroutput> for
908 details. </para>
909 </listitem>
910
911</itemizedlist>
912
913
914<para>Memcheck intercepts calls to malloc, calloc, realloc,
915valloc, memalign, free, new and delete. The behaviour you get
916is:</para>
917
918<itemizedlist>
919
920 <listitem>
921 <para>malloc/new: the returned memory is marked as
922 addressible but not having valid values. This means you have
923 to write on it before you can read it.</para>
924 </listitem>
925
926 <listitem>
927 <para>calloc: returned memory is marked both addressible and
928 valid, since calloc() clears the area to zero.</para>
929 </listitem>
930
931 <listitem>
932 <para>realloc: if the new size is larger than the old, the
933 new section is addressible but invalid, as with
934 malloc.</para>
935 </listitem>
936
937 <listitem>
938 <para>If the new size is smaller, the dropped-off section is
939 marked as unaddressible. You may only pass to realloc a
940 pointer previously issued to you by malloc/calloc/realloc.</para>
941 </listitem>
942
943 <listitem>
944 <para>free/delete: you may only pass to free a pointer
945 previously issued to you by malloc/calloc/realloc, or the
946 value NULL. Otherwise, Valgrind complains. If the pointer is
947 indeed valid, Valgrind marks the entire area it points at as
948 unaddressible, and places the block in the
949 freed-blocks-queue. The aim is to defer as long as possible
950 reallocation of this block. Until that happens, all attempts
951 to access it will elicit an invalid-address error, as you
952 would hope.</para>
953 </listitem>
954
955</itemizedlist>
956
957</sect2>
958</sect1>
959
960
961
962<sect1 id="mc-manual.leaks" xreflabel="Memory leak detection">
963<title>Memory leak detection</title>
964
965<para>Memcheck keeps track of all memory blocks issued in
966response to calls to malloc/calloc/realloc/new. So when the
967program exits, it knows which blocks are still outstanding --
968have not been returned, in other words. Ideally, you want your
969program to have no blocks still in use at exit. But many
970programs do.</para>
971
972<para>For each such block, Memcheck scans the entire address
973space of the process, looking for pointers to the block. One of
974three situations may result:</para>
975
976<itemizedlist>
977
978 <listitem>
979 <para>A pointer to the start of the block is found. This
980 usually indicates programming sloppiness; since the block is
981 still pointed at, the programmer could, at least in
982 principle, free'd it before program exit.</para>
983 </listitem>
984
985 <listitem>
986 <para>A pointer to the interior of the block is found. The
987 pointer might originally have pointed to the start and have
988 been moved along, or it might be entirely unrelated.
989 Memcheck deems such a block as "dubious", that is, possibly
990 leaked, because it's unclear whether or not a pointer to it
991 still exists.</para>
992 </listitem>
993
994 <listitem>
995 <para>The worst outcome is that no pointer to the block can
996 be found. The block is classified as "leaked", because the
997 programmer could not possibly have free'd it at program exit,
998 since no pointer to it exists. This might be a symptom of
999 having lost the pointer at some earlier point in the
1000 program.</para>
1001 </listitem>
1002
1003</itemizedlist>
1004
1005<para>Memcheck reports summaries about leaked and dubious blocks.
1006For each such block, it will also tell you where the block was
1007allocated. This should help you figure out why the pointer to it
1008has been lost. In general, you should attempt to ensure your
1009programs do not have any leaked or dubious blocks at exit.</para>
1010
1011<para>The precise area of memory in which Memcheck searches for
1012pointers is: all naturally-aligned 4-byte words for which all A
1013bits indicate addressibility and all V bits indicated that the
1014stored value is actually valid.</para>
1015
1016</sect1>
1017
1018
1019<sect1 id="mc-manual.clientreqs" xreflabel="Client requests">
1020<title>Client Requests</title>
1021
1022<para>The following client requests are defined in
1023<filename>memcheck.h</filename>. They also work for Addrcheck.
1024See <filename>memcheck.h</filename> for exact details of their
1025arguments.</para>
1026
1027<itemizedlist>
1028
1029 <listitem>
1030 <para><computeroutput>VALGRIND_MAKE_NOACCESS</computeroutput>,
1031 <computeroutput>VALGRIND_MAKE_WRITABLE</computeroutput> and
1032 <computeroutput>VALGRIND_MAKE_READABLE</computeroutput>.
1033 These mark address ranges as completely inaccessible,
1034 accessible but containing undefined data, and accessible and
1035 containing defined data, respectively. Subsequent errors may
1036 have their faulting addresses described in terms of these
1037 blocks. Returns a "block handle". Returns zero when not run
1038 on Valgrind.</para>
1039 </listitem>
1040
1041 <listitem>
1042 <para><computeroutput>VALGRIND_DISCARD</computeroutput>: At
1043 some point you may want Valgrind to stop reporting errors in
1044 terms of the blocks defined by the previous three macros. To
1045 do this, the above macros return a small-integer "block
1046 handle". You can pass this block handle to
1047 <computeroutput>VALGRIND_DISCARD</computeroutput>. After
1048 doing so, Valgrind will no longer be able to relate
1049 addressing errors to the user-defined block associated with
1050 the handle. The permissions settings associated with the
1051 handle remain in place; this just affects how errors are
1052 reported, not whether they are reported. Returns 1 for an
1053 invalid handle and 0 for a valid handle (although passing
1054 invalid handles is harmless). Always returns 0 when not run
1055 on Valgrind.</para>
1056 </listitem>
1057
1058 <listitem>
1059 <para><computeroutput>VALGRIND_CHECK_WRITABLE</computeroutput>
1060 and <computeroutput>VALGRIND_CHECK_READABLE</computeroutput>:
1061 check immediately whether or not the given address range has
1062 the relevant property, and if not, print an error message.
1063 Also, for the convenience of the client, returns zero if the
1064 relevant property holds; otherwise, the returned value is the
1065 address of the first byte for which the property is not true.
1066 Always returns 0 when not run on Valgrind.</para>
1067 </listitem>
1068
1069 <listitem>
1070 <para><computeroutput>VALGRIND_CHECK_DEFINED</computeroutput>:
1071 a quick and easy way to find out whether Valgrind thinks a
1072 particular variable (lvalue, to be precise) is addressible
1073 and defined. Prints an error message if not. Returns no
1074 value.</para>
1075 </listitem>
1076
1077 <listitem>
1078 <para><computeroutput>VALGRIND_DO_LEAK_CHECK</computeroutput>:
1079 run the memory leak detector right now. Returns no value. I
1080 guess this could be used to incrementally check for leaks
1081 between arbitrary places in the program's execution.
1082 Warning: not properly tested!</para>
1083 </listitem>
1084
1085 <listitem>
1086 <para><computeroutput>VALGRIND_COUNT_LEAKS</computeroutput>:
1087 fills in the four arguments with the number of bytes of
1088 memory found by the previous leak check to be leaked,
1089 dubious, reachable and suppressed. Again, useful in test
1090 harness code, after calling
1091 <computeroutput>VALGRIND_DO_LEAK_CHECK</computeroutput>.</para>
1092 </listitem>
1093
1094 <listitem>
1095 <para><computeroutput>VALGRIND_GET_VBITS</computeroutput> and
1096 <computeroutput>VALGRIND_SET_VBITS</computeroutput>: allow
1097 you to get and set the V (validity) bits for an address
1098 range. You should probably only set V bits that you have got
1099 with <computeroutput>VALGRIND_GET_VBITS</computeroutput>.
1100 Only for those who really know what they are doing.</para>
1101 </listitem>
1102
1103</itemizedlist>
1104
1105</sect1>
1106</chapter>