| <?xml version="1.0"?> <!-- -*- sgml -*- --> |
| <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" |
| "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" |
| [ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]> |
| |
| |
| <chapter id="pc-manual" |
| xreflabel="Ptrcheck: an experimental heap, stack & global array overrun detector"> |
| <title>Ptrcheck: an experimental heap, stack & global array overrun detector</title> |
| |
| <para>To use this tool, you must specify |
| <computeroutput>--tool=exp-ptrcheck</computeroutput> on the Valgrind |
| command line.</para> |
| |
| |
| |
| |
| <sect1 id="pc-manual.overview" xreflabel="Overview"> |
| <title>Overview</title> |
| |
| <para>Ptrcheck is a tool for finding overruns of heap, stack |
| and global arrays. Its functionality overlaps somewhat with |
| Memcheck's, but it is able to catch invalid accesses in a number of |
| cases that Memcheck would miss. A detailed comparison against |
| Memcheck is presented below.</para> |
| |
| <para>Ptrcheck is composed of two almost completely independent tools |
| that have been glued together. One part, |
| in <computeroutput>h_main.[ch]</computeroutput>, checks accesses |
| through heap-derived pointers. The other part, in |
| <computeroutput>sg_main.[ch]</computeroutput>, checks accesses to |
| stack and global arrays. The remaining |
| files <computeroutput>pc_{common,main}.[ch]</computeroutput>, provide |
| common error-management and coordination functions, so as to make it |
| appear as a single tool.</para> |
| |
| <para>The heap-check part is an extensively-hacked (largely rewritten) |
| version of the experimental "Annelid" tool developed and described by |
| Nicholas Nethercote and Jeremy Fitzhardinge. The stack- and global- |
| check part uses a heuristic approach derived from an observation about |
| the likely forms of stack and global array accesses, and, as far as is |
| known, is entirely novel.</para> |
| |
| </sect1> |
| |
| |
| |
| |
| <sect1 id="pc-manual.options" xreflabel="Ptrcheck Options"> |
| <title>Ptrcheck Options</title> |
| |
| <para>The following end-user options are available:</para> |
| |
| <!-- start of xi:include in the manpage --> |
| <variablelist id="pc.opts.list"> |
| |
| <varlistentry id="opt.enable-sg-checks" xreflabel="--enable-sg-checks"> |
| <term> |
| <option><![CDATA[--enable-sg-checks=no|yes |
| [default: yes] ]]></option> |
| </term> |
| <listitem> |
| <para>By default, Ptrcheck checks for overruns of stack, global |
| and heap arrays. |
| With <varname>--enable-sg-checks=no</varname>, the stack and |
| global array checks are omitted, and only heap checking is |
| performed. This can be useful because the stack and global |
| checks are quite expensive, so omitting them speeds Ptrcheck up |
| a lot. |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry id="opt.partial-loads-ok" xreflabel="--partial-loads-ok"> |
| <term> |
| <option><![CDATA[--partial-loads-ok=<yes|no> [default: no] ]]></option> |
| </term> |
| <listitem> |
| <para>This option has the same meaning as it does for |
| Memcheck.</para> |
| <para>Controls how Ptrcheck handles word-sized, word-aligned |
| loads which partially overlap the end of heap blocks -- that is, |
| some of the bytes in the word are validly addressable, but |
| others are not. When <varname>yes</varname>, such loads do not |
| produce an address error. When <varname>no</varname> (the |
| default), loads from partially invalid addresses are treated the |
| same as loads from completely invalid addresses: an illegal heap |
| access error is issued. |
| </para> |
| <para>Note that code that behaves in this way is in violation of |
| the the ISO C/C++ standards, and should be considered broken. If |
| at all possible, such code should be fixed. This flag should be |
| used only as a last resort.</para> |
| </listitem> |
| </varlistentry> |
| |
| </variablelist> |
| <!-- end of xi:include in the manpage --> |
| |
| <!-- start of xi:include in the manpage --> |
| <para>In addition, the following debugging options are available for |
| Ptrcheck:</para> |
| |
| <variablelist id="hg.debugopts.list"> |
| |
| <varlistentry id="opt.trace-malloc" xreflabel="--trace-malloc"> |
| <term> |
| <option><![CDATA[--trace-malloc=no|yes [no] |
| ]]></option> |
| </term> |
| <listitem> |
| <para>Show all client malloc (etc) and free (etc) requests.</para> |
| </listitem> |
| </varlistentry> |
| |
| </variablelist> |
| <!-- end of xi:include in the manpage --> |
| |
| |
| </sect1> |
| |
| |
| |
| |
| <sect1 id="pc-manual.how-works.heap-checks" |
| xreflabel="How Ptrcheck Works: Heap Checks"> |
| <title>How Ptrcheck Works: Heap Checks</title> |
| |
| <para>Ptrcheck can check for invalid uses of heap pointers, including |
| out of range accesses and accesses to freed memory. The mechanism is |
| however completely different from Memcheck's, and the checking is more |
| powerful.</para> |
| |
| <para>For each pointer in the program, Ptrcheck keeps track of which |
| heap block (if any) it was derived from. Then, when an access is made |
| through that pointer, Ptrcheck compares the access address with the |
| bounds of the associated block, and reports an error if the address is |
| out of bounds, or if the block has been freed.</para> |
| |
| <para>Of course it is rarely the case that one wants to access a block |
| only at the exact address returned by malloc (et al). Ptrcheck |
| understands that adding or subtracting offsets from a pointer to a |
| block results in a pointer to the same block.</para> |
| |
| <para>At a fundamental level, this scheme works because a correct |
| program cannot make assumptions about the addresses returned by |
| malloc. In particular it cannot make any assumptions about the |
| differences in addresses returned by subsequent calls to malloc. |
| Hence there are very few ways to take an address returned by malloc, |
| modify it, and still have a valid address. In short, the only |
| allowable operations are adding and subtracting other non-pointer |
| values. Almost all other operations produce a value which cannot |
| possibly be a valid pointer.</para> |
| |
| </sect1> |
| |
| |
| |
| <sect1 id="pc-manual.how-works.sg-checks" |
| xreflabel="How Ptrcheck Works: Stack and Global Checks"> |
| <title>How Ptrcheck Works: Stack and Global Checks</title> |
| |
| <para>When a source file is compiled |
| with <computeroutput>-g</computeroutput>, the compiler attaches DWARF3 |
| debugging information which describes the location of all stack and |
| global arrays in the file.</para> |
| |
| <para>Checking of accesses to such arrays would then be relatively |
| simple, if the compiler could also tell us which array (if any) each |
| memory referencing instruction was supposed to access. Unfortunately |
| the DWARF3 debugging format does not provide a way to represent such |
| information, so we have to resort to a heuristic technique to |
| approximate the same information. The key observation is that |
| </para> |
| |
| <para> |
| if a memory referencing instruction accesses inside a stack or |
| global array once, then it is highly likely to always access that |
| same array</para> |
| |
| <para>To see how this might be useful, consider the following buggy |
| fragment:</para> |
| <programlisting><![CDATA[ |
| { int i, a[10]; // both are auto vars |
| for (i = 0; i <= 10; i++) |
| a[i] = 42; |
| } |
| ]]></programlisting> |
| |
| <para>At run time we will know the precise address |
| of <computeroutput>a[]</computeroutput> on the stack, and so we can |
| observe that the first store resulting from <computeroutput>a[i] = |
| 42</computeroutput> writes <computeroutput>a[]</computeroutput>, and |
| we will (correctly) assume that that instruction is intended always to |
| access <computeroutput>a[]</computeroutput>. Then, on the 11th |
| iteration, it accesses somewhere else, possibly a different local, |
| possibly an un-accounted for area of the stack (eg, spill slot), so |
| Ptrcheck reports an error.</para> |
| |
| <para>There is an important caveat.</para> |
| |
| <para>Imagine a function such as memcpy, which is used to read and |
| write many different areas of memory over the lifetime of the program. |
| If we insist that the read and write instructions in its memory |
| copying loop only ever access one particular stack or global variable, |
| we will be flooded with errors resulting from calls to memcpy.</para> |
| |
| <para>To avoid this problem, Ptrcheck instantiates fresh likely-target |
| records for each entry to a function, and discards them on exit. This |
| allows detection of cases where (eg) memcpy overflows its source or |
| destination buffers for any specific call, but does not carry any |
| restriction from one call to the next. Indeed, multiple threads may |
| be multiple simultaneous calls to (eg) memcpy without mutual |
| interference.</para> |
| |
| </sect1> |
| |
| |
| |
| |
| <sect1 id="pc-manual.cmp-w-memcheck" |
| xreflabel="Comparison with Memcheck"> |
| <title>Comparison with Memcheck</title> |
| |
| <para>Memcheck does not do any access checks for stack or global arrays, so |
| the presence of those in Ptrcheck is a straight win. (But see |
| "Limitations" below).</para> |
| |
| <para>Memcheck and Ptrcheck use different approaches for checking heap |
| accesses. Memcheck maintains bitmaps telling it which areas of memory |
| are accessible and which are not. If a memory access falls in an |
| unaccessible area, it reports an error. By marking the 16 bytes |
| before and after an allocated block unaccessible, Memcheck is able to |
| detect small over- and underruns of the block. Similarly, by marking |
| freed memory as unaccessible, Memcheck can detect all accesses to |
| freed memory.</para> |
| |
| <para>Memcheck's approach is simple. But it's also weak. It can't |
| catch block overruns beyond 16 bytes. And, more generally, because it |
| focusses only on the question "is the target address accessible", it |
| fails to detect invalid accesses which just happen to fall within some |
| other valid area. This is not improbable, especially in crowded areas |
| of the process' address space.</para> |
| |
| <para>Ptrcheck's approach is to keep track of pointers derived from |
| heap blocks. It tracks pointers which are derived directly from calls |
| to malloc et al, but also ones derived indirectly, by adding or |
| subtracting offsets from the directly-derived pointers. When a |
| pointer is finally used to access memory, Ptrcheck compares the access |
| address with that of the block it was originally derived from, and |
| reports an error if the access address is not within the block |
| bounds.</para> |
| |
| <para>Consequently Ptrcheck can detect any out of bounds access |
| through a heap-derived pointer, no matter how far from the original |
| block it is.</para> |
| |
| <para>A second advantage is that Ptrcheck is better at detecting |
| accesses to blocks freed very far in the past. Memcheck can detect |
| these too, but only for blocks freed relatively recently. To detect |
| accesses to a freed block, Memcheck must make it inaccessible, hence |
| requiring a space overhead proportional to the size of the block. If |
| the blocks are large, Memcheck will have to make them available for |
| re-allocation relatively quickly, thereby losing the ability to detect |
| invalid accesses to them.</para> |
| |
| <para>By contrast, Ptrcheck has a constant per-block space requirement |
| of four machine words, for detection of accesses to freed blocks. A |
| freed block can be reallocated immediately, yet Ptrcheck can still |
| detect all invalid accesses through any pointers derived from the old |
| allocation, providing only that the four-word descriptor for the old |
| allocation is stored. For example, on a 64-bit machine, to detect |
| accesses in any of the most recently freed 10 million blocks, Ptrcheck |
| will require only 320MB of extra storage. Achieving the same level of |
| detection with Memcheck is close to impossible and would likely |
| involve several gigabytes of extra storage.</para> |
| |
| <para>In defense of Memcheck ...</para> |
| |
| <para>Remember that Memcheck performs uninitialised value checking, |
| which Ptrcheck does not. Memcheck has also benefitted from years of |
| refinement, tuning, and experience with production-level usage, and so |
| is much faster than Ptrcheck as it currently stands, as of October |
| 2008.</para> |
| |
| <para>Consequently it is recommended to first make your programs run |
| Memcheck clean. Once that's done, try Ptrcheck to see if you can |
| shake out any further heap, global or stack errors.</para> |
| |
| </sect1> |
| |
| |
| |
| |
| |
| <sect1 id="pc-manual.limitations" |
| xreflabel="Limitations"> |
| <title>Limitations</title> |
| |
| <para>This is an experimental tool, which relies rather too heavily on some |
| not-as-robust-as-I-would-like assumptions on the behaviour of correct |
| programs. There are a number of limitations which you should be aware |
| of.</para> |
| |
| <itemizedlist> |
| |
| <listitem> |
| <para>Heap checks: Ptrcheck can occasionally lose track of, or |
| become confused about, which heap block a given pointer has been |
| derived from. This can cause it to falsely report errors, or to |
| miss some errors. This is not believed to be a serious |
| problem.</para> |
| </listitem> |
| |
| <listitem> |
| <para>Heap checks: Ptrcheck only tracks pointers that are stored |
| properly aligned in memory. If a pointer is stored at a misaligned |
| address, and then later read again, Ptrcheck will lose track of |
| what it points at. Similar problem if a pointer is split into |
| pieces and later reconsitituted.</para> |
| </listitem> |
| |
| <listitem> |
| <para>Heap checks: Ptrcheck needs to "understand" which system |
| calls return pointers and which don't. Many, but not all system |
| calls are handled. If an unhandled one is encountered, Ptrcheck |
| will abort.</para> |
| </listitem> |
| |
| <listitem> |
| <para>Stack checks: It follows from the description above (How Ptrcheck |
| Works: Stack and Global Checks) that the first access by a memory |
| referencing instruction to a stack or global array creates an |
| association between that instruction and the array, which is |
| checked on subsequent accesses by that instruction, until the |
| containing function exits. Hence, the first access by an |
| instruction to an array (in any given function instantiation) is |
| not checked for overrun, since Ptrcheck uses that as the "example" |
| of how subsequent accesses should behave.</para> |
| </listitem> |
| |
| <listitem> |
| <para>Stack checks: Similarly, and more serious, it is clearly |
| possible to write legitimate pieces of code which break the basic |
| assumption upon which the stack/global checking rests. For |
| example:</para> |
| |
| <programlisting><![CDATA[ |
| { int a[10], b[10], *p, i; |
| for (i = 0; i < 10; i++) { |
| p = /* arbitrary condition */ ? &a[i] : &b[i]; |
| *p = 42; |
| } |
| } |
| ]]></programlisting> |
| |
| <para>In this case the store sometimes |
| accesses <computeroutput>a[]</computeroutput> and |
| sometimes <computeroutput>b[]</computeroutput>, but in no cases is |
| the addressed array overrun. Nevertheless the change in target |
| will cause an error to be reported.</para> |
| |
| <para>It is hard to see how to get around this problem. The only |
| mitigating factor is that such constructions appear very rare, at |
| least judging from the results using the tool so far. Such a |
| construction appears only once in the Valgrind sources (running |
| Valgrind on Valgrind) and perhaps two or three times for a start |
| and exit of Firefox. The best that can be done is to suppress the |
| errors.</para> |
| </listitem> |
| |
| <listitem> |
| <para>Performance: the stack/global checks require reading all of |
| the DWARF3 type and variable information on the executable and its |
| shared objects. This is computationally expensive and makes |
| startup quite slow. You can expect debuginfo reading time to be in |
| the region of a minute for an OpenOffice sized application, on a |
| 2.4 GHz Core 2 machine. Reading this information also requires a |
| lot of memory. To make it viable, Ptrcheck goes to considerable |
| trouble to compress the in-memory representation of the DWARF3 |
| data, which is why the process of reading it appears slow.</para> |
| </listitem> |
| |
| <listitem> |
| <para>Performance: Ptrcheck runs slower than Memcheck. This is |
| partly due to a lack of tuning, but partly due to algorithmic |
| difficulties. The heap-check side is potentially quite fast. The |
| stack and global checks can sometimes require a number of range |
| checks per memory access, and these are difficult to short-circuit |
| (despite considerable efforts having been made). |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para>Coverage: the heap checking is relatively robust, requiring |
| only that Ptrcheck can see calls to malloc/free et al. In that |
| sense it has debug-info requirements comparable with Memcheck, and |
| is able to heap-check programs even with no debugging information |
| attached.</para> |
| |
| <para>Stack/global checking is much more fragile. If a shared |
| object does not have debug information attached, then Ptrcheck will |
| not be able to determine the bounds of any stack or global arrays |
| defined within that shared object, and so will not be able to check |
| accesses to them. This is true even when those arrays are accessed |
| from some other shared object which was compiled with debug |
| info.</para> |
| |
| <para>At the moment Ptrcheck accepts objects lacking debuginfo |
| without comment. This is dangerous as it causes Ptrcheck to |
| silently skip stack and global checking for such objects. It would |
| be better to print a warning in such circumstances.</para> |
| </listitem> |
| |
| <listitem> |
| <para>Coverage: Ptrcheck checks that the areas read or written by |
| system calls do not overrun heap blocks. But it doesn't currently |
| check them for overruns stack and global arrays. This would be |
| easy to add.</para> |
| </listitem> |
| |
| <listitem> |
| <para>Platforms: the stack/global checks won't work properly on any |
| PowerPC platforms, only on x86 and amd64 targets. That's because |
| the stack and global checking requires tracking function calls and |
| exits reliably, and there's no obvious way to do it with the PPC |
| ABIs. (cf with the x86 and amd64 ABIs this is relatively |
| straightforward.)</para> |
| </listitem> |
| |
| <listitem> |
| <para>Robustness: related to the previous point. Function |
| call/exit tracking for x86/amd64 is believed to work properly even |
| in the presence of longjmps within the same stack (although this |
| has not been tested). However, code which switches stacks is |
| likely to cause breakage/chaos.</para> |
| </listitem> |
| </itemizedlist> |
| |
| </sect1> |
| |
| |
| |
| |
| |
| <sect1 id="pc-manual.todo-user-visible" |
| xreflabel="Still To Do: User Visible Functionality"> |
| <title>Still To Do: User Visible Functionality</title> |
| |
| <itemizedlist> |
| |
| <listitem> |
| <para>Extend system call checking to work on stack and global arrays.</para> |
| </listitem> |
| |
| <listitem> |
| <para>Print a warning if a shared object does not have debug info |
| attached, or if, for whatever reason, debug info could not be |
| found, or read.</para> |
| </listitem> |
| |
| </itemizedlist> |
| |
| </sect1> |
| |
| |
| |
| |
| <sect1 id="pc-manual.todo-implementation" |
| xreflabel="Still To Do: Implementation Tidying"> |
| <title>Still To Do: Implementation Tidying</title> |
| |
| <para>Items marked CRITICAL are considered important for correctness: |
| non-fixage of them is liable to lead to crashes or assertion failures |
| in real use.</para> |
| |
| <itemizedlist> |
| |
| <listitem> |
| <para>h_main.c: make N_FREED_SEGS command-line configurable.</para> |
| </listitem> |
| |
| <listitem> |
| <para> sg_main.c: Improve the performance of the stack / global |
| checks by doing some up-front filtering to ignore references in |
| areas which "obviously" can't be stack or globals. This will |
| require using information that m_aspacemgr knows about the address |
| space layout.</para> |
| </listitem> |
| |
| <listitem> |
| <para>h_main.c: get rid of the last_seg_added hack; add suitable |
| plumbing to the core/tool interface to do this cleanly.</para> |
| </listitem> |
| |
| <listitem> |
| <para>h_main.c: move vast amounts of arch-dependent uglyness |
| (get_IntRegInfo et al) to its own source file, a la |
| mc_machine.c.</para> |
| </listitem> |
| |
| <listitem> |
| <para>h_main.c: make the lossage-check stuff work again, as a way |
| of doing quality assurance on the implementation.</para> |
| </listitem> |
| |
| <listitem> |
| <para>h_main.c: schemeEw_Atom: don't generate a call to |
| nonptr_or_unknown, this is really stupid, since it could be done at |
| translation time instead.</para> |
| </listitem> |
| |
| <listitem> |
| <para>CRITICAL: h_main.c: h_instrument (main instrumentation fn): |
| generate shadows for word-sized temps defined in the block's |
| preamble. (Why does this work at all, as it stands?)</para> |
| </listitem> |
| |
| <listitem> |
| <para>sg_main.c: fix compute_II_hash to make it a bit more sensible |
| for ppc32/64 targets (except that sg_ doesn't work on ppc32/64 |
| targets, so this is a bit academic at the mo).</para> |
| </listitem> |
| |
| </itemizedlist> |
| |
| </sect1> |
| |
| |
| |
| </chapter> |