blob: 863219972456286aa3abf5e1baddb5cb9697827e [file] [log] [blame]
sewardj37a78a02008-10-23 13:15:23 +00001<?xml version="1.0"?> <!-- -*- sgml -*- -->
2<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
3 "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
4[ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]>
5
6
sewardj38488272011-05-11 15:26:06 +00007<chapter id="sg-manual"
8 xreflabel="SGCheck: an experimental stack and global array overrun detector">
9 <title>SGCheck: an experimental stack and global array overrun detector</title>
sewardj37a78a02008-10-23 13:15:23 +000010
11<para>To use this tool, you must specify
sewardj38488272011-05-11 15:26:06 +000012<option>--tool=exp-sgcheck</option> on the Valgrind
sewardj37a78a02008-10-23 13:15:23 +000013command line.</para>
14
15
16
17
sewardj38488272011-05-11 15:26:06 +000018<sect1 id="sg-manual.overview" xreflabel="Overview">
sewardj37a78a02008-10-23 13:15:23 +000019<title>Overview</title>
20
sewardj38488272011-05-11 15:26:06 +000021<para>SGCheck is a tool for finding overruns of stack and global
22arrays. It works by using a heuristic approach derived from an
23observation about the likely forms of stack and global array accesses.
24</para>
sewardj37a78a02008-10-23 13:15:23 +000025
26</sect1>
27
28
29
30
sewardj38488272011-05-11 15:26:06 +000031<sect1 id="sg-manual.options" xreflabel="SGCheck Command-line Options">
32<title>SGCheck Command-line Options</title>
sewardj37a78a02008-10-23 13:15:23 +000033
sewardj38488272011-05-11 15:26:06 +000034<para>There are no SGCheck-specific command-line options at present.</para>
35<!--
36<para>SGCheck-specific command-line options are:</para>
sewardj37a78a02008-10-23 13:15:23 +000037
sewardj37a78a02008-10-23 13:15:23 +000038
sewardj38488272011-05-11 15:26:06 +000039<variablelist id="sg.opts.list">
sewardj37a78a02008-10-23 13:15:23 +000040</variablelist>
sewardj38488272011-05-11 15:26:06 +000041-->
sewardj37a78a02008-10-23 13:15:23 +000042
sewardj37a78a02008-10-23 13:15:23 +000043</sect1>
44
45
46
sewardj38488272011-05-11 15:26:06 +000047<sect1 id="sg-manual.how-works.sg-checks"
48 xreflabel="How SGCheck Works">
49<title>How SGCheck Works</title>
sewardj37a78a02008-10-23 13:15:23 +000050
51<para>When a source file is compiled
njn7e5d4ed2009-07-30 02:57:52 +000052with <option>-g</option>, the compiler attaches DWARF3
sewardj37a78a02008-10-23 13:15:23 +000053debugging information which describes the location of all stack and
54global arrays in the file.</para>
55
56<para>Checking of accesses to such arrays would then be relatively
57simple, if the compiler could also tell us which array (if any) each
58memory referencing instruction was supposed to access. Unfortunately
59the DWARF3 debugging format does not provide a way to represent such
60information, so we have to resort to a heuristic technique to
sewardj38488272011-05-11 15:26:06 +000061approximate it. The key observation is that
njn5cca9f92009-08-05 07:15:28 +000062 <emphasis>
sewardj37a78a02008-10-23 13:15:23 +000063 if a memory referencing instruction accesses inside a stack or
64 global array once, then it is highly likely to always access that
njn5cca9f92009-08-05 07:15:28 +000065 same array</emphasis>.</para>
sewardj37a78a02008-10-23 13:15:23 +000066
67<para>To see how this might be useful, consider the following buggy
68fragment:</para>
69<programlisting><![CDATA[
70 { int i, a[10]; // both are auto vars
71 for (i = 0; i <= 10; i++)
72 a[i] = 42;
73 }
74]]></programlisting>
75
76<para>At run time we will know the precise address
77of <computeroutput>a[]</computeroutput> on the stack, and so we can
78observe that the first store resulting from <computeroutput>a[i] =
7942</computeroutput> writes <computeroutput>a[]</computeroutput>, and
80we will (correctly) assume that that instruction is intended always to
81access <computeroutput>a[]</computeroutput>. Then, on the 11th
82iteration, it accesses somewhere else, possibly a different local,
83possibly an un-accounted for area of the stack (eg, spill slot), so
sewardj38488272011-05-11 15:26:06 +000084SGCheck reports an error.</para>
sewardj37a78a02008-10-23 13:15:23 +000085
86<para>There is an important caveat.</para>
87
njn5cca9f92009-08-05 07:15:28 +000088<para>Imagine a function such as <function>memcpy</function>, which is used
89to read and write many different areas of memory over the lifetime of the
90program. If we insist that the read and write instructions in its memory
91copying loop only ever access one particular stack or global variable, we
92will be flooded with errors resulting from calls to
93<function>memcpy</function>.</para>
sewardj37a78a02008-10-23 13:15:23 +000094
sewardj38488272011-05-11 15:26:06 +000095<para>To avoid this problem, SGCheck instantiates fresh likely-target
sewardj37a78a02008-10-23 13:15:23 +000096records for each entry to a function, and discards them on exit. This
sewardj38488272011-05-11 15:26:06 +000097allows detection of cases where (e.g.) <function>memcpy</function>
98overflows its source or destination buffers for any specific call, but
99does not carry any restriction from one call to the next. Indeed,
100multiple threads may make multiple simultaneous calls to
101(e.g.) <function>memcpy</function> without mutual interference.</para>
sewardj37a78a02008-10-23 13:15:23 +0000102
103</sect1>
104
105
106
107
sewardj38488272011-05-11 15:26:06 +0000108<sect1 id="sg-manual.cmp-w-memcheck"
sewardj37a78a02008-10-23 13:15:23 +0000109 xreflabel="Comparison with Memcheck">
110<title>Comparison with Memcheck</title>
111
sewardj38488272011-05-11 15:26:06 +0000112<para>SGCheck and Memcheck are complementary: their capabilities do
113not overlap. Memcheck performs bounds checks and use-after-free
114checks for heap arrays. It also finds uses of uninitialised values
115created by heap or stack allocations. But it does not perform bounds
116checking for stack or global arrays.</para>
sewardj37a78a02008-10-23 13:15:23 +0000117
sewardj38488272011-05-11 15:26:06 +0000118<para>SGCheck, on the other hand, does do bounds checking for stack or
119global arrays, but it doesn't do anything else.</para>
sewardj37a78a02008-10-23 13:15:23 +0000120
121</sect1>
122
123
124
125
126
sewardj38488272011-05-11 15:26:06 +0000127<sect1 id="sg-manual.limitations"
sewardj37a78a02008-10-23 13:15:23 +0000128 xreflabel="Limitations">
129<title>Limitations</title>
130
131<para>This is an experimental tool, which relies rather too heavily on some
132not-as-robust-as-I-would-like assumptions on the behaviour of correct
133programs. There are a number of limitations which you should be aware
134of.</para>
135
136<itemizedlist>
137
138 <listitem>
sewardj38488272011-05-11 15:26:06 +0000139 <para>False negatives (missed errors): it follows from the
140 description above (<xref linkend="sg-manual.how-works.sg-checks"/>)
141 that the first access by a memory referencing instruction to a
142 stack or global array creates an association between that
143 instruction and the array, which is checked on subsequent accesses
144 by that instruction, until the containing function exits. Hence,
145 the first access by an instruction to an array (in any given
146 function instantiation) is not checked for overrun, since SGCheck
njn5cca9f92009-08-05 07:15:28 +0000147 uses that as the "example" of how subsequent accesses should
148 behave.</para>
sewardj37a78a02008-10-23 13:15:23 +0000149 </listitem>
150
151 <listitem>
sewardj38488272011-05-11 15:26:06 +0000152 <para>False positives (false errors): similarly, and more serious,
153 it is clearly possible to write legitimate pieces of code which
154 break the basic assumption upon which the checking algorithm
155 depends. For example:</para>
sewardj37a78a02008-10-23 13:15:23 +0000156
157<programlisting><![CDATA[
158 { int a[10], b[10], *p, i;
159 for (i = 0; i < 10; i++) {
160 p = /* arbitrary condition */ ? &a[i] : &b[i];
161 *p = 42;
162 }
163 }
164]]></programlisting>
165
166 <para>In this case the store sometimes
167 accesses <computeroutput>a[]</computeroutput> and
168 sometimes <computeroutput>b[]</computeroutput>, but in no cases is
169 the addressed array overrun. Nevertheless the change in target
170 will cause an error to be reported.</para>
171
172 <para>It is hard to see how to get around this problem. The only
173 mitigating factor is that such constructions appear very rare, at
174 least judging from the results using the tool so far. Such a
175 construction appears only once in the Valgrind sources (running
176 Valgrind on Valgrind) and perhaps two or three times for a start
177 and exit of Firefox. The best that can be done is to suppress the
178 errors.</para>
179 </listitem>
180
181 <listitem>
sewardj38488272011-05-11 15:26:06 +0000182 <para>Performance: SGCheck has to read all of
sewardj37a78a02008-10-23 13:15:23 +0000183 the DWARF3 type and variable information on the executable and its
184 shared objects. This is computationally expensive and makes
185 startup quite slow. You can expect debuginfo reading time to be in
186 the region of a minute for an OpenOffice sized application, on a
187 2.4 GHz Core 2 machine. Reading this information also requires a
sewardj38488272011-05-11 15:26:06 +0000188 lot of memory. To make it viable, SGCheck goes to considerable
sewardj37a78a02008-10-23 13:15:23 +0000189 trouble to compress the in-memory representation of the DWARF3
190 data, which is why the process of reading it appears slow.</para>
191 </listitem>
192
193 <listitem>
sewardj38488272011-05-11 15:26:06 +0000194 <para>Performance: SGCheck runs slower than Memcheck. This is
sewardj37a78a02008-10-23 13:15:23 +0000195 partly due to a lack of tuning, but partly due to algorithmic
sewardj38488272011-05-11 15:26:06 +0000196 difficulties. The
sewardj37a78a02008-10-23 13:15:23 +0000197 stack and global checks can sometimes require a number of range
sewardj38488272011-05-11 15:26:06 +0000198 checks per memory access, and these are difficult to short-circuit,
199 despite considerable efforts having been made. A
200 redesign and reimplementation could potentially make it much faster.
sewardj37a78a02008-10-23 13:15:23 +0000201 </para>
202 </listitem>
203
204 <listitem>
sewardj38488272011-05-11 15:26:06 +0000205 <para>Coverage: Stack and global checking is fragile. If a shared
206 object does not have debug information attached, then SGCheck will
sewardj37a78a02008-10-23 13:15:23 +0000207 not be able to determine the bounds of any stack or global arrays
208 defined within that shared object, and so will not be able to check
209 accesses to them. This is true even when those arrays are accessed
210 from some other shared object which was compiled with debug
211 info.</para>
212
sewardj38488272011-05-11 15:26:06 +0000213 <para>At the moment SGCheck accepts objects lacking debuginfo
214 without comment. This is dangerous as it causes SGCheck to
sewardj37a78a02008-10-23 13:15:23 +0000215 silently skip stack and global checking for such objects. It would
216 be better to print a warning in such circumstances.</para>
217 </listitem>
218
219 <listitem>
sewardj38488272011-05-11 15:26:06 +0000220 <para>Coverage: SGCheck does not check whether the the areas read
221 or written by system calls do overrun stack or global arrays. This
222 would be easy to add.</para>
sewardj37a78a02008-10-23 13:15:23 +0000223 </listitem>
224
225 <listitem>
sewardj38488272011-05-11 15:26:06 +0000226 <para>Platforms: the stack/global checks won't work properly on
227 PowerPC, ARM or S390X platforms, only on X86 and AMD64 targets.
228 That's because the stack and global checking requires tracking
229 function calls and exits reliably, and there's no obvious way to do
230 it on ABIs that use a link register for function returns.
231 </para>
sewardj37a78a02008-10-23 13:15:23 +0000232 </listitem>
233
234 <listitem>
235 <para>Robustness: related to the previous point. Function
sewardj38488272011-05-11 15:26:06 +0000236 call/exit tracking for X86 and AMD64 is believed to work properly
237 even in the presence of longjmps within the same stack (although
238 this has not been tested). However, code which switches stacks is
sewardj37a78a02008-10-23 13:15:23 +0000239 likely to cause breakage/chaos.</para>
240 </listitem>
241</itemizedlist>
242
243</sect1>
244
245
246
247
248
sewardj38488272011-05-11 15:26:06 +0000249<sect1 id="sg-manual.todo-user-visible"
njn5cca9f92009-08-05 07:15:28 +0000250 xreflabel="Still To Do: User-visible Functionality">
251<title>Still To Do: User-visible Functionality</title>
sewardj37a78a02008-10-23 13:15:23 +0000252
253<itemizedlist>
254
255 <listitem>
256 <para>Extend system call checking to work on stack and global arrays.</para>
257 </listitem>
258
259 <listitem>
260 <para>Print a warning if a shared object does not have debug info
261 attached, or if, for whatever reason, debug info could not be
262 found, or read.</para>
263 </listitem>
264
sewardj38488272011-05-11 15:26:06 +0000265 <listitem>
266 <para>Add some heuristic filtering that removes obvious false
267 positives. This would be easy to do. For example, an access
268 transition from a heap to a stack object almost certainly isn't a
269 bug and so should not be reported to the user.</para>
270 </listitem>
271
sewardj37a78a02008-10-23 13:15:23 +0000272</itemizedlist>
273
274</sect1>
275
276
277
278
sewardj38488272011-05-11 15:26:06 +0000279<sect1 id="sg-manual.todo-implementation"
sewardj37a78a02008-10-23 13:15:23 +0000280 xreflabel="Still To Do: Implementation Tidying">
281<title>Still To Do: Implementation Tidying</title>
282
283<para>Items marked CRITICAL are considered important for correctness:
284non-fixage of them is liable to lead to crashes or assertion failures
285in real use.</para>
286
287<itemizedlist>
288
289 <listitem>
sewardj38488272011-05-11 15:26:06 +0000290 <para> sg_main.c: Redesign and reimplement the basic checking
291 algorithm. It could be done much faster than it is -- the current
292 implementation isn't very good.
293 </para>
sewardj37a78a02008-10-23 13:15:23 +0000294 </listitem>
sewardj38488272011-05-11 15:26:06 +0000295
sewardj37a78a02008-10-23 13:15:23 +0000296 <listitem>
297 <para> sg_main.c: Improve the performance of the stack / global
298 checks by doing some up-front filtering to ignore references in
299 areas which "obviously" can't be stack or globals. This will
300 require using information that m_aspacemgr knows about the address
301 space layout.</para>
302 </listitem>
303
304 <listitem>
sewardj37a78a02008-10-23 13:15:23 +0000305 <para>sg_main.c: fix compute_II_hash to make it a bit more sensible
306 for ppc32/64 targets (except that sg_ doesn't work on ppc32/64
njn5cca9f92009-08-05 07:15:28 +0000307 targets, so this is a bit academic at the moment).</para>
sewardj37a78a02008-10-23 13:15:23 +0000308 </listitem>
309
310</itemizedlist>
311
312</sect1>
313
314
315
316</chapter>