blob: 4b4356fc52e85577fd370bad96433a077213ac7d [file] [log] [blame]
Elliott Hughesa0664b92017-04-18 17:46:52 -07001<html>
2<head>
3<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
4<title>10. DHAT: a dynamic heap analysis tool</title>
5<link rel="stylesheet" type="text/css" href="vg_basic.css">
Elliott Hughesed398002017-06-21 14:41:24 -07006<meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
Elliott Hughesa0664b92017-04-18 17:46:52 -07007<link rel="home" href="index.html" title="Valgrind Documentation">
8<link rel="up" href="manual.html" title="Valgrind User Manual">
9<link rel="prev" href="ms-manual.html" title="9. Massif: a heap profiler">
10<link rel="next" href="sg-manual.html" title="11. SGCheck: an experimental stack and global array overrun detector">
11</head>
12<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
13<div><table class="nav" width="100%" cellspacing="3" cellpadding="3" border="0" summary="Navigation header"><tr>
14<td width="22px" align="center" valign="middle"><a accesskey="p" href="ms-manual.html"><img src="images/prev.png" width="18" height="21" border="0" alt="Prev"></a></td>
15<td width="25px" align="center" valign="middle"><a accesskey="u" href="manual.html"><img src="images/up.png" width="21" height="18" border="0" alt="Up"></a></td>
16<td width="31px" align="center" valign="middle"><a accesskey="h" href="index.html"><img src="images/home.png" width="27" height="20" border="0" alt="Up"></a></td>
17<th align="center" valign="middle">Valgrind User Manual</th>
18<td width="22px" align="center" valign="middle"><a accesskey="n" href="sg-manual.html"><img src="images/next.png" width="18" height="21" border="0" alt="Next"></a></td>
19</tr></table></div>
20<div class="chapter">
21<div class="titlepage"><div><div><h1 class="title">
22<a name="dh-manual"></a>10. DHAT: a dynamic heap analysis tool</h1></div></div></div>
23<div class="toc">
24<p><b>Table of Contents</b></p>
25<dl class="toc">
26<dt><span class="sect1"><a href="dh-manual.html#dh-manual.overview">10.1. Overview</a></span></dt>
27<dt><span class="sect1"><a href="dh-manual.html#dh-manual.understanding">10.2. Understanding DHAT's output</a></span></dt>
28<dd><dl>
Elliott Hughesed398002017-06-21 14:41:24 -070029<dt><span class="sect2"><a href="dh-manual.html#idm140394924138288">10.2.1. Interpreting the max-live, tot-alloc and deaths fields</a></span></dt>
30<dt><span class="sect2"><a href="dh-manual.html#idm140394926128304">10.2.2. Interpreting the acc-ratios fields</a></span></dt>
31<dt><span class="sect2"><a href="dh-manual.html#idm140394925890256">10.2.3. Interpreting "Aggregated access counts by offset" data</a></span></dt>
Elliott Hughesa0664b92017-04-18 17:46:52 -070032</dl></dd>
33<dt><span class="sect1"><a href="dh-manual.html#dh-manual.options">10.3. DHAT Command-line Options</a></span></dt>
34</dl>
35</div>
36<p>To use this tool, you must specify
37<code class="option">--tool=exp-dhat</code> on the Valgrind
38command line.</p>
39<div class="sect1">
40<div class="titlepage"><div><div><h2 class="title" style="clear: both">
41<a name="dh-manual.overview"></a>10.1. Overview</h2></div></div></div>
42<p>DHAT is a tool for examining how programs use their heap
43allocations.</p>
44<p>It tracks the allocated blocks, and inspects every memory access
45to find which block, if any, it is to. The following data is
46collected and presented per allocation point (allocation
47stack):</p>
48<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
49<li class="listitem"><p>Total allocation (number of bytes and
50 blocks)</p></li>
51<li class="listitem"><p>maximum live volume (number of bytes and
52 blocks)</p></li>
53<li class="listitem"><p>average block lifetime (number of instructions
54 between allocation and freeing)</p></li>
55<li class="listitem"><p>average number of reads and writes to each byte in
56 the block ("access ratios")</p></li>
57<li class="listitem"><p>for allocation points which always allocate blocks
58 only of one size, and that size is 4096 bytes or less: counts
59 showing how often each byte offset inside the block is
60 accessed.</p></li>
61</ul></div>
62<p>Using these statistics it is possible to identify allocation
63points with the following characteristics:</p>
64<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
65<li class="listitem"><p>potential process-lifetime leaks: blocks allocated
66 by the point just accumulate, and are freed only at the end of the
67 run.</p></li>
68<li class="listitem"><p>excessive turnover: points which chew through a lot
69 of heap, even if it is not held onto for very long</p></li>
70<li class="listitem"><p>excessively transient: points which allocate very
71 short lived blocks</p></li>
72<li class="listitem"><p>useless or underused allocations: blocks which are
73 allocated but not completely filled in, or are filled in but not
74 subsequently read.</p></li>
75<li class="listitem"><p>blocks with inefficient layout -- areas never
76 accessed, or with hot fields scattered throughout the
77 block.</p></li>
78</ul></div>
79<p>As with the Massif heap profiler, DHAT measures program progress
80by counting instructions, and so presents all age/time related figures
81as instruction counts. This sounds a little odd at first, but it
82makes runs repeatable in a way which is not possible if CPU time is
83used.</p>
84</div>
85<div class="sect1">
86<div class="titlepage"><div><div><h2 class="title" style="clear: both">
87<a name="dh-manual.understanding"></a>10.2. Understanding DHAT's output</h2></div></div></div>
88<p>DHAT provides a lot of useful information on dynamic heap usage.
89Most of the art of using it is in interpretation of the resulting
90numbers. That is best illustrated via a set of examples.</p>
91<div class="sect2">
92<div class="titlepage"><div><div><h3 class="title">
Elliott Hughesed398002017-06-21 14:41:24 -070093<a name="idm140394924138288"></a>10.2.1. Interpreting the max-live, tot-alloc and deaths fields</h3></div></div></div>
Elliott Hughesa0664b92017-04-18 17:46:52 -070094<div class="sect3"><div class="titlepage"><div><div><h4 class="title">
Elliott Hughesed398002017-06-21 14:41:24 -070095<a name="idm140394924137584"></a>10.2.1.1. A simple example</h4></div></div></div></div>
Elliott Hughesa0664b92017-04-18 17:46:52 -070096<pre class="screen">
97 ======== SUMMARY STATISTICS ========
98
99 guest_insns: 1,045,339,534
100 [...]
101 max-live: 63,490 in 984 blocks
102 tot-alloc: 1,904,700 in 29,520 blocks (avg size 64.52)
103 deaths: 29,520, at avg age 22,227,424
104 acc-ratios: 6.37 rd, 1.14 wr (12,141,526 b-read, 2,174,460 b-written)
105 at 0x4C275B8: malloc (vg_replace_malloc.c:236)
106 by 0x40350E: tcc_malloc (tinycc.c:6712)
107 by 0x404580: tok_alloc_new (tinycc.c:7151)
108 by 0x40870A: next_nomacro1 (tinycc.c:9305)
109</pre>
110<p>Over the entire run of the program, this stack (allocation
111point) allocated 29,520 blocks in total, containing 1,904,700 bytes in
112total. By looking at the max-live data, we see that not many blocks
113were simultaneously live, though: at the peak, there were 63,490
114allocated bytes in 984 blocks. This tells us that the program is
115steadily freeing such blocks as it runs, rather than hanging on to all
116of them until the end and freeing them all.</p>
117<p>The deaths entry tells us that 29,520 blocks allocated by this stack
118died (were freed) during the run of the program. Since 29,520 is
119also the number of blocks allocated in total, that tells us that
120all allocated blocks were freed by the end of the program.</p>
121<p>It also tells us that the average age at death was 22,227,424
122instructions. From the summary statistics we see that the program ran
123for 1,045,339,534 instructions, and so the average age at death is
124about 2% of the program's total run time.</p>
125<div class="sect3"><div class="titlepage"><div><div><h4 class="title">
Elliott Hughesed398002017-06-21 14:41:24 -0700126<a name="idm140394923815680"></a>10.2.1.2. Example of a potential process-lifetime leak</h4></div></div></div></div>
Elliott Hughesa0664b92017-04-18 17:46:52 -0700127<p>This next example (from a different program than the above)
128shows a potential process lifetime leak. A process lifetime leak
129occurs when a program keeps allocating data, but only frees the
130data just before it exits. Hence the program's heap grows constantly
131in size, yet Memcheck reports no leak, because the program has
132freed up everything at exit. This is particularly a hazard for
133long running programs.</p>
134<pre class="screen">
135 ======== SUMMARY STATISTICS ========
136
137 guest_insns: 418,901,537
138 [...]
139 max-live: 32,512 in 254 blocks
140 tot-alloc: 32,512 in 254 blocks (avg size 128.00)
141 deaths: 254, at avg age 300,467,389
142 acc-ratios: 0.26 rd, 0.20 wr (8,756 b-read, 6,604 b-written)
143 at 0x4C275B8: malloc (vg_replace_malloc.c:236)
144 by 0x4C27632: realloc (vg_replace_malloc.c:525)
145 by 0x56FF41D: QtFontStyle::pixelSize(unsigned short, bool) (qfontdatabase.cpp:269)
146 by 0x5700D69: loadFontConfig() (qfontdatabase_x11.cpp:1146)
147</pre>
148<p>There are two tell-tale signs that this might be a
149process-lifetime leak. Firstly, the max-live and tot-alloc numbers
150are identical. The only way that can happen is if these blocks are
151all allocated and then all deallocated.</p>
152<p>Secondly, the average age at death (300 million insns) is 71% of
153the total program lifetime (419 million insns), hence this is not a
154transient allocation-free spike -- rather, it is spread out over a
155large part of the entire run. One interpretation is, roughly, that
156all 254 blocks were allocated in the first half of the run, held onto
157for the second half, and then freed just before exit.</p>
158</div>
159<div class="sect2">
160<div class="titlepage"><div><div><h3 class="title">
Elliott Hughesed398002017-06-21 14:41:24 -0700161<a name="idm140394926128304"></a>10.2.2. Interpreting the acc-ratios fields</h3></div></div></div>
Elliott Hughesa0664b92017-04-18 17:46:52 -0700162<div class="sect3"><div class="titlepage"><div><div><h4 class="title">
Elliott Hughesed398002017-06-21 14:41:24 -0700163<a name="idm140394923814880"></a>10.2.2.1. A fairly harmless allocation point record</h4></div></div></div></div>
Elliott Hughesa0664b92017-04-18 17:46:52 -0700164<pre class="screen">
165 max-live: 49,398 in 808 blocks
166 tot-alloc: 1,481,940 in 24,240 blocks (avg size 61.13)
167 deaths: 24,240, at avg age 34,611,026
168 acc-ratios: 2.13 rd, 0.91 wr (3,166,650 b-read, 1,358,820 b-written)
169 at 0x4C275B8: malloc (vg_replace_malloc.c:236)
170 by 0x40350E: tcc_malloc (tinycc.c:6712)
171 by 0x404580: tok_alloc_new (tinycc.c:7151)
172 by 0x4046C4: tok_alloc (tinycc.c:7190)
173</pre>
174<p>The acc-ratios field tells us that each byte in the blocks
175allocated here is read an average of 2.13 times before the block is
176deallocated. Given that the blocks have an average age at death of
17734,611,026, that's one read per block per approximately every 15
178million instructions. So from that standpoint the blocks aren't
179"working" very hard.</p>
180<p>More interesting is the write ratio: each byte is written an
181average of 0.91 times. This tells us that some parts of the allocated
182blocks are never written, at least 9% on average. To completely
183initialise the block would require writing each byte at least once,
184and that would give a write ratio of 1.0. The fact that some block
185areas are evidently unused might point to data alignment holes or
186other layout inefficiencies.</p>
187<p>Well, at least all the blocks are freed (24,240 allocations,
18824,240 deaths).</p>
189<p>If all the blocks had been the same size, DHAT would also show
190the access counts by block offset, so we could see where exactly these
191unused areas are. However, that isn't the case: the blocks have
192varying sizes, so DHAT can't perform such an analysis. We can see
193that they must have varying sizes since the average block size, 61.13,
194isn't a whole number.</p>
195<div class="sect3"><div class="titlepage"><div><div><h4 class="title">
Elliott Hughesed398002017-06-21 14:41:24 -0700196<a name="idm140394922011696"></a>10.2.2.2. A more suspicious looking example</h4></div></div></div></div>
Elliott Hughesa0664b92017-04-18 17:46:52 -0700197<pre class="screen">
198 max-live: 180,224 in 22 blocks
199 tot-alloc: 180,224 in 22 blocks (avg size 8192.00)
200 deaths: none (none of these blocks were freed)
201 acc-ratios: 0.00 rd, 0.00 wr (0 b-read, 0 b-written)
202 at 0x4C275B8: malloc (vg_replace_malloc.c:236)
203 by 0x40350E: tcc_malloc (tinycc.c:6712)
204 by 0x40369C: __sym_malloc (tinycc.c:6787)
205 by 0x403711: sym_malloc (tinycc.c:6805)
206</pre>
207<p>Here, both the read and write access ratios are zero. Hence
208this point is allocating blocks which are never used, neither read nor
209written. Indeed, they are also not freed ("deaths: none") and are
210simply leaked. So, here is 180k of completely useless allocation that
211could be removed.</p>
212<p>Re-running with Memcheck does indeed report the same leak. What
213DHAT can tell us, that Memcheck can't, is that not only are the blocks
214leaked, they are also never used.</p>
215<div class="sect3"><div class="titlepage"><div><div><h4 class="title">
Elliott Hughesed398002017-06-21 14:41:24 -0700216<a name="idm140394922008704"></a>10.2.2.3. Another suspicious example</h4></div></div></div></div>
Elliott Hughesa0664b92017-04-18 17:46:52 -0700217<p>Here's one where blocks are allocated, written to,
218but never read from. We see this immediately from the zero read
219access ratio. They do get freed, though:</p>
220<pre class="screen">
221 max-live: 54 in 3 blocks
222 tot-alloc: 1,620 in 90 blocks (avg size 18.00)
223 deaths: 90, at avg age 34,558,236
224 acc-ratios: 0.00 rd, 1.11 wr (0 b-read, 1,800 b-written)
225 at 0x4C275B8: malloc (vg_replace_malloc.c:236)
226 by 0x40350E: tcc_malloc (tinycc.c:6712)
227 by 0x4035BD: tcc_strdup (tinycc.c:6750)
228 by 0x41FEBB: tcc_add_sysinclude_path (tinycc.c:20931)
229</pre>
230<p>In the previous two examples, it is easy to see blocks that are
231never written to, or never read from, or some combination of both.
232Unfortunately, in C++ code, the situation is less clear. That's
233because an object's constructor will write to the underlying block,
234and its destructor will read from it. So the block's read and write
235ratios will be non-zero even if the object, once constructed, is never
236used, but only eventually destructed.</p>
237<p>Really, what we want is to measure only memory accesses in
238between the end of an object's construction and the start of its
239destruction. Unfortunately I do not know of a reliable way to
240determine when those transitions are made.</p>
241</div>
242<div class="sect2">
243<div class="titlepage"><div><div><h3 class="title">
Elliott Hughesed398002017-06-21 14:41:24 -0700244<a name="idm140394925890256"></a>10.2.3. Interpreting "Aggregated access counts by offset" data</h3></div></div></div>
Elliott Hughesa0664b92017-04-18 17:46:52 -0700245<p>For allocation points that always allocate blocks of the same
246size, and which are 4096 bytes or smaller, DHAT counts accesses
247per offset, for example:</p>
248<pre class="screen">
249 max-live: 317,408 in 5,668 blocks
250 tot-alloc: 317,408 in 5,668 blocks (avg size 56.00)
251 deaths: 5,668, at avg age 622,890,597
252 acc-ratios: 1.03 rd, 1.28 wr (327,642 b-read, 408,172 b-written)
253 at 0x4C275B8: malloc (vg_replace_malloc.c:236)
254 by 0x5440C16: QDesignerPropertySheetPrivate::ensureInfo (qhash.h:515)
255 by 0x544350B: QDesignerPropertySheet::setVisible (qdesigner_propertysh...)
256 by 0x5446232: QDesignerPropertySheet::QDesignerPropertySheet (qdesigne...)
257
258 Aggregated access counts by offset:
259
260 [ 0] 28782 28782 28782 28782 28782 28782 28782 28782
261 [ 8] 20638 20638 20638 20638 0 0 0 0
262 [ 16] 22738 22738 22738 22738 22738 22738 22738 22738
263 [ 24] 6013 6013 6013 6013 6013 6013 6013 6013
264 [ 32] 18883 18883 18883 37422 0 0 0 0
265 [ 36] 5668 11915 5668 5668 11336 11336 11336 11336
266 [ 48] 6166 6166 6166 6166 0 0 0 0
267</pre>
268<p>This is fairly typical, for C++ code running on a 64-bit
269platform. Here, we have aggregated access statistics for 5668 blocks,
270all of size 56 bytes. Each byte has been accessed at least 5668
271times, except for offsets 12--15, 36--39 and 52--55. These are likely
272to be alignment holes.</p>
273<p>Careful interpretation of the numbers reveals useful information.
274Groups of N consecutive identical numbers that begin at an N-aligned
275offset, for N being 2, 4 or 8, are likely to indicate an N-byte object
276in the structure at that point. For example, the first 32 bytes of
277this object are likely to have the layout</p>
278<pre class="screen">
279 [0 ] 64-bit type
280 [8 ] 32-bit type
281 [12] 32-bit alignment hole
282 [16] 64-bit type
283 [24] 64-bit type
284</pre>
285<p>As a counterexample, it's also clear that, whatever is at offset 32,
286it is not a 32-bit value. That's because the last number of the group
287(37422) is not the same as the first three (18883 18883 18883).</p>
288<p>This example leads one to enquire (by reading the source code)
289whether the zeroes at 12--15 and 52--55 are alignment holes, and
290whether 48--51 is indeed a 32-bit type. If so, it might be possible
291to place what's at 48--51 at 12--15 instead, which would reduce
292the object size from 56 to 48 bytes.</p>
293<p>Bear in mind that the above inferences are all only "maybes". That's
294because they are based on dynamic data, not static analysis of the
295object layout. For example, the zeroes might not be alignment
296holes, but rather just parts of the structure which were not used
297at all for this particular run. Experience shows that's unlikely
298to be the case, but it could happen.</p>
299</div>
300</div>
301<div class="sect1">
302<div class="titlepage"><div><div><h2 class="title" style="clear: both">
303<a name="dh-manual.options"></a>10.3. DHAT Command-line Options</h2></div></div></div>
304<p>DHAT-specific command-line options are:</p>
305<div class="variablelist">
306<a name="dh.opts.list"></a><dl class="variablelist">
307<dt>
308<a name="opt.show-top-n"></a><span class="term">
309 <code class="option">--show-top-n=&lt;number&gt;
310 [default: 10] </code>
311 </span>
312</dt>
313<dd><p>At the end of the run, DHAT sorts the accumulated
314 allocation points according to some metric, and shows the
315 highest scoring entries. <code class="varname">--show-top-n</code>
316 controls how many entries are shown. The default of 10 is
317 quite small. For realistic applications you will probably need
318 to set it much higher, at least several hundred.</p></dd>
319<dt>
320<a name="opt.sort-by"></a><span class="term">
321 <code class="option">--sort-by=&lt;string&gt; [default: max-bytes-live] </code>
322 </span>
323</dt>
324<dd>
325<p>At the end of the run, DHAT sorts the accumulated
326 allocation points according to some metric, and shows the
327 highest scoring entries. <code class="varname">--sort-by</code>
328 selects the metric used for sorting:</p>
329<p><code class="varname">max-bytes-live </code> maximum live bytes [default]</p>
330<p><code class="varname">tot-bytes-allocd </code> bytes allocates in total (turnover)</p>
331<p><code class="varname">max-blocks-live </code> maximum live blocks</p>
332<p><code class="varname">tot-blocks-allocd </code> blocks allocated in total (turnover)</p>
333<p>This controls the order in which allocation points are
334 displayed. You can choose to look at allocation points with
335 the highest number of live bytes, or the highest total byte turnover, or
336 by the highest number of live blocks, or the highest total block
337 turnover. These give usefully different pictures of program behaviour.
338 For example, sorting by maximum live blocks tends to show up allocation
339 points creating large numbers of small objects.</p>
340</dd>
341</dl>
342</div>
343<p>One important point to note is that each allocation stack counts
Elliott Hughesed398002017-06-21 14:41:24 -0700344as a separate allocation point. Because stacks by default have 12
Elliott Hughesa0664b92017-04-18 17:46:52 -0700345frames, this tends to spread data out over multiple allocation points.
346You may want to use the flag --num-callers=4 or some such small
347number, to reduce the spreading.</p>
348</div>
349</div>
350<div>
351<br><table class="nav" width="100%" cellspacing="3" cellpadding="2" border="0" summary="Navigation footer">
352<tr>
353<td rowspan="2" width="40%" align="left">
354<a accesskey="p" href="ms-manual.html">&lt;&lt; 9. Massif: a heap profiler</a> </td>
355<td width="20%" align="center"><a accesskey="u" href="manual.html">Up</a></td>
356<td rowspan="2" width="40%" align="right"> <a accesskey="n" href="sg-manual.html">11. SGCheck: an experimental stack and global array overrun detector &gt;&gt;</a>
357</td>
358</tr>
359<tr><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td></tr>
360</table>
361</div>
362</body>
363</html>