Elliott Hughes | a0664b9 | 2017-04-18 17:46:52 -0700 | [diff] [blame] | 1 | <html> |
| 2 | <head> |
| 3 | <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> |
| 4 | <title>6. Callgrind: a call-graph generating cache and branch prediction profiler</title> |
| 5 | <link rel="stylesheet" type="text/css" href="vg_basic.css"> |
Elliott Hughes | ed39800 | 2017-06-21 14:41:24 -0700 | [diff] [blame] | 6 | <meta name="generator" content="DocBook XSL Stylesheets V1.79.1"> |
Elliott Hughes | a0664b9 | 2017-04-18 17:46:52 -0700 | [diff] [blame] | 7 | <link rel="home" href="index.html" title="Valgrind Documentation"> |
| 8 | <link rel="up" href="manual.html" title="Valgrind User Manual"> |
| 9 | <link rel="prev" href="cg-manual.html" title="5. Cachegrind: a cache and branch-prediction profiler"> |
| 10 | <link rel="next" href="hg-manual.html" title="7. Helgrind: a thread error detector"> |
| 11 | </head> |
| 12 | <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"> |
| 13 | <div><table class="nav" width="100%" cellspacing="3" cellpadding="3" border="0" summary="Navigation header"><tr> |
| 14 | <td width="22px" align="center" valign="middle"><a accesskey="p" href="cg-manual.html"><img src="images/prev.png" width="18" height="21" border="0" alt="Prev"></a></td> |
| 15 | <td width="25px" align="center" valign="middle"><a accesskey="u" href="manual.html"><img src="images/up.png" width="21" height="18" border="0" alt="Up"></a></td> |
| 16 | <td width="31px" align="center" valign="middle"><a accesskey="h" href="index.html"><img src="images/home.png" width="27" height="20" border="0" alt="Up"></a></td> |
| 17 | <th align="center" valign="middle">Valgrind User Manual</th> |
| 18 | <td width="22px" align="center" valign="middle"><a accesskey="n" href="hg-manual.html"><img src="images/next.png" width="18" height="21" border="0" alt="Next"></a></td> |
| 19 | </tr></table></div> |
| 20 | <div class="chapter"> |
| 21 | <div class="titlepage"><div><div><h1 class="title"> |
| 22 | <a name="cl-manual"></a>6. Callgrind: a call-graph generating cache and branch prediction profiler</h1></div></div></div> |
| 23 | <div class="toc"> |
| 24 | <p><b>Table of Contents</b></p> |
| 25 | <dl class="toc"> |
| 26 | <dt><span class="sect1"><a href="cl-manual.html#cl-manual.use">6.1. Overview</a></span></dt> |
| 27 | <dd><dl> |
| 28 | <dt><span class="sect2"><a href="cl-manual.html#cl-manual.functionality">6.1.1. Functionality</a></span></dt> |
| 29 | <dt><span class="sect2"><a href="cl-manual.html#cl-manual.basics">6.1.2. Basic Usage</a></span></dt> |
| 30 | </dl></dd> |
| 31 | <dt><span class="sect1"><a href="cl-manual.html#cl-manual.usage">6.2. Advanced Usage</a></span></dt> |
| 32 | <dd><dl> |
| 33 | <dt><span class="sect2"><a href="cl-manual.html#cl-manual.dumps">6.2.1. Multiple profiling dumps from one program run</a></span></dt> |
| 34 | <dt><span class="sect2"><a href="cl-manual.html#cl-manual.limits">6.2.2. Limiting the range of collected events</a></span></dt> |
| 35 | <dt><span class="sect2"><a href="cl-manual.html#cl-manual.busevents">6.2.3. Counting global bus events</a></span></dt> |
| 36 | <dt><span class="sect2"><a href="cl-manual.html#cl-manual.cycles">6.2.4. Avoiding cycles</a></span></dt> |
| 37 | <dt><span class="sect2"><a href="cl-manual.html#cl-manual.forkingprograms">6.2.5. Forking Programs</a></span></dt> |
| 38 | </dl></dd> |
| 39 | <dt><span class="sect1"><a href="cl-manual.html#cl-manual.options">6.3. Callgrind Command-line Options</a></span></dt> |
| 40 | <dd><dl> |
| 41 | <dt><span class="sect2"><a href="cl-manual.html#cl-manual.options.creation">6.3.1. Dump creation options</a></span></dt> |
| 42 | <dt><span class="sect2"><a href="cl-manual.html#cl-manual.options.activity">6.3.2. Activity options</a></span></dt> |
| 43 | <dt><span class="sect2"><a href="cl-manual.html#cl-manual.options.collection">6.3.3. Data collection options</a></span></dt> |
| 44 | <dt><span class="sect2"><a href="cl-manual.html#cl-manual.options.separation">6.3.4. Cost entity separation options</a></span></dt> |
| 45 | <dt><span class="sect2"><a href="cl-manual.html#cl-manual.options.simulation">6.3.5. Simulation options</a></span></dt> |
| 46 | <dt><span class="sect2"><a href="cl-manual.html#cl-manual.options.cachesimulation">6.3.6. Cache simulation options</a></span></dt> |
| 47 | </dl></dd> |
| 48 | <dt><span class="sect1"><a href="cl-manual.html#cl-manual.monitor-commands">6.4. Callgrind Monitor Commands</a></span></dt> |
| 49 | <dt><span class="sect1"><a href="cl-manual.html#cl-manual.clientrequests">6.5. Callgrind specific client requests</a></span></dt> |
| 50 | <dt><span class="sect1"><a href="cl-manual.html#cl-manual.callgrind_annotate-options">6.6. callgrind_annotate Command-line Options</a></span></dt> |
| 51 | <dt><span class="sect1"><a href="cl-manual.html#cl-manual.callgrind_control-options">6.7. callgrind_control Command-line Options</a></span></dt> |
| 52 | </dl> |
| 53 | </div> |
| 54 | <p>To use this tool, you must specify |
| 55 | <code class="option">--tool=callgrind</code> on the |
| 56 | Valgrind command line.</p> |
| 57 | <div class="sect1"> |
| 58 | <div class="titlepage"><div><div><h2 class="title" style="clear: both"> |
| 59 | <a name="cl-manual.use"></a>6.1. Overview</h2></div></div></div> |
| 60 | <p>Callgrind is a profiling tool that records the call history among |
| 61 | functions in a program's run as a call-graph. |
| 62 | By default, the collected data consists of |
| 63 | the number of instructions executed, their relationship |
| 64 | to source lines, the caller/callee relationship between functions, |
| 65 | and the numbers of such calls. |
| 66 | Optionally, cache simulation and/or branch prediction (similar to Cachegrind) |
| 67 | can produce further information about the runtime behavior of an application. |
| 68 | </p> |
| 69 | <p>The profile data is written out to a file at program |
| 70 | termination. For presentation of the data, and interactive control |
| 71 | of the profiling, two command line tools are provided:</p> |
| 72 | <div class="variablelist"><dl class="variablelist"> |
| 73 | <dt><span class="term"><span class="command"><strong>callgrind_annotate</strong></span></span></dt> |
| 74 | <dd> |
| 75 | <p>This command reads in the profile data, and prints a |
| 76 | sorted lists of functions, optionally with source annotation.</p> |
| 77 | <p>For graphical visualization of the data, try |
| 78 | <a class="ulink" href="http://kcachegrind.sourceforge.net/cgi-bin/show.cgi/KcacheGrindIndex" target="_top">KCachegrind</a>, which is a KDE/Qt based |
| 79 | GUI that makes it easy to navigate the large amount of data that |
| 80 | Callgrind produces.</p> |
| 81 | </dd> |
| 82 | <dt><span class="term"><span class="command"><strong>callgrind_control</strong></span></span></dt> |
| 83 | <dd><p>This command enables you to interactively observe and control |
| 84 | the status of a program currently running under Callgrind's control, |
| 85 | without stopping the program. You can get statistics information as |
| 86 | well as the current stack trace, and you can request zeroing of counters |
| 87 | or dumping of profile data.</p></dd> |
| 88 | </dl></div> |
| 89 | <div class="sect2"> |
| 90 | <div class="titlepage"><div><div><h3 class="title"> |
| 91 | <a name="cl-manual.functionality"></a>6.1.1. Functionality</h3></div></div></div> |
| 92 | <p>Cachegrind collects flat profile data: event counts (data reads, |
| 93 | cache misses, etc.) are attributed directly to the function they |
| 94 | occurred in. This cost attribution mechanism is |
| 95 | called <span class="emphasis"><em>self</em></span> or <span class="emphasis"><em>exclusive</em></span> |
| 96 | attribution.</p> |
| 97 | <p>Callgrind extends this functionality by propagating costs |
| 98 | across function call boundaries. If function <code class="function">foo</code> calls |
| 99 | <code class="function">bar</code>, the costs from <code class="function">bar</code> are added into |
| 100 | <code class="function">foo</code>'s costs. When applied to the program as a whole, |
| 101 | this builds up a picture of so called <span class="emphasis"><em>inclusive</em></span> |
| 102 | costs, that is, where the cost of each function includes the costs of |
| 103 | all functions it called, directly or indirectly.</p> |
| 104 | <p>As an example, the inclusive cost of |
| 105 | <code class="function">main</code> should be almost 100 percent |
| 106 | of the total program cost. Because of costs arising before |
| 107 | <code class="function">main</code> is run, such as |
| 108 | initialization of the run time linker and construction of global C++ |
| 109 | objects, the inclusive cost of <code class="function">main</code> |
| 110 | is not exactly 100 percent of the total program cost.</p> |
| 111 | <p>Together with the call graph, this allows you to find the |
| 112 | specific call chains starting from |
| 113 | <code class="function">main</code> in which the majority of the |
| 114 | program's costs occur. Caller/callee cost attribution is also useful |
| 115 | for profiling functions called from multiple call sites, and where |
| 116 | optimization opportunities depend on changing code in the callers, in |
| 117 | particular by reducing the call count.</p> |
| 118 | <p>Callgrind's cache simulation is based on that of Cachegrind. |
| 119 | Read the documentation for <a class="xref" href="cg-manual.html" title="5. Cachegrind: a cache and branch-prediction profiler">Cachegrind: a cache and branch-prediction profiler</a> first. The material |
| 120 | below describes the features supported in addition to Cachegrind's |
| 121 | features.</p> |
| 122 | <p>Callgrind's ability to detect function calls and returns depends |
| 123 | on the instruction set of the platform it is run on. It works best on |
| 124 | x86 and amd64, and unfortunately currently does not work so well on |
| 125 | PowerPC, ARM, Thumb or MIPS code. This is because there are no explicit |
| 126 | call or return instructions in these instruction sets, so Callgrind |
| 127 | has to rely on heuristics to detect calls and returns.</p> |
| 128 | </div> |
| 129 | <div class="sect2"> |
| 130 | <div class="titlepage"><div><div><h3 class="title"> |
| 131 | <a name="cl-manual.basics"></a>6.1.2. Basic Usage</h3></div></div></div> |
| 132 | <p>As with Cachegrind, you probably want to compile with debugging info |
| 133 | (the <code class="option">-g</code> option) and with optimization turned on.</p> |
| 134 | <p>To start a profile run for a program, execute: |
| 135 | </p> |
| 136 | <pre class="screen">valgrind --tool=callgrind [callgrind options] your-program [program options]</pre> |
| 137 | <p> |
| 138 | </p> |
| 139 | <p>While the simulation is running, you can observe execution with: |
| 140 | </p> |
| 141 | <pre class="screen">callgrind_control -b</pre> |
| 142 | <p> |
| 143 | This will print out the current backtrace. To annotate the backtrace with |
| 144 | event counts, run |
| 145 | </p> |
| 146 | <pre class="screen">callgrind_control -e -b</pre> |
| 147 | <p> |
| 148 | </p> |
| 149 | <p>After program termination, a profile data file named |
| 150 | <code class="computeroutput">callgrind.out.<pid></code> |
| 151 | is generated, where <span class="emphasis"><em>pid</em></span> is the process ID |
| 152 | of the program being profiled. |
| 153 | The data file contains information about the calls made in the |
| 154 | program among the functions executed, together with |
| 155 | <span class="command"><strong>Instruction Read</strong></span> (Ir) event counts.</p> |
| 156 | <p>To generate a function-by-function summary from the profile |
| 157 | data file, use |
| 158 | </p> |
| 159 | <pre class="screen">callgrind_annotate [options] callgrind.out.<pid></pre> |
| 160 | <p> |
| 161 | This summary is similar to the output you get from a Cachegrind |
| 162 | run with cg_annotate: the list |
| 163 | of functions is ordered by exclusive cost of functions, which also |
| 164 | are the ones that are shown. |
| 165 | Important for the additional features of Callgrind are |
| 166 | the following two options:</p> |
| 167 | <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> |
| 168 | <li class="listitem"><p><code class="option">--inclusive=yes</code>: Instead of using |
| 169 | exclusive cost of functions as sorting order, use and show |
| 170 | inclusive cost.</p></li> |
| 171 | <li class="listitem"><p><code class="option">--tree=both</code>: Interleave into the |
| 172 | top level list of functions, information on the callers and the callees |
| 173 | of each function. In these lines, which represents executed |
| 174 | calls, the cost gives the number of events spent in the call. |
| 175 | Indented, above each function, there is the list of callers, |
| 176 | and below, the list of callees. The sum of events in calls to |
| 177 | a given function (caller lines), as well as the sum of events in |
| 178 | calls from the function (callee lines) together with the self |
| 179 | cost, gives the total inclusive cost of the function.</p></li> |
| 180 | </ul></div> |
| 181 | <p>Use <code class="option">--auto=yes</code> to get annotated source code |
| 182 | for all relevant functions for which the source can be found. In |
| 183 | addition to source annotation as produced by |
| 184 | <code class="computeroutput">cg_annotate</code>, you will see the |
| 185 | annotated call sites with call counts. For all other options, |
| 186 | consult the (Cachegrind) documentation for |
| 187 | <code class="computeroutput">cg_annotate</code>. |
| 188 | </p> |
| 189 | <p>For better call graph browsing experience, it is highly recommended |
| 190 | to use <a class="ulink" href="http://kcachegrind.sourceforge.net/cgi-bin/show.cgi/KcacheGrindIndex" target="_top">KCachegrind</a>. |
| 191 | If your code |
| 192 | has a significant fraction of its cost in <span class="emphasis"><em>cycles</em></span> (sets |
| 193 | of functions calling each other in a recursive manner), you have to |
| 194 | use KCachegrind, as <code class="computeroutput">callgrind_annotate</code> |
| 195 | currently does not do any cycle detection, which is important to get correct |
| 196 | results in this case.</p> |
| 197 | <p>If you are additionally interested in measuring the |
| 198 | cache behavior of your program, use Callgrind with the option |
| 199 | <code class="option"><a class="xref" href="cl-manual.html#clopt.cache-sim">--cache-sim</a>=yes</code>. For |
| 200 | branch prediction simulation, use <code class="option"><a class="xref" href="cl-manual.html#clopt.branch-sim">--branch-sim</a>=yes</code>. |
| 201 | Expect a further slow down approximately by a factor of 2.</p> |
| 202 | <p>If the program section you want to profile is somewhere in the |
| 203 | middle of the run, it is beneficial to |
| 204 | <span class="emphasis"><em>fast forward</em></span> to this section without any |
| 205 | profiling, and then enable profiling. This is achieved by using |
| 206 | the command line option |
| 207 | <code class="option"><a class="xref" href="cl-manual.html#opt.instr-atstart">--instr-atstart</a>=no</code> |
| 208 | and running, in a shell: |
| 209 | <code class="computeroutput">callgrind_control -i on</code> just before the |
| 210 | interesting code section is executed. To exactly specify |
| 211 | the code position where profiling should start, use the client request |
| 212 | <code class="computeroutput"><a class="xref" href="cl-manual.html#cr.start-instr">CALLGRIND_START_INSTRUMENTATION</a></code>.</p> |
| 213 | <p>If you want to be able to see assembly code level annotation, specify |
| 214 | <code class="option"><a class="xref" href="cl-manual.html#opt.dump-instr">--dump-instr</a>=yes</code>. This will produce |
| 215 | profile data at instruction granularity. Note that the resulting profile |
| 216 | data |
| 217 | can only be viewed with KCachegrind. For assembly annotation, it also is |
| 218 | interesting to see more details of the control flow inside of functions, |
| 219 | i.e. (conditional) jumps. This will be collected by further specifying |
| 220 | <code class="option"><a class="xref" href="cl-manual.html#opt.collect-jumps">--collect-jumps</a>=yes</code>.</p> |
| 221 | </div> |
| 222 | </div> |
| 223 | <div class="sect1"> |
| 224 | <div class="titlepage"><div><div><h2 class="title" style="clear: both"> |
| 225 | <a name="cl-manual.usage"></a>6.2. Advanced Usage</h2></div></div></div> |
| 226 | <div class="sect2"> |
| 227 | <div class="titlepage"><div><div><h3 class="title"> |
| 228 | <a name="cl-manual.dumps"></a>6.2.1. Multiple profiling dumps from one program run</h3></div></div></div> |
| 229 | <p>Sometimes you are not interested in characteristics of a full |
| 230 | program run, but only of a small part of it, for example execution of one |
| 231 | algorithm. If there are multiple algorithms, or one algorithm |
| 232 | running with different input data, it may even be useful to get different |
| 233 | profile information for different parts of a single program run.</p> |
| 234 | <p>Profile data files have names of the form |
| 235 | </p> |
| 236 | <pre class="screen"> |
| 237 | callgrind.out.<span class="emphasis"><em>pid</em></span>.<span class="emphasis"><em>part</em></span>-<span class="emphasis"><em>threadID</em></span> |
| 238 | </pre> |
| 239 | <p> |
| 240 | </p> |
| 241 | <p>where <span class="emphasis"><em>pid</em></span> is the PID of the running |
| 242 | program, <span class="emphasis"><em>part</em></span> is a number incremented on each |
| 243 | dump (".part" is skipped for the dump at program termination), and |
| 244 | <span class="emphasis"><em>threadID</em></span> is a thread identification |
| 245 | ("-threadID" is only used if you request dumps of individual |
| 246 | threads with <code class="option"><a class="xref" href="cl-manual.html#opt.separate-threads">--separate-threads</a>=yes</code>).</p> |
| 247 | <p>There are different ways to generate multiple profile dumps |
| 248 | while a program is running under Callgrind's supervision. Nevertheless, |
| 249 | all methods trigger the same action, which is "dump all profile |
| 250 | information since the last dump or program start, and zero cost |
| 251 | counters afterwards". To allow for zeroing cost counters without |
| 252 | dumping, there is a second action "zero all cost counters now". |
| 253 | The different methods are:</p> |
| 254 | <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> |
| 255 | <li class="listitem"><p><span class="command"><strong>Dump on program termination.</strong></span> |
| 256 | This method is the standard way and doesn't need any special |
| 257 | action on your part.</p></li> |
| 258 | <li class="listitem"> |
| 259 | <p><span class="command"><strong>Spontaneous, interactive dumping.</strong></span> Use |
| 260 | </p> |
| 261 | <pre class="screen">callgrind_control -d [hint [PID/Name]]</pre> |
| 262 | <p> to |
| 263 | request the dumping of profile information of the supervised |
| 264 | application with PID or Name. <span class="emphasis"><em>hint</em></span> is an |
| 265 | arbitrary string you can optionally specify to later be able to |
| 266 | distinguish profile dumps. The control program will not terminate |
| 267 | before the dump is completely written. Note that the application |
| 268 | must be actively running for detection of the dump command. So, |
| 269 | for a GUI application, resize the window, or for a server, send a |
| 270 | request.</p> |
| 271 | <p>If you are using <a class="ulink" href="http://kcachegrind.sourceforge.net/cgi-bin/show.cgi/KcacheGrindIndex" target="_top">KCachegrind</a> |
| 272 | for browsing of profile information, you can use the toolbar |
| 273 | button <span class="command"><strong>Force dump</strong></span>. This will request a dump |
| 274 | and trigger a reload after the dump is written.</p> |
| 275 | </li> |
| 276 | <li class="listitem"><p><span class="command"><strong>Periodic dumping after execution of a specified |
| 277 | number of basic blocks</strong></span>. For this, use the command line |
| 278 | option <code class="option"><a class="xref" href="cl-manual.html#opt.dump-every-bb">--dump-every-bb</a>=count</code>. |
| 279 | </p></li> |
| 280 | <li class="listitem"> |
| 281 | <p><span class="command"><strong>Dumping at enter/leave of specified functions.</strong></span> |
| 282 | Use the |
| 283 | option <code class="option"><a class="xref" href="cl-manual.html#opt.dump-before">--dump-before</a>=function</code> |
| 284 | and <code class="option"><a class="xref" href="cl-manual.html#opt.dump-after">--dump-after</a>=function</code>. |
| 285 | To zero cost counters before entering a function, use |
| 286 | <code class="option"><a class="xref" href="cl-manual.html#opt.zero-before">--zero-before</a>=function</code>.</p> |
| 287 | <p>You can specify these options multiple times for different |
| 288 | functions. Function specifications support wildcards: e.g. use |
| 289 | <code class="option"><a class="xref" href="cl-manual.html#opt.dump-before">--dump-before</a>='foo*'</code> to |
| 290 | generate dumps before entering any function starting with |
| 291 | <span class="emphasis"><em>foo</em></span>.</p> |
| 292 | </li> |
| 293 | <li class="listitem"><p><span class="command"><strong>Program controlled dumping.</strong></span> |
| 294 | Insert |
| 295 | <code class="computeroutput"><a class="xref" href="cl-manual.html#cr.dump-stats">CALLGRIND_DUMP_STATS</a>;</code> |
| 296 | at the position in your code where you want a profile dump to happen. Use |
| 297 | <code class="computeroutput"><a class="xref" href="cl-manual.html#cr.zero-stats">CALLGRIND_ZERO_STATS</a>;</code> to only |
| 298 | zero profile counters. |
| 299 | See <a class="xref" href="cl-manual.html#cl-manual.clientrequests" title="6.5. Callgrind specific client requests">Client request reference</a> for more information on |
| 300 | Callgrind specific client requests.</p></li> |
| 301 | </ul></div> |
| 302 | <p>If you are running a multi-threaded application and specify the |
| 303 | command line option <code class="option"><a class="xref" href="cl-manual.html#opt.separate-threads">--separate-threads</a>=yes</code>, |
| 304 | every thread will be profiled on its own and will create its own |
| 305 | profile dump. Thus, the last two methods will only generate one dump |
| 306 | of the currently running thread. With the other methods, you will get |
| 307 | multiple dumps (one for each thread) on a dump request.</p> |
| 308 | </div> |
| 309 | <div class="sect2"> |
| 310 | <div class="titlepage"><div><div><h3 class="title"> |
| 311 | <a name="cl-manual.limits"></a>6.2.2. Limiting the range of collected events</h3></div></div></div> |
| 312 | <p>By default, whenever events are happening (such as an |
| 313 | instruction execution or cache hit/miss), Callgrind is aggregating |
| 314 | them into event counters. However, you may be interested only in |
| 315 | what is happening within a given function or starting from a given |
| 316 | program phase. To this end, you can disable event aggregation for |
| 317 | uninteresting program parts. While attribution of events to |
Elliott Hughes | ed39800 | 2017-06-21 14:41:24 -0700 | [diff] [blame] | 318 | functions as well as producing separate output per program phase |
Elliott Hughes | a0664b9 | 2017-04-18 17:46:52 -0700 | [diff] [blame] | 319 | can be done by other means (see previous section), there are two |
| 320 | benefits by disabling aggregation. First, this is very |
| 321 | fine-granular (e.g. just for a loop within a function). Second, |
| 322 | disabling event aggregation for complete program phases allows to |
| 323 | switch off time-consuming cache simulation and allows Callgrind to |
| 324 | progress at much higher speed with an slowdown of around factor 2 |
| 325 | (identical to <code class="computeroutput">valgrind |
| 326 | --tool=none</code>). |
| 327 | </p> |
| 328 | <p>There are two aspects which influence whether Callgrind is |
| 329 | aggregating events at some point in time of program execution. |
| 330 | First, there is the <span class="emphasis"><em>collection state</em></span>. If this |
| 331 | is off, no aggregation will be done. By changing the collection |
| 332 | state, you can control event aggregation at a very fine |
| 333 | granularity. However, there is not much difference in regard to |
| 334 | execution speed of Callgrind. By default, collection is switched |
| 335 | on, but can be disabled by different means (see below). Second, |
| 336 | there is the <span class="emphasis"><em>instrumentation mode</em></span> in which |
| 337 | Callgrind is running. This mode either can be on or off. If |
| 338 | instrumentation is off, no observation of actions in the program |
| 339 | will be done and thus, no actions will be forwarded to the |
| 340 | simulator which could trigger events. In the end, no events will |
| 341 | be aggregated. The huge benefit is the much higher speed with |
| 342 | instrumentation switched off. However, this only should be used |
| 343 | with care and in a coarse fashion: every mode change resets the |
| 344 | simulator state (ie. whether a memory block is cached or not) and |
| 345 | flushes Valgrinds internal cache of instrumented code blocks, |
| 346 | resulting in latency penalty at switching time. Also, cache |
| 347 | simulator results directly after switching on instrumentation will |
| 348 | be skewed due to identified cache misses which would not happen in |
| 349 | reality (if you care about this warm-up effect, you should make |
| 350 | sure to temporarly have collection state switched off directly |
| 351 | after turning instrumentation mode on). However, switching |
| 352 | instrumentation state is very useful to skip larger program phases |
| 353 | such as an initialization phase. By default, instrumentation is |
| 354 | switched on, but as with the collection state, can be changed by |
| 355 | various means. |
| 356 | </p> |
| 357 | <p>Callgrind can start with instrumentation mode switched off by |
| 358 | specifying |
| 359 | option <code class="option"><a class="xref" href="cl-manual.html#opt.instr-atstart">--instr-atstart</a>=no</code>. |
| 360 | Afterwards, instrumentation can be controlled in two ways: first, |
| 361 | interactively with: </p> |
| 362 | <pre class="screen">callgrind_control -i on</pre> |
| 363 | <p> (and |
| 364 | switching off again by specifying "off" instead of "on"). Second, |
| 365 | instrumentation state can be programatically changed with the |
| 366 | macros <code class="computeroutput"><a class="xref" href="cl-manual.html#cr.start-instr">CALLGRIND_START_INSTRUMENTATION</a>;</code> |
| 367 | and <code class="computeroutput"><a class="xref" href="cl-manual.html#cr.stop-instr">CALLGRIND_STOP_INSTRUMENTATION</a>;</code>. |
| 368 | </p> |
| 369 | <p>Similarly, the collection state at program start can be |
| 370 | switched off |
| 371 | by <code class="option"><a class="xref" href="cl-manual.html#opt.instr-atstart">--instr-atstart</a>=no</code>. During |
| 372 | execution, it can be controlled programatically with the |
| 373 | macro <code class="computeroutput">CALLGRIND_TOGGLE_COLLECT;</code>. |
| 374 | Further, you can limit event collection to a specific function by |
| 375 | using <code class="option"><a class="xref" href="cl-manual.html#opt.toggle-collect">--toggle-collect</a>=function</code>. |
| 376 | This will toggle the collection state on entering and leaving the |
| 377 | specified function. When this option is in effect, the default |
| 378 | collection state at program start is "off". Only events happening |
| 379 | while running inside of the given function will be |
| 380 | collected. Recursive calls of the given function do not trigger |
| 381 | any action. This option can be given multiple times to specify |
| 382 | different functions of interest.</p> |
| 383 | </div> |
| 384 | <div class="sect2"> |
| 385 | <div class="titlepage"><div><div><h3 class="title"> |
| 386 | <a name="cl-manual.busevents"></a>6.2.3. Counting global bus events</h3></div></div></div> |
| 387 | <p>For access to shared data among threads in a multithreaded |
| 388 | code, synchronization is required to avoid raced conditions. |
| 389 | Synchronization primitives are usually implemented via atomic instructions. |
| 390 | However, excessive use of such instructions can lead to performance |
| 391 | issues.</p> |
| 392 | <p>To enable analysis of this problem, Callgrind optionally can count |
| 393 | the number of atomic instructions executed. More precisely, for x86/x86_64, |
| 394 | these are instructions using a lock prefix. For architectures supporting |
| 395 | LL/SC, these are the number of SC instructions executed. For both, the term |
| 396 | "global bus events" is used.</p> |
| 397 | <p>The short name of the event type used for global bus events is "Ge". |
| 398 | To count global bus events, use <code class="option"><a class="xref" href="cl-manual.html#clopt.collect-bus">--collect-bus</a>=yes</code>. |
| 399 | </p> |
| 400 | </div> |
| 401 | <div class="sect2"> |
| 402 | <div class="titlepage"><div><div><h3 class="title"> |
| 403 | <a name="cl-manual.cycles"></a>6.2.4. Avoiding cycles</h3></div></div></div> |
| 404 | <p>Informally speaking, a cycle is a group of functions which |
| 405 | call each other in a recursive way.</p> |
| 406 | <p>Formally speaking, a cycle is a nonempty set S of functions, |
| 407 | such that for every pair of functions F and G in S, it is possible |
| 408 | to call from F to G (possibly via intermediate functions) and also |
| 409 | from G to F. Furthermore, S must be maximal -- that is, be the |
| 410 | largest set of functions satisfying this property. For example, if |
| 411 | a third function H is called from inside S and calls back into S, |
| 412 | then H is also part of the cycle and should be included in S.</p> |
| 413 | <p>Recursion is quite usual in programs, and therefore, cycles |
| 414 | sometimes appear in the call graph output of Callgrind. However, |
| 415 | the title of this chapter should raise two questions: What is bad |
| 416 | about cycles which makes you want to avoid them? And: How can |
| 417 | cycles be avoided without changing program code?</p> |
| 418 | <p>Cycles are not bad in itself, but tend to make performance |
| 419 | analysis of your code harder. This is because inclusive costs |
| 420 | for calls inside of a cycle are meaningless. The definition of |
| 421 | inclusive cost, i.e. self cost of a function plus inclusive cost |
| 422 | of its callees, needs a topological order among functions. For |
| 423 | cycles, this does not hold true: callees of a function in a cycle include |
| 424 | the function itself. Therefore, KCachegrind does cycle detection |
| 425 | and skips visualization of any inclusive cost for calls inside |
Elliott Hughes | ed39800 | 2017-06-21 14:41:24 -0700 | [diff] [blame] | 426 | of cycles. Further, all functions in a cycle are collapsed into artificial |
Elliott Hughes | a0664b9 | 2017-04-18 17:46:52 -0700 | [diff] [blame] | 427 | functions called like <code class="computeroutput">Cycle 1</code>.</p> |
| 428 | <p>Now, when a program exposes really big cycles (as is |
| 429 | true for some GUI code, or in general code using event or callback based |
| 430 | programming style), you lose the nice property to let you pinpoint |
| 431 | the bottlenecks by following call chains from |
| 432 | <code class="function">main</code>, guided via |
| 433 | inclusive cost. In addition, KCachegrind loses its ability to show |
| 434 | interesting parts of the call graph, as it uses inclusive costs to |
| 435 | cut off uninteresting areas.</p> |
| 436 | <p>Despite the meaningless of inclusive costs in cycles, the big |
| 437 | drawback for visualization motivates the possibility to temporarily |
| 438 | switch off cycle detection in KCachegrind, which can lead to |
| 439 | misguiding visualization. However, often cycles appear because of |
| 440 | unlucky superposition of independent call chains in a way that |
| 441 | the profile result will see a cycle. Neglecting uninteresting |
| 442 | calls with very small measured inclusive cost would break these |
| 443 | cycles. In such cases, incorrect handling of cycles by not detecting |
| 444 | them still gives meaningful profiling visualization.</p> |
| 445 | <p>It has to be noted that currently, <span class="command"><strong>callgrind_annotate</strong></span> |
| 446 | does not do any cycle detection at all. For program executions with function |
| 447 | recursion, it e.g. can print nonsense inclusive costs way above 100%.</p> |
| 448 | <p>After describing why cycles are bad for profiling, it is worth |
| 449 | talking about cycle avoidance. The key insight here is that symbols in |
| 450 | the profile data do not have to exactly match the symbols found in the |
| 451 | program. Instead, the symbol name could encode additional information |
| 452 | from the current execution context such as recursion level of the |
| 453 | current function, or even some part of the call chain leading to the |
| 454 | function. While encoding of additional information into symbols is |
| 455 | quite capable of avoiding cycles, it has to be used carefully to not cause |
| 456 | symbol explosion. The latter imposes large memory requirement for Callgrind |
| 457 | with possible out-of-memory conditions, and big profile data files.</p> |
| 458 | <p>A further possibility to avoid cycles in Callgrind's profile data |
| 459 | output is to simply leave out given functions in the call graph. Of course, this |
| 460 | also skips any call information from and to an ignored function, and thus can |
| 461 | break a cycle. Candidates for this typically are dispatcher functions in event |
| 462 | driven code. The option to ignore calls to a function is |
| 463 | <code class="option"><a class="xref" href="cl-manual.html#opt.fn-skip">--fn-skip</a>=function</code>. Aside from |
| 464 | possibly breaking cycles, this is used in Callgrind to skip |
| 465 | trampoline functions in the PLT sections |
| 466 | for calls to functions in shared libraries. You can see the difference |
| 467 | if you profile with <code class="option"><a class="xref" href="cl-manual.html#opt.skip-plt">--skip-plt</a>=no</code>. |
| 468 | If a call is ignored, its cost events will be propagated to the |
| 469 | enclosing function.</p> |
| 470 | <p>If you have a recursive function, you can distinguish the first |
| 471 | 10 recursion levels by specifying |
| 472 | <code class="option"><a class="xref" href="cl-manual.html#opt.separate-recs-num">--separate-recs10</a>=function</code>. |
| 473 | Or for all functions with |
| 474 | <code class="option"><a class="xref" href="cl-manual.html#opt.separate-recs">--separate-recs</a>=10</code>, but this will |
| 475 | give you much bigger profile data files. In the profile data, you will see |
| 476 | the recursion levels of "func" as the different functions with names |
| 477 | "func", "func'2", "func'3" and so on.</p> |
| 478 | <p>If you have call chains "A > B > C" and "A > C > B" |
| 479 | in your program, you usually get a "false" cycle "B <> C". Use |
| 480 | <code class="option"><a class="xref" href="cl-manual.html#opt.separate-callers-num">--separate-callers2</a>=B</code> |
| 481 | <code class="option"><a class="xref" href="cl-manual.html#opt.separate-callers-num">--separate-callers2</a>=C</code>, |
| 482 | and functions "B" and "C" will be treated as different functions |
| 483 | depending on the direct caller. Using the apostrophe for appending |
| 484 | this "context" to the function name, you get "A > B'A > C'B" |
| 485 | and "A > C'A > B'C", and there will be no cycle. Use |
| 486 | <code class="option"><a class="xref" href="cl-manual.html#opt.separate-callers">--separate-callers</a>=2</code> to get a 2-caller |
| 487 | dependency for all functions. Note that doing this will increase |
| 488 | the size of profile data files.</p> |
| 489 | </div> |
| 490 | <div class="sect2"> |
| 491 | <div class="titlepage"><div><div><h3 class="title"> |
| 492 | <a name="cl-manual.forkingprograms"></a>6.2.5. Forking Programs</h3></div></div></div> |
| 493 | <p>If your program forks, the child will inherit all the profiling |
| 494 | data that has been gathered for the parent. To start with empty profile |
| 495 | counter values in the child, the client request |
| 496 | <code class="computeroutput"><a class="xref" href="cl-manual.html#cr.zero-stats">CALLGRIND_ZERO_STATS</a>;</code> |
| 497 | can be inserted into code to be executed by the child, directly after |
| 498 | <code class="computeroutput">fork</code>.</p> |
| 499 | <p>However, you will have to make sure that the output file format string |
| 500 | (controlled by <code class="option">--callgrind-out-file</code>) does contain |
| 501 | <code class="option">%p</code> (which is true by default). Otherwise, the |
| 502 | outputs from the parent and child will overwrite each other or will be |
| 503 | intermingled, which almost certainly is not what you want.</p> |
| 504 | <p>You will be able to control the new child independently from |
| 505 | the parent via callgrind_control.</p> |
| 506 | </div> |
| 507 | </div> |
| 508 | <div class="sect1"> |
| 509 | <div class="titlepage"><div><div><h2 class="title" style="clear: both"> |
| 510 | <a name="cl-manual.options"></a>6.3. Callgrind Command-line Options</h2></div></div></div> |
| 511 | <p> |
| 512 | In the following, options are grouped into classes. |
| 513 | </p> |
| 514 | <p> |
| 515 | Some options allow the specification of a function/symbol name, such as |
| 516 | <code class="option"><a class="xref" href="cl-manual.html#opt.dump-before">--dump-before</a>=function</code>, or |
| 517 | <code class="option"><a class="xref" href="cl-manual.html#opt.fn-skip">--fn-skip</a>=function</code>. All these options |
| 518 | can be specified multiple times for different functions. |
| 519 | In addition, the function specifications actually are patterns by supporting |
| 520 | the use of wildcards '*' (zero or more arbitrary characters) and '?' |
| 521 | (exactly one arbitrary character), similar to file name globbing in the |
| 522 | shell. This feature is important especially for C++, as without wildcard |
| 523 | usage, the function would have to be specified in full extent, including |
| 524 | parameter signature. </p> |
| 525 | <div class="sect2"> |
| 526 | <div class="titlepage"><div><div><h3 class="title"> |
| 527 | <a name="cl-manual.options.creation"></a>6.3.1. Dump creation options</h3></div></div></div> |
| 528 | <p> |
| 529 | These options influence the name and format of the profile data files. |
| 530 | </p> |
| 531 | <div class="variablelist"> |
| 532 | <a name="cl.opts.list.creation"></a><dl class="variablelist"> |
| 533 | <dt> |
| 534 | <a name="opt.callgrind-out-file"></a><span class="term"> |
| 535 | <code class="option">--callgrind-out-file=<file> </code> |
| 536 | </span> |
| 537 | </dt> |
| 538 | <dd><p>Write the profile data to |
| 539 | <code class="computeroutput">file</code> rather than to the default |
| 540 | output file, |
| 541 | <code class="computeroutput">callgrind.out.<pid></code>. The |
| 542 | <code class="option">%p</code> and <code class="option">%q</code> format specifiers |
| 543 | can be used to embed the process ID and/or the contents of an |
| 544 | environment variable in the name, as is the case for the core |
| 545 | option <code class="option"><a class="xref" href="manual-core.html#opt.log-file">--log-file</a></code>. |
| 546 | When multiple dumps are made, the file name |
| 547 | is modified further; see below.</p></dd> |
| 548 | <dt> |
| 549 | <a name="opt.dump-line"></a><span class="term"> |
| 550 | <code class="option">--dump-line=<no|yes> [default: yes] </code> |
| 551 | </span> |
| 552 | </dt> |
| 553 | <dd><p>This specifies that event counting should be performed at |
| 554 | source line granularity. This allows source annotation for sources |
| 555 | which are compiled with debug information |
| 556 | (<code class="option">-g</code>).</p></dd> |
| 557 | <dt> |
| 558 | <a name="opt.dump-instr"></a><span class="term"> |
| 559 | <code class="option">--dump-instr=<no|yes> [default: no] </code> |
| 560 | </span> |
| 561 | </dt> |
| 562 | <dd><p>This specifies that event counting should be performed at |
| 563 | per-instruction granularity. |
| 564 | This allows for assembly code |
| 565 | annotation. Currently the results can only be |
| 566 | displayed by KCachegrind.</p></dd> |
| 567 | <dt> |
| 568 | <a name="opt.compress-strings"></a><span class="term"> |
| 569 | <code class="option">--compress-strings=<no|yes> [default: yes] </code> |
| 570 | </span> |
| 571 | </dt> |
| 572 | <dd><p>This option influences the output format of the profile data. |
| 573 | It specifies whether strings (file and function names) should be |
| 574 | identified by numbers. This shrinks the file, |
| 575 | but makes it more difficult |
| 576 | for humans to read (which is not recommended in any case).</p></dd> |
| 577 | <dt> |
| 578 | <a name="opt.compress-pos"></a><span class="term"> |
| 579 | <code class="option">--compress-pos=<no|yes> [default: yes] </code> |
| 580 | </span> |
| 581 | </dt> |
| 582 | <dd><p>This option influences the output format of the profile data. |
| 583 | It specifies whether numerical positions are always specified as absolute |
| 584 | values or are allowed to be relative to previous numbers. |
| 585 | This shrinks the file size.</p></dd> |
| 586 | <dt> |
| 587 | <a name="opt.combine-dumps"></a><span class="term"> |
| 588 | <code class="option">--combine-dumps=<no|yes> [default: no] </code> |
| 589 | </span> |
| 590 | </dt> |
| 591 | <dd><p>When enabled, when multiple profile data parts are to be |
| 592 | generated these parts are appended to the same output file. |
| 593 | Not recommended.</p></dd> |
| 594 | </dl> |
| 595 | </div> |
| 596 | </div> |
| 597 | <div class="sect2"> |
| 598 | <div class="titlepage"><div><div><h3 class="title"> |
| 599 | <a name="cl-manual.options.activity"></a>6.3.2. Activity options</h3></div></div></div> |
| 600 | <p> |
| 601 | These options specify when actions relating to event counts are to |
| 602 | be executed. For interactive control use callgrind_control. |
| 603 | </p> |
| 604 | <div class="variablelist"> |
| 605 | <a name="cl.opts.list.activity"></a><dl class="variablelist"> |
| 606 | <dt> |
| 607 | <a name="opt.dump-every-bb"></a><span class="term"> |
| 608 | <code class="option">--dump-every-bb=<count> [default: 0, never] </code> |
| 609 | </span> |
| 610 | </dt> |
| 611 | <dd><p>Dump profile data every <code class="option">count</code> basic blocks. |
| 612 | Whether a dump is needed is only checked when Valgrind's internal |
| 613 | scheduler is run. Therefore, the minimum setting useful is about 100000. |
| 614 | The count is a 64-bit value to make long dump periods possible. |
| 615 | </p></dd> |
| 616 | <dt> |
| 617 | <a name="opt.dump-before"></a><span class="term"> |
| 618 | <code class="option">--dump-before=<function> </code> |
| 619 | </span> |
| 620 | </dt> |
| 621 | <dd><p>Dump when entering <code class="option">function</code>.</p></dd> |
| 622 | <dt> |
| 623 | <a name="opt.zero-before"></a><span class="term"> |
| 624 | <code class="option">--zero-before=<function> </code> |
| 625 | </span> |
| 626 | </dt> |
| 627 | <dd><p>Zero all costs when entering <code class="option">function</code>.</p></dd> |
| 628 | <dt> |
| 629 | <a name="opt.dump-after"></a><span class="term"> |
| 630 | <code class="option">--dump-after=<function> </code> |
| 631 | </span> |
| 632 | </dt> |
| 633 | <dd><p>Dump when leaving <code class="option">function</code>.</p></dd> |
| 634 | </dl> |
| 635 | </div> |
| 636 | </div> |
| 637 | <div class="sect2"> |
| 638 | <div class="titlepage"><div><div><h3 class="title"> |
| 639 | <a name="cl-manual.options.collection"></a>6.3.3. Data collection options</h3></div></div></div> |
| 640 | <p> |
| 641 | These options specify when events are to be aggregated into event counts. |
| 642 | Also see <a class="xref" href="cl-manual.html#cl-manual.limits" title="6.2.2. Limiting the range of collected events">Limiting range of event collection</a>.</p> |
| 643 | <div class="variablelist"> |
| 644 | <a name="cl.opts.list.collection"></a><dl class="variablelist"> |
| 645 | <dt> |
| 646 | <a name="opt.instr-atstart"></a><span class="term"> |
| 647 | <code class="option">--instr-atstart=<yes|no> [default: yes] </code> |
| 648 | </span> |
| 649 | </dt> |
| 650 | <dd> |
| 651 | <p>Specify if you want Callgrind to start simulation and |
| 652 | profiling from the beginning of the program. |
| 653 | When set to <code class="computeroutput">no</code>, |
| 654 | Callgrind will not be able |
| 655 | to collect any information, including calls, but it will have at |
| 656 | most a slowdown of around 4, which is the minimum Valgrind |
| 657 | overhead. Instrumentation can be interactively enabled via |
| 658 | <code class="computeroutput">callgrind_control -i on</code>.</p> |
| 659 | <p>Note that the resulting call graph will most probably not |
| 660 | contain <code class="function">main</code>, but will contain all the |
| 661 | functions executed after instrumentation was enabled. |
| 662 | Instrumentation can also programatically enabled/disabled. See the |
| 663 | Callgrind include file |
| 664 | <code class="computeroutput">callgrind.h</code> for the macro |
| 665 | you have to use in your source code.</p> |
| 666 | <p>For cache |
| 667 | simulation, results will be less accurate when switching on |
| 668 | instrumentation later in the program run, as the simulator starts |
| 669 | with an empty cache at that moment. Switch on event collection |
| 670 | later to cope with this error.</p> |
| 671 | </dd> |
| 672 | <dt> |
| 673 | <a name="opt.collect-atstart"></a><span class="term"> |
| 674 | <code class="option">--collect-atstart=<yes|no> [default: yes] </code> |
| 675 | </span> |
| 676 | </dt> |
| 677 | <dd> |
| 678 | <p>Specify whether event collection is enabled at beginning |
| 679 | of the profile run.</p> |
| 680 | <p>To only look at parts of your program, you have two |
| 681 | possibilities:</p> |
| 682 | <div class="orderedlist"><ol class="orderedlist" type="1"> |
| 683 | <li class="listitem"><p>Zero event counters before entering the program part you |
| 684 | want to profile, and dump the event counters to a file after |
| 685 | leaving that program part.</p></li> |
| 686 | <li class="listitem"><p>Switch on/off collection state as needed to only see |
| 687 | event counters happening while inside of the program part you |
| 688 | want to profile.</p></li> |
| 689 | </ol></div> |
| 690 | <p>The second option can be used if the program part you want to |
| 691 | profile is called many times. Option 1, i.e. creating a lot of |
| 692 | dumps is not practical here.</p> |
| 693 | <p>Collection state can be |
| 694 | toggled at entry and exit of a given function with the |
| 695 | option <code class="option"><a class="xref" href="cl-manual.html#opt.toggle-collect">--toggle-collect</a></code>. If you |
| 696 | use this option, collection |
| 697 | state should be disabled at the beginning. Note that the |
| 698 | specification of <code class="option">--toggle-collect</code> |
| 699 | implicitly sets |
| 700 | <code class="option">--collect-state=no</code>.</p> |
| 701 | <p>Collection state can be toggled also by inserting the client request |
| 702 | <code class="computeroutput"> |
| 703 | |
| 704 | CALLGRIND_TOGGLE_COLLECT |
| 705 | ;</code> |
| 706 | at the needed code positions.</p> |
| 707 | </dd> |
| 708 | <dt> |
| 709 | <a name="opt.toggle-collect"></a><span class="term"> |
| 710 | <code class="option">--toggle-collect=<function> </code> |
| 711 | </span> |
| 712 | </dt> |
| 713 | <dd><p>Toggle collection on entry/exit of <code class="option">function</code>.</p></dd> |
| 714 | <dt> |
| 715 | <a name="opt.collect-jumps"></a><span class="term"> |
| 716 | <code class="option">--collect-jumps=<no|yes> [default: no] </code> |
| 717 | </span> |
| 718 | </dt> |
| 719 | <dd><p>This specifies whether information for (conditional) jumps |
| 720 | should be collected. As above, callgrind_annotate currently is not |
| 721 | able to show you the data. You have to use KCachegrind to get jump |
| 722 | arrows in the annotated code.</p></dd> |
| 723 | <dt> |
| 724 | <a name="opt.collect-systime"></a><span class="term"> |
| 725 | <code class="option">--collect-systime=<no|yes> [default: no] </code> |
| 726 | </span> |
| 727 | </dt> |
| 728 | <dd><p>This specifies whether information for system call times |
| 729 | should be collected.</p></dd> |
| 730 | <dt> |
| 731 | <a name="clopt.collect-bus"></a><span class="term"> |
| 732 | <code class="option">--collect-bus=<no|yes> [default: no] </code> |
| 733 | </span> |
| 734 | </dt> |
| 735 | <dd><p>This specifies whether the number of global bus events executed |
| 736 | should be collected. The event type "Ge" is used for these events.</p></dd> |
| 737 | </dl> |
| 738 | </div> |
| 739 | </div> |
| 740 | <div class="sect2"> |
| 741 | <div class="titlepage"><div><div><h3 class="title"> |
| 742 | <a name="cl-manual.options.separation"></a>6.3.4. Cost entity separation options</h3></div></div></div> |
| 743 | <p> |
| 744 | These options specify how event counts should be attributed to execution |
| 745 | contexts. |
| 746 | For example, they specify whether the recursion level or the |
| 747 | call chain leading to a function should be taken into account, |
| 748 | and whether the thread ID should be considered. |
| 749 | Also see <a class="xref" href="cl-manual.html#cl-manual.cycles" title="6.2.4. Avoiding cycles">Avoiding cycles</a>.</p> |
| 750 | <div class="variablelist"> |
| 751 | <a name="cmd-options.separation"></a><dl class="variablelist"> |
| 752 | <dt> |
| 753 | <a name="opt.separate-threads"></a><span class="term"> |
| 754 | <code class="option">--separate-threads=<no|yes> [default: no] </code> |
| 755 | </span> |
| 756 | </dt> |
| 757 | <dd><p>This option specifies whether profile data should be generated |
| 758 | separately for every thread. If yes, the file names get "-threadID" |
| 759 | appended.</p></dd> |
| 760 | <dt> |
| 761 | <a name="opt.separate-callers"></a><span class="term"> |
| 762 | <code class="option">--separate-callers=<callers> [default: 0] </code> |
| 763 | </span> |
| 764 | </dt> |
| 765 | <dd><p>Separate contexts by at most <callers> functions in the |
| 766 | call chain. See <a class="xref" href="cl-manual.html#cl-manual.cycles" title="6.2.4. Avoiding cycles">Avoiding cycles</a>.</p></dd> |
| 767 | <dt> |
| 768 | <a name="opt.separate-callers-num"></a><span class="term"> |
| 769 | <code class="option">--separate-callers<number>=<function> </code> |
| 770 | </span> |
| 771 | </dt> |
| 772 | <dd><p>Separate <code class="option">number</code> callers for <code class="option">function</code>. |
| 773 | See <a class="xref" href="cl-manual.html#cl-manual.cycles" title="6.2.4. Avoiding cycles">Avoiding cycles</a>.</p></dd> |
| 774 | <dt> |
| 775 | <a name="opt.separate-recs"></a><span class="term"> |
| 776 | <code class="option">--separate-recs=<level> [default: 2] </code> |
| 777 | </span> |
| 778 | </dt> |
| 779 | <dd><p>Separate function recursions by at most <code class="option">level</code> levels. |
| 780 | See <a class="xref" href="cl-manual.html#cl-manual.cycles" title="6.2.4. Avoiding cycles">Avoiding cycles</a>.</p></dd> |
| 781 | <dt> |
| 782 | <a name="opt.separate-recs-num"></a><span class="term"> |
| 783 | <code class="option">--separate-recs<number>=<function> </code> |
| 784 | </span> |
| 785 | </dt> |
| 786 | <dd><p>Separate <code class="option">number</code> recursions for <code class="option">function</code>. |
| 787 | See <a class="xref" href="cl-manual.html#cl-manual.cycles" title="6.2.4. Avoiding cycles">Avoiding cycles</a>.</p></dd> |
| 788 | <dt> |
| 789 | <a name="opt.skip-plt"></a><span class="term"> |
| 790 | <code class="option">--skip-plt=<no|yes> [default: yes] </code> |
| 791 | </span> |
| 792 | </dt> |
| 793 | <dd><p>Ignore calls to/from PLT sections.</p></dd> |
| 794 | <dt> |
| 795 | <a name="opt.skip-direct-rec"></a><span class="term"> |
| 796 | <code class="option">--skip-direct-rec=<no|yes> [default: yes] </code> |
| 797 | </span> |
| 798 | </dt> |
| 799 | <dd><p>Ignore direct recursions.</p></dd> |
| 800 | <dt> |
| 801 | <a name="opt.fn-skip"></a><span class="term"> |
| 802 | <code class="option">--fn-skip=<function> </code> |
| 803 | </span> |
| 804 | </dt> |
| 805 | <dd> |
| 806 | <p>Ignore calls to/from a given function. E.g. if you have a |
| 807 | call chain A > B > C, and you specify function B to be |
| 808 | ignored, you will only see A > C.</p> |
| 809 | <p>This is very convenient to skip functions handling callback |
| 810 | behaviour. For example, with the signal/slot mechanism in the |
| 811 | Qt graphics library, you only want |
| 812 | to see the function emitting a signal to call the slots connected |
| 813 | to that signal. First, determine the real call chain to see the |
| 814 | functions needed to be skipped, then use this option.</p> |
| 815 | </dd> |
| 816 | </dl> |
| 817 | </div> |
| 818 | </div> |
| 819 | <div class="sect2"> |
| 820 | <div class="titlepage"><div><div><h3 class="title"> |
| 821 | <a name="cl-manual.options.simulation"></a>6.3.5. Simulation options</h3></div></div></div> |
| 822 | <div class="variablelist"> |
| 823 | <a name="cl.opts.list.simulation"></a><dl class="variablelist"> |
| 824 | <dt> |
| 825 | <a name="clopt.cache-sim"></a><span class="term"> |
| 826 | <code class="option">--cache-sim=<yes|no> [default: no] </code> |
| 827 | </span> |
| 828 | </dt> |
| 829 | <dd><p>Specify if you want to do full cache simulation. By default, |
| 830 | only instruction read accesses will be counted ("Ir"). |
| 831 | With cache simulation, further event counters are enabled: |
| 832 | Cache misses on instruction reads ("I1mr"/"ILmr"), |
| 833 | data read accesses ("Dr") and related cache misses ("D1mr"/"DLmr"), |
| 834 | data write accesses ("Dw") and related cache misses ("D1mw"/"DLmw"). |
| 835 | For more information, see <a class="xref" href="cg-manual.html" title="5. Cachegrind: a cache and branch-prediction profiler">Cachegrind: a cache and branch-prediction profiler</a>. |
| 836 | </p></dd> |
| 837 | <dt> |
| 838 | <a name="clopt.branch-sim"></a><span class="term"> |
| 839 | <code class="option">--branch-sim=<yes|no> [default: no] </code> |
| 840 | </span> |
| 841 | </dt> |
| 842 | <dd><p>Specify if you want to do branch prediction simulation. |
| 843 | Further event counters are enabled: Number of executed conditional |
| 844 | branches and related predictor misses ("Bc"/"Bcm"), executed indirect |
| 845 | jumps and related misses of the jump address predictor ("Bi"/"Bim"). |
| 846 | </p></dd> |
| 847 | </dl> |
| 848 | </div> |
| 849 | </div> |
| 850 | <div class="sect2"> |
| 851 | <div class="titlepage"><div><div><h3 class="title"> |
| 852 | <a name="cl-manual.options.cachesimulation"></a>6.3.6. Cache simulation options</h3></div></div></div> |
| 853 | <div class="variablelist"> |
| 854 | <a name="cl.opts.list.cachesimulation"></a><dl class="variablelist"> |
| 855 | <dt> |
| 856 | <a name="opt.simulate-wb"></a><span class="term"> |
| 857 | <code class="option">--simulate-wb=<yes|no> [default: no] </code> |
| 858 | </span> |
| 859 | </dt> |
| 860 | <dd><p>Specify whether write-back behavior should be simulated, allowing |
| 861 | to distinguish LL caches misses with and without write backs. |
| 862 | The cache model of Cachegrind/Callgrind does not specify write-through |
| 863 | vs. write-back behavior, and this also is not relevant for the number |
| 864 | of generated miss counts. However, with explicit write-back simulation |
| 865 | it can be decided whether a miss triggers not only the loading of a new |
| 866 | cache line, but also if a write back of a dirty cache line had to take |
| 867 | place before. The new dirty miss events are ILdmr, DLdmr, and DLdmw, |
| 868 | for misses because of instruction read, data read, and data write, |
| 869 | respectively. As they produce two memory transactions, they should |
| 870 | account for a doubled time estimation in relation to a normal miss. |
| 871 | </p></dd> |
| 872 | <dt> |
| 873 | <a name="opt.simulate-hwpref"></a><span class="term"> |
| 874 | <code class="option">--simulate-hwpref=<yes|no> [default: no] </code> |
| 875 | </span> |
| 876 | </dt> |
| 877 | <dd><p>Specify whether simulation of a hardware prefetcher should be |
| 878 | added which is able to detect stream access in the second level cache |
| 879 | by comparing accesses to separate to each page. |
| 880 | As the simulation can not decide about any timing issues of prefetching, |
| 881 | it is assumed that any hardware prefetch triggered succeeds before a |
| 882 | real access is done. Thus, this gives a best-case scenario by covering |
| 883 | all possible stream accesses.</p></dd> |
| 884 | <dt> |
| 885 | <a name="opt.cacheuse"></a><span class="term"> |
| 886 | <code class="option">--cacheuse=<yes|no> [default: no] </code> |
| 887 | </span> |
| 888 | </dt> |
| 889 | <dd><p>Specify whether cache line use should be collected. For every |
| 890 | cache line, from loading to it being evicted, the number of accesses |
| 891 | as well as the number of actually used bytes is determined. This |
| 892 | behavior is related to the code which triggered loading of the cache |
| 893 | line. In contrast to miss counters, which shows the position where |
| 894 | the symptoms of bad cache behavior (i.e. latencies) happens, the |
| 895 | use counters try to pinpoint at the reason (i.e. the code with the |
| 896 | bad access behavior). The new counters are defined in a way such |
| 897 | that worse behavior results in higher cost. |
| 898 | AcCost1 and AcCost2 are counters showing bad temporal locality |
| 899 | for L1 and LL caches, respectively. This is done by summing up |
| 900 | reciprocal values of the numbers of accesses of each cache line, |
| 901 | multiplied by 1000 (as only integer costs are allowed). E.g. for |
| 902 | a given source line with 5 read accesses, a value of 5000 AcCost |
| 903 | means that for every access, a new cache line was loaded and directly |
| 904 | evicted afterwards without further accesses. Similarly, SpLoss1/2 |
| 905 | shows bad spatial locality for L1 and LL caches, respectively. It |
| 906 | gives the <span class="emphasis"><em>spatial loss</em></span> count of bytes which |
| 907 | were loaded into cache but never accessed. It pinpoints at code |
| 908 | accessing data in a way such that cache space is wasted. This hints |
| 909 | at bad layout of data structures in memory. Assuming a cache line |
| 910 | size of 64 bytes and 100 L1 misses for a given source line, the |
| 911 | loading of 6400 bytes into L1 was triggered. If SpLoss1 shows a |
| 912 | value of 3200 for this line, this means that half of the loaded data was |
| 913 | never used, or using a better data layout, only half of the cache |
| 914 | space would have been needed. |
| 915 | Please note that for cache line use counters, it currently is |
| 916 | not possible to provide meaningful inclusive costs. Therefore, |
| 917 | inclusive cost of these counters should be ignored. |
| 918 | </p></dd> |
| 919 | <dt> |
| 920 | <a name="opt.I1"></a><span class="term"> |
| 921 | <code class="option">--I1=<size>,<associativity>,<line size> </code> |
| 922 | </span> |
| 923 | </dt> |
| 924 | <dd><p>Specify the size, associativity and line size of the level 1 |
| 925 | instruction cache. </p></dd> |
| 926 | <dt> |
| 927 | <a name="opt.D1"></a><span class="term"> |
| 928 | <code class="option">--D1=<size>,<associativity>,<line size> </code> |
| 929 | </span> |
| 930 | </dt> |
| 931 | <dd><p>Specify the size, associativity and line size of the level 1 |
| 932 | data cache.</p></dd> |
| 933 | <dt> |
| 934 | <a name="opt.LL"></a><span class="term"> |
| 935 | <code class="option">--LL=<size>,<associativity>,<line size> </code> |
| 936 | </span> |
| 937 | </dt> |
| 938 | <dd><p>Specify the size, associativity and line size of the last-level |
| 939 | cache.</p></dd> |
| 940 | </dl> |
| 941 | </div> |
| 942 | </div> |
| 943 | </div> |
| 944 | <div class="sect1"> |
| 945 | <div class="titlepage"><div><div><h2 class="title" style="clear: both"> |
| 946 | <a name="cl-manual.monitor-commands"></a>6.4. Callgrind Monitor Commands</h2></div></div></div> |
| 947 | <p>The Callgrind tool provides monitor commands handled by the Valgrind |
| 948 | gdbserver (see <a class="xref" href="manual-core-adv.html#manual-core-adv.gdbserver-commandhandling" title="3.2.5. Monitor command handling by the Valgrind gdbserver">Monitor command handling by the Valgrind gdbserver</a>). |
| 949 | </p> |
| 950 | <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> |
| 951 | <li class="listitem"><p><code class="varname">dump [<dump_hint>]</code> requests to dump the |
| 952 | profile data. </p></li> |
| 953 | <li class="listitem"><p><code class="varname">zero</code> requests to zero the profile data |
| 954 | counters. </p></li> |
| 955 | <li class="listitem"><p><code class="varname">instrumentation [on|off]</code> requests to set |
| 956 | (if parameter on/off is given) or get the current instrumentation state. |
| 957 | </p></li> |
| 958 | <li class="listitem"><p><code class="varname">status</code> requests to print out some status |
| 959 | information.</p></li> |
| 960 | </ul></div> |
| 961 | </div> |
| 962 | <div class="sect1"> |
| 963 | <div class="titlepage"><div><div><h2 class="title" style="clear: both"> |
| 964 | <a name="cl-manual.clientrequests"></a>6.5. Callgrind specific client requests</h2></div></div></div> |
| 965 | <p>Callgrind provides the following specific client requests in |
| 966 | <code class="filename">callgrind.h</code>. See that file for the exact details of |
| 967 | their arguments.</p> |
| 968 | <div class="variablelist"> |
| 969 | <a name="cl.clientrequests.list"></a><dl class="variablelist"> |
| 970 | <dt> |
| 971 | <a name="cr.dump-stats"></a><span class="term"> |
| 972 | <code class="computeroutput">CALLGRIND_DUMP_STATS</code> |
| 973 | </span> |
| 974 | </dt> |
| 975 | <dd><p>Force generation of a profile dump at specified position |
| 976 | in code, for the current thread only. Written counters will be reset |
| 977 | to zero.</p></dd> |
| 978 | <dt> |
| 979 | <a name="cr.dump-stats-at"></a><span class="term"> |
| 980 | <code class="computeroutput">CALLGRIND_DUMP_STATS_AT(string)</code> |
| 981 | </span> |
| 982 | </dt> |
| 983 | <dd><p>Same as <code class="computeroutput">CALLGRIND_DUMP_STATS</code>, |
| 984 | but allows to specify a string to be able to distinguish profile |
| 985 | dumps.</p></dd> |
| 986 | <dt> |
| 987 | <a name="cr.zero-stats"></a><span class="term"> |
| 988 | <code class="computeroutput">CALLGRIND_ZERO_STATS</code> |
| 989 | </span> |
| 990 | </dt> |
| 991 | <dd><p>Reset the profile counters for the current thread to zero.</p></dd> |
| 992 | <dt> |
| 993 | <a name="cr.toggle-collect"></a><span class="term"> |
| 994 | <code class="computeroutput">CALLGRIND_TOGGLE_COLLECT</code> |
| 995 | </span> |
| 996 | </dt> |
| 997 | <dd><p>Toggle the collection state. This allows to ignore events |
| 998 | with regard to profile counters. See also options |
| 999 | <code class="option"><a class="xref" href="cl-manual.html#opt.collect-atstart">--collect-atstart</a></code> and |
| 1000 | <code class="option"><a class="xref" href="cl-manual.html#opt.toggle-collect">--toggle-collect</a></code>.</p></dd> |
| 1001 | <dt> |
| 1002 | <a name="cr.start-instr"></a><span class="term"> |
| 1003 | <code class="computeroutput">CALLGRIND_START_INSTRUMENTATION</code> |
| 1004 | </span> |
| 1005 | </dt> |
| 1006 | <dd><p>Start full Callgrind instrumentation if not already enabled. |
| 1007 | When cache simulation is done, this will flush the simulated cache |
Elliott Hughes | ed39800 | 2017-06-21 14:41:24 -0700 | [diff] [blame] | 1008 | and lead to an artificial cache warmup phase afterwards with |
Elliott Hughes | a0664b9 | 2017-04-18 17:46:52 -0700 | [diff] [blame] | 1009 | cache misses which would not have happened in reality. See also |
| 1010 | option <code class="option"><a class="xref" href="cl-manual.html#opt.instr-atstart">--instr-atstart</a></code>.</p></dd> |
| 1011 | <dt> |
| 1012 | <a name="cr.stop-instr"></a><span class="term"> |
| 1013 | <code class="computeroutput">CALLGRIND_STOP_INSTRUMENTATION</code> |
| 1014 | </span> |
| 1015 | </dt> |
| 1016 | <dd><p>Stop full Callgrind instrumentation if not already disabled. |
| 1017 | This flushes Valgrinds translation cache, and does no additional |
| 1018 | instrumentation afterwards: it effectivly will run at the same |
| 1019 | speed as Nulgrind, i.e. at minimal slowdown. Use this to |
| 1020 | speed up the Callgrind run for uninteresting code parts. Use |
| 1021 | <code class="computeroutput"><a class="xref" href="cl-manual.html#cr.start-instr">CALLGRIND_START_INSTRUMENTATION</a></code> to |
| 1022 | enable instrumentation again. See also option |
| 1023 | <code class="option"><a class="xref" href="cl-manual.html#opt.instr-atstart">--instr-atstart</a></code>.</p></dd> |
| 1024 | </dl> |
| 1025 | </div> |
| 1026 | </div> |
| 1027 | <div class="sect1"> |
| 1028 | <div class="titlepage"><div><div><h2 class="title" style="clear: both"> |
| 1029 | <a name="cl-manual.callgrind_annotate-options"></a>6.6. callgrind_annotate Command-line Options</h2></div></div></div> |
| 1030 | <div class="variablelist"> |
| 1031 | <a name="callgrind_annotate.opts.list"></a><dl class="variablelist"> |
| 1032 | <dt><span class="term"><code class="option">-h --help</code></span></dt> |
| 1033 | <dd><p>Show summary of options.</p></dd> |
| 1034 | <dt><span class="term"><code class="option">--version</code></span></dt> |
| 1035 | <dd><p>Show version of callgrind_annotate.</p></dd> |
| 1036 | <dt><span class="term"> |
| 1037 | <code class="option">--show=A,B,C [default: all]</code> |
| 1038 | </span></dt> |
| 1039 | <dd><p>Only show figures for events A,B,C.</p></dd> |
| 1040 | <dt><span class="term"> |
| 1041 | <code class="option">--sort=A,B,C</code> |
| 1042 | </span></dt> |
Elliott Hughes | ed39800 | 2017-06-21 14:41:24 -0700 | [diff] [blame] | 1043 | <dd> |
| 1044 | <p>Sort columns by events A,B,C [event column order].</p> |
| 1045 | <p>Optionally, each event is followed by a : and a threshold, |
| 1046 | to specify different thresholds depending on the event.</p> |
| 1047 | </dd> |
Elliott Hughes | a0664b9 | 2017-04-18 17:46:52 -0700 | [diff] [blame] | 1048 | <dt><span class="term"> |
| 1049 | <code class="option">--threshold=<0--100> [default: 99%] </code> |
| 1050 | </span></dt> |
| 1051 | <dd><p>Percentage of counts (of primary sort event) we are |
| 1052 | interested in.</p></dd> |
| 1053 | <dt><span class="term"> |
| 1054 | <code class="option">--auto=<yes|no> [default: no] </code> |
| 1055 | </span></dt> |
| 1056 | <dd><p>Annotate all source files containing functions that helped |
| 1057 | reach the event count threshold.</p></dd> |
| 1058 | <dt><span class="term"> |
| 1059 | <code class="option">--context=N [default: 8] </code> |
| 1060 | </span></dt> |
| 1061 | <dd><p>Print N lines of context before and after annotated |
| 1062 | lines.</p></dd> |
| 1063 | <dt><span class="term"> |
| 1064 | <code class="option">--inclusive=<yes|no> [default: no] </code> |
| 1065 | </span></dt> |
| 1066 | <dd><p>Add subroutine costs to functions calls.</p></dd> |
| 1067 | <dt><span class="term"> |
| 1068 | <code class="option">--tree=<none|caller|calling|both> [default: none] </code> |
| 1069 | </span></dt> |
| 1070 | <dd><p>Print for each function their callers, the called functions |
| 1071 | or both.</p></dd> |
| 1072 | <dt><span class="term"> |
| 1073 | <code class="option">-I, --include=<dir> </code> |
| 1074 | </span></dt> |
| 1075 | <dd><p>Add <code class="option">dir</code> to the list of directories to search |
| 1076 | for source files.</p></dd> |
| 1077 | </dl> |
| 1078 | </div> |
| 1079 | </div> |
| 1080 | <div class="sect1"> |
| 1081 | <div class="titlepage"><div><div><h2 class="title" style="clear: both"> |
| 1082 | <a name="cl-manual.callgrind_control-options"></a>6.7. callgrind_control Command-line Options</h2></div></div></div> |
| 1083 | <p>By default, callgrind_control acts on all programs run by the |
| 1084 | current user under Callgrind. It is possible to limit the actions to |
| 1085 | specified Callgrind runs by providing a list of pids or program names as |
| 1086 | argument. The default action is to give some brief information about the |
| 1087 | applications being run under Callgrind.</p> |
| 1088 | <div class="variablelist"> |
| 1089 | <a name="callgrind_control.opts.list"></a><dl class="variablelist"> |
| 1090 | <dt><span class="term"><code class="option">-h --help</code></span></dt> |
| 1091 | <dd><p>Show a short description, usage, and summary of options.</p></dd> |
| 1092 | <dt><span class="term"><code class="option">--version</code></span></dt> |
| 1093 | <dd><p>Show version of callgrind_control.</p></dd> |
| 1094 | <dt><span class="term"><code class="option">-l --long</code></span></dt> |
| 1095 | <dd><p>Show also the working directory, in addition to the brief |
| 1096 | information given by default. |
| 1097 | </p></dd> |
| 1098 | <dt><span class="term"><code class="option">-s --stat</code></span></dt> |
| 1099 | <dd><p>Show statistics information about active Callgrind runs.</p></dd> |
| 1100 | <dt><span class="term"><code class="option">-b --back</code></span></dt> |
| 1101 | <dd><p>Show stack/back traces of each thread in active Callgrind runs. For |
| 1102 | each active function in the stack trace, also the number of invocations |
| 1103 | since program start (or last dump) is shown. This option can be |
| 1104 | combined with -e to show inclusive cost of active functions.</p></dd> |
| 1105 | <dt><span class="term"><code class="option">-e [A,B,...] </code> (default: all)</span></dt> |
| 1106 | <dd><p>Show the current per-thread, exclusive cost values of event |
| 1107 | counters. If no explicit event names are given, figures for all event |
| 1108 | types which are collected in the given Callgrind run are |
| 1109 | shown. Otherwise, only figures for event types A, B, ... are shown. If |
| 1110 | this option is combined with -b, inclusive cost for the functions of |
| 1111 | each active stack frame is provided, too. |
| 1112 | </p></dd> |
| 1113 | <dt><span class="term"><code class="option">--dump[=<desc>] </code> (default: no description)</span></dt> |
| 1114 | <dd><p>Request the dumping of profile information. Optionally, a |
| 1115 | description can be specified which is written into the dump as part of |
| 1116 | the information giving the reason which triggered the dump action. This |
| 1117 | can be used to distinguish multiple dumps.</p></dd> |
| 1118 | <dt><span class="term"><code class="option">-z --zero</code></span></dt> |
| 1119 | <dd><p>Zero all event counters.</p></dd> |
| 1120 | <dt><span class="term"><code class="option">-k --kill</code></span></dt> |
| 1121 | <dd><p>Force a Callgrind run to be terminated.</p></dd> |
| 1122 | <dt><span class="term"><code class="option">--instr=<on|off></code></span></dt> |
| 1123 | <dd><p>Switch instrumentation mode on or off. If a Callgrind run has |
| 1124 | instrumentation disabled, no simulation is done and no events are |
| 1125 | counted. This is useful to skip uninteresting program parts, as there |
| 1126 | is much less slowdown (same as with the Valgrind tool "none"). See also |
| 1127 | the Callgrind option <code class="option">--instr-atstart</code>.</p></dd> |
| 1128 | <dt><span class="term"><code class="option">--vgdb-prefix=<prefix></code></span></dt> |
| 1129 | <dd><p>Specify the vgdb prefix to use by callgrind_control. |
| 1130 | callgrind_control internally uses vgdb to find and control the active |
| 1131 | Callgrind runs. If the <code class="option">--vgdb-prefix</code> option was used |
| 1132 | for launching valgrind, then the same option must be given to |
| 1133 | callgrind_control.</p></dd> |
| 1134 | </dl> |
| 1135 | </div> |
| 1136 | </div> |
| 1137 | </div> |
| 1138 | <div> |
| 1139 | <br><table class="nav" width="100%" cellspacing="3" cellpadding="2" border="0" summary="Navigation footer"> |
| 1140 | <tr> |
| 1141 | <td rowspan="2" width="40%" align="left"> |
| 1142 | <a accesskey="p" href="cg-manual.html"><< 5. Cachegrind: a cache and branch-prediction profiler</a> </td> |
| 1143 | <td width="20%" align="center"><a accesskey="u" href="manual.html">Up</a></td> |
| 1144 | <td rowspan="2" width="40%" align="right"> <a accesskey="n" href="hg-manual.html">7. Helgrind: a thread error detector >></a> |
| 1145 | </td> |
| 1146 | </tr> |
| 1147 | <tr><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td></tr> |
| 1148 | </table> |
| 1149 | </div> |
| 1150 | </body> |
| 1151 | </html> |