| <html> |
| <head> |
| <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> |
| <title>6. Callgrind: a call-graph generating cache and branch prediction profiler</title> |
| <link rel="stylesheet" type="text/css" href="vg_basic.css"> |
| <meta name="generator" content="DocBook XSL Stylesheets V1.79.1"> |
| <link rel="home" href="index.html" title="Valgrind Documentation"> |
| <link rel="up" href="manual.html" title="Valgrind User Manual"> |
| <link rel="prev" href="cg-manual.html" title="5. Cachegrind: a cache and branch-prediction profiler"> |
| <link rel="next" href="hg-manual.html" title="7. Helgrind: a thread error detector"> |
| </head> |
| <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"> |
| <div><table class="nav" width="100%" cellspacing="3" cellpadding="3" border="0" summary="Navigation header"><tr> |
| <td width="22px" align="center" valign="middle"><a accesskey="p" href="cg-manual.html"><img src="images/prev.png" width="18" height="21" border="0" alt="Prev"></a></td> |
| <td width="25px" align="center" valign="middle"><a accesskey="u" href="manual.html"><img src="images/up.png" width="21" height="18" border="0" alt="Up"></a></td> |
| <td width="31px" align="center" valign="middle"><a accesskey="h" href="index.html"><img src="images/home.png" width="27" height="20" border="0" alt="Up"></a></td> |
| <th align="center" valign="middle">Valgrind User Manual</th> |
| <td width="22px" align="center" valign="middle"><a accesskey="n" href="hg-manual.html"><img src="images/next.png" width="18" height="21" border="0" alt="Next"></a></td> |
| </tr></table></div> |
| <div class="chapter"> |
| <div class="titlepage"><div><div><h1 class="title"> |
| <a name="cl-manual"></a>6. Callgrind: a call-graph generating cache and branch prediction profiler</h1></div></div></div> |
| <div class="toc"> |
| <p><b>Table of Contents</b></p> |
| <dl class="toc"> |
| <dt><span class="sect1"><a href="cl-manual.html#cl-manual.use">6.1. Overview</a></span></dt> |
| <dd><dl> |
| <dt><span class="sect2"><a href="cl-manual.html#cl-manual.functionality">6.1.1. Functionality</a></span></dt> |
| <dt><span class="sect2"><a href="cl-manual.html#cl-manual.basics">6.1.2. Basic Usage</a></span></dt> |
| </dl></dd> |
| <dt><span class="sect1"><a href="cl-manual.html#cl-manual.usage">6.2. Advanced Usage</a></span></dt> |
| <dd><dl> |
| <dt><span class="sect2"><a href="cl-manual.html#cl-manual.dumps">6.2.1. Multiple profiling dumps from one program run</a></span></dt> |
| <dt><span class="sect2"><a href="cl-manual.html#cl-manual.limits">6.2.2. Limiting the range of collected events</a></span></dt> |
| <dt><span class="sect2"><a href="cl-manual.html#cl-manual.busevents">6.2.3. Counting global bus events</a></span></dt> |
| <dt><span class="sect2"><a href="cl-manual.html#cl-manual.cycles">6.2.4. Avoiding cycles</a></span></dt> |
| <dt><span class="sect2"><a href="cl-manual.html#cl-manual.forkingprograms">6.2.5. Forking Programs</a></span></dt> |
| </dl></dd> |
| <dt><span class="sect1"><a href="cl-manual.html#cl-manual.options">6.3. Callgrind Command-line Options</a></span></dt> |
| <dd><dl> |
| <dt><span class="sect2"><a href="cl-manual.html#cl-manual.options.creation">6.3.1. Dump creation options</a></span></dt> |
| <dt><span class="sect2"><a href="cl-manual.html#cl-manual.options.activity">6.3.2. Activity options</a></span></dt> |
| <dt><span class="sect2"><a href="cl-manual.html#cl-manual.options.collection">6.3.3. Data collection options</a></span></dt> |
| <dt><span class="sect2"><a href="cl-manual.html#cl-manual.options.separation">6.3.4. Cost entity separation options</a></span></dt> |
| <dt><span class="sect2"><a href="cl-manual.html#cl-manual.options.simulation">6.3.5. Simulation options</a></span></dt> |
| <dt><span class="sect2"><a href="cl-manual.html#cl-manual.options.cachesimulation">6.3.6. Cache simulation options</a></span></dt> |
| </dl></dd> |
| <dt><span class="sect1"><a href="cl-manual.html#cl-manual.monitor-commands">6.4. Callgrind Monitor Commands</a></span></dt> |
| <dt><span class="sect1"><a href="cl-manual.html#cl-manual.clientrequests">6.5. Callgrind specific client requests</a></span></dt> |
| <dt><span class="sect1"><a href="cl-manual.html#cl-manual.callgrind_annotate-options">6.6. callgrind_annotate Command-line Options</a></span></dt> |
| <dt><span class="sect1"><a href="cl-manual.html#cl-manual.callgrind_control-options">6.7. callgrind_control Command-line Options</a></span></dt> |
| </dl> |
| </div> |
| <p>To use this tool, you must specify |
| <code class="option">--tool=callgrind</code> on the |
| Valgrind command line.</p> |
| <div class="sect1"> |
| <div class="titlepage"><div><div><h2 class="title" style="clear: both"> |
| <a name="cl-manual.use"></a>6.1. Overview</h2></div></div></div> |
| <p>Callgrind is a profiling tool that records the call history among |
| functions in a program's run as a call-graph. |
| By default, the collected data consists of |
| the number of instructions executed, their relationship |
| to source lines, the caller/callee relationship between functions, |
| and the numbers of such calls. |
| Optionally, cache simulation and/or branch prediction (similar to Cachegrind) |
| can produce further information about the runtime behavior of an application. |
| </p> |
| <p>The profile data is written out to a file at program |
| termination. For presentation of the data, and interactive control |
| of the profiling, two command line tools are provided:</p> |
| <div class="variablelist"><dl class="variablelist"> |
| <dt><span class="term"><span class="command"><strong>callgrind_annotate</strong></span></span></dt> |
| <dd> |
| <p>This command reads in the profile data, and prints a |
| sorted lists of functions, optionally with source annotation.</p> |
| <p>For graphical visualization of the data, try |
| <a class="ulink" href="http://kcachegrind.sourceforge.net/cgi-bin/show.cgi/KcacheGrindIndex" target="_top">KCachegrind</a>, which is a KDE/Qt based |
| GUI that makes it easy to navigate the large amount of data that |
| Callgrind produces.</p> |
| </dd> |
| <dt><span class="term"><span class="command"><strong>callgrind_control</strong></span></span></dt> |
| <dd><p>This command enables you to interactively observe and control |
| the status of a program currently running under Callgrind's control, |
| without stopping the program. You can get statistics information as |
| well as the current stack trace, and you can request zeroing of counters |
| or dumping of profile data.</p></dd> |
| </dl></div> |
| <div class="sect2"> |
| <div class="titlepage"><div><div><h3 class="title"> |
| <a name="cl-manual.functionality"></a>6.1.1. Functionality</h3></div></div></div> |
| <p>Cachegrind collects flat profile data: event counts (data reads, |
| cache misses, etc.) are attributed directly to the function they |
| occurred in. This cost attribution mechanism is |
| called <span class="emphasis"><em>self</em></span> or <span class="emphasis"><em>exclusive</em></span> |
| attribution.</p> |
| <p>Callgrind extends this functionality by propagating costs |
| across function call boundaries. If function <code class="function">foo</code> calls |
| <code class="function">bar</code>, the costs from <code class="function">bar</code> are added into |
| <code class="function">foo</code>'s costs. When applied to the program as a whole, |
| this builds up a picture of so called <span class="emphasis"><em>inclusive</em></span> |
| costs, that is, where the cost of each function includes the costs of |
| all functions it called, directly or indirectly.</p> |
| <p>As an example, the inclusive cost of |
| <code class="function">main</code> should be almost 100 percent |
| of the total program cost. Because of costs arising before |
| <code class="function">main</code> is run, such as |
| initialization of the run time linker and construction of global C++ |
| objects, the inclusive cost of <code class="function">main</code> |
| is not exactly 100 percent of the total program cost.</p> |
| <p>Together with the call graph, this allows you to find the |
| specific call chains starting from |
| <code class="function">main</code> in which the majority of the |
| program's costs occur. Caller/callee cost attribution is also useful |
| for profiling functions called from multiple call sites, and where |
| optimization opportunities depend on changing code in the callers, in |
| particular by reducing the call count.</p> |
| <p>Callgrind's cache simulation is based on that of Cachegrind. |
| Read the documentation for <a class="xref" href="cg-manual.html" title="5. Cachegrind: a cache and branch-prediction profiler">Cachegrind: a cache and branch-prediction profiler</a> first. The material |
| below describes the features supported in addition to Cachegrind's |
| features.</p> |
| <p>Callgrind's ability to detect function calls and returns depends |
| on the instruction set of the platform it is run on. It works best on |
| x86 and amd64, and unfortunately currently does not work so well on |
| PowerPC, ARM, Thumb or MIPS code. This is because there are no explicit |
| call or return instructions in these instruction sets, so Callgrind |
| has to rely on heuristics to detect calls and returns.</p> |
| </div> |
| <div class="sect2"> |
| <div class="titlepage"><div><div><h3 class="title"> |
| <a name="cl-manual.basics"></a>6.1.2. Basic Usage</h3></div></div></div> |
| <p>As with Cachegrind, you probably want to compile with debugging info |
| (the <code class="option">-g</code> option) and with optimization turned on.</p> |
| <p>To start a profile run for a program, execute: |
| </p> |
| <pre class="screen">valgrind --tool=callgrind [callgrind options] your-program [program options]</pre> |
| <p> |
| </p> |
| <p>While the simulation is running, you can observe execution with: |
| </p> |
| <pre class="screen">callgrind_control -b</pre> |
| <p> |
| This will print out the current backtrace. To annotate the backtrace with |
| event counts, run |
| </p> |
| <pre class="screen">callgrind_control -e -b</pre> |
| <p> |
| </p> |
| <p>After program termination, a profile data file named |
| <code class="computeroutput">callgrind.out.<pid></code> |
| is generated, where <span class="emphasis"><em>pid</em></span> is the process ID |
| of the program being profiled. |
| The data file contains information about the calls made in the |
| program among the functions executed, together with |
| <span class="command"><strong>Instruction Read</strong></span> (Ir) event counts.</p> |
| <p>To generate a function-by-function summary from the profile |
| data file, use |
| </p> |
| <pre class="screen">callgrind_annotate [options] callgrind.out.<pid></pre> |
| <p> |
| This summary is similar to the output you get from a Cachegrind |
| run with cg_annotate: the list |
| of functions is ordered by exclusive cost of functions, which also |
| are the ones that are shown. |
| Important for the additional features of Callgrind are |
| the following two options:</p> |
| <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> |
| <li class="listitem"><p><code class="option">--inclusive=yes</code>: Instead of using |
| exclusive cost of functions as sorting order, use and show |
| inclusive cost.</p></li> |
| <li class="listitem"><p><code class="option">--tree=both</code>: Interleave into the |
| top level list of functions, information on the callers and the callees |
| of each function. In these lines, which represents executed |
| calls, the cost gives the number of events spent in the call. |
| Indented, above each function, there is the list of callers, |
| and below, the list of callees. The sum of events in calls to |
| a given function (caller lines), as well as the sum of events in |
| calls from the function (callee lines) together with the self |
| cost, gives the total inclusive cost of the function.</p></li> |
| </ul></div> |
| <p>Use <code class="option">--auto=yes</code> to get annotated source code |
| for all relevant functions for which the source can be found. In |
| addition to source annotation as produced by |
| <code class="computeroutput">cg_annotate</code>, you will see the |
| annotated call sites with call counts. For all other options, |
| consult the (Cachegrind) documentation for |
| <code class="computeroutput">cg_annotate</code>. |
| </p> |
| <p>For better call graph browsing experience, it is highly recommended |
| to use <a class="ulink" href="http://kcachegrind.sourceforge.net/cgi-bin/show.cgi/KcacheGrindIndex" target="_top">KCachegrind</a>. |
| If your code |
| has a significant fraction of its cost in <span class="emphasis"><em>cycles</em></span> (sets |
| of functions calling each other in a recursive manner), you have to |
| use KCachegrind, as <code class="computeroutput">callgrind_annotate</code> |
| currently does not do any cycle detection, which is important to get correct |
| results in this case.</p> |
| <p>If you are additionally interested in measuring the |
| cache behavior of your program, use Callgrind with the option |
| <code class="option"><a class="xref" href="cl-manual.html#clopt.cache-sim">--cache-sim</a>=yes</code>. For |
| branch prediction simulation, use <code class="option"><a class="xref" href="cl-manual.html#clopt.branch-sim">--branch-sim</a>=yes</code>. |
| Expect a further slow down approximately by a factor of 2.</p> |
| <p>If the program section you want to profile is somewhere in the |
| middle of the run, it is beneficial to |
| <span class="emphasis"><em>fast forward</em></span> to this section without any |
| profiling, and then enable profiling. This is achieved by using |
| the command line option |
| <code class="option"><a class="xref" href="cl-manual.html#opt.instr-atstart">--instr-atstart</a>=no</code> |
| and running, in a shell: |
| <code class="computeroutput">callgrind_control -i on</code> just before the |
| interesting code section is executed. To exactly specify |
| the code position where profiling should start, use the client request |
| <code class="computeroutput"><a class="xref" href="cl-manual.html#cr.start-instr">CALLGRIND_START_INSTRUMENTATION</a></code>.</p> |
| <p>If you want to be able to see assembly code level annotation, specify |
| <code class="option"><a class="xref" href="cl-manual.html#opt.dump-instr">--dump-instr</a>=yes</code>. This will produce |
| profile data at instruction granularity. Note that the resulting profile |
| data |
| can only be viewed with KCachegrind. For assembly annotation, it also is |
| interesting to see more details of the control flow inside of functions, |
| i.e. (conditional) jumps. This will be collected by further specifying |
| <code class="option"><a class="xref" href="cl-manual.html#opt.collect-jumps">--collect-jumps</a>=yes</code>.</p> |
| </div> |
| </div> |
| <div class="sect1"> |
| <div class="titlepage"><div><div><h2 class="title" style="clear: both"> |
| <a name="cl-manual.usage"></a>6.2. Advanced Usage</h2></div></div></div> |
| <div class="sect2"> |
| <div class="titlepage"><div><div><h3 class="title"> |
| <a name="cl-manual.dumps"></a>6.2.1. Multiple profiling dumps from one program run</h3></div></div></div> |
| <p>Sometimes you are not interested in characteristics of a full |
| program run, but only of a small part of it, for example execution of one |
| algorithm. If there are multiple algorithms, or one algorithm |
| running with different input data, it may even be useful to get different |
| profile information for different parts of a single program run.</p> |
| <p>Profile data files have names of the form |
| </p> |
| <pre class="screen"> |
| callgrind.out.<span class="emphasis"><em>pid</em></span>.<span class="emphasis"><em>part</em></span>-<span class="emphasis"><em>threadID</em></span> |
| </pre> |
| <p> |
| </p> |
| <p>where <span class="emphasis"><em>pid</em></span> is the PID of the running |
| program, <span class="emphasis"><em>part</em></span> is a number incremented on each |
| dump (".part" is skipped for the dump at program termination), and |
| <span class="emphasis"><em>threadID</em></span> is a thread identification |
| ("-threadID" is only used if you request dumps of individual |
| threads with <code class="option"><a class="xref" href="cl-manual.html#opt.separate-threads">--separate-threads</a>=yes</code>).</p> |
| <p>There are different ways to generate multiple profile dumps |
| while a program is running under Callgrind's supervision. Nevertheless, |
| all methods trigger the same action, which is "dump all profile |
| information since the last dump or program start, and zero cost |
| counters afterwards". To allow for zeroing cost counters without |
| dumping, there is a second action "zero all cost counters now". |
| The different methods are:</p> |
| <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> |
| <li class="listitem"><p><span class="command"><strong>Dump on program termination.</strong></span> |
| This method is the standard way and doesn't need any special |
| action on your part.</p></li> |
| <li class="listitem"> |
| <p><span class="command"><strong>Spontaneous, interactive dumping.</strong></span> Use |
| </p> |
| <pre class="screen">callgrind_control -d [hint [PID/Name]]</pre> |
| <p> to |
| request the dumping of profile information of the supervised |
| application with PID or Name. <span class="emphasis"><em>hint</em></span> is an |
| arbitrary string you can optionally specify to later be able to |
| distinguish profile dumps. The control program will not terminate |
| before the dump is completely written. Note that the application |
| must be actively running for detection of the dump command. So, |
| for a GUI application, resize the window, or for a server, send a |
| request.</p> |
| <p>If you are using <a class="ulink" href="http://kcachegrind.sourceforge.net/cgi-bin/show.cgi/KcacheGrindIndex" target="_top">KCachegrind</a> |
| for browsing of profile information, you can use the toolbar |
| button <span class="command"><strong>Force dump</strong></span>. This will request a dump |
| and trigger a reload after the dump is written.</p> |
| </li> |
| <li class="listitem"><p><span class="command"><strong>Periodic dumping after execution of a specified |
| number of basic blocks</strong></span>. For this, use the command line |
| option <code class="option"><a class="xref" href="cl-manual.html#opt.dump-every-bb">--dump-every-bb</a>=count</code>. |
| </p></li> |
| <li class="listitem"> |
| <p><span class="command"><strong>Dumping at enter/leave of specified functions.</strong></span> |
| Use the |
| option <code class="option"><a class="xref" href="cl-manual.html#opt.dump-before">--dump-before</a>=function</code> |
| and <code class="option"><a class="xref" href="cl-manual.html#opt.dump-after">--dump-after</a>=function</code>. |
| To zero cost counters before entering a function, use |
| <code class="option"><a class="xref" href="cl-manual.html#opt.zero-before">--zero-before</a>=function</code>.</p> |
| <p>You can specify these options multiple times for different |
| functions. Function specifications support wildcards: e.g. use |
| <code class="option"><a class="xref" href="cl-manual.html#opt.dump-before">--dump-before</a>='foo*'</code> to |
| generate dumps before entering any function starting with |
| <span class="emphasis"><em>foo</em></span>.</p> |
| </li> |
| <li class="listitem"><p><span class="command"><strong>Program controlled dumping.</strong></span> |
| Insert |
| <code class="computeroutput"><a class="xref" href="cl-manual.html#cr.dump-stats">CALLGRIND_DUMP_STATS</a>;</code> |
| at the position in your code where you want a profile dump to happen. Use |
| <code class="computeroutput"><a class="xref" href="cl-manual.html#cr.zero-stats">CALLGRIND_ZERO_STATS</a>;</code> to only |
| zero profile counters. |
| See <a class="xref" href="cl-manual.html#cl-manual.clientrequests" title="6.5. Callgrind specific client requests">Client request reference</a> for more information on |
| Callgrind specific client requests.</p></li> |
| </ul></div> |
| <p>If you are running a multi-threaded application and specify the |
| command line option <code class="option"><a class="xref" href="cl-manual.html#opt.separate-threads">--separate-threads</a>=yes</code>, |
| every thread will be profiled on its own and will create its own |
| profile dump. Thus, the last two methods will only generate one dump |
| of the currently running thread. With the other methods, you will get |
| multiple dumps (one for each thread) on a dump request.</p> |
| </div> |
| <div class="sect2"> |
| <div class="titlepage"><div><div><h3 class="title"> |
| <a name="cl-manual.limits"></a>6.2.2. Limiting the range of collected events</h3></div></div></div> |
| <p>By default, whenever events are happening (such as an |
| instruction execution or cache hit/miss), Callgrind is aggregating |
| them into event counters. However, you may be interested only in |
| what is happening within a given function or starting from a given |
| program phase. To this end, you can disable event aggregation for |
| uninteresting program parts. While attribution of events to |
| functions as well as producing separate output per program phase |
| can be done by other means (see previous section), there are two |
| benefits by disabling aggregation. First, this is very |
| fine-granular (e.g. just for a loop within a function). Second, |
| disabling event aggregation for complete program phases allows to |
| switch off time-consuming cache simulation and allows Callgrind to |
| progress at much higher speed with an slowdown of around factor 2 |
| (identical to <code class="computeroutput">valgrind |
| --tool=none</code>). |
| </p> |
| <p>There are two aspects which influence whether Callgrind is |
| aggregating events at some point in time of program execution. |
| First, there is the <span class="emphasis"><em>collection state</em></span>. If this |
| is off, no aggregation will be done. By changing the collection |
| state, you can control event aggregation at a very fine |
| granularity. However, there is not much difference in regard to |
| execution speed of Callgrind. By default, collection is switched |
| on, but can be disabled by different means (see below). Second, |
| there is the <span class="emphasis"><em>instrumentation mode</em></span> in which |
| Callgrind is running. This mode either can be on or off. If |
| instrumentation is off, no observation of actions in the program |
| will be done and thus, no actions will be forwarded to the |
| simulator which could trigger events. In the end, no events will |
| be aggregated. The huge benefit is the much higher speed with |
| instrumentation switched off. However, this only should be used |
| with care and in a coarse fashion: every mode change resets the |
| simulator state (ie. whether a memory block is cached or not) and |
| flushes Valgrinds internal cache of instrumented code blocks, |
| resulting in latency penalty at switching time. Also, cache |
| simulator results directly after switching on instrumentation will |
| be skewed due to identified cache misses which would not happen in |
| reality (if you care about this warm-up effect, you should make |
| sure to temporarly have collection state switched off directly |
| after turning instrumentation mode on). However, switching |
| instrumentation state is very useful to skip larger program phases |
| such as an initialization phase. By default, instrumentation is |
| switched on, but as with the collection state, can be changed by |
| various means. |
| </p> |
| <p>Callgrind can start with instrumentation mode switched off by |
| specifying |
| option <code class="option"><a class="xref" href="cl-manual.html#opt.instr-atstart">--instr-atstart</a>=no</code>. |
| Afterwards, instrumentation can be controlled in two ways: first, |
| interactively with: </p> |
| <pre class="screen">callgrind_control -i on</pre> |
| <p> (and |
| switching off again by specifying "off" instead of "on"). Second, |
| instrumentation state can be programatically changed with the |
| macros <code class="computeroutput"><a class="xref" href="cl-manual.html#cr.start-instr">CALLGRIND_START_INSTRUMENTATION</a>;</code> |
| and <code class="computeroutput"><a class="xref" href="cl-manual.html#cr.stop-instr">CALLGRIND_STOP_INSTRUMENTATION</a>;</code>. |
| </p> |
| <p>Similarly, the collection state at program start can be |
| switched off |
| by <code class="option"><a class="xref" href="cl-manual.html#opt.instr-atstart">--instr-atstart</a>=no</code>. During |
| execution, it can be controlled programatically with the |
| macro <code class="computeroutput">CALLGRIND_TOGGLE_COLLECT;</code>. |
| Further, you can limit event collection to a specific function by |
| using <code class="option"><a class="xref" href="cl-manual.html#opt.toggle-collect">--toggle-collect</a>=function</code>. |
| This will toggle the collection state on entering and leaving the |
| specified function. When this option is in effect, the default |
| collection state at program start is "off". Only events happening |
| while running inside of the given function will be |
| collected. Recursive calls of the given function do not trigger |
| any action. This option can be given multiple times to specify |
| different functions of interest.</p> |
| </div> |
| <div class="sect2"> |
| <div class="titlepage"><div><div><h3 class="title"> |
| <a name="cl-manual.busevents"></a>6.2.3. Counting global bus events</h3></div></div></div> |
| <p>For access to shared data among threads in a multithreaded |
| code, synchronization is required to avoid raced conditions. |
| Synchronization primitives are usually implemented via atomic instructions. |
| However, excessive use of such instructions can lead to performance |
| issues.</p> |
| <p>To enable analysis of this problem, Callgrind optionally can count |
| the number of atomic instructions executed. More precisely, for x86/x86_64, |
| these are instructions using a lock prefix. For architectures supporting |
| LL/SC, these are the number of SC instructions executed. For both, the term |
| "global bus events" is used.</p> |
| <p>The short name of the event type used for global bus events is "Ge". |
| To count global bus events, use <code class="option"><a class="xref" href="cl-manual.html#clopt.collect-bus">--collect-bus</a>=yes</code>. |
| </p> |
| </div> |
| <div class="sect2"> |
| <div class="titlepage"><div><div><h3 class="title"> |
| <a name="cl-manual.cycles"></a>6.2.4. Avoiding cycles</h3></div></div></div> |
| <p>Informally speaking, a cycle is a group of functions which |
| call each other in a recursive way.</p> |
| <p>Formally speaking, a cycle is a nonempty set S of functions, |
| such that for every pair of functions F and G in S, it is possible |
| to call from F to G (possibly via intermediate functions) and also |
| from G to F. Furthermore, S must be maximal -- that is, be the |
| largest set of functions satisfying this property. For example, if |
| a third function H is called from inside S and calls back into S, |
| then H is also part of the cycle and should be included in S.</p> |
| <p>Recursion is quite usual in programs, and therefore, cycles |
| sometimes appear in the call graph output of Callgrind. However, |
| the title of this chapter should raise two questions: What is bad |
| about cycles which makes you want to avoid them? And: How can |
| cycles be avoided without changing program code?</p> |
| <p>Cycles are not bad in itself, but tend to make performance |
| analysis of your code harder. This is because inclusive costs |
| for calls inside of a cycle are meaningless. The definition of |
| inclusive cost, i.e. self cost of a function plus inclusive cost |
| of its callees, needs a topological order among functions. For |
| cycles, this does not hold true: callees of a function in a cycle include |
| the function itself. Therefore, KCachegrind does cycle detection |
| and skips visualization of any inclusive cost for calls inside |
| of cycles. Further, all functions in a cycle are collapsed into artificial |
| functions called like <code class="computeroutput">Cycle 1</code>.</p> |
| <p>Now, when a program exposes really big cycles (as is |
| true for some GUI code, or in general code using event or callback based |
| programming style), you lose the nice property to let you pinpoint |
| the bottlenecks by following call chains from |
| <code class="function">main</code>, guided via |
| inclusive cost. In addition, KCachegrind loses its ability to show |
| interesting parts of the call graph, as it uses inclusive costs to |
| cut off uninteresting areas.</p> |
| <p>Despite the meaningless of inclusive costs in cycles, the big |
| drawback for visualization motivates the possibility to temporarily |
| switch off cycle detection in KCachegrind, which can lead to |
| misguiding visualization. However, often cycles appear because of |
| unlucky superposition of independent call chains in a way that |
| the profile result will see a cycle. Neglecting uninteresting |
| calls with very small measured inclusive cost would break these |
| cycles. In such cases, incorrect handling of cycles by not detecting |
| them still gives meaningful profiling visualization.</p> |
| <p>It has to be noted that currently, <span class="command"><strong>callgrind_annotate</strong></span> |
| does not do any cycle detection at all. For program executions with function |
| recursion, it e.g. can print nonsense inclusive costs way above 100%.</p> |
| <p>After describing why cycles are bad for profiling, it is worth |
| talking about cycle avoidance. The key insight here is that symbols in |
| the profile data do not have to exactly match the symbols found in the |
| program. Instead, the symbol name could encode additional information |
| from the current execution context such as recursion level of the |
| current function, or even some part of the call chain leading to the |
| function. While encoding of additional information into symbols is |
| quite capable of avoiding cycles, it has to be used carefully to not cause |
| symbol explosion. The latter imposes large memory requirement for Callgrind |
| with possible out-of-memory conditions, and big profile data files.</p> |
| <p>A further possibility to avoid cycles in Callgrind's profile data |
| output is to simply leave out given functions in the call graph. Of course, this |
| also skips any call information from and to an ignored function, and thus can |
| break a cycle. Candidates for this typically are dispatcher functions in event |
| driven code. The option to ignore calls to a function is |
| <code class="option"><a class="xref" href="cl-manual.html#opt.fn-skip">--fn-skip</a>=function</code>. Aside from |
| possibly breaking cycles, this is used in Callgrind to skip |
| trampoline functions in the PLT sections |
| for calls to functions in shared libraries. You can see the difference |
| if you profile with <code class="option"><a class="xref" href="cl-manual.html#opt.skip-plt">--skip-plt</a>=no</code>. |
| If a call is ignored, its cost events will be propagated to the |
| enclosing function.</p> |
| <p>If you have a recursive function, you can distinguish the first |
| 10 recursion levels by specifying |
| <code class="option"><a class="xref" href="cl-manual.html#opt.separate-recs-num">--separate-recs10</a>=function</code>. |
| Or for all functions with |
| <code class="option"><a class="xref" href="cl-manual.html#opt.separate-recs">--separate-recs</a>=10</code>, but this will |
| give you much bigger profile data files. In the profile data, you will see |
| the recursion levels of "func" as the different functions with names |
| "func", "func'2", "func'3" and so on.</p> |
| <p>If you have call chains "A > B > C" and "A > C > B" |
| in your program, you usually get a "false" cycle "B <> C". Use |
| <code class="option"><a class="xref" href="cl-manual.html#opt.separate-callers-num">--separate-callers2</a>=B</code> |
| <code class="option"><a class="xref" href="cl-manual.html#opt.separate-callers-num">--separate-callers2</a>=C</code>, |
| and functions "B" and "C" will be treated as different functions |
| depending on the direct caller. Using the apostrophe for appending |
| this "context" to the function name, you get "A > B'A > C'B" |
| and "A > C'A > B'C", and there will be no cycle. Use |
| <code class="option"><a class="xref" href="cl-manual.html#opt.separate-callers">--separate-callers</a>=2</code> to get a 2-caller |
| dependency for all functions. Note that doing this will increase |
| the size of profile data files.</p> |
| </div> |
| <div class="sect2"> |
| <div class="titlepage"><div><div><h3 class="title"> |
| <a name="cl-manual.forkingprograms"></a>6.2.5. Forking Programs</h3></div></div></div> |
| <p>If your program forks, the child will inherit all the profiling |
| data that has been gathered for the parent. To start with empty profile |
| counter values in the child, the client request |
| <code class="computeroutput"><a class="xref" href="cl-manual.html#cr.zero-stats">CALLGRIND_ZERO_STATS</a>;</code> |
| can be inserted into code to be executed by the child, directly after |
| <code class="computeroutput">fork</code>.</p> |
| <p>However, you will have to make sure that the output file format string |
| (controlled by <code class="option">--callgrind-out-file</code>) does contain |
| <code class="option">%p</code> (which is true by default). Otherwise, the |
| outputs from the parent and child will overwrite each other or will be |
| intermingled, which almost certainly is not what you want.</p> |
| <p>You will be able to control the new child independently from |
| the parent via callgrind_control.</p> |
| </div> |
| </div> |
| <div class="sect1"> |
| <div class="titlepage"><div><div><h2 class="title" style="clear: both"> |
| <a name="cl-manual.options"></a>6.3. Callgrind Command-line Options</h2></div></div></div> |
| <p> |
| In the following, options are grouped into classes. |
| </p> |
| <p> |
| Some options allow the specification of a function/symbol name, such as |
| <code class="option"><a class="xref" href="cl-manual.html#opt.dump-before">--dump-before</a>=function</code>, or |
| <code class="option"><a class="xref" href="cl-manual.html#opt.fn-skip">--fn-skip</a>=function</code>. All these options |
| can be specified multiple times for different functions. |
| In addition, the function specifications actually are patterns by supporting |
| the use of wildcards '*' (zero or more arbitrary characters) and '?' |
| (exactly one arbitrary character), similar to file name globbing in the |
| shell. This feature is important especially for C++, as without wildcard |
| usage, the function would have to be specified in full extent, including |
| parameter signature. </p> |
| <div class="sect2"> |
| <div class="titlepage"><div><div><h3 class="title"> |
| <a name="cl-manual.options.creation"></a>6.3.1. Dump creation options</h3></div></div></div> |
| <p> |
| These options influence the name and format of the profile data files. |
| </p> |
| <div class="variablelist"> |
| <a name="cl.opts.list.creation"></a><dl class="variablelist"> |
| <dt> |
| <a name="opt.callgrind-out-file"></a><span class="term"> |
| <code class="option">--callgrind-out-file=<file> </code> |
| </span> |
| </dt> |
| <dd><p>Write the profile data to |
| <code class="computeroutput">file</code> rather than to the default |
| output file, |
| <code class="computeroutput">callgrind.out.<pid></code>. The |
| <code class="option">%p</code> and <code class="option">%q</code> format specifiers |
| can be used to embed the process ID and/or the contents of an |
| environment variable in the name, as is the case for the core |
| option <code class="option"><a class="xref" href="manual-core.html#opt.log-file">--log-file</a></code>. |
| When multiple dumps are made, the file name |
| is modified further; see below.</p></dd> |
| <dt> |
| <a name="opt.dump-line"></a><span class="term"> |
| <code class="option">--dump-line=<no|yes> [default: yes] </code> |
| </span> |
| </dt> |
| <dd><p>This specifies that event counting should be performed at |
| source line granularity. This allows source annotation for sources |
| which are compiled with debug information |
| (<code class="option">-g</code>).</p></dd> |
| <dt> |
| <a name="opt.dump-instr"></a><span class="term"> |
| <code class="option">--dump-instr=<no|yes> [default: no] </code> |
| </span> |
| </dt> |
| <dd><p>This specifies that event counting should be performed at |
| per-instruction granularity. |
| This allows for assembly code |
| annotation. Currently the results can only be |
| displayed by KCachegrind.</p></dd> |
| <dt> |
| <a name="opt.compress-strings"></a><span class="term"> |
| <code class="option">--compress-strings=<no|yes> [default: yes] </code> |
| </span> |
| </dt> |
| <dd><p>This option influences the output format of the profile data. |
| It specifies whether strings (file and function names) should be |
| identified by numbers. This shrinks the file, |
| but makes it more difficult |
| for humans to read (which is not recommended in any case).</p></dd> |
| <dt> |
| <a name="opt.compress-pos"></a><span class="term"> |
| <code class="option">--compress-pos=<no|yes> [default: yes] </code> |
| </span> |
| </dt> |
| <dd><p>This option influences the output format of the profile data. |
| It specifies whether numerical positions are always specified as absolute |
| values or are allowed to be relative to previous numbers. |
| This shrinks the file size.</p></dd> |
| <dt> |
| <a name="opt.combine-dumps"></a><span class="term"> |
| <code class="option">--combine-dumps=<no|yes> [default: no] </code> |
| </span> |
| </dt> |
| <dd><p>When enabled, when multiple profile data parts are to be |
| generated these parts are appended to the same output file. |
| Not recommended.</p></dd> |
| </dl> |
| </div> |
| </div> |
| <div class="sect2"> |
| <div class="titlepage"><div><div><h3 class="title"> |
| <a name="cl-manual.options.activity"></a>6.3.2. Activity options</h3></div></div></div> |
| <p> |
| These options specify when actions relating to event counts are to |
| be executed. For interactive control use callgrind_control. |
| </p> |
| <div class="variablelist"> |
| <a name="cl.opts.list.activity"></a><dl class="variablelist"> |
| <dt> |
| <a name="opt.dump-every-bb"></a><span class="term"> |
| <code class="option">--dump-every-bb=<count> [default: 0, never] </code> |
| </span> |
| </dt> |
| <dd><p>Dump profile data every <code class="option">count</code> basic blocks. |
| Whether a dump is needed is only checked when Valgrind's internal |
| scheduler is run. Therefore, the minimum setting useful is about 100000. |
| The count is a 64-bit value to make long dump periods possible. |
| </p></dd> |
| <dt> |
| <a name="opt.dump-before"></a><span class="term"> |
| <code class="option">--dump-before=<function> </code> |
| </span> |
| </dt> |
| <dd><p>Dump when entering <code class="option">function</code>.</p></dd> |
| <dt> |
| <a name="opt.zero-before"></a><span class="term"> |
| <code class="option">--zero-before=<function> </code> |
| </span> |
| </dt> |
| <dd><p>Zero all costs when entering <code class="option">function</code>.</p></dd> |
| <dt> |
| <a name="opt.dump-after"></a><span class="term"> |
| <code class="option">--dump-after=<function> </code> |
| </span> |
| </dt> |
| <dd><p>Dump when leaving <code class="option">function</code>.</p></dd> |
| </dl> |
| </div> |
| </div> |
| <div class="sect2"> |
| <div class="titlepage"><div><div><h3 class="title"> |
| <a name="cl-manual.options.collection"></a>6.3.3. Data collection options</h3></div></div></div> |
| <p> |
| These options specify when events are to be aggregated into event counts. |
| Also see <a class="xref" href="cl-manual.html#cl-manual.limits" title="6.2.2. Limiting the range of collected events">Limiting range of event collection</a>.</p> |
| <div class="variablelist"> |
| <a name="cl.opts.list.collection"></a><dl class="variablelist"> |
| <dt> |
| <a name="opt.instr-atstart"></a><span class="term"> |
| <code class="option">--instr-atstart=<yes|no> [default: yes] </code> |
| </span> |
| </dt> |
| <dd> |
| <p>Specify if you want Callgrind to start simulation and |
| profiling from the beginning of the program. |
| When set to <code class="computeroutput">no</code>, |
| Callgrind will not be able |
| to collect any information, including calls, but it will have at |
| most a slowdown of around 4, which is the minimum Valgrind |
| overhead. Instrumentation can be interactively enabled via |
| <code class="computeroutput">callgrind_control -i on</code>.</p> |
| <p>Note that the resulting call graph will most probably not |
| contain <code class="function">main</code>, but will contain all the |
| functions executed after instrumentation was enabled. |
| Instrumentation can also programatically enabled/disabled. See the |
| Callgrind include file |
| <code class="computeroutput">callgrind.h</code> for the macro |
| you have to use in your source code.</p> |
| <p>For cache |
| simulation, results will be less accurate when switching on |
| instrumentation later in the program run, as the simulator starts |
| with an empty cache at that moment. Switch on event collection |
| later to cope with this error.</p> |
| </dd> |
| <dt> |
| <a name="opt.collect-atstart"></a><span class="term"> |
| <code class="option">--collect-atstart=<yes|no> [default: yes] </code> |
| </span> |
| </dt> |
| <dd> |
| <p>Specify whether event collection is enabled at beginning |
| of the profile run.</p> |
| <p>To only look at parts of your program, you have two |
| possibilities:</p> |
| <div class="orderedlist"><ol class="orderedlist" type="1"> |
| <li class="listitem"><p>Zero event counters before entering the program part you |
| want to profile, and dump the event counters to a file after |
| leaving that program part.</p></li> |
| <li class="listitem"><p>Switch on/off collection state as needed to only see |
| event counters happening while inside of the program part you |
| want to profile.</p></li> |
| </ol></div> |
| <p>The second option can be used if the program part you want to |
| profile is called many times. Option 1, i.e. creating a lot of |
| dumps is not practical here.</p> |
| <p>Collection state can be |
| toggled at entry and exit of a given function with the |
| option <code class="option"><a class="xref" href="cl-manual.html#opt.toggle-collect">--toggle-collect</a></code>. If you |
| use this option, collection |
| state should be disabled at the beginning. Note that the |
| specification of <code class="option">--toggle-collect</code> |
| implicitly sets |
| <code class="option">--collect-state=no</code>.</p> |
| <p>Collection state can be toggled also by inserting the client request |
| <code class="computeroutput"> |
| |
| CALLGRIND_TOGGLE_COLLECT |
| ;</code> |
| at the needed code positions.</p> |
| </dd> |
| <dt> |
| <a name="opt.toggle-collect"></a><span class="term"> |
| <code class="option">--toggle-collect=<function> </code> |
| </span> |
| </dt> |
| <dd><p>Toggle collection on entry/exit of <code class="option">function</code>.</p></dd> |
| <dt> |
| <a name="opt.collect-jumps"></a><span class="term"> |
| <code class="option">--collect-jumps=<no|yes> [default: no] </code> |
| </span> |
| </dt> |
| <dd><p>This specifies whether information for (conditional) jumps |
| should be collected. As above, callgrind_annotate currently is not |
| able to show you the data. You have to use KCachegrind to get jump |
| arrows in the annotated code.</p></dd> |
| <dt> |
| <a name="opt.collect-systime"></a><span class="term"> |
| <code class="option">--collect-systime=<no|yes> [default: no] </code> |
| </span> |
| </dt> |
| <dd><p>This specifies whether information for system call times |
| should be collected.</p></dd> |
| <dt> |
| <a name="clopt.collect-bus"></a><span class="term"> |
| <code class="option">--collect-bus=<no|yes> [default: no] </code> |
| </span> |
| </dt> |
| <dd><p>This specifies whether the number of global bus events executed |
| should be collected. The event type "Ge" is used for these events.</p></dd> |
| </dl> |
| </div> |
| </div> |
| <div class="sect2"> |
| <div class="titlepage"><div><div><h3 class="title"> |
| <a name="cl-manual.options.separation"></a>6.3.4. Cost entity separation options</h3></div></div></div> |
| <p> |
| These options specify how event counts should be attributed to execution |
| contexts. |
| For example, they specify whether the recursion level or the |
| call chain leading to a function should be taken into account, |
| and whether the thread ID should be considered. |
| Also see <a class="xref" href="cl-manual.html#cl-manual.cycles" title="6.2.4. Avoiding cycles">Avoiding cycles</a>.</p> |
| <div class="variablelist"> |
| <a name="cmd-options.separation"></a><dl class="variablelist"> |
| <dt> |
| <a name="opt.separate-threads"></a><span class="term"> |
| <code class="option">--separate-threads=<no|yes> [default: no] </code> |
| </span> |
| </dt> |
| <dd><p>This option specifies whether profile data should be generated |
| separately for every thread. If yes, the file names get "-threadID" |
| appended.</p></dd> |
| <dt> |
| <a name="opt.separate-callers"></a><span class="term"> |
| <code class="option">--separate-callers=<callers> [default: 0] </code> |
| </span> |
| </dt> |
| <dd><p>Separate contexts by at most <callers> functions in the |
| call chain. See <a class="xref" href="cl-manual.html#cl-manual.cycles" title="6.2.4. Avoiding cycles">Avoiding cycles</a>.</p></dd> |
| <dt> |
| <a name="opt.separate-callers-num"></a><span class="term"> |
| <code class="option">--separate-callers<number>=<function> </code> |
| </span> |
| </dt> |
| <dd><p>Separate <code class="option">number</code> callers for <code class="option">function</code>. |
| See <a class="xref" href="cl-manual.html#cl-manual.cycles" title="6.2.4. Avoiding cycles">Avoiding cycles</a>.</p></dd> |
| <dt> |
| <a name="opt.separate-recs"></a><span class="term"> |
| <code class="option">--separate-recs=<level> [default: 2] </code> |
| </span> |
| </dt> |
| <dd><p>Separate function recursions by at most <code class="option">level</code> levels. |
| See <a class="xref" href="cl-manual.html#cl-manual.cycles" title="6.2.4. Avoiding cycles">Avoiding cycles</a>.</p></dd> |
| <dt> |
| <a name="opt.separate-recs-num"></a><span class="term"> |
| <code class="option">--separate-recs<number>=<function> </code> |
| </span> |
| </dt> |
| <dd><p>Separate <code class="option">number</code> recursions for <code class="option">function</code>. |
| See <a class="xref" href="cl-manual.html#cl-manual.cycles" title="6.2.4. Avoiding cycles">Avoiding cycles</a>.</p></dd> |
| <dt> |
| <a name="opt.skip-plt"></a><span class="term"> |
| <code class="option">--skip-plt=<no|yes> [default: yes] </code> |
| </span> |
| </dt> |
| <dd><p>Ignore calls to/from PLT sections.</p></dd> |
| <dt> |
| <a name="opt.skip-direct-rec"></a><span class="term"> |
| <code class="option">--skip-direct-rec=<no|yes> [default: yes] </code> |
| </span> |
| </dt> |
| <dd><p>Ignore direct recursions.</p></dd> |
| <dt> |
| <a name="opt.fn-skip"></a><span class="term"> |
| <code class="option">--fn-skip=<function> </code> |
| </span> |
| </dt> |
| <dd> |
| <p>Ignore calls to/from a given function. E.g. if you have a |
| call chain A > B > C, and you specify function B to be |
| ignored, you will only see A > C.</p> |
| <p>This is very convenient to skip functions handling callback |
| behaviour. For example, with the signal/slot mechanism in the |
| Qt graphics library, you only want |
| to see the function emitting a signal to call the slots connected |
| to that signal. First, determine the real call chain to see the |
| functions needed to be skipped, then use this option.</p> |
| </dd> |
| </dl> |
| </div> |
| </div> |
| <div class="sect2"> |
| <div class="titlepage"><div><div><h3 class="title"> |
| <a name="cl-manual.options.simulation"></a>6.3.5. Simulation options</h3></div></div></div> |
| <div class="variablelist"> |
| <a name="cl.opts.list.simulation"></a><dl class="variablelist"> |
| <dt> |
| <a name="clopt.cache-sim"></a><span class="term"> |
| <code class="option">--cache-sim=<yes|no> [default: no] </code> |
| </span> |
| </dt> |
| <dd><p>Specify if you want to do full cache simulation. By default, |
| only instruction read accesses will be counted ("Ir"). |
| With cache simulation, further event counters are enabled: |
| Cache misses on instruction reads ("I1mr"/"ILmr"), |
| data read accesses ("Dr") and related cache misses ("D1mr"/"DLmr"), |
| data write accesses ("Dw") and related cache misses ("D1mw"/"DLmw"). |
| For more information, see <a class="xref" href="cg-manual.html" title="5. Cachegrind: a cache and branch-prediction profiler">Cachegrind: a cache and branch-prediction profiler</a>. |
| </p></dd> |
| <dt> |
| <a name="clopt.branch-sim"></a><span class="term"> |
| <code class="option">--branch-sim=<yes|no> [default: no] </code> |
| </span> |
| </dt> |
| <dd><p>Specify if you want to do branch prediction simulation. |
| Further event counters are enabled: Number of executed conditional |
| branches and related predictor misses ("Bc"/"Bcm"), executed indirect |
| jumps and related misses of the jump address predictor ("Bi"/"Bim"). |
| </p></dd> |
| </dl> |
| </div> |
| </div> |
| <div class="sect2"> |
| <div class="titlepage"><div><div><h3 class="title"> |
| <a name="cl-manual.options.cachesimulation"></a>6.3.6. Cache simulation options</h3></div></div></div> |
| <div class="variablelist"> |
| <a name="cl.opts.list.cachesimulation"></a><dl class="variablelist"> |
| <dt> |
| <a name="opt.simulate-wb"></a><span class="term"> |
| <code class="option">--simulate-wb=<yes|no> [default: no] </code> |
| </span> |
| </dt> |
| <dd><p>Specify whether write-back behavior should be simulated, allowing |
| to distinguish LL caches misses with and without write backs. |
| The cache model of Cachegrind/Callgrind does not specify write-through |
| vs. write-back behavior, and this also is not relevant for the number |
| of generated miss counts. However, with explicit write-back simulation |
| it can be decided whether a miss triggers not only the loading of a new |
| cache line, but also if a write back of a dirty cache line had to take |
| place before. The new dirty miss events are ILdmr, DLdmr, and DLdmw, |
| for misses because of instruction read, data read, and data write, |
| respectively. As they produce two memory transactions, they should |
| account for a doubled time estimation in relation to a normal miss. |
| </p></dd> |
| <dt> |
| <a name="opt.simulate-hwpref"></a><span class="term"> |
| <code class="option">--simulate-hwpref=<yes|no> [default: no] </code> |
| </span> |
| </dt> |
| <dd><p>Specify whether simulation of a hardware prefetcher should be |
| added which is able to detect stream access in the second level cache |
| by comparing accesses to separate to each page. |
| As the simulation can not decide about any timing issues of prefetching, |
| it is assumed that any hardware prefetch triggered succeeds before a |
| real access is done. Thus, this gives a best-case scenario by covering |
| all possible stream accesses.</p></dd> |
| <dt> |
| <a name="opt.cacheuse"></a><span class="term"> |
| <code class="option">--cacheuse=<yes|no> [default: no] </code> |
| </span> |
| </dt> |
| <dd><p>Specify whether cache line use should be collected. For every |
| cache line, from loading to it being evicted, the number of accesses |
| as well as the number of actually used bytes is determined. This |
| behavior is related to the code which triggered loading of the cache |
| line. In contrast to miss counters, which shows the position where |
| the symptoms of bad cache behavior (i.e. latencies) happens, the |
| use counters try to pinpoint at the reason (i.e. the code with the |
| bad access behavior). The new counters are defined in a way such |
| that worse behavior results in higher cost. |
| AcCost1 and AcCost2 are counters showing bad temporal locality |
| for L1 and LL caches, respectively. This is done by summing up |
| reciprocal values of the numbers of accesses of each cache line, |
| multiplied by 1000 (as only integer costs are allowed). E.g. for |
| a given source line with 5 read accesses, a value of 5000 AcCost |
| means that for every access, a new cache line was loaded and directly |
| evicted afterwards without further accesses. Similarly, SpLoss1/2 |
| shows bad spatial locality for L1 and LL caches, respectively. It |
| gives the <span class="emphasis"><em>spatial loss</em></span> count of bytes which |
| were loaded into cache but never accessed. It pinpoints at code |
| accessing data in a way such that cache space is wasted. This hints |
| at bad layout of data structures in memory. Assuming a cache line |
| size of 64 bytes and 100 L1 misses for a given source line, the |
| loading of 6400 bytes into L1 was triggered. If SpLoss1 shows a |
| value of 3200 for this line, this means that half of the loaded data was |
| never used, or using a better data layout, only half of the cache |
| space would have been needed. |
| Please note that for cache line use counters, it currently is |
| not possible to provide meaningful inclusive costs. Therefore, |
| inclusive cost of these counters should be ignored. |
| </p></dd> |
| <dt> |
| <a name="opt.I1"></a><span class="term"> |
| <code class="option">--I1=<size>,<associativity>,<line size> </code> |
| </span> |
| </dt> |
| <dd><p>Specify the size, associativity and line size of the level 1 |
| instruction cache. </p></dd> |
| <dt> |
| <a name="opt.D1"></a><span class="term"> |
| <code class="option">--D1=<size>,<associativity>,<line size> </code> |
| </span> |
| </dt> |
| <dd><p>Specify the size, associativity and line size of the level 1 |
| data cache.</p></dd> |
| <dt> |
| <a name="opt.LL"></a><span class="term"> |
| <code class="option">--LL=<size>,<associativity>,<line size> </code> |
| </span> |
| </dt> |
| <dd><p>Specify the size, associativity and line size of the last-level |
| cache.</p></dd> |
| </dl> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <div class="titlepage"><div><div><h2 class="title" style="clear: both"> |
| <a name="cl-manual.monitor-commands"></a>6.4. Callgrind Monitor Commands</h2></div></div></div> |
| <p>The Callgrind tool provides monitor commands handled by the Valgrind |
| gdbserver (see <a class="xref" href="manual-core-adv.html#manual-core-adv.gdbserver-commandhandling" title="3.2.5. Monitor command handling by the Valgrind gdbserver">Monitor command handling by the Valgrind gdbserver</a>). |
| </p> |
| <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> |
| <li class="listitem"><p><code class="varname">dump [<dump_hint>]</code> requests to dump the |
| profile data. </p></li> |
| <li class="listitem"><p><code class="varname">zero</code> requests to zero the profile data |
| counters. </p></li> |
| <li class="listitem"><p><code class="varname">instrumentation [on|off]</code> requests to set |
| (if parameter on/off is given) or get the current instrumentation state. |
| </p></li> |
| <li class="listitem"><p><code class="varname">status</code> requests to print out some status |
| information.</p></li> |
| </ul></div> |
| </div> |
| <div class="sect1"> |
| <div class="titlepage"><div><div><h2 class="title" style="clear: both"> |
| <a name="cl-manual.clientrequests"></a>6.5. Callgrind specific client requests</h2></div></div></div> |
| <p>Callgrind provides the following specific client requests in |
| <code class="filename">callgrind.h</code>. See that file for the exact details of |
| their arguments.</p> |
| <div class="variablelist"> |
| <a name="cl.clientrequests.list"></a><dl class="variablelist"> |
| <dt> |
| <a name="cr.dump-stats"></a><span class="term"> |
| <code class="computeroutput">CALLGRIND_DUMP_STATS</code> |
| </span> |
| </dt> |
| <dd><p>Force generation of a profile dump at specified position |
| in code, for the current thread only. Written counters will be reset |
| to zero.</p></dd> |
| <dt> |
| <a name="cr.dump-stats-at"></a><span class="term"> |
| <code class="computeroutput">CALLGRIND_DUMP_STATS_AT(string)</code> |
| </span> |
| </dt> |
| <dd><p>Same as <code class="computeroutput">CALLGRIND_DUMP_STATS</code>, |
| but allows to specify a string to be able to distinguish profile |
| dumps.</p></dd> |
| <dt> |
| <a name="cr.zero-stats"></a><span class="term"> |
| <code class="computeroutput">CALLGRIND_ZERO_STATS</code> |
| </span> |
| </dt> |
| <dd><p>Reset the profile counters for the current thread to zero.</p></dd> |
| <dt> |
| <a name="cr.toggle-collect"></a><span class="term"> |
| <code class="computeroutput">CALLGRIND_TOGGLE_COLLECT</code> |
| </span> |
| </dt> |
| <dd><p>Toggle the collection state. This allows to ignore events |
| with regard to profile counters. See also options |
| <code class="option"><a class="xref" href="cl-manual.html#opt.collect-atstart">--collect-atstart</a></code> and |
| <code class="option"><a class="xref" href="cl-manual.html#opt.toggle-collect">--toggle-collect</a></code>.</p></dd> |
| <dt> |
| <a name="cr.start-instr"></a><span class="term"> |
| <code class="computeroutput">CALLGRIND_START_INSTRUMENTATION</code> |
| </span> |
| </dt> |
| <dd><p>Start full Callgrind instrumentation if not already enabled. |
| When cache simulation is done, this will flush the simulated cache |
| and lead to an artificial cache warmup phase afterwards with |
| cache misses which would not have happened in reality. See also |
| option <code class="option"><a class="xref" href="cl-manual.html#opt.instr-atstart">--instr-atstart</a></code>.</p></dd> |
| <dt> |
| <a name="cr.stop-instr"></a><span class="term"> |
| <code class="computeroutput">CALLGRIND_STOP_INSTRUMENTATION</code> |
| </span> |
| </dt> |
| <dd><p>Stop full Callgrind instrumentation if not already disabled. |
| This flushes Valgrinds translation cache, and does no additional |
| instrumentation afterwards: it effectivly will run at the same |
| speed as Nulgrind, i.e. at minimal slowdown. Use this to |
| speed up the Callgrind run for uninteresting code parts. Use |
| <code class="computeroutput"><a class="xref" href="cl-manual.html#cr.start-instr">CALLGRIND_START_INSTRUMENTATION</a></code> to |
| enable instrumentation again. See also option |
| <code class="option"><a class="xref" href="cl-manual.html#opt.instr-atstart">--instr-atstart</a></code>.</p></dd> |
| </dl> |
| </div> |
| </div> |
| <div class="sect1"> |
| <div class="titlepage"><div><div><h2 class="title" style="clear: both"> |
| <a name="cl-manual.callgrind_annotate-options"></a>6.6. callgrind_annotate Command-line Options</h2></div></div></div> |
| <div class="variablelist"> |
| <a name="callgrind_annotate.opts.list"></a><dl class="variablelist"> |
| <dt><span class="term"><code class="option">-h --help</code></span></dt> |
| <dd><p>Show summary of options.</p></dd> |
| <dt><span class="term"><code class="option">--version</code></span></dt> |
| <dd><p>Show version of callgrind_annotate.</p></dd> |
| <dt><span class="term"> |
| <code class="option">--show=A,B,C [default: all]</code> |
| </span></dt> |
| <dd><p>Only show figures for events A,B,C.</p></dd> |
| <dt><span class="term"> |
| <code class="option">--sort=A,B,C</code> |
| </span></dt> |
| <dd> |
| <p>Sort columns by events A,B,C [event column order].</p> |
| <p>Optionally, each event is followed by a : and a threshold, |
| to specify different thresholds depending on the event.</p> |
| </dd> |
| <dt><span class="term"> |
| <code class="option">--threshold=<0--100> [default: 99%] </code> |
| </span></dt> |
| <dd><p>Percentage of counts (of primary sort event) we are |
| interested in.</p></dd> |
| <dt><span class="term"> |
| <code class="option">--auto=<yes|no> [default: no] </code> |
| </span></dt> |
| <dd><p>Annotate all source files containing functions that helped |
| reach the event count threshold.</p></dd> |
| <dt><span class="term"> |
| <code class="option">--context=N [default: 8] </code> |
| </span></dt> |
| <dd><p>Print N lines of context before and after annotated |
| lines.</p></dd> |
| <dt><span class="term"> |
| <code class="option">--inclusive=<yes|no> [default: no] </code> |
| </span></dt> |
| <dd><p>Add subroutine costs to functions calls.</p></dd> |
| <dt><span class="term"> |
| <code class="option">--tree=<none|caller|calling|both> [default: none] </code> |
| </span></dt> |
| <dd><p>Print for each function their callers, the called functions |
| or both.</p></dd> |
| <dt><span class="term"> |
| <code class="option">-I, --include=<dir> </code> |
| </span></dt> |
| <dd><p>Add <code class="option">dir</code> to the list of directories to search |
| for source files.</p></dd> |
| </dl> |
| </div> |
| </div> |
| <div class="sect1"> |
| <div class="titlepage"><div><div><h2 class="title" style="clear: both"> |
| <a name="cl-manual.callgrind_control-options"></a>6.7. callgrind_control Command-line Options</h2></div></div></div> |
| <p>By default, callgrind_control acts on all programs run by the |
| current user under Callgrind. It is possible to limit the actions to |
| specified Callgrind runs by providing a list of pids or program names as |
| argument. The default action is to give some brief information about the |
| applications being run under Callgrind.</p> |
| <div class="variablelist"> |
| <a name="callgrind_control.opts.list"></a><dl class="variablelist"> |
| <dt><span class="term"><code class="option">-h --help</code></span></dt> |
| <dd><p>Show a short description, usage, and summary of options.</p></dd> |
| <dt><span class="term"><code class="option">--version</code></span></dt> |
| <dd><p>Show version of callgrind_control.</p></dd> |
| <dt><span class="term"><code class="option">-l --long</code></span></dt> |
| <dd><p>Show also the working directory, in addition to the brief |
| information given by default. |
| </p></dd> |
| <dt><span class="term"><code class="option">-s --stat</code></span></dt> |
| <dd><p>Show statistics information about active Callgrind runs.</p></dd> |
| <dt><span class="term"><code class="option">-b --back</code></span></dt> |
| <dd><p>Show stack/back traces of each thread in active Callgrind runs. For |
| each active function in the stack trace, also the number of invocations |
| since program start (or last dump) is shown. This option can be |
| combined with -e to show inclusive cost of active functions.</p></dd> |
| <dt><span class="term"><code class="option">-e [A,B,...] </code> (default: all)</span></dt> |
| <dd><p>Show the current per-thread, exclusive cost values of event |
| counters. If no explicit event names are given, figures for all event |
| types which are collected in the given Callgrind run are |
| shown. Otherwise, only figures for event types A, B, ... are shown. If |
| this option is combined with -b, inclusive cost for the functions of |
| each active stack frame is provided, too. |
| </p></dd> |
| <dt><span class="term"><code class="option">--dump[=<desc>] </code> (default: no description)</span></dt> |
| <dd><p>Request the dumping of profile information. Optionally, a |
| description can be specified which is written into the dump as part of |
| the information giving the reason which triggered the dump action. This |
| can be used to distinguish multiple dumps.</p></dd> |
| <dt><span class="term"><code class="option">-z --zero</code></span></dt> |
| <dd><p>Zero all event counters.</p></dd> |
| <dt><span class="term"><code class="option">-k --kill</code></span></dt> |
| <dd><p>Force a Callgrind run to be terminated.</p></dd> |
| <dt><span class="term"><code class="option">--instr=<on|off></code></span></dt> |
| <dd><p>Switch instrumentation mode on or off. If a Callgrind run has |
| instrumentation disabled, no simulation is done and no events are |
| counted. This is useful to skip uninteresting program parts, as there |
| is much less slowdown (same as with the Valgrind tool "none"). See also |
| the Callgrind option <code class="option">--instr-atstart</code>.</p></dd> |
| <dt><span class="term"><code class="option">--vgdb-prefix=<prefix></code></span></dt> |
| <dd><p>Specify the vgdb prefix to use by callgrind_control. |
| callgrind_control internally uses vgdb to find and control the active |
| Callgrind runs. If the <code class="option">--vgdb-prefix</code> option was used |
| for launching valgrind, then the same option must be given to |
| callgrind_control.</p></dd> |
| </dl> |
| </div> |
| </div> |
| </div> |
| <div> |
| <br><table class="nav" width="100%" cellspacing="3" cellpadding="2" border="0" summary="Navigation footer"> |
| <tr> |
| <td rowspan="2" width="40%" align="left"> |
| <a accesskey="p" href="cg-manual.html"><< 5. Cachegrind: a cache and branch-prediction profiler</a> </td> |
| <td width="20%" align="center"><a accesskey="u" href="manual.html">Up</a></td> |
| <td rowspan="2" width="40%" align="right"> <a accesskey="n" href="hg-manual.html">7. Helgrind: a thread error detector >></a> |
| </td> |
| </tr> |
| <tr><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td></tr> |
| </table> |
| </div> |
| </body> |
| </html> |