Blame - docs/html/cl-manual.html - platform/external/valgrind

blob: 35f29cf11095ad9e3d86cd4edbb8759cb7c1fc60 [file] [log] [blame]

Elliott Hughes	a0664b9	2017-04-18 17:46:52 -0700	[diff] [blame^]	1	<html>
				2	<head>
				3	<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
				4	<title>6. Callgrind: a call-graph generating cache and branch prediction profiler</title>
				5	<link rel="stylesheet" type="text/css" href="vg_basic.css">
				6	<meta name="generator" content="DocBook XSL Stylesheets V1.78.1">
				7	<link rel="home" href="index.html" title="Valgrind Documentation">
				8	<link rel="up" href="manual.html" title="Valgrind User Manual">
				9	<link rel="prev" href="cg-manual.html" title="5. Cachegrind: a cache and branch-prediction profiler">
				10	<link rel="next" href="hg-manual.html" title="7. Helgrind: a thread error detector">
				11	</head>
				12	<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
				13	<div><table class="nav" width="100%" cellspacing="3" cellpadding="3" border="0" summary="Navigation header"><tr>
				14	<td width="22px" align="center" valign="middle"><a accesskey="p" href="cg-manual.html"><img src="images/prev.png" width="18" height="21" border="0" alt="Prev"></a></td>
				15	<td width="25px" align="center" valign="middle"><a accesskey="u" href="manual.html"><img src="images/up.png" width="21" height="18" border="0" alt="Up"></a></td>
				16	<td width="31px" align="center" valign="middle"><a accesskey="h" href="index.html"><img src="images/home.png" width="27" height="20" border="0" alt="Up"></a></td>
				17	<th align="center" valign="middle">Valgrind User Manual</th>
				18	<td width="22px" align="center" valign="middle"><a accesskey="n" href="hg-manual.html"><img src="images/next.png" width="18" height="21" border="0" alt="Next"></a></td>
				19	</tr></table></div>
				20	<div class="chapter">
				21	<div class="titlepage"><div><div><h1 class="title">
				22	<a name="cl-manual"></a>6. Callgrind: a call-graph generating cache and branch prediction profiler</h1></div></div></div>
				23	<div class="toc">
				24	<p><b>Table of Contents</b></p>
				25	<dl class="toc">
				26	<dt><span class="sect1"><a href="cl-manual.html#cl-manual.use">6.1. Overview</a></span></dt>
				27	<dd><dl>
				28	<dt><span class="sect2"><a href="cl-manual.html#cl-manual.functionality">6.1.1. Functionality</a></span></dt>
				29	<dt><span class="sect2"><a href="cl-manual.html#cl-manual.basics">6.1.2. Basic Usage</a></span></dt>
				30	</dl></dd>
				31	<dt><span class="sect1"><a href="cl-manual.html#cl-manual.usage">6.2. Advanced Usage</a></span></dt>
				32	<dd><dl>
				33	<dt><span class="sect2"><a href="cl-manual.html#cl-manual.dumps">6.2.1. Multiple profiling dumps from one program run</a></span></dt>
				34	<dt><span class="sect2"><a href="cl-manual.html#cl-manual.limits">6.2.2. Limiting the range of collected events</a></span></dt>
				35	<dt><span class="sect2"><a href="cl-manual.html#cl-manual.busevents">6.2.3. Counting global bus events</a></span></dt>
				36	<dt><span class="sect2"><a href="cl-manual.html#cl-manual.cycles">6.2.4. Avoiding cycles</a></span></dt>
				37	<dt><span class="sect2"><a href="cl-manual.html#cl-manual.forkingprograms">6.2.5. Forking Programs</a></span></dt>
				38	</dl></dd>
				39	<dt><span class="sect1"><a href="cl-manual.html#cl-manual.options">6.3. Callgrind Command-line Options</a></span></dt>
				40	<dd><dl>
				41	<dt><span class="sect2"><a href="cl-manual.html#cl-manual.options.creation">6.3.1. Dump creation options</a></span></dt>
				42	<dt><span class="sect2"><a href="cl-manual.html#cl-manual.options.activity">6.3.2. Activity options</a></span></dt>
				43	<dt><span class="sect2"><a href="cl-manual.html#cl-manual.options.collection">6.3.3. Data collection options</a></span></dt>
				44	<dt><span class="sect2"><a href="cl-manual.html#cl-manual.options.separation">6.3.4. Cost entity separation options</a></span></dt>
				45	<dt><span class="sect2"><a href="cl-manual.html#cl-manual.options.simulation">6.3.5. Simulation options</a></span></dt>
				46	<dt><span class="sect2"><a href="cl-manual.html#cl-manual.options.cachesimulation">6.3.6. Cache simulation options</a></span></dt>
				47	</dl></dd>
				48	<dt><span class="sect1"><a href="cl-manual.html#cl-manual.monitor-commands">6.4. Callgrind Monitor Commands</a></span></dt>
				49	<dt><span class="sect1"><a href="cl-manual.html#cl-manual.clientrequests">6.5. Callgrind specific client requests</a></span></dt>
				50	<dt><span class="sect1"><a href="cl-manual.html#cl-manual.callgrind_annotate-options">6.6. callgrind_annotate Command-line Options</a></span></dt>
				51	<dt><span class="sect1"><a href="cl-manual.html#cl-manual.callgrind_control-options">6.7. callgrind_control Command-line Options</a></span></dt>
				52	</dl>
				53	</div>
				54	<p>To use this tool, you must specify
				55	<code class="option">--tool=callgrind</code> on the
				56	Valgrind command line.</p>
				57	<div class="sect1">
				58	<div class="titlepage"><div><div><h2 class="title" style="clear: both">
				59	<a name="cl-manual.use"></a>6.1. Overview</h2></div></div></div>
				60	<p>Callgrind is a profiling tool that records the call history among
				61	functions in a program's run as a call-graph.
				62	By default, the collected data consists of
				63	the number of instructions executed, their relationship
				64	to source lines, the caller/callee relationship between functions,
				65	and the numbers of such calls.
				66	Optionally, cache simulation and/or branch prediction (similar to Cachegrind)
				67	can produce further information about the runtime behavior of an application.
				68	</p>
				69	<p>The profile data is written out to a file at program
				70	termination. For presentation of the data, and interactive control
				71	of the profiling, two command line tools are provided:</p>
				72	<div class="variablelist"><dl class="variablelist">
				73	<dt><span class="term"><span class="command"><strong>callgrind_annotate</strong></span></span></dt>
				74	<dd>
				75	<p>This command reads in the profile data, and prints a
				76	sorted lists of functions, optionally with source annotation.</p>
				77	<p>For graphical visualization of the data, try
				78	<a class="ulink" href="http://kcachegrind.sourceforge.net/cgi-bin/show.cgi/KcacheGrindIndex" target="_top">KCachegrind</a>, which is a KDE/Qt based
				79	GUI that makes it easy to navigate the large amount of data that
				80	Callgrind produces.</p>
				81	</dd>
				82	<dt><span class="term"><span class="command"><strong>callgrind_control</strong></span></span></dt>
				83	<dd><p>This command enables you to interactively observe and control
				84	the status of a program currently running under Callgrind's control,
				85	without stopping the program. You can get statistics information as
				86	well as the current stack trace, and you can request zeroing of counters
				87	or dumping of profile data.</p></dd>
				88	</dl></div>
				89	<div class="sect2">
				90	<div class="titlepage"><div><div><h3 class="title">
				91	<a name="cl-manual.functionality"></a>6.1.1. Functionality</h3></div></div></div>
				92	<p>Cachegrind collects flat profile data: event counts (data reads,
				93	cache misses, etc.) are attributed directly to the function they
				94	occurred in. This cost attribution mechanism is
				95	called <span class="emphasis"><em>self</em></span> or <span class="emphasis"><em>exclusive</em></span>
				96	attribution.</p>
				97	<p>Callgrind extends this functionality by propagating costs
				98	across function call boundaries. If function <code class="function">foo</code> calls
				99	<code class="function">bar</code>, the costs from <code class="function">bar</code> are added into
				100	<code class="function">foo</code>'s costs. When applied to the program as a whole,
				101	this builds up a picture of so called <span class="emphasis"><em>inclusive</em></span>
				102	costs, that is, where the cost of each function includes the costs of
				103	all functions it called, directly or indirectly.</p>
				104	<p>As an example, the inclusive cost of
				105	<code class="function">main</code> should be almost 100 percent
				106	of the total program cost. Because of costs arising before
				107	<code class="function">main</code> is run, such as
				108	initialization of the run time linker and construction of global C++
				109	objects, the inclusive cost of <code class="function">main</code>
				110	is not exactly 100 percent of the total program cost.</p>
				111	<p>Together with the call graph, this allows you to find the
				112	specific call chains starting from
				113	<code class="function">main</code> in which the majority of the
				114	program's costs occur. Caller/callee cost attribution is also useful
				115	for profiling functions called from multiple call sites, and where
				116	optimization opportunities depend on changing code in the callers, in
				117	particular by reducing the call count.</p>
				118	<p>Callgrind's cache simulation is based on that of Cachegrind.
				119	Read the documentation for <a class="xref" href="cg-manual.html" title="5. Cachegrind: a cache and branch-prediction profiler">Cachegrind: a cache and branch-prediction profiler</a> first. The material
				120	below describes the features supported in addition to Cachegrind's
				121	features.</p>
				122	<p>Callgrind's ability to detect function calls and returns depends
				123	on the instruction set of the platform it is run on. It works best on
				124	x86 and amd64, and unfortunately currently does not work so well on
				125	PowerPC, ARM, Thumb or MIPS code. This is because there are no explicit
				126	call or return instructions in these instruction sets, so Callgrind
				127	has to rely on heuristics to detect calls and returns.</p>
				128	</div>
				129	<div class="sect2">
				130	<div class="titlepage"><div><div><h3 class="title">
				131	<a name="cl-manual.basics"></a>6.1.2. Basic Usage</h3></div></div></div>
				132	<p>As with Cachegrind, you probably want to compile with debugging info
				133	(the <code class="option">-g</code> option) and with optimization turned on.</p>
				134	<p>To start a profile run for a program, execute:
				135	</p>
				136	<pre class="screen">valgrind --tool=callgrind [callgrind options] your-program [program options]</pre>
				137	<p>
				138	</p>
				139	<p>While the simulation is running, you can observe execution with:
				140	</p>
				141	<pre class="screen">callgrind_control -b</pre>
				142	<p>
				143	This will print out the current backtrace. To annotate the backtrace with
				144	event counts, run
				145	</p>
				146	<pre class="screen">callgrind_control -e -b</pre>
				147	<p>
				148	</p>
				149	<p>After program termination, a profile data file named
				150	<code class="computeroutput">callgrind.out.<pid></code>
				151	is generated, where <span class="emphasis"><em>pid</em></span> is the process ID
				152	of the program being profiled.
				153	The data file contains information about the calls made in the
				154	program among the functions executed, together with
				155	<span class="command"><strong>Instruction Read</strong></span> (Ir) event counts.</p>
				156	<p>To generate a function-by-function summary from the profile
				157	data file, use
				158	</p>
				159	<pre class="screen">callgrind_annotate [options] callgrind.out.<pid></pre>
				160	<p>
				161	This summary is similar to the output you get from a Cachegrind
				162	run with cg_annotate: the list
				163	of functions is ordered by exclusive cost of functions, which also
				164	are the ones that are shown.
				165	Important for the additional features of Callgrind are
				166	the following two options:</p>
				167	<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
				168	<li class="listitem"><p><code class="option">--inclusive=yes</code>: Instead of using
				169	exclusive cost of functions as sorting order, use and show
				170	inclusive cost.</p></li>
				171	<li class="listitem"><p><code class="option">--tree=both</code>: Interleave into the
				172	top level list of functions, information on the callers and the callees
				173	of each function. In these lines, which represents executed
				174	calls, the cost gives the number of events spent in the call.
				175	Indented, above each function, there is the list of callers,
				176	and below, the list of callees. The sum of events in calls to
				177	a given function (caller lines), as well as the sum of events in
				178	calls from the function (callee lines) together with the self
				179	cost, gives the total inclusive cost of the function.</p></li>
				180	</ul></div>
				181	<p>Use <code class="option">--auto=yes</code> to get annotated source code
				182	for all relevant functions for which the source can be found. In
				183	addition to source annotation as produced by
				184	<code class="computeroutput">cg_annotate</code>, you will see the
				185	annotated call sites with call counts. For all other options,
				186	consult the (Cachegrind) documentation for
				187	<code class="computeroutput">cg_annotate</code>.
				188	</p>
				189	<p>For better call graph browsing experience, it is highly recommended
				190	to use <a class="ulink" href="http://kcachegrind.sourceforge.net/cgi-bin/show.cgi/KcacheGrindIndex" target="_top">KCachegrind</a>.
				191	If your code
				192	has a significant fraction of its cost in <span class="emphasis"><em>cycles</em></span> (sets
				193	of functions calling each other in a recursive manner), you have to
				194	use KCachegrind, as <code class="computeroutput">callgrind_annotate</code>
				195	currently does not do any cycle detection, which is important to get correct
				196	results in this case.</p>
				197	<p>If you are additionally interested in measuring the
				198	cache behavior of your program, use Callgrind with the option
				199	<code class="option"><a class="xref" href="cl-manual.html#clopt.cache-sim">--cache-sim</a>=yes</code>. For
				200	branch prediction simulation, use <code class="option"><a class="xref" href="cl-manual.html#clopt.branch-sim">--branch-sim</a>=yes</code>.
				201	Expect a further slow down approximately by a factor of 2.</p>
				202	<p>If the program section you want to profile is somewhere in the
				203	middle of the run, it is beneficial to
				204	<span class="emphasis"><em>fast forward</em></span> to this section without any
				205	profiling, and then enable profiling. This is achieved by using
				206	the command line option
				207	<code class="option"><a class="xref" href="cl-manual.html#opt.instr-atstart">--instr-atstart</a>=no</code>
				208	and running, in a shell:
				209	<code class="computeroutput">callgrind_control -i on</code> just before the
				210	interesting code section is executed. To exactly specify
				211	the code position where profiling should start, use the client request
				212	<code class="computeroutput"><a class="xref" href="cl-manual.html#cr.start-instr">CALLGRIND_START_INSTRUMENTATION</a></code>.</p>
				213	<p>If you want to be able to see assembly code level annotation, specify
				214	<code class="option"><a class="xref" href="cl-manual.html#opt.dump-instr">--dump-instr</a>=yes</code>. This will produce
				215	profile data at instruction granularity. Note that the resulting profile
				216	data
				217	can only be viewed with KCachegrind. For assembly annotation, it also is
				218	interesting to see more details of the control flow inside of functions,
				219	i.e. (conditional) jumps. This will be collected by further specifying
				220	<code class="option"><a class="xref" href="cl-manual.html#opt.collect-jumps">--collect-jumps</a>=yes</code>.</p>
				221	</div>
				222	</div>
				223	<div class="sect1">
				224	<div class="titlepage"><div><div><h2 class="title" style="clear: both">
				225	<a name="cl-manual.usage"></a>6.2. Advanced Usage</h2></div></div></div>
				226	<div class="sect2">
				227	<div class="titlepage"><div><div><h3 class="title">
				228	<a name="cl-manual.dumps"></a>6.2.1. Multiple profiling dumps from one program run</h3></div></div></div>
				229	<p>Sometimes you are not interested in characteristics of a full
				230	program run, but only of a small part of it, for example execution of one
				231	algorithm. If there are multiple algorithms, or one algorithm
				232	running with different input data, it may even be useful to get different
				233	profile information for different parts of a single program run.</p>
				234	<p>Profile data files have names of the form
				235	</p>
				236	<pre class="screen">
				237	callgrind.out.<span class="emphasis"><em>pid</em></span>.<span class="emphasis"><em>part</em></span>-<span class="emphasis"><em>threadID</em></span>
				238	</pre>
				239	<p>
				240	</p>
				241	<p>where <span class="emphasis"><em>pid</em></span> is the PID of the running
				242	program, <span class="emphasis"><em>part</em></span> is a number incremented on each
				243	dump (".part" is skipped for the dump at program termination), and
				244	<span class="emphasis"><em>threadID</em></span> is a thread identification
				245	("-threadID" is only used if you request dumps of individual
				246	threads with <code class="option"><a class="xref" href="cl-manual.html#opt.separate-threads">--separate-threads</a>=yes</code>).</p>
				247	<p>There are different ways to generate multiple profile dumps
				248	while a program is running under Callgrind's supervision. Nevertheless,
				249	all methods trigger the same action, which is "dump all profile
				250	information since the last dump or program start, and zero cost
				251	counters afterwards". To allow for zeroing cost counters without
				252	dumping, there is a second action "zero all cost counters now".
				253	The different methods are:</p>
				254	<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
				255	<li class="listitem"><p><span class="command"><strong>Dump on program termination.</strong></span>
				256	This method is the standard way and doesn't need any special
				257	action on your part.</p></li>
				258	<li class="listitem">
				259	<p><span class="command"><strong>Spontaneous, interactive dumping.</strong></span> Use
				260	</p>
				261	<pre class="screen">callgrind_control -d [hint [PID/Name]]</pre>
				262	<p> to
				263	request the dumping of profile information of the supervised
				264	application with PID or Name. <span class="emphasis"><em>hint</em></span> is an
				265	arbitrary string you can optionally specify to later be able to
				266	distinguish profile dumps. The control program will not terminate
				267	before the dump is completely written. Note that the application
				268	must be actively running for detection of the dump command. So,
				269	for a GUI application, resize the window, or for a server, send a
				270	request.</p>
				271	<p>If you are using <a class="ulink" href="http://kcachegrind.sourceforge.net/cgi-bin/show.cgi/KcacheGrindIndex" target="_top">KCachegrind</a>
				272	for browsing of profile information, you can use the toolbar
				273	button <span class="command"><strong>Force dump</strong></span>. This will request a dump
				274	and trigger a reload after the dump is written.</p>
				275	</li>
				276	<li class="listitem"><p><span class="command"><strong>Periodic dumping after execution of a specified
				277	number of basic blocks</strong></span>. For this, use the command line
				278	option <code class="option"><a class="xref" href="cl-manual.html#opt.dump-every-bb">--dump-every-bb</a>=count</code>.
				279	</p></li>
				280	<li class="listitem">
				281	<p><span class="command"><strong>Dumping at enter/leave of specified functions.</strong></span>
				282	Use the
				283	option <code class="option"><a class="xref" href="cl-manual.html#opt.dump-before">--dump-before</a>=function</code>
				284	and <code class="option"><a class="xref" href="cl-manual.html#opt.dump-after">--dump-after</a>=function</code>.
				285	To zero cost counters before entering a function, use
				286	<code class="option"><a class="xref" href="cl-manual.html#opt.zero-before">--zero-before</a>=function</code>.</p>
				287	<p>You can specify these options multiple times for different
				288	functions. Function specifications support wildcards: e.g. use
				289	<code class="option"><a class="xref" href="cl-manual.html#opt.dump-before">--dump-before</a>='foo*'</code> to
				290	generate dumps before entering any function starting with
				291	<span class="emphasis"><em>foo</em></span>.</p>
				292	</li>
				293	<li class="listitem"><p><span class="command"><strong>Program controlled dumping.</strong></span>
				294	Insert
				295	<code class="computeroutput"><a class="xref" href="cl-manual.html#cr.dump-stats">CALLGRIND_DUMP_STATS</a>;</code>
				296	at the position in your code where you want a profile dump to happen. Use
				297	<code class="computeroutput"><a class="xref" href="cl-manual.html#cr.zero-stats">CALLGRIND_ZERO_STATS</a>;</code> to only
				298	zero profile counters.
				299	See <a class="xref" href="cl-manual.html#cl-manual.clientrequests" title="6.5. Callgrind specific client requests">Client request reference</a> for more information on
				300	Callgrind specific client requests.</p></li>
				301	</ul></div>
				302	<p>If you are running a multi-threaded application and specify the
				303	command line option <code class="option"><a class="xref" href="cl-manual.html#opt.separate-threads">--separate-threads</a>=yes</code>,
				304	every thread will be profiled on its own and will create its own
				305	profile dump. Thus, the last two methods will only generate one dump
				306	of the currently running thread. With the other methods, you will get
				307	multiple dumps (one for each thread) on a dump request.</p>
				308	</div>
				309	<div class="sect2">
				310	<div class="titlepage"><div><div><h3 class="title">
				311	<a name="cl-manual.limits"></a>6.2.2. Limiting the range of collected events</h3></div></div></div>
				312	<p>By default, whenever events are happening (such as an
				313	instruction execution or cache hit/miss), Callgrind is aggregating
				314	them into event counters. However, you may be interested only in
				315	what is happening within a given function or starting from a given
				316	program phase. To this end, you can disable event aggregation for
				317	uninteresting program parts. While attribution of events to
				318	functions as well as producing seperate output per program phase
				319	can be done by other means (see previous section), there are two
				320	benefits by disabling aggregation. First, this is very
				321	fine-granular (e.g. just for a loop within a function). Second,
				322	disabling event aggregation for complete program phases allows to
				323	switch off time-consuming cache simulation and allows Callgrind to
				324	progress at much higher speed with an slowdown of around factor 2
				325	(identical to <code class="computeroutput">valgrind
				326	--tool=none</code>).
				327	</p>
				328	<p>There are two aspects which influence whether Callgrind is
				329	aggregating events at some point in time of program execution.
				330	First, there is the <span class="emphasis"><em>collection state</em></span>. If this
				331	is off, no aggregation will be done. By changing the collection
				332	state, you can control event aggregation at a very fine
				333	granularity. However, there is not much difference in regard to
				334	execution speed of Callgrind. By default, collection is switched
				335	on, but can be disabled by different means (see below). Second,
				336	there is the <span class="emphasis"><em>instrumentation mode</em></span> in which
				337	Callgrind is running. This mode either can be on or off. If
				338	instrumentation is off, no observation of actions in the program
				339	will be done and thus, no actions will be forwarded to the
				340	simulator which could trigger events. In the end, no events will
				341	be aggregated. The huge benefit is the much higher speed with
				342	instrumentation switched off. However, this only should be used
				343	with care and in a coarse fashion: every mode change resets the
				344	simulator state (ie. whether a memory block is cached or not) and
				345	flushes Valgrinds internal cache of instrumented code blocks,
				346	resulting in latency penalty at switching time. Also, cache
				347	simulator results directly after switching on instrumentation will
				348	be skewed due to identified cache misses which would not happen in
				349	reality (if you care about this warm-up effect, you should make
				350	sure to temporarly have collection state switched off directly
				351	after turning instrumentation mode on). However, switching
				352	instrumentation state is very useful to skip larger program phases
				353	such as an initialization phase. By default, instrumentation is
				354	switched on, but as with the collection state, can be changed by
				355	various means.
				356	</p>
				357	<p>Callgrind can start with instrumentation mode switched off by
				358	specifying
				359	option <code class="option"><a class="xref" href="cl-manual.html#opt.instr-atstart">--instr-atstart</a>=no</code>.
				360	Afterwards, instrumentation can be controlled in two ways: first,
				361	interactively with: </p>
				362	<pre class="screen">callgrind_control -i on</pre>
				363	<p> (and
				364	switching off again by specifying "off" instead of "on"). Second,
				365	instrumentation state can be programatically changed with the
				366	macros <code class="computeroutput"><a class="xref" href="cl-manual.html#cr.start-instr">CALLGRIND_START_INSTRUMENTATION</a>;</code>
				367	and <code class="computeroutput"><a class="xref" href="cl-manual.html#cr.stop-instr">CALLGRIND_STOP_INSTRUMENTATION</a>;</code>.
				368	</p>
				369	<p>Similarly, the collection state at program start can be
				370	switched off
				371	by <code class="option"><a class="xref" href="cl-manual.html#opt.instr-atstart">--instr-atstart</a>=no</code>. During
				372	execution, it can be controlled programatically with the
				373	macro <code class="computeroutput">CALLGRIND_TOGGLE_COLLECT;</code>.
				374	Further, you can limit event collection to a specific function by
				375	using <code class="option"><a class="xref" href="cl-manual.html#opt.toggle-collect">--toggle-collect</a>=function</code>.
				376	This will toggle the collection state on entering and leaving the
				377	specified function. When this option is in effect, the default
				378	collection state at program start is "off". Only events happening
				379	while running inside of the given function will be
				380	collected. Recursive calls of the given function do not trigger
				381	any action. This option can be given multiple times to specify
				382	different functions of interest.</p>
				383	</div>
				384	<div class="sect2">
				385	<div class="titlepage"><div><div><h3 class="title">
				386	<a name="cl-manual.busevents"></a>6.2.3. Counting global bus events</h3></div></div></div>
				387	<p>For access to shared data among threads in a multithreaded
				388	code, synchronization is required to avoid raced conditions.
				389	Synchronization primitives are usually implemented via atomic instructions.
				390	However, excessive use of such instructions can lead to performance
				391	issues.</p>
				392	<p>To enable analysis of this problem, Callgrind optionally can count
				393	the number of atomic instructions executed. More precisely, for x86/x86_64,
				394	these are instructions using a lock prefix. For architectures supporting
				395	LL/SC, these are the number of SC instructions executed. For both, the term
				396	"global bus events" is used.</p>
				397	<p>The short name of the event type used for global bus events is "Ge".
				398	To count global bus events, use <code class="option"><a class="xref" href="cl-manual.html#clopt.collect-bus">--collect-bus</a>=yes</code>.
				399	</p>
				400	</div>
				401	<div class="sect2">
				402	<div class="titlepage"><div><div><h3 class="title">
				403	<a name="cl-manual.cycles"></a>6.2.4. Avoiding cycles</h3></div></div></div>
				404	<p>Informally speaking, a cycle is a group of functions which
				405	call each other in a recursive way.</p>
				406	<p>Formally speaking, a cycle is a nonempty set S of functions,
				407	such that for every pair of functions F and G in S, it is possible
				408	to call from F to G (possibly via intermediate functions) and also
				409	from G to F. Furthermore, S must be maximal -- that is, be the
				410	largest set of functions satisfying this property. For example, if
				411	a third function H is called from inside S and calls back into S,
				412	then H is also part of the cycle and should be included in S.</p>
				413	<p>Recursion is quite usual in programs, and therefore, cycles
				414	sometimes appear in the call graph output of Callgrind. However,
				415	the title of this chapter should raise two questions: What is bad
				416	about cycles which makes you want to avoid them? And: How can
				417	cycles be avoided without changing program code?</p>
				418	<p>Cycles are not bad in itself, but tend to make performance
				419	analysis of your code harder. This is because inclusive costs
				420	for calls inside of a cycle are meaningless. The definition of
				421	inclusive cost, i.e. self cost of a function plus inclusive cost
				422	of its callees, needs a topological order among functions. For
				423	cycles, this does not hold true: callees of a function in a cycle include
				424	the function itself. Therefore, KCachegrind does cycle detection
				425	and skips visualization of any inclusive cost for calls inside
				426	of cycles. Further, all functions in a cycle are collapsed into artifical
				427	functions called like <code class="computeroutput">Cycle 1</code>.</p>
				428	<p>Now, when a program exposes really big cycles (as is
				429	true for some GUI code, or in general code using event or callback based
				430	programming style), you lose the nice property to let you pinpoint
				431	the bottlenecks by following call chains from
				432	<code class="function">main</code>, guided via
				433	inclusive cost. In addition, KCachegrind loses its ability to show
				434	interesting parts of the call graph, as it uses inclusive costs to
				435	cut off uninteresting areas.</p>
				436	<p>Despite the meaningless of inclusive costs in cycles, the big
				437	drawback for visualization motivates the possibility to temporarily
				438	switch off cycle detection in KCachegrind, which can lead to
				439	misguiding visualization. However, often cycles appear because of
				440	unlucky superposition of independent call chains in a way that
				441	the profile result will see a cycle. Neglecting uninteresting
				442	calls with very small measured inclusive cost would break these
				443	cycles. In such cases, incorrect handling of cycles by not detecting
				444	them still gives meaningful profiling visualization.</p>
				445	<p>It has to be noted that currently, <span class="command"><strong>callgrind_annotate</strong></span>
				446	does not do any cycle detection at all. For program executions with function
				447	recursion, it e.g. can print nonsense inclusive costs way above 100%.</p>
				448	<p>After describing why cycles are bad for profiling, it is worth
				449	talking about cycle avoidance. The key insight here is that symbols in
				450	the profile data do not have to exactly match the symbols found in the
				451	program. Instead, the symbol name could encode additional information
				452	from the current execution context such as recursion level of the
				453	current function, or even some part of the call chain leading to the
				454	function. While encoding of additional information into symbols is
				455	quite capable of avoiding cycles, it has to be used carefully to not cause
				456	symbol explosion. The latter imposes large memory requirement for Callgrind
				457	with possible out-of-memory conditions, and big profile data files.</p>
				458	<p>A further possibility to avoid cycles in Callgrind's profile data
				459	output is to simply leave out given functions in the call graph. Of course, this
				460	also skips any call information from and to an ignored function, and thus can
				461	break a cycle. Candidates for this typically are dispatcher functions in event
				462	driven code. The option to ignore calls to a function is
				463	<code class="option"><a class="xref" href="cl-manual.html#opt.fn-skip">--fn-skip</a>=function</code>. Aside from
				464	possibly breaking cycles, this is used in Callgrind to skip
				465	trampoline functions in the PLT sections
				466	for calls to functions in shared libraries. You can see the difference
				467	if you profile with <code class="option"><a class="xref" href="cl-manual.html#opt.skip-plt">--skip-plt</a>=no</code>.
				468	If a call is ignored, its cost events will be propagated to the
				469	enclosing function.</p>
				470	<p>If you have a recursive function, you can distinguish the first
				471	10 recursion levels by specifying
				472	<code class="option"><a class="xref" href="cl-manual.html#opt.separate-recs-num">--separate-recs10</a>=function</code>.
				473	Or for all functions with
				474	<code class="option"><a class="xref" href="cl-manual.html#opt.separate-recs">--separate-recs</a>=10</code>, but this will
				475	give you much bigger profile data files. In the profile data, you will see
				476	the recursion levels of "func" as the different functions with names
				477	"func", "func'2", "func'3" and so on.</p>
				478	<p>If you have call chains "A > B > C" and "A > C > B"
				479	in your program, you usually get a "false" cycle "B <> C". Use
				480	<code class="option"><a class="xref" href="cl-manual.html#opt.separate-callers-num">--separate-callers2</a>=B</code>
				481	<code class="option"><a class="xref" href="cl-manual.html#opt.separate-callers-num">--separate-callers2</a>=C</code>,
				482	and functions "B" and "C" will be treated as different functions
				483	depending on the direct caller. Using the apostrophe for appending
				484	this "context" to the function name, you get "A > B'A > C'B"
				485	and "A > C'A > B'C", and there will be no cycle. Use
				486	<code class="option"><a class="xref" href="cl-manual.html#opt.separate-callers">--separate-callers</a>=2</code> to get a 2-caller
				487	dependency for all functions. Note that doing this will increase
				488	the size of profile data files.</p>
				489	</div>
				490	<div class="sect2">
				491	<div class="titlepage"><div><div><h3 class="title">
				492	<a name="cl-manual.forkingprograms"></a>6.2.5. Forking Programs</h3></div></div></div>
				493	<p>If your program forks, the child will inherit all the profiling
				494	data that has been gathered for the parent. To start with empty profile
				495	counter values in the child, the client request
				496	<code class="computeroutput"><a class="xref" href="cl-manual.html#cr.zero-stats">CALLGRIND_ZERO_STATS</a>;</code>
				497	can be inserted into code to be executed by the child, directly after
				498	<code class="computeroutput">fork</code>.</p>
				499	<p>However, you will have to make sure that the output file format string
				500	(controlled by <code class="option">--callgrind-out-file</code>) does contain
				501	<code class="option">%p</code> (which is true by default). Otherwise, the
				502	outputs from the parent and child will overwrite each other or will be
				503	intermingled, which almost certainly is not what you want.</p>
				504	<p>You will be able to control the new child independently from
				505	the parent via callgrind_control.</p>
				506	</div>
				507	</div>
				508	<div class="sect1">
				509	<div class="titlepage"><div><div><h2 class="title" style="clear: both">
				510	<a name="cl-manual.options"></a>6.3. Callgrind Command-line Options</h2></div></div></div>
				511	<p>
				512	In the following, options are grouped into classes.
				513	</p>
				514	<p>
				515	Some options allow the specification of a function/symbol name, such as
				516	<code class="option"><a class="xref" href="cl-manual.html#opt.dump-before">--dump-before</a>=function</code>, or
				517	<code class="option"><a class="xref" href="cl-manual.html#opt.fn-skip">--fn-skip</a>=function</code>. All these options
				518	can be specified multiple times for different functions.
				519	In addition, the function specifications actually are patterns by supporting
				520	the use of wildcards '*' (zero or more arbitrary characters) and '?'
				521	(exactly one arbitrary character), similar to file name globbing in the
				522	shell. This feature is important especially for C++, as without wildcard
				523	usage, the function would have to be specified in full extent, including
				524	parameter signature. </p>
				525	<div class="sect2">
				526	<div class="titlepage"><div><div><h3 class="title">
				527	<a name="cl-manual.options.creation"></a>6.3.1. Dump creation options</h3></div></div></div>
				528	<p>
				529	These options influence the name and format of the profile data files.
				530	</p>
				531	<div class="variablelist">
				532	<a name="cl.opts.list.creation"></a><dl class="variablelist">
				533	<dt>
				534	<a name="opt.callgrind-out-file"></a><span class="term">
				535	<code class="option">--callgrind-out-file=<file> </code>
				536	</span>
				537	</dt>
				538	<dd><p>Write the profile data to
				539	<code class="computeroutput">file</code> rather than to the default
				540	output file,
				541	<code class="computeroutput">callgrind.out.<pid></code>. The
				542	<code class="option">%p</code> and <code class="option">%q</code> format specifiers
				543	can be used to embed the process ID and/or the contents of an
				544	environment variable in the name, as is the case for the core
				545	option <code class="option"><a class="xref" href="manual-core.html#opt.log-file">--log-file</a></code>.
				546	When multiple dumps are made, the file name
				547	is modified further; see below.</p></dd>
				548	<dt>
				549	<a name="opt.dump-line"></a><span class="term">
				550	<code class="option">--dump-line=<no\|yes> [default: yes] </code>
				551	</span>
				552	</dt>
				553	<dd><p>This specifies that event counting should be performed at
				554	source line granularity. This allows source annotation for sources
				555	which are compiled with debug information
				556	(<code class="option">-g</code>).</p></dd>
				557	<dt>
				558	<a name="opt.dump-instr"></a><span class="term">
				559	<code class="option">--dump-instr=<no\|yes> [default: no] </code>
				560	</span>
				561	</dt>
				562	<dd><p>This specifies that event counting should be performed at
				563	per-instruction granularity.
				564	This allows for assembly code
				565	annotation. Currently the results can only be
				566	displayed by KCachegrind.</p></dd>
				567	<dt>
				568	<a name="opt.compress-strings"></a><span class="term">
				569	<code class="option">--compress-strings=<no\|yes> [default: yes] </code>
				570	</span>
				571	</dt>
				572	<dd><p>This option influences the output format of the profile data.
				573	It specifies whether strings (file and function names) should be
				574	identified by numbers. This shrinks the file,
				575	but makes it more difficult
				576	for humans to read (which is not recommended in any case).</p></dd>
				577	<dt>
				578	<a name="opt.compress-pos"></a><span class="term">
				579	<code class="option">--compress-pos=<no\|yes> [default: yes] </code>
				580	</span>
				581	</dt>
				582	<dd><p>This option influences the output format of the profile data.
				583	It specifies whether numerical positions are always specified as absolute
				584	values or are allowed to be relative to previous numbers.
				585	This shrinks the file size.</p></dd>
				586	<dt>
				587	<a name="opt.combine-dumps"></a><span class="term">
				588	<code class="option">--combine-dumps=<no\|yes> [default: no] </code>
				589	</span>
				590	</dt>
				591	<dd><p>When enabled, when multiple profile data parts are to be
				592	generated these parts are appended to the same output file.
				593	Not recommended.</p></dd>
				594	</dl>
				595	</div>
				596	</div>
				597	<div class="sect2">
				598	<div class="titlepage"><div><div><h3 class="title">
				599	<a name="cl-manual.options.activity"></a>6.3.2. Activity options</h3></div></div></div>
				600	<p>
				601	These options specify when actions relating to event counts are to
				602	be executed. For interactive control use callgrind_control.
				603	</p>
				604	<div class="variablelist">
				605	<a name="cl.opts.list.activity"></a><dl class="variablelist">
				606	<dt>
				607	<a name="opt.dump-every-bb"></a><span class="term">
				608	<code class="option">--dump-every-bb=<count> [default: 0, never] </code>
				609	</span>
				610	</dt>
				611	<dd><p>Dump profile data every <code class="option">count</code> basic blocks.
				612	Whether a dump is needed is only checked when Valgrind's internal
				613	scheduler is run. Therefore, the minimum setting useful is about 100000.
				614	The count is a 64-bit value to make long dump periods possible.
				615	</p></dd>
				616	<dt>
				617	<a name="opt.dump-before"></a><span class="term">
				618	<code class="option">--dump-before=<function> </code>
				619	</span>
				620	</dt>
				621	<dd><p>Dump when entering <code class="option">function</code>.</p></dd>
				622	<dt>
				623	<a name="opt.zero-before"></a><span class="term">
				624	<code class="option">--zero-before=<function> </code>
				625	</span>
				626	</dt>
				627	<dd><p>Zero all costs when entering <code class="option">function</code>.</p></dd>
				628	<dt>
				629	<a name="opt.dump-after"></a><span class="term">
				630	<code class="option">--dump-after=<function> </code>
				631	</span>
				632	</dt>
				633	<dd><p>Dump when leaving <code class="option">function</code>.</p></dd>
				634	</dl>
				635	</div>
				636	</div>
				637	<div class="sect2">
				638	<div class="titlepage"><div><div><h3 class="title">
				639	<a name="cl-manual.options.collection"></a>6.3.3. Data collection options</h3></div></div></div>
				640	<p>
				641	These options specify when events are to be aggregated into event counts.
				642	Also see <a class="xref" href="cl-manual.html#cl-manual.limits" title="6.2.2. Limiting the range of collected events">Limiting range of event collection</a>.</p>
				643	<div class="variablelist">
				644	<a name="cl.opts.list.collection"></a><dl class="variablelist">
				645	<dt>
				646	<a name="opt.instr-atstart"></a><span class="term">
				647	<code class="option">--instr-atstart=<yes\|no> [default: yes] </code>
				648	</span>
				649	</dt>
				650	<dd>
				651	<p>Specify if you want Callgrind to start simulation and
				652	profiling from the beginning of the program.
				653	When set to <code class="computeroutput">no</code>,
				654	Callgrind will not be able
				655	to collect any information, including calls, but it will have at
				656	most a slowdown of around 4, which is the minimum Valgrind
				657	overhead. Instrumentation can be interactively enabled via
				658	<code class="computeroutput">callgrind_control -i on</code>.</p>
				659	<p>Note that the resulting call graph will most probably not
				660	contain <code class="function">main</code>, but will contain all the
				661	functions executed after instrumentation was enabled.
				662	Instrumentation can also programatically enabled/disabled. See the
				663	Callgrind include file
				664	<code class="computeroutput">callgrind.h</code> for the macro
				665	you have to use in your source code.</p>
				666	<p>For cache
				667	simulation, results will be less accurate when switching on
				668	instrumentation later in the program run, as the simulator starts
				669	with an empty cache at that moment. Switch on event collection
				670	later to cope with this error.</p>
				671	</dd>
				672	<dt>
				673	<a name="opt.collect-atstart"></a><span class="term">
				674	<code class="option">--collect-atstart=<yes\|no> [default: yes] </code>
				675	</span>
				676	</dt>
				677	<dd>
				678	<p>Specify whether event collection is enabled at beginning
				679	of the profile run.</p>
				680	<p>To only look at parts of your program, you have two
				681	possibilities:</p>
				682	<div class="orderedlist"><ol class="orderedlist" type="1">
				683	<li class="listitem"><p>Zero event counters before entering the program part you
				684	want to profile, and dump the event counters to a file after
				685	leaving that program part.</p></li>
				686	<li class="listitem"><p>Switch on/off collection state as needed to only see
				687	event counters happening while inside of the program part you
				688	want to profile.</p></li>
				689	</ol></div>
				690	<p>The second option can be used if the program part you want to
				691	profile is called many times. Option 1, i.e. creating a lot of
				692	dumps is not practical here.</p>
				693	<p>Collection state can be
				694	toggled at entry and exit of a given function with the
				695	option <code class="option"><a class="xref" href="cl-manual.html#opt.toggle-collect">--toggle-collect</a></code>. If you
				696	use this option, collection
				697	state should be disabled at the beginning. Note that the
				698	specification of <code class="option">--toggle-collect</code>
				699	implicitly sets
				700	<code class="option">--collect-state=no</code>.</p>
				701	<p>Collection state can be toggled also by inserting the client request
				702	<code class="computeroutput">
				703
				704	CALLGRIND_TOGGLE_COLLECT
				705	;</code>
				706	at the needed code positions.</p>
				707	</dd>
				708	<dt>
				709	<a name="opt.toggle-collect"></a><span class="term">
				710	<code class="option">--toggle-collect=<function> </code>
				711	</span>
				712	</dt>
				713	<dd><p>Toggle collection on entry/exit of <code class="option">function</code>.</p></dd>
				714	<dt>
				715	<a name="opt.collect-jumps"></a><span class="term">
				716	<code class="option">--collect-jumps=<no\|yes> [default: no] </code>
				717	</span>
				718	</dt>
				719	<dd><p>This specifies whether information for (conditional) jumps
				720	should be collected. As above, callgrind_annotate currently is not
				721	able to show you the data. You have to use KCachegrind to get jump
				722	arrows in the annotated code.</p></dd>
				723	<dt>
				724	<a name="opt.collect-systime"></a><span class="term">
				725	<code class="option">--collect-systime=<no\|yes> [default: no] </code>
				726	</span>
				727	</dt>
				728	<dd><p>This specifies whether information for system call times
				729	should be collected.</p></dd>
				730	<dt>
				731	<a name="clopt.collect-bus"></a><span class="term">
				732	<code class="option">--collect-bus=<no\|yes> [default: no] </code>
				733	</span>
				734	</dt>
				735	<dd><p>This specifies whether the number of global bus events executed
				736	should be collected. The event type "Ge" is used for these events.</p></dd>
				737	</dl>
				738	</div>
				739	</div>
				740	<div class="sect2">
				741	<div class="titlepage"><div><div><h3 class="title">
				742	<a name="cl-manual.options.separation"></a>6.3.4. Cost entity separation options</h3></div></div></div>
				743	<p>
				744	These options specify how event counts should be attributed to execution
				745	contexts.
				746	For example, they specify whether the recursion level or the
				747	call chain leading to a function should be taken into account,
				748	and whether the thread ID should be considered.
				749	Also see <a class="xref" href="cl-manual.html#cl-manual.cycles" title="6.2.4. Avoiding cycles">Avoiding cycles</a>.</p>
				750	<div class="variablelist">
				751	<a name="cmd-options.separation"></a><dl class="variablelist">
				752	<dt>
				753	<a name="opt.separate-threads"></a><span class="term">
				754	<code class="option">--separate-threads=<no\|yes> [default: no] </code>
				755	</span>
				756	</dt>
				757	<dd><p>This option specifies whether profile data should be generated
				758	separately for every thread. If yes, the file names get "-threadID"
				759	appended.</p></dd>
				760	<dt>
				761	<a name="opt.separate-callers"></a><span class="term">
				762	<code class="option">--separate-callers=<callers> [default: 0] </code>
				763	</span>
				764	</dt>
				765	<dd><p>Separate contexts by at most <callers> functions in the
				766	call chain. See <a class="xref" href="cl-manual.html#cl-manual.cycles" title="6.2.4. Avoiding cycles">Avoiding cycles</a>.</p></dd>
				767	<dt>
				768	<a name="opt.separate-callers-num"></a><span class="term">
				769	<code class="option">--separate-callers<number>=<function> </code>
				770	</span>
				771	</dt>
				772	<dd><p>Separate <code class="option">number</code> callers for <code class="option">function</code>.
				773	See <a class="xref" href="cl-manual.html#cl-manual.cycles" title="6.2.4. Avoiding cycles">Avoiding cycles</a>.</p></dd>
				774	<dt>
				775	<a name="opt.separate-recs"></a><span class="term">
				776	<code class="option">--separate-recs=<level> [default: 2] </code>
				777	</span>
				778	</dt>
				779	<dd><p>Separate function recursions by at most <code class="option">level</code> levels.
				780	See <a class="xref" href="cl-manual.html#cl-manual.cycles" title="6.2.4. Avoiding cycles">Avoiding cycles</a>.</p></dd>
				781	<dt>
				782	<a name="opt.separate-recs-num"></a><span class="term">
				783	<code class="option">--separate-recs<number>=<function> </code>
				784	</span>
				785	</dt>
				786	<dd><p>Separate <code class="option">number</code> recursions for <code class="option">function</code>.
				787	See <a class="xref" href="cl-manual.html#cl-manual.cycles" title="6.2.4. Avoiding cycles">Avoiding cycles</a>.</p></dd>
				788	<dt>
				789	<a name="opt.skip-plt"></a><span class="term">
				790	<code class="option">--skip-plt=<no\|yes> [default: yes] </code>
				791	</span>
				792	</dt>
				793	<dd><p>Ignore calls to/from PLT sections.</p></dd>
				794	<dt>
				795	<a name="opt.skip-direct-rec"></a><span class="term">
				796	<code class="option">--skip-direct-rec=<no\|yes> [default: yes] </code>
				797	</span>
				798	</dt>
				799	<dd><p>Ignore direct recursions.</p></dd>
				800	<dt>
				801	<a name="opt.fn-skip"></a><span class="term">
				802	<code class="option">--fn-skip=<function> </code>
				803	</span>
				804	</dt>
				805	<dd>
				806	<p>Ignore calls to/from a given function. E.g. if you have a
				807	call chain A > B > C, and you specify function B to be
				808	ignored, you will only see A > C.</p>
				809	<p>This is very convenient to skip functions handling callback
				810	behaviour. For example, with the signal/slot mechanism in the
				811	Qt graphics library, you only want
				812	to see the function emitting a signal to call the slots connected
				813	to that signal. First, determine the real call chain to see the
				814	functions needed to be skipped, then use this option.</p>
				815	</dd>
				816	</dl>
				817	</div>
				818	</div>
				819	<div class="sect2">
				820	<div class="titlepage"><div><div><h3 class="title">
				821	<a name="cl-manual.options.simulation"></a>6.3.5. Simulation options</h3></div></div></div>
				822	<div class="variablelist">
				823	<a name="cl.opts.list.simulation"></a><dl class="variablelist">
				824	<dt>
				825	<a name="clopt.cache-sim"></a><span class="term">
				826	<code class="option">--cache-sim=<yes\|no> [default: no] </code>
				827	</span>
				828	</dt>
				829	<dd><p>Specify if you want to do full cache simulation. By default,
				830	only instruction read accesses will be counted ("Ir").
				831	With cache simulation, further event counters are enabled:
				832	Cache misses on instruction reads ("I1mr"/"ILmr"),
				833	data read accesses ("Dr") and related cache misses ("D1mr"/"DLmr"),
				834	data write accesses ("Dw") and related cache misses ("D1mw"/"DLmw").
				835	For more information, see <a class="xref" href="cg-manual.html" title="5. Cachegrind: a cache and branch-prediction profiler">Cachegrind: a cache and branch-prediction profiler</a>.
				836	</p></dd>
				837	<dt>
				838	<a name="clopt.branch-sim"></a><span class="term">
				839	<code class="option">--branch-sim=<yes\|no> [default: no] </code>
				840	</span>
				841	</dt>
				842	<dd><p>Specify if you want to do branch prediction simulation.
				843	Further event counters are enabled: Number of executed conditional
				844	branches and related predictor misses ("Bc"/"Bcm"), executed indirect
				845	jumps and related misses of the jump address predictor ("Bi"/"Bim").
				846	</p></dd>
				847	</dl>
				848	</div>
				849	</div>
				850	<div class="sect2">
				851	<div class="titlepage"><div><div><h3 class="title">
				852	<a name="cl-manual.options.cachesimulation"></a>6.3.6. Cache simulation options</h3></div></div></div>
				853	<div class="variablelist">
				854	<a name="cl.opts.list.cachesimulation"></a><dl class="variablelist">
				855	<dt>
				856	<a name="opt.simulate-wb"></a><span class="term">
				857	<code class="option">--simulate-wb=<yes\|no> [default: no] </code>
				858	</span>
				859	</dt>
				860	<dd><p>Specify whether write-back behavior should be simulated, allowing
				861	to distinguish LL caches misses with and without write backs.
				862	The cache model of Cachegrind/Callgrind does not specify write-through
				863	vs. write-back behavior, and this also is not relevant for the number
				864	of generated miss counts. However, with explicit write-back simulation
				865	it can be decided whether a miss triggers not only the loading of a new
				866	cache line, but also if a write back of a dirty cache line had to take
				867	place before. The new dirty miss events are ILdmr, DLdmr, and DLdmw,
				868	for misses because of instruction read, data read, and data write,
				869	respectively. As they produce two memory transactions, they should
				870	account for a doubled time estimation in relation to a normal miss.
				871	</p></dd>
				872	<dt>
				873	<a name="opt.simulate-hwpref"></a><span class="term">
				874	<code class="option">--simulate-hwpref=<yes\|no> [default: no] </code>
				875	</span>
				876	</dt>
				877	<dd><p>Specify whether simulation of a hardware prefetcher should be
				878	added which is able to detect stream access in the second level cache
				879	by comparing accesses to separate to each page.
				880	As the simulation can not decide about any timing issues of prefetching,
				881	it is assumed that any hardware prefetch triggered succeeds before a
				882	real access is done. Thus, this gives a best-case scenario by covering
				883	all possible stream accesses.</p></dd>
				884	<dt>
				885	<a name="opt.cacheuse"></a><span class="term">
				886	<code class="option">--cacheuse=<yes\|no> [default: no] </code>
				887	</span>
				888	</dt>
				889	<dd><p>Specify whether cache line use should be collected. For every
				890	cache line, from loading to it being evicted, the number of accesses
				891	as well as the number of actually used bytes is determined. This
				892	behavior is related to the code which triggered loading of the cache
				893	line. In contrast to miss counters, which shows the position where
				894	the symptoms of bad cache behavior (i.e. latencies) happens, the
				895	use counters try to pinpoint at the reason (i.e. the code with the
				896	bad access behavior). The new counters are defined in a way such
				897	that worse behavior results in higher cost.
				898	AcCost1 and AcCost2 are counters showing bad temporal locality
				899	for L1 and LL caches, respectively. This is done by summing up
				900	reciprocal values of the numbers of accesses of each cache line,
				901	multiplied by 1000 (as only integer costs are allowed). E.g. for
				902	a given source line with 5 read accesses, a value of 5000 AcCost
				903	means that for every access, a new cache line was loaded and directly
				904	evicted afterwards without further accesses. Similarly, SpLoss1/2
				905	shows bad spatial locality for L1 and LL caches, respectively. It
				906	gives the <span class="emphasis"><em>spatial loss</em></span> count of bytes which
				907	were loaded into cache but never accessed. It pinpoints at code
				908	accessing data in a way such that cache space is wasted. This hints
				909	at bad layout of data structures in memory. Assuming a cache line
				910	size of 64 bytes and 100 L1 misses for a given source line, the
				911	loading of 6400 bytes into L1 was triggered. If SpLoss1 shows a
				912	value of 3200 for this line, this means that half of the loaded data was
				913	never used, or using a better data layout, only half of the cache
				914	space would have been needed.
				915	Please note that for cache line use counters, it currently is
				916	not possible to provide meaningful inclusive costs. Therefore,
				917	inclusive cost of these counters should be ignored.
				918	</p></dd>
				919	<dt>
				920	<a name="opt.I1"></a><span class="term">
				921	<code class="option">--I1=<size>,<associativity>,<line size> </code>
				922	</span>
				923	</dt>
				924	<dd><p>Specify the size, associativity and line size of the level 1
				925	instruction cache. </p></dd>
				926	<dt>
				927	<a name="opt.D1"></a><span class="term">
				928	<code class="option">--D1=<size>,<associativity>,<line size> </code>
				929	</span>
				930	</dt>
				931	<dd><p>Specify the size, associativity and line size of the level 1
				932	data cache.</p></dd>
				933	<dt>
				934	<a name="opt.LL"></a><span class="term">
				935	<code class="option">--LL=<size>,<associativity>,<line size> </code>
				936	</span>
				937	</dt>
				938	<dd><p>Specify the size, associativity and line size of the last-level
				939	cache.</p></dd>
				940	</dl>
				941	</div>
				942	</div>
				943	</div>
				944	<div class="sect1">
				945	<div class="titlepage"><div><div><h2 class="title" style="clear: both">
				946	<a name="cl-manual.monitor-commands"></a>6.4. Callgrind Monitor Commands</h2></div></div></div>
				947	<p>The Callgrind tool provides monitor commands handled by the Valgrind
				948	gdbserver (see <a class="xref" href="manual-core-adv.html#manual-core-adv.gdbserver-commandhandling" title="3.2.5. Monitor command handling by the Valgrind gdbserver">Monitor command handling by the Valgrind gdbserver</a>).
				949	</p>
				950	<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
				951	<li class="listitem"><p><code class="varname">dump [<dump_hint>]</code> requests to dump the
				952	profile data. </p></li>
				953	<li class="listitem"><p><code class="varname">zero</code> requests to zero the profile data
				954	counters. </p></li>
				955	<li class="listitem"><p><code class="varname">instrumentation [on\|off]</code> requests to set
				956	(if parameter on/off is given) or get the current instrumentation state.
				957	</p></li>
				958	<li class="listitem"><p><code class="varname">status</code> requests to print out some status
				959	information.</p></li>
				960	</ul></div>
				961	</div>
				962	<div class="sect1">
				963	<div class="titlepage"><div><div><h2 class="title" style="clear: both">
				964	<a name="cl-manual.clientrequests"></a>6.5. Callgrind specific client requests</h2></div></div></div>
				965	<p>Callgrind provides the following specific client requests in
				966	<code class="filename">callgrind.h</code>. See that file for the exact details of
				967	their arguments.</p>
				968	<div class="variablelist">
				969	<a name="cl.clientrequests.list"></a><dl class="variablelist">
				970	<dt>
				971	<a name="cr.dump-stats"></a><span class="term">
				972	<code class="computeroutput">CALLGRIND_DUMP_STATS</code>
				973	</span>
				974	</dt>
				975	<dd><p>Force generation of a profile dump at specified position
				976	in code, for the current thread only. Written counters will be reset
				977	to zero.</p></dd>
				978	<dt>
				979	<a name="cr.dump-stats-at"></a><span class="term">
				980	<code class="computeroutput">CALLGRIND_DUMP_STATS_AT(string)</code>
				981	</span>
				982	</dt>
				983	<dd><p>Same as <code class="computeroutput">CALLGRIND_DUMP_STATS</code>,
				984	but allows to specify a string to be able to distinguish profile
				985	dumps.</p></dd>
				986	<dt>
				987	<a name="cr.zero-stats"></a><span class="term">
				988	<code class="computeroutput">CALLGRIND_ZERO_STATS</code>
				989	</span>
				990	</dt>
				991	<dd><p>Reset the profile counters for the current thread to zero.</p></dd>
				992	<dt>
				993	<a name="cr.toggle-collect"></a><span class="term">
				994	<code class="computeroutput">CALLGRIND_TOGGLE_COLLECT</code>
				995	</span>
				996	</dt>
				997	<dd><p>Toggle the collection state. This allows to ignore events
				998	with regard to profile counters. See also options
				999	<code class="option"><a class="xref" href="cl-manual.html#opt.collect-atstart">--collect-atstart</a></code> and
				1000	<code class="option"><a class="xref" href="cl-manual.html#opt.toggle-collect">--toggle-collect</a></code>.</p></dd>
				1001	<dt>
				1002	<a name="cr.start-instr"></a><span class="term">
				1003	<code class="computeroutput">CALLGRIND_START_INSTRUMENTATION</code>
				1004	</span>
				1005	</dt>
				1006	<dd><p>Start full Callgrind instrumentation if not already enabled.
				1007	When cache simulation is done, this will flush the simulated cache
				1008	and lead to an artifical cache warmup phase afterwards with
				1009	cache misses which would not have happened in reality. See also
				1010	option <code class="option"><a class="xref" href="cl-manual.html#opt.instr-atstart">--instr-atstart</a></code>.</p></dd>
				1011	<dt>
				1012	<a name="cr.stop-instr"></a><span class="term">
				1013	<code class="computeroutput">CALLGRIND_STOP_INSTRUMENTATION</code>
				1014	</span>
				1015	</dt>
				1016	<dd><p>Stop full Callgrind instrumentation if not already disabled.
				1017	This flushes Valgrinds translation cache, and does no additional
				1018	instrumentation afterwards: it effectivly will run at the same
				1019	speed as Nulgrind, i.e. at minimal slowdown. Use this to
				1020	speed up the Callgrind run for uninteresting code parts. Use
				1021	<code class="computeroutput"><a class="xref" href="cl-manual.html#cr.start-instr">CALLGRIND_START_INSTRUMENTATION</a></code> to
				1022	enable instrumentation again. See also option
				1023	<code class="option"><a class="xref" href="cl-manual.html#opt.instr-atstart">--instr-atstart</a></code>.</p></dd>
				1024	</dl>
				1025	</div>
				1026	</div>
				1027	<div class="sect1">
				1028	<div class="titlepage"><div><div><h2 class="title" style="clear: both">
				1029	<a name="cl-manual.callgrind_annotate-options"></a>6.6. callgrind_annotate Command-line Options</h2></div></div></div>
				1030	<div class="variablelist">
				1031	<a name="callgrind_annotate.opts.list"></a><dl class="variablelist">
				1032	<dt><span class="term"><code class="option">-h --help</code></span></dt>
				1033	<dd><p>Show summary of options.</p></dd>
				1034	<dt><span class="term"><code class="option">--version</code></span></dt>
				1035	<dd><p>Show version of callgrind_annotate.</p></dd>
				1036	<dt><span class="term">
				1037	<code class="option">--show=A,B,C [default: all]</code>
				1038	</span></dt>
				1039	<dd><p>Only show figures for events A,B,C.</p></dd>
				1040	<dt><span class="term">
				1041	<code class="option">--sort=A,B,C</code>
				1042	</span></dt>
				1043	<dd><p>Sort columns by events A,B,C [event column order].</p></dd>
				1044	<dt><span class="term">
				1045	<code class="option">--threshold=<0--100> [default: 99%] </code>
				1046	</span></dt>
				1047	<dd><p>Percentage of counts (of primary sort event) we are
				1048	interested in.</p></dd>
				1049	<dt><span class="term">
				1050	<code class="option">--auto=<yes\|no> [default: no] </code>
				1051	</span></dt>
				1052	<dd><p>Annotate all source files containing functions that helped
				1053	reach the event count threshold.</p></dd>
				1054	<dt><span class="term">
				1055	<code class="option">--context=N [default: 8] </code>
				1056	</span></dt>
				1057	<dd><p>Print N lines of context before and after annotated
				1058	lines.</p></dd>
				1059	<dt><span class="term">
				1060	<code class="option">--inclusive=<yes\|no> [default: no] </code>
				1061	</span></dt>
				1062	<dd><p>Add subroutine costs to functions calls.</p></dd>
				1063	<dt><span class="term">
				1064	<code class="option">--tree=<none\|caller\|calling\|both> [default: none] </code>
				1065	</span></dt>
				1066	<dd><p>Print for each function their callers, the called functions
				1067	or both.</p></dd>
				1068	<dt><span class="term">
				1069	<code class="option">-I, --include=<dir> </code>
				1070	</span></dt>
				1071	<dd><p>Add <code class="option">dir</code> to the list of directories to search
				1072	for source files.</p></dd>
				1073	</dl>
				1074	</div>
				1075	</div>
				1076	<div class="sect1">
				1077	<div class="titlepage"><div><div><h2 class="title" style="clear: both">
				1078	<a name="cl-manual.callgrind_control-options"></a>6.7. callgrind_control Command-line Options</h2></div></div></div>
				1079	<p>By default, callgrind_control acts on all programs run by the
				1080	current user under Callgrind. It is possible to limit the actions to
				1081	specified Callgrind runs by providing a list of pids or program names as
				1082	argument. The default action is to give some brief information about the
				1083	applications being run under Callgrind.</p>
				1084	<div class="variablelist">
				1085	<a name="callgrind_control.opts.list"></a><dl class="variablelist">
				1086	<dt><span class="term"><code class="option">-h --help</code></span></dt>
				1087	<dd><p>Show a short description, usage, and summary of options.</p></dd>
				1088	<dt><span class="term"><code class="option">--version</code></span></dt>
				1089	<dd><p>Show version of callgrind_control.</p></dd>
				1090	<dt><span class="term"><code class="option">-l --long</code></span></dt>
				1091	<dd><p>Show also the working directory, in addition to the brief
				1092	information given by default.
				1093	</p></dd>
				1094	<dt><span class="term"><code class="option">-s --stat</code></span></dt>
				1095	<dd><p>Show statistics information about active Callgrind runs.</p></dd>
				1096	<dt><span class="term"><code class="option">-b --back</code></span></dt>
				1097	<dd><p>Show stack/back traces of each thread in active Callgrind runs. For
				1098	each active function in the stack trace, also the number of invocations
				1099	since program start (or last dump) is shown. This option can be
				1100	combined with -e to show inclusive cost of active functions.</p></dd>
				1101	<dt><span class="term"><code class="option">-e [A,B,...] </code> (default: all)</span></dt>
				1102	<dd><p>Show the current per-thread, exclusive cost values of event
				1103	counters. If no explicit event names are given, figures for all event
				1104	types which are collected in the given Callgrind run are
				1105	shown. Otherwise, only figures for event types A, B, ... are shown. If
				1106	this option is combined with -b, inclusive cost for the functions of
				1107	each active stack frame is provided, too.
				1108	</p></dd>
				1109	<dt><span class="term"><code class="option">--dump[=<desc>] </code> (default: no description)</span></dt>
				1110	<dd><p>Request the dumping of profile information. Optionally, a
				1111	description can be specified which is written into the dump as part of
				1112	the information giving the reason which triggered the dump action. This
				1113	can be used to distinguish multiple dumps.</p></dd>
				1114	<dt><span class="term"><code class="option">-z --zero</code></span></dt>
				1115	<dd><p>Zero all event counters.</p></dd>
				1116	<dt><span class="term"><code class="option">-k --kill</code></span></dt>
				1117	<dd><p>Force a Callgrind run to be terminated.</p></dd>
				1118	<dt><span class="term"><code class="option">--instr=<on\|off></code></span></dt>
				1119	<dd><p>Switch instrumentation mode on or off. If a Callgrind run has
				1120	instrumentation disabled, no simulation is done and no events are
				1121	counted. This is useful to skip uninteresting program parts, as there
				1122	is much less slowdown (same as with the Valgrind tool "none"). See also
				1123	the Callgrind option <code class="option">--instr-atstart</code>.</p></dd>
				1124	<dt><span class="term"><code class="option">--vgdb-prefix=<prefix></code></span></dt>
				1125	<dd><p>Specify the vgdb prefix to use by callgrind_control.
				1126	callgrind_control internally uses vgdb to find and control the active
				1127	Callgrind runs. If the <code class="option">--vgdb-prefix</code> option was used
				1128	for launching valgrind, then the same option must be given to
				1129	callgrind_control.</p></dd>
				1130	</dl>
				1131	</div>
				1132	</div>
				1133	</div>
				1134	<div>
				1135	<br><table class="nav" width="100%" cellspacing="3" cellpadding="2" border="0" summary="Navigation footer">
				1136	<tr>
				1137	<td rowspan="2" width="40%" align="left">
				1138	<a accesskey="p" href="cg-manual.html"><< 5. Cachegrind: a cache and branch-prediction profiler</a> </td>
				1139	<td width="20%" align="center"><a accesskey="u" href="manual.html">Up</a></td>
				1140	<td rowspan="2" width="40%" align="right"> <a accesskey="n" href="hg-manual.html">7. Helgrind: a thread error detector >></a>
				1141	</td>
				1142	</tr>
				1143	<tr><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td></tr>
				1144	</table>
				1145	</div>
				1146	</body>
				1147	</html>