Blame - cachegrind/docs/cg-manual.xml - fp2-dev/platform/external/valgrind

blob: b09fc67c650e63a151d61e3ccac227104bce60d0 [file] [log] [blame]

njn	3e986b2	2004-11-30 10:43:45 +0000	[diff] [blame]	1	<?xml version="1.0"?> <!-- -- sgml -- -->
				2	<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
sewardj	7aeb10f	2006-12-10 02:59:16 +0000	[diff] [blame]	3	"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
				4	[ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]>
njn	3e986b2	2004-11-30 10:43:45 +0000	[diff] [blame]	5
de	03e0e7c	2005-12-03 23:02:33 +0000	[diff] [blame]	6
njn	3e986b2	2004-11-30 10:43:45 +0000	[diff] [blame]	7	<chapter id="cg-manual" xreflabel="Cachegrind: a cache-miss profiler">
				8	<title>Cachegrind: a cache profiler</title>
				9
njn	3e986b2	2004-11-30 10:43:45 +0000	[diff] [blame]	10	<sect1 id="cg-manual.cache" xreflabel="Cache profiling">
				11	<title>Cache profiling</title>
				12
				13	<para>To use this tool, you must specify
				14	<computeroutput>--tool=cachegrind</computeroutput> on the
				15	Valgrind command line.</para>
				16
				17	<para>Cachegrind is a tool for doing cache simulations and
				18	annotating your source line-by-line with the number of cache
				19	misses. In particular, it records:</para>
				20	<itemizedlist>
				21	<listitem>
				22	<para>L1 instruction cache reads and misses;</para>
				23	</listitem>
				24	<listitem>
				25	<para>L1 data cache reads and read misses, writes and write
				26	misses;</para>
				27	</listitem>
				28	<listitem>
				29	<para>L2 unified cache reads and read misses, writes and
				30	writes misses.</para>
				31	</listitem>
				32	</itemizedlist>
				33
njn	c8cccb1	2005-07-25 23:30:24 +0000	[diff] [blame]	34	<para>On a modern machine, an L1 miss will typically cost
njn	3e986b2	2004-11-30 10:43:45 +0000	[diff] [blame]	35	around 10 cycles, and an L2 miss can cost as much as 200
				36	cycles. Detailed cache profiling can be very useful for improving
				37	the performance of your program.</para>
				38
				39	<para>Also, since one instruction cache read is performed per
				40	instruction executed, you can find out how many instructions are
				41	executed per line, which can be useful for traditional profiling
				42	and test coverage.</para>
				43
				44	<para>Any feedback, bug-fixes, suggestions, etc, welcome.</para>
				45
				46
				47
				48	<sect2 id="cg-manual.overview" xreflabel="Overview">
				49	<title>Overview</title>
				50
				51	<para>First off, as for normal Valgrind use, you probably want to
				52	compile with debugging info (the
				53	<computeroutput>-g</computeroutput> flag). But by contrast with
				54	normal Valgrind use, you probably <command>do</command> want to turn
				55	optimisation on, since you should profile your program as it will
				56	be normally run.</para>
				57
				58	<para>The two steps are:</para>
				59	<orderedlist>
				60	<listitem>
				61	<para>Run your program with <computeroutput>valgrind
				62	--tool=cachegrind</computeroutput> in front of the normal
				63	command line invocation. When the program finishes,
				64	Cachegrind will print summary cache statistics. It also
				65	collects line-by-line information in a file
				66	<computeroutput>cachegrind.out.pid</computeroutput>, where
				67	<computeroutput>pid</computeroutput> is the program's process
				68	id.</para>
				69
				70	<para>This step should be done every time you want to collect
				71	information about a new program, a changed program, or about
				72	the same program with different input.</para>
				73	</listitem>
				74
				75	<listitem>
				76	<para>Generate a function-by-function summary, and possibly
				77	annotate source files, using the supplied
				78	<computeroutput>cg_annotate</computeroutput> program. Source
				79	files to annotate can be specified manually, or manually on
				80	the command line, or "interesting" source files can be
				81	annotated automatically with the
				82	<computeroutput>--auto=yes</computeroutput> option. You can
				83	annotate C/C++ files or assembly language files equally
				84	easily.</para>
				85
				86	<para>This step can be performed as many times as you like
				87	for each Step 2. You may want to do multiple annotations
				88	showing different information each time.</para>
				89	</listitem>
				90
				91	</orderedlist>
				92
				93	<para>The steps are described in detail in the following
				94	sections.</para>
				95
				96	</sect2>
				97
				98
de	bc32e82	2005-06-25 14:43:05 +0000	[diff] [blame]	99	<sect2 id="cache-sim" xreflabel="Cache simulation specifics">
njn	3e986b2	2004-11-30 10:43:45 +0000	[diff] [blame]	100	<title>Cache simulation specifics</title>
				101
				102	<para>Cachegrind uses a simulation for a machine with a split L1
				103	cache and a unified L2 cache. This configuration is used for all
				104	(modern) x86-based machines we are aware of. Old Cyrix CPUs had
				105	a unified I and D L1 cache, but they are ancient history
				106	now.</para>
				107
				108	<para>The more specific characteristics of the simulation are as
				109	follows.</para>
				110
				111	<itemizedlist>
				112
				113	<listitem>
				114	<para>Write-allocate: when a write miss occurs, the block
				115	written to is brought into the D1 cache. Most modern caches
				116	have this property.</para>
				117	</listitem>
				118
				119	<listitem>
				120	<para>Bit-selection hash function: the line(s) in the cache
				121	to which a memory block maps is chosen by the middle bits
				122	M--(M+N-1) of the byte address, where:</para>
				123	<itemizedlist>
				124	<listitem>
				125	<para>line size = 2^M bytes</para>
				126	</listitem>
				127	<listitem>
				128	<para>(cache size / line size) = 2^N bytes</para>
				129	</listitem>
				130	</itemizedlist>
				131	</listitem>
				132
				133	<listitem>
				134	<para>Inclusive L2 cache: the L2 cache replicates all the
				135	entries of the L1 cache. This is standard on Pentium chips,
				136	but AMD Athlons use an exclusive L2 cache that only holds
				137	blocks evicted from L1. Ditto AMD Durons and most modern
				138	VIAs.</para>
				139	</listitem>
				140
				141	</itemizedlist>
				142
				143	<para>The cache configuration simulated (cache size,
				144	associativity and line size) is determined automagically using
				145	the CPUID instruction. If you have an old machine that (a)
				146	doesn't support the CPUID instruction, or (b) supports it in an
				147	early incarnation that doesn't give any cache information, then
				148	Cachegrind will fall back to using a default configuration (that
				149	of a model 3/4 Athlon). Cachegrind will tell you if this
				150	happens. You can manually specify one, two or all three levels
				151	(I1/D1/L2) of the cache from the command line using the
				152	<computeroutput>--I1</computeroutput>,
				153	<computeroutput>--D1</computeroutput> and
				154	<computeroutput>--L2</computeroutput> options.</para>
				155
				156
				157	<para>Other noteworthy behaviour:</para>
				158
				159	<itemizedlist>
				160	<listitem>
				161	<para>References that straddle two cache lines are treated as
				162	follows:</para>
				163	<itemizedlist>
				164	<listitem>
				165	<para>If both blocks hit --> counted as one hit</para>
				166	</listitem>
				167	<listitem>
				168	<para>If one block hits, the other misses --> counted
				169	as one miss.</para>
				170	</listitem>
				171	<listitem>
				172	<para>If both blocks miss --> counted as one miss (not
				173	two)</para>
				174	</listitem>
				175	</itemizedlist>
				176	</listitem>
				177
				178	<listitem>
				179	<para>Instructions that modify a memory location
				180	(eg. <computeroutput>inc</computeroutput> and
				181	<computeroutput>dec</computeroutput>) are counted as doing
				182	just a read, ie. a single data reference. This may seem
				183	strange, but since the write can never cause a miss (the read
				184	guarantees the block is in the cache) it's not very
				185	interesting.</para>
				186
				187	<para>Thus it measures not the number of times the data cache
				188	is accessed, but the number of times a data cache miss could
				189	occur.</para>
				190	</listitem>
				191
				192	</itemizedlist>
				193
				194	<para>If you are interested in simulating a cache with different
				195	properties, it is not particularly hard to write your own cache
				196	simulator, or to modify the existing ones in
				197	<computeroutput>vg_cachesim_I1.c</computeroutput>,
				198	<computeroutput>vg_cachesim_D1.c</computeroutput>,
				199	<computeroutput>vg_cachesim_L2.c</computeroutput> and
				200	<computeroutput>vg_cachesim_gen.c</computeroutput>. We'd be
				201	interested to hear from anyone who does.</para>
				202
				203	</sect2>
				204
				205	</sect1>
				206
				207
				208
				209	<sect1 id="cg-manual.profile" xreflabel="Profiling programs">
				210	<title>Profiling programs</title>
				211
				212	<para>To gather cache profiling information about the program
				213	<computeroutput>ls -l</computeroutput>, invoke Cachegrind like
				214	this:</para>
				215
				216	<programlisting><![CDATA[
				217	valgrind --tool=cachegrind ls -l]]></programlisting>
				218
				219	<para>The program will execute (slowly). Upon completion,
				220	summary statistics that look like this will be printed:</para>
				221
				222	<programlisting><![CDATA[
				223	==31751== I refs: 27,742,716
				224	==31751== I1 misses: 276
				225	==31751== L2 misses: 275
				226	==31751== I1 miss rate: 0.0%
				227	==31751== L2i miss rate: 0.0%
				228	==31751==
				229	==31751== D refs: 15,430,290 (10,955,517 rd + 4,474,773 wr)
				230	==31751== D1 misses: 41,185 ( 21,905 rd + 19,280 wr)
				231	==31751== L2 misses: 23,085 ( 3,987 rd + 19,098 wr)
				232	==31751== D1 miss rate: 0.2% ( 0.1% + 0.4%)
				233	==31751== L2d miss rate: 0.1% ( 0.0% + 0.4%)
				234	==31751==
				235	==31751== L2 misses: 23,360 ( 4,262 rd + 19,098 wr)
				236	==31751== L2 miss rate: 0.0% ( 0.0% + 0.4%)]]></programlisting>
				237
				238	<para>Cache accesses for instruction fetches are summarised
				239	first, giving the number of fetches made (this is the number of
				240	instructions executed, which can be useful to know in its own
				241	right), the number of I1 misses, and the number of L2 instruction
				242	(<computeroutput>L2i</computeroutput>) misses.</para>
				243
				244	<para>Cache accesses for data follow. The information is similar
				245	to that of the instruction fetches, except that the values are
				246	also shown split between reads and writes (note each row's
				247	<computeroutput>rd</computeroutput> and
				248	<computeroutput>wr</computeroutput> values add up to the row's
				249	total).</para>
				250
				251	<para>Combined instruction and data figures for the L2 cache
				252	follow that.</para>
				253
				254
				255
				256	<sect2 id="cg-manual.outputfile" xreflabel="Output file">
				257	<title>Output file</title>
				258
				259	<para>As well as printing summary information, Cachegrind also
				260	writes line-by-line cache profiling information to a file named
				261	<computeroutput>cachegrind.out.pid</computeroutput>. This file
				262	is human-readable, but is best interpreted by the accompanying
				263	program <computeroutput>cg_annotate</computeroutput>, described
				264	in the next section.</para>
				265
				266	<para>Things to note about the
				267	<computeroutput>cachegrind.out.pid</computeroutput>
				268	file:</para>
				269
				270	<itemizedlist>
				271	<listitem>
				272	<para>It is written every time Cachegrind is run, and will
				273	overwrite any existing
				274	<computeroutput>cachegrind.out.pid</computeroutput>
				275	in the current directory (but that won't happen very often
				276	because it takes some time for process ids to be
				277	recycled).</para>
				278	</listitem>
				279	<listitem>
				280	<para>It can be huge: <computeroutput>ls -l</computeroutput>
				281	generates a file of about 350KB. Browsing a few files and
				282	web pages with a Konqueror built with full debugging
				283	information generates a file of around 15 MB.</para>
				284	</listitem>
				285	</itemizedlist>
				286
sewardj	8d9fec5	2005-11-15 20:56:23 +0000	[diff] [blame]	287	<para>The <computeroutput>.pid</computeroutput> suffix
de	7e109d1	2005-11-18 22:09:58 +0000	[diff] [blame]	288	on the output file name serves two purposes. Firstly, it means you
				289	don't have to rename old log files that you don't want to overwrite.
				290	Secondly, and more importantly, it allows correct profiling with the
njn	3e986b2	2004-11-30 10:43:45 +0000	[diff] [blame]	291	<computeroutput>--trace-children=yes</computeroutput> option of
				292	programs that spawn child processes.</para>
				293
				294	</sect2>
				295
				296
				297
				298	<sect2 id="cg-manual.cgopts" xreflabel="Cachegrind options">
				299	<title>Cachegrind options</title>
				300
de	03e0e7c	2005-12-03 23:02:33 +0000	[diff] [blame]	301	<!-- start of xi:include in the manpage -->
				302	<para id="cg.opts.para">Manually specifies the I1/D1/L2 cache
				303	configuration, where <varname>size</varname> and
				304	<varname>line_size</varname> are measured in bytes. The three items
				305	must be comma-separated, but with no spaces, eg:
				306	<literallayout> valgrind --tool=cachegrind --I1=65535,2,64</literallayout>
				307
				308	You can specify one, two or three of the I1/D1/L2 caches. Any level not
				309	manually specified will be simulated using the configuration found in
				310	the normal way (via the CPUID instruction for automagic cache
				311	configuration, or failing that, via defaults).</para>
				312
njn	3e986b2	2004-11-30 10:43:45 +0000	[diff] [blame]	313	<para>Cache-simulation specific options are:</para>
				314
de	03e0e7c	2005-12-03 23:02:33 +0000	[diff] [blame]	315	<variablelist id="cg.opts.list">
njn	3e986b2	2004-11-30 10:43:45 +0000	[diff] [blame]	316
de	03e0e7c	2005-12-03 23:02:33 +0000	[diff] [blame]	317	<varlistentry id="opt.I1" xreflabel="--I1">
				318	<term>
				319	<option><![CDATA[--I1=<size>,<associativity>,<line size> ]]></option>
				320	</term>
				321	<listitem>
				322	<para>Specify the size, associativity and line size of the level 1
				323	instruction cache. </para>
				324	</listitem>
				325	</varlistentry>
njn	3e986b2	2004-11-30 10:43:45 +0000	[diff] [blame]	326
de	03e0e7c	2005-12-03 23:02:33 +0000	[diff] [blame]	327	<varlistentry id="opt.D1" xreflabel="--D1">
				328	<term>
				329	<option><![CDATA[--D1=<size>,<associativity>,<line size> ]]></option>
				330	</term>
				331	<listitem>
				332	<para>Specify the size, associativity and line size of the level 1
				333	data cache.</para>
				334	</listitem>
				335	</varlistentry>
njn	3e986b2	2004-11-30 10:43:45 +0000	[diff] [blame]	336
de	03e0e7c	2005-12-03 23:02:33 +0000	[diff] [blame]	337	<varlistentry id="opt.L2" xreflabel="--L2">
				338	<term>
				339	<option><![CDATA[--L2=<size>,<associativity>,<line size> ]]></option>
				340	</term>
				341	<listitem>
				342	<para>Specify the size, associativity and line size of the level 2
				343	cache.</para>
				344	</listitem>
				345	</varlistentry>
njn	3e986b2	2004-11-30 10:43:45 +0000	[diff] [blame]	346
de	03e0e7c	2005-12-03 23:02:33 +0000	[diff] [blame]	347	</variablelist>
				348	<!-- end of xi:include in the manpage -->
njn	3e986b2	2004-11-30 10:43:45 +0000	[diff] [blame]	349
				350	</sect2>
				351
				352
				353
				354	<sect2 id="cg-manual.annotate" xreflabel="Annotating C/C++ programs">
				355	<title>Annotating C/C++ programs</title>
				356
				357	<para>Before using <computeroutput>cg_annotate</computeroutput>,
				358	it is worth widening your window to be at least 120-characters
				359	wide if possible, as the output lines can be quite long.</para>
				360
				361	<para>To get a function-by-function summary, run
				362	<computeroutput>cg_annotate --pid</computeroutput> in a directory
de	03e0e7c	2005-12-03 23:02:33 +0000	[diff] [blame]	363	containing a <filename>cachegrind.out.pid</filename> file. The
				364	<emphasis>--pid</emphasis> is required so that
				365	<computeroutput>cg_annotate</computeroutput> knows which log file to use
				366	when several are present.</para>
njn	3e986b2	2004-11-30 10:43:45 +0000	[diff] [blame]	367
				368	<para>The output looks like this:</para>
				369
				370	<programlisting><![CDATA[
				371	--------------------------------------------------------------------------------
				372	I1 cache: 65536 B, 64 B, 2-way associative
				373	D1 cache: 65536 B, 64 B, 2-way associative
				374	L2 cache: 262144 B, 64 B, 8-way associative
				375	Command: concord vg_to_ucode.c
				376	Events recorded: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				377	Events shown: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				378	Event sort order: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				379	Threshold: 99%
				380	Chosen for annotation:
				381	Auto-annotation: on
				382
				383	--------------------------------------------------------------------------------
				384	Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				385	--------------------------------------------------------------------------------
				386	27,742,716 276 275 10,955,517 21,905 3,987 4,474,773 19,280 19,098 PROGRAM TOTALS
				387
				388	--------------------------------------------------------------------------------
				389	Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw file:function
				390	--------------------------------------------------------------------------------
				391	8,821,482 5 5 2,242,702 1,621 73 1,794,230 0 0 getc.c:_IO_getc
				392	5,222,023 4 4 2,276,334 16 12 875,959 1 1 concord.c:get_word
				393	2,649,248 2 2 1,344,810 7,326 1,385 . . . vg_main.c:strcmp
				394	2,521,927 2 2 591,215 0 0 179,398 0 0 concord.c:hash
				395	2,242,740 2 2 1,046,612 568 22 448,548 0 0 ctype.c:tolower
				396	1,496,937 4 4 630,874 9,000 1,400 279,388 0 0 concord.c:insert
				397	897,991 51 51 897,831 95 30 62 1 1 ???:???
				398	598,068 1 1 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__flockfile
				399	598,068 0 0 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__funlockfile
				400	598,024 4 4 213,580 35 16 149,506 0 0 vg_clientmalloc.c:malloc
				401	446,587 1 1 215,973 2,167 430 129,948 14,057 13,957 concord.c:add_existing
				402	341,760 2 2 128,160 0 0 128,160 0 0 vg_clientmalloc.c:vg_trap_here_WRAPPER
				403	320,782 4 4 150,711 276 0 56,027 53 53 concord.c:init_hash_table
				404	298,998 1 1 106,785 0 0 64,071 1 1 concord.c:create
				405	149,518 0 0 149,516 0 0 1 0 0 ???:tolower@@GLIBC_2.0
				406	149,518 0 0 149,516 0 0 1 0 0 ???:fgetc@@GLIBC_2.0
				407	95,983 4 4 38,031 0 0 34,409 3,152 3,150 concord.c:new_word_node
				408	85,440 0 0 42,720 0 0 21,360 0 0 vg_clientmalloc.c:vg_bogus_epilogue]]></programlisting>
				409
				410
				411	<para>First up is a summary of the annotation options:</para>
				412
				413	<itemizedlist>
				414
				415	<listitem>
				416	<para>I1 cache, D1 cache, L2 cache: cache configuration. So
				417	you know the configuration with which these results were
				418	obtained.</para>
				419	</listitem>
				420
				421	<listitem>
				422	<para>Command: the command line invocation of the program
				423	under examination.</para>
				424	</listitem>
				425
				426	<listitem>
				427	<para>Events recorded: event abbreviations are:</para>
				428	<itemizedlist>
				429	<listitem>
				430	<para><computeroutput>Ir </computeroutput>: I cache reads
				431	(ie. instructions executed)</para>
				432	</listitem>
				433	<listitem>
				434	<para><computeroutput>I1mr</computeroutput>: I1 cache read
				435	misses</para>
				436	</listitem>
				437	<listitem>
				438	<para><computeroutput>I2mr</computeroutput>: L2 cache
				439	instruction read misses</para>
				440	</listitem>
				441	<listitem>
				442	<para><computeroutput>Dr </computeroutput>: D cache reads
				443	(ie. memory reads)</para>
				444	</listitem>
				445	<listitem>
				446	<para><computeroutput>D1mr</computeroutput>: D1 cache read
				447	misses</para>
				448	</listitem>
				449	<listitem>
				450	<para><computeroutput>D2mr</computeroutput>: L2 cache data
				451	read misses</para>
				452	</listitem>
				453	<listitem>
				454	<para><computeroutput>Dw </computeroutput>: D cache writes
				455	(ie. memory writes)</para>
				456	</listitem>
				457	<listitem>
				458	<para><computeroutput>D1mw</computeroutput>: D1 cache write
				459	misses</para>
				460	</listitem>
				461	<listitem>
				462	<para><computeroutput>D2mw</computeroutput>: L2 cache data
				463	write misses</para>
				464	</listitem>
				465	</itemizedlist>
				466
				467	<para>Note that D1 total accesses is given by
				468	<computeroutput>D1mr</computeroutput> +
				469	<computeroutput>D1mw</computeroutput>, and that L2 total
				470	accesses is given by <computeroutput>I2mr</computeroutput> +
				471	<computeroutput>D2mr</computeroutput> +
				472	<computeroutput>D2mw</computeroutput>.</para>
				473	</listitem>
				474
				475	<listitem>
				476	<para>Events shown: the events shown (a subset of events
				477	gathered). This can be adjusted with the
				478	<computeroutput>--show</computeroutput> option.</para>
				479	</listitem>
				480
				481	<listitem>
				482	<para>Event sort order: the sort order in which functions are
				483	shown. For example, in this case the functions are sorted
				484	from highest <computeroutput>Ir</computeroutput> counts to
				485	lowest. If two functions have identical
				486	<computeroutput>Ir</computeroutput> counts, they will then be
				487	sorted by <computeroutput>I1mr</computeroutput> counts, and
				488	so on. This order can be adjusted with the
				489	<computeroutput>--sort</computeroutput> option.</para>
				490
				491	<para>Note that this dictates the order the functions appear.
				492	It is <command>not</command> the order in which the columns
				493	appear; that is dictated by the "events shown" line (and can
				494	be changed with the <computeroutput>--show</computeroutput>
				495	option).</para>
				496	</listitem>
				497
				498	<listitem>
				499	<para>Threshold: <computeroutput>cg_annotate</computeroutput>
				500	by default omits functions that cause very low numbers of
				501	misses to avoid drowning you in information. In this case,
				502	cg_annotate shows summaries the functions that account for
				503	99% of the <computeroutput>Ir</computeroutput> counts;
				504	<computeroutput>Ir</computeroutput> is chosen as the
				505	threshold event since it is the primary sort event. The
				506	threshold can be adjusted with the
				507	<computeroutput>--threshold</computeroutput>
				508	option.</para>
				509	</listitem>
				510
				511	<listitem>
				512	<para>Chosen for annotation: names of files specified
				513	manually for annotation; in this case none.</para>
				514	</listitem>
				515
				516	<listitem>
				517	<para>Auto-annotation: whether auto-annotation was requested
				518	via the <computeroutput>--auto=yes</computeroutput>
				519	option. In this case no.</para>
				520	</listitem>
				521
				522	</itemizedlist>
				523
				524	<para>Then follows summary statistics for the whole
				525	program. These are similar to the summary provided when running
de	03e0e7c	2005-12-03 23:02:33 +0000	[diff] [blame]	526	<computeroutput>valgrind --tool=cachegrind</computeroutput>.</para>
njn	3e986b2	2004-11-30 10:43:45 +0000	[diff] [blame]	527
				528	<para>Then follows function-by-function statistics. Each function
				529	is identified by a
				530	<computeroutput>file_name:function_name</computeroutput> pair. If
				531	a column contains only a dot it means the function never performs
				532	that event (eg. the third row shows that
				533	<computeroutput>strcmp()</computeroutput> contains no
				534	instructions that write to memory). The name
				535	<computeroutput>???</computeroutput> is used if the the file name
				536	and/or function name could not be determined from debugging
				537	information. If most of the entries have the form
				538	<computeroutput>???:???</computeroutput> the program probably
				539	wasn't compiled with <computeroutput>-g</computeroutput>. If any
				540	code was invalidated (either due to self-modifying code or
				541	unloading of shared objects) its counts are aggregated into a
				542	single cost centre written as
				543	<computeroutput>(discarded):(discarded)</computeroutput>.</para>
				544
				545	<para>It is worth noting that functions will come from three
				546	types of source files:</para>
				547
				548	<orderedlist>
				549	<listitem>
				550	<para>From the profiled program
				551	(<filename>concord.c</filename> in this example).</para>
				552	</listitem>
				553	<listitem>
				554	<para>From libraries (eg. <filename>getc.c</filename>)</para>
				555	</listitem>
				556	<listitem>
				557	<para>From Valgrind's implementation of some libc functions
				558	(eg. <computeroutput>vg_clientmalloc.c:malloc</computeroutput>).
				559	These are recognisable because the filename begins with
				560	<computeroutput>vg_</computeroutput>, and is probably one of
				561	<filename>vg_main.c</filename>,
				562	<filename>vg_clientmalloc.c</filename> or
				563	<filename>vg_mylibc.c</filename>.</para>
				564	</listitem>
				565
				566	</orderedlist>
				567
				568	<para>There are two ways to annotate source files -- by choosing
				569	them manually, or with the
				570	<computeroutput>--auto=yes</computeroutput> option. To do it
				571	manually, just specify the filenames as arguments to
				572	<computeroutput>cg_annotate</computeroutput>. For example, the
				573	output from running <filename>cg_annotate concord.c</filename>
				574	for our example produces the same output as above followed by an
				575	annotated version of <filename>concord.c</filename>, a section of
				576	which looks like:</para>
				577
				578	<programlisting><![CDATA[
				579	--------------------------------------------------------------------------------
				580	-- User-annotated source: concord.c
				581	--------------------------------------------------------------------------------
				582	Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
				583
				584	[snip]
				585
				586	. . . . . . . . . void init_hash_table(char file_name, Word_Node table[])
				587	3 1 1 . . . 1 0 0 {
				588	. . . . . . . . . FILE *file_ptr;
				589	. . . . . . . . . Word_Info *data;
				590	1 0 0 . . . 1 1 1 int line = 1, i;
				591	. . . . . . . . .
				592	5 0 0 . . . 3 0 0 data = (Word_Info *) create(sizeof(Word_Info));
				593	. . . . . . . . .
				594	4,991 0 0 1,995 0 0 998 0 0 for (i = 0; i < TABLE_SIZE; i++)
				595	3,988 1 1 1,994 0 0 997 53 52 table[i] = NULL;
				596	. . . . . . . . .
				597	. . . . . . . . . /* Open file, check it. */
				598	6 0 0 1 0 0 4 0 0 file_ptr = fopen(file_name, "r");
				599	2 0 0 1 0 0 . . . if (!(file_ptr)) {
				600	. . . . . . . . . fprintf(stderr, "Couldn't open '%s'.\n", file_name);
				601	1 1 1 . . . . . . exit(EXIT_FAILURE);
				602	. . . . . . . . . }
				603	. . . . . . . . .
				604	165,062 1 1 73,360 0 0 91,700 0 0 while ((line = get_word(data, line, file_ptr)) != EOF)
				605	146,712 0 0 73,356 0 0 73,356 0 0 insert(data->;word, data->line, table);
				606	. . . . . . . . .
				607	4 0 0 1 0 0 2 0 0 free(data);
				608	4 0 0 1 0 0 2 0 0 fclose(file_ptr);
				609	3 0 0 2 0 0 . . . }]]></programlisting>
				610
				611	<para>(Although column widths are automatically minimised, a wide
				612	terminal is clearly useful.)</para>
				613
				614	<para>Each source file is clearly marked
				615	(<computeroutput>User-annotated source</computeroutput>) as
				616	having been chosen manually for annotation. If the file was
				617	found in one of the directories specified with the
				618	<computeroutput>-I / --include</computeroutput> option, the directory
				619	and file are both given.</para>
				620
				621	<para>Each line is annotated with its event counts. Events not
				622	applicable for a line are represented by a `.'; this is useful
				623	for distinguishing between an event which cannot happen, and one
				624	which can but did not.</para>
				625
				626	<para>Sometimes only a small section of a source file is
sewardj	8d9fec5	2005-11-15 20:56:23 +0000	[diff] [blame]	627	executed. To minimise uninteresting output, Cachegrind only shows
njn	3e986b2	2004-11-30 10:43:45 +0000	[diff] [blame]	628	annotated lines and lines within a small distance of annotated
				629	lines. Gaps are marked with the line numbers so you know which
				630	part of a file the shown code comes from, eg:</para>
				631
				632	<programlisting><![CDATA[
				633	(figures and code for line 704)
				634	-- line 704 ----------------------------------------
				635	-- line 878 ----------------------------------------
				636	(figures and code for line 878)]]></programlisting>
				637
				638	<para>The amount of context to show around annotated lines is
				639	controlled by the <computeroutput>--context</computeroutput>
				640	option.</para>
				641
				642	<para>To get automatic annotation, run
				643	<computeroutput>cg_annotate --auto=yes</computeroutput>.
				644	cg_annotate will automatically annotate every source file it can
				645	find that is mentioned in the function-by-function summary.
				646	Therefore, the files chosen for auto-annotation are affected by
				647	the <computeroutput>--sort</computeroutput> and
				648	<computeroutput>--threshold</computeroutput> options. Each
				649	source file is clearly marked (<computeroutput>Auto-annotated
				650	source</computeroutput>) as being chosen automatically. Any
				651	files that could not be found are mentioned at the end of the
				652	output, eg:</para>
				653
				654	<programlisting><![CDATA[
				655	------------------------------------------------------------------
				656	The following files chosen for auto-annotation could not be found:
				657	------------------------------------------------------------------
				658	getc.c
				659	ctype.c
				660	../sysdeps/generic/lockfile.c]]></programlisting>
				661
				662	<para>This is quite common for library files, since libraries are
				663	usually compiled with debugging information, but the source files
				664	are often not present on a system. If a file is chosen for
				665	annotation <command>both</command> manually and automatically, it
				666	is marked as <computeroutput>User-annotated
				667	source</computeroutput>. Use the <computeroutput>-I /
				668	--include</computeroutput> option to tell Valgrind where to look
				669	for source files if the filenames found from the debugging
				670	information aren't specific enough.</para>
				671
				672	<para>Beware that cg_annotate can take some time to digest large
				673	<computeroutput>cachegrind.out.pid</computeroutput> files,
				674	e.g. 30 seconds or more. Also beware that auto-annotation can
				675	produce a lot of output if your program is large!</para>
				676
				677	</sect2>
				678
				679
				680	<sect2 id="cg-manual.assembler" xreflabel="Annotating assembler programs">
				681	<title>Annotating assembler programs</title>
				682
				683	<para>Valgrind can annotate assembler programs too, or annotate
				684	the assembler generated for your C program. Sometimes this is
				685	useful for understanding what is really happening when an
				686	interesting line of C code is translated into multiple
				687	instructions.</para>
				688
				689	<para>To do this, you just need to assemble your
				690	<computeroutput>.s</computeroutput> files with assembler-level
				691	debug information. gcc doesn't do this, but you can use the GNU
				692	assembler with the <computeroutput>--gstabs</computeroutput>
				693	option to generate object files with this information, eg:</para>
				694
				695	<programlisting><![CDATA[
				696	as --gstabs foo.s]]></programlisting>
				697
				698	<para>You can then profile and annotate source files in the same
				699	way as for C/C++ programs.</para>
				700
				701	</sect2>
				702
				703	</sect1>
				704
				705
				706	<sect1 id="cg-manual.annopts" xreflabel="cg_annotate options">
				707	<title><computeroutput>cg_annotate</computeroutput> options</title>
				708
				709	<itemizedlist>
				710
de	bc32e82	2005-06-25 14:43:05 +0000	[diff] [blame]	711	<listitem id="pid">
njn	3e986b2	2004-11-30 10:43:45 +0000	[diff] [blame]	712	<para><computeroutput>--pid</computeroutput></para>
				713	<para>Indicates which
				714	<computeroutput>cachegrind.out.pid</computeroutput> file to
				715	read. Not actually an option -- it is required.</para>
				716	</listitem>
				717
				718	<listitem>
				719	<para><computeroutput>-h, --help</computeroutput></para>
				720	<para><computeroutput>-v, --version</computeroutput></para>
				721	<para>Help and version, as usual.</para>
				722	</listitem>
				723
de	bc32e82	2005-06-25 14:43:05 +0000	[diff] [blame]	724	<listitem id="sort">
njn	3e986b2	2004-11-30 10:43:45 +0000	[diff] [blame]	725	<para><computeroutput>--sort=A,B,C</computeroutput> [default:
				726	order in
				727	<computeroutput>cachegrind.out.pid</computeroutput>]</para>
				728	<para>Specifies the events upon which the sorting of the
				729	function-by-function entries will be based. Useful if you
				730	want to concentrate on eg. I cache misses
				731	(<computeroutput>--sort=I1mr,I2mr</computeroutput>), or D
				732	cache misses
				733	(<computeroutput>--sort=D1mr,D2mr</computeroutput>), or L2
				734	misses
				735	(<computeroutput>--sort=D2mr,I2mr</computeroutput>).</para>
				736	</listitem>
				737
de	bc32e82	2005-06-25 14:43:05 +0000	[diff] [blame]	738	<listitem id="show">
njn	3e986b2	2004-11-30 10:43:45 +0000	[diff] [blame]	739	<para><computeroutput>--show=A,B,C</computeroutput> [default:
				740	all, using order in
				741	<computeroutput>cachegrind.out.pid</computeroutput>]</para>
				742	<para>Specifies which events to show (and the column
				743	order). Default is to use all present in the
				744	<computeroutput>cachegrind.out.pid</computeroutput> file (and
				745	use the order in the file).</para>
				746	</listitem>
				747
de	bc32e82	2005-06-25 14:43:05 +0000	[diff] [blame]	748	<listitem id="threshold">
njn	3e986b2	2004-11-30 10:43:45 +0000	[diff] [blame]	749	<para><computeroutput>--threshold=X</computeroutput>
				750	[default: 99%]</para>
				751	<para>Sets the threshold for the function-by-function
				752	summary. Functions are shown that account for more than X%
				753	of the primary sort event. If auto-annotating, also affects
				754	which files are annotated.</para>
				755
				756	<para>Note: thresholds can be set for more than one of the
				757	events by appending any events for the
				758	<computeroutput>--sort</computeroutput> option with a colon
				759	and a number (no spaces, though). E.g. if you want to see
				760	the functions that cover 99% of L2 read misses and 99% of L2
				761	write misses, use this option:</para>
				762	<para><computeroutput>--sort=D2mr:99,D2mw:99</computeroutput></para>
				763	</listitem>
				764
de	bc32e82	2005-06-25 14:43:05 +0000	[diff] [blame]	765	<listitem id="auto">
njn	3e986b2	2004-11-30 10:43:45 +0000	[diff] [blame]	766	<para><computeroutput>--auto=no</computeroutput> [default]</para>
				767	<para><computeroutput>--auto=yes</computeroutput></para>
				768	<para>When enabled, automatically annotates every file that
				769	is mentioned in the function-by-function summary that can be
				770	found. Also gives a list of those that couldn't be found.</para>
				771	</listitem>
				772
de	bc32e82	2005-06-25 14:43:05 +0000	[diff] [blame]	773	<listitem id="context">
njn	3e986b2	2004-11-30 10:43:45 +0000	[diff] [blame]	774	<para><computeroutput>--context=N</computeroutput> [default:
				775	8]</para>
				776	<para>Print N lines of context before and after each
				777	annotated line. Avoids printing large sections of source
				778	files that were not executed. Use a large number
				779	(eg. 10,000) to show all source lines.</para>
				780	</listitem>
				781
de	bc32e82	2005-06-25 14:43:05 +0000	[diff] [blame]	782	<listitem id="include">
sewardj	8d9fec5	2005-11-15 20:56:23 +0000	[diff] [blame]	783	<para><computeroutput>-I<dir>,
njn	3e986b2	2004-11-30 10:43:45 +0000	[diff] [blame]	784	--include=<dir></computeroutput> [default: empty
				785	string]</para>
				786	<para>Adds a directory to the list in which to search for
				787	files. Multiple -I/--include options can be given to add
				788	multiple directories.</para>
				789	</listitem>
				790
				791	</itemizedlist>
				792
				793
				794
				795	<sect2>
				796	<title>Warnings</title>
				797
				798	<para>There are a couple of situations in which
				799	<computeroutput>cg_annotate</computeroutput> issues
				800	warnings.</para>
				801
				802	<itemizedlist>
				803	<listitem>
				804	<para>If a source file is more recent than the
				805	<computeroutput>cachegrind.out.pid</computeroutput> file.
				806	This is because the information in
				807	<computeroutput>cachegrind.out.pid</computeroutput> is only
				808	recorded with line numbers, so if the line numbers change at
				809	all in the source (eg. lines added, deleted, swapped), any
				810	annotations will be incorrect.</para>
				811	</listitem>
				812	<listitem>
				813	<para>If information is recorded about line numbers past the
				814	end of a file. This can be caused by the above problem,
				815	ie. shortening the source file while using an old
				816	<computeroutput>cachegrind.out.pid</computeroutput> file. If
				817	this happens, the figures for the bogus lines are printed
				818	anyway (clearly marked as bogus) in case they are
				819	important.</para>
				820	</listitem>
				821	</itemizedlist>
				822
				823	</sect2>
				824
				825
				826
				827	<sect2>
				828	<title>Things to watch out for</title>
				829
				830	<para>Some odd things that can occur during annotation:</para>
				831
				832	<itemizedlist>
				833	<listitem>
				834	<para>If annotating at the assembler level, you might see
				835	something like this:</para>
				836	<programlisting><![CDATA[
				837	1 0 0 . . . . . . leal -12(%ebp),%eax
				838	1 0 0 . . . 1 0 0 movl %eax,84(%ebx)
				839	2 0 0 0 0 0 1 0 0 movl $1,-20(%ebp)
				840	. . . . . . . . . .align 4,0x90
				841	1 0 0 . . . . . . movl $.LnrB,%eax
				842	1 0 0 . . . 1 0 0 movl %eax,-16(%ebp)]]></programlisting>
				843
				844	<para>How can the third instruction be executed twice when
				845	the others are executed only once? As it turns out, it
				846	isn't. Here's a dump of the executable, using
				847	<computeroutput>objdump -d</computeroutput>:</para>
				848	<programlisting><![CDATA[
				849	8048f25: 8d 45 f4 lea 0xfffffff4(%ebp),%eax
				850	8048f28: 89 43 54 mov %eax,0x54(%ebx)
				851	8048f2b: c7 45 ec 01 00 00 00 movl $0x1,0xffffffec(%ebp)
				852	8048f32: 89 f6 mov %esi,%esi
				853	8048f34: b8 08 8b 07 08 mov $0x8078b08,%eax
				854	8048f39: 89 45 f0 mov %eax,0xfffffff0(%ebp)]]></programlisting>
				855
				856	<para>Notice the extra <computeroutput>mov
				857	%esi,%esi</computeroutput> instruction. Where did this come
				858	from? The GNU assembler inserted it to serve as the two
				859	bytes of padding needed to align the <computeroutput>movl
				860	$.LnrB,%eax</computeroutput> instruction on a four-byte
				861	boundary, but pretended it didn't exist when adding debug
				862	information. Thus when Valgrind reads the debug info it
				863	thinks that the <computeroutput>movl
				864	$0x1,0xffffffec(%ebp)</computeroutput> instruction covers the
				865	address range 0x8048f2b--0x804833 by itself, and attributes
				866	the counts for the <computeroutput>mov
				867	%esi,%esi</computeroutput> to it.</para>
				868	</listitem>
				869
				870	<listitem>
				871	<para>Inlined functions can cause strange results in the
				872	function-by-function summary. If a function
				873	<computeroutput>inline_me()</computeroutput> is defined in
				874	<filename>foo.h</filename> and inlined in the functions
				875	<computeroutput>f1()</computeroutput>,
				876	<computeroutput>f2()</computeroutput> and
				877	<computeroutput>f3()</computeroutput> in
				878	<filename>bar.c</filename>, there will not be a
				879	<computeroutput>foo.h:inline_me()</computeroutput> function
				880	entry. Instead, there will be separate function entries for
				881	each inlining site, ie.
				882	<computeroutput>foo.h:f1()</computeroutput>,
				883	<computeroutput>foo.h:f2()</computeroutput> and
				884	<computeroutput>foo.h:f3()</computeroutput>. To find the
				885	total counts for
				886	<computeroutput>foo.h:inline_me()</computeroutput>, add up
				887	the counts from each entry.</para>
				888
				889	<para>The reason for this is that although the debug info
				890	output by gcc indicates the switch from
				891	<filename>bar.c</filename> to <filename>foo.h</filename>, it
				892	doesn't indicate the name of the function in
				893	<filename>foo.h</filename>, so Valgrind keeps using the old
				894	one.</para>
				895	</listitem>
				896
				897	<listitem>
				898	<para>Sometimes, the same filename might be represented with
				899	a relative name and with an absolute name in different parts
				900	of the debug info, eg:
				901	<filename>/home/user/proj/proj.h</filename> and
				902	<filename>../proj.h</filename>. In this case, if you use
				903	auto-annotation, the file will be annotated twice with the
				904	counts split between the two.</para>
				905	</listitem>
				906
				907	<listitem>
				908	<para>Files with more than 65,535 lines cause difficulties
				909	for the stabs debug info reader. This is because the line
				910	number in the <computeroutput>struct nlist</computeroutput>
				911	defined in <filename>a.out.h</filename> under Linux is only a
				912	16-bit value. Valgrind can handle some files with more than
				913	65,535 lines correctly by making some guesses to identify
				914	line number overflows. But some cases are beyond it, in
				915	which case you'll get a warning message explaining that
				916	annotations for the file might be incorrect.</para>
				917	</listitem>
				918
				919	<listitem>
				920	<para>If you compile some files with
				921	<computeroutput>-g</computeroutput> and some without, some
				922	events that take place in a file without debug info could be
				923	attributed to the last line of a file with debug info
				924	(whichever one gets placed before the non-debug-info file in
				925	the executable).</para>
				926	</listitem>
				927
				928	</itemizedlist>
				929
				930	<para>This list looks long, but these cases should be fairly
				931	rare.</para>
				932
				933	<formalpara>
				934	<title>Note:</title>
				935	<para><computeroutput>stabs</computeroutput> is not an easy
				936	format to read. If you come across bizarre annotations that
				937	look like might be caused by a bug in the stabs reader, please
				938	let us know.</para>
				939	</formalpara>
				940
				941	</sect2>
				942
				943
				944
				945	<sect2>
				946	<title>Accuracy</title>
				947
				948	<para>Valgrind's cache profiling has a number of
				949	shortcomings:</para>
				950
				951	<itemizedlist>
				952	<listitem>
				953	<para>It doesn't account for kernel activity -- the effect of
				954	system calls on the cache contents is ignored.</para>
				955	</listitem>
				956
				957	<listitem>
				958	<para>It doesn't account for other process activity (although
				959	this is probably desirable when considering a single
				960	program).</para>
				961	</listitem>
				962
				963	<listitem>
				964	<para>It doesn't account for virtual-to-physical address
				965	mappings; hence the entire simulation is not a true
				966	representation of what's happening in the
				967	cache.</para>
				968	</listitem>
				969
				970	<listitem>
				971	<para>It doesn't account for cache misses not visible at the
				972	instruction level, eg. those arising from TLB misses, or
				973	speculative execution.</para>
				974	</listitem>
				975
				976	<listitem>
sewardj	8d9fec5	2005-11-15 20:56:23 +0000	[diff] [blame]	977	<para>Valgrind will schedule
				978	threads differently from how they would be when running natively.
				979	This could warp the results for threaded programs.</para>
njn	3e986b2	2004-11-30 10:43:45 +0000	[diff] [blame]	980	</listitem>
				981
				982	<listitem>
sewardj	8d9fec5	2005-11-15 20:56:23 +0000	[diff] [blame]	983	<para>The x86/amd64 instructions <computeroutput>bts</computeroutput>,
njn	3e986b2	2004-11-30 10:43:45 +0000	[diff] [blame]	984	<computeroutput>btr</computeroutput> and
				985	<computeroutput>btc</computeroutput> will incorrectly be
				986	counted as doing a data read if both the arguments are
				987	registers, eg:</para>
				988	<programlisting><![CDATA[
				989	btsl %eax, %edx]]></programlisting>
				990
				991	<para>This should only happen rarely.</para>
				992	</listitem>
				993
				994	<listitem>
sewardj	8d9fec5	2005-11-15 20:56:23 +0000	[diff] [blame]	995	<para>x86/amd64 FPU instructions with data sizes of 28 and 108 bytes
njn	3e986b2	2004-11-30 10:43:45 +0000	[diff] [blame]	996	(e.g. <computeroutput>fsave</computeroutput>) are treated as
				997	though they only access 16 bytes. These instructions seem to
				998	be rare so hopefully this won't affect accuracy much.</para>
				999	</listitem>
				1000
				1001	</itemizedlist>
				1002
				1003	<para>Another thing worth nothing is that results are very
				1004	sensitive. Changing the size of the
				1005	<filename>valgrind.so</filename> file, the size of the program
				1006	being profiled, or even the length of its name can perturb the
				1007	results. Variations will be small, but don't expect perfectly
				1008	repeatable results if your program changes at all.</para>
				1009
				1010	<para>While these factors mean you shouldn't trust the results to
				1011	be super-accurate, hopefully they should be close enough to be
				1012	useful.</para>
				1013
				1014	</sect2>
				1015
njn	534f781	2006-10-21 22:22:59 +0000	[diff] [blame]	1016	</sect1>
				1017
				1018	<sect1>
				1019	<title>Implementation details</title>
				1020	This section talks about details you don't need to know about in order to
				1021	use Cachegrind, but may be of interest to some people.
njn	3e986b2	2004-11-30 10:43:45 +0000	[diff] [blame]	1022
				1023	<sect2>
njn	534f781	2006-10-21 22:22:59 +0000	[diff] [blame]	1024	<title>How Cachegrind works</title>
				1025	<para>The best reference for understanding how Cachegrind works is chapter 3 of
				1026	"Dynamic Binary Analysis and Instrumentation", by Nicholas Nethercote. It
njn	011215f	2006-10-21 23:00:59 +0000	[diff] [blame]	1027	is available on the <ulink url="&vg-pubs;">Valgrind publications
				1028	page</ulink>.</para>
njn	534f781	2006-10-21 22:22:59 +0000	[diff] [blame]	1029	</sect2>
njn	3e986b2	2004-11-30 10:43:45 +0000	[diff] [blame]	1030
njn	534f781	2006-10-21 22:22:59 +0000	[diff] [blame]	1031	<sect2>
				1032	<title>Cachegrind output file format</title>
				1033	<para>The file format is fairly straightforward, basically giving the
				1034	cost centre for every line, grouped by files and
				1035	functions. Total counts (eg. total cache accesses, total L1
				1036	misses) are calculated when traversing this structure rather than
				1037	during execution, to save time; the cache simulation functions
				1038	are called so often that even one or two extra adds can make a
				1039	sizeable difference.</para>
				1040
				1041	<para>The file format:</para>
				1042	<programlisting><![CDATA[
				1043	file ::= desc_line* cmd_line events_line data_line+ summary_line
				1044	desc_line ::= "desc:" ws? non_nl_string
				1045	cmd_line ::= "cmd:" ws? cmd
				1046	events_line ::= "events:" ws? (event ws)+
				1047	data_line ::= file_line \| fn_line \| count_line
				1048	file_line ::= "fl=" filename
				1049	fn_line ::= "fn=" fn_name
				1050	count_line ::= line_num ws? (count ws)+
				1051	summary_line ::= "summary:" ws? (count ws)+
				1052	count ::= num \| "."]]></programlisting>
				1053
				1054	<para>Where:</para>
njn	3e986b2	2004-11-30 10:43:45 +0000	[diff] [blame]	1055	<itemizedlist>
				1056	<listitem>
njn	534f781	2006-10-21 22:22:59 +0000	[diff] [blame]	1057	<para><computeroutput>non_nl_string</computeroutput> is any
				1058	string not containing a newline.</para>
njn	3e986b2	2004-11-30 10:43:45 +0000	[diff] [blame]	1059	</listitem>
njn	534f781	2006-10-21 22:22:59 +0000	[diff] [blame]	1060	<listitem>
				1061	<para><computeroutput>cmd</computeroutput> is a string holding the
				1062	command line of the profiled program.</para>
				1063	</listitem>
				1064	<listitem>
njn	2624212	2007-01-22 03:21:27 +0000	[diff] [blame]	1065	<para><computeroutput>event</computeroutput> is a string containing
				1066	no whitespace.</para>
				1067	</listitem>
				1068	<listitem>
njn	534f781	2006-10-21 22:22:59 +0000	[diff] [blame]	1069	<para><computeroutput>filename</computeroutput> and
				1070	<computeroutput>fn_name</computeroutput> are strings.</para>
				1071	</listitem>
				1072	<listitem>
				1073	<para><computeroutput>num</computeroutput> and
				1074	<computeroutput>line_num</computeroutput> are decimal
				1075	numbers.</para>
				1076	</listitem>
				1077	<listitem>
				1078	<para><computeroutput>ws</computeroutput> is whitespace.</para>
				1079	</listitem>
				1080	</itemizedlist>
				1081
				1082	<para>The contents of the "desc:" lines are printed out at the top
				1083	of the summary. This is a generic way of providing simulation
				1084	specific information, eg. for giving the cache configuration for
				1085	cache simulation.</para>
				1086
				1087	<para>More than one line of info can be presented for each file/fn/line number.
				1088	In such cases, the counts for the named events will be accumulated.</para>
				1089
				1090	<para>Counts can be "." to represent zero. This makes the files easier to
				1091	read.</para>
				1092
				1093	<para>The number of counts in each
				1094	<computeroutput>line</computeroutput> and the
				1095	<computeroutput>summary_line</computeroutput> should not exceed
				1096	the number of events in the
				1097	<computeroutput>event_line</computeroutput>. If the number in
				1098	each <computeroutput>line</computeroutput> is less, cg_annotate
				1099	treats those missing as though they were a "." entry.</para>
				1100
				1101	<para>A <computeroutput>file_line</computeroutput> changes the
				1102	current file name. A <computeroutput>fn_line</computeroutput>
				1103	changes the current function name. A
				1104	<computeroutput>count_line</computeroutput> contains counts that
				1105	pertain to the current filename/fn_name. A "fn="
				1106	<computeroutput>file_line</computeroutput> and a
				1107	<computeroutput>fn_line</computeroutput> must appear before any
				1108	<computeroutput>count_line</computeroutput>s to give the context
				1109	of the first <computeroutput>count_line</computeroutput>s.</para>
				1110
				1111	<para>Each <computeroutput>file_line</computeroutput> will normally be
				1112	immediately followed by a <computeroutput>fn_line</computeroutput>. But it
				1113	doesn't have to be.</para>
				1114
njn	3e986b2	2004-11-30 10:43:45 +0000	[diff] [blame]	1115
				1116	</sect2>
				1117
				1118	</sect1>
				1119	</chapter>