Blame - doc/internals.xml - fp2-dev/platform/external/oprofile

blob: a7feac74cf8aa626d238671f7c90b792c09a1e06 [file] [log] [blame]

Mike Dodd	8cfa702	2010-11-17 11:12:26 -0800	[diff] [blame]	1	<?xml version="1.0" encoding='ISO-8859-1'?>
				2	<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
				3
				4	<book id="oprofile-internals">
				5	<bookinfo>
				6	<title>OProfile Internals</title>
				7
				8	<authorgroup>
				9	<author>
				10	<firstname>John</firstname>
				11	<surname>Levon</surname>
				12	<affiliation>
				13	<address><email>levon@movementarian.org</email></address>
				14	</affiliation>
				15	</author>
				16	</authorgroup>
				17
				18	<copyright>
				19	<year>2003</year>
				20	<holder>John Levon</holder>
				21	</copyright>
				22	</bookinfo>
				23
				24	<toc></toc>
				25
				26	<chapter id="introduction">
				27	<title>Introduction</title>
				28
				29	<para>
				30	This document is current for OProfile version <oprofileversion />.
				31	This document provides some details on the internal workings of OProfile for the
				32	interested hacker. This document assumes strong C, working C++, plus some knowledge of
				33	kernel internals and CPU hardware.
				34	</para>
				35	<note>
				36	<para>
				37	Only the "new" implementation associated with kernel 2.6 and above is covered here. 2.4
				38	uses a very different kernel module implementation and daemon to produce the sample files.
				39	</para>
				40	</note>
				41
				42	<sect1 id="overview">
				43	<title>Overview</title>
				44	<para>
				45	OProfile is a statistical continuous profiler. In other words, profiles are generated by
				46	regularly sampling the current registers on each CPU (from an interrupt handler, the
				47	saved PC value at the time of interrupt is stored), and converting that runtime PC
				48	value into something meaningful to the programmer.
				49	</para>
				50	<para>
				51	OProfile achieves this by taking the stream of sampled PC values, along with the detail
				52	of which task was running at the time of the interrupt, and converting into a file offset
				53	against a particular binary file. Because applications <function>mmap()</function>
				54	the code they run (be it <filename>/bin/bash</filename>, <filename>/lib/libfoo.so</filename>
				55	or whatever), it's possible to find the relevant binary file and offset by walking
				56	the task's list of mapped memory areas. Each PC value is thus converted into a tuple
				57	of binary-image,offset. This is something that the userspace tools can use directly
				58	to reconstruct where the code came from, including the particular assembly instructions,
				59	symbol, and source line (via the binary's debug information if present).
				60	</para>
				61	<para>
				62	Regularly sampling the PC value like this approximates what actually was executed and
				63	how often - more often than not, this statistical approximation is good enough to
				64	reflect reality. In common operation, the time between each sample interrupt is regulated
				65	by a fixed number of clock cycles. This implies that the results will reflect where
				66	the CPU is spending the most time; this is obviously a very useful information source
				67	for performance analysis.
				68	</para>
				69	<para>
				70	Sometimes though, an application programmer needs different kinds of information: for example,
				71	"which of the source routines cause the most cache misses ?". The rise in importance of
				72	such metrics in recent years has led many CPU manufacturers to provide hardware performance
				73	counters capable of measuring these events on the hardware level. Typically, these counters
				74	increment once per each event, and generate an interrupt on reaching some pre-defined
				75	number of events. OProfile can use these interrupts to generate samples: then, the
				76	profile results are a statistical approximation of which code caused how many of the
				77	given event.
				78	</para>
				79	<para>
				80	Consider a simplified system that only executes two functions A and B. A
				81	takes one cycle to execute, whereas B takes 99 cycles. Imagine we run at
				82	100 cycles a second, and we've set the performance counter to create an
				83	interrupt after a set number of "events" (in this case an event is one
				84	clock cycle). It should be clear that the chances of the interrupt
				85	occurring in function A is 1/100, and 99/100 for function B. Thus, we
				86	statistically approximate the actual relative performance features of
				87	the two functions over time. This same analysis works for other types of
				88	events, providing that the interrupt is tied to the number of events
				89	occurring (that is, after N events, an interrupt is generated).
				90	</para>
				91	<para>
				92	There are typically more than one of these counters, so it's possible to set up profiling
				93	for several different event types. Using these counters gives us a powerful, low-overhead
				94	way of gaining performance metrics. If OProfile, or the CPU, does not support performance
				95	counters, then a simpler method is used: the kernel timer interrupt feeds samples
				96	into OProfile itself.
				97	</para>
				98	<para>
				99	The rest of this document concerns itself with how we get from receiving samples at
				100	interrupt time to producing user-readable profile information.
				101	</para>
				102	</sect1>
				103
				104	<sect1 id="components">
				105	<title>Components of the OProfile system</title>
				106
				107	<sect2 id="arch-specific-components">
				108	<title>Architecture-specific components</title>
				109	<para>
				110	If OProfile supports the hardware performance counters found on
				111	a particular architecture, code for managing the details of setting
				112	up and managing these counters can be found in the kernel source
				113	tree in the relevant <filename>arch/<emphasis>arch</emphasis>/oprofile/</filename>
				114	directory. The architecture-specific implementation works via
				115	filling in the oprofile_operations structure at init time. This
				116	provides a set of operations such as <function>setup()</function>,
				117	<function>start()</function>, <function>stop()</function>, etc.
				118	that manage the hardware-specific details of fiddling with the
				119	performance counter registers.
				120	</para>
				121	<para>
				122	The other important facility available to the architecture code is
				123	<function>oprofile_add_sample()</function>. This is where a particular sample
				124	taken at interrupt time is fed into the generic OProfile driver code.
				125	</para>
				126	</sect2>
				127
				128	<sect2 id="filesystem">
				129	<title>oprofilefs</title>
				130	<para>
				131	OProfile implements a pseudo-filesystem known as "oprofilefs", mounted from
				132	userspace at <filename>/dev/oprofile</filename>. This consists of small
				133	files for reporting and receiving configuration from userspace, as well
				134	as the actual character device that the OProfile userspace receives samples
				135	from. At <function>setup()</function> time, the architecture-specific may
				136	add further configuration files related to the details of the performance
				137	counters. For example, on x86, one numbered directory for each hardware
				138	performance counter is added, with files in each for the event type,
				139	reset value, etc.
				140	</para>
				141	<para>
				142	The filesystem also contains a <filename>stats</filename> directory with
				143	a number of useful counters for various OProfile events.
				144	</para>
				145	</sect2>
				146
				147	<sect2 id="driver">
				148	<title>Generic kernel driver</title>
				149	<para>
				150	This lives in <filename>drivers/oprofile/</filename>, and forms the core of
				151	how OProfile works in the kernel. Its job is to take samples delivered
				152	from the architecture-specific code (via <function>oprofile_add_sample()</function>),
				153	and buffer this data, in a transformed form as described later, until releasing
				154	the data to the userspace daemon via the <filename>/dev/oprofile/buffer</filename>
				155	character device.
				156	</para>
				157	</sect2>
				158
				159	<sect2 id="daemon">
				160	<title>The OProfile daemon</title>
				161	<para>
				162	The OProfile userspace daemon's job is to take the raw data provided by the
				163	kernel and write it to the disk. It takes the single data stream from the
				164	kernel and logs sample data against a number of sample files (found in
				165	<filename>$SESSION_DIR/samples/current/</filename>, by default located at
				166	<filename>/var/lib/oprofile/samples/current/</filename>. For the benefit
				167	of the "separate" functionality, the names/paths of these sample files
				168	are mangled to reflect where the samples were from: this can include
				169	thread IDs, the binary file path, the event type used, and more.
				170	</para>
				171	<para>
				172	After this final step from interrupt to disk file, the data is now
				173	persistent (that is, changes in the running of the system do not invalidate
				174	stored data). So the post-profiling tools can run on this data at any
				175	time (assuming the original binary files are still available and unchanged,
				176	naturally).
				177	</para>
				178	</sect2>
				179
				180	<sect2 id="post-profiling">
				181	<title>Post-profiling tools</title>
				182	So far, we've collected data, but we've yet to present it in a useful form
				183	to the user. This is the job of the post-profiling tools. In general form,
				184	they collate a subset of the available sample files, load and process each one
				185	correlated against the relevant binary file, and finally produce user-readable
				186	information.
				187	</sect2>
				188
				189	</sect1>
				190
				191	</chapter>
				192
				193	<chapter id="performance-counters">
				194	<title>Performance counter management</title>
				195
				196	<sect1 id ="performance-counters-ui">
				197	<title>Providing a user interface</title>
				198
				199	<para>
				200	The performance counter registers need programming in order to set the
				201	type of event to count, etc. OProfile uses a standard model across all
				202	CPUs for defining these events as follows :
				203	</para>
				204	<informaltable frame="all">
				205	<tgroup cols='2'>
				206	<tbody>
				207	<row><entry><option>event</option></entry><entry>The event type e.g. DATA_MEM_REFS</entry></row>
				208	<row><entry><option>unit mask</option></entry><entry>The sub-events to count (more detailed specification)</entry></row>
				209	<row><entry><option>counter</option></entry><entry>The hardware counter(s) that can count this event</entry></row>
				210	<row><entry><option>count</option></entry><entry>The reset value (how many events before an interrupt)</entry></row>
				211	<row><entry><option>kernel</option></entry><entry>Whether the counter should increment when in kernel space</entry></row>
				212	<row><entry><option>user</option></entry><entry>Whether the counter should increment when in user space</entry></row>
				213	</tbody>
				214	</tgroup>
				215	</informaltable>
				216	<para>
				217	The term "unit mask" is borrowed from the Intel architectures, and can
				218	further specify exactly when a counter is incremented (for example,
				219	cache-related events can be restricted to particular state transitions
				220	of the cache lines).
				221	</para>
				222	<para>
				223	All of the available hardware events and their details are specified in
				224	the textual files in the <filename>events</filename> directory. The
				225	syntax of these files should be fairly obvious. The user specifies the
				226	names and configuration details of the chosen counters via
				227	<command>opcontrol</command>. These are then written to the kernel
				228	module (in numerical form) via <filename>/dev/oprofile/N/</filename>
				229	where N is the physical hardware counter (some events can only be used
				230	on specific counters; OProfile hides these details from the user when
				231	possible). On IA64, the perfmon-based interface behaves somewhat
				232	differently, as described later.
				233	</para>
				234
				235	</sect1>
				236
				237	<sect1 id="performance-counters-programming">
				238	<title>Programming the performance counter registers</title>
				239
				240	<para>
				241	We have described how the user interface fills in the desired
				242	configuration of the counters and transmits the information to the
				243	kernel. It is the job of the <function>->setup()</function> method
				244	to actually program the performance counter registers. Clearly, the
				245	details of how this is done is architecture-specific; it is also
				246	model-specific on many architectures. For example, i386 provides methods
				247	for each model type that programs the counter registers correctly
				248	(see the <filename>op_model_*</filename> files in
				249	<filename>arch/i386/oprofile</filename> for the details). The method
				250	reads the values stored in the virtual oprofilefs files and programs
				251	the registers appropriately, ready for starting the actual profiling
				252	session.
				253	</para>
				254	<para>
				255	The architecture-specific drivers make sure to save the old register
				256	settings before doing OProfile setup. They are restored when OProfile
				257	shuts down. This is useful, for example, on i386, where the NMI watchdog
				258	uses the same performance counter registers as OProfile; they cannot
				259	run concurrently, but OProfile makes sure to restore the setup it found
				260	before it was running.
				261	</para>
				262	<para>
				263	In addition to programming the counter registers themselves, other setup
				264	is often necessary. For example, on i386, the local APIC needs
				265	programming in order to make the counter's overflow interrupt appear as
				266	an NMI (non-maskable interrupt). This allows sampling (and therefore
				267	profiling) of regions where "normal" interrupts are masked, enabling
				268	more reliable profiles.
				269	</para>
				270
				271	<sect2 id="performance-counters-start">
				272	<title>Starting and stopping the counters</title>
				273	<para>
				274	Initiating a profiling session is done via writing an ASCII '1'
				275	to the file <filename>/dev/oprofile/enable</filename>. This sets up the
				276	core, and calls into the architecture-specific driver to actually
				277	enable each configured counter. Again, the details of how this is
				278	done is model-specific (for example, the Athlon models can disable
				279	or enable on a per-counter basis, unlike the PPro models).
				280	</para>
				281	</sect2>
				282
				283	<sect2>
				284	<title>IA64 and perfmon</title>
				285	<para>
				286	The IA64 architecture provides a different interface from the other
				287	architectures, using the existing perfmon driver. Register programming
				288	is handled entirely in user-space (see
				289	<filename>daemon/opd_perfmon.c</filename> for the details). A process
				290	is forked for each CPU, which creates a perfmon context and sets the
				291	counter registers appropriately via the
				292	<function>sys_perfmonctl</function> interface. In addition, the actual
				293	initiation and termination of the profiling session is handled via the
				294	same interface using <constant>PFM_START</constant> and
				295	<constant>PFM_STOP</constant>. On IA64, then, there are no oprofilefs
				296	files for the performance counters, as the kernel driver does not
				297	program the registers itself.
				298	</para>
				299	<para>
				300	Instead, the perfmon driver for OProfile simply registers with the
				301	OProfile core with an OProfile-specific UUID. During a profiling
				302	session, the perfmon core calls into the OProfile perfmon driver and
				303	samples are registered with the OProfile core itself as usual (with
				304	<function>oprofile_add_sample()</function>).
				305	</para>
				306	</sect2>
				307
				308	</sect1>
				309
				310	</chapter>
				311
				312	<chapter id="collecting-samples">
				313	<title>Collecting and processing samples</title>
				314
				315	<sect1 id="receiving-interrupts">
				316	<title>Receiving interrupts</title>
				317	<para>
				318	Naturally, how the overflow interrupts are received is specific
				319	to the hardware architecture, unless we are in "timer" mode, where the
				320	logging routine is called directly from the standard kernel timer
				321	interrupt handler.
				322	</para>
				323	<para>
				324	On the i386 architecture, the local APIC is programmed such that when a
				325	counter overflows (that is, it receives an event that causes an integer
				326	overflow of the register value to zero), an NMI is generated. This calls
				327	into the general handler <function>do_nmi()</function>; because OProfile
				328	has registered itself as capable of handling NMI interrupts, this will
				329	call into the OProfile driver code in
				330	<filename>arch/i386/oprofile</filename>. Here, the saved PC value (the
				331	CPU saves the register set at the time of interrupt on the stack
				332	available for inspection) is extracted, and the counters are examined to
				333	find out which one generated the interrupt. Also determined is whether
				334	the system was inside kernel or user space at the time of the interrupt.
				335	These three pieces of information are then forwarded onto the OProfile
				336	core via <function>oprofile_add_sample()</function>. Finally, the
				337	counter values are reset to the chosen count value, to ensure another
				338	interrupt happens after another N events have occurred. Other
				339	architectures behave in a similar manner.
				340	</para>
				341	</sect1>
				342
				343	<sect1 id="core-structure">
				344	<title>Core data structures</title>
				345	<para>
				346	Before considering what happens when we log a sample, we shall digress
				347	for a moment and look at the general structure of the data collection
				348	system.
				349	</para>
				350	<para>
				351	OProfile maintains a small buffer for storing the logged samples for
				352	each CPU on the system. Only this buffer is altered when we actually log
				353	a sample (remember, we may still be in an NMI context, so no locking is
				354	possible). The buffer is managed by a two-handed system; the "head"
				355	iterator dictates where the next sample data should be placed in the
				356	buffer. Of course, overflow of the buffer is possible, in which case
				357	the sample is discarded.
				358	</para>
				359	<para>
				360	It is critical to remember that at this point, the PC value is an
				361	absolute value, and is therefore only meaningful in the context of which
				362	task it was logged against. Thus, these per-CPU buffers also maintain
				363	details of which task each logged sample is for, as described in the
				364	next section. In addition, we store whether the sample was in kernel
				365	space or user space (on some architectures and configurations, the address
				366	space is not sub-divided neatly at a specific PC value, so we must store
				367	this information).
				368	</para>
				369	<para>
				370	As well as these small per-CPU buffers, we have a considerably larger
				371	single buffer. This holds the data that is eventually copied out into
				372	the OProfile daemon. On certain system events, the per-CPU buffers are
				373	processed and entered (in mutated form) into the main buffer, known in
				374	the source as the "event buffer". The "tail" iterator indicates the
				375	point from which the CPU may be read, up to the position of the "head"
				376	iterator. This provides an entirely lock-free method for extracting data
				377	from the CPU buffers. This process is described in detail later in this chapter.
				378	</para>
				379	<figure><title>The OProfile buffers</title>
				380	<graphic fileref="buffers.png" />
				381	</figure>
				382	</sect1>
				383
				384	<sect1 id="logging-sample">
				385	<title>Logging a sample</title>
				386	<para>
				387	As mentioned, the sample is logged into the buffer specific to the
				388	current CPU. The CPU buffer is a simple array of pairs of unsigned long
				389	values; for a sample, they hold the PC value and the counter for the
				390	sample. (The counter value is later used to translate back into the relevant
				391	event type the counter was programmed to).
				392	</para>
				393	<para>
				394	In addition to logging the sample itself, we also log task switches.
				395	This is simply done by storing the address of the last task to log a
				396	sample on that CPU in a data structure, and writing a task switch entry
				397	into the buffer if the new value of <function>current()</function> has
				398	changed. Note that later we will directly de-reference this pointer;
				399	this imposes certain restrictions on when and how the CPU buffers need
				400	to be processed.
				401	</para>
				402	<para>
				403	Finally, as mentioned, we log whether we have changed between kernel and
				404	userspace using a similar method. Both of these variables
				405	(<varname>last_task</varname> and <varname>last_is_kernel</varname>) are
				406	reset when the CPU buffer is read.
				407	</para>
				408	</sect1>
				409
				410	<sect1 id="logging-stack">
				411	<title>Logging stack traces</title>
				412	<para>
				413	OProfile can also provide statistical samples of call chains (on x86). To
				414	do this, at sample time, the frame pointer chain is traversed, recording
				415	the return address for each stack frame. This will only work if the code
				416	was compiled with frame pointers, but we're careful to abort the
				417	traversal if the frame pointer appears bad. We store the set of return
				418	addresses straight into the CPU buffer. Note that, since this traversal
				419	is keyed off the standard sample interrupt, the number of times a
				420	function appears in a stack trace is not an indicator of how many times
				421	the call site was executed: rather, it's related to the number of
				422	samples we took where that call site was involved. Thus, the results for
				423	stack traces are not necessarily proportional to the call counts:
				424	typical programs will have many <function>main()</function> samples.
				425	</para>
				426	</sect1>
				427
				428	<sect1 id="synchronising-buffers">
				429	<title>Synchronising the CPU buffers to the event buffer</title>
				430	<!-- FIXME: update when percpu patch goes in -->
				431	<para>
				432	At some point, we have to process the data in each CPU buffer and enter
				433	it into the main (event) buffer. The file
				434	<filename>buffer_sync.c</filename> contains the relevant code. We
				435	periodically (currently every <constant>HZ</constant>/4 jiffies) start
				436	the synchronisation process. In addition, we process the buffers on
				437	certain events, such as an application calling
				438	<function>munmap()</function>. This is particularly important for
				439	<function>exit()</function> - because the CPU buffers contain pointers
				440	to the task structure, if we don't process all the buffers before the
				441	task is actually destroyed and the task structure freed, then we could
				442	end up trying to dereference a bogus pointer in one of the CPU buffers.
				443	</para>
				444	<para>
				445	We also add a notification when a kernel module is loaded; this is so
				446	that user-space can re-read <filename>/proc/modules</filename> to
				447	determine the load addresses of kernel module text sections. Without
				448	this notification, samples for a newly-loaded module could get lost or
				449	be attributed to the wrong module.
				450	</para>
				451	<para>
				452	The synchronisation itself works in the following manner: first, mutual
				453	exclusion on the event buffer is taken. Remember, we do not need to do
				454	that for each CPU buffer, as we only read from the tail iterator (whilst
				455	interrupts might be arriving at the same buffer, but they will write to
				456	the position of the head iterator, leaving previously written entries
				457	intact). Then, we process each CPU buffer in turn. A CPU switch
				458	notification is added to the buffer first (for
				459	<option>--separate=cpu</option> support). Then the processing of the
				460	actual data starts.
				461	</para>
				462	<para>
				463	As mentioned, the CPU buffer consists of task switch entries and the
				464	actual samples. When the routine <function>sync_buffer()</function> sees
				465	a task switch, the process ID and process group ID are recorded into the
				466	event buffer, along with a dcookie (see below) identifying the
				467	application binary (e.g. <filename>/bin/bash</filename>). The
				468	<varname>mmap_sem</varname> for the task is then taken, to allow safe
				469	iteration across the tasks' list of mapped areas. Each sample is then
				470	processed as described in the next section.
				471	</para>
				472	<para>
				473	After a buffer has been read, the tail iterator is updated to reflect
				474	how much of the buffer was processed. Note that when we determined how
				475	much data there was to read in the CPU buffer, we also called
				476	<function>cpu_buffer_reset()</function> to reset
				477	<varname>last_task</varname> and <varname>last_is_kernel</varname>, as
				478	we've already mentioned. During the processing, more samples may have
				479	been arriving in the CPU buffer; this is OK because we are careful to
				480	only update the tail iterator to how much we actually read - on the next
				481	buffer synchronisation, we will start again from that point.
				482	</para>
				483	</sect1>
				484
				485	<sect1 id="dentry-cookies">
				486	<title>Identifying binary images</title>
				487	<para>
				488	In order to produce useful profiles, we need to be able to associate a
				489	particular PC value sample with an actual ELF binary on the disk. This
				490	leaves us with the problem of how to export this information to
				491	user-space. We create unique IDs that identify a particular directory
				492	entry (dentry), and write those IDs into the event buffer. Later on,
				493	the user-space daemon can call the <function>lookup_dcookie</function>
				494	system call, which looks up the ID and fills in the full path of
				495	the binary image in the buffer user-space passes in. These IDs are
				496	maintained by the code in <filename>fs/dcookies.c</filename>; the
				497	cache lasts for as long as the daemon has the event buffer open.
				498	</para>
				499	</sect1>
				500
				501	<sect1 id="finding-dentry">
				502	<title>Finding a sample's binary image and offset</title>
				503	<para>
				504	We haven't yet described how we process the absolute PC value into
				505	something usable by the user-space daemon. When we find a sample entered
				506	into the CPU buffer, we traverse the list of mappings for the task
				507	(remember, we will have seen a task switch earlier, so we know which
				508	task's lists to look at). When a mapping is found that contains the PC
				509	value, we look up the mapped file's dentry in the dcookie cache. This
				510	gives the dcookie ID that will uniquely identify the mapped file. Then
				511	we alter the absolute value such that it is an offset from the start of
				512	the file being mapped (the mapping need not start at the start of the
				513	actual file, so we have to consider the offset value of the mapping). We
				514	store this dcookie ID into the event buffer; this identifies which
				515	binary the samples following it are against.
				516	In this manner, we have converted a PC value, which has transitory
				517	meaning only, into a static offset value for later processing by the
				518	daemon.
				519	</para>
				520	<para>
				521	We also attempt to avoid the relatively expensive lookup of the dentry
				522	cookie value by storing the cookie value directly into the dentry
				523	itself; then we can simply derive the cookie value immediately when we
				524	find the correct mapping.
				525	</para>
				526	</sect1>
				527
				528	</chapter>
				529
				530	<chapter id="sample-files">
				531	<title>Generating sample files</title>
				532
				533	<sect1 id="processing-buffer">
				534	<title>Processing the buffer</title>
				535
				536	<para>
				537	Now we can move onto user-space in our description of how raw interrupt
				538	samples are processed into useful information. As we described in
				539	previous sections, the kernel OProfile driver creates a large buffer of
				540	sample data consisting of offset values, interspersed with
				541	notification of changes in context. These context changes indicate how
				542	following samples should be attributed, and include task switches, CPU
				543	changes, and which dcookie the sample value is against. By processing
				544	this buffer entry-by-entry, we can determine where the samples should
				545	be accredited to. This is particularly important when using the
				546	<option>--separate</option>.
				547	</para>
				548	<para>
				549	The file <filename>daemon/opd_trans.c</filename> contains the basic routine
				550	for the buffer processing. The <varname>struct transient</varname>
				551	structure is used to hold changes in context. Its members are modified
				552	as we process each entry; it is passed into the routines in
				553	<filename>daemon/opd_sfile.c</filename> for actually logging the sample
				554	to a particular sample file (which will be held in
				555	<filename>$SESSION_DIR/samples/current</filename>).
				556	</para>
				557	<para>
				558	The buffer format is designed for conciseness, as high sampling rates
				559	can easily generate a lot of data. Thus, context changes are prefixed
				560	by an escape code, identified by <function>is_escape_code()</function>.
				561	If an escape code is found, the next entry in the buffer identifies
				562	what type of context change is being read. These are handed off to
				563	various handlers (see the <varname>handlers</varname> array), which
				564	modify the transient structure as appropriate. If it's not an escape
				565	code, then it must be a PC offset value, and the very next entry will
				566	be the numeric hardware counter. These values are read and recorded
				567	in the transient structure; we then do a lookup to find the correct
				568	sample file, and log the sample, as described in the next section.
				569	</para>
				570
				571	<sect2 id="handling-kernel-samples">
				572	<title>Handling kernel samples</title>
				573
				574	<para>
				575	Samples from kernel code require a little special handling. Because
				576	the binary text which the sample is against does not correspond to
				577	any file that the kernel directly knows about, the OProfile driver
				578	stores the absolute PC value in the buffer, instead of the file offset.
				579	Of course, we need an offset against some particular binary. To handle
				580	this, we keep a list of loaded modules by parsing
				581	<filename>/proc/modules</filename> as needed. When a module is loaded,
				582	a notification is placed in the OProfile buffer, and this triggers a
				583	re-read. We store the module name, and the loading address and size.
				584	This is also done for the main kernel image, as specified by the user.
				585	The absolute PC value is matched against each address range, and
				586	modified into an offset when the matching module is found. See
				587	<filename>daemon/opd_kernel.c</filename> for the details.
				588	</para>
				589
				590	</sect2>
				591
				592
				593	</sect1>
				594
				595	<sect1 id="sample-file-generation">
				596	<title>Locating and creating sample files</title>
				597
				598	<para>
				599	We have a sample value and its satellite data stored in a
				600	<varname>struct transient</varname>, and we must locate an
				601	actual sample file to store the sample in, using the context
				602	information in the transient structure as a key. The transient data to
				603	sample file lookup is handled in
				604	<filename>daemon/opd_sfile.c</filename>. A hash is taken of the
				605	transient values that are relevant (depending upon the setting of
				606	<option>--separate</option>, some values might be irrelevant), and the
				607	hash value is used to lookup the list of currently open sample files.
				608	Of course, the sample file might not be found, in which case we need
				609	to create and open it.
				610	</para>
				611	<para>
				612	OProfile uses a rather complex scheme for naming sample files, in order
				613	to make selecting relevant sample files easier for the post-profiling
				614	utilities. The exact details of the scheme are given in
				615	<filename>oprofile-tests/pp_interface</filename>, but for now it will
				616	suffice to remember that the filename will include only relevant
				617	information for the current settings, taken from the transient data. A
				618	fully-specified filename looks something like :
				619	</para>
				620	<computeroutput>
				621	/var/lib/oprofile/samples/current/{root}/usr/bin/xmms/{dep}/{root}/lib/tls/libc-2.3.2.so/CPU_CLK_UNHALTED.100000.0.28082.28089.0
				622	</computeroutput>
				623	<para>
				624	It should be clear that this identifies such information as the
				625	application binary, the dependent (library) binary, the hardware event,
				626	and the process and thread ID. Typically, not all this information is
				627	needed, in which cases some values may be replaced with the token
				628	<filename>all</filename>.
				629	</para>
				630	<para>
				631	The code that generates this filename and opens the file is found in
				632	<filename>daemon/opd_mangling.c</filename>. You may have realised that
				633	at this point, we do not have the binary image file names, only the
				634	dcookie values. In order to determine a file name, a dcookie value is
				635	looked up in the dcookie cache. This is to be found in
				636	<filename>daemon/opd_cookie.c</filename>. Since dcookies are both
				637	persistent and unique during a sampling session, we can cache the
				638	values. If the value is not found in the cache, then we ask the kernel
				639	to do the lookup from value to file name for us by calling
				640	<function>lookup_dcookie()</function>. This looks up the value in a
				641	kernel-side cache (see <filename>fs/dcookies.c</filename>) and returns
				642	the fully-qualified file name to userspace.
				643	</para>
				644
				645	</sect1>
				646
				647	<sect1 id="sample-file-writing">
				648	<title>Writing data to a sample file</title>
				649
				650	<para>
				651	Each specific sample file is a hashed collection, where the key is
				652	the PC offset from the transient data, and the value is the number of
				653	samples recorded against that offset. The files are
				654	<function>mmap()</function>ed into the daemon's memory space. The code
				655	to actually log the write against the sample file can be found in
				656	<filename>libdb/</filename>.
				657	</para>
				658	<para>
				659	For recording stack traces, we have a more complicated sample filename
				660	mangling scheme that allows us to identify cross-binary calls. We use
				661	the same sample file format, where the key is a 64-bit value composed
				662	from the from,to pair of offsets.
				663	</para>
				664
				665	</sect1>
				666
				667	</chapter>
				668
				669	<chapter id="output">
				670	<title>Generating useful output</title>
				671
				672	<para>
				673	All of the tools used to generate human-readable output have to take
				674	roughly the same steps to collect the data for processing. First, the
				675	profile specification given by the user has to be parsed. Next, a list
				676	of sample files matching the specification has to obtained. Using this
				677	list, we need to locate the binary file for each sample file, and then
				678	use them to extract meaningful data, before a final collation and
				679	presentation to the user.
				680	</para>
				681
				682	<sect1 id="profile-specification">
				683	<title>Handling the profile specification</title>
				684
				685	<para>
				686	The profile specification presented by the user is parsed in
				687	the function <function>profile_spec::create()</function>. This
				688	creates an object representing the specification. Then we
				689	use <function>profile_spec::generate_file_list()</function>
				690	to search for all sample files and match them against the
				691	<varname>profile_spec</varname>.
				692	</para>
				693
				694	<para>
				695	To enable this matching process to work, the attributes of
				696	each sample file is encoded in its filename. This is a low-tech
				697	approach to matching specifications against candidate sample
				698	files, but it works reasonably well. A typical sample file
				699	might look like these:
				700	</para>
				701	<screen>
				702	/var/lib/oprofile/samples/current/{root}/bin/ls/{dep}/{root}/bin/ls/{cg}/{root}/bin/ls/CPU_CLK_UNHALTED.100000.0.all.all.all
				703	/var/lib/oprofile/samples/current/{root}/bin/ls/{dep}/{root}/bin/ls/CPU_CLK_UNHALTED.100000.0.all.all.all
				704	/var/lib/oprofile/samples/current/{root}/bin/ls/{dep}/{root}/bin/ls/CPU_CLK_UNHALTED.100000.0.7423.7424.0
				705	/var/lib/oprofile/samples/current/{kern}/r128/{dep}/{kern}/r128/CPU_CLK_UNHALTED.100000.0.all.all.all
				706	</screen>
				707	<para>
				708	This looks unnecessarily complex, but it's actually fairly simple. First
				709	we have the session of the sample, by default located here
				710	<filename>/var/lib/oprofile/samples/current</filename>. This location
				711	can be changed by specifying the --session-dir option at command-line.
				712	This session could equally well be inside an archive from <command>oparchive</command>.
				713	Next we have one of the tokens <filename>{root}</filename> or
				714	<filename>{kern}</filename>. <filename>{root}</filename> indicates
				715	that the binary is found on a file system, and we will encode its path
				716	in the next section (e.g. <filename>/bin/ls</filename>).
				717	<filename>{kern}</filename> indicates a kernel module - on 2.6 kernels
				718	the path information is not available from the kernel, so we have to
				719	special-case kernel modules like this; we encode merely the name of the
				720	module as loaded.
				721	</para>
				722	<para>
				723	Next there is a <filename>{dep}</filename> token, indicating another
				724	token/path which identifies the dependent binary image. This is used even for
				725	the "primary" binary (i.e. the one that was
				726	<function>execve()</function>d), as it simplifies processing. Finally,
				727	if this sample file is a normal flat profile, the actual file is next in
				728	the path. If it's a call-graph sample file, we need one further
				729	specification, to allow us to identify cross-binary arcs in the call
				730	graph.
				731	</para>
				732	<para>
				733	The actual sample file name is dot-separated, where the fields are, in
				734	order: event name, event count, unit mask, task group ID, task ID, and
				735	CPU number.
				736	</para>
				737	<para>
				738	This sample file can be reliably parsed (with
				739	<function>parse_filename()</function>) into a
				740	<varname>filename_spec</varname>. Finally, we can check whether to
				741	include the sample file in the final results by comparing this
				742	<varname>filename_spec</varname> against the
				743	<varname>profile_spec</varname> the user specified (for the interested,
				744	see <function>valid_candidate()</function> and
				745	<function>profile_spec::match</function>). Then comes the really
				746	complicated bit...
				747	</para>
				748
				749	</sect1>
				750
				751	<sect1 id="sample-file-collating">
				752	<title>Collating the candidate sample files</title>
				753
				754	<para>
				755	At this point we have a duplicate-free list of sample files we need
				756	to process. But first we need to do some further arrangement: we
				757	need to classify each sample file, and we may also need to "invert"
				758	the profiles.
				759	</para>
				760
				761	<sect2 id="sample-file-classifying">
				762	<title>Classifying sample files</title>
				763
				764	<para>
				765	It's possible for utilities like <command>opreport</command> to show
				766	data in columnar format: for example, we might want to show the results
				767	of two threads within a process side-by-side. To do this, we need
				768	to classify each sample file into classes - the classes correspond
				769	with each <command>opreport</command> column. The function that handles
				770	this is <function>arrange_profiles()</function>. Each sample file
				771	is added to a particular class. If the sample file is the first in
				772	its class, a template is generated from the sample file. Each template
				773	describes a particular class (thus, in our example above, each template
				774	will have a different thread ID, and this uniquely identifies each
				775	class).
				776	</para>
				777
				778	<para>
				779	Each class has a list of "profile sets" matching that class's template.
				780	A profile set is either a profile of the primary binary image, or any of
				781	its dependent images. After all sample files have been listed in one of
				782	the profile sets belonging to the classes, we have to name each class and
				783	perform error-checking. This is done by
				784	<function>identify_classes()</function>; each class is checked to ensure
				785	that its "axis" is the same as all the others. This is needed because
				786	<command>opreport</command> can't produce results in 3D format: we can
				787	only differ in one aspect, such as thread ID or event name.
				788	</para>
				789
				790	</sect2>
				791
				792	<sect2 id="sample-file-inverting">
				793	<title>Creating inverted profile lists</title>
				794
				795	<para>
				796	Remember that if we're using certain profile separation options, such as
				797	"--separate=lib", a single binary could be a dependent image to many
				798	different binaries. For example, the C library image would be a
				799	dependent image for most programs that have been profiled. As it
				800	happens, this can cause severe performance problems: without some
				801	re-arrangement, these dependent binary images would be opened each
				802	time we need to process sample files for each program.
				803	</para>
				804
				805	<para>
				806	The solution is to "invert" the profiles via
				807	<function>invert_profiles()</function>. We create a new data structure
				808	where the dependent binary is first, and the primary binary images using
				809	that dependent binary are listed as sub-images. This helps our
				810	performance problem, as now we only need to open each dependent image
				811	once, when we process the list of inverted profiles.
				812	</para>
				813
				814	</sect2>
				815
				816	</sect1>
				817
				818	<sect1 id="generating-profile-data">
				819	<title>Generating profile data</title>
				820
				821	<para>
				822	Things don't get any simpler at this point, unfortunately. At this point
				823	we've collected and classified the sample files into the set of inverted
				824	profiles, as described in the previous section. Now we need to process
				825	each inverted profile and make something of the data. The entry point
				826	for this is <function>populate_for_image()</function>.
				827	</para>
				828
				829	<sect2 id="bfd">
				830	<title>Processing the binary image</title>
				831	<para>
				832	The first thing we do with an inverted profile is attempt to open the
				833	binary image (remember each inverted profile set is only for one binary
				834	image, but may have many sample files to process). The
				835	<varname>op_bfd</varname> class provides an abstracted interface to
				836	this; internally it uses <filename>libbfd</filename>. The main purpose
				837	of this class is to process the symbols for the binary image; this is
				838	also where symbol filtering happens. This is actually quite tricky, but
				839	should be clear from the source.
				840	</para>
				841	</sect2>
				842
				843	<sect2 id="processing-sample-files">
				844	<title>Processing the sample files</title>
				845	<para>
				846	The class <varname>profile_container</varname> is a hold-all that
				847	contains all the processed results. It is a container of
				848	<varname>profile_t</varname> objects. The
				849	<function>add_sample_files()</function> method uses
				850	<filename>libdb</filename> to open the given sample file and add the
				851	key/value types to the <varname>profile_t</varname>. Once this has been
				852	done, <function>profile_container::add()</function> is passed the
				853	<varname>profile_t</varname> plus the <varname>op_bfd</varname> for
				854	processing.
				855	</para>
				856	<para>
				857	<function>profile_container::add()</function> walks through the symbols
				858	collected in the <varname>op_bfd</varname>.
				859	<function>op_bfd::get_symbol_range()</function> gives us the start and
				860	end of the symbol as an offset from the start of the binary image,
				861	then we interrogate the <varname>profile_t</varname> for the relevant samples
				862	for that offset range. We create a <varname>symbol_entry</varname>
				863	object for this symbol and fill it in. If needed, here we also collect
				864	debug information from the <varname>op_bfd</varname>, and possibly
				865	record the detailed sample information (as used by <command>opreport
				866	-d</command> and <command>opannotate</command>).
				867	Finally the <varname>symbol_entry</varname> is added to
				868	a private container of <varname>profile_container</varname> - this
				869	<varname>symbol_container</varname> holds all such processed symbols.
				870	</para>
				871	</sect2>
				872
				873	</sect1>
				874
				875	<sect1 id="generating-output">
				876	<title>Generating output</title>
				877
				878	<para>
				879	After the processing described in the previous section, we've now got
				880	full details of what we need to output stored in the
				881	<varname>profile_container</varname> on a symbol-by-symbol basis. To
				882	produce output, we need to replay that data and format it suitably.
				883	</para>
				884	<para>
				885	<command>opreport</command> first asks the
				886	<varname>profile_container</varname> for a
				887	<varname>symbol_collection</varname> (this is also where thresholding
				888	happens).
				889	This is sorted, then a
				890	<varname>opreport_formatter</varname> is initialised.
				891	This object initialises a set of field formatters as requested. Then
				892	<function>opreport_formatter::output()</function> is called. This
				893	iterates through the (sorted) <varname>symbol_collection</varname>;
				894	for each entry, the selected fields (as set by the
				895	<varname>format_flags</varname> options) are output by calling the
				896	field formatters, with the <varname>symbol_entry</varname> passed in.
				897	</para>
				898
				899	</sect1>
				900
				901	</chapter>
				902
				903	<chapter id="ext">
				904	<title>Extended Feature Interface</title>
				905
				906	<sect1 id="ext-intro">
				907	<title>Introduction</title>
				908
				909	<para>
				910	The Extended Feature Interface is a standard callback interface
				911	designed to allow extension to the OProfile daemon's sample processing.
				912	Each feature defines a set of callback handlers which can be enabled or
				913	disabled through the OProfile daemon's command-line option.
				914	This interface can be used to implement support for architecture-specific
				915	features or features not commonly used by general OProfile users.
				916	</para>
				917
				918	</sect1>
				919
				920	<sect1 id="ext-name-and-handlers">
				921	<title>Feature Name and Handlers</title>
				922
				923	<para>
				924	Each extended feature has an entry in the <varname>ext_feature_table</varname>
				925	in <filename>opd_extended.cpp</filename>. Each entry contains a feature name,
				926	and a corresponding set of handlers. Feature name is a unique string, which is
				927	used to identify a feature in the table. Each feature provides a set
				928	of handlers, which will be executed by the OProfile daemon from pre-determined
				929	locations to perform certain tasks. At runtime, the OProfile daemon calls a feature
				930	handler wrapper from one of the predetermined locations to check whether
				931	an extended feature is enabled, and whether a particular handler exists.
				932	Only the handlers of the enabled feature will be executed.
				933	</para>
				934
				935	</sect1>
				936
				937	<sect1 id="ext-enable">
				938	<title>Enabling Features</title>
				939
				940	<para>
				941	Each feature is enabled using the OProfile daemon (oprofiled) command-line
				942	option "--ext-feature=<extended-feature-name>:[args]". The
				943	"extended-feature-name" is used to determine the feature to be enabled.
				944	The optional "args" is passed into the feature-specific initialization handler
				945	(<function>ext_init</function>). Currently, only one extended feature can be
				946	enabled at a time.
				947	</para>
				948
				949	</sect1>
				950
				951	<sect1 id="ext-types-of-handlers">
				952	<title>Type of Handlers</title>
				953
				954	<para>
				955	Each feature is responsible for providing its own set of handlers.
				956	Types of handler are:
				957	</para>
				958
				959	<sect2 id="ext_init">
				960	<title>ext_init Handler</title>
				961
				962	<para>
				963	"ext_init" handles initialization of an extended feature. It takes
				964	"args" parameter which is passed in through the "oprofiled --ext-feature=<
				965	extended-feature-name>:[args]". This handler is executed in the function
				966	<function>opd_options()</function> in the file <filename>daemon/oprofiled.c
				967	</filename>.
				968	</para>
				969
				970	<note>
				971	<para>
				972	The ext_init handler is required for all features.
				973	</para>
				974	</note>
				975
				976	</sect2>
				977
				978	<sect2 id="ext_print_stats">
				979	<title>ext_print_stats Handler</title>
				980
				981	<para>
				982	"ext_print_stats" handles the extended feature statistics report. It adds
				983	a new section in the OProfile daemon statistics report, which is normally
				984	outputed to the file
				985	<filename>/var/lib/oprofile/samples/oprofiled.log</filename>.
				986	This handler is executed in the function <function>opd_print_stats()</function>
				987	in the file <filename>daemon/opd_stats.c</filename>.
				988	</para>
				989
				990	</sect2>
				991
				992	<sect2 id="ext_sfile_handlers">
				993	<title>ext_sfile Handler</title>
				994
				995	<para>
				996	"ext_sfile" contains a set of handlers related to operations on the extended
				997	sample files (sample files for events related to extended feature).
				998	These operations include <function>create_sfile()</function>,
				999	<function>sfile_dup()</function>, <function>close_sfile()</function>,
				1000	<function>sync_sfile()</function>, and <function>get_file()</function>
				1001	as defined in <filename>daemon/opd_sfile.c</filename>.
				1002	An additional field, <varname>odb_t * ext_file</varname>, is added to the
				1003	<varname>struct sfile</varname> for storing extended sample files
				1004	information.
				1005
				1006	</para>
				1007
				1008	</sect2>
				1009
				1010	</sect1>
				1011
				1012	<sect1 id="ext-implementation">
				1013	<title>Extended Feature Reference Implementation</title>
				1014
				1015	<sect2 id="ext-ibs">
				1016	<title>Instruction-Based Sampling (IBS)</title>
				1017
				1018	<para>
				1019	An example of extended feature implementation can be seen by
				1020	examining the AMD Instruction-Based Sampling support.
				1021	</para>
				1022
				1023	<sect3 id="ibs-init">
				1024	<title>IBS Initialization</title>
				1025
				1026	<para>
				1027	Instruction-Based Sampling (IBS) is a new performance measurement technique
				1028	available on AMD Family 10h processors. Enabling IBS profiling is done simply
				1029	by specifying IBS performance events through the "--event=" options.
				1030	</para>
				1031
				1032	<screen>
				1033	opcontrol --event=IBS_FETCH_XXX:<count>:<um>:<kernel>:<user>
				1034	opcontrol --event=IBS_OP_XXX:<count>:<um>:<kernel>:<user>
				1035
				1036	Note: * Count and unitmask for all IBS fetch events must be the same,
				1037	as do those for IBS op.
				1038	</screen>
				1039
				1040	<para>
				1041	IBS performance events are listed in <function>opcontrol --list-events</function>.
				1042	When users specify these events, opcontrol verifies them using ophelp, which
				1043	checks for the <varname>ext:ibs_fetch</varname> or <varname>ext:ibs_op</varname>
				1044	tag in <filename>events/x86-64/family10/events</filename> file.
				1045	Then, it configures the driver interface (/dev/oprofile/ibs_fetch/... and
				1046	/dev/oprofile/ibs_op/...) and starts the OProfile daemon as follows.
				1047	</para>
				1048
				1049	<screen>
				1050	oprofiled \
				1051	--ext-feature=ibs:\
				1052	fetch:<IBS_FETCH_EVENT1>,<IBS_FETCH_EVENT2>,...,:<IBS fetch count>:<IBS Fetch um>\|\
				1053	op:<IBS_OP_EVENT1>,<IBS_OP_EVENT2>,...,:<IBS op count>:<IBS op um>
				1054	</screen>
				1055
				1056	<para>
				1057	Here, the OProfile daemon parses the <varname>--ext-feature</varname>
				1058	option and checks the feature name ("ibs") before calling the
				1059	the initialization function to handle the string
				1060	containing IBS events, counts, and unitmasks.
				1061	Then, it stores each event in the IBS virtual-counter table
				1062	(<varname>struct opd_event ibs_vc[OP_MAX_IBS_COUNTERS]</varname>) and
				1063	stores the event index in the IBS Virtual Counter Index (VCI) map
				1064	(<varname>ibs_vci_map[OP_MAX_IBS_COUNTERS]</varname>) with IBS event value
				1065	as the map key.
				1066	</para>
				1067	</sect3>
				1068
				1069	<sect3 id="ibs-data-processing">
				1070	<title>IBS Data Processing</title>
				1071
				1072	<para>
				1073	During a profile session, the OProfile daemon identifies IBS samples in the
				1074	event buffer using the <varname>"IBS_FETCH_CODE"</varname> or
				1075	<varname>"IBS_OP_CODE"</varname>. These codes trigger the handlers
				1076	<function>code_ibs_fetch_sample()</function> or
				1077	<function>code_ibs_op_sample()</function> listed in the
				1078	<varname>handler_t handlers[]</varname> vector in
				1079	<filename>daemon/opd_trans.c </filename>. These handlers are responsible for
				1080	processing IBS samples and translate them into IBS performance events.
				1081	</para>
				1082
				1083	<para>
				1084	Unlike traditional performance events, each IBS sample can be derived into
				1085	multiple IBS performance events. For each event that the user specifies,
				1086	a combination of bits from Model-Specific Registers (MSR) are checked
				1087	against the bitmask defining the event. If the condition is met, the event
				1088	will then be recorded. The derivation logic is in the files
				1089	<filename>daemon/opd_ibs_macro.h</filename> and
				1090	<filename>daemon/opd_ibs_trans.[h,c]</filename>.
				1091	</para>
				1092
				1093	</sect3>
				1094
				1095	<sect3 id="ibs-sample-file">
				1096	<title>IBS Sample File</title>
				1097
				1098	<para>
				1099	Traditionally, sample file information <varname>(odb_t)</varname> is stored
				1100	in the <varname>struct sfile::odb_t file[OP_MAX_COUNTER]</varname>.
				1101	Currently, <varname>OP_MAX_COUNTER</varname> is 8 on non-alpha, and 20 on
				1102	alpha-based system. Event index (the counter number on which the event
				1103	is configured) is used to access the corresponding entry in the array.
				1104	Unlike the traditional performance event, IBS does not use the actual
				1105	counter registers (i.e. <filename>/dev/oprofile/0,1,2,3</filename>).
				1106	Also, the number of performance events generated by IBS could be larger than
				1107	<varname>OP_MAX_COUNTER</varname> (currently upto 13 IBS-fetch and 46 IBS-op
				1108	events). Therefore IBS requires a special data structure and sfile
				1109	handlers (<varname>struct opd_ext_sfile_handlers</varname>) for managing
				1110	IBS sample files. IBS-sample-file information is stored in a memory
				1111	allocated by handler <function>ibs_sfile_create()</function>, which can
				1112	be accessed through <varname>struct sfile::odb_t * ext_files</varname>.
				1113	</para>
				1114
				1115	</sect3>
				1116
				1117	</sect2>
				1118
				1119	</sect1>
				1120
				1121	</chapter>
				1122
				1123	<glossary id="glossary">
				1124	<title>Glossary of OProfile source concepts and types</title>
				1125
				1126	<glossentry><glossterm>application image</glossterm>
				1127	<glossdef><para>
				1128	The primary binary image used by an application. This is derived
				1129	from the kernel and corresponds to the binary started upon running
				1130	an application: for example, <filename>/bin/bash</filename>.
				1131	</para></glossdef></glossentry>
				1132
				1133	<glossentry><glossterm>binary image</glossterm>
				1134	<glossdef><para>
				1135	An ELF file containing executable code: this includes kernel modules,
				1136	the kernel itself (a.k.a. <filename>vmlinux</filename>), shared libraries,
				1137	and application binaries.
				1138	</para></glossdef></glossentry>
				1139
				1140	<glossentry><glossterm>dcookie</glossterm>
				1141	<glossdef><para>
				1142	Short for "dentry cookie". A unique ID that can be looked up to provide
				1143	the full path name of a binary image.
				1144	</para></glossdef></glossentry>
				1145
				1146	<glossentry><glossterm>dependent image</glossterm>
				1147	<glossdef><para>
				1148	A binary image that is dependent upon an application, used with
				1149	per-application separation. Most commonly, shared libraries. For example,
				1150	if <filename>/bin/bash</filename> is running and we take
				1151	some samples inside the C library itself due to <command>bash</command>
				1152	calling library code, then the image <filename>/lib/libc.so</filename>
				1153	would be dependent upon <filename>/bin/bash</filename>.
				1154	</para></glossdef></glossentry>
				1155
				1156	<glossentry><glossterm>merging</glossterm>
				1157	<glossdef><para>
				1158	This refers to the ability to merge several distinct sample files
				1159	into one set of data at runtime, in the post-profiling tools. For example,
				1160	per-thread sample files can be merged into one set of data, because
				1161	they are compatible (i.e. the aggregation of the data is meaningful),
				1162	but it's not possible to merge sample files for two different events,
				1163	because there would be no useful meaning to the results.
				1164	</para></glossdef></glossentry>
				1165
				1166	<glossentry><glossterm>profile class</glossterm>
				1167	<glossdef><para>
				1168	A collection of profile data that has been collected under the same
				1169	class template. For example, if we're using <command>opreport</command>
				1170	to show results after profiling with two performance counters enabled
				1171	profiling <constant>DATA_MEM_REFS</constant> and <constant>CPU_CLK_UNHALTED</constant>,
				1172	there would be two profile classes, one for each event. Or if we're on
				1173	an SMP system and doing per-cpu profiling, and we request
				1174	<command>opreport</command> to show results for each CPU side-by-side,
				1175	there would be a profile class for each CPU.
				1176	</para></glossdef></glossentry>
				1177
				1178	<glossentry><glossterm>profile specification</glossterm>
				1179	<glossdef><para>
				1180	The parameters the user passes to the post-profiling tools that limit
				1181	what sample files are used. This specification is matched against
				1182	the available sample files to generate a selection of profile data.
				1183	</para></glossdef></glossentry>
				1184
				1185	<glossentry><glossterm>profile template</glossterm>
				1186	<glossdef><para>
				1187	The parameters that define what goes in a particular profile class.
				1188	This includes a symbolic name (e.g. "cpu:1") and the code-usable
				1189	equivalent.
				1190	</para></glossdef></glossentry>
				1191
				1192	</glossary>
				1193
				1194	</book>