blob: 136138357fa2a39bdee93cd0905c6e254f6959d0 [file] [log] [blame]
Mike Dodd8cfa7022010-11-17 11:12:26 -08001<?xml version="1.0" encoding="ISO-8859-1"?>
2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
3<html xmlns="http://www.w3.org/1999/xhtml">
4 <head>
5 <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
6 <title>OProfile manual</title>
7 <meta name="generator" content="DocBook XSL Stylesheets V1.69.1" />
8 </head>
9 <body>
10 <div class="book" lang="en" xml:lang="en">
11 <div class="titlepage">
12 <div>
13 <div>
14 <h1 class="title"><a id="oprofile-guide"></a>OProfile manual</h1>
15 </div>
16 <div>
17 <div class="authorgroup">
18 <div class="author">
19 <h3 class="author"><span class="firstname">John</span> <span class="surname">Levon</span></h3>
20 <div class="affiliation">
21 <div class="address">
22 <p>
23 <code class="email">&lt;<a href="mailto:levon@movementarian.org">levon@movementarian.org</a>&gt;</code>
24 </p>
25 </div>
26 </div>
27 </div>
28 </div>
29 </div>
30 <div>
31 <p class="copyright">Copyright © 2000-2004 Victoria University of Manchester, John Levon and others</p>
32 </div>
33 </div>
34 <hr />
35 </div>
36 <div class="toc">
37 <p>
38 <b>Table of Contents</b>
39 </p>
40 <dl>
41 <dt>
42 <span class="chapter">
43 <a href="#introduction">1. Introduction</a>
44 </span>
45 </dt>
46 <dd>
47 <dl>
48 <dt>
49 <span class="sect1">
50 <a href="#applications">1. Applications of OProfile</a>
51 </span>
52 </dt>
53 <dd>
54 <dl>
55 <dt>
56 <span class="sect2">
57 <a href="#jitsupport">1.1. Support for dynamically compiled (JIT) code</a>
58 </span>
59 </dt>
60 </dl>
61 </dd>
62 <dt>
63 <span class="sect1">
64 <a href="#requirements">2. System requirements</a>
65 </span>
66 </dt>
67 <dt>
68 <span class="sect1">
69 <a href="#resources">3. Internet resources</a>
70 </span>
71 </dt>
72 <dt>
73 <span class="sect1">
74 <a href="#install">4. Installation</a>
75 </span>
76 </dt>
77 <dt>
78 <span class="sect1">
79 <a href="#uninstall">5. Uninstalling OProfile</a>
80 </span>
81 </dt>
82 </dl>
83 </dd>
84 <dt>
85 <span class="chapter">
86 <a href="#overview">2. Overview</a>
87 </span>
88 </dt>
89 <dd>
90 <dl>
91 <dt>
92 <span class="sect1">
93 <a href="#getting-started">1. Getting started</a>
94 </span>
95 </dt>
96 <dt>
97 <span class="sect1">
98 <a href="#tools-overview">2. Tools summary</a>
99 </span>
100 </dt>
101 </dl>
102 </dd>
103 <dt>
104 <span class="chapter">
105 <a href="#controlling">3. Controlling the profiler</a>
106 </span>
107 </dt>
108 <dd>
109 <dl>
110 <dt>
111 <span class="sect1">
112 <a href="#controlling-daemon">1. Using <span><strong class="command">opcontrol</strong></span></a>
113 </span>
114 </dt>
115 <dd>
116 <dl>
117 <dt>
118 <span class="sect2">
119 <a href="#opcontrolexamples">1.1. Examples</a>
120 </span>
121 </dt>
122 <dt>
123 <span class="sect2">
124 <a href="#eventspec">1.2. Specifying performance counter events</a>
125 </span>
126 </dt>
127 </dl>
128 </dd>
129 <dt>
130 <span class="sect1">
131 <a href="#setup-jit">2. Setting up the JIT profiling feature</a>
132 </span>
133 </dt>
134 <dd>
135 <dl>
136 <dt>
137 <span class="sect2">
138 <a href="#setup-jit-jvm">2.1. JVM instrumentation</a>
139 </span>
140 </dt>
141 </dl>
142 </dd>
143 <dt>
144 <span class="sect1">
145 <a href="#oprofile-gui">3. Using <span><strong class="command">oprof_start</strong></span></a>
146 </span>
147 </dt>
148 <dt>
149 <span class="sect1">
150 <a href="#detailed-parameters">4. Configuration details</a>
151 </span>
152 </dt>
153 <dd>
154 <dl>
155 <dt>
156 <span class="sect2">
157 <a href="#hardware-counters">4.1. Hardware performance counters</a>
158 </span>
159 </dt>
160 <dt>
161 <span class="sect2">
162 <a href="#rtc">4.2. OProfile in RTC mode</a>
163 </span>
164 </dt>
165 <dt>
166 <span class="sect2">
167 <a href="#timer">4.3. OProfile in timer interrupt mode</a>
168 </span>
169 </dt>
170 <dt>
171 <span class="sect2">
172 <a href="#p4">4.4. Pentium 4 support</a>
173 </span>
174 </dt>
175 <dt>
176 <span class="sect2">
177 <a href="#ia64">4.5. Intel Itanium 2 support</a>
178 </span>
179 </dt>
180 <dt>
181 <span class="sect2">
182 <a href="#ppc64">4.6. PowerPC64 support</a>
183 </span>
184 </dt>
185 <dt>
186 <span class="sect2">
187 <a href="#cell-be">4.7. Cell Broadband Engine support</a>
188 </span>
189 </dt>
190 <dt>
191 <span class="sect2">
192 <a href="#amd-ibs-support">4.8. AMD64 (x86_64) Instruction-Based Sampling (IBS) support</a>
193 </span>
194 </dt>
195 <dt>
196 <span class="sect2">
197 <a href="#misuse">4.9. Dangerous counter settings</a>
198 </span>
199 </dt>
200 </dl>
201 </dd>
202 </dl>
203 </dd>
204 <dt>
205 <span class="chapter">
206 <a href="#results">4. Obtaining results</a>
207 </span>
208 </dt>
209 <dd>
210 <dl>
211 <dt>
212 <span class="sect1">
213 <a href="#profile-spec">1. Profile specifications</a>
214 </span>
215 </dt>
216 <dd>
217 <dl>
218 <dt>
219 <span class="sect2">
220 <a href="#profile-spec-examples">1.1. Examples</a>
221 </span>
222 </dt>
223 <dt>
224 <span class="sect2">
225 <a href="#profile-spec-details">1.2. Profile specification parameters</a>
226 </span>
227 </dt>
228 <dt>
229 <span class="sect2">
230 <a href="#locating-and-managing-binary-images">1.3. Locating and managing binary images</a>
231 </span>
232 </dt>
233 <dt>
234 <span class="sect2">
235 <a href="#no-results">1.4. What to do when you don't get any results</a>
236 </span>
237 </dt>
238 </dl>
239 </dd>
240 <dt>
241 <span class="sect1">
242 <a href="#opreport">2. Image summaries and symbol summaries (<span><strong class="command">opreport</strong></span>)</a>
243 </span>
244 </dt>
245 <dd>
246 <dl>
247 <dt>
248 <span class="sect2">
249 <a href="#opreport-merging">2.1. Merging separate profiles</a>
250 </span>
251 </dt>
252 <dt>
253 <span class="sect2">
254 <a href="#opreport-comparison">2.2. Side-by-side multiple results</a>
255 </span>
256 </dt>
257 <dt>
258 <span class="sect2">
259 <a href="#opreport-callgraph">2.3. Callgraph output</a>
260 </span>
261 </dt>
262 <dt>
263 <span class="sect2">
264 <a href="#opreport-diff">2.4. Differential profiles with <span><strong class="command">opreport</strong></span></a>
265 </span>
266 </dt>
267 <dt>
268 <span class="sect2">
269 <a href="#opreport-anon">2.5. Anonymous executable mappings</a>
270 </span>
271 </dt>
272 <dt>
273 <span class="sect2">
274 <a href="#opreport-xml">2.6. XML formatted output</a>
275 </span>
276 </dt>
277 <dt>
278 <span class="sect2">
279 <a href="#opreport-options">2.7. Options for <span><strong class="command">opreport</strong></span></a>
280 </span>
281 </dt>
282 </dl>
283 </dd>
284 <dt>
285 <span class="sect1">
286 <a href="#opannotate">3. Outputting annotated source (<span><strong class="command">opannotate</strong></span>)</a>
287 </span>
288 </dt>
289 <dd>
290 <dl>
291 <dt>
292 <span class="sect2">
293 <a href="#opannotate-finding-source">3.1. Locating source files</a>
294 </span>
295 </dt>
296 <dt>
297 <span class="sect2">
298 <a href="#opannotate-details">3.2. Usage of <span><strong class="command">opannotate</strong></span></a>
299 </span>
300 </dt>
301 </dl>
302 </dd>
303 <dt>
304 <span class="sect1">
305 <a href="#getting-jit-reports">4. OProfile results with JIT samples</a>
306 </span>
307 </dt>
308 <dt>
309 <span class="sect1">
310 <a href="#opgprof">5. <span><strong class="command">gprof</strong></span>-compatible output (<span><strong class="command">opgprof</strong></span>)</a>
311 </span>
312 </dt>
313 <dd>
314 <dl>
315 <dt>
316 <span class="sect2">
317 <a href="#opgprof-details">5.1. Usage of <span><strong class="command">opgprof</strong></span></a>
318 </span>
319 </dt>
320 </dl>
321 </dd>
322 <dt>
323 <span class="sect1">
324 <a href="#oparchive">6. Archiving measurements (<span><strong class="command">oparchive</strong></span>)</a>
325 </span>
326 </dt>
327 <dd>
328 <dl>
329 <dt>
330 <span class="sect2">
331 <a href="#oparchive-details">6.1. Usage of <span><strong class="command">oparchive</strong></span></a>
332 </span>
333 </dt>
334 </dl>
335 </dd>
336 <dt>
337 <span class="sect1">
338 <a href="#opimport">7. Converting sample database files (<span><strong class="command">opimport</strong></span>)</a>
339 </span>
340 </dt>
341 <dd>
342 <dl>
343 <dt>
344 <span class="sect2">
345 <a href="#opimport-details">7.1. Usage of <span><strong class="command">opimport</strong></span></a>
346 </span>
347 </dt>
348 </dl>
349 </dd>
350 </dl>
351 </dd>
352 <dt>
353 <span class="chapter">
354 <a href="#interpreting">5. Interpreting profiling results</a>
355 </span>
356 </dt>
357 <dd>
358 <dl>
359 <dt>
360 <span class="sect1">
361 <a href="#irq-latency">1. Profiling interrupt latency</a>
362 </span>
363 </dt>
364 <dt>
365 <span class="sect1">
366 <a href="#kernel-profiling">2. Kernel profiling</a>
367 </span>
368 </dt>
369 <dd>
370 <dl>
371 <dt>
372 <span class="sect2">
373 <a href="#irq-masking">2.1. Interrupt masking</a>
374 </span>
375 </dt>
376 <dt>
377 <span class="sect2">
378 <a href="#idle">2.2. Idle time</a>
379 </span>
380 </dt>
381 <dt>
382 <span class="sect2">
383 <a href="#kernel-modules">2.3. Profiling kernel modules</a>
384 </span>
385 </dt>
386 </dl>
387 </dd>
388 <dt>
389 <span class="sect1">
390 <a href="#interpreting-callgraph">3. Interpreting call-graph profiles</a>
391 </span>
392 </dt>
393 <dt>
394 <span class="sect1">
395 <a href="#debug-info">4. Inaccuracies in annotated source</a>
396 </span>
397 </dt>
398 <dd>
399 <dl>
400 <dt>
401 <span class="sect2">
402 <a href="#effect-of-optimizations">4.1. Side effects of optimizations</a>
403 </span>
404 </dt>
405 <dt>
406 <span class="sect2">
407 <a href="#prologues">4.2. Prologues and epilogues</a>
408 </span>
409 </dt>
410 <dt>
411 <span class="sect2">
412 <a href="#inlined-function">4.3. Inlined functions</a>
413 </span>
414 </dt>
415 <dt>
416 <span class="sect2">
417 <a href="#wrong-linenr-info">4.4. Inaccuracy in line number information</a>
418 </span>
419 </dt>
420 </dl>
421 </dd>
422 <dt>
423 <span class="sect1">
424 <a href="#symbol-without-debug-info">5. Assembly functions</a>
425 </span>
426 </dt>
427 <dt>
428 <span class="sect1">
429 <a href="#overlapping-symbols">6. Overlapping symbols in JITed code</a>
430 </span>
431 </dt>
432 <dt>
433 <span class="sect1">
434 <a href="#hidden-cost">7. Other discrepancies</a>
435 </span>
436 </dt>
437 </dl>
438 </dd>
439 <dt>
440 <span class="chapter">
441 <a href="#ack">6. Acknowledgments</a>
442 </span>
443 </dt>
444 </dl>
445 </div>
446 <div class="chapter" lang="en" xml:lang="en">
447 <div class="titlepage">
448 <div>
449 <div>
450 <h2 class="title"><a id="introduction"></a>Chapter 1. Introduction</h2>
451 </div>
452 </div>
453 </div>
454 <div class="toc">
455 <p>
456 <b>Table of Contents</b>
457 </p>
458 <dl>
459 <dt>
460 <span class="sect1">
461 <a href="#applications">1. Applications of OProfile</a>
462 </span>
463 </dt>
464 <dd>
465 <dl>
466 <dt>
467 <span class="sect2">
468 <a href="#jitsupport">1.1. Support for dynamically compiled (JIT) code</a>
469 </span>
470 </dt>
471 </dl>
472 </dd>
473 <dt>
474 <span class="sect1">
475 <a href="#requirements">2. System requirements</a>
476 </span>
477 </dt>
478 <dt>
479 <span class="sect1">
480 <a href="#resources">3. Internet resources</a>
481 </span>
482 </dt>
483 <dt>
484 <span class="sect1">
485 <a href="#install">4. Installation</a>
486 </span>
487 </dt>
488 <dt>
489 <span class="sect1">
490 <a href="#uninstall">5. Uninstalling OProfile</a>
491 </span>
492 </dt>
493 </dl>
494 </div>
495 <p>
496This manual applies to OProfile version 0.9.6.
497OProfile is a profiling system for Linux 2.2/2.4/2.6 systems on a number of architectures. It is capable of profiling
498all parts of a running system, from the kernel (including modules and interrupt handlers) to shared libraries
499to binaries. It runs transparently in the background collecting information at a low overhead. These
500features make it ideal for profiling entire systems to determine bottle necks in real-world systems.
501</p>
502 <p>
503Many CPUs provide "performance counters", hardware registers that can count "events"; for example,
504cache misses, or CPU cycles. OProfile provides profiles of code based on the number of these occurring events:
505repeatedly, every time a certain (configurable) number of events has occurred, the PC value is recorded.
506This information is aggregated into profiles for each binary image.</p>
507 <p>
508Some hardware setups do not allow OProfile to use performance counters: in these cases, no
509events are available, and OProfile operates in timer/RTC mode, as described in later chapters.
510</p>
511 <div class="sect1" lang="en" xml:lang="en">
512 <div class="titlepage">
513 <div>
514 <div>
515 <h2 class="title" style="clear: both"><a id="applications"></a>1. Applications of OProfile</h2>
516 </div>
517 </div>
518 </div>
519 <p>
520OProfile is useful in a number of situations. You might want to use OProfile when you :
521</p>
522 <div class="itemizedlist">
523 <ul type="disc">
524 <li>
525 <p>need low overhead</p>
526 </li>
527 <li>
528 <p>cannot use highly intrusive profiling methods</p>
529 </li>
530 <li>
531 <p>need to profile interrupt handlers</p>
532 </li>
533 <li>
534 <p>need to profile an application and its shared libraries</p>
535 </li>
536 <li>
537 <p>need to profile dynamically compiled code of supported virtual machines (see <a href="#jitsupport" title="1.1. Support for dynamically compiled (JIT) code">Section 1.1, &#8220;Support for dynamically compiled (JIT) code&#8221;</a>)</p>
538 </li>
539 <li>
540 <p>need to capture the performance behaviour of entire system</p>
541 </li>
542 <li>
543 <p>want to examine hardware effects such as cache misses</p>
544 </li>
545 <li>
546 <p>want detailed source annotation</p>
547 </li>
548 <li>
549 <p>want instruction-level profiles</p>
550 </li>
551 <li>
552 <p>want call-graph profiles</p>
553 </li>
554 </ul>
555 </div>
556 <p>
557OProfile is not a panacea. OProfile might not be a complete solution when you :
558</p>
559 <div class="itemizedlist">
560 <ul type="disc">
561 <li>
562 <p>require call graph profiles on platforms other than 2.6/x86</p>
563 </li>
564 <li>
565 <p>don't have root permissions</p>
566 </li>
567 <li>
568 <p>require 100% instruction-accurate profiles</p>
569 </li>
570 <li>
571 <p>need function call counts or an interstitial profiling API</p>
572 </li>
573 <li>
574 <p>cannot tolerate any disturbance to the system whatsoever</p>
575 </li>
576 <li>
577 <p>need to profile interpreted or dynamically compiled code of non-supported virtual machines</p>
578 </li>
579 </ul>
580 </div>
581 <div class="sect2" lang="en" xml:lang="en">
582 <div class="titlepage">
583 <div>
584 <div>
585 <h3 class="title"><a id="jitsupport"></a>1.1. Support for dynamically compiled (JIT) code</h3>
586 </div>
587 </div>
588 </div>
589 <p>
590Older versions of OProfile were not capable of attributing samples to symbols from dynamically
591compiled code, i.e. "just-in-time (JIT) code". Typical JIT compilers load the JIT code into
592anonymous memory regions. OProfile reported the samples from such code, but the attribution
593provided was simply:
594 </p>
595 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
596 <tr>
597 <td>
598 <pre class="screen">"anon: &lt;tgid&gt;&lt;address range&gt;" </pre>
599 </td>
600 </tr>
601 </table>
602 <p>
603Due to this limitation, it wasn't possible to profile applications executed by virtual machines (VMs)
604like the Java Virtual Machine. OProfile now contains an infrastructure to support JITed code.
605A development library is provided to allow developers
606to add support for any VM that produces dynamically compiled code (see the <span class="emphasis"><em>OProfile JIT agent
607developer guide</em></span>).
608In addition, built-in support is included for the following:</p>
609 <div class="itemizedlist">
610 <ul type="disc">
611 <li>JVMTI agent library for Java (1.5 and higher)</li>
612 <li>JVMPI agent library for Java (1.5 and lower)</li>
613 </ul>
614 </div>
615 <p>
616For information on how to use OProfile's JIT support, see <a href="#setup-jit" title="2. Setting up the JIT profiling feature">Section 2, &#8220;Setting up the JIT profiling feature&#8221;</a>.
617</p>
618 </div>
619 </div>
620 <div class="sect1" lang="en" xml:lang="en">
621 <div class="titlepage">
622 <div>
623 <div>
624 <h2 class="title" style="clear: both"><a id="requirements"></a>2. System requirements</h2>
625 </div>
626 </div>
627 </div>
628 <div class="variablelist">
629 <dl>
630 <dt>
631 <span class="term">Linux kernel 2.2/2.4/2.6</span>
632 </dt>
633 <dd>
634 <p>
635 OProfile uses a kernel module that can be compiled for
636 2.2.11 or later and 2.4. 2.4.10 or above is required if you use the
637 boot-time kernel option <code class="option">nosmp</code>. 2.6 kernels are supported with the in-kernel
638 OProfile driver. Note that only 32-bit x86 and IA64 are supported on 2.2/2.4 kernels.
639 </p>
640 <p>
641 2.6 kernels are strongly recommended. Under 2.4, OProfile may cause system crashes if power
642 management is used, or the BIOS does not correctly deal with local APICs.
643 </p>
644 <p>
645 PPC64 processors (Power4/Power5/PPC970, etc.) require a recent (&gt; 2.6.5) kernel with the line
646 <code class="constant">#define PV_970</code> present in <code class="filename">include/asm-ppc64/processor.h</code>.
647
648 </p>
649 <p>
650 Profiling the Cell Broadband Engine PowerPC Processing Element (PPE) requires a kernel version
651 of 2.6.18 or more recent.
652 Profiling the Cell Broadband Engine Synergistic Processing Element (SPE) requires a kernel version
653 of 2.6.22 or more recent. Additionally, full support of SPE profiling requires a BFD library
654 from binutils code dated January 2007 or later. To ensure the proper BFD support exists, run
655 the <code class="code">configure</code> utility with <code class="code">--with-target=cell-be</code>.
656
657 Profiling the Cell Broadband Engine using SPU events requires a kernel version of 2.6.29-rc1
658 or more recent.
659
660 </p>
661 <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3>Attempting to profile SPEs with kernel versions older than 2.6.22 may cause the
662 system to crash.</div>
663 <p>
664 </p>
665 <p>
666 Instruction-Based Sampling (IBS) profile on AMD family10h processors requires
667 kernel version 2.6.28-rc2 or later.
668 </p>
669 </dd>
670 <dt>
671 <span class="term">modutils 2.4.6 or above</span>
672 </dt>
673 <dd>
674 <p>
675 You should have installed modutils 2.4.6 or higher (in fact earlier versions work well in almost all
676 cases).
677 </p>
678 </dd>
679 <dt>
680 <span class="term">Supported architecture</span>
681 </dt>
682 <dd>
683 <p>
684 For Intel IA32, a CPU with either a P6 generation or Pentium 4 core is
685 required. In marketing terms this translates to anything
686 between an Intel Pentium Pro (not Pentium Classics) and
687 a Pentium 4 / Xeon, including all Celerons. The AMD
688 Athlon, Opteron, Phenom, and Turion CPUs are also supported. Other IA32
689 CPU types only support the RTC mode of OProfile; please
690 see later in this manual for details. Hyper-threaded Pentium IVs
691 are not supported in 2.4. For 2.4 kernels, the Intel
692 IA-64 CPUs are also supported. For 2.6 kernels, there is additionally
693 support for Alpha processors, MIPS, ARM, x86-64, sparc64, ppc64, AVR32, and,
694 in timer mode, PA-RISC and s390.
695 </p>
696 </dd>
697 <dt>
698 <span class="term">Uniprocessor or SMP</span>
699 </dt>
700 <dd>
701 <p>
702 SMP machines are fully supported.
703 </p>
704 </dd>
705 <dt>
706 <span class="term">Required libraries</span>
707 </dt>
708 <dd>
709 <p>
710 These libraries are required : <code class="filename">popt</code>, <code class="filename">bfd</code>,
711 <code class="filename">liberty</code> (debian users: libiberty is provided in binutils-dev package), <code class="filename">dl</code>,
712 plus the standard C++ libraries.
713 </p>
714 </dd>
715 <dt>
716 <span class="term">Required user account</span>
717 </dt>
718 <dd>
719 <p>
720 For secure processing of sample data from JIT virtual machines (e.g., Java),
721 the special user account "oprofile" must exist on the system. The 'configure'
722 and 'make install' operations will print warning messages if this
723 account is not found. If you intend to profile JITed code, you must create
724 a group account named 'oprofile' and then create the 'oprofile' user account,
725 setting the default group to 'oprofile'. A runtime error message is printed to
726 the oprofile daemon log when processing JIT samples if this special user
727 account cannot be found.
728 </p>
729 </dd>
730 <dt>
731 <span class="term">OProfile GUI</span>
732 </dt>
733 <dd>
734 <p>
735 The use of the GUI to start the profiler requires the <code class="filename">Qt 2</code> library. <code class="filename">Qt 3</code> should
736 also work.
737 </p>
738 </dd>
739 <dt>
740 <span class="term">
741 <span class="acronym">ELF</span>
742 </span>
743 </dt>
744 <dd>
745 <p>
746 Probably not too strenuous a requirement, but older <span class="acronym">A.OUT</span> binaries/libraries are not supported.
747 </p>
748 </dd>
749 <dt>
750 <span class="term">K&amp;R coding style</span>
751 </dt>
752 <dd>
753 <p>
754 OK, so it's not really a requirement, but I wish it was...
755 </p>
756 </dd>
757 </dl>
758 </div>
759 </div>
760 <div class="sect1" lang="en" xml:lang="en">
761 <div class="titlepage">
762 <div>
763 <div>
764 <h2 class="title" style="clear: both"><a id="resources"></a>3. Internet resources</h2>
765 </div>
766 </div>
767 </div>
768 <div class="variablelist">
769 <dl>
770 <dt>
771 <span class="term">Web page</span>
772 </dt>
773 <dd>
774 <p>
775 There is a web page (which you may be reading now) at
776 <a href="http://oprofile.sf.net/">http://oprofile.sf.net/</a>.
777 </p>
778 </dd>
779 <dt>
780 <span class="term">Download</span>
781 </dt>
782 <dd>
783 <p>
784 You can download a source tarball or get anonymous CVS at the sourceforge page,
785 <a href="http://sf.net/projects/oprofile/">http://sf.net/projects/oprofile/</a>.
786 </p>
787 </dd>
788 <dt>
789 <span class="term">Mailing list</span>
790 </dt>
791 <dd>
792 <p>
793 There is a low-traffic OProfile-specific mailing list, details at
794 <a href="http://sf.net/mail/?group_id=16191">http://sf.net/mail/?group_id=16191</a>.
795 </p>
796 </dd>
797 <dt>
798 <span class="term">Bug tracker</span>
799 </dt>
800 <dd>
801 <p>
802 There is a bug tracker for OProfile at SourceForge,
803 <a href="http://sf.net/tracker/?group_id=16191&amp;atid=116191">http://sf.net/tracker/?group_id=16191&amp;atid=116191</a>.
804 </p>
805 </dd>
806 <dt>
807 <span class="term">IRC channel</span>
808 </dt>
809 <dd>
810 <p>
811 Several OProfile developers and users sometimes hang out on channel <span><strong class="command">#oprofile</strong></span>
812 on the <a href="http://oftc.net">OFTC</a> network.
813 </p>
814 </dd>
815 </dl>
816 </div>
817 </div>
818 <div class="sect1" lang="en" xml:lang="en">
819 <div class="titlepage">
820 <div>
821 <div>
822 <h2 class="title" style="clear: both"><a id="install"></a>4. Installation</h2>
823 </div>
824 </div>
825 </div>
826 <p>
827First you need to build OProfile and install it. <span><strong class="command">./configure</strong></span>, <span><strong class="command">make</strong></span>, <span><strong class="command">make install</strong></span>
828is often all you need, but note these arguments to <span><strong class="command">./configure</strong></span> :
829</p>
830 <div class="variablelist">
831 <dl>
832 <dt>
833 <span class="term">
834 <code class="option">--with-linux</code>
835 </span>
836 </dt>
837 <dd>
838 <p>
839 Use this option to specify the location of the kernel source tree you wish
840 to compile against. The kernel module is built against this source and
841 will only work with a running kernel built from the same source with
842 exact same options, so it is important you specify this option if you need
843 to.
844 </p>
845 </dd>
846 <dt>
847 <span class="term">
848 <code class="option">--with-java</code>
849 </span>
850 </dt>
851 <dd>
852 <p>
853 Use this option if you need to profile Java applications. Also, see
854 <a href="#requirements" title="2. System requirements">Section 2, &#8220;System requirements&#8221;</a>, "Required user account". This option
855 is used to specify the location of the Java Development Kit (JDK)
856 source tree you wish to use. This is necessary to get the interface description
857 of the JVMPI (or JVMTI) interface to compile the JIT support code successfully.
858 </p>
859 <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
860 <h3 class="title">Note</h3>
861 <p>
862 The Java Runtime Environment (JRE) does not include the development
863 files that are required to compile the JIT support code, so the full
864 JDK must be installed in order to use this option.
865 </p>
866 </div>
867 <p>
868 By default, the Oprofile JIT support libraries will be installed in
869 <code class="filename">&lt;oprof_install_dir&gt;/lib/oprofile</code>. To build
870 and install OProfile and the JIT support libraries as 64-bit, you can
871 do something like the following:
872 </p>
873 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
874 <tr>
875 <td>
876 <pre class="screen">
877 # CFLAGS="-m64" CXXFLAGS="-m64" ./configure \
878 --with-kernel-support --with-java={my_jdk_installdir} \
879 --libdir=/usr/local/lib64
880 </pre>
881 </td>
882 </tr>
883 </table>
884 <p>
885 </p>
886 <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
887 <h3 class="title">Note</h3>
888 <p>
889 If you encounter errors building 64-bit, you should
890 install libtool 1.5.26 or later since that release of
891 libtool fixes known problems for certain platforms.
892 If you install libtool into a non-standard location,
893 you'll need to edit the invocation of 'aclocal' in
894 OProfile's autogen.sh as follows (assume an install
895 location of /usr/local):
896 </p>
897 <p>
898 <code class="code">aclocal -I m4 -I /usr/local/share/aclocal</code>
899 </p>
900 </div>
901 </dd>
902 <dt>
903 <span class="term">
904 <code class="option">--with-kernel-support</code>
905 </span>
906 </dt>
907 <dd>
908 <p>
909 Use this option with 2.6 and above kernels to indicate the
910 kernel provides the OProfile device driver.
911 </p>
912 </dd>
913 <dt>
914 <span class="term">
915 <code class="option">--with-qt-dir/includes/libraries</code>
916 </span>
917 </dt>
918 <dd>
919 <p>
920 Specify the location of Qt headers and libraries. It defaults to searching in
921 <code class="constant">$QTDIR</code> if these are not specified.
922 </p>
923 </dd>
924 <dt>
925 <a id="disable-werror"></a>
926 <span class="term">
927 <code class="option">--disable-werror</code>
928 </span>
929 </dt>
930 <dd>
931 <p>
932 Development versions of OProfile build by
933 default with <code class="option">-Werror</code>. This option turns
934 <code class="option">-Werror</code> off.
935 </p>
936 </dd>
937 <dt>
938 <a id="disable-optimization"></a>
939 <span class="term">
940 <code class="option">--disable-optimization</code>
941 </span>
942 </dt>
943 <dd>
944 <p>
945 Disable the <code class="option">-O2</code> compiler flag
946 (useful if you discover an OProfile bug and want to give a useful
947 back-trace etc.)
948 </p>
949 </dd>
950 </dl>
951 </div>
952 <p>
953You'll need to have a configured kernel source for the current kernel
954to build the module for 2.4 kernels. Since all distributions provide different kernels it's unlikely the running kernel match the configured source
955you installed. The safest way is to recompile your own kernel, run it and compile oprofile. It is also recommended that if you have a
956uniprocessor machine, you enable the local APIC / IO_APIC support for
957your kernel (this is automatically enabled for SMP kernels). With many BIOS, kernel &gt;= 2.6.9 and UP kernel it's not sufficient to enable the local APIC you must also turn it on explicitly at boot time by providing "lapic" option to the kernel. On
958machines with power management, such as laptops, the power management
959must be turned off when using OProfile with 2.4 kernels. The power management software
960in the BIOS cannot handle the non-maskable interrupts (NMIs) used by
961OProfile for data collection. If you use the NMI watchdog, be aware that
962the watchdog is disabled when profiling starts, and not re-enabled until the
963OProfile module is removed (or, in 2.6, when OProfile is not running). If you compile OProfile for
964a 2.2 kernel you must be root to compile the module. If you are using
9652.6 kernels or higher, you do not need kernel source, as long as the
966OProfile driver is enabled; additionally, you should not need to disable
967power management.
968</p>
969 <p>
970Please note that you must save or have available the <code class="filename">vmlinux</code> file
971generated during a kernel compile, as OProfile needs it (you can use
972<code class="option">--no-vmlinux</code>, but this will prevent kernel profiling).
973</p>
974 </div>
975 <div class="sect1" lang="en" xml:lang="en">
976 <div class="titlepage">
977 <div>
978 <div>
979 <h2 class="title" style="clear: both"><a id="uninstall"></a>5. Uninstalling OProfile</h2>
980 </div>
981 </div>
982 </div>
983 <p>
984You must have the source tree available to uninstall OProfile; a <span><strong class="command">make uninstall</strong></span> will
985remove all installed files except your configuration file in the directory <code class="filename">~/.oprofile</code>.
986</p>
987 </div>
988 </div>
989 <div class="chapter" lang="en" xml:lang="en">
990 <div class="titlepage">
991 <div>
992 <div>
993 <h2 class="title"><a id="overview"></a>Chapter 2. Overview</h2>
994 </div>
995 </div>
996 </div>
997 <div class="toc">
998 <p>
999 <b>Table of Contents</b>
1000 </p>
1001 <dl>
1002 <dt>
1003 <span class="sect1">
1004 <a href="#getting-started">1. Getting started</a>
1005 </span>
1006 </dt>
1007 <dt>
1008 <span class="sect1">
1009 <a href="#tools-overview">2. Tools summary</a>
1010 </span>
1011 </dt>
1012 </dl>
1013 </div>
1014 <div class="sect1" lang="en" xml:lang="en">
1015 <div class="titlepage">
1016 <div>
1017 <div>
1018 <h2 class="title" style="clear: both"><a id="getting-started"></a>1. Getting started</h2>
1019 </div>
1020 </div>
1021 </div>
1022 <p>
1023Before you can use OProfile, you must set it up. The minimum setup required for this
1024is to tell OProfile where the <code class="filename">vmlinux</code> file corresponding to the
1025running kernel is, for example :
1026</p>
1027 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
1028 <tr>
1029 <td>
1030 <pre class="screen">opcontrol --vmlinux=/boot/vmlinux-`uname -r`</pre>
1031 </td>
1032 </tr>
1033 </table>
1034 <p>
1035If you don't want to profile the kernel itself,
1036you can tell OProfile you don't have a <code class="filename">vmlinux</code> file :
1037</p>
1038 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
1039 <tr>
1040 <td>
1041 <pre class="screen">opcontrol --no-vmlinux</pre>
1042 </td>
1043 </tr>
1044 </table>
1045 <p>
1046Now we are ready to start the daemon (<span><strong class="command">oprofiled</strong></span>) which collects
1047the profile data :
1048</p>
1049 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
1050 <tr>
1051 <td>
1052 <pre class="screen">opcontrol --start</pre>
1053 </td>
1054 </tr>
1055 </table>
1056 <p>
1057When I want to stop profiling, I can do so with :
1058</p>
1059 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
1060 <tr>
1061 <td>
1062 <pre class="screen">opcontrol --shutdown</pre>
1063 </td>
1064 </tr>
1065 </table>
1066 <p>
1067Note that unlike <span><strong class="command">gprof</strong></span>, no instrumentation (<code class="option">-pg</code>
1068and <code class="option">-a</code> options to <span><strong class="command">gcc</strong></span>)
1069is necessary.
1070</p>
1071 <p>
1072Periodically (or on <span><strong class="command">opcontrol --shutdown</strong></span> or <span><strong class="command">opcontrol --dump</strong></span>)
1073the profile data is written out into the $SESSION_DIR/samples directory (by default at <code class="filename">/var/lib/oprofile/samples</code>).
1074These profile files cover shared libraries, applications, the kernel (vmlinux), and kernel modules.
1075You can clear the profile data (at any time) with <span><strong class="command">opcontrol --reset</strong></span>.
1076</p>
1077 <p>
1078To place these sample database files in a specific directory instead of the default location (<code class="filename">/var/lib/oprofile</code>) use the <code class="option">--session-dir=dir</code> option. You must also specify the <code class="option">--session-dir</code> to tell the tools to continue using this directory. (In the future, we should allow this to be specified in an environment variable.) :
1079</p>
1080 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
1081 <tr>
1082 <td>
1083 <pre class="screen">opcontrol --no-vmlinux --session-dir=/home/me/tmpsession</pre>
1084 </td>
1085 </tr>
1086 </table>
1087 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
1088 <tr>
1089 <td>
1090 <pre class="screen">opcontrol --start --session-dir=/home/me/tmpsession</pre>
1091 </td>
1092 </tr>
1093 </table>
1094 <p>
1095You can get summaries of this data in a number of ways at any time. To get a summary of
1096data across the entire system for all of these profiles, you can do :
1097</p>
1098 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
1099 <tr>
1100 <td>
1101 <pre class="screen">opreport [--session-dir=dir]</pre>
1102 </td>
1103 </tr>
1104 </table>
1105 <p>
1106Or to get a more detailed summary, for a particular image, you can do something like :
1107</p>
1108 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
1109 <tr>
1110 <td>
1111 <pre class="screen">opreport -l /boot/vmlinux-`uname -r`</pre>
1112 </td>
1113 </tr>
1114 </table>
1115 <p>
1116There are also a number of other ways of presenting the data, as described later in this manual.
1117Note that OProfile will choose a default profiling setup for you. However, there are a number
1118of options you can pass to <span><strong class="command">opcontrol</strong></span> if you need to change something,
1119also detailed later.
1120</p>
1121 </div>
1122 <div class="sect1" lang="en" xml:lang="en">
1123 <div class="titlepage">
1124 <div>
1125 <div>
1126 <h2 class="title" style="clear: both"><a id="tools-overview"></a>2. Tools summary</h2>
1127 </div>
1128 </div>
1129 </div>
1130 <p>
1131This section gives a brief description of the available OProfile utilities and their purpose.
1132</p>
1133 <div class="variablelist">
1134 <dl>
1135 <dt>
1136 <span class="term">
1137 <code class="filename">ophelp</code>
1138 </span>
1139 </dt>
1140 <dd>
1141 <p>
1142 This utility lists the available events and short descriptions.
1143 </p>
1144 </dd>
1145 <dt>
1146 <span class="term">
1147 <code class="filename">opcontrol</code>
1148 </span>
1149 </dt>
1150 <dd>
1151 <p>
1152 Used for controlling the OProfile data collection, discussed in <a href="#controlling" title="Chapter 3. Controlling the profiler">Chapter 3, <i>Controlling the profiler</i></a>.
1153 </p>
1154 </dd>
1155 <dt>
1156 <span class="term">
1157 <code class="filename">agent libraries</code>
1158 </span>
1159 </dt>
1160 <dd>
1161 <p>
1162 Used by virtual machines (like the Java VM) to record information about JITed code being profiled. See <a href="#setup-jit" title="2. Setting up the JIT profiling feature">Section 2, &#8220;Setting up the JIT profiling feature&#8221;</a>.
1163 </p>
1164 </dd>
1165 <dt>
1166 <span class="term">
1167 <code class="filename">opreport</code>
1168 </span>
1169 </dt>
1170 <dd>
1171 <p>
1172 This is the main tool for retrieving useful profile data, described in
1173 <a href="#opreport" title="2. Image summaries and symbol summaries (opreport)">Section 2, &#8220;Image summaries and symbol summaries (<span><strong class="command">opreport</strong></span>)&#8221;</a>.
1174 </p>
1175 </dd>
1176 <dt>
1177 <span class="term">
1178 <code class="filename">opannotate</code>
1179 </span>
1180 </dt>
1181 <dd>
1182 <p>
1183 This utility can be used to produce annotated source, assembly or mixed source/assembly.
1184 Source level annotation is available only if the application was compiled with
1185 debugging symbols. See <a href="#opannotate" title="3. Outputting annotated source (opannotate)">Section 3, &#8220;Outputting annotated source (<span><strong class="command">opannotate</strong></span>)&#8221;</a>.
1186 </p>
1187 </dd>
1188 <dt>
1189 <span class="term">
1190 <code class="filename">opgprof</code>
1191 </span>
1192 </dt>
1193 <dd>
1194 <p>
1195 This utility can output gprof-style data files for a binary, for use with
1196 <span><strong class="command">gprof -p</strong></span>. See <a href="#opgprof" title="5. gprof-compatible output (opgprof)">Section 5, &#8220;<span><strong class="command">gprof</strong></span>-compatible output (<span><strong class="command">opgprof</strong></span>)&#8221;</a>.
1197 </p>
1198 </dd>
1199 <dt>
1200 <span class="term">
1201 <code class="filename">oparchive</code>
1202 </span>
1203 </dt>
1204 <dd>
1205 <p>
1206 This utility can be used to collect executables, debuginfo,
1207 and sample files and copy the files into an archive.
1208 The archive is self-contained and can be moved to another
1209 machine for further analysis.
1210 See <a href="#oparchive" title="6. Archiving measurements (oparchive)">Section 6, &#8220;Archiving measurements (<span><strong class="command">oparchive</strong></span>)&#8221;</a>.
1211 </p>
1212 </dd>
1213 <dt>
1214 <span class="term">
1215 <code class="filename">opimport</code>
1216 </span>
1217 </dt>
1218 <dd>
1219 <p>
1220 This utility converts sample database files from a foreign binary format (abi) to
1221 the native format. This is useful only when moving sample files between hosts,
1222 for analysis on platforms other than the one used for collection.
1223 See <a href="#opimport" title="7. Converting sample database files (opimport)">Section 7, &#8220;Converting sample database files (<span><strong class="command">opimport</strong></span>)&#8221;</a>.
1224 </p>
1225 </dd>
1226 </dl>
1227 </div>
1228 </div>
1229 </div>
1230 <div class="chapter" lang="en" xml:lang="en">
1231 <div class="titlepage">
1232 <div>
1233 <div>
1234 <h2 class="title"><a id="controlling"></a>Chapter 3. Controlling the profiler</h2>
1235 </div>
1236 </div>
1237 </div>
1238 <div class="toc">
1239 <p>
1240 <b>Table of Contents</b>
1241 </p>
1242 <dl>
1243 <dt>
1244 <span class="sect1">
1245 <a href="#controlling-daemon">1. Using <span><strong class="command">opcontrol</strong></span></a>
1246 </span>
1247 </dt>
1248 <dd>
1249 <dl>
1250 <dt>
1251 <span class="sect2">
1252 <a href="#opcontrolexamples">1.1. Examples</a>
1253 </span>
1254 </dt>
1255 <dt>
1256 <span class="sect2">
1257 <a href="#eventspec">1.2. Specifying performance counter events</a>
1258 </span>
1259 </dt>
1260 </dl>
1261 </dd>
1262 <dt>
1263 <span class="sect1">
1264 <a href="#setup-jit">2. Setting up the JIT profiling feature</a>
1265 </span>
1266 </dt>
1267 <dd>
1268 <dl>
1269 <dt>
1270 <span class="sect2">
1271 <a href="#setup-jit-jvm">2.1. JVM instrumentation</a>
1272 </span>
1273 </dt>
1274 </dl>
1275 </dd>
1276 <dt>
1277 <span class="sect1">
1278 <a href="#oprofile-gui">3. Using <span><strong class="command">oprof_start</strong></span></a>
1279 </span>
1280 </dt>
1281 <dt>
1282 <span class="sect1">
1283 <a href="#detailed-parameters">4. Configuration details</a>
1284 </span>
1285 </dt>
1286 <dd>
1287 <dl>
1288 <dt>
1289 <span class="sect2">
1290 <a href="#hardware-counters">4.1. Hardware performance counters</a>
1291 </span>
1292 </dt>
1293 <dt>
1294 <span class="sect2">
1295 <a href="#rtc">4.2. OProfile in RTC mode</a>
1296 </span>
1297 </dt>
1298 <dt>
1299 <span class="sect2">
1300 <a href="#timer">4.3. OProfile in timer interrupt mode</a>
1301 </span>
1302 </dt>
1303 <dt>
1304 <span class="sect2">
1305 <a href="#p4">4.4. Pentium 4 support</a>
1306 </span>
1307 </dt>
1308 <dt>
1309 <span class="sect2">
1310 <a href="#ia64">4.5. Intel Itanium 2 support</a>
1311 </span>
1312 </dt>
1313 <dt>
1314 <span class="sect2">
1315 <a href="#ppc64">4.6. PowerPC64 support</a>
1316 </span>
1317 </dt>
1318 <dt>
1319 <span class="sect2">
1320 <a href="#cell-be">4.7. Cell Broadband Engine support</a>
1321 </span>
1322 </dt>
1323 <dt>
1324 <span class="sect2">
1325 <a href="#amd-ibs-support">4.8. AMD64 (x86_64) Instruction-Based Sampling (IBS) support</a>
1326 </span>
1327 </dt>
1328 <dt>
1329 <span class="sect2">
1330 <a href="#misuse">4.9. Dangerous counter settings</a>
1331 </span>
1332 </dt>
1333 </dl>
1334 </dd>
1335 </dl>
1336 </div>
1337 <div class="sect1" lang="en" xml:lang="en">
1338 <div class="titlepage">
1339 <div>
1340 <div>
1341 <h2 class="title" style="clear: both"><a id="controlling-daemon"></a>1. Using <span><strong class="command">opcontrol</strong></span></h2>
1342 </div>
1343 </div>
1344 </div>
1345 <p>
1346In this section we describe the configuration and control of the profiling system
1347with opcontrol in more depth.
1348The <span><strong class="command">opcontrol</strong></span> script has a default setup, but you
1349can alter this with the options given below. In particular,
1350if your hardware supports performance counters, you can configure them.
1351There are a number of counters (for example, counter 0 and counter 1
1352on the Pentium III). Each of these counters can be programmed with
1353an event to count, such as cache misses or MMX operations. The event
1354chosen for each counter is reflected in the profile data collected
1355by OProfile: functions and binaries at the top of the profiles reflect
1356that most of the chosen events happened within that code.
1357</p>
1358 <p>
1359Additionally, each counter has a "count" value: this corresponds to how
1360detailed the profile is. The lower the value, the more frequently profile
1361samples are taken. A counter can choose to sample only kernel code, user-space code,
1362or both (both is the default). Finally, some events have a "unit mask"
1363- this is a value that further restricts the types of event that are counted.
1364The event types and unit masks for your CPU are listed by <span><strong class="command">opcontrol
1365--list-events</strong></span>.
1366</p>
1367 <p>
1368The <span><strong class="command">opcontrol</strong></span> script provides the following actions :
1369</p>
1370 <div class="variablelist">
1371 <dl>
1372 <dt>
1373 <span class="term">
1374 <code class="option">--init</code>
1375 </span>
1376 </dt>
1377 <dd>
1378 <p>
1379 Loads the OProfile module if required and makes the OProfile driver
1380 interface available.
1381 </p>
1382 </dd>
1383 <dt>
1384 <span class="term">
1385 <code class="option">--setup</code>
1386 </span>
1387 </dt>
1388 <dd>
1389 <p>
1390 Followed by list arguments for profiling set up. List of arguments
1391 saved in <code class="filename">/root/.oprofile/daemonrc</code>.
1392 Giving this option is not necessary; you can just directly pass one
1393 of the setup options, e.g. <span><strong class="command">opcontrol --no-vmlinux</strong></span>.
1394 </p>
1395 </dd>
1396 <dt>
1397 <span class="term">
1398 <code class="option">--status</code>
1399 </span>
1400 </dt>
1401 <dd>
1402 <p>
1403 Show configuration information.
1404 </p>
1405 </dd>
1406 <dt>
1407 <span class="term">
1408 <code class="option">--start-daemon</code>
1409 </span>
1410 </dt>
1411 <dd>
1412 <p>
1413 Start the oprofile daemon without starting actual profiling. The profiling
1414 can then be started using <code class="option">--start</code>. This is useful for avoiding
1415 measuring the cost of daemon startup, as <code class="option">--start</code> is a simple
1416 write to a file in oprofilefs. Not available in 2.2/2.4 kernels.
1417 </p>
1418 </dd>
1419 <dt>
1420 <span class="term">
1421 <code class="option">--start</code>
1422 </span>
1423 </dt>
1424 <dd>
1425 <p>
1426 Start data collection with either arguments provided by <code class="option">--setup</code>
1427 or information saved in <code class="filename">/root/.oprofile/daemonrc</code>. Specifying
1428 the addition <code class="option">--verbose</code> makes the daemon generate lots of debug data
1429 whilst it is running.
1430 </p>
1431 </dd>
1432 <dt>
1433 <span class="term">
1434 <code class="option">--dump</code>
1435 </span>
1436 </dt>
1437 <dd>
1438 <p>
1439 Force a flush of the collected profiling data to the daemon.
1440 </p>
1441 </dd>
1442 <dt>
1443 <span class="term">
1444 <code class="option">--stop</code>
1445 </span>
1446 </dt>
1447 <dd>
1448 <p>
1449 Stop data collection (this separate step is not possible with 2.2 or 2.4 kernels).
1450 </p>
1451 </dd>
1452 <dt>
1453 <span class="term">
1454 <code class="option">--shutdown</code>
1455 </span>
1456 </dt>
1457 <dd>
1458 <p>
1459 Stop data collection and kill the daemon.
1460 </p>
1461 </dd>
1462 <dt>
1463 <span class="term">
1464 <code class="option">--reset</code>
1465 </span>
1466 </dt>
1467 <dd>
1468 <p>
1469 Clears out data from current session, but leaves saved sessions.
1470 </p>
1471 </dd>
1472 <dt>
1473 <span class="term"><code class="option">--save=</code>session_name</span>
1474 </dt>
1475 <dd>
1476 <p>
1477 Save data from current session to session_name.
1478 </p>
1479 </dd>
1480 <dt>
1481 <span class="term">
1482 <code class="option">--deinit</code>
1483 </span>
1484 </dt>
1485 <dd>
1486 <p>
1487 Shuts down daemon. Unload the OProfile module and oprofilefs.
1488 </p>
1489 </dd>
1490 <dt>
1491 <span class="term">
1492 <code class="option">--list-events</code>
1493 </span>
1494 </dt>
1495 <dd>
1496 <p>
1497 List event types and unit masks.
1498 </p>
1499 </dd>
1500 <dt>
1501 <span class="term">
1502 <code class="option">--help</code>
1503 </span>
1504 </dt>
1505 <dd>
1506 <p>
1507 Generate usage messages.
1508 </p>
1509 </dd>
1510 </dl>
1511 </div>
1512 <p>
1513There are a number of possible settings, of which, only
1514<code class="option">--vmlinux</code> (or <code class="option">--no-vmlinux</code>)
1515is required. These settings are stored in <code class="filename">~/.oprofile/daemonrc</code>.
1516</p>
1517 <div class="variablelist">
1518 <dl>
1519 <dt>
1520 <span class="term"><code class="option">--buffer-size=</code>num</span>
1521 </dt>
1522 <dd>
1523 <p>
1524 Number of samples in kernel buffer. When using a 2.6 kernel
1525 buffer watershed need to be tweaked when changing this value.
1526 </p>
1527 </dd>
1528 <dt>
1529 <span class="term"><code class="option">--buffer-watershed=</code>num</span>
1530 </dt>
1531 <dd>
1532 <p>
1533 Set kernel buffer watershed to num samples (2.6 only). When it'll remain only
1534 buffer-size - buffer-watershed free entry in the kernel buffer data will be
1535 flushed to daemon, most usefull value are in the range [0.25 - 0.5] * buffer-size.
1536 </p>
1537 </dd>
1538 <dt>
1539 <span class="term"><code class="option">--cpu-buffer-size=</code>num</span>
1540 </dt>
1541 <dd>
1542 <p>
1543 Number of samples in kernel per-cpu buffer (2.6 only). If you
1544 profile at high rate it can help to increase this if the log
1545 file show excessive count of sample lost cpu buffer overflow.
1546 </p>
1547 </dd>
1548 <dt>
1549 <span class="term"><code class="option">--event=</code>[eventspec]</span>
1550 </dt>
1551 <dd>
1552 <p>
1553 Use the given performance counter event to profile.
1554 See <a href="#eventspec" title="1.2. Specifying performance counter events">Section 1.2, &#8220;Specifying performance counter events&#8221;</a> below.
1555 </p>
1556 </dd>
1557 <dt>
1558 <span class="term"><code class="option">--session-dir=</code>dir_path</span>
1559 </dt>
1560 <dd>
1561 <p>
1562 Create/use sample database out of directory <code class="filename">dir_path</code> instead of
1563 the default location (/var/lib/oprofile).
1564 </p>
1565 </dd>
1566 <dt>
1567 <span class="term"><code class="option">--separate=</code>[none,lib,kernel,thread,cpu,all]</span>
1568 </dt>
1569 <dd>
1570 <p>
1571 By default, every profile is stored in a single file. Thus, for example,
1572 samples in the C library are all accredited to the <code class="filename">/lib/libc.o</code>
1573 profile. However, you choose to create separate sample files by specifying
1574 one of the below options.
1575 </p>
1576 <div class="informaltable">
1577 <table border="1">
1578 <colgroup>
1579 <col />
1580 <col />
1581 </colgroup>
1582 <tbody>
1583 <tr>
1584 <td>
1585 <code class="option">none</code>
1586 </td>
1587 <td>No profile separation (default)</td>
1588 </tr>
1589 <tr>
1590 <td>
1591 <code class="option">lib</code>
1592 </td>
1593 <td>Create per-application profiles for libraries</td>
1594 </tr>
1595 <tr>
1596 <td>
1597 <code class="option">kernel</code>
1598 </td>
1599 <td>Create per-application profiles for the kernel and kernel modules</td>
1600 </tr>
1601 <tr>
1602 <td>
1603 <code class="option">thread</code>
1604 </td>
1605 <td>Create profiles for each thread and each task</td>
1606 </tr>
1607 <tr>
1608 <td>
1609 <code class="option">cpu</code>
1610 </td>
1611 <td>Create profiles for each CPU</td>
1612 </tr>
1613 <tr>
1614 <td>
1615 <code class="option">all</code>
1616 </td>
1617 <td>All of the above options</td>
1618 </tr>
1619 </tbody>
1620 </table>
1621 </div>
1622 <p>
1623 Note that <code class="option">--separate=kernel</code> also turns on <code class="option">--separate=lib</code>.
1624
1625 When using <code class="option">--separate=kernel</code>, samples in hardware interrupts, soft-irqs, or other
1626 asynchronous kernel contexts are credited to the task currently running. This means you will see
1627 seemingly nonsense profiles such as <code class="filename">/bin/bash</code> showing samples for the PPP modules,
1628 etc.
1629 </p>
1630 <p>
1631 On 2.2/2.4 only kernel threads already started when profiling begins are correctly profiled;
1632 newly started kernel thread samples are credited to the vmlinux (kernel) profile.
1633 </p>
1634 <p>
1635 Using <code class="option">--separate=thread</code> creates a lot
1636 of sample files if you leave OProfile running for a while; it's most
1637 useful when used for short sessions, or when using image filtering.
1638 </p>
1639 </dd>
1640 <dt>
1641 <span class="term"><code class="option">--callgraph=</code>#depth</span>
1642 </dt>
1643 <dd>
1644 <p>
1645 Enable call-graph sample collection with a maximum depth. Use 0 to disable
1646 callgraph profiling. NOTE: Callgraph support is available on a limited
1647 number of platforms at this time; for example:
1648 </p>
1649 <p>
1650 </p>
1651 <div class="itemizedlist">
1652 <ul type="disc">
1653 <li>
1654 <p>x86 with recent 2.6 kernel</p>
1655 </li>
1656 <li>
1657 <p>ARM with recent 2.6 kernel</p>
1658 </li>
1659 <li>
1660 <p>PowerPC with 2.6.17 kernel</p>
1661 </li>
1662 </ul>
1663 </div>
1664 <p>
1665 </p>
1666 <p>
1667 </p>
1668 </dd>
1669 <dt>
1670 <span class="term"><code class="option">--image=</code>image,[images]|"all"</span>
1671 </dt>
1672 <dd>
1673 <p>
1674 Image filtering. If you specify one or more absolute
1675 paths to binaries, OProfile will only produce profile results for those
1676 binary images. This is useful for restricting the sometimes voluminous
1677 output you may get otherwise, especially with
1678 <code class="option">--separate=thread</code>. Note that if you are using
1679 <code class="option">--separate=lib</code> or
1680 <code class="option">--separate=kernel</code>, then if you specification an
1681 application binary, the shared libraries and kernel code
1682 <span class="emphasis"><em>are</em></span> included. Specify the value
1683 "all" to profile everything (the default).
1684 </p>
1685 </dd>
1686 <dt>
1687 <span class="term"><code class="option">--vmlinux=</code>file</span>
1688 </dt>
1689 <dd>
1690 <p>
1691 vmlinux kernel image.
1692 </p>
1693 </dd>
1694 <dt>
1695 <span class="term">
1696 <code class="option">--no-vmlinux</code>
1697 </span>
1698 </dt>
1699 <dd>
1700 <p>
1701 Use this when you don't have a kernel vmlinux file, and you don't want
1702 to profile the kernel. This still counts the total number of kernel samples,
1703 but can't give symbol-based results for the kernel or any modules.
1704 </p>
1705 </dd>
1706 </dl>
1707 </div>
1708 <div class="sect2" lang="en" xml:lang="en">
1709 <div class="titlepage">
1710 <div>
1711 <div>
1712 <h3 class="title"><a id="opcontrolexamples"></a>1.1. Examples</h3>
1713 </div>
1714 </div>
1715 </div>
1716 <div class="sect3" lang="en" xml:lang="en">
1717 <div class="titlepage">
1718 <div>
1719 <div>
1720 <h4 class="title"><a id="examplesperfctr"></a>1.1.1. Intel performance counter setup</h4>
1721 </div>
1722 </div>
1723 </div>
1724 <p>
1725Here, we have a Pentium III running at 800MHz, and we want to look at where data memory
1726references are happening most, and also get results for CPU time.
1727</p>
1728 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
1729 <tr>
1730 <td>
1731 <pre class="screen">
1732# opcontrol --event=CPU_CLK_UNHALTED:400000 --event=DATA_MEM_REFS:10000
1733# opcontrol --vmlinux=/boot/2.6.0/vmlinux
1734# opcontrol --start
1735</pre>
1736 </td>
1737 </tr>
1738 </table>
1739 </div>
1740 <div class="sect3" lang="en" xml:lang="en">
1741 <div class="titlepage">
1742 <div>
1743 <div>
1744 <h4 class="title"><a id="examplesrtc"></a>1.1.2. RTC mode</h4>
1745 </div>
1746 </div>
1747 </div>
1748 <p>
1749Here, we have an Intel laptop without support for performance counters, running on 2.4 kernels.
1750</p>
1751 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
1752 <tr>
1753 <td>
1754 <pre class="screen">
1755# ophelp -r
1756CPU with RTC device
1757# opcontrol --vmlinux=/boot/2.4.13/vmlinux --event=RTC_INTERRUPTS:1024
1758# opcontrol --start
1759</pre>
1760 </td>
1761 </tr>
1762 </table>
1763 </div>
1764 <div class="sect3" lang="en" xml:lang="en">
1765 <div class="titlepage">
1766 <div>
1767 <div>
1768 <h4 class="title"><a id="examplesstartdaemon"></a>1.1.3. Starting the daemon separately</h4>
1769 </div>
1770 </div>
1771 </div>
1772 <p>
1773If we're running 2.6 kernels, we can use <code class="option">--start-daemon</code> to avoid
1774the profiler startup affecting results.
1775</p>
1776 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
1777 <tr>
1778 <td>
1779 <pre class="screen">
1780# opcontrol --vmlinux=/boot/2.6.0/vmlinux
1781# opcontrol --start-daemon
1782# my_favourite_benchmark --init
1783# opcontrol --start ; my_favourite_benchmark --run ; opcontrol --stop
1784</pre>
1785 </td>
1786 </tr>
1787 </table>
1788 </div>
1789 <div class="sect3" lang="en" xml:lang="en">
1790 <div class="titlepage">
1791 <div>
1792 <div>
1793 <h4 class="title"><a id="exampleseparate"></a>1.1.4. Separate profiles for libraries and the kernel</h4>
1794 </div>
1795 </div>
1796 </div>
1797 <p>
1798Here, we want to see a profile of the OProfile daemon itself, including when
1799it was running inside the kernel driver, and its use of shared libraries.
1800</p>
1801 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
1802 <tr>
1803 <td>
1804 <pre class="screen">
1805# opcontrol --separate=kernel --vmlinux=/boot/2.6.0/vmlinux
1806# opcontrol --start
1807# my_favourite_stress_test --run
1808# opreport -l -p /lib/modules/2.6.0/kernel /usr/local/bin/oprofiled
1809</pre>
1810 </td>
1811 </tr>
1812 </table>
1813 </div>
1814 <div class="sect3" lang="en" xml:lang="en">
1815 <div class="titlepage">
1816 <div>
1817 <div>
1818 <h4 class="title"><a id="examplessessions"></a>1.1.5. Profiling sessions</h4>
1819 </div>
1820 </div>
1821 </div>
1822 <p>
1823It can often be useful to split up profiling data into several different
1824time periods. For example, you may want to collect data on an application's
1825startup separately from the normal runtime data. You can use the simple
1826command <span><strong class="command">opcontrol --save</strong></span> to do this. For example :
1827</p>
1828 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
1829 <tr>
1830 <td>
1831 <pre class="screen">
1832# opcontrol --save=blah
1833</pre>
1834 </td>
1835 </tr>
1836 </table>
1837 <p>
1838will create a sub-directory in <code class="filename">$SESSION_DIR/samples</code> containing the samples
1839up to that point (the current session's sample files are moved into this
1840directory). You can then pass this session name as a parameter to the post-profiling
1841analysis tools, to only get data up to the point you named the
1842session. If you do not want to save a session, you can do
1843<span><strong class="command">rm -rf $SESSION_DIR/samples/sessionname</strong></span> or, for the
1844current session, <span><strong class="command">opcontrol --reset</strong></span>.
1845</p>
1846 </div>
1847 </div>
1848 <div class="sect2" lang="en" xml:lang="en">
1849 <div class="titlepage">
1850 <div>
1851 <div>
1852 <h3 class="title"><a id="eventspec"></a>1.2. Specifying performance counter events</h3>
1853 </div>
1854 </div>
1855 </div>
1856 <p>
1857The <code class="option">--event</code> option to <span><strong class="command">opcontrol</strong></span>
1858takes a specification that indicates how the details of each
1859hardware performance counter should be setup. If you want to
1860revert to OProfile's default setting (<code class="option">--event</code>
1861is strictly optional), use <code class="option">--event=default</code>. Use of this
1862option over-rides all previous event selections.
1863</p>
1864 <p>
1865You can pass multiple event specifications. OProfile will allocate
1866hardware counters as necessary. Note that some combinations are not
1867allowed by the CPU; running <span><strong class="command">opcontrol --list-events</strong></span> gives the details
1868of each event. The event specification is a colon-separated string
1869of the form <code class="option"><span class="emphasis"><em>name</em></span>:<span class="emphasis"><em>count</em></span>:<span class="emphasis"><em>unitmask</em></span>:<span class="emphasis"><em>kernel</em></span>:<span class="emphasis"><em>user</em></span></code> as described in this table:
1870</p>
1871 <div class="informaltable">
1872 <table border="1">
1873 <colgroup>
1874 <col />
1875 <col />
1876 </colgroup>
1877 <tbody>
1878 <tr>
1879 <td>
1880 <code class="option">name</code>
1881 </td>
1882 <td>The symbolic event name, e.g. <code class="constant">CPU_CLK_UNHALTED</code></td>
1883 </tr>
1884 <tr>
1885 <td>
1886 <code class="option">count</code>
1887 </td>
1888 <td>The counter reset value, e.g. 100000</td>
1889 </tr>
1890 <tr>
1891 <td>
1892 <code class="option">unitmask</code>
1893 </td>
1894 <td>The unit mask, as given in the events list, e.g. 0x0f</td>
1895 </tr>
1896 <tr>
1897 <td>
1898 <code class="option">kernel</code>
1899 </td>
1900 <td>Whether to profile kernel code</td>
1901 </tr>
1902 <tr>
1903 <td>
1904 <code class="option">user</code>
1905 </td>
1906 <td>Whether to profile userspace code</td>
1907 </tr>
1908 </tbody>
1909 </table>
1910 </div>
1911 <p>
1912The last three values are optional, if you omit them (e.g. <code class="option">--event=DATA_MEM_REFS:30000</code>),
1913they will be set to the default values (a unit mask of 0, and profiling both kernel and
1914userspace code). Note that some events require a unit mask.
1915</p>
1916 <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
1917 <h3 class="title">Note</h3>
1918 <p>
1919For the PowerPC platforms, all events specified must be in the same group; i.e., the group number
1920appended to the event name (e.g. <code class="constant">&lt;<span class="emphasis"><em>some-event-name</em></span>&gt;_GRP9</code>) must be the same.
1921</p>
1922 </div>
1923 <p>
1924If OProfile is using RTC mode, and you want to alter the default counter value,
1925you can use something like <code class="option">--event=RTC_INTERRUPTS:2048</code>. Note the last
1926three values here are ignored.
1927If OProfile is using timer-interrupt mode, there is no configuration possible.
1928</p>
1929 <p>
1930The table below lists the events selected by default
1931(<code class="option">--event=default</code>) for the various computer architectures:
1932</p>
1933 <div class="informaltable">
1934 <table border="1">
1935 <colgroup>
1936 <col />
1937 <col />
1938 <col />
1939 </colgroup>
1940 <tbody>
1941 <tr>
1942 <td>Processor</td>
1943 <td>cpu_type</td>
1944 <td>Default event</td>
1945 </tr>
1946 <tr>
1947 <td>Alpha EV4</td>
1948 <td>alpha/ev4</td>
1949 <td>CYCLES:100000:0:1:1</td>
1950 </tr>
1951 <tr>
1952 <td>Alpha EV5</td>
1953 <td>alpha/ev5</td>
1954 <td>CYCLES:100000:0:1:1</td>
1955 </tr>
1956 <tr>
1957 <td>Alpha PCA56</td>
1958 <td>alpha/pca56</td>
1959 <td>CYCLES:100000:0:1:1</td>
1960 </tr>
1961 <tr>
1962 <td>Alpha EV6</td>
1963 <td>alpha/ev6</td>
1964 <td>CYCLES:100000:0:1:1</td>
1965 </tr>
1966 <tr>
1967 <td>Alpha EV67</td>
1968 <td>alpha/ev67</td>
1969 <td>CYCLES:100000:0:1:1</td>
1970 </tr>
1971 <tr>
1972 <td>ARM/XScale PMU1</td>
1973 <td>arm/xscale1</td>
1974 <td>CPU_CYCLES:100000:0:1:1</td>
1975 </tr>
1976 <tr>
1977 <td>ARM/XScale PMU2</td>
1978 <td>arm/xscale2</td>
1979 <td>CPU_CYCLES:100000:0:1:1</td>
1980 </tr>
1981 <tr>
1982 <td>ARM/MPCore</td>
1983 <td>arm/mpcore</td>
1984 <td>CPU_CYCLES:100000:0:1:1</td>
1985 </tr>
1986 <tr>
1987 <td>AVR32</td>
1988 <td>avr32</td>
1989 <td>CPU_CYCLES:100000:0:1:1</td>
1990 </tr>
1991 <tr>
1992 <td>Athlon</td>
1993 <td>i386/athlon</td>
1994 <td>CPU_CLK_UNHALTED:100000:0:1:1</td>
1995 </tr>
1996 <tr>
1997 <td>Pentium Pro</td>
1998 <td>i386/ppro</td>
1999 <td>CPU_CLK_UNHALTED:100000:0:1:1</td>
2000 </tr>
2001 <tr>
2002 <td>Pentium II</td>
2003 <td>i386/pii</td>
2004 <td>CPU_CLK_UNHALTED:100000:0:1:1</td>
2005 </tr>
2006 <tr>
2007 <td>Pentium III</td>
2008 <td>i386/piii</td>
2009 <td>CPU_CLK_UNHALTED:100000:0:1:1</td>
2010 </tr>
2011 <tr>
2012 <td>Pentium M (P6 core)</td>
2013 <td>i386/p6_mobile</td>
2014 <td>CPU_CLK_UNHALTED:100000:0:1:1</td>
2015 </tr>
2016 <tr>
2017 <td>Pentium 4 (non-HT)</td>
2018 <td>i386/p4</td>
2019 <td>GLOBAL_POWER_EVENTS:100000:1:1:1</td>
2020 </tr>
2021 <tr>
2022 <td>Pentium 4 (HT)</td>
2023 <td>i386/p4-ht</td>
2024 <td>GLOBAL_POWER_EVENTS:100000:1:1:1</td>
2025 </tr>
2026 <tr>
2027 <td>Hammer</td>
2028 <td>x86-64/hammer</td>
2029 <td>CPU_CLK_UNHALTED:100000:0:1:1</td>
2030 </tr>
2031 <tr>
2032 <td>Family10h</td>
2033 <td>x86-64/family10</td>
2034 <td>CPU_CLK_UNHALTED:100000:0:1:1</td>
2035 </tr>
2036 <tr>
2037 <td>Family11h</td>
2038 <td>x86-64/family11h</td>
2039 <td>CPU_CLK_UNHALTED:100000:0:1:1</td>
2040 </tr>
2041 <tr>
2042 <td>Itanium</td>
2043 <td>ia64/itanium</td>
2044 <td>CPU_CYCLES:100000:0:1:1</td>
2045 </tr>
2046 <tr>
2047 <td>Itanium 2</td>
2048 <td>ia64/itanium2</td>
2049 <td>CPU_CYCLES:100000:0:1:1</td>
2050 </tr>
2051 <tr>
2052 <td>TIMER_INT</td>
2053 <td>timer</td>
2054 <td>None selectable</td>
2055 </tr>
2056 <tr>
2057 <td>IBM iseries</td>
2058 <td>PowerPC 4/5/970</td>
2059 <td>CYCLES:10000:0:1:1</td>
2060 </tr>
2061 <tr>
2062 <td>IBM pseries</td>
2063 <td>PowerPC 4/5/970/Cell</td>
2064 <td>CYCLES:10000:0:1:1</td>
2065 </tr>
2066 <tr>
2067 <td>IBM s390</td>
2068 <td>timer</td>
2069 <td>None selectable</td>
2070 </tr>
2071 <tr>
2072 <td>IBM s390x</td>
2073 <td>timer</td>
2074 <td>None selectable</td>
2075 </tr>
2076 </tbody>
2077 </table>
2078 </div>
2079 </div>
2080 </div>
2081 <div class="sect1" lang="en" xml:lang="en">
2082 <div class="titlepage">
2083 <div>
2084 <div>
2085 <h2 class="title" style="clear: both"><a id="setup-jit"></a>2. Setting up the JIT profiling feature</h2>
2086 </div>
2087 </div>
2088 </div>
2089 <p>
2090 To gather information about JITed code from a virtual machine,
2091 it needs to be instrumented with an agent library. We use the
2092 agent libraries for Java in the following example. To use the
2093 Java profiling feature, you must build OProfile with the "--with-java" option
2094 (<a href="#install" title="4. Installation">Section 4, &#8220;Installation&#8221;</a>).
2095
2096 </p>
2097 <div class="sect2" lang="en" xml:lang="en">
2098 <div class="titlepage">
2099 <div>
2100 <div>
2101 <h3 class="title"><a id="setup-jit-jvm"></a>2.1. JVM instrumentation</h3>
2102 </div>
2103 </div>
2104 </div>
2105 <p>
2106 Add this to the startup parameters of the JVM (for JVMTI):
2107
2108 </p>
2109 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
2110 <tr>
2111 <td>
2112 <pre class="screen"><code xmlns="http://www.w3.org/1999/xhtml" class="option">-agentpath:&lt;libdir&gt;/libjvmti_oprofile.so[=&lt;options&gt;]</code> </pre>
2113 </td>
2114 </tr>
2115 </table>
2116 <p>
2117 or
2118 </p>
2119 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
2120 <tr>
2121 <td>
2122 <pre class="screen"><code xmlns="http://www.w3.org/1999/xhtml" class="option">-agentlib:jvmti_oprofile[=&lt;options&gt;]</code> </pre>
2123 </td>
2124 </tr>
2125 </table>
2126 <p>
2127 </p>
2128 <p>
2129 The JVMPI agent implementation is enabled with the command line option
2130 </p>
2131 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
2132 <tr>
2133 <td>
2134 <pre class="screen"><code xmlns="http://www.w3.org/1999/xhtml" class="option">-Xrunjvmpi_oprofile[:&lt;options&gt;]</code> </pre>
2135 </td>
2136 </tr>
2137 </table>
2138 <p>
2139 </p>
2140 <p>
2141 Currently, there is just one option available -- <code class="option">debug</code>. For JVMPI,
2142 the convention for specifying an option is <code class="option">option_name=[yes|no]</code>.
2143 For JVMTI, the option specification is simply the option name, implying
2144 "yes"; no option specified implies "no".
2145 </p>
2146 <p>
2147 The agent library (installed in <code class="filename">&lt;oprof_install_dir&gt;/lib/oprofile</code>)
2148 needs to be in the library search path (e.g. add the library directory
2149 to <code class="constant">LD_LIBRARY_PATH</code>). If the command line of
2150 the JVM is not accessible, it may be buried within shell scripts or a
2151 launcher program. It may also be possible to set an environment variable to add
2152 the instrumentation.
2153 For Sun JVMs this is <code class="constant">JAVA_TOOL_OPTIONS</code>. Please check
2154 your JVM documentation for
2155 further information on the agent startup options.
2156 </p>
2157 </div>
2158 </div>
2159 <div class="sect1" lang="en" xml:lang="en">
2160 <div class="titlepage">
2161 <div>
2162 <div>
2163 <h2 class="title" style="clear: both"><a id="oprofile-gui"></a>3. Using <span><strong class="command">oprof_start</strong></span></h2>
2164 </div>
2165 </div>
2166 </div>
2167 <p>
2168The <span><strong class="command">oprof_start</strong></span> application provides a convenient way to start the profiler.
2169Note that <span><strong class="command">oprof_start</strong></span> is just a wrapper around the <span><strong class="command">opcontrol</strong></span> script,
2170so it does not provide more services than the script itself.
2171</p>
2172 <p>
2173After <span><strong class="command">oprof_start</strong></span> is started you can select the event type for each counter;
2174the sampling rate and other related parameters are explained in <a href="#controlling-daemon" title="1. Using opcontrol">Section 1, &#8220;Using <span><strong class="command">opcontrol</strong></span>&#8221;</a>.
2175The "Configuration" section allows you to set general parameters such as the buffer size, kernel filename
2176etc. The counter setup interface should be self-explanatory; <a href="#hardware-counters" title="4.1. Hardware performance counters">Section 4.1, &#8220;Hardware performance counters&#8221;</a> and related
2177links contain information on using unit masks.
2178</p>
2179 <p>
2180A status line shows the current status of the profiler: how long it has been running, and the average
2181number of interrupts received per second and the total, over all processors.
2182Note that quitting <span><strong class="command">oprof_start</strong></span> does not stop the profiler.
2183</p>
2184 <p>
2185Your configuration is saved in the same file as <span><strong class="command">opcontrol</strong></span> uses; that is,
2186<code class="filename">~/.oprofile/daemonrc</code>.
2187</p>
2188 </div>
2189 <div class="sect1" lang="en" xml:lang="en">
2190 <div class="titlepage">
2191 <div>
2192 <div>
2193 <h2 class="title" style="clear: both"><a id="detailed-parameters"></a>4. Configuration details</h2>
2194 </div>
2195 </div>
2196 </div>
2197 <div class="sect2" lang="en" xml:lang="en">
2198 <div class="titlepage">
2199 <div>
2200 <div>
2201 <h3 class="title"><a id="hardware-counters"></a>4.1. Hardware performance counters</h3>
2202 </div>
2203 </div>
2204 </div>
2205 <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
2206 <h3 class="title">Note</h3>
2207 <p>
2208Your CPU type may not include the requisite support for hardware performance counters, in which case
2209you must use OProfile in RTC mode in 2.4 (see <a href="#rtc" title="4.2. OProfile in RTC mode">Section 4.2, &#8220;OProfile in RTC mode&#8221;</a>), or timer mode in 2.6 (see <a href="#timer" title="4.3. OProfile in timer interrupt mode">Section 4.3, &#8220;OProfile in timer interrupt mode&#8221;</a>).
2210You do not really need to read this section unless you are interested in using
2211events other than the default event chosen by OProfile.
2212</p>
2213 </div>
2214 <p>
2215The Intel hardware performance counters are detailed in the Intel IA-32 Architecture Manual, Volume 3, available
2216from <a href="http://developer.intel.com/">http://developer.intel.com/</a>.
2217The AMD Athlon/Opteron/Phenom/Turion implementation is detailed in <a href="http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf">
2218http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf</a>.
2219For PowerPC64 processors in IBM iSeries, pSeries, and blade server systems, processor documentation
2220is available at <a href="http://www-01.ibm.com/chips/techlib/techlib.nsf/productfamilies/PowerPC/">
2221http://www-01.ibm.com/chips/techlib/techlib.nsf/productfamilies/PowerPC</a>. (For example, the
2222specific publication containing information on the performance monitor unit for the PowerPC970 is
2223"IBM PowerPC 970FX RISC Microprocessor User's Manual.")
2224These processors are capable of delivering an interrupt when a counter overflows.
2225This is the basic mechanism on which OProfile is based. The delivery mode is <span class="acronym">NMI</span>,
2226so blocking interrupts in the kernel does not prevent profiling. When the interrupt handler is called,
2227the current <span class="acronym">PC</span> value and the current task are recorded into the profiling structure.
2228This allows the overflow event to be attached to a specific assembly instruction in a binary image.
2229The daemon receives this data from the kernel, and writes it to the sample files.
2230</p>
2231 <p>
2232If we use an event such as <code class="constant">CPU_CLK_UNHALTED</code> or <code class="constant">INST_RETIRED</code>
2233(<code class="constant">GLOBAL_POWER_EVENTS</code> or <code class="constant">INSTR_RETIRED</code>, respectively, on the Pentium 4), we can
2234use the overflow counts as an estimate of actual time spent in each part of code. Alternatively we can profile interesting
2235data such as the cache behaviour of routines with the other available counters.
2236</p>
2237 <p>
2238However there are several caveats. First, there are those issues listed in the Intel manual. There is a delay
2239between the counter overflow and the interrupt delivery that can skew results on a small scale - this means
2240you cannot rely on the profiles at the instruction level as being perfectly accurate.
2241If you are using an "event-mode" counter such as the cache counters, a count registered against it doesn't mean
2242that it is responsible for that event. However, it implies that the counter overflowed in the dynamic
2243vicinity of that instruction, to within a few instructions. Further details on this problem can be found in
2244<a href="#interpreting" title="Chapter 5. Interpreting profiling results">Chapter 5, <i>Interpreting profiling results</i></a> and also in the Digital paper "ProfileMe: A Hardware Performance Counter".
2245</p>
2246 <p>
2247Each counter has several configuration parameters.
2248First, there is the unit mask: this simply further specifies what to count.
2249Second, there is the counter value, discussed below. Third, there is a parameter whether to increment counts
2250whilst in kernel or user space. You can configure these separately for each counter.
2251</p>
2252 <p>
2253After each overflow event, the counter will be re-initialized
2254such that another overflow will occur after this many events have been counted. Thus, higher
2255values mean less-detailed profiling, and lower values mean more detail, but higher overhead.
2256Picking a good value for this
2257parameter is, unfortunately, somewhat of a black art. It is of course dependent on the event
2258you have chosen.
2259Specifying too large a value will mean not enough interrupts are generated
2260to give a realistic profile (though this problem can be ameliorated by profiling for <span class="emphasis"><em>longer</em></span>).
2261Specifying too small a value can lead to higher performance overhead.
2262</p>
2263 </div>
2264 <div class="sect2" lang="en" xml:lang="en">
2265 <div class="titlepage">
2266 <div>
2267 <div>
2268 <h3 class="title"><a id="rtc"></a>4.2. OProfile in RTC mode</h3>
2269 </div>
2270 </div>
2271 </div>
2272 <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
2273 <h3 class="title">Note</h3>
2274 <p>
2275This section applies to 2.2/2.4 kernels only.
2276</p>
2277 </div>
2278 <p>
2279Some CPU types do not provide the needed hardware support to use the hardware performance counters. This includes
2280some laptops, classic Pentiums, and other CPU types not yet supported by OProfile (such as Cyrix).
2281On these machines, OProfile falls
2282back to using the real-time clock interrupt to collect samples. This interrupt is also used by the <span><strong class="command">rtc</strong></span>
2283module: you cannot have both the OProfile and rtc modules loaded nor the rtc support compiled in the kernel.
2284</p>
2285 <p>
2286RTC mode is less capable than the hardware counters mode; in particular, it is unable to profile sections of
2287the kernel where interrupts are disabled. There is just one available event, "RTC interrupts", and its value
2288corresponds to the number of interrupts generated per second (that is, a higher number means a better profiling
2289resolution, and higher overhead). The current implementation of the real-time clock supports only power-of-two
2290sampling rates from 2 to 4096 per second. Other values within this range are rounded to the nearest power of
2291two.
2292</p>
2293 <p>
2294You can force use of the RTC interrupt with the <code class="option">force_rtc=1</code> module parameter.
2295</p>
2296 <p>
2297Setting the value from the GUI should be straightforward. On the command line, you need to specify the
2298event to <span><strong class="command">opcontrol</strong></span>, e.g. :
2299</p>
2300 <p>
2301 <span>
2302 <strong class="command">opcontrol --event=RTC_INTERRUPTS:256</strong>
2303 </span>
2304 </p>
2305 </div>
2306 <div class="sect2" lang="en" xml:lang="en">
2307 <div class="titlepage">
2308 <div>
2309 <div>
2310 <h3 class="title"><a id="timer"></a>4.3. OProfile in timer interrupt mode</h3>
2311 </div>
2312 </div>
2313 </div>
2314 <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
2315 <h3 class="title">Note</h3>
2316 <p>
2317This section applies to 2.6 kernels and above only.
2318</p>
2319 </div>
2320 <p>
2321In 2.6 kernels on CPUs without OProfile support for the hardware performance counters, the driver
2322falls back to using the timer interrupt for profiling. Like the RTC mode in 2.4 kernels, this is not able to
2323profile code that has interrupts disabled. Note that there are no configuration parameters for
2324setting this, unlike the RTC and hardware performance counter setup.
2325</p>
2326 <p>
2327You can force use of the timer interrupt by using the <code class="option">timer=1</code> module
2328parameter (or <code class="option">oprofile.timer=1</code> on the boot command line if OProfile is
2329built-in).
2330</p>
2331 </div>
2332 <div class="sect2" lang="en" xml:lang="en">
2333 <div class="titlepage">
2334 <div>
2335 <div>
2336 <h3 class="title"><a id="p4"></a>4.4. Pentium 4 support</h3>
2337 </div>
2338 </div>
2339 </div>
2340 <p>
2341The Pentium 4 / Xeon performance counters are organized around 3 types of model specific registers (MSRs): 45 event
2342selection control registers (ESCRs), 18 counter configuration control registers (CCCRs) and 18 counters. ESCRs describe a
2343particular set of events which are to be recorded, and CCCRs bind ESCRs to counters and configure their
2344operation. Unfortunately the relationship between these registers is quite complex; they cannot all be used with one
2345another at any time. There is, however, a subset of 8 counters, 8 ESCRs, and 8 CCCRs which can be used independently of
2346one another, so OProfile only accesses those registers, treating them as a bank of 8 "normal" counters, similar
2347to those in the P6 or Athlon/Opteron/Phenom/Turion families of CPU.
2348</p>
2349 <p>
2350There is currently no support for Precision Event-Based Sampling (PEBS), nor any advanced uses of the Debug Store
2351(DS). Current support is limited to the conservative extension of OProfile's existing interrupt-based model described
2352above. Performance monitoring hardware on Pentium 4 / Xeon processors with Hyperthreading enabled (multiple logical
2353processors on a single die) is not supported in 2.4 kernels (you can use OProfile if you disable hyper-threading,
2354though).
2355</p>
2356 </div>
2357 <div class="sect2" lang="en" xml:lang="en">
2358 <div class="titlepage">
2359 <div>
2360 <div>
2361 <h3 class="title"><a id="ia64"></a>4.5. Intel Itanium 2 support</h3>
2362 </div>
2363 </div>
2364 </div>
2365 <p>
2366The Itanium 2 performance monitoring unit (PMU) organizes the counters as four
2367pairs of performance event monitoring registers. Each pair is composed of a
2368Performance Monitoring Configuration (PMC) register and Performance Monitoring
2369Data (PMD) register. The PMC selects the performance event being monitored and
2370the PMD determines the sampling interval. The IA64 Performance Monitoring Unit
2371(PMU) triggers sampling with maskable interrupts. Thus, samples will not occur
2372in sections of the IA64 kernel where interrupts are disabled.
2373</p>
2374 <p>
2375None of the advance features of the Itanium 2 performance monitoring unit
2376such as opcode matching, address range matching, or precise event sampling are
2377supported by this version of OProfile. The Itanium 2 support only maps OProfile's
2378existing interrupt-based model to the PMU hardware.
2379</p>
2380 </div>
2381 <div class="sect2" lang="en" xml:lang="en">
2382 <div class="titlepage">
2383 <div>
2384 <div>
2385 <h3 class="title"><a id="ppc64"></a>4.6. PowerPC64 support</h3>
2386 </div>
2387 </div>
2388 </div>
2389 <p>
2390The performance monitoring unit (PMU) for the IBM PowerPC 64-bit processors
2391consists of between 4 and 8 counters (depending on the model), plus three
2392special purpose registers used for programming the counters -- MMCR0, MMCR1,
2393and MMCRA. Advanced features such as instruction matching and thresholding are
2394not supported by this version of OProfile.
2395</p>
2396 <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3>Later versions of the IBM POWER5+ processor (beginning with revision 3.0)
2397run the performance monitor unit in POWER6 mode, effectively removing OProfile's
2398access to counters 5 and 6. These two counters are dedicated to counting
2399instructions completed and cycles, respectively. In POWER6 mode, however, the
2400counters do not generate an interrupt on overflow and so are unusable by
2401OProfile. Kernel versions 2.6.23 and higher will recognize this mode
2402and export "ppc64/power5++" as the cpu_type to the oprofilefs pseudo filesystem.
2403OProfile userspace responds to this cpu_type by removing these counters from
2404the list of potential events to count. Without this kernel support, attempts
2405to profile using an event from one of these counters will yield incorrect
2406results -- typically, zero (or near zero) samples in the generated report.
2407</div>
2408 <p>
2409</p>
2410 </div>
2411 <div class="sect2" lang="en" xml:lang="en">
2412 <div class="titlepage">
2413 <div>
2414 <div>
2415 <h3 class="title"><a id="cell-be"></a>4.7. Cell Broadband Engine support</h3>
2416 </div>
2417 </div>
2418 </div>
2419 <p>
2420The Cell Broadband Engine (CBE) processor core consists of a PowerPC Processing
2421Element (PPE) and 8 Synergistic Processing Elements (SPE). PPEs and SPEs each
2422consist of a processing unit (PPU and SPU, respectively) and other hardware
2423components, such as memory controllers.
2424</p>
2425 <p>
2426A PPU has two hardware threads (aka "virtual CPUs"). The performance monitor
2427unit of the CBE collects event information on one hardware thread at a time.
2428Therefore, when profiling PPE events,
2429OProfile collects the profile based on the selected events by time slicing the
2430performance counter hardware between the two threads. The user must ensure the
2431collection interval is long enough so that the time spent collecting data for
2432each PPU is sufficient to obtain a good profile.
2433</p>
2434 <p>
2435To profile an SPU application, the user should specify the SPU_CYCLES event.
2436When starting OProfile with SPU_CYCLES, the opcontrol script enforces certain
2437separation parameters (separate=cpu,lib) to ensure that sufficient information
2438is collected in the sample data in order to generate a complete report. The
2439--merge=cpu option can be used to obtain a more readable report if analyzing
2440the performance of each separate SPU is not necessary.
2441</p>
2442 <p>
2443Profiling with an SPU event (events 4100 through 4163) is not compatible with any other
2444event. Further more, only one SPU event can be specified at a time. The hardware only
2445supports profiling on one SPU per node at a time. The OProfile kernel code time slices
2446between the eight SPUs to collect data on all SPUs.
2447</p>
2448 <p>
2449SPU profile reports have some unique characteristics compared to reports for
2450standard architectures:
2451</p>
2452 <div class="itemizedlist">
2453 <ul type="disc">
2454 <li>Typically no "app name" column. This is really standard OProfile behavior
2455when the report contains samples for just a single application, which is
2456commonly the case when profiling SPUs.</li>
2457 <li>"CPU" equates to "SPU"</li>
2458 <li>Specifying '--long-filenames' on the opreport command does not always result
2459in long filenames. This happens when the SPU application code is embedded in
2460the PPE executable or shared library. The embedded SPU ELF data contains only the
2461short filename (i.e., no path information) for the SPU binary file that was used as
2462the source for embedding. The reason that just the short filename is used is because
2463the original SPU binary file may not exist or be accessible at runtime. The performance
2464analyst must have sufficient knowledge of the application to be able to correlate the
2465SPU binary image names found in the report to the application's source files.
2466<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3>
2467Compile the application with -g and generate the OProfile report
2468with -g to facilitate finding the right source file(s) on which to focus.
2469</div></li>
2470 </ul>
2471 </div>
2472 </div>
2473 <div class="sect2" lang="en" xml:lang="en">
2474 <div class="titlepage">
2475 <div>
2476 <div>
2477 <h3 class="title"><a id="amd-ibs-support"></a>4.8. AMD64 (x86_64) Instruction-Based Sampling (IBS) support</h3>
2478 </div>
2479 </div>
2480 </div>
2481 <p>
2482Instruction-Based Sampling (IBS) is a new performance measurement technique
2483available on AMD Family 10h processors. Traditional performance counter
2484sampling is not precise enough to isolate performance issues to individual
2485instructions. IBS, however, precisely identifies instructions which are not
2486making the best use of the processor pipeline and memory hierarchy.
2487For more information, please refer to the "Instruction-Based Sampling:
2488A New Performance Analysis Technique for AMD Family 10h Processors" (
2489<a href="http://developer.amd.com/assets/AMD_IBS_paper_EN.pdf">
2490http://developer.amd.com/assets/AMD_IBS_paper_EN.pdf</a>).
2491There are two types of IBS profile types, described in the following sections.
2492</p>
2493 <div class="sect3" lang="en" xml:lang="en">
2494 <div class="titlepage">
2495 <div>
2496 <div>
2497 <h4 class="title"><a id="ibs-fetch"></a>4.8.1. IBS Fetch</h4>
2498 </div>
2499 </div>
2500 </div>
2501 <p>
2502IBS fetch sampling is a statistical sampling method which counts completed
2503fetch operations. When the number of completed fetch operations reaches the
2504maximum fetch count (the sampling period), IBS tags the fetch operation and
2505monitors that operation until it either completes or aborts. When a tagged
2506fetch completes or aborts, a sampling interrupt is generated and an IBS fetch
2507sample is taken. An IBS fetch sample contains a timestamp, the identifier of
2508the interrupted process, the virtual fetch address, and several event flags
2509and values that describe what happened during the fetch operation.
2510</p>
2511 </div>
2512 <div class="sect3" lang="en" xml:lang="en">
2513 <div class="titlepage">
2514 <div>
2515 <div>
2516 <h4 class="title"><a id="ibs-op"></a>4.8.2. IBS Op</h4>
2517 </div>
2518 </div>
2519 </div>
2520 <p>
2521IBS op sampling selects, tags, and monitors macro-ops as issued from AMD64
2522instructions. Two options are available for selecting ops for sampling:
2523</p>
2524 <div class="itemizedlist">
2525 <ul type="disc">
2526 <li>
2527Cycles-based selection counts CPU clock cycles. The op is tagged and monitored
2528when the count reaches a threshold (the sampling period) and a valid op is
2529available.
2530</li>
2531 <li>
2532Dispatched op-based selection counts dispatched macro-ops.
2533When the count reaches a threshold, the next valid op is tagged and monitored.
2534</li>
2535 </ul>
2536 </div>
2537 <p>
2538In both cases, an IBS sample is generated only if the tagged op retires.
2539Thus, IBS op event information does not measure speculative execution activity.
2540The execution stages of the pipeline monitor the tagged macro-op. When the
2541tagged macro-op retires, a sampling interrupt is generated and an IBS op
2542sample is taken. An IBS op sample contains a timestamp, the identifier of
2543the interrupted process, the virtual address of the AMD64 instruction from
2544which the op was issued, and several event flags and values that describe
2545what happened when the macro-op executed.
2546</p>
2547 </div>
2548 <p>
2549Enabling IBS profiling is done simply by specifying IBS performance events
2550through the "--event=" options. These events are listed in the
2551<code class="function">opcontrol --list-events</code>.
2552</p>
2553 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
2554 <tr>
2555 <td>
2556 <pre class="screen">
2557opcontrol --event=IBS_FETCH_XXX:&lt;count&gt;:&lt;um&gt;:&lt;kernel&gt;:&lt;user&gt;
2558opcontrol --event=IBS_OP_XXX:&lt;count&gt;:&lt;um&gt;:&lt;kernel&gt;:&lt;user&gt;
2559
2560Note: * All IBS fetch event must have the same event count and unitmask,
2561 as do those for IBS op.
2562</pre>
2563 </td>
2564 </tr>
2565 </table>
2566 </div>
2567 <div class="sect2" lang="en" xml:lang="en">
2568 <div class="titlepage">
2569 <div>
2570 <div>
2571 <h3 class="title"><a id="misuse"></a>4.9. Dangerous counter settings</h3>
2572 </div>
2573 </div>
2574 </div>
2575 <p>
2576OProfile is a low-level profiler which allow continuous profiling with a low-overhead cost.
2577If too low a count reset value is set for a counter, the system can become overloaded with counter
2578interrupts, and seem as if the system has frozen. Whilst some validation is done, it
2579is not foolproof.
2580</p>
2581 <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
2582 <h3 class="title">Note</h3>
2583 <p>
2584This can happen as follows: When the profiler count
2585reaches zero an NMI handler is called which stores the sample values in an internal buffer, then resets the counter
2586to its original value. If the count is very low, a pending NMI can be sent before the NMI handler has
2587completed. Due to the priority of the NMI, the local APIC delivers the pending interrupt immediately after
2588completion of the previous interrupt handler, and control never returns to other parts of the system.
2589In this way the system seems to be frozen.
2590</p>
2591 </div>
2592 <p>If this happens, it will be impossible to bring the system back to a workable state.
2593There is no way to provide real security against this happening, other than making sure to use a reasonable value
2594for the counter reset. For example, setting <code class="constant">CPU_CLK_UNHALTED</code> event type with a ridiculously low reset count (e.g. 500)
2595is likely to freeze the system.
2596</p>
2597 <p>
2598In short : <span><strong class="command">Don't try a foolish sample count value</strong></span>. Unfortunately the definition of a foolish value
2599is really dependent on the event type - if ever in doubt, e-mail </p>
2600 <div class="address">
2601 <p><code class="email">&lt;<a href="mailto:oprofile-list@lists.sf.net">oprofile-list@lists.sf.net</a>&gt;</code>.</p>
2602 </div>
2603 </div>
2604 </div>
2605 </div>
2606 <div class="chapter" lang="en" xml:lang="en">
2607 <div class="titlepage">
2608 <div>
2609 <div>
2610 <h2 class="title"><a id="results"></a>Chapter 4. Obtaining results</h2>
2611 </div>
2612 </div>
2613 </div>
2614 <div class="toc">
2615 <p>
2616 <b>Table of Contents</b>
2617 </p>
2618 <dl>
2619 <dt>
2620 <span class="sect1">
2621 <a href="#profile-spec">1. Profile specifications</a>
2622 </span>
2623 </dt>
2624 <dd>
2625 <dl>
2626 <dt>
2627 <span class="sect2">
2628 <a href="#profile-spec-examples">1.1. Examples</a>
2629 </span>
2630 </dt>
2631 <dt>
2632 <span class="sect2">
2633 <a href="#profile-spec-details">1.2. Profile specification parameters</a>
2634 </span>
2635 </dt>
2636 <dt>
2637 <span class="sect2">
2638 <a href="#locating-and-managing-binary-images">1.3. Locating and managing binary images</a>
2639 </span>
2640 </dt>
2641 <dt>
2642 <span class="sect2">
2643 <a href="#no-results">1.4. What to do when you don't get any results</a>
2644 </span>
2645 </dt>
2646 </dl>
2647 </dd>
2648 <dt>
2649 <span class="sect1">
2650 <a href="#opreport">2. Image summaries and symbol summaries (<span><strong class="command">opreport</strong></span>)</a>
2651 </span>
2652 </dt>
2653 <dd>
2654 <dl>
2655 <dt>
2656 <span class="sect2">
2657 <a href="#opreport-merging">2.1. Merging separate profiles</a>
2658 </span>
2659 </dt>
2660 <dt>
2661 <span class="sect2">
2662 <a href="#opreport-comparison">2.2. Side-by-side multiple results</a>
2663 </span>
2664 </dt>
2665 <dt>
2666 <span class="sect2">
2667 <a href="#opreport-callgraph">2.3. Callgraph output</a>
2668 </span>
2669 </dt>
2670 <dt>
2671 <span class="sect2">
2672 <a href="#opreport-diff">2.4. Differential profiles with <span><strong class="command">opreport</strong></span></a>
2673 </span>
2674 </dt>
2675 <dt>
2676 <span class="sect2">
2677 <a href="#opreport-anon">2.5. Anonymous executable mappings</a>
2678 </span>
2679 </dt>
2680 <dt>
2681 <span class="sect2">
2682 <a href="#opreport-xml">2.6. XML formatted output</a>
2683 </span>
2684 </dt>
2685 <dt>
2686 <span class="sect2">
2687 <a href="#opreport-options">2.7. Options for <span><strong class="command">opreport</strong></span></a>
2688 </span>
2689 </dt>
2690 </dl>
2691 </dd>
2692 <dt>
2693 <span class="sect1">
2694 <a href="#opannotate">3. Outputting annotated source (<span><strong class="command">opannotate</strong></span>)</a>
2695 </span>
2696 </dt>
2697 <dd>
2698 <dl>
2699 <dt>
2700 <span class="sect2">
2701 <a href="#opannotate-finding-source">3.1. Locating source files</a>
2702 </span>
2703 </dt>
2704 <dt>
2705 <span class="sect2">
2706 <a href="#opannotate-details">3.2. Usage of <span><strong class="command">opannotate</strong></span></a>
2707 </span>
2708 </dt>
2709 </dl>
2710 </dd>
2711 <dt>
2712 <span class="sect1">
2713 <a href="#getting-jit-reports">4. OProfile results with JIT samples</a>
2714 </span>
2715 </dt>
2716 <dt>
2717 <span class="sect1">
2718 <a href="#opgprof">5. <span><strong class="command">gprof</strong></span>-compatible output (<span><strong class="command">opgprof</strong></span>)</a>
2719 </span>
2720 </dt>
2721 <dd>
2722 <dl>
2723 <dt>
2724 <span class="sect2">
2725 <a href="#opgprof-details">5.1. Usage of <span><strong class="command">opgprof</strong></span></a>
2726 </span>
2727 </dt>
2728 </dl>
2729 </dd>
2730 <dt>
2731 <span class="sect1">
2732 <a href="#oparchive">6. Archiving measurements (<span><strong class="command">oparchive</strong></span>)</a>
2733 </span>
2734 </dt>
2735 <dd>
2736 <dl>
2737 <dt>
2738 <span class="sect2">
2739 <a href="#oparchive-details">6.1. Usage of <span><strong class="command">oparchive</strong></span></a>
2740 </span>
2741 </dt>
2742 </dl>
2743 </dd>
2744 <dt>
2745 <span class="sect1">
2746 <a href="#opimport">7. Converting sample database files (<span><strong class="command">opimport</strong></span>)</a>
2747 </span>
2748 </dt>
2749 <dd>
2750 <dl>
2751 <dt>
2752 <span class="sect2">
2753 <a href="#opimport-details">7.1. Usage of <span><strong class="command">opimport</strong></span></a>
2754 </span>
2755 </dt>
2756 </dl>
2757 </dd>
2758 </dl>
2759 </div>
2760 <p>
2761OK, so the profiler has been running, but it's not much use unless we can get some data out. Fairly often,
2762OProfile does a little <span class="emphasis"><em>too</em></span> good a job of keeping overhead low, and no data reaches
2763the profiler. This can happen on lightly-loaded machines. Remember you can force a dump at any time with :
2764</p>
2765 <p>
2766 <span>
2767 <strong class="command">opcontrol --dump</strong>
2768 </span>
2769 </p>
2770 <p>Remember to do this before complaining there is no profiling data !
2771Now that we've got some data, it has to be processed. That's the job of <span><strong class="command">opreport</strong></span>,
2772<span><strong class="command">opannotate</strong></span>, or <span><strong class="command">opgprof</strong></span>.
2773</p>
2774 <div class="sect1" lang="en" xml:lang="en">
2775 <div class="titlepage">
2776 <div>
2777 <div>
2778 <h2 class="title" style="clear: both"><a id="profile-spec"></a>1. Profile specifications</h2>
2779 </div>
2780 </div>
2781 </div>
2782 <p>
2783All of the analysis tools take a <span class="emphasis"><em>profile specification</em></span>.
2784This is a set of definitions that describe which actual profiles should be
2785examined. The simplest profile specification is empty: this will match all
2786the available profile files for the current session (this is what happens
2787when you do <span><strong class="command">opreport</strong></span>).
2788</p>
2789 <p>
2790Specification parameters are of the form <code class="option">name:value[,value]</code>.
2791For example, if I wanted to get a combined symbol summary for
2792<code class="filename">/bin/myprog</code> and <code class="filename">/bin/myprog2</code>,
2793I could do <span><strong class="command">opreport -l image:/bin/myprog,/bin/myprog2</strong></span>.
2794As a special case, you don't actually need to specify the <code class="option">image:</code>
2795part here: anything left on the command line is assumed to be an
2796<code class="option">image:</code> name. Similarly, if no <code class="option">session:</code>
2797is specified, then <code class="option">session:current</code> is assumed ("current"
2798is a special name of the current / last profiling session).
2799</p>
2800 <p>
2801In addition to the comma-separated list shown above, some of the
2802specification parameters can take <span><strong class="command">glob</strong></span>-style
2803values. For example, if I want to see image summaries for all
2804binaries profiled in <code class="filename">/usr/bin/</code>, I could do
2805<span><strong class="command">opreport image:/usr/bin/\*</strong></span>. Note the necessity
2806to escape the special character from the shell.
2807</p>
2808 <p>
2809For <span><strong class="command">opreport</strong></span>, profile specifications can be used to
2810define two profiles, giving differential output. This is done by
2811enclosing each of the two specifications within curly braces, as shown
2812in the examples below. Any specifications outside of curly braces are
2813shared across both.
2814</p>
2815 <div class="sect2" lang="en" xml:lang="en">
2816 <div class="titlepage">
2817 <div>
2818 <div>
2819 <h3 class="title"><a id="profile-spec-examples"></a>1.1. Examples</h3>
2820 </div>
2821 </div>
2822 </div>
2823 <p>
2824Image summaries for all profiles with <code class="constant">DATA_MEM_REFS</code>
2825samples in the saved session called "stresstest" :
2826</p>
2827 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
2828 <tr>
2829 <td>
2830 <pre class="screen">
2831# opreport session:stresstest event:DATA_MEM_REFS
2832</pre>
2833 </td>
2834 </tr>
2835 </table>
2836 <p>
2837Symbol summary for the application called "test_sym53c8xx,9xx". Note the
2838escaping is necessary as <code class="option">image:</code> takes a comma-separated list.
2839</p>
2840 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
2841 <tr>
2842 <td>
2843 <pre class="screen">
2844# opreport -l ./test/test_sym53c8xx\,9xx
2845</pre>
2846 </td>
2847 </tr>
2848 </table>
2849 <p>
2850Image summaries for all binaries in the <code class="filename">test</code> directory,
2851excepting <code class="filename">boring-test</code> :
2852</p>
2853 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
2854 <tr>
2855 <td>
2856 <pre class="screen">
2857# opreport image:./test/\* image-exclude:./test/boring-test
2858</pre>
2859 </td>
2860 </tr>
2861 </table>
2862 <p>
2863Differential profile of a binary stored in two archives :
2864</p>
2865 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
2866 <tr>
2867 <td>
2868 <pre class="screen">
2869# opreport -l /bin/bash { archive:./orig } { archive:./new }
2870</pre>
2871 </td>
2872 </tr>
2873 </table>
2874 <p>
2875Differential profile of an archived binary with the current session :
2876</p>
2877 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
2878 <tr>
2879 <td>
2880 <pre class="screen">
2881# opreport -l /bin/bash { archive:./orig } { }
2882</pre>
2883 </td>
2884 </tr>
2885 </table>
2886 </div>
2887 <div class="sect2" lang="en" xml:lang="en">
2888 <div class="titlepage">
2889 <div>
2890 <div>
2891 <h3 class="title"><a id="profile-spec-details"></a>1.2. Profile specification parameters</h3>
2892 </div>
2893 </div>
2894 </div>
2895 <div class="variablelist">
2896 <dl>
2897 <dt>
2898 <span class="term">
2899 <code class="option">archive:</code>
2900 <span class="emphasis">
2901 <em>archivepath</em>
2902 </span>
2903 </span>
2904 </dt>
2905 <dd>
2906 <p>
2907 A path to an archive made with <span><strong class="command">oparchive</strong></span>.
2908 Absence of this tag, unlike others, means "the current system",
2909 equivalent to specifying "archive:".
2910 </p>
2911 </dd>
2912 <dt>
2913 <span class="term">
2914 <code class="option">session:</code>
2915 <span class="emphasis">
2916 <em>sessionlist</em>
2917 </span>
2918 </span>
2919 </dt>
2920 <dd>
2921 <p>
2922 A comma-separated list of session names to resolve in. Absence of this
2923 tag, unlike others, means "the current session", equivalent to
2924 specifying "session:current".
2925 </p>
2926 </dd>
2927 <dt>
2928 <span class="term">
2929 <code class="option">session-exclude:</code>
2930 <span class="emphasis">
2931 <em>sessionlist</em>
2932 </span>
2933 </span>
2934 </dt>
2935 <dd>
2936 <p>
2937 A comma-separated list of sessions to exclude.
2938 </p>
2939 </dd>
2940 <dt>
2941 <span class="term">
2942 <code class="option">image:</code>
2943 <span class="emphasis">
2944 <em>imagelist</em>
2945 </span>
2946 </span>
2947 </dt>
2948 <dd>
2949 <p>
2950 A comma-separated list of image names to resolve. Each entry may be relative
2951 path, <span><strong class="command">glob</strong></span>-style name, or full path, e.g.</p>
2952 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
2953 <tr>
2954 <td>
2955 <pre class="screen">opreport 'image:/usr/bin/oprofiled,*op*,./opreport'</pre>
2956 </td>
2957 </tr>
2958 </table>
2959 </dd>
2960 <dt>
2961 <span class="term">
2962 <code class="option">image-exclude:</code>
2963 <span class="emphasis">
2964 <em>imagelist</em>
2965 </span>
2966 </span>
2967 </dt>
2968 <dd>
2969 <p>
2970 Same as <code class="option">image:</code>, but the matching images are excluded.
2971 </p>
2972 </dd>
2973 <dt>
2974 <span class="term">
2975 <code class="option">lib-image:</code>
2976 <span class="emphasis">
2977 <em>imagelist</em>
2978 </span>
2979 </span>
2980 </dt>
2981 <dd>
2982 <p>
2983 Same as <code class="option">image:</code>, but only for images that are for
2984 a particular primary binary image (namely, an application). This only
2985 makes sense to use if you're using <code class="option">--separate</code>.
2986 This includes kernel modules and the kernel when using
2987 <code class="option">--separate=kernel</code>.
2988 </p>
2989 </dd>
2990 <dt>
2991 <span class="term">
2992 <code class="option">lib-image-exclude:</code>
2993 <span class="emphasis">
2994 <em>imagelist</em>
2995 </span>
2996 </span>
2997 </dt>
2998 <dd>
2999 <p>
3000 Same as <code class="option">lib-image:</code>, but the matching images
3001 are excluded.
3002 </p>
3003 </dd>
3004 <dt>
3005 <span class="term">
3006 <code class="option">event:</code>
3007 <span class="emphasis">
3008 <em>eventlist</em>
3009 </span>
3010 </span>
3011 </dt>
3012 <dd>
3013 <p>
3014 The symbolic event name to match on, e.g. <code class="option">event:DATA_MEM_REFS</code>.
3015 You can pass a list of events for side-by-side comparison with <span><strong class="command">opreport</strong></span>.
3016 When using the timer interrupt, the event is always "TIMER".
3017 </p>
3018 </dd>
3019 <dt>
3020 <span class="term">
3021 <code class="option">count:</code>
3022 <span class="emphasis">
3023 <em>eventcountlist</em>
3024 </span>
3025 </span>
3026 </dt>
3027 <dd>
3028 <p>
3029 The event count to match on, e.g. <code class="option">event:DATA_MEM_REFS count:30000</code>.
3030 Note that this value refers to the setting used for <span><strong class="command">opcontrol</strong></span>
3031 only, and has nothing to do with the sample counts in the profile data
3032 itself.
3033 You can pass a list of events for side-by-side comparison with <span><strong class="command">opreport</strong></span>.
3034 When using the timer interrupt, the count is always 0 (indicating it cannot be set).
3035 </p>
3036 </dd>
3037 <dt>
3038 <span class="term">
3039 <code class="option">unit-mask:</code>
3040 <span class="emphasis">
3041 <em>masklist</em>
3042 </span>
3043 </span>
3044 </dt>
3045 <dd>
3046 <p>
3047 The unit mask value of the event to match on, e.g. <code class="option">unit-mask:1</code>.
3048 You can pass a list of events for side-by-side comparison with <span><strong class="command">opreport</strong></span>.
3049 </p>
3050 </dd>
3051 <dt>
3052 <span class="term">
3053 <code class="option">cpu:</code>
3054 <span class="emphasis">
3055 <em>cpulist</em>
3056 </span>
3057 </span>
3058 </dt>
3059 <dd>
3060 <p>
3061 Only consider profiles for the given numbered CPU (starting from zero).
3062 This is only useful when using CPU profile separation.
3063 </p>
3064 </dd>
3065 <dt>
3066 <span class="term">
3067 <code class="option">tgid:</code>
3068 <span class="emphasis">
3069 <em>pidlist</em>
3070 </span>
3071 </span>
3072 </dt>
3073 <dd>
3074 <p>
3075 Only consider profiles for the given task groups. Unless some program
3076 is using threads, the task group ID of a process is the same
3077 as its process ID. This option corresponds to the POSIX
3078 notion of a thread group.
3079 This is only useful when using per-process profile separation.
3080 </p>
3081 </dd>
3082 <dt>
3083 <span class="term">
3084 <code class="option">tid:</code>
3085 <span class="emphasis">
3086 <em>tidlist</em>
3087 </span>
3088 </span>
3089 </dt>
3090 <dd>
3091 <p>
3092 Only consider profiles for the given threads. When using
3093 recent thread libraries, all threads in a process share the
3094 same task group ID, but have different thread IDs. You can
3095 use this option in combination with <code class="option">tgid:</code> to
3096 restrict the results to particular threads within a process.
3097 This is only useful when using per-process profile separation.
3098 </p>
3099 </dd>
3100 </dl>
3101 </div>
3102 </div>
3103 <div class="sect2" lang="en" xml:lang="en">
3104 <div class="titlepage">
3105 <div>
3106 <div>
3107 <h3 class="title"><a id="locating-and-managing-binary-images"></a>1.3. Locating and managing binary images</h3>
3108 </div>
3109 </div>
3110 </div>
3111 <p>
3112Each session's sample files can be found in the $SESSION_DIR/samples/ directory (default: <code class="filename">/var/lib/oprofile/samples/</code>).
3113These are used, along with the binary image files, to produce human-readable data.
3114In some circumstances (kernel modules in an initrd, or modules on 2.6 kernels), OProfile
3115will not be able to find the binary images. All the tools have an <code class="option">--image-path</code>
3116option to which you can pass a comma-separated list of alternate paths to search. For example,
3117I can let OProfile find my 2.6 modules by using <span><strong class="command">--image-path /lib/modules/2.6.0/kernel/</strong></span>.
3118It is your responsibility to ensure that the correct images are found when using this
3119option.
3120</p>
3121 <p>
3122Note that if a binary image changes after the sample file was created, you won't be able to get useful
3123symbol-based data out. This situation is detected for you. If you replace a binary, you should
3124make sure to save the old binary if you need to do comparative profiles.
3125</p>
3126 </div>
3127 <div class="sect2" lang="en" xml:lang="en">
3128 <div class="titlepage">
3129 <div>
3130 <div>
3131 <h3 class="title"><a id="no-results"></a>1.4. What to do when you don't get any results</h3>
3132 </div>
3133 </div>
3134 </div>
3135 <p>
3136When attempting to get output, you may see the error :
3137</p>
3138 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
3139 <tr>
3140 <td>
3141 <pre class="screen">
3142error: no sample files found: profile specification too strict ?
3143</pre>
3144 </td>
3145 </tr>
3146 </table>
3147 <p>
3148What this is saying is that the profile specification you passed in,
3149when matched against the available sample files, resulted in no matches.
3150There are a number of reasons this might happen:
3151</p>
3152 <div class="variablelist">
3153 <dl>
3154 <dt>
3155 <span class="term">spelling</span>
3156 </dt>
3157 <dd>
3158 <p>
3159You specified a binary name, but spelt it wrongly. Check your spelling !
3160</p>
3161 </dd>
3162 <dt>
3163 <span class="term">profiler wasn't running</span>
3164 </dt>
3165 <dd>
3166 <p>
3167Make very sure that OProfile was actually up and running when you ran
3168the binary.
3169</p>
3170 </dd>
3171 <dt>
3172 <span class="term">binary didn't run long enough</span>
3173 </dt>
3174 <dd>
3175 <p>
3176Remember OProfile is a statistical profiler - you're not guaranteed to
3177get samples for short-running programs. You can help this by using a
3178lower count for the performance counter, so there are a lot more samples
3179taken per second.
3180</p>
3181 </dd>
3182 <dt>
3183 <span class="term">binary spent most of its time in libraries</span>
3184 </dt>
3185 <dd>
3186 <p>
3187Similarly, if the binary spends little time in the main binary image
3188itself, with most of it spent in shared libraries it uses, you might
3189not see any samples for the binary image itself. You can check this
3190by using <span><strong class="command">opcontrol --separate=lib</strong></span> before the
3191profiling session, so <span><strong class="command">opreport</strong></span> and friends show
3192the library profiles on a per-application basis.
3193</p>
3194 </dd>
3195 <dt>
3196 <span class="term">specification was really too strict</span>
3197 </dt>
3198 <dd>
3199 <p>
3200For example, you specified something like <code class="option">tgid:3433</code>,
3201but no task with that group ID ever ran the code.
3202</p>
3203 </dd>
3204 <dt>
3205 <span class="term">binary didn't generate any events</span>
3206 </dt>
3207 <dd>
3208 <p>
3209If you're using a particular event counter, for example counting MMX
3210operations, the code might simply have not generated any events in the
3211first place. Verify the code you're profiling does what you expect it
3212to.
3213</p>
3214 </dd>
3215 <dt>
3216 <span class="term">you didn't specify kernel module name correctly</span>
3217 </dt>
3218 <dd>
3219 <p>
3220If you're using 2.6 kernels, and trying to get reports for a kernel
3221module, make sure to use the <code class="option">-p</code> option, and specify the
3222module name <span class="emphasis"><em>with</em></span> the <code class="filename">.ko</code>
3223extension. Check if the module is one loaded from initrd.
3224</p>
3225 </dd>
3226 </dl>
3227 </div>
3228 </div>
3229 </div>
3230 <div class="sect1" lang="en" xml:lang="en">
3231 <div class="titlepage">
3232 <div>
3233 <div>
3234 <h2 class="title" style="clear: both"><a id="opreport"></a>2. Image summaries and symbol summaries (<span><strong class="command">opreport</strong></span>)</h2>
3235 </div>
3236 </div>
3237 </div>
3238 <p>
3239The <span><strong class="command">opreport</strong></span> utility is the primary utility you will use for
3240getting formatted data out of OProfile. It produces two types of data: image summaries
3241and symbol summaries. An image summary lists the number of samples for individual
3242binary images such as libraries or applications. Symbol summaries provide per-symbol
3243profile data. In the following example, we're getting an image summary for the whole
3244system:
3245</p>
3246 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
3247 <tr>
3248 <td>
3249 <pre class="screen">
3250$ opreport --long-filenames
3251CPU: PIII, speed 863.195 MHz (estimated)
3252Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 23150
3253 905898 59.7415 /usr/lib/gcc-lib/i386-redhat-linux/3.2/cc1plus
3254 214320 14.1338 /boot/2.6.0/vmlinux
3255 103450 6.8222 /lib/i686/libc-2.3.2.so
3256 60160 3.9674 /usr/local/bin/madplay
3257 31769 2.0951 /usr/local/oprofile-pp/bin/oprofiled
3258 26550 1.7509 /usr/lib/libartsflow.so.1.0.0
3259 23906 1.5765 /usr/bin/as
3260 18770 1.2378 /oprofile
3261 15528 1.0240 /usr/lib/qt-3.0.5/lib/libqt-mt.so.3.0.5
3262 11979 0.7900 /usr/X11R6/bin/XFree86
3263 11328 0.7471 /bin/bash
3264 ...
3265</pre>
3266 </td>
3267 </tr>
3268 </table>
3269 <p>
3270If we had specified <code class="option">--symbols</code> in the previous command, we would have
3271gotten a symbol summary of all the images across the entire system. We can restrict this to only
3272part of the system profile; for example,
3273below is a symbol summary of the OProfile daemon. Note that as we used
3274<span><strong class="command">opcontrol --separate=kernel</strong></span>, symbols from images that <span><strong class="command">oprofiled</strong></span>
3275has used are also shown.
3276</p>
3277 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
3278 <tr>
3279 <td>
3280 <pre class="screen">
3281$ opreport -l `which oprofiled` 2&gt;/dev/null | more
3282CPU: PIII, speed 863.195 MHz (estimated)
3283Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 23150
3284vma samples % image name symbol name
32850804be10 14971 28.1993 oprofiled odb_insert
32860804afdc 7144 13.4564 oprofiled pop_buffer_value
3287c01daea0 6113 11.5144 vmlinux __copy_to_user_ll
32880804b060 2816 5.3042 oprofiled opd_put_sample
32890804b4a0 2147 4.0441 oprofiled opd_process_samples
32900804acf4 1855 3.4941 oprofiled opd_put_image_sample
32910804ad84 1766 3.3264 oprofiled opd_find_image
32920804a5ec 1084 2.0418 oprofiled opd_find_module
32930804ba5c 741 1.3957 oprofiled odb_hash_add_node
3294...
3295</pre>
3296 </td>
3297 </tr>
3298 </table>
3299 <p>
3300These are the two basic ways you are most likely to use regularly, but <span><strong class="command">opreport</strong></span>
3301can do a lot more than that, as described below.
3302</p>
3303 <div class="sect2" lang="en" xml:lang="en"><div class="titlepage"><div><div><h3 class="title"><a id="opreport-merging"></a>2.1. Merging separate profiles</h3></div></div></div>
3304
3305If you have used one of the <code class="option">--separate=</code> options
3306whilst profiling, there can be several separate profiles for
3307a single binary image within a session. Normally the output
3308will keep these images separated (so, for example, the image summary
3309output shows library image summaries on a per-application basis,
3310when using <code class="option">--separate=lib</code>).
3311Sometimes it can be useful to merge these results back together
3312before getting results. The <code class="option">--merge</code> option allows
3313you to do that.
3314</div>
3315 <div class="sect2" lang="en" xml:lang="en"><div class="titlepage"><div><div><h3 class="title"><a id="opreport-comparison"></a>2.2. Side-by-side multiple results</h3></div></div></div>
3316If you have used multiple events when profiling, by default you get
3317side-by-side results of each event's sample values from <span><strong class="command">opreport</strong></span>.
3318You can restrict which events to list by appropriate use of the
3319<code class="option">event:</code> profile specifications, etc.
3320</div>
3321 <div class="sect2" lang="en" xml:lang="en">
3322 <div class="titlepage">
3323 <div>
3324 <div>
3325 <h3 class="title"><a id="opreport-callgraph"></a>2.3. Callgraph output</h3>
3326 </div>
3327 </div>
3328 </div>
3329 <p>
3330This section provides details on how to use the OProfile callgraph feature.
3331</p>
3332 <div class="sect3" lang="en" xml:lang="en">
3333 <div class="titlepage">
3334 <div>
3335 <div>
3336 <h4 class="title"><a id="op-cg1"></a>2.3.1. Callgraph details</h4>
3337 </div>
3338 </div>
3339 </div>
3340 <p>
3341When using the <code class="option">opcontrol --callgraph</code> option, you can see what
3342functions are calling other functions in the output. Consider the
3343following program:
3344</p>
3345 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
3346 <tr>
3347 <td>
3348 <pre class="screen">
3349#include &lt;string.h&gt;
3350#include &lt;stdlib.h&gt;
3351#include &lt;stdio.h&gt;
3352
3353#define SIZE 500000
3354
3355static int compare(const void *s1, const void *s2)
3356{
3357 return strcmp(s1, s2);
3358}
3359
3360static void repeat(void)
3361{
3362 int i;
3363 char *strings[SIZE];
3364 char str[] = "abcdefghijklmnopqrstuvwxyz";
3365
3366 for (i = 0; i &lt; SIZE; ++i) {
3367 strings[i] = strdup(str);
3368 strfry(strings[i]);
3369 }
3370
3371 qsort(strings, SIZE, sizeof(char *), compare);
3372}
3373
3374int main()
3375{
3376 while (1)
3377 repeat();
3378}
3379</pre>
3380 </td>
3381 </tr>
3382 </table>
3383 <p>
3384When running with the call-graph option, OProfile will
3385record the function stack every time it takes a sample.
3386<span><strong class="command">opreport --callgraph</strong></span> outputs an entry for each
3387function, where each entry looks similar to:
3388</p>
3389 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
3390 <tr>
3391 <td>
3392 <pre class="screen">
3393samples % image name symbol name
3394 197 0.1548 cg main
3395 127036 99.8452 cg repeat
339684590 42.5084 libc-2.3.2.so strfry
3397 84590 66.4838 libc-2.3.2.so strfry [self]
3398 39169 30.7850 libc-2.3.2.so random_r
3399 3475 2.7312 libc-2.3.2.so __i686.get_pc_thunk.bx
3400-------------------------------------------------------------------------------
3401</pre>
3402 </td>
3403 </tr>
3404 </table>
3405 <p>
3406Here the non-indented line is the function we're focussing upon
3407(<code class="function">strfry()</code>). This
3408line is the same as you'd get from a normal <span><strong class="command">opreport</strong></span>
3409output.
3410</p>
3411 <p>
3412Above the non-indented line we find the functions that called this
3413function (for example, <code class="function">repeat()</code> calls
3414<code class="function">strfry()</code>). The samples and percentage values here
3415refer to the number of times we took a sample where this call was found
3416in the stack; the percentage is relative to all other callers of the
3417function we're focussing on. Note that these values are
3418<span class="emphasis"><em>not</em></span> call counts; they only reflect the call stack
3419every time a sample is taken; that is, if a call is found in the stack
3420at the time of a sample, it is recorded in this count.
3421</p>
3422 <p>
3423Below the line are functions that are called by
3424<code class="function">strfry()</code> (called <span class="emphasis"><em>callees</em></span>).
3425It's clear here that <code class="function">strfry()</code> calls
3426<code class="function">random_r()</code>. We also see a special entry with a
3427"[self]" marker. This records the normal samples for the function, but
3428the percentage becomes relative to all callees. This allows you to
3429compare time spent in the function itself compared to functions it
3430calls. Note that if a function calls itself, then it will appear in the
3431list of callees of itself, but without the "[self]" marker; so recursive
3432calls are still clearly separable.
3433</p>
3434 <p>
3435You may have noticed that the output lists <code class="function">main()</code>
3436as calling <code class="function">strfry()</code>, but it's clear from the source
3437that this doesn't actually happen. See <a href="#interpreting-callgraph" title="3. Interpreting call-graph profiles">Section 3, &#8220;Interpreting call-graph profiles&#8221;</a> for an explanation.
3438</p>
3439 </div>
3440 <div class="sect3" lang="en" xml:lang="en">
3441 <div class="titlepage">
3442 <div>
3443 <div>
3444 <h4 class="title"><a id="cg-with-jitsupport"></a>2.3.2. Callgraph and JIT support</h4>
3445 </div>
3446 </div>
3447 </div>
3448 <p>
3449Callgraph output where anonymously mapped code is in the callstack can sometimes be misleading.
3450For all such code, the samples for the anonymously mapped code are stored in a samples subdirectory
3451named <code class="filename">{anon:anon}/&lt;tgid&gt;.&lt;begin_addr&gt;.&lt;end_addr&gt;</code>.
3452As stated earlier, if this anonymously mapped code is JITed code from a supported VM like Java,
3453OProfile creates an ELF file to provide a (somewhat) permanent backing file for the code.
3454However, when viewing callgraph output, any anonymously mapped code in the callstack
3455will be attributed to <code class="filename">anon (&lt;tgid&gt;: range:&lt;begin_addr&gt;-&lt;end_addr&gt;</code>,
3456even if a <code class="filename">.jo</code> ELF file had been created for it. See the example below.
3457</p>
3458 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
3459 <tr>
3460 <td>
3461 <pre class="screen">
3462-------------------------------------------------------------------------------
3463 1 2.2727 libj9ute23.so java.bin traceV
3464 2 4.5455 libj9ute23.so java.bin utsTraceV
3465 4 9.0909 libj9trc23.so java.bin fillInUTInterfaces
3466 37 84.0909 libj9trc23.so java.bin twGetSequenceCounter
34678 0.0154 libj9prt23.so java.bin j9time_hires_clock
3468 27 61.3636 anon (tgid:10014 range:0x100000-0x103000) java.bin (no symbols)
3469 9 20.4545 libc-2.4.so java.bin gettimeofday
3470 8 18.1818 libj9prt23.so java.bin j9time_hires_clock [self]
3471-------------------------------------------------------------------------------
3472</pre>
3473 </td>
3474 </tr>
3475 </table>
3476 <p>
3477The output shows that "anon (tgid:10014 range:0x100000-0x103000)" was a callee of
3478<code class="code">j9time_hires_clock</code>, even though the ELF file <code class="filename">10014.jo</code> was
3479created for this profile run. Unfortunately, there is currently no way to correlate
3480that anonymous callgraph entry with its corresponding <code class="filename">.jo</code> file.
3481</p>
3482 </div>
3483 </div>
3484 <div class="sect2" lang="en" xml:lang="en">
3485 <div class="titlepage">
3486 <div>
3487 <div>
3488 <h3 class="title"><a id="opreport-diff"></a>2.4. Differential profiles with <span><strong class="command">opreport</strong></span></h3>
3489 </div>
3490 </div>
3491 </div>
3492 <p>
3493Often, we'd like to be able to compare two profiles. For example, when
3494analysing the performance of an application, we'd like to make code
3495changes and examine the effect of the change. This is supported in
3496<span><strong class="command">opreport</strong></span> by giving a profile specification that
3497identifies two different profiles. The general form is of:
3498</p>
3499 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
3500 <tr>
3501 <td>
3502 <pre class="screen">
3503$ opreport &lt;shared-spec&gt; { &lt;first-profile&gt; } { &lt;second-profile&gt; }
3504</pre>
3505 </td>
3506 </tr>
3507 </table>
3508 <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
3509 <h3 class="title">Note</h3>
3510 <p>
3511We lost our Dragon book down the back of the sofa, so you have to be
3512careful to have spaces around those braces, or things will get
3513hopelessly confused. We can only apologise.
3514</p>
3515 </div>
3516 <p>
3517For each of the profiles, the shared section is prefixed, and then the
3518specification is analysed. The usual parameters work both within the
3519shared section, and in the sub-specification within the curly braces.
3520</p>
3521 <p>
3522A typical way to use this feature is with archives created with
3523<span><strong class="command">oparchive</strong></span>. Let's look at an example:
3524</p>
3525 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
3526 <tr>
3527 <td>
3528 <pre class="screen">
3529$ ./a
3530$ oparchive -o orig ./a
3531$ opcontrol --reset
3532 # edit and recompile a
3533$ ./a
3534 # now compare the current profile of a with the archived profile
3535$ opreport -xl ./a { archive:./orig } { }
3536CPU: PIII, speed 863.233 MHz (estimated)
3537Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a
3538unit mask of 0x00 (No unit mask) count 100000
3539samples % diff % symbol name
354092435 48.5366 +0.4999 a
354154226 --- --- c
354249222 25.8459 +++ d
354348787 25.6175 -2.2e-01 b
3544</pre>
3545 </td>
3546 </tr>
3547 </table>
3548 <p>
3549Note that we specified an empty second profile in the curly braces, as
3550we wanted to use the current session; alternatively, we could
3551have specified another archive, or a tgid etc. We specified the binary
3552<span><strong class="command">a</strong></span> in the shared section, so we matched that in both
3553the profiles we're diffing.
3554</p>
3555 <p>
3556As in the normal output, the results are sorted by the number of
3557samples, and the percentage field represents the relative percentage of
3558the symbol's samples in the second profile.
3559</p>
3560 <p>
3561Notice the new column in the output. This value represents the
3562percentage change of the relative percent between the first and the
3563second profile: roughly, "how much more important this symbol is".
3564Looking at the symbol <code class="function">a()</code>, we can see that it took
3565roughly the same amount of the total profile in both the first and the
3566second profile. The function <code class="function">c()</code> was not in the new
3567profile, so has been marked with <code class="function">---</code>. Note that the
3568sample value is the number of samples in the first profile; since we're
3569displaying results for the second profile, we don't list a percentage
3570value for it, as it would be meaningless. <code class="function">d()</code> is
3571new in the second profile, and consequently marked with
3572<code class="function">+++</code>.
3573</p>
3574 <p>
3575When comparing profiles between different binaries, it should be clear
3576that functions can change in terms of VMA and size. To avoid this
3577problem, <span><strong class="command">opreport</strong></span> considers a symbol to be the same
3578if the symbol name, image name, and owning application name all match;
3579any other factors are ignored. Note that the check for application name
3580means that trying to compare library profiles between two different
3581applications will not work as you might expect: each symbol will be
3582considered different.
3583</p>
3584 </div>
3585 <div class="sect2" lang="en" xml:lang="en">
3586 <div class="titlepage">
3587 <div>
3588 <div>
3589 <h3 class="title"><a id="opreport-anon"></a>2.5. Anonymous executable mappings</h3>
3590 </div>
3591 </div>
3592 </div>
3593 <p>
3594Many applications, typically ones involving dynamic compilation into
3595machine code (just-in-time, or "JIT", compilation), have executable mappings that
3596are not backed by an ELF file. <span><strong class="command">opreport</strong></span> has basic support for showing the
3597samples taken in these regions; for example:
3598</p>
3599 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
3600 <tr>
3601 <td>
3602 <pre class="screen">
3603$ opreport /usr/bin/mono -l
3604CPU: ppc64 POWER5, speed 1654.34 MHz (estimated)
3605Counted CYCLES events (Processor Cycles using continuous sampling) with a unit mask of 0x00 (No unit mask) count 100000
3606samples % image name symbol name
360747 58.7500 mono (no symbols)
360814 17.5000 anon (tgid:3189 range:0xf72aa000-0xf72fa000) (no symbols)
36099 11.2500 anon (tgid:3189 range:0xf6cca000-0xf6dd9000) (no symbols)
3610. . . .
3611</pre>
3612 </td>
3613 </tr>
3614 </table>
3615 <p>
3616</p>
3617 <p>
3618Note that, since such mappings are dependent upon individual invocations of
3619a binary, these mappings are always listed as a dependent image,
3620even when using <code class="option">--separate=none</code>.
3621Equally, the results are not affected by the <code class="option">--merge</code>
3622option.
3623</p>
3624 <p>
3625As shown in the opreport output above, OProfile is unable to attribute the samples to any
3626symbol(s) because there is no ELF file for this code.
3627Enhanced support for JITed code is now available for some virtual machines;
3628e.g., the Java Virtual Machine. For details about OProfile output for
3629JITed code, see <a href="#getting-jit-reports" title="4. OProfile results with JIT samples">Section 4, &#8220;OProfile results with JIT samples&#8221;</a>.
3630</p>
3631 <p>For more information about JIT support in OProfile, see <a href="#jitsupport" title="1.1. Support for dynamically compiled (JIT) code">Section 1.1, &#8220;Support for dynamically compiled (JIT) code&#8221;</a>.
3632</p>
3633 </div>
3634 <div class="sect2" lang="en" xml:lang="en">
3635 <div class="titlepage">
3636 <div>
3637 <div>
3638 <h3 class="title"><a id="opreport-xml"></a>2.6. XML formatted output</h3>
3639 </div>
3640 </div>
3641 </div>
3642 <p>
3643The -xml option can be used to generate XML instead of the usual
3644text format. This allows opreport to eliminate some of the constraints
3645dictated by the two dimensional text format. For example, it is possible
3646to separate the sample data across multiple events, cpus and threads. The XML
3647schema implemented by opreport is found in doc/opreport.xsd. It contains
3648more detailed comments about the structure of the XML generated by opreport.
3649</p>
3650 <p>
3651Since XML is consumed by a client program rather than a user, its structure
3652is fairly static. In particular, the --sort option is incompatible with the
3653--xml option. Percentages are not dislayed in the XML so the options related
3654to percentages will have no effect. Full pathnames are always displayed in
3655the XML so --long-filenames is not necessary. The --details option will cause
3656all of the individual sample data to be included in the XML as well as the
3657instruction byte stream for each symbol (for doing disassembly) and can result
3658in very large XML files.
3659</p>
3660 </div>
3661 <div class="sect2" lang="en" xml:lang="en">
3662 <div class="titlepage">
3663 <div>
3664 <div>
3665 <h3 class="title"><a id="opreport-options"></a>2.7. Options for <span><strong class="command">opreport</strong></span></h3>
3666 </div>
3667 </div>
3668 </div>
3669 <div class="variablelist">
3670 <dl>
3671 <dt>
3672 <span class="term">
3673 <code class="option">--accumulated / -a</code>
3674 </span>
3675 </dt>
3676 <dd>
3677 <p>
3678Accumulate sample and percentage counts in the symbol list.
3679</p>
3680 </dd>
3681 <dt>
3682 <span class="term">
3683 <code class="option">--callgraph / -c</code>
3684 </span>
3685 </dt>
3686 <dd>
3687 <p>
3688Show callgraph information.
3689</p>
3690 </dd>
3691 <dt>
3692 <span class="term">
3693 <code class="option">--debug-info / -g</code>
3694 </span>
3695 </dt>
3696 <dd>
3697 <p>
3698Show source file and line for each symbol.
3699</p>
3700 </dd>
3701 <dt>
3702 <span class="term">
3703 <code class="option">--demangle / -D none|normal|smart</code>
3704 </span>
3705 </dt>
3706 <dd>
3707 <p>
3708none: no demangling. normal: use default demangler (default) smart: use
3709pattern-matching to make C++ symbol demangling more readable.
3710</p>
3711 </dd>
3712 <dt>
3713 <span class="term">
3714 <code class="option">--details / -d</code>
3715 </span>
3716 </dt>
3717 <dd>
3718 <p>
3719Show per-instruction details for all selected symbols. Note that, for
3720binaries without symbol information, the VMA values shown are raw file
3721offsets for the image binary.
3722</p>
3723 </dd>
3724 <dt>
3725 <span class="term">
3726 <code class="option">--exclude-dependent / -x</code>
3727 </span>
3728 </dt>
3729 <dd>
3730 <p>
3731Do not include application-specific images for libraries, kernel modules
3732and the kernel. This option only makes sense if the profile session
3733used --separate.
3734</p>
3735 </dd>
3736 <dt>
3737 <span class="term">
3738 <code class="option">--exclude-symbols / -e [symbols]</code>
3739 </span>
3740 </dt>
3741 <dd>
3742 <p>
3743Exclude all the symbols in the given comma-separated list.
3744</p>
3745 </dd>
3746 <dt>
3747 <span class="term">
3748 <code class="option">--global-percent / -%</code>
3749 </span>
3750 </dt>
3751 <dd>
3752 <p>
3753Make all percentages relative to the whole profile.
3754</p>
3755 </dd>
3756 <dt>
3757 <span class="term">
3758 <code class="option">--help / -? / --usage</code>
3759 </span>
3760 </dt>
3761 <dd>
3762 <p>
3763Show help message.
3764</p>
3765 </dd>
3766 <dt>
3767 <span class="term">
3768 <code class="option">--image-path / -p [paths]</code>
3769 </span>
3770 </dt>
3771 <dd>
3772 <p>
3773Comma-separated list of additional paths to search for binaries.
3774This is needed to find modules in kernels 2.6 and upwards.
3775</p>
3776 </dd>
3777 <dt>
3778 <span class="term">
3779 <code class="option">--root / -R [path]</code>
3780 </span>
3781 </dt>
3782 <dd>
3783 <p>
3784A path to a filesystem to search for additional binaries.
3785</p>
3786 </dd>
3787 <dt>
3788 <span class="term">
3789 <code class="option">--include-symbols / -i [symbols]</code>
3790 </span>
3791 </dt>
3792 <dd>
3793 <p>
3794Only include symbols in the given comma-separated list.
3795</p>
3796 </dd>
3797 <dt>
3798 <span class="term">
3799 <code class="option">--long-filenames / -f</code>
3800 </span>
3801 </dt>
3802 <dd>
3803 <p>
3804Output full paths instead of basenames.
3805</p>
3806 </dd>
3807 <dt>
3808 <span class="term">
3809 <code class="option">--merge / -m [lib,cpu,tid,tgid,unitmask,all]</code>
3810 </span>
3811 </dt>
3812 <dd>
3813 <p>
3814Merge any profiles separated in a --separate session.
3815</p>
3816 </dd>
3817 <dt>
3818 <span class="term">
3819 <code class="option">--no-header</code>
3820 </span>
3821 </dt>
3822 <dd>
3823 <p>
3824Don't output a header detailing profiling parameters.
3825</p>
3826 </dd>
3827 <dt>
3828 <span class="term">
3829 <code class="option">--output-file / -o [file]</code>
3830 </span>
3831 </dt>
3832 <dd>
3833 <p>
3834Output to the given file instead of stdout.
3835</p>
3836 </dd>
3837 <dt>
3838 <span class="term">
3839 <code class="option">--reverse-sort / -r</code>
3840 </span>
3841 </dt>
3842 <dd>
3843 <p>
3844Reverse the sort from the default.
3845</p>
3846 </dd>
3847 <dt>
3848 <span class="term"><code class="option">--session-dir=</code>dir_path</span>
3849 </dt>
3850 <dd>
3851 <p>
3852Use sample database out of directory <code class="filename">dir_path</code>
3853instead of the default location (/var/lib/oprofile).
3854</p>
3855 </dd>
3856 <dt>
3857 <span class="term">
3858 <code class="option">--show-address / -w</code>
3859 </span>
3860 </dt>
3861 <dd>
3862 <p>
3863Show the VMA address of each symbol (off by default).
3864</p>
3865 </dd>
3866 <dt>
3867 <span class="term">
3868 <code class="option">--sort / -s [vma,sample,symbol,debug,image]</code>
3869 </span>
3870 </dt>
3871 <dd>
3872 <p>
3873Sort the list of symbols by, respectively, symbol address,
3874number of samples, symbol name, debug filename and line number,
3875binary image filename.
3876</p>
3877 </dd>
3878 <dt>
3879 <span class="term">
3880 <code class="option">--symbols / -l</code>
3881 </span>
3882 </dt>
3883 <dd>
3884 <p>
3885List per-symbol information instead of a binary image summary.
3886</p>
3887 </dd>
3888 <dt>
3889 <span class="term">
3890 <code class="option">--threshold / -t [percentage]</code>
3891 </span>
3892 </dt>
3893 <dd>
3894 <p>
3895Only output data for symbols that have more than the given percentage
3896of total samples.
3897</p>
3898 </dd>
3899 <dt>
3900 <span class="term">
3901 <code class="option">--verbose / -V [options]</code>
3902 </span>
3903 </dt>
3904 <dd>
3905 <p>
3906Give verbose debugging output.
3907</p>
3908 </dd>
3909 <dt>
3910 <span class="term">
3911 <code class="option">--version / -v</code>
3912 </span>
3913 </dt>
3914 <dd>
3915 <p>
3916Show version.
3917</p>
3918 </dd>
3919 <dt>
3920 <span class="term">
3921 <code class="option">--xml / -X</code>
3922 </span>
3923 </dt>
3924 <dd>
3925 <p>
3926Generate XML output.
3927</p>
3928 </dd>
3929 </dl>
3930 </div>
3931 </div>
3932 </div>
3933 <div class="sect1" lang="en" xml:lang="en">
3934 <div class="titlepage">
3935 <div>
3936 <div>
3937 <h2 class="title" style="clear: both"><a id="opannotate"></a>3. Outputting annotated source (<span><strong class="command">opannotate</strong></span>)</h2>
3938 </div>
3939 </div>
3940 </div>
3941 <p>
3942The <span><strong class="command">opannotate</strong></span> utility generates annotated source files or assembly listings, optionally
3943mixed with source.
3944If you want to see the source file, the profiled application needs to have debug information, and the source
3945must be available through this debug information. For GCC, you must use the <code class="option">-g</code> option
3946when you are compiling.
3947If the binary doesn't contain sufficient debug information, you can still
3948use <span><strong class="command">opannotate <code class="option">--assembly</code></strong></span> to get annotated assembly.
3949</p>
3950 <p>
3951Note that for the reason explained in <a href="#hardware-counters" title="4.1. Hardware performance counters">Section 4.1, &#8220;Hardware performance counters&#8221;</a> the results can be
3952inaccurate. The debug information itself can add other problems; for example, the line number for a symbol can be
3953incorrect. Assembly instructions can be re-ordered and moved by the compiler, and this can lead to
3954crediting source lines with samples not really "owned" by this line. Also see
3955<a href="#interpreting" title="Chapter 5. Interpreting profiling results">Chapter 5, <i>Interpreting profiling results</i></a>.
3956</p>
3957 <p>
3958You can output the annotation to one single file, containing all the source found using the
3959<code class="option">--source</code>. You can use this in conjunction with <code class="option">--assembly</code>
3960to get combined source/assembly output.
3961</p>
3962 <p>
3963You can also output a directory of annotated source files that maintains the structure of
3964the original sources. Each line in the annotated source is prepended with the samples
3965for that line. Additionally, each symbol is annotated giving details for the symbol
3966as a whole. An example:
3967</p>
3968 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
3969 <tr>
3970 <td>
3971 <pre class="screen">
3972$ opannotate --source --output-dir=annotated /usr/local/oprofile-pp/bin/oprofiled
3973$ ls annotated/home/moz/src/oprofile-pp/daemon/
3974opd_cookie.h opd_image.c opd_kernel.c opd_sample_files.c oprofiled.c
3975</pre>
3976 </td>
3977 </tr>
3978 </table>
3979 <p>
3980Line numbers are maintained in the source files, but each file has
3981a footer appended describing the profiling details. The actual annotation
3982looks something like this :
3983</p>
3984 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
3985 <tr>
3986 <td>
3987 <pre class="screen">
3988...
3989 :static uint64_t pop_buffer_value(struct transient * trans)
3990 11510 1.9661 :{ /* pop_buffer_value total: 89901 15.3566 */
3991 : uint64_t val;
3992 :
3993 10227 1.7469 : if (!trans-&gt;remaining) {
3994 : fprintf(stderr, "BUG: popping empty buffer !\n");
3995 : exit(EXIT_FAILURE);
3996 : }
3997 :
3998 : val = get_buffer_value(trans-&gt;buffer, 0);
3999 2281 0.3896 : trans-&gt;remaining--;
4000 2296 0.3922 : trans-&gt;buffer += kernel_pointer_size;
4001 : return val;
4002 10454 1.7857 :}
4003...
4004</pre>
4005 </td>
4006 </tr>
4007 </table>
4008 <p>
4009The first number on each line is the number of samples, whilst the second is
4010the relative percentage of total samples.
4011</p>
4012 <div class="sect2" lang="en" xml:lang="en">
4013 <div class="titlepage">
4014 <div>
4015 <div>
4016 <h3 class="title"><a id="opannotate-finding-source"></a>3.1. Locating source files</h3>
4017 </div>
4018 </div>
4019 </div>
4020 <p>
4021Of course, <span><strong class="command">opannotate</strong></span> needs to be able to locate the source files
4022for the binary image(s) in order to produce output. Some binary images have debug
4023information where the given source file paths are relative, not absolute. You can
4024specify search paths to look for these files (similar to <span><strong class="command">gdb</strong></span>'s
4025<code class="option">dir</code> command) with the <code class="option">--search-dirs</code> option.
4026</p>
4027 <p>
4028Sometimes you may have a binary image which gives absolute paths for the source files,
4029but you have the actual sources elsewhere (commonly, you've installed an SRPM for
4030a binary on your system and you want annotation from an existing profile). You can
4031use the <code class="option">--base-dirs</code> option to redirect OProfile to look somewhere
4032else for source files. For example, imagine we have a binary generated from a source
4033file that is given in the debug information as <code class="filename">/tmp/build/libfoo/foo.c</code>,
4034and you have the source tree matching that binary installed in <code class="filename">/home/user/libfoo/</code>.
4035You can redirect OProfile to find <code class="filename">foo.c</code> correctly like this :
4036</p>
4037 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
4038 <tr>
4039 <td>
4040 <pre class="screen">
4041$ opannotate --source --base-dirs=/tmp/build/libfoo/ --search-dirs=/home/user/libfoo/ --output-dir=annotated/ /lib/libfoo.so
4042</pre>
4043 </td>
4044 </tr>
4045 </table>
4046 <p>
4047You can specify multiple (comma-separated) paths to both options.
4048</p>
4049 </div>
4050 <div class="sect2" lang="en" xml:lang="en">
4051 <div class="titlepage">
4052 <div>
4053 <div>
4054 <h3 class="title"><a id="opannotate-details"></a>3.2. Usage of <span><strong class="command">opannotate</strong></span></h3>
4055 </div>
4056 </div>
4057 </div>
4058 <div class="variablelist">
4059 <dl>
4060 <dt>
4061 <span class="term">
4062 <code class="option">--assembly / -a</code>
4063 </span>
4064 </dt>
4065 <dd>
4066 <p>
4067Output annotated assembly. If this is combined with --source, then mixed
4068source / assembly annotations are output.
4069</p>
4070 </dd>
4071 <dt>
4072 <span class="term">
4073 <code class="option">--base-dirs / -b [paths]/</code>
4074 </span>
4075 </dt>
4076 <dd>
4077 <p>
4078Comma-separated list of path prefixes. This can be used to point OProfile to a
4079different location for source files when the debug information specifies an
4080absolute path on your system for the source that does not exist. The prefix
4081is stripped from the debug source file paths, then searched in the search dirs
4082specified by <code class="option">--search-dirs</code>.
4083</p>
4084 </dd>
4085 <dt>
4086 <span class="term">
4087 <code class="option">--demangle / -D none|normal|smart</code>
4088 </span>
4089 </dt>
4090 <dd>
4091 <p>
4092none: no demangling. normal: use default demangler (default) smart: use
4093pattern-matching to make C++ symbol demangling more readable.
4094</p>
4095 </dd>
4096 <dt>
4097 <span class="term">
4098 <code class="option">--exclude-dependent / -x</code>
4099 </span>
4100 </dt>
4101 <dd>
4102 <p>
4103Do not include application-specific images for libraries, kernel modules
4104and the kernel. This option only makes sense if the profile session
4105used --separate.
4106</p>
4107 </dd>
4108 <dt>
4109 <span class="term">
4110 <code class="option">--exclude-file [files]</code>
4111 </span>
4112 </dt>
4113 <dd>
4114 <p>
4115Exclude all files in the given comma-separated list of glob patterns.
4116</p>
4117 </dd>
4118 <dt>
4119 <span class="term">
4120 <code class="option">--exclude-symbols / -e [symbols]</code>
4121 </span>
4122 </dt>
4123 <dd>
4124 <p>
4125Exclude all the symbols in the given comma-separated list.
4126</p>
4127 </dd>
4128 <dt>
4129 <span class="term">
4130 <code class="option">--help / -? / --usage</code>
4131 </span>
4132 </dt>
4133 <dd>
4134 <p>
4135Show help message.
4136</p>
4137 </dd>
4138 <dt>
4139 <span class="term">
4140 <code class="option">--image-path / -p [paths]</code>
4141 </span>
4142 </dt>
4143 <dd>
4144 <p>
4145Comma-separated list of additional paths to search for binaries.
4146This is needed to find modules in kernels 2.6 and upwards.
4147</p>
4148 </dd>
4149 <dt>
4150 <span class="term">
4151 <code class="option">--root / -R [path]</code>
4152 </span>
4153 </dt>
4154 <dd>
4155 <p>
4156A path to a filesystem to search for additional binaries.
4157</p>
4158 </dd>
4159 <dt>
4160 <span class="term">
4161 <code class="option">--include-file [files]</code>
4162 </span>
4163 </dt>
4164 <dd>
4165 <p>
4166Only include files in the given comma-separated list of glob patterns.
4167</p>
4168 </dd>
4169 <dt>
4170 <span class="term">
4171 <code class="option">--include-symbols / -i [symbols]</code>
4172 </span>
4173 </dt>
4174 <dd>
4175 <p>
4176Only include symbols in the given comma-separated list.
4177</p>
4178 </dd>
4179 <dt>
4180 <span class="term">
4181 <code class="option">--objdump-params [params]</code>
4182 </span>
4183 </dt>
4184 <dd>
4185 <p>
4186Pass the given parameters as extra values when calling objdump.
4187</p>
4188 </dd>
4189 <dt>
4190 <span class="term">
4191 <code class="option">--output-dir / -o [dir]</code>
4192 </span>
4193 </dt>
4194 <dd>
4195 <p>
4196Output directory. This makes opannotate output one annotated file for each
4197source file. This option can't be used in conjunction with --assembly.
4198</p>
4199 </dd>
4200 <dt>
4201 <span class="term">
4202 <code class="option">--search-dirs / -d [paths]</code>
4203 </span>
4204 </dt>
4205 <dd>
4206 <p>
4207Comma-separated list of paths to search for source files. This is useful to find
4208source files when the debug information only contains relative paths.
4209</p>
4210 </dd>
4211 <dt>
4212 <span class="term">
4213 <code class="option">--source / -s</code>
4214 </span>
4215 </dt>
4216 <dd>
4217 <p>
4218Output annotated source. This requires debugging information to be available
4219for the binaries.
4220</p>
4221 </dd>
4222 <dt>
4223 <span class="term">
4224 <code class="option">--threshold / -t [percentage]</code>
4225 </span>
4226 </dt>
4227 <dd>
4228 <p>
4229Only output data for symbols that have more than the given percentage
4230of total samples.
4231</p>
4232 </dd>
4233 <dt>
4234 <span class="term">
4235 <code class="option">--verbose / -V [options]</code>
4236 </span>
4237 </dt>
4238 <dd>
4239 <p>
4240Give verbose debugging output.
4241</p>
4242 </dd>
4243 <dt>
4244 <span class="term">
4245 <code class="option">--version / -v</code>
4246 </span>
4247 </dt>
4248 <dd>
4249 <p>
4250Show version.
4251</p>
4252 </dd>
4253 </dl>
4254 </div>
4255 </div>
4256 </div>
4257 <div class="sect1" lang="en" xml:lang="en">
4258 <div class="titlepage">
4259 <div>
4260 <div>
4261 <h2 class="title" style="clear: both"><a id="getting-jit-reports"></a>4. OProfile results with JIT samples</h2>
4262 </div>
4263 </div>
4264 </div>
4265 <p>
4266 After profiling a Java (or other supported VM) application, the command
4267 </p>
4268 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
4269 <tr>
4270 <td>
4271 <pre class="screen"><span xmlns="http://www.w3.org/1999/xhtml"><strong class="command">"opcontrol --dump"</strong></span> </pre>
4272 </td>
4273 </tr>
4274 </table>
4275 <p>
4276 flushes the sample buffers and creates ELF binaries from the
4277 intermediate files that were written by the agent library.
4278 The ELF binaries are named <code class="filename">&lt;tgid&gt;.jo</code>.
4279 With the symbol information stored in these ELF files, it is
4280 possible to map samples to the appropriate symbols.
4281 </p>
4282 <p>
4283 The usual analysis tools (<span><strong class="command">opreport</strong></span> and/or
4284 <span><strong class="command">opannotate</strong></span>) can now be used
4285 to get symbols and assembly code for the instrumented VM processes.
4286 </p>
4287 <p>
4288Below is an example of a profile report of a Java application that has been
4289instrumented with the provided agent library.
4290</p>
4291 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
4292 <tr>
4293 <td>
4294 <pre class="screen">
4295$ opreport -l /usr/lib/jvm/jre-1.5.0-ibm/bin/java
4296CPU: Core Solo / Duo, speed 2167 MHz (estimated)
4297Counted CPU_CLK_UNHALTED events (Unhalted clock cycles) with a unit mask of 0x00 (Unhalted core cycles) count 100000
4298samples % image name symbol name
4299186020 50.0523 no-vmlinux no-vmlinux (no symbols)
430034333 9.2380 7635.jo java void test.f1()
430119022 5.1182 libc-2.5.so libc-2.5.so _IO_file_xsputn@@GLIBC_2.1
430218762 5.0483 libc-2.5.so libc-2.5.so vfprintf
430316408 4.4149 7635.jo java void test$HelloThread.run()
430416250 4.3724 7635.jo java void test$test_1.f2(int)
430515303 4.1176 7635.jo java void test.f2(int, int)
430613252 3.5657 7635.jo java void test.f2(int)
43075165 1.3897 7635.jo java void test.f4()
4308955 0.2570 7635.jo java void test$HelloThread.run()~
4309
4310</pre>
4311 </td>
4312 </tr>
4313 </table>
4314 <p>
4315</p>
4316 <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
4317 <h3 class="title">Note</h3>
4318 <p>
4319 Depending on the JVM that is used, certain options of opreport and opannotate
4320 do NOT work since they rely on debug information (e.g. source code line number)
4321 that is not always available. The Sun JVM does provide the necessary debug
4322 information via the JVMTI[PI] interface,
4323 but other JVMs do not.
4324 </p>
4325 </div>
4326 <p>
4327 As you can see in the opreport output, the JIT support agent for Java
4328 generates symbols to include the class and method signature.
4329 A symbol with the suffix &#732;&lt;n&gt; (e.g.
4330 <code class="code">void test$HelloThread.run()&#732;1</code>) means that this is
4331 the &lt;n&gt;th occurrence of the identical name. This happens if a method is re-JITed.
4332 A symbol with the suffix %&lt;n&gt;, means that the address space of this symbol
4333 was reused during the sample session (see <a href="#overlapping-symbols" title="6. Overlapping symbols in JITed code">Section 6, &#8220;Overlapping symbols in JITed code&#8221;</a>).
4334 The value &lt;n&gt; is the percentage of time that this symbol/code was present in
4335 relation to the total lifetime of all overlapping other symbols. A symbol of the form
4336 <code class="code">&lt;return_val&gt; &lt;class_name&gt;$&lt;method_sig&gt;</code> denotes an
4337 inner class.
4338 </p>
4339 </div>
4340 <div class="sect1" lang="en" xml:lang="en">
4341 <div class="titlepage">
4342 <div>
4343 <div>
4344 <h2 class="title" style="clear: both"><a id="opgprof"></a>5. <span><strong class="command">gprof</strong></span>-compatible output (<span><strong class="command">opgprof</strong></span>)</h2>
4345 </div>
4346 </div>
4347 </div>
4348 <p>
4349If you're familiar with the output produced by <span><strong class="command">GNU gprof</strong></span>,
4350you may find <span><strong class="command">opgprof</strong></span> useful. It takes a single binary
4351as an argument, and produces a <code class="filename">gmon.out</code> file for use
4352with <span><strong class="command">gprof -p</strong></span>. If call-graph profiling is enabled,
4353then this is also included.
4354</p>
4355 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
4356 <tr>
4357 <td>
4358 <pre class="screen">
4359$ opgprof `which oprofiled` # generates gmon.out file
4360$ gprof -p `which oprofiled` | head
4361Flat profile:
4362
4363Each sample counts as 1 samples.
4364 % cumulative self self total
4365 time samples samples calls T1/call T1/call name
4366 33.13 206237.00 206237.00 odb_insert
4367 22.67 347386.00 141149.00 pop_buffer_value
4368 9.56 406881.00 59495.00 opd_put_sample
4369 7.34 452599.00 45718.00 opd_find_image
4370 7.19 497327.00 44728.00 opd_process_samples
4371</pre>
4372 </td>
4373 </tr>
4374 </table>
4375 <div class="sect2" lang="en" xml:lang="en">
4376 <div class="titlepage">
4377 <div>
4378 <div>
4379 <h3 class="title"><a id="opgprof-details"></a>5.1. Usage of <span><strong class="command">opgprof</strong></span></h3>
4380 </div>
4381 </div>
4382 </div>
4383 <div class="variablelist">
4384 <dl>
4385 <dt>
4386 <span class="term">
4387 <code class="option">--help / -? / --usage</code>
4388 </span>
4389 </dt>
4390 <dd>
4391 <p>
4392Show help message.
4393</p>
4394 </dd>
4395 <dt>
4396 <span class="term">
4397 <code class="option">--image-path / -p [paths]</code>
4398 </span>
4399 </dt>
4400 <dd>
4401 <p>
4402Comma-separated list of additional paths to search for binaries.
4403This is needed to find modules in kernels 2.6 and upwards.
4404</p>
4405 </dd>
4406 <dt>
4407 <span class="term">
4408 <code class="option">--root / -R [path]</code>
4409 </span>
4410 </dt>
4411 <dd>
4412 <p>
4413A path to a filesystem to search for additional binaries.
4414</p>
4415 </dd>
4416 <dt>
4417 <span class="term">
4418 <code class="option">--output-filename / -o [file]</code>
4419 </span>
4420 </dt>
4421 <dd>
4422 <p>
4423Output to the given file instead of the default, gmon.out
4424</p>
4425 </dd>
4426 <dt>
4427 <span class="term">
4428 <code class="option">--threshold / -t [percentage]</code>
4429 </span>
4430 </dt>
4431 <dd>
4432 <p>
4433Only output data for symbols that have more than the given percentage
4434of total samples.
4435</p>
4436 </dd>
4437 <dt>
4438 <span class="term">
4439 <code class="option">--verbose / -V [options]</code>
4440 </span>
4441 </dt>
4442 <dd>
4443 <p>
4444Give verbose debugging output.
4445</p>
4446 </dd>
4447 <dt>
4448 <span class="term">
4449 <code class="option">--version / -v</code>
4450 </span>
4451 </dt>
4452 <dd>
4453 <p>
4454Show version.
4455</p>
4456 </dd>
4457 </dl>
4458 </div>
4459 </div>
4460 </div>
4461 <div class="sect1" lang="en" xml:lang="en">
4462 <div class="titlepage">
4463 <div>
4464 <div>
4465 <h2 class="title" style="clear: both"><a id="oparchive"></a>6. Archiving measurements (<span><strong class="command">oparchive</strong></span>)</h2>
4466 </div>
4467 </div>
4468 </div>
4469 <p>
4470 The <span><strong class="command">oparchive</strong></span> utility generates a directory populated
4471 with executable, debug, and oprofile sample files. This directory can be
4472 moved to another machine via <span><strong class="command">tar</strong></span> and analyzed without
4473 further use of the data collection machine.
4474</p>
4475 <p>
4476 The following command would collect the sample files, the executables
4477 associated with the sample files, and the debuginfo files associated
4478 with the executables and copy them into
4479 <code class="filename">/tmp/current_data</code>:
4480</p>
4481 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
4482 <tr>
4483 <td>
4484 <pre class="screen">
4485# oparchive -o /tmp/current_data
4486</pre>
4487 </td>
4488 </tr>
4489 </table>
4490 <div class="sect2" lang="en" xml:lang="en">
4491 <div class="titlepage">
4492 <div>
4493 <div>
4494 <h3 class="title"><a id="oparchive-details"></a>6.1. Usage of <span><strong class="command">oparchive</strong></span></h3>
4495 </div>
4496 </div>
4497 </div>
4498 <div class="variablelist">
4499 <dl>
4500 <dt>
4501 <span class="term">
4502 <code class="option">--help / -? / --usage</code>
4503 </span>
4504 </dt>
4505 <dd>
4506 <p>
4507Show help message.
4508</p>
4509 </dd>
4510 <dt>
4511 <span class="term">
4512 <code class="option">--exclude-dependent / -x</code>
4513 </span>
4514 </dt>
4515 <dd>
4516 <p>
4517Do not include application-specific images for libraries, kernel modules
4518and the kernel. This option only makes sense if the profile session
4519used --separate.
4520</p>
4521 </dd>
4522 <dt>
4523 <span class="term">
4524 <code class="option">--image-path / -p [paths]</code>
4525 </span>
4526 </dt>
4527 <dd>
4528 <p>
4529Comma-separated list of additional paths to search for binaries.
4530This is needed to find modules in kernels 2.6 and upwards.
4531</p>
4532 </dd>
4533 <dt>
4534 <span class="term">
4535 <code class="option">--root / -R [path]</code>
4536 </span>
4537 </dt>
4538 <dd>
4539 <p>
4540A path to a filesystem to search for additional binaries.
4541</p>
4542 </dd>
4543 <dt>
4544 <span class="term">
4545 <code class="option">--output-directory / -o [directory]</code>
4546 </span>
4547 </dt>
4548 <dd>
4549 <p>
4550Output to the given directory. There is no default. This must be specified.
4551</p>
4552 </dd>
4553 <dt>
4554 <span class="term">
4555 <code class="option">--list-files / -l</code>
4556 </span>
4557 </dt>
4558 <dd>
4559 <p>
4560Only list the files that would be archived, don't copy them.
4561</p>
4562 </dd>
4563 <dt>
4564 <span class="term">
4565 <code class="option">--verbose / -V [options]</code>
4566 </span>
4567 </dt>
4568 <dd>
4569 <p>
4570Give verbose debugging output.
4571</p>
4572 </dd>
4573 <dt>
4574 <span class="term">
4575 <code class="option">--version / -v</code>
4576 </span>
4577 </dt>
4578 <dd>
4579 <p>
4580Show version.
4581</p>
4582 </dd>
4583 </dl>
4584 </div>
4585 </div>
4586 </div>
4587 <div class="sect1" lang="en" xml:lang="en">
4588 <div class="titlepage">
4589 <div>
4590 <div>
4591 <h2 class="title" style="clear: both"><a id="opimport"></a>7. Converting sample database files (<span><strong class="command">opimport</strong></span>)</h2>
4592 </div>
4593 </div>
4594 </div>
4595 <p>
4596 This utility converts sample database files from a foreign binary format (abi) to
4597 the native format. This is useful only when moving sample files between hosts,
4598 for analysis on platforms other than the one used for collection. The abi format
4599 of the file to be imported is described in a text file located in <code class="filename">$SESSION_DIR/abi</code>.
4600</p>
4601 <p>
4602 The following command would convert the input samples files to the
4603 output samples files using the given abi file as a binary description
4604 of the input file and the curent platform abi as a binary description
4605 of the output file.
4606</p>
4607 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
4608 <tr>
4609 <td>
4610 <pre class="screen">
4611# opimport -a /var/lib/oprofile/abi -o /tmp/current/.../GLOBAL_POWER_EVENTS.200000.1.all.all.all /var/lib/.../mprime/GLOBAL_POWER_EVENTS.200000.1.all.all.all
4612</pre>
4613 </td>
4614 </tr>
4615 </table>
4616 <div class="sect2" lang="en" xml:lang="en">
4617 <div class="titlepage">
4618 <div>
4619 <div>
4620 <h3 class="title"><a id="opimport-details"></a>7.1. Usage of <span><strong class="command">opimport</strong></span></h3>
4621 </div>
4622 </div>
4623 </div>
4624 <div class="variablelist">
4625 <dl>
4626 <dt>
4627 <span class="term">
4628 <code class="option">--help / -? / --usage</code>
4629 </span>
4630 </dt>
4631 <dd>
4632 <p>
4633Show help message.
4634</p>
4635 </dd>
4636 <dt>
4637 <span class="term">
4638 <code class="option">--abi / -a [filename]</code>
4639 </span>
4640 </dt>
4641 <dd>
4642 <p>
4643Input abi file description location.
4644</p>
4645 </dd>
4646 <dt>
4647 <span class="term">
4648 <code class="option">--force / -f</code>
4649 </span>
4650 </dt>
4651 <dd>
4652 <p>
4653Force conversion even if the input and output abi are identical.
4654</p>
4655 </dd>
4656 <dt>
4657 <span class="term">
4658 <code class="option">--output / -o [filename]</code>
4659 </span>
4660 </dt>
4661 <dd>
4662 <p>
4663Specify the output filename. If the output file already exists, the file is
4664not overwritten but data are accumulated in. Sample filename are informative
4665for post profile tools and must be kept identical, in other word the pathname
4666from the first path component containing a '{' must be kept as it in the
4667output filename.
4668</p>
4669 </dd>
4670 <dt>
4671 <span class="term">
4672 <code class="option">--verbose / -V</code>
4673 </span>
4674 </dt>
4675 <dd>
4676 <p>
4677Give verbose debugging output.
4678</p>
4679 </dd>
4680 <dt>
4681 <span class="term">
4682 <code class="option">--version / -v</code>
4683 </span>
4684 </dt>
4685 <dd>
4686 <p>
4687Show version.
4688</p>
4689 </dd>
4690 </dl>
4691 </div>
4692 </div>
4693 </div>
4694 </div>
4695 <div class="chapter" lang="en" xml:lang="en">
4696 <div class="titlepage">
4697 <div>
4698 <div>
4699 <h2 class="title"><a id="interpreting"></a>Chapter 5. Interpreting profiling results</h2>
4700 </div>
4701 </div>
4702 </div>
4703 <div class="toc">
4704 <p>
4705 <b>Table of Contents</b>
4706 </p>
4707 <dl>
4708 <dt>
4709 <span class="sect1">
4710 <a href="#irq-latency">1. Profiling interrupt latency</a>
4711 </span>
4712 </dt>
4713 <dt>
4714 <span class="sect1">
4715 <a href="#kernel-profiling">2. Kernel profiling</a>
4716 </span>
4717 </dt>
4718 <dd>
4719 <dl>
4720 <dt>
4721 <span class="sect2">
4722 <a href="#irq-masking">2.1. Interrupt masking</a>
4723 </span>
4724 </dt>
4725 <dt>
4726 <span class="sect2">
4727 <a href="#idle">2.2. Idle time</a>
4728 </span>
4729 </dt>
4730 <dt>
4731 <span class="sect2">
4732 <a href="#kernel-modules">2.3. Profiling kernel modules</a>
4733 </span>
4734 </dt>
4735 </dl>
4736 </dd>
4737 <dt>
4738 <span class="sect1">
4739 <a href="#interpreting-callgraph">3. Interpreting call-graph profiles</a>
4740 </span>
4741 </dt>
4742 <dt>
4743 <span class="sect1">
4744 <a href="#debug-info">4. Inaccuracies in annotated source</a>
4745 </span>
4746 </dt>
4747 <dd>
4748 <dl>
4749 <dt>
4750 <span class="sect2">
4751 <a href="#effect-of-optimizations">4.1. Side effects of optimizations</a>
4752 </span>
4753 </dt>
4754 <dt>
4755 <span class="sect2">
4756 <a href="#prologues">4.2. Prologues and epilogues</a>
4757 </span>
4758 </dt>
4759 <dt>
4760 <span class="sect2">
4761 <a href="#inlined-function">4.3. Inlined functions</a>
4762 </span>
4763 </dt>
4764 <dt>
4765 <span class="sect2">
4766 <a href="#wrong-linenr-info">4.4. Inaccuracy in line number information</a>
4767 </span>
4768 </dt>
4769 </dl>
4770 </dd>
4771 <dt>
4772 <span class="sect1">
4773 <a href="#symbol-without-debug-info">5. Assembly functions</a>
4774 </span>
4775 </dt>
4776 <dt>
4777 <span class="sect1">
4778 <a href="#overlapping-symbols">6. Overlapping symbols in JITed code</a>
4779 </span>
4780 </dt>
4781 <dt>
4782 <span class="sect1">
4783 <a href="#hidden-cost">7. Other discrepancies</a>
4784 </span>
4785 </dt>
4786 </dl>
4787 </div>
4788 <p>
4789The standard caveats of profiling apply in interpreting the results from OProfile:
4790profile realistic situations, profile different scenarios, profile
4791for as long as a time as possible, avoid system-specific artifacts, don't trust
4792the profile data too much. Also bear in mind the comments on the performance
4793counters above - you <span class="emphasis"><em>cannot</em></span> rely on totally accurate
4794instruction-level profiling. However, for almost all circumstances the data
4795can be useful. Ideally a utility such as Intel's VTUNE would be available to
4796allow careful instruction-level analysis; go hassle Intel for this, not me ;)
4797</p>
4798 <div class="sect1" lang="en" xml:lang="en">
4799 <div class="titlepage">
4800 <div>
4801 <div>
4802 <h2 class="title" style="clear: both"><a id="irq-latency"></a>1. Profiling interrupt latency</h2>
4803 </div>
4804 </div>
4805 </div>
4806 <p>
4807This is an example of how the latency of delivery of profiling interrupts
4808can impact the reliability of the profiling data. This is pretty much a
4809worst-case-scenario example: these problems are fairly rare.
4810</p>
4811 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
4812 <tr>
4813 <td>
4814 <pre class="screen">
4815double fun(double a, double b, double c)
4816{
4817 double result = 0;
4818 for (int i = 0 ; i &lt; 10000; ++i) {
4819 result += a;
4820 result *= b;
4821 result /= c;
4822 }
4823 return result;
4824}
4825</pre>
4826 </td>
4827 </tr>
4828 </table>
4829 <p>
4830Here the last instruction of the loop is very costly, and you would expect the result
4831reflecting that - but (cutting the instructions inside the loop):
4832</p>
4833 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
4834 <tr>
4835 <td>
4836 <pre class="screen">
4837$ opannotate -a -t 10 ./a.out
4838
4839 88 15.38% : 8048337: fadd %st(3),%st
4840 48 8.391% : 8048339: fmul %st(2),%st
4841 68 11.88% : 804833b: fdiv %st(1),%st
4842 368 64.33% : 804833d: inc %eax
4843 : 804833e: cmp $0x270f,%eax
4844 : 8048343: jle 8048337
4845</pre>
4846 </td>
4847 </tr>
4848 </table>
4849 <p>
4850The problem comes from the x86 hardware; when the counter overflows the IRQ
4851is asserted but the hardware has features that can delay the NMI interrupt:
4852x86 hardware is synchronous (i.e. cannot interrupt during an instruction);
4853there is also a latency when the IRQ is asserted, and the multiple
4854execution units and the out-of-order model of modern x86 CPUs also causes
4855problems. This is the same function, with annotation :
4856</p>
4857 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
4858 <tr>
4859 <td>
4860 <pre class="screen">
4861$ opannotate -s -t 10 ./a.out
4862
4863 :double fun(double a, double b, double c)
4864 :{ /* _Z3funddd total: 572 100.0% */
4865 : double result = 0;
4866 368 64.33% : for (int i = 0 ; i &lt; 10000; ++i) {
4867 88 15.38% : result += a;
4868 48 8.391% : result *= b;
4869 68 11.88% : result /= c;
4870 : }
4871 : return result;
4872 :}
4873</pre>
4874 </td>
4875 </tr>
4876 </table>
4877 <p>
4878The conclusion: don't trust samples coming at the end of a loop,
4879particularly if the last instruction generated by the compiler is costly. This
4880case can also occur for branches. Always bear in mind that samples
4881can be delayed by a few cycles from its real position. That's a hardware
4882problem and OProfile can do nothing about it.
4883</p>
4884 </div>
4885 <div class="sect1" lang="en" xml:lang="en">
4886 <div class="titlepage">
4887 <div>
4888 <div>
4889 <h2 class="title" style="clear: both"><a id="kernel-profiling"></a>2. Kernel profiling</h2>
4890 </div>
4891 </div>
4892 </div>
4893 <div class="sect2" lang="en" xml:lang="en">
4894 <div class="titlepage">
4895 <div>
4896 <div>
4897 <h3 class="title"><a id="irq-masking"></a>2.1. Interrupt masking</h3>
4898 </div>
4899 </div>
4900 </div>
4901 <p>
4902OProfile uses non-maskable interrupts (NMI) on the P6 generation, Pentium 4,
4903Athlon, Opteron, Phenom, and Turion processors. These interrupts can occur even in section of the
4904Linux where interrupts are disabled, allowing collection of samples in virtually
4905all executable code. The RTC, timer interrupt mode, and Itanium 2 collection mechanisms
4906use maskable interrupts. Thus, the RTC and Itanium 2 data collection mechanism have "sample
4907shadows", or blind spots: regions where no samples will be collected. Typically, the samples
4908will be attributed to the code immediately after the interrupts are re-enabled.
4909</p>
4910 </div>
4911 <div class="sect2" lang="en" xml:lang="en">
4912 <div class="titlepage">
4913 <div>
4914 <div>
4915 <h3 class="title"><a id="idle"></a>2.2. Idle time</h3>
4916 </div>
4917 </div>
4918 </div>
4919 <p>
4920Your kernel is likely to support halting the processor when a CPU is idle. As
4921the typical hardware events like <code class="constant">CPU_CLK_UNHALTED</code> do not
4922count when the CPU is halted, the kernel profile will not reflect the actual
4923amount of time spent idle. You can change this behaviour by booting with
4924the <code class="option">idle=poll</code> option, which uses a different idle routine. This
4925will appear as <code class="function">poll_idle()</code> in your kernel profile.
4926</p>
4927 </div>
4928 <div class="sect2" lang="en" xml:lang="en">
4929 <div class="titlepage">
4930 <div>
4931 <div>
4932 <h3 class="title"><a id="kernel-modules"></a>2.3. Profiling kernel modules</h3>
4933 </div>
4934 </div>
4935 </div>
4936 <p>
4937OProfile profiles kernel modules by default. However, there are a couple of problems
4938you may have when trying to get results. First, you may have booted via an initrd;
4939this means that the actual path for the module binaries cannot be determined automatically.
4940To get around this, you can use the <code class="option">-p</code> option to the profiling tools
4941to specify where to look for the kernel modules.
4942</p>
4943 <p>
4944In 2.6, the information on where kernel module binaries are located has been removed.
4945This means OProfile needs guiding with the <code class="option">-p</code> option to find your
4946modules. Normally, you can just use your standard module top-level directory for this.
4947Note that due to this problem, OProfile cannot check that the modification times match;
4948it is your responsibility to make sure you do not modify a binary after a profile
4949has been created.
4950</p>
4951 <p>
4952If you have run <span><strong class="command">insmod</strong></span> or <span><strong class="command">modprobe</strong></span> to insert a module
4953in a particular directory, it is important that you specify this directory with the
4954<code class="option">-p</code> option first, so that it over-rides an older module binary that might
4955exist in other directories you've specified with <code class="option">-p</code>. It is up to you
4956to make sure that these values are correct: 2.6 kernels simply do not provide enough
4957information for OProfile to get this information.
4958</p>
4959 </div>
4960 </div>
4961 <div class="sect1" lang="en" xml:lang="en">
4962 <div class="titlepage">
4963 <div>
4964 <div>
4965 <h2 class="title" style="clear: both"><a id="interpreting-callgraph"></a>3. Interpreting call-graph profiles</h2>
4966 </div>
4967 </div>
4968 </div>
4969 <p>
4970Sometimes the results from call-graph profiles may be different to what
4971you expect to see. The first thing to check is whether the target
4972binaries where compiled with frame pointers enabled (if the binary was
4973compiled using <span><strong class="command">gcc</strong></span>'s
4974<code class="option">-fomit-frame-pointer</code> option, you will not get
4975meaningful results). Note that as of this writing, the GCC developers
4976plan to disable frame pointers by default. The Linux kernel is built
4977without frame pointers by default; there is a configuration option you
4978can use to turn it on under the "Kernel Hacking" menu.
4979</p>
4980 <p>
4981Often you may see a caller of a function that does not actually directly
4982call the function you're looking at (e.g. if <code class="function">a()</code>
4983calls <code class="function">b()</code>, which in turn calls
4984<code class="function">c()</code>, you may see an entry for
4985<code class="function">a()-&gt;c()</code>). What's actually occurring is that we
4986are taking samples at the very start (or the very end) of
4987<code class="function">c()</code>; at these few instructions, we haven't yet
4988created the new function's frame, so it appears as if
4989<code class="function">a()</code> is calling directly into
4990<code class="function">c()</code>. Be careful not to be misled by these
4991entries.
4992</p>
4993 <p>
4994Like the rest of OProfile, call-graph profiling uses a statistical
4995approach; this means that sometimes a backtrace sample is truncated, or
4996even partially wrong. Bear this in mind when examining results.
4997</p>
4998 </div>
4999 <div class="sect1" lang="en" xml:lang="en">
5000 <div class="titlepage">
5001 <div>
5002 <div>
5003 <h2 class="title" style="clear: both"><a id="debug-info"></a>4. Inaccuracies in annotated source</h2>
5004 </div>
5005 </div>
5006 </div>
5007 <div class="sect2" lang="en" xml:lang="en">
5008 <div class="titlepage">
5009 <div>
5010 <div>
5011 <h3 class="title"><a id="effect-of-optimizations"></a>4.1. Side effects of optimizations</h3>
5012 </div>
5013 </div>
5014 </div>
5015 <p>
5016The compiler can introduce some pitfalls in the annotated source output.
5017The optimizer can move pieces of code in such manner that two line of codes
5018are interlaced (instruction scheduling). Also debug info generated by the compiler
5019can show strange behavior. This is especially true for complex expressions e.g. inside
5020an if statement:
5021</p>
5022 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
5023 <tr>
5024 <td>
5025 <pre class="screen">
5026 if (a &amp;&amp; ..
5027 b &amp;&amp; ..
5028 c &amp;&amp;)
5029</pre>
5030 </td>
5031 </tr>
5032 </table>
5033 <p>
5034here the problem come from the position of line number. The available debug
5035info does not give enough details for the if condition, so all samples are
5036accumulated at the position of the right brace of the expression. Using
5037<span><strong class="command">opannotate <code class="option">-a</code></strong></span> can help to show the real
5038samples at an assembly level.
5039</p>
5040 </div>
5041 <div class="sect2" lang="en" xml:lang="en">
5042 <div class="titlepage">
5043 <div>
5044 <div>
5045 <h3 class="title"><a id="prologues"></a>4.2. Prologues and epilogues</h3>
5046 </div>
5047 </div>
5048 </div>
5049 <p>
5050The compiler generally needs to generate "glue" code across function calls, dependent
5051on the particular function call conventions used. Additionally other things
5052need to happen, like stack pointer adjustment for the local variables; this
5053code is known as the function prologue. Similar code is needed at function return,
5054and is known as the function epilogue. This will show up in annotations as
5055samples at the very start and end of a function, where there is no apparent
5056executable code in the source.
5057</p>
5058 </div>
5059 <div class="sect2" lang="en" xml:lang="en">
5060 <div class="titlepage">
5061 <div>
5062 <div>
5063 <h3 class="title"><a id="inlined-function"></a>4.3. Inlined functions</h3>
5064 </div>
5065 </div>
5066 </div>
5067 <p>
5068You may see that a function is credited with a certain number of samples, but
5069the listing does not add up to the correct total. To pick a real example :
5070</p>
5071 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
5072 <tr>
5073 <td>
5074 <pre class="screen">
5075 :internal_sk_buff_alloc_security(struct sk_buff *skb)
5076 353 2.342% :{ /* internal_sk_buff_alloc_security total: 1882 12.48% */
5077 :
5078 : sk_buff_security_t *sksec;
5079 15 0.0995% : int rc = 0;
5080 :
5081 10 0.06633% : sksec = skb-&gt;lsm_security;
5082 468 3.104% : if (sksec &amp;&amp; sksec-&gt;magic == DSI_MAGIC) {
5083 : goto out;
5084 : }
5085 :
5086 : sksec = (sk_buff_security_t *) get_sk_buff_memory(skb);
5087 3 0.0199% : if (!sksec) {
5088 38 0.2521% : rc = -ENOMEM;
5089 : goto out;
5090 10 0.06633% : }
5091 : memset(sksec, 0, sizeof (sk_buff_security_t));
5092 44 0.2919% : sksec-&gt;magic = DSI_MAGIC;
5093 32 0.2123% : sksec-&gt;skb = skb;
5094 45 0.2985% : sksec-&gt;sid = DSI_SID_NORMAL;
5095 31 0.2056% : skb-&gt;lsm_security = sksec;
5096 :
5097 : out:
5098 :
5099 146 0.9685% : return rc;
5100 :
5101 98 0.6501% :}
5102</pre>
5103 </td>
5104 </tr>
5105 </table>
5106 <p>
5107Here, the function is credited with 1,882 samples, but the annotations
5108below do not account for this. This is usually because of inline functions -
5109the compiler marks such code with debug entries for the inline function
5110definition, and this is where <span><strong class="command">opannotate</strong></span> annotates
5111such samples. In the case above, <code class="function">memset</code> is the most
5112likely candidate for this problem. Examining the mixed source/assembly
5113output can help identify such results.
5114</p>
5115 <p>
5116This problem is more visible when there is no source file available, in the
5117following example it's trivially visible the sums of symbols samples is less
5118than the number of the samples for this file. The difference must be accounted
5119to inline functions.
5120</p>
5121 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
5122 <tr>
5123 <td>
5124 <pre class="screen">
5125/*
5126 * Total samples for file : "arch/i386/kernel/process.c"
5127 *
5128 * 109 2.4616
5129 */
5130
5131 /* default_idle total: 84 1.8970 */
5132 /* cpu_idle total: 21 0.4743 */
5133 /* flush_thread total: 1 0.0226 */
5134 /* prepare_to_copy total: 1 0.0226 */
5135 /* __switch_to total: 18 0.4065 */
5136</pre>
5137 </td>
5138 </tr>
5139 </table>
5140 <p>
5141The missing samples are not lost, they will be credited to another source
5142location where the inlined function is defined. The inlined function will be
5143credited from multiple call site and merged in one place in the annotated
5144source file so there is no way to see from what call site are coming the
5145samples for an inlined function.
5146</p>
5147 <p>
5148When running <span><strong class="command">opannotate</strong></span>, you may get a warning
5149"some functions compiled without debug information may have incorrect source line attributions".
5150In some rare cases, OProfile is not able to verify that the derived source line
5151is correct (when some parts of the binary image are compiled without debugging
5152information). Be wary of results if this warning appears.
5153</p>
5154 <p>
5155Furthermore, for some languages the compiler can implicitly generate functions,
5156such as default copy constructors. Such functions are labelled by the compiler
5157as having a line number of 0, which means the source annotation can be confusing.
5158</p>
5159 </div>
5160 <div class="sect2" lang="en" xml:lang="en">
5161 <div class="titlepage">
5162 <div>
5163 <div>
5164 <h3 class="title"><a id="wrong-linenr-info"></a>4.4. Inaccuracy in line number information</h3>
5165 </div>
5166 </div>
5167 </div>
5168 <p>
5169Depending on your compiler you can fall into the following problem:
5170</p>
5171 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
5172 <tr>
5173 <td>
5174 <pre class="screen">
5175struct big_object { int a[500]; };
5176
5177int main()
5178{
5179 big_object a, b;
5180 for (int i = 0 ; i != 1000 * 1000; ++i)
5181 b = a;
5182 return 0;
5183}
5184
5185</pre>
5186 </td>
5187 </tr>
5188 </table>
5189 <p>
5190Compiled with <span><strong class="command">gcc</strong></span> 3.0.4 the annotated source is clearly inaccurate:
5191</p>
5192 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
5193 <tr>
5194 <td>
5195 <pre class="screen">
5196 :int main()
5197 :{ /* main total: 7871 100% */
5198 : big_object a, b;
5199 : for (int i = 0 ; i != 1000 * 1000; ++i)
5200 : b = a;
5201 7871 100% : return 0;
5202 :}
5203</pre>
5204 </td>
5205 </tr>
5206 </table>
5207 <p>
5208The problem here is distinct from the IRQ latency problem; the debug line number
5209information is not precise enough; again, looking at output of <span><strong class="command">opannoatate -as</strong></span> can help.
5210</p>
5211 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
5212 <tr>
5213 <td>
5214 <pre class="screen">
5215 :int main()
5216 :{
5217 : big_object a, b;
5218 : for (int i = 0 ; i != 1000 * 1000; ++i)
5219 : 80484c0: push %ebp
5220 : 80484c1: mov %esp,%ebp
5221 : 80484c3: sub $0xfac,%esp
5222 : 80484c9: push %edi
5223 : 80484ca: push %esi
5224 : 80484cb: push %ebx
5225 : b = a;
5226 : 80484cc: lea 0xfffff060(%ebp),%edx
5227 : 80484d2: lea 0xfffff830(%ebp),%eax
5228 : 80484d8: mov $0xf423f,%ebx
5229 : 80484dd: lea 0x0(%esi),%esi
5230 : return 0;
5231 3 0.03811% : 80484e0: mov %edx,%edi
5232 : 80484e2: mov %eax,%esi
5233 1 0.0127% : 80484e4: cld
5234 8 0.1016% : 80484e5: mov $0x1f4,%ecx
5235 7850 99.73% : 80484ea: repz movsl %ds:(%esi),%es:(%edi)
5236 9 0.1143% : 80484ec: dec %ebx
5237 : 80484ed: jns 80484e0
5238 : 80484ef: xor %eax,%eax
5239 : 80484f1: pop %ebx
5240 : 80484f2: pop %esi
5241 : 80484f3: pop %edi
5242 : 80484f4: leave
5243 : 80484f5: ret
5244</pre>
5245 </td>
5246 </tr>
5247 </table>
5248 <p>
5249So here it's clear that copying is correctly credited with of all the samples, but the
5250line number information is misplaced. <span><strong class="command">objdump -dS</strong></span> exposes the
5251same problem. Note that maintaining accurate debug information for compilers when optimizing is difficult, so this problem is not suprising.
5252The problem of debug information
5253accuracy is also dependent on the binutils version used; some BFD library versions
5254contain a work-around for known problems of <span><strong class="command">gcc</strong></span>, some others do not. This is unfortunate but we must live with that,
5255since profiling is pointless when you disable optimisation (which would give better debugging entries).
5256</p>
5257 </div>
5258 </div>
5259 <div class="sect1" lang="en" xml:lang="en">
5260 <div class="titlepage">
5261 <div>
5262 <div>
5263 <h2 class="title" style="clear: both"><a id="symbol-without-debug-info"></a>5. Assembly functions</h2>
5264 </div>
5265 </div>
5266 </div>
5267 <p>
5268Often the assembler cannot generate debug information automatically.
5269This means that you cannot get a source report unless
5270you manually define the neccessary debug information; read your assembler documentation for how you might
5271do that. The only
5272debugging info needed currently by OProfile is the line-number/filename-VMA association. When profiling assembly
5273without debugging info you can always get report for symbols, and optionally for VMA, through <span><strong class="command">opreport -l</strong></span>
5274or <span><strong class="command">opreport -d</strong></span>, but this works only for symbols with the right attributes.
5275For <span><strong class="command">gas</strong></span> you can get this by
5276</p>
5277 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
5278 <tr>
5279 <td>
5280 <pre class="screen">
5281.globl foo
5282 .type foo,@function
5283</pre>
5284 </td>
5285 </tr>
5286 </table>
5287 <p>
5288whilst for <span><strong class="command">nasm</strong></span> you must use
5289</p>
5290 <table xmlns="" border="0" style="background: #E0E0E0;" width="90%">
5291 <tr>
5292 <td>
5293 <pre class="screen">
5294 GLOBAL foo:function ; [1]
5295</pre>
5296 </td>
5297 </tr>
5298 </table>
5299 <p>
5300Note that OProfile does not need the global attribute, only the function attribute.
5301</p>
5302 </div>
5303 <div class="sect1" lang="en" xml:lang="en">
5304 <div class="titlepage">
5305 <div>
5306 <div>
5307 <h2 class="title" style="clear: both"><a id="overlapping-symbols"></a>6. Overlapping symbols in JITed code</h2>
5308 </div>
5309 </div>
5310 </div>
5311 <p>
5312 Some virtual machines (e.g., Java) may re-JIT a method, resulting in previously
5313 allocated space for a piece of compiled code to be reused. This means that, at one distinct
5314 code address, multiple symbols/methods may be present during the run time of the application.
5315 </p>
5316 <p>
5317 Since OProfile samples are buffered and don&#8242;t have timing information, there is no way
5318 to correlate samples with the (possibly) varying address ranges in which the code for a symbol
5319 may reside.
5320 An alternative would be flushing the OProfile sampling buffer when we get an unload event,
5321 but this could result in high overhead.
5322 </p>
5323 <p>
5324 To moderate the problem of overlapping symbols, OProfile tries to select the symbol that was
5325 present at this address range most of the time. Additionally, other overlapping symbols
5326 are truncated in the overlapping area.
5327 This gives reasonable results, because in reality, address reuse typically takes place
5328 during phase changes of the application -- in particular, during application startup.
5329 Thus, for optimum profiling results, start the sampling session after application startup
5330 and burn in.
5331 </p>
5332 </div>
5333 <div class="sect1" lang="en" xml:lang="en">
5334 <div class="titlepage">
5335 <div>
5336 <div>
5337 <h2 class="title" style="clear: both"><a id="hidden-cost"></a>7. Other discrepancies</h2>
5338 </div>
5339 </div>
5340 </div>
5341 <p>
5342Another cause of apparent problems is the hidden cost of instructions. A very
5343common example is two memory reads: one from L1 cache and the other from memory:
5344the second memory read is likely to have more samples.
5345There are many other causes of hidden cost of instructions. A non-exhaustive
5346list: mis-predicted branch, TLB cache miss, partial register stall,
5347partial register dependencies, memory mismatch stall, re-executed µops. If you want to write
5348programs at the assembly level, be sure to take a look at the Intel and
5349AMD documentation at <a href="http://developer.intel.com/">http://developer.intel.com/</a>
5350and <a href="http://developer.amd.com/devguides.jsp/">http://developer.amd.com/devguides.jsp</a>.
5351</p>
5352 </div>
5353 </div>
5354 <div class="chapter" lang="en" xml:lang="en">
5355 <div class="titlepage">
5356 <div>
5357 <div>
5358 <h2 class="title"><a id="ack"></a>Chapter 6. Acknowledgments</h2>
5359 </div>
5360 </div>
5361 </div>
5362 <p>
5363Thanks to (in no particular order) : Arjan van de Ven, Rik van Riel, Juan Quintela, Philippe Elie,
5364Phillipp Rumpf, Tigran Aivazian, Alex Brown, Alisdair Rawsthorne, Bob Montgomery, Ray Bryant, H.J. Lu,
5365Jeff Esper, Will Cohen, Graydon Hoare, Cliff Woolley, Alex Tsariounov, Al Stone, Jason Yeh,
5366Randolph Chung, Anton Blanchard, Richard Henderson, Andries Brouwer, Bryan Rittmeyer,
5367Maynard P. Johnson,
5368Richard Reich (rreich@rdrtech.com), Zwane Mwaikambo, Dave Jones, Charles Filtness; and finally Pulp, for "Intro".
5369</p>
5370 </div>
5371 </div>
5372 </body>
5373</html>