Mike Dodd | 8cfa702 | 2010-11-17 11:12:26 -0800 | [diff] [blame] | 1 | <?xml version="1.0" encoding='ISO-8859-1'?> |
| 2 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"> |
| 3 | |
| 4 | <book id="oprofile-guide"> |
| 5 | <bookinfo> |
| 6 | <title>OProfile manual</title> |
| 7 | |
| 8 | <authorgroup> |
| 9 | <author> |
| 10 | <firstname>John</firstname> |
| 11 | <surname>Levon</surname> |
| 12 | <affiliation> |
| 13 | <address><email>levon@movementarian.org</email></address> |
| 14 | </affiliation> |
| 15 | </author> |
| 16 | </authorgroup> |
| 17 | |
| 18 | <copyright> |
| 19 | <year>2000-2004</year> |
| 20 | <holder>Victoria University of Manchester, John Levon and others</holder> |
| 21 | </copyright> |
| 22 | </bookinfo> |
| 23 | |
| 24 | <toc></toc> |
| 25 | |
| 26 | <chapter id="introduction"> |
| 27 | <title>Introduction</title> |
| 28 | |
| 29 | <para> |
| 30 | This manual applies to OProfile version <oprofileversion />. |
| 31 | OProfile is a profiling system for Linux 2.2/2.4/2.6 systems on a number of architectures. It is capable of profiling |
| 32 | all parts of a running system, from the kernel (including modules and interrupt handlers) to shared libraries |
| 33 | to binaries. It runs transparently in the background collecting information at a low overhead. These |
| 34 | features make it ideal for profiling entire systems to determine bottle necks in real-world systems. |
| 35 | </para> |
| 36 | <para> |
| 37 | Many CPUs provide "performance counters", hardware registers that can count "events"; for example, |
| 38 | cache misses, or CPU cycles. OProfile provides profiles of code based on the number of these occurring events: |
| 39 | repeatedly, every time a certain (configurable) number of events has occurred, the PC value is recorded. |
| 40 | This information is aggregated into profiles for each binary image.</para> |
| 41 | <para> |
| 42 | Some hardware setups do not allow OProfile to use performance counters: in these cases, no |
| 43 | events are available, and OProfile operates in timer/RTC mode, as described in later chapters. |
| 44 | </para> |
| 45 | <sect1 id="applications"> |
| 46 | <title>Applications of OProfile</title> |
| 47 | <para> |
| 48 | OProfile is useful in a number of situations. You might want to use OProfile when you : |
| 49 | </para> |
| 50 | <itemizedlist> |
| 51 | <listitem><para>need low overhead</para></listitem> |
| 52 | <listitem><para>cannot use highly intrusive profiling methods</para></listitem> |
| 53 | <listitem><para>need to profile interrupt handlers</para></listitem> |
| 54 | <listitem><para>need to profile an application and its shared libraries</para></listitem> |
| 55 | <listitem><para>need to profile dynamically compiled code of supported virtual machines (see <xref linkend="jitsupport"/>)</para></listitem> |
| 56 | <listitem><para>need to capture the performance behaviour of entire system</para></listitem> |
| 57 | <listitem><para>want to examine hardware effects such as cache misses</para></listitem> |
| 58 | <listitem><para>want detailed source annotation</para></listitem> |
| 59 | <listitem><para>want instruction-level profiles</para></listitem> |
| 60 | <listitem><para>want call-graph profiles</para></listitem> |
| 61 | </itemizedlist> |
| 62 | <para> |
| 63 | OProfile is not a panacea. OProfile might not be a complete solution when you : |
| 64 | </para> |
| 65 | <itemizedlist> |
| 66 | <listitem><para>require call graph profiles on platforms other than 2.6/x86</para></listitem> |
| 67 | <listitem><para>don't have root permissions</para></listitem> |
| 68 | <listitem><para>require 100% instruction-accurate profiles</para></listitem> |
| 69 | <listitem><para>need function call counts or an interstitial profiling API</para></listitem> |
| 70 | <listitem><para>cannot tolerate any disturbance to the system whatsoever</para></listitem> |
| 71 | <listitem><para>need to profile interpreted or dynamically compiled code of non-supported virtual machines</para></listitem> |
| 72 | </itemizedlist> |
| 73 | <sect2 id="jitsupport"> |
| 74 | <title>Support for dynamically compiled (JIT) code</title> |
| 75 | <para> |
| 76 | Older versions of OProfile were not capable of attributing samples to symbols from dynamically |
| 77 | compiled code, i.e. "just-in-time (JIT) code". Typical JIT compilers load the JIT code into |
| 78 | anonymous memory regions. OProfile reported the samples from such code, but the attribution |
| 79 | provided was simply: |
| 80 | <screen>"anon: <tgid><address range>" </screen> |
| 81 | Due to this limitation, it wasn't possible to profile applications executed by virtual machines (VMs) |
| 82 | like the Java Virtual Machine. OProfile now contains an infrastructure to support JITed code. |
| 83 | A development library is provided to allow developers |
| 84 | to add support for any VM that produces dynamically compiled code (see the <emphasis>OProfile JIT agent |
| 85 | developer guide</emphasis>). |
| 86 | In addition, built-in support is included for the following:</para> |
| 87 | <itemizedlist><listitem>JVMTI agent library for Java (1.5 and higher)</listitem> |
| 88 | <listitem>JVMPI agent library for Java (1.5 and lower)</listitem> |
| 89 | </itemizedlist> |
| 90 | <para> |
| 91 | For information on how to use OProfile's JIT support, see <xref linkend="setup-jit"/>. |
| 92 | </para> |
| 93 | </sect2> |
| 94 | </sect1> |
| 95 | |
| 96 | <sect1 id="requirements"> |
| 97 | <title>System requirements</title> |
| 98 | |
| 99 | <variablelist> |
| 100 | <varlistentry> |
| 101 | <term>Linux kernel 2.2/2.4/2.6</term> |
| 102 | <listitem><para> |
| 103 | OProfile uses a kernel module that can be compiled for |
| 104 | 2.2.11 or later and 2.4. 2.4.10 or above is required if you use the |
| 105 | boot-time kernel option <option>nosmp</option>. 2.6 kernels are supported with the in-kernel |
| 106 | OProfile driver. Note that only 32-bit x86 and IA64 are supported on 2.2/2.4 kernels. |
| 107 | </para> |
| 108 | |
| 109 | <para> |
| 110 | 2.6 kernels are strongly recommended. Under 2.4, OProfile may cause system crashes if power |
| 111 | management is used, or the BIOS does not correctly deal with local APICs. |
| 112 | </para> |
| 113 | |
| 114 | <para> |
Jeff Brown | 7a33c86 | 2011-02-02 14:00:44 -0800 | [diff] [blame] | 115 | To use OProfile's JIT support, a kernel version 2.6.13 or later is required. |
| 116 | In earlier kernel versions, the anonymous memory regions are not reported to OProfile and results |
| 117 | in profiling reports without any samples in these regions. |
| 118 | </para> |
| 119 | |
| 120 | <para> |
Mike Dodd | 8cfa702 | 2010-11-17 11:12:26 -0800 | [diff] [blame] | 121 | PPC64 processors (Power4/Power5/PPC970, etc.) require a recent (> 2.6.5) kernel with the line |
| 122 | <constant>#define PV_970</constant> present in <filename>include/asm-ppc64/processor.h</filename>. |
| 123 | <!-- FIXME: do we require always gte 2.4.10 for nosmp ? --> |
| 124 | </para> |
| 125 | <para> |
| 126 | Profiling the Cell Broadband Engine PowerPC Processing Element (PPE) requires a kernel version |
| 127 | of 2.6.18 or more recent. |
| 128 | Profiling the Cell Broadband Engine Synergistic Processing Element (SPE) requires a kernel version |
| 129 | of 2.6.22 or more recent. Additionally, full support of SPE profiling requires a BFD library |
| 130 | from binutils code dated January 2007 or later. To ensure the proper BFD support exists, run |
| 131 | the <code>configure</code> utility with <code>--with-target=cell-be</code>. |
| 132 | |
| 133 | Profiling the Cell Broadband Engine using SPU events requires a kernel version of 2.6.29-rc1 |
| 134 | or more recent. |
| 135 | |
| 136 | <note>Attempting to profile SPEs with kernel versions older than 2.6.22 may cause the |
| 137 | system to crash.</note> |
| 138 | </para> |
| 139 | |
| 140 | <para> |
| 141 | Instruction-Based Sampling (IBS) profile on AMD family10h processors requires |
| 142 | kernel version 2.6.28-rc2 or later. |
| 143 | </para> |
| 144 | </listitem> |
| 145 | </varlistentry> |
| 146 | <varlistentry> |
| 147 | <term>modutils 2.4.6 or above</term> |
| 148 | <listitem><para> |
| 149 | You should have installed modutils 2.4.6 or higher (in fact earlier versions work well in almost all |
| 150 | cases). |
| 151 | </para></listitem> |
| 152 | </varlistentry> |
| 153 | <varlistentry> |
| 154 | <term>Supported architecture</term> |
| 155 | <listitem><para> |
| 156 | For Intel IA32, a CPU with either a P6 generation or Pentium 4 core is |
| 157 | required. In marketing terms this translates to anything |
| 158 | between an Intel Pentium Pro (not Pentium Classics) and |
| 159 | a Pentium 4 / Xeon, including all Celerons. The AMD |
| 160 | Athlon, Opteron, Phenom, and Turion CPUs are also supported. Other IA32 |
| 161 | CPU types only support the RTC mode of OProfile; please |
| 162 | see later in this manual for details. Hyper-threaded Pentium IVs |
| 163 | are not supported in 2.4. For 2.4 kernels, the Intel |
| 164 | IA-64 CPUs are also supported. For 2.6 kernels, there is additionally |
| 165 | support for Alpha processors, MIPS, ARM, x86-64, sparc64, ppc64, AVR32, and, |
| 166 | in timer mode, PA-RISC and s390. |
| 167 | </para></listitem> |
| 168 | </varlistentry> |
| 169 | <varlistentry> |
| 170 | <term>Uniprocessor or SMP</term> |
| 171 | <listitem><para> |
| 172 | SMP machines are fully supported. |
| 173 | </para></listitem> |
| 174 | </varlistentry> |
| 175 | <varlistentry> |
| 176 | <term>Required libraries</term> |
| 177 | <listitem><para> |
| 178 | These libraries are required : <filename>popt</filename>, <filename>bfd</filename>, |
| 179 | <filename>liberty</filename> (debian users: libiberty is provided in binutils-dev package), <filename>dl</filename>, |
| 180 | plus the standard C++ libraries. |
| 181 | </para></listitem> |
| 182 | </varlistentry> |
| 183 | <varlistentry> |
| 184 | <term>Required user account</term> |
| 185 | <listitem><para> |
| 186 | For secure processing of sample data from JIT virtual machines (e.g., Java), |
| 187 | the special user account "oprofile" must exist on the system. The 'configure' |
| 188 | and 'make install' operations will print warning messages if this |
| 189 | account is not found. If you intend to profile JITed code, you must create |
| 190 | a group account named 'oprofile' and then create the 'oprofile' user account, |
| 191 | setting the default group to 'oprofile'. A runtime error message is printed to |
| 192 | the oprofile daemon log when processing JIT samples if this special user |
| 193 | account cannot be found. |
| 194 | </para></listitem> |
| 195 | </varlistentry> |
| 196 | <varlistentry> |
| 197 | <term>OProfile GUI</term> |
| 198 | <listitem><para> |
| 199 | The use of the GUI to start the profiler requires the <filename>Qt 2</filename> library. <filename>Qt 3</filename> should |
| 200 | also work. |
| 201 | </para></listitem> |
| 202 | </varlistentry> |
| 203 | <varlistentry> |
| 204 | <term><acronym>ELF</acronym></term> |
| 205 | <listitem><para> |
| 206 | Probably not too strenuous a requirement, but older <acronym>A.OUT</acronym> binaries/libraries are not supported. |
| 207 | </para></listitem> |
| 208 | </varlistentry> |
| 209 | <varlistentry> |
| 210 | <term>K&R coding style</term> |
| 211 | <listitem><para> |
| 212 | OK, so it's not really a requirement, but I wish it was... |
| 213 | </para></listitem> |
| 214 | </varlistentry> |
| 215 | </variablelist> |
| 216 | |
| 217 | |
| 218 | </sect1> |
| 219 | |
| 220 | <sect1 id="resources"> |
| 221 | <title>Internet resources</title> |
| 222 | |
| 223 | <variablelist> |
| 224 | <varlistentry> |
| 225 | <term>Web page</term> |
| 226 | <listitem><para> |
| 227 | There is a web page (which you may be reading now) at |
| 228 | <ulink url="http://oprofile.sf.net/">http://oprofile.sf.net/</ulink>. |
| 229 | </para></listitem> |
| 230 | </varlistentry> |
| 231 | <varlistentry> |
| 232 | <term>Download</term> |
| 233 | <listitem><para> |
| 234 | You can download a source tarball or get anonymous CVS at the sourceforge page, |
| 235 | <ulink url="http://sf.net/projects/oprofile/">http://sf.net/projects/oprofile/</ulink>. |
| 236 | </para></listitem> |
| 237 | </varlistentry> |
| 238 | <varlistentry> |
| 239 | <term>Mailing list</term> |
| 240 | <listitem><para> |
| 241 | There is a low-traffic OProfile-specific mailing list, details at |
| 242 | <ulink url="http://sf.net/mail/?group_id=16191">http://sf.net/mail/?group_id=16191</ulink>. |
| 243 | </para></listitem> |
| 244 | </varlistentry> |
| 245 | <varlistentry> |
| 246 | <term>Bug tracker</term> |
| 247 | <listitem><para> |
| 248 | There is a bug tracker for OProfile at SourceForge, |
| 249 | <ulink url="http://sf.net/tracker/?group_id=16191&atid=116191">http://sf.net/tracker/?group_id=16191&atid=116191</ulink>. |
| 250 | </para></listitem> |
| 251 | </varlistentry> |
| 252 | <varlistentry> |
| 253 | <term>IRC channel</term> |
| 254 | <listitem><para> |
| 255 | Several OProfile developers and users sometimes hang out on channel <command>#oprofile</command> |
| 256 | on the <ulink url="http://oftc.net">OFTC</ulink> network. |
| 257 | </para></listitem> |
| 258 | </varlistentry> |
| 259 | </variablelist> |
| 260 | |
| 261 | </sect1> |
| 262 | |
| 263 | <sect1 id="install"> |
| 264 | <title>Installation</title> |
| 265 | |
| 266 | <para> |
| 267 | First you need to build OProfile and install it. <command>./configure</command>, <command>make</command>, <command>make install</command> |
| 268 | is often all you need, but note these arguments to <command>./configure</command> : |
| 269 | </para> |
| 270 | <variablelist> |
| 271 | <varlistentry> |
| 272 | <term><option>--with-linux</option></term> |
| 273 | <listitem><para> |
| 274 | Use this option to specify the location of the kernel source tree you wish |
| 275 | to compile against. The kernel module is built against this source and |
| 276 | will only work with a running kernel built from the same source with |
| 277 | exact same options, so it is important you specify this option if you need |
| 278 | to. |
| 279 | </para></listitem> |
| 280 | </varlistentry> |
| 281 | <varlistentry> |
| 282 | <term><option>--with-java</option></term> |
| 283 | <listitem> |
| 284 | <para> |
| 285 | Use this option if you need to profile Java applications. Also, see |
| 286 | <xref linkend="requirements"/>, "Required user account". This option |
| 287 | is used to specify the location of the Java Development Kit (JDK) |
| 288 | source tree you wish to use. This is necessary to get the interface description |
| 289 | of the JVMPI (or JVMTI) interface to compile the JIT support code successfully. |
| 290 | </para> |
| 291 | <note> |
| 292 | <para> |
| 293 | The Java Runtime Environment (JRE) does not include the development |
| 294 | files that are required to compile the JIT support code, so the full |
| 295 | JDK must be installed in order to use this option. |
| 296 | </para> |
| 297 | </note> |
| 298 | <para> |
| 299 | By default, the Oprofile JIT support libraries will be installed in |
| 300 | <filename><oprof_install_dir>/lib/oprofile</filename>. To build |
| 301 | and install OProfile and the JIT support libraries as 64-bit, you can |
| 302 | do something like the following: |
| 303 | <screen> |
| 304 | # CFLAGS="-m64" CXXFLAGS="-m64" ./configure \ |
| 305 | --with-kernel-support --with-java={my_jdk_installdir} \ |
| 306 | --libdir=/usr/local/lib64 |
| 307 | </screen> |
| 308 | </para> |
| 309 | <note> |
| 310 | <para> |
| 311 | If you encounter errors building 64-bit, you should |
| 312 | install libtool 1.5.26 or later since that release of |
| 313 | libtool fixes known problems for certain platforms. |
| 314 | If you install libtool into a non-standard location, |
| 315 | you'll need to edit the invocation of 'aclocal' in |
| 316 | OProfile's autogen.sh as follows (assume an install |
| 317 | location of /usr/local): |
| 318 | </para> |
| 319 | <para> |
| 320 | <code>aclocal -I m4 -I /usr/local/share/aclocal</code> |
| 321 | </para> |
| 322 | </note> |
| 323 | </listitem> |
| 324 | </varlistentry> |
| 325 | <varlistentry> |
| 326 | <term><option>--with-kernel-support</option></term> |
| 327 | <listitem><para> |
| 328 | Use this option with 2.6 and above kernels to indicate the |
| 329 | kernel provides the OProfile device driver. |
| 330 | </para></listitem> |
| 331 | </varlistentry> |
| 332 | <varlistentry> |
| 333 | <term><option>--with-qt-dir/includes/libraries</option></term> |
| 334 | <listitem><para> |
| 335 | Specify the location of Qt headers and libraries. It defaults to searching in |
| 336 | <constant>$QTDIR</constant> if these are not specified. |
| 337 | </para></listitem> |
| 338 | </varlistentry> |
| 339 | <varlistentry id="disable-werror"> |
| 340 | <term><option>--disable-werror</option></term> |
| 341 | <listitem><para> |
| 342 | Development versions of OProfile build by |
| 343 | default with <option>-Werror</option>. This option turns |
| 344 | <option>-Werror</option> off. |
| 345 | </para></listitem> |
| 346 | </varlistentry> |
| 347 | <varlistentry id="disable-optimization"> |
| 348 | <term><option>--disable-optimization</option></term> |
| 349 | <listitem><para> |
| 350 | Disable the <option>-O2</option> compiler flag |
| 351 | (useful if you discover an OProfile bug and want to give a useful |
| 352 | back-trace etc.) |
| 353 | </para></listitem> |
| 354 | </varlistentry> |
| 355 | </variablelist> |
| 356 | <para> |
| 357 | You'll need to have a configured kernel source for the current kernel |
| 358 | to build the module for 2.4 kernels. Since all distributions provide different kernels it's unlikely the running kernel match the configured source |
| 359 | you installed. The safest way is to recompile your own kernel, run it and compile oprofile. It is also recommended that if you have a |
| 360 | uniprocessor machine, you enable the local APIC / IO_APIC support for |
| 361 | your kernel (this is automatically enabled for SMP kernels). With many BIOS, kernel >= 2.6.9 and UP kernel it's not sufficient to enable the local APIC you must also turn it on explicitly at boot time by providing "lapic" option to the kernel. On |
| 362 | machines with power management, such as laptops, the power management |
| 363 | must be turned off when using OProfile with 2.4 kernels. The power management software |
| 364 | in the BIOS cannot handle the non-maskable interrupts (NMIs) used by |
| 365 | OProfile for data collection. If you use the NMI watchdog, be aware that |
| 366 | the watchdog is disabled when profiling starts, and not re-enabled until the |
| 367 | OProfile module is removed (or, in 2.6, when OProfile is not running). If you compile OProfile for |
| 368 | a 2.2 kernel you must be root to compile the module. If you are using |
| 369 | 2.6 kernels or higher, you do not need kernel source, as long as the |
| 370 | OProfile driver is enabled; additionally, you should not need to disable |
| 371 | power management. |
| 372 | </para> |
| 373 | <para> |
| 374 | Please note that you must save or have available the <filename>vmlinux</filename> file |
| 375 | generated during a kernel compile, as OProfile needs it (you can use |
| 376 | <option>--no-vmlinux</option>, but this will prevent kernel profiling). |
| 377 | </para> |
| 378 | |
| 379 | </sect1> |
| 380 | |
| 381 | <sect1 id="uninstall"> |
| 382 | <title>Uninstalling OProfile</title> |
| 383 | <para> |
| 384 | You must have the source tree available to uninstall OProfile; a <command>make uninstall</command> will |
| 385 | remove all installed files except your configuration file in the directory <filename>~/.oprofile</filename>. |
| 386 | </para> |
| 387 | </sect1> |
| 388 | |
| 389 | </chapter> |
| 390 | |
| 391 | <chapter id="overview"> |
| 392 | <title>Overview</title> |
| 393 | |
| 394 | <sect1 id="getting-started"> |
| 395 | <title>Getting started</title> |
| 396 | <para> |
| 397 | Before you can use OProfile, you must set it up. The minimum setup required for this |
| 398 | is to tell OProfile where the <filename>vmlinux</filename> file corresponding to the |
| 399 | running kernel is, for example : |
| 400 | </para> |
| 401 | <screen>opcontrol --vmlinux=/boot/vmlinux-`uname -r`</screen> |
| 402 | <para> |
| 403 | If you don't want to profile the kernel itself, |
| 404 | you can tell OProfile you don't have a <filename>vmlinux</filename> file : |
| 405 | </para> |
| 406 | <screen>opcontrol --no-vmlinux</screen> |
| 407 | <para> |
| 408 | Now we are ready to start the daemon (<command>oprofiled</command>) which collects |
| 409 | the profile data : |
| 410 | </para> |
| 411 | <screen>opcontrol --start</screen> |
| 412 | <para> |
| 413 | When I want to stop profiling, I can do so with : |
| 414 | </para> |
| 415 | <screen>opcontrol --shutdown</screen> |
| 416 | <para> |
| 417 | Note that unlike <command>gprof</command>, no instrumentation (<option>-pg</option> |
| 418 | and <option>-a</option> options to <command>gcc</command>) |
| 419 | is necessary. |
| 420 | </para> |
| 421 | <para> |
| 422 | Periodically (or on <command>opcontrol --shutdown</command> or <command>opcontrol --dump</command>) |
| 423 | the profile data is written out into the $SESSION_DIR/samples directory (by default at <filename>/var/lib/oprofile/samples</filename>). |
| 424 | These profile files cover shared libraries, applications, the kernel (vmlinux), and kernel modules. |
| 425 | You can clear the profile data (at any time) with <command>opcontrol --reset</command>. |
| 426 | </para> |
| 427 | <para> |
| 428 | To place these sample database files in a specific directory instead of the default location (<filename>/var/lib/oprofile</filename>) use the <option>--session-dir=dir</option> option. You must also specify the <option>--session-dir</option> to tell the tools to continue using this directory. (In the future, we should allow this to be specified in an environment variable.) : |
| 429 | </para> |
| 430 | <screen>opcontrol --no-vmlinux --session-dir=/home/me/tmpsession</screen> |
| 431 | <screen>opcontrol --start --session-dir=/home/me/tmpsession</screen> |
| 432 | <para> |
| 433 | You can get summaries of this data in a number of ways at any time. To get a summary of |
| 434 | data across the entire system for all of these profiles, you can do : |
| 435 | </para> |
| 436 | <screen>opreport [--session-dir=dir]</screen> |
| 437 | <para> |
| 438 | Or to get a more detailed summary, for a particular image, you can do something like : |
| 439 | </para> |
| 440 | <screen>opreport -l /boot/vmlinux-`uname -r`</screen> |
| 441 | <para> |
| 442 | There are also a number of other ways of presenting the data, as described later in this manual. |
| 443 | Note that OProfile will choose a default profiling setup for you. However, there are a number |
| 444 | of options you can pass to <command>opcontrol</command> if you need to change something, |
| 445 | also detailed later. |
| 446 | </para> |
| 447 | |
| 448 | </sect1> |
| 449 | |
| 450 | <sect1 id="tools-overview"> |
| 451 | <title>Tools summary</title> |
| 452 | <para> |
| 453 | This section gives a brief description of the available OProfile utilities and their purpose. |
| 454 | </para> |
| 455 | <variablelist> |
| 456 | <varlistentry> |
| 457 | <term><filename>ophelp</filename></term> |
| 458 | <listitem><para> |
| 459 | This utility lists the available events and short descriptions. |
| 460 | </para></listitem> |
| 461 | </varlistentry> |
| 462 | |
| 463 | <varlistentry> |
| 464 | <term><filename>opcontrol</filename></term> |
| 465 | <listitem><para> |
| 466 | Used for controlling the OProfile data collection, discussed in <xref linkend="controlling" />. |
| 467 | </para></listitem> |
| 468 | </varlistentry> |
| 469 | |
| 470 | <varlistentry> |
| 471 | <term><filename>agent libraries</filename></term> |
| 472 | <listitem><para> |
| 473 | Used by virtual machines (like the Java VM) to record information about JITed code being profiled. See <xref linkend="setup-jit" />. |
| 474 | </para></listitem> |
| 475 | </varlistentry> |
| 476 | |
| 477 | <varlistentry> |
| 478 | <term><filename>opreport</filename></term> |
| 479 | <listitem><para> |
| 480 | This is the main tool for retrieving useful profile data, described in |
| 481 | <xref linkend="opreport" />. |
| 482 | </para></listitem> |
| 483 | </varlistentry> |
| 484 | |
| 485 | <varlistentry> |
| 486 | <term><filename>opannotate</filename></term> |
| 487 | <listitem><para> |
| 488 | This utility can be used to produce annotated source, assembly or mixed source/assembly. |
| 489 | Source level annotation is available only if the application was compiled with |
| 490 | debugging symbols. See <xref linkend="opannotate" />. |
| 491 | </para></listitem> |
| 492 | </varlistentry> |
| 493 | |
| 494 | <varlistentry> |
| 495 | <term><filename>opgprof</filename></term> |
| 496 | <listitem><para> |
| 497 | This utility can output gprof-style data files for a binary, for use with |
| 498 | <command>gprof -p</command>. See <xref linkend="opgprof" />. |
| 499 | </para></listitem> |
| 500 | </varlistentry> |
| 501 | |
| 502 | <varlistentry> |
| 503 | <term><filename>oparchive</filename></term> |
| 504 | <listitem><para> |
| 505 | This utility can be used to collect executables, debuginfo, |
| 506 | and sample files and copy the files into an archive. |
| 507 | The archive is self-contained and can be moved to another |
| 508 | machine for further analysis. |
| 509 | See <xref linkend="oparchive" />. |
| 510 | </para></listitem> |
| 511 | </varlistentry> |
| 512 | |
| 513 | <varlistentry> |
| 514 | <term><filename>opimport</filename></term> |
| 515 | <listitem><para> |
| 516 | This utility converts sample database files from a foreign binary format (abi) to |
| 517 | the native format. This is useful only when moving sample files between hosts, |
| 518 | for analysis on platforms other than the one used for collection. |
| 519 | See <xref linkend="opimport" />. |
| 520 | </para></listitem> |
| 521 | </varlistentry> |
| 522 | |
| 523 | </variablelist> |
| 524 | </sect1> |
| 525 | |
| 526 | </chapter> |
| 527 | |
| 528 | <chapter id="controlling"> |
| 529 | <title>Controlling the profiler</title> |
| 530 | |
| 531 | <sect1 id="controlling-daemon"> |
| 532 | <title>Using <command>opcontrol</command></title> |
| 533 | <para> |
| 534 | In this section we describe the configuration and control of the profiling system |
| 535 | with opcontrol in more depth. |
| 536 | The <command>opcontrol</command> script has a default setup, but you |
| 537 | can alter this with the options given below. In particular, |
| 538 | if your hardware supports performance counters, you can configure them. |
| 539 | There are a number of counters (for example, counter 0 and counter 1 |
| 540 | on the Pentium III). Each of these counters can be programmed with |
| 541 | an event to count, such as cache misses or MMX operations. The event |
| 542 | chosen for each counter is reflected in the profile data collected |
| 543 | by OProfile: functions and binaries at the top of the profiles reflect |
| 544 | that most of the chosen events happened within that code. |
| 545 | </para> |
| 546 | <para> |
| 547 | Additionally, each counter has a "count" value: this corresponds to how |
| 548 | detailed the profile is. The lower the value, the more frequently profile |
| 549 | samples are taken. A counter can choose to sample only kernel code, user-space code, |
| 550 | or both (both is the default). Finally, some events have a "unit mask" |
| 551 | - this is a value that further restricts the types of event that are counted. |
| 552 | The event types and unit masks for your CPU are listed by <command>opcontrol |
| 553 | --list-events</command>. |
| 554 | </para> |
| 555 | <para> |
| 556 | The <command>opcontrol</command> script provides the following actions : |
| 557 | </para> |
| 558 | <variablelist> |
| 559 | <varlistentry> |
| 560 | <term><option>--init</option></term> |
| 561 | <listitem><para> |
| 562 | Loads the OProfile module if required and makes the OProfile driver |
| 563 | interface available. |
| 564 | </para></listitem> |
| 565 | </varlistentry> |
| 566 | <varlistentry> |
| 567 | <term><option>--setup</option></term> |
| 568 | <listitem><para> |
| 569 | Followed by list arguments for profiling set up. List of arguments |
| 570 | saved in <filename>/root/.oprofile/daemonrc</filename>. |
| 571 | Giving this option is not necessary; you can just directly pass one |
| 572 | of the setup options, e.g. <command>opcontrol --no-vmlinux</command>. |
| 573 | </para></listitem> |
| 574 | </varlistentry> |
| 575 | <varlistentry> |
| 576 | <term><option>--status</option></term> |
| 577 | <listitem><para> |
| 578 | Show configuration information. |
| 579 | </para></listitem> |
| 580 | </varlistentry> |
| 581 | <varlistentry> |
| 582 | <term><option>--start-daemon</option></term> |
| 583 | <listitem><para> |
| 584 | Start the oprofile daemon without starting actual profiling. The profiling |
| 585 | can then be started using <option>--start</option>. This is useful for avoiding |
| 586 | measuring the cost of daemon startup, as <option>--start</option> is a simple |
| 587 | write to a file in oprofilefs. Not available in 2.2/2.4 kernels. |
| 588 | </para></listitem> |
| 589 | </varlistentry> |
| 590 | <varlistentry> |
| 591 | <term><option>--start</option></term> |
| 592 | <listitem><para> |
| 593 | Start data collection with either arguments provided by <option>--setup</option> |
| 594 | or information saved in <filename>/root/.oprofile/daemonrc</filename>. Specifying |
| 595 | the addition <option>--verbose</option> makes the daemon generate lots of debug data |
| 596 | whilst it is running. |
| 597 | </para></listitem> |
| 598 | </varlistentry> |
| 599 | <varlistentry> |
| 600 | <term><option>--dump</option></term> |
| 601 | <listitem><para> |
| 602 | Force a flush of the collected profiling data to the daemon. |
| 603 | </para></listitem> |
| 604 | </varlistentry> |
| 605 | <varlistentry> |
| 606 | <term><option>--stop</option></term> |
| 607 | <listitem><para> |
| 608 | Stop data collection (this separate step is not possible with 2.2 or 2.4 kernels). |
| 609 | </para></listitem> |
| 610 | </varlistentry> |
| 611 | <varlistentry> |
| 612 | <term><option>--shutdown</option></term> |
| 613 | <listitem><para> |
| 614 | Stop data collection and kill the daemon. |
| 615 | </para></listitem> |
| 616 | </varlistentry> |
| 617 | <varlistentry> |
| 618 | <term><option>--reset</option></term> |
| 619 | <listitem><para> |
| 620 | Clears out data from current session, but leaves saved sessions. |
| 621 | </para></listitem> |
| 622 | </varlistentry> |
| 623 | <varlistentry> |
| 624 | <term><option>--save=</option>session_name</term> |
| 625 | <listitem><para> |
| 626 | Save data from current session to session_name. |
| 627 | </para></listitem> |
| 628 | </varlistentry> |
| 629 | <varlistentry> |
| 630 | <term><option>--deinit</option></term> |
| 631 | <listitem><para> |
| 632 | Shuts down daemon. Unload the OProfile module and oprofilefs. |
| 633 | </para></listitem> |
| 634 | </varlistentry> |
| 635 | <varlistentry> |
| 636 | <term><option>--list-events</option></term> |
| 637 | <listitem><para> |
| 638 | List event types and unit masks. |
| 639 | </para></listitem> |
| 640 | </varlistentry> |
| 641 | <varlistentry> |
| 642 | <term><option>--help</option></term> |
| 643 | <listitem><para> |
| 644 | Generate usage messages. |
| 645 | </para></listitem> |
| 646 | </varlistentry> |
| 647 | </variablelist> |
| 648 | |
| 649 | <para> |
| 650 | There are a number of possible settings, of which, only |
| 651 | <option>--vmlinux</option> (or <option>--no-vmlinux</option>) |
| 652 | is required. These settings are stored in <filename>~/.oprofile/daemonrc</filename>. |
| 653 | </para> |
| 654 | <variablelist> |
| 655 | <varlistentry> |
| 656 | <term><option>--buffer-size=</option>num</term> |
| 657 | <listitem><para> |
| 658 | Number of samples in kernel buffer. When using a 2.6 kernel |
| 659 | buffer watershed need to be tweaked when changing this value. |
| 660 | </para></listitem> |
| 661 | </varlistentry> |
| 662 | <varlistentry> |
| 663 | <term><option>--buffer-watershed=</option>num</term> |
| 664 | <listitem><para> |
| 665 | Set kernel buffer watershed to num samples (2.6 only). When it'll remain only |
| 666 | buffer-size - buffer-watershed free entry in the kernel buffer data will be |
| 667 | flushed to daemon, most usefull value are in the range [0.25 - 0.5] * buffer-size. |
| 668 | </para></listitem> |
| 669 | </varlistentry> |
| 670 | <varlistentry> |
| 671 | <term><option>--cpu-buffer-size=</option>num</term> |
| 672 | <listitem><para> |
| 673 | Number of samples in kernel per-cpu buffer (2.6 only). If you |
| 674 | profile at high rate it can help to increase this if the log |
| 675 | file show excessive count of sample lost cpu buffer overflow. |
| 676 | </para></listitem> |
| 677 | </varlistentry> |
| 678 | <varlistentry> |
| 679 | <term><option>--event=</option>[eventspec]</term> |
| 680 | <listitem><para> |
| 681 | Use the given performance counter event to profile. |
| 682 | See <xref linkend="eventspec" /> below. |
| 683 | </para></listitem> |
| 684 | </varlistentry> |
| 685 | <varlistentry> |
| 686 | <term><option>--session-dir=</option>dir_path</term> |
| 687 | <listitem><para> |
| 688 | Create/use sample database out of directory <filename>dir_path</filename> instead of |
| 689 | the default location (/var/lib/oprofile). |
| 690 | </para></listitem> |
| 691 | </varlistentry> |
| 692 | <varlistentry> |
| 693 | <term><option>--separate=</option>[none,lib,kernel,thread,cpu,all]</term> |
| 694 | <listitem><para> |
| 695 | By default, every profile is stored in a single file. Thus, for example, |
| 696 | samples in the C library are all accredited to the <filename>/lib/libc.o</filename> |
| 697 | profile. However, you choose to create separate sample files by specifying |
| 698 | one of the below options. |
| 699 | </para> |
| 700 | <informaltable frame="all"> |
| 701 | <tgroup cols='2'> |
| 702 | <tbody> |
| 703 | <row><entry><option>none</option></entry><entry>No profile separation (default)</entry></row> |
| 704 | <row><entry><option>lib</option></entry><entry>Create per-application profiles for libraries</entry></row> |
| 705 | <row><entry><option>kernel</option></entry><entry>Create per-application profiles for the kernel and kernel modules</entry></row> |
| 706 | <row><entry><option>thread</option></entry><entry>Create profiles for each thread and each task</entry></row> |
| 707 | <row><entry><option>cpu</option></entry><entry>Create profiles for each CPU</entry></row> |
| 708 | <row><entry><option>all</option></entry><entry>All of the above options</entry></row> |
| 709 | </tbody> |
| 710 | </tgroup> |
| 711 | </informaltable> |
| 712 | <para> |
| 713 | Note that <option>--separate=kernel</option> also turns on <option>--separate=lib</option>. |
| 714 | <!-- FIXME: update if this change --> |
| 715 | When using <option>--separate=kernel</option>, samples in hardware interrupts, soft-irqs, or other |
| 716 | asynchronous kernel contexts are credited to the task currently running. This means you will see |
| 717 | seemingly nonsense profiles such as <filename>/bin/bash</filename> showing samples for the PPP modules, |
| 718 | etc. |
| 719 | </para> |
| 720 | <para> |
| 721 | On 2.2/2.4 only kernel threads already started when profiling begins are correctly profiled; |
| 722 | newly started kernel thread samples are credited to the vmlinux (kernel) profile. |
| 723 | </para> |
| 724 | <para> |
| 725 | Using <option>--separate=thread</option> creates a lot |
| 726 | of sample files if you leave OProfile running for a while; it's most |
| 727 | useful when used for short sessions, or when using image filtering. |
| 728 | </para> |
| 729 | </listitem> |
| 730 | </varlistentry> |
| 731 | <varlistentry> |
| 732 | <term><option>--callgraph=</option>#depth</term> |
| 733 | <listitem><para> |
| 734 | Enable call-graph sample collection with a maximum depth. Use 0 to disable |
| 735 | callgraph profiling. NOTE: Callgraph support is available on a limited |
| 736 | number of platforms at this time; for example: |
| 737 | <para> |
| 738 | <itemizedlist> |
| 739 | <listitem><para>x86 with recent 2.6 kernel</para></listitem> |
| 740 | <listitem><para>ARM with recent 2.6 kernel</para></listitem> |
| 741 | <listitem><para>PowerPC with 2.6.17 kernel</para></listitem> |
| 742 | </itemizedlist> |
| 743 | </para> |
| 744 | </para></listitem> |
| 745 | </varlistentry> |
| 746 | <varlistentry> |
| 747 | <term><option>--image=</option>image,[images]|"all"</term> |
| 748 | <listitem><para> |
| 749 | Image filtering. If you specify one or more absolute |
| 750 | paths to binaries, OProfile will only produce profile results for those |
| 751 | binary images. This is useful for restricting the sometimes voluminous |
| 752 | output you may get otherwise, especially with |
| 753 | <option>--separate=thread</option>. Note that if you are using |
| 754 | <option>--separate=lib</option> or |
| 755 | <option>--separate=kernel</option>, then if you specification an |
| 756 | application binary, the shared libraries and kernel code |
| 757 | <emphasis>are</emphasis> included. Specify the value |
| 758 | "all" to profile everything (the default). |
| 759 | </para></listitem> |
| 760 | </varlistentry> |
| 761 | <varlistentry> |
| 762 | <term><option>--vmlinux=</option>file</term> |
| 763 | <listitem><para> |
| 764 | vmlinux kernel image. |
| 765 | </para></listitem> |
| 766 | </varlistentry> |
| 767 | <varlistentry> |
| 768 | <term><option>--no-vmlinux</option></term> |
| 769 | <listitem><para> |
| 770 | Use this when you don't have a kernel vmlinux file, and you don't want |
| 771 | to profile the kernel. This still counts the total number of kernel samples, |
| 772 | but can't give symbol-based results for the kernel or any modules. |
| 773 | </para></listitem> |
| 774 | </varlistentry> |
| 775 | </variablelist> |
| 776 | |
| 777 | <sect2 id="opcontrolexamples"> |
| 778 | <title>Examples</title> |
| 779 | |
| 780 | <sect3 id="examplesperfctr"> |
| 781 | <title>Intel performance counter setup</title> |
| 782 | <para> |
| 783 | Here, we have a Pentium III running at 800MHz, and we want to look at where data memory |
| 784 | references are happening most, and also get results for CPU time. |
| 785 | </para> |
| 786 | <screen> |
| 787 | # opcontrol --event=CPU_CLK_UNHALTED:400000 --event=DATA_MEM_REFS:10000 |
| 788 | # opcontrol --vmlinux=/boot/2.6.0/vmlinux |
| 789 | # opcontrol --start |
| 790 | </screen> |
| 791 | </sect3> |
| 792 | |
| 793 | <sect3 id="examplesrtc"> |
| 794 | <title>RTC mode</title> |
| 795 | <para> |
| 796 | Here, we have an Intel laptop without support for performance counters, running on 2.4 kernels. |
| 797 | </para> |
| 798 | <screen> |
| 799 | # ophelp -r |
| 800 | CPU with RTC device |
| 801 | # opcontrol --vmlinux=/boot/2.4.13/vmlinux --event=RTC_INTERRUPTS:1024 |
| 802 | # opcontrol --start |
| 803 | </screen> |
| 804 | </sect3> |
| 805 | |
| 806 | <sect3 id="examplesstartdaemon"> |
| 807 | <title>Starting the daemon separately</title> |
| 808 | <para> |
| 809 | If we're running 2.6 kernels, we can use <option>--start-daemon</option> to avoid |
| 810 | the profiler startup affecting results. |
| 811 | </para> |
| 812 | <screen> |
| 813 | # opcontrol --vmlinux=/boot/2.6.0/vmlinux |
| 814 | # opcontrol --start-daemon |
| 815 | # my_favourite_benchmark --init |
| 816 | # opcontrol --start ; my_favourite_benchmark --run ; opcontrol --stop |
| 817 | </screen> |
| 818 | </sect3> |
| 819 | |
| 820 | <sect3 id="exampleseparate"> |
| 821 | <title>Separate profiles for libraries and the kernel</title> |
| 822 | <para> |
| 823 | Here, we want to see a profile of the OProfile daemon itself, including when |
| 824 | it was running inside the kernel driver, and its use of shared libraries. |
| 825 | </para> |
| 826 | <screen> |
| 827 | # opcontrol --separate=kernel --vmlinux=/boot/2.6.0/vmlinux |
| 828 | # opcontrol --start |
| 829 | # my_favourite_stress_test --run |
| 830 | # opreport -l -p /lib/modules/2.6.0/kernel /usr/local/bin/oprofiled |
| 831 | </screen> |
| 832 | </sect3> |
| 833 | |
| 834 | <sect3 id="examplessessions"> |
| 835 | <title>Profiling sessions</title> |
| 836 | <para> |
| 837 | It can often be useful to split up profiling data into several different |
| 838 | time periods. For example, you may want to collect data on an application's |
| 839 | startup separately from the normal runtime data. You can use the simple |
| 840 | command <command>opcontrol --save</command> to do this. For example : |
| 841 | </para> |
| 842 | <screen> |
| 843 | # opcontrol --save=blah |
| 844 | </screen> |
| 845 | <para> |
| 846 | will create a sub-directory in <filename>$SESSION_DIR/samples</filename> containing the samples |
| 847 | up to that point (the current session's sample files are moved into this |
| 848 | directory). You can then pass this session name as a parameter to the post-profiling |
| 849 | analysis tools, to only get data up to the point you named the |
| 850 | session. If you do not want to save a session, you can do |
| 851 | <command>rm -rf $SESSION_DIR/samples/sessionname</command> or, for the |
| 852 | current session, <command>opcontrol --reset</command>. |
| 853 | </para> |
| 854 | </sect3> |
| 855 | </sect2> |
| 856 | |
| 857 | <sect2 id="eventspec"> |
| 858 | <title>Specifying performance counter events</title> |
| 859 | <para> |
| 860 | The <option>--event</option> option to <command>opcontrol</command> |
| 861 | takes a specification that indicates how the details of each |
| 862 | hardware performance counter should be setup. If you want to |
| 863 | revert to OProfile's default setting (<option>--event</option> |
| 864 | is strictly optional), use <option>--event=default</option>. Use of this |
| 865 | option over-rides all previous event selections. |
| 866 | </para> |
| 867 | <para> |
| 868 | You can pass multiple event specifications. OProfile will allocate |
| 869 | hardware counters as necessary. Note that some combinations are not |
| 870 | allowed by the CPU; running <command>opcontrol --list-events</command> gives the details |
| 871 | of each event. The event specification is a colon-separated string |
| 872 | of the form <option><emphasis>name</emphasis>:<emphasis>count</emphasis>:<emphasis>unitmask</emphasis>:<emphasis>kernel</emphasis>:<emphasis>user</emphasis></option> as described in this table: |
| 873 | </para> |
| 874 | <informaltable frame="all"> |
| 875 | <tgroup cols='2'> |
| 876 | <tbody> |
| 877 | <row><entry><option>name</option></entry><entry>The symbolic event name, e.g. <constant>CPU_CLK_UNHALTED</constant></entry></row> |
| 878 | <row><entry><option>count</option></entry><entry>The counter reset value, e.g. 100000</entry></row> |
| 879 | <row><entry><option>unitmask</option></entry><entry>The unit mask, as given in the events list, e.g. 0x0f</entry></row> |
| 880 | <row><entry><option>kernel</option></entry><entry>Whether to profile kernel code</entry></row> |
| 881 | <row><entry><option>user</option></entry><entry>Whether to profile userspace code</entry></row> |
| 882 | </tbody> |
| 883 | </tgroup> |
| 884 | </informaltable> |
| 885 | <para> |
| 886 | The last three values are optional, if you omit them (e.g. <option>--event=DATA_MEM_REFS:30000</option>), |
| 887 | they will be set to the default values (a unit mask of 0, and profiling both kernel and |
| 888 | userspace code). Note that some events require a unit mask. |
| 889 | </para> |
| 890 | <note><para> |
| 891 | For the PowerPC platforms, all events specified must be in the same group; i.e., the group number |
| 892 | appended to the event name (e.g. <constant><<emphasis>some-event-name</emphasis>>_GRP9</constant>) must be the same. |
| 893 | </para></note> |
| 894 | <para> |
| 895 | If OProfile is using RTC mode, and you want to alter the default counter value, |
| 896 | you can use something like <option>--event=RTC_INTERRUPTS:2048</option>. Note the last |
| 897 | three values here are ignored. |
| 898 | If OProfile is using timer-interrupt mode, there is no configuration possible. |
| 899 | </para> |
| 900 | <para> |
| 901 | The table below lists the events selected by default |
| 902 | (<option>--event=default</option>) for the various computer architectures: |
| 903 | </para> |
| 904 | <informaltable frame="all"> |
| 905 | <tgroup cols='3'> |
| 906 | <tbody> |
| 907 | <row><entry>Processor</entry><entry>cpu_type</entry><entry>Default event</entry></row> |
| 908 | <row><entry>Alpha EV4</entry><entry>alpha/ev4</entry><entry>CYCLES:100000:0:1:1</entry></row> |
| 909 | <row><entry>Alpha EV5</entry><entry>alpha/ev5</entry><entry>CYCLES:100000:0:1:1</entry></row> |
| 910 | <row><entry>Alpha PCA56</entry><entry>alpha/pca56</entry><entry>CYCLES:100000:0:1:1</entry></row> |
| 911 | <row><entry>Alpha EV6</entry><entry>alpha/ev6</entry><entry>CYCLES:100000:0:1:1</entry></row> |
| 912 | <row><entry>Alpha EV67</entry><entry>alpha/ev67</entry><entry>CYCLES:100000:0:1:1</entry></row> |
| 913 | <row><entry>ARM/XScale PMU1</entry><entry>arm/xscale1</entry><entry>CPU_CYCLES:100000:0:1:1</entry></row> |
| 914 | <row><entry>ARM/XScale PMU2</entry><entry>arm/xscale2</entry><entry>CPU_CYCLES:100000:0:1:1</entry></row> |
| 915 | <row><entry>ARM/MPCore</entry><entry>arm/mpcore</entry><entry>CPU_CYCLES:100000:0:1:1</entry></row> |
| 916 | <row><entry>AVR32</entry><entry>avr32</entry><entry>CPU_CYCLES:100000:0:1:1</entry></row> |
| 917 | <row><entry>Athlon</entry><entry>i386/athlon</entry><entry>CPU_CLK_UNHALTED:100000:0:1:1</entry></row> |
| 918 | <row><entry>Pentium Pro</entry><entry>i386/ppro</entry><entry>CPU_CLK_UNHALTED:100000:0:1:1</entry></row> |
| 919 | <row><entry>Pentium II</entry><entry>i386/pii</entry><entry>CPU_CLK_UNHALTED:100000:0:1:1</entry></row> |
| 920 | <row><entry>Pentium III</entry><entry>i386/piii</entry><entry>CPU_CLK_UNHALTED:100000:0:1:1</entry></row> |
| 921 | <row><entry>Pentium M (P6 core)</entry><entry>i386/p6_mobile</entry><entry>CPU_CLK_UNHALTED:100000:0:1:1</entry></row> |
| 922 | <row><entry>Pentium 4 (non-HT)</entry><entry>i386/p4</entry><entry>GLOBAL_POWER_EVENTS:100000:1:1:1</entry></row> |
| 923 | <row><entry>Pentium 4 (HT)</entry><entry>i386/p4-ht</entry><entry>GLOBAL_POWER_EVENTS:100000:1:1:1</entry></row> |
| 924 | <row><entry>Hammer</entry><entry>x86-64/hammer</entry><entry>CPU_CLK_UNHALTED:100000:0:1:1</entry></row> |
| 925 | <row><entry>Family10h</entry><entry>x86-64/family10</entry><entry>CPU_CLK_UNHALTED:100000:0:1:1</entry></row> |
| 926 | <row><entry>Family11h</entry><entry>x86-64/family11h</entry><entry>CPU_CLK_UNHALTED:100000:0:1:1</entry></row> |
| 927 | <row><entry>Itanium</entry><entry>ia64/itanium</entry><entry>CPU_CYCLES:100000:0:1:1</entry></row> |
| 928 | <row><entry>Itanium 2</entry><entry>ia64/itanium2</entry><entry>CPU_CYCLES:100000:0:1:1</entry></row> |
| 929 | <row><entry>TIMER_INT</entry><entry>timer</entry><entry>None selectable</entry></row> |
| 930 | <row><entry>IBM iseries</entry><entry>PowerPC 4/5/970</entry><entry>CYCLES:10000:0:1:1</entry></row> |
| 931 | <row><entry>IBM pseries</entry><entry>PowerPC 4/5/970/Cell</entry><entry>CYCLES:10000:0:1:1</entry></row> |
| 932 | <row><entry>IBM s390</entry><entry>timer</entry><entry>None selectable</entry></row> |
| 933 | <row><entry>IBM s390x</entry><entry>timer</entry><entry>None selectable</entry></row> |
| 934 | </tbody> |
| 935 | </tgroup> |
| 936 | </informaltable> |
| 937 | |
| 938 | </sect2> |
| 939 | |
| 940 | </sect1> |
| 941 | |
| 942 | <sect1 id="setup-jit"> |
| 943 | <title>Setting up the JIT profiling feature</title> |
| 944 | <para> |
| 945 | To gather information about JITed code from a virtual machine, |
| 946 | it needs to be instrumented with an agent library. We use the |
| 947 | agent libraries for Java in the following example. To use the |
| 948 | Java profiling feature, you must build OProfile with the "--with-java" option |
| 949 | (<xref linkend="install" />). |
| 950 | |
| 951 | </para> |
| 952 | |
| 953 | <sect2 id="setup-jit-jvm"> |
| 954 | <title>JVM instrumentation</title> |
| 955 | <para> |
| 956 | Add this to the startup parameters of the JVM (for JVMTI): |
| 957 | |
| 958 | <screen><option>-agentpath:<libdir>/libjvmti_oprofile.so[=<options>]</option> </screen> |
| 959 | or |
| 960 | <screen><option>-agentlib:jvmti_oprofile[=<options>]</option> </screen> |
| 961 | </para> |
| 962 | <para> |
| 963 | The JVMPI agent implementation is enabled with the command line option |
| 964 | <screen><option>-Xrunjvmpi_oprofile[:<options>]</option> </screen> |
| 965 | </para> |
| 966 | <para> |
| 967 | Currently, there is just one option available -- <option>debug</option>. For JVMPI, |
| 968 | the convention for specifying an option is <option>option_name=[yes|no]</option>. |
| 969 | For JVMTI, the option specification is simply the option name, implying |
| 970 | "yes"; no option specified implies "no". |
| 971 | </para> |
| 972 | <para> |
| 973 | The agent library (installed in <filename><oprof_install_dir>/lib/oprofile</filename>) |
| 974 | needs to be in the library search path (e.g. add the library directory |
| 975 | to <constant>LD_LIBRARY_PATH</constant>). If the command line of |
| 976 | the JVM is not accessible, it may be buried within shell scripts or a |
| 977 | launcher program. It may also be possible to set an environment variable to add |
| 978 | the instrumentation. |
| 979 | For Sun JVMs this is <constant>JAVA_TOOL_OPTIONS</constant>. Please check |
| 980 | your JVM documentation for |
| 981 | further information on the agent startup options. |
| 982 | </para> |
| 983 | |
| 984 | </sect2> |
| 985 | </sect1> |
| 986 | |
| 987 | <sect1 id="oprofile-gui"> |
| 988 | <title>Using <command>oprof_start</command></title> |
| 989 | <para> |
| 990 | The <command>oprof_start</command> application provides a convenient way to start the profiler. |
| 991 | Note that <command>oprof_start</command> is just a wrapper around the <command>opcontrol</command> script, |
| 992 | so it does not provide more services than the script itself. |
| 993 | </para> |
| 994 | <para> |
| 995 | After <command>oprof_start</command> is started you can select the event type for each counter; |
| 996 | the sampling rate and other related parameters are explained in <xref linkend="controlling-daemon" />. |
| 997 | The "Configuration" section allows you to set general parameters such as the buffer size, kernel filename |
| 998 | etc. The counter setup interface should be self-explanatory; <xref linkend="hardware-counters" /> and related |
| 999 | links contain information on using unit masks. |
| 1000 | </para> |
| 1001 | <para> |
| 1002 | A status line shows the current status of the profiler: how long it has been running, and the average |
| 1003 | number of interrupts received per second and the total, over all processors. |
| 1004 | Note that quitting <command>oprof_start</command> does not stop the profiler. |
| 1005 | </para> |
| 1006 | <para> |
| 1007 | Your configuration is saved in the same file as <command>opcontrol</command> uses; that is, |
| 1008 | <filename>~/.oprofile/daemonrc</filename>. |
| 1009 | </para> |
| 1010 | |
| 1011 | </sect1> |
| 1012 | |
| 1013 | <sect1 id="detailed-parameters"> |
| 1014 | <title>Configuration details</title> |
| 1015 | |
| 1016 | <sect2 id="hardware-counters"> |
| 1017 | <title>Hardware performance counters</title> |
| 1018 | <note> |
| 1019 | <para> |
| 1020 | Your CPU type may not include the requisite support for hardware performance counters, in which case |
| 1021 | you must use OProfile in RTC mode in 2.4 (see <xref linkend="rtc" />), or timer mode in 2.6 (see <xref linkend="timer" />). |
| 1022 | You do not really need to read this section unless you are interested in using |
| 1023 | events other than the default event chosen by OProfile. |
| 1024 | </para> |
| 1025 | </note> |
| 1026 | <para> |
| 1027 | The Intel hardware performance counters are detailed in the Intel IA-32 Architecture Manual, Volume 3, available |
| 1028 | from <ulink url="http://developer.intel.com/">http://developer.intel.com/</ulink>. |
| 1029 | The AMD Athlon/Opteron/Phenom/Turion implementation is detailed in <ulink |
| 1030 | url="http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf"> |
| 1031 | http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf</ulink>. |
| 1032 | For PowerPC64 processors in IBM iSeries, pSeries, and blade server systems, processor documentation |
| 1033 | is available at <ulink url="http://www-01.ibm.com/chips/techlib/techlib.nsf/productfamilies/PowerPC/"> |
| 1034 | http://www-01.ibm.com/chips/techlib/techlib.nsf/productfamilies/PowerPC</ulink>. (For example, the |
| 1035 | specific publication containing information on the performance monitor unit for the PowerPC970 is |
| 1036 | "IBM PowerPC 970FX RISC Microprocessor User's Manual.") |
| 1037 | These processors are capable of delivering an interrupt when a counter overflows. |
| 1038 | This is the basic mechanism on which OProfile is based. The delivery mode is <acronym>NMI</acronym>, |
| 1039 | so blocking interrupts in the kernel does not prevent profiling. When the interrupt handler is called, |
| 1040 | the current <acronym>PC</acronym> value and the current task are recorded into the profiling structure. |
| 1041 | This allows the overflow event to be attached to a specific assembly instruction in a binary image. |
| 1042 | The daemon receives this data from the kernel, and writes it to the sample files. |
| 1043 | </para> |
| 1044 | <para> |
| 1045 | If we use an event such as <constant>CPU_CLK_UNHALTED</constant> or <constant>INST_RETIRED</constant> |
| 1046 | (<constant>GLOBAL_POWER_EVENTS</constant> or <constant>INSTR_RETIRED</constant>, respectively, on the Pentium 4), we can |
| 1047 | use the overflow counts as an estimate of actual time spent in each part of code. Alternatively we can profile interesting |
| 1048 | data such as the cache behaviour of routines with the other available counters. |
| 1049 | </para> |
| 1050 | <para> |
| 1051 | However there are several caveats. First, there are those issues listed in the Intel manual. There is a delay |
| 1052 | between the counter overflow and the interrupt delivery that can skew results on a small scale - this means |
| 1053 | you cannot rely on the profiles at the instruction level as being perfectly accurate. |
| 1054 | If you are using an "event-mode" counter such as the cache counters, a count registered against it doesn't mean |
| 1055 | that it is responsible for that event. However, it implies that the counter overflowed in the dynamic |
| 1056 | vicinity of that instruction, to within a few instructions. Further details on this problem can be found in |
| 1057 | <xref linkend="interpreting" /> and also in the Digital paper "ProfileMe: A Hardware Performance Counter". |
| 1058 | </para> |
| 1059 | <para> |
| 1060 | Each counter has several configuration parameters. |
| 1061 | First, there is the unit mask: this simply further specifies what to count. |
| 1062 | Second, there is the counter value, discussed below. Third, there is a parameter whether to increment counts |
| 1063 | whilst in kernel or user space. You can configure these separately for each counter. |
| 1064 | </para> |
| 1065 | <para> |
| 1066 | After each overflow event, the counter will be re-initialized |
| 1067 | such that another overflow will occur after this many events have been counted. Thus, higher |
| 1068 | values mean less-detailed profiling, and lower values mean more detail, but higher overhead. |
| 1069 | Picking a good value for this |
| 1070 | parameter is, unfortunately, somewhat of a black art. It is of course dependent on the event |
| 1071 | you have chosen. |
| 1072 | Specifying too large a value will mean not enough interrupts are generated |
| 1073 | to give a realistic profile (though this problem can be ameliorated by profiling for <emphasis>longer</emphasis>). |
| 1074 | Specifying too small a value can lead to higher performance overhead. |
| 1075 | </para> |
| 1076 | |
| 1077 | </sect2> |
| 1078 | |
| 1079 | <sect2 id="rtc"> |
| 1080 | <title>OProfile in RTC mode</title> |
| 1081 | <note><para> |
| 1082 | This section applies to 2.2/2.4 kernels only. |
| 1083 | </para></note> |
| 1084 | <para> |
| 1085 | Some CPU types do not provide the needed hardware support to use the hardware performance counters. This includes |
| 1086 | some laptops, classic Pentiums, and other CPU types not yet supported by OProfile (such as Cyrix). |
| 1087 | On these machines, OProfile falls |
| 1088 | back to using the real-time clock interrupt to collect samples. This interrupt is also used by the <command>rtc</command> |
| 1089 | module: you cannot have both the OProfile and rtc modules loaded nor the rtc support compiled in the kernel. |
| 1090 | </para> |
| 1091 | <para> |
| 1092 | RTC mode is less capable than the hardware counters mode; in particular, it is unable to profile sections of |
| 1093 | the kernel where interrupts are disabled. There is just one available event, "RTC interrupts", and its value |
| 1094 | corresponds to the number of interrupts generated per second (that is, a higher number means a better profiling |
| 1095 | resolution, and higher overhead). The current implementation of the real-time clock supports only power-of-two |
| 1096 | sampling rates from 2 to 4096 per second. Other values within this range are rounded to the nearest power of |
| 1097 | two. |
| 1098 | </para> |
| 1099 | <para> |
| 1100 | You can force use of the RTC interrupt with the <option>force_rtc=1</option> module parameter. |
| 1101 | </para> |
| 1102 | <para> |
| 1103 | Setting the value from the GUI should be straightforward. On the command line, you need to specify the |
| 1104 | event to <command>opcontrol</command>, e.g. : |
| 1105 | </para> |
| 1106 | <para><command>opcontrol --event=RTC_INTERRUPTS:256</command></para> |
| 1107 | </sect2> |
| 1108 | |
| 1109 | <sect2 id="timer"> |
| 1110 | <title>OProfile in timer interrupt mode</title> |
| 1111 | <note><para> |
| 1112 | This section applies to 2.6 kernels and above only. |
| 1113 | </para></note> |
| 1114 | <para> |
| 1115 | In 2.6 kernels on CPUs without OProfile support for the hardware performance counters, the driver |
| 1116 | falls back to using the timer interrupt for profiling. Like the RTC mode in 2.4 kernels, this is not able to |
| 1117 | profile code that has interrupts disabled. Note that there are no configuration parameters for |
| 1118 | setting this, unlike the RTC and hardware performance counter setup. |
| 1119 | </para> |
| 1120 | <para> |
| 1121 | You can force use of the timer interrupt by using the <option>timer=1</option> module |
| 1122 | parameter (or <option>oprofile.timer=1</option> on the boot command line if OProfile is |
| 1123 | built-in). |
| 1124 | </para> |
| 1125 | </sect2> |
| 1126 | |
| 1127 | <sect2 id="p4"> |
| 1128 | <title>Pentium 4 support</title> |
| 1129 | <para> |
| 1130 | The Pentium 4 / Xeon performance counters are organized around 3 types of model specific registers (MSRs): 45 event |
| 1131 | selection control registers (ESCRs), 18 counter configuration control registers (CCCRs) and 18 counters. ESCRs describe a |
| 1132 | particular set of events which are to be recorded, and CCCRs bind ESCRs to counters and configure their |
| 1133 | operation. Unfortunately the relationship between these registers is quite complex; they cannot all be used with one |
| 1134 | another at any time. There is, however, a subset of 8 counters, 8 ESCRs, and 8 CCCRs which can be used independently of |
| 1135 | one another, so OProfile only accesses those registers, treating them as a bank of 8 "normal" counters, similar |
| 1136 | to those in the P6 or Athlon/Opteron/Phenom/Turion families of CPU. |
| 1137 | </para> |
| 1138 | <para> |
| 1139 | There is currently no support for Precision Event-Based Sampling (PEBS), nor any advanced uses of the Debug Store |
| 1140 | (DS). Current support is limited to the conservative extension of OProfile's existing interrupt-based model described |
| 1141 | above. Performance monitoring hardware on Pentium 4 / Xeon processors with Hyperthreading enabled (multiple logical |
| 1142 | processors on a single die) is not supported in 2.4 kernels (you can use OProfile if you disable hyper-threading, |
| 1143 | though). |
| 1144 | </para> |
| 1145 | </sect2> |
| 1146 | |
| 1147 | <sect2 id="ia64"> |
| 1148 | <title>Intel Itanium 2 support</title> |
| 1149 | <para> |
| 1150 | The Itanium 2 performance monitoring unit (PMU) organizes the counters as four |
| 1151 | pairs of performance event monitoring registers. Each pair is composed of a |
| 1152 | Performance Monitoring Configuration (PMC) register and Performance Monitoring |
| 1153 | Data (PMD) register. The PMC selects the performance event being monitored and |
| 1154 | the PMD determines the sampling interval. The IA64 Performance Monitoring Unit |
| 1155 | (PMU) triggers sampling with maskable interrupts. Thus, samples will not occur |
| 1156 | in sections of the IA64 kernel where interrupts are disabled. |
| 1157 | </para> |
| 1158 | <para> |
| 1159 | None of the advance features of the Itanium 2 performance monitoring unit |
| 1160 | such as opcode matching, address range matching, or precise event sampling are |
| 1161 | supported by this version of OProfile. The Itanium 2 support only maps OProfile's |
| 1162 | existing interrupt-based model to the PMU hardware. |
| 1163 | </para> |
| 1164 | </sect2> |
| 1165 | |
| 1166 | <sect2 id="ppc64"> |
| 1167 | <title>PowerPC64 support</title> |
| 1168 | <para> |
| 1169 | The performance monitoring unit (PMU) for the IBM PowerPC 64-bit processors |
| 1170 | consists of between 4 and 8 counters (depending on the model), plus three |
| 1171 | special purpose registers used for programming the counters -- MMCR0, MMCR1, |
| 1172 | and MMCRA. Advanced features such as instruction matching and thresholding are |
| 1173 | not supported by this version of OProfile. |
| 1174 | <note>Later versions of the IBM POWER5+ processor (beginning with revision 3.0) |
| 1175 | run the performance monitor unit in POWER6 mode, effectively removing OProfile's |
| 1176 | access to counters 5 and 6. These two counters are dedicated to counting |
| 1177 | instructions completed and cycles, respectively. In POWER6 mode, however, the |
| 1178 | counters do not generate an interrupt on overflow and so are unusable by |
| 1179 | OProfile. Kernel versions 2.6.23 and higher will recognize this mode |
| 1180 | and export "ppc64/power5++" as the cpu_type to the oprofilefs pseudo filesystem. |
| 1181 | OProfile userspace responds to this cpu_type by removing these counters from |
| 1182 | the list of potential events to count. Without this kernel support, attempts |
| 1183 | to profile using an event from one of these counters will yield incorrect |
| 1184 | results -- typically, zero (or near zero) samples in the generated report. |
| 1185 | </note> |
| 1186 | </para> |
| 1187 | |
| 1188 | </sect2> |
| 1189 | |
| 1190 | <sect2 id="cell-be"> |
| 1191 | <title>Cell Broadband Engine support</title> |
| 1192 | <para> |
| 1193 | The Cell Broadband Engine (CBE) processor core consists of a PowerPC Processing |
| 1194 | Element (PPE) and 8 Synergistic Processing Elements (SPE). PPEs and SPEs each |
| 1195 | consist of a processing unit (PPU and SPU, respectively) and other hardware |
| 1196 | components, such as memory controllers. |
| 1197 | </para> |
| 1198 | <para> |
| 1199 | A PPU has two hardware threads (aka "virtual CPUs"). The performance monitor |
| 1200 | unit of the CBE collects event information on one hardware thread at a time. |
| 1201 | Therefore, when profiling PPE events, |
| 1202 | OProfile collects the profile based on the selected events by time slicing the |
| 1203 | performance counter hardware between the two threads. The user must ensure the |
| 1204 | collection interval is long enough so that the time spent collecting data for |
| 1205 | each PPU is sufficient to obtain a good profile. |
| 1206 | </para> |
| 1207 | <para> |
| 1208 | To profile an SPU application, the user should specify the SPU_CYCLES event. |
| 1209 | When starting OProfile with SPU_CYCLES, the opcontrol script enforces certain |
| 1210 | separation parameters (separate=cpu,lib) to ensure that sufficient information |
| 1211 | is collected in the sample data in order to generate a complete report. The |
| 1212 | --merge=cpu option can be used to obtain a more readable report if analyzing |
| 1213 | the performance of each separate SPU is not necessary. |
| 1214 | </para> |
| 1215 | <para> |
| 1216 | Profiling with an SPU event (events 4100 through 4163) is not compatible with any other |
| 1217 | event. Further more, only one SPU event can be specified at a time. The hardware only |
| 1218 | supports profiling on one SPU per node at a time. The OProfile kernel code time slices |
| 1219 | between the eight SPUs to collect data on all SPUs. |
| 1220 | </para> |
| 1221 | <para> |
| 1222 | SPU profile reports have some unique characteristics compared to reports for |
| 1223 | standard architectures: |
| 1224 | </para> |
| 1225 | <itemizedlist> |
| 1226 | <listitem>Typically no "app name" column. This is really standard OProfile behavior |
| 1227 | when the report contains samples for just a single application, which is |
| 1228 | commonly the case when profiling SPUs.</listitem> |
| 1229 | <listitem>"CPU" equates to "SPU"</listitem> |
| 1230 | <listitem>Specifying '--long-filenames' on the opreport command does not always result |
| 1231 | in long filenames. This happens when the SPU application code is embedded in |
| 1232 | the PPE executable or shared library. The embedded SPU ELF data contains only the |
| 1233 | short filename (i.e., no path information) for the SPU binary file that was used as |
| 1234 | the source for embedding. The reason that just the short filename is used is because |
| 1235 | the original SPU binary file may not exist or be accessible at runtime. The performance |
| 1236 | analyst must have sufficient knowledge of the application to be able to correlate the |
| 1237 | SPU binary image names found in the report to the application's source files. |
| 1238 | <note> |
| 1239 | Compile the application with -g and generate the OProfile report |
| 1240 | with -g to facilitate finding the right source file(s) on which to focus. |
| 1241 | </note> |
| 1242 | </listitem> |
| 1243 | </itemizedlist> |
| 1244 | |
| 1245 | </sect2> |
| 1246 | |
| 1247 | <sect2 id="amd-ibs-support"> |
| 1248 | <title>AMD64 (x86_64) Instruction-Based Sampling (IBS) support</title> |
| 1249 | |
| 1250 | <para> |
| 1251 | Instruction-Based Sampling (IBS) is a new performance measurement technique |
| 1252 | available on AMD Family 10h processors. Traditional performance counter |
| 1253 | sampling is not precise enough to isolate performance issues to individual |
| 1254 | instructions. IBS, however, precisely identifies instructions which are not |
| 1255 | making the best use of the processor pipeline and memory hierarchy. |
| 1256 | For more information, please refer to the "Instruction-Based Sampling: |
| 1257 | A New Performance Analysis Technique for AMD Family 10h Processors" ( |
| 1258 | <ulink url="http://developer.amd.com/assets/AMD_IBS_paper_EN.pdf"> |
| 1259 | http://developer.amd.com/assets/AMD_IBS_paper_EN.pdf</ulink>). |
| 1260 | There are two types of IBS profile types, described in the following sections. |
| 1261 | </para> |
| 1262 | |
| 1263 | <sect3 id="ibs-fetch"> |
| 1264 | <title>IBS Fetch</title> |
| 1265 | |
| 1266 | <para> |
| 1267 | IBS fetch sampling is a statistical sampling method which counts completed |
| 1268 | fetch operations. When the number of completed fetch operations reaches the |
| 1269 | maximum fetch count (the sampling period), IBS tags the fetch operation and |
| 1270 | monitors that operation until it either completes or aborts. When a tagged |
| 1271 | fetch completes or aborts, a sampling interrupt is generated and an IBS fetch |
| 1272 | sample is taken. An IBS fetch sample contains a timestamp, the identifier of |
| 1273 | the interrupted process, the virtual fetch address, and several event flags |
| 1274 | and values that describe what happened during the fetch operation. |
| 1275 | </para> |
| 1276 | |
| 1277 | </sect3> |
| 1278 | |
| 1279 | <sect3 id="ibs-op"> |
| 1280 | <title>IBS Op</title> |
| 1281 | |
| 1282 | <para> |
| 1283 | IBS op sampling selects, tags, and monitors macro-ops as issued from AMD64 |
| 1284 | instructions. Two options are available for selecting ops for sampling: |
| 1285 | </para> |
| 1286 | |
| 1287 | <itemizedlist> |
| 1288 | <listitem> |
| 1289 | Cycles-based selection counts CPU clock cycles. The op is tagged and monitored |
| 1290 | when the count reaches a threshold (the sampling period) and a valid op is |
| 1291 | available. |
| 1292 | </listitem> |
| 1293 | |
| 1294 | <listitem> |
| 1295 | Dispatched op-based selection counts dispatched macro-ops. |
| 1296 | When the count reaches a threshold, the next valid op is tagged and monitored. |
| 1297 | </listitem> |
| 1298 | </itemizedlist> |
| 1299 | |
| 1300 | <para> |
| 1301 | In both cases, an IBS sample is generated only if the tagged op retires. |
| 1302 | Thus, IBS op event information does not measure speculative execution activity. |
| 1303 | The execution stages of the pipeline monitor the tagged macro-op. When the |
| 1304 | tagged macro-op retires, a sampling interrupt is generated and an IBS op |
| 1305 | sample is taken. An IBS op sample contains a timestamp, the identifier of |
| 1306 | the interrupted process, the virtual address of the AMD64 instruction from |
| 1307 | which the op was issued, and several event flags and values that describe |
| 1308 | what happened when the macro-op executed. |
| 1309 | </para> |
| 1310 | |
| 1311 | </sect3> |
| 1312 | |
| 1313 | <para> |
| 1314 | Enabling IBS profiling is done simply by specifying IBS performance events |
| 1315 | through the "--event=" options. These events are listed in the |
| 1316 | <function>opcontrol --list-events</function>. |
| 1317 | </para> |
| 1318 | |
| 1319 | <screen> |
| 1320 | opcontrol --event=IBS_FETCH_XXX:<count>:<um>:<kernel>:<user> |
| 1321 | opcontrol --event=IBS_OP_XXX:<count>:<um>:<kernel>:<user> |
| 1322 | |
| 1323 | Note: * All IBS fetch event must have the same event count and unitmask, |
| 1324 | as do those for IBS op. |
| 1325 | </screen> |
| 1326 | |
| 1327 | </sect2> |
| 1328 | |
| 1329 | |
| 1330 | <sect2 id="misuse"> |
| 1331 | <title>Dangerous counter settings</title> |
| 1332 | <para> |
| 1333 | OProfile is a low-level profiler which allow continuous profiling with a low-overhead cost. |
| 1334 | If too low a count reset value is set for a counter, the system can become overloaded with counter |
| 1335 | interrupts, and seem as if the system has frozen. Whilst some validation is done, it |
| 1336 | is not foolproof. |
| 1337 | </para> |
| 1338 | <note><para> |
| 1339 | This can happen as follows: When the profiler count |
| 1340 | reaches zero an NMI handler is called which stores the sample values in an internal buffer, then resets the counter |
| 1341 | to its original value. If the count is very low, a pending NMI can be sent before the NMI handler has |
| 1342 | completed. Due to the priority of the NMI, the local APIC delivers the pending interrupt immediately after |
| 1343 | completion of the previous interrupt handler, and control never returns to other parts of the system. |
| 1344 | In this way the system seems to be frozen. |
| 1345 | </para></note> |
| 1346 | <para>If this happens, it will be impossible to bring the system back to a workable state. |
| 1347 | There is no way to provide real security against this happening, other than making sure to use a reasonable value |
| 1348 | for the counter reset. For example, setting <constant>CPU_CLK_UNHALTED</constant> event type with a ridiculously low reset count (e.g. 500) |
| 1349 | is likely to freeze the system. |
| 1350 | </para> |
| 1351 | <para> |
| 1352 | In short : <command>Don't try a foolish sample count value</command>. Unfortunately the definition of a foolish value |
| 1353 | is really dependent on the event type - if ever in doubt, e-mail </para> |
| 1354 | <address><email>oprofile-list@lists.sf.net</email>.</address> |
| 1355 | </sect2> |
| 1356 | |
| 1357 | </sect1> |
| 1358 | |
| 1359 | </chapter> |
| 1360 | |
| 1361 | <chapter id="results"> |
| 1362 | <title>Obtaining results</title> |
| 1363 | <para> |
| 1364 | OK, so the profiler has been running, but it's not much use unless we can get some data out. Fairly often, |
| 1365 | OProfile does a little <emphasis>too</emphasis> good a job of keeping overhead low, and no data reaches |
| 1366 | the profiler. This can happen on lightly-loaded machines. Remember you can force a dump at any time with : |
| 1367 | </para> |
| 1368 | <para><command>opcontrol --dump</command></para> |
| 1369 | <para>Remember to do this before complaining there is no profiling data ! |
| 1370 | Now that we've got some data, it has to be processed. That's the job of <command>opreport</command>, |
| 1371 | <command>opannotate</command>, or <command>opgprof</command>. |
| 1372 | </para> |
| 1373 | |
| 1374 | <sect1 id="profile-spec"> |
| 1375 | <title>Profile specifications</title> |
| 1376 | |
| 1377 | <para> |
| 1378 | All of the analysis tools take a <emphasis>profile specification</emphasis>. |
| 1379 | This is a set of definitions that describe which actual profiles should be |
| 1380 | examined. The simplest profile specification is empty: this will match all |
| 1381 | the available profile files for the current session (this is what happens |
| 1382 | when you do <command>opreport</command>). |
| 1383 | </para> |
| 1384 | <para> |
| 1385 | Specification parameters are of the form <option>name:value[,value]</option>. |
| 1386 | For example, if I wanted to get a combined symbol summary for |
| 1387 | <filename>/bin/myprog</filename> and <filename>/bin/myprog2</filename>, |
| 1388 | I could do <command>opreport -l image:/bin/myprog,/bin/myprog2</command>. |
| 1389 | As a special case, you don't actually need to specify the <option>image:</option> |
| 1390 | part here: anything left on the command line is assumed to be an |
| 1391 | <option>image:</option> name. Similarly, if no <option>session:</option> |
| 1392 | is specified, then <option>session:current</option> is assumed ("current" |
| 1393 | is a special name of the current / last profiling session). |
| 1394 | </para> |
| 1395 | <para> |
| 1396 | In addition to the comma-separated list shown above, some of the |
| 1397 | specification parameters can take <command>glob</command>-style |
| 1398 | values. For example, if I want to see image summaries for all |
| 1399 | binaries profiled in <filename>/usr/bin/</filename>, I could do |
| 1400 | <command>opreport image:/usr/bin/\*</command>. Note the necessity |
| 1401 | to escape the special character from the shell. |
| 1402 | </para> |
| 1403 | <para> |
| 1404 | For <command>opreport</command>, profile specifications can be used to |
| 1405 | define two profiles, giving differential output. This is done by |
| 1406 | enclosing each of the two specifications within curly braces, as shown |
| 1407 | in the examples below. Any specifications outside of curly braces are |
| 1408 | shared across both. |
| 1409 | </para> |
| 1410 | |
| 1411 | <sect2 id="profile-spec-examples"> |
| 1412 | <title>Examples</title> |
| 1413 | |
| 1414 | <para> |
| 1415 | Image summaries for all profiles with <constant>DATA_MEM_REFS</constant> |
| 1416 | samples in the saved session called "stresstest" : |
| 1417 | </para> |
| 1418 | <screen> |
| 1419 | # opreport session:stresstest event:DATA_MEM_REFS |
| 1420 | </screen> |
| 1421 | |
| 1422 | <para> |
| 1423 | Symbol summary for the application called "test_sym53c8xx,9xx". Note the |
| 1424 | escaping is necessary as <option>image:</option> takes a comma-separated list. |
| 1425 | </para> |
| 1426 | <screen> |
| 1427 | # opreport -l ./test/test_sym53c8xx\,9xx |
| 1428 | </screen> |
| 1429 | |
| 1430 | <para> |
| 1431 | Image summaries for all binaries in the <filename>test</filename> directory, |
| 1432 | excepting <filename>boring-test</filename> : |
| 1433 | </para> |
| 1434 | <screen> |
| 1435 | # opreport image:./test/\* image-exclude:./test/boring-test |
| 1436 | </screen> |
| 1437 | |
| 1438 | <para> |
| 1439 | Differential profile of a binary stored in two archives : |
| 1440 | </para> |
| 1441 | <screen> |
| 1442 | # opreport -l /bin/bash { archive:./orig } { archive:./new } |
| 1443 | </screen> |
| 1444 | |
| 1445 | <para> |
| 1446 | Differential profile of an archived binary with the current session : |
| 1447 | </para> |
| 1448 | <screen> |
| 1449 | # opreport -l /bin/bash { archive:./orig } { } |
| 1450 | </screen> |
| 1451 | |
| 1452 | </sect2> <!-- profile spec examples --> |
| 1453 | |
| 1454 | <sect2 id="profile-spec-details"> |
| 1455 | <title>Profile specification parameters</title> |
| 1456 | |
| 1457 | <variablelist> |
| 1458 | <varlistentry> |
| 1459 | <term><option>archive:</option><emphasis>archivepath</emphasis></term> |
| 1460 | <listitem><para> |
| 1461 | A path to an archive made with <command>oparchive</command>. |
| 1462 | Absence of this tag, unlike others, means "the current system", |
| 1463 | equivalent to specifying "archive:". |
| 1464 | </para></listitem> |
| 1465 | </varlistentry> |
| 1466 | <varlistentry> |
| 1467 | <term><option>session:</option><emphasis>sessionlist</emphasis></term> |
| 1468 | <listitem><para> |
| 1469 | A comma-separated list of session names to resolve in. Absence of this |
| 1470 | tag, unlike others, means "the current session", equivalent to |
| 1471 | specifying "session:current". |
| 1472 | </para></listitem> |
| 1473 | </varlistentry> |
| 1474 | <varlistentry> |
| 1475 | <term><option>session-exclude:</option><emphasis>sessionlist</emphasis></term> |
| 1476 | <listitem><para> |
| 1477 | A comma-separated list of sessions to exclude. |
| 1478 | </para></listitem> |
| 1479 | </varlistentry> |
| 1480 | <varlistentry> |
| 1481 | <term><option>image:</option><emphasis>imagelist</emphasis></term> |
| 1482 | <listitem><para> |
| 1483 | A comma-separated list of image names to resolve. Each entry may be relative |
| 1484 | path, <command>glob</command>-style name, or full path, e.g.</para> |
| 1485 | <screen>opreport 'image:/usr/bin/oprofiled,*op*,./opreport'</screen> |
| 1486 | </listitem> |
| 1487 | </varlistentry> |
| 1488 | |
| 1489 | <varlistentry> |
| 1490 | <term><option>image-exclude:</option><emphasis>imagelist</emphasis></term> |
| 1491 | <listitem><para> |
| 1492 | Same as <option>image:</option>, but the matching images are excluded. |
| 1493 | </para></listitem> |
| 1494 | </varlistentry> |
| 1495 | |
| 1496 | <varlistentry> |
| 1497 | <term><option>lib-image:</option><emphasis>imagelist</emphasis></term> |
| 1498 | <listitem><para> |
| 1499 | Same as <option>image:</option>, but only for images that are for |
| 1500 | a particular primary binary image (namely, an application). This only |
| 1501 | makes sense to use if you're using <option>--separate</option>. |
| 1502 | This includes kernel modules and the kernel when using |
| 1503 | <option>--separate=kernel</option>. |
| 1504 | </para></listitem> |
| 1505 | </varlistentry> |
| 1506 | |
| 1507 | <varlistentry> |
| 1508 | <term><option>lib-image-exclude:</option><emphasis>imagelist</emphasis></term> |
| 1509 | <listitem><para> |
| 1510 | Same as <option>lib-image:</option>, but the matching images |
| 1511 | are excluded. |
| 1512 | </para></listitem> |
| 1513 | </varlistentry> |
| 1514 | |
| 1515 | <varlistentry> |
| 1516 | <term><option>event:</option><emphasis>eventlist</emphasis></term> |
| 1517 | <listitem><para> |
| 1518 | The symbolic event name to match on, e.g. <option>event:DATA_MEM_REFS</option>. |
| 1519 | You can pass a list of events for side-by-side comparison with <command>opreport</command>. |
| 1520 | When using the timer interrupt, the event is always "TIMER". |
| 1521 | </para></listitem> |
| 1522 | </varlistentry> |
| 1523 | |
| 1524 | <varlistentry> |
| 1525 | <term><option>count:</option><emphasis>eventcountlist</emphasis></term> |
| 1526 | <listitem><para> |
| 1527 | The event count to match on, e.g. <option>event:DATA_MEM_REFS count:30000</option>. |
| 1528 | Note that this value refers to the setting used for <command>opcontrol</command> |
| 1529 | only, and has nothing to do with the sample counts in the profile data |
| 1530 | itself. |
| 1531 | You can pass a list of events for side-by-side comparison with <command>opreport</command>. |
| 1532 | When using the timer interrupt, the count is always 0 (indicating it cannot be set). |
| 1533 | </para></listitem> |
| 1534 | </varlistentry> |
| 1535 | |
| 1536 | <varlistentry> |
| 1537 | <term><option>unit-mask:</option><emphasis>masklist</emphasis></term> |
| 1538 | <listitem><para> |
| 1539 | The unit mask value of the event to match on, e.g. <option>unit-mask:1</option>. |
| 1540 | You can pass a list of events for side-by-side comparison with <command>opreport</command>. |
| 1541 | </para></listitem> |
| 1542 | </varlistentry> |
| 1543 | |
| 1544 | <varlistentry> |
| 1545 | <term><option>cpu:</option><emphasis>cpulist</emphasis></term> |
| 1546 | <listitem><para> |
| 1547 | Only consider profiles for the given numbered CPU (starting from zero). |
| 1548 | This is only useful when using CPU profile separation. |
| 1549 | </para></listitem> |
| 1550 | </varlistentry> |
| 1551 | |
| 1552 | <varlistentry> |
| 1553 | <term><option>tgid:</option><emphasis>pidlist</emphasis></term> |
| 1554 | <listitem><para> |
| 1555 | Only consider profiles for the given task groups. Unless some program |
| 1556 | is using threads, the task group ID of a process is the same |
| 1557 | as its process ID. This option corresponds to the POSIX |
| 1558 | notion of a thread group. |
| 1559 | This is only useful when using per-process profile separation. |
| 1560 | </para></listitem> |
| 1561 | </varlistentry> |
| 1562 | |
| 1563 | <varlistentry> |
| 1564 | <term><option>tid:</option><emphasis>tidlist</emphasis></term> |
| 1565 | <listitem><para> |
| 1566 | Only consider profiles for the given threads. When using |
| 1567 | recent thread libraries, all threads in a process share the |
| 1568 | same task group ID, but have different thread IDs. You can |
| 1569 | use this option in combination with <option>tgid:</option> to |
| 1570 | restrict the results to particular threads within a process. |
| 1571 | This is only useful when using per-process profile separation. |
| 1572 | </para></listitem> |
| 1573 | </varlistentry> |
| 1574 | </variablelist> |
| 1575 | |
| 1576 | </sect2> |
| 1577 | |
| 1578 | <sect2 id="locating-and-managing-binary-images"> |
| 1579 | <title>Locating and managing binary images</title> |
| 1580 | <para> |
| 1581 | Each session's sample files can be found in the $SESSION_DIR/samples/ directory (default: <filename>/var/lib/oprofile/samples/</filename>). |
| 1582 | These are used, along with the binary image files, to produce human-readable data. |
| 1583 | In some circumstances (kernel modules in an initrd, or modules on 2.6 kernels), OProfile |
| 1584 | will not be able to find the binary images. All the tools have an <option>--image-path</option> |
| 1585 | option to which you can pass a comma-separated list of alternate paths to search. For example, |
| 1586 | I can let OProfile find my 2.6 modules by using <command>--image-path /lib/modules/2.6.0/kernel/</command>. |
| 1587 | It is your responsibility to ensure that the correct images are found when using this |
| 1588 | option. |
| 1589 | </para> |
| 1590 | <para> |
| 1591 | Note that if a binary image changes after the sample file was created, you won't be able to get useful |
| 1592 | symbol-based data out. This situation is detected for you. If you replace a binary, you should |
| 1593 | make sure to save the old binary if you need to do comparative profiles. |
| 1594 | </para> |
| 1595 | |
| 1596 | </sect2> |
| 1597 | |
| 1598 | <sect2 id="no-results"> |
| 1599 | <title>What to do when you don't get any results</title> |
| 1600 | <para> |
| 1601 | When attempting to get output, you may see the error : |
| 1602 | </para> |
| 1603 | <screen> |
| 1604 | error: no sample files found: profile specification too strict ? |
| 1605 | </screen> |
| 1606 | <para> |
| 1607 | What this is saying is that the profile specification you passed in, |
| 1608 | when matched against the available sample files, resulted in no matches. |
| 1609 | There are a number of reasons this might happen: |
| 1610 | </para> |
| 1611 | <variablelist> |
| 1612 | <varlistentry><term>spelling</term><listitem><para> |
| 1613 | You specified a binary name, but spelt it wrongly. Check your spelling ! |
| 1614 | </para></listitem></varlistentry> |
| 1615 | <varlistentry><term>profiler wasn't running</term><listitem><para> |
| 1616 | Make very sure that OProfile was actually up and running when you ran |
| 1617 | the binary. |
| 1618 | </para></listitem></varlistentry> |
| 1619 | <varlistentry><term>binary didn't run long enough</term><listitem><para> |
| 1620 | Remember OProfile is a statistical profiler - you're not guaranteed to |
| 1621 | get samples for short-running programs. You can help this by using a |
| 1622 | lower count for the performance counter, so there are a lot more samples |
| 1623 | taken per second. |
| 1624 | </para></listitem></varlistentry> |
| 1625 | <varlistentry><term>binary spent most of its time in libraries</term><listitem><para> |
| 1626 | Similarly, if the binary spends little time in the main binary image |
| 1627 | itself, with most of it spent in shared libraries it uses, you might |
| 1628 | not see any samples for the binary image itself. You can check this |
| 1629 | by using <command>opcontrol --separate=lib</command> before the |
| 1630 | profiling session, so <command>opreport</command> and friends show |
| 1631 | the library profiles on a per-application basis. |
| 1632 | </para></listitem></varlistentry> |
| 1633 | <varlistentry><term>specification was really too strict</term><listitem><para> |
| 1634 | For example, you specified something like <option>tgid:3433</option>, |
| 1635 | but no task with that group ID ever ran the code. |
| 1636 | </para></listitem></varlistentry> |
| 1637 | <varlistentry><term>binary didn't generate any events</term><listitem><para> |
| 1638 | If you're using a particular event counter, for example counting MMX |
| 1639 | operations, the code might simply have not generated any events in the |
| 1640 | first place. Verify the code you're profiling does what you expect it |
| 1641 | to. |
| 1642 | </para></listitem></varlistentry> |
| 1643 | <varlistentry><term>you didn't specify kernel module name correctly</term><listitem><para> |
| 1644 | If you're using 2.6 kernels, and trying to get reports for a kernel |
| 1645 | module, make sure to use the <option>-p</option> option, and specify the |
| 1646 | module name <emphasis>with</emphasis> the <filename>.ko</filename> |
| 1647 | extension. Check if the module is one loaded from initrd. |
| 1648 | </para></listitem></varlistentry> |
| 1649 | </variablelist> |
| 1650 | |
| 1651 | </sect2> |
| 1652 | |
| 1653 | </sect1> <!-- profile-spec --> |
| 1654 | |
| 1655 | <sect1 id="opreport"> |
| 1656 | <title>Image summaries and symbol summaries (<command>opreport</command>)</title> |
| 1657 | <para> |
| 1658 | The <command>opreport</command> utility is the primary utility you will use for |
| 1659 | getting formatted data out of OProfile. It produces two types of data: image summaries |
| 1660 | and symbol summaries. An image summary lists the number of samples for individual |
| 1661 | binary images such as libraries or applications. Symbol summaries provide per-symbol |
| 1662 | profile data. In the following example, we're getting an image summary for the whole |
| 1663 | system: |
| 1664 | </para> |
| 1665 | <screen> |
| 1666 | $ opreport --long-filenames |
| 1667 | CPU: PIII, speed 863.195 MHz (estimated) |
| 1668 | Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 23150 |
| 1669 | 905898 59.7415 /usr/lib/gcc-lib/i386-redhat-linux/3.2/cc1plus |
| 1670 | 214320 14.1338 /boot/2.6.0/vmlinux |
| 1671 | 103450 6.8222 /lib/i686/libc-2.3.2.so |
| 1672 | 60160 3.9674 /usr/local/bin/madplay |
| 1673 | 31769 2.0951 /usr/local/oprofile-pp/bin/oprofiled |
| 1674 | 26550 1.7509 /usr/lib/libartsflow.so.1.0.0 |
| 1675 | 23906 1.5765 /usr/bin/as |
| 1676 | 18770 1.2378 /oprofile |
| 1677 | 15528 1.0240 /usr/lib/qt-3.0.5/lib/libqt-mt.so.3.0.5 |
| 1678 | 11979 0.7900 /usr/X11R6/bin/XFree86 |
| 1679 | 11328 0.7471 /bin/bash |
| 1680 | ... |
| 1681 | </screen> |
| 1682 | <para> |
| 1683 | If we had specified <option>--symbols</option> in the previous command, we would have |
| 1684 | gotten a symbol summary of all the images across the entire system. We can restrict this to only |
| 1685 | part of the system profile; for example, |
| 1686 | below is a symbol summary of the OProfile daemon. Note that as we used |
| 1687 | <command>opcontrol --separate=kernel</command>, symbols from images that <command>oprofiled</command> |
| 1688 | has used are also shown. |
| 1689 | </para> |
| 1690 | <screen> |
| 1691 | $ opreport -l `which oprofiled` 2>/dev/null | more |
| 1692 | CPU: PIII, speed 863.195 MHz (estimated) |
| 1693 | Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 23150 |
| 1694 | vma samples % image name symbol name |
| 1695 | 0804be10 14971 28.1993 oprofiled odb_insert |
| 1696 | 0804afdc 7144 13.4564 oprofiled pop_buffer_value |
| 1697 | c01daea0 6113 11.5144 vmlinux __copy_to_user_ll |
| 1698 | 0804b060 2816 5.3042 oprofiled opd_put_sample |
| 1699 | 0804b4a0 2147 4.0441 oprofiled opd_process_samples |
| 1700 | 0804acf4 1855 3.4941 oprofiled opd_put_image_sample |
| 1701 | 0804ad84 1766 3.3264 oprofiled opd_find_image |
| 1702 | 0804a5ec 1084 2.0418 oprofiled opd_find_module |
| 1703 | 0804ba5c 741 1.3957 oprofiled odb_hash_add_node |
| 1704 | ... |
| 1705 | </screen> |
| 1706 | |
| 1707 | <para> |
| 1708 | These are the two basic ways you are most likely to use regularly, but <command>opreport</command> |
| 1709 | can do a lot more than that, as described below. |
| 1710 | </para> |
| 1711 | |
| 1712 | <sect2 id="opreport-merging"> |
| 1713 | <title>Merging separate profiles</title> |
| 1714 | |
| 1715 | If you have used one of the <option>--separate=</option> options |
| 1716 | whilst profiling, there can be several separate profiles for |
| 1717 | a single binary image within a session. Normally the output |
| 1718 | will keep these images separated (so, for example, the image summary |
| 1719 | output shows library image summaries on a per-application basis, |
| 1720 | when using <option>--separate=lib</option>). |
| 1721 | Sometimes it can be useful to merge these results back together |
| 1722 | before getting results. The <option>--merge</option> option allows |
| 1723 | you to do that. |
| 1724 | </sect2> |
| 1725 | |
| 1726 | <sect2 id="opreport-comparison"> |
| 1727 | <title>Side-by-side multiple results</title> |
| 1728 | If you have used multiple events when profiling, by default you get |
| 1729 | side-by-side results of each event's sample values from <command>opreport</command>. |
| 1730 | You can restrict which events to list by appropriate use of the |
| 1731 | <option>event:</option> profile specifications, etc. |
| 1732 | </sect2> |
| 1733 | |
| 1734 | <sect2 id="opreport-callgraph"> |
| 1735 | <title>Callgraph output</title> |
| 1736 | <para> |
| 1737 | This section provides details on how to use the OProfile callgraph feature. |
| 1738 | </para> |
| 1739 | <sect3 id="op-cg1"> |
| 1740 | <title>Callgraph details</title> |
| 1741 | <para> |
| 1742 | When using the <option>opcontrol --callgraph</option> option, you can see what |
| 1743 | functions are calling other functions in the output. Consider the |
| 1744 | following program: |
| 1745 | </para> |
| 1746 | <screen> |
| 1747 | #include <string.h> |
| 1748 | #include <stdlib.h> |
| 1749 | #include <stdio.h> |
| 1750 | |
| 1751 | #define SIZE 500000 |
| 1752 | |
| 1753 | static int compare(const void *s1, const void *s2) |
| 1754 | { |
| 1755 | return strcmp(s1, s2); |
| 1756 | } |
| 1757 | |
| 1758 | static void repeat(void) |
| 1759 | { |
| 1760 | int i; |
| 1761 | char *strings[SIZE]; |
| 1762 | char str[] = "abcdefghijklmnopqrstuvwxyz"; |
| 1763 | |
| 1764 | for (i = 0; i < SIZE; ++i) { |
| 1765 | strings[i] = strdup(str); |
| 1766 | strfry(strings[i]); |
| 1767 | } |
| 1768 | |
| 1769 | qsort(strings, SIZE, sizeof(char *), compare); |
| 1770 | } |
| 1771 | |
| 1772 | int main() |
| 1773 | { |
| 1774 | while (1) |
| 1775 | repeat(); |
| 1776 | } |
| 1777 | </screen> |
| 1778 | <para> |
| 1779 | When running with the call-graph option, OProfile will |
| 1780 | record the function stack every time it takes a sample. |
| 1781 | <command>opreport --callgraph</command> outputs an entry for each |
| 1782 | function, where each entry looks similar to: |
| 1783 | </para> |
| 1784 | <screen> |
| 1785 | samples % image name symbol name |
| 1786 | 197 0.1548 cg main |
| 1787 | 127036 99.8452 cg repeat |
| 1788 | 84590 42.5084 libc-2.3.2.so strfry |
| 1789 | 84590 66.4838 libc-2.3.2.so strfry [self] |
| 1790 | 39169 30.7850 libc-2.3.2.so random_r |
| 1791 | 3475 2.7312 libc-2.3.2.so __i686.get_pc_thunk.bx |
| 1792 | ------------------------------------------------------------------------------- |
| 1793 | </screen> |
| 1794 | <para> |
| 1795 | Here the non-indented line is the function we're focussing upon |
| 1796 | (<function>strfry()</function>). This |
| 1797 | line is the same as you'd get from a normal <command>opreport</command> |
| 1798 | output. |
| 1799 | </para> |
| 1800 | <para> |
| 1801 | Above the non-indented line we find the functions that called this |
| 1802 | function (for example, <function>repeat()</function> calls |
| 1803 | <function>strfry()</function>). The samples and percentage values here |
| 1804 | refer to the number of times we took a sample where this call was found |
| 1805 | in the stack; the percentage is relative to all other callers of the |
| 1806 | function we're focussing on. Note that these values are |
| 1807 | <emphasis>not</emphasis> call counts; they only reflect the call stack |
| 1808 | every time a sample is taken; that is, if a call is found in the stack |
| 1809 | at the time of a sample, it is recorded in this count. |
| 1810 | </para> |
| 1811 | <para> |
| 1812 | Below the line are functions that are called by |
| 1813 | <function>strfry()</function> (called <emphasis>callees</emphasis>). |
| 1814 | It's clear here that <function>strfry()</function> calls |
| 1815 | <function>random_r()</function>. We also see a special entry with a |
| 1816 | "[self]" marker. This records the normal samples for the function, but |
| 1817 | the percentage becomes relative to all callees. This allows you to |
| 1818 | compare time spent in the function itself compared to functions it |
| 1819 | calls. Note that if a function calls itself, then it will appear in the |
| 1820 | list of callees of itself, but without the "[self]" marker; so recursive |
| 1821 | calls are still clearly separable. |
| 1822 | </para> |
| 1823 | <para> |
| 1824 | You may have noticed that the output lists <function>main()</function> |
| 1825 | as calling <function>strfry()</function>, but it's clear from the source |
| 1826 | that this doesn't actually happen. See <xref |
| 1827 | linkend="interpreting-callgraph" /> for an explanation. |
| 1828 | </para> |
| 1829 | </sect3> |
| 1830 | <sect3 id="cg-with-jitsupport"> |
| 1831 | <title>Callgraph and JIT support</title> |
| 1832 | <para> |
| 1833 | Callgraph output where anonymously mapped code is in the callstack can sometimes be misleading. |
| 1834 | For all such code, the samples for the anonymously mapped code are stored in a samples subdirectory |
| 1835 | named <filename>{anon:anon}/<tgid>.<begin_addr>.<end_addr></filename>. |
| 1836 | As stated earlier, if this anonymously mapped code is JITed code from a supported VM like Java, |
| 1837 | OProfile creates an ELF file to provide a (somewhat) permanent backing file for the code. |
| 1838 | However, when viewing callgraph output, any anonymously mapped code in the callstack |
| 1839 | will be attributed to <filename>anon (<tgid>: range:<begin_addr>-<end_addr></filename>, |
| 1840 | even if a <filename>.jo</filename> ELF file had been created for it. See the example below. |
| 1841 | </para> |
| 1842 | <screen> |
| 1843 | ------------------------------------------------------------------------------- |
| 1844 | 1 2.2727 libj9ute23.so java.bin traceV |
| 1845 | 2 4.5455 libj9ute23.so java.bin utsTraceV |
| 1846 | 4 9.0909 libj9trc23.so java.bin fillInUTInterfaces |
| 1847 | 37 84.0909 libj9trc23.so java.bin twGetSequenceCounter |
| 1848 | 8 0.0154 libj9prt23.so java.bin j9time_hires_clock |
| 1849 | 27 61.3636 anon (tgid:10014 range:0x100000-0x103000) java.bin (no symbols) |
| 1850 | 9 20.4545 libc-2.4.so java.bin gettimeofday |
| 1851 | 8 18.1818 libj9prt23.so java.bin j9time_hires_clock [self] |
| 1852 | ------------------------------------------------------------------------------- |
| 1853 | </screen> |
| 1854 | <para> |
| 1855 | The output shows that "anon (tgid:10014 range:0x100000-0x103000)" was a callee of |
| 1856 | <code>j9time_hires_clock</code>, even though the ELF file <filename>10014.jo</filename> was |
| 1857 | created for this profile run. Unfortunately, there is currently no way to correlate |
| 1858 | that anonymous callgraph entry with its corresponding <filename>.jo</filename> file. |
| 1859 | </para> |
| 1860 | </sect3> |
| 1861 | |
| 1862 | |
| 1863 | </sect2> <!-- opreport-callgraph --> |
| 1864 | |
| 1865 | <sect2 id="opreport-diff"> |
| 1866 | <title>Differential profiles with <command>opreport</command></title> |
| 1867 | |
| 1868 | <para> |
| 1869 | Often, we'd like to be able to compare two profiles. For example, when |
| 1870 | analysing the performance of an application, we'd like to make code |
| 1871 | changes and examine the effect of the change. This is supported in |
| 1872 | <command>opreport</command> by giving a profile specification that |
| 1873 | identifies two different profiles. The general form is of: |
| 1874 | </para> |
| 1875 | <screen> |
| 1876 | $ opreport <shared-spec> { <first-profile> } { <second-profile> } |
| 1877 | </screen> |
| 1878 | <note><para> |
| 1879 | We lost our Dragon book down the back of the sofa, so you have to be |
| 1880 | careful to have spaces around those braces, or things will get |
| 1881 | hopelessly confused. We can only apologise. |
| 1882 | </para></note> |
| 1883 | <para> |
| 1884 | For each of the profiles, the shared section is prefixed, and then the |
| 1885 | specification is analysed. The usual parameters work both within the |
| 1886 | shared section, and in the sub-specification within the curly braces. |
| 1887 | </para> |
| 1888 | <para> |
| 1889 | A typical way to use this feature is with archives created with |
| 1890 | <command>oparchive</command>. Let's look at an example: |
| 1891 | </para> |
| 1892 | <screen> |
| 1893 | $ ./a |
| 1894 | $ oparchive -o orig ./a |
| 1895 | $ opcontrol --reset |
| 1896 | # edit and recompile a |
| 1897 | $ ./a |
| 1898 | # now compare the current profile of a with the archived profile |
| 1899 | $ opreport -xl ./a { archive:./orig } { } |
| 1900 | CPU: PIII, speed 863.233 MHz (estimated) |
| 1901 | Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a |
| 1902 | unit mask of 0x00 (No unit mask) count 100000 |
| 1903 | samples % diff % symbol name |
| 1904 | 92435 48.5366 +0.4999 a |
| 1905 | 54226 --- --- c |
| 1906 | 49222 25.8459 +++ d |
| 1907 | 48787 25.6175 -2.2e-01 b |
| 1908 | </screen> |
| 1909 | <para> |
| 1910 | Note that we specified an empty second profile in the curly braces, as |
| 1911 | we wanted to use the current session; alternatively, we could |
| 1912 | have specified another archive, or a tgid etc. We specified the binary |
| 1913 | <command>a</command> in the shared section, so we matched that in both |
| 1914 | the profiles we're diffing. |
| 1915 | </para> |
| 1916 | <para> |
| 1917 | As in the normal output, the results are sorted by the number of |
| 1918 | samples, and the percentage field represents the relative percentage of |
| 1919 | the symbol's samples in the second profile. |
| 1920 | </para> |
| 1921 | <para> |
| 1922 | Notice the new column in the output. This value represents the |
| 1923 | percentage change of the relative percent between the first and the |
| 1924 | second profile: roughly, "how much more important this symbol is". |
| 1925 | Looking at the symbol <function>a()</function>, we can see that it took |
| 1926 | roughly the same amount of the total profile in both the first and the |
| 1927 | second profile. The function <function>c()</function> was not in the new |
| 1928 | profile, so has been marked with <function>---</function>. Note that the |
| 1929 | sample value is the number of samples in the first profile; since we're |
| 1930 | displaying results for the second profile, we don't list a percentage |
| 1931 | value for it, as it would be meaningless. <function>d()</function> is |
| 1932 | new in the second profile, and consequently marked with |
| 1933 | <function>+++</function>. |
| 1934 | </para> |
| 1935 | <para> |
| 1936 | When comparing profiles between different binaries, it should be clear |
| 1937 | that functions can change in terms of VMA and size. To avoid this |
| 1938 | problem, <command>opreport</command> considers a symbol to be the same |
| 1939 | if the symbol name, image name, and owning application name all match; |
| 1940 | any other factors are ignored. Note that the check for application name |
| 1941 | means that trying to compare library profiles between two different |
| 1942 | applications will not work as you might expect: each symbol will be |
| 1943 | considered different. |
| 1944 | </para> |
| 1945 | |
| 1946 | </sect2> <!-- opreport-diff --> |
| 1947 | |
| 1948 | <sect2 id="opreport-anon"> |
| 1949 | <title>Anonymous executable mappings</title> |
| 1950 | <para> |
| 1951 | Many applications, typically ones involving dynamic compilation into |
| 1952 | machine code (just-in-time, or "JIT", compilation), have executable mappings that |
| 1953 | are not backed by an ELF file. <command>opreport</command> has basic support for showing the |
| 1954 | samples taken in these regions; for example: |
| 1955 | <screen> |
| 1956 | $ opreport /usr/bin/mono -l |
| 1957 | CPU: ppc64 POWER5, speed 1654.34 MHz (estimated) |
| 1958 | Counted CYCLES events (Processor Cycles using continuous sampling) with a unit mask of 0x00 (No unit mask) count 100000 |
| 1959 | samples % image name symbol name |
| 1960 | 47 58.7500 mono (no symbols) |
| 1961 | 14 17.5000 anon (tgid:3189 range:0xf72aa000-0xf72fa000) (no symbols) |
| 1962 | 9 11.2500 anon (tgid:3189 range:0xf6cca000-0xf6dd9000) (no symbols) |
| 1963 | . . . . |
| 1964 | </screen> |
| 1965 | </para> |
| 1966 | <para> |
| 1967 | Note that, since such mappings are dependent upon individual invocations of |
| 1968 | a binary, these mappings are always listed as a dependent image, |
| 1969 | even when using <option>--separate=none</option>. |
| 1970 | Equally, the results are not affected by the <option>--merge</option> |
| 1971 | option. |
| 1972 | </para> |
| 1973 | <para> |
| 1974 | As shown in the opreport output above, OProfile is unable to attribute the samples to any |
| 1975 | symbol(s) because there is no ELF file for this code. |
| 1976 | Enhanced support for JITed code is now available for some virtual machines; |
| 1977 | e.g., the Java Virtual Machine. For details about OProfile output for |
| 1978 | JITed code, see <xref linkend="getting-jit-reports" />. |
| 1979 | </para> |
| 1980 | <para>For more information about JIT support in OProfile, see <xref linkend="jitsupport"/>. |
| 1981 | </para> |
| 1982 | </sect2> <!-- opreport-anon --> |
| 1983 | |
| 1984 | <sect2 id="opreport-xml"> |
| 1985 | <title>XML formatted output</title> |
| 1986 | <para> |
| 1987 | The -xml option can be used to generate XML instead of the usual |
| 1988 | text format. This allows opreport to eliminate some of the constraints |
| 1989 | dictated by the two dimensional text format. For example, it is possible |
| 1990 | to separate the sample data across multiple events, cpus and threads. The XML |
| 1991 | schema implemented by opreport is found in doc/opreport.xsd. It contains |
| 1992 | more detailed comments about the structure of the XML generated by opreport. |
| 1993 | </para> |
| 1994 | <para> |
| 1995 | Since XML is consumed by a client program rather than a user, its structure |
| 1996 | is fairly static. In particular, the --sort option is incompatible with the |
| 1997 | --xml option. Percentages are not dislayed in the XML so the options related |
| 1998 | to percentages will have no effect. Full pathnames are always displayed in |
| 1999 | the XML so --long-filenames is not necessary. The --details option will cause |
| 2000 | all of the individual sample data to be included in the XML as well as the |
| 2001 | instruction byte stream for each symbol (for doing disassembly) and can result |
| 2002 | in very large XML files. |
| 2003 | </para> |
| 2004 | </sect2> <!-- opreport-xml --> |
| 2005 | |
| 2006 | <sect2 id="opreport-options"> |
| 2007 | <title>Options for <command>opreport</command></title> |
| 2008 | |
| 2009 | <variablelist> |
| 2010 | <varlistentry><term><option>--accumulated / -a</option></term><listitem><para> |
| 2011 | Accumulate sample and percentage counts in the symbol list. |
| 2012 | </para></listitem></varlistentry> |
| 2013 | <varlistentry><term><option>--callgraph / -c</option></term><listitem><para> |
| 2014 | Show callgraph information. |
| 2015 | </para></listitem></varlistentry> |
| 2016 | <varlistentry><term><option>--debug-info / -g</option></term><listitem><para> |
| 2017 | Show source file and line for each symbol. |
| 2018 | </para></listitem></varlistentry> |
| 2019 | <varlistentry><term><option>--demangle / -D none|normal|smart</option></term><listitem><para> |
| 2020 | none: no demangling. normal: use default demangler (default) smart: use |
| 2021 | pattern-matching to make C++ symbol demangling more readable. |
| 2022 | </para></listitem></varlistentry> |
| 2023 | <varlistentry><term><option>--details / -d</option></term><listitem><para> |
| 2024 | Show per-instruction details for all selected symbols. Note that, for |
| 2025 | binaries without symbol information, the VMA values shown are raw file |
| 2026 | offsets for the image binary. |
| 2027 | </para></listitem></varlistentry> |
| 2028 | <varlistentry><term><option>--exclude-dependent / -x</option></term><listitem><para> |
| 2029 | Do not include application-specific images for libraries, kernel modules |
| 2030 | and the kernel. This option only makes sense if the profile session |
| 2031 | used --separate. |
| 2032 | </para></listitem></varlistentry> |
| 2033 | <varlistentry><term><option>--exclude-symbols / -e [symbols]</option></term><listitem><para> |
| 2034 | Exclude all the symbols in the given comma-separated list. |
| 2035 | </para></listitem></varlistentry> |
| 2036 | <varlistentry><term><option>--global-percent / -%</option></term><listitem><para> |
| 2037 | Make all percentages relative to the whole profile. |
| 2038 | </para></listitem></varlistentry> |
| 2039 | <varlistentry><term><option>--help / -? / --usage</option></term><listitem><para> |
| 2040 | Show help message. |
| 2041 | </para></listitem></varlistentry> |
| 2042 | <varlistentry><term><option>--image-path / -p [paths]</option></term><listitem><para> |
| 2043 | Comma-separated list of additional paths to search for binaries. |
| 2044 | This is needed to find modules in kernels 2.6 and upwards. |
| 2045 | </para></listitem></varlistentry> |
| 2046 | <varlistentry><term><option>--root / -R [path]</option></term><listitem><para> |
| 2047 | A path to a filesystem to search for additional binaries. |
| 2048 | </para></listitem></varlistentry> |
| 2049 | <varlistentry><term><option>--include-symbols / -i [symbols]</option></term><listitem><para> |
| 2050 | Only include symbols in the given comma-separated list. |
| 2051 | </para></listitem></varlistentry> |
| 2052 | <varlistentry><term><option>--long-filenames / -f</option></term><listitem><para> |
| 2053 | Output full paths instead of basenames. |
| 2054 | </para></listitem></varlistentry> |
| 2055 | <varlistentry><term><option>--merge / -m [lib,cpu,tid,tgid,unitmask,all]</option></term><listitem><para> |
| 2056 | Merge any profiles separated in a --separate session. |
| 2057 | </para></listitem></varlistentry> |
| 2058 | <varlistentry><term><option>--no-header</option></term><listitem><para> |
| 2059 | Don't output a header detailing profiling parameters. |
| 2060 | </para></listitem></varlistentry> |
| 2061 | <varlistentry><term><option>--output-file / -o [file]</option></term><listitem><para> |
| 2062 | Output to the given file instead of stdout. |
| 2063 | </para></listitem></varlistentry> |
| 2064 | <varlistentry><term><option>--reverse-sort / -r</option></term><listitem><para> |
| 2065 | Reverse the sort from the default. |
| 2066 | </para></listitem></varlistentry> |
| 2067 | <varlistentry><term><option>--session-dir=</option>dir_path</term><listitem><para> |
| 2068 | Use sample database out of directory <filename>dir_path</filename> |
| 2069 | instead of the default location (/var/lib/oprofile). |
| 2070 | </para></listitem></varlistentry> |
| 2071 | <varlistentry><term><option>--show-address / -w</option></term><listitem><para> |
| 2072 | Show the VMA address of each symbol (off by default). |
| 2073 | </para></listitem></varlistentry> |
| 2074 | <varlistentry><term><option>--sort / -s [vma,sample,symbol,debug,image]</option></term><listitem><para> |
| 2075 | Sort the list of symbols by, respectively, symbol address, |
| 2076 | number of samples, symbol name, debug filename and line number, |
| 2077 | binary image filename. |
| 2078 | </para></listitem></varlistentry> |
| 2079 | <varlistentry><term><option>--symbols / -l</option></term><listitem><para> |
| 2080 | List per-symbol information instead of a binary image summary. |
| 2081 | </para></listitem></varlistentry> |
| 2082 | <varlistentry><term><option>--threshold / -t [percentage]</option></term><listitem><para> |
| 2083 | Only output data for symbols that have more than the given percentage |
| 2084 | of total samples. |
| 2085 | </para></listitem></varlistentry> |
| 2086 | <varlistentry><term><option>--verbose / -V [options]</option></term><listitem><para> |
| 2087 | Give verbose debugging output. |
| 2088 | </para></listitem></varlistentry> |
| 2089 | <varlistentry><term><option>--version / -v</option></term><listitem><para> |
| 2090 | Show version. |
| 2091 | </para></listitem></varlistentry> |
| 2092 | <varlistentry><term><option>--xml / -X</option></term><listitem><para> |
| 2093 | Generate XML output. |
| 2094 | </para></listitem></varlistentry> |
| 2095 | </variablelist> |
| 2096 | |
| 2097 | </sect2> |
| 2098 | |
| 2099 | </sect1> <!-- opreport --> |
| 2100 | |
| 2101 | <sect1 id="opannotate"> |
| 2102 | <title>Outputting annotated source (<command>opannotate</command>)</title> |
| 2103 | <para> |
| 2104 | The <command>opannotate</command> utility generates annotated source files or assembly listings, optionally |
| 2105 | mixed with source. |
| 2106 | If you want to see the source file, the profiled application needs to have debug information, and the source |
| 2107 | must be available through this debug information. For GCC, you must use the <option>-g</option> option |
| 2108 | when you are compiling. |
| 2109 | If the binary doesn't contain sufficient debug information, you can still |
| 2110 | use <command>opannotate <option>--assembly</option></command> to get annotated assembly. |
| 2111 | </para> |
| 2112 | <para> |
| 2113 | Note that for the reason explained in <xref linkend="hardware-counters" /> the results can be |
| 2114 | inaccurate. The debug information itself can add other problems; for example, the line number for a symbol can be |
| 2115 | incorrect. Assembly instructions can be re-ordered and moved by the compiler, and this can lead to |
| 2116 | crediting source lines with samples not really "owned" by this line. Also see |
| 2117 | <xref linkend="interpreting" />. |
| 2118 | </para> |
| 2119 | <para> |
| 2120 | You can output the annotation to one single file, containing all the source found using the |
| 2121 | <option>--source</option>. You can use this in conjunction with <option>--assembly</option> |
| 2122 | to get combined source/assembly output. |
| 2123 | </para> |
| 2124 | <para> |
| 2125 | You can also output a directory of annotated source files that maintains the structure of |
| 2126 | the original sources. Each line in the annotated source is prepended with the samples |
| 2127 | for that line. Additionally, each symbol is annotated giving details for the symbol |
| 2128 | as a whole. An example: |
| 2129 | </para> |
| 2130 | <screen> |
| 2131 | $ opannotate --source --output-dir=annotated /usr/local/oprofile-pp/bin/oprofiled |
| 2132 | $ ls annotated/home/moz/src/oprofile-pp/daemon/ |
| 2133 | opd_cookie.h opd_image.c opd_kernel.c opd_sample_files.c oprofiled.c |
| 2134 | </screen> |
| 2135 | <para> |
| 2136 | Line numbers are maintained in the source files, but each file has |
| 2137 | a footer appended describing the profiling details. The actual annotation |
| 2138 | looks something like this : |
| 2139 | </para> |
| 2140 | <screen> |
| 2141 | ... |
| 2142 | :static uint64_t pop_buffer_value(struct transient * trans) |
| 2143 | 11510 1.9661 :{ /* pop_buffer_value total: 89901 15.3566 */ |
| 2144 | : uint64_t val; |
| 2145 | : |
| 2146 | 10227 1.7469 : if (!trans->remaining) { |
| 2147 | : fprintf(stderr, "BUG: popping empty buffer !\n"); |
| 2148 | : exit(EXIT_FAILURE); |
| 2149 | : } |
| 2150 | : |
| 2151 | : val = get_buffer_value(trans->buffer, 0); |
| 2152 | 2281 0.3896 : trans->remaining--; |
| 2153 | 2296 0.3922 : trans->buffer += kernel_pointer_size; |
| 2154 | : return val; |
| 2155 | 10454 1.7857 :} |
| 2156 | ... |
| 2157 | </screen> |
| 2158 | |
| 2159 | <para> |
| 2160 | The first number on each line is the number of samples, whilst the second is |
| 2161 | the relative percentage of total samples. |
| 2162 | </para> |
| 2163 | |
| 2164 | <sect2 id="opannotate-finding-source"> |
| 2165 | <title>Locating source files</title> |
| 2166 | <para> |
| 2167 | Of course, <command>opannotate</command> needs to be able to locate the source files |
| 2168 | for the binary image(s) in order to produce output. Some binary images have debug |
| 2169 | information where the given source file paths are relative, not absolute. You can |
| 2170 | specify search paths to look for these files (similar to <command>gdb</command>'s |
| 2171 | <option>dir</option> command) with the <option>--search-dirs</option> option. |
| 2172 | </para> |
| 2173 | <para> |
| 2174 | Sometimes you may have a binary image which gives absolute paths for the source files, |
| 2175 | but you have the actual sources elsewhere (commonly, you've installed an SRPM for |
| 2176 | a binary on your system and you want annotation from an existing profile). You can |
| 2177 | use the <option>--base-dirs</option> option to redirect OProfile to look somewhere |
| 2178 | else for source files. For example, imagine we have a binary generated from a source |
| 2179 | file that is given in the debug information as <filename>/tmp/build/libfoo/foo.c</filename>, |
| 2180 | and you have the source tree matching that binary installed in <filename>/home/user/libfoo/</filename>. |
| 2181 | You can redirect OProfile to find <filename>foo.c</filename> correctly like this : |
| 2182 | </para> |
| 2183 | <screen> |
| 2184 | $ opannotate --source --base-dirs=/tmp/build/libfoo/ --search-dirs=/home/user/libfoo/ --output-dir=annotated/ /lib/libfoo.so |
| 2185 | </screen> |
| 2186 | <para> |
| 2187 | You can specify multiple (comma-separated) paths to both options. |
| 2188 | </para> |
| 2189 | </sect2> |
| 2190 | |
| 2191 | <sect2 id="opannotate-details"> |
| 2192 | <title>Usage of <command>opannotate</command></title> |
| 2193 | |
| 2194 | <variablelist> |
| 2195 | <varlistentry><term><option>--assembly / -a</option></term><listitem><para> |
| 2196 | Output annotated assembly. If this is combined with --source, then mixed |
| 2197 | source / assembly annotations are output. |
| 2198 | </para></listitem></varlistentry> |
| 2199 | <varlistentry><term><option>--base-dirs / -b [paths]/</option></term><listitem><para> |
| 2200 | Comma-separated list of path prefixes. This can be used to point OProfile to a |
| 2201 | different location for source files when the debug information specifies an |
| 2202 | absolute path on your system for the source that does not exist. The prefix |
| 2203 | is stripped from the debug source file paths, then searched in the search dirs |
| 2204 | specified by <option>--search-dirs</option>. |
| 2205 | </para></listitem></varlistentry> |
| 2206 | <varlistentry><term><option>--demangle / -D none|normal|smart</option></term><listitem><para> |
| 2207 | none: no demangling. normal: use default demangler (default) smart: use |
| 2208 | pattern-matching to make C++ symbol demangling more readable. |
| 2209 | </para></listitem></varlistentry> |
| 2210 | <varlistentry><term><option>--exclude-dependent / -x</option></term><listitem><para> |
| 2211 | Do not include application-specific images for libraries, kernel modules |
| 2212 | and the kernel. This option only makes sense if the profile session |
| 2213 | used --separate. |
| 2214 | </para></listitem></varlistentry> |
| 2215 | <varlistentry><term><option>--exclude-file [files]</option></term><listitem><para> |
| 2216 | Exclude all files in the given comma-separated list of glob patterns. |
| 2217 | </para></listitem></varlistentry> |
| 2218 | <varlistentry><term><option>--exclude-symbols / -e [symbols]</option></term><listitem><para> |
| 2219 | Exclude all the symbols in the given comma-separated list. |
| 2220 | </para></listitem></varlistentry> |
| 2221 | <varlistentry><term><option>--help / -? / --usage</option></term><listitem><para> |
| 2222 | Show help message. |
| 2223 | </para></listitem></varlistentry> |
| 2224 | <varlistentry><term><option>--image-path / -p [paths]</option></term><listitem><para> |
| 2225 | Comma-separated list of additional paths to search for binaries. |
| 2226 | This is needed to find modules in kernels 2.6 and upwards. |
| 2227 | </para></listitem></varlistentry> |
| 2228 | <varlistentry><term><option>--root / -R [path]</option></term><listitem><para> |
| 2229 | A path to a filesystem to search for additional binaries. |
| 2230 | </para></listitem></varlistentry> |
| 2231 | <varlistentry><term><option>--include-file [files]</option></term><listitem><para> |
| 2232 | Only include files in the given comma-separated list of glob patterns. |
| 2233 | </para></listitem></varlistentry> |
| 2234 | <varlistentry><term><option>--include-symbols / -i [symbols]</option></term><listitem><para> |
| 2235 | Only include symbols in the given comma-separated list. |
| 2236 | </para></listitem></varlistentry> |
| 2237 | <varlistentry><term><option>--objdump-params [params]</option></term><listitem><para> |
| 2238 | Pass the given parameters as extra values when calling objdump. |
| 2239 | </para></listitem></varlistentry> |
| 2240 | <varlistentry><term><option>--output-dir / -o [dir]</option></term><listitem><para> |
| 2241 | Output directory. This makes opannotate output one annotated file for each |
| 2242 | source file. This option can't be used in conjunction with --assembly. |
| 2243 | </para></listitem></varlistentry> |
| 2244 | <varlistentry><term><option>--search-dirs / -d [paths]</option></term><listitem><para> |
| 2245 | Comma-separated list of paths to search for source files. This is useful to find |
| 2246 | source files when the debug information only contains relative paths. |
| 2247 | </para></listitem></varlistentry> |
| 2248 | <varlistentry><term><option>--source / -s</option></term><listitem><para> |
| 2249 | Output annotated source. This requires debugging information to be available |
| 2250 | for the binaries. |
| 2251 | </para></listitem></varlistentry> |
| 2252 | <varlistentry><term><option>--threshold / -t [percentage]</option></term><listitem><para> |
| 2253 | Only output data for symbols that have more than the given percentage |
| 2254 | of total samples. |
| 2255 | </para></listitem></varlistentry> |
| 2256 | <varlistentry><term><option>--verbose / -V [options]</option></term><listitem><para> |
| 2257 | Give verbose debugging output. |
| 2258 | </para></listitem></varlistentry> |
| 2259 | <varlistentry><term><option>--version / -v</option></term><listitem><para> |
| 2260 | Show version. |
| 2261 | </para></listitem></varlistentry> |
| 2262 | </variablelist> |
| 2263 | |
| 2264 | |
| 2265 | </sect2> <!-- opannotate-details --> |
| 2266 | |
| 2267 | </sect1> <!-- opannotate --> |
| 2268 | |
| 2269 | <sect1 id="getting-jit-reports"> |
| 2270 | <title>OProfile results with JIT samples</title> |
| 2271 | <para> |
| 2272 | After profiling a Java (or other supported VM) application, the command |
| 2273 | <screen><command>"opcontrol --dump"</command> </screen> |
| 2274 | flushes the sample buffers and creates ELF binaries from the |
| 2275 | intermediate files that were written by the agent library. |
| 2276 | The ELF binaries are named <filename><tgid>.jo</filename>. |
| 2277 | With the symbol information stored in these ELF files, it is |
| 2278 | possible to map samples to the appropriate symbols. |
| 2279 | </para> |
| 2280 | <para> |
| 2281 | The usual analysis tools (<command>opreport</command> and/or |
| 2282 | <command>opannotate</command>) can now be used |
| 2283 | to get symbols and assembly code for the instrumented VM processes. |
| 2284 | </para> |
| 2285 | <para> |
| 2286 | Below is an example of a profile report of a Java application that has been |
| 2287 | instrumented with the provided agent library. |
| 2288 | <screen> |
| 2289 | $ opreport -l /usr/lib/jvm/jre-1.5.0-ibm/bin/java |
| 2290 | CPU: Core Solo / Duo, speed 2167 MHz (estimated) |
| 2291 | Counted CPU_CLK_UNHALTED events (Unhalted clock cycles) with a unit mask of 0x00 (Unhalted core cycles) count 100000 |
| 2292 | samples % image name symbol name |
| 2293 | 186020 50.0523 no-vmlinux no-vmlinux (no symbols) |
| 2294 | 34333 9.2380 7635.jo java void test.f1() |
| 2295 | 19022 5.1182 libc-2.5.so libc-2.5.so _IO_file_xsputn@@GLIBC_2.1 |
| 2296 | 18762 5.0483 libc-2.5.so libc-2.5.so vfprintf |
| 2297 | 16408 4.4149 7635.jo java void test$HelloThread.run() |
| 2298 | 16250 4.3724 7635.jo java void test$test_1.f2(int) |
| 2299 | 15303 4.1176 7635.jo java void test.f2(int, int) |
| 2300 | 13252 3.5657 7635.jo java void test.f2(int) |
| 2301 | 5165 1.3897 7635.jo java void test.f4() |
| 2302 | 955 0.2570 7635.jo java void test$HelloThread.run()~ |
| 2303 | |
| 2304 | </screen> |
| 2305 | </para> |
| 2306 | <note><para> |
| 2307 | Depending on the JVM that is used, certain options of opreport and opannotate |
| 2308 | do NOT work since they rely on debug information (e.g. source code line number) |
| 2309 | that is not always available. The Sun JVM does provide the necessary debug |
| 2310 | information via the JVMTI[PI] interface, |
| 2311 | but other JVMs do not. |
| 2312 | </para></note> |
| 2313 | <para> |
| 2314 | As you can see in the opreport output, the JIT support agent for Java |
| 2315 | generates symbols to include the class and method signature. |
| 2316 | A symbol with the suffix ˜<n> (e.g. |
| 2317 | <code>void test$HelloThread.run()˜1</code>) means that this is |
| 2318 | the <n>th occurrence of the identical name. This happens if a method is re-JITed. |
| 2319 | A symbol with the suffix %<n>, means that the address space of this symbol |
| 2320 | was reused during the sample session (see <xref linkend="overlapping-symbols" />). |
| 2321 | The value <n> is the percentage of time that this symbol/code was present in |
| 2322 | relation to the total lifetime of all overlapping other symbols. A symbol of the form |
| 2323 | <code><return_val> <class_name>$<method_sig></code> denotes an |
| 2324 | inner class. |
| 2325 | </para> |
| 2326 | </sect1> |
| 2327 | |
| 2328 | <sect1 id="opgprof"> |
| 2329 | <title><command>gprof</command>-compatible output (<command>opgprof</command>)</title> |
| 2330 | <para> |
| 2331 | If you're familiar with the output produced by <command>GNU gprof</command>, |
| 2332 | you may find <command>opgprof</command> useful. It takes a single binary |
| 2333 | as an argument, and produces a <filename>gmon.out</filename> file for use |
| 2334 | with <command>gprof -p</command>. If call-graph profiling is enabled, |
| 2335 | then this is also included. |
| 2336 | </para> |
| 2337 | <screen> |
| 2338 | $ opgprof `which oprofiled` # generates gmon.out file |
| 2339 | $ gprof -p `which oprofiled` | head |
| 2340 | Flat profile: |
| 2341 | |
| 2342 | Each sample counts as 1 samples. |
| 2343 | % cumulative self self total |
| 2344 | time samples samples calls T1/call T1/call name |
| 2345 | 33.13 206237.00 206237.00 odb_insert |
| 2346 | 22.67 347386.00 141149.00 pop_buffer_value |
| 2347 | 9.56 406881.00 59495.00 opd_put_sample |
| 2348 | 7.34 452599.00 45718.00 opd_find_image |
| 2349 | 7.19 497327.00 44728.00 opd_process_samples |
| 2350 | </screen> |
| 2351 | |
| 2352 | <sect2 id="opgprof-details"> |
| 2353 | <title>Usage of <command>opgprof</command></title> |
| 2354 | |
| 2355 | <variablelist> |
| 2356 | <varlistentry><term><option>--help / -? / --usage</option></term><listitem><para> |
| 2357 | Show help message. |
| 2358 | </para></listitem></varlistentry> |
| 2359 | <varlistentry><term><option>--image-path / -p [paths]</option></term><listitem><para> |
| 2360 | Comma-separated list of additional paths to search for binaries. |
| 2361 | This is needed to find modules in kernels 2.6 and upwards. |
| 2362 | </para></listitem></varlistentry> |
| 2363 | <varlistentry><term><option>--root / -R [path]</option></term><listitem><para> |
| 2364 | A path to a filesystem to search for additional binaries. |
| 2365 | </para></listitem></varlistentry> |
| 2366 | <varlistentry><term><option>--output-filename / -o [file]</option></term><listitem><para> |
| 2367 | Output to the given file instead of the default, gmon.out |
| 2368 | </para></listitem></varlistentry> |
| 2369 | <varlistentry><term><option>--threshold / -t [percentage]</option></term><listitem><para> |
| 2370 | Only output data for symbols that have more than the given percentage |
| 2371 | of total samples. |
| 2372 | </para></listitem></varlistentry> |
| 2373 | <varlistentry><term><option>--verbose / -V [options]</option></term><listitem><para> |
| 2374 | Give verbose debugging output. |
| 2375 | </para></listitem></varlistentry> |
| 2376 | <varlistentry><term><option>--version / -v</option></term><listitem><para> |
| 2377 | Show version. |
| 2378 | </para></listitem></varlistentry> |
| 2379 | </variablelist> |
| 2380 | |
| 2381 | </sect2> <!-- opgprof-details --> |
| 2382 | |
| 2383 | </sect1> <!-- opgprof --> |
| 2384 | |
| 2385 | <sect1 id="oparchive"> |
| 2386 | <title>Archiving measurements (<command>oparchive</command>)</title> |
| 2387 | <para> |
| 2388 | The <command>oparchive</command> utility generates a directory populated |
| 2389 | with executable, debug, and oprofile sample files. This directory can be |
| 2390 | moved to another machine via <command>tar</command> and analyzed without |
| 2391 | further use of the data collection machine. |
| 2392 | </para> |
| 2393 | |
| 2394 | <para> |
| 2395 | The following command would collect the sample files, the executables |
| 2396 | associated with the sample files, and the debuginfo files associated |
| 2397 | with the executables and copy them into |
| 2398 | <filename>/tmp/current_data</filename>: |
| 2399 | </para> |
| 2400 | |
| 2401 | <screen> |
| 2402 | # oparchive -o /tmp/current_data |
| 2403 | </screen> |
| 2404 | |
| 2405 | <sect2 id="oparchive-details"> |
| 2406 | <title>Usage of <command>oparchive</command></title> |
| 2407 | |
| 2408 | <variablelist> |
| 2409 | <varlistentry><term><option>--help / -? / --usage</option></term><listitem><para> |
| 2410 | Show help message. |
| 2411 | </para></listitem></varlistentry> |
| 2412 | <varlistentry><term><option>--exclude-dependent / -x</option></term><listitem><para> |
| 2413 | Do not include application-specific images for libraries, kernel modules |
| 2414 | and the kernel. This option only makes sense if the profile session |
| 2415 | used --separate. |
| 2416 | </para></listitem></varlistentry> |
| 2417 | <varlistentry><term><option>--image-path / -p [paths]</option></term><listitem><para> |
| 2418 | Comma-separated list of additional paths to search for binaries. |
| 2419 | This is needed to find modules in kernels 2.6 and upwards. |
| 2420 | </para></listitem></varlistentry> |
| 2421 | <varlistentry><term><option>--root / -R [path]</option></term><listitem><para> |
| 2422 | A path to a filesystem to search for additional binaries. |
| 2423 | </para></listitem></varlistentry> |
| 2424 | <varlistentry><term><option>--output-directory / -o [directory]</option></term><listitem><para> |
| 2425 | Output to the given directory. There is no default. This must be specified. |
| 2426 | </para></listitem></varlistentry> |
| 2427 | <varlistentry><term><option>--list-files / -l</option></term><listitem><para> |
| 2428 | Only list the files that would be archived, don't copy them. |
| 2429 | </para></listitem></varlistentry> |
| 2430 | <varlistentry><term><option>--verbose / -V [options]</option></term><listitem><para> |
| 2431 | Give verbose debugging output. |
| 2432 | </para></listitem></varlistentry> |
| 2433 | <varlistentry><term><option>--version / -v</option></term><listitem><para> |
| 2434 | Show version. |
| 2435 | </para></listitem></varlistentry> |
| 2436 | </variablelist> |
| 2437 | |
| 2438 | </sect2> <!-- oparchive-details --> |
| 2439 | |
| 2440 | </sect1> <!-- oparchive --> |
| 2441 | |
| 2442 | <sect1 id="opimport"> |
| 2443 | <title>Converting sample database files (<command>opimport</command>)</title> |
| 2444 | <para> |
| 2445 | This utility converts sample database files from a foreign binary format (abi) to |
| 2446 | the native format. This is useful only when moving sample files between hosts, |
| 2447 | for analysis on platforms other than the one used for collection. The abi format |
| 2448 | of the file to be imported is described in a text file located in <filename>$SESSION_DIR/abi</filename>. |
| 2449 | </para> |
| 2450 | |
| 2451 | <para> |
| 2452 | The following command would convert the input samples files to the |
| 2453 | output samples files using the given abi file as a binary description |
| 2454 | of the input file and the curent platform abi as a binary description |
| 2455 | of the output file. |
| 2456 | </para> |
| 2457 | |
| 2458 | <screen> |
| 2459 | # opimport -a /var/lib/oprofile/abi -o /tmp/current/.../GLOBAL_POWER_EVENTS.200000.1.all.all.all /var/lib/.../mprime/GLOBAL_POWER_EVENTS.200000.1.all.all.all |
| 2460 | </screen> |
| 2461 | |
| 2462 | <sect2 id="opimport-details"> |
| 2463 | <title>Usage of <command>opimport</command></title> |
| 2464 | |
| 2465 | <variablelist> |
| 2466 | <varlistentry><term><option>--help / -? / --usage</option></term><listitem><para> |
| 2467 | Show help message. |
| 2468 | </para></listitem></varlistentry> |
| 2469 | <varlistentry><term><option>--abi / -a [filename]</option></term><listitem><para> |
| 2470 | Input abi file description location. |
| 2471 | </para></listitem></varlistentry> |
| 2472 | <varlistentry><term><option>--force / -f</option></term><listitem><para> |
| 2473 | Force conversion even if the input and output abi are identical. |
| 2474 | </para></listitem></varlistentry> |
| 2475 | <varlistentry><term><option>--output / -o [filename]</option></term><listitem><para> |
| 2476 | Specify the output filename. If the output file already exists, the file is |
| 2477 | not overwritten but data are accumulated in. Sample filename are informative |
| 2478 | for post profile tools and must be kept identical, in other word the pathname |
| 2479 | from the first path component containing a '{' must be kept as it in the |
| 2480 | output filename. |
| 2481 | </para></listitem></varlistentry> |
| 2482 | <varlistentry><term><option>--verbose / -V</option></term><listitem><para> |
| 2483 | Give verbose debugging output. |
| 2484 | </para></listitem></varlistentry> |
| 2485 | <varlistentry><term><option>--version / -v</option></term><listitem><para> |
| 2486 | Show version. |
| 2487 | </para></listitem></varlistentry> |
| 2488 | </variablelist> |
| 2489 | |
| 2490 | </sect2> <!-- opimport-details --> |
| 2491 | |
| 2492 | </sect1> <!-- opimport --> |
| 2493 | |
| 2494 | </chapter> |
| 2495 | |
| 2496 | <chapter id="interpreting"> |
| 2497 | <title>Interpreting profiling results</title> |
| 2498 | <para> |
| 2499 | The standard caveats of profiling apply in interpreting the results from OProfile: |
| 2500 | profile realistic situations, profile different scenarios, profile |
| 2501 | for as long as a time as possible, avoid system-specific artifacts, don't trust |
| 2502 | the profile data too much. Also bear in mind the comments on the performance |
| 2503 | counters above - you <emphasis>cannot</emphasis> rely on totally accurate |
| 2504 | instruction-level profiling. However, for almost all circumstances the data |
| 2505 | can be useful. Ideally a utility such as Intel's VTUNE would be available to |
| 2506 | allow careful instruction-level analysis; go hassle Intel for this, not me ;) |
| 2507 | </para> |
| 2508 | <sect1 id="irq-latency"> |
| 2509 | <title>Profiling interrupt latency</title> |
| 2510 | <para> |
| 2511 | This is an example of how the latency of delivery of profiling interrupts |
| 2512 | can impact the reliability of the profiling data. This is pretty much a |
| 2513 | worst-case-scenario example: these problems are fairly rare. |
| 2514 | </para> |
| 2515 | <screen> |
| 2516 | double fun(double a, double b, double c) |
| 2517 | { |
| 2518 | double result = 0; |
| 2519 | for (int i = 0 ; i < 10000; ++i) { |
| 2520 | result += a; |
| 2521 | result *= b; |
| 2522 | result /= c; |
| 2523 | } |
| 2524 | return result; |
| 2525 | } |
| 2526 | </screen> |
| 2527 | <para> |
| 2528 | Here the last instruction of the loop is very costly, and you would expect the result |
| 2529 | reflecting that - but (cutting the instructions inside the loop): |
| 2530 | </para> |
| 2531 | <screen> |
| 2532 | $ opannotate -a -t 10 ./a.out |
| 2533 | |
| 2534 | 88 15.38% : 8048337: fadd %st(3),%st |
| 2535 | 48 8.391% : 8048339: fmul %st(2),%st |
| 2536 | 68 11.88% : 804833b: fdiv %st(1),%st |
| 2537 | 368 64.33% : 804833d: inc %eax |
| 2538 | : 804833e: cmp $0x270f,%eax |
| 2539 | : 8048343: jle 8048337 |
| 2540 | </screen> |
| 2541 | <para> |
| 2542 | The problem comes from the x86 hardware; when the counter overflows the IRQ |
| 2543 | is asserted but the hardware has features that can delay the NMI interrupt: |
| 2544 | x86 hardware is synchronous (i.e. cannot interrupt during an instruction); |
| 2545 | there is also a latency when the IRQ is asserted, and the multiple |
| 2546 | execution units and the out-of-order model of modern x86 CPUs also causes |
| 2547 | problems. This is the same function, with annotation : |
| 2548 | </para> |
| 2549 | <screen> |
| 2550 | $ opannotate -s -t 10 ./a.out |
| 2551 | |
| 2552 | :double fun(double a, double b, double c) |
| 2553 | :{ /* _Z3funddd total: 572 100.0% */ |
| 2554 | : double result = 0; |
| 2555 | 368 64.33% : for (int i = 0 ; i < 10000; ++i) { |
| 2556 | 88 15.38% : result += a; |
| 2557 | 48 8.391% : result *= b; |
| 2558 | 68 11.88% : result /= c; |
| 2559 | : } |
| 2560 | : return result; |
| 2561 | :} |
| 2562 | </screen> |
| 2563 | <para> |
| 2564 | The conclusion: don't trust samples coming at the end of a loop, |
| 2565 | particularly if the last instruction generated by the compiler is costly. This |
| 2566 | case can also occur for branches. Always bear in mind that samples |
| 2567 | can be delayed by a few cycles from its real position. That's a hardware |
| 2568 | problem and OProfile can do nothing about it. |
| 2569 | </para> |
| 2570 | </sect1> |
| 2571 | <sect1 id="kernel-profiling"> |
| 2572 | <title>Kernel profiling</title> |
| 2573 | <sect2 id="irq-masking"> |
| 2574 | <title>Interrupt masking</title> |
| 2575 | <para> |
| 2576 | OProfile uses non-maskable interrupts (NMI) on the P6 generation, Pentium 4, |
| 2577 | Athlon, Opteron, Phenom, and Turion processors. These interrupts can occur even in section of the |
| 2578 | Linux where interrupts are disabled, allowing collection of samples in virtually |
| 2579 | all executable code. The RTC, timer interrupt mode, and Itanium 2 collection mechanisms |
| 2580 | use maskable interrupts. Thus, the RTC and Itanium 2 data collection mechanism have "sample |
| 2581 | shadows", or blind spots: regions where no samples will be collected. Typically, the samples |
| 2582 | will be attributed to the code immediately after the interrupts are re-enabled. |
| 2583 | </para> |
| 2584 | </sect2> |
| 2585 | <sect2 id="idle"> |
| 2586 | <title>Idle time</title> |
| 2587 | <para> |
| 2588 | Your kernel is likely to support halting the processor when a CPU is idle. As |
| 2589 | the typical hardware events like <constant>CPU_CLK_UNHALTED</constant> do not |
| 2590 | count when the CPU is halted, the kernel profile will not reflect the actual |
| 2591 | amount of time spent idle. You can change this behaviour by booting with |
| 2592 | the <option>idle=poll</option> option, which uses a different idle routine. This |
| 2593 | will appear as <function>poll_idle()</function> in your kernel profile. |
| 2594 | </para> |
| 2595 | </sect2> |
| 2596 | <sect2 id="kernel-modules"> |
| 2597 | <title>Profiling kernel modules</title> |
| 2598 | <para> |
| 2599 | OProfile profiles kernel modules by default. However, there are a couple of problems |
| 2600 | you may have when trying to get results. First, you may have booted via an initrd; |
| 2601 | this means that the actual path for the module binaries cannot be determined automatically. |
| 2602 | To get around this, you can use the <option>-p</option> option to the profiling tools |
| 2603 | to specify where to look for the kernel modules. |
| 2604 | </para> |
| 2605 | <para> |
| 2606 | In 2.6, the information on where kernel module binaries are located has been removed. |
| 2607 | This means OProfile needs guiding with the <option>-p</option> option to find your |
| 2608 | modules. Normally, you can just use your standard module top-level directory for this. |
| 2609 | Note that due to this problem, OProfile cannot check that the modification times match; |
| 2610 | it is your responsibility to make sure you do not modify a binary after a profile |
| 2611 | has been created. |
| 2612 | </para> |
| 2613 | <para> |
| 2614 | If you have run <command>insmod</command> or <command>modprobe</command> to insert a module |
| 2615 | in a particular directory, it is important that you specify this directory with the |
| 2616 | <option>-p</option> option first, so that it over-rides an older module binary that might |
| 2617 | exist in other directories you've specified with <option>-p</option>. It is up to you |
| 2618 | to make sure that these values are correct: 2.6 kernels simply do not provide enough |
| 2619 | information for OProfile to get this information. |
| 2620 | </para> |
| 2621 | </sect2> |
| 2622 | </sect1> |
| 2623 | |
| 2624 | <sect1 id="interpreting-callgraph"> |
| 2625 | <title>Interpreting call-graph profiles</title> |
| 2626 | <para> |
| 2627 | Sometimes the results from call-graph profiles may be different to what |
| 2628 | you expect to see. The first thing to check is whether the target |
| 2629 | binaries where compiled with frame pointers enabled (if the binary was |
| 2630 | compiled using <command>gcc</command>'s |
| 2631 | <option>-fomit-frame-pointer</option> option, you will not get |
| 2632 | meaningful results). Note that as of this writing, the GCC developers |
| 2633 | plan to disable frame pointers by default. The Linux kernel is built |
| 2634 | without frame pointers by default; there is a configuration option you |
| 2635 | can use to turn it on under the "Kernel Hacking" menu. |
| 2636 | </para> |
| 2637 | <para> |
| 2638 | Often you may see a caller of a function that does not actually directly |
| 2639 | call the function you're looking at (e.g. if <function>a()</function> |
| 2640 | calls <function>b()</function>, which in turn calls |
| 2641 | <function>c()</function>, you may see an entry for |
| 2642 | <function>a()->c()</function>). What's actually occurring is that we |
| 2643 | are taking samples at the very start (or the very end) of |
| 2644 | <function>c()</function>; at these few instructions, we haven't yet |
| 2645 | created the new function's frame, so it appears as if |
| 2646 | <function>a()</function> is calling directly into |
| 2647 | <function>c()</function>. Be careful not to be misled by these |
| 2648 | entries. |
| 2649 | </para> |
| 2650 | <para> |
| 2651 | Like the rest of OProfile, call-graph profiling uses a statistical |
| 2652 | approach; this means that sometimes a backtrace sample is truncated, or |
| 2653 | even partially wrong. Bear this in mind when examining results. |
| 2654 | </para> |
| 2655 | <!-- FIXME: what do we need here ? --> |
| 2656 | </sect1> |
| 2657 | |
| 2658 | <sect1 id="debug-info"> |
| 2659 | <title>Inaccuracies in annotated source</title> |
| 2660 | <sect2 id="effect-of-optimizations"> |
| 2661 | <title>Side effects of optimizations</title> |
| 2662 | <para> |
| 2663 | The compiler can introduce some pitfalls in the annotated source output. |
| 2664 | The optimizer can move pieces of code in such manner that two line of codes |
| 2665 | are interlaced (instruction scheduling). Also debug info generated by the compiler |
| 2666 | can show strange behavior. This is especially true for complex expressions e.g. inside |
| 2667 | an if statement: |
| 2668 | </para> |
| 2669 | <screen> |
| 2670 | if (a && .. |
| 2671 | b && .. |
| 2672 | c &&) |
| 2673 | </screen> |
| 2674 | <para> |
| 2675 | here the problem come from the position of line number. The available debug |
| 2676 | info does not give enough details for the if condition, so all samples are |
| 2677 | accumulated at the position of the right brace of the expression. Using |
| 2678 | <command>opannotate <option>-a</option></command> can help to show the real |
| 2679 | samples at an assembly level. |
| 2680 | </para> |
| 2681 | </sect2> |
| 2682 | <sect2 id="prologues"> |
| 2683 | <title>Prologues and epilogues</title> |
| 2684 | <para> |
| 2685 | The compiler generally needs to generate "glue" code across function calls, dependent |
| 2686 | on the particular function call conventions used. Additionally other things |
| 2687 | need to happen, like stack pointer adjustment for the local variables; this |
| 2688 | code is known as the function prologue. Similar code is needed at function return, |
| 2689 | and is known as the function epilogue. This will show up in annotations as |
| 2690 | samples at the very start and end of a function, where there is no apparent |
| 2691 | executable code in the source. |
| 2692 | </para> |
| 2693 | </sect2> |
| 2694 | <sect2 id="inlined-function"> |
| 2695 | <title>Inlined functions</title> |
| 2696 | <para> |
| 2697 | You may see that a function is credited with a certain number of samples, but |
| 2698 | the listing does not add up to the correct total. To pick a real example : |
| 2699 | </para> |
| 2700 | <screen> |
| 2701 | :internal_sk_buff_alloc_security(struct sk_buff *skb) |
| 2702 | 353 2.342% :{ /* internal_sk_buff_alloc_security total: 1882 12.48% */ |
| 2703 | : |
| 2704 | : sk_buff_security_t *sksec; |
| 2705 | 15 0.0995% : int rc = 0; |
| 2706 | : |
| 2707 | 10 0.06633% : sksec = skb->lsm_security; |
| 2708 | 468 3.104% : if (sksec && sksec->magic == DSI_MAGIC) { |
| 2709 | : goto out; |
| 2710 | : } |
| 2711 | : |
| 2712 | : sksec = (sk_buff_security_t *) get_sk_buff_memory(skb); |
| 2713 | 3 0.0199% : if (!sksec) { |
| 2714 | 38 0.2521% : rc = -ENOMEM; |
| 2715 | : goto out; |
| 2716 | 10 0.06633% : } |
| 2717 | : memset(sksec, 0, sizeof (sk_buff_security_t)); |
| 2718 | 44 0.2919% : sksec->magic = DSI_MAGIC; |
| 2719 | 32 0.2123% : sksec->skb = skb; |
| 2720 | 45 0.2985% : sksec->sid = DSI_SID_NORMAL; |
| 2721 | 31 0.2056% : skb->lsm_security = sksec; |
| 2722 | : |
| 2723 | : out: |
| 2724 | : |
| 2725 | 146 0.9685% : return rc; |
| 2726 | : |
| 2727 | 98 0.6501% :} |
| 2728 | </screen> |
| 2729 | <para> |
| 2730 | Here, the function is credited with 1,882 samples, but the annotations |
| 2731 | below do not account for this. This is usually because of inline functions - |
| 2732 | the compiler marks such code with debug entries for the inline function |
| 2733 | definition, and this is where <command>opannotate</command> annotates |
| 2734 | such samples. In the case above, <function>memset</function> is the most |
| 2735 | likely candidate for this problem. Examining the mixed source/assembly |
| 2736 | output can help identify such results. |
| 2737 | </para> |
| 2738 | <para> |
| 2739 | This problem is more visible when there is no source file available, in the |
| 2740 | following example it's trivially visible the sums of symbols samples is less |
| 2741 | than the number of the samples for this file. The difference must be accounted |
| 2742 | to inline functions. |
| 2743 | </para> |
| 2744 | <screen> |
| 2745 | /* |
| 2746 | * Total samples for file : "arch/i386/kernel/process.c" |
| 2747 | * |
| 2748 | * 109 2.4616 |
| 2749 | */ |
| 2750 | |
| 2751 | /* default_idle total: 84 1.8970 */ |
| 2752 | /* cpu_idle total: 21 0.4743 */ |
| 2753 | /* flush_thread total: 1 0.0226 */ |
| 2754 | /* prepare_to_copy total: 1 0.0226 */ |
| 2755 | /* __switch_to total: 18 0.4065 */ |
| 2756 | </screen> |
| 2757 | <para> |
| 2758 | The missing samples are not lost, they will be credited to another source |
| 2759 | location where the inlined function is defined. The inlined function will be |
| 2760 | credited from multiple call site and merged in one place in the annotated |
| 2761 | source file so there is no way to see from what call site are coming the |
| 2762 | samples for an inlined function. |
| 2763 | </para> |
| 2764 | <para> |
| 2765 | When running <command>opannotate</command>, you may get a warning |
| 2766 | "some functions compiled without debug information may have incorrect source line attributions". |
| 2767 | In some rare cases, OProfile is not able to verify that the derived source line |
| 2768 | is correct (when some parts of the binary image are compiled without debugging |
| 2769 | information). Be wary of results if this warning appears. |
| 2770 | </para> |
| 2771 | <para> |
| 2772 | Furthermore, for some languages the compiler can implicitly generate functions, |
| 2773 | such as default copy constructors. Such functions are labelled by the compiler |
| 2774 | as having a line number of 0, which means the source annotation can be confusing. |
| 2775 | </para> |
| 2776 | <!-- FIXME so what *actually* happens to those samples ? ignored ? --> |
| 2777 | </sect2> |
| 2778 | <sect2 id="wrong-linenr-info"> |
| 2779 | <title>Inaccuracy in line number information</title> |
| 2780 | <para> |
| 2781 | Depending on your compiler you can fall into the following problem: |
| 2782 | </para> |
| 2783 | <screen> |
| 2784 | struct big_object { int a[500]; }; |
| 2785 | |
| 2786 | int main() |
| 2787 | { |
| 2788 | big_object a, b; |
| 2789 | for (int i = 0 ; i != 1000 * 1000; ++i) |
| 2790 | b = a; |
| 2791 | return 0; |
| 2792 | } |
| 2793 | |
| 2794 | </screen> |
| 2795 | <para> |
| 2796 | Compiled with <command>gcc</command> 3.0.4 the annotated source is clearly inaccurate: |
| 2797 | </para> |
| 2798 | <screen> |
| 2799 | :int main() |
| 2800 | :{ /* main total: 7871 100% */ |
| 2801 | : big_object a, b; |
| 2802 | : for (int i = 0 ; i != 1000 * 1000; ++i) |
| 2803 | : b = a; |
| 2804 | 7871 100% : return 0; |
| 2805 | :} |
| 2806 | </screen> |
| 2807 | <para> |
| 2808 | The problem here is distinct from the IRQ latency problem; the debug line number |
| 2809 | information is not precise enough; again, looking at output of <command>opannoatate -as</command> can help. |
| 2810 | </para> |
| 2811 | <screen> |
| 2812 | :int main() |
| 2813 | :{ |
| 2814 | : big_object a, b; |
| 2815 | : for (int i = 0 ; i != 1000 * 1000; ++i) |
| 2816 | : 80484c0: push %ebp |
| 2817 | : 80484c1: mov %esp,%ebp |
| 2818 | : 80484c3: sub $0xfac,%esp |
| 2819 | : 80484c9: push %edi |
| 2820 | : 80484ca: push %esi |
| 2821 | : 80484cb: push %ebx |
| 2822 | : b = a; |
| 2823 | : 80484cc: lea 0xfffff060(%ebp),%edx |
| 2824 | : 80484d2: lea 0xfffff830(%ebp),%eax |
| 2825 | : 80484d8: mov $0xf423f,%ebx |
| 2826 | : 80484dd: lea 0x0(%esi),%esi |
| 2827 | : return 0; |
| 2828 | 3 0.03811% : 80484e0: mov %edx,%edi |
| 2829 | : 80484e2: mov %eax,%esi |
| 2830 | 1 0.0127% : 80484e4: cld |
| 2831 | 8 0.1016% : 80484e5: mov $0x1f4,%ecx |
| 2832 | 7850 99.73% : 80484ea: repz movsl %ds:(%esi),%es:(%edi) |
| 2833 | 9 0.1143% : 80484ec: dec %ebx |
| 2834 | : 80484ed: jns 80484e0 |
| 2835 | : 80484ef: xor %eax,%eax |
| 2836 | : 80484f1: pop %ebx |
| 2837 | : 80484f2: pop %esi |
| 2838 | : 80484f3: pop %edi |
| 2839 | : 80484f4: leave |
| 2840 | : 80484f5: ret |
| 2841 | </screen> |
| 2842 | <para> |
| 2843 | So here it's clear that copying is correctly credited with of all the samples, but the |
| 2844 | line number information is misplaced. <command>objdump -dS</command> exposes the |
| 2845 | same problem. Note that maintaining accurate debug information for compilers when optimizing is difficult, so this problem is not suprising. |
| 2846 | The problem of debug information |
| 2847 | accuracy is also dependent on the binutils version used; some BFD library versions |
| 2848 | contain a work-around for known problems of <command>gcc</command>, some others do not. This is unfortunate but we must live with that, |
| 2849 | since profiling is pointless when you disable optimisation (which would give better debugging entries). |
| 2850 | </para> |
| 2851 | </sect2> |
| 2852 | </sect1> |
| 2853 | <sect1 id="symbol-without-debug-info"> |
| 2854 | <title>Assembly functions</title> |
| 2855 | <para> |
| 2856 | Often the assembler cannot generate debug information automatically. |
| 2857 | This means that you cannot get a source report unless |
| 2858 | you manually define the neccessary debug information; read your assembler documentation for how you might |
| 2859 | do that. The only |
| 2860 | debugging info needed currently by OProfile is the line-number/filename-VMA association. When profiling assembly |
| 2861 | without debugging info you can always get report for symbols, and optionally for VMA, through <command>opreport -l</command> |
| 2862 | or <command>opreport -d</command>, but this works only for symbols with the right attributes. |
| 2863 | For <command>gas</command> you can get this by |
| 2864 | </para> |
| 2865 | <screen> |
| 2866 | .globl foo |
| 2867 | .type foo,@function |
| 2868 | </screen> |
| 2869 | <para> |
| 2870 | whilst for <command>nasm</command> you must use |
| 2871 | </para> |
| 2872 | <screen> |
| 2873 | GLOBAL foo:function ; [1] |
| 2874 | </screen> |
| 2875 | <para> |
| 2876 | Note that OProfile does not need the global attribute, only the function attribute. |
| 2877 | </para> |
| 2878 | </sect1> |
| 2879 | <!-- |
| 2880 | |
| 2881 | FIXME: I commented this bit out until we've written something ... |
| 2882 | |
| 2883 | improve this ? but look first why this file is special |
| 2884 | <sect2 id="small-functions"> |
| 2885 | <title>Small functions</title> |
| 2886 | <para> |
| 2887 | Very small functions can show strange behavior. The file in your source |
| 2888 | directory of OProfile <filename>$SRC/test-oprofile/understanding/puzzle.c</filename> |
| 2889 | show such example |
| 2890 | </para> |
| 2891 | </sect2> |
| 2892 | --> |
| 2893 | |
| 2894 | <sect1 id="overlapping-symbols"> |
| 2895 | <title>Overlapping symbols in JITed code</title> |
| 2896 | <para> |
| 2897 | Some virtual machines (e.g., Java) may re-JIT a method, resulting in previously |
| 2898 | allocated space for a piece of compiled code to be reused. This means that, at one distinct |
| 2899 | code address, multiple symbols/methods may be present during the run time of the application. |
| 2900 | </para> |
| 2901 | <para> |
| 2902 | Since OProfile samples are buffered and don′t have timing information, there is no way |
| 2903 | to correlate samples with the (possibly) varying address ranges in which the code for a symbol |
| 2904 | may reside. |
| 2905 | An alternative would be flushing the OProfile sampling buffer when we get an unload event, |
| 2906 | but this could result in high overhead. |
| 2907 | </para> |
| 2908 | <para> |
| 2909 | To moderate the problem of overlapping symbols, OProfile tries to select the symbol that was |
| 2910 | present at this address range most of the time. Additionally, other overlapping symbols |
| 2911 | are truncated in the overlapping area. |
| 2912 | This gives reasonable results, because in reality, address reuse typically takes place |
| 2913 | during phase changes of the application -- in particular, during application startup. |
| 2914 | Thus, for optimum profiling results, start the sampling session after application startup |
| 2915 | and burn in. |
| 2916 | </para> |
| 2917 | </sect1> |
| 2918 | |
| 2919 | <sect1 id="hidden-cost"> |
| 2920 | <title>Other discrepancies</title> |
| 2921 | <para> |
| 2922 | Another cause of apparent problems is the hidden cost of instructions. A very |
| 2923 | common example is two memory reads: one from L1 cache and the other from memory: |
| 2924 | the second memory read is likely to have more samples. |
| 2925 | There are many other causes of hidden cost of instructions. A non-exhaustive |
| 2926 | list: mis-predicted branch, TLB cache miss, partial register stall, |
| 2927 | partial register dependencies, memory mismatch stall, re-executed µops. If you want to write |
| 2928 | programs at the assembly level, be sure to take a look at the Intel and |
| 2929 | AMD documentation at <ulink url="http://developer.intel.com/">http://developer.intel.com/</ulink> |
| 2930 | and <ulink url="http://developer.amd.com/devguides.jsp/">http://developer.amd.com/devguides.jsp</ulink>. |
| 2931 | </para> |
| 2932 | </sect1> |
| 2933 | </chapter> |
| 2934 | |
| 2935 | |
| 2936 | <chapter id="ack"> |
| 2937 | <title>Acknowledgments</title> |
| 2938 | <para> |
| 2939 | Thanks to (in no particular order) : Arjan van de Ven, Rik van Riel, Juan Quintela, Philippe Elie, |
| 2940 | Phillipp Rumpf, Tigran Aivazian, Alex Brown, Alisdair Rawsthorne, Bob Montgomery, Ray Bryant, H.J. Lu, |
| 2941 | Jeff Esper, Will Cohen, Graydon Hoare, Cliff Woolley, Alex Tsariounov, Al Stone, Jason Yeh, |
| 2942 | Randolph Chung, Anton Blanchard, Richard Henderson, Andries Brouwer, Bryan Rittmeyer, |
| 2943 | Maynard P. Johnson, |
| 2944 | Richard Reich (rreich@rdrtech.com), Zwane Mwaikambo, Dave Jones, Charles Filtness; and finally Pulp, for "Intro". |
| 2945 | </para> |
| 2946 | </chapter> |
| 2947 | |
| 2948 | </book> |