Mike Dodd | 8cfa702 | 2010-11-17 11:12:26 -0800 | [diff] [blame^] | 1 | <?xml version="1.0" encoding="ISO-8859-1"?> |
| 2 | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> |
| 3 | <html xmlns="http://www.w3.org/1999/xhtml"> |
| 4 | <head> |
| 5 | <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /> |
| 6 | <title>OProfile manual</title> |
| 7 | <meta name="generator" content="DocBook XSL Stylesheets V1.69.1" /> |
| 8 | </head> |
| 9 | <body> |
| 10 | <div class="book" lang="en" xml:lang="en"> |
| 11 | <div class="titlepage"> |
| 12 | <div> |
| 13 | <div> |
| 14 | <h1 class="title"><a id="oprofile-guide"></a>OProfile manual</h1> |
| 15 | </div> |
| 16 | <div> |
| 17 | <div class="authorgroup"> |
| 18 | <div class="author"> |
| 19 | <h3 class="author"><span class="firstname">John</span> <span class="surname">Levon</span></h3> |
| 20 | <div class="affiliation"> |
| 21 | <div class="address"> |
| 22 | <p> |
| 23 | <code class="email"><<a href="mailto:levon@movementarian.org">levon@movementarian.org</a>></code> |
| 24 | </p> |
| 25 | </div> |
| 26 | </div> |
| 27 | </div> |
| 28 | </div> |
| 29 | </div> |
| 30 | <div> |
| 31 | <p class="copyright">Copyright © 2000-2004 Victoria University of Manchester, John Levon and others</p> |
| 32 | </div> |
| 33 | </div> |
| 34 | <hr /> |
| 35 | </div> |
| 36 | <div class="toc"> |
| 37 | <p> |
| 38 | <b>Table of Contents</b> |
| 39 | </p> |
| 40 | <dl> |
| 41 | <dt> |
| 42 | <span class="chapter"> |
| 43 | <a href="#introduction">1. Introduction</a> |
| 44 | </span> |
| 45 | </dt> |
| 46 | <dd> |
| 47 | <dl> |
| 48 | <dt> |
| 49 | <span class="sect1"> |
| 50 | <a href="#applications">1. Applications of OProfile</a> |
| 51 | </span> |
| 52 | </dt> |
| 53 | <dd> |
| 54 | <dl> |
| 55 | <dt> |
| 56 | <span class="sect2"> |
| 57 | <a href="#jitsupport">1.1. Support for dynamically compiled (JIT) code</a> |
| 58 | </span> |
| 59 | </dt> |
| 60 | </dl> |
| 61 | </dd> |
| 62 | <dt> |
| 63 | <span class="sect1"> |
| 64 | <a href="#requirements">2. System requirements</a> |
| 65 | </span> |
| 66 | </dt> |
| 67 | <dt> |
| 68 | <span class="sect1"> |
| 69 | <a href="#resources">3. Internet resources</a> |
| 70 | </span> |
| 71 | </dt> |
| 72 | <dt> |
| 73 | <span class="sect1"> |
| 74 | <a href="#install">4. Installation</a> |
| 75 | </span> |
| 76 | </dt> |
| 77 | <dt> |
| 78 | <span class="sect1"> |
| 79 | <a href="#uninstall">5. Uninstalling OProfile</a> |
| 80 | </span> |
| 81 | </dt> |
| 82 | </dl> |
| 83 | </dd> |
| 84 | <dt> |
| 85 | <span class="chapter"> |
| 86 | <a href="#overview">2. Overview</a> |
| 87 | </span> |
| 88 | </dt> |
| 89 | <dd> |
| 90 | <dl> |
| 91 | <dt> |
| 92 | <span class="sect1"> |
| 93 | <a href="#getting-started">1. Getting started</a> |
| 94 | </span> |
| 95 | </dt> |
| 96 | <dt> |
| 97 | <span class="sect1"> |
| 98 | <a href="#tools-overview">2. Tools summary</a> |
| 99 | </span> |
| 100 | </dt> |
| 101 | </dl> |
| 102 | </dd> |
| 103 | <dt> |
| 104 | <span class="chapter"> |
| 105 | <a href="#controlling">3. Controlling the profiler</a> |
| 106 | </span> |
| 107 | </dt> |
| 108 | <dd> |
| 109 | <dl> |
| 110 | <dt> |
| 111 | <span class="sect1"> |
| 112 | <a href="#controlling-daemon">1. Using <span><strong class="command">opcontrol</strong></span></a> |
| 113 | </span> |
| 114 | </dt> |
| 115 | <dd> |
| 116 | <dl> |
| 117 | <dt> |
| 118 | <span class="sect2"> |
| 119 | <a href="#opcontrolexamples">1.1. Examples</a> |
| 120 | </span> |
| 121 | </dt> |
| 122 | <dt> |
| 123 | <span class="sect2"> |
| 124 | <a href="#eventspec">1.2. Specifying performance counter events</a> |
| 125 | </span> |
| 126 | </dt> |
| 127 | </dl> |
| 128 | </dd> |
| 129 | <dt> |
| 130 | <span class="sect1"> |
| 131 | <a href="#setup-jit">2. Setting up the JIT profiling feature</a> |
| 132 | </span> |
| 133 | </dt> |
| 134 | <dd> |
| 135 | <dl> |
| 136 | <dt> |
| 137 | <span class="sect2"> |
| 138 | <a href="#setup-jit-jvm">2.1. JVM instrumentation</a> |
| 139 | </span> |
| 140 | </dt> |
| 141 | </dl> |
| 142 | </dd> |
| 143 | <dt> |
| 144 | <span class="sect1"> |
| 145 | <a href="#oprofile-gui">3. Using <span><strong class="command">oprof_start</strong></span></a> |
| 146 | </span> |
| 147 | </dt> |
| 148 | <dt> |
| 149 | <span class="sect1"> |
| 150 | <a href="#detailed-parameters">4. Configuration details</a> |
| 151 | </span> |
| 152 | </dt> |
| 153 | <dd> |
| 154 | <dl> |
| 155 | <dt> |
| 156 | <span class="sect2"> |
| 157 | <a href="#hardware-counters">4.1. Hardware performance counters</a> |
| 158 | </span> |
| 159 | </dt> |
| 160 | <dt> |
| 161 | <span class="sect2"> |
| 162 | <a href="#rtc">4.2. OProfile in RTC mode</a> |
| 163 | </span> |
| 164 | </dt> |
| 165 | <dt> |
| 166 | <span class="sect2"> |
| 167 | <a href="#timer">4.3. OProfile in timer interrupt mode</a> |
| 168 | </span> |
| 169 | </dt> |
| 170 | <dt> |
| 171 | <span class="sect2"> |
| 172 | <a href="#p4">4.4. Pentium 4 support</a> |
| 173 | </span> |
| 174 | </dt> |
| 175 | <dt> |
| 176 | <span class="sect2"> |
| 177 | <a href="#ia64">4.5. Intel Itanium 2 support</a> |
| 178 | </span> |
| 179 | </dt> |
| 180 | <dt> |
| 181 | <span class="sect2"> |
| 182 | <a href="#ppc64">4.6. PowerPC64 support</a> |
| 183 | </span> |
| 184 | </dt> |
| 185 | <dt> |
| 186 | <span class="sect2"> |
| 187 | <a href="#cell-be">4.7. Cell Broadband Engine support</a> |
| 188 | </span> |
| 189 | </dt> |
| 190 | <dt> |
| 191 | <span class="sect2"> |
| 192 | <a href="#amd-ibs-support">4.8. AMD64 (x86_64) Instruction-Based Sampling (IBS) support</a> |
| 193 | </span> |
| 194 | </dt> |
| 195 | <dt> |
| 196 | <span class="sect2"> |
| 197 | <a href="#misuse">4.9. Dangerous counter settings</a> |
| 198 | </span> |
| 199 | </dt> |
| 200 | </dl> |
| 201 | </dd> |
| 202 | </dl> |
| 203 | </dd> |
| 204 | <dt> |
| 205 | <span class="chapter"> |
| 206 | <a href="#results">4. Obtaining results</a> |
| 207 | </span> |
| 208 | </dt> |
| 209 | <dd> |
| 210 | <dl> |
| 211 | <dt> |
| 212 | <span class="sect1"> |
| 213 | <a href="#profile-spec">1. Profile specifications</a> |
| 214 | </span> |
| 215 | </dt> |
| 216 | <dd> |
| 217 | <dl> |
| 218 | <dt> |
| 219 | <span class="sect2"> |
| 220 | <a href="#profile-spec-examples">1.1. Examples</a> |
| 221 | </span> |
| 222 | </dt> |
| 223 | <dt> |
| 224 | <span class="sect2"> |
| 225 | <a href="#profile-spec-details">1.2. Profile specification parameters</a> |
| 226 | </span> |
| 227 | </dt> |
| 228 | <dt> |
| 229 | <span class="sect2"> |
| 230 | <a href="#locating-and-managing-binary-images">1.3. Locating and managing binary images</a> |
| 231 | </span> |
| 232 | </dt> |
| 233 | <dt> |
| 234 | <span class="sect2"> |
| 235 | <a href="#no-results">1.4. What to do when you don't get any results</a> |
| 236 | </span> |
| 237 | </dt> |
| 238 | </dl> |
| 239 | </dd> |
| 240 | <dt> |
| 241 | <span class="sect1"> |
| 242 | <a href="#opreport">2. Image summaries and symbol summaries (<span><strong class="command">opreport</strong></span>)</a> |
| 243 | </span> |
| 244 | </dt> |
| 245 | <dd> |
| 246 | <dl> |
| 247 | <dt> |
| 248 | <span class="sect2"> |
| 249 | <a href="#opreport-merging">2.1. Merging separate profiles</a> |
| 250 | </span> |
| 251 | </dt> |
| 252 | <dt> |
| 253 | <span class="sect2"> |
| 254 | <a href="#opreport-comparison">2.2. Side-by-side multiple results</a> |
| 255 | </span> |
| 256 | </dt> |
| 257 | <dt> |
| 258 | <span class="sect2"> |
| 259 | <a href="#opreport-callgraph">2.3. Callgraph output</a> |
| 260 | </span> |
| 261 | </dt> |
| 262 | <dt> |
| 263 | <span class="sect2"> |
| 264 | <a href="#opreport-diff">2.4. Differential profiles with <span><strong class="command">opreport</strong></span></a> |
| 265 | </span> |
| 266 | </dt> |
| 267 | <dt> |
| 268 | <span class="sect2"> |
| 269 | <a href="#opreport-anon">2.5. Anonymous executable mappings</a> |
| 270 | </span> |
| 271 | </dt> |
| 272 | <dt> |
| 273 | <span class="sect2"> |
| 274 | <a href="#opreport-xml">2.6. XML formatted output</a> |
| 275 | </span> |
| 276 | </dt> |
| 277 | <dt> |
| 278 | <span class="sect2"> |
| 279 | <a href="#opreport-options">2.7. Options for <span><strong class="command">opreport</strong></span></a> |
| 280 | </span> |
| 281 | </dt> |
| 282 | </dl> |
| 283 | </dd> |
| 284 | <dt> |
| 285 | <span class="sect1"> |
| 286 | <a href="#opannotate">3. Outputting annotated source (<span><strong class="command">opannotate</strong></span>)</a> |
| 287 | </span> |
| 288 | </dt> |
| 289 | <dd> |
| 290 | <dl> |
| 291 | <dt> |
| 292 | <span class="sect2"> |
| 293 | <a href="#opannotate-finding-source">3.1. Locating source files</a> |
| 294 | </span> |
| 295 | </dt> |
| 296 | <dt> |
| 297 | <span class="sect2"> |
| 298 | <a href="#opannotate-details">3.2. Usage of <span><strong class="command">opannotate</strong></span></a> |
| 299 | </span> |
| 300 | </dt> |
| 301 | </dl> |
| 302 | </dd> |
| 303 | <dt> |
| 304 | <span class="sect1"> |
| 305 | <a href="#getting-jit-reports">4. OProfile results with JIT samples</a> |
| 306 | </span> |
| 307 | </dt> |
| 308 | <dt> |
| 309 | <span class="sect1"> |
| 310 | <a href="#opgprof">5. <span><strong class="command">gprof</strong></span>-compatible output (<span><strong class="command">opgprof</strong></span>)</a> |
| 311 | </span> |
| 312 | </dt> |
| 313 | <dd> |
| 314 | <dl> |
| 315 | <dt> |
| 316 | <span class="sect2"> |
| 317 | <a href="#opgprof-details">5.1. Usage of <span><strong class="command">opgprof</strong></span></a> |
| 318 | </span> |
| 319 | </dt> |
| 320 | </dl> |
| 321 | </dd> |
| 322 | <dt> |
| 323 | <span class="sect1"> |
| 324 | <a href="#oparchive">6. Archiving measurements (<span><strong class="command">oparchive</strong></span>)</a> |
| 325 | </span> |
| 326 | </dt> |
| 327 | <dd> |
| 328 | <dl> |
| 329 | <dt> |
| 330 | <span class="sect2"> |
| 331 | <a href="#oparchive-details">6.1. Usage of <span><strong class="command">oparchive</strong></span></a> |
| 332 | </span> |
| 333 | </dt> |
| 334 | </dl> |
| 335 | </dd> |
| 336 | <dt> |
| 337 | <span class="sect1"> |
| 338 | <a href="#opimport">7. Converting sample database files (<span><strong class="command">opimport</strong></span>)</a> |
| 339 | </span> |
| 340 | </dt> |
| 341 | <dd> |
| 342 | <dl> |
| 343 | <dt> |
| 344 | <span class="sect2"> |
| 345 | <a href="#opimport-details">7.1. Usage of <span><strong class="command">opimport</strong></span></a> |
| 346 | </span> |
| 347 | </dt> |
| 348 | </dl> |
| 349 | </dd> |
| 350 | </dl> |
| 351 | </dd> |
| 352 | <dt> |
| 353 | <span class="chapter"> |
| 354 | <a href="#interpreting">5. Interpreting profiling results</a> |
| 355 | </span> |
| 356 | </dt> |
| 357 | <dd> |
| 358 | <dl> |
| 359 | <dt> |
| 360 | <span class="sect1"> |
| 361 | <a href="#irq-latency">1. Profiling interrupt latency</a> |
| 362 | </span> |
| 363 | </dt> |
| 364 | <dt> |
| 365 | <span class="sect1"> |
| 366 | <a href="#kernel-profiling">2. Kernel profiling</a> |
| 367 | </span> |
| 368 | </dt> |
| 369 | <dd> |
| 370 | <dl> |
| 371 | <dt> |
| 372 | <span class="sect2"> |
| 373 | <a href="#irq-masking">2.1. Interrupt masking</a> |
| 374 | </span> |
| 375 | </dt> |
| 376 | <dt> |
| 377 | <span class="sect2"> |
| 378 | <a href="#idle">2.2. Idle time</a> |
| 379 | </span> |
| 380 | </dt> |
| 381 | <dt> |
| 382 | <span class="sect2"> |
| 383 | <a href="#kernel-modules">2.3. Profiling kernel modules</a> |
| 384 | </span> |
| 385 | </dt> |
| 386 | </dl> |
| 387 | </dd> |
| 388 | <dt> |
| 389 | <span class="sect1"> |
| 390 | <a href="#interpreting-callgraph">3. Interpreting call-graph profiles</a> |
| 391 | </span> |
| 392 | </dt> |
| 393 | <dt> |
| 394 | <span class="sect1"> |
| 395 | <a href="#debug-info">4. Inaccuracies in annotated source</a> |
| 396 | </span> |
| 397 | </dt> |
| 398 | <dd> |
| 399 | <dl> |
| 400 | <dt> |
| 401 | <span class="sect2"> |
| 402 | <a href="#effect-of-optimizations">4.1. Side effects of optimizations</a> |
| 403 | </span> |
| 404 | </dt> |
| 405 | <dt> |
| 406 | <span class="sect2"> |
| 407 | <a href="#prologues">4.2. Prologues and epilogues</a> |
| 408 | </span> |
| 409 | </dt> |
| 410 | <dt> |
| 411 | <span class="sect2"> |
| 412 | <a href="#inlined-function">4.3. Inlined functions</a> |
| 413 | </span> |
| 414 | </dt> |
| 415 | <dt> |
| 416 | <span class="sect2"> |
| 417 | <a href="#wrong-linenr-info">4.4. Inaccuracy in line number information</a> |
| 418 | </span> |
| 419 | </dt> |
| 420 | </dl> |
| 421 | </dd> |
| 422 | <dt> |
| 423 | <span class="sect1"> |
| 424 | <a href="#symbol-without-debug-info">5. Assembly functions</a> |
| 425 | </span> |
| 426 | </dt> |
| 427 | <dt> |
| 428 | <span class="sect1"> |
| 429 | <a href="#overlapping-symbols">6. Overlapping symbols in JITed code</a> |
| 430 | </span> |
| 431 | </dt> |
| 432 | <dt> |
| 433 | <span class="sect1"> |
| 434 | <a href="#hidden-cost">7. Other discrepancies</a> |
| 435 | </span> |
| 436 | </dt> |
| 437 | </dl> |
| 438 | </dd> |
| 439 | <dt> |
| 440 | <span class="chapter"> |
| 441 | <a href="#ack">6. Acknowledgments</a> |
| 442 | </span> |
| 443 | </dt> |
| 444 | </dl> |
| 445 | </div> |
| 446 | <div class="chapter" lang="en" xml:lang="en"> |
| 447 | <div class="titlepage"> |
| 448 | <div> |
| 449 | <div> |
| 450 | <h2 class="title"><a id="introduction"></a>Chapter 1. Introduction</h2> |
| 451 | </div> |
| 452 | </div> |
| 453 | </div> |
| 454 | <div class="toc"> |
| 455 | <p> |
| 456 | <b>Table of Contents</b> |
| 457 | </p> |
| 458 | <dl> |
| 459 | <dt> |
| 460 | <span class="sect1"> |
| 461 | <a href="#applications">1. Applications of OProfile</a> |
| 462 | </span> |
| 463 | </dt> |
| 464 | <dd> |
| 465 | <dl> |
| 466 | <dt> |
| 467 | <span class="sect2"> |
| 468 | <a href="#jitsupport">1.1. Support for dynamically compiled (JIT) code</a> |
| 469 | </span> |
| 470 | </dt> |
| 471 | </dl> |
| 472 | </dd> |
| 473 | <dt> |
| 474 | <span class="sect1"> |
| 475 | <a href="#requirements">2. System requirements</a> |
| 476 | </span> |
| 477 | </dt> |
| 478 | <dt> |
| 479 | <span class="sect1"> |
| 480 | <a href="#resources">3. Internet resources</a> |
| 481 | </span> |
| 482 | </dt> |
| 483 | <dt> |
| 484 | <span class="sect1"> |
| 485 | <a href="#install">4. Installation</a> |
| 486 | </span> |
| 487 | </dt> |
| 488 | <dt> |
| 489 | <span class="sect1"> |
| 490 | <a href="#uninstall">5. Uninstalling OProfile</a> |
| 491 | </span> |
| 492 | </dt> |
| 493 | </dl> |
| 494 | </div> |
| 495 | <p> |
| 496 | This manual applies to OProfile version 0.9.6. |
| 497 | OProfile is a profiling system for Linux 2.2/2.4/2.6 systems on a number of architectures. It is capable of profiling |
| 498 | all parts of a running system, from the kernel (including modules and interrupt handlers) to shared libraries |
| 499 | to binaries. It runs transparently in the background collecting information at a low overhead. These |
| 500 | features make it ideal for profiling entire systems to determine bottle necks in real-world systems. |
| 501 | </p> |
| 502 | <p> |
| 503 | Many CPUs provide "performance counters", hardware registers that can count "events"; for example, |
| 504 | cache misses, or CPU cycles. OProfile provides profiles of code based on the number of these occurring events: |
| 505 | repeatedly, every time a certain (configurable) number of events has occurred, the PC value is recorded. |
| 506 | This information is aggregated into profiles for each binary image.</p> |
| 507 | <p> |
| 508 | Some hardware setups do not allow OProfile to use performance counters: in these cases, no |
| 509 | events are available, and OProfile operates in timer/RTC mode, as described in later chapters. |
| 510 | </p> |
| 511 | <div class="sect1" lang="en" xml:lang="en"> |
| 512 | <div class="titlepage"> |
| 513 | <div> |
| 514 | <div> |
| 515 | <h2 class="title" style="clear: both"><a id="applications"></a>1. Applications of OProfile</h2> |
| 516 | </div> |
| 517 | </div> |
| 518 | </div> |
| 519 | <p> |
| 520 | OProfile is useful in a number of situations. You might want to use OProfile when you : |
| 521 | </p> |
| 522 | <div class="itemizedlist"> |
| 523 | <ul type="disc"> |
| 524 | <li> |
| 525 | <p>need low overhead</p> |
| 526 | </li> |
| 527 | <li> |
| 528 | <p>cannot use highly intrusive profiling methods</p> |
| 529 | </li> |
| 530 | <li> |
| 531 | <p>need to profile interrupt handlers</p> |
| 532 | </li> |
| 533 | <li> |
| 534 | <p>need to profile an application and its shared libraries</p> |
| 535 | </li> |
| 536 | <li> |
| 537 | <p>need to profile dynamically compiled code of supported virtual machines (see <a href="#jitsupport" title="1.1. Support for dynamically compiled (JIT) code">Section 1.1, “Support for dynamically compiled (JIT) code”</a>)</p> |
| 538 | </li> |
| 539 | <li> |
| 540 | <p>need to capture the performance behaviour of entire system</p> |
| 541 | </li> |
| 542 | <li> |
| 543 | <p>want to examine hardware effects such as cache misses</p> |
| 544 | </li> |
| 545 | <li> |
| 546 | <p>want detailed source annotation</p> |
| 547 | </li> |
| 548 | <li> |
| 549 | <p>want instruction-level profiles</p> |
| 550 | </li> |
| 551 | <li> |
| 552 | <p>want call-graph profiles</p> |
| 553 | </li> |
| 554 | </ul> |
| 555 | </div> |
| 556 | <p> |
| 557 | OProfile is not a panacea. OProfile might not be a complete solution when you : |
| 558 | </p> |
| 559 | <div class="itemizedlist"> |
| 560 | <ul type="disc"> |
| 561 | <li> |
| 562 | <p>require call graph profiles on platforms other than 2.6/x86</p> |
| 563 | </li> |
| 564 | <li> |
| 565 | <p>don't have root permissions</p> |
| 566 | </li> |
| 567 | <li> |
| 568 | <p>require 100% instruction-accurate profiles</p> |
| 569 | </li> |
| 570 | <li> |
| 571 | <p>need function call counts or an interstitial profiling API</p> |
| 572 | </li> |
| 573 | <li> |
| 574 | <p>cannot tolerate any disturbance to the system whatsoever</p> |
| 575 | </li> |
| 576 | <li> |
| 577 | <p>need to profile interpreted or dynamically compiled code of non-supported virtual machines</p> |
| 578 | </li> |
| 579 | </ul> |
| 580 | </div> |
| 581 | <div class="sect2" lang="en" xml:lang="en"> |
| 582 | <div class="titlepage"> |
| 583 | <div> |
| 584 | <div> |
| 585 | <h3 class="title"><a id="jitsupport"></a>1.1. Support for dynamically compiled (JIT) code</h3> |
| 586 | </div> |
| 587 | </div> |
| 588 | </div> |
| 589 | <p> |
| 590 | Older versions of OProfile were not capable of attributing samples to symbols from dynamically |
| 591 | compiled code, i.e. "just-in-time (JIT) code". Typical JIT compilers load the JIT code into |
| 592 | anonymous memory regions. OProfile reported the samples from such code, but the attribution |
| 593 | provided was simply: |
| 594 | </p> |
| 595 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 596 | <tr> |
| 597 | <td> |
| 598 | <pre class="screen">"anon: <tgid><address range>" </pre> |
| 599 | </td> |
| 600 | </tr> |
| 601 | </table> |
| 602 | <p> |
| 603 | Due to this limitation, it wasn't possible to profile applications executed by virtual machines (VMs) |
| 604 | like the Java Virtual Machine. OProfile now contains an infrastructure to support JITed code. |
| 605 | A development library is provided to allow developers |
| 606 | to add support for any VM that produces dynamically compiled code (see the <span class="emphasis"><em>OProfile JIT agent |
| 607 | developer guide</em></span>). |
| 608 | In addition, built-in support is included for the following:</p> |
| 609 | <div class="itemizedlist"> |
| 610 | <ul type="disc"> |
| 611 | <li>JVMTI agent library for Java (1.5 and higher)</li> |
| 612 | <li>JVMPI agent library for Java (1.5 and lower)</li> |
| 613 | </ul> |
| 614 | </div> |
| 615 | <p> |
| 616 | For information on how to use OProfile's JIT support, see <a href="#setup-jit" title="2. Setting up the JIT profiling feature">Section 2, “Setting up the JIT profiling feature”</a>. |
| 617 | </p> |
| 618 | </div> |
| 619 | </div> |
| 620 | <div class="sect1" lang="en" xml:lang="en"> |
| 621 | <div class="titlepage"> |
| 622 | <div> |
| 623 | <div> |
| 624 | <h2 class="title" style="clear: both"><a id="requirements"></a>2. System requirements</h2> |
| 625 | </div> |
| 626 | </div> |
| 627 | </div> |
| 628 | <div class="variablelist"> |
| 629 | <dl> |
| 630 | <dt> |
| 631 | <span class="term">Linux kernel 2.2/2.4/2.6</span> |
| 632 | </dt> |
| 633 | <dd> |
| 634 | <p> |
| 635 | OProfile uses a kernel module that can be compiled for |
| 636 | 2.2.11 or later and 2.4. 2.4.10 or above is required if you use the |
| 637 | boot-time kernel option <code class="option">nosmp</code>. 2.6 kernels are supported with the in-kernel |
| 638 | OProfile driver. Note that only 32-bit x86 and IA64 are supported on 2.2/2.4 kernels. |
| 639 | </p> |
| 640 | <p> |
| 641 | 2.6 kernels are strongly recommended. Under 2.4, OProfile may cause system crashes if power |
| 642 | management is used, or the BIOS does not correctly deal with local APICs. |
| 643 | </p> |
| 644 | <p> |
| 645 | PPC64 processors (Power4/Power5/PPC970, etc.) require a recent (> 2.6.5) kernel with the line |
| 646 | <code class="constant">#define PV_970</code> present in <code class="filename">include/asm-ppc64/processor.h</code>. |
| 647 | |
| 648 | </p> |
| 649 | <p> |
| 650 | Profiling the Cell Broadband Engine PowerPC Processing Element (PPE) requires a kernel version |
| 651 | of 2.6.18 or more recent. |
| 652 | Profiling the Cell Broadband Engine Synergistic Processing Element (SPE) requires a kernel version |
| 653 | of 2.6.22 or more recent. Additionally, full support of SPE profiling requires a BFD library |
| 654 | from binutils code dated January 2007 or later. To ensure the proper BFD support exists, run |
| 655 | the <code class="code">configure</code> utility with <code class="code">--with-target=cell-be</code>. |
| 656 | |
| 657 | Profiling the Cell Broadband Engine using SPU events requires a kernel version of 2.6.29-rc1 |
| 658 | or more recent. |
| 659 | |
| 660 | </p> |
| 661 | <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3>Attempting to profile SPEs with kernel versions older than 2.6.22 may cause the |
| 662 | system to crash.</div> |
| 663 | <p> |
| 664 | </p> |
| 665 | <p> |
| 666 | Instruction-Based Sampling (IBS) profile on AMD family10h processors requires |
| 667 | kernel version 2.6.28-rc2 or later. |
| 668 | </p> |
| 669 | </dd> |
| 670 | <dt> |
| 671 | <span class="term">modutils 2.4.6 or above</span> |
| 672 | </dt> |
| 673 | <dd> |
| 674 | <p> |
| 675 | You should have installed modutils 2.4.6 or higher (in fact earlier versions work well in almost all |
| 676 | cases). |
| 677 | </p> |
| 678 | </dd> |
| 679 | <dt> |
| 680 | <span class="term">Supported architecture</span> |
| 681 | </dt> |
| 682 | <dd> |
| 683 | <p> |
| 684 | For Intel IA32, a CPU with either a P6 generation or Pentium 4 core is |
| 685 | required. In marketing terms this translates to anything |
| 686 | between an Intel Pentium Pro (not Pentium Classics) and |
| 687 | a Pentium 4 / Xeon, including all Celerons. The AMD |
| 688 | Athlon, Opteron, Phenom, and Turion CPUs are also supported. Other IA32 |
| 689 | CPU types only support the RTC mode of OProfile; please |
| 690 | see later in this manual for details. Hyper-threaded Pentium IVs |
| 691 | are not supported in 2.4. For 2.4 kernels, the Intel |
| 692 | IA-64 CPUs are also supported. For 2.6 kernels, there is additionally |
| 693 | support for Alpha processors, MIPS, ARM, x86-64, sparc64, ppc64, AVR32, and, |
| 694 | in timer mode, PA-RISC and s390. |
| 695 | </p> |
| 696 | </dd> |
| 697 | <dt> |
| 698 | <span class="term">Uniprocessor or SMP</span> |
| 699 | </dt> |
| 700 | <dd> |
| 701 | <p> |
| 702 | SMP machines are fully supported. |
| 703 | </p> |
| 704 | </dd> |
| 705 | <dt> |
| 706 | <span class="term">Required libraries</span> |
| 707 | </dt> |
| 708 | <dd> |
| 709 | <p> |
| 710 | These libraries are required : <code class="filename">popt</code>, <code class="filename">bfd</code>, |
| 711 | <code class="filename">liberty</code> (debian users: libiberty is provided in binutils-dev package), <code class="filename">dl</code>, |
| 712 | plus the standard C++ libraries. |
| 713 | </p> |
| 714 | </dd> |
| 715 | <dt> |
| 716 | <span class="term">Required user account</span> |
| 717 | </dt> |
| 718 | <dd> |
| 719 | <p> |
| 720 | For secure processing of sample data from JIT virtual machines (e.g., Java), |
| 721 | the special user account "oprofile" must exist on the system. The 'configure' |
| 722 | and 'make install' operations will print warning messages if this |
| 723 | account is not found. If you intend to profile JITed code, you must create |
| 724 | a group account named 'oprofile' and then create the 'oprofile' user account, |
| 725 | setting the default group to 'oprofile'. A runtime error message is printed to |
| 726 | the oprofile daemon log when processing JIT samples if this special user |
| 727 | account cannot be found. |
| 728 | </p> |
| 729 | </dd> |
| 730 | <dt> |
| 731 | <span class="term">OProfile GUI</span> |
| 732 | </dt> |
| 733 | <dd> |
| 734 | <p> |
| 735 | The use of the GUI to start the profiler requires the <code class="filename">Qt 2</code> library. <code class="filename">Qt 3</code> should |
| 736 | also work. |
| 737 | </p> |
| 738 | </dd> |
| 739 | <dt> |
| 740 | <span class="term"> |
| 741 | <span class="acronym">ELF</span> |
| 742 | </span> |
| 743 | </dt> |
| 744 | <dd> |
| 745 | <p> |
| 746 | Probably not too strenuous a requirement, but older <span class="acronym">A.OUT</span> binaries/libraries are not supported. |
| 747 | </p> |
| 748 | </dd> |
| 749 | <dt> |
| 750 | <span class="term">K&R coding style</span> |
| 751 | </dt> |
| 752 | <dd> |
| 753 | <p> |
| 754 | OK, so it's not really a requirement, but I wish it was... |
| 755 | </p> |
| 756 | </dd> |
| 757 | </dl> |
| 758 | </div> |
| 759 | </div> |
| 760 | <div class="sect1" lang="en" xml:lang="en"> |
| 761 | <div class="titlepage"> |
| 762 | <div> |
| 763 | <div> |
| 764 | <h2 class="title" style="clear: both"><a id="resources"></a>3. Internet resources</h2> |
| 765 | </div> |
| 766 | </div> |
| 767 | </div> |
| 768 | <div class="variablelist"> |
| 769 | <dl> |
| 770 | <dt> |
| 771 | <span class="term">Web page</span> |
| 772 | </dt> |
| 773 | <dd> |
| 774 | <p> |
| 775 | There is a web page (which you may be reading now) at |
| 776 | <a href="http://oprofile.sf.net/">http://oprofile.sf.net/</a>. |
| 777 | </p> |
| 778 | </dd> |
| 779 | <dt> |
| 780 | <span class="term">Download</span> |
| 781 | </dt> |
| 782 | <dd> |
| 783 | <p> |
| 784 | You can download a source tarball or get anonymous CVS at the sourceforge page, |
| 785 | <a href="http://sf.net/projects/oprofile/">http://sf.net/projects/oprofile/</a>. |
| 786 | </p> |
| 787 | </dd> |
| 788 | <dt> |
| 789 | <span class="term">Mailing list</span> |
| 790 | </dt> |
| 791 | <dd> |
| 792 | <p> |
| 793 | There is a low-traffic OProfile-specific mailing list, details at |
| 794 | <a href="http://sf.net/mail/?group_id=16191">http://sf.net/mail/?group_id=16191</a>. |
| 795 | </p> |
| 796 | </dd> |
| 797 | <dt> |
| 798 | <span class="term">Bug tracker</span> |
| 799 | </dt> |
| 800 | <dd> |
| 801 | <p> |
| 802 | There is a bug tracker for OProfile at SourceForge, |
| 803 | <a href="http://sf.net/tracker/?group_id=16191&atid=116191">http://sf.net/tracker/?group_id=16191&atid=116191</a>. |
| 804 | </p> |
| 805 | </dd> |
| 806 | <dt> |
| 807 | <span class="term">IRC channel</span> |
| 808 | </dt> |
| 809 | <dd> |
| 810 | <p> |
| 811 | Several OProfile developers and users sometimes hang out on channel <span><strong class="command">#oprofile</strong></span> |
| 812 | on the <a href="http://oftc.net">OFTC</a> network. |
| 813 | </p> |
| 814 | </dd> |
| 815 | </dl> |
| 816 | </div> |
| 817 | </div> |
| 818 | <div class="sect1" lang="en" xml:lang="en"> |
| 819 | <div class="titlepage"> |
| 820 | <div> |
| 821 | <div> |
| 822 | <h2 class="title" style="clear: both"><a id="install"></a>4. Installation</h2> |
| 823 | </div> |
| 824 | </div> |
| 825 | </div> |
| 826 | <p> |
| 827 | First you need to build OProfile and install it. <span><strong class="command">./configure</strong></span>, <span><strong class="command">make</strong></span>, <span><strong class="command">make install</strong></span> |
| 828 | is often all you need, but note these arguments to <span><strong class="command">./configure</strong></span> : |
| 829 | </p> |
| 830 | <div class="variablelist"> |
| 831 | <dl> |
| 832 | <dt> |
| 833 | <span class="term"> |
| 834 | <code class="option">--with-linux</code> |
| 835 | </span> |
| 836 | </dt> |
| 837 | <dd> |
| 838 | <p> |
| 839 | Use this option to specify the location of the kernel source tree you wish |
| 840 | to compile against. The kernel module is built against this source and |
| 841 | will only work with a running kernel built from the same source with |
| 842 | exact same options, so it is important you specify this option if you need |
| 843 | to. |
| 844 | </p> |
| 845 | </dd> |
| 846 | <dt> |
| 847 | <span class="term"> |
| 848 | <code class="option">--with-java</code> |
| 849 | </span> |
| 850 | </dt> |
| 851 | <dd> |
| 852 | <p> |
| 853 | Use this option if you need to profile Java applications. Also, see |
| 854 | <a href="#requirements" title="2. System requirements">Section 2, “System requirements”</a>, "Required user account". This option |
| 855 | is used to specify the location of the Java Development Kit (JDK) |
| 856 | source tree you wish to use. This is necessary to get the interface description |
| 857 | of the JVMPI (or JVMTI) interface to compile the JIT support code successfully. |
| 858 | </p> |
| 859 | <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"> |
| 860 | <h3 class="title">Note</h3> |
| 861 | <p> |
| 862 | The Java Runtime Environment (JRE) does not include the development |
| 863 | files that are required to compile the JIT support code, so the full |
| 864 | JDK must be installed in order to use this option. |
| 865 | </p> |
| 866 | </div> |
| 867 | <p> |
| 868 | By default, the Oprofile JIT support libraries will be installed in |
| 869 | <code class="filename"><oprof_install_dir>/lib/oprofile</code>. To build |
| 870 | and install OProfile and the JIT support libraries as 64-bit, you can |
| 871 | do something like the following: |
| 872 | </p> |
| 873 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 874 | <tr> |
| 875 | <td> |
| 876 | <pre class="screen"> |
| 877 | # CFLAGS="-m64" CXXFLAGS="-m64" ./configure \ |
| 878 | --with-kernel-support --with-java={my_jdk_installdir} \ |
| 879 | --libdir=/usr/local/lib64 |
| 880 | </pre> |
| 881 | </td> |
| 882 | </tr> |
| 883 | </table> |
| 884 | <p> |
| 885 | </p> |
| 886 | <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"> |
| 887 | <h3 class="title">Note</h3> |
| 888 | <p> |
| 889 | If you encounter errors building 64-bit, you should |
| 890 | install libtool 1.5.26 or later since that release of |
| 891 | libtool fixes known problems for certain platforms. |
| 892 | If you install libtool into a non-standard location, |
| 893 | you'll need to edit the invocation of 'aclocal' in |
| 894 | OProfile's autogen.sh as follows (assume an install |
| 895 | location of /usr/local): |
| 896 | </p> |
| 897 | <p> |
| 898 | <code class="code">aclocal -I m4 -I /usr/local/share/aclocal</code> |
| 899 | </p> |
| 900 | </div> |
| 901 | </dd> |
| 902 | <dt> |
| 903 | <span class="term"> |
| 904 | <code class="option">--with-kernel-support</code> |
| 905 | </span> |
| 906 | </dt> |
| 907 | <dd> |
| 908 | <p> |
| 909 | Use this option with 2.6 and above kernels to indicate the |
| 910 | kernel provides the OProfile device driver. |
| 911 | </p> |
| 912 | </dd> |
| 913 | <dt> |
| 914 | <span class="term"> |
| 915 | <code class="option">--with-qt-dir/includes/libraries</code> |
| 916 | </span> |
| 917 | </dt> |
| 918 | <dd> |
| 919 | <p> |
| 920 | Specify the location of Qt headers and libraries. It defaults to searching in |
| 921 | <code class="constant">$QTDIR</code> if these are not specified. |
| 922 | </p> |
| 923 | </dd> |
| 924 | <dt> |
| 925 | <a id="disable-werror"></a> |
| 926 | <span class="term"> |
| 927 | <code class="option">--disable-werror</code> |
| 928 | </span> |
| 929 | </dt> |
| 930 | <dd> |
| 931 | <p> |
| 932 | Development versions of OProfile build by |
| 933 | default with <code class="option">-Werror</code>. This option turns |
| 934 | <code class="option">-Werror</code> off. |
| 935 | </p> |
| 936 | </dd> |
| 937 | <dt> |
| 938 | <a id="disable-optimization"></a> |
| 939 | <span class="term"> |
| 940 | <code class="option">--disable-optimization</code> |
| 941 | </span> |
| 942 | </dt> |
| 943 | <dd> |
| 944 | <p> |
| 945 | Disable the <code class="option">-O2</code> compiler flag |
| 946 | (useful if you discover an OProfile bug and want to give a useful |
| 947 | back-trace etc.) |
| 948 | </p> |
| 949 | </dd> |
| 950 | </dl> |
| 951 | </div> |
| 952 | <p> |
| 953 | You'll need to have a configured kernel source for the current kernel |
| 954 | to build the module for 2.4 kernels. Since all distributions provide different kernels it's unlikely the running kernel match the configured source |
| 955 | you installed. The safest way is to recompile your own kernel, run it and compile oprofile. It is also recommended that if you have a |
| 956 | uniprocessor machine, you enable the local APIC / IO_APIC support for |
| 957 | your kernel (this is automatically enabled for SMP kernels). With many BIOS, kernel >= 2.6.9 and UP kernel it's not sufficient to enable the local APIC you must also turn it on explicitly at boot time by providing "lapic" option to the kernel. On |
| 958 | machines with power management, such as laptops, the power management |
| 959 | must be turned off when using OProfile with 2.4 kernels. The power management software |
| 960 | in the BIOS cannot handle the non-maskable interrupts (NMIs) used by |
| 961 | OProfile for data collection. If you use the NMI watchdog, be aware that |
| 962 | the watchdog is disabled when profiling starts, and not re-enabled until the |
| 963 | OProfile module is removed (or, in 2.6, when OProfile is not running). If you compile OProfile for |
| 964 | a 2.2 kernel you must be root to compile the module. If you are using |
| 965 | 2.6 kernels or higher, you do not need kernel source, as long as the |
| 966 | OProfile driver is enabled; additionally, you should not need to disable |
| 967 | power management. |
| 968 | </p> |
| 969 | <p> |
| 970 | Please note that you must save or have available the <code class="filename">vmlinux</code> file |
| 971 | generated during a kernel compile, as OProfile needs it (you can use |
| 972 | <code class="option">--no-vmlinux</code>, but this will prevent kernel profiling). |
| 973 | </p> |
| 974 | </div> |
| 975 | <div class="sect1" lang="en" xml:lang="en"> |
| 976 | <div class="titlepage"> |
| 977 | <div> |
| 978 | <div> |
| 979 | <h2 class="title" style="clear: both"><a id="uninstall"></a>5. Uninstalling OProfile</h2> |
| 980 | </div> |
| 981 | </div> |
| 982 | </div> |
| 983 | <p> |
| 984 | You must have the source tree available to uninstall OProfile; a <span><strong class="command">make uninstall</strong></span> will |
| 985 | remove all installed files except your configuration file in the directory <code class="filename">~/.oprofile</code>. |
| 986 | </p> |
| 987 | </div> |
| 988 | </div> |
| 989 | <div class="chapter" lang="en" xml:lang="en"> |
| 990 | <div class="titlepage"> |
| 991 | <div> |
| 992 | <div> |
| 993 | <h2 class="title"><a id="overview"></a>Chapter 2. Overview</h2> |
| 994 | </div> |
| 995 | </div> |
| 996 | </div> |
| 997 | <div class="toc"> |
| 998 | <p> |
| 999 | <b>Table of Contents</b> |
| 1000 | </p> |
| 1001 | <dl> |
| 1002 | <dt> |
| 1003 | <span class="sect1"> |
| 1004 | <a href="#getting-started">1. Getting started</a> |
| 1005 | </span> |
| 1006 | </dt> |
| 1007 | <dt> |
| 1008 | <span class="sect1"> |
| 1009 | <a href="#tools-overview">2. Tools summary</a> |
| 1010 | </span> |
| 1011 | </dt> |
| 1012 | </dl> |
| 1013 | </div> |
| 1014 | <div class="sect1" lang="en" xml:lang="en"> |
| 1015 | <div class="titlepage"> |
| 1016 | <div> |
| 1017 | <div> |
| 1018 | <h2 class="title" style="clear: both"><a id="getting-started"></a>1. Getting started</h2> |
| 1019 | </div> |
| 1020 | </div> |
| 1021 | </div> |
| 1022 | <p> |
| 1023 | Before you can use OProfile, you must set it up. The minimum setup required for this |
| 1024 | is to tell OProfile where the <code class="filename">vmlinux</code> file corresponding to the |
| 1025 | running kernel is, for example : |
| 1026 | </p> |
| 1027 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 1028 | <tr> |
| 1029 | <td> |
| 1030 | <pre class="screen">opcontrol --vmlinux=/boot/vmlinux-`uname -r`</pre> |
| 1031 | </td> |
| 1032 | </tr> |
| 1033 | </table> |
| 1034 | <p> |
| 1035 | If you don't want to profile the kernel itself, |
| 1036 | you can tell OProfile you don't have a <code class="filename">vmlinux</code> file : |
| 1037 | </p> |
| 1038 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 1039 | <tr> |
| 1040 | <td> |
| 1041 | <pre class="screen">opcontrol --no-vmlinux</pre> |
| 1042 | </td> |
| 1043 | </tr> |
| 1044 | </table> |
| 1045 | <p> |
| 1046 | Now we are ready to start the daemon (<span><strong class="command">oprofiled</strong></span>) which collects |
| 1047 | the profile data : |
| 1048 | </p> |
| 1049 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 1050 | <tr> |
| 1051 | <td> |
| 1052 | <pre class="screen">opcontrol --start</pre> |
| 1053 | </td> |
| 1054 | </tr> |
| 1055 | </table> |
| 1056 | <p> |
| 1057 | When I want to stop profiling, I can do so with : |
| 1058 | </p> |
| 1059 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 1060 | <tr> |
| 1061 | <td> |
| 1062 | <pre class="screen">opcontrol --shutdown</pre> |
| 1063 | </td> |
| 1064 | </tr> |
| 1065 | </table> |
| 1066 | <p> |
| 1067 | Note that unlike <span><strong class="command">gprof</strong></span>, no instrumentation (<code class="option">-pg</code> |
| 1068 | and <code class="option">-a</code> options to <span><strong class="command">gcc</strong></span>) |
| 1069 | is necessary. |
| 1070 | </p> |
| 1071 | <p> |
| 1072 | Periodically (or on <span><strong class="command">opcontrol --shutdown</strong></span> or <span><strong class="command">opcontrol --dump</strong></span>) |
| 1073 | the profile data is written out into the $SESSION_DIR/samples directory (by default at <code class="filename">/var/lib/oprofile/samples</code>). |
| 1074 | These profile files cover shared libraries, applications, the kernel (vmlinux), and kernel modules. |
| 1075 | You can clear the profile data (at any time) with <span><strong class="command">opcontrol --reset</strong></span>. |
| 1076 | </p> |
| 1077 | <p> |
| 1078 | To place these sample database files in a specific directory instead of the default location (<code class="filename">/var/lib/oprofile</code>) use the <code class="option">--session-dir=dir</code> option. You must also specify the <code class="option">--session-dir</code> to tell the tools to continue using this directory. (In the future, we should allow this to be specified in an environment variable.) : |
| 1079 | </p> |
| 1080 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 1081 | <tr> |
| 1082 | <td> |
| 1083 | <pre class="screen">opcontrol --no-vmlinux --session-dir=/home/me/tmpsession</pre> |
| 1084 | </td> |
| 1085 | </tr> |
| 1086 | </table> |
| 1087 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 1088 | <tr> |
| 1089 | <td> |
| 1090 | <pre class="screen">opcontrol --start --session-dir=/home/me/tmpsession</pre> |
| 1091 | </td> |
| 1092 | </tr> |
| 1093 | </table> |
| 1094 | <p> |
| 1095 | You can get summaries of this data in a number of ways at any time. To get a summary of |
| 1096 | data across the entire system for all of these profiles, you can do : |
| 1097 | </p> |
| 1098 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 1099 | <tr> |
| 1100 | <td> |
| 1101 | <pre class="screen">opreport [--session-dir=dir]</pre> |
| 1102 | </td> |
| 1103 | </tr> |
| 1104 | </table> |
| 1105 | <p> |
| 1106 | Or to get a more detailed summary, for a particular image, you can do something like : |
| 1107 | </p> |
| 1108 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 1109 | <tr> |
| 1110 | <td> |
| 1111 | <pre class="screen">opreport -l /boot/vmlinux-`uname -r`</pre> |
| 1112 | </td> |
| 1113 | </tr> |
| 1114 | </table> |
| 1115 | <p> |
| 1116 | There are also a number of other ways of presenting the data, as described later in this manual. |
| 1117 | Note that OProfile will choose a default profiling setup for you. However, there are a number |
| 1118 | of options you can pass to <span><strong class="command">opcontrol</strong></span> if you need to change something, |
| 1119 | also detailed later. |
| 1120 | </p> |
| 1121 | </div> |
| 1122 | <div class="sect1" lang="en" xml:lang="en"> |
| 1123 | <div class="titlepage"> |
| 1124 | <div> |
| 1125 | <div> |
| 1126 | <h2 class="title" style="clear: both"><a id="tools-overview"></a>2. Tools summary</h2> |
| 1127 | </div> |
| 1128 | </div> |
| 1129 | </div> |
| 1130 | <p> |
| 1131 | This section gives a brief description of the available OProfile utilities and their purpose. |
| 1132 | </p> |
| 1133 | <div class="variablelist"> |
| 1134 | <dl> |
| 1135 | <dt> |
| 1136 | <span class="term"> |
| 1137 | <code class="filename">ophelp</code> |
| 1138 | </span> |
| 1139 | </dt> |
| 1140 | <dd> |
| 1141 | <p> |
| 1142 | This utility lists the available events and short descriptions. |
| 1143 | </p> |
| 1144 | </dd> |
| 1145 | <dt> |
| 1146 | <span class="term"> |
| 1147 | <code class="filename">opcontrol</code> |
| 1148 | </span> |
| 1149 | </dt> |
| 1150 | <dd> |
| 1151 | <p> |
| 1152 | Used for controlling the OProfile data collection, discussed in <a href="#controlling" title="Chapter 3. Controlling the profiler">Chapter 3, <i>Controlling the profiler</i></a>. |
| 1153 | </p> |
| 1154 | </dd> |
| 1155 | <dt> |
| 1156 | <span class="term"> |
| 1157 | <code class="filename">agent libraries</code> |
| 1158 | </span> |
| 1159 | </dt> |
| 1160 | <dd> |
| 1161 | <p> |
| 1162 | Used by virtual machines (like the Java VM) to record information about JITed code being profiled. See <a href="#setup-jit" title="2. Setting up the JIT profiling feature">Section 2, “Setting up the JIT profiling feature”</a>. |
| 1163 | </p> |
| 1164 | </dd> |
| 1165 | <dt> |
| 1166 | <span class="term"> |
| 1167 | <code class="filename">opreport</code> |
| 1168 | </span> |
| 1169 | </dt> |
| 1170 | <dd> |
| 1171 | <p> |
| 1172 | This is the main tool for retrieving useful profile data, described in |
| 1173 | <a href="#opreport" title="2. Image summaries and symbol summaries (opreport)">Section 2, “Image summaries and symbol summaries (<span><strong class="command">opreport</strong></span>)”</a>. |
| 1174 | </p> |
| 1175 | </dd> |
| 1176 | <dt> |
| 1177 | <span class="term"> |
| 1178 | <code class="filename">opannotate</code> |
| 1179 | </span> |
| 1180 | </dt> |
| 1181 | <dd> |
| 1182 | <p> |
| 1183 | This utility can be used to produce annotated source, assembly or mixed source/assembly. |
| 1184 | Source level annotation is available only if the application was compiled with |
| 1185 | debugging symbols. See <a href="#opannotate" title="3. Outputting annotated source (opannotate)">Section 3, “Outputting annotated source (<span><strong class="command">opannotate</strong></span>)”</a>. |
| 1186 | </p> |
| 1187 | </dd> |
| 1188 | <dt> |
| 1189 | <span class="term"> |
| 1190 | <code class="filename">opgprof</code> |
| 1191 | </span> |
| 1192 | </dt> |
| 1193 | <dd> |
| 1194 | <p> |
| 1195 | This utility can output gprof-style data files for a binary, for use with |
| 1196 | <span><strong class="command">gprof -p</strong></span>. See <a href="#opgprof" title="5. gprof-compatible output (opgprof)">Section 5, “<span><strong class="command">gprof</strong></span>-compatible output (<span><strong class="command">opgprof</strong></span>)”</a>. |
| 1197 | </p> |
| 1198 | </dd> |
| 1199 | <dt> |
| 1200 | <span class="term"> |
| 1201 | <code class="filename">oparchive</code> |
| 1202 | </span> |
| 1203 | </dt> |
| 1204 | <dd> |
| 1205 | <p> |
| 1206 | This utility can be used to collect executables, debuginfo, |
| 1207 | and sample files and copy the files into an archive. |
| 1208 | The archive is self-contained and can be moved to another |
| 1209 | machine for further analysis. |
| 1210 | See <a href="#oparchive" title="6. Archiving measurements (oparchive)">Section 6, “Archiving measurements (<span><strong class="command">oparchive</strong></span>)”</a>. |
| 1211 | </p> |
| 1212 | </dd> |
| 1213 | <dt> |
| 1214 | <span class="term"> |
| 1215 | <code class="filename">opimport</code> |
| 1216 | </span> |
| 1217 | </dt> |
| 1218 | <dd> |
| 1219 | <p> |
| 1220 | This utility converts sample database files from a foreign binary format (abi) to |
| 1221 | the native format. This is useful only when moving sample files between hosts, |
| 1222 | for analysis on platforms other than the one used for collection. |
| 1223 | See <a href="#opimport" title="7. Converting sample database files (opimport)">Section 7, “Converting sample database files (<span><strong class="command">opimport</strong></span>)”</a>. |
| 1224 | </p> |
| 1225 | </dd> |
| 1226 | </dl> |
| 1227 | </div> |
| 1228 | </div> |
| 1229 | </div> |
| 1230 | <div class="chapter" lang="en" xml:lang="en"> |
| 1231 | <div class="titlepage"> |
| 1232 | <div> |
| 1233 | <div> |
| 1234 | <h2 class="title"><a id="controlling"></a>Chapter 3. Controlling the profiler</h2> |
| 1235 | </div> |
| 1236 | </div> |
| 1237 | </div> |
| 1238 | <div class="toc"> |
| 1239 | <p> |
| 1240 | <b>Table of Contents</b> |
| 1241 | </p> |
| 1242 | <dl> |
| 1243 | <dt> |
| 1244 | <span class="sect1"> |
| 1245 | <a href="#controlling-daemon">1. Using <span><strong class="command">opcontrol</strong></span></a> |
| 1246 | </span> |
| 1247 | </dt> |
| 1248 | <dd> |
| 1249 | <dl> |
| 1250 | <dt> |
| 1251 | <span class="sect2"> |
| 1252 | <a href="#opcontrolexamples">1.1. Examples</a> |
| 1253 | </span> |
| 1254 | </dt> |
| 1255 | <dt> |
| 1256 | <span class="sect2"> |
| 1257 | <a href="#eventspec">1.2. Specifying performance counter events</a> |
| 1258 | </span> |
| 1259 | </dt> |
| 1260 | </dl> |
| 1261 | </dd> |
| 1262 | <dt> |
| 1263 | <span class="sect1"> |
| 1264 | <a href="#setup-jit">2. Setting up the JIT profiling feature</a> |
| 1265 | </span> |
| 1266 | </dt> |
| 1267 | <dd> |
| 1268 | <dl> |
| 1269 | <dt> |
| 1270 | <span class="sect2"> |
| 1271 | <a href="#setup-jit-jvm">2.1. JVM instrumentation</a> |
| 1272 | </span> |
| 1273 | </dt> |
| 1274 | </dl> |
| 1275 | </dd> |
| 1276 | <dt> |
| 1277 | <span class="sect1"> |
| 1278 | <a href="#oprofile-gui">3. Using <span><strong class="command">oprof_start</strong></span></a> |
| 1279 | </span> |
| 1280 | </dt> |
| 1281 | <dt> |
| 1282 | <span class="sect1"> |
| 1283 | <a href="#detailed-parameters">4. Configuration details</a> |
| 1284 | </span> |
| 1285 | </dt> |
| 1286 | <dd> |
| 1287 | <dl> |
| 1288 | <dt> |
| 1289 | <span class="sect2"> |
| 1290 | <a href="#hardware-counters">4.1. Hardware performance counters</a> |
| 1291 | </span> |
| 1292 | </dt> |
| 1293 | <dt> |
| 1294 | <span class="sect2"> |
| 1295 | <a href="#rtc">4.2. OProfile in RTC mode</a> |
| 1296 | </span> |
| 1297 | </dt> |
| 1298 | <dt> |
| 1299 | <span class="sect2"> |
| 1300 | <a href="#timer">4.3. OProfile in timer interrupt mode</a> |
| 1301 | </span> |
| 1302 | </dt> |
| 1303 | <dt> |
| 1304 | <span class="sect2"> |
| 1305 | <a href="#p4">4.4. Pentium 4 support</a> |
| 1306 | </span> |
| 1307 | </dt> |
| 1308 | <dt> |
| 1309 | <span class="sect2"> |
| 1310 | <a href="#ia64">4.5. Intel Itanium 2 support</a> |
| 1311 | </span> |
| 1312 | </dt> |
| 1313 | <dt> |
| 1314 | <span class="sect2"> |
| 1315 | <a href="#ppc64">4.6. PowerPC64 support</a> |
| 1316 | </span> |
| 1317 | </dt> |
| 1318 | <dt> |
| 1319 | <span class="sect2"> |
| 1320 | <a href="#cell-be">4.7. Cell Broadband Engine support</a> |
| 1321 | </span> |
| 1322 | </dt> |
| 1323 | <dt> |
| 1324 | <span class="sect2"> |
| 1325 | <a href="#amd-ibs-support">4.8. AMD64 (x86_64) Instruction-Based Sampling (IBS) support</a> |
| 1326 | </span> |
| 1327 | </dt> |
| 1328 | <dt> |
| 1329 | <span class="sect2"> |
| 1330 | <a href="#misuse">4.9. Dangerous counter settings</a> |
| 1331 | </span> |
| 1332 | </dt> |
| 1333 | </dl> |
| 1334 | </dd> |
| 1335 | </dl> |
| 1336 | </div> |
| 1337 | <div class="sect1" lang="en" xml:lang="en"> |
| 1338 | <div class="titlepage"> |
| 1339 | <div> |
| 1340 | <div> |
| 1341 | <h2 class="title" style="clear: both"><a id="controlling-daemon"></a>1. Using <span><strong class="command">opcontrol</strong></span></h2> |
| 1342 | </div> |
| 1343 | </div> |
| 1344 | </div> |
| 1345 | <p> |
| 1346 | In this section we describe the configuration and control of the profiling system |
| 1347 | with opcontrol in more depth. |
| 1348 | The <span><strong class="command">opcontrol</strong></span> script has a default setup, but you |
| 1349 | can alter this with the options given below. In particular, |
| 1350 | if your hardware supports performance counters, you can configure them. |
| 1351 | There are a number of counters (for example, counter 0 and counter 1 |
| 1352 | on the Pentium III). Each of these counters can be programmed with |
| 1353 | an event to count, such as cache misses or MMX operations. The event |
| 1354 | chosen for each counter is reflected in the profile data collected |
| 1355 | by OProfile: functions and binaries at the top of the profiles reflect |
| 1356 | that most of the chosen events happened within that code. |
| 1357 | </p> |
| 1358 | <p> |
| 1359 | Additionally, each counter has a "count" value: this corresponds to how |
| 1360 | detailed the profile is. The lower the value, the more frequently profile |
| 1361 | samples are taken. A counter can choose to sample only kernel code, user-space code, |
| 1362 | or both (both is the default). Finally, some events have a "unit mask" |
| 1363 | - this is a value that further restricts the types of event that are counted. |
| 1364 | The event types and unit masks for your CPU are listed by <span><strong class="command">opcontrol |
| 1365 | --list-events</strong></span>. |
| 1366 | </p> |
| 1367 | <p> |
| 1368 | The <span><strong class="command">opcontrol</strong></span> script provides the following actions : |
| 1369 | </p> |
| 1370 | <div class="variablelist"> |
| 1371 | <dl> |
| 1372 | <dt> |
| 1373 | <span class="term"> |
| 1374 | <code class="option">--init</code> |
| 1375 | </span> |
| 1376 | </dt> |
| 1377 | <dd> |
| 1378 | <p> |
| 1379 | Loads the OProfile module if required and makes the OProfile driver |
| 1380 | interface available. |
| 1381 | </p> |
| 1382 | </dd> |
| 1383 | <dt> |
| 1384 | <span class="term"> |
| 1385 | <code class="option">--setup</code> |
| 1386 | </span> |
| 1387 | </dt> |
| 1388 | <dd> |
| 1389 | <p> |
| 1390 | Followed by list arguments for profiling set up. List of arguments |
| 1391 | saved in <code class="filename">/root/.oprofile/daemonrc</code>. |
| 1392 | Giving this option is not necessary; you can just directly pass one |
| 1393 | of the setup options, e.g. <span><strong class="command">opcontrol --no-vmlinux</strong></span>. |
| 1394 | </p> |
| 1395 | </dd> |
| 1396 | <dt> |
| 1397 | <span class="term"> |
| 1398 | <code class="option">--status</code> |
| 1399 | </span> |
| 1400 | </dt> |
| 1401 | <dd> |
| 1402 | <p> |
| 1403 | Show configuration information. |
| 1404 | </p> |
| 1405 | </dd> |
| 1406 | <dt> |
| 1407 | <span class="term"> |
| 1408 | <code class="option">--start-daemon</code> |
| 1409 | </span> |
| 1410 | </dt> |
| 1411 | <dd> |
| 1412 | <p> |
| 1413 | Start the oprofile daemon without starting actual profiling. The profiling |
| 1414 | can then be started using <code class="option">--start</code>. This is useful for avoiding |
| 1415 | measuring the cost of daemon startup, as <code class="option">--start</code> is a simple |
| 1416 | write to a file in oprofilefs. Not available in 2.2/2.4 kernels. |
| 1417 | </p> |
| 1418 | </dd> |
| 1419 | <dt> |
| 1420 | <span class="term"> |
| 1421 | <code class="option">--start</code> |
| 1422 | </span> |
| 1423 | </dt> |
| 1424 | <dd> |
| 1425 | <p> |
| 1426 | Start data collection with either arguments provided by <code class="option">--setup</code> |
| 1427 | or information saved in <code class="filename">/root/.oprofile/daemonrc</code>. Specifying |
| 1428 | the addition <code class="option">--verbose</code> makes the daemon generate lots of debug data |
| 1429 | whilst it is running. |
| 1430 | </p> |
| 1431 | </dd> |
| 1432 | <dt> |
| 1433 | <span class="term"> |
| 1434 | <code class="option">--dump</code> |
| 1435 | </span> |
| 1436 | </dt> |
| 1437 | <dd> |
| 1438 | <p> |
| 1439 | Force a flush of the collected profiling data to the daemon. |
| 1440 | </p> |
| 1441 | </dd> |
| 1442 | <dt> |
| 1443 | <span class="term"> |
| 1444 | <code class="option">--stop</code> |
| 1445 | </span> |
| 1446 | </dt> |
| 1447 | <dd> |
| 1448 | <p> |
| 1449 | Stop data collection (this separate step is not possible with 2.2 or 2.4 kernels). |
| 1450 | </p> |
| 1451 | </dd> |
| 1452 | <dt> |
| 1453 | <span class="term"> |
| 1454 | <code class="option">--shutdown</code> |
| 1455 | </span> |
| 1456 | </dt> |
| 1457 | <dd> |
| 1458 | <p> |
| 1459 | Stop data collection and kill the daemon. |
| 1460 | </p> |
| 1461 | </dd> |
| 1462 | <dt> |
| 1463 | <span class="term"> |
| 1464 | <code class="option">--reset</code> |
| 1465 | </span> |
| 1466 | </dt> |
| 1467 | <dd> |
| 1468 | <p> |
| 1469 | Clears out data from current session, but leaves saved sessions. |
| 1470 | </p> |
| 1471 | </dd> |
| 1472 | <dt> |
| 1473 | <span class="term"><code class="option">--save=</code>session_name</span> |
| 1474 | </dt> |
| 1475 | <dd> |
| 1476 | <p> |
| 1477 | Save data from current session to session_name. |
| 1478 | </p> |
| 1479 | </dd> |
| 1480 | <dt> |
| 1481 | <span class="term"> |
| 1482 | <code class="option">--deinit</code> |
| 1483 | </span> |
| 1484 | </dt> |
| 1485 | <dd> |
| 1486 | <p> |
| 1487 | Shuts down daemon. Unload the OProfile module and oprofilefs. |
| 1488 | </p> |
| 1489 | </dd> |
| 1490 | <dt> |
| 1491 | <span class="term"> |
| 1492 | <code class="option">--list-events</code> |
| 1493 | </span> |
| 1494 | </dt> |
| 1495 | <dd> |
| 1496 | <p> |
| 1497 | List event types and unit masks. |
| 1498 | </p> |
| 1499 | </dd> |
| 1500 | <dt> |
| 1501 | <span class="term"> |
| 1502 | <code class="option">--help</code> |
| 1503 | </span> |
| 1504 | </dt> |
| 1505 | <dd> |
| 1506 | <p> |
| 1507 | Generate usage messages. |
| 1508 | </p> |
| 1509 | </dd> |
| 1510 | </dl> |
| 1511 | </div> |
| 1512 | <p> |
| 1513 | There are a number of possible settings, of which, only |
| 1514 | <code class="option">--vmlinux</code> (or <code class="option">--no-vmlinux</code>) |
| 1515 | is required. These settings are stored in <code class="filename">~/.oprofile/daemonrc</code>. |
| 1516 | </p> |
| 1517 | <div class="variablelist"> |
| 1518 | <dl> |
| 1519 | <dt> |
| 1520 | <span class="term"><code class="option">--buffer-size=</code>num</span> |
| 1521 | </dt> |
| 1522 | <dd> |
| 1523 | <p> |
| 1524 | Number of samples in kernel buffer. When using a 2.6 kernel |
| 1525 | buffer watershed need to be tweaked when changing this value. |
| 1526 | </p> |
| 1527 | </dd> |
| 1528 | <dt> |
| 1529 | <span class="term"><code class="option">--buffer-watershed=</code>num</span> |
| 1530 | </dt> |
| 1531 | <dd> |
| 1532 | <p> |
| 1533 | Set kernel buffer watershed to num samples (2.6 only). When it'll remain only |
| 1534 | buffer-size - buffer-watershed free entry in the kernel buffer data will be |
| 1535 | flushed to daemon, most usefull value are in the range [0.25 - 0.5] * buffer-size. |
| 1536 | </p> |
| 1537 | </dd> |
| 1538 | <dt> |
| 1539 | <span class="term"><code class="option">--cpu-buffer-size=</code>num</span> |
| 1540 | </dt> |
| 1541 | <dd> |
| 1542 | <p> |
| 1543 | Number of samples in kernel per-cpu buffer (2.6 only). If you |
| 1544 | profile at high rate it can help to increase this if the log |
| 1545 | file show excessive count of sample lost cpu buffer overflow. |
| 1546 | </p> |
| 1547 | </dd> |
| 1548 | <dt> |
| 1549 | <span class="term"><code class="option">--event=</code>[eventspec]</span> |
| 1550 | </dt> |
| 1551 | <dd> |
| 1552 | <p> |
| 1553 | Use the given performance counter event to profile. |
| 1554 | See <a href="#eventspec" title="1.2. Specifying performance counter events">Section 1.2, “Specifying performance counter events”</a> below. |
| 1555 | </p> |
| 1556 | </dd> |
| 1557 | <dt> |
| 1558 | <span class="term"><code class="option">--session-dir=</code>dir_path</span> |
| 1559 | </dt> |
| 1560 | <dd> |
| 1561 | <p> |
| 1562 | Create/use sample database out of directory <code class="filename">dir_path</code> instead of |
| 1563 | the default location (/var/lib/oprofile). |
| 1564 | </p> |
| 1565 | </dd> |
| 1566 | <dt> |
| 1567 | <span class="term"><code class="option">--separate=</code>[none,lib,kernel,thread,cpu,all]</span> |
| 1568 | </dt> |
| 1569 | <dd> |
| 1570 | <p> |
| 1571 | By default, every profile is stored in a single file. Thus, for example, |
| 1572 | samples in the C library are all accredited to the <code class="filename">/lib/libc.o</code> |
| 1573 | profile. However, you choose to create separate sample files by specifying |
| 1574 | one of the below options. |
| 1575 | </p> |
| 1576 | <div class="informaltable"> |
| 1577 | <table border="1"> |
| 1578 | <colgroup> |
| 1579 | <col /> |
| 1580 | <col /> |
| 1581 | </colgroup> |
| 1582 | <tbody> |
| 1583 | <tr> |
| 1584 | <td> |
| 1585 | <code class="option">none</code> |
| 1586 | </td> |
| 1587 | <td>No profile separation (default)</td> |
| 1588 | </tr> |
| 1589 | <tr> |
| 1590 | <td> |
| 1591 | <code class="option">lib</code> |
| 1592 | </td> |
| 1593 | <td>Create per-application profiles for libraries</td> |
| 1594 | </tr> |
| 1595 | <tr> |
| 1596 | <td> |
| 1597 | <code class="option">kernel</code> |
| 1598 | </td> |
| 1599 | <td>Create per-application profiles for the kernel and kernel modules</td> |
| 1600 | </tr> |
| 1601 | <tr> |
| 1602 | <td> |
| 1603 | <code class="option">thread</code> |
| 1604 | </td> |
| 1605 | <td>Create profiles for each thread and each task</td> |
| 1606 | </tr> |
| 1607 | <tr> |
| 1608 | <td> |
| 1609 | <code class="option">cpu</code> |
| 1610 | </td> |
| 1611 | <td>Create profiles for each CPU</td> |
| 1612 | </tr> |
| 1613 | <tr> |
| 1614 | <td> |
| 1615 | <code class="option">all</code> |
| 1616 | </td> |
| 1617 | <td>All of the above options</td> |
| 1618 | </tr> |
| 1619 | </tbody> |
| 1620 | </table> |
| 1621 | </div> |
| 1622 | <p> |
| 1623 | Note that <code class="option">--separate=kernel</code> also turns on <code class="option">--separate=lib</code>. |
| 1624 | |
| 1625 | When using <code class="option">--separate=kernel</code>, samples in hardware interrupts, soft-irqs, or other |
| 1626 | asynchronous kernel contexts are credited to the task currently running. This means you will see |
| 1627 | seemingly nonsense profiles such as <code class="filename">/bin/bash</code> showing samples for the PPP modules, |
| 1628 | etc. |
| 1629 | </p> |
| 1630 | <p> |
| 1631 | On 2.2/2.4 only kernel threads already started when profiling begins are correctly profiled; |
| 1632 | newly started kernel thread samples are credited to the vmlinux (kernel) profile. |
| 1633 | </p> |
| 1634 | <p> |
| 1635 | Using <code class="option">--separate=thread</code> creates a lot |
| 1636 | of sample files if you leave OProfile running for a while; it's most |
| 1637 | useful when used for short sessions, or when using image filtering. |
| 1638 | </p> |
| 1639 | </dd> |
| 1640 | <dt> |
| 1641 | <span class="term"><code class="option">--callgraph=</code>#depth</span> |
| 1642 | </dt> |
| 1643 | <dd> |
| 1644 | <p> |
| 1645 | Enable call-graph sample collection with a maximum depth. Use 0 to disable |
| 1646 | callgraph profiling. NOTE: Callgraph support is available on a limited |
| 1647 | number of platforms at this time; for example: |
| 1648 | </p> |
| 1649 | <p> |
| 1650 | </p> |
| 1651 | <div class="itemizedlist"> |
| 1652 | <ul type="disc"> |
| 1653 | <li> |
| 1654 | <p>x86 with recent 2.6 kernel</p> |
| 1655 | </li> |
| 1656 | <li> |
| 1657 | <p>ARM with recent 2.6 kernel</p> |
| 1658 | </li> |
| 1659 | <li> |
| 1660 | <p>PowerPC with 2.6.17 kernel</p> |
| 1661 | </li> |
| 1662 | </ul> |
| 1663 | </div> |
| 1664 | <p> |
| 1665 | </p> |
| 1666 | <p> |
| 1667 | </p> |
| 1668 | </dd> |
| 1669 | <dt> |
| 1670 | <span class="term"><code class="option">--image=</code>image,[images]|"all"</span> |
| 1671 | </dt> |
| 1672 | <dd> |
| 1673 | <p> |
| 1674 | Image filtering. If you specify one or more absolute |
| 1675 | paths to binaries, OProfile will only produce profile results for those |
| 1676 | binary images. This is useful for restricting the sometimes voluminous |
| 1677 | output you may get otherwise, especially with |
| 1678 | <code class="option">--separate=thread</code>. Note that if you are using |
| 1679 | <code class="option">--separate=lib</code> or |
| 1680 | <code class="option">--separate=kernel</code>, then if you specification an |
| 1681 | application binary, the shared libraries and kernel code |
| 1682 | <span class="emphasis"><em>are</em></span> included. Specify the value |
| 1683 | "all" to profile everything (the default). |
| 1684 | </p> |
| 1685 | </dd> |
| 1686 | <dt> |
| 1687 | <span class="term"><code class="option">--vmlinux=</code>file</span> |
| 1688 | </dt> |
| 1689 | <dd> |
| 1690 | <p> |
| 1691 | vmlinux kernel image. |
| 1692 | </p> |
| 1693 | </dd> |
| 1694 | <dt> |
| 1695 | <span class="term"> |
| 1696 | <code class="option">--no-vmlinux</code> |
| 1697 | </span> |
| 1698 | </dt> |
| 1699 | <dd> |
| 1700 | <p> |
| 1701 | Use this when you don't have a kernel vmlinux file, and you don't want |
| 1702 | to profile the kernel. This still counts the total number of kernel samples, |
| 1703 | but can't give symbol-based results for the kernel or any modules. |
| 1704 | </p> |
| 1705 | </dd> |
| 1706 | </dl> |
| 1707 | </div> |
| 1708 | <div class="sect2" lang="en" xml:lang="en"> |
| 1709 | <div class="titlepage"> |
| 1710 | <div> |
| 1711 | <div> |
| 1712 | <h3 class="title"><a id="opcontrolexamples"></a>1.1. Examples</h3> |
| 1713 | </div> |
| 1714 | </div> |
| 1715 | </div> |
| 1716 | <div class="sect3" lang="en" xml:lang="en"> |
| 1717 | <div class="titlepage"> |
| 1718 | <div> |
| 1719 | <div> |
| 1720 | <h4 class="title"><a id="examplesperfctr"></a>1.1.1. Intel performance counter setup</h4> |
| 1721 | </div> |
| 1722 | </div> |
| 1723 | </div> |
| 1724 | <p> |
| 1725 | Here, we have a Pentium III running at 800MHz, and we want to look at where data memory |
| 1726 | references are happening most, and also get results for CPU time. |
| 1727 | </p> |
| 1728 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 1729 | <tr> |
| 1730 | <td> |
| 1731 | <pre class="screen"> |
| 1732 | # opcontrol --event=CPU_CLK_UNHALTED:400000 --event=DATA_MEM_REFS:10000 |
| 1733 | # opcontrol --vmlinux=/boot/2.6.0/vmlinux |
| 1734 | # opcontrol --start |
| 1735 | </pre> |
| 1736 | </td> |
| 1737 | </tr> |
| 1738 | </table> |
| 1739 | </div> |
| 1740 | <div class="sect3" lang="en" xml:lang="en"> |
| 1741 | <div class="titlepage"> |
| 1742 | <div> |
| 1743 | <div> |
| 1744 | <h4 class="title"><a id="examplesrtc"></a>1.1.2. RTC mode</h4> |
| 1745 | </div> |
| 1746 | </div> |
| 1747 | </div> |
| 1748 | <p> |
| 1749 | Here, we have an Intel laptop without support for performance counters, running on 2.4 kernels. |
| 1750 | </p> |
| 1751 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 1752 | <tr> |
| 1753 | <td> |
| 1754 | <pre class="screen"> |
| 1755 | # ophelp -r |
| 1756 | CPU with RTC device |
| 1757 | # opcontrol --vmlinux=/boot/2.4.13/vmlinux --event=RTC_INTERRUPTS:1024 |
| 1758 | # opcontrol --start |
| 1759 | </pre> |
| 1760 | </td> |
| 1761 | </tr> |
| 1762 | </table> |
| 1763 | </div> |
| 1764 | <div class="sect3" lang="en" xml:lang="en"> |
| 1765 | <div class="titlepage"> |
| 1766 | <div> |
| 1767 | <div> |
| 1768 | <h4 class="title"><a id="examplesstartdaemon"></a>1.1.3. Starting the daemon separately</h4> |
| 1769 | </div> |
| 1770 | </div> |
| 1771 | </div> |
| 1772 | <p> |
| 1773 | If we're running 2.6 kernels, we can use <code class="option">--start-daemon</code> to avoid |
| 1774 | the profiler startup affecting results. |
| 1775 | </p> |
| 1776 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 1777 | <tr> |
| 1778 | <td> |
| 1779 | <pre class="screen"> |
| 1780 | # opcontrol --vmlinux=/boot/2.6.0/vmlinux |
| 1781 | # opcontrol --start-daemon |
| 1782 | # my_favourite_benchmark --init |
| 1783 | # opcontrol --start ; my_favourite_benchmark --run ; opcontrol --stop |
| 1784 | </pre> |
| 1785 | </td> |
| 1786 | </tr> |
| 1787 | </table> |
| 1788 | </div> |
| 1789 | <div class="sect3" lang="en" xml:lang="en"> |
| 1790 | <div class="titlepage"> |
| 1791 | <div> |
| 1792 | <div> |
| 1793 | <h4 class="title"><a id="exampleseparate"></a>1.1.4. Separate profiles for libraries and the kernel</h4> |
| 1794 | </div> |
| 1795 | </div> |
| 1796 | </div> |
| 1797 | <p> |
| 1798 | Here, we want to see a profile of the OProfile daemon itself, including when |
| 1799 | it was running inside the kernel driver, and its use of shared libraries. |
| 1800 | </p> |
| 1801 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 1802 | <tr> |
| 1803 | <td> |
| 1804 | <pre class="screen"> |
| 1805 | # opcontrol --separate=kernel --vmlinux=/boot/2.6.0/vmlinux |
| 1806 | # opcontrol --start |
| 1807 | # my_favourite_stress_test --run |
| 1808 | # opreport -l -p /lib/modules/2.6.0/kernel /usr/local/bin/oprofiled |
| 1809 | </pre> |
| 1810 | </td> |
| 1811 | </tr> |
| 1812 | </table> |
| 1813 | </div> |
| 1814 | <div class="sect3" lang="en" xml:lang="en"> |
| 1815 | <div class="titlepage"> |
| 1816 | <div> |
| 1817 | <div> |
| 1818 | <h4 class="title"><a id="examplessessions"></a>1.1.5. Profiling sessions</h4> |
| 1819 | </div> |
| 1820 | </div> |
| 1821 | </div> |
| 1822 | <p> |
| 1823 | It can often be useful to split up profiling data into several different |
| 1824 | time periods. For example, you may want to collect data on an application's |
| 1825 | startup separately from the normal runtime data. You can use the simple |
| 1826 | command <span><strong class="command">opcontrol --save</strong></span> to do this. For example : |
| 1827 | </p> |
| 1828 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 1829 | <tr> |
| 1830 | <td> |
| 1831 | <pre class="screen"> |
| 1832 | # opcontrol --save=blah |
| 1833 | </pre> |
| 1834 | </td> |
| 1835 | </tr> |
| 1836 | </table> |
| 1837 | <p> |
| 1838 | will create a sub-directory in <code class="filename">$SESSION_DIR/samples</code> containing the samples |
| 1839 | up to that point (the current session's sample files are moved into this |
| 1840 | directory). You can then pass this session name as a parameter to the post-profiling |
| 1841 | analysis tools, to only get data up to the point you named the |
| 1842 | session. If you do not want to save a session, you can do |
| 1843 | <span><strong class="command">rm -rf $SESSION_DIR/samples/sessionname</strong></span> or, for the |
| 1844 | current session, <span><strong class="command">opcontrol --reset</strong></span>. |
| 1845 | </p> |
| 1846 | </div> |
| 1847 | </div> |
| 1848 | <div class="sect2" lang="en" xml:lang="en"> |
| 1849 | <div class="titlepage"> |
| 1850 | <div> |
| 1851 | <div> |
| 1852 | <h3 class="title"><a id="eventspec"></a>1.2. Specifying performance counter events</h3> |
| 1853 | </div> |
| 1854 | </div> |
| 1855 | </div> |
| 1856 | <p> |
| 1857 | The <code class="option">--event</code> option to <span><strong class="command">opcontrol</strong></span> |
| 1858 | takes a specification that indicates how the details of each |
| 1859 | hardware performance counter should be setup. If you want to |
| 1860 | revert to OProfile's default setting (<code class="option">--event</code> |
| 1861 | is strictly optional), use <code class="option">--event=default</code>. Use of this |
| 1862 | option over-rides all previous event selections. |
| 1863 | </p> |
| 1864 | <p> |
| 1865 | You can pass multiple event specifications. OProfile will allocate |
| 1866 | hardware counters as necessary. Note that some combinations are not |
| 1867 | allowed by the CPU; running <span><strong class="command">opcontrol --list-events</strong></span> gives the details |
| 1868 | of each event. The event specification is a colon-separated string |
| 1869 | of the form <code class="option"><span class="emphasis"><em>name</em></span>:<span class="emphasis"><em>count</em></span>:<span class="emphasis"><em>unitmask</em></span>:<span class="emphasis"><em>kernel</em></span>:<span class="emphasis"><em>user</em></span></code> as described in this table: |
| 1870 | </p> |
| 1871 | <div class="informaltable"> |
| 1872 | <table border="1"> |
| 1873 | <colgroup> |
| 1874 | <col /> |
| 1875 | <col /> |
| 1876 | </colgroup> |
| 1877 | <tbody> |
| 1878 | <tr> |
| 1879 | <td> |
| 1880 | <code class="option">name</code> |
| 1881 | </td> |
| 1882 | <td>The symbolic event name, e.g. <code class="constant">CPU_CLK_UNHALTED</code></td> |
| 1883 | </tr> |
| 1884 | <tr> |
| 1885 | <td> |
| 1886 | <code class="option">count</code> |
| 1887 | </td> |
| 1888 | <td>The counter reset value, e.g. 100000</td> |
| 1889 | </tr> |
| 1890 | <tr> |
| 1891 | <td> |
| 1892 | <code class="option">unitmask</code> |
| 1893 | </td> |
| 1894 | <td>The unit mask, as given in the events list, e.g. 0x0f</td> |
| 1895 | </tr> |
| 1896 | <tr> |
| 1897 | <td> |
| 1898 | <code class="option">kernel</code> |
| 1899 | </td> |
| 1900 | <td>Whether to profile kernel code</td> |
| 1901 | </tr> |
| 1902 | <tr> |
| 1903 | <td> |
| 1904 | <code class="option">user</code> |
| 1905 | </td> |
| 1906 | <td>Whether to profile userspace code</td> |
| 1907 | </tr> |
| 1908 | </tbody> |
| 1909 | </table> |
| 1910 | </div> |
| 1911 | <p> |
| 1912 | The last three values are optional, if you omit them (e.g. <code class="option">--event=DATA_MEM_REFS:30000</code>), |
| 1913 | they will be set to the default values (a unit mask of 0, and profiling both kernel and |
| 1914 | userspace code). Note that some events require a unit mask. |
| 1915 | </p> |
| 1916 | <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"> |
| 1917 | <h3 class="title">Note</h3> |
| 1918 | <p> |
| 1919 | For the PowerPC platforms, all events specified must be in the same group; i.e., the group number |
| 1920 | appended to the event name (e.g. <code class="constant"><<span class="emphasis"><em>some-event-name</em></span>>_GRP9</code>) must be the same. |
| 1921 | </p> |
| 1922 | </div> |
| 1923 | <p> |
| 1924 | If OProfile is using RTC mode, and you want to alter the default counter value, |
| 1925 | you can use something like <code class="option">--event=RTC_INTERRUPTS:2048</code>. Note the last |
| 1926 | three values here are ignored. |
| 1927 | If OProfile is using timer-interrupt mode, there is no configuration possible. |
| 1928 | </p> |
| 1929 | <p> |
| 1930 | The table below lists the events selected by default |
| 1931 | (<code class="option">--event=default</code>) for the various computer architectures: |
| 1932 | </p> |
| 1933 | <div class="informaltable"> |
| 1934 | <table border="1"> |
| 1935 | <colgroup> |
| 1936 | <col /> |
| 1937 | <col /> |
| 1938 | <col /> |
| 1939 | </colgroup> |
| 1940 | <tbody> |
| 1941 | <tr> |
| 1942 | <td>Processor</td> |
| 1943 | <td>cpu_type</td> |
| 1944 | <td>Default event</td> |
| 1945 | </tr> |
| 1946 | <tr> |
| 1947 | <td>Alpha EV4</td> |
| 1948 | <td>alpha/ev4</td> |
| 1949 | <td>CYCLES:100000:0:1:1</td> |
| 1950 | </tr> |
| 1951 | <tr> |
| 1952 | <td>Alpha EV5</td> |
| 1953 | <td>alpha/ev5</td> |
| 1954 | <td>CYCLES:100000:0:1:1</td> |
| 1955 | </tr> |
| 1956 | <tr> |
| 1957 | <td>Alpha PCA56</td> |
| 1958 | <td>alpha/pca56</td> |
| 1959 | <td>CYCLES:100000:0:1:1</td> |
| 1960 | </tr> |
| 1961 | <tr> |
| 1962 | <td>Alpha EV6</td> |
| 1963 | <td>alpha/ev6</td> |
| 1964 | <td>CYCLES:100000:0:1:1</td> |
| 1965 | </tr> |
| 1966 | <tr> |
| 1967 | <td>Alpha EV67</td> |
| 1968 | <td>alpha/ev67</td> |
| 1969 | <td>CYCLES:100000:0:1:1</td> |
| 1970 | </tr> |
| 1971 | <tr> |
| 1972 | <td>ARM/XScale PMU1</td> |
| 1973 | <td>arm/xscale1</td> |
| 1974 | <td>CPU_CYCLES:100000:0:1:1</td> |
| 1975 | </tr> |
| 1976 | <tr> |
| 1977 | <td>ARM/XScale PMU2</td> |
| 1978 | <td>arm/xscale2</td> |
| 1979 | <td>CPU_CYCLES:100000:0:1:1</td> |
| 1980 | </tr> |
| 1981 | <tr> |
| 1982 | <td>ARM/MPCore</td> |
| 1983 | <td>arm/mpcore</td> |
| 1984 | <td>CPU_CYCLES:100000:0:1:1</td> |
| 1985 | </tr> |
| 1986 | <tr> |
| 1987 | <td>AVR32</td> |
| 1988 | <td>avr32</td> |
| 1989 | <td>CPU_CYCLES:100000:0:1:1</td> |
| 1990 | </tr> |
| 1991 | <tr> |
| 1992 | <td>Athlon</td> |
| 1993 | <td>i386/athlon</td> |
| 1994 | <td>CPU_CLK_UNHALTED:100000:0:1:1</td> |
| 1995 | </tr> |
| 1996 | <tr> |
| 1997 | <td>Pentium Pro</td> |
| 1998 | <td>i386/ppro</td> |
| 1999 | <td>CPU_CLK_UNHALTED:100000:0:1:1</td> |
| 2000 | </tr> |
| 2001 | <tr> |
| 2002 | <td>Pentium II</td> |
| 2003 | <td>i386/pii</td> |
| 2004 | <td>CPU_CLK_UNHALTED:100000:0:1:1</td> |
| 2005 | </tr> |
| 2006 | <tr> |
| 2007 | <td>Pentium III</td> |
| 2008 | <td>i386/piii</td> |
| 2009 | <td>CPU_CLK_UNHALTED:100000:0:1:1</td> |
| 2010 | </tr> |
| 2011 | <tr> |
| 2012 | <td>Pentium M (P6 core)</td> |
| 2013 | <td>i386/p6_mobile</td> |
| 2014 | <td>CPU_CLK_UNHALTED:100000:0:1:1</td> |
| 2015 | </tr> |
| 2016 | <tr> |
| 2017 | <td>Pentium 4 (non-HT)</td> |
| 2018 | <td>i386/p4</td> |
| 2019 | <td>GLOBAL_POWER_EVENTS:100000:1:1:1</td> |
| 2020 | </tr> |
| 2021 | <tr> |
| 2022 | <td>Pentium 4 (HT)</td> |
| 2023 | <td>i386/p4-ht</td> |
| 2024 | <td>GLOBAL_POWER_EVENTS:100000:1:1:1</td> |
| 2025 | </tr> |
| 2026 | <tr> |
| 2027 | <td>Hammer</td> |
| 2028 | <td>x86-64/hammer</td> |
| 2029 | <td>CPU_CLK_UNHALTED:100000:0:1:1</td> |
| 2030 | </tr> |
| 2031 | <tr> |
| 2032 | <td>Family10h</td> |
| 2033 | <td>x86-64/family10</td> |
| 2034 | <td>CPU_CLK_UNHALTED:100000:0:1:1</td> |
| 2035 | </tr> |
| 2036 | <tr> |
| 2037 | <td>Family11h</td> |
| 2038 | <td>x86-64/family11h</td> |
| 2039 | <td>CPU_CLK_UNHALTED:100000:0:1:1</td> |
| 2040 | </tr> |
| 2041 | <tr> |
| 2042 | <td>Itanium</td> |
| 2043 | <td>ia64/itanium</td> |
| 2044 | <td>CPU_CYCLES:100000:0:1:1</td> |
| 2045 | </tr> |
| 2046 | <tr> |
| 2047 | <td>Itanium 2</td> |
| 2048 | <td>ia64/itanium2</td> |
| 2049 | <td>CPU_CYCLES:100000:0:1:1</td> |
| 2050 | </tr> |
| 2051 | <tr> |
| 2052 | <td>TIMER_INT</td> |
| 2053 | <td>timer</td> |
| 2054 | <td>None selectable</td> |
| 2055 | </tr> |
| 2056 | <tr> |
| 2057 | <td>IBM iseries</td> |
| 2058 | <td>PowerPC 4/5/970</td> |
| 2059 | <td>CYCLES:10000:0:1:1</td> |
| 2060 | </tr> |
| 2061 | <tr> |
| 2062 | <td>IBM pseries</td> |
| 2063 | <td>PowerPC 4/5/970/Cell</td> |
| 2064 | <td>CYCLES:10000:0:1:1</td> |
| 2065 | </tr> |
| 2066 | <tr> |
| 2067 | <td>IBM s390</td> |
| 2068 | <td>timer</td> |
| 2069 | <td>None selectable</td> |
| 2070 | </tr> |
| 2071 | <tr> |
| 2072 | <td>IBM s390x</td> |
| 2073 | <td>timer</td> |
| 2074 | <td>None selectable</td> |
| 2075 | </tr> |
| 2076 | </tbody> |
| 2077 | </table> |
| 2078 | </div> |
| 2079 | </div> |
| 2080 | </div> |
| 2081 | <div class="sect1" lang="en" xml:lang="en"> |
| 2082 | <div class="titlepage"> |
| 2083 | <div> |
| 2084 | <div> |
| 2085 | <h2 class="title" style="clear: both"><a id="setup-jit"></a>2. Setting up the JIT profiling feature</h2> |
| 2086 | </div> |
| 2087 | </div> |
| 2088 | </div> |
| 2089 | <p> |
| 2090 | To gather information about JITed code from a virtual machine, |
| 2091 | it needs to be instrumented with an agent library. We use the |
| 2092 | agent libraries for Java in the following example. To use the |
| 2093 | Java profiling feature, you must build OProfile with the "--with-java" option |
| 2094 | (<a href="#install" title="4. Installation">Section 4, “Installation”</a>). |
| 2095 | |
| 2096 | </p> |
| 2097 | <div class="sect2" lang="en" xml:lang="en"> |
| 2098 | <div class="titlepage"> |
| 2099 | <div> |
| 2100 | <div> |
| 2101 | <h3 class="title"><a id="setup-jit-jvm"></a>2.1. JVM instrumentation</h3> |
| 2102 | </div> |
| 2103 | </div> |
| 2104 | </div> |
| 2105 | <p> |
| 2106 | Add this to the startup parameters of the JVM (for JVMTI): |
| 2107 | |
| 2108 | </p> |
| 2109 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 2110 | <tr> |
| 2111 | <td> |
| 2112 | <pre class="screen"><code xmlns="http://www.w3.org/1999/xhtml" class="option">-agentpath:<libdir>/libjvmti_oprofile.so[=<options>]</code> </pre> |
| 2113 | </td> |
| 2114 | </tr> |
| 2115 | </table> |
| 2116 | <p> |
| 2117 | or |
| 2118 | </p> |
| 2119 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 2120 | <tr> |
| 2121 | <td> |
| 2122 | <pre class="screen"><code xmlns="http://www.w3.org/1999/xhtml" class="option">-agentlib:jvmti_oprofile[=<options>]</code> </pre> |
| 2123 | </td> |
| 2124 | </tr> |
| 2125 | </table> |
| 2126 | <p> |
| 2127 | </p> |
| 2128 | <p> |
| 2129 | The JVMPI agent implementation is enabled with the command line option |
| 2130 | </p> |
| 2131 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 2132 | <tr> |
| 2133 | <td> |
| 2134 | <pre class="screen"><code xmlns="http://www.w3.org/1999/xhtml" class="option">-Xrunjvmpi_oprofile[:<options>]</code> </pre> |
| 2135 | </td> |
| 2136 | </tr> |
| 2137 | </table> |
| 2138 | <p> |
| 2139 | </p> |
| 2140 | <p> |
| 2141 | Currently, there is just one option available -- <code class="option">debug</code>. For JVMPI, |
| 2142 | the convention for specifying an option is <code class="option">option_name=[yes|no]</code>. |
| 2143 | For JVMTI, the option specification is simply the option name, implying |
| 2144 | "yes"; no option specified implies "no". |
| 2145 | </p> |
| 2146 | <p> |
| 2147 | The agent library (installed in <code class="filename"><oprof_install_dir>/lib/oprofile</code>) |
| 2148 | needs to be in the library search path (e.g. add the library directory |
| 2149 | to <code class="constant">LD_LIBRARY_PATH</code>). If the command line of |
| 2150 | the JVM is not accessible, it may be buried within shell scripts or a |
| 2151 | launcher program. It may also be possible to set an environment variable to add |
| 2152 | the instrumentation. |
| 2153 | For Sun JVMs this is <code class="constant">JAVA_TOOL_OPTIONS</code>. Please check |
| 2154 | your JVM documentation for |
| 2155 | further information on the agent startup options. |
| 2156 | </p> |
| 2157 | </div> |
| 2158 | </div> |
| 2159 | <div class="sect1" lang="en" xml:lang="en"> |
| 2160 | <div class="titlepage"> |
| 2161 | <div> |
| 2162 | <div> |
| 2163 | <h2 class="title" style="clear: both"><a id="oprofile-gui"></a>3. Using <span><strong class="command">oprof_start</strong></span></h2> |
| 2164 | </div> |
| 2165 | </div> |
| 2166 | </div> |
| 2167 | <p> |
| 2168 | The <span><strong class="command">oprof_start</strong></span> application provides a convenient way to start the profiler. |
| 2169 | Note that <span><strong class="command">oprof_start</strong></span> is just a wrapper around the <span><strong class="command">opcontrol</strong></span> script, |
| 2170 | so it does not provide more services than the script itself. |
| 2171 | </p> |
| 2172 | <p> |
| 2173 | After <span><strong class="command">oprof_start</strong></span> is started you can select the event type for each counter; |
| 2174 | the sampling rate and other related parameters are explained in <a href="#controlling-daemon" title="1. Using opcontrol">Section 1, “Using <span><strong class="command">opcontrol</strong></span>”</a>. |
| 2175 | The "Configuration" section allows you to set general parameters such as the buffer size, kernel filename |
| 2176 | etc. The counter setup interface should be self-explanatory; <a href="#hardware-counters" title="4.1. Hardware performance counters">Section 4.1, “Hardware performance counters”</a> and related |
| 2177 | links contain information on using unit masks. |
| 2178 | </p> |
| 2179 | <p> |
| 2180 | A status line shows the current status of the profiler: how long it has been running, and the average |
| 2181 | number of interrupts received per second and the total, over all processors. |
| 2182 | Note that quitting <span><strong class="command">oprof_start</strong></span> does not stop the profiler. |
| 2183 | </p> |
| 2184 | <p> |
| 2185 | Your configuration is saved in the same file as <span><strong class="command">opcontrol</strong></span> uses; that is, |
| 2186 | <code class="filename">~/.oprofile/daemonrc</code>. |
| 2187 | </p> |
| 2188 | </div> |
| 2189 | <div class="sect1" lang="en" xml:lang="en"> |
| 2190 | <div class="titlepage"> |
| 2191 | <div> |
| 2192 | <div> |
| 2193 | <h2 class="title" style="clear: both"><a id="detailed-parameters"></a>4. Configuration details</h2> |
| 2194 | </div> |
| 2195 | </div> |
| 2196 | </div> |
| 2197 | <div class="sect2" lang="en" xml:lang="en"> |
| 2198 | <div class="titlepage"> |
| 2199 | <div> |
| 2200 | <div> |
| 2201 | <h3 class="title"><a id="hardware-counters"></a>4.1. Hardware performance counters</h3> |
| 2202 | </div> |
| 2203 | </div> |
| 2204 | </div> |
| 2205 | <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"> |
| 2206 | <h3 class="title">Note</h3> |
| 2207 | <p> |
| 2208 | Your CPU type may not include the requisite support for hardware performance counters, in which case |
| 2209 | you must use OProfile in RTC mode in 2.4 (see <a href="#rtc" title="4.2. OProfile in RTC mode">Section 4.2, “OProfile in RTC mode”</a>), or timer mode in 2.6 (see <a href="#timer" title="4.3. OProfile in timer interrupt mode">Section 4.3, “OProfile in timer interrupt mode”</a>). |
| 2210 | You do not really need to read this section unless you are interested in using |
| 2211 | events other than the default event chosen by OProfile. |
| 2212 | </p> |
| 2213 | </div> |
| 2214 | <p> |
| 2215 | The Intel hardware performance counters are detailed in the Intel IA-32 Architecture Manual, Volume 3, available |
| 2216 | from <a href="http://developer.intel.com/">http://developer.intel.com/</a>. |
| 2217 | The AMD Athlon/Opteron/Phenom/Turion implementation is detailed in <a href="http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf"> |
| 2218 | http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf</a>. |
| 2219 | For PowerPC64 processors in IBM iSeries, pSeries, and blade server systems, processor documentation |
| 2220 | is available at <a href="http://www-01.ibm.com/chips/techlib/techlib.nsf/productfamilies/PowerPC/"> |
| 2221 | http://www-01.ibm.com/chips/techlib/techlib.nsf/productfamilies/PowerPC</a>. (For example, the |
| 2222 | specific publication containing information on the performance monitor unit for the PowerPC970 is |
| 2223 | "IBM PowerPC 970FX RISC Microprocessor User's Manual.") |
| 2224 | These processors are capable of delivering an interrupt when a counter overflows. |
| 2225 | This is the basic mechanism on which OProfile is based. The delivery mode is <span class="acronym">NMI</span>, |
| 2226 | so blocking interrupts in the kernel does not prevent profiling. When the interrupt handler is called, |
| 2227 | the current <span class="acronym">PC</span> value and the current task are recorded into the profiling structure. |
| 2228 | This allows the overflow event to be attached to a specific assembly instruction in a binary image. |
| 2229 | The daemon receives this data from the kernel, and writes it to the sample files. |
| 2230 | </p> |
| 2231 | <p> |
| 2232 | If we use an event such as <code class="constant">CPU_CLK_UNHALTED</code> or <code class="constant">INST_RETIRED</code> |
| 2233 | (<code class="constant">GLOBAL_POWER_EVENTS</code> or <code class="constant">INSTR_RETIRED</code>, respectively, on the Pentium 4), we can |
| 2234 | use the overflow counts as an estimate of actual time spent in each part of code. Alternatively we can profile interesting |
| 2235 | data such as the cache behaviour of routines with the other available counters. |
| 2236 | </p> |
| 2237 | <p> |
| 2238 | However there are several caveats. First, there are those issues listed in the Intel manual. There is a delay |
| 2239 | between the counter overflow and the interrupt delivery that can skew results on a small scale - this means |
| 2240 | you cannot rely on the profiles at the instruction level as being perfectly accurate. |
| 2241 | If you are using an "event-mode" counter such as the cache counters, a count registered against it doesn't mean |
| 2242 | that it is responsible for that event. However, it implies that the counter overflowed in the dynamic |
| 2243 | vicinity of that instruction, to within a few instructions. Further details on this problem can be found in |
| 2244 | <a href="#interpreting" title="Chapter 5. Interpreting profiling results">Chapter 5, <i>Interpreting profiling results</i></a> and also in the Digital paper "ProfileMe: A Hardware Performance Counter". |
| 2245 | </p> |
| 2246 | <p> |
| 2247 | Each counter has several configuration parameters. |
| 2248 | First, there is the unit mask: this simply further specifies what to count. |
| 2249 | Second, there is the counter value, discussed below. Third, there is a parameter whether to increment counts |
| 2250 | whilst in kernel or user space. You can configure these separately for each counter. |
| 2251 | </p> |
| 2252 | <p> |
| 2253 | After each overflow event, the counter will be re-initialized |
| 2254 | such that another overflow will occur after this many events have been counted. Thus, higher |
| 2255 | values mean less-detailed profiling, and lower values mean more detail, but higher overhead. |
| 2256 | Picking a good value for this |
| 2257 | parameter is, unfortunately, somewhat of a black art. It is of course dependent on the event |
| 2258 | you have chosen. |
| 2259 | Specifying too large a value will mean not enough interrupts are generated |
| 2260 | to give a realistic profile (though this problem can be ameliorated by profiling for <span class="emphasis"><em>longer</em></span>). |
| 2261 | Specifying too small a value can lead to higher performance overhead. |
| 2262 | </p> |
| 2263 | </div> |
| 2264 | <div class="sect2" lang="en" xml:lang="en"> |
| 2265 | <div class="titlepage"> |
| 2266 | <div> |
| 2267 | <div> |
| 2268 | <h3 class="title"><a id="rtc"></a>4.2. OProfile in RTC mode</h3> |
| 2269 | </div> |
| 2270 | </div> |
| 2271 | </div> |
| 2272 | <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"> |
| 2273 | <h3 class="title">Note</h3> |
| 2274 | <p> |
| 2275 | This section applies to 2.2/2.4 kernels only. |
| 2276 | </p> |
| 2277 | </div> |
| 2278 | <p> |
| 2279 | Some CPU types do not provide the needed hardware support to use the hardware performance counters. This includes |
| 2280 | some laptops, classic Pentiums, and other CPU types not yet supported by OProfile (such as Cyrix). |
| 2281 | On these machines, OProfile falls |
| 2282 | back to using the real-time clock interrupt to collect samples. This interrupt is also used by the <span><strong class="command">rtc</strong></span> |
| 2283 | module: you cannot have both the OProfile and rtc modules loaded nor the rtc support compiled in the kernel. |
| 2284 | </p> |
| 2285 | <p> |
| 2286 | RTC mode is less capable than the hardware counters mode; in particular, it is unable to profile sections of |
| 2287 | the kernel where interrupts are disabled. There is just one available event, "RTC interrupts", and its value |
| 2288 | corresponds to the number of interrupts generated per second (that is, a higher number means a better profiling |
| 2289 | resolution, and higher overhead). The current implementation of the real-time clock supports only power-of-two |
| 2290 | sampling rates from 2 to 4096 per second. Other values within this range are rounded to the nearest power of |
| 2291 | two. |
| 2292 | </p> |
| 2293 | <p> |
| 2294 | You can force use of the RTC interrupt with the <code class="option">force_rtc=1</code> module parameter. |
| 2295 | </p> |
| 2296 | <p> |
| 2297 | Setting the value from the GUI should be straightforward. On the command line, you need to specify the |
| 2298 | event to <span><strong class="command">opcontrol</strong></span>, e.g. : |
| 2299 | </p> |
| 2300 | <p> |
| 2301 | <span> |
| 2302 | <strong class="command">opcontrol --event=RTC_INTERRUPTS:256</strong> |
| 2303 | </span> |
| 2304 | </p> |
| 2305 | </div> |
| 2306 | <div class="sect2" lang="en" xml:lang="en"> |
| 2307 | <div class="titlepage"> |
| 2308 | <div> |
| 2309 | <div> |
| 2310 | <h3 class="title"><a id="timer"></a>4.3. OProfile in timer interrupt mode</h3> |
| 2311 | </div> |
| 2312 | </div> |
| 2313 | </div> |
| 2314 | <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"> |
| 2315 | <h3 class="title">Note</h3> |
| 2316 | <p> |
| 2317 | This section applies to 2.6 kernels and above only. |
| 2318 | </p> |
| 2319 | </div> |
| 2320 | <p> |
| 2321 | In 2.6 kernels on CPUs without OProfile support for the hardware performance counters, the driver |
| 2322 | falls back to using the timer interrupt for profiling. Like the RTC mode in 2.4 kernels, this is not able to |
| 2323 | profile code that has interrupts disabled. Note that there are no configuration parameters for |
| 2324 | setting this, unlike the RTC and hardware performance counter setup. |
| 2325 | </p> |
| 2326 | <p> |
| 2327 | You can force use of the timer interrupt by using the <code class="option">timer=1</code> module |
| 2328 | parameter (or <code class="option">oprofile.timer=1</code> on the boot command line if OProfile is |
| 2329 | built-in). |
| 2330 | </p> |
| 2331 | </div> |
| 2332 | <div class="sect2" lang="en" xml:lang="en"> |
| 2333 | <div class="titlepage"> |
| 2334 | <div> |
| 2335 | <div> |
| 2336 | <h3 class="title"><a id="p4"></a>4.4. Pentium 4 support</h3> |
| 2337 | </div> |
| 2338 | </div> |
| 2339 | </div> |
| 2340 | <p> |
| 2341 | The Pentium 4 / Xeon performance counters are organized around 3 types of model specific registers (MSRs): 45 event |
| 2342 | selection control registers (ESCRs), 18 counter configuration control registers (CCCRs) and 18 counters. ESCRs describe a |
| 2343 | particular set of events which are to be recorded, and CCCRs bind ESCRs to counters and configure their |
| 2344 | operation. Unfortunately the relationship between these registers is quite complex; they cannot all be used with one |
| 2345 | another at any time. There is, however, a subset of 8 counters, 8 ESCRs, and 8 CCCRs which can be used independently of |
| 2346 | one another, so OProfile only accesses those registers, treating them as a bank of 8 "normal" counters, similar |
| 2347 | to those in the P6 or Athlon/Opteron/Phenom/Turion families of CPU. |
| 2348 | </p> |
| 2349 | <p> |
| 2350 | There is currently no support for Precision Event-Based Sampling (PEBS), nor any advanced uses of the Debug Store |
| 2351 | (DS). Current support is limited to the conservative extension of OProfile's existing interrupt-based model described |
| 2352 | above. Performance monitoring hardware on Pentium 4 / Xeon processors with Hyperthreading enabled (multiple logical |
| 2353 | processors on a single die) is not supported in 2.4 kernels (you can use OProfile if you disable hyper-threading, |
| 2354 | though). |
| 2355 | </p> |
| 2356 | </div> |
| 2357 | <div class="sect2" lang="en" xml:lang="en"> |
| 2358 | <div class="titlepage"> |
| 2359 | <div> |
| 2360 | <div> |
| 2361 | <h3 class="title"><a id="ia64"></a>4.5. Intel Itanium 2 support</h3> |
| 2362 | </div> |
| 2363 | </div> |
| 2364 | </div> |
| 2365 | <p> |
| 2366 | The Itanium 2 performance monitoring unit (PMU) organizes the counters as four |
| 2367 | pairs of performance event monitoring registers. Each pair is composed of a |
| 2368 | Performance Monitoring Configuration (PMC) register and Performance Monitoring |
| 2369 | Data (PMD) register. The PMC selects the performance event being monitored and |
| 2370 | the PMD determines the sampling interval. The IA64 Performance Monitoring Unit |
| 2371 | (PMU) triggers sampling with maskable interrupts. Thus, samples will not occur |
| 2372 | in sections of the IA64 kernel where interrupts are disabled. |
| 2373 | </p> |
| 2374 | <p> |
| 2375 | None of the advance features of the Itanium 2 performance monitoring unit |
| 2376 | such as opcode matching, address range matching, or precise event sampling are |
| 2377 | supported by this version of OProfile. The Itanium 2 support only maps OProfile's |
| 2378 | existing interrupt-based model to the PMU hardware. |
| 2379 | </p> |
| 2380 | </div> |
| 2381 | <div class="sect2" lang="en" xml:lang="en"> |
| 2382 | <div class="titlepage"> |
| 2383 | <div> |
| 2384 | <div> |
| 2385 | <h3 class="title"><a id="ppc64"></a>4.6. PowerPC64 support</h3> |
| 2386 | </div> |
| 2387 | </div> |
| 2388 | </div> |
| 2389 | <p> |
| 2390 | The performance monitoring unit (PMU) for the IBM PowerPC 64-bit processors |
| 2391 | consists of between 4 and 8 counters (depending on the model), plus three |
| 2392 | special purpose registers used for programming the counters -- MMCR0, MMCR1, |
| 2393 | and MMCRA. Advanced features such as instruction matching and thresholding are |
| 2394 | not supported by this version of OProfile. |
| 2395 | </p> |
| 2396 | <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3>Later versions of the IBM POWER5+ processor (beginning with revision 3.0) |
| 2397 | run the performance monitor unit in POWER6 mode, effectively removing OProfile's |
| 2398 | access to counters 5 and 6. These two counters are dedicated to counting |
| 2399 | instructions completed and cycles, respectively. In POWER6 mode, however, the |
| 2400 | counters do not generate an interrupt on overflow and so are unusable by |
| 2401 | OProfile. Kernel versions 2.6.23 and higher will recognize this mode |
| 2402 | and export "ppc64/power5++" as the cpu_type to the oprofilefs pseudo filesystem. |
| 2403 | OProfile userspace responds to this cpu_type by removing these counters from |
| 2404 | the list of potential events to count. Without this kernel support, attempts |
| 2405 | to profile using an event from one of these counters will yield incorrect |
| 2406 | results -- typically, zero (or near zero) samples in the generated report. |
| 2407 | </div> |
| 2408 | <p> |
| 2409 | </p> |
| 2410 | </div> |
| 2411 | <div class="sect2" lang="en" xml:lang="en"> |
| 2412 | <div class="titlepage"> |
| 2413 | <div> |
| 2414 | <div> |
| 2415 | <h3 class="title"><a id="cell-be"></a>4.7. Cell Broadband Engine support</h3> |
| 2416 | </div> |
| 2417 | </div> |
| 2418 | </div> |
| 2419 | <p> |
| 2420 | The Cell Broadband Engine (CBE) processor core consists of a PowerPC Processing |
| 2421 | Element (PPE) and 8 Synergistic Processing Elements (SPE). PPEs and SPEs each |
| 2422 | consist of a processing unit (PPU and SPU, respectively) and other hardware |
| 2423 | components, such as memory controllers. |
| 2424 | </p> |
| 2425 | <p> |
| 2426 | A PPU has two hardware threads (aka "virtual CPUs"). The performance monitor |
| 2427 | unit of the CBE collects event information on one hardware thread at a time. |
| 2428 | Therefore, when profiling PPE events, |
| 2429 | OProfile collects the profile based on the selected events by time slicing the |
| 2430 | performance counter hardware between the two threads. The user must ensure the |
| 2431 | collection interval is long enough so that the time spent collecting data for |
| 2432 | each PPU is sufficient to obtain a good profile. |
| 2433 | </p> |
| 2434 | <p> |
| 2435 | To profile an SPU application, the user should specify the SPU_CYCLES event. |
| 2436 | When starting OProfile with SPU_CYCLES, the opcontrol script enforces certain |
| 2437 | separation parameters (separate=cpu,lib) to ensure that sufficient information |
| 2438 | is collected in the sample data in order to generate a complete report. The |
| 2439 | --merge=cpu option can be used to obtain a more readable report if analyzing |
| 2440 | the performance of each separate SPU is not necessary. |
| 2441 | </p> |
| 2442 | <p> |
| 2443 | Profiling with an SPU event (events 4100 through 4163) is not compatible with any other |
| 2444 | event. Further more, only one SPU event can be specified at a time. The hardware only |
| 2445 | supports profiling on one SPU per node at a time. The OProfile kernel code time slices |
| 2446 | between the eight SPUs to collect data on all SPUs. |
| 2447 | </p> |
| 2448 | <p> |
| 2449 | SPU profile reports have some unique characteristics compared to reports for |
| 2450 | standard architectures: |
| 2451 | </p> |
| 2452 | <div class="itemizedlist"> |
| 2453 | <ul type="disc"> |
| 2454 | <li>Typically no "app name" column. This is really standard OProfile behavior |
| 2455 | when the report contains samples for just a single application, which is |
| 2456 | commonly the case when profiling SPUs.</li> |
| 2457 | <li>"CPU" equates to "SPU"</li> |
| 2458 | <li>Specifying '--long-filenames' on the opreport command does not always result |
| 2459 | in long filenames. This happens when the SPU application code is embedded in |
| 2460 | the PPE executable or shared library. The embedded SPU ELF data contains only the |
| 2461 | short filename (i.e., no path information) for the SPU binary file that was used as |
| 2462 | the source for embedding. The reason that just the short filename is used is because |
| 2463 | the original SPU binary file may not exist or be accessible at runtime. The performance |
| 2464 | analyst must have sufficient knowledge of the application to be able to correlate the |
| 2465 | SPU binary image names found in the report to the application's source files. |
| 2466 | <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3> |
| 2467 | Compile the application with -g and generate the OProfile report |
| 2468 | with -g to facilitate finding the right source file(s) on which to focus. |
| 2469 | </div></li> |
| 2470 | </ul> |
| 2471 | </div> |
| 2472 | </div> |
| 2473 | <div class="sect2" lang="en" xml:lang="en"> |
| 2474 | <div class="titlepage"> |
| 2475 | <div> |
| 2476 | <div> |
| 2477 | <h3 class="title"><a id="amd-ibs-support"></a>4.8. AMD64 (x86_64) Instruction-Based Sampling (IBS) support</h3> |
| 2478 | </div> |
| 2479 | </div> |
| 2480 | </div> |
| 2481 | <p> |
| 2482 | Instruction-Based Sampling (IBS) is a new performance measurement technique |
| 2483 | available on AMD Family 10h processors. Traditional performance counter |
| 2484 | sampling is not precise enough to isolate performance issues to individual |
| 2485 | instructions. IBS, however, precisely identifies instructions which are not |
| 2486 | making the best use of the processor pipeline and memory hierarchy. |
| 2487 | For more information, please refer to the "Instruction-Based Sampling: |
| 2488 | A New Performance Analysis Technique for AMD Family 10h Processors" ( |
| 2489 | <a href="http://developer.amd.com/assets/AMD_IBS_paper_EN.pdf"> |
| 2490 | http://developer.amd.com/assets/AMD_IBS_paper_EN.pdf</a>). |
| 2491 | There are two types of IBS profile types, described in the following sections. |
| 2492 | </p> |
| 2493 | <div class="sect3" lang="en" xml:lang="en"> |
| 2494 | <div class="titlepage"> |
| 2495 | <div> |
| 2496 | <div> |
| 2497 | <h4 class="title"><a id="ibs-fetch"></a>4.8.1. IBS Fetch</h4> |
| 2498 | </div> |
| 2499 | </div> |
| 2500 | </div> |
| 2501 | <p> |
| 2502 | IBS fetch sampling is a statistical sampling method which counts completed |
| 2503 | fetch operations. When the number of completed fetch operations reaches the |
| 2504 | maximum fetch count (the sampling period), IBS tags the fetch operation and |
| 2505 | monitors that operation until it either completes or aborts. When a tagged |
| 2506 | fetch completes or aborts, a sampling interrupt is generated and an IBS fetch |
| 2507 | sample is taken. An IBS fetch sample contains a timestamp, the identifier of |
| 2508 | the interrupted process, the virtual fetch address, and several event flags |
| 2509 | and values that describe what happened during the fetch operation. |
| 2510 | </p> |
| 2511 | </div> |
| 2512 | <div class="sect3" lang="en" xml:lang="en"> |
| 2513 | <div class="titlepage"> |
| 2514 | <div> |
| 2515 | <div> |
| 2516 | <h4 class="title"><a id="ibs-op"></a>4.8.2. IBS Op</h4> |
| 2517 | </div> |
| 2518 | </div> |
| 2519 | </div> |
| 2520 | <p> |
| 2521 | IBS op sampling selects, tags, and monitors macro-ops as issued from AMD64 |
| 2522 | instructions. Two options are available for selecting ops for sampling: |
| 2523 | </p> |
| 2524 | <div class="itemizedlist"> |
| 2525 | <ul type="disc"> |
| 2526 | <li> |
| 2527 | Cycles-based selection counts CPU clock cycles. The op is tagged and monitored |
| 2528 | when the count reaches a threshold (the sampling period) and a valid op is |
| 2529 | available. |
| 2530 | </li> |
| 2531 | <li> |
| 2532 | Dispatched op-based selection counts dispatched macro-ops. |
| 2533 | When the count reaches a threshold, the next valid op is tagged and monitored. |
| 2534 | </li> |
| 2535 | </ul> |
| 2536 | </div> |
| 2537 | <p> |
| 2538 | In both cases, an IBS sample is generated only if the tagged op retires. |
| 2539 | Thus, IBS op event information does not measure speculative execution activity. |
| 2540 | The execution stages of the pipeline monitor the tagged macro-op. When the |
| 2541 | tagged macro-op retires, a sampling interrupt is generated and an IBS op |
| 2542 | sample is taken. An IBS op sample contains a timestamp, the identifier of |
| 2543 | the interrupted process, the virtual address of the AMD64 instruction from |
| 2544 | which the op was issued, and several event flags and values that describe |
| 2545 | what happened when the macro-op executed. |
| 2546 | </p> |
| 2547 | </div> |
| 2548 | <p> |
| 2549 | Enabling IBS profiling is done simply by specifying IBS performance events |
| 2550 | through the "--event=" options. These events are listed in the |
| 2551 | <code class="function">opcontrol --list-events</code>. |
| 2552 | </p> |
| 2553 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 2554 | <tr> |
| 2555 | <td> |
| 2556 | <pre class="screen"> |
| 2557 | opcontrol --event=IBS_FETCH_XXX:<count>:<um>:<kernel>:<user> |
| 2558 | opcontrol --event=IBS_OP_XXX:<count>:<um>:<kernel>:<user> |
| 2559 | |
| 2560 | Note: * All IBS fetch event must have the same event count and unitmask, |
| 2561 | as do those for IBS op. |
| 2562 | </pre> |
| 2563 | </td> |
| 2564 | </tr> |
| 2565 | </table> |
| 2566 | </div> |
| 2567 | <div class="sect2" lang="en" xml:lang="en"> |
| 2568 | <div class="titlepage"> |
| 2569 | <div> |
| 2570 | <div> |
| 2571 | <h3 class="title"><a id="misuse"></a>4.9. Dangerous counter settings</h3> |
| 2572 | </div> |
| 2573 | </div> |
| 2574 | </div> |
| 2575 | <p> |
| 2576 | OProfile is a low-level profiler which allow continuous profiling with a low-overhead cost. |
| 2577 | If too low a count reset value is set for a counter, the system can become overloaded with counter |
| 2578 | interrupts, and seem as if the system has frozen. Whilst some validation is done, it |
| 2579 | is not foolproof. |
| 2580 | </p> |
| 2581 | <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"> |
| 2582 | <h3 class="title">Note</h3> |
| 2583 | <p> |
| 2584 | This can happen as follows: When the profiler count |
| 2585 | reaches zero an NMI handler is called which stores the sample values in an internal buffer, then resets the counter |
| 2586 | to its original value. If the count is very low, a pending NMI can be sent before the NMI handler has |
| 2587 | completed. Due to the priority of the NMI, the local APIC delivers the pending interrupt immediately after |
| 2588 | completion of the previous interrupt handler, and control never returns to other parts of the system. |
| 2589 | In this way the system seems to be frozen. |
| 2590 | </p> |
| 2591 | </div> |
| 2592 | <p>If this happens, it will be impossible to bring the system back to a workable state. |
| 2593 | There is no way to provide real security against this happening, other than making sure to use a reasonable value |
| 2594 | for the counter reset. For example, setting <code class="constant">CPU_CLK_UNHALTED</code> event type with a ridiculously low reset count (e.g. 500) |
| 2595 | is likely to freeze the system. |
| 2596 | </p> |
| 2597 | <p> |
| 2598 | In short : <span><strong class="command">Don't try a foolish sample count value</strong></span>. Unfortunately the definition of a foolish value |
| 2599 | is really dependent on the event type - if ever in doubt, e-mail </p> |
| 2600 | <div class="address"> |
| 2601 | <p><code class="email"><<a href="mailto:oprofile-list@lists.sf.net">oprofile-list@lists.sf.net</a>></code>.</p> |
| 2602 | </div> |
| 2603 | </div> |
| 2604 | </div> |
| 2605 | </div> |
| 2606 | <div class="chapter" lang="en" xml:lang="en"> |
| 2607 | <div class="titlepage"> |
| 2608 | <div> |
| 2609 | <div> |
| 2610 | <h2 class="title"><a id="results"></a>Chapter 4. Obtaining results</h2> |
| 2611 | </div> |
| 2612 | </div> |
| 2613 | </div> |
| 2614 | <div class="toc"> |
| 2615 | <p> |
| 2616 | <b>Table of Contents</b> |
| 2617 | </p> |
| 2618 | <dl> |
| 2619 | <dt> |
| 2620 | <span class="sect1"> |
| 2621 | <a href="#profile-spec">1. Profile specifications</a> |
| 2622 | </span> |
| 2623 | </dt> |
| 2624 | <dd> |
| 2625 | <dl> |
| 2626 | <dt> |
| 2627 | <span class="sect2"> |
| 2628 | <a href="#profile-spec-examples">1.1. Examples</a> |
| 2629 | </span> |
| 2630 | </dt> |
| 2631 | <dt> |
| 2632 | <span class="sect2"> |
| 2633 | <a href="#profile-spec-details">1.2. Profile specification parameters</a> |
| 2634 | </span> |
| 2635 | </dt> |
| 2636 | <dt> |
| 2637 | <span class="sect2"> |
| 2638 | <a href="#locating-and-managing-binary-images">1.3. Locating and managing binary images</a> |
| 2639 | </span> |
| 2640 | </dt> |
| 2641 | <dt> |
| 2642 | <span class="sect2"> |
| 2643 | <a href="#no-results">1.4. What to do when you don't get any results</a> |
| 2644 | </span> |
| 2645 | </dt> |
| 2646 | </dl> |
| 2647 | </dd> |
| 2648 | <dt> |
| 2649 | <span class="sect1"> |
| 2650 | <a href="#opreport">2. Image summaries and symbol summaries (<span><strong class="command">opreport</strong></span>)</a> |
| 2651 | </span> |
| 2652 | </dt> |
| 2653 | <dd> |
| 2654 | <dl> |
| 2655 | <dt> |
| 2656 | <span class="sect2"> |
| 2657 | <a href="#opreport-merging">2.1. Merging separate profiles</a> |
| 2658 | </span> |
| 2659 | </dt> |
| 2660 | <dt> |
| 2661 | <span class="sect2"> |
| 2662 | <a href="#opreport-comparison">2.2. Side-by-side multiple results</a> |
| 2663 | </span> |
| 2664 | </dt> |
| 2665 | <dt> |
| 2666 | <span class="sect2"> |
| 2667 | <a href="#opreport-callgraph">2.3. Callgraph output</a> |
| 2668 | </span> |
| 2669 | </dt> |
| 2670 | <dt> |
| 2671 | <span class="sect2"> |
| 2672 | <a href="#opreport-diff">2.4. Differential profiles with <span><strong class="command">opreport</strong></span></a> |
| 2673 | </span> |
| 2674 | </dt> |
| 2675 | <dt> |
| 2676 | <span class="sect2"> |
| 2677 | <a href="#opreport-anon">2.5. Anonymous executable mappings</a> |
| 2678 | </span> |
| 2679 | </dt> |
| 2680 | <dt> |
| 2681 | <span class="sect2"> |
| 2682 | <a href="#opreport-xml">2.6. XML formatted output</a> |
| 2683 | </span> |
| 2684 | </dt> |
| 2685 | <dt> |
| 2686 | <span class="sect2"> |
| 2687 | <a href="#opreport-options">2.7. Options for <span><strong class="command">opreport</strong></span></a> |
| 2688 | </span> |
| 2689 | </dt> |
| 2690 | </dl> |
| 2691 | </dd> |
| 2692 | <dt> |
| 2693 | <span class="sect1"> |
| 2694 | <a href="#opannotate">3. Outputting annotated source (<span><strong class="command">opannotate</strong></span>)</a> |
| 2695 | </span> |
| 2696 | </dt> |
| 2697 | <dd> |
| 2698 | <dl> |
| 2699 | <dt> |
| 2700 | <span class="sect2"> |
| 2701 | <a href="#opannotate-finding-source">3.1. Locating source files</a> |
| 2702 | </span> |
| 2703 | </dt> |
| 2704 | <dt> |
| 2705 | <span class="sect2"> |
| 2706 | <a href="#opannotate-details">3.2. Usage of <span><strong class="command">opannotate</strong></span></a> |
| 2707 | </span> |
| 2708 | </dt> |
| 2709 | </dl> |
| 2710 | </dd> |
| 2711 | <dt> |
| 2712 | <span class="sect1"> |
| 2713 | <a href="#getting-jit-reports">4. OProfile results with JIT samples</a> |
| 2714 | </span> |
| 2715 | </dt> |
| 2716 | <dt> |
| 2717 | <span class="sect1"> |
| 2718 | <a href="#opgprof">5. <span><strong class="command">gprof</strong></span>-compatible output (<span><strong class="command">opgprof</strong></span>)</a> |
| 2719 | </span> |
| 2720 | </dt> |
| 2721 | <dd> |
| 2722 | <dl> |
| 2723 | <dt> |
| 2724 | <span class="sect2"> |
| 2725 | <a href="#opgprof-details">5.1. Usage of <span><strong class="command">opgprof</strong></span></a> |
| 2726 | </span> |
| 2727 | </dt> |
| 2728 | </dl> |
| 2729 | </dd> |
| 2730 | <dt> |
| 2731 | <span class="sect1"> |
| 2732 | <a href="#oparchive">6. Archiving measurements (<span><strong class="command">oparchive</strong></span>)</a> |
| 2733 | </span> |
| 2734 | </dt> |
| 2735 | <dd> |
| 2736 | <dl> |
| 2737 | <dt> |
| 2738 | <span class="sect2"> |
| 2739 | <a href="#oparchive-details">6.1. Usage of <span><strong class="command">oparchive</strong></span></a> |
| 2740 | </span> |
| 2741 | </dt> |
| 2742 | </dl> |
| 2743 | </dd> |
| 2744 | <dt> |
| 2745 | <span class="sect1"> |
| 2746 | <a href="#opimport">7. Converting sample database files (<span><strong class="command">opimport</strong></span>)</a> |
| 2747 | </span> |
| 2748 | </dt> |
| 2749 | <dd> |
| 2750 | <dl> |
| 2751 | <dt> |
| 2752 | <span class="sect2"> |
| 2753 | <a href="#opimport-details">7.1. Usage of <span><strong class="command">opimport</strong></span></a> |
| 2754 | </span> |
| 2755 | </dt> |
| 2756 | </dl> |
| 2757 | </dd> |
| 2758 | </dl> |
| 2759 | </div> |
| 2760 | <p> |
| 2761 | OK, so the profiler has been running, but it's not much use unless we can get some data out. Fairly often, |
| 2762 | OProfile does a little <span class="emphasis"><em>too</em></span> good a job of keeping overhead low, and no data reaches |
| 2763 | the profiler. This can happen on lightly-loaded machines. Remember you can force a dump at any time with : |
| 2764 | </p> |
| 2765 | <p> |
| 2766 | <span> |
| 2767 | <strong class="command">opcontrol --dump</strong> |
| 2768 | </span> |
| 2769 | </p> |
| 2770 | <p>Remember to do this before complaining there is no profiling data ! |
| 2771 | Now that we've got some data, it has to be processed. That's the job of <span><strong class="command">opreport</strong></span>, |
| 2772 | <span><strong class="command">opannotate</strong></span>, or <span><strong class="command">opgprof</strong></span>. |
| 2773 | </p> |
| 2774 | <div class="sect1" lang="en" xml:lang="en"> |
| 2775 | <div class="titlepage"> |
| 2776 | <div> |
| 2777 | <div> |
| 2778 | <h2 class="title" style="clear: both"><a id="profile-spec"></a>1. Profile specifications</h2> |
| 2779 | </div> |
| 2780 | </div> |
| 2781 | </div> |
| 2782 | <p> |
| 2783 | All of the analysis tools take a <span class="emphasis"><em>profile specification</em></span>. |
| 2784 | This is a set of definitions that describe which actual profiles should be |
| 2785 | examined. The simplest profile specification is empty: this will match all |
| 2786 | the available profile files for the current session (this is what happens |
| 2787 | when you do <span><strong class="command">opreport</strong></span>). |
| 2788 | </p> |
| 2789 | <p> |
| 2790 | Specification parameters are of the form <code class="option">name:value[,value]</code>. |
| 2791 | For example, if I wanted to get a combined symbol summary for |
| 2792 | <code class="filename">/bin/myprog</code> and <code class="filename">/bin/myprog2</code>, |
| 2793 | I could do <span><strong class="command">opreport -l image:/bin/myprog,/bin/myprog2</strong></span>. |
| 2794 | As a special case, you don't actually need to specify the <code class="option">image:</code> |
| 2795 | part here: anything left on the command line is assumed to be an |
| 2796 | <code class="option">image:</code> name. Similarly, if no <code class="option">session:</code> |
| 2797 | is specified, then <code class="option">session:current</code> is assumed ("current" |
| 2798 | is a special name of the current / last profiling session). |
| 2799 | </p> |
| 2800 | <p> |
| 2801 | In addition to the comma-separated list shown above, some of the |
| 2802 | specification parameters can take <span><strong class="command">glob</strong></span>-style |
| 2803 | values. For example, if I want to see image summaries for all |
| 2804 | binaries profiled in <code class="filename">/usr/bin/</code>, I could do |
| 2805 | <span><strong class="command">opreport image:/usr/bin/\*</strong></span>. Note the necessity |
| 2806 | to escape the special character from the shell. |
| 2807 | </p> |
| 2808 | <p> |
| 2809 | For <span><strong class="command">opreport</strong></span>, profile specifications can be used to |
| 2810 | define two profiles, giving differential output. This is done by |
| 2811 | enclosing each of the two specifications within curly braces, as shown |
| 2812 | in the examples below. Any specifications outside of curly braces are |
| 2813 | shared across both. |
| 2814 | </p> |
| 2815 | <div class="sect2" lang="en" xml:lang="en"> |
| 2816 | <div class="titlepage"> |
| 2817 | <div> |
| 2818 | <div> |
| 2819 | <h3 class="title"><a id="profile-spec-examples"></a>1.1. Examples</h3> |
| 2820 | </div> |
| 2821 | </div> |
| 2822 | </div> |
| 2823 | <p> |
| 2824 | Image summaries for all profiles with <code class="constant">DATA_MEM_REFS</code> |
| 2825 | samples in the saved session called "stresstest" : |
| 2826 | </p> |
| 2827 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 2828 | <tr> |
| 2829 | <td> |
| 2830 | <pre class="screen"> |
| 2831 | # opreport session:stresstest event:DATA_MEM_REFS |
| 2832 | </pre> |
| 2833 | </td> |
| 2834 | </tr> |
| 2835 | </table> |
| 2836 | <p> |
| 2837 | Symbol summary for the application called "test_sym53c8xx,9xx". Note the |
| 2838 | escaping is necessary as <code class="option">image:</code> takes a comma-separated list. |
| 2839 | </p> |
| 2840 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 2841 | <tr> |
| 2842 | <td> |
| 2843 | <pre class="screen"> |
| 2844 | # opreport -l ./test/test_sym53c8xx\,9xx |
| 2845 | </pre> |
| 2846 | </td> |
| 2847 | </tr> |
| 2848 | </table> |
| 2849 | <p> |
| 2850 | Image summaries for all binaries in the <code class="filename">test</code> directory, |
| 2851 | excepting <code class="filename">boring-test</code> : |
| 2852 | </p> |
| 2853 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 2854 | <tr> |
| 2855 | <td> |
| 2856 | <pre class="screen"> |
| 2857 | # opreport image:./test/\* image-exclude:./test/boring-test |
| 2858 | </pre> |
| 2859 | </td> |
| 2860 | </tr> |
| 2861 | </table> |
| 2862 | <p> |
| 2863 | Differential profile of a binary stored in two archives : |
| 2864 | </p> |
| 2865 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 2866 | <tr> |
| 2867 | <td> |
| 2868 | <pre class="screen"> |
| 2869 | # opreport -l /bin/bash { archive:./orig } { archive:./new } |
| 2870 | </pre> |
| 2871 | </td> |
| 2872 | </tr> |
| 2873 | </table> |
| 2874 | <p> |
| 2875 | Differential profile of an archived binary with the current session : |
| 2876 | </p> |
| 2877 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 2878 | <tr> |
| 2879 | <td> |
| 2880 | <pre class="screen"> |
| 2881 | # opreport -l /bin/bash { archive:./orig } { } |
| 2882 | </pre> |
| 2883 | </td> |
| 2884 | </tr> |
| 2885 | </table> |
| 2886 | </div> |
| 2887 | <div class="sect2" lang="en" xml:lang="en"> |
| 2888 | <div class="titlepage"> |
| 2889 | <div> |
| 2890 | <div> |
| 2891 | <h3 class="title"><a id="profile-spec-details"></a>1.2. Profile specification parameters</h3> |
| 2892 | </div> |
| 2893 | </div> |
| 2894 | </div> |
| 2895 | <div class="variablelist"> |
| 2896 | <dl> |
| 2897 | <dt> |
| 2898 | <span class="term"> |
| 2899 | <code class="option">archive:</code> |
| 2900 | <span class="emphasis"> |
| 2901 | <em>archivepath</em> |
| 2902 | </span> |
| 2903 | </span> |
| 2904 | </dt> |
| 2905 | <dd> |
| 2906 | <p> |
| 2907 | A path to an archive made with <span><strong class="command">oparchive</strong></span>. |
| 2908 | Absence of this tag, unlike others, means "the current system", |
| 2909 | equivalent to specifying "archive:". |
| 2910 | </p> |
| 2911 | </dd> |
| 2912 | <dt> |
| 2913 | <span class="term"> |
| 2914 | <code class="option">session:</code> |
| 2915 | <span class="emphasis"> |
| 2916 | <em>sessionlist</em> |
| 2917 | </span> |
| 2918 | </span> |
| 2919 | </dt> |
| 2920 | <dd> |
| 2921 | <p> |
| 2922 | A comma-separated list of session names to resolve in. Absence of this |
| 2923 | tag, unlike others, means "the current session", equivalent to |
| 2924 | specifying "session:current". |
| 2925 | </p> |
| 2926 | </dd> |
| 2927 | <dt> |
| 2928 | <span class="term"> |
| 2929 | <code class="option">session-exclude:</code> |
| 2930 | <span class="emphasis"> |
| 2931 | <em>sessionlist</em> |
| 2932 | </span> |
| 2933 | </span> |
| 2934 | </dt> |
| 2935 | <dd> |
| 2936 | <p> |
| 2937 | A comma-separated list of sessions to exclude. |
| 2938 | </p> |
| 2939 | </dd> |
| 2940 | <dt> |
| 2941 | <span class="term"> |
| 2942 | <code class="option">image:</code> |
| 2943 | <span class="emphasis"> |
| 2944 | <em>imagelist</em> |
| 2945 | </span> |
| 2946 | </span> |
| 2947 | </dt> |
| 2948 | <dd> |
| 2949 | <p> |
| 2950 | A comma-separated list of image names to resolve. Each entry may be relative |
| 2951 | path, <span><strong class="command">glob</strong></span>-style name, or full path, e.g.</p> |
| 2952 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 2953 | <tr> |
| 2954 | <td> |
| 2955 | <pre class="screen">opreport 'image:/usr/bin/oprofiled,*op*,./opreport'</pre> |
| 2956 | </td> |
| 2957 | </tr> |
| 2958 | </table> |
| 2959 | </dd> |
| 2960 | <dt> |
| 2961 | <span class="term"> |
| 2962 | <code class="option">image-exclude:</code> |
| 2963 | <span class="emphasis"> |
| 2964 | <em>imagelist</em> |
| 2965 | </span> |
| 2966 | </span> |
| 2967 | </dt> |
| 2968 | <dd> |
| 2969 | <p> |
| 2970 | Same as <code class="option">image:</code>, but the matching images are excluded. |
| 2971 | </p> |
| 2972 | </dd> |
| 2973 | <dt> |
| 2974 | <span class="term"> |
| 2975 | <code class="option">lib-image:</code> |
| 2976 | <span class="emphasis"> |
| 2977 | <em>imagelist</em> |
| 2978 | </span> |
| 2979 | </span> |
| 2980 | </dt> |
| 2981 | <dd> |
| 2982 | <p> |
| 2983 | Same as <code class="option">image:</code>, but only for images that are for |
| 2984 | a particular primary binary image (namely, an application). This only |
| 2985 | makes sense to use if you're using <code class="option">--separate</code>. |
| 2986 | This includes kernel modules and the kernel when using |
| 2987 | <code class="option">--separate=kernel</code>. |
| 2988 | </p> |
| 2989 | </dd> |
| 2990 | <dt> |
| 2991 | <span class="term"> |
| 2992 | <code class="option">lib-image-exclude:</code> |
| 2993 | <span class="emphasis"> |
| 2994 | <em>imagelist</em> |
| 2995 | </span> |
| 2996 | </span> |
| 2997 | </dt> |
| 2998 | <dd> |
| 2999 | <p> |
| 3000 | Same as <code class="option">lib-image:</code>, but the matching images |
| 3001 | are excluded. |
| 3002 | </p> |
| 3003 | </dd> |
| 3004 | <dt> |
| 3005 | <span class="term"> |
| 3006 | <code class="option">event:</code> |
| 3007 | <span class="emphasis"> |
| 3008 | <em>eventlist</em> |
| 3009 | </span> |
| 3010 | </span> |
| 3011 | </dt> |
| 3012 | <dd> |
| 3013 | <p> |
| 3014 | The symbolic event name to match on, e.g. <code class="option">event:DATA_MEM_REFS</code>. |
| 3015 | You can pass a list of events for side-by-side comparison with <span><strong class="command">opreport</strong></span>. |
| 3016 | When using the timer interrupt, the event is always "TIMER". |
| 3017 | </p> |
| 3018 | </dd> |
| 3019 | <dt> |
| 3020 | <span class="term"> |
| 3021 | <code class="option">count:</code> |
| 3022 | <span class="emphasis"> |
| 3023 | <em>eventcountlist</em> |
| 3024 | </span> |
| 3025 | </span> |
| 3026 | </dt> |
| 3027 | <dd> |
| 3028 | <p> |
| 3029 | The event count to match on, e.g. <code class="option">event:DATA_MEM_REFS count:30000</code>. |
| 3030 | Note that this value refers to the setting used for <span><strong class="command">opcontrol</strong></span> |
| 3031 | only, and has nothing to do with the sample counts in the profile data |
| 3032 | itself. |
| 3033 | You can pass a list of events for side-by-side comparison with <span><strong class="command">opreport</strong></span>. |
| 3034 | When using the timer interrupt, the count is always 0 (indicating it cannot be set). |
| 3035 | </p> |
| 3036 | </dd> |
| 3037 | <dt> |
| 3038 | <span class="term"> |
| 3039 | <code class="option">unit-mask:</code> |
| 3040 | <span class="emphasis"> |
| 3041 | <em>masklist</em> |
| 3042 | </span> |
| 3043 | </span> |
| 3044 | </dt> |
| 3045 | <dd> |
| 3046 | <p> |
| 3047 | The unit mask value of the event to match on, e.g. <code class="option">unit-mask:1</code>. |
| 3048 | You can pass a list of events for side-by-side comparison with <span><strong class="command">opreport</strong></span>. |
| 3049 | </p> |
| 3050 | </dd> |
| 3051 | <dt> |
| 3052 | <span class="term"> |
| 3053 | <code class="option">cpu:</code> |
| 3054 | <span class="emphasis"> |
| 3055 | <em>cpulist</em> |
| 3056 | </span> |
| 3057 | </span> |
| 3058 | </dt> |
| 3059 | <dd> |
| 3060 | <p> |
| 3061 | Only consider profiles for the given numbered CPU (starting from zero). |
| 3062 | This is only useful when using CPU profile separation. |
| 3063 | </p> |
| 3064 | </dd> |
| 3065 | <dt> |
| 3066 | <span class="term"> |
| 3067 | <code class="option">tgid:</code> |
| 3068 | <span class="emphasis"> |
| 3069 | <em>pidlist</em> |
| 3070 | </span> |
| 3071 | </span> |
| 3072 | </dt> |
| 3073 | <dd> |
| 3074 | <p> |
| 3075 | Only consider profiles for the given task groups. Unless some program |
| 3076 | is using threads, the task group ID of a process is the same |
| 3077 | as its process ID. This option corresponds to the POSIX |
| 3078 | notion of a thread group. |
| 3079 | This is only useful when using per-process profile separation. |
| 3080 | </p> |
| 3081 | </dd> |
| 3082 | <dt> |
| 3083 | <span class="term"> |
| 3084 | <code class="option">tid:</code> |
| 3085 | <span class="emphasis"> |
| 3086 | <em>tidlist</em> |
| 3087 | </span> |
| 3088 | </span> |
| 3089 | </dt> |
| 3090 | <dd> |
| 3091 | <p> |
| 3092 | Only consider profiles for the given threads. When using |
| 3093 | recent thread libraries, all threads in a process share the |
| 3094 | same task group ID, but have different thread IDs. You can |
| 3095 | use this option in combination with <code class="option">tgid:</code> to |
| 3096 | restrict the results to particular threads within a process. |
| 3097 | This is only useful when using per-process profile separation. |
| 3098 | </p> |
| 3099 | </dd> |
| 3100 | </dl> |
| 3101 | </div> |
| 3102 | </div> |
| 3103 | <div class="sect2" lang="en" xml:lang="en"> |
| 3104 | <div class="titlepage"> |
| 3105 | <div> |
| 3106 | <div> |
| 3107 | <h3 class="title"><a id="locating-and-managing-binary-images"></a>1.3. Locating and managing binary images</h3> |
| 3108 | </div> |
| 3109 | </div> |
| 3110 | </div> |
| 3111 | <p> |
| 3112 | Each session's sample files can be found in the $SESSION_DIR/samples/ directory (default: <code class="filename">/var/lib/oprofile/samples/</code>). |
| 3113 | These are used, along with the binary image files, to produce human-readable data. |
| 3114 | In some circumstances (kernel modules in an initrd, or modules on 2.6 kernels), OProfile |
| 3115 | will not be able to find the binary images. All the tools have an <code class="option">--image-path</code> |
| 3116 | option to which you can pass a comma-separated list of alternate paths to search. For example, |
| 3117 | I can let OProfile find my 2.6 modules by using <span><strong class="command">--image-path /lib/modules/2.6.0/kernel/</strong></span>. |
| 3118 | It is your responsibility to ensure that the correct images are found when using this |
| 3119 | option. |
| 3120 | </p> |
| 3121 | <p> |
| 3122 | Note that if a binary image changes after the sample file was created, you won't be able to get useful |
| 3123 | symbol-based data out. This situation is detected for you. If you replace a binary, you should |
| 3124 | make sure to save the old binary if you need to do comparative profiles. |
| 3125 | </p> |
| 3126 | </div> |
| 3127 | <div class="sect2" lang="en" xml:lang="en"> |
| 3128 | <div class="titlepage"> |
| 3129 | <div> |
| 3130 | <div> |
| 3131 | <h3 class="title"><a id="no-results"></a>1.4. What to do when you don't get any results</h3> |
| 3132 | </div> |
| 3133 | </div> |
| 3134 | </div> |
| 3135 | <p> |
| 3136 | When attempting to get output, you may see the error : |
| 3137 | </p> |
| 3138 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 3139 | <tr> |
| 3140 | <td> |
| 3141 | <pre class="screen"> |
| 3142 | error: no sample files found: profile specification too strict ? |
| 3143 | </pre> |
| 3144 | </td> |
| 3145 | </tr> |
| 3146 | </table> |
| 3147 | <p> |
| 3148 | What this is saying is that the profile specification you passed in, |
| 3149 | when matched against the available sample files, resulted in no matches. |
| 3150 | There are a number of reasons this might happen: |
| 3151 | </p> |
| 3152 | <div class="variablelist"> |
| 3153 | <dl> |
| 3154 | <dt> |
| 3155 | <span class="term">spelling</span> |
| 3156 | </dt> |
| 3157 | <dd> |
| 3158 | <p> |
| 3159 | You specified a binary name, but spelt it wrongly. Check your spelling ! |
| 3160 | </p> |
| 3161 | </dd> |
| 3162 | <dt> |
| 3163 | <span class="term">profiler wasn't running</span> |
| 3164 | </dt> |
| 3165 | <dd> |
| 3166 | <p> |
| 3167 | Make very sure that OProfile was actually up and running when you ran |
| 3168 | the binary. |
| 3169 | </p> |
| 3170 | </dd> |
| 3171 | <dt> |
| 3172 | <span class="term">binary didn't run long enough</span> |
| 3173 | </dt> |
| 3174 | <dd> |
| 3175 | <p> |
| 3176 | Remember OProfile is a statistical profiler - you're not guaranteed to |
| 3177 | get samples for short-running programs. You can help this by using a |
| 3178 | lower count for the performance counter, so there are a lot more samples |
| 3179 | taken per second. |
| 3180 | </p> |
| 3181 | </dd> |
| 3182 | <dt> |
| 3183 | <span class="term">binary spent most of its time in libraries</span> |
| 3184 | </dt> |
| 3185 | <dd> |
| 3186 | <p> |
| 3187 | Similarly, if the binary spends little time in the main binary image |
| 3188 | itself, with most of it spent in shared libraries it uses, you might |
| 3189 | not see any samples for the binary image itself. You can check this |
| 3190 | by using <span><strong class="command">opcontrol --separate=lib</strong></span> before the |
| 3191 | profiling session, so <span><strong class="command">opreport</strong></span> and friends show |
| 3192 | the library profiles on a per-application basis. |
| 3193 | </p> |
| 3194 | </dd> |
| 3195 | <dt> |
| 3196 | <span class="term">specification was really too strict</span> |
| 3197 | </dt> |
| 3198 | <dd> |
| 3199 | <p> |
| 3200 | For example, you specified something like <code class="option">tgid:3433</code>, |
| 3201 | but no task with that group ID ever ran the code. |
| 3202 | </p> |
| 3203 | </dd> |
| 3204 | <dt> |
| 3205 | <span class="term">binary didn't generate any events</span> |
| 3206 | </dt> |
| 3207 | <dd> |
| 3208 | <p> |
| 3209 | If you're using a particular event counter, for example counting MMX |
| 3210 | operations, the code might simply have not generated any events in the |
| 3211 | first place. Verify the code you're profiling does what you expect it |
| 3212 | to. |
| 3213 | </p> |
| 3214 | </dd> |
| 3215 | <dt> |
| 3216 | <span class="term">you didn't specify kernel module name correctly</span> |
| 3217 | </dt> |
| 3218 | <dd> |
| 3219 | <p> |
| 3220 | If you're using 2.6 kernels, and trying to get reports for a kernel |
| 3221 | module, make sure to use the <code class="option">-p</code> option, and specify the |
| 3222 | module name <span class="emphasis"><em>with</em></span> the <code class="filename">.ko</code> |
| 3223 | extension. Check if the module is one loaded from initrd. |
| 3224 | </p> |
| 3225 | </dd> |
| 3226 | </dl> |
| 3227 | </div> |
| 3228 | </div> |
| 3229 | </div> |
| 3230 | <div class="sect1" lang="en" xml:lang="en"> |
| 3231 | <div class="titlepage"> |
| 3232 | <div> |
| 3233 | <div> |
| 3234 | <h2 class="title" style="clear: both"><a id="opreport"></a>2. Image summaries and symbol summaries (<span><strong class="command">opreport</strong></span>)</h2> |
| 3235 | </div> |
| 3236 | </div> |
| 3237 | </div> |
| 3238 | <p> |
| 3239 | The <span><strong class="command">opreport</strong></span> utility is the primary utility you will use for |
| 3240 | getting formatted data out of OProfile. It produces two types of data: image summaries |
| 3241 | and symbol summaries. An image summary lists the number of samples for individual |
| 3242 | binary images such as libraries or applications. Symbol summaries provide per-symbol |
| 3243 | profile data. In the following example, we're getting an image summary for the whole |
| 3244 | system: |
| 3245 | </p> |
| 3246 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 3247 | <tr> |
| 3248 | <td> |
| 3249 | <pre class="screen"> |
| 3250 | $ opreport --long-filenames |
| 3251 | CPU: PIII, speed 863.195 MHz (estimated) |
| 3252 | Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 23150 |
| 3253 | 905898 59.7415 /usr/lib/gcc-lib/i386-redhat-linux/3.2/cc1plus |
| 3254 | 214320 14.1338 /boot/2.6.0/vmlinux |
| 3255 | 103450 6.8222 /lib/i686/libc-2.3.2.so |
| 3256 | 60160 3.9674 /usr/local/bin/madplay |
| 3257 | 31769 2.0951 /usr/local/oprofile-pp/bin/oprofiled |
| 3258 | 26550 1.7509 /usr/lib/libartsflow.so.1.0.0 |
| 3259 | 23906 1.5765 /usr/bin/as |
| 3260 | 18770 1.2378 /oprofile |
| 3261 | 15528 1.0240 /usr/lib/qt-3.0.5/lib/libqt-mt.so.3.0.5 |
| 3262 | 11979 0.7900 /usr/X11R6/bin/XFree86 |
| 3263 | 11328 0.7471 /bin/bash |
| 3264 | ... |
| 3265 | </pre> |
| 3266 | </td> |
| 3267 | </tr> |
| 3268 | </table> |
| 3269 | <p> |
| 3270 | If we had specified <code class="option">--symbols</code> in the previous command, we would have |
| 3271 | gotten a symbol summary of all the images across the entire system. We can restrict this to only |
| 3272 | part of the system profile; for example, |
| 3273 | below is a symbol summary of the OProfile daemon. Note that as we used |
| 3274 | <span><strong class="command">opcontrol --separate=kernel</strong></span>, symbols from images that <span><strong class="command">oprofiled</strong></span> |
| 3275 | has used are also shown. |
| 3276 | </p> |
| 3277 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 3278 | <tr> |
| 3279 | <td> |
| 3280 | <pre class="screen"> |
| 3281 | $ opreport -l `which oprofiled` 2>/dev/null | more |
| 3282 | CPU: PIII, speed 863.195 MHz (estimated) |
| 3283 | Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 23150 |
| 3284 | vma samples % image name symbol name |
| 3285 | 0804be10 14971 28.1993 oprofiled odb_insert |
| 3286 | 0804afdc 7144 13.4564 oprofiled pop_buffer_value |
| 3287 | c01daea0 6113 11.5144 vmlinux __copy_to_user_ll |
| 3288 | 0804b060 2816 5.3042 oprofiled opd_put_sample |
| 3289 | 0804b4a0 2147 4.0441 oprofiled opd_process_samples |
| 3290 | 0804acf4 1855 3.4941 oprofiled opd_put_image_sample |
| 3291 | 0804ad84 1766 3.3264 oprofiled opd_find_image |
| 3292 | 0804a5ec 1084 2.0418 oprofiled opd_find_module |
| 3293 | 0804ba5c 741 1.3957 oprofiled odb_hash_add_node |
| 3294 | ... |
| 3295 | </pre> |
| 3296 | </td> |
| 3297 | </tr> |
| 3298 | </table> |
| 3299 | <p> |
| 3300 | These are the two basic ways you are most likely to use regularly, but <span><strong class="command">opreport</strong></span> |
| 3301 | can do a lot more than that, as described below. |
| 3302 | </p> |
| 3303 | <div class="sect2" lang="en" xml:lang="en"><div class="titlepage"><div><div><h3 class="title"><a id="opreport-merging"></a>2.1. Merging separate profiles</h3></div></div></div> |
| 3304 | |
| 3305 | If you have used one of the <code class="option">--separate=</code> options |
| 3306 | whilst profiling, there can be several separate profiles for |
| 3307 | a single binary image within a session. Normally the output |
| 3308 | will keep these images separated (so, for example, the image summary |
| 3309 | output shows library image summaries on a per-application basis, |
| 3310 | when using <code class="option">--separate=lib</code>). |
| 3311 | Sometimes it can be useful to merge these results back together |
| 3312 | before getting results. The <code class="option">--merge</code> option allows |
| 3313 | you to do that. |
| 3314 | </div> |
| 3315 | <div class="sect2" lang="en" xml:lang="en"><div class="titlepage"><div><div><h3 class="title"><a id="opreport-comparison"></a>2.2. Side-by-side multiple results</h3></div></div></div> |
| 3316 | If you have used multiple events when profiling, by default you get |
| 3317 | side-by-side results of each event's sample values from <span><strong class="command">opreport</strong></span>. |
| 3318 | You can restrict which events to list by appropriate use of the |
| 3319 | <code class="option">event:</code> profile specifications, etc. |
| 3320 | </div> |
| 3321 | <div class="sect2" lang="en" xml:lang="en"> |
| 3322 | <div class="titlepage"> |
| 3323 | <div> |
| 3324 | <div> |
| 3325 | <h3 class="title"><a id="opreport-callgraph"></a>2.3. Callgraph output</h3> |
| 3326 | </div> |
| 3327 | </div> |
| 3328 | </div> |
| 3329 | <p> |
| 3330 | This section provides details on how to use the OProfile callgraph feature. |
| 3331 | </p> |
| 3332 | <div class="sect3" lang="en" xml:lang="en"> |
| 3333 | <div class="titlepage"> |
| 3334 | <div> |
| 3335 | <div> |
| 3336 | <h4 class="title"><a id="op-cg1"></a>2.3.1. Callgraph details</h4> |
| 3337 | </div> |
| 3338 | </div> |
| 3339 | </div> |
| 3340 | <p> |
| 3341 | When using the <code class="option">opcontrol --callgraph</code> option, you can see what |
| 3342 | functions are calling other functions in the output. Consider the |
| 3343 | following program: |
| 3344 | </p> |
| 3345 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 3346 | <tr> |
| 3347 | <td> |
| 3348 | <pre class="screen"> |
| 3349 | #include <string.h> |
| 3350 | #include <stdlib.h> |
| 3351 | #include <stdio.h> |
| 3352 | |
| 3353 | #define SIZE 500000 |
| 3354 | |
| 3355 | static int compare(const void *s1, const void *s2) |
| 3356 | { |
| 3357 | return strcmp(s1, s2); |
| 3358 | } |
| 3359 | |
| 3360 | static void repeat(void) |
| 3361 | { |
| 3362 | int i; |
| 3363 | char *strings[SIZE]; |
| 3364 | char str[] = "abcdefghijklmnopqrstuvwxyz"; |
| 3365 | |
| 3366 | for (i = 0; i < SIZE; ++i) { |
| 3367 | strings[i] = strdup(str); |
| 3368 | strfry(strings[i]); |
| 3369 | } |
| 3370 | |
| 3371 | qsort(strings, SIZE, sizeof(char *), compare); |
| 3372 | } |
| 3373 | |
| 3374 | int main() |
| 3375 | { |
| 3376 | while (1) |
| 3377 | repeat(); |
| 3378 | } |
| 3379 | </pre> |
| 3380 | </td> |
| 3381 | </tr> |
| 3382 | </table> |
| 3383 | <p> |
| 3384 | When running with the call-graph option, OProfile will |
| 3385 | record the function stack every time it takes a sample. |
| 3386 | <span><strong class="command">opreport --callgraph</strong></span> outputs an entry for each |
| 3387 | function, where each entry looks similar to: |
| 3388 | </p> |
| 3389 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 3390 | <tr> |
| 3391 | <td> |
| 3392 | <pre class="screen"> |
| 3393 | samples % image name symbol name |
| 3394 | 197 0.1548 cg main |
| 3395 | 127036 99.8452 cg repeat |
| 3396 | 84590 42.5084 libc-2.3.2.so strfry |
| 3397 | 84590 66.4838 libc-2.3.2.so strfry [self] |
| 3398 | 39169 30.7850 libc-2.3.2.so random_r |
| 3399 | 3475 2.7312 libc-2.3.2.so __i686.get_pc_thunk.bx |
| 3400 | ------------------------------------------------------------------------------- |
| 3401 | </pre> |
| 3402 | </td> |
| 3403 | </tr> |
| 3404 | </table> |
| 3405 | <p> |
| 3406 | Here the non-indented line is the function we're focussing upon |
| 3407 | (<code class="function">strfry()</code>). This |
| 3408 | line is the same as you'd get from a normal <span><strong class="command">opreport</strong></span> |
| 3409 | output. |
| 3410 | </p> |
| 3411 | <p> |
| 3412 | Above the non-indented line we find the functions that called this |
| 3413 | function (for example, <code class="function">repeat()</code> calls |
| 3414 | <code class="function">strfry()</code>). The samples and percentage values here |
| 3415 | refer to the number of times we took a sample where this call was found |
| 3416 | in the stack; the percentage is relative to all other callers of the |
| 3417 | function we're focussing on. Note that these values are |
| 3418 | <span class="emphasis"><em>not</em></span> call counts; they only reflect the call stack |
| 3419 | every time a sample is taken; that is, if a call is found in the stack |
| 3420 | at the time of a sample, it is recorded in this count. |
| 3421 | </p> |
| 3422 | <p> |
| 3423 | Below the line are functions that are called by |
| 3424 | <code class="function">strfry()</code> (called <span class="emphasis"><em>callees</em></span>). |
| 3425 | It's clear here that <code class="function">strfry()</code> calls |
| 3426 | <code class="function">random_r()</code>. We also see a special entry with a |
| 3427 | "[self]" marker. This records the normal samples for the function, but |
| 3428 | the percentage becomes relative to all callees. This allows you to |
| 3429 | compare time spent in the function itself compared to functions it |
| 3430 | calls. Note that if a function calls itself, then it will appear in the |
| 3431 | list of callees of itself, but without the "[self]" marker; so recursive |
| 3432 | calls are still clearly separable. |
| 3433 | </p> |
| 3434 | <p> |
| 3435 | You may have noticed that the output lists <code class="function">main()</code> |
| 3436 | as calling <code class="function">strfry()</code>, but it's clear from the source |
| 3437 | that this doesn't actually happen. See <a href="#interpreting-callgraph" title="3. Interpreting call-graph profiles">Section 3, “Interpreting call-graph profiles”</a> for an explanation. |
| 3438 | </p> |
| 3439 | </div> |
| 3440 | <div class="sect3" lang="en" xml:lang="en"> |
| 3441 | <div class="titlepage"> |
| 3442 | <div> |
| 3443 | <div> |
| 3444 | <h4 class="title"><a id="cg-with-jitsupport"></a>2.3.2. Callgraph and JIT support</h4> |
| 3445 | </div> |
| 3446 | </div> |
| 3447 | </div> |
| 3448 | <p> |
| 3449 | Callgraph output where anonymously mapped code is in the callstack can sometimes be misleading. |
| 3450 | For all such code, the samples for the anonymously mapped code are stored in a samples subdirectory |
| 3451 | named <code class="filename">{anon:anon}/<tgid>.<begin_addr>.<end_addr></code>. |
| 3452 | As stated earlier, if this anonymously mapped code is JITed code from a supported VM like Java, |
| 3453 | OProfile creates an ELF file to provide a (somewhat) permanent backing file for the code. |
| 3454 | However, when viewing callgraph output, any anonymously mapped code in the callstack |
| 3455 | will be attributed to <code class="filename">anon (<tgid>: range:<begin_addr>-<end_addr></code>, |
| 3456 | even if a <code class="filename">.jo</code> ELF file had been created for it. See the example below. |
| 3457 | </p> |
| 3458 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 3459 | <tr> |
| 3460 | <td> |
| 3461 | <pre class="screen"> |
| 3462 | ------------------------------------------------------------------------------- |
| 3463 | 1 2.2727 libj9ute23.so java.bin traceV |
| 3464 | 2 4.5455 libj9ute23.so java.bin utsTraceV |
| 3465 | 4 9.0909 libj9trc23.so java.bin fillInUTInterfaces |
| 3466 | 37 84.0909 libj9trc23.so java.bin twGetSequenceCounter |
| 3467 | 8 0.0154 libj9prt23.so java.bin j9time_hires_clock |
| 3468 | 27 61.3636 anon (tgid:10014 range:0x100000-0x103000) java.bin (no symbols) |
| 3469 | 9 20.4545 libc-2.4.so java.bin gettimeofday |
| 3470 | 8 18.1818 libj9prt23.so java.bin j9time_hires_clock [self] |
| 3471 | ------------------------------------------------------------------------------- |
| 3472 | </pre> |
| 3473 | </td> |
| 3474 | </tr> |
| 3475 | </table> |
| 3476 | <p> |
| 3477 | The output shows that "anon (tgid:10014 range:0x100000-0x103000)" was a callee of |
| 3478 | <code class="code">j9time_hires_clock</code>, even though the ELF file <code class="filename">10014.jo</code> was |
| 3479 | created for this profile run. Unfortunately, there is currently no way to correlate |
| 3480 | that anonymous callgraph entry with its corresponding <code class="filename">.jo</code> file. |
| 3481 | </p> |
| 3482 | </div> |
| 3483 | </div> |
| 3484 | <div class="sect2" lang="en" xml:lang="en"> |
| 3485 | <div class="titlepage"> |
| 3486 | <div> |
| 3487 | <div> |
| 3488 | <h3 class="title"><a id="opreport-diff"></a>2.4. Differential profiles with <span><strong class="command">opreport</strong></span></h3> |
| 3489 | </div> |
| 3490 | </div> |
| 3491 | </div> |
| 3492 | <p> |
| 3493 | Often, we'd like to be able to compare two profiles. For example, when |
| 3494 | analysing the performance of an application, we'd like to make code |
| 3495 | changes and examine the effect of the change. This is supported in |
| 3496 | <span><strong class="command">opreport</strong></span> by giving a profile specification that |
| 3497 | identifies two different profiles. The general form is of: |
| 3498 | </p> |
| 3499 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 3500 | <tr> |
| 3501 | <td> |
| 3502 | <pre class="screen"> |
| 3503 | $ opreport <shared-spec> { <first-profile> } { <second-profile> } |
| 3504 | </pre> |
| 3505 | </td> |
| 3506 | </tr> |
| 3507 | </table> |
| 3508 | <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"> |
| 3509 | <h3 class="title">Note</h3> |
| 3510 | <p> |
| 3511 | We lost our Dragon book down the back of the sofa, so you have to be |
| 3512 | careful to have spaces around those braces, or things will get |
| 3513 | hopelessly confused. We can only apologise. |
| 3514 | </p> |
| 3515 | </div> |
| 3516 | <p> |
| 3517 | For each of the profiles, the shared section is prefixed, and then the |
| 3518 | specification is analysed. The usual parameters work both within the |
| 3519 | shared section, and in the sub-specification within the curly braces. |
| 3520 | </p> |
| 3521 | <p> |
| 3522 | A typical way to use this feature is with archives created with |
| 3523 | <span><strong class="command">oparchive</strong></span>. Let's look at an example: |
| 3524 | </p> |
| 3525 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 3526 | <tr> |
| 3527 | <td> |
| 3528 | <pre class="screen"> |
| 3529 | $ ./a |
| 3530 | $ oparchive -o orig ./a |
| 3531 | $ opcontrol --reset |
| 3532 | # edit and recompile a |
| 3533 | $ ./a |
| 3534 | # now compare the current profile of a with the archived profile |
| 3535 | $ opreport -xl ./a { archive:./orig } { } |
| 3536 | CPU: PIII, speed 863.233 MHz (estimated) |
| 3537 | Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a |
| 3538 | unit mask of 0x00 (No unit mask) count 100000 |
| 3539 | samples % diff % symbol name |
| 3540 | 92435 48.5366 +0.4999 a |
| 3541 | 54226 --- --- c |
| 3542 | 49222 25.8459 +++ d |
| 3543 | 48787 25.6175 -2.2e-01 b |
| 3544 | </pre> |
| 3545 | </td> |
| 3546 | </tr> |
| 3547 | </table> |
| 3548 | <p> |
| 3549 | Note that we specified an empty second profile in the curly braces, as |
| 3550 | we wanted to use the current session; alternatively, we could |
| 3551 | have specified another archive, or a tgid etc. We specified the binary |
| 3552 | <span><strong class="command">a</strong></span> in the shared section, so we matched that in both |
| 3553 | the profiles we're diffing. |
| 3554 | </p> |
| 3555 | <p> |
| 3556 | As in the normal output, the results are sorted by the number of |
| 3557 | samples, and the percentage field represents the relative percentage of |
| 3558 | the symbol's samples in the second profile. |
| 3559 | </p> |
| 3560 | <p> |
| 3561 | Notice the new column in the output. This value represents the |
| 3562 | percentage change of the relative percent between the first and the |
| 3563 | second profile: roughly, "how much more important this symbol is". |
| 3564 | Looking at the symbol <code class="function">a()</code>, we can see that it took |
| 3565 | roughly the same amount of the total profile in both the first and the |
| 3566 | second profile. The function <code class="function">c()</code> was not in the new |
| 3567 | profile, so has been marked with <code class="function">---</code>. Note that the |
| 3568 | sample value is the number of samples in the first profile; since we're |
| 3569 | displaying results for the second profile, we don't list a percentage |
| 3570 | value for it, as it would be meaningless. <code class="function">d()</code> is |
| 3571 | new in the second profile, and consequently marked with |
| 3572 | <code class="function">+++</code>. |
| 3573 | </p> |
| 3574 | <p> |
| 3575 | When comparing profiles between different binaries, it should be clear |
| 3576 | that functions can change in terms of VMA and size. To avoid this |
| 3577 | problem, <span><strong class="command">opreport</strong></span> considers a symbol to be the same |
| 3578 | if the symbol name, image name, and owning application name all match; |
| 3579 | any other factors are ignored. Note that the check for application name |
| 3580 | means that trying to compare library profiles between two different |
| 3581 | applications will not work as you might expect: each symbol will be |
| 3582 | considered different. |
| 3583 | </p> |
| 3584 | </div> |
| 3585 | <div class="sect2" lang="en" xml:lang="en"> |
| 3586 | <div class="titlepage"> |
| 3587 | <div> |
| 3588 | <div> |
| 3589 | <h3 class="title"><a id="opreport-anon"></a>2.5. Anonymous executable mappings</h3> |
| 3590 | </div> |
| 3591 | </div> |
| 3592 | </div> |
| 3593 | <p> |
| 3594 | Many applications, typically ones involving dynamic compilation into |
| 3595 | machine code (just-in-time, or "JIT", compilation), have executable mappings that |
| 3596 | are not backed by an ELF file. <span><strong class="command">opreport</strong></span> has basic support for showing the |
| 3597 | samples taken in these regions; for example: |
| 3598 | </p> |
| 3599 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 3600 | <tr> |
| 3601 | <td> |
| 3602 | <pre class="screen"> |
| 3603 | $ opreport /usr/bin/mono -l |
| 3604 | CPU: ppc64 POWER5, speed 1654.34 MHz (estimated) |
| 3605 | Counted CYCLES events (Processor Cycles using continuous sampling) with a unit mask of 0x00 (No unit mask) count 100000 |
| 3606 | samples % image name symbol name |
| 3607 | 47 58.7500 mono (no symbols) |
| 3608 | 14 17.5000 anon (tgid:3189 range:0xf72aa000-0xf72fa000) (no symbols) |
| 3609 | 9 11.2500 anon (tgid:3189 range:0xf6cca000-0xf6dd9000) (no symbols) |
| 3610 | . . . . |
| 3611 | </pre> |
| 3612 | </td> |
| 3613 | </tr> |
| 3614 | </table> |
| 3615 | <p> |
| 3616 | </p> |
| 3617 | <p> |
| 3618 | Note that, since such mappings are dependent upon individual invocations of |
| 3619 | a binary, these mappings are always listed as a dependent image, |
| 3620 | even when using <code class="option">--separate=none</code>. |
| 3621 | Equally, the results are not affected by the <code class="option">--merge</code> |
| 3622 | option. |
| 3623 | </p> |
| 3624 | <p> |
| 3625 | As shown in the opreport output above, OProfile is unable to attribute the samples to any |
| 3626 | symbol(s) because there is no ELF file for this code. |
| 3627 | Enhanced support for JITed code is now available for some virtual machines; |
| 3628 | e.g., the Java Virtual Machine. For details about OProfile output for |
| 3629 | JITed code, see <a href="#getting-jit-reports" title="4. OProfile results with JIT samples">Section 4, “OProfile results with JIT samples”</a>. |
| 3630 | </p> |
| 3631 | <p>For more information about JIT support in OProfile, see <a href="#jitsupport" title="1.1. Support for dynamically compiled (JIT) code">Section 1.1, “Support for dynamically compiled (JIT) code”</a>. |
| 3632 | </p> |
| 3633 | </div> |
| 3634 | <div class="sect2" lang="en" xml:lang="en"> |
| 3635 | <div class="titlepage"> |
| 3636 | <div> |
| 3637 | <div> |
| 3638 | <h3 class="title"><a id="opreport-xml"></a>2.6. XML formatted output</h3> |
| 3639 | </div> |
| 3640 | </div> |
| 3641 | </div> |
| 3642 | <p> |
| 3643 | The -xml option can be used to generate XML instead of the usual |
| 3644 | text format. This allows opreport to eliminate some of the constraints |
| 3645 | dictated by the two dimensional text format. For example, it is possible |
| 3646 | to separate the sample data across multiple events, cpus and threads. The XML |
| 3647 | schema implemented by opreport is found in doc/opreport.xsd. It contains |
| 3648 | more detailed comments about the structure of the XML generated by opreport. |
| 3649 | </p> |
| 3650 | <p> |
| 3651 | Since XML is consumed by a client program rather than a user, its structure |
| 3652 | is fairly static. In particular, the --sort option is incompatible with the |
| 3653 | --xml option. Percentages are not dislayed in the XML so the options related |
| 3654 | to percentages will have no effect. Full pathnames are always displayed in |
| 3655 | the XML so --long-filenames is not necessary. The --details option will cause |
| 3656 | all of the individual sample data to be included in the XML as well as the |
| 3657 | instruction byte stream for each symbol (for doing disassembly) and can result |
| 3658 | in very large XML files. |
| 3659 | </p> |
| 3660 | </div> |
| 3661 | <div class="sect2" lang="en" xml:lang="en"> |
| 3662 | <div class="titlepage"> |
| 3663 | <div> |
| 3664 | <div> |
| 3665 | <h3 class="title"><a id="opreport-options"></a>2.7. Options for <span><strong class="command">opreport</strong></span></h3> |
| 3666 | </div> |
| 3667 | </div> |
| 3668 | </div> |
| 3669 | <div class="variablelist"> |
| 3670 | <dl> |
| 3671 | <dt> |
| 3672 | <span class="term"> |
| 3673 | <code class="option">--accumulated / -a</code> |
| 3674 | </span> |
| 3675 | </dt> |
| 3676 | <dd> |
| 3677 | <p> |
| 3678 | Accumulate sample and percentage counts in the symbol list. |
| 3679 | </p> |
| 3680 | </dd> |
| 3681 | <dt> |
| 3682 | <span class="term"> |
| 3683 | <code class="option">--callgraph / -c</code> |
| 3684 | </span> |
| 3685 | </dt> |
| 3686 | <dd> |
| 3687 | <p> |
| 3688 | Show callgraph information. |
| 3689 | </p> |
| 3690 | </dd> |
| 3691 | <dt> |
| 3692 | <span class="term"> |
| 3693 | <code class="option">--debug-info / -g</code> |
| 3694 | </span> |
| 3695 | </dt> |
| 3696 | <dd> |
| 3697 | <p> |
| 3698 | Show source file and line for each symbol. |
| 3699 | </p> |
| 3700 | </dd> |
| 3701 | <dt> |
| 3702 | <span class="term"> |
| 3703 | <code class="option">--demangle / -D none|normal|smart</code> |
| 3704 | </span> |
| 3705 | </dt> |
| 3706 | <dd> |
| 3707 | <p> |
| 3708 | none: no demangling. normal: use default demangler (default) smart: use |
| 3709 | pattern-matching to make C++ symbol demangling more readable. |
| 3710 | </p> |
| 3711 | </dd> |
| 3712 | <dt> |
| 3713 | <span class="term"> |
| 3714 | <code class="option">--details / -d</code> |
| 3715 | </span> |
| 3716 | </dt> |
| 3717 | <dd> |
| 3718 | <p> |
| 3719 | Show per-instruction details for all selected symbols. Note that, for |
| 3720 | binaries without symbol information, the VMA values shown are raw file |
| 3721 | offsets for the image binary. |
| 3722 | </p> |
| 3723 | </dd> |
| 3724 | <dt> |
| 3725 | <span class="term"> |
| 3726 | <code class="option">--exclude-dependent / -x</code> |
| 3727 | </span> |
| 3728 | </dt> |
| 3729 | <dd> |
| 3730 | <p> |
| 3731 | Do not include application-specific images for libraries, kernel modules |
| 3732 | and the kernel. This option only makes sense if the profile session |
| 3733 | used --separate. |
| 3734 | </p> |
| 3735 | </dd> |
| 3736 | <dt> |
| 3737 | <span class="term"> |
| 3738 | <code class="option">--exclude-symbols / -e [symbols]</code> |
| 3739 | </span> |
| 3740 | </dt> |
| 3741 | <dd> |
| 3742 | <p> |
| 3743 | Exclude all the symbols in the given comma-separated list. |
| 3744 | </p> |
| 3745 | </dd> |
| 3746 | <dt> |
| 3747 | <span class="term"> |
| 3748 | <code class="option">--global-percent / -%</code> |
| 3749 | </span> |
| 3750 | </dt> |
| 3751 | <dd> |
| 3752 | <p> |
| 3753 | Make all percentages relative to the whole profile. |
| 3754 | </p> |
| 3755 | </dd> |
| 3756 | <dt> |
| 3757 | <span class="term"> |
| 3758 | <code class="option">--help / -? / --usage</code> |
| 3759 | </span> |
| 3760 | </dt> |
| 3761 | <dd> |
| 3762 | <p> |
| 3763 | Show help message. |
| 3764 | </p> |
| 3765 | </dd> |
| 3766 | <dt> |
| 3767 | <span class="term"> |
| 3768 | <code class="option">--image-path / -p [paths]</code> |
| 3769 | </span> |
| 3770 | </dt> |
| 3771 | <dd> |
| 3772 | <p> |
| 3773 | Comma-separated list of additional paths to search for binaries. |
| 3774 | This is needed to find modules in kernels 2.6 and upwards. |
| 3775 | </p> |
| 3776 | </dd> |
| 3777 | <dt> |
| 3778 | <span class="term"> |
| 3779 | <code class="option">--root / -R [path]</code> |
| 3780 | </span> |
| 3781 | </dt> |
| 3782 | <dd> |
| 3783 | <p> |
| 3784 | A path to a filesystem to search for additional binaries. |
| 3785 | </p> |
| 3786 | </dd> |
| 3787 | <dt> |
| 3788 | <span class="term"> |
| 3789 | <code class="option">--include-symbols / -i [symbols]</code> |
| 3790 | </span> |
| 3791 | </dt> |
| 3792 | <dd> |
| 3793 | <p> |
| 3794 | Only include symbols in the given comma-separated list. |
| 3795 | </p> |
| 3796 | </dd> |
| 3797 | <dt> |
| 3798 | <span class="term"> |
| 3799 | <code class="option">--long-filenames / -f</code> |
| 3800 | </span> |
| 3801 | </dt> |
| 3802 | <dd> |
| 3803 | <p> |
| 3804 | Output full paths instead of basenames. |
| 3805 | </p> |
| 3806 | </dd> |
| 3807 | <dt> |
| 3808 | <span class="term"> |
| 3809 | <code class="option">--merge / -m [lib,cpu,tid,tgid,unitmask,all]</code> |
| 3810 | </span> |
| 3811 | </dt> |
| 3812 | <dd> |
| 3813 | <p> |
| 3814 | Merge any profiles separated in a --separate session. |
| 3815 | </p> |
| 3816 | </dd> |
| 3817 | <dt> |
| 3818 | <span class="term"> |
| 3819 | <code class="option">--no-header</code> |
| 3820 | </span> |
| 3821 | </dt> |
| 3822 | <dd> |
| 3823 | <p> |
| 3824 | Don't output a header detailing profiling parameters. |
| 3825 | </p> |
| 3826 | </dd> |
| 3827 | <dt> |
| 3828 | <span class="term"> |
| 3829 | <code class="option">--output-file / -o [file]</code> |
| 3830 | </span> |
| 3831 | </dt> |
| 3832 | <dd> |
| 3833 | <p> |
| 3834 | Output to the given file instead of stdout. |
| 3835 | </p> |
| 3836 | </dd> |
| 3837 | <dt> |
| 3838 | <span class="term"> |
| 3839 | <code class="option">--reverse-sort / -r</code> |
| 3840 | </span> |
| 3841 | </dt> |
| 3842 | <dd> |
| 3843 | <p> |
| 3844 | Reverse the sort from the default. |
| 3845 | </p> |
| 3846 | </dd> |
| 3847 | <dt> |
| 3848 | <span class="term"><code class="option">--session-dir=</code>dir_path</span> |
| 3849 | </dt> |
| 3850 | <dd> |
| 3851 | <p> |
| 3852 | Use sample database out of directory <code class="filename">dir_path</code> |
| 3853 | instead of the default location (/var/lib/oprofile). |
| 3854 | </p> |
| 3855 | </dd> |
| 3856 | <dt> |
| 3857 | <span class="term"> |
| 3858 | <code class="option">--show-address / -w</code> |
| 3859 | </span> |
| 3860 | </dt> |
| 3861 | <dd> |
| 3862 | <p> |
| 3863 | Show the VMA address of each symbol (off by default). |
| 3864 | </p> |
| 3865 | </dd> |
| 3866 | <dt> |
| 3867 | <span class="term"> |
| 3868 | <code class="option">--sort / -s [vma,sample,symbol,debug,image]</code> |
| 3869 | </span> |
| 3870 | </dt> |
| 3871 | <dd> |
| 3872 | <p> |
| 3873 | Sort the list of symbols by, respectively, symbol address, |
| 3874 | number of samples, symbol name, debug filename and line number, |
| 3875 | binary image filename. |
| 3876 | </p> |
| 3877 | </dd> |
| 3878 | <dt> |
| 3879 | <span class="term"> |
| 3880 | <code class="option">--symbols / -l</code> |
| 3881 | </span> |
| 3882 | </dt> |
| 3883 | <dd> |
| 3884 | <p> |
| 3885 | List per-symbol information instead of a binary image summary. |
| 3886 | </p> |
| 3887 | </dd> |
| 3888 | <dt> |
| 3889 | <span class="term"> |
| 3890 | <code class="option">--threshold / -t [percentage]</code> |
| 3891 | </span> |
| 3892 | </dt> |
| 3893 | <dd> |
| 3894 | <p> |
| 3895 | Only output data for symbols that have more than the given percentage |
| 3896 | of total samples. |
| 3897 | </p> |
| 3898 | </dd> |
| 3899 | <dt> |
| 3900 | <span class="term"> |
| 3901 | <code class="option">--verbose / -V [options]</code> |
| 3902 | </span> |
| 3903 | </dt> |
| 3904 | <dd> |
| 3905 | <p> |
| 3906 | Give verbose debugging output. |
| 3907 | </p> |
| 3908 | </dd> |
| 3909 | <dt> |
| 3910 | <span class="term"> |
| 3911 | <code class="option">--version / -v</code> |
| 3912 | </span> |
| 3913 | </dt> |
| 3914 | <dd> |
| 3915 | <p> |
| 3916 | Show version. |
| 3917 | </p> |
| 3918 | </dd> |
| 3919 | <dt> |
| 3920 | <span class="term"> |
| 3921 | <code class="option">--xml / -X</code> |
| 3922 | </span> |
| 3923 | </dt> |
| 3924 | <dd> |
| 3925 | <p> |
| 3926 | Generate XML output. |
| 3927 | </p> |
| 3928 | </dd> |
| 3929 | </dl> |
| 3930 | </div> |
| 3931 | </div> |
| 3932 | </div> |
| 3933 | <div class="sect1" lang="en" xml:lang="en"> |
| 3934 | <div class="titlepage"> |
| 3935 | <div> |
| 3936 | <div> |
| 3937 | <h2 class="title" style="clear: both"><a id="opannotate"></a>3. Outputting annotated source (<span><strong class="command">opannotate</strong></span>)</h2> |
| 3938 | </div> |
| 3939 | </div> |
| 3940 | </div> |
| 3941 | <p> |
| 3942 | The <span><strong class="command">opannotate</strong></span> utility generates annotated source files or assembly listings, optionally |
| 3943 | mixed with source. |
| 3944 | If you want to see the source file, the profiled application needs to have debug information, and the source |
| 3945 | must be available through this debug information. For GCC, you must use the <code class="option">-g</code> option |
| 3946 | when you are compiling. |
| 3947 | If the binary doesn't contain sufficient debug information, you can still |
| 3948 | use <span><strong class="command">opannotate <code class="option">--assembly</code></strong></span> to get annotated assembly. |
| 3949 | </p> |
| 3950 | <p> |
| 3951 | Note that for the reason explained in <a href="#hardware-counters" title="4.1. Hardware performance counters">Section 4.1, “Hardware performance counters”</a> the results can be |
| 3952 | inaccurate. The debug information itself can add other problems; for example, the line number for a symbol can be |
| 3953 | incorrect. Assembly instructions can be re-ordered and moved by the compiler, and this can lead to |
| 3954 | crediting source lines with samples not really "owned" by this line. Also see |
| 3955 | <a href="#interpreting" title="Chapter 5. Interpreting profiling results">Chapter 5, <i>Interpreting profiling results</i></a>. |
| 3956 | </p> |
| 3957 | <p> |
| 3958 | You can output the annotation to one single file, containing all the source found using the |
| 3959 | <code class="option">--source</code>. You can use this in conjunction with <code class="option">--assembly</code> |
| 3960 | to get combined source/assembly output. |
| 3961 | </p> |
| 3962 | <p> |
| 3963 | You can also output a directory of annotated source files that maintains the structure of |
| 3964 | the original sources. Each line in the annotated source is prepended with the samples |
| 3965 | for that line. Additionally, each symbol is annotated giving details for the symbol |
| 3966 | as a whole. An example: |
| 3967 | </p> |
| 3968 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 3969 | <tr> |
| 3970 | <td> |
| 3971 | <pre class="screen"> |
| 3972 | $ opannotate --source --output-dir=annotated /usr/local/oprofile-pp/bin/oprofiled |
| 3973 | $ ls annotated/home/moz/src/oprofile-pp/daemon/ |
| 3974 | opd_cookie.h opd_image.c opd_kernel.c opd_sample_files.c oprofiled.c |
| 3975 | </pre> |
| 3976 | </td> |
| 3977 | </tr> |
| 3978 | </table> |
| 3979 | <p> |
| 3980 | Line numbers are maintained in the source files, but each file has |
| 3981 | a footer appended describing the profiling details. The actual annotation |
| 3982 | looks something like this : |
| 3983 | </p> |
| 3984 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 3985 | <tr> |
| 3986 | <td> |
| 3987 | <pre class="screen"> |
| 3988 | ... |
| 3989 | :static uint64_t pop_buffer_value(struct transient * trans) |
| 3990 | 11510 1.9661 :{ /* pop_buffer_value total: 89901 15.3566 */ |
| 3991 | : uint64_t val; |
| 3992 | : |
| 3993 | 10227 1.7469 : if (!trans->remaining) { |
| 3994 | : fprintf(stderr, "BUG: popping empty buffer !\n"); |
| 3995 | : exit(EXIT_FAILURE); |
| 3996 | : } |
| 3997 | : |
| 3998 | : val = get_buffer_value(trans->buffer, 0); |
| 3999 | 2281 0.3896 : trans->remaining--; |
| 4000 | 2296 0.3922 : trans->buffer += kernel_pointer_size; |
| 4001 | : return val; |
| 4002 | 10454 1.7857 :} |
| 4003 | ... |
| 4004 | </pre> |
| 4005 | </td> |
| 4006 | </tr> |
| 4007 | </table> |
| 4008 | <p> |
| 4009 | The first number on each line is the number of samples, whilst the second is |
| 4010 | the relative percentage of total samples. |
| 4011 | </p> |
| 4012 | <div class="sect2" lang="en" xml:lang="en"> |
| 4013 | <div class="titlepage"> |
| 4014 | <div> |
| 4015 | <div> |
| 4016 | <h3 class="title"><a id="opannotate-finding-source"></a>3.1. Locating source files</h3> |
| 4017 | </div> |
| 4018 | </div> |
| 4019 | </div> |
| 4020 | <p> |
| 4021 | Of course, <span><strong class="command">opannotate</strong></span> needs to be able to locate the source files |
| 4022 | for the binary image(s) in order to produce output. Some binary images have debug |
| 4023 | information where the given source file paths are relative, not absolute. You can |
| 4024 | specify search paths to look for these files (similar to <span><strong class="command">gdb</strong></span>'s |
| 4025 | <code class="option">dir</code> command) with the <code class="option">--search-dirs</code> option. |
| 4026 | </p> |
| 4027 | <p> |
| 4028 | Sometimes you may have a binary image which gives absolute paths for the source files, |
| 4029 | but you have the actual sources elsewhere (commonly, you've installed an SRPM for |
| 4030 | a binary on your system and you want annotation from an existing profile). You can |
| 4031 | use the <code class="option">--base-dirs</code> option to redirect OProfile to look somewhere |
| 4032 | else for source files. For example, imagine we have a binary generated from a source |
| 4033 | file that is given in the debug information as <code class="filename">/tmp/build/libfoo/foo.c</code>, |
| 4034 | and you have the source tree matching that binary installed in <code class="filename">/home/user/libfoo/</code>. |
| 4035 | You can redirect OProfile to find <code class="filename">foo.c</code> correctly like this : |
| 4036 | </p> |
| 4037 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 4038 | <tr> |
| 4039 | <td> |
| 4040 | <pre class="screen"> |
| 4041 | $ opannotate --source --base-dirs=/tmp/build/libfoo/ --search-dirs=/home/user/libfoo/ --output-dir=annotated/ /lib/libfoo.so |
| 4042 | </pre> |
| 4043 | </td> |
| 4044 | </tr> |
| 4045 | </table> |
| 4046 | <p> |
| 4047 | You can specify multiple (comma-separated) paths to both options. |
| 4048 | </p> |
| 4049 | </div> |
| 4050 | <div class="sect2" lang="en" xml:lang="en"> |
| 4051 | <div class="titlepage"> |
| 4052 | <div> |
| 4053 | <div> |
| 4054 | <h3 class="title"><a id="opannotate-details"></a>3.2. Usage of <span><strong class="command">opannotate</strong></span></h3> |
| 4055 | </div> |
| 4056 | </div> |
| 4057 | </div> |
| 4058 | <div class="variablelist"> |
| 4059 | <dl> |
| 4060 | <dt> |
| 4061 | <span class="term"> |
| 4062 | <code class="option">--assembly / -a</code> |
| 4063 | </span> |
| 4064 | </dt> |
| 4065 | <dd> |
| 4066 | <p> |
| 4067 | Output annotated assembly. If this is combined with --source, then mixed |
| 4068 | source / assembly annotations are output. |
| 4069 | </p> |
| 4070 | </dd> |
| 4071 | <dt> |
| 4072 | <span class="term"> |
| 4073 | <code class="option">--base-dirs / -b [paths]/</code> |
| 4074 | </span> |
| 4075 | </dt> |
| 4076 | <dd> |
| 4077 | <p> |
| 4078 | Comma-separated list of path prefixes. This can be used to point OProfile to a |
| 4079 | different location for source files when the debug information specifies an |
| 4080 | absolute path on your system for the source that does not exist. The prefix |
| 4081 | is stripped from the debug source file paths, then searched in the search dirs |
| 4082 | specified by <code class="option">--search-dirs</code>. |
| 4083 | </p> |
| 4084 | </dd> |
| 4085 | <dt> |
| 4086 | <span class="term"> |
| 4087 | <code class="option">--demangle / -D none|normal|smart</code> |
| 4088 | </span> |
| 4089 | </dt> |
| 4090 | <dd> |
| 4091 | <p> |
| 4092 | none: no demangling. normal: use default demangler (default) smart: use |
| 4093 | pattern-matching to make C++ symbol demangling more readable. |
| 4094 | </p> |
| 4095 | </dd> |
| 4096 | <dt> |
| 4097 | <span class="term"> |
| 4098 | <code class="option">--exclude-dependent / -x</code> |
| 4099 | </span> |
| 4100 | </dt> |
| 4101 | <dd> |
| 4102 | <p> |
| 4103 | Do not include application-specific images for libraries, kernel modules |
| 4104 | and the kernel. This option only makes sense if the profile session |
| 4105 | used --separate. |
| 4106 | </p> |
| 4107 | </dd> |
| 4108 | <dt> |
| 4109 | <span class="term"> |
| 4110 | <code class="option">--exclude-file [files]</code> |
| 4111 | </span> |
| 4112 | </dt> |
| 4113 | <dd> |
| 4114 | <p> |
| 4115 | Exclude all files in the given comma-separated list of glob patterns. |
| 4116 | </p> |
| 4117 | </dd> |
| 4118 | <dt> |
| 4119 | <span class="term"> |
| 4120 | <code class="option">--exclude-symbols / -e [symbols]</code> |
| 4121 | </span> |
| 4122 | </dt> |
| 4123 | <dd> |
| 4124 | <p> |
| 4125 | Exclude all the symbols in the given comma-separated list. |
| 4126 | </p> |
| 4127 | </dd> |
| 4128 | <dt> |
| 4129 | <span class="term"> |
| 4130 | <code class="option">--help / -? / --usage</code> |
| 4131 | </span> |
| 4132 | </dt> |
| 4133 | <dd> |
| 4134 | <p> |
| 4135 | Show help message. |
| 4136 | </p> |
| 4137 | </dd> |
| 4138 | <dt> |
| 4139 | <span class="term"> |
| 4140 | <code class="option">--image-path / -p [paths]</code> |
| 4141 | </span> |
| 4142 | </dt> |
| 4143 | <dd> |
| 4144 | <p> |
| 4145 | Comma-separated list of additional paths to search for binaries. |
| 4146 | This is needed to find modules in kernels 2.6 and upwards. |
| 4147 | </p> |
| 4148 | </dd> |
| 4149 | <dt> |
| 4150 | <span class="term"> |
| 4151 | <code class="option">--root / -R [path]</code> |
| 4152 | </span> |
| 4153 | </dt> |
| 4154 | <dd> |
| 4155 | <p> |
| 4156 | A path to a filesystem to search for additional binaries. |
| 4157 | </p> |
| 4158 | </dd> |
| 4159 | <dt> |
| 4160 | <span class="term"> |
| 4161 | <code class="option">--include-file [files]</code> |
| 4162 | </span> |
| 4163 | </dt> |
| 4164 | <dd> |
| 4165 | <p> |
| 4166 | Only include files in the given comma-separated list of glob patterns. |
| 4167 | </p> |
| 4168 | </dd> |
| 4169 | <dt> |
| 4170 | <span class="term"> |
| 4171 | <code class="option">--include-symbols / -i [symbols]</code> |
| 4172 | </span> |
| 4173 | </dt> |
| 4174 | <dd> |
| 4175 | <p> |
| 4176 | Only include symbols in the given comma-separated list. |
| 4177 | </p> |
| 4178 | </dd> |
| 4179 | <dt> |
| 4180 | <span class="term"> |
| 4181 | <code class="option">--objdump-params [params]</code> |
| 4182 | </span> |
| 4183 | </dt> |
| 4184 | <dd> |
| 4185 | <p> |
| 4186 | Pass the given parameters as extra values when calling objdump. |
| 4187 | </p> |
| 4188 | </dd> |
| 4189 | <dt> |
| 4190 | <span class="term"> |
| 4191 | <code class="option">--output-dir / -o [dir]</code> |
| 4192 | </span> |
| 4193 | </dt> |
| 4194 | <dd> |
| 4195 | <p> |
| 4196 | Output directory. This makes opannotate output one annotated file for each |
| 4197 | source file. This option can't be used in conjunction with --assembly. |
| 4198 | </p> |
| 4199 | </dd> |
| 4200 | <dt> |
| 4201 | <span class="term"> |
| 4202 | <code class="option">--search-dirs / -d [paths]</code> |
| 4203 | </span> |
| 4204 | </dt> |
| 4205 | <dd> |
| 4206 | <p> |
| 4207 | Comma-separated list of paths to search for source files. This is useful to find |
| 4208 | source files when the debug information only contains relative paths. |
| 4209 | </p> |
| 4210 | </dd> |
| 4211 | <dt> |
| 4212 | <span class="term"> |
| 4213 | <code class="option">--source / -s</code> |
| 4214 | </span> |
| 4215 | </dt> |
| 4216 | <dd> |
| 4217 | <p> |
| 4218 | Output annotated source. This requires debugging information to be available |
| 4219 | for the binaries. |
| 4220 | </p> |
| 4221 | </dd> |
| 4222 | <dt> |
| 4223 | <span class="term"> |
| 4224 | <code class="option">--threshold / -t [percentage]</code> |
| 4225 | </span> |
| 4226 | </dt> |
| 4227 | <dd> |
| 4228 | <p> |
| 4229 | Only output data for symbols that have more than the given percentage |
| 4230 | of total samples. |
| 4231 | </p> |
| 4232 | </dd> |
| 4233 | <dt> |
| 4234 | <span class="term"> |
| 4235 | <code class="option">--verbose / -V [options]</code> |
| 4236 | </span> |
| 4237 | </dt> |
| 4238 | <dd> |
| 4239 | <p> |
| 4240 | Give verbose debugging output. |
| 4241 | </p> |
| 4242 | </dd> |
| 4243 | <dt> |
| 4244 | <span class="term"> |
| 4245 | <code class="option">--version / -v</code> |
| 4246 | </span> |
| 4247 | </dt> |
| 4248 | <dd> |
| 4249 | <p> |
| 4250 | Show version. |
| 4251 | </p> |
| 4252 | </dd> |
| 4253 | </dl> |
| 4254 | </div> |
| 4255 | </div> |
| 4256 | </div> |
| 4257 | <div class="sect1" lang="en" xml:lang="en"> |
| 4258 | <div class="titlepage"> |
| 4259 | <div> |
| 4260 | <div> |
| 4261 | <h2 class="title" style="clear: both"><a id="getting-jit-reports"></a>4. OProfile results with JIT samples</h2> |
| 4262 | </div> |
| 4263 | </div> |
| 4264 | </div> |
| 4265 | <p> |
| 4266 | After profiling a Java (or other supported VM) application, the command |
| 4267 | </p> |
| 4268 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 4269 | <tr> |
| 4270 | <td> |
| 4271 | <pre class="screen"><span xmlns="http://www.w3.org/1999/xhtml"><strong class="command">"opcontrol --dump"</strong></span> </pre> |
| 4272 | </td> |
| 4273 | </tr> |
| 4274 | </table> |
| 4275 | <p> |
| 4276 | flushes the sample buffers and creates ELF binaries from the |
| 4277 | intermediate files that were written by the agent library. |
| 4278 | The ELF binaries are named <code class="filename"><tgid>.jo</code>. |
| 4279 | With the symbol information stored in these ELF files, it is |
| 4280 | possible to map samples to the appropriate symbols. |
| 4281 | </p> |
| 4282 | <p> |
| 4283 | The usual analysis tools (<span><strong class="command">opreport</strong></span> and/or |
| 4284 | <span><strong class="command">opannotate</strong></span>) can now be used |
| 4285 | to get symbols and assembly code for the instrumented VM processes. |
| 4286 | </p> |
| 4287 | <p> |
| 4288 | Below is an example of a profile report of a Java application that has been |
| 4289 | instrumented with the provided agent library. |
| 4290 | </p> |
| 4291 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 4292 | <tr> |
| 4293 | <td> |
| 4294 | <pre class="screen"> |
| 4295 | $ opreport -l /usr/lib/jvm/jre-1.5.0-ibm/bin/java |
| 4296 | CPU: Core Solo / Duo, speed 2167 MHz (estimated) |
| 4297 | Counted CPU_CLK_UNHALTED events (Unhalted clock cycles) with a unit mask of 0x00 (Unhalted core cycles) count 100000 |
| 4298 | samples % image name symbol name |
| 4299 | 186020 50.0523 no-vmlinux no-vmlinux (no symbols) |
| 4300 | 34333 9.2380 7635.jo java void test.f1() |
| 4301 | 19022 5.1182 libc-2.5.so libc-2.5.so _IO_file_xsputn@@GLIBC_2.1 |
| 4302 | 18762 5.0483 libc-2.5.so libc-2.5.so vfprintf |
| 4303 | 16408 4.4149 7635.jo java void test$HelloThread.run() |
| 4304 | 16250 4.3724 7635.jo java void test$test_1.f2(int) |
| 4305 | 15303 4.1176 7635.jo java void test.f2(int, int) |
| 4306 | 13252 3.5657 7635.jo java void test.f2(int) |
| 4307 | 5165 1.3897 7635.jo java void test.f4() |
| 4308 | 955 0.2570 7635.jo java void test$HelloThread.run()~ |
| 4309 | |
| 4310 | </pre> |
| 4311 | </td> |
| 4312 | </tr> |
| 4313 | </table> |
| 4314 | <p> |
| 4315 | </p> |
| 4316 | <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"> |
| 4317 | <h3 class="title">Note</h3> |
| 4318 | <p> |
| 4319 | Depending on the JVM that is used, certain options of opreport and opannotate |
| 4320 | do NOT work since they rely on debug information (e.g. source code line number) |
| 4321 | that is not always available. The Sun JVM does provide the necessary debug |
| 4322 | information via the JVMTI[PI] interface, |
| 4323 | but other JVMs do not. |
| 4324 | </p> |
| 4325 | </div> |
| 4326 | <p> |
| 4327 | As you can see in the opreport output, the JIT support agent for Java |
| 4328 | generates symbols to include the class and method signature. |
| 4329 | A symbol with the suffix ˜<n> (e.g. |
| 4330 | <code class="code">void test$HelloThread.run()˜1</code>) means that this is |
| 4331 | the <n>th occurrence of the identical name. This happens if a method is re-JITed. |
| 4332 | A symbol with the suffix %<n>, means that the address space of this symbol |
| 4333 | was reused during the sample session (see <a href="#overlapping-symbols" title="6. Overlapping symbols in JITed code">Section 6, “Overlapping symbols in JITed code”</a>). |
| 4334 | The value <n> is the percentage of time that this symbol/code was present in |
| 4335 | relation to the total lifetime of all overlapping other symbols. A symbol of the form |
| 4336 | <code class="code"><return_val> <class_name>$<method_sig></code> denotes an |
| 4337 | inner class. |
| 4338 | </p> |
| 4339 | </div> |
| 4340 | <div class="sect1" lang="en" xml:lang="en"> |
| 4341 | <div class="titlepage"> |
| 4342 | <div> |
| 4343 | <div> |
| 4344 | <h2 class="title" style="clear: both"><a id="opgprof"></a>5. <span><strong class="command">gprof</strong></span>-compatible output (<span><strong class="command">opgprof</strong></span>)</h2> |
| 4345 | </div> |
| 4346 | </div> |
| 4347 | </div> |
| 4348 | <p> |
| 4349 | If you're familiar with the output produced by <span><strong class="command">GNU gprof</strong></span>, |
| 4350 | you may find <span><strong class="command">opgprof</strong></span> useful. It takes a single binary |
| 4351 | as an argument, and produces a <code class="filename">gmon.out</code> file for use |
| 4352 | with <span><strong class="command">gprof -p</strong></span>. If call-graph profiling is enabled, |
| 4353 | then this is also included. |
| 4354 | </p> |
| 4355 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 4356 | <tr> |
| 4357 | <td> |
| 4358 | <pre class="screen"> |
| 4359 | $ opgprof `which oprofiled` # generates gmon.out file |
| 4360 | $ gprof -p `which oprofiled` | head |
| 4361 | Flat profile: |
| 4362 | |
| 4363 | Each sample counts as 1 samples. |
| 4364 | % cumulative self self total |
| 4365 | time samples samples calls T1/call T1/call name |
| 4366 | 33.13 206237.00 206237.00 odb_insert |
| 4367 | 22.67 347386.00 141149.00 pop_buffer_value |
| 4368 | 9.56 406881.00 59495.00 opd_put_sample |
| 4369 | 7.34 452599.00 45718.00 opd_find_image |
| 4370 | 7.19 497327.00 44728.00 opd_process_samples |
| 4371 | </pre> |
| 4372 | </td> |
| 4373 | </tr> |
| 4374 | </table> |
| 4375 | <div class="sect2" lang="en" xml:lang="en"> |
| 4376 | <div class="titlepage"> |
| 4377 | <div> |
| 4378 | <div> |
| 4379 | <h3 class="title"><a id="opgprof-details"></a>5.1. Usage of <span><strong class="command">opgprof</strong></span></h3> |
| 4380 | </div> |
| 4381 | </div> |
| 4382 | </div> |
| 4383 | <div class="variablelist"> |
| 4384 | <dl> |
| 4385 | <dt> |
| 4386 | <span class="term"> |
| 4387 | <code class="option">--help / -? / --usage</code> |
| 4388 | </span> |
| 4389 | </dt> |
| 4390 | <dd> |
| 4391 | <p> |
| 4392 | Show help message. |
| 4393 | </p> |
| 4394 | </dd> |
| 4395 | <dt> |
| 4396 | <span class="term"> |
| 4397 | <code class="option">--image-path / -p [paths]</code> |
| 4398 | </span> |
| 4399 | </dt> |
| 4400 | <dd> |
| 4401 | <p> |
| 4402 | Comma-separated list of additional paths to search for binaries. |
| 4403 | This is needed to find modules in kernels 2.6 and upwards. |
| 4404 | </p> |
| 4405 | </dd> |
| 4406 | <dt> |
| 4407 | <span class="term"> |
| 4408 | <code class="option">--root / -R [path]</code> |
| 4409 | </span> |
| 4410 | </dt> |
| 4411 | <dd> |
| 4412 | <p> |
| 4413 | A path to a filesystem to search for additional binaries. |
| 4414 | </p> |
| 4415 | </dd> |
| 4416 | <dt> |
| 4417 | <span class="term"> |
| 4418 | <code class="option">--output-filename / -o [file]</code> |
| 4419 | </span> |
| 4420 | </dt> |
| 4421 | <dd> |
| 4422 | <p> |
| 4423 | Output to the given file instead of the default, gmon.out |
| 4424 | </p> |
| 4425 | </dd> |
| 4426 | <dt> |
| 4427 | <span class="term"> |
| 4428 | <code class="option">--threshold / -t [percentage]</code> |
| 4429 | </span> |
| 4430 | </dt> |
| 4431 | <dd> |
| 4432 | <p> |
| 4433 | Only output data for symbols that have more than the given percentage |
| 4434 | of total samples. |
| 4435 | </p> |
| 4436 | </dd> |
| 4437 | <dt> |
| 4438 | <span class="term"> |
| 4439 | <code class="option">--verbose / -V [options]</code> |
| 4440 | </span> |
| 4441 | </dt> |
| 4442 | <dd> |
| 4443 | <p> |
| 4444 | Give verbose debugging output. |
| 4445 | </p> |
| 4446 | </dd> |
| 4447 | <dt> |
| 4448 | <span class="term"> |
| 4449 | <code class="option">--version / -v</code> |
| 4450 | </span> |
| 4451 | </dt> |
| 4452 | <dd> |
| 4453 | <p> |
| 4454 | Show version. |
| 4455 | </p> |
| 4456 | </dd> |
| 4457 | </dl> |
| 4458 | </div> |
| 4459 | </div> |
| 4460 | </div> |
| 4461 | <div class="sect1" lang="en" xml:lang="en"> |
| 4462 | <div class="titlepage"> |
| 4463 | <div> |
| 4464 | <div> |
| 4465 | <h2 class="title" style="clear: both"><a id="oparchive"></a>6. Archiving measurements (<span><strong class="command">oparchive</strong></span>)</h2> |
| 4466 | </div> |
| 4467 | </div> |
| 4468 | </div> |
| 4469 | <p> |
| 4470 | The <span><strong class="command">oparchive</strong></span> utility generates a directory populated |
| 4471 | with executable, debug, and oprofile sample files. This directory can be |
| 4472 | moved to another machine via <span><strong class="command">tar</strong></span> and analyzed without |
| 4473 | further use of the data collection machine. |
| 4474 | </p> |
| 4475 | <p> |
| 4476 | The following command would collect the sample files, the executables |
| 4477 | associated with the sample files, and the debuginfo files associated |
| 4478 | with the executables and copy them into |
| 4479 | <code class="filename">/tmp/current_data</code>: |
| 4480 | </p> |
| 4481 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 4482 | <tr> |
| 4483 | <td> |
| 4484 | <pre class="screen"> |
| 4485 | # oparchive -o /tmp/current_data |
| 4486 | </pre> |
| 4487 | </td> |
| 4488 | </tr> |
| 4489 | </table> |
| 4490 | <div class="sect2" lang="en" xml:lang="en"> |
| 4491 | <div class="titlepage"> |
| 4492 | <div> |
| 4493 | <div> |
| 4494 | <h3 class="title"><a id="oparchive-details"></a>6.1. Usage of <span><strong class="command">oparchive</strong></span></h3> |
| 4495 | </div> |
| 4496 | </div> |
| 4497 | </div> |
| 4498 | <div class="variablelist"> |
| 4499 | <dl> |
| 4500 | <dt> |
| 4501 | <span class="term"> |
| 4502 | <code class="option">--help / -? / --usage</code> |
| 4503 | </span> |
| 4504 | </dt> |
| 4505 | <dd> |
| 4506 | <p> |
| 4507 | Show help message. |
| 4508 | </p> |
| 4509 | </dd> |
| 4510 | <dt> |
| 4511 | <span class="term"> |
| 4512 | <code class="option">--exclude-dependent / -x</code> |
| 4513 | </span> |
| 4514 | </dt> |
| 4515 | <dd> |
| 4516 | <p> |
| 4517 | Do not include application-specific images for libraries, kernel modules |
| 4518 | and the kernel. This option only makes sense if the profile session |
| 4519 | used --separate. |
| 4520 | </p> |
| 4521 | </dd> |
| 4522 | <dt> |
| 4523 | <span class="term"> |
| 4524 | <code class="option">--image-path / -p [paths]</code> |
| 4525 | </span> |
| 4526 | </dt> |
| 4527 | <dd> |
| 4528 | <p> |
| 4529 | Comma-separated list of additional paths to search for binaries. |
| 4530 | This is needed to find modules in kernels 2.6 and upwards. |
| 4531 | </p> |
| 4532 | </dd> |
| 4533 | <dt> |
| 4534 | <span class="term"> |
| 4535 | <code class="option">--root / -R [path]</code> |
| 4536 | </span> |
| 4537 | </dt> |
| 4538 | <dd> |
| 4539 | <p> |
| 4540 | A path to a filesystem to search for additional binaries. |
| 4541 | </p> |
| 4542 | </dd> |
| 4543 | <dt> |
| 4544 | <span class="term"> |
| 4545 | <code class="option">--output-directory / -o [directory]</code> |
| 4546 | </span> |
| 4547 | </dt> |
| 4548 | <dd> |
| 4549 | <p> |
| 4550 | Output to the given directory. There is no default. This must be specified. |
| 4551 | </p> |
| 4552 | </dd> |
| 4553 | <dt> |
| 4554 | <span class="term"> |
| 4555 | <code class="option">--list-files / -l</code> |
| 4556 | </span> |
| 4557 | </dt> |
| 4558 | <dd> |
| 4559 | <p> |
| 4560 | Only list the files that would be archived, don't copy them. |
| 4561 | </p> |
| 4562 | </dd> |
| 4563 | <dt> |
| 4564 | <span class="term"> |
| 4565 | <code class="option">--verbose / -V [options]</code> |
| 4566 | </span> |
| 4567 | </dt> |
| 4568 | <dd> |
| 4569 | <p> |
| 4570 | Give verbose debugging output. |
| 4571 | </p> |
| 4572 | </dd> |
| 4573 | <dt> |
| 4574 | <span class="term"> |
| 4575 | <code class="option">--version / -v</code> |
| 4576 | </span> |
| 4577 | </dt> |
| 4578 | <dd> |
| 4579 | <p> |
| 4580 | Show version. |
| 4581 | </p> |
| 4582 | </dd> |
| 4583 | </dl> |
| 4584 | </div> |
| 4585 | </div> |
| 4586 | </div> |
| 4587 | <div class="sect1" lang="en" xml:lang="en"> |
| 4588 | <div class="titlepage"> |
| 4589 | <div> |
| 4590 | <div> |
| 4591 | <h2 class="title" style="clear: both"><a id="opimport"></a>7. Converting sample database files (<span><strong class="command">opimport</strong></span>)</h2> |
| 4592 | </div> |
| 4593 | </div> |
| 4594 | </div> |
| 4595 | <p> |
| 4596 | This utility converts sample database files from a foreign binary format (abi) to |
| 4597 | the native format. This is useful only when moving sample files between hosts, |
| 4598 | for analysis on platforms other than the one used for collection. The abi format |
| 4599 | of the file to be imported is described in a text file located in <code class="filename">$SESSION_DIR/abi</code>. |
| 4600 | </p> |
| 4601 | <p> |
| 4602 | The following command would convert the input samples files to the |
| 4603 | output samples files using the given abi file as a binary description |
| 4604 | of the input file and the curent platform abi as a binary description |
| 4605 | of the output file. |
| 4606 | </p> |
| 4607 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 4608 | <tr> |
| 4609 | <td> |
| 4610 | <pre class="screen"> |
| 4611 | # opimport -a /var/lib/oprofile/abi -o /tmp/current/.../GLOBAL_POWER_EVENTS.200000.1.all.all.all /var/lib/.../mprime/GLOBAL_POWER_EVENTS.200000.1.all.all.all |
| 4612 | </pre> |
| 4613 | </td> |
| 4614 | </tr> |
| 4615 | </table> |
| 4616 | <div class="sect2" lang="en" xml:lang="en"> |
| 4617 | <div class="titlepage"> |
| 4618 | <div> |
| 4619 | <div> |
| 4620 | <h3 class="title"><a id="opimport-details"></a>7.1. Usage of <span><strong class="command">opimport</strong></span></h3> |
| 4621 | </div> |
| 4622 | </div> |
| 4623 | </div> |
| 4624 | <div class="variablelist"> |
| 4625 | <dl> |
| 4626 | <dt> |
| 4627 | <span class="term"> |
| 4628 | <code class="option">--help / -? / --usage</code> |
| 4629 | </span> |
| 4630 | </dt> |
| 4631 | <dd> |
| 4632 | <p> |
| 4633 | Show help message. |
| 4634 | </p> |
| 4635 | </dd> |
| 4636 | <dt> |
| 4637 | <span class="term"> |
| 4638 | <code class="option">--abi / -a [filename]</code> |
| 4639 | </span> |
| 4640 | </dt> |
| 4641 | <dd> |
| 4642 | <p> |
| 4643 | Input abi file description location. |
| 4644 | </p> |
| 4645 | </dd> |
| 4646 | <dt> |
| 4647 | <span class="term"> |
| 4648 | <code class="option">--force / -f</code> |
| 4649 | </span> |
| 4650 | </dt> |
| 4651 | <dd> |
| 4652 | <p> |
| 4653 | Force conversion even if the input and output abi are identical. |
| 4654 | </p> |
| 4655 | </dd> |
| 4656 | <dt> |
| 4657 | <span class="term"> |
| 4658 | <code class="option">--output / -o [filename]</code> |
| 4659 | </span> |
| 4660 | </dt> |
| 4661 | <dd> |
| 4662 | <p> |
| 4663 | Specify the output filename. If the output file already exists, the file is |
| 4664 | not overwritten but data are accumulated in. Sample filename are informative |
| 4665 | for post profile tools and must be kept identical, in other word the pathname |
| 4666 | from the first path component containing a '{' must be kept as it in the |
| 4667 | output filename. |
| 4668 | </p> |
| 4669 | </dd> |
| 4670 | <dt> |
| 4671 | <span class="term"> |
| 4672 | <code class="option">--verbose / -V</code> |
| 4673 | </span> |
| 4674 | </dt> |
| 4675 | <dd> |
| 4676 | <p> |
| 4677 | Give verbose debugging output. |
| 4678 | </p> |
| 4679 | </dd> |
| 4680 | <dt> |
| 4681 | <span class="term"> |
| 4682 | <code class="option">--version / -v</code> |
| 4683 | </span> |
| 4684 | </dt> |
| 4685 | <dd> |
| 4686 | <p> |
| 4687 | Show version. |
| 4688 | </p> |
| 4689 | </dd> |
| 4690 | </dl> |
| 4691 | </div> |
| 4692 | </div> |
| 4693 | </div> |
| 4694 | </div> |
| 4695 | <div class="chapter" lang="en" xml:lang="en"> |
| 4696 | <div class="titlepage"> |
| 4697 | <div> |
| 4698 | <div> |
| 4699 | <h2 class="title"><a id="interpreting"></a>Chapter 5. Interpreting profiling results</h2> |
| 4700 | </div> |
| 4701 | </div> |
| 4702 | </div> |
| 4703 | <div class="toc"> |
| 4704 | <p> |
| 4705 | <b>Table of Contents</b> |
| 4706 | </p> |
| 4707 | <dl> |
| 4708 | <dt> |
| 4709 | <span class="sect1"> |
| 4710 | <a href="#irq-latency">1. Profiling interrupt latency</a> |
| 4711 | </span> |
| 4712 | </dt> |
| 4713 | <dt> |
| 4714 | <span class="sect1"> |
| 4715 | <a href="#kernel-profiling">2. Kernel profiling</a> |
| 4716 | </span> |
| 4717 | </dt> |
| 4718 | <dd> |
| 4719 | <dl> |
| 4720 | <dt> |
| 4721 | <span class="sect2"> |
| 4722 | <a href="#irq-masking">2.1. Interrupt masking</a> |
| 4723 | </span> |
| 4724 | </dt> |
| 4725 | <dt> |
| 4726 | <span class="sect2"> |
| 4727 | <a href="#idle">2.2. Idle time</a> |
| 4728 | </span> |
| 4729 | </dt> |
| 4730 | <dt> |
| 4731 | <span class="sect2"> |
| 4732 | <a href="#kernel-modules">2.3. Profiling kernel modules</a> |
| 4733 | </span> |
| 4734 | </dt> |
| 4735 | </dl> |
| 4736 | </dd> |
| 4737 | <dt> |
| 4738 | <span class="sect1"> |
| 4739 | <a href="#interpreting-callgraph">3. Interpreting call-graph profiles</a> |
| 4740 | </span> |
| 4741 | </dt> |
| 4742 | <dt> |
| 4743 | <span class="sect1"> |
| 4744 | <a href="#debug-info">4. Inaccuracies in annotated source</a> |
| 4745 | </span> |
| 4746 | </dt> |
| 4747 | <dd> |
| 4748 | <dl> |
| 4749 | <dt> |
| 4750 | <span class="sect2"> |
| 4751 | <a href="#effect-of-optimizations">4.1. Side effects of optimizations</a> |
| 4752 | </span> |
| 4753 | </dt> |
| 4754 | <dt> |
| 4755 | <span class="sect2"> |
| 4756 | <a href="#prologues">4.2. Prologues and epilogues</a> |
| 4757 | </span> |
| 4758 | </dt> |
| 4759 | <dt> |
| 4760 | <span class="sect2"> |
| 4761 | <a href="#inlined-function">4.3. Inlined functions</a> |
| 4762 | </span> |
| 4763 | </dt> |
| 4764 | <dt> |
| 4765 | <span class="sect2"> |
| 4766 | <a href="#wrong-linenr-info">4.4. Inaccuracy in line number information</a> |
| 4767 | </span> |
| 4768 | </dt> |
| 4769 | </dl> |
| 4770 | </dd> |
| 4771 | <dt> |
| 4772 | <span class="sect1"> |
| 4773 | <a href="#symbol-without-debug-info">5. Assembly functions</a> |
| 4774 | </span> |
| 4775 | </dt> |
| 4776 | <dt> |
| 4777 | <span class="sect1"> |
| 4778 | <a href="#overlapping-symbols">6. Overlapping symbols in JITed code</a> |
| 4779 | </span> |
| 4780 | </dt> |
| 4781 | <dt> |
| 4782 | <span class="sect1"> |
| 4783 | <a href="#hidden-cost">7. Other discrepancies</a> |
| 4784 | </span> |
| 4785 | </dt> |
| 4786 | </dl> |
| 4787 | </div> |
| 4788 | <p> |
| 4789 | The standard caveats of profiling apply in interpreting the results from OProfile: |
| 4790 | profile realistic situations, profile different scenarios, profile |
| 4791 | for as long as a time as possible, avoid system-specific artifacts, don't trust |
| 4792 | the profile data too much. Also bear in mind the comments on the performance |
| 4793 | counters above - you <span class="emphasis"><em>cannot</em></span> rely on totally accurate |
| 4794 | instruction-level profiling. However, for almost all circumstances the data |
| 4795 | can be useful. Ideally a utility such as Intel's VTUNE would be available to |
| 4796 | allow careful instruction-level analysis; go hassle Intel for this, not me ;) |
| 4797 | </p> |
| 4798 | <div class="sect1" lang="en" xml:lang="en"> |
| 4799 | <div class="titlepage"> |
| 4800 | <div> |
| 4801 | <div> |
| 4802 | <h2 class="title" style="clear: both"><a id="irq-latency"></a>1. Profiling interrupt latency</h2> |
| 4803 | </div> |
| 4804 | </div> |
| 4805 | </div> |
| 4806 | <p> |
| 4807 | This is an example of how the latency of delivery of profiling interrupts |
| 4808 | can impact the reliability of the profiling data. This is pretty much a |
| 4809 | worst-case-scenario example: these problems are fairly rare. |
| 4810 | </p> |
| 4811 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 4812 | <tr> |
| 4813 | <td> |
| 4814 | <pre class="screen"> |
| 4815 | double fun(double a, double b, double c) |
| 4816 | { |
| 4817 | double result = 0; |
| 4818 | for (int i = 0 ; i < 10000; ++i) { |
| 4819 | result += a; |
| 4820 | result *= b; |
| 4821 | result /= c; |
| 4822 | } |
| 4823 | return result; |
| 4824 | } |
| 4825 | </pre> |
| 4826 | </td> |
| 4827 | </tr> |
| 4828 | </table> |
| 4829 | <p> |
| 4830 | Here the last instruction of the loop is very costly, and you would expect the result |
| 4831 | reflecting that - but (cutting the instructions inside the loop): |
| 4832 | </p> |
| 4833 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 4834 | <tr> |
| 4835 | <td> |
| 4836 | <pre class="screen"> |
| 4837 | $ opannotate -a -t 10 ./a.out |
| 4838 | |
| 4839 | 88 15.38% : 8048337: fadd %st(3),%st |
| 4840 | 48 8.391% : 8048339: fmul %st(2),%st |
| 4841 | 68 11.88% : 804833b: fdiv %st(1),%st |
| 4842 | 368 64.33% : 804833d: inc %eax |
| 4843 | : 804833e: cmp $0x270f,%eax |
| 4844 | : 8048343: jle 8048337 |
| 4845 | </pre> |
| 4846 | </td> |
| 4847 | </tr> |
| 4848 | </table> |
| 4849 | <p> |
| 4850 | The problem comes from the x86 hardware; when the counter overflows the IRQ |
| 4851 | is asserted but the hardware has features that can delay the NMI interrupt: |
| 4852 | x86 hardware is synchronous (i.e. cannot interrupt during an instruction); |
| 4853 | there is also a latency when the IRQ is asserted, and the multiple |
| 4854 | execution units and the out-of-order model of modern x86 CPUs also causes |
| 4855 | problems. This is the same function, with annotation : |
| 4856 | </p> |
| 4857 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 4858 | <tr> |
| 4859 | <td> |
| 4860 | <pre class="screen"> |
| 4861 | $ opannotate -s -t 10 ./a.out |
| 4862 | |
| 4863 | :double fun(double a, double b, double c) |
| 4864 | :{ /* _Z3funddd total: 572 100.0% */ |
| 4865 | : double result = 0; |
| 4866 | 368 64.33% : for (int i = 0 ; i < 10000; ++i) { |
| 4867 | 88 15.38% : result += a; |
| 4868 | 48 8.391% : result *= b; |
| 4869 | 68 11.88% : result /= c; |
| 4870 | : } |
| 4871 | : return result; |
| 4872 | :} |
| 4873 | </pre> |
| 4874 | </td> |
| 4875 | </tr> |
| 4876 | </table> |
| 4877 | <p> |
| 4878 | The conclusion: don't trust samples coming at the end of a loop, |
| 4879 | particularly if the last instruction generated by the compiler is costly. This |
| 4880 | case can also occur for branches. Always bear in mind that samples |
| 4881 | can be delayed by a few cycles from its real position. That's a hardware |
| 4882 | problem and OProfile can do nothing about it. |
| 4883 | </p> |
| 4884 | </div> |
| 4885 | <div class="sect1" lang="en" xml:lang="en"> |
| 4886 | <div class="titlepage"> |
| 4887 | <div> |
| 4888 | <div> |
| 4889 | <h2 class="title" style="clear: both"><a id="kernel-profiling"></a>2. Kernel profiling</h2> |
| 4890 | </div> |
| 4891 | </div> |
| 4892 | </div> |
| 4893 | <div class="sect2" lang="en" xml:lang="en"> |
| 4894 | <div class="titlepage"> |
| 4895 | <div> |
| 4896 | <div> |
| 4897 | <h3 class="title"><a id="irq-masking"></a>2.1. Interrupt masking</h3> |
| 4898 | </div> |
| 4899 | </div> |
| 4900 | </div> |
| 4901 | <p> |
| 4902 | OProfile uses non-maskable interrupts (NMI) on the P6 generation, Pentium 4, |
| 4903 | Athlon, Opteron, Phenom, and Turion processors. These interrupts can occur even in section of the |
| 4904 | Linux where interrupts are disabled, allowing collection of samples in virtually |
| 4905 | all executable code. The RTC, timer interrupt mode, and Itanium 2 collection mechanisms |
| 4906 | use maskable interrupts. Thus, the RTC and Itanium 2 data collection mechanism have "sample |
| 4907 | shadows", or blind spots: regions where no samples will be collected. Typically, the samples |
| 4908 | will be attributed to the code immediately after the interrupts are re-enabled. |
| 4909 | </p> |
| 4910 | </div> |
| 4911 | <div class="sect2" lang="en" xml:lang="en"> |
| 4912 | <div class="titlepage"> |
| 4913 | <div> |
| 4914 | <div> |
| 4915 | <h3 class="title"><a id="idle"></a>2.2. Idle time</h3> |
| 4916 | </div> |
| 4917 | </div> |
| 4918 | </div> |
| 4919 | <p> |
| 4920 | Your kernel is likely to support halting the processor when a CPU is idle. As |
| 4921 | the typical hardware events like <code class="constant">CPU_CLK_UNHALTED</code> do not |
| 4922 | count when the CPU is halted, the kernel profile will not reflect the actual |
| 4923 | amount of time spent idle. You can change this behaviour by booting with |
| 4924 | the <code class="option">idle=poll</code> option, which uses a different idle routine. This |
| 4925 | will appear as <code class="function">poll_idle()</code> in your kernel profile. |
| 4926 | </p> |
| 4927 | </div> |
| 4928 | <div class="sect2" lang="en" xml:lang="en"> |
| 4929 | <div class="titlepage"> |
| 4930 | <div> |
| 4931 | <div> |
| 4932 | <h3 class="title"><a id="kernel-modules"></a>2.3. Profiling kernel modules</h3> |
| 4933 | </div> |
| 4934 | </div> |
| 4935 | </div> |
| 4936 | <p> |
| 4937 | OProfile profiles kernel modules by default. However, there are a couple of problems |
| 4938 | you may have when trying to get results. First, you may have booted via an initrd; |
| 4939 | this means that the actual path for the module binaries cannot be determined automatically. |
| 4940 | To get around this, you can use the <code class="option">-p</code> option to the profiling tools |
| 4941 | to specify where to look for the kernel modules. |
| 4942 | </p> |
| 4943 | <p> |
| 4944 | In 2.6, the information on where kernel module binaries are located has been removed. |
| 4945 | This means OProfile needs guiding with the <code class="option">-p</code> option to find your |
| 4946 | modules. Normally, you can just use your standard module top-level directory for this. |
| 4947 | Note that due to this problem, OProfile cannot check that the modification times match; |
| 4948 | it is your responsibility to make sure you do not modify a binary after a profile |
| 4949 | has been created. |
| 4950 | </p> |
| 4951 | <p> |
| 4952 | If you have run <span><strong class="command">insmod</strong></span> or <span><strong class="command">modprobe</strong></span> to insert a module |
| 4953 | in a particular directory, it is important that you specify this directory with the |
| 4954 | <code class="option">-p</code> option first, so that it over-rides an older module binary that might |
| 4955 | exist in other directories you've specified with <code class="option">-p</code>. It is up to you |
| 4956 | to make sure that these values are correct: 2.6 kernels simply do not provide enough |
| 4957 | information for OProfile to get this information. |
| 4958 | </p> |
| 4959 | </div> |
| 4960 | </div> |
| 4961 | <div class="sect1" lang="en" xml:lang="en"> |
| 4962 | <div class="titlepage"> |
| 4963 | <div> |
| 4964 | <div> |
| 4965 | <h2 class="title" style="clear: both"><a id="interpreting-callgraph"></a>3. Interpreting call-graph profiles</h2> |
| 4966 | </div> |
| 4967 | </div> |
| 4968 | </div> |
| 4969 | <p> |
| 4970 | Sometimes the results from call-graph profiles may be different to what |
| 4971 | you expect to see. The first thing to check is whether the target |
| 4972 | binaries where compiled with frame pointers enabled (if the binary was |
| 4973 | compiled using <span><strong class="command">gcc</strong></span>'s |
| 4974 | <code class="option">-fomit-frame-pointer</code> option, you will not get |
| 4975 | meaningful results). Note that as of this writing, the GCC developers |
| 4976 | plan to disable frame pointers by default. The Linux kernel is built |
| 4977 | without frame pointers by default; there is a configuration option you |
| 4978 | can use to turn it on under the "Kernel Hacking" menu. |
| 4979 | </p> |
| 4980 | <p> |
| 4981 | Often you may see a caller of a function that does not actually directly |
| 4982 | call the function you're looking at (e.g. if <code class="function">a()</code> |
| 4983 | calls <code class="function">b()</code>, which in turn calls |
| 4984 | <code class="function">c()</code>, you may see an entry for |
| 4985 | <code class="function">a()->c()</code>). What's actually occurring is that we |
| 4986 | are taking samples at the very start (or the very end) of |
| 4987 | <code class="function">c()</code>; at these few instructions, we haven't yet |
| 4988 | created the new function's frame, so it appears as if |
| 4989 | <code class="function">a()</code> is calling directly into |
| 4990 | <code class="function">c()</code>. Be careful not to be misled by these |
| 4991 | entries. |
| 4992 | </p> |
| 4993 | <p> |
| 4994 | Like the rest of OProfile, call-graph profiling uses a statistical |
| 4995 | approach; this means that sometimes a backtrace sample is truncated, or |
| 4996 | even partially wrong. Bear this in mind when examining results. |
| 4997 | </p> |
| 4998 | </div> |
| 4999 | <div class="sect1" lang="en" xml:lang="en"> |
| 5000 | <div class="titlepage"> |
| 5001 | <div> |
| 5002 | <div> |
| 5003 | <h2 class="title" style="clear: both"><a id="debug-info"></a>4. Inaccuracies in annotated source</h2> |
| 5004 | </div> |
| 5005 | </div> |
| 5006 | </div> |
| 5007 | <div class="sect2" lang="en" xml:lang="en"> |
| 5008 | <div class="titlepage"> |
| 5009 | <div> |
| 5010 | <div> |
| 5011 | <h3 class="title"><a id="effect-of-optimizations"></a>4.1. Side effects of optimizations</h3> |
| 5012 | </div> |
| 5013 | </div> |
| 5014 | </div> |
| 5015 | <p> |
| 5016 | The compiler can introduce some pitfalls in the annotated source output. |
| 5017 | The optimizer can move pieces of code in such manner that two line of codes |
| 5018 | are interlaced (instruction scheduling). Also debug info generated by the compiler |
| 5019 | can show strange behavior. This is especially true for complex expressions e.g. inside |
| 5020 | an if statement: |
| 5021 | </p> |
| 5022 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 5023 | <tr> |
| 5024 | <td> |
| 5025 | <pre class="screen"> |
| 5026 | if (a && .. |
| 5027 | b && .. |
| 5028 | c &&) |
| 5029 | </pre> |
| 5030 | </td> |
| 5031 | </tr> |
| 5032 | </table> |
| 5033 | <p> |
| 5034 | here the problem come from the position of line number. The available debug |
| 5035 | info does not give enough details for the if condition, so all samples are |
| 5036 | accumulated at the position of the right brace of the expression. Using |
| 5037 | <span><strong class="command">opannotate <code class="option">-a</code></strong></span> can help to show the real |
| 5038 | samples at an assembly level. |
| 5039 | </p> |
| 5040 | </div> |
| 5041 | <div class="sect2" lang="en" xml:lang="en"> |
| 5042 | <div class="titlepage"> |
| 5043 | <div> |
| 5044 | <div> |
| 5045 | <h3 class="title"><a id="prologues"></a>4.2. Prologues and epilogues</h3> |
| 5046 | </div> |
| 5047 | </div> |
| 5048 | </div> |
| 5049 | <p> |
| 5050 | The compiler generally needs to generate "glue" code across function calls, dependent |
| 5051 | on the particular function call conventions used. Additionally other things |
| 5052 | need to happen, like stack pointer adjustment for the local variables; this |
| 5053 | code is known as the function prologue. Similar code is needed at function return, |
| 5054 | and is known as the function epilogue. This will show up in annotations as |
| 5055 | samples at the very start and end of a function, where there is no apparent |
| 5056 | executable code in the source. |
| 5057 | </p> |
| 5058 | </div> |
| 5059 | <div class="sect2" lang="en" xml:lang="en"> |
| 5060 | <div class="titlepage"> |
| 5061 | <div> |
| 5062 | <div> |
| 5063 | <h3 class="title"><a id="inlined-function"></a>4.3. Inlined functions</h3> |
| 5064 | </div> |
| 5065 | </div> |
| 5066 | </div> |
| 5067 | <p> |
| 5068 | You may see that a function is credited with a certain number of samples, but |
| 5069 | the listing does not add up to the correct total. To pick a real example : |
| 5070 | </p> |
| 5071 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 5072 | <tr> |
| 5073 | <td> |
| 5074 | <pre class="screen"> |
| 5075 | :internal_sk_buff_alloc_security(struct sk_buff *skb) |
| 5076 | 353 2.342% :{ /* internal_sk_buff_alloc_security total: 1882 12.48% */ |
| 5077 | : |
| 5078 | : sk_buff_security_t *sksec; |
| 5079 | 15 0.0995% : int rc = 0; |
| 5080 | : |
| 5081 | 10 0.06633% : sksec = skb->lsm_security; |
| 5082 | 468 3.104% : if (sksec && sksec->magic == DSI_MAGIC) { |
| 5083 | : goto out; |
| 5084 | : } |
| 5085 | : |
| 5086 | : sksec = (sk_buff_security_t *) get_sk_buff_memory(skb); |
| 5087 | 3 0.0199% : if (!sksec) { |
| 5088 | 38 0.2521% : rc = -ENOMEM; |
| 5089 | : goto out; |
| 5090 | 10 0.06633% : } |
| 5091 | : memset(sksec, 0, sizeof (sk_buff_security_t)); |
| 5092 | 44 0.2919% : sksec->magic = DSI_MAGIC; |
| 5093 | 32 0.2123% : sksec->skb = skb; |
| 5094 | 45 0.2985% : sksec->sid = DSI_SID_NORMAL; |
| 5095 | 31 0.2056% : skb->lsm_security = sksec; |
| 5096 | : |
| 5097 | : out: |
| 5098 | : |
| 5099 | 146 0.9685% : return rc; |
| 5100 | : |
| 5101 | 98 0.6501% :} |
| 5102 | </pre> |
| 5103 | </td> |
| 5104 | </tr> |
| 5105 | </table> |
| 5106 | <p> |
| 5107 | Here, the function is credited with 1,882 samples, but the annotations |
| 5108 | below do not account for this. This is usually because of inline functions - |
| 5109 | the compiler marks such code with debug entries for the inline function |
| 5110 | definition, and this is where <span><strong class="command">opannotate</strong></span> annotates |
| 5111 | such samples. In the case above, <code class="function">memset</code> is the most |
| 5112 | likely candidate for this problem. Examining the mixed source/assembly |
| 5113 | output can help identify such results. |
| 5114 | </p> |
| 5115 | <p> |
| 5116 | This problem is more visible when there is no source file available, in the |
| 5117 | following example it's trivially visible the sums of symbols samples is less |
| 5118 | than the number of the samples for this file. The difference must be accounted |
| 5119 | to inline functions. |
| 5120 | </p> |
| 5121 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 5122 | <tr> |
| 5123 | <td> |
| 5124 | <pre class="screen"> |
| 5125 | /* |
| 5126 | * Total samples for file : "arch/i386/kernel/process.c" |
| 5127 | * |
| 5128 | * 109 2.4616 |
| 5129 | */ |
| 5130 | |
| 5131 | /* default_idle total: 84 1.8970 */ |
| 5132 | /* cpu_idle total: 21 0.4743 */ |
| 5133 | /* flush_thread total: 1 0.0226 */ |
| 5134 | /* prepare_to_copy total: 1 0.0226 */ |
| 5135 | /* __switch_to total: 18 0.4065 */ |
| 5136 | </pre> |
| 5137 | </td> |
| 5138 | </tr> |
| 5139 | </table> |
| 5140 | <p> |
| 5141 | The missing samples are not lost, they will be credited to another source |
| 5142 | location where the inlined function is defined. The inlined function will be |
| 5143 | credited from multiple call site and merged in one place in the annotated |
| 5144 | source file so there is no way to see from what call site are coming the |
| 5145 | samples for an inlined function. |
| 5146 | </p> |
| 5147 | <p> |
| 5148 | When running <span><strong class="command">opannotate</strong></span>, you may get a warning |
| 5149 | "some functions compiled without debug information may have incorrect source line attributions". |
| 5150 | In some rare cases, OProfile is not able to verify that the derived source line |
| 5151 | is correct (when some parts of the binary image are compiled without debugging |
| 5152 | information). Be wary of results if this warning appears. |
| 5153 | </p> |
| 5154 | <p> |
| 5155 | Furthermore, for some languages the compiler can implicitly generate functions, |
| 5156 | such as default copy constructors. Such functions are labelled by the compiler |
| 5157 | as having a line number of 0, which means the source annotation can be confusing. |
| 5158 | </p> |
| 5159 | </div> |
| 5160 | <div class="sect2" lang="en" xml:lang="en"> |
| 5161 | <div class="titlepage"> |
| 5162 | <div> |
| 5163 | <div> |
| 5164 | <h3 class="title"><a id="wrong-linenr-info"></a>4.4. Inaccuracy in line number information</h3> |
| 5165 | </div> |
| 5166 | </div> |
| 5167 | </div> |
| 5168 | <p> |
| 5169 | Depending on your compiler you can fall into the following problem: |
| 5170 | </p> |
| 5171 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 5172 | <tr> |
| 5173 | <td> |
| 5174 | <pre class="screen"> |
| 5175 | struct big_object { int a[500]; }; |
| 5176 | |
| 5177 | int main() |
| 5178 | { |
| 5179 | big_object a, b; |
| 5180 | for (int i = 0 ; i != 1000 * 1000; ++i) |
| 5181 | b = a; |
| 5182 | return 0; |
| 5183 | } |
| 5184 | |
| 5185 | </pre> |
| 5186 | </td> |
| 5187 | </tr> |
| 5188 | </table> |
| 5189 | <p> |
| 5190 | Compiled with <span><strong class="command">gcc</strong></span> 3.0.4 the annotated source is clearly inaccurate: |
| 5191 | </p> |
| 5192 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 5193 | <tr> |
| 5194 | <td> |
| 5195 | <pre class="screen"> |
| 5196 | :int main() |
| 5197 | :{ /* main total: 7871 100% */ |
| 5198 | : big_object a, b; |
| 5199 | : for (int i = 0 ; i != 1000 * 1000; ++i) |
| 5200 | : b = a; |
| 5201 | 7871 100% : return 0; |
| 5202 | :} |
| 5203 | </pre> |
| 5204 | </td> |
| 5205 | </tr> |
| 5206 | </table> |
| 5207 | <p> |
| 5208 | The problem here is distinct from the IRQ latency problem; the debug line number |
| 5209 | information is not precise enough; again, looking at output of <span><strong class="command">opannoatate -as</strong></span> can help. |
| 5210 | </p> |
| 5211 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 5212 | <tr> |
| 5213 | <td> |
| 5214 | <pre class="screen"> |
| 5215 | :int main() |
| 5216 | :{ |
| 5217 | : big_object a, b; |
| 5218 | : for (int i = 0 ; i != 1000 * 1000; ++i) |
| 5219 | : 80484c0: push %ebp |
| 5220 | : 80484c1: mov %esp,%ebp |
| 5221 | : 80484c3: sub $0xfac,%esp |
| 5222 | : 80484c9: push %edi |
| 5223 | : 80484ca: push %esi |
| 5224 | : 80484cb: push %ebx |
| 5225 | : b = a; |
| 5226 | : 80484cc: lea 0xfffff060(%ebp),%edx |
| 5227 | : 80484d2: lea 0xfffff830(%ebp),%eax |
| 5228 | : 80484d8: mov $0xf423f,%ebx |
| 5229 | : 80484dd: lea 0x0(%esi),%esi |
| 5230 | : return 0; |
| 5231 | 3 0.03811% : 80484e0: mov %edx,%edi |
| 5232 | : 80484e2: mov %eax,%esi |
| 5233 | 1 0.0127% : 80484e4: cld |
| 5234 | 8 0.1016% : 80484e5: mov $0x1f4,%ecx |
| 5235 | 7850 99.73% : 80484ea: repz movsl %ds:(%esi),%es:(%edi) |
| 5236 | 9 0.1143% : 80484ec: dec %ebx |
| 5237 | : 80484ed: jns 80484e0 |
| 5238 | : 80484ef: xor %eax,%eax |
| 5239 | : 80484f1: pop %ebx |
| 5240 | : 80484f2: pop %esi |
| 5241 | : 80484f3: pop %edi |
| 5242 | : 80484f4: leave |
| 5243 | : 80484f5: ret |
| 5244 | </pre> |
| 5245 | </td> |
| 5246 | </tr> |
| 5247 | </table> |
| 5248 | <p> |
| 5249 | So here it's clear that copying is correctly credited with of all the samples, but the |
| 5250 | line number information is misplaced. <span><strong class="command">objdump -dS</strong></span> exposes the |
| 5251 | same problem. Note that maintaining accurate debug information for compilers when optimizing is difficult, so this problem is not suprising. |
| 5252 | The problem of debug information |
| 5253 | accuracy is also dependent on the binutils version used; some BFD library versions |
| 5254 | contain a work-around for known problems of <span><strong class="command">gcc</strong></span>, some others do not. This is unfortunate but we must live with that, |
| 5255 | since profiling is pointless when you disable optimisation (which would give better debugging entries). |
| 5256 | </p> |
| 5257 | </div> |
| 5258 | </div> |
| 5259 | <div class="sect1" lang="en" xml:lang="en"> |
| 5260 | <div class="titlepage"> |
| 5261 | <div> |
| 5262 | <div> |
| 5263 | <h2 class="title" style="clear: both"><a id="symbol-without-debug-info"></a>5. Assembly functions</h2> |
| 5264 | </div> |
| 5265 | </div> |
| 5266 | </div> |
| 5267 | <p> |
| 5268 | Often the assembler cannot generate debug information automatically. |
| 5269 | This means that you cannot get a source report unless |
| 5270 | you manually define the neccessary debug information; read your assembler documentation for how you might |
| 5271 | do that. The only |
| 5272 | debugging info needed currently by OProfile is the line-number/filename-VMA association. When profiling assembly |
| 5273 | without debugging info you can always get report for symbols, and optionally for VMA, through <span><strong class="command">opreport -l</strong></span> |
| 5274 | or <span><strong class="command">opreport -d</strong></span>, but this works only for symbols with the right attributes. |
| 5275 | For <span><strong class="command">gas</strong></span> you can get this by |
| 5276 | </p> |
| 5277 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 5278 | <tr> |
| 5279 | <td> |
| 5280 | <pre class="screen"> |
| 5281 | .globl foo |
| 5282 | .type foo,@function |
| 5283 | </pre> |
| 5284 | </td> |
| 5285 | </tr> |
| 5286 | </table> |
| 5287 | <p> |
| 5288 | whilst for <span><strong class="command">nasm</strong></span> you must use |
| 5289 | </p> |
| 5290 | <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| 5291 | <tr> |
| 5292 | <td> |
| 5293 | <pre class="screen"> |
| 5294 | GLOBAL foo:function ; [1] |
| 5295 | </pre> |
| 5296 | </td> |
| 5297 | </tr> |
| 5298 | </table> |
| 5299 | <p> |
| 5300 | Note that OProfile does not need the global attribute, only the function attribute. |
| 5301 | </p> |
| 5302 | </div> |
| 5303 | <div class="sect1" lang="en" xml:lang="en"> |
| 5304 | <div class="titlepage"> |
| 5305 | <div> |
| 5306 | <div> |
| 5307 | <h2 class="title" style="clear: both"><a id="overlapping-symbols"></a>6. Overlapping symbols in JITed code</h2> |
| 5308 | </div> |
| 5309 | </div> |
| 5310 | </div> |
| 5311 | <p> |
| 5312 | Some virtual machines (e.g., Java) may re-JIT a method, resulting in previously |
| 5313 | allocated space for a piece of compiled code to be reused. This means that, at one distinct |
| 5314 | code address, multiple symbols/methods may be present during the run time of the application. |
| 5315 | </p> |
| 5316 | <p> |
| 5317 | Since OProfile samples are buffered and don′t have timing information, there is no way |
| 5318 | to correlate samples with the (possibly) varying address ranges in which the code for a symbol |
| 5319 | may reside. |
| 5320 | An alternative would be flushing the OProfile sampling buffer when we get an unload event, |
| 5321 | but this could result in high overhead. |
| 5322 | </p> |
| 5323 | <p> |
| 5324 | To moderate the problem of overlapping symbols, OProfile tries to select the symbol that was |
| 5325 | present at this address range most of the time. Additionally, other overlapping symbols |
| 5326 | are truncated in the overlapping area. |
| 5327 | This gives reasonable results, because in reality, address reuse typically takes place |
| 5328 | during phase changes of the application -- in particular, during application startup. |
| 5329 | Thus, for optimum profiling results, start the sampling session after application startup |
| 5330 | and burn in. |
| 5331 | </p> |
| 5332 | </div> |
| 5333 | <div class="sect1" lang="en" xml:lang="en"> |
| 5334 | <div class="titlepage"> |
| 5335 | <div> |
| 5336 | <div> |
| 5337 | <h2 class="title" style="clear: both"><a id="hidden-cost"></a>7. Other discrepancies</h2> |
| 5338 | </div> |
| 5339 | </div> |
| 5340 | </div> |
| 5341 | <p> |
| 5342 | Another cause of apparent problems is the hidden cost of instructions. A very |
| 5343 | common example is two memory reads: one from L1 cache and the other from memory: |
| 5344 | the second memory read is likely to have more samples. |
| 5345 | There are many other causes of hidden cost of instructions. A non-exhaustive |
| 5346 | list: mis-predicted branch, TLB cache miss, partial register stall, |
| 5347 | partial register dependencies, memory mismatch stall, re-executed µops. If you want to write |
| 5348 | programs at the assembly level, be sure to take a look at the Intel and |
| 5349 | AMD documentation at <a href="http://developer.intel.com/">http://developer.intel.com/</a> |
| 5350 | and <a href="http://developer.amd.com/devguides.jsp/">http://developer.amd.com/devguides.jsp</a>. |
| 5351 | </p> |
| 5352 | </div> |
| 5353 | </div> |
| 5354 | <div class="chapter" lang="en" xml:lang="en"> |
| 5355 | <div class="titlepage"> |
| 5356 | <div> |
| 5357 | <div> |
| 5358 | <h2 class="title"><a id="ack"></a>Chapter 6. Acknowledgments</h2> |
| 5359 | </div> |
| 5360 | </div> |
| 5361 | </div> |
| 5362 | <p> |
| 5363 | Thanks to (in no particular order) : Arjan van de Ven, Rik van Riel, Juan Quintela, Philippe Elie, |
| 5364 | Phillipp Rumpf, Tigran Aivazian, Alex Brown, Alisdair Rawsthorne, Bob Montgomery, Ray Bryant, H.J. Lu, |
| 5365 | Jeff Esper, Will Cohen, Graydon Hoare, Cliff Woolley, Alex Tsariounov, Al Stone, Jason Yeh, |
| 5366 | Randolph Chung, Anton Blanchard, Richard Henderson, Andries Brouwer, Bryan Rittmeyer, |
| 5367 | Maynard P. Johnson, |
| 5368 | Richard Reich (rreich@rdrtech.com), Zwane Mwaikambo, Dave Jones, Charles Filtness; and finally Pulp, for "Intro". |
| 5369 | </p> |
| 5370 | </div> |
| 5371 | </div> |
| 5372 | </body> |
| 5373 | </html> |