Brendan Gregg | f4bf275 | 2016-07-21 18:13:24 -0700 | [diff] [blame] | 1 | .TH profile 8 "2016-07-17" "USER COMMANDS" |
| 2 | .SH NAME |
| 3 | profile \- Profile CPU usage by sampling stack traces. Uses Linux eBPF/bcc. |
| 4 | .SH SYNOPSIS |
| 5 | .B profile [\-adfh] [\-p PID] [\-U | \-k] [\-F FREQUENCY] |
Brendan Gregg | 715f7e6 | 2016-10-20 22:50:08 -0700 | [diff] [blame] | 6 | .B [\-\-stack\-storage\-size COUNT] [duration] |
Brendan Gregg | f4bf275 | 2016-07-21 18:13:24 -0700 | [diff] [blame] | 7 | .SH DESCRIPTION |
| 8 | This is a CPU profiler. It works by taking samples of stack traces at timed |
| 9 | intervals. It will help you understand and quantify CPU usage: which code is |
| 10 | executing, and by how much, including both user-level and kernel code. |
| 11 | |
| 12 | By default this samples at 49 Hertz (samples per second), across all CPUs. |
| 13 | This frequency can be tuned using a command line option. The reason for 49, and |
| 14 | not 50, is to avoid lock-step sampling. |
| 15 | |
| 16 | This is also an efficient profiler, as stack traces are frequency counted in |
| 17 | kernel context, rather than passing each stack to user space for frequency |
| 18 | counting there. Only the unique stacks and counts are passed to user space |
| 19 | at the end of the profile, greatly reducing the kernel<->user transfer. |
Brendan Gregg | f4bf275 | 2016-07-21 18:13:24 -0700 | [diff] [blame] | 20 | .SH REQUIREMENTS |
| 21 | CONFIG_BPF and bcc. |
| 22 | |
Brendan Gregg | 715f7e6 | 2016-10-20 22:50:08 -0700 | [diff] [blame] | 23 | This also requires Linux 4.9+ (BPF_PROG_TYPE_PERF_EVENT support). See tools/old |
| 24 | for an older version that may work on Linux 4.6 - 4.8. |
Brendan Gregg | f4bf275 | 2016-07-21 18:13:24 -0700 | [diff] [blame] | 25 | .SH OPTIONS |
| 26 | .TP |
| 27 | \-h |
| 28 | Print usage message. |
| 29 | .TP |
| 30 | \-p PID |
| 31 | Trace this process ID only (filtered in-kernel). Without this, all CPUs are |
| 32 | profiled. |
| 33 | .TP |
| 34 | \-F frequency |
| 35 | Frequency to sample stacks (default 49). |
| 36 | .TP |
| 37 | \-f |
| 38 | Print output in folded stack format. |
| 39 | .TP |
| 40 | \-d |
| 41 | Include an output delimiter between kernel and user stacks (either "--", or, |
| 42 | in folded mode, "-"). |
| 43 | .TP |
| 44 | \-U |
| 45 | Show stacks from user space only (no kernel space stacks). |
| 46 | .TP |
| 47 | \-K |
| 48 | Show stacks from kernel space only (no user space stacks). |
| 49 | .TP |
| 50 | \-\-stack-storage-size COUNT |
| 51 | The maximum number of unique stack traces that the kernel will count (default |
Brendan Gregg | 43e87c1 | 2017-01-11 09:40:49 -0800 | [diff] [blame] | 52 | 10240). If the sampled count exceeds this, a warning will be printed. |
Brendan Gregg | f4bf275 | 2016-07-21 18:13:24 -0700 | [diff] [blame] | 53 | .TP |
Brendan Gregg | f4bf275 | 2016-07-21 18:13:24 -0700 | [diff] [blame] | 54 | duration |
| 55 | Duration to trace, in seconds. |
| 56 | .SH EXAMPLES |
| 57 | .TP |
| 58 | Profile (sample) stack traces system-wide at 49 Hertz (samples per second) until Ctrl-C: |
| 59 | # |
| 60 | .B profile |
| 61 | .TP |
| 62 | Profile for 5 seconds only: |
| 63 | # |
| 64 | .B profile 5 |
| 65 | .TP |
| 66 | Profile at 99 Hertz for 5 seconds only: |
| 67 | # |
| 68 | .B profile -F 99 5 |
| 69 | .TP |
| 70 | Profile PID 181 only: |
| 71 | # |
| 72 | .B profile -p 181 |
| 73 | .TP |
| 74 | Profile for 5 seconds and output in folded stack format (suitable as input for flame graphs), including a delimiter between kernel and user stacks: |
| 75 | # |
| 76 | .B profile -df 5 |
| 77 | .TP |
| 78 | Profile kernel stacks only: |
| 79 | # |
| 80 | .B profile -K |
| 81 | .SH DEBUGGING |
| 82 | See "[unknown]" frames with bogus addresses? This can happen for different |
| 83 | reasons. Your best approach is to get Linux perf to work first, and then to |
| 84 | try this tool. Eg, "perf record \-F 49 \-a \-g \-\- sleep 1; perf script", and |
| 85 | to check for unknown frames there. |
| 86 | |
| 87 | The most common reason for "[unknown]" frames is that the target software has |
| 88 | not been compiled |
| 89 | with frame pointers, and so we can't use that simple method for walking the |
| 90 | stack. The fix in that case is to use software that does have frame pointers, |
| 91 | eg, gcc -fno-omit-frame-pointer, or Java's -XX:+PreserveFramePointer. |
| 92 | |
| 93 | Another reason for "[unknown]" frames is JIT compilers, which don't use a |
| 94 | traditional symbol table. The fix in that case is to populate a |
| 95 | /tmp/perf-PID.map file with the symbols, which this tool should read. How you |
| 96 | do this depends on the runtime (Java, Node.js). |
| 97 | |
| 98 | If you seem to have unrelated samples in the output, check for other |
| 99 | sampling or tracing tools that may be running. The current version of this |
| 100 | tool can include their events if profiling happened concurrently. Those |
| 101 | samples may be filtered in a future version. |
| 102 | .SH OVERHEAD |
| 103 | This is an efficient profiler, as stack traces are frequency counted in |
| 104 | kernel context, and only the unique stacks and their counts are passed to |
| 105 | user space. Contrast this with the current "perf record -F 99 -a" method |
| 106 | of profiling, which writes each sample to user space (via a ring buffer), |
| 107 | and then to the file system (perf.data), which must be post-processed. |
| 108 | |
| 109 | This uses perf_event_open to setup a timer which is instrumented by BPF, |
| 110 | and for efficiency it does not initialize the perf ring buffer, so the |
| 111 | redundant perf samples are not collected. |
| 112 | |
| 113 | It's expected that the overhead while sampling at 49 Hertz (the default), |
| 114 | across all CPUs, should be negligible. If you increase the sample rate, the |
| 115 | overhead might begin to be measurable. |
| 116 | .SH SOURCE |
| 117 | This is from bcc. |
| 118 | .IP |
| 119 | https://github.com/iovisor/bcc |
| 120 | .PP |
| 121 | Also look in the bcc distribution for a companion _examples.txt file containing |
| 122 | example usage, output, and commentary for this tool. |
| 123 | .SH OS |
| 124 | Linux |
| 125 | .SH STABILITY |
| 126 | Unstable - in development. |
| 127 | .SH AUTHOR |
| 128 | Brendan Gregg |
| 129 | .SH SEE ALSO |
| 130 | offcputime(8) |