blob: fca31fd830f4ca5a8d59d4609ce4d0a919491873 [file] [log] [blame]
Brendan Greggf4bf2752016-07-21 18:13:24 -07001.TH profile 8 "2016-07-17" "USER COMMANDS"
2.SH NAME
3profile \- Profile CPU usage by sampling stack traces. Uses Linux eBPF/bcc.
4.SH SYNOPSIS
5.B profile [\-adfh] [\-p PID] [\-U | \-k] [\-F FREQUENCY]
Brendan Gregg715f7e62016-10-20 22:50:08 -07006.B [\-\-stack\-storage\-size COUNT] [duration]
Brendan Greggf4bf2752016-07-21 18:13:24 -07007.SH DESCRIPTION
8This is a CPU profiler. It works by taking samples of stack traces at timed
9intervals. It will help you understand and quantify CPU usage: which code is
10executing, and by how much, including both user-level and kernel code.
11
12By default this samples at 49 Hertz (samples per second), across all CPUs.
13This frequency can be tuned using a command line option. The reason for 49, and
14not 50, is to avoid lock-step sampling.
15
16This is also an efficient profiler, as stack traces are frequency counted in
17kernel context, rather than passing each stack to user space for frequency
18counting there. Only the unique stacks and counts are passed to user space
19at the end of the profile, greatly reducing the kernel<->user transfer.
Brendan Greggf4bf2752016-07-21 18:13:24 -070020.SH REQUIREMENTS
21CONFIG_BPF and bcc.
22
Brendan Gregg715f7e62016-10-20 22:50:08 -070023This also requires Linux 4.9+ (BPF_PROG_TYPE_PERF_EVENT support). See tools/old
24for an older version that may work on Linux 4.6 - 4.8.
Brendan Greggf4bf2752016-07-21 18:13:24 -070025.SH OPTIONS
26.TP
27\-h
28Print usage message.
29.TP
30\-p PID
31Trace this process ID only (filtered in-kernel). Without this, all CPUs are
32profiled.
33.TP
34\-F frequency
35Frequency to sample stacks (default 49).
36.TP
37\-f
38Print output in folded stack format.
39.TP
40\-d
41Include an output delimiter between kernel and user stacks (either "--", or,
42in folded mode, "-").
43.TP
44\-U
45Show stacks from user space only (no kernel space stacks).
46.TP
47\-K
48Show stacks from kernel space only (no user space stacks).
49.TP
50\-\-stack-storage-size COUNT
51The maximum number of unique stack traces that the kernel will count (default
Brendan Gregg43e87c12017-01-11 09:40:49 -08005210240). If the sampled count exceeds this, a warning will be printed.
Brendan Greggf4bf2752016-07-21 18:13:24 -070053.TP
Brendan Greggf4bf2752016-07-21 18:13:24 -070054duration
55Duration to trace, in seconds.
56.SH EXAMPLES
57.TP
58Profile (sample) stack traces system-wide at 49 Hertz (samples per second) until Ctrl-C:
59#
60.B profile
61.TP
62Profile for 5 seconds only:
63#
64.B profile 5
65.TP
66Profile at 99 Hertz for 5 seconds only:
67#
68.B profile -F 99 5
69.TP
70Profile PID 181 only:
71#
72.B profile -p 181
73.TP
74Profile for 5 seconds and output in folded stack format (suitable as input for flame graphs), including a delimiter between kernel and user stacks:
75#
76.B profile -df 5
77.TP
78Profile kernel stacks only:
79#
80.B profile -K
81.SH DEBUGGING
82See "[unknown]" frames with bogus addresses? This can happen for different
83reasons. Your best approach is to get Linux perf to work first, and then to
84try this tool. Eg, "perf record \-F 49 \-a \-g \-\- sleep 1; perf script", and
85to check for unknown frames there.
86
87The most common reason for "[unknown]" frames is that the target software has
88not been compiled
89with frame pointers, and so we can't use that simple method for walking the
90stack. The fix in that case is to use software that does have frame pointers,
91eg, gcc -fno-omit-frame-pointer, or Java's -XX:+PreserveFramePointer.
92
93Another reason for "[unknown]" frames is JIT compilers, which don't use a
94traditional symbol table. The fix in that case is to populate a
95/tmp/perf-PID.map file with the symbols, which this tool should read. How you
96do this depends on the runtime (Java, Node.js).
97
98If you seem to have unrelated samples in the output, check for other
99sampling or tracing tools that may be running. The current version of this
100tool can include their events if profiling happened concurrently. Those
101samples may be filtered in a future version.
102.SH OVERHEAD
103This is an efficient profiler, as stack traces are frequency counted in
104kernel context, and only the unique stacks and their counts are passed to
105user space. Contrast this with the current "perf record -F 99 -a" method
106of profiling, which writes each sample to user space (via a ring buffer),
107and then to the file system (perf.data), which must be post-processed.
108
109This uses perf_event_open to setup a timer which is instrumented by BPF,
110and for efficiency it does not initialize the perf ring buffer, so the
111redundant perf samples are not collected.
112
113It's expected that the overhead while sampling at 49 Hertz (the default),
114across all CPUs, should be negligible. If you increase the sample rate, the
115overhead might begin to be measurable.
116.SH SOURCE
117This is from bcc.
118.IP
119https://github.com/iovisor/bcc
120.PP
121Also look in the bcc distribution for a companion _examples.txt file containing
122example usage, output, and commentary for this tool.
123.SH OS
124Linux
125.SH STABILITY
126Unstable - in development.
127.SH AUTHOR
128Brendan Gregg
129.SH SEE ALSO
130offcputime(8)