Blame - man/man8/profile.8 - platform/external/bcc

blob: fca31fd830f4ca5a8d59d4609ce4d0a919491873 [file] [log] [blame]

Brendan Gregg	f4bf275	2016-07-21 18:13:24 -0700	[diff] [blame]	1	.TH profile 8 "2016-07-17" "USER COMMANDS"
				2	.SH NAME
				3	profile \- Profile CPU usage by sampling stack traces. Uses Linux eBPF/bcc.
				4	.SH SYNOPSIS
				5	.B profile [\-adfh] [\-p PID] [\-U \| \-k] [\-F FREQUENCY]
Brendan Gregg	715f7e6	2016-10-20 22:50:08 -0700	[diff] [blame]	6	.B [\-\-stack\-storage\-size COUNT] [duration]
Brendan Gregg	f4bf275	2016-07-21 18:13:24 -0700	[diff] [blame]	7	.SH DESCRIPTION
				8	This is a CPU profiler. It works by taking samples of stack traces at timed
				9	intervals. It will help you understand and quantify CPU usage: which code is
				10	executing, and by how much, including both user-level and kernel code.
				11
				12	By default this samples at 49 Hertz (samples per second), across all CPUs.
				13	This frequency can be tuned using a command line option. The reason for 49, and
				14	not 50, is to avoid lock-step sampling.
				15
				16	This is also an efficient profiler, as stack traces are frequency counted in
				17	kernel context, rather than passing each stack to user space for frequency
				18	counting there. Only the unique stacks and counts are passed to user space
				19	at the end of the profile, greatly reducing the kernel<->user transfer.
Brendan Gregg	f4bf275	2016-07-21 18:13:24 -0700	[diff] [blame]	20	.SH REQUIREMENTS
				21	CONFIG_BPF and bcc.
				22
Brendan Gregg	715f7e6	2016-10-20 22:50:08 -0700	[diff] [blame]	23	This also requires Linux 4.9+ (BPF_PROG_TYPE_PERF_EVENT support). See tools/old
				24	for an older version that may work on Linux 4.6 - 4.8.
Brendan Gregg	f4bf275	2016-07-21 18:13:24 -0700	[diff] [blame]	25	.SH OPTIONS
				26	.TP
				27	\-h
				28	Print usage message.
				29	.TP
				30	\-p PID
				31	Trace this process ID only (filtered in-kernel). Without this, all CPUs are
				32	profiled.
				33	.TP
				34	\-F frequency
				35	Frequency to sample stacks (default 49).
				36	.TP
				37	\-f
				38	Print output in folded stack format.
				39	.TP
				40	\-d
				41	Include an output delimiter between kernel and user stacks (either "--", or,
				42	in folded mode, "-").
				43	.TP
				44	\-U
				45	Show stacks from user space only (no kernel space stacks).
				46	.TP
				47	\-K
				48	Show stacks from kernel space only (no user space stacks).
				49	.TP
				50	\-\-stack-storage-size COUNT
				51	The maximum number of unique stack traces that the kernel will count (default
Brendan Gregg	43e87c1	2017-01-11 09:40:49 -0800	[diff] [blame]	52	10240). If the sampled count exceeds this, a warning will be printed.
Brendan Gregg	f4bf275	2016-07-21 18:13:24 -0700	[diff] [blame]	53	.TP
Brendan Gregg	f4bf275	2016-07-21 18:13:24 -0700	[diff] [blame]	54	duration
				55	Duration to trace, in seconds.
				56	.SH EXAMPLES
				57	.TP
				58	Profile (sample) stack traces system-wide at 49 Hertz (samples per second) until Ctrl-C:
				59	#
				60	.B profile
				61	.TP
				62	Profile for 5 seconds only:
				63	#
				64	.B profile 5
				65	.TP
				66	Profile at 99 Hertz for 5 seconds only:
				67	#
				68	.B profile -F 99 5
				69	.TP
				70	Profile PID 181 only:
				71	#
				72	.B profile -p 181
				73	.TP
				74	Profile for 5 seconds and output in folded stack format (suitable as input for flame graphs), including a delimiter between kernel and user stacks:
				75	#
				76	.B profile -df 5
				77	.TP
				78	Profile kernel stacks only:
				79	#
				80	.B profile -K
				81	.SH DEBUGGING
				82	See "[unknown]" frames with bogus addresses? This can happen for different
				83	reasons. Your best approach is to get Linux perf to work first, and then to
				84	try this tool. Eg, "perf record \-F 49 \-a \-g \-\- sleep 1; perf script", and
				85	to check for unknown frames there.
				86
				87	The most common reason for "[unknown]" frames is that the target software has
				88	not been compiled
				89	with frame pointers, and so we can't use that simple method for walking the
				90	stack. The fix in that case is to use software that does have frame pointers,
				91	eg, gcc -fno-omit-frame-pointer, or Java's -XX:+PreserveFramePointer.
				92
				93	Another reason for "[unknown]" frames is JIT compilers, which don't use a
				94	traditional symbol table. The fix in that case is to populate a
				95	/tmp/perf-PID.map file with the symbols, which this tool should read. How you
				96	do this depends on the runtime (Java, Node.js).
				97
				98	If you seem to have unrelated samples in the output, check for other
				99	sampling or tracing tools that may be running. The current version of this
				100	tool can include their events if profiling happened concurrently. Those
				101	samples may be filtered in a future version.
				102	.SH OVERHEAD
				103	This is an efficient profiler, as stack traces are frequency counted in
				104	kernel context, and only the unique stacks and their counts are passed to
				105	user space. Contrast this with the current "perf record -F 99 -a" method
				106	of profiling, which writes each sample to user space (via a ring buffer),
				107	and then to the file system (perf.data), which must be post-processed.
				108
				109	This uses perf_event_open to setup a timer which is instrumented by BPF,
				110	and for efficiency it does not initialize the perf ring buffer, so the
				111	redundant perf samples are not collected.
				112
				113	It's expected that the overhead while sampling at 49 Hertz (the default),
				114	across all CPUs, should be negligible. If you increase the sample rate, the
				115	overhead might begin to be measurable.
				116	.SH SOURCE
				117	This is from bcc.
				118	.IP
				119	https://github.com/iovisor/bcc
				120	.PP
				121	Also look in the bcc distribution for a companion _examples.txt file containing
				122	example usage, output, and commentary for this tool.
				123	.SH OS
				124	Linux
				125	.SH STABILITY
				126	Unstable - in development.
				127	.SH AUTHOR
				128	Brendan Gregg
				129	.SH SEE ALSO
				130	offcputime(8)