Blame - Documentation/trace/ftrace.rst - kernel/msm-4.19

blob: 7ea16a0ceffc50070e02d602a1e073cd634bd7d4 [file] [log] [blame]

Changbin Du	1f198e2	2018-02-17 13:39:38 +0800	[diff] [blame]	1	========================
				2	ftrace - Function Tracer
				3	========================
				4
				5	Copyright 2008 Red Hat Inc.
				6
				7	:Author: Steven Rostedt <srostedt@redhat.com>
				8	:License: The GNU Free Documentation License, Version 1.2
				9	(dual licensed under the GPL v2)
				10	:Original Reviewers: Elias Oltmanns, Randy Dunlap, Andrew Morton,
				11	John Kacur, and David Teigland.
				12
				13	- Written for: 2.6.28-rc2
				14	- Updated for: 3.10
				15	- Updated for: 4.13 - Copyright 2017 VMware Inc. Steven Rostedt
				16	- Converted to rst format - Changbin Du <changbin.du@intel.com>
				17
				18	Introduction
				19	------------
				20
				21	Ftrace is an internal tracer designed to help out developers and
				22	designers of systems to find what is going on inside the kernel.
				23	It can be used for debugging or analyzing latencies and
				24	performance issues that take place outside of user-space.
				25
				26	Although ftrace is typically considered the function tracer, it
				27	is really a frame work of several assorted tracing utilities.
				28	There's latency tracing to examine what occurs between interrupts
				29	disabled and enabled, as well as for preemption and from a time
				30	a task is woken to the task is actually scheduled in.
				31
				32	One of the most common uses of ftrace is the event tracing.
				33	Through out the kernel is hundreds of static event points that
				34	can be enabled via the tracefs file system to see what is
				35	going on in certain parts of the kernel.
				36
				37	See events.txt for more information.
				38
				39
				40	Implementation Details
				41	----------------------
				42
				43	See :doc:`ftrace-design` for details for arch porters and such.
				44
				45
				46	The File System
				47	---------------
				48
				49	Ftrace uses the tracefs file system to hold the control files as
				50	well as the files to display output.
				51
				52	When tracefs is configured into the kernel (which selecting any ftrace
				53	option will do) the directory /sys/kernel/tracing will be created. To mount
				54	this directory, you can add to your /etc/fstab file::
				55
				56	tracefs /sys/kernel/tracing tracefs defaults 0 0
				57
				58	Or you can mount it at run time with::
				59
				60	mount -t tracefs nodev /sys/kernel/tracing
				61
				62	For quicker access to that directory you may want to make a soft link to
				63	it::
				64
				65	ln -s /sys/kernel/tracing /tracing
				66
				67	.. attention::
				68
				69	Before 4.1, all ftrace tracing control files were within the debugfs
				70	file system, which is typically located at /sys/kernel/debug/tracing.
				71	For backward compatibility, when mounting the debugfs file system,
				72	the tracefs file system will be automatically mounted at:
				73
				74	/sys/kernel/debug/tracing
				75
				76	All files located in the tracefs file system will be located in that
				77	debugfs file system directory as well.
				78
				79	.. attention::
				80
				81	Any selected ftrace option will also create the tracefs file system.
				82	The rest of the document will assume that you are in the ftrace directory
				83	(cd /sys/kernel/tracing) and will only concentrate on the files within that
				84	directory and not distract from the content with the extended
				85	"/sys/kernel/tracing" path name.
				86
				87	That's it! (assuming that you have ftrace configured into your kernel)
				88
				89	After mounting tracefs you will have access to the control and output files
				90	of ftrace. Here is a list of some of the key files:
				91
				92
				93	Note: all time values are in microseconds.
				94
				95	current_tracer:
				96
				97	This is used to set or display the current tracer
				98	that is configured.
				99
				100	available_tracers:
				101
				102	This holds the different types of tracers that
				103	have been compiled into the kernel. The
				104	tracers listed here can be configured by
				105	echoing their name into current_tracer.
				106
				107	tracing_on:
				108
				109	This sets or displays whether writing to the trace
				110	ring buffer is enabled. Echo 0 into this file to disable
				111	the tracer or 1 to enable it. Note, this only disables
				112	writing to the ring buffer, the tracing overhead may
				113	still be occurring.
				114
				115	The kernel function tracing_off() can be used within the
				116	kernel to disable writing to the ring buffer, which will
				117	set this file to "0". User space can re-enable tracing by
				118	echoing "1" into the file.
				119
				120	Note, the function and event trigger "traceoff" will also
				121	set this file to zero and stop tracing. Which can also
				122	be re-enabled by user space using this file.
				123
				124	trace:
				125
				126	This file holds the output of the trace in a human
				127	readable format (described below). Note, tracing is temporarily
				128	disabled while this file is being read (opened).
				129
				130	trace_pipe:
				131
				132	The output is the same as the "trace" file but this
				133	file is meant to be streamed with live tracing.
				134	Reads from this file will block until new data is
				135	retrieved. Unlike the "trace" file, this file is a
				136	consumer. This means reading from this file causes
				137	sequential reads to display more current data. Once
				138	data is read from this file, it is consumed, and
				139	will not be read again with a sequential read. The
				140	"trace" file is static, and if the tracer is not
				141	adding more data, it will display the same
				142	information every time it is read. This file will not
				143	disable tracing while being read.
				144
				145	trace_options:
				146
				147	This file lets the user control the amount of data
				148	that is displayed in one of the above output
				149	files. Options also exist to modify how a tracer
				150	or events work (stack traces, timestamps, etc).
				151
				152	options:
				153
				154	This is a directory that has a file for every available
				155	trace option (also in trace_options). Options may also be set
				156	or cleared by writing a "1" or "0" respectively into the
				157	corresponding file with the option name.
				158
				159	tracing_max_latency:
				160
				161	Some of the tracers record the max latency.
				162	For example, the maximum time that interrupts are disabled.
				163	The maximum time is saved in this file. The max trace will also be
				164	stored, and displayed by "trace". A new max trace will only be
				165	recorded if the latency is greater than the value in this file
				166	(in microseconds).
				167
				168	By echoing in a time into this file, no latency will be recorded
				169	unless it is greater than the time in this file.
				170
				171	tracing_thresh:
				172
				173	Some latency tracers will record a trace whenever the
				174	latency is greater than the number in this file.
				175	Only active when the file contains a number greater than 0.
				176	(in microseconds)
				177
				178	buffer_size_kb:
				179
				180	This sets or displays the number of kilobytes each CPU
				181	buffer holds. By default, the trace buffers are the same size
				182	for each CPU. The displayed number is the size of the
				183	CPU buffer and not total size of all buffers. The
				184	trace buffers are allocated in pages (blocks of memory
				185	that the kernel uses for allocation, usually 4 KB in size).
				186	If the last page allocated has room for more bytes
				187	than requested, the rest of the page will be used,
				188	making the actual allocation bigger than requested or shown.
				189	( Note, the size may not be a multiple of the page size
				190	due to buffer management meta-data. )
				191
				192	Buffer sizes for individual CPUs may vary
				193	(see "per_cpu/cpu0/buffer_size_kb" below), and if they do
				194	this file will show "X".
				195
				196	buffer_total_size_kb:
				197
				198	This displays the total combined size of all the trace buffers.
				199
				200	free_buffer:
				201
				202	If a process is performing tracing, and the ring buffer should be
				203	shrunk "freed" when the process is finished, even if it were to be
				204	killed by a signal, this file can be used for that purpose. On close
				205	of this file, the ring buffer will be resized to its minimum size.
				206	Having a process that is tracing also open this file, when the process
				207	exits its file descriptor for this file will be closed, and in doing so,
				208	the ring buffer will be "freed".
				209
				210	It may also stop tracing if disable_on_free option is set.
				211
				212	tracing_cpumask:
				213
				214	This is a mask that lets the user only trace on specified CPUs.
				215	The format is a hex string representing the CPUs.
				216
				217	set_ftrace_filter:
				218
				219	When dynamic ftrace is configured in (see the
				220	section below "dynamic ftrace"), the code is dynamically
				221	modified (code text rewrite) to disable calling of the
				222	function profiler (mcount). This lets tracing be configured
				223	in with practically no overhead in performance. This also
				224	has a side effect of enabling or disabling specific functions
				225	to be traced. Echoing names of functions into this file
				226	will limit the trace to only those functions.
Steffen Maier	32fb7ef	2018-04-13 17:39:15 +0200	[diff] [blame]	227	This influences the tracers "function" and "function_graph"
				228	and thus also function profiling (see "function_profile_enabled").
Changbin Du	1f198e2	2018-02-17 13:39:38 +0800	[diff] [blame]	229
				230	The functions listed in "available_filter_functions" are what
				231	can be written into this file.
				232
				233	This interface also allows for commands to be used. See the
				234	"Filter commands" section for more details.
				235
				236	set_ftrace_notrace:
				237
				238	This has an effect opposite to that of
				239	set_ftrace_filter. Any function that is added here will not
				240	be traced. If a function exists in both set_ftrace_filter
				241	and set_ftrace_notrace, the function will _not_ be traced.
				242
				243	set_ftrace_pid:
				244
				245	Have the function tracer only trace the threads whose PID are
				246	listed in this file.
				247
				248	If the "function-fork" option is set, then when a task whose
				249	PID is listed in this file forks, the child's PID will
				250	automatically be added to this file, and the child will be
				251	traced by the function tracer as well. This option will also
				252	cause PIDs of tasks that exit to be removed from the file.
				253
				254	set_event_pid:
				255
				256	Have the events only trace a task with a PID listed in this file.
				257	Note, sched_switch and sched_wake_up will also trace events
				258	listed in this file.
				259
				260	To have the PIDs of children of tasks with their PID in this file
				261	added on fork, enable the "event-fork" option. That option will also
				262	cause the PIDs of tasks to be removed from this file when the task
				263	exits.
				264
				265	set_graph_function:
				266
				267	Functions listed in this file will cause the function graph
				268	tracer to only trace these functions and the functions that
				269	they call. (See the section "dynamic ftrace" for more details).
Steffen Maier	32fb7ef	2018-04-13 17:39:15 +0200	[diff] [blame]	270	Note, set_ftrace_filter and set_ftrace_notrace still affects
				271	what functions are being traced.
Changbin Du	1f198e2	2018-02-17 13:39:38 +0800	[diff] [blame]	272
				273	set_graph_notrace:
				274
				275	Similar to set_graph_function, but will disable function graph
				276	tracing when the function is hit until it exits the function.
				277	This makes it possible to ignore tracing functions that are called
				278	by a specific function.
				279
				280	available_filter_functions:
				281
				282	This lists the functions that ftrace has processed and can trace.
				283	These are the function names that you can pass to
Steffen Maier	32fb7ef	2018-04-13 17:39:15 +0200	[diff] [blame]	284	"set_ftrace_filter", "set_ftrace_notrace",
				285	"set_graph_function", or "set_graph_notrace".
Changbin Du	1f198e2	2018-02-17 13:39:38 +0800	[diff] [blame]	286	(See the section "dynamic ftrace" below for more details.)
				287
				288	dyn_ftrace_total_info:
				289
				290	This file is for debugging purposes. The number of functions that
				291	have been converted to nops and are available to be traced.
				292
				293	enabled_functions:
				294
				295	This file is more for debugging ftrace, but can also be useful
				296	in seeing if any function has a callback attached to it.
				297	Not only does the trace infrastructure use ftrace function
				298	trace utility, but other subsystems might too. This file
				299	displays all functions that have a callback attached to them
				300	as well as the number of callbacks that have been attached.
				301	Note, a callback may also call multiple functions which will
				302	not be listed in this count.
				303
				304	If the callback registered to be traced by a function with
				305	the "save regs" attribute (thus even more overhead), a 'R'
				306	will be displayed on the same line as the function that
				307	is returning registers.
				308
				309	If the callback registered to be traced by a function with
				310	the "ip modify" attribute (thus the regs->ip can be changed),
				311	an 'I' will be displayed on the same line as the function that
				312	can be overridden.
				313
				314	If the architecture supports it, it will also show what callback
				315	is being directly called by the function. If the count is greater
				316	than 1 it most likely will be ftrace_ops_list_func().
				317
				318	If the callback of the function jumps to a trampoline that is
				319	specific to a the callback and not the standard trampoline,
				320	its address will be printed as well as the function that the
				321	trampoline calls.
				322
				323	function_profile_enabled:
				324
				325	When set it will enable all functions with either the function
				326	tracer, or if configured, the function graph tracer. It will
				327	keep a histogram of the number of functions that were called
				328	and if the function graph tracer was configured, it will also keep
				329	track of the time spent in those functions. The histogram
				330	content can be displayed in the files:
				331
Masami Hiramatsu	1fee4f7	2018-07-26 21:43:36 +0900	[diff] [blame]	332	trace_stat/function<cpu> ( function0, function1, etc).
Changbin Du	1f198e2	2018-02-17 13:39:38 +0800	[diff] [blame]	333
Masami Hiramatsu	1fee4f7	2018-07-26 21:43:36 +0900	[diff] [blame]	334	trace_stat:
Changbin Du	1f198e2	2018-02-17 13:39:38 +0800	[diff] [blame]	335
				336	A directory that holds different tracing stats.
				337
				338	kprobe_events:
				339
				340	Enable dynamic trace points. See kprobetrace.txt.
				341
				342	kprobe_profile:
				343
				344	Dynamic trace points stats. See kprobetrace.txt.
				345
				346	max_graph_depth:
				347
				348	Used with the function graph tracer. This is the max depth
				349	it will trace into a function. Setting this to a value of
				350	one will show only the first kernel function that is called
				351	from user space.
				352
				353	printk_formats:
				354
				355	This is for tools that read the raw format files. If an event in
				356	the ring buffer references a string, only a pointer to the string
				357	is recorded into the buffer and not the string itself. This prevents
				358	tools from knowing what that string was. This file displays the string
				359	and address for the string allowing tools to map the pointers to what
				360	the strings were.
				361
				362	saved_cmdlines:
				363
				364	Only the pid of the task is recorded in a trace event unless
				365	the event specifically saves the task comm as well. Ftrace
				366	makes a cache of pid mappings to comms to try to display
				367	comms for events. If a pid for a comm is not listed, then
				368	"<...>" is displayed in the output.
				369
				370	If the option "record-cmd" is set to "0", then comms of tasks
				371	will not be saved during recording. By default, it is enabled.
				372
				373	saved_cmdlines_size:
				374
				375	By default, 128 comms are saved (see "saved_cmdlines" above). To
				376	increase or decrease the amount of comms that are cached, echo
				377	in a the number of comms to cache, into this file.
				378
				379	saved_tgids:
				380
				381	If the option "record-tgid" is set, on each scheduling context switch
				382	the Task Group ID of a task is saved in a table mapping the PID of
				383	the thread to its TGID. By default, the "record-tgid" option is
				384	disabled.
				385
				386	snapshot:
				387
				388	This displays the "snapshot" buffer and also lets the user
				389	take a snapshot of the current running trace.
				390	See the "Snapshot" section below for more details.
				391
				392	stack_max_size:
				393
				394	When the stack tracer is activated, this will display the
				395	maximum stack size it has encountered.
				396	See the "Stack Trace" section below.
				397
				398	stack_trace:
				399
				400	This displays the stack back trace of the largest stack
				401	that was encountered when the stack tracer is activated.
				402	See the "Stack Trace" section below.
				403
				404	stack_trace_filter:
				405
				406	This is similar to "set_ftrace_filter" but it limits what
				407	functions the stack tracer will check.
				408
				409	trace_clock:
				410
				411	Whenever an event is recorded into the ring buffer, a
				412	"timestamp" is added. This stamp comes from a specified
				413	clock. By default, ftrace uses the "local" clock. This
				414	clock is very fast and strictly per cpu, but on some
				415	systems it may not be monotonic with respect to other
				416	CPUs. In other words, the local clocks may not be in sync
				417	with local clocks on other CPUs.
				418
				419	Usual clocks for tracing::
				420
				421	# cat trace_clock
				422	[local] global counter x86-tsc
				423
				424	The clock with the square brackets around it is the one in effect.
				425
				426	local:
				427	Default clock, but may not be in sync across CPUs
				428
				429	global:
				430	This clock is in sync with all CPUs but may
				431	be a bit slower than the local clock.
				432
				433	counter:
				434	This is not a clock at all, but literally an atomic
				435	counter. It counts up one by one, but is in sync
				436	with all CPUs. This is useful when you need to
				437	know exactly the order events occurred with respect to
				438	each other on different CPUs.
				439
				440	uptime:
				441	This uses the jiffies counter and the time stamp
				442	is relative to the time since boot up.
				443
				444	perf:
				445	This makes ftrace use the same clock that perf uses.
				446	Eventually perf will be able to read ftrace buffers
				447	and this will help out in interleaving the data.
				448
				449	x86-tsc:
				450	Architectures may define their own clocks. For
				451	example, x86 uses its own TSC cycle clock here.
				452
				453	ppc-tb:
				454	This uses the powerpc timebase register value.
				455	This is in sync across CPUs and can also be used
				456	to correlate events across hypervisor/guest if
				457	tb_offset is known.
				458
				459	mono:
				460	This uses the fast monotonic clock (CLOCK_MONOTONIC)
				461	which is monotonic and is subject to NTP rate adjustments.
				462
				463	mono_raw:
				464	This is the raw monotonic clock (CLOCK_MONOTONIC_RAW)
				465	which is montonic but is not subject to any rate adjustments
				466	and ticks at the same rate as the hardware clocksource.
				467
				468	boot:
Thomas Gleixner	a3ed0e43	2018-04-25 15:33:38 +0200	[diff] [blame]	469	This is the boot clock (CLOCK_BOOTTIME) and is based on the
				470	fast monotonic clock, but also accounts for time spent in
				471	suspend. Since the clock access is designed for use in
				472	tracing in the suspend path, some side effects are possible
				473	if clock is accessed after the suspend time is accounted before
				474	the fast mono clock is updated. In this case, the clock update
				475	appears to happen slightly sooner than it normally would have.
				476	Also on 32-bit systems, it's possible that the 64-bit boot offset
				477	sees a partial update. These effects are rare and post
				478	processing should be able to handle them. See comments in the
				479	ktime_get_boot_fast_ns() function for more information.
Changbin Du	1f198e2	2018-02-17 13:39:38 +0800	[diff] [blame]	480
Linus Torvalds	680014d	2018-04-04 14:50:29 -0700	[diff] [blame]	481	To set a clock, simply echo the clock name into this file::
Changbin Du	1f198e2	2018-02-17 13:39:38 +0800	[diff] [blame]	482
Linus Torvalds	680014d	2018-04-04 14:50:29 -0700	[diff] [blame]	483	# echo global > trace_clock
Changbin Du	1f198e2	2018-02-17 13:39:38 +0800	[diff] [blame]	484
				485	trace_marker:
				486
				487	This is a very useful file for synchronizing user space
				488	with events happening in the kernel. Writing strings into
				489	this file will be written into the ftrace buffer.
				490
				491	It is useful in applications to open this file at the start
				492	of the application and just reference the file descriptor
				493	for the file::
				494
				495	void trace_write(const char *fmt, ...)
				496	{
				497	va_list ap;
				498	char buf[256];
				499	int n;
				500
				501	if (trace_fd < 0)
				502	return;
				503
				504	va_start(ap, fmt);
				505	n = vsnprintf(buf, 256, fmt, ap);
				506	va_end(ap);
				507
				508	write(trace_fd, buf, n);
				509	}
				510
				511	start::
				512
				513	trace_fd = open("trace_marker", WR_ONLY);
				514
Steven Rostedt (VMware)	d3439f9	2018-05-11 15:41:24 -0400	[diff] [blame]	515	Note: Writing into the trace_marker file can also initiate triggers
				516	that are written into /sys/kernel/tracing/events/ftrace/print/trigger
				517	See "Event triggers" in Documentation/trace/events.rst and an
				518	example in Documentation/trace/histogram.rst (Section 3.)
				519
Changbin Du	1f198e2	2018-02-17 13:39:38 +0800	[diff] [blame]	520	trace_marker_raw:
				521
				522	This is similar to trace_marker above, but is meant for for binary data
				523	to be written to it, where a tool can be used to parse the data
				524	from trace_pipe_raw.
				525
				526	uprobe_events:
				527
				528	Add dynamic tracepoints in programs.
				529	See uprobetracer.txt
				530
				531	uprobe_profile:
				532
				533	Uprobe statistics. See uprobetrace.txt
				534
				535	instances:
				536
				537	This is a way to make multiple trace buffers where different
				538	events can be recorded in different buffers.
				539	See "Instances" section below.
				540
				541	events:
				542
				543	This is the trace event directory. It holds event tracepoints
				544	(also known as static tracepoints) that have been compiled
				545	into the kernel. It shows what event tracepoints exist
				546	and how they are grouped by system. There are "enable"
				547	files at various levels that can enable the tracepoints
				548	when a "1" is written to them.
				549
				550	See events.txt for more information.
				551
				552	set_event:
				553
				554	By echoing in the event into this file, will enable that event.
				555
				556	See events.txt for more information.
				557
				558	available_events:
				559
				560	A list of events that can be enabled in tracing.
				561
				562	See events.txt for more information.
				563
Linus Torvalds	2a56bb5	2018-04-10 11:27:30 -0700	[diff] [blame]	564	timestamp_mode:
				565
				566	Certain tracers may change the timestamp mode used when
				567	logging trace events into the event buffer. Events with
				568	different modes can coexist within a buffer but the mode in
				569	effect when an event is logged determines which timestamp mode
				570	is used for that event. The default timestamp mode is
				571	'delta'.
				572
				573	Usual timestamp modes for tracing:
				574
				575	# cat timestamp_mode
				576	[delta] absolute
				577
				578	The timestamp mode with the square brackets around it is the
				579	one in effect.
				580
				581	delta: Default timestamp mode - timestamp is a delta against
				582	a per-buffer timestamp.
				583
				584	absolute: The timestamp is a full timestamp, not a delta
				585	against some other value. As such it takes up more
				586	space and is less efficient.
				587
Changbin Du	1f198e2	2018-02-17 13:39:38 +0800	[diff] [blame]	588	hwlat_detector:
				589
				590	Directory for the Hardware Latency Detector.
				591	See "Hardware Latency Detector" section below.
				592
				593	per_cpu:
				594
				595	This is a directory that contains the trace per_cpu information.
				596
				597	per_cpu/cpu0/buffer_size_kb:
				598
				599	The ftrace buffer is defined per_cpu. That is, there's a separate
				600	buffer for each CPU to allow writes to be done atomically,
				601	and free from cache bouncing. These buffers may have different
				602	size buffers. This file is similar to the buffer_size_kb
				603	file, but it only displays or sets the buffer size for the
				604	specific CPU. (here cpu0).
				605
				606	per_cpu/cpu0/trace:
				607
				608	This is similar to the "trace" file, but it will only display
				609	the data specific for the CPU. If written to, it only clears
				610	the specific CPU buffer.
				611
				612	per_cpu/cpu0/trace_pipe
				613
				614	This is similar to the "trace_pipe" file, and is a consuming
				615	read, but it will only display (and consume) the data specific
				616	for the CPU.
				617
				618	per_cpu/cpu0/trace_pipe_raw
				619
				620	For tools that can parse the ftrace ring buffer binary format,
				621	the trace_pipe_raw file can be used to extract the data
				622	from the ring buffer directly. With the use of the splice()
				623	system call, the buffer data can be quickly transferred to
				624	a file or to the network where a server is collecting the
				625	data.
				626
				627	Like trace_pipe, this is a consuming reader, where multiple
				628	reads will always produce different data.
				629
				630	per_cpu/cpu0/snapshot:
				631
				632	This is similar to the main "snapshot" file, but will only
				633	snapshot the current CPU (if supported). It only displays
				634	the content of the snapshot for a given CPU, and if
				635	written to, only clears this CPU buffer.
				636
				637	per_cpu/cpu0/snapshot_raw:
				638
				639	Similar to the trace_pipe_raw, but will read the binary format
				640	from the snapshot buffer for the given CPU.
				641
				642	per_cpu/cpu0/stats:
				643
				644	This displays certain stats about the ring buffer:
				645
				646	entries:
				647	The number of events that are still in the buffer.
				648
				649	overrun:
				650	The number of lost events due to overwriting when
				651	the buffer was full.
				652
				653	commit overrun:
				654	Should always be zero.
				655	This gets set if so many events happened within a nested
				656	event (ring buffer is re-entrant), that it fills the
				657	buffer and starts dropping events.
				658
				659	bytes:
				660	Bytes actually read (not overwritten).
				661
				662	oldest event ts:
				663	The oldest timestamp in the buffer
				664
				665	now ts:
				666	The current timestamp
				667
				668	dropped events:
				669	Events lost due to overwrite option being off.
				670
				671	read events:
				672	The number of events read.
				673
				674	The Tracers
				675	-----------
				676
				677	Here is the list of current tracers that may be configured.
				678
				679	"function"
				680
				681	Function call tracer to trace all kernel functions.
				682
				683	"function_graph"
				684
				685	Similar to the function tracer except that the
				686	function tracer probes the functions on their entry
				687	whereas the function graph tracer traces on both entry
				688	and exit of the functions. It then provides the ability
				689	to draw a graph of function calls similar to C code
				690	source.
				691
				692	"blk"
				693
				694	The block tracer. The tracer used by the blktrace user
				695	application.
				696
				697	"hwlat"
				698
				699	The Hardware Latency tracer is used to detect if the hardware
				700	produces any latency. See "Hardware Latency Detector" section
				701	below.
				702
				703	"irqsoff"
				704
				705	Traces the areas that disable interrupts and saves
				706	the trace with the longest max latency.
				707	See tracing_max_latency. When a new max is recorded,
				708	it replaces the old trace. It is best to view this
				709	trace with the latency-format option enabled, which
				710	happens automatically when the tracer is selected.
				711
				712	"preemptoff"
				713
				714	Similar to irqsoff but traces and records the amount of
				715	time for which preemption is disabled.
				716
				717	"preemptirqsoff"
				718
				719	Similar to irqsoff and preemptoff, but traces and
				720	records the largest time for which irqs and/or preemption
				721	is disabled.
				722
				723	"wakeup"
				724
				725	Traces and records the max latency that it takes for
				726	the highest priority task to get scheduled after
				727	it has been woken up.
				728	Traces all tasks as an average developer would expect.
				729
				730	"wakeup_rt"
				731
				732	Traces and records the max latency that it takes for just
				733	RT tasks (as the current "wakeup" does). This is useful
				734	for those interested in wake up timings of RT tasks.
				735
				736	"wakeup_dl"
				737
				738	Traces and records the max latency that it takes for
				739	a SCHED_DEADLINE task to be woken (as the "wakeup" and
				740	"wakeup_rt" does).
				741
				742	"mmiotrace"
				743
				744	A special tracer that is used to trace binary module.
				745	It will trace all the calls that a module makes to the
				746	hardware. Everything it writes and reads from the I/O
				747	as well.
				748
				749	"branch"
				750
				751	This tracer can be configured when tracing likely/unlikely
				752	calls within the kernel. It will trace when a likely and
				753	unlikely branch is hit and if it was correct in its prediction
				754	of being correct.
				755
				756	"nop"
				757
				758	This is the "trace nothing" tracer. To remove all
				759	tracers from tracing simply echo "nop" into
				760	current_tracer.
				761
				762
				763	Examples of using the tracer
				764	----------------------------
				765
				766	Here are typical examples of using the tracers when controlling
				767	them only with the tracefs interface (without using any
				768	user-land utilities).
				769
				770	Output format:
				771	--------------
				772
				773	Here is an example of the output format of the file "trace"::
				774
				775	# tracer: function
				776	#
				777	# entries-in-buffer/entries-written: 140080/250280 #P:4
				778	#
				779	# _-----=> irqs-off
				780	# / _----=> need-resched
				781	# \| / _---=> hardirq/softirq
				782	# \|\| / _--=> preempt-depth
				783	# \|\|\| / delay
				784	# TASK-PID CPU# \|\|\|\| TIMESTAMP FUNCTION
				785	# \| \| \| \|\|\|\| \| \|
				786	bash-1977 [000] .... 17284.993652: sys_close <-system_call_fastpath
				787	bash-1977 [000] .... 17284.993653: __close_fd <-sys_close
				788	bash-1977 [000] .... 17284.993653: _raw_spin_lock <-__close_fd
				789	sshd-1974 [003] .... 17284.993653: __srcu_read_unlock <-fsnotify
				790	bash-1977 [000] .... 17284.993654: add_preempt_count <-_raw_spin_lock
				791	bash-1977 [000] ...1 17284.993655: _raw_spin_unlock <-__close_fd
				792	bash-1977 [000] ...1 17284.993656: sub_preempt_count <-_raw_spin_unlock
				793	bash-1977 [000] .... 17284.993657: filp_close <-__close_fd
				794	bash-1977 [000] .... 17284.993657: dnotify_flush <-filp_close
				795	sshd-1974 [003] .... 17284.993658: sys_select <-system_call_fastpath
				796	....
				797
				798	A header is printed with the tracer name that is represented by
				799	the trace. In this case the tracer is "function". Then it shows the
				800	number of events in the buffer as well as the total number of entries
				801	that were written. The difference is the number of entries that were
				802	lost due to the buffer filling up (250280 - 140080 = 110200 events
				803	lost).
				804
				805	The header explains the content of the events. Task name "bash", the task
				806	PID "1977", the CPU that it was running on "000", the latency format
				807	(explained below), the timestamp in <secs>.<usecs> format, the
				808	function name that was traced "sys_close" and the parent function that
				809	called this function "system_call_fastpath". The timestamp is the time
				810	at which the function was entered.
				811
				812	Latency trace format
				813	--------------------
				814
				815	When the latency-format option is enabled or when one of the latency
				816	tracers is set, the trace file gives somewhat more information to see
				817	why a latency happened. Here is a typical trace::
				818
				819	# tracer: irqsoff
				820	#
				821	# irqsoff latency trace v1.1.5 on 3.8.0-test+
				822	# --------------------------------------------------------------------
				823	# latency: 259 us, #4/4, CPU#2 \| (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
				824	# -----------------
				825	# \| task: ps-6143 (uid:0 nice:0 policy:0 rt_prio:0)
				826	# -----------------
				827	# => started at: __lock_task_sighand
				828	# => ended at: _raw_spin_unlock_irqrestore
				829	#
				830	#
				831	# _------=> CPU#
				832	# / _-----=> irqs-off
				833	# \| / _----=> need-resched
				834	# \|\| / _---=> hardirq/softirq
				835	# \|\|\| / _--=> preempt-depth
				836	# \|\|\|\| / delay
				837	# cmd pid \|\|\|\|\| time \| caller
				838	# \ / \|\|\|\|\| \ \| /
				839	ps-6143 2d... 0us!: trace_hardirqs_off <-__lock_task_sighand
				840	ps-6143 2d..1 259us+: trace_hardirqs_on <-_raw_spin_unlock_irqrestore
				841	ps-6143 2d..1 263us+: time_hardirqs_on <-_raw_spin_unlock_irqrestore
				842	ps-6143 2d..1 306us : <stack trace>
				843	=> trace_hardirqs_on_caller
				844	=> trace_hardirqs_on
				845	=> _raw_spin_unlock_irqrestore
				846	=> do_task_stat
				847	=> proc_tgid_stat
				848	=> proc_single_show
				849	=> seq_read
				850	=> vfs_read
				851	=> sys_read
				852	=> system_call_fastpath
				853
				854
				855	This shows that the current tracer is "irqsoff" tracing the time
				856	for which interrupts were disabled. It gives the trace version (which
				857	never changes) and the version of the kernel upon which this was executed on
				858	(3.8). Then it displays the max latency in microseconds (259 us). The number
				859	of trace entries displayed and the total number (both are four: #4/4).
				860	VP, KP, SP, and HP are always zero and are reserved for later use.
				861	#P is the number of online CPUs (#P:4).
				862
				863	The task is the process that was running when the latency
				864	occurred. (ps pid: 6143).
				865
				866	The start and stop (the functions in which the interrupts were
				867	disabled and enabled respectively) that caused the latencies:
				868
				869	- __lock_task_sighand is where the interrupts were disabled.
				870	- _raw_spin_unlock_irqrestore is where they were enabled again.
				871
				872	The next lines after the header are the trace itself. The header
				873	explains which is which.
				874
				875	cmd: The name of the process in the trace.
				876
				877	pid: The PID of that process.
				878
				879	CPU#: The CPU which the process was running on.
				880
				881	irqs-off: 'd' interrupts are disabled. '.' otherwise.
				882	.. caution:: If the architecture does not support a way to
				883	read the irq flags variable, an 'X' will always
				884	be printed here.
				885
				886	need-resched:
				887	- 'N' both TIF_NEED_RESCHED and PREEMPT_NEED_RESCHED is set,
				888	- 'n' only TIF_NEED_RESCHED is set,
				889	- 'p' only PREEMPT_NEED_RESCHED is set,
				890	- '.' otherwise.
				891
				892	hardirq/softirq:
				893	- 'Z' - NMI occurred inside a hardirq
				894	- 'z' - NMI is running
				895	- 'H' - hard irq occurred inside a softirq.
				896	- 'h' - hard irq is running
				897	- 's' - soft irq is running
				898	- '.' - normal context.
				899
				900	preempt-depth: The level of preempt_disabled
				901
				902	The above is mostly meaningful for kernel developers.
				903
				904	time:
				905	When the latency-format option is enabled, the trace file
				906	output includes a timestamp relative to the start of the
				907	trace. This differs from the output when latency-format
				908	is disabled, which includes an absolute timestamp.
				909
				910	delay:
				911	This is just to help catch your eye a bit better. And
				912	needs to be fixed to be only relative to the same CPU.
				913	The marks are determined by the difference between this
				914	current trace and the next trace.
				915
				916	- '$' - greater than 1 second
				917	- '@' - greater than 100 milisecond
				918	- '*' - greater than 10 milisecond
				919	- '#' - greater than 1000 microsecond
				920	- '!' - greater than 100 microsecond
				921	- '+' - greater than 10 microsecond
				922	- ' ' - less than or equal to 10 microsecond.
				923
				924	The rest is the same as the 'trace' file.
				925
				926	Note, the latency tracers will usually end with a back trace
				927	to easily find where the latency occurred.
				928
				929	trace_options
				930	-------------
				931
				932	The trace_options file (or the options directory) is used to control
				933	what gets printed in the trace output, or manipulate the tracers.
				934	To see what is available, simply cat the file::
				935
				936	cat trace_options
				937	print-parent
				938	nosym-offset
				939	nosym-addr
				940	noverbose
				941	noraw
				942	nohex
				943	nobin
				944	noblock
				945	trace_printk
				946	annotate
				947	nouserstacktrace
				948	nosym-userobj
				949	noprintk-msg-only
				950	context-info
				951	nolatency-format
				952	record-cmd
				953	norecord-tgid
				954	overwrite
				955	nodisable_on_free
				956	irq-info
				957	markers
				958	noevent-fork
				959	function-trace
				960	nofunction-fork
				961	nodisplay-graph
				962	nostacktrace
				963	nobranch
				964
				965	To disable one of the options, echo in the option prepended with
				966	"no"::
				967
				968	echo noprint-parent > trace_options
				969
				970	To enable an option, leave off the "no"::
				971
				972	echo sym-offset > trace_options
				973
				974	Here are the available options:
				975
				976	print-parent
				977	On function traces, display the calling (parent)
				978	function as well as the function being traced.
				979	::
				980
				981	print-parent:
				982	bash-4000 [01] 1477.606694: simple_strtoul <-kstrtoul
				983
				984	noprint-parent:
				985	bash-4000 [01] 1477.606694: simple_strtoul
				986
				987
				988	sym-offset
				989	Display not only the function name, but also the
				990	offset in the function. For example, instead of
				991	seeing just "ktime_get", you will see
				992	"ktime_get+0xb/0x20".
				993	::
				994
				995	sym-offset:
				996	bash-4000 [01] 1477.606694: simple_strtoul+0x6/0xa0
				997
				998	sym-addr
				999	This will also display the function address as well
				1000	as the function name.
				1001	::
				1002
				1003	sym-addr:
				1004	bash-4000 [01] 1477.606694: simple_strtoul <c0339346>
				1005
				1006	verbose
				1007	This deals with the trace file when the
				1008	latency-format option is enabled.
				1009	::
				1010
				1011	bash 4000 1 0 00000000 00010a95 [58127d26] 1720.415ms \
				1012	(+0.000ms): simple_strtoul (kstrtoul)
				1013
				1014	raw
				1015	This will display raw numbers. This option is best for
				1016	use with user applications that can translate the raw
				1017	numbers better than having it done in the kernel.
				1018
				1019	hex
				1020	Similar to raw, but the numbers will be in a hexadecimal format.
				1021
				1022	bin
				1023	This will print out the formats in raw binary.
				1024
				1025	block
				1026	When set, reading trace_pipe will not block when polled.
				1027
				1028	trace_printk
				1029	Can disable trace_printk() from writing into the buffer.
				1030
				1031	annotate
				1032	It is sometimes confusing when the CPU buffers are full
				1033	and one CPU buffer had a lot of events recently, thus
				1034	a shorter time frame, were another CPU may have only had
				1035	a few events, which lets it have older events. When
				1036	the trace is reported, it shows the oldest events first,
				1037	and it may look like only one CPU ran (the one with the
				1038	oldest events). When the annotate option is set, it will
				1039	display when a new CPU buffer started::
				1040
				1041	<idle>-0 [001] dNs4 21169.031481: wake_up_idle_cpu <-add_timer_on
				1042	<idle>-0 [001] dNs4 21169.031482: _raw_spin_unlock_irqrestore <-add_timer_on
				1043	<idle>-0 [001] .Ns4 21169.031484: sub_preempt_count <-_raw_spin_unlock_irqrestore
				1044	##### CPU 2 buffer started ####
				1045	<idle>-0 [002] .N.1 21169.031484: rcu_idle_exit <-cpu_idle
				1046	<idle>-0 [001] .Ns3 21169.031484: _raw_spin_unlock <-clocksource_watchdog
				1047	<idle>-0 [001] .Ns3 21169.031485: sub_preempt_count <-_raw_spin_unlock
				1048
				1049	userstacktrace
				1050	This option changes the trace. It records a
				1051	stacktrace of the current user space thread after
				1052	each trace event.
				1053
				1054	sym-userobj
				1055	when user stacktrace are enabled, look up which
				1056	object the address belongs to, and print a
				1057	relative address. This is especially useful when
				1058	ASLR is on, otherwise you don't get a chance to
				1059	resolve the address to object/file/line after
				1060	the app is no longer running
				1061
				1062	The lookup is performed when you read
				1063	trace,trace_pipe. Example::
				1064
				1065	a.out-1623 [000] 40874.465068: /root/a.out[+0x480] <-/root/a.out[+0
				1066	x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
				1067
				1068
				1069	printk-msg-only
				1070	When set, trace_printk()s will only show the format
				1071	and not their parameters (if trace_bprintk() or
				1072	trace_bputs() was used to save the trace_printk()).
				1073
				1074	context-info
				1075	Show only the event data. Hides the comm, PID,
				1076	timestamp, CPU, and other useful data.
				1077
				1078	latency-format
				1079	This option changes the trace output. When it is enabled,
				1080	the trace displays additional information about the
				1081	latency, as described in "Latency trace format".
				1082
				1083	record-cmd
				1084	When any event or tracer is enabled, a hook is enabled
				1085	in the sched_switch trace point to fill comm cache
				1086	with mapped pids and comms. But this may cause some
				1087	overhead, and if you only care about pids, and not the
				1088	name of the task, disabling this option can lower the
				1089	impact of tracing. See "saved_cmdlines".
				1090
				1091	record-tgid
				1092	When any event or tracer is enabled, a hook is enabled
				1093	in the sched_switch trace point to fill the cache of
				1094	mapped Thread Group IDs (TGID) mapping to pids. See
				1095	"saved_tgids".
				1096
				1097	overwrite
				1098	This controls what happens when the trace buffer is
				1099	full. If "1" (default), the oldest events are
				1100	discarded and overwritten. If "0", then the newest
				1101	events are discarded.
				1102	(see per_cpu/cpu0/stats for overrun and dropped)
				1103
				1104	disable_on_free
				1105	When the free_buffer is closed, tracing will
				1106	stop (tracing_on set to 0).
				1107
				1108	irq-info
				1109	Shows the interrupt, preempt count, need resched data.
				1110	When disabled, the trace looks like::
				1111
				1112	# tracer: function
				1113	#
				1114	# entries-in-buffer/entries-written: 144405/9452052 #P:4
				1115	#
				1116	# TASK-PID CPU# TIMESTAMP FUNCTION
				1117	# \| \| \| \| \|
				1118	<idle>-0 [002] 23636.756054: ttwu_do_activate.constprop.89 <-try_to_wake_up
				1119	<idle>-0 [002] 23636.756054: activate_task <-ttwu_do_activate.constprop.89
				1120	<idle>-0 [002] 23636.756055: enqueue_task <-activate_task
				1121
				1122
				1123	markers
				1124	When set, the trace_marker is writable (only by root).
				1125	When disabled, the trace_marker will error with EINVAL
				1126	on write.
				1127
				1128	event-fork
				1129	When set, tasks with PIDs listed in set_event_pid will have
				1130	the PIDs of their children added to set_event_pid when those
				1131	tasks fork. Also, when tasks with PIDs in set_event_pid exit,
				1132	their PIDs will be removed from the file.
				1133
				1134	function-trace
				1135	The latency tracers will enable function tracing
				1136	if this option is enabled (default it is). When
				1137	it is disabled, the latency tracers do not trace
				1138	functions. This keeps the overhead of the tracer down
				1139	when performing latency tests.
				1140
				1141	function-fork
				1142	When set, tasks with PIDs listed in set_ftrace_pid will
				1143	have the PIDs of their children added to set_ftrace_pid
				1144	when those tasks fork. Also, when tasks with PIDs in
				1145	set_ftrace_pid exit, their PIDs will be removed from the
				1146	file.
				1147
				1148	display-graph
				1149	When set, the latency tracers (irqsoff, wakeup, etc) will
				1150	use function graph tracing instead of function tracing.
				1151
				1152	stacktrace
				1153	When set, a stack trace is recorded after any trace event
				1154	is recorded.
				1155
				1156	branch
				1157	Enable branch tracing with the tracer. This enables branch
				1158	tracer along with the currently set tracer. Enabling this
				1159	with the "nop" tracer is the same as just enabling the
				1160	"branch" tracer.
				1161
				1162	.. tip:: Some tracers have their own options. They only appear in this
				1163	file when the tracer is active. They always appear in the
				1164	options directory.
				1165
				1166
				1167	Here are the per tracer options:
				1168
				1169	Options for function tracer:
				1170
				1171	func_stack_trace
				1172	When set, a stack trace is recorded after every
				1173	function that is recorded. NOTE! Limit the functions
				1174	that are recorded before enabling this, with
				1175	"set_ftrace_filter" otherwise the system performance
				1176	will be critically degraded. Remember to disable
				1177	this option before clearing the function filter.
				1178
				1179	Options for function_graph tracer:
				1180
				1181	Since the function_graph tracer has a slightly different output
				1182	it has its own options to control what is displayed.
				1183
				1184	funcgraph-overrun
				1185	When set, the "overrun" of the graph stack is
				1186	displayed after each function traced. The
				1187	overrun, is when the stack depth of the calls
				1188	is greater than what is reserved for each task.
				1189	Each task has a fixed array of functions to
				1190	trace in the call graph. If the depth of the
				1191	calls exceeds that, the function is not traced.
				1192	The overrun is the number of functions missed
				1193	due to exceeding this array.
				1194
				1195	funcgraph-cpu
				1196	When set, the CPU number of the CPU where the trace
				1197	occurred is displayed.
				1198
				1199	funcgraph-overhead
				1200	When set, if the function takes longer than
				1201	A certain amount, then a delay marker is
				1202	displayed. See "delay" above, under the
				1203	header description.
				1204
				1205	funcgraph-proc
				1206	Unlike other tracers, the process' command line
				1207	is not displayed by default, but instead only
				1208	when a task is traced in and out during a context
				1209	switch. Enabling this options has the command
				1210	of each process displayed at every line.
				1211
				1212	funcgraph-duration
				1213	At the end of each function (the return)
				1214	the duration of the amount of time in the
				1215	function is displayed in microseconds.
				1216
				1217	funcgraph-abstime
				1218	When set, the timestamp is displayed at each line.
				1219
				1220	funcgraph-irqs
				1221	When disabled, functions that happen inside an
				1222	interrupt will not be traced.
				1223
				1224	funcgraph-tail
				1225	When set, the return event will include the function
				1226	that it represents. By default this is off, and
				1227	only a closing curly bracket "}" is displayed for
				1228	the return of a function.
				1229
				1230	sleep-time
				1231	When running function graph tracer, to include
				1232	the time a task schedules out in its function.
				1233	When enabled, it will account time the task has been
				1234	scheduled out as part of the function call.
				1235
				1236	graph-time
				1237	When running function profiler with function graph tracer,
				1238	to include the time to call nested functions. When this is
				1239	not set, the time reported for the function will only
				1240	include the time the function itself executed for, not the
				1241	time for functions that it called.
				1242
				1243	Options for blk tracer:
				1244
				1245	blk_classic
				1246	Shows a more minimalistic output.
				1247
				1248
				1249	irqsoff
				1250	-------
				1251
				1252	When interrupts are disabled, the CPU can not react to any other
				1253	external event (besides NMIs and SMIs). This prevents the timer
				1254	interrupt from triggering or the mouse interrupt from letting
				1255	the kernel know of a new mouse event. The result is a latency
				1256	with the reaction time.
				1257
				1258	The irqsoff tracer tracks the time for which interrupts are
				1259	disabled. When a new maximum latency is hit, the tracer saves
				1260	the trace leading up to that latency point so that every time a
				1261	new maximum is reached, the old saved trace is discarded and the
				1262	new trace is saved.
				1263
				1264	To reset the maximum, echo 0 into tracing_max_latency. Here is
				1265	an example::
				1266
				1267	# echo 0 > options/function-trace
				1268	# echo irqsoff > current_tracer
				1269	# echo 1 > tracing_on
				1270	# echo 0 > tracing_max_latency
				1271	# ls -ltr
				1272	[...]
				1273	# echo 0 > tracing_on
				1274	# cat trace
				1275	# tracer: irqsoff
				1276	#
				1277	# irqsoff latency trace v1.1.5 on 3.8.0-test+
				1278	# --------------------------------------------------------------------
				1279	# latency: 16 us, #4/4, CPU#0 \| (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
				1280	# -----------------
				1281	# \| task: swapper/0-0 (uid:0 nice:0 policy:0 rt_prio:0)
				1282	# -----------------
				1283	# => started at: run_timer_softirq
				1284	# => ended at: run_timer_softirq
				1285	#
				1286	#
				1287	# _------=> CPU#
				1288	# / _-----=> irqs-off
				1289	# \| / _----=> need-resched
				1290	# \|\| / _---=> hardirq/softirq
				1291	# \|\|\| / _--=> preempt-depth
				1292	# \|\|\|\| / delay
				1293	# cmd pid \|\|\|\|\| time \| caller
				1294	# \ / \|\|\|\|\| \ \| /
				1295	<idle>-0 0d.s2 0us+: _raw_spin_lock_irq <-run_timer_softirq
				1296	<idle>-0 0dNs3 17us : _raw_spin_unlock_irq <-run_timer_softirq
				1297	<idle>-0 0dNs3 17us+: trace_hardirqs_on <-run_timer_softirq
				1298	<idle>-0 0dNs3 25us : <stack trace>
				1299	=> _raw_spin_unlock_irq
				1300	=> run_timer_softirq
				1301	=> __do_softirq
				1302	=> call_softirq
				1303	=> do_softirq
				1304	=> irq_exit
				1305	=> smp_apic_timer_interrupt
				1306	=> apic_timer_interrupt
				1307	=> rcu_idle_exit
				1308	=> cpu_idle
				1309	=> rest_init
				1310	=> start_kernel
				1311	=> x86_64_start_reservations
				1312	=> x86_64_start_kernel
				1313
				1314	Here we see that that we had a latency of 16 microseconds (which is
				1315	very good). The _raw_spin_lock_irq in run_timer_softirq disabled
				1316	interrupts. The difference between the 16 and the displayed
				1317	timestamp 25us occurred because the clock was incremented
				1318	between the time of recording the max latency and the time of
				1319	recording the function that had that latency.
				1320
				1321	Note the above example had function-trace not set. If we set
				1322	function-trace, we get a much larger output::
				1323
				1324	with echo 1 > options/function-trace
				1325
				1326	# tracer: irqsoff
				1327	#
				1328	# irqsoff latency trace v1.1.5 on 3.8.0-test+
				1329	# --------------------------------------------------------------------
				1330	# latency: 71 us, #168/168, CPU#3 \| (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
				1331	# -----------------
				1332	# \| task: bash-2042 (uid:0 nice:0 policy:0 rt_prio:0)
				1333	# -----------------
				1334	# => started at: ata_scsi_queuecmd
				1335	# => ended at: ata_scsi_queuecmd
				1336	#
				1337	#
				1338	# _------=> CPU#
				1339	# / _-----=> irqs-off
				1340	# \| / _----=> need-resched
				1341	# \|\| / _---=> hardirq/softirq
				1342	# \|\|\| / _--=> preempt-depth
				1343	# \|\|\|\| / delay
				1344	# cmd pid \|\|\|\|\| time \| caller
				1345	# \ / \|\|\|\|\| \ \| /
				1346	bash-2042 3d... 0us : _raw_spin_lock_irqsave <-ata_scsi_queuecmd
				1347	bash-2042 3d... 0us : add_preempt_count <-_raw_spin_lock_irqsave
				1348	bash-2042 3d..1 1us : ata_scsi_find_dev <-ata_scsi_queuecmd
				1349	bash-2042 3d..1 1us : __ata_scsi_find_dev <-ata_scsi_find_dev
				1350	bash-2042 3d..1 2us : ata_find_dev.part.14 <-__ata_scsi_find_dev
				1351	bash-2042 3d..1 2us : ata_qc_new_init <-__ata_scsi_queuecmd
				1352	bash-2042 3d..1 3us : ata_sg_init <-__ata_scsi_queuecmd
				1353	bash-2042 3d..1 4us : ata_scsi_rw_xlat <-__ata_scsi_queuecmd
				1354	bash-2042 3d..1 4us : ata_build_rw_tf <-ata_scsi_rw_xlat
				1355	[...]
				1356	bash-2042 3d..1 67us : delay_tsc <-__delay
				1357	bash-2042 3d..1 67us : add_preempt_count <-delay_tsc
				1358	bash-2042 3d..2 67us : sub_preempt_count <-delay_tsc
				1359	bash-2042 3d..1 67us : add_preempt_count <-delay_tsc
				1360	bash-2042 3d..2 68us : sub_preempt_count <-delay_tsc
				1361	bash-2042 3d..1 68us+: ata_bmdma_start <-ata_bmdma_qc_issue
				1362	bash-2042 3d..1 71us : _raw_spin_unlock_irqrestore <-ata_scsi_queuecmd
				1363	bash-2042 3d..1 71us : _raw_spin_unlock_irqrestore <-ata_scsi_queuecmd
				1364	bash-2042 3d..1 72us+: trace_hardirqs_on <-ata_scsi_queuecmd
				1365	bash-2042 3d..1 120us : <stack trace>
				1366	=> _raw_spin_unlock_irqrestore
				1367	=> ata_scsi_queuecmd
				1368	=> scsi_dispatch_cmd
				1369	=> scsi_request_fn
				1370	=> __blk_run_queue_uncond
				1371	=> __blk_run_queue
				1372	=> blk_queue_bio
				1373	=> generic_make_request
				1374	=> submit_bio
				1375	=> submit_bh
				1376	=> __ext3_get_inode_loc
				1377	=> ext3_iget
				1378	=> ext3_lookup
				1379	=> lookup_real
				1380	=> __lookup_hash
				1381	=> walk_component
				1382	=> lookup_last
				1383	=> path_lookupat
				1384	=> filename_lookup
				1385	=> user_path_at_empty
				1386	=> user_path_at
				1387	=> vfs_fstatat
				1388	=> vfs_stat
				1389	=> sys_newstat
				1390	=> system_call_fastpath
				1391
				1392
				1393	Here we traced a 71 microsecond latency. But we also see all the
				1394	functions that were called during that time. Note that by
				1395	enabling function tracing, we incur an added overhead. This
				1396	overhead may extend the latency times. But nevertheless, this
				1397	trace has provided some very helpful debugging information.
				1398
				1399
				1400	preemptoff
				1401	----------
				1402
				1403	When preemption is disabled, we may be able to receive
				1404	interrupts but the task cannot be preempted and a higher
				1405	priority task must wait for preemption to be enabled again
				1406	before it can preempt a lower priority task.
				1407
				1408	The preemptoff tracer traces the places that disable preemption.
				1409	Like the irqsoff tracer, it records the maximum latency for
				1410	which preemption was disabled. The control of preemptoff tracer
				1411	is much like the irqsoff tracer.
				1412	::
				1413
				1414	# echo 0 > options/function-trace
				1415	# echo preemptoff > current_tracer
				1416	# echo 1 > tracing_on
				1417	# echo 0 > tracing_max_latency
				1418	# ls -ltr
				1419	[...]
				1420	# echo 0 > tracing_on
				1421	# cat trace
				1422	# tracer: preemptoff
				1423	#
				1424	# preemptoff latency trace v1.1.5 on 3.8.0-test+
				1425	# --------------------------------------------------------------------
				1426	# latency: 46 us, #4/4, CPU#1 \| (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
				1427	# -----------------
				1428	# \| task: sshd-1991 (uid:0 nice:0 policy:0 rt_prio:0)
				1429	# -----------------
				1430	# => started at: do_IRQ
				1431	# => ended at: do_IRQ
				1432	#
				1433	#
				1434	# _------=> CPU#
				1435	# / _-----=> irqs-off
				1436	# \| / _----=> need-resched
				1437	# \|\| / _---=> hardirq/softirq
				1438	# \|\|\| / _--=> preempt-depth
				1439	# \|\|\|\| / delay
				1440	# cmd pid \|\|\|\|\| time \| caller
				1441	# \ / \|\|\|\|\| \ \| /
				1442	sshd-1991 1d.h. 0us+: irq_enter <-do_IRQ
				1443	sshd-1991 1d..1 46us : irq_exit <-do_IRQ
				1444	sshd-1991 1d..1 47us+: trace_preempt_on <-do_IRQ
				1445	sshd-1991 1d..1 52us : <stack trace>
				1446	=> sub_preempt_count
				1447	=> irq_exit
				1448	=> do_IRQ
				1449	=> ret_from_intr
				1450
				1451
				1452	This has some more changes. Preemption was disabled when an
				1453	interrupt came in (notice the 'h'), and was enabled on exit.
				1454	But we also see that interrupts have been disabled when entering
				1455	the preempt off section and leaving it (the 'd'). We do not know if
				1456	interrupts were enabled in the mean time or shortly after this
				1457	was over.
				1458	::
				1459
				1460	# tracer: preemptoff
				1461	#
				1462	# preemptoff latency trace v1.1.5 on 3.8.0-test+
				1463	# --------------------------------------------------------------------
				1464	# latency: 83 us, #241/241, CPU#1 \| (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
				1465	# -----------------
				1466	# \| task: bash-1994 (uid:0 nice:0 policy:0 rt_prio:0)
				1467	# -----------------
				1468	# => started at: wake_up_new_task
				1469	# => ended at: task_rq_unlock
				1470	#
				1471	#
				1472	# _------=> CPU#
				1473	# / _-----=> irqs-off
				1474	# \| / _----=> need-resched
				1475	# \|\| / _---=> hardirq/softirq
				1476	# \|\|\| / _--=> preempt-depth
				1477	# \|\|\|\| / delay
				1478	# cmd pid \|\|\|\|\| time \| caller
				1479	# \ / \|\|\|\|\| \ \| /
				1480	bash-1994 1d..1 0us : _raw_spin_lock_irqsave <-wake_up_new_task
				1481	bash-1994 1d..1 0us : select_task_rq_fair <-select_task_rq
				1482	bash-1994 1d..1 1us : __rcu_read_lock <-select_task_rq_fair
				1483	bash-1994 1d..1 1us : source_load <-select_task_rq_fair
				1484	bash-1994 1d..1 1us : source_load <-select_task_rq_fair
				1485	[...]
				1486	bash-1994 1d..1 12us : irq_enter <-smp_apic_timer_interrupt
				1487	bash-1994 1d..1 12us : rcu_irq_enter <-irq_enter
				1488	bash-1994 1d..1 13us : add_preempt_count <-irq_enter
				1489	bash-1994 1d.h1 13us : exit_idle <-smp_apic_timer_interrupt
				1490	bash-1994 1d.h1 13us : hrtimer_interrupt <-smp_apic_timer_interrupt
				1491	bash-1994 1d.h1 13us : _raw_spin_lock <-hrtimer_interrupt
				1492	bash-1994 1d.h1 14us : add_preempt_count <-_raw_spin_lock
				1493	bash-1994 1d.h2 14us : ktime_get_update_offsets <-hrtimer_interrupt
				1494	[...]
				1495	bash-1994 1d.h1 35us : lapic_next_event <-clockevents_program_event
				1496	bash-1994 1d.h1 35us : irq_exit <-smp_apic_timer_interrupt
				1497	bash-1994 1d.h1 36us : sub_preempt_count <-irq_exit
				1498	bash-1994 1d..2 36us : do_softirq <-irq_exit
				1499	bash-1994 1d..2 36us : __do_softirq <-call_softirq
				1500	bash-1994 1d..2 36us : __local_bh_disable <-__do_softirq
				1501	bash-1994 1d.s2 37us : add_preempt_count <-_raw_spin_lock_irq
				1502	bash-1994 1d.s3 38us : _raw_spin_unlock <-run_timer_softirq
				1503	bash-1994 1d.s3 39us : sub_preempt_count <-_raw_spin_unlock
				1504	bash-1994 1d.s2 39us : call_timer_fn <-run_timer_softirq
				1505	[...]
				1506	bash-1994 1dNs2 81us : cpu_needs_another_gp <-rcu_process_callbacks
				1507	bash-1994 1dNs2 82us : __local_bh_enable <-__do_softirq
				1508	bash-1994 1dNs2 82us : sub_preempt_count <-__local_bh_enable
				1509	bash-1994 1dN.2 82us : idle_cpu <-irq_exit
				1510	bash-1994 1dN.2 83us : rcu_irq_exit <-irq_exit
				1511	bash-1994 1dN.2 83us : sub_preempt_count <-irq_exit
				1512	bash-1994 1.N.1 84us : _raw_spin_unlock_irqrestore <-task_rq_unlock
				1513	bash-1994 1.N.1 84us+: trace_preempt_on <-task_rq_unlock
				1514	bash-1994 1.N.1 104us : <stack trace>
				1515	=> sub_preempt_count
				1516	=> _raw_spin_unlock_irqrestore
				1517	=> task_rq_unlock
				1518	=> wake_up_new_task
				1519	=> do_fork
				1520	=> sys_clone
				1521	=> stub_clone
				1522
				1523
				1524	The above is an example of the preemptoff trace with
				1525	function-trace set. Here we see that interrupts were not disabled
				1526	the entire time. The irq_enter code lets us know that we entered
				1527	an interrupt 'h'. Before that, the functions being traced still
				1528	show that it is not in an interrupt, but we can see from the
				1529	functions themselves that this is not the case.
				1530
				1531	preemptirqsoff
				1532	--------------
				1533
				1534	Knowing the locations that have interrupts disabled or
				1535	preemption disabled for the longest times is helpful. But
				1536	sometimes we would like to know when either preemption and/or
				1537	interrupts are disabled.
				1538
				1539	Consider the following code::
				1540
				1541	local_irq_disable();
				1542	call_function_with_irqs_off();
				1543	preempt_disable();
				1544	call_function_with_irqs_and_preemption_off();
				1545	local_irq_enable();
				1546	call_function_with_preemption_off();
				1547	preempt_enable();
				1548
				1549	The irqsoff tracer will record the total length of
				1550	call_function_with_irqs_off() and
				1551	call_function_with_irqs_and_preemption_off().
				1552
				1553	The preemptoff tracer will record the total length of
				1554	call_function_with_irqs_and_preemption_off() and
				1555	call_function_with_preemption_off().
				1556
				1557	But neither will trace the time that interrupts and/or
				1558	preemption is disabled. This total time is the time that we can
				1559	not schedule. To record this time, use the preemptirqsoff
				1560	tracer.
				1561
				1562	Again, using this trace is much like the irqsoff and preemptoff
				1563	tracers.
				1564	::
				1565
				1566	# echo 0 > options/function-trace
				1567	# echo preemptirqsoff > current_tracer
				1568	# echo 1 > tracing_on
				1569	# echo 0 > tracing_max_latency
				1570	# ls -ltr
				1571	[...]
				1572	# echo 0 > tracing_on
				1573	# cat trace
				1574	# tracer: preemptirqsoff
				1575	#
				1576	# preemptirqsoff latency trace v1.1.5 on 3.8.0-test+
				1577	# --------------------------------------------------------------------
				1578	# latency: 100 us, #4/4, CPU#3 \| (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
				1579	# -----------------
				1580	# \| task: ls-2230 (uid:0 nice:0 policy:0 rt_prio:0)
				1581	# -----------------
				1582	# => started at: ata_scsi_queuecmd
				1583	# => ended at: ata_scsi_queuecmd
				1584	#
				1585	#
				1586	# _------=> CPU#
				1587	# / _-----=> irqs-off
				1588	# \| / _----=> need-resched
				1589	# \|\| / _---=> hardirq/softirq
				1590	# \|\|\| / _--=> preempt-depth
				1591	# \|\|\|\| / delay
				1592	# cmd pid \|\|\|\|\| time \| caller
				1593	# \ / \|\|\|\|\| \ \| /
				1594	ls-2230 3d... 0us+: _raw_spin_lock_irqsave <-ata_scsi_queuecmd
				1595	ls-2230 3...1 100us : _raw_spin_unlock_irqrestore <-ata_scsi_queuecmd
				1596	ls-2230 3...1 101us+: trace_preempt_on <-ata_scsi_queuecmd
				1597	ls-2230 3...1 111us : <stack trace>
				1598	=> sub_preempt_count
				1599	=> _raw_spin_unlock_irqrestore
				1600	=> ata_scsi_queuecmd
				1601	=> scsi_dispatch_cmd
				1602	=> scsi_request_fn
				1603	=> __blk_run_queue_uncond
				1604	=> __blk_run_queue
				1605	=> blk_queue_bio
				1606	=> generic_make_request
				1607	=> submit_bio
				1608	=> submit_bh
				1609	=> ext3_bread
				1610	=> ext3_dir_bread
				1611	=> htree_dirblock_to_tree
				1612	=> ext3_htree_fill_tree
				1613	=> ext3_readdir
				1614	=> vfs_readdir
				1615	=> sys_getdents
				1616	=> system_call_fastpath
				1617
				1618
				1619	The trace_hardirqs_off_thunk is called from assembly on x86 when
				1620	interrupts are disabled in the assembly code. Without the
				1621	function tracing, we do not know if interrupts were enabled
				1622	within the preemption points. We do see that it started with
				1623	preemption enabled.
				1624
				1625	Here is a trace with function-trace set::
				1626
				1627	# tracer: preemptirqsoff
				1628	#
				1629	# preemptirqsoff latency trace v1.1.5 on 3.8.0-test+
				1630	# --------------------------------------------------------------------
				1631	# latency: 161 us, #339/339, CPU#3 \| (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
				1632	# -----------------
				1633	# \| task: ls-2269 (uid:0 nice:0 policy:0 rt_prio:0)
				1634	# -----------------
				1635	# => started at: schedule
				1636	# => ended at: mutex_unlock
				1637	#
				1638	#
				1639	# _------=> CPU#
				1640	# / _-----=> irqs-off
				1641	# \| / _----=> need-resched
				1642	# \|\| / _---=> hardirq/softirq
				1643	# \|\|\| / _--=> preempt-depth
				1644	# \|\|\|\| / delay
				1645	# cmd pid \|\|\|\|\| time \| caller
				1646	# \ / \|\|\|\|\| \ \| /
				1647	kworker/-59 3...1 0us : __schedule <-schedule
				1648	kworker/-59 3d..1 0us : rcu_preempt_qs <-rcu_note_context_switch
				1649	kworker/-59 3d..1 1us : add_preempt_count <-_raw_spin_lock_irq
				1650	kworker/-59 3d..2 1us : deactivate_task <-__schedule
				1651	kworker/-59 3d..2 1us : dequeue_task <-deactivate_task
				1652	kworker/-59 3d..2 2us : update_rq_clock <-dequeue_task
				1653	kworker/-59 3d..2 2us : dequeue_task_fair <-dequeue_task
				1654	kworker/-59 3d..2 2us : update_curr <-dequeue_task_fair
				1655	kworker/-59 3d..2 2us : update_min_vruntime <-update_curr
				1656	kworker/-59 3d..2 3us : cpuacct_charge <-update_curr
				1657	kworker/-59 3d..2 3us : __rcu_read_lock <-cpuacct_charge
				1658	kworker/-59 3d..2 3us : __rcu_read_unlock <-cpuacct_charge
				1659	kworker/-59 3d..2 3us : update_cfs_rq_blocked_load <-dequeue_task_fair
				1660	kworker/-59 3d..2 4us : clear_buddies <-dequeue_task_fair
				1661	kworker/-59 3d..2 4us : account_entity_dequeue <-dequeue_task_fair
				1662	kworker/-59 3d..2 4us : update_min_vruntime <-dequeue_task_fair
				1663	kworker/-59 3d..2 4us : update_cfs_shares <-dequeue_task_fair
				1664	kworker/-59 3d..2 5us : hrtick_update <-dequeue_task_fair
				1665	kworker/-59 3d..2 5us : wq_worker_sleeping <-__schedule
				1666	kworker/-59 3d..2 5us : kthread_data <-wq_worker_sleeping
				1667	kworker/-59 3d..2 5us : put_prev_task_fair <-__schedule
				1668	kworker/-59 3d..2 6us : pick_next_task_fair <-pick_next_task
				1669	kworker/-59 3d..2 6us : clear_buddies <-pick_next_task_fair
				1670	kworker/-59 3d..2 6us : set_next_entity <-pick_next_task_fair
				1671	kworker/-59 3d..2 6us : update_stats_wait_end <-set_next_entity
				1672	ls-2269 3d..2 7us : finish_task_switch <-__schedule
				1673	ls-2269 3d..2 7us : _raw_spin_unlock_irq <-finish_task_switch
				1674	ls-2269 3d..2 8us : do_IRQ <-ret_from_intr
				1675	ls-2269 3d..2 8us : irq_enter <-do_IRQ
				1676	ls-2269 3d..2 8us : rcu_irq_enter <-irq_enter
				1677	ls-2269 3d..2 9us : add_preempt_count <-irq_enter
				1678	ls-2269 3d.h2 9us : exit_idle <-do_IRQ
				1679	[...]
				1680	ls-2269 3d.h3 20us : sub_preempt_count <-_raw_spin_unlock
				1681	ls-2269 3d.h2 20us : irq_exit <-do_IRQ
				1682	ls-2269 3d.h2 21us : sub_preempt_count <-irq_exit
				1683	ls-2269 3d..3 21us : do_softirq <-irq_exit
				1684	ls-2269 3d..3 21us : __do_softirq <-call_softirq
				1685	ls-2269 3d..3 21us+: __local_bh_disable <-__do_softirq
				1686	ls-2269 3d.s4 29us : sub_preempt_count <-_local_bh_enable_ip
				1687	ls-2269 3d.s5 29us : sub_preempt_count <-_local_bh_enable_ip
				1688	ls-2269 3d.s5 31us : do_IRQ <-ret_from_intr
				1689	ls-2269 3d.s5 31us : irq_enter <-do_IRQ
				1690	ls-2269 3d.s5 31us : rcu_irq_enter <-irq_enter
				1691	[...]
				1692	ls-2269 3d.s5 31us : rcu_irq_enter <-irq_enter
				1693	ls-2269 3d.s5 32us : add_preempt_count <-irq_enter
				1694	ls-2269 3d.H5 32us : exit_idle <-do_IRQ
				1695	ls-2269 3d.H5 32us : handle_irq <-do_IRQ
				1696	ls-2269 3d.H5 32us : irq_to_desc <-handle_irq
				1697	ls-2269 3d.H5 33us : handle_fasteoi_irq <-handle_irq
				1698	[...]
				1699	ls-2269 3d.s5 158us : _raw_spin_unlock_irqrestore <-rtl8139_poll
				1700	ls-2269 3d.s3 158us : net_rps_action_and_irq_enable.isra.65 <-net_rx_action
				1701	ls-2269 3d.s3 159us : __local_bh_enable <-__do_softirq
				1702	ls-2269 3d.s3 159us : sub_preempt_count <-__local_bh_enable
				1703	ls-2269 3d..3 159us : idle_cpu <-irq_exit
				1704	ls-2269 3d..3 159us : rcu_irq_exit <-irq_exit
				1705	ls-2269 3d..3 160us : sub_preempt_count <-irq_exit
				1706	ls-2269 3d... 161us : __mutex_unlock_slowpath <-mutex_unlock
				1707	ls-2269 3d... 162us+: trace_hardirqs_on <-mutex_unlock
				1708	ls-2269 3d... 186us : <stack trace>
				1709	=> __mutex_unlock_slowpath
				1710	=> mutex_unlock
				1711	=> process_output
				1712	=> n_tty_write
				1713	=> tty_write
				1714	=> vfs_write
				1715	=> sys_write
				1716	=> system_call_fastpath
				1717
				1718	This is an interesting trace. It started with kworker running and
				1719	scheduling out and ls taking over. But as soon as ls released the
				1720	rq lock and enabled interrupts (but not preemption) an interrupt
				1721	triggered. When the interrupt finished, it started running softirqs.
				1722	But while the softirq was running, another interrupt triggered.
				1723	When an interrupt is running inside a softirq, the annotation is 'H'.
				1724
				1725
				1726	wakeup
				1727	------
				1728
				1729	One common case that people are interested in tracing is the
				1730	time it takes for a task that is woken to actually wake up.
				1731	Now for non Real-Time tasks, this can be arbitrary. But tracing
				1732	it none the less can be interesting.
				1733
				1734	Without function tracing::
				1735
				1736	# echo 0 > options/function-trace
				1737	# echo wakeup > current_tracer
				1738	# echo 1 > tracing_on
				1739	# echo 0 > tracing_max_latency
				1740	# chrt -f 5 sleep 1
				1741	# echo 0 > tracing_on
				1742	# cat trace
				1743	# tracer: wakeup
				1744	#
				1745	# wakeup latency trace v1.1.5 on 3.8.0-test+
				1746	# --------------------------------------------------------------------
				1747	# latency: 15 us, #4/4, CPU#3 \| (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
				1748	# -----------------
				1749	# \| task: kworker/3:1H-312 (uid:0 nice:-20 policy:0 rt_prio:0)
				1750	# -----------------
				1751	#
				1752	# _------=> CPU#
				1753	# / _-----=> irqs-off
				1754	# \| / _----=> need-resched
				1755	# \|\| / _---=> hardirq/softirq
				1756	# \|\|\| / _--=> preempt-depth
				1757	# \|\|\|\| / delay
				1758	# cmd pid \|\|\|\|\| time \| caller
				1759	# \ / \|\|\|\|\| \ \| /
				1760	<idle>-0 3dNs7 0us : 0:120:R + [003] 312:100:R kworker/3:1H
				1761	<idle>-0 3dNs7 1us+: ttwu_do_activate.constprop.87 <-try_to_wake_up
				1762	<idle>-0 3d..3 15us : __schedule <-schedule
				1763	<idle>-0 3d..3 15us : 0:120:R ==> [003] 312:100:R kworker/3:1H
				1764
				1765	The tracer only traces the highest priority task in the system
				1766	to avoid tracing the normal circumstances. Here we see that
				1767	the kworker with a nice priority of -20 (not very nice), took
				1768	just 15 microseconds from the time it woke up, to the time it
				1769	ran.
				1770
				1771	Non Real-Time tasks are not that interesting. A more interesting
				1772	trace is to concentrate only on Real-Time tasks.
				1773
				1774	wakeup_rt
				1775	---------
				1776
				1777	In a Real-Time environment it is very important to know the
				1778	wakeup time it takes for the highest priority task that is woken
				1779	up to the time that it executes. This is also known as "schedule
				1780	latency". I stress the point that this is about RT tasks. It is
				1781	also important to know the scheduling latency of non-RT tasks,
				1782	but the average schedule latency is better for non-RT tasks.
				1783	Tools like LatencyTop are more appropriate for such
				1784	measurements.
				1785
				1786	Real-Time environments are interested in the worst case latency.
				1787	That is the longest latency it takes for something to happen,
				1788	and not the average. We can have a very fast scheduler that may
				1789	only have a large latency once in a while, but that would not
				1790	work well with Real-Time tasks. The wakeup_rt tracer was designed
				1791	to record the worst case wakeups of RT tasks. Non-RT tasks are
				1792	not recorded because the tracer only records one worst case and
				1793	tracing non-RT tasks that are unpredictable will overwrite the
				1794	worst case latency of RT tasks (just run the normal wakeup
				1795	tracer for a while to see that effect).
				1796
				1797	Since this tracer only deals with RT tasks, we will run this
				1798	slightly differently than we did with the previous tracers.
				1799	Instead of performing an 'ls', we will run 'sleep 1' under
				1800	'chrt' which changes the priority of the task.
				1801	::
				1802
				1803	# echo 0 > options/function-trace
				1804	# echo wakeup_rt > current_tracer
				1805	# echo 1 > tracing_on
				1806	# echo 0 > tracing_max_latency
				1807	# chrt -f 5 sleep 1
				1808	# echo 0 > tracing_on
				1809	# cat trace
				1810	# tracer: wakeup
				1811	#
				1812	# tracer: wakeup_rt
				1813	#
				1814	# wakeup_rt latency trace v1.1.5 on 3.8.0-test+
				1815	# --------------------------------------------------------------------
				1816	# latency: 5 us, #4/4, CPU#3 \| (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
				1817	# -----------------
				1818	# \| task: sleep-2389 (uid:0 nice:0 policy:1 rt_prio:5)
				1819	# -----------------
				1820	#
				1821	# _------=> CPU#
				1822	# / _-----=> irqs-off
				1823	# \| / _----=> need-resched
				1824	# \|\| / _---=> hardirq/softirq
				1825	# \|\|\| / _--=> preempt-depth
				1826	# \|\|\|\| / delay
				1827	# cmd pid \|\|\|\|\| time \| caller
				1828	# \ / \|\|\|\|\| \ \| /
				1829	<idle>-0 3d.h4 0us : 0:120:R + [003] 2389: 94:R sleep
				1830	<idle>-0 3d.h4 1us+: ttwu_do_activate.constprop.87 <-try_to_wake_up
				1831	<idle>-0 3d..3 5us : __schedule <-schedule
				1832	<idle>-0 3d..3 5us : 0:120:R ==> [003] 2389: 94:R sleep
				1833
				1834
				1835	Running this on an idle system, we see that it only took 5 microseconds
				1836	to perform the task switch. Note, since the trace point in the schedule
				1837	is before the actual "switch", we stop the tracing when the recorded task
				1838	is about to schedule in. This may change if we add a new marker at the
				1839	end of the scheduler.
				1840
				1841	Notice that the recorded task is 'sleep' with the PID of 2389
				1842	and it has an rt_prio of 5. This priority is user-space priority
				1843	and not the internal kernel priority. The policy is 1 for
				1844	SCHED_FIFO and 2 for SCHED_RR.
				1845
				1846	Note, that the trace data shows the internal priority (99 - rtprio).
				1847	::
				1848
				1849	<idle>-0 3d..3 5us : 0:120:R ==> [003] 2389: 94:R sleep
				1850
				1851	The 0:120:R means idle was running with a nice priority of 0 (120 - 120)
				1852	and in the running state 'R'. The sleep task was scheduled in with
				1853	2389: 94:R. That is the priority is the kernel rtprio (99 - 5 = 94)
				1854	and it too is in the running state.
				1855
				1856	Doing the same with chrt -r 5 and function-trace set.
				1857	::
				1858
				1859	echo 1 > options/function-trace
				1860
				1861	# tracer: wakeup_rt
				1862	#
				1863	# wakeup_rt latency trace v1.1.5 on 3.8.0-test+
				1864	# --------------------------------------------------------------------
				1865	# latency: 29 us, #85/85, CPU#3 \| (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
				1866	# -----------------
				1867	# \| task: sleep-2448 (uid:0 nice:0 policy:1 rt_prio:5)
				1868	# -----------------
				1869	#
				1870	# _------=> CPU#
				1871	# / _-----=> irqs-off
				1872	# \| / _----=> need-resched
				1873	# \|\| / _---=> hardirq/softirq
				1874	# \|\|\| / _--=> preempt-depth
				1875	# \|\|\|\| / delay
				1876	# cmd pid \|\|\|\|\| time \| caller
				1877	# \ / \|\|\|\|\| \ \| /
				1878	<idle>-0 3d.h4 1us+: 0:120:R + [003] 2448: 94:R sleep
				1879	<idle>-0 3d.h4 2us : ttwu_do_activate.constprop.87 <-try_to_wake_up
				1880	<idle>-0 3d.h3 3us : check_preempt_curr <-ttwu_do_wakeup
				1881	<idle>-0 3d.h3 3us : resched_curr <-check_preempt_curr
				1882	<idle>-0 3dNh3 4us : task_woken_rt <-ttwu_do_wakeup
				1883	<idle>-0 3dNh3 4us : _raw_spin_unlock <-try_to_wake_up
				1884	<idle>-0 3dNh3 4us : sub_preempt_count <-_raw_spin_unlock
				1885	<idle>-0 3dNh2 5us : ttwu_stat <-try_to_wake_up
				1886	<idle>-0 3dNh2 5us : _raw_spin_unlock_irqrestore <-try_to_wake_up
				1887	<idle>-0 3dNh2 6us : sub_preempt_count <-_raw_spin_unlock_irqrestore
				1888	<idle>-0 3dNh1 6us : _raw_spin_lock <-__run_hrtimer
				1889	<idle>-0 3dNh1 6us : add_preempt_count <-_raw_spin_lock
				1890	<idle>-0 3dNh2 7us : _raw_spin_unlock <-hrtimer_interrupt
				1891	<idle>-0 3dNh2 7us : sub_preempt_count <-_raw_spin_unlock
				1892	<idle>-0 3dNh1 7us : tick_program_event <-hrtimer_interrupt
				1893	<idle>-0 3dNh1 7us : clockevents_program_event <-tick_program_event
				1894	<idle>-0 3dNh1 8us : ktime_get <-clockevents_program_event
				1895	<idle>-0 3dNh1 8us : lapic_next_event <-clockevents_program_event
				1896	<idle>-0 3dNh1 8us : irq_exit <-smp_apic_timer_interrupt
				1897	<idle>-0 3dNh1 9us : sub_preempt_count <-irq_exit
				1898	<idle>-0 3dN.2 9us : idle_cpu <-irq_exit
				1899	<idle>-0 3dN.2 9us : rcu_irq_exit <-irq_exit
				1900	<idle>-0 3dN.2 10us : rcu_eqs_enter_common.isra.45 <-rcu_irq_exit
				1901	<idle>-0 3dN.2 10us : sub_preempt_count <-irq_exit
				1902	<idle>-0 3.N.1 11us : rcu_idle_exit <-cpu_idle
				1903	<idle>-0 3dN.1 11us : rcu_eqs_exit_common.isra.43 <-rcu_idle_exit
				1904	<idle>-0 3.N.1 11us : tick_nohz_idle_exit <-cpu_idle
				1905	<idle>-0 3dN.1 12us : menu_hrtimer_cancel <-tick_nohz_idle_exit
				1906	<idle>-0 3dN.1 12us : ktime_get <-tick_nohz_idle_exit
				1907	<idle>-0 3dN.1 12us : tick_do_update_jiffies64 <-tick_nohz_idle_exit
				1908	<idle>-0 3dN.1 13us : cpu_load_update_nohz <-tick_nohz_idle_exit
				1909	<idle>-0 3dN.1 13us : _raw_spin_lock <-cpu_load_update_nohz
				1910	<idle>-0 3dN.1 13us : add_preempt_count <-_raw_spin_lock
				1911	<idle>-0 3dN.2 13us : __cpu_load_update <-cpu_load_update_nohz
				1912	<idle>-0 3dN.2 14us : sched_avg_update <-__cpu_load_update
				1913	<idle>-0 3dN.2 14us : _raw_spin_unlock <-cpu_load_update_nohz
				1914	<idle>-0 3dN.2 14us : sub_preempt_count <-_raw_spin_unlock
				1915	<idle>-0 3dN.1 15us : calc_load_nohz_stop <-tick_nohz_idle_exit
				1916	<idle>-0 3dN.1 15us : touch_softlockup_watchdog <-tick_nohz_idle_exit
				1917	<idle>-0 3dN.1 15us : hrtimer_cancel <-tick_nohz_idle_exit
				1918	<idle>-0 3dN.1 15us : hrtimer_try_to_cancel <-hrtimer_cancel
				1919	<idle>-0 3dN.1 16us : lock_hrtimer_base.isra.18 <-hrtimer_try_to_cancel
				1920	<idle>-0 3dN.1 16us : _raw_spin_lock_irqsave <-lock_hrtimer_base.isra.18
				1921	<idle>-0 3dN.1 16us : add_preempt_count <-_raw_spin_lock_irqsave
				1922	<idle>-0 3dN.2 17us : __remove_hrtimer <-remove_hrtimer.part.16
				1923	<idle>-0 3dN.2 17us : hrtimer_force_reprogram <-__remove_hrtimer
				1924	<idle>-0 3dN.2 17us : tick_program_event <-hrtimer_force_reprogram
				1925	<idle>-0 3dN.2 18us : clockevents_program_event <-tick_program_event
				1926	<idle>-0 3dN.2 18us : ktime_get <-clockevents_program_event
				1927	<idle>-0 3dN.2 18us : lapic_next_event <-clockevents_program_event
				1928	<idle>-0 3dN.2 19us : _raw_spin_unlock_irqrestore <-hrtimer_try_to_cancel
				1929	<idle>-0 3dN.2 19us : sub_preempt_count <-_raw_spin_unlock_irqrestore
				1930	<idle>-0 3dN.1 19us : hrtimer_forward <-tick_nohz_idle_exit
				1931	<idle>-0 3dN.1 20us : ktime_add_safe <-hrtimer_forward
				1932	<idle>-0 3dN.1 20us : ktime_add_safe <-hrtimer_forward
				1933	<idle>-0 3dN.1 20us : hrtimer_start_range_ns <-hrtimer_start_expires.constprop.11
				1934	<idle>-0 3dN.1 20us : __hrtimer_start_range_ns <-hrtimer_start_range_ns
				1935	<idle>-0 3dN.1 21us : lock_hrtimer_base.isra.18 <-__hrtimer_start_range_ns
				1936	<idle>-0 3dN.1 21us : _raw_spin_lock_irqsave <-lock_hrtimer_base.isra.18
				1937	<idle>-0 3dN.1 21us : add_preempt_count <-_raw_spin_lock_irqsave
				1938	<idle>-0 3dN.2 22us : ktime_add_safe <-__hrtimer_start_range_ns
				1939	<idle>-0 3dN.2 22us : enqueue_hrtimer <-__hrtimer_start_range_ns
				1940	<idle>-0 3dN.2 22us : tick_program_event <-__hrtimer_start_range_ns
				1941	<idle>-0 3dN.2 23us : clockevents_program_event <-tick_program_event
				1942	<idle>-0 3dN.2 23us : ktime_get <-clockevents_program_event
				1943	<idle>-0 3dN.2 23us : lapic_next_event <-clockevents_program_event
				1944	<idle>-0 3dN.2 24us : _raw_spin_unlock_irqrestore <-__hrtimer_start_range_ns
				1945	<idle>-0 3dN.2 24us : sub_preempt_count <-_raw_spin_unlock_irqrestore
				1946	<idle>-0 3dN.1 24us : account_idle_ticks <-tick_nohz_idle_exit
				1947	<idle>-0 3dN.1 24us : account_idle_time <-account_idle_ticks
				1948	<idle>-0 3.N.1 25us : sub_preempt_count <-cpu_idle
				1949	<idle>-0 3.N.. 25us : schedule <-cpu_idle
				1950	<idle>-0 3.N.. 25us : __schedule <-preempt_schedule
				1951	<idle>-0 3.N.. 26us : add_preempt_count <-__schedule
				1952	<idle>-0 3.N.1 26us : rcu_note_context_switch <-__schedule
				1953	<idle>-0 3.N.1 26us : rcu_sched_qs <-rcu_note_context_switch
				1954	<idle>-0 3dN.1 27us : rcu_preempt_qs <-rcu_note_context_switch
				1955	<idle>-0 3.N.1 27us : _raw_spin_lock_irq <-__schedule
				1956	<idle>-0 3dN.1 27us : add_preempt_count <-_raw_spin_lock_irq
				1957	<idle>-0 3dN.2 28us : put_prev_task_idle <-__schedule
				1958	<idle>-0 3dN.2 28us : pick_next_task_stop <-pick_next_task
				1959	<idle>-0 3dN.2 28us : pick_next_task_rt <-pick_next_task
				1960	<idle>-0 3dN.2 29us : dequeue_pushable_task <-pick_next_task_rt
				1961	<idle>-0 3d..3 29us : __schedule <-preempt_schedule
				1962	<idle>-0 3d..3 30us : 0:120:R ==> [003] 2448: 94:R sleep
				1963
				1964	This isn't that big of a trace, even with function tracing enabled,
				1965	so I included the entire trace.
				1966
				1967	The interrupt went off while when the system was idle. Somewhere
				1968	before task_woken_rt() was called, the NEED_RESCHED flag was set,
				1969	this is indicated by the first occurrence of the 'N' flag.
				1970
				1971	Latency tracing and events
				1972	--------------------------
				1973	As function tracing can induce a much larger latency, but without
				1974	seeing what happens within the latency it is hard to know what
				1975	caused it. There is a middle ground, and that is with enabling
				1976	events.
				1977	::
				1978
				1979	# echo 0 > options/function-trace
				1980	# echo wakeup_rt > current_tracer
				1981	# echo 1 > events/enable
				1982	# echo 1 > tracing_on
				1983	# echo 0 > tracing_max_latency
				1984	# chrt -f 5 sleep 1
				1985	# echo 0 > tracing_on
				1986	# cat trace
				1987	# tracer: wakeup_rt
				1988	#
				1989	# wakeup_rt latency trace v1.1.5 on 3.8.0-test+
				1990	# --------------------------------------------------------------------
				1991	# latency: 6 us, #12/12, CPU#2 \| (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
				1992	# -----------------
				1993	# \| task: sleep-5882 (uid:0 nice:0 policy:1 rt_prio:5)
				1994	# -----------------
				1995	#
				1996	# _------=> CPU#
				1997	# / _-----=> irqs-off
				1998	# \| / _----=> need-resched
				1999	# \|\| / _---=> hardirq/softirq
				2000	# \|\|\| / _--=> preempt-depth
				2001	# \|\|\|\| / delay
				2002	# cmd pid \|\|\|\|\| time \| caller
				2003	# \ / \|\|\|\|\| \ \| /
				2004	<idle>-0 2d.h4 0us : 0:120:R + [002] 5882: 94:R sleep
				2005	<idle>-0 2d.h4 0us : ttwu_do_activate.constprop.87 <-try_to_wake_up
				2006	<idle>-0 2d.h4 1us : sched_wakeup: comm=sleep pid=5882 prio=94 success=1 target_cpu=002
				2007	<idle>-0 2dNh2 1us : hrtimer_expire_exit: hrtimer=ffff88007796feb8
				2008	<idle>-0 2.N.2 2us : power_end: cpu_id=2
				2009	<idle>-0 2.N.2 3us : cpu_idle: state=4294967295 cpu_id=2
				2010	<idle>-0 2dN.3 4us : hrtimer_cancel: hrtimer=ffff88007d50d5e0
				2011	<idle>-0 2dN.3 4us : hrtimer_start: hrtimer=ffff88007d50d5e0 function=tick_sched_timer expires=34311211000000 softexpires=34311211000000
				2012	<idle>-0 2.N.2 5us : rcu_utilization: Start context switch
				2013	<idle>-0 2.N.2 5us : rcu_utilization: End context switch
				2014	<idle>-0 2d..3 6us : __schedule <-schedule
				2015	<idle>-0 2d..3 6us : 0:120:R ==> [002] 5882: 94:R sleep
				2016
				2017
				2018	Hardware Latency Detector
				2019	-------------------------
				2020
				2021	The hardware latency detector is executed by enabling the "hwlat" tracer.
				2022
				2023	NOTE, this tracer will affect the performance of the system as it will
				2024	periodically make a CPU constantly busy with interrupts disabled.
				2025	::
				2026
				2027	# echo hwlat > current_tracer
				2028	# sleep 100
				2029	# cat trace
				2030	# tracer: hwlat
				2031	#
				2032	# _-----=> irqs-off
				2033	# / _----=> need-resched
				2034	# \| / _---=> hardirq/softirq
				2035	# \|\| / _--=> preempt-depth
				2036	# \|\|\| / delay
				2037	# TASK-PID CPU# \|\|\|\| TIMESTAMP FUNCTION
				2038	# \| \| \| \|\|\|\| \| \|
				2039	<...>-3638 [001] d... 19452.055471: #1 inner/outer(us): 12/14 ts:1499801089.066141940
				2040	<...>-3638 [003] d... 19454.071354: #2 inner/outer(us): 11/9 ts:1499801091.082164365
				2041	<...>-3638 [002] dn.. 19461.126852: #3 inner/outer(us): 12/9 ts:1499801098.138150062
				2042	<...>-3638 [001] d... 19488.340960: #4 inner/outer(us): 8/12 ts:1499801125.354139633
				2043	<...>-3638 [003] d... 19494.388553: #5 inner/outer(us): 8/12 ts:1499801131.402150961
				2044	<...>-3638 [003] d... 19501.283419: #6 inner/outer(us): 0/12 ts:1499801138.297435289 nmi-total:4 nmi-count:1
				2045
				2046
				2047	The above output is somewhat the same in the header. All events will have
				2048	interrupts disabled 'd'. Under the FUNCTION title there is:
				2049
				2050	#1
				2051	This is the count of events recorded that were greater than the
				2052	tracing_threshold (See below).
				2053
				2054	inner/outer(us): 12/14
				2055
				2056	This shows two numbers as "inner latency" and "outer latency". The test
				2057	runs in a loop checking a timestamp twice. The latency detected within
				2058	the two timestamps is the "inner latency" and the latency detected
				2059	after the previous timestamp and the next timestamp in the loop is
				2060	the "outer latency".
				2061
				2062	ts:1499801089.066141940
				2063
				2064	The absolute timestamp that the event happened.
				2065
				2066	nmi-total:4 nmi-count:1
				2067
				2068	On architectures that support it, if an NMI comes in during the
				2069	test, the time spent in NMI is reported in "nmi-total" (in
				2070	microseconds).
				2071
				2072	All architectures that have NMIs will show the "nmi-count" if an
				2073	NMI comes in during the test.
				2074
				2075	hwlat files:
				2076
				2077	tracing_threshold
				2078	This gets automatically set to "10" to represent 10
				2079	microseconds. This is the threshold of latency that
				2080	needs to be detected before the trace will be recorded.
				2081
				2082	Note, when hwlat tracer is finished (another tracer is
				2083	written into "current_tracer"), the original value for
				2084	tracing_threshold is placed back into this file.
				2085
				2086	hwlat_detector/width
				2087	The length of time the test runs with interrupts disabled.
				2088
				2089	hwlat_detector/window
				2090	The length of time of the window which the test
				2091	runs. That is, the test will run for "width"
				2092	microseconds per "window" microseconds
				2093
				2094	tracing_cpumask
				2095	When the test is started. A kernel thread is created that
				2096	runs the test. This thread will alternate between CPUs
				2097	listed in the tracing_cpumask between each period
				2098	(one "window"). To limit the test to specific CPUs
				2099	set the mask in this file to only the CPUs that the test
				2100	should run on.
				2101
				2102	function
				2103	--------
				2104
				2105	This tracer is the function tracer. Enabling the function tracer
				2106	can be done from the debug file system. Make sure the
				2107	ftrace_enabled is set; otherwise this tracer is a nop.
				2108	See the "ftrace_enabled" section below.
				2109	::
				2110
				2111	# sysctl kernel.ftrace_enabled=1
				2112	# echo function > current_tracer
				2113	# echo 1 > tracing_on
				2114	# usleep 1
				2115	# echo 0 > tracing_on
				2116	# cat trace
				2117	# tracer: function
				2118	#
				2119	# entries-in-buffer/entries-written: 24799/24799 #P:4
				2120	#
				2121	# _-----=> irqs-off
				2122	# / _----=> need-resched
				2123	# \| / _---=> hardirq/softirq
				2124	# \|\| / _--=> preempt-depth
				2125	# \|\|\| / delay
				2126	# TASK-PID CPU# \|\|\|\| TIMESTAMP FUNCTION
				2127	# \| \| \| \|\|\|\| \| \|
				2128	bash-1994 [002] .... 3082.063030: mutex_unlock <-rb_simple_write
				2129	bash-1994 [002] .... 3082.063031: __mutex_unlock_slowpath <-mutex_unlock
				2130	bash-1994 [002] .... 3082.063031: __fsnotify_parent <-fsnotify_modify
				2131	bash-1994 [002] .... 3082.063032: fsnotify <-fsnotify_modify
				2132	bash-1994 [002] .... 3082.063032: __srcu_read_lock <-fsnotify
				2133	bash-1994 [002] .... 3082.063032: add_preempt_count <-__srcu_read_lock
				2134	bash-1994 [002] ...1 3082.063032: sub_preempt_count <-__srcu_read_lock
				2135	bash-1994 [002] .... 3082.063033: __srcu_read_unlock <-fsnotify
				2136	[...]
				2137
				2138
				2139	Note: function tracer uses ring buffers to store the above
				2140	entries. The newest data may overwrite the oldest data.
				2141	Sometimes using echo to stop the trace is not sufficient because
				2142	the tracing could have overwritten the data that you wanted to
				2143	record. For this reason, it is sometimes better to disable
				2144	tracing directly from a program. This allows you to stop the
				2145	tracing at the point that you hit the part that you are
				2146	interested in. To disable the tracing directly from a C program,
				2147	something like following code snippet can be used::
				2148
				2149	int trace_fd;
				2150	[...]
				2151	int main(int argc, char *argv[]) {
				2152	[...]
				2153	trace_fd = open(tracing_file("tracing_on"), O_WRONLY);
				2154	[...]
				2155	if (condition_hit()) {
				2156	write(trace_fd, "0", 1);
				2157	}
				2158	[...]
				2159	}
				2160
				2161
				2162	Single thread tracing
				2163	---------------------
				2164
				2165	By writing into set_ftrace_pid you can trace a
				2166	single thread. For example::
				2167
				2168	# cat set_ftrace_pid
				2169	no pid
				2170	# echo 3111 > set_ftrace_pid
				2171	# cat set_ftrace_pid
				2172	3111
				2173	# echo function > current_tracer
				2174	# cat trace \| head
				2175	# tracer: function
				2176	#
				2177	# TASK-PID CPU# TIMESTAMP FUNCTION
				2178	# \| \| \| \| \|
				2179	yum-updatesd-3111 [003] 1637.254676: finish_task_switch <-thread_return
				2180	yum-updatesd-3111 [003] 1637.254681: hrtimer_cancel <-schedule_hrtimeout_range
				2181	yum-updatesd-3111 [003] 1637.254682: hrtimer_try_to_cancel <-hrtimer_cancel
				2182	yum-updatesd-3111 [003] 1637.254683: lock_hrtimer_base <-hrtimer_try_to_cancel
				2183	yum-updatesd-3111 [003] 1637.254685: fget_light <-do_sys_poll
				2184	yum-updatesd-3111 [003] 1637.254686: pipe_poll <-do_sys_poll
				2185	# echo > set_ftrace_pid
				2186	# cat trace \|head
				2187	# tracer: function
				2188	#
				2189	# TASK-PID CPU# TIMESTAMP FUNCTION
				2190	# \| \| \| \| \|
				2191	##### CPU 3 buffer started ####
				2192	yum-updatesd-3111 [003] 1701.957688: free_poll_entry <-poll_freewait
				2193	yum-updatesd-3111 [003] 1701.957689: remove_wait_queue <-free_poll_entry
				2194	yum-updatesd-3111 [003] 1701.957691: fput <-free_poll_entry
				2195	yum-updatesd-3111 [003] 1701.957692: audit_syscall_exit <-sysret_audit
				2196	yum-updatesd-3111 [003] 1701.957693: path_put <-audit_syscall_exit
				2197
				2198	If you want to trace a function when executing, you could use
				2199	something like this simple program.
				2200	::
				2201
				2202	#include <stdio.h>
				2203	#include <stdlib.h>
				2204	#include <sys/types.h>
				2205	#include <sys/stat.h>
				2206	#include <fcntl.h>
				2207	#include <unistd.h>
				2208	#include <string.h>
				2209
				2210	#define _STR(x) #x
				2211	#define STR(x) _STR(x)
				2212	#define MAX_PATH 256
				2213
				2214	const char *find_tracefs(void)
				2215	{
				2216	static char tracefs[MAX_PATH+1];
				2217	static int tracefs_found;
				2218	char type[100];
				2219	FILE *fp;
				2220
				2221	if (tracefs_found)
				2222	return tracefs;
				2223
				2224	if ((fp = fopen("/proc/mounts","r")) == NULL) {
				2225	perror("/proc/mounts");
				2226	return NULL;
				2227	}
				2228
				2229	while (fscanf(fp, "%*s %"
				2230	STR(MAX_PATH)
				2231	"s %99s %s %d %*d\n",
				2232	tracefs, type) == 2) {
				2233	if (strcmp(type, "tracefs") == 0)
				2234	break;
				2235	}
				2236	fclose(fp);
				2237
				2238	if (strcmp(type, "tracefs") != 0) {
				2239	fprintf(stderr, "tracefs not mounted");
				2240	return NULL;
				2241	}
				2242
				2243	strcat(tracefs, "/tracing/");
				2244	tracefs_found = 1;
				2245
				2246	return tracefs;
				2247	}
				2248
				2249	const char tracing_file(const char file_name)
				2250	{
				2251	static char trace_file[MAX_PATH+1];
				2252	snprintf(trace_file, MAX_PATH, "%s/%s", find_tracefs(), file_name);
				2253	return trace_file;
				2254	}
				2255
				2256	int main (int argc, char **argv)
				2257	{
				2258	if (argc < 1)
				2259	exit(-1);
				2260
				2261	if (fork() > 0) {
				2262	int fd, ffd;
				2263	char line[64];
				2264	int s;
				2265
				2266	ffd = open(tracing_file("current_tracer"), O_WRONLY);
				2267	if (ffd < 0)
				2268	exit(-1);
				2269	write(ffd, "nop", 3);
				2270
				2271	fd = open(tracing_file("set_ftrace_pid"), O_WRONLY);
				2272	s = sprintf(line, "%d\n", getpid());
				2273	write(fd, line, s);
				2274
				2275	write(ffd, "function", 8);
				2276
				2277	close(fd);
				2278	close(ffd);
				2279
				2280	execvp(argv[1], argv+1);
				2281	}
				2282
				2283	return 0;
				2284	}
				2285
				2286	Or this simple script!
				2287	::
				2288
				2289	#!/bin/bash
				2290
				2291	tracefs=`sed -ne 's/^tracefs $.$ tracefs./\1/p' /proc/mounts`
				2292	echo nop > $tracefs/tracing/current_tracer
				2293	echo 0 > $tracefs/tracing/tracing_on
				2294	echo $$ > $tracefs/tracing/set_ftrace_pid
				2295	echo function > $tracefs/tracing/current_tracer
				2296	echo 1 > $tracefs/tracing/tracing_on
				2297	exec "$@"
				2298
				2299
				2300	function graph tracer
				2301	---------------------------
				2302
				2303	This tracer is similar to the function tracer except that it
				2304	probes a function on its entry and its exit. This is done by
				2305	using a dynamically allocated stack of return addresses in each
				2306	task_struct. On function entry the tracer overwrites the return
				2307	address of each function traced to set a custom probe. Thus the
				2308	original return address is stored on the stack of return address
				2309	in the task_struct.
				2310
				2311	Probing on both ends of a function leads to special features
				2312	such as:
				2313
				2314	- measure of a function's time execution
				2315	- having a reliable call stack to draw function calls graph
				2316
				2317	This tracer is useful in several situations:
				2318
				2319	- you want to find the reason of a strange kernel behavior and
				2320	need to see what happens in detail on any areas (or specific
				2321	ones).
				2322
				2323	- you are experiencing weird latencies but it's difficult to
				2324	find its origin.
				2325
				2326	- you want to find quickly which path is taken by a specific
				2327	function
				2328
				2329	- you just want to peek inside a working kernel and want to see
				2330	what happens there.
				2331
				2332	::
				2333
				2334	# tracer: function_graph
				2335	#
				2336	# CPU DURATION FUNCTION CALLS
				2337	# \| \| \| \| \| \| \|
				2338
				2339	0) \| sys_open() {
				2340	0) \| do_sys_open() {
				2341	0) \| getname() {
				2342	0) \| kmem_cache_alloc() {
				2343	0) 1.382 us \| __might_sleep();
				2344	0) 2.478 us \| }
				2345	0) \| strncpy_from_user() {
				2346	0) \| might_fault() {
				2347	0) 1.389 us \| __might_sleep();
				2348	0) 2.553 us \| }
				2349	0) 3.807 us \| }
				2350	0) 7.876 us \| }
				2351	0) \| alloc_fd() {
				2352	0) 0.668 us \| _spin_lock();
				2353	0) 0.570 us \| expand_files();
				2354	0) 0.586 us \| _spin_unlock();
				2355
				2356
				2357	There are several columns that can be dynamically
				2358	enabled/disabled. You can use every combination of options you
				2359	want, depending on your needs.
				2360
				2361	- The cpu number on which the function executed is default
				2362	enabled. It is sometimes better to only trace one cpu (see
				2363	tracing_cpu_mask file) or you might sometimes see unordered
				2364	function calls while cpu tracing switch.
				2365
				2366	- hide: echo nofuncgraph-cpu > trace_options
				2367	- show: echo funcgraph-cpu > trace_options
				2368
				2369	- The duration (function's time of execution) is displayed on
				2370	the closing bracket line of a function or on the same line
				2371	than the current function in case of a leaf one. It is default
				2372	enabled.
				2373
				2374	- hide: echo nofuncgraph-duration > trace_options
				2375	- show: echo funcgraph-duration > trace_options
				2376
				2377	- The overhead field precedes the duration field in case of
				2378	reached duration thresholds.
				2379
				2380	- hide: echo nofuncgraph-overhead > trace_options
				2381	- show: echo funcgraph-overhead > trace_options
				2382	- depends on: funcgraph-duration
				2383
				2384	ie::
				2385
				2386	3) # 1837.709 us \| } /* __switch_to */
				2387	3) \| finish_task_switch() {
				2388	3) 0.313 us \| _raw_spin_unlock_irq();
				2389	3) 3.177 us \| }
				2390	3) # 1889.063 us \| } /* __schedule */
				2391	3) ! 140.417 us \| } /* __schedule */
				2392	3) # 2034.948 us \| } /* schedule */
				2393	3) * 33998.59 us \| } /* schedule_preempt_disabled */
				2394
				2395	[...]
				2396
				2397	1) 0.260 us \| msecs_to_jiffies();
				2398	1) 0.313 us \| __rcu_read_unlock();
				2399	1) + 61.770 us \| }
				2400	1) + 64.479 us \| }
				2401	1) 0.313 us \| rcu_bh_qs();
				2402	1) 0.313 us \| __local_bh_enable();
				2403	1) ! 217.240 us \| }
				2404	1) 0.365 us \| idle_cpu();
				2405	1) \| rcu_irq_exit() {
				2406	1) 0.417 us \| rcu_eqs_enter_common.isra.47();
				2407	1) 3.125 us \| }
				2408	1) ! 227.812 us \| }
				2409	1) ! 457.395 us \| }
				2410	1) @ 119760.2 us \| }
				2411
				2412	[...]
				2413
				2414	2) \| handle_IPI() {
				2415	1) 6.979 us \| }
				2416	2) 0.417 us \| scheduler_ipi();
				2417	1) 9.791 us \| }
				2418	1) + 12.917 us \| }
				2419	2) 3.490 us \| }
				2420	1) + 15.729 us \| }
				2421	1) + 18.542 us \| }
				2422	2) $ 3594274 us \| }
				2423
				2424	Flags::
				2425
				2426	+ means that the function exceeded 10 usecs.
				2427	! means that the function exceeded 100 usecs.
				2428	# means that the function exceeded 1000 usecs.
				2429	* means that the function exceeded 10 msecs.
				2430	@ means that the function exceeded 100 msecs.
				2431	$ means that the function exceeded 1 sec.
				2432
				2433
				2434	- The task/pid field displays the thread cmdline and pid which
				2435	executed the function. It is default disabled.
				2436
				2437	- hide: echo nofuncgraph-proc > trace_options
				2438	- show: echo funcgraph-proc > trace_options
				2439
				2440	ie::
				2441
				2442	# tracer: function_graph
				2443	#
				2444	# CPU TASK/PID DURATION FUNCTION CALLS
				2445	# \| \| \| \| \| \| \| \| \|
				2446	0) sh-4802 \| \| d_free() {
				2447	0) sh-4802 \| \| call_rcu() {
				2448	0) sh-4802 \| \| __call_rcu() {
				2449	0) sh-4802 \| 0.616 us \| rcu_process_gp_end();
				2450	0) sh-4802 \| 0.586 us \| check_for_new_grace_period();
				2451	0) sh-4802 \| 2.899 us \| }
				2452	0) sh-4802 \| 4.040 us \| }
				2453	0) sh-4802 \| 5.151 us \| }
				2454	0) sh-4802 \| + 49.370 us \| }
				2455
				2456
				2457	- The absolute time field is an absolute timestamp given by the
				2458	system clock since it started. A snapshot of this time is
				2459	given on each entry/exit of functions
				2460
				2461	- hide: echo nofuncgraph-abstime > trace_options
				2462	- show: echo funcgraph-abstime > trace_options
				2463
				2464	ie::
				2465
				2466	#
				2467	# TIME CPU DURATION FUNCTION CALLS
				2468	# \| \| \| \| \| \| \| \|
				2469	360.774522 \| 1) 0.541 us \| }
				2470	360.774522 \| 1) 4.663 us \| }
				2471	360.774523 \| 1) 0.541 us \| __wake_up_bit();
				2472	360.774524 \| 1) 6.796 us \| }
				2473	360.774524 \| 1) 7.952 us \| }
				2474	360.774525 \| 1) 9.063 us \| }
				2475	360.774525 \| 1) 0.615 us \| journal_mark_dirty();
				2476	360.774527 \| 1) 0.578 us \| __brelse();
				2477	360.774528 \| 1) \| reiserfs_prepare_for_journal() {
				2478	360.774528 \| 1) \| unlock_buffer() {
				2479	360.774529 \| 1) \| wake_up_bit() {
				2480	360.774529 \| 1) \| bit_waitqueue() {
				2481	360.774530 \| 1) 0.594 us \| __phys_addr();
				2482
				2483
				2484	The function name is always displayed after the closing bracket
				2485	for a function if the start of that function is not in the
				2486	trace buffer.
				2487
				2488	Display of the function name after the closing bracket may be
				2489	enabled for functions whose start is in the trace buffer,
				2490	allowing easier searching with grep for function durations.
				2491	It is default disabled.
				2492
				2493	- hide: echo nofuncgraph-tail > trace_options
				2494	- show: echo funcgraph-tail > trace_options
				2495
				2496	Example with nofuncgraph-tail (default)::
				2497
				2498	0) \| putname() {
				2499	0) \| kmem_cache_free() {
				2500	0) 0.518 us \| __phys_addr();
				2501	0) 1.757 us \| }
				2502	0) 2.861 us \| }
				2503
				2504	Example with funcgraph-tail::
				2505
				2506	0) \| putname() {
				2507	0) \| kmem_cache_free() {
				2508	0) 0.518 us \| __phys_addr();
				2509	0) 1.757 us \| } /* kmem_cache_free() */
				2510	0) 2.861 us \| } /* putname() */
				2511
				2512	You can put some comments on specific functions by using
				2513	trace_printk() For example, if you want to put a comment inside
				2514	the __might_sleep() function, you just have to include
				2515	<linux/ftrace.h> and call trace_printk() inside __might_sleep()::
				2516
				2517	trace_printk("I'm a comment!\n")
				2518
				2519	will produce::
				2520
				2521	1) \| __might_sleep() {
				2522	1) \| /* I'm a comment! */
				2523	1) 1.449 us \| }
				2524
				2525
				2526	You might find other useful features for this tracer in the
				2527	following "dynamic ftrace" section such as tracing only specific
				2528	functions or tasks.
				2529
				2530	dynamic ftrace
				2531	--------------
				2532
				2533	If CONFIG_DYNAMIC_FTRACE is set, the system will run with
				2534	virtually no overhead when function tracing is disabled. The way
				2535	this works is the mcount function call (placed at the start of
				2536	every kernel function, produced by the -pg switch in gcc),
				2537	starts of pointing to a simple return. (Enabling FTRACE will
				2538	include the -pg switch in the compiling of the kernel.)
				2539
				2540	At compile time every C file object is run through the
				2541	recordmcount program (located in the scripts directory). This
				2542	program will parse the ELF headers in the C object to find all
				2543	the locations in the .text section that call mcount. Starting
				2544	with gcc verson 4.6, the -mfentry has been added for x86, which
				2545	calls "__fentry__" instead of "mcount". Which is called before
				2546	the creation of the stack frame.
				2547
				2548	Note, not all sections are traced. They may be prevented by either
				2549	a notrace, or blocked another way and all inline functions are not
				2550	traced. Check the "available_filter_functions" file to see what functions
				2551	can be traced.
				2552
				2553	A section called "__mcount_loc" is created that holds
				2554	references to all the mcount/fentry call sites in the .text section.
				2555	The recordmcount program re-links this section back into the
				2556	original object. The final linking stage of the kernel will add all these
				2557	references into a single table.
				2558
				2559	On boot up, before SMP is initialized, the dynamic ftrace code
				2560	scans this table and updates all the locations into nops. It
				2561	also records the locations, which are added to the
				2562	available_filter_functions list. Modules are processed as they
				2563	are loaded and before they are executed. When a module is
				2564	unloaded, it also removes its functions from the ftrace function
				2565	list. This is automatic in the module unload code, and the
				2566	module author does not need to worry about it.
				2567
				2568	When tracing is enabled, the process of modifying the function
				2569	tracepoints is dependent on architecture. The old method is to use
				2570	kstop_machine to prevent races with the CPUs executing code being
				2571	modified (which can cause the CPU to do undesirable things, especially
				2572	if the modified code crosses cache (or page) boundaries), and the nops are
				2573	patched back to calls. But this time, they do not call mcount
				2574	(which is just a function stub). They now call into the ftrace
				2575	infrastructure.
				2576
				2577	The new method of modifying the function tracepoints is to place
				2578	a breakpoint at the location to be modified, sync all CPUs, modify
				2579	the rest of the instruction not covered by the breakpoint. Sync
				2580	all CPUs again, and then remove the breakpoint with the finished
				2581	version to the ftrace call site.
				2582
				2583	Some archs do not even need to monkey around with the synchronization,
				2584	and can just slap the new code on top of the old without any
				2585	problems with other CPUs executing it at the same time.
				2586
				2587	One special side-effect to the recording of the functions being
				2588	traced is that we can now selectively choose which functions we
				2589	wish to trace and which ones we want the mcount calls to remain
				2590	as nops.
				2591
				2592	Two files are used, one for enabling and one for disabling the
				2593	tracing of specified functions. They are:
				2594
				2595	set_ftrace_filter
				2596
				2597	and
				2598
				2599	set_ftrace_notrace
				2600
				2601	A list of available functions that you can add to these files is
				2602	listed in:
				2603
				2604	available_filter_functions
				2605
				2606	::
				2607
				2608	# cat available_filter_functions
				2609	put_prev_task_idle
				2610	kmem_cache_create
				2611	pick_next_task_rt
				2612	get_online_cpus
				2613	pick_next_task_fair
				2614	mutex_lock
				2615	[...]
				2616
				2617	If I am only interested in sys_nanosleep and hrtimer_interrupt::
				2618
				2619	# echo sys_nanosleep hrtimer_interrupt > set_ftrace_filter
				2620	# echo function > current_tracer
				2621	# echo 1 > tracing_on
				2622	# usleep 1
				2623	# echo 0 > tracing_on
				2624	# cat trace
				2625	# tracer: function
				2626	#
				2627	# entries-in-buffer/entries-written: 5/5 #P:4
				2628	#
				2629	# _-----=> irqs-off
				2630	# / _----=> need-resched
				2631	# \| / _---=> hardirq/softirq
				2632	# \|\| / _--=> preempt-depth
				2633	# \|\|\| / delay
				2634	# TASK-PID CPU# \|\|\|\| TIMESTAMP FUNCTION
				2635	# \| \| \| \|\|\|\| \| \|
				2636	usleep-2665 [001] .... 4186.475355: sys_nanosleep <-system_call_fastpath
				2637	<idle>-0 [001] d.h1 4186.475409: hrtimer_interrupt <-smp_apic_timer_interrupt
				2638	usleep-2665 [001] d.h1 4186.475426: hrtimer_interrupt <-smp_apic_timer_interrupt
				2639	<idle>-0 [003] d.h1 4186.475426: hrtimer_interrupt <-smp_apic_timer_interrupt
				2640	<idle>-0 [002] d.h1 4186.475427: hrtimer_interrupt <-smp_apic_timer_interrupt
				2641
				2642	To see which functions are being traced, you can cat the file:
				2643	::
				2644
				2645	# cat set_ftrace_filter
				2646	hrtimer_interrupt
				2647	sys_nanosleep
				2648
				2649
				2650	Perhaps this is not enough. The filters also allow glob(7) matching.
				2651
Jonathan Corbet	6234c7b	2018-03-07 10:44:08 -0700	[diff] [blame]	2652	``<match>*``
Changbin Du	1f198e2	2018-02-17 13:39:38 +0800	[diff] [blame]	2653	will match functions that begin with <match>
Jonathan Corbet	6234c7b	2018-03-07 10:44:08 -0700	[diff] [blame]	2654	``*<match>``
Changbin Du	1f198e2	2018-02-17 13:39:38 +0800	[diff] [blame]	2655	will match functions that end with <match>
Jonathan Corbet	6234c7b	2018-03-07 10:44:08 -0700	[diff] [blame]	2656	``<match>``
Changbin Du	1f198e2	2018-02-17 13:39:38 +0800	[diff] [blame]	2657	will match functions that have <match> in it
Jonathan Corbet	6234c7b	2018-03-07 10:44:08 -0700	[diff] [blame]	2658	``<match1>*<match2>``
Changbin Du	1f198e2	2018-02-17 13:39:38 +0800	[diff] [blame]	2659	will match functions that begin with <match1> and end with <match2>
				2660
				2661	.. note::
				2662	It is better to use quotes to enclose the wild cards,
				2663	otherwise the shell may expand the parameters into names
				2664	of files in the local directory.
				2665
				2666	::
				2667
				2668	# echo 'hrtimer_*' > set_ftrace_filter
				2669
				2670	Produces::
				2671
				2672	# tracer: function
				2673	#
				2674	# entries-in-buffer/entries-written: 897/897 #P:4
				2675	#
				2676	# _-----=> irqs-off
				2677	# / _----=> need-resched
				2678	# \| / _---=> hardirq/softirq
				2679	# \|\| / _--=> preempt-depth
				2680	# \|\|\| / delay
				2681	# TASK-PID CPU# \|\|\|\| TIMESTAMP FUNCTION
				2682	# \| \| \| \|\|\|\| \| \|
				2683	<idle>-0 [003] dN.1 4228.547803: hrtimer_cancel <-tick_nohz_idle_exit
				2684	<idle>-0 [003] dN.1 4228.547804: hrtimer_try_to_cancel <-hrtimer_cancel
				2685	<idle>-0 [003] dN.2 4228.547805: hrtimer_force_reprogram <-__remove_hrtimer
				2686	<idle>-0 [003] dN.1 4228.547805: hrtimer_forward <-tick_nohz_idle_exit
				2687	<idle>-0 [003] dN.1 4228.547805: hrtimer_start_range_ns <-hrtimer_start_expires.constprop.11
				2688	<idle>-0 [003] d..1 4228.547858: hrtimer_get_next_event <-get_next_timer_interrupt
				2689	<idle>-0 [003] d..1 4228.547859: hrtimer_start <-__tick_nohz_idle_enter
				2690	<idle>-0 [003] d..2 4228.547860: hrtimer_force_reprogram <-__rem
				2691
				2692	Notice that we lost the sys_nanosleep.
				2693	::
				2694
				2695	# cat set_ftrace_filter
				2696	hrtimer_run_queues
				2697	hrtimer_run_pending
				2698	hrtimer_init
				2699	hrtimer_cancel
				2700	hrtimer_try_to_cancel
				2701	hrtimer_forward
				2702	hrtimer_start
				2703	hrtimer_reprogram
				2704	hrtimer_force_reprogram
				2705	hrtimer_get_next_event
				2706	hrtimer_interrupt
				2707	hrtimer_nanosleep
				2708	hrtimer_wakeup
				2709	hrtimer_get_remaining
				2710	hrtimer_get_res
				2711	hrtimer_init_sleeper
				2712
				2713
				2714	This is because the '>' and '>>' act just like they do in bash.
				2715	To rewrite the filters, use '>'
				2716	To append to the filters, use '>>'
				2717
				2718	To clear out a filter so that all functions will be recorded
				2719	again::
				2720
				2721	# echo > set_ftrace_filter
				2722	# cat set_ftrace_filter
				2723	#
				2724
				2725	Again, now we want to append.
				2726
				2727	::
				2728
				2729	# echo sys_nanosleep > set_ftrace_filter
				2730	# cat set_ftrace_filter
				2731	sys_nanosleep
				2732	# echo 'hrtimer_*' >> set_ftrace_filter
				2733	# cat set_ftrace_filter
				2734	hrtimer_run_queues
				2735	hrtimer_run_pending
				2736	hrtimer_init
				2737	hrtimer_cancel
				2738	hrtimer_try_to_cancel
				2739	hrtimer_forward
				2740	hrtimer_start
				2741	hrtimer_reprogram
				2742	hrtimer_force_reprogram
				2743	hrtimer_get_next_event
				2744	hrtimer_interrupt
				2745	sys_nanosleep
				2746	hrtimer_nanosleep
				2747	hrtimer_wakeup
				2748	hrtimer_get_remaining
				2749	hrtimer_get_res
				2750	hrtimer_init_sleeper
				2751
				2752
				2753	The set_ftrace_notrace prevents those functions from being
				2754	traced.
				2755	::
				2756
				2757	# echo 'preempt' 'lock' > set_ftrace_notrace
				2758
				2759	Produces::
				2760
				2761	# tracer: function
				2762	#
				2763	# entries-in-buffer/entries-written: 39608/39608 #P:4
				2764	#
				2765	# _-----=> irqs-off
				2766	# / _----=> need-resched
				2767	# \| / _---=> hardirq/softirq
				2768	# \|\| / _--=> preempt-depth
				2769	# \|\|\| / delay
				2770	# TASK-PID CPU# \|\|\|\| TIMESTAMP FUNCTION
				2771	# \| \| \| \|\|\|\| \| \|
				2772	bash-1994 [000] .... 4342.324896: file_ra_state_init <-do_dentry_open
				2773	bash-1994 [000] .... 4342.324897: open_check_o_direct <-do_last
				2774	bash-1994 [000] .... 4342.324897: ima_file_check <-do_last
				2775	bash-1994 [000] .... 4342.324898: process_measurement <-ima_file_check
				2776	bash-1994 [000] .... 4342.324898: ima_get_action <-process_measurement
				2777	bash-1994 [000] .... 4342.324898: ima_match_policy <-ima_get_action
				2778	bash-1994 [000] .... 4342.324899: do_truncate <-do_last
				2779	bash-1994 [000] .... 4342.324899: should_remove_suid <-do_truncate
				2780	bash-1994 [000] .... 4342.324899: notify_change <-do_truncate
				2781	bash-1994 [000] .... 4342.324900: current_fs_time <-notify_change
				2782	bash-1994 [000] .... 4342.324900: current_kernel_time <-current_fs_time
				2783	bash-1994 [000] .... 4342.324900: timespec_trunc <-current_fs_time
				2784
				2785	We can see that there's no more lock or preempt tracing.
				2786
				2787
				2788	Dynamic ftrace with the function graph tracer
				2789	---------------------------------------------
				2790
				2791	Although what has been explained above concerns both the
				2792	function tracer and the function-graph-tracer, there are some
				2793	special features only available in the function-graph tracer.
				2794
				2795	If you want to trace only one function and all of its children,
				2796	you just have to echo its name into set_graph_function::
				2797
				2798	echo __do_fault > set_graph_function
				2799
				2800	will produce the following "expanded" trace of the __do_fault()
				2801	function::
				2802
				2803	0) \| __do_fault() {
				2804	0) \| filemap_fault() {
				2805	0) \| find_lock_page() {
				2806	0) 0.804 us \| find_get_page();
				2807	0) \| __might_sleep() {
				2808	0) 1.329 us \| }
				2809	0) 3.904 us \| }
				2810	0) 4.979 us \| }
				2811	0) 0.653 us \| _spin_lock();
				2812	0) 0.578 us \| page_add_file_rmap();
				2813	0) 0.525 us \| native_set_pte_at();
				2814	0) 0.585 us \| _spin_unlock();
				2815	0) \| unlock_page() {
				2816	0) 0.541 us \| page_waitqueue();
				2817	0) 0.639 us \| __wake_up_bit();
				2818	0) 2.786 us \| }
				2819	0) + 14.237 us \| }
				2820	0) \| __do_fault() {
				2821	0) \| filemap_fault() {
				2822	0) \| find_lock_page() {
				2823	0) 0.698 us \| find_get_page();
				2824	0) \| __might_sleep() {
				2825	0) 1.412 us \| }
				2826	0) 3.950 us \| }
				2827	0) 5.098 us \| }
				2828	0) 0.631 us \| _spin_lock();
				2829	0) 0.571 us \| page_add_file_rmap();
				2830	0) 0.526 us \| native_set_pte_at();
				2831	0) 0.586 us \| _spin_unlock();
				2832	0) \| unlock_page() {
				2833	0) 0.533 us \| page_waitqueue();
				2834	0) 0.638 us \| __wake_up_bit();
				2835	0) 2.793 us \| }
				2836	0) + 14.012 us \| }
				2837
				2838	You can also expand several functions at once::
				2839
				2840	echo sys_open > set_graph_function
				2841	echo sys_close >> set_graph_function
				2842
				2843	Now if you want to go back to trace all functions you can clear
				2844	this special filter via::
				2845
				2846	echo > set_graph_function
				2847
				2848
				2849	ftrace_enabled
				2850	--------------
				2851
				2852	Note, the proc sysctl ftrace_enable is a big on/off switch for the
				2853	function tracer. By default it is enabled (when function tracing is
				2854	enabled in the kernel). If it is disabled, all function tracing is
				2855	disabled. This includes not only the function tracers for ftrace, but
				2856	also for any other uses (perf, kprobes, stack tracing, profiling, etc).
				2857
				2858	Please disable this with care.
				2859
				2860	This can be disable (and enabled) with::
				2861
				2862	sysctl kernel.ftrace_enabled=0
				2863	sysctl kernel.ftrace_enabled=1
				2864
				2865	or
				2866
				2867	echo 0 > /proc/sys/kernel/ftrace_enabled
				2868	echo 1 > /proc/sys/kernel/ftrace_enabled
				2869
				2870
				2871	Filter commands
				2872	---------------
				2873
				2874	A few commands are supported by the set_ftrace_filter interface.
				2875	Trace commands have the following format::
				2876
				2877	<function>:<command>:<parameter>
				2878
				2879	The following commands are supported:
				2880
				2881	- mod:
				2882	This command enables function filtering per module. The
				2883	parameter defines the module. For example, if only the write*
				2884	functions in the ext3 module are desired, run:
				2885
				2886	echo 'write*:mod:ext3' > set_ftrace_filter
				2887
				2888	This command interacts with the filter in the same way as
				2889	filtering based on function names. Thus, adding more functions
				2890	in a different module is accomplished by appending (>>) to the
				2891	filter file. Remove specific module functions by prepending
				2892	'!'::
				2893
				2894	echo '!writeback*:mod:ext3' >> set_ftrace_filter
				2895
				2896	Mod command supports module globbing. Disable tracing for all
				2897	functions except a specific module::
				2898
				2899	echo '!*:mod:!ext3' >> set_ftrace_filter
				2900
				2901	Disable tracing for all modules, but still trace kernel::
				2902
				2903	echo '!:mod:' >> set_ftrace_filter
				2904
				2905	Enable filter only for kernel::
				2906
				2907	echo 'write:mod:!*' >> set_ftrace_filter
				2908
				2909	Enable filter for module globbing::
				2910
				2911	echo 'write:mod:snd' >> set_ftrace_filter
				2912
				2913	- traceon/traceoff:
				2914	These commands turn tracing on and off when the specified
				2915	functions are hit. The parameter determines how many times the
				2916	tracing system is turned on and off. If unspecified, there is
				2917	no limit. For example, to disable tracing when a schedule bug
				2918	is hit the first 5 times, run::
				2919
				2920	echo '__schedule_bug:traceoff:5' > set_ftrace_filter
				2921
				2922	To always disable tracing when __schedule_bug is hit::
				2923
				2924	echo '__schedule_bug:traceoff' > set_ftrace_filter
				2925
				2926	These commands are cumulative whether or not they are appended
				2927	to set_ftrace_filter. To remove a command, prepend it by '!'
				2928	and drop the parameter::
				2929
				2930	echo '!__schedule_bug:traceoff:0' > set_ftrace_filter
				2931
				2932	The above removes the traceoff command for __schedule_bug
				2933	that have a counter. To remove commands without counters::
				2934
				2935	echo '!__schedule_bug:traceoff' > set_ftrace_filter
				2936
				2937	- snapshot:
				2938	Will cause a snapshot to be triggered when the function is hit.
				2939	::
				2940
				2941	echo 'native_flush_tlb_others:snapshot' > set_ftrace_filter
				2942
				2943	To only snapshot once:
				2944	::
				2945
				2946	echo 'native_flush_tlb_others:snapshot:1' > set_ftrace_filter
				2947
				2948	To remove the above commands::
				2949
				2950	echo '!native_flush_tlb_others:snapshot' > set_ftrace_filter
				2951	echo '!native_flush_tlb_others:snapshot:0' > set_ftrace_filter
				2952
				2953	- enable_event/disable_event:
				2954	These commands can enable or disable a trace event. Note, because
				2955	function tracing callbacks are very sensitive, when these commands
				2956	are registered, the trace point is activated, but disabled in
				2957	a "soft" mode. That is, the tracepoint will be called, but
				2958	just will not be traced. The event tracepoint stays in this mode
				2959	as long as there's a command that triggers it.
				2960	::
				2961
				2962	echo 'try_to_wake_up:enable_event:sched:sched_switch:2' > \
				2963	set_ftrace_filter
				2964
				2965	The format is::
				2966
				2967	<function>:enable_event:<system>:<event>[:count]
				2968	<function>:disable_event:<system>:<event>[:count]
				2969
				2970	To remove the events commands::
				2971
				2972	echo '!try_to_wake_up:enable_event:sched:sched_switch:0' > \
				2973	set_ftrace_filter
				2974	echo '!schedule:disable_event:sched:sched_switch' > \
				2975	set_ftrace_filter
				2976
				2977	- dump:
				2978	When the function is hit, it will dump the contents of the ftrace
				2979	ring buffer to the console. This is useful if you need to debug
				2980	something, and want to dump the trace when a certain function
				2981	is hit. Perhaps its a function that is called before a tripple
				2982	fault happens and does not allow you to get a regular dump.
				2983
				2984	- cpudump:
				2985	When the function is hit, it will dump the contents of the ftrace
				2986	ring buffer for the current CPU to the console. Unlike the "dump"
				2987	command, it only prints out the contents of the ring buffer for the
				2988	CPU that executed the function that triggered the dump.
				2989
				2990	trace_pipe
				2991	----------
				2992
				2993	The trace_pipe outputs the same content as the trace file, but
				2994	the effect on the tracing is different. Every read from
				2995	trace_pipe is consumed. This means that subsequent reads will be
				2996	different. The trace is live.
				2997	::
				2998
				2999	# echo function > current_tracer
				3000	# cat trace_pipe > /tmp/trace.out &
				3001	[1] 4153
				3002	# echo 1 > tracing_on
				3003	# usleep 1
				3004	# echo 0 > tracing_on
				3005	# cat trace
				3006	# tracer: function
				3007	#
				3008	# entries-in-buffer/entries-written: 0/0 #P:4
				3009	#
				3010	# _-----=> irqs-off
				3011	# / _----=> need-resched
				3012	# \| / _---=> hardirq/softirq
				3013	# \|\| / _--=> preempt-depth
				3014	# \|\|\| / delay
				3015	# TASK-PID CPU# \|\|\|\| TIMESTAMP FUNCTION
				3016	# \| \| \| \|\|\|\| \| \|
				3017
				3018	#
				3019	# cat /tmp/trace.out
				3020	bash-1994 [000] .... 5281.568961: mutex_unlock <-rb_simple_write
				3021	bash-1994 [000] .... 5281.568963: __mutex_unlock_slowpath <-mutex_unlock
				3022	bash-1994 [000] .... 5281.568963: __fsnotify_parent <-fsnotify_modify
				3023	bash-1994 [000] .... 5281.568964: fsnotify <-fsnotify_modify
				3024	bash-1994 [000] .... 5281.568964: __srcu_read_lock <-fsnotify
				3025	bash-1994 [000] .... 5281.568964: add_preempt_count <-__srcu_read_lock
				3026	bash-1994 [000] ...1 5281.568965: sub_preempt_count <-__srcu_read_lock
				3027	bash-1994 [000] .... 5281.568965: __srcu_read_unlock <-fsnotify
				3028	bash-1994 [000] .... 5281.568967: sys_dup2 <-system_call_fastpath
				3029
				3030
				3031	Note, reading the trace_pipe file will block until more input is
				3032	added.
				3033
				3034	trace entries
				3035	-------------
				3036
				3037	Having too much or not enough data can be troublesome in
				3038	diagnosing an issue in the kernel. The file buffer_size_kb is
				3039	used to modify the size of the internal trace buffers. The
				3040	number listed is the number of entries that can be recorded per
				3041	CPU. To know the full size, multiply the number of possible CPUs
				3042	with the number of entries.
				3043	::
				3044
				3045	# cat buffer_size_kb
				3046	1408 (units kilobytes)
				3047
				3048	Or simply read buffer_total_size_kb
				3049	::
				3050
				3051	# cat buffer_total_size_kb
				3052	5632
				3053
				3054	To modify the buffer, simple echo in a number (in 1024 byte segments).
				3055	::
				3056
				3057	# echo 10000 > buffer_size_kb
				3058	# cat buffer_size_kb
				3059	10000 (units kilobytes)
				3060
				3061	It will try to allocate as much as possible. If you allocate too
				3062	much, it can cause Out-Of-Memory to trigger.
				3063	::
				3064
				3065	# echo 1000000000000 > buffer_size_kb
				3066	-bash: echo: write error: Cannot allocate memory
				3067	# cat buffer_size_kb
				3068	85
				3069
				3070	The per_cpu buffers can be changed individually as well:
				3071	::
				3072
				3073	# echo 10000 > per_cpu/cpu0/buffer_size_kb
				3074	# echo 100 > per_cpu/cpu1/buffer_size_kb
				3075
				3076	When the per_cpu buffers are not the same, the buffer_size_kb
				3077	at the top level will just show an X
				3078	::
				3079
				3080	# cat buffer_size_kb
				3081	X
				3082
				3083	This is where the buffer_total_size_kb is useful:
				3084	::
				3085
				3086	# cat buffer_total_size_kb
				3087	12916
				3088
				3089	Writing to the top level buffer_size_kb will reset all the buffers
				3090	to be the same again.
				3091
				3092	Snapshot
				3093	--------
				3094	CONFIG_TRACER_SNAPSHOT makes a generic snapshot feature
				3095	available to all non latency tracers. (Latency tracers which
				3096	record max latency, such as "irqsoff" or "wakeup", can't use
				3097	this feature, since those are already using the snapshot
				3098	mechanism internally.)
				3099
				3100	Snapshot preserves a current trace buffer at a particular point
				3101	in time without stopping tracing. Ftrace swaps the current
				3102	buffer with a spare buffer, and tracing continues in the new
				3103	current (=previous spare) buffer.
				3104
				3105	The following tracefs files in "tracing" are related to this
				3106	feature:
				3107
				3108	snapshot:
				3109
				3110	This is used to take a snapshot and to read the output
				3111	of the snapshot. Echo 1 into this file to allocate a
				3112	spare buffer and to take a snapshot (swap), then read
				3113	the snapshot from this file in the same format as
				3114	"trace" (described above in the section "The File
				3115	System"). Both reads snapshot and tracing are executable
				3116	in parallel. When the spare buffer is allocated, echoing
				3117	0 frees it, and echoing else (positive) values clear the
				3118	snapshot contents.
				3119	More details are shown in the table below.
				3120
				3121	+--------------+------------+------------+------------+
				3122	\|status\\input \| 0 \| 1 \| else \|
				3123	+==============+============+============+============+
				3124	\|not allocated \|(do nothing)\| alloc+swap \|(do nothing)\|
				3125	+--------------+------------+------------+------------+
				3126	\|allocated \| free \| swap \| clear \|
				3127	+--------------+------------+------------+------------+
				3128
				3129	Here is an example of using the snapshot feature.
				3130	::
				3131
				3132	# echo 1 > events/sched/enable
				3133	# echo 1 > snapshot
				3134	# cat snapshot
				3135	# tracer: nop
				3136	#
				3137	# entries-in-buffer/entries-written: 71/71 #P:8
				3138	#
				3139	# _-----=> irqs-off
				3140	# / _----=> need-resched
				3141	# \| / _---=> hardirq/softirq
				3142	# \|\| / _--=> preempt-depth
				3143	# \|\|\| / delay
				3144	# TASK-PID CPU# \|\|\|\| TIMESTAMP FUNCTION
				3145	# \| \| \| \|\|\|\| \| \|
				3146	<idle>-0 [005] d... 2440.603828: sched_switch: prev_comm=swapper/5 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=snapshot-test-2 next_pid=2242 next_prio=120
				3147	sleep-2242 [005] d... 2440.603846: sched_switch: prev_comm=snapshot-test-2 prev_pid=2242 prev_prio=120 prev_state=R ==> next_comm=kworker/5:1 next_pid=60 next_prio=120
				3148	[...]
				3149	<idle>-0 [002] d... 2440.707230: sched_switch: prev_comm=swapper/2 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=snapshot-test-2 next_pid=2229 next_prio=120
				3150
				3151	# cat trace
				3152	# tracer: nop
				3153	#
				3154	# entries-in-buffer/entries-written: 77/77 #P:8
				3155	#
				3156	# _-----=> irqs-off
				3157	# / _----=> need-resched
				3158	# \| / _---=> hardirq/softirq
				3159	# \|\| / _--=> preempt-depth
				3160	# \|\|\| / delay
				3161	# TASK-PID CPU# \|\|\|\| TIMESTAMP FUNCTION
				3162	# \| \| \| \|\|\|\| \| \|
				3163	<idle>-0 [007] d... 2440.707395: sched_switch: prev_comm=swapper/7 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=snapshot-test-2 next_pid=2243 next_prio=120
				3164	snapshot-test-2-2229 [002] d... 2440.707438: sched_switch: prev_comm=snapshot-test-2 prev_pid=2229 prev_prio=120 prev_state=S ==> next_comm=swapper/2 next_pid=0 next_prio=120
				3165	[...]
				3166
				3167
				3168	If you try to use this snapshot feature when current tracer is
				3169	one of the latency tracers, you will get the following results.
				3170	::
				3171
				3172	# echo wakeup > current_tracer
				3173	# echo 1 > snapshot
				3174	bash: echo: write error: Device or resource busy
				3175	# cat snapshot
				3176	cat: snapshot: Device or resource busy
				3177
				3178
				3179	Instances
				3180	---------
				3181	In the tracefs tracing directory is a directory called "instances".
				3182	This directory can have new directories created inside of it using
				3183	mkdir, and removing directories with rmdir. The directory created
				3184	with mkdir in this directory will already contain files and other
				3185	directories after it is created.
				3186	::
				3187
				3188	# mkdir instances/foo
				3189	# ls instances/foo
				3190	buffer_size_kb buffer_total_size_kb events free_buffer per_cpu
				3191	set_event snapshot trace trace_clock trace_marker trace_options
				3192	trace_pipe tracing_on
				3193
				3194	As you can see, the new directory looks similar to the tracing directory
				3195	itself. In fact, it is very similar, except that the buffer and
				3196	events are agnostic from the main director, or from any other
				3197	instances that are created.
				3198
				3199	The files in the new directory work just like the files with the
				3200	same name in the tracing directory except the buffer that is used
				3201	is a separate and new buffer. The files affect that buffer but do not
				3202	affect the main buffer with the exception of trace_options. Currently,
				3203	the trace_options affect all instances and the top level buffer
				3204	the same, but this may change in future releases. That is, options
				3205	may become specific to the instance they reside in.
				3206
				3207	Notice that none of the function tracer files are there, nor is
				3208	current_tracer and available_tracers. This is because the buffers
				3209	can currently only have events enabled for them.
				3210	::
				3211
				3212	# mkdir instances/foo
				3213	# mkdir instances/bar
				3214	# mkdir instances/zoot
				3215	# echo 100000 > buffer_size_kb
				3216	# echo 1000 > instances/foo/buffer_size_kb
				3217	# echo 5000 > instances/bar/per_cpu/cpu1/buffer_size_kb
				3218	# echo function > current_trace
				3219	# echo 1 > instances/foo/events/sched/sched_wakeup/enable
				3220	# echo 1 > instances/foo/events/sched/sched_wakeup_new/enable
				3221	# echo 1 > instances/foo/events/sched/sched_switch/enable
				3222	# echo 1 > instances/bar/events/irq/enable
				3223	# echo 1 > instances/zoot/events/syscalls/enable
				3224	# cat trace_pipe
				3225	CPU:2 [LOST 11745 EVENTS]
				3226	bash-2044 [002] .... 10594.481032: _raw_spin_lock_irqsave <-get_page_from_freelist
				3227	bash-2044 [002] d... 10594.481032: add_preempt_count <-_raw_spin_lock_irqsave
				3228	bash-2044 [002] d..1 10594.481032: __rmqueue <-get_page_from_freelist
				3229	bash-2044 [002] d..1 10594.481033: _raw_spin_unlock <-get_page_from_freelist
				3230	bash-2044 [002] d..1 10594.481033: sub_preempt_count <-_raw_spin_unlock
				3231	bash-2044 [002] d... 10594.481033: get_pageblock_flags_group <-get_pageblock_migratetype
				3232	bash-2044 [002] d... 10594.481034: __mod_zone_page_state <-get_page_from_freelist
				3233	bash-2044 [002] d... 10594.481034: zone_statistics <-get_page_from_freelist
				3234	bash-2044 [002] d... 10594.481034: __inc_zone_state <-zone_statistics
				3235	bash-2044 [002] d... 10594.481034: __inc_zone_state <-zone_statistics
				3236	bash-2044 [002] .... 10594.481035: arch_dup_task_struct <-copy_process
				3237	[...]
				3238
				3239	# cat instances/foo/trace_pipe
				3240	bash-1998 [000] d..4 136.676759: sched_wakeup: comm=kworker/0:1 pid=59 prio=120 success=1 target_cpu=000
				3241	bash-1998 [000] dN.4 136.676760: sched_wakeup: comm=bash pid=1998 prio=120 success=1 target_cpu=000
				3242	<idle>-0 [003] d.h3 136.676906: sched_wakeup: comm=rcu_preempt pid=9 prio=120 success=1 target_cpu=003
				3243	<idle>-0 [003] d..3 136.676909: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcu_preempt next_pid=9 next_prio=120
				3244	rcu_preempt-9 [003] d..3 136.676916: sched_switch: prev_comm=rcu_preempt prev_pid=9 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120
				3245	bash-1998 [000] d..4 136.677014: sched_wakeup: comm=kworker/0:1 pid=59 prio=120 success=1 target_cpu=000
				3246	bash-1998 [000] dN.4 136.677016: sched_wakeup: comm=bash pid=1998 prio=120 success=1 target_cpu=000
				3247	bash-1998 [000] d..3 136.677018: sched_switch: prev_comm=bash prev_pid=1998 prev_prio=120 prev_state=R+ ==> next_comm=kworker/0:1 next_pid=59 next_prio=120
				3248	kworker/0:1-59 [000] d..4 136.677022: sched_wakeup: comm=sshd pid=1995 prio=120 success=1 target_cpu=001
				3249	kworker/0:1-59 [000] d..3 136.677025: sched_switch: prev_comm=kworker/0:1 prev_pid=59 prev_prio=120 prev_state=S ==> next_comm=bash next_pid=1998 next_prio=120
				3250	[...]
				3251
				3252	# cat instances/bar/trace_pipe
				3253	migration/1-14 [001] d.h3 138.732674: softirq_raise: vec=3 [action=NET_RX]
				3254	<idle>-0 [001] dNh3 138.732725: softirq_raise: vec=3 [action=NET_RX]
				3255	bash-1998 [000] d.h1 138.733101: softirq_raise: vec=1 [action=TIMER]
				3256	bash-1998 [000] d.h1 138.733102: softirq_raise: vec=9 [action=RCU]
				3257	bash-1998 [000] ..s2 138.733105: softirq_entry: vec=1 [action=TIMER]
				3258	bash-1998 [000] ..s2 138.733106: softirq_exit: vec=1 [action=TIMER]
				3259	bash-1998 [000] ..s2 138.733106: softirq_entry: vec=9 [action=RCU]
				3260	bash-1998 [000] ..s2 138.733109: softirq_exit: vec=9 [action=RCU]
				3261	sshd-1995 [001] d.h1 138.733278: irq_handler_entry: irq=21 name=uhci_hcd:usb4
				3262	sshd-1995 [001] d.h1 138.733280: irq_handler_exit: irq=21 ret=unhandled
				3263	sshd-1995 [001] d.h1 138.733281: irq_handler_entry: irq=21 name=eth0
				3264	sshd-1995 [001] d.h1 138.733283: irq_handler_exit: irq=21 ret=handled
				3265	[...]
				3266
				3267	# cat instances/zoot/trace
				3268	# tracer: nop
				3269	#
				3270	# entries-in-buffer/entries-written: 18996/18996 #P:4
				3271	#
				3272	# _-----=> irqs-off
				3273	# / _----=> need-resched
				3274	# \| / _---=> hardirq/softirq
				3275	# \|\| / _--=> preempt-depth
				3276	# \|\|\| / delay
				3277	# TASK-PID CPU# \|\|\|\| TIMESTAMP FUNCTION
				3278	# \| \| \| \|\|\|\| \| \|
				3279	bash-1998 [000] d... 140.733501: sys_write -> 0x2
				3280	bash-1998 [000] d... 140.733504: sys_dup2(oldfd: a, newfd: 1)
				3281	bash-1998 [000] d... 140.733506: sys_dup2 -> 0x1
				3282	bash-1998 [000] d... 140.733508: sys_fcntl(fd: a, cmd: 1, arg: 0)
				3283	bash-1998 [000] d... 140.733509: sys_fcntl -> 0x1
				3284	bash-1998 [000] d... 140.733510: sys_close(fd: a)
				3285	bash-1998 [000] d... 140.733510: sys_close -> 0x0
				3286	bash-1998 [000] d... 140.733514: sys_rt_sigprocmask(how: 0, nset: 0, oset: 6e2768, sigsetsize: 8)
				3287	bash-1998 [000] d... 140.733515: sys_rt_sigprocmask -> 0x0
				3288	bash-1998 [000] d... 140.733516: sys_rt_sigaction(sig: 2, act: 7fff718846f0, oact: 7fff71884650, sigsetsize: 8)
				3289	bash-1998 [000] d... 140.733516: sys_rt_sigaction -> 0x0
				3290
				3291	You can see that the trace of the top most trace buffer shows only
				3292	the function tracing. The foo instance displays wakeups and task
				3293	switches.
				3294
				3295	To remove the instances, simply delete their directories:
				3296	::
				3297
				3298	# rmdir instances/foo
				3299	# rmdir instances/bar
				3300	# rmdir instances/zoot
				3301
				3302	Note, if a process has a trace file open in one of the instance
				3303	directories, the rmdir will fail with EBUSY.
				3304
				3305
				3306	Stack trace
				3307	-----------
				3308	Since the kernel has a fixed sized stack, it is important not to
				3309	waste it in functions. A kernel developer must be conscience of
				3310	what they allocate on the stack. If they add too much, the system
				3311	can be in danger of a stack overflow, and corruption will occur,
				3312	usually leading to a system panic.
				3313
				3314	There are some tools that check this, usually with interrupts
				3315	periodically checking usage. But if you can perform a check
				3316	at every function call that will become very useful. As ftrace provides
				3317	a function tracer, it makes it convenient to check the stack size
				3318	at every function call. This is enabled via the stack tracer.
				3319
				3320	CONFIG_STACK_TRACER enables the ftrace stack tracing functionality.
				3321	To enable it, write a '1' into /proc/sys/kernel/stack_tracer_enabled.
				3322	::
				3323
				3324	# echo 1 > /proc/sys/kernel/stack_tracer_enabled
				3325
				3326	You can also enable it from the kernel command line to trace
				3327	the stack size of the kernel during boot up, by adding "stacktrace"
				3328	to the kernel command line parameter.
				3329
				3330	After running it for a few minutes, the output looks like:
				3331	::
				3332
				3333	# cat stack_max_size
				3334	2928
				3335
				3336	# cat stack_trace
				3337	Depth Size Location (18 entries)
				3338	----- ---- --------
				3339	0) 2928 224 update_sd_lb_stats+0xbc/0x4ac
				3340	1) 2704 160 find_busiest_group+0x31/0x1f1
				3341	2) 2544 256 load_balance+0xd9/0x662
				3342	3) 2288 80 idle_balance+0xbb/0x130
				3343	4) 2208 128 __schedule+0x26e/0x5b9
				3344	5) 2080 16 schedule+0x64/0x66
				3345	6) 2064 128 schedule_timeout+0x34/0xe0
				3346	7) 1936 112 wait_for_common+0x97/0xf1
				3347	8) 1824 16 wait_for_completion+0x1d/0x1f
				3348	9) 1808 128 flush_work+0xfe/0x119
				3349	10) 1680 16 tty_flush_to_ldisc+0x1e/0x20
				3350	11) 1664 48 input_available_p+0x1d/0x5c
				3351	12) 1616 48 n_tty_poll+0x6d/0x134
				3352	13) 1568 64 tty_poll+0x64/0x7f
				3353	14) 1504 880 do_select+0x31e/0x511
				3354	15) 624 400 core_sys_select+0x177/0x216
				3355	16) 224 96 sys_select+0x91/0xb9
				3356	17) 128 128 system_call_fastpath+0x16/0x1b
				3357
				3358	Note, if -mfentry is being used by gcc, functions get traced before
				3359	they set up the stack frame. This means that leaf level functions
				3360	are not tested by the stack tracer when -mfentry is used.
				3361
				3362	Currently, -mfentry is used by gcc 4.6.0 and above on x86 only.
				3363
				3364	More
				3365	----
				3366	More details can be found in the source code, in the `kernel/trace/*.c` files.