Blame - Documentation/admin-guide/pm/cpufreq.rst - kernel/msm-4.19

blob: 09aa2e9497875acec984dc68727e14139c5d23ad [file] [log] [blame]

Rafael J. Wysocki	2a0e492	2017-03-13 23:59:57 +0100	[diff] [blame]	1	.. \|struct cpufreq_policy\| replace:: :c:type:`struct cpufreq_policy <cpufreq_policy>`
Rafael J. Wysocki	33fc30b	2017-05-14 02:06:03 +0200	[diff] [blame]	2	.. \|intel_pstate\| replace:: :doc:`intel_pstate <intel_pstate>`
Rafael J. Wysocki	2a0e492	2017-03-13 23:59:57 +0100	[diff] [blame]	3
				4	=======================
				5	CPU Performance Scaling
				6	=======================
				7
				8	::
				9
				10	Copyright (c) 2017 Intel Corp., Rafael J. Wysocki <rafael.j.wysocki@intel.com>
				11
				12	The Concept of CPU Performance Scaling
				13	======================================
				14
				15	The majority of modern processors are capable of operating in a number of
				16	different clock frequency and voltage configurations, often referred to as
				17	Operating Performance Points or P-states (in ACPI terminology). As a rule,
				18	the higher the clock frequency and the higher the voltage, the more instructions
				19	can be retired by the CPU over a unit of time, but also the higher the clock
				20	frequency and the higher the voltage, the more energy is consumed over a unit of
				21	time (or the more power is drawn) by the CPU in the given P-state. Therefore
				22	there is a natural tradeoff between the CPU capacity (the number of instructions
				23	that can be executed over a unit of time) and the power drawn by the CPU.
				24
				25	In some situations it is desirable or even necessary to run the program as fast
				26	as possible and then there is no reason to use any P-states different from the
				27	highest one (i.e. the highest-performance frequency/voltage configuration
				28	available). In some other cases, however, it may not be necessary to execute
				29	instructions so quickly and maintaining the highest available CPU capacity for a
				30	relatively long time without utilizing it entirely may be regarded as wasteful.
				31	It also may not be physically possible to maintain maximum CPU capacity for too
				32	long for thermal or power supply capacity reasons or similar. To cover those
				33	cases, there are hardware interfaces allowing CPUs to be switched between
				34	different frequency/voltage configurations or (in the ACPI terminology) to be
				35	put into different P-states.
				36
				37	Typically, they are used along with algorithms to estimate the required CPU
				38	capacity, so as to decide which P-states to put the CPUs into. Of course, since
				39	the utilization of the system generally changes over time, that has to be done
				40	repeatedly on a regular basis. The activity by which this happens is referred
				41	to as CPU performance scaling or CPU frequency scaling (because it involves
				42	adjusting the CPU clock frequency).
				43
				44
				45	CPU Performance Scaling in Linux
				46	================================
				47
				48	The Linux kernel supports CPU performance scaling by means of the ``CPUFreq``
				49	(CPU Frequency scaling) subsystem that consists of three layers of code: the
				50	core, scaling governors and scaling drivers.
				51
				52	The ``CPUFreq`` core provides the common code infrastructure and user space
				53	interfaces for all platforms that support CPU performance scaling. It defines
				54	the basic framework in which the other components operate.
				55
				56	Scaling governors implement algorithms to estimate the required CPU capacity.
				57	As a rule, each governor implements one, possibly parametrized, scaling
				58	algorithm.
				59
				60	Scaling drivers talk to the hardware. They provide scaling governors with
				61	information on the available P-states (or P-state ranges in some cases) and
				62	access platform-specific hardware interfaces to change CPU P-states as requested
				63	by scaling governors.
				64
				65	In principle, all available scaling governors can be used with every scaling
				66	driver. That design is based on the observation that the information used by
				67	performance scaling algorithms for P-state selection can be represented in a
				68	platform-independent form in the majority of cases, so it should be possible
				69	to use the same performance scaling algorithm implemented in exactly the same
				70	way regardless of which scaling driver is used. Consequently, the same set of
				71	scaling governors should be suitable for every supported platform.
				72
				73	However, that observation may not hold for performance scaling algorithms
				74	based on information provided by the hardware itself, for example through
				75	feedback registers, as that information is typically specific to the hardware
				76	interface it comes from and may not be easily represented in an abstract,
				77	platform-independent way. For this reason, ``CPUFreq`` allows scaling drivers
				78	to bypass the governor layer and implement their own performance scaling
Rafael J. Wysocki	33fc30b	2017-05-14 02:06:03 +0200	[diff] [blame]	79	algorithms. That is done by the \|intel_pstate\| scaling driver.
Rafael J. Wysocki	2a0e492	2017-03-13 23:59:57 +0100	[diff] [blame]	80
				81
				82	``CPUFreq`` Policy Objects
				83	==========================
				84
				85	In some cases the hardware interface for P-state control is shared by multiple
				86	CPUs. That is, for example, the same register (or set of registers) is used to
				87	control the P-state of multiple CPUs at the same time and writing to it affects
				88	all of those CPUs simultaneously.
				89
				90	Sets of CPUs sharing hardware P-state control interfaces are represented by
				91	``CPUFreq`` as \|struct cpufreq_policy\| objects. For consistency,
				92	\|struct cpufreq_policy\| is also used when there is only one CPU in the given
				93	set.
				94
				95	The ``CPUFreq`` core maintains a pointer to a \|struct cpufreq_policy\| object for
				96	every CPU in the system, including CPUs that are currently offline. If multiple
				97	CPUs share the same hardware P-state control interface, all of the pointers
				98	corresponding to them point to the same \|struct cpufreq_policy\| object.
				99
				100	``CPUFreq`` uses \|struct cpufreq_policy\| as its basic data type and the design
				101	of its user space interface is based on the policy concept.
				102
				103
				104	CPU Initialization
				105	==================
				106
				107	First of all, a scaling driver has to be registered for ``CPUFreq`` to work.
				108	It is only possible to register one scaling driver at a time, so the scaling
				109	driver is expected to be able to handle all CPUs in the system.
				110
				111	The scaling driver may be registered before or after CPU registration. If
				112	CPUs are registered earlier, the driver core invokes the ``CPUFreq`` core to
				113	take a note of all of the already registered CPUs during the registration of the
				114	scaling driver. In turn, if any CPUs are registered after the registration of
				115	the scaling driver, the ``CPUFreq`` core will be invoked to take note of them
				116	at their registration time.
				117
				118	In any case, the ``CPUFreq`` core is invoked to take note of any logical CPU it
				119	has not seen so far as soon as it is ready to handle that CPU. [Note that the
				120	logical CPU may be a physical single-core processor, or a single core in a
				121	multicore processor, or a hardware thread in a physical processor or processor
				122	core. In what follows "CPU" always means "logical CPU" unless explicitly stated
				123	otherwise and the word "processor" is used to refer to the physical part
				124	possibly including multiple logical CPUs.]
				125
				126	Once invoked, the ``CPUFreq`` core checks if the policy pointer is already set
				127	for the given CPU and if so, it skips the policy object creation. Otherwise,
				128	a new policy object is created and initialized, which involves the creation of
				129	a new policy directory in ``sysfs``, and the policy pointer corresponding to
				130	the given CPU is set to the new policy object's address in memory.
				131
				132	Next, the scaling driver's ``->init()`` callback is invoked with the policy
				133	pointer of the new CPU passed to it as the argument. That callback is expected
				134	to initialize the performance scaling hardware interface for the given CPU (or,
				135	more precisely, for the set of CPUs sharing the hardware interface it belongs
				136	to, represented by its policy object) and, if the policy object it has been
				137	called for is new, to set parameters of the policy, like the minimum and maximum
				138	frequencies supported by the hardware, the table of available frequencies (if
				139	the set of supported P-states is not a continuous range), and the mask of CPUs
				140	that belong to the same policy (including both online and offline CPUs). That
				141	mask is then used by the core to populate the policy pointers for all of the
				142	CPUs in it.
				143
				144	The next major initialization step for a new policy object is to attach a
				145	scaling governor to it (to begin with, that is the default scaling governor
				146	determined by the kernel configuration, but it may be changed later
				147	via ``sysfs``). First, a pointer to the new policy object is passed to the
				148	governor's ``->init()`` callback which is expected to initialize all of the
				149	data structures necessary to handle the given policy and, possibly, to add
				150	a governor ``sysfs`` interface to it. Next, the governor is started by
				151	invoking its ``->start()`` callback.
				152
				153	That callback it expected to register per-CPU utilization update callbacks for
				154	all of the online CPUs belonging to the given policy with the CPU scheduler.
				155	The utilization update callbacks will be invoked by the CPU scheduler on
				156	important events, like task enqueue and dequeue, on every iteration of the
				157	scheduler tick or generally whenever the CPU utilization may change (from the
				158	scheduler's perspective). They are expected to carry out computations needed
				159	to determine the P-state to use for the given policy going forward and to
				160	invoke the scaling driver to make changes to the hardware in accordance with
				161	the P-state selection. The scaling driver may be invoked directly from
				162	scheduler context or asynchronously, via a kernel thread or workqueue, depending
				163	on the configuration and capabilities of the scaling driver and the governor.
				164
				165	Similar steps are taken for policy objects that are not new, but were "inactive"
				166	previously, meaning that all of the CPUs belonging to them were offline. The
				167	only practical difference in that case is that the ``CPUFreq`` core will attempt
				168	to use the scaling governor previously used with the policy that became
				169	"inactive" (and is re-initialized now) instead of the default governor.
				170
				171	In turn, if a previously offline CPU is being brought back online, but some
				172	other CPUs sharing the policy object with it are online already, there is no
				173	need to re-initialize the policy object at all. In that case, it only is
				174	necessary to restart the scaling governor so that it can take the new online CPU
				175	into account. That is achieved by invoking the governor's ``->stop`` and
				176	``->start()`` callbacks, in this order, for the entire policy.
				177
Rafael J. Wysocki	33fc30b	2017-05-14 02:06:03 +0200	[diff] [blame]	178	As mentioned before, the \|intel_pstate\| scaling driver bypasses the scaling
Rafael J. Wysocki	2a0e492	2017-03-13 23:59:57 +0100	[diff] [blame]	179	governor layer of ``CPUFreq`` and provides its own P-state selection algorithms.
Rafael J. Wysocki	33fc30b	2017-05-14 02:06:03 +0200	[diff] [blame]	180	Consequently, if \|intel_pstate\| is used, scaling governors are not attached to
Rafael J. Wysocki	2a0e492	2017-03-13 23:59:57 +0100	[diff] [blame]	181	new policy objects. Instead, the driver's ``->setpolicy()`` callback is invoked
				182	to register per-CPU utilization update callbacks for each policy. These
				183	callbacks are invoked by the CPU scheduler in the same way as for scaling
Rafael J. Wysocki	33fc30b	2017-05-14 02:06:03 +0200	[diff] [blame]	184	governors, but in the \|intel_pstate\| case they both determine the P-state to
Rafael J. Wysocki	2a0e492	2017-03-13 23:59:57 +0100	[diff] [blame]	185	use and change the hardware configuration accordingly in one go from scheduler
				186	context.
				187
				188	The policy objects created during CPU initialization and other data structures
				189	associated with them are torn down when the scaling driver is unregistered
				190	(which happens when the kernel module containing it is unloaded, for example) or
				191	when the last CPU belonging to the given policy in unregistered.
				192
				193
				194	Policy Interface in ``sysfs``
				195	=============================
				196
				197	During the initialization of the kernel, the ``CPUFreq`` core creates a
				198	``sysfs`` directory (kobject) called ``cpufreq`` under
				199	:file:`/sys/devices/system/cpu/`.
				200
				201	That directory contains a ``policyX`` subdirectory (where ``X`` represents an
				202	integer number) for every policy object maintained by the ``CPUFreq`` core.
				203	Each ``policyX`` directory is pointed to by ``cpufreq`` symbolic links
				204	under :file:`/sys/devices/system/cpu/cpuY/` (where ``Y`` represents an integer
				205	that may be different from the one represented by ``X``) for all of the CPUs
				206	associated with (or belonging to) the given policy. The ``policyX`` directories
				207	in :file:`/sys/devices/system/cpu/cpufreq` each contain policy-specific
				208	attributes (files) to control ``CPUFreq`` behavior for the corresponding policy
				209	objects (that is, for all of the CPUs associated with them).
				210
				211	Some of those attributes are generic. They are created by the ``CPUFreq`` core
				212	and their behavior generally does not depend on what scaling driver is in use
				213	and what scaling governor is attached to the given policy. Some scaling drivers
				214	also add driver-specific attributes to the policy directories in ``sysfs`` to
				215	control policy-specific aspects of driver behavior.
				216
				217	The generic attributes under :file:`/sys/devices/system/cpu/cpufreq/policyX/`
				218	are the following:
				219
				220	``affected_cpus``
				221	List of online CPUs belonging to this policy (i.e. sharing the hardware
				222	performance scaling interface represented by the ``policyX`` policy
				223	object).
				224
				225	``bios_limit``
				226	If the platform firmware (BIOS) tells the OS to apply an upper limit to
				227	CPU frequencies, that limit will be reported through this attribute (if
				228	present).
				229
				230	The existence of the limit may be a result of some (often unintentional)
				231	BIOS settings, restrictions coming from a service processor or another
				232	BIOS/HW-based mechanisms.
				233
				234	This does not cover ACPI thermal limitations which can be discovered
				235	through a generic thermal driver.
				236
				237	This attribute is not present if the scaling driver in use does not
				238	support it.
				239
				240	``cpuinfo_max_freq``
				241	Maximum possible operating frequency the CPUs belonging to this policy
				242	can run at (in kHz).
				243
				244	``cpuinfo_min_freq``
				245	Minimum possible operating frequency the CPUs belonging to this policy
				246	can run at (in kHz).
				247
				248	``cpuinfo_transition_latency``
				249	The time it takes to switch the CPUs belonging to this policy from one
				250	P-state to another, in nanoseconds.
				251
				252	If unknown or if known to be so high that the scaling driver does not
				253	work with the `ondemand`_ governor, -1 (:c:macro:`CPUFREQ_ETERNAL`)
				254	will be returned by reads from this attribute.
				255
				256	``related_cpus``
				257	List of all (online and offline) CPUs belonging to this policy.
				258
				259	``scaling_available_governors``
				260	List of ``CPUFreq`` scaling governors present in the kernel that can
Rafael J. Wysocki	33fc30b	2017-05-14 02:06:03 +0200	[diff] [blame]	261	be attached to this policy or (if the \|intel_pstate\| scaling driver is
Rafael J. Wysocki	2a0e492	2017-03-13 23:59:57 +0100	[diff] [blame]	262	in use) list of scaling algorithms provided by the driver that can be
				263	applied to this policy.
				264
				265	[Note that some governors are modular and it may be necessary to load a
				266	kernel module for the governor held by it to become available and be
				267	listed by this attribute.]
				268
				269	``scaling_cur_freq``
				270	Current frequency of all of the CPUs belonging to this policy (in kHz).
				271
				272	For the majority of scaling drivers, this is the frequency of the last
				273	P-state requested by the driver from the hardware using the scaling
				274	interface provided by it, which may or may not reflect the frequency
				275	the CPU is actually running at (due to hardware design and other
				276	limitations).
				277
Rafael J. Wysocki	33fc30b	2017-05-14 02:06:03 +0200	[diff] [blame]	278	Some scaling drivers (e.g. \|intel_pstate\|) attempt to provide
Rafael J. Wysocki	2a0e492	2017-03-13 23:59:57 +0100	[diff] [blame]	279	information more precisely reflecting the current CPU frequency through
				280	this attribute, but that still may not be the exact current CPU
				281	frequency as seen by the hardware at the moment.
				282
				283	``scaling_driver``
				284	The scaling driver currently in use.
				285
				286	``scaling_governor``
				287	The scaling governor currently attached to this policy or (if the
Rafael J. Wysocki	33fc30b	2017-05-14 02:06:03 +0200	[diff] [blame]	288	\|intel_pstate\| scaling driver is in use) the scaling algorithm
Rafael J. Wysocki	2a0e492	2017-03-13 23:59:57 +0100	[diff] [blame]	289	provided by the driver that is currently applied to this policy.
				290
				291	This attribute is read-write and writing to it will cause a new scaling
				292	governor to be attached to this policy or a new scaling algorithm
				293	provided by the scaling driver to be applied to it (in the
Rafael J. Wysocki	33fc30b	2017-05-14 02:06:03 +0200	[diff] [blame]	294	\|intel_pstate\| case), as indicated by the string written to this
Rafael J. Wysocki	2a0e492	2017-03-13 23:59:57 +0100	[diff] [blame]	295	attribute (which must be one of the names listed by the
				296	``scaling_available_governors`` attribute described above).
				297
				298	``scaling_max_freq``
				299	Maximum frequency the CPUs belonging to this policy are allowed to be
				300	running at (in kHz).
				301
				302	This attribute is read-write and writing a string representing an
				303	integer to it will cause a new limit to be set (it must not be lower
				304	than the value of the ``scaling_min_freq`` attribute).
				305
				306	``scaling_min_freq``
				307	Minimum frequency the CPUs belonging to this policy are allowed to be
				308	running at (in kHz).
				309
				310	This attribute is read-write and writing a string representing a
				311	non-negative integer to it will cause a new limit to be set (it must not
				312	be higher than the value of the ``scaling_max_freq`` attribute).
				313
				314	``scaling_setspeed``
				315	This attribute is functional only if the `userspace`_ scaling governor
				316	is attached to the given policy.
				317
				318	It returns the last frequency requested by the governor (in kHz) or can
				319	be written to in order to set a new frequency for the policy.
				320
				321
				322	Generic Scaling Governors
				323	=========================
				324
				325	``CPUFreq`` provides generic scaling governors that can be used with all
				326	scaling drivers. As stated before, each of them implements a single, possibly
				327	parametrized, performance scaling algorithm.
				328
				329	Scaling governors are attached to policy objects and different policy objects
				330	can be handled by different scaling governors at the same time (although that
				331	may lead to suboptimal results in some cases).
				332
				333	The scaling governor for a given policy object can be changed at any time with
				334	the help of the ``scaling_governor`` policy attribute in ``sysfs``.
				335
				336	Some governors expose ``sysfs`` attributes to control or fine-tune the scaling
				337	algorithms implemented by them. Those attributes, referred to as governor
				338	tunables, can be either global (system-wide) or per-policy, depending on the
				339	scaling driver in use. If the driver requires governor tunables to be
				340	per-policy, they are located in a subdirectory of each policy directory.
				341	Otherwise, they are located in a subdirectory under
				342	:file:`/sys/devices/system/cpu/cpufreq/`. In either case the name of the
				343	subdirectory containing the governor tunables is the name of the governor
				344	providing them.
				345
				346	``performance``
				347	---------------
				348
				349	When attached to a policy object, this governor causes the highest frequency,
				350	within the ``scaling_max_freq`` policy limit, to be requested for that policy.
				351
				352	The request is made once at that time the governor for the policy is set to
				353	``performance`` and whenever the ``scaling_max_freq`` or ``scaling_min_freq``
				354	policy limits change after that.
				355
				356	``powersave``
				357	-------------
				358
				359	When attached to a policy object, this governor causes the lowest frequency,
				360	within the ``scaling_min_freq`` policy limit, to be requested for that policy.
				361
				362	The request is made once at that time the governor for the policy is set to
				363	``powersave`` and whenever the ``scaling_max_freq`` or ``scaling_min_freq``
				364	policy limits change after that.
				365
				366	``userspace``
				367	-------------
				368
				369	This governor does not do anything by itself. Instead, it allows user space
				370	to set the CPU frequency for the policy it is attached to by writing to the
				371	``scaling_setspeed`` attribute of that policy.
				372
				373	``schedutil``
				374	-------------
				375
				376	This governor uses CPU utilization data available from the CPU scheduler. It
				377	generally is regarded as a part of the CPU scheduler, so it can access the
				378	scheduler's internal data structures directly.
				379
				380	It runs entirely in scheduler context, although in some cases it may need to
				381	invoke the scaling driver asynchronously when it decides that the CPU frequency
				382	should be changed for a given policy (that depends on whether or not the driver
				383	is capable of changing the CPU frequency from scheduler context).
				384
				385	The actions of this governor for a particular CPU depend on the scheduling class
				386	invoking its utilization update callback for that CPU. If it is invoked by the
				387	RT or deadline scheduling classes, the governor will increase the frequency to
				388	the allowed maximum (that is, the ``scaling_max_freq`` policy limit). In turn,
				389	if it is invoked by the CFS scheduling class, the governor will use the
				390	Per-Entity Load Tracking (PELT) metric for the root control group of the
				391	given CPU as the CPU utilization estimate (see the `Per-entity load tracking`_
				392	LWN.net article for a description of the PELT mechanism). Then, the new
				393	CPU frequency to apply is computed in accordance with the formula
				394
				395	f = 1.25 * ``f_0`` * ``util`` / ``max``
				396
				397	where ``util`` is the PELT number, ``max`` is the theoretical maximum of
				398	``util``, and ``f_0`` is either the maximum possible CPU frequency for the given
				399	policy (if the PELT number is frequency-invariant), or the current CPU frequency
				400	(otherwise).
				401
				402	This governor also employs a mechanism allowing it to temporarily bump up the
				403	CPU frequency for tasks that have been waiting on I/O most recently, called
				404	"IO-wait boosting". That happens when the :c:macro:`SCHED_CPUFREQ_IOWAIT` flag
				405	is passed by the scheduler to the governor callback which causes the frequency
				406	to go up to the allowed maximum immediately and then draw back to the value
				407	returned by the above formula over time.
				408
				409	This governor exposes only one tunable:
				410
				411	``rate_limit_us``
				412	Minimum time (in microseconds) that has to pass between two consecutive
				413	runs of governor computations (default: 1000 times the scaling driver's
				414	transition latency).
				415
				416	The purpose of this tunable is to reduce the scheduler context overhead
				417	of the governor which might be excessive without it.
				418
				419	This governor generally is regarded as a replacement for the older `ondemand`_
				420	and `conservative`_ governors (described below), as it is simpler and more
				421	tightly integrated with the CPU scheduler, its overhead in terms of CPU context
				422	switches and similar is less significant, and it uses the scheduler's own CPU
				423	utilization metric, so in principle its decisions should not contradict the
				424	decisions made by the other parts of the scheduler.
				425
				426	``ondemand``
				427	------------
				428
				429	This governor uses CPU load as a CPU frequency selection metric.
				430
				431	In order to estimate the current CPU load, it measures the time elapsed between
				432	consecutive invocations of its worker routine and computes the fraction of that
				433	time in which the given CPU was not idle. The ratio of the non-idle (active)
				434	time to the total CPU time is taken as an estimate of the load.
				435
				436	If this governor is attached to a policy shared by multiple CPUs, the load is
				437	estimated for all of them and the greatest result is taken as the load estimate
				438	for the entire policy.
				439
				440	The worker routine of this governor has to run in process context, so it is
				441	invoked asynchronously (via a workqueue) and CPU P-states are updated from
				442	there if necessary. As a result, the scheduler context overhead from this
				443	governor is minimum, but it causes additional CPU context switches to happen
				444	relatively often and the CPU P-state updates triggered by it can be relatively
				445	irregular. Also, it affects its own CPU load metric by running code that
				446	reduces the CPU idle time (even though the CPU idle time is only reduced very
				447	slightly by it).
				448
				449	It generally selects CPU frequencies proportional to the estimated load, so that
				450	the value of the ``cpuinfo_max_freq`` policy attribute corresponds to the load of
				451	1 (or 100%), and the value of the ``cpuinfo_min_freq`` policy attribute
				452	corresponds to the load of 0, unless when the load exceeds a (configurable)
				453	speedup threshold, in which case it will go straight for the highest frequency
				454	it is allowed to use (the ``scaling_max_freq`` policy limit).
				455
				456	This governor exposes the following tunables:
				457
				458	``sampling_rate``
				459	This is how often the governor's worker routine should run, in
				460	microseconds.
				461
				462	Typically, it is set to values of the order of 10000 (10 ms). Its
				463	default value is equal to the value of ``cpuinfo_transition_latency``
				464	for each policy this governor is attached to (but since the unit here
				465	is greater by 1000, this means that the time represented by
				466	``sampling_rate`` is 1000 times greater than the transition latency by
				467	default).
				468
				469	If this tunable is per-policy, the following shell command sets the time
				470	represented by it to be 750 times as high as the transition latency::
				471
				472	# echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) > ondemand/sampling_rate
				473
				474
				475	``min_sampling_rate``
				476	The minimum value of ``sampling_rate``.
				477
				478	Equal to 10000 (10 ms) if :c:macro:`CONFIG_NO_HZ_COMMON` and
				479	:c:data:`tick_nohz_active` are both set or to 20 times the value of
				480	:c:data:`jiffies` in microseconds otherwise.
				481
				482	``up_threshold``
				483	If the estimated CPU load is above this value (in percent), the governor
				484	will set the frequency to the maximum value allowed for the policy.
				485	Otherwise, the selected frequency will be proportional to the estimated
				486	CPU load.
				487
				488	``ignore_nice_load``
				489	If set to 1 (default 0), it will cause the CPU load estimation code to
				490	treat the CPU time spent on executing tasks with "nice" levels greater
				491	than 0 as CPU idle time.
				492
				493	This may be useful if there are tasks in the system that should not be
				494	taken into account when deciding what frequency to run the CPUs at.
				495	Then, to make that happen it is sufficient to increase the "nice" level
				496	of those tasks above 0 and set this attribute to 1.
				497
				498	``sampling_down_factor``
				499	Temporary multiplier, between 1 (default) and 100 inclusive, to apply to
				500	the ``sampling_rate`` value if the CPU load goes above ``up_threshold``.
				501
				502	This causes the next execution of the governor's worker routine (after
				503	setting the frequency to the allowed maximum) to be delayed, so the
				504	frequency stays at the maximum level for a longer time.
				505
				506	Frequency fluctuations in some bursty workloads may be avoided this way
				507	at the cost of additional energy spent on maintaining the maximum CPU
				508	capacity.
				509
				510	``powersave_bias``
				511	Reduction factor to apply to the original frequency target of the
				512	governor (including the maximum value used when the ``up_threshold``
				513	value is exceeded by the estimated CPU load) or sensitivity threshold
				514	for the AMD frequency sensitivity powersave bias driver
				515	(:file:`drivers/cpufreq/amd_freq_sensitivity.c`), between 0 and 1000
				516	inclusive.
				517
				518	If the AMD frequency sensitivity powersave bias driver is not loaded,
				519	the effective frequency to apply is given by
				520
				521	f * (1 - ``powersave_bias`` / 1000)
				522
				523	where f is the governor's original frequency target. The default value
				524	of this attribute is 0 in that case.
				525
				526	If the AMD frequency sensitivity powersave bias driver is loaded, the
				527	value of this attribute is 400 by default and it is used in a different
				528	way.
				529
				530	On Family 16h (and later) AMD processors there is a mechanism to get a
				531	measured workload sensitivity, between 0 and 100% inclusive, from the
				532	hardware. That value can be used to estimate how the performance of the
				533	workload running on a CPU will change in response to frequency changes.
				534
				535	The performance of a workload with the sensitivity of 0 (memory-bound or
				536	IO-bound) is not expected to increase at all as a result of increasing
				537	the CPU frequency, whereas workloads with the sensitivity of 100%
				538	(CPU-bound) are expected to perform much better if the CPU frequency is
				539	increased.
				540
				541	If the workload sensitivity is less than the threshold represented by
				542	the ``powersave_bias`` value, the sensitivity powersave bias driver
				543	will cause the governor to select a frequency lower than its original
				544	target, so as to avoid over-provisioning workloads that will not benefit
				545	from running at higher CPU frequencies.
				546
				547	``conservative``
				548	----------------
				549
				550	This governor uses CPU load as a CPU frequency selection metric.
				551
				552	It estimates the CPU load in the same way as the `ondemand`_ governor described
				553	above, but the CPU frequency selection algorithm implemented by it is different.
				554
				555	Namely, it avoids changing the frequency significantly over short time intervals
				556	which may not be suitable for systems with limited power supply capacity (e.g.
				557	battery-powered). To achieve that, it changes the frequency in relatively
				558	small steps, one step at a time, up or down - depending on whether or not a
				559	(configurable) threshold has been exceeded by the estimated CPU load.
				560
				561	This governor exposes the following tunables:
				562
				563	``freq_step``
				564	Frequency step in percent of the maximum frequency the governor is
				565	allowed to set (the ``scaling_max_freq`` policy limit), between 0 and
				566	100 (5 by default).
				567
				568	This is how much the frequency is allowed to change in one go. Setting
				569	it to 0 will cause the default frequency step (5 percent) to be used
				570	and setting it to 100 effectively causes the governor to periodically
				571	switch the frequency between the ``scaling_min_freq`` and
				572	``scaling_max_freq`` policy limits.
				573
				574	``down_threshold``
				575	Threshold value (in percent, 20 by default) used to determine the
				576	frequency change direction.
				577
				578	If the estimated CPU load is greater than this value, the frequency will
				579	go up (by ``freq_step``). If the load is less than this value (and the
				580	``sampling_down_factor`` mechanism is not in effect), the frequency will
				581	go down. Otherwise, the frequency will not be changed.
				582
				583	``sampling_down_factor``
				584	Frequency decrease deferral factor, between 1 (default) and 10
				585	inclusive.
				586
				587	It effectively causes the frequency to go down ``sampling_down_factor``
				588	times slower than it ramps up.
				589
				590
				591	Frequency Boost Support
				592	=======================
				593
				594	Background
				595	----------
				596
				597	Some processors support a mechanism to raise the operating frequency of some
				598	cores in a multicore package temporarily (and above the sustainable frequency
				599	threshold for the whole package) under certain conditions, for example if the
				600	whole chip is not fully utilized and below its intended thermal or power budget.
				601
				602	Different names are used by different vendors to refer to this functionality.
				603	For Intel processors it is referred to as "Turbo Boost", AMD calls it
				604	"Turbo-Core" or (in technical documentation) "Core Performance Boost" and so on.
				605	As a rule, it also is implemented differently by different vendors. The simple
				606	term "frequency boost" is used here for brevity to refer to all of those
				607	implementations.
				608
				609	The frequency boost mechanism may be either hardware-based or software-based.
				610	If it is hardware-based (e.g. on x86), the decision to trigger the boosting is
				611	made by the hardware (although in general it requires the hardware to be put
				612	into a special state in which it can control the CPU frequency within certain
				613	limits). If it is software-based (e.g. on ARM), the scaling driver decides
				614	whether or not to trigger boosting and when to do that.
				615
				616	The ``boost`` File in ``sysfs``
				617	-------------------------------
				618
				619	This file is located under :file:`/sys/devices/system/cpu/cpufreq/` and controls
				620	the "boost" setting for the whole system. It is not present if the underlying
				621	scaling driver does not support the frequency boost mechanism (or supports it,
				622	but provides a driver-specific interface for controlling it, like
Rafael J. Wysocki	33fc30b	2017-05-14 02:06:03 +0200	[diff] [blame]	623	\|intel_pstate\|).
Rafael J. Wysocki	2a0e492	2017-03-13 23:59:57 +0100	[diff] [blame]	624
				625	If the value in this file is 1, the frequency boost mechanism is enabled. This
				626	means that either the hardware can be put into states in which it is able to
				627	trigger boosting (in the hardware-based case), or the software is allowed to
				628	trigger boosting (in the software-based case). It does not mean that boosting
				629	is actually in use at the moment on any CPUs in the system. It only means a
				630	permission to use the frequency boost mechanism (which still may never be used
				631	for other reasons).
				632
				633	If the value in this file is 0, the frequency boost mechanism is disabled and
				634	cannot be used at all.
				635
				636	The only values that can be written to this file are 0 and 1.
				637
				638	Rationale for Boost Control Knob
				639	--------------------------------
				640
				641	The frequency boost mechanism is generally intended to help to achieve optimum
				642	CPU performance on time scales below software resolution (e.g. below the
				643	scheduler tick interval) and it is demonstrably suitable for many workloads, but
				644	it may lead to problems in certain situations.
				645
				646	For this reason, many systems make it possible to disable the frequency boost
				647	mechanism in the platform firmware (BIOS) setup, but that requires the system to
				648	be restarted for the setting to be adjusted as desired, which may not be
				649	practical at least in some cases. For example:
				650
				651	1. Boosting means overclocking the processor, although under controlled
				652	conditions. Generally, the processor's energy consumption increases
				653	as a result of increasing its frequency and voltage, even temporarily.
				654	That may not be desirable on systems that switch to power sources of
				655	limited capacity, such as batteries, so the ability to disable the boost
				656	mechanism while the system is running may help there (but that depends on
				657	the workload too).
				658
				659	2. In some situations deterministic behavior is more important than
				660	performance or energy consumption (or both) and the ability to disable
				661	boosting while the system is running may be useful then.
				662
				663	3. To examine the impact of the frequency boost mechanism itself, it is useful
				664	to be able to run tests with and without boosting, preferably without
				665	restarting the system in the meantime.
				666
				667	4. Reproducible results are important when running benchmarks. Since
				668	the boosting functionality depends on the load of the whole package,
				669	single-thread performance may vary because of it which may lead to
				670	unreproducible results sometimes. That can be avoided by disabling the
				671	frequency boost mechanism before running benchmarks sensitive to that
				672	issue.
				673
				674	Legacy AMD ``cpb`` Knob
				675	-----------------------
				676
				677	The AMD powernow-k8 scaling driver supports a ``sysfs`` knob very similar to
				678	the global ``boost`` one. It is used for disabling/enabling the "Core
				679	Performance Boost" feature of some AMD processors.
				680
				681	If present, that knob is located in every ``CPUFreq`` policy directory in
				682	``sysfs`` (:file:`/sys/devices/system/cpu/cpufreq/policyX/`) and is called
				683	``cpb``, which indicates a more fine grained control interface. The actual
				684	implementation, however, works on the system-wide basis and setting that knob
				685	for one policy causes the same value of it to be set for all of the other
				686	policies at the same time.
				687
				688	That knob is still supported on AMD processors that support its underlying
				689	hardware feature, but it may be configured out of the kernel (via the
				690	:c:macro:`CONFIG_X86_ACPI_CPUFREQ_CPB` configuration option) and the global
				691	``boost`` knob is present regardless. Thus it is always possible use the
				692	``boost`` knob instead of the ``cpb`` one which is highly recommended, as that
				693	is more consistent with what all of the other systems do (and the ``cpb`` knob
				694	may not be supported any more in the future).
				695
				696	The ``cpb`` knob is never present for any processors without the underlying
				697	hardware feature (e.g. all Intel ones), even if the
				698	:c:macro:`CONFIG_X86_ACPI_CPUFREQ_CPB` configuration option is set.
				699
				700
				701	.. _Per-entity load tracking: https://lwn.net/Articles/531853/