Documentation/scheduler/sched-tune.txt - kernel/msm-4.9 - Gitiles

              Central, scheduler-driven, power-performance control
                                (EXPERIMENTAL)

 Abstract
 ========

 The topic of a single simple power-performance tunable, that is wholly
 scheduler centric, and has well defined and predictable properties has come up
 on several occasions in the past [1,2]. With techniques such as a scheduler
 driven DVFS [3], we now have a good framework for implementing such a tunable.
 This document describes the overall ideas behind its design and implementation.


 Table of Contents
 =================

 1. Motivation
 2. Introduction
 3. Signal Boosting Strategy
 4. OPP selection using boosted CPU utilization
 5. Per task group boosting
 6. Question and Answers
    - What about "auto" mode?
    - What about boosting on a congested system?
    - How CPUs are boosted when we have tasks with multiple boost values?
 7. References


 1. Motivation
 =============

 Sched-DVFS [3] is a new event-driven cpufreq governor which allows the
 scheduler to select the optimal DVFS operating point (OPP) for running a task
 allocated to a CPU. The introduction of sched-DVFS enables running workloads at
 the most energy efficient OPPs.

 However, sometimes it may be desired to intentionally boost the performance of
 a workload even if that could imply a reasonable increase in energy
 consumption. For example, in order to reduce the response time of a task, we
 may want to run the task at a higher OPP than the one that is actually required
 by it's CPU bandwidth demand.

 This last requirement is especially important if we consider that one of the
 main goals of the sched-DVFS component is to replace all currently available
 CPUFreq policies. Since sched-DVFS is event based, as opposed to the sampling
 driven governors we currently have, it is already more responsive at selecting
 the optimal OPP to run tasks allocated to a CPU. However, just tracking the
 actual task load demand may not be enough from a performance standpoint.  For
 example, it is not possible to get behaviors similar to those provided by the
 "performance" and "interactive" CPUFreq governors.

 This document describes an implementation of a tunable, stacked on top of the
 sched-DVFS which extends its functionality to support task performance
 boosting.

 By "performance boosting" we mean the reduction of the time required to
 complete a task activation, i.e. the time elapsed from a task wakeup to its
 next deactivation (e.g. because it goes back to sleep or it terminates).  For
 example, if we consider a simple periodic task which executes the same workload
 for 5[s] every 20[s] while running at a certain OPP, a boosted execution of
 that task must complete each of its activations in less than 5[s].

 A previous attempt [5] to introduce such a boosting feature has not been
 successful mainly because of the complexity of the proposed solution.  The
 approach described in this document exposes a single simple interface to
 user-space.  This single tunable knob allows the tuning of system wide
 scheduler behaviours ranging from energy efficiency at one end through to
 incremental performance boosting at the other end.  This first tunable affects
 all tasks. However, a more advanced extension of the concept is also provided
 which uses CGroups to boost the performance of only selected tasks while using
 the energy efficient default for all others.

 The rest of this document introduces in more details the proposed solution
 which has been named SchedTune.


 2. Introduction
 ===============

 SchedTune exposes a simple user-space interface with a single power-performance
 tunable:

   /proc/sys/kernel/sched_cfs_boost

 This permits expressing a boost value as an integer in the range [0..100].

 A value of 0 (default) configures the CFS scheduler for maximum energy
 efficiency. This means that sched-DVFS runs the tasks at the minimum OPP
 required to satisfy their workload demand.
 A value of 100 configures scheduler for maximum performance, which translates
 to the selection of the maximum OPP on that CPU.

 The range between 0 and 100 can be set to satisfy other scenarios suitably. For
 example to satisfy interactive response or depending on other system events
 (battery level etc).

 A CGroup based extension is also provided, which permits further user-space
 defined task classification to tune the scheduler for different goals depending
 on the specific nature of the task, e.g. background vs interactive vs
 low-priority.

 The overall design of the SchedTune module is built on top of "Per-Entity Load
 Tracking" (PELT) signals and sched-DVFS by introducing a bias on the Operating
 Performance Point (OPP) selection.
 Each time a task is allocated on a CPU, sched-DVFS has the opportunity to tune
 the operating frequency of that CPU to better match the workload demand. The
 selection of the actual OPP being activated is influenced by the global boost
 value, or the boost value for the task CGroup when in use.

 This simple biasing approach leverages existing frameworks, which means minimal
 modifications to the scheduler, and yet it allows to achieve a range of
 different behaviours all from a single simple tunable knob.
 The only new concept introduced is that of signal boosting.


 3. Signal Boosting Strategy
 ===========================

 The whole PELT machinery works based on the value of a few load tracking signals
 which basically track the CPU bandwidth requirements for tasks and the capacity
 of CPUs. The basic idea behind the SchedTune knob is to artificially inflate
 some of these load tracking signals to make a task or RQ appears more demanding
 that it actually is.

 Which signals have to be inflated depends on the specific "consumer".  However,
 independently from the specific (signal, consumer) pair, it is important to
 define a simple and possibly consistent strategy for the concept of boosting a
 signal.

 A boosting strategy defines how the "abstract" user-space defined
 sched_cfs_boost value is translated into an internal "margin" value to be added
 to a signal to get its inflated value:

   margin         := boosting_strategy(sched_cfs_boost, signal)
   boosted_signal := signal + margin

 Different boosting strategies were identified and analyzed before selecting the
 one found to be most effective.

 Signal Proportional Compensation (SPC)
 --------------------------------------

 In this boosting strategy the sched_cfs_boost value is used to compute a
 margin which is proportional to the complement of the original signal.
 When a signal has a maximum possible value, its complement is defined as
 the delta from the actual value and its possible maximum.

 Since the tunable implementation uses signals which have SCHED_LOAD_SCALE as
 the maximum possible value, the margin becomes:

 	margin := sched_cfs_boost * (SCHED_LOAD_SCALE - signal)

 Using this boosting strategy:
 - a 100% sched_cfs_boost means that the signal is scaled to the maximum value
 - each value in the range of sched_cfs_boost effectively inflates the signal in
   question by a quantity which is proportional to the maximum value.

 For example, by applying the SPC boosting strategy to the selection of the OPP
 to run a task it is possible to achieve these behaviors:

 -   0% boosting: run the task at the minimum OPP required by its workload
 - 100% boosting: run the task at the maximum OPP available for the CPU
 -  50% boosting: run at the half-way OPP between minimum and maximum

 Which means that, at 50% boosting, a task will be scheduled to run at half of
 the maximum theoretically achievable performance on the specific target
 platform.

 A graphical representation of an SPC boosted signal is represented in the
 following figure where:
  a) "-" represents the original signal
  b) "b" represents a  50% boosted signal
  c) "p" represents a 100% boosted signal


    ^
    |  SCHED_LOAD_SCALE
    +-----------------------------------------------------------------+
    |pppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppp
    |
    |                                             boosted_signal
    |                                          bbbbbbbbbbbbbbbbbbbbbbbb
    |
    |                                            original signal
    |                  bbbbbbbbbbbbbbbbbbbbbbbb+----------------------+
    |                                          |
    |bbbbbbbbbbbbbbbbbb                        |
    |                                          |
    |                                          |
    |                                          |
    |                  +-----------------------+
    |                  |
    |                  |
    |                  |
    |------------------+
    |
    |
    +----------------------------------------------------------------------->

 The plot above shows a ramped load signal (titled 'original_signal') and it's
 boosted equivalent. For each step of the original signal the boosted signal
 corresponding to a 50% boost is midway from the original signal and the upper
 bound. Boosting by 100% generates a boosted signal which is always saturated to
 the upper bound.


 4. OPP selection using boosted CPU utilization
 ==============================================

 It is worth calling out that the implementation does not introduce any new load
 signals. Instead, it provides an API to tune existing signals. This tuning is
 done on demand and only in scheduler code paths where it is sensible to do so.
 The new API calls are defined to return either the default signal or a boosted
 one, depending on the value of sched_cfs_boost. This is a clean an non invasive
 modification of the existing existing code paths.

 The signal representing a CPU's utilization is boosted according to the
 previously described SPC boosting strategy. To sched-DVFS, this allows a CPU
 (ie CFS run-queue) to appear more used then it actually is.

 Thus, with the sched_cfs_boost enabled we have the following main functions to
 get the current utilization of a CPU:

   cpu_util()
   boosted_cpu_util()

 The new boosted_cpu_util() is similar to the first but returns a boosted
 utilization signal which is a function of the sched_cfs_boost value.

 This function is used in the CFS scheduler code paths where sched-DVFS needs to
 decide the OPP to run a CPU at.
 For example, this allows selecting the highest OPP for a CPU which has
 the boost value set to 100%.


 5. Per task group boosting
 ==========================

 The availability of a single knob which is used to boost all tasks in the
 system is certainly a simple solution but it quite likely doesn't fit many
 utilization scenarios, especially in the mobile device space.

 For example, on battery powered devices there usually are many background
 services which are long running and need energy efficient scheduling. On the
 other hand, some applications are more performance sensitive and require an
 interactive response and/or maximum performance, regardless of the energy cost.
 To better service such scenarios, the SchedTune implementation has an extension
 that provides a more fine grained boosting interface.

 A new CGroup controller, namely "schedtune", could be enabled which allows to
 defined and configure task groups with different boosting values.
 Tasks that require special performance can be put into separate CGroups.
 The value of the boost associated with the tasks in this group can be specified
 using a single knob exposed by the CGroup controller:

    schedtune.boost

 This knob allows the definition of a boost value that is to be used for
 SPC boosting of all tasks attached to this group.

 The current schedtune controller implementation is really simple and has these
 main characteristics:

   1) It is only possible to create 1 level depth hierarchies

      The root control groups define the system-wide boost value to be applied
      by default to all tasks. Its direct subgroups are named "boost groups" and
      they define the boost value for specific set of tasks.
      Further nested subgroups are not allowed since they do not have a sensible
      meaning from a user-space standpoint.

   2) It is possible to define only a limited number of "boost groups"

      This number is defined at compile time and by default configured to 16.
      This is a design decision motivated by two main reasons:
      a) In a real system we do not expect utilization scenarios with more then few
 	boost groups. For example, a reasonable collection of groups could be
         just "background", "interactive" and "performance".
      b) It simplifies the implementation considerably, especially for the code
 	which has to compute the per CPU boosting once there are multiple
         RUNNABLE tasks with different boost values.

 Such a simple design should allow servicing the main utilization scenarios identified
 so far. It provides a simple interface which can be used to manage the
 power-performance of all tasks or only selected tasks.
 Moreover, this interface can be easily integrated by user-space run-times (e.g.
 Android, ChromeOS) to implement a QoS solution for task boosting based on tasks
 classification, which has been a long standing requirement.

 Setup and usage
 ---------------

 0. Use a kernel with CGROUP_SCHEDTUNE support enabled

 1. Check that the "schedtune" CGroup controller is available:

    root@linaro-nano:~# cat /proc/cgroups
    #subsys_name	hierarchy	num_cgroups	enabled
    cpuset  	0		1		1
    cpu     	0		1		1
    schedtune	0		1		1

 2. Mount a tmpfs to create the CGroups mount point (Optional)

    root@linaro-nano:~# sudo mount -t tmpfs cgroups /sys/fs/cgroup

 3. Mount the "schedtune" controller

    root@linaro-nano:~# mkdir /sys/fs/cgroup/stune
    root@linaro-nano:~# sudo mount -t cgroup -o schedtune stune /sys/fs/cgroup/stune

 4. Setup the system-wide boost value (Optional)

    If not configured the root control group has a 0% boost value, which
    basically disables boosting for all tasks in the system thus running in
    an energy-efficient mode.

    root@linaro-nano:~# echo $SYSBOOST > /sys/fs/cgroup/stune/schedtune.boost

 5. Create task groups and configure their specific boost value (Optional)

    For example here we create a "performance" boost group configure to boost
    all its tasks to 100%

    root@linaro-nano:~# mkdir /sys/fs/cgroup/stune/performance
    root@linaro-nano:~# echo 100 > /sys/fs/cgroup/stune/performance/schedtune.boost

 6. Move tasks into the boost group

    For example, the following moves the tasks with PID $TASKPID (and all its
    threads) into the "performance" boost group.

    root@linaro-nano:~# echo "TASKPID > /sys/fs/cgroup/stune/performance/cgroup.procs

 This simple configuration allows only the threads of the $TASKPID task to run,
 when needed, at the highest OPP in the most capable CPU of the system.


 6. Question and Answers
 =======================

 What about "auto" mode?
 -----------------------

 The 'auto' mode as described in [5] can be implemented by interfacing SchedTune
 with some suitable user-space element. This element could use the exposed
 system-wide or cgroup based interface.

 How are multiple groups of tasks with different boost values managed?
 ---------------------------------------------------------------------

 The current SchedTune implementation keeps track of the boosted RUNNABLE tasks
 on a CPU. Once sched-DVFS selects the OPP to run a CPU at, the CPU utilization
 is boosted with a value which is the maximum of the boost values of the
 currently RUNNABLE tasks in its RQ.

 This allows sched-DVFS to boost a CPU only while there are boosted tasks ready
 to run and switch back to the energy efficient mode as soon as the last boosted
 task is dequeued.


 7. References
 =============
 [1] http://lwn.net/Articles/552889
 [2] http://lkml.org/lkml/2012/5/18/91
 [3] http://lkml.org/lkml/2015/6/26/620
	Central, scheduler-driven, power-performance control
	(EXPERIMENTAL)

	Abstract
	========

	The topic of a single simple power-performance tunable, that is wholly
	scheduler centric, and has well defined and predictable properties has come up
	on several occasions in the past [1,2]. With techniques such as a scheduler
	driven DVFS [3], we now have a good framework for implementing such a tunable.
	This document describes the overall ideas behind its design and implementation.


	Table of Contents
	=================

	1. Motivation
	2. Introduction
	3. Signal Boosting Strategy
	4. OPP selection using boosted CPU utilization
	5. Per task group boosting
	6. Question and Answers
	- What about "auto" mode?
	- What about boosting on a congested system?
	- How CPUs are boosted when we have tasks with multiple boost values?
	7. References


	1. Motivation
	=============

	Sched-DVFS [3] is a new event-driven cpufreq governor which allows the
	scheduler to select the optimal DVFS operating point (OPP) for running a task
	allocated to a CPU. The introduction of sched-DVFS enables running workloads at
	the most energy efficient OPPs.

	However, sometimes it may be desired to intentionally boost the performance of
	a workload even if that could imply a reasonable increase in energy
	consumption. For example, in order to reduce the response time of a task, we
	may want to run the task at a higher OPP than the one that is actually required
	by it's CPU bandwidth demand.

	This last requirement is especially important if we consider that one of the
	main goals of the sched-DVFS component is to replace all currently available
	CPUFreq policies. Since sched-DVFS is event based, as opposed to the sampling
	driven governors we currently have, it is already more responsive at selecting
	the optimal OPP to run tasks allocated to a CPU. However, just tracking the
	actual task load demand may not be enough from a performance standpoint. For
	example, it is not possible to get behaviors similar to those provided by the
	"performance" and "interactive" CPUFreq governors.

	This document describes an implementation of a tunable, stacked on top of the
	sched-DVFS which extends its functionality to support task performance
	boosting.

	By "performance boosting" we mean the reduction of the time required to
	complete a task activation, i.e. the time elapsed from a task wakeup to its
	next deactivation (e.g. because it goes back to sleep or it terminates). For
	example, if we consider a simple periodic task which executes the same workload
	for 5[s] every 20[s] while running at a certain OPP, a boosted execution of
	that task must complete each of its activations in less than 5[s].

	A previous attempt [5] to introduce such a boosting feature has not been
	successful mainly because of the complexity of the proposed solution. The
	approach described in this document exposes a single simple interface to
	user-space. This single tunable knob allows the tuning of system wide
	scheduler behaviours ranging from energy efficiency at one end through to
	incremental performance boosting at the other end. This first tunable affects
	all tasks. However, a more advanced extension of the concept is also provided
	which uses CGroups to boost the performance of only selected tasks while using
	the energy efficient default for all others.

	The rest of this document introduces in more details the proposed solution
	which has been named SchedTune.


	2. Introduction
	===============

	SchedTune exposes a simple user-space interface with a single power-performance
	tunable:

	/proc/sys/kernel/sched_cfs_boost

	This permits expressing a boost value as an integer in the range [0..100].

	A value of 0 (default) configures the CFS scheduler for maximum energy
	efficiency. This means that sched-DVFS runs the tasks at the minimum OPP
	required to satisfy their workload demand.
	A value of 100 configures scheduler for maximum performance, which translates
	to the selection of the maximum OPP on that CPU.

	The range between 0 and 100 can be set to satisfy other scenarios suitably. For
	example to satisfy interactive response or depending on other system events
	(battery level etc).

	A CGroup based extension is also provided, which permits further user-space
	defined task classification to tune the scheduler for different goals depending
	on the specific nature of the task, e.g. background vs interactive vs
	low-priority.

	The overall design of the SchedTune module is built on top of "Per-Entity Load
	Tracking" (PELT) signals and sched-DVFS by introducing a bias on the Operating
	Performance Point (OPP) selection.
	Each time a task is allocated on a CPU, sched-DVFS has the opportunity to tune
	the operating frequency of that CPU to better match the workload demand. The
	selection of the actual OPP being activated is influenced by the global boost
	value, or the boost value for the task CGroup when in use.

	This simple biasing approach leverages existing frameworks, which means minimal
	modifications to the scheduler, and yet it allows to achieve a range of
	different behaviours all from a single simple tunable knob.
	The only new concept introduced is that of signal boosting.


	3. Signal Boosting Strategy
	===========================

	The whole PELT machinery works based on the value of a few load tracking signals
	which basically track the CPU bandwidth requirements for tasks and the capacity
	of CPUs. The basic idea behind the SchedTune knob is to artificially inflate
	some of these load tracking signals to make a task or RQ appears more demanding
	that it actually is.

	Which signals have to be inflated depends on the specific "consumer". However,
	independently from the specific (signal, consumer) pair, it is important to
	define a simple and possibly consistent strategy for the concept of boosting a
	signal.

	A boosting strategy defines how the "abstract" user-space defined
	sched_cfs_boost value is translated into an internal "margin" value to be added
	to a signal to get its inflated value:

	margin := boosting_strategy(sched_cfs_boost, signal)
	boosted_signal := signal + margin

	Different boosting strategies were identified and analyzed before selecting the
	one found to be most effective.

	Signal Proportional Compensation (SPC)
	--------------------------------------

	In this boosting strategy the sched_cfs_boost value is used to compute a
	margin which is proportional to the complement of the original signal.
	When a signal has a maximum possible value, its complement is defined as
	the delta from the actual value and its possible maximum.

	Since the tunable implementation uses signals which have SCHED_LOAD_SCALE as
	the maximum possible value, the margin becomes:

	margin := sched_cfs_boost * (SCHED_LOAD_SCALE - signal)

	Using this boosting strategy:
	- a 100% sched_cfs_boost means that the signal is scaled to the maximum value
	- each value in the range of sched_cfs_boost effectively inflates the signal in
	question by a quantity which is proportional to the maximum value.

	For example, by applying the SPC boosting strategy to the selection of the OPP
	to run a task it is possible to achieve these behaviors:

	- 0% boosting: run the task at the minimum OPP required by its workload
	- 100% boosting: run the task at the maximum OPP available for the CPU
	- 50% boosting: run at the half-way OPP between minimum and maximum

	Which means that, at 50% boosting, a task will be scheduled to run at half of
	the maximum theoretically achievable performance on the specific target
	platform.

	A graphical representation of an SPC boosted signal is represented in the
	following figure where:
	a) "-" represents the original signal
	b) "b" represents a 50% boosted signal
	c) "p" represents a 100% boosted signal


	^
	\| SCHED_LOAD_SCALE
	+-----------------------------------------------------------------+
	\|pppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppp
	\|
	\| boosted_signal
	\| bbbbbbbbbbbbbbbbbbbbbbbb
	\|
	\| original signal
	\| bbbbbbbbbbbbbbbbbbbbbbbb+----------------------+
	\| \|
	\|bbbbbbbbbbbbbbbbbb \|
	\| \|
	\| \|
	\| \|
	\| +-----------------------+
	\| \|
	\| \|
	\| \|
	\|------------------+
	\|
	\|
	+----------------------------------------------------------------------->

	The plot above shows a ramped load signal (titled 'original_signal') and it's
	boosted equivalent. For each step of the original signal the boosted signal
	corresponding to a 50% boost is midway from the original signal and the upper
	bound. Boosting by 100% generates a boosted signal which is always saturated to
	the upper bound.


	4. OPP selection using boosted CPU utilization
	==============================================

	It is worth calling out that the implementation does not introduce any new load
	signals. Instead, it provides an API to tune existing signals. This tuning is
	done on demand and only in scheduler code paths where it is sensible to do so.
	The new API calls are defined to return either the default signal or a boosted
	one, depending on the value of sched_cfs_boost. This is a clean an non invasive
	modification of the existing existing code paths.

	The signal representing a CPU's utilization is boosted according to the
	previously described SPC boosting strategy. To sched-DVFS, this allows a CPU
	(ie CFS run-queue) to appear more used then it actually is.

	Thus, with the sched_cfs_boost enabled we have the following main functions to
	get the current utilization of a CPU:

	cpu_util()
	boosted_cpu_util()

	The new boosted_cpu_util() is similar to the first but returns a boosted
	utilization signal which is a function of the sched_cfs_boost value.

	This function is used in the CFS scheduler code paths where sched-DVFS needs to
	decide the OPP to run a CPU at.
	For example, this allows selecting the highest OPP for a CPU which has
	the boost value set to 100%.


	5. Per task group boosting
	==========================

	The availability of a single knob which is used to boost all tasks in the
	system is certainly a simple solution but it quite likely doesn't fit many
	utilization scenarios, especially in the mobile device space.

	For example, on battery powered devices there usually are many background
	services which are long running and need energy efficient scheduling. On the
	other hand, some applications are more performance sensitive and require an
	interactive response and/or maximum performance, regardless of the energy cost.
	To better service such scenarios, the SchedTune implementation has an extension
	that provides a more fine grained boosting interface.

	A new CGroup controller, namely "schedtune", could be enabled which allows to
	defined and configure task groups with different boosting values.
	Tasks that require special performance can be put into separate CGroups.
	The value of the boost associated with the tasks in this group can be specified
	using a single knob exposed by the CGroup controller:

	schedtune.boost

	This knob allows the definition of a boost value that is to be used for
	SPC boosting of all tasks attached to this group.

	The current schedtune controller implementation is really simple and has these
	main characteristics:

	1) It is only possible to create 1 level depth hierarchies

	The root control groups define the system-wide boost value to be applied
	by default to all tasks. Its direct subgroups are named "boost groups" and
	they define the boost value for specific set of tasks.
	Further nested subgroups are not allowed since they do not have a sensible
	meaning from a user-space standpoint.

	2) It is possible to define only a limited number of "boost groups"

	This number is defined at compile time and by default configured to 16.
	This is a design decision motivated by two main reasons:
	a) In a real system we do not expect utilization scenarios with more then few
	boost groups. For example, a reasonable collection of groups could be
	just "background", "interactive" and "performance".
	b) It simplifies the implementation considerably, especially for the code
	which has to compute the per CPU boosting once there are multiple
	RUNNABLE tasks with different boost values.

	Such a simple design should allow servicing the main utilization scenarios identified
	so far. It provides a simple interface which can be used to manage the
	power-performance of all tasks or only selected tasks.
	Moreover, this interface can be easily integrated by user-space run-times (e.g.
	Android, ChromeOS) to implement a QoS solution for task boosting based on tasks
	classification, which has been a long standing requirement.

	Setup and usage
	---------------

	0. Use a kernel with CGROUP_SCHEDTUNE support enabled

	1. Check that the "schedtune" CGroup controller is available:

	root@linaro-nano:~# cat /proc/cgroups
	#subsys_name hierarchy num_cgroups enabled
	cpuset 0 1 1
	cpu 0 1 1
	schedtune 0 1 1

	2. Mount a tmpfs to create the CGroups mount point (Optional)

	root@linaro-nano:~# sudo mount -t tmpfs cgroups /sys/fs/cgroup

	3. Mount the "schedtune" controller

	root@linaro-nano:~# mkdir /sys/fs/cgroup/stune
	root@linaro-nano:~# sudo mount -t cgroup -o schedtune stune /sys/fs/cgroup/stune

	4. Setup the system-wide boost value (Optional)

	If not configured the root control group has a 0% boost value, which
	basically disables boosting for all tasks in the system thus running in
	an energy-efficient mode.

	root@linaro-nano:~# echo $SYSBOOST > /sys/fs/cgroup/stune/schedtune.boost

	5. Create task groups and configure their specific boost value (Optional)

	For example here we create a "performance" boost group configure to boost
	all its tasks to 100%

	root@linaro-nano:~# mkdir /sys/fs/cgroup/stune/performance
	root@linaro-nano:~# echo 100 > /sys/fs/cgroup/stune/performance/schedtune.boost

	6. Move tasks into the boost group

	For example, the following moves the tasks with PID $TASKPID (and all its
	threads) into the "performance" boost group.

	root@linaro-nano:~# echo "TASKPID > /sys/fs/cgroup/stune/performance/cgroup.procs

	This simple configuration allows only the threads of the $TASKPID task to run,
	when needed, at the highest OPP in the most capable CPU of the system.


	6. Question and Answers
	=======================

	What about "auto" mode?
	-----------------------

	The 'auto' mode as described in [5] can be implemented by interfacing SchedTune
	with some suitable user-space element. This element could use the exposed
	system-wide or cgroup based interface.

	How are multiple groups of tasks with different boost values managed?
	---------------------------------------------------------------------

	The current SchedTune implementation keeps track of the boosted RUNNABLE tasks
	on a CPU. Once sched-DVFS selects the OPP to run a CPU at, the CPU utilization
	is boosted with a value which is the maximum of the boost values of the
	currently RUNNABLE tasks in its RQ.

	This allows sched-DVFS to boost a CPU only while there are boosted tasks ready
	to run and switch back to the energy efficient mode as soon as the last boosted
	task is dequeued.


	7. References
	=============
	[1] http://lwn.net/Articles/552889
	[2] http://lkml.org/lkml/2012/5/18/91
	[3] http://lkml.org/lkml/2015/6/26/620