Blame - Documentation/scheduler/sched-deadline.txt - kernel/msm-4.9

blob: 53a2fe1ae8b8343478a1c3d7c11cba7f3572b15d [file] [log] [blame]

Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	1	Deadline Task Scheduling
				2	------------------------
				3
				4	CONTENTS
				5	========
				6
				7	0. WARNING
				8	1. Overview
				9	2. Scheduling algorithm
				10	3. Scheduling Real-Time Tasks
Luca Abeni	6aaa102	2015-05-18 15:00:32 +0200	[diff] [blame]	11	3.1 Definitions
				12	3.2 Schedulability Analysis for Uniprocessor Systems
				13	3.3 Schedulability Analysis for Multiprocessor Systems
				14	3.4 Relationship with SCHED_DEADLINE Parameters
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	15	4. Bandwidth management
				16	4.1 System-wide settings
				17	4.2 Task interface
				18	4.3 Default behavior
				19	5. Tasks CPU affinity
				20	5.1 SCHED_DEADLINE and cpusets HOWTO
				21	6. Future plans
Juri Lelli	f580193	2014-09-09 10:57:15 +0100	[diff] [blame]	22	A. Test suite
Juri Lelli	13924d2	2014-09-09 10:57:16 +0100	[diff] [blame]	23	B. Minimal main()
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	24
				25
				26	0. WARNING
				27	==========
				28
				29	Fiddling with these settings can result in an unpredictable or even unstable
				30	system behavior. As for -rt (group) scheduling, it is assumed that root users
				31	know what they're doing.
				32
				33
				34	1. Overview
				35	===========
				36
				37	The SCHED_DEADLINE policy contained inside the sched_dl scheduling class is
				38	basically an implementation of the Earliest Deadline First (EDF) scheduling
				39	algorithm, augmented with a mechanism (called Constant Bandwidth Server, CBS)
				40	that makes it possible to isolate the behavior of tasks between each other.
				41
				42
				43	2. Scheduling algorithm
				44	==================
				45
				46	SCHED_DEADLINE uses three parameters, named "runtime", "period", and
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	47	"deadline", to schedule tasks. A SCHED_DEADLINE task should receive
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	48	"runtime" microseconds of execution time every "period" microseconds, and
				49	these "runtime" microseconds are available within "deadline" microseconds
Luca Abeni	3a3a58d	2015-05-18 15:00:25 +0200	[diff] [blame]	50	from the beginning of the period. In order to implement this behavior,
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	51	every time the task wakes up, the scheduler computes a "scheduling deadline"
				52	consistent with the guarantee (using the CBS[2,3] algorithm). Tasks are then
				53	scheduled using EDF[1] on these scheduling deadlines (the task with the
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	54	earliest scheduling deadline is selected for execution). Notice that the
				55	task actually receives "runtime" time units within "deadline" if a proper
				56	"admission control" strategy (see Section "4. Bandwidth management") is used
				57	(clearly, if the system is overloaded this guarantee cannot be respected).
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	58
Luca Abeni	3aa2dbe	2015-05-18 15:00:26 +0200	[diff] [blame]	59	Summing up, the CBS[2,3] algorithm assigns scheduling deadlines to tasks so
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	60	that each task runs for at most its runtime every period, avoiding any
				61	interference between different tasks (bandwidth isolation), while the EDF[1]
Luca Abeni	ad67dc3	2014-09-09 10:57:12 +0100	[diff] [blame]	62	algorithm selects the task with the earliest scheduling deadline as the one
				63	to be executed next. Thanks to this feature, tasks that do not strictly comply
				64	with the "traditional" real-time task model (see Section 3) can effectively
				65	use the new policy.
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	66
				67	In more details, the CBS algorithm assigns scheduling deadlines to
				68	tasks in the following way:
				69
Luca Abeni	3a3a58d	2015-05-18 15:00:25 +0200	[diff] [blame]	70	- Each SCHED_DEADLINE task is characterized by the "runtime",
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	71	"deadline", and "period" parameters;
				72
				73	- The state of the task is described by a "scheduling deadline", and
Luca Abeni	ad67dc3	2014-09-09 10:57:12 +0100	[diff] [blame]	74	a "remaining runtime". These two parameters are initially set to 0;
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	75
				76	- When a SCHED_DEADLINE task wakes up (becomes ready for execution),
				77	the scheduler checks if
				78
Luca Abeni	ad67dc3	2014-09-09 10:57:12 +0100	[diff] [blame]	79	remaining runtime runtime
				80	---------------------------------- > ---------
				81	scheduling deadline - current time period
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	82
				83	then, if the scheduling deadline is smaller than the current time, or
				84	this condition is verified, the scheduling deadline and the
Luca Abeni	3a3a58d	2015-05-18 15:00:25 +0200	[diff] [blame]	85	remaining runtime are re-initialized as
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	86
				87	scheduling deadline = current time + deadline
Luca Abeni	ad67dc3	2014-09-09 10:57:12 +0100	[diff] [blame]	88	remaining runtime = runtime
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	89
Luca Abeni	ad67dc3	2014-09-09 10:57:12 +0100	[diff] [blame]	90	otherwise, the scheduling deadline and the remaining runtime are
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	91	left unchanged;
				92
				93	- When a SCHED_DEADLINE task executes for an amount of time t, its
Luca Abeni	ad67dc3	2014-09-09 10:57:12 +0100	[diff] [blame]	94	remaining runtime is decreased as
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	95
Luca Abeni	ad67dc3	2014-09-09 10:57:12 +0100	[diff] [blame]	96	remaining runtime = remaining runtime - t
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	97
				98	(technically, the runtime is decreased at every tick, or when the
				99	task is descheduled / preempted);
				100
Luca Abeni	ad67dc3	2014-09-09 10:57:12 +0100	[diff] [blame]	101	- When the remaining runtime becomes less or equal than 0, the task is
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	102	said to be "throttled" (also known as "depleted" in real-time literature)
				103	and cannot be scheduled until its scheduling deadline. The "replenishment
				104	time" for this task (see next item) is set to be equal to the current
				105	value of the scheduling deadline;
				106
				107	- When the current time is equal to the replenishment time of a
Luca Abeni	ad67dc3	2014-09-09 10:57:12 +0100	[diff] [blame]	108	throttled task, the scheduling deadline and the remaining runtime are
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	109	updated as
				110
				111	scheduling deadline = scheduling deadline + period
Luca Abeni	ad67dc3	2014-09-09 10:57:12 +0100	[diff] [blame]	112	remaining runtime = remaining runtime + runtime
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	113
				114
				115	3. Scheduling Real-Time Tasks
				116	=============================
				117
				118	* BIG FAT WARNING ******************************************************
				119	*
				120	* This section contains a (not-thorough) summary on classical deadline
				121	* scheduling theory, and how it applies to SCHED_DEADLINE.
				122	* The reader can "safely" skip to Section 4 if only interested in seeing
				123	* how the scheduling policy can be used. Anyway, we strongly recommend
				124	* to come back here and continue reading (once the urge for testing is
				125	* satisfied :P) to be sure of fully understanding all technical details.
				126	************************************************************************
				127
				128	There are no limitations on what kind of task can exploit this new
				129	scheduling discipline, even if it must be said that it is particularly
				130	suited for periodic or sporadic real-time tasks that need guarantees on their
				131	timing behavior, e.g., multimedia, streaming, control applications, etc.
				132
Luca Abeni	6aaa102	2015-05-18 15:00:32 +0200	[diff] [blame]	133	3.1 Definitions
				134	------------------------
				135
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	136	A typical real-time task is composed of a repetition of computation phases
				137	(task instances, or jobs) which are activated on a periodic or sporadic
				138	fashion.
Luca Abeni	3a3a58d	2015-05-18 15:00:25 +0200	[diff] [blame]	139	Each job J_j (where J_j is the j^th job of the task) is characterized by an
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	140	arrival time r_j (the time when the job starts), an amount of computation
				141	time c_j needed to finish the job, and a job absolute deadline d_j, which
				142	is the time within which the job should be finished. The maximum execution
Luca Abeni	c2a6849	2015-05-18 15:00:28 +0200	[diff] [blame]	143	time max{c_j} is called "Worst Case Execution Time" (WCET) for the task.
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	144	A real-time task can be periodic with period P if r_{j+1} = r_j + P, or
				145	sporadic with minimum inter-arrival time P is r_{j+1} >= r_j + P. Finally,
				146	d_j = r_j + D, where D is the task's relative deadline.
Luca Abeni	e0deda8	2015-05-18 15:00:29 +0200	[diff] [blame]	147	Summing up, a real-time task can be described as
				148	Task = (WCET, D, P)
				149
Luca Abeni	3a3a58d	2015-05-18 15:00:25 +0200	[diff] [blame]	150	The utilization of a real-time task is defined as the ratio between its
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	151	WCET and its period (or minimum inter-arrival time), and represents
				152	the fraction of CPU time needed to execute the task.
				153
Luca Abeni	c2a6849	2015-05-18 15:00:28 +0200	[diff] [blame]	154	If the total utilization U=sum(WCET_i/P_i) is larger than M (with M equal
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	155	to the number of CPUs), then the scheduler is unable to respect all the
				156	deadlines.
Luca Abeni	3a3a58d	2015-05-18 15:00:25 +0200	[diff] [blame]	157	Note that total utilization is defined as the sum of the utilizations
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	158	WCET_i/P_i over all the real-time tasks in the system. When considering
				159	multiple real-time tasks, the parameters of the i-th task are indicated
				160	with the "_i" suffix.
Luca Abeni	3a3a58d	2015-05-18 15:00:25 +0200	[diff] [blame]	161	Moreover, if the total utilization is larger than M, then we risk starving
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	162	non- real-time tasks by real-time tasks.
Luca Abeni	3a3a58d	2015-05-18 15:00:25 +0200	[diff] [blame]	163	If, instead, the total utilization is smaller than M, then non real-time
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	164	tasks will not be starved and the system might be able to respect all the
				165	deadlines.
				166	As a matter of fact, in this case it is possible to provide an upper bound
				167	for tardiness (defined as the maximum between 0 and the difference
				168	between the finishing time of a job and its absolute deadline).
				169	More precisely, it can be proven that using a global EDF scheduler the
				170	maximum tardiness of each task is smaller or equal than
				171	((M − 1) · WCET_max − WCET_min)/(M − (M − 2) · U_max) + WCET_max
Luca Abeni	c2a6849	2015-05-18 15:00:28 +0200	[diff] [blame]	172	where WCET_max = max{WCET_i} is the maximum WCET, WCET_min=min{WCET_i}
Luca Abeni	134136c	2015-05-18 15:00:30 +0200	[diff] [blame]	173	is the minimum WCET, and U_max = max{WCET_i/P_i} is the maximum
				174	utilization[12].
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	175
Luca Abeni	6aaa102	2015-05-18 15:00:32 +0200	[diff] [blame]	176	3.2 Schedulability Analysis for Uniprocessor Systems
				177	------------------------
				178
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	179	If M=1 (uniprocessor system), or in case of partitioned scheduling (each
				180	real-time task is statically assigned to one and only one CPU), it is
				181	possible to formally check if all the deadlines are respected.
				182	If D_i = P_i for all tasks, then EDF is able to respect all the deadlines
Luca Abeni	3a3a58d	2015-05-18 15:00:25 +0200	[diff] [blame]	183	of all the tasks executing on a CPU if and only if the total utilization
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	184	of the tasks running on such a CPU is smaller or equal than 1.
				185	If D_i != P_i for some task, then it is possible to define the density of
Luca Abeni	48355c4	2015-05-18 15:00:27 +0200	[diff] [blame]	186	a task as WCET_i/min{D_i,P_i}, and EDF is able to respect all the deadlines
Luca Abeni	e0deda8	2015-05-18 15:00:29 +0200	[diff] [blame]	187	of all the tasks running on a CPU if the sum of the densities of the tasks
				188	running on such a CPU is smaller or equal than 1:
				189	sum(WCET_i / min{D_i, P_i}) <= 1
				190	It is important to notice that this condition is only sufficient, and not
				191	necessary: there are task sets that are schedulable, but do not respect the
				192	condition. For example, consider the task set {Task_1,Task_2} composed by
				193	Task_1=(50ms,50ms,100ms) and Task_2=(10ms,100ms,100ms).
				194	EDF is clearly able to schedule the two tasks without missing any deadline
				195	(Task_1 is scheduled as soon as it is released, and finishes just in time
				196	to respect its deadline; Task_2 is scheduled immediately after Task_1, hence
				197	its response time cannot be larger than 50ms + 10ms = 60ms) even if
				198	50 / min{50,100} + 10 / min{100, 100} = 50 / 50 + 10 / 100 = 1.1
				199	Of course it is possible to test the exact schedulability of tasks with
				200	D_i != P_i (checking a condition that is both sufficient and necessary),
				201	but this cannot be done by comparing the total utilization or density with
				202	a constant. Instead, the so called "processor demand" approach can be used,
				203	computing the total amount of CPU time h(t) needed by all the tasks to
				204	respect all of their deadlines in a time interval of size t, and comparing
				205	such a time with the interval size t. If h(t) is smaller than t (that is,
				206	the amount of time needed by the tasks in a time interval of size t is
				207	smaller than the size of the interval) for all the possible values of t, then
				208	EDF is able to schedule the tasks respecting all of their deadlines. Since
				209	performing this check for all possible values of t is impossible, it has been
				210	proven[4,5,6] that it is sufficient to perform the test for values of t
				211	between 0 and a maximum value L. The cited papers contain all of the
				212	mathematical details and explain how to compute h(t) and L.
				213	In any case, this kind of analysis is too complex as well as too
				214	time-consuming to be performed on-line. Hence, as explained in Section
				215	4 Linux uses an admission test based on the tasks' utilizations.
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	216
Luca Abeni	6aaa102	2015-05-18 15:00:32 +0200	[diff] [blame]	217	3.3 Schedulability Analysis for Multiprocessor Systems
				218	------------------------
				219
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	220	On multiprocessor systems with global EDF scheduling (non partitioned
				221	systems), a sufficient test for schedulability can not be based on the
Luca Abeni	134136c	2015-05-18 15:00:30 +0200	[diff] [blame]	222	utilizations or densities: it can be shown that even if D_i = P_i task
				223	sets with utilizations slightly larger than 1 can miss deadlines regardless
				224	of the number of CPUs.
				225
				226	Consider a set {Task_1,...Task_{M+1}} of M+1 tasks on a system with M
				227	CPUs, with the first task Task_1=(P,P,P) having period, relative deadline
				228	and WCET equal to P. The remaining M tasks Task_i=(e,P-1,P-1) have an
				229	arbitrarily small worst case execution time (indicated as "e" here) and a
				230	period smaller than the one of the first task. Hence, if all the tasks
				231	activate at the same time t, global EDF schedules these M tasks first
				232	(because their absolute deadlines are equal to t + P - 1, hence they are
				233	smaller than the absolute deadline of Task_1, which is t + P). As a
				234	result, Task_1 can be scheduled only at time t + e, and will finish at
				235	time t + e + P, after its absolute deadline. The total utilization of the
				236	task set is U = M · e / (P - 1) + P / P = M · e / (P - 1) + 1, and for small
				237	values of e this can become very close to 1. This is known as "Dhall's
				238	effect"[7]. Note: the example in the original paper by Dhall has been
				239	slightly simplified here (for example, Dhall more correctly computed
				240	lim_{e->0}U).
				241
				242	More complex schedulability tests for global EDF have been developed in
				243	real-time literature[8,9], but they are not based on a simple comparison
				244	between total utilization (or density) and a fixed constant. If all tasks
				245	have D_i = P_i, a sufficient schedulability condition can be expressed in
				246	a simple way:
				247	sum(WCET_i / P_i) <= M - (M - 1) · U_max
				248	where U_max = max{WCET_i / P_i}[10]. Notice that for U_max = 1,
				249	M - (M - 1) · U_max becomes M - M + 1 = 1 and this schedulability condition
				250	just confirms the Dhall's effect. A more complete survey of the literature
				251	about schedulability tests for multi-processor real-time scheduling can be
				252	found in [11].
				253
				254	As seen, enforcing that the total utilization is smaller than M does not
				255	guarantee that global EDF schedules the tasks without missing any deadline
				256	(in other words, global EDF is not an optimal scheduling algorithm). However,
				257	a total utilization smaller than M is enough to guarantee that non real-time
				258	tasks are not starved and that the tardiness of real-time tasks has an upper
				259	bound[12] (as previously noted). Different bounds on the maximum tardiness
				260	experienced by real-time tasks have been developed in various papers[13,14],
				261	but the theoretical result that is important for SCHED_DEADLINE is that if
				262	the total utilization is smaller or equal than M then the response times of
				263	the tasks are limited.
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	264
Luca Abeni	6aaa102	2015-05-18 15:00:32 +0200	[diff] [blame]	265	3.4 Relationship with SCHED_DEADLINE Parameters
				266	------------------------
				267
Luca Abeni	7874085	2015-05-18 15:00:31 +0200	[diff] [blame]	268	Finally, it is important to understand the relationship between the
				269	SCHED_DEADLINE scheduling parameters described in Section 2 (runtime,
				270	deadline and period) and the real-time task parameters (WCET, D, P)
				271	described in this section. Note that the tasks' temporal constraints are
				272	represented by its absolute deadlines d_j = r_j + D described above, while
				273	SCHED_DEADLINE schedules the tasks according to scheduling deadlines (see
				274	Section 2).
				275	If an admission test is used to guarantee that the scheduling deadlines
				276	are respected, then SCHED_DEADLINE can be used to schedule real-time tasks
				277	guaranteeing that all the jobs' deadlines of a task are respected.
				278	In order to do this, a task must be scheduled by setting:
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	279
				280	- runtime >= WCET
				281	- deadline = D
				282	- period <= P
				283
Luca Abeni	3aa2dbe	2015-05-18 15:00:26 +0200	[diff] [blame]	284	IOW, if runtime >= WCET and if period is <= P, then the scheduling deadlines
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	285	and the absolute deadlines (d_j) coincide, so a proper admission control
				286	allows to respect the jobs' absolute deadlines for this task (this is what is
				287	called "hard schedulability property" and is an extension of Lemma 1 of [2]).
Luca Abeni	ad67dc3	2014-09-09 10:57:12 +0100	[diff] [blame]	288	Notice that if runtime > deadline the admission control will surely reject
				289	this task, as it is not possible to respect its temporal constraints.
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	290
				291	References:
				292	1 - C. L. Liu and J. W. Layland. Scheduling algorithms for multiprogram-
				293	ming in a hard-real-time environment. Journal of the Association for
				294	Computing Machinery, 20(1), 1973.
				295	2 - L. Abeni , G. Buttazzo. Integrating Multimedia Applications in Hard
				296	Real-Time Systems. Proceedings of the 19th IEEE Real-time Systems
				297	Symposium, 1998. http://retis.sssup.it/~giorgio/paps/1998/rtss98-cbs.pdf
				298	3 - L. Abeni. Server Mechanisms for Multimedia Applications. ReTiS Lab
Luca Abeni	ad67dc3	2014-09-09 10:57:12 +0100	[diff] [blame]	299	Technical Report. http://disi.unitn.it/~abeni/tr-98-01.pdf
Luca Abeni	e0deda8	2015-05-18 15:00:29 +0200	[diff] [blame]	300	4 - J. Y. Leung and M.L. Merril. A Note on Preemptive Scheduling of
				301	Periodic, Real-Time Tasks. Information Processing Letters, vol. 11,
				302	no. 3, pp. 115-118, 1980.
				303	5 - S. K. Baruah, A. K. Mok and L. E. Rosier. Preemptively Scheduling
				304	Hard-Real-Time Sporadic Tasks on One Processor. Proceedings of the
				305	11th IEEE Real-time Systems Symposium, 1990.
				306	6 - S. K. Baruah, L. E. Rosier and R. R. Howell. Algorithms and Complexity
				307	Concerning the Preemptive Scheduling of Periodic Real-Time tasks on
				308	One Processor. Real-Time Systems Journal, vol. 4, no. 2, pp 301-324,
				309	1990.
Luca Abeni	134136c	2015-05-18 15:00:30 +0200	[diff] [blame]	310	7 - S. J. Dhall and C. L. Liu. On a real-time scheduling problem. Operations
				311	research, vol. 26, no. 1, pp 127-140, 1978.
				312	8 - T. Baker. Multiprocessor EDF and Deadline Monotonic Schedulability
				313	Analysis. Proceedings of the 24th IEEE Real-Time Systems Symposium, 2003.
				314	9 - T. Baker. An Analysis of EDF Schedulability on a Multiprocessor.
				315	IEEE Transactions on Parallel and Distributed Systems, vol. 16, no. 8,
				316	pp 760-768, 2005.
				317	10 - J. Goossens, S. Funk and S. Baruah, Priority-Driven Scheduling of
				318	Periodic Task Systems on Multiprocessors. Real-Time Systems Journal,
				319	vol. 25, no. 2–3, pp. 187–205, 2003.
				320	11 - R. Davis and A. Burns. A Survey of Hard Real-Time Scheduling for
				321	Multiprocessor Systems. ACM Computing Surveys, vol. 43, no. 4, 2011.
				322	http://www-users.cs.york.ac.uk/~robdavis/papers/MPSurveyv5.0.pdf
				323	12 - U. C. Devi and J. H. Anderson. Tardiness Bounds under Global EDF
				324	Scheduling on a Multiprocessor. Real-Time Systems Journal, vol. 32,
				325	no. 2, pp 133-189, 2008.
				326	13 - P. Valente and G. Lipari. An Upper Bound to the Lateness of Soft
				327	Real-Time Tasks Scheduled by EDF on Multiprocessors. Proceedings of
				328	the 26th IEEE Real-Time Systems Symposium, 2005.
				329	14 - J. Erickson, U. Devi and S. Baruah. Improved tardiness bounds for
				330	Global EDF. Proceedings of the 22nd Euromicro Conference on
				331	Real-Time Systems, 2010.
				332
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	333
				334	4. Bandwidth management
				335	=======================
				336
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	337	As previously mentioned, in order for -deadline scheduling to be
				338	effective and useful (that is, to be able to provide "runtime" time units
				339	within "deadline"), it is important to have some method to keep the allocation
				340	of the available fractions of CPU time to the various tasks under control.
				341	This is usually called "admission control" and if it is not performed, then
				342	no guarantee can be given on the actual scheduling of the -deadline tasks.
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	343
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	344	As already stated in Section 3, a necessary condition to be respected to
Luca Abeni	3a3a58d	2015-05-18 15:00:25 +0200	[diff] [blame]	345	correctly schedule a set of real-time tasks is that the total utilization
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	346	is smaller than M. When talking about -deadline tasks, this requires that
				347	the sum of the ratio between runtime and period for all tasks is smaller
Luca Abeni	3a3a58d	2015-05-18 15:00:25 +0200	[diff] [blame]	348	than M. Notice that the ratio runtime/period is equivalent to the utilization
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	349	of a "traditional" real-time task, and is also often referred to as
				350	"bandwidth".
				351	The interface used to control the CPU bandwidth that can be allocated
				352	to -deadline tasks is similar to the one already used for -rt
Juri Lelli	0d9ba8b	2014-09-09 10:57:13 +0100	[diff] [blame]	353	tasks with real-time group scheduling (a.k.a. RT-throttling - see
				354	Documentation/scheduler/sched-rt-group.txt), and is based on readable/
				355	writable control files located in procfs (for system wide settings).
				356	Notice that per-group settings (controlled through cgroupfs) are still not
				357	defined for -deadline tasks, because more discussion is needed in order to
				358	figure out how we want to manage SCHED_DEADLINE bandwidth at the task group
				359	level.
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	360
Juri Lelli	0d9ba8b	2014-09-09 10:57:13 +0100	[diff] [blame]	361	A main difference between deadline bandwidth management and RT-throttling
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	362	is that -deadline tasks have bandwidth on their own (while -rt ones don't!),
Juri Lelli	0d9ba8b	2014-09-09 10:57:13 +0100	[diff] [blame]	363	and thus we don't need a higher level throttling mechanism to enforce the
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	364	desired bandwidth. In other words, this means that interface parameters are
				365	only used at admission control time (i.e., when the user calls
				366	sched_setattr()). Scheduling is then performed considering actual tasks'
				367	parameters, so that CPU bandwidth is allocated to SCHED_DEADLINE tasks
				368	respecting their needs in terms of granularity. Therefore, using this simple
				369	interface we can put a cap on total utilization of -deadline tasks (i.e.,
				370	\Sum (runtime_i / period_i) < global_dl_utilization_cap).
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	371
				372	4.1 System wide settings
				373	------------------------
				374
				375	The system wide settings are configured under the /proc virtual file system.
				376
Juri Lelli	0d9ba8b	2014-09-09 10:57:13 +0100	[diff] [blame]	377	For now the -rt knobs are used for -deadline admission control and the
Luca Abeni	3a3a58d	2015-05-18 15:00:25 +0200	[diff] [blame]	378	-deadline runtime is accounted against the -rt runtime. We realize that this
Juri Lelli	0d9ba8b	2014-09-09 10:57:13 +0100	[diff] [blame]	379	isn't entirely desirable; however, it is better to have a small interface for
				380	now, and be able to change it easily later. The ideal situation (see 5.) is to
				381	run -rt tasks from a -deadline server; in which case the -rt bandwidth is a
				382	direct subset of dl_bw.
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	383
				384	This means that, for a root_domain comprising M CPUs, -deadline tasks
				385	can be created while the sum of their bandwidths stays below:
				386
				387	M * (sched_rt_runtime_us / sched_rt_period_us)
				388
				389	It is also possible to disable this bandwidth management logic, and
				390	be thus free of oversubscribing the system up to any arbitrary level.
				391	This is done by writing -1 in /proc/sys/kernel/sched_rt_runtime_us.
				392
				393
				394	4.2 Task interface
				395	------------------
				396
				397	Specifying a periodic/sporadic task that executes for a given amount of
				398	runtime at each instance, and that is scheduled according to the urgency of
				399	its own timing constraints needs, in general, a way of declaring:
				400	- a (maximum/typical) instance execution time,
				401	- a minimum interval between consecutive instances,
				402	- a time constraint by which each instance must be completed.
				403
				404	Therefore:
				405	* a new struct sched_attr, containing all the necessary fields is
				406	provided;
				407	* the new scheduling related syscalls that manipulate it, i.e.,
				408	sched_setattr() and sched_getattr() are implemented.
				409
				410
				411	4.3 Default behavior
				412	---------------------
				413
				414	The default value for SCHED_DEADLINE bandwidth is to have rt_runtime equal to
				415	950000. With rt_period equal to 1000000, by default, it means that -deadline
				416	tasks can use at most 95%, multiplied by the number of CPUs that compose the
				417	root_domain, for each root_domain.
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	418	This means that non -deadline tasks will receive at least 5% of the CPU time,
				419	and that -deadline tasks will receive their runtime with a guaranteed
				420	worst-case delay respect to the "deadline" parameter. If "deadline" = "period"
				421	and the cpuset mechanism is used to implement partitioned scheduling (see
				422	Section 5), then this simple setting of the bandwidth management is able to
				423	deterministically guarantee that -deadline tasks will receive their runtime
				424	in a period.
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	425
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	426	Finally, notice that in order not to jeopardize the admission control a
				427	-deadline task cannot fork.
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	428
				429	5. Tasks CPU affinity
				430	=====================
				431
				432	-deadline tasks cannot have an affinity mask smaller that the entire
				433	root_domain they are created on. However, affinities can be specified
seokhoon.yoon	09c3bcc	2016-08-02 23:23:57 +0900	[diff] [blame^]	434	through the cpuset facility (Documentation/cgroup-v1/cpusets.txt).
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	435
				436	5.1 SCHED_DEADLINE and cpusets HOWTO
				437	------------------------------------
				438
				439	An example of a simple configuration (pin a -deadline task to CPU0)
				440	follows (rt-app is used to create a -deadline task).
				441
				442	mkdir /dev/cpuset
				443	mount -t cgroup -o cpuset cpuset /dev/cpuset
				444	cd /dev/cpuset
				445	mkdir cpu0
				446	echo 0 > cpu0/cpuset.cpus
				447	echo 0 > cpu0/cpuset.mems
				448	echo 1 > cpuset.cpu_exclusive
				449	echo 0 > cpuset.sched_load_balance
				450	echo 1 > cpu0/cpuset.cpu_exclusive
				451	echo 1 > cpu0/cpuset.mem_exclusive
				452	echo $$ > cpu0/tasks
				453	rt-app -t 100000:10000:d:0 -D5 (it is now actually superfluous to specify
				454	task affinity)
				455
				456	6. Future plans
				457	===============
				458
				459	Still missing:
				460
				461	- refinements to deadline inheritance, especially regarding the possibility
				462	of retaining bandwidth isolation among non-interacting tasks. This is
				463	being studied from both theoretical and practical points of view, and
				464	hopefully we should be able to produce some demonstrative code soon;
				465	- (c)group based bandwidth management, and maybe scheduling;
				466	- access control for non-root users (and related security concerns to
				467	address), which is the best way to allow unprivileged use of the mechanisms
				468	and how to prevent non-root users "cheat" the system?
				469
				470	As already discussed, we are planning also to merge this work with the EDF
				471	throttling patches [https://lkml.org/lkml/2010/2/23/239] but we still are in
				472	the preliminary phases of the merge and we really seek feedback that would
				473	help us decide on the direction it should take.
Juri Lelli	f580193	2014-09-09 10:57:15 +0100	[diff] [blame]	474
				475	Appendix A. Test suite
				476	======================
				477
				478	The SCHED_DEADLINE policy can be easily tested using two applications that
				479	are part of a wider Linux Scheduler validation suite. The suite is
				480	available as a GitHub repository: https://github.com/scheduler-tools.
				481
				482	The first testing application is called rt-app and can be used to
				483	start multiple threads with specific parameters. rt-app supports
				484	SCHED_{OTHER,FIFO,RR,DEADLINE} scheduling policies and their related
				485	parameters (e.g., niceness, priority, runtime/deadline/period). rt-app
				486	is a valuable tool, as it can be used to synthetically recreate certain
				487	workloads (maybe mimicking real use-cases) and evaluate how the scheduler
				488	behaves under such workloads. In this way, results are easily reproducible.
				489	rt-app is available at: https://github.com/scheduler-tools/rt-app.
				490
				491	Thread parameters can be specified from the command line, with something like
				492	this:
				493
				494	# rt-app -t 100000:10000:d -t 150000:20000:f:10 -D5
				495
				496	The above creates 2 threads. The first one, scheduled by SCHED_DEADLINE,
				497	executes for 10ms every 100ms. The second one, scheduled at SCHED_FIFO
				498	priority 10, executes for 20ms every 150ms. The test will run for a total
				499	of 5 seconds.
				500
				501	More interestingly, configurations can be described with a json file that
				502	can be passed as input to rt-app with something like this:
				503
				504	# rt-app my_config.json
				505
				506	The parameters that can be specified with the second method are a superset
				507	of the command line options. Please refer to rt-app documentation for more
				508	details (<rt-app-sources>/doc/*.json).
				509
				510	The second testing application is a modification of schedtool, called
				511	schedtool-dl, which can be used to setup SCHED_DEADLINE parameters for a
				512	certain pid/application. schedtool-dl is available at:
				513	https://github.com/scheduler-tools/schedtool-dl.git.
				514
				515	The usage is straightforward:
				516
				517	# schedtool -E -t 10000000:100000000 -e ./my_cpuhog_app
				518
				519	With this, my_cpuhog_app is put to run inside a SCHED_DEADLINE reservation
				520	of 10ms every 100ms (note that parameters are expressed in microseconds).
				521	You can also use schedtool to create a reservation for an already running
				522	application, given that you know its pid:
				523
				524	# schedtool -E -t 10000000:100000000 my_app_pid
Juri Lelli	13924d2	2014-09-09 10:57:16 +0100	[diff] [blame]	525
				526	Appendix B. Minimal main()
				527	==========================
				528
				529	We provide in what follows a simple (ugly) self-contained code snippet
				530	showing how SCHED_DEADLINE reservations can be created by a real-time
				531	application developer.
				532
				533	#define _GNU_SOURCE
				534	#include <unistd.h>
				535	#include <stdio.h>
				536	#include <stdlib.h>
				537	#include <string.h>
				538	#include <time.h>
				539	#include <linux/unistd.h>
				540	#include <linux/kernel.h>
				541	#include <linux/types.h>
				542	#include <sys/syscall.h>
				543	#include <pthread.h>
				544
				545	#define gettid() syscall(__NR_gettid)
				546
				547	#define SCHED_DEADLINE 6
				548
				549	/* XXX use the proper syscall numbers */
				550	#ifdef __x86_64__
				551	#define __NR_sched_setattr 314
				552	#define __NR_sched_getattr 315
				553	#endif
				554
				555	#ifdef __i386__
				556	#define __NR_sched_setattr 351
				557	#define __NR_sched_getattr 352
				558	#endif
				559
				560	#ifdef __arm__
				561	#define __NR_sched_setattr 380
				562	#define __NR_sched_getattr 381
				563	#endif
				564
				565	static volatile int done;
				566
				567	struct sched_attr {
				568	__u32 size;
				569
				570	__u32 sched_policy;
				571	__u64 sched_flags;
				572
				573	/* SCHED_NORMAL, SCHED_BATCH */
				574	__s32 sched_nice;
				575
				576	/* SCHED_FIFO, SCHED_RR */
				577	__u32 sched_priority;
				578
				579	/* SCHED_DEADLINE (nsec) */
				580	__u64 sched_runtime;
				581	__u64 sched_deadline;
				582	__u64 sched_period;
				583	};
				584
				585	int sched_setattr(pid_t pid,
				586	const struct sched_attr *attr,
				587	unsigned int flags)
				588	{
				589	return syscall(__NR_sched_setattr, pid, attr, flags);
				590	}
				591
				592	int sched_getattr(pid_t pid,
				593	struct sched_attr *attr,
				594	unsigned int size,
				595	unsigned int flags)
				596	{
				597	return syscall(__NR_sched_getattr, pid, attr, size, flags);
				598	}
				599
				600	void run_deadline(void data)
				601	{
				602	struct sched_attr attr;
				603	int x = 0;
				604	int ret;
				605	unsigned int flags = 0;
				606
				607	printf("deadline thread started [%ld]\n", gettid());
				608
				609	attr.size = sizeof(attr);
				610	attr.sched_flags = 0;
				611	attr.sched_nice = 0;
				612	attr.sched_priority = 0;
				613
				614	/* This creates a 10ms/30ms reservation */
				615	attr.sched_policy = SCHED_DEADLINE;
				616	attr.sched_runtime = 10 * 1000 * 1000;
				617	attr.sched_period = attr.sched_deadline = 30 * 1000 * 1000;
				618
				619	ret = sched_setattr(0, &attr, flags);
				620	if (ret < 0) {
				621	done = 0;
				622	perror("sched_setattr");
				623	exit(-1);
				624	}
				625
				626	while (!done) {
				627	x++;
				628	}
				629
				630	printf("deadline thread dies [%ld]\n", gettid());
				631	return NULL;
				632	}
				633
				634	int main (int argc, char **argv)
				635	{
				636	pthread_t thread;
				637
				638	printf("main thread [%ld]\n", gettid());
				639
				640	pthread_create(&thread, NULL, run_deadline, NULL);
				641
				642	sleep(10);
				643
				644	done = 1;
				645	pthread_join(thread, NULL);
				646
				647	printf("main dies [%ld]\n", gettid());
				648	return 0;
				649	}