Blame - Documentation/scheduler/sched-deadline.txt - kernel/msm-4.19

blob: e89e36ec15a5bf6453db0b77031d23a28c705e07 [file] [log] [blame]

Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	1	Deadline Task Scheduling
				2	------------------------
				3
				4	CONTENTS
				5	========
				6
				7	0. WARNING
				8	1. Overview
				9	2. Scheduling algorithm
Claudio Scordino	ccc9d65	2017-05-18 22:13:37 +0200	[diff] [blame]	10	2.1 Main algorithm
				11	2.2 Bandwidth reclaiming
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	12	3. Scheduling Real-Time Tasks
Luca Abeni	6aaa102	2015-05-18 15:00:32 +0200	[diff] [blame]	13	3.1 Definitions
				14	3.2 Schedulability Analysis for Uniprocessor Systems
				15	3.3 Schedulability Analysis for Multiprocessor Systems
				16	3.4 Relationship with SCHED_DEADLINE Parameters
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	17	4. Bandwidth management
				18	4.1 System-wide settings
				19	4.2 Task interface
				20	4.3 Default behavior
Tommaso Cucinotta	b95202a	2016-09-09 19:45:17 +0200	[diff] [blame]	21	4.4 Behavior of sched_yield()
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	22	5. Tasks CPU affinity
				23	5.1 SCHED_DEADLINE and cpusets HOWTO
				24	6. Future plans
Juri Lelli	f580193	2014-09-09 10:57:15 +0100	[diff] [blame]	25	A. Test suite
Juri Lelli	13924d2	2014-09-09 10:57:16 +0100	[diff] [blame]	26	B. Minimal main()
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	27
				28
				29	0. WARNING
				30	==========
				31
				32	Fiddling with these settings can result in an unpredictable or even unstable
				33	system behavior. As for -rt (group) scheduling, it is assumed that root users
				34	know what they're doing.
				35
				36
				37	1. Overview
				38	===========
				39
				40	The SCHED_DEADLINE policy contained inside the sched_dl scheduling class is
				41	basically an implementation of the Earliest Deadline First (EDF) scheduling
				42	algorithm, augmented with a mechanism (called Constant Bandwidth Server, CBS)
				43	that makes it possible to isolate the behavior of tasks between each other.
				44
				45
				46	2. Scheduling algorithm
				47	==================
				48
Claudio Scordino	ccc9d65	2017-05-18 22:13:37 +0200	[diff] [blame]	49	2.1 Main algorithm
				50	------------------
				51
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	52	SCHED_DEADLINE uses three parameters, named "runtime", "period", and
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	53	"deadline", to schedule tasks. A SCHED_DEADLINE task should receive
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	54	"runtime" microseconds of execution time every "period" microseconds, and
				55	these "runtime" microseconds are available within "deadline" microseconds
Luca Abeni	3a3a58d	2015-05-18 15:00:25 +0200	[diff] [blame]	56	from the beginning of the period. In order to implement this behavior,
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	57	every time the task wakes up, the scheduler computes a "scheduling deadline"
				58	consistent with the guarantee (using the CBS[2,3] algorithm). Tasks are then
				59	scheduled using EDF[1] on these scheduling deadlines (the task with the
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	60	earliest scheduling deadline is selected for execution). Notice that the
				61	task actually receives "runtime" time units within "deadline" if a proper
				62	"admission control" strategy (see Section "4. Bandwidth management") is used
				63	(clearly, if the system is overloaded this guarantee cannot be respected).
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	64
Luca Abeni	3aa2dbe	2015-05-18 15:00:26 +0200	[diff] [blame]	65	Summing up, the CBS[2,3] algorithm assigns scheduling deadlines to tasks so
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	66	that each task runs for at most its runtime every period, avoiding any
				67	interference between different tasks (bandwidth isolation), while the EDF[1]
Luca Abeni	ad67dc3	2014-09-09 10:57:12 +0100	[diff] [blame]	68	algorithm selects the task with the earliest scheduling deadline as the one
				69	to be executed next. Thanks to this feature, tasks that do not strictly comply
				70	with the "traditional" real-time task model (see Section 3) can effectively
				71	use the new policy.
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	72
				73	In more details, the CBS algorithm assigns scheduling deadlines to
				74	tasks in the following way:
				75
Luca Abeni	3a3a58d	2015-05-18 15:00:25 +0200	[diff] [blame]	76	- Each SCHED_DEADLINE task is characterized by the "runtime",
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	77	"deadline", and "period" parameters;
				78
				79	- The state of the task is described by a "scheduling deadline", and
Luca Abeni	ad67dc3	2014-09-09 10:57:12 +0100	[diff] [blame]	80	a "remaining runtime". These two parameters are initially set to 0;
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	81
				82	- When a SCHED_DEADLINE task wakes up (becomes ready for execution),
				83	the scheduler checks if
				84
Luca Abeni	ad67dc3	2014-09-09 10:57:12 +0100	[diff] [blame]	85	remaining runtime runtime
				86	---------------------------------- > ---------
				87	scheduling deadline - current time period
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	88
				89	then, if the scheduling deadline is smaller than the current time, or
				90	this condition is verified, the scheduling deadline and the
Luca Abeni	3a3a58d	2015-05-18 15:00:25 +0200	[diff] [blame]	91	remaining runtime are re-initialized as
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	92
				93	scheduling deadline = current time + deadline
Luca Abeni	ad67dc3	2014-09-09 10:57:12 +0100	[diff] [blame]	94	remaining runtime = runtime
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	95
Luca Abeni	ad67dc3	2014-09-09 10:57:12 +0100	[diff] [blame]	96	otherwise, the scheduling deadline and the remaining runtime are
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	97	left unchanged;
				98
				99	- When a SCHED_DEADLINE task executes for an amount of time t, its
Luca Abeni	ad67dc3	2014-09-09 10:57:12 +0100	[diff] [blame]	100	remaining runtime is decreased as
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	101
Luca Abeni	ad67dc3	2014-09-09 10:57:12 +0100	[diff] [blame]	102	remaining runtime = remaining runtime - t
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	103
				104	(technically, the runtime is decreased at every tick, or when the
				105	task is descheduled / preempted);
				106
Luca Abeni	ad67dc3	2014-09-09 10:57:12 +0100	[diff] [blame]	107	- When the remaining runtime becomes less or equal than 0, the task is
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	108	said to be "throttled" (also known as "depleted" in real-time literature)
				109	and cannot be scheduled until its scheduling deadline. The "replenishment
				110	time" for this task (see next item) is set to be equal to the current
				111	value of the scheduling deadline;
				112
				113	- When the current time is equal to the replenishment time of a
Luca Abeni	ad67dc3	2014-09-09 10:57:12 +0100	[diff] [blame]	114	throttled task, the scheduling deadline and the remaining runtime are
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	115	updated as
				116
				117	scheduling deadline = scheduling deadline + period
Luca Abeni	ad67dc3	2014-09-09 10:57:12 +0100	[diff] [blame]	118	remaining runtime = remaining runtime + runtime
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	119
				120
Claudio Scordino	ccc9d65	2017-05-18 22:13:37 +0200	[diff] [blame]	121	2.2 Bandwidth reclaiming
				122	------------------------
				123
				124	Bandwidth reclaiming for deadline tasks is based on the GRUB (Greedy
				125	Reclamation of Unused Bandwidth) algorithm [15, 16, 17] and it is enabled
				126	when flag SCHED_FLAG_RECLAIM is set.
				127
				128	The following diagram illustrates the state names for tasks handled by GRUB:
				129
				130	------------
				131	(d) \| Active \|
				132	------------->\| \|
				133	\| \| Contending \|
				134	\| ------------
				135	\| A \|
				136	---------- \| \|
				137	\| \| \| \|
				138	\| Inactive \| \|(b) \| (a)
				139	\| \| \| \|
				140	---------- \| \|
				141	A \| V
				142	\| ------------
				143	\| \| Active \|
				144	--------------\| Non \|
				145	(c) \| Contending \|
				146	------------
				147
				148	A task can be in one of the following states:
				149
				150	- ActiveContending: if it is ready for execution (or executing);
				151
				152	- ActiveNonContending: if it just blocked and has not yet surpassed the 0-lag
				153	time;
				154
				155	- Inactive: if it is blocked and has surpassed the 0-lag time.
				156
				157	State transitions:
				158
				159	(a) When a task blocks, it does not become immediately inactive since its
				160	bandwidth cannot be immediately reclaimed without breaking the
				161	real-time guarantees. It therefore enters a transitional state called
				162	ActiveNonContending. The scheduler arms the "inactive timer" to fire at
				163	the 0-lag time, when the task's bandwidth can be reclaimed without
				164	breaking the real-time guarantees.
				165
				166	The 0-lag time for a task entering the ActiveNonContending state is
				167	computed as
				168
				169	(runtime * dl_period)
				170	deadline - ---------------------
				171	dl_runtime
				172
				173	where runtime is the remaining runtime, while dl_runtime and dl_period
				174	are the reservation parameters.
				175
				176	(b) If the task wakes up before the inactive timer fires, the task re-enters
				177	the ActiveContending state and the "inactive timer" is canceled.
				178	In addition, if the task wakes up on a different runqueue, then
				179	the task's utilization must be removed from the previous runqueue's active
				180	utilization and must be added to the new runqueue's active utilization.
				181	In order to avoid races between a task waking up on a runqueue while the
				182	"inactive timer" is running on a different CPU, the "dl_non_contending"
				183	flag is used to indicate that a task is not on a runqueue but is active
				184	(so, the flag is set when the task blocks and is cleared when the
				185	"inactive timer" fires or when the task wakes up).
				186
				187	(c) When the "inactive timer" fires, the task enters the Inactive state and
				188	its utilization is removed from the runqueue's active utilization.
				189
				190	(d) When an inactive task wakes up, it enters the ActiveContending state and
				191	its utilization is added to the active utilization of the runqueue where
				192	it has been enqueued.
				193
				194	For each runqueue, the algorithm GRUB keeps track of two different bandwidths:
				195
				196	- Active bandwidth (running_bw): this is the sum of the bandwidths of all
				197	tasks in active state (i.e., ActiveContending or ActiveNonContending);
				198
				199	- Total bandwidth (this_bw): this is the sum of all tasks "belonging" to the
				200	runqueue, including the tasks in Inactive state.
				201
				202
				203	The algorithm reclaims the bandwidth of the tasks in Inactive state.
				204	It does so by decrementing the runtime of the executing task Ti at a pace equal
				205	to
				206
				207	dq = -max{ Ui, (1 - Uinact) } dt
				208
				209	where Uinact is the inactive utilization, computed as (this_bq - running_bw),
				210	and Ui is the bandwidth of task Ti.
				211
				212
				213	Let's now see a trivial example of two deadline tasks with runtime equal
				214	to 4 and period equal to 8 (i.e., bandwidth equal to 0.5):
				215
				216	A Task T1
				217	\|
				218	\| \|
				219	\| \|
				220	\|-------- \|----
				221	\| \| V
				222	\|---\|---\|---\|---\|---\|---\|---\|---\|--------->t
				223	0 1 2 3 4 5 6 7 8
				224
				225
				226	A Task T2
				227	\|
				228	\| \|
				229	\| \|
				230	\| ------------------------\|
				231	\| \| V
				232	\|---\|---\|---\|---\|---\|---\|---\|---\|--------->t
				233	0 1 2 3 4 5 6 7 8
				234
				235
				236	A running_bw
				237	\|
				238	1 ----------------- ------
				239	\| \| \|
				240	0.5- -----------------
				241	\| \|
				242	\|---\|---\|---\|---\|---\|---\|---\|---\|--------->t
				243	0 1 2 3 4 5 6 7 8
				244
				245
				246	- Time t = 0:
				247
				248	Both tasks are ready for execution and therefore in ActiveContending state.
				249	Suppose Task T1 is the first task to start execution.
				250	Since there are no inactive tasks, its runtime is decreased as dq = -1 dt.
				251
				252	- Time t = 2:
				253
				254	Suppose that task T1 blocks
				255	Task T1 therefore enters the ActiveNonContending state. Since its remaining
				256	runtime is equal to 2, its 0-lag time is equal to t = 4.
				257	Task T2 start execution, with runtime still decreased as dq = -1 dt since
				258	there are no inactive tasks.
				259
				260	- Time t = 4:
				261
				262	This is the 0-lag time for Task T1. Since it didn't woken up in the
				263	meantime, it enters the Inactive state. Its bandwidth is removed from
				264	running_bw.
				265	Task T2 continues its execution. However, its runtime is now decreased as
				266	dq = - 0.5 dt because Uinact = 0.5.
				267	Task T2 therefore reclaims the bandwidth unused by Task T1.
				268
				269	- Time t = 8:
				270
				271	Task T1 wakes up. It enters the ActiveContending state again, and the
				272	running_bw is incremented.
				273
				274
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	275	3. Scheduling Real-Time Tasks
				276	=============================
				277
				278	* BIG FAT WARNING ******************************************************
				279	*
				280	* This section contains a (not-thorough) summary on classical deadline
				281	* scheduling theory, and how it applies to SCHED_DEADLINE.
				282	* The reader can "safely" skip to Section 4 if only interested in seeing
				283	* how the scheduling policy can be used. Anyway, we strongly recommend
				284	* to come back here and continue reading (once the urge for testing is
				285	* satisfied :P) to be sure of fully understanding all technical details.
				286	************************************************************************
				287
				288	There are no limitations on what kind of task can exploit this new
				289	scheduling discipline, even if it must be said that it is particularly
				290	suited for periodic or sporadic real-time tasks that need guarantees on their
				291	timing behavior, e.g., multimedia, streaming, control applications, etc.
				292
Luca Abeni	6aaa102	2015-05-18 15:00:32 +0200	[diff] [blame]	293	3.1 Definitions
				294	------------------------
				295
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	296	A typical real-time task is composed of a repetition of computation phases
				297	(task instances, or jobs) which are activated on a periodic or sporadic
				298	fashion.
Luca Abeni	3a3a58d	2015-05-18 15:00:25 +0200	[diff] [blame]	299	Each job J_j (where J_j is the j^th job of the task) is characterized by an
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	300	arrival time r_j (the time when the job starts), an amount of computation
				301	time c_j needed to finish the job, and a job absolute deadline d_j, which
				302	is the time within which the job should be finished. The maximum execution
Luca Abeni	c2a6849	2015-05-18 15:00:28 +0200	[diff] [blame]	303	time max{c_j} is called "Worst Case Execution Time" (WCET) for the task.
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	304	A real-time task can be periodic with period P if r_{j+1} = r_j + P, or
				305	sporadic with minimum inter-arrival time P is r_{j+1} >= r_j + P. Finally,
				306	d_j = r_j + D, where D is the task's relative deadline.
Luca Abeni	e0deda8	2015-05-18 15:00:29 +0200	[diff] [blame]	307	Summing up, a real-time task can be described as
				308	Task = (WCET, D, P)
				309
Luca Abeni	3a3a58d	2015-05-18 15:00:25 +0200	[diff] [blame]	310	The utilization of a real-time task is defined as the ratio between its
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	311	WCET and its period (or minimum inter-arrival time), and represents
				312	the fraction of CPU time needed to execute the task.
				313
Luca Abeni	c2a6849	2015-05-18 15:00:28 +0200	[diff] [blame]	314	If the total utilization U=sum(WCET_i/P_i) is larger than M (with M equal
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	315	to the number of CPUs), then the scheduler is unable to respect all the
				316	deadlines.
Luca Abeni	3a3a58d	2015-05-18 15:00:25 +0200	[diff] [blame]	317	Note that total utilization is defined as the sum of the utilizations
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	318	WCET_i/P_i over all the real-time tasks in the system. When considering
				319	multiple real-time tasks, the parameters of the i-th task are indicated
				320	with the "_i" suffix.
Luca Abeni	3a3a58d	2015-05-18 15:00:25 +0200	[diff] [blame]	321	Moreover, if the total utilization is larger than M, then we risk starving
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	322	non- real-time tasks by real-time tasks.
Luca Abeni	3a3a58d	2015-05-18 15:00:25 +0200	[diff] [blame]	323	If, instead, the total utilization is smaller than M, then non real-time
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	324	tasks will not be starved and the system might be able to respect all the
				325	deadlines.
				326	As a matter of fact, in this case it is possible to provide an upper bound
				327	for tardiness (defined as the maximum between 0 and the difference
				328	between the finishing time of a job and its absolute deadline).
				329	More precisely, it can be proven that using a global EDF scheduler the
				330	maximum tardiness of each task is smaller or equal than
				331	((M − 1) · WCET_max − WCET_min)/(M − (M − 2) · U_max) + WCET_max
Luca Abeni	c2a6849	2015-05-18 15:00:28 +0200	[diff] [blame]	332	where WCET_max = max{WCET_i} is the maximum WCET, WCET_min=min{WCET_i}
Luca Abeni	134136c	2015-05-18 15:00:30 +0200	[diff] [blame]	333	is the minimum WCET, and U_max = max{WCET_i/P_i} is the maximum
				334	utilization[12].
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	335
Luca Abeni	6aaa102	2015-05-18 15:00:32 +0200	[diff] [blame]	336	3.2 Schedulability Analysis for Uniprocessor Systems
				337	------------------------
				338
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	339	If M=1 (uniprocessor system), or in case of partitioned scheduling (each
				340	real-time task is statically assigned to one and only one CPU), it is
				341	possible to formally check if all the deadlines are respected.
				342	If D_i = P_i for all tasks, then EDF is able to respect all the deadlines
Luca Abeni	3a3a58d	2015-05-18 15:00:25 +0200	[diff] [blame]	343	of all the tasks executing on a CPU if and only if the total utilization
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	344	of the tasks running on such a CPU is smaller or equal than 1.
				345	If D_i != P_i for some task, then it is possible to define the density of
Luca Abeni	48355c4	2015-05-18 15:00:27 +0200	[diff] [blame]	346	a task as WCET_i/min{D_i,P_i}, and EDF is able to respect all the deadlines
Luca Abeni	e0deda8	2015-05-18 15:00:29 +0200	[diff] [blame]	347	of all the tasks running on a CPU if the sum of the densities of the tasks
				348	running on such a CPU is smaller or equal than 1:
				349	sum(WCET_i / min{D_i, P_i}) <= 1
				350	It is important to notice that this condition is only sufficient, and not
				351	necessary: there are task sets that are schedulable, but do not respect the
				352	condition. For example, consider the task set {Task_1,Task_2} composed by
				353	Task_1=(50ms,50ms,100ms) and Task_2=(10ms,100ms,100ms).
				354	EDF is clearly able to schedule the two tasks without missing any deadline
				355	(Task_1 is scheduled as soon as it is released, and finishes just in time
				356	to respect its deadline; Task_2 is scheduled immediately after Task_1, hence
				357	its response time cannot be larger than 50ms + 10ms = 60ms) even if
				358	50 / min{50,100} + 10 / min{100, 100} = 50 / 50 + 10 / 100 = 1.1
				359	Of course it is possible to test the exact schedulability of tasks with
				360	D_i != P_i (checking a condition that is both sufficient and necessary),
				361	but this cannot be done by comparing the total utilization or density with
				362	a constant. Instead, the so called "processor demand" approach can be used,
				363	computing the total amount of CPU time h(t) needed by all the tasks to
				364	respect all of their deadlines in a time interval of size t, and comparing
				365	such a time with the interval size t. If h(t) is smaller than t (that is,
				366	the amount of time needed by the tasks in a time interval of size t is
				367	smaller than the size of the interval) for all the possible values of t, then
				368	EDF is able to schedule the tasks respecting all of their deadlines. Since
				369	performing this check for all possible values of t is impossible, it has been
				370	proven[4,5,6] that it is sufficient to perform the test for values of t
				371	between 0 and a maximum value L. The cited papers contain all of the
				372	mathematical details and explain how to compute h(t) and L.
				373	In any case, this kind of analysis is too complex as well as too
				374	time-consuming to be performed on-line. Hence, as explained in Section
				375	4 Linux uses an admission test based on the tasks' utilizations.
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	376
Luca Abeni	6aaa102	2015-05-18 15:00:32 +0200	[diff] [blame]	377	3.3 Schedulability Analysis for Multiprocessor Systems
				378	------------------------
				379
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	380	On multiprocessor systems with global EDF scheduling (non partitioned
				381	systems), a sufficient test for schedulability can not be based on the
Luca Abeni	134136c	2015-05-18 15:00:30 +0200	[diff] [blame]	382	utilizations or densities: it can be shown that even if D_i = P_i task
				383	sets with utilizations slightly larger than 1 can miss deadlines regardless
				384	of the number of CPUs.
				385
				386	Consider a set {Task_1,...Task_{M+1}} of M+1 tasks on a system with M
				387	CPUs, with the first task Task_1=(P,P,P) having period, relative deadline
				388	and WCET equal to P. The remaining M tasks Task_i=(e,P-1,P-1) have an
				389	arbitrarily small worst case execution time (indicated as "e" here) and a
				390	period smaller than the one of the first task. Hence, if all the tasks
				391	activate at the same time t, global EDF schedules these M tasks first
				392	(because their absolute deadlines are equal to t + P - 1, hence they are
				393	smaller than the absolute deadline of Task_1, which is t + P). As a
				394	result, Task_1 can be scheduled only at time t + e, and will finish at
				395	time t + e + P, after its absolute deadline. The total utilization of the
				396	task set is U = M · e / (P - 1) + P / P = M · e / (P - 1) + 1, and for small
				397	values of e this can become very close to 1. This is known as "Dhall's
				398	effect"[7]. Note: the example in the original paper by Dhall has been
				399	slightly simplified here (for example, Dhall more correctly computed
				400	lim_{e->0}U).
				401
				402	More complex schedulability tests for global EDF have been developed in
				403	real-time literature[8,9], but they are not based on a simple comparison
				404	between total utilization (or density) and a fixed constant. If all tasks
				405	have D_i = P_i, a sufficient schedulability condition can be expressed in
				406	a simple way:
				407	sum(WCET_i / P_i) <= M - (M - 1) · U_max
				408	where U_max = max{WCET_i / P_i}[10]. Notice that for U_max = 1,
				409	M - (M - 1) · U_max becomes M - M + 1 = 1 and this schedulability condition
				410	just confirms the Dhall's effect. A more complete survey of the literature
				411	about schedulability tests for multi-processor real-time scheduling can be
				412	found in [11].
				413
				414	As seen, enforcing that the total utilization is smaller than M does not
				415	guarantee that global EDF schedules the tasks without missing any deadline
				416	(in other words, global EDF is not an optimal scheduling algorithm). However,
				417	a total utilization smaller than M is enough to guarantee that non real-time
				418	tasks are not starved and that the tardiness of real-time tasks has an upper
				419	bound[12] (as previously noted). Different bounds on the maximum tardiness
				420	experienced by real-time tasks have been developed in various papers[13,14],
				421	but the theoretical result that is important for SCHED_DEADLINE is that if
				422	the total utilization is smaller or equal than M then the response times of
				423	the tasks are limited.
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	424
Luca Abeni	6aaa102	2015-05-18 15:00:32 +0200	[diff] [blame]	425	3.4 Relationship with SCHED_DEADLINE Parameters
				426	------------------------
				427
Luca Abeni	7874085	2015-05-18 15:00:31 +0200	[diff] [blame]	428	Finally, it is important to understand the relationship between the
				429	SCHED_DEADLINE scheduling parameters described in Section 2 (runtime,
				430	deadline and period) and the real-time task parameters (WCET, D, P)
				431	described in this section. Note that the tasks' temporal constraints are
				432	represented by its absolute deadlines d_j = r_j + D described above, while
				433	SCHED_DEADLINE schedules the tasks according to scheduling deadlines (see
				434	Section 2).
				435	If an admission test is used to guarantee that the scheduling deadlines
				436	are respected, then SCHED_DEADLINE can be used to schedule real-time tasks
				437	guaranteeing that all the jobs' deadlines of a task are respected.
				438	In order to do this, a task must be scheduled by setting:
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	439
				440	- runtime >= WCET
				441	- deadline = D
				442	- period <= P
				443
Luca Abeni	3aa2dbe	2015-05-18 15:00:26 +0200	[diff] [blame]	444	IOW, if runtime >= WCET and if period is <= P, then the scheduling deadlines
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	445	and the absolute deadlines (d_j) coincide, so a proper admission control
				446	allows to respect the jobs' absolute deadlines for this task (this is what is
				447	called "hard schedulability property" and is an extension of Lemma 1 of [2]).
Luca Abeni	ad67dc3	2014-09-09 10:57:12 +0100	[diff] [blame]	448	Notice that if runtime > deadline the admission control will surely reject
				449	this task, as it is not possible to respect its temporal constraints.
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	450
				451	References:
				452	1 - C. L. Liu and J. W. Layland. Scheduling algorithms for multiprogram-
				453	ming in a hard-real-time environment. Journal of the Association for
				454	Computing Machinery, 20(1), 1973.
				455	2 - L. Abeni , G. Buttazzo. Integrating Multimedia Applications in Hard
				456	Real-Time Systems. Proceedings of the 19th IEEE Real-time Systems
				457	Symposium, 1998. http://retis.sssup.it/~giorgio/paps/1998/rtss98-cbs.pdf
				458	3 - L. Abeni. Server Mechanisms for Multimedia Applications. ReTiS Lab
Luca Abeni	ad67dc3	2014-09-09 10:57:12 +0100	[diff] [blame]	459	Technical Report. http://disi.unitn.it/~abeni/tr-98-01.pdf
Luca Abeni	e0deda8	2015-05-18 15:00:29 +0200	[diff] [blame]	460	4 - J. Y. Leung and M.L. Merril. A Note on Preemptive Scheduling of
				461	Periodic, Real-Time Tasks. Information Processing Letters, vol. 11,
				462	no. 3, pp. 115-118, 1980.
				463	5 - S. K. Baruah, A. K. Mok and L. E. Rosier. Preemptively Scheduling
				464	Hard-Real-Time Sporadic Tasks on One Processor. Proceedings of the
				465	11th IEEE Real-time Systems Symposium, 1990.
				466	6 - S. K. Baruah, L. E. Rosier and R. R. Howell. Algorithms and Complexity
				467	Concerning the Preemptive Scheduling of Periodic Real-Time tasks on
				468	One Processor. Real-Time Systems Journal, vol. 4, no. 2, pp 301-324,
				469	1990.
Luca Abeni	134136c	2015-05-18 15:00:30 +0200	[diff] [blame]	470	7 - S. J. Dhall and C. L. Liu. On a real-time scheduling problem. Operations
				471	research, vol. 26, no. 1, pp 127-140, 1978.
				472	8 - T. Baker. Multiprocessor EDF and Deadline Monotonic Schedulability
				473	Analysis. Proceedings of the 24th IEEE Real-Time Systems Symposium, 2003.
				474	9 - T. Baker. An Analysis of EDF Schedulability on a Multiprocessor.
				475	IEEE Transactions on Parallel and Distributed Systems, vol. 16, no. 8,
				476	pp 760-768, 2005.
				477	10 - J. Goossens, S. Funk and S. Baruah, Priority-Driven Scheduling of
				478	Periodic Task Systems on Multiprocessors. Real-Time Systems Journal,
				479	vol. 25, no. 2–3, pp. 187–205, 2003.
				480	11 - R. Davis and A. Burns. A Survey of Hard Real-Time Scheduling for
				481	Multiprocessor Systems. ACM Computing Surveys, vol. 43, no. 4, 2011.
				482	http://www-users.cs.york.ac.uk/~robdavis/papers/MPSurveyv5.0.pdf
				483	12 - U. C. Devi and J. H. Anderson. Tardiness Bounds under Global EDF
				484	Scheduling on a Multiprocessor. Real-Time Systems Journal, vol. 32,
				485	no. 2, pp 133-189, 2008.
				486	13 - P. Valente and G. Lipari. An Upper Bound to the Lateness of Soft
				487	Real-Time Tasks Scheduled by EDF on Multiprocessors. Proceedings of
				488	the 26th IEEE Real-Time Systems Symposium, 2005.
				489	14 - J. Erickson, U. Devi and S. Baruah. Improved tardiness bounds for
				490	Global EDF. Proceedings of the 22nd Euromicro Conference on
				491	Real-Time Systems, 2010.
Claudio Scordino	ccc9d65	2017-05-18 22:13:37 +0200	[diff] [blame]	492	15 - G. Lipari, S. Baruah, Greedy reclamation of unused bandwidth in
				493	constant-bandwidth servers, 12th IEEE Euromicro Conference on Real-Time
				494	Systems, 2000.
				495	16 - L. Abeni, J. Lelli, C. Scordino, L. Palopoli, Greedy CPU reclaiming for
				496	SCHED DEADLINE. In Proceedings of the Real-Time Linux Workshop (RTLWS),
				497	Dusseldorf, Germany, 2014.
				498	17 - L. Abeni, G. Lipari, A. Parri, Y. Sun, Multicore CPU reclaiming: parallel
				499	or sequential?. In Proceedings of the 31st Annual ACM Symposium on Applied
				500	Computing, 2016.
Luca Abeni	134136c	2015-05-18 15:00:30 +0200	[diff] [blame]	501
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	502
				503	4. Bandwidth management
				504	=======================
				505
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	506	As previously mentioned, in order for -deadline scheduling to be
				507	effective and useful (that is, to be able to provide "runtime" time units
				508	within "deadline"), it is important to have some method to keep the allocation
				509	of the available fractions of CPU time to the various tasks under control.
				510	This is usually called "admission control" and if it is not performed, then
				511	no guarantee can be given on the actual scheduling of the -deadline tasks.
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	512
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	513	As already stated in Section 3, a necessary condition to be respected to
Luca Abeni	3a3a58d	2015-05-18 15:00:25 +0200	[diff] [blame]	514	correctly schedule a set of real-time tasks is that the total utilization
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	515	is smaller than M. When talking about -deadline tasks, this requires that
				516	the sum of the ratio between runtime and period for all tasks is smaller
Luca Abeni	3a3a58d	2015-05-18 15:00:25 +0200	[diff] [blame]	517	than M. Notice that the ratio runtime/period is equivalent to the utilization
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	518	of a "traditional" real-time task, and is also often referred to as
				519	"bandwidth".
				520	The interface used to control the CPU bandwidth that can be allocated
				521	to -deadline tasks is similar to the one already used for -rt
Juri Lelli	0d9ba8b	2014-09-09 10:57:13 +0100	[diff] [blame]	522	tasks with real-time group scheduling (a.k.a. RT-throttling - see
				523	Documentation/scheduler/sched-rt-group.txt), and is based on readable/
				524	writable control files located in procfs (for system wide settings).
				525	Notice that per-group settings (controlled through cgroupfs) are still not
				526	defined for -deadline tasks, because more discussion is needed in order to
				527	figure out how we want to manage SCHED_DEADLINE bandwidth at the task group
				528	level.
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	529
Juri Lelli	0d9ba8b	2014-09-09 10:57:13 +0100	[diff] [blame]	530	A main difference between deadline bandwidth management and RT-throttling
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	531	is that -deadline tasks have bandwidth on their own (while -rt ones don't!),
Juri Lelli	0d9ba8b	2014-09-09 10:57:13 +0100	[diff] [blame]	532	and thus we don't need a higher level throttling mechanism to enforce the
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	533	desired bandwidth. In other words, this means that interface parameters are
				534	only used at admission control time (i.e., when the user calls
				535	sched_setattr()). Scheduling is then performed considering actual tasks'
				536	parameters, so that CPU bandwidth is allocated to SCHED_DEADLINE tasks
				537	respecting their needs in terms of granularity. Therefore, using this simple
				538	interface we can put a cap on total utilization of -deadline tasks (i.e.,
				539	\Sum (runtime_i / period_i) < global_dl_utilization_cap).
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	540
				541	4.1 System wide settings
				542	------------------------
				543
				544	The system wide settings are configured under the /proc virtual file system.
				545
Juri Lelli	0d9ba8b	2014-09-09 10:57:13 +0100	[diff] [blame]	546	For now the -rt knobs are used for -deadline admission control and the
Luca Abeni	3a3a58d	2015-05-18 15:00:25 +0200	[diff] [blame]	547	-deadline runtime is accounted against the -rt runtime. We realize that this
Juri Lelli	0d9ba8b	2014-09-09 10:57:13 +0100	[diff] [blame]	548	isn't entirely desirable; however, it is better to have a small interface for
				549	now, and be able to change it easily later. The ideal situation (see 5.) is to
				550	run -rt tasks from a -deadline server; in which case the -rt bandwidth is a
				551	direct subset of dl_bw.
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	552
				553	This means that, for a root_domain comprising M CPUs, -deadline tasks
				554	can be created while the sum of their bandwidths stays below:
				555
				556	M * (sched_rt_runtime_us / sched_rt_period_us)
				557
				558	It is also possible to disable this bandwidth management logic, and
				559	be thus free of oversubscribing the system up to any arbitrary level.
				560	This is done by writing -1 in /proc/sys/kernel/sched_rt_runtime_us.
				561
				562
				563	4.2 Task interface
				564	------------------
				565
				566	Specifying a periodic/sporadic task that executes for a given amount of
				567	runtime at each instance, and that is scheduled according to the urgency of
				568	its own timing constraints needs, in general, a way of declaring:
				569	- a (maximum/typical) instance execution time,
				570	- a minimum interval between consecutive instances,
				571	- a time constraint by which each instance must be completed.
				572
				573	Therefore:
				574	* a new struct sched_attr, containing all the necessary fields is
				575	provided;
				576	* the new scheduling related syscalls that manipulate it, i.e.,
				577	sched_setattr() and sched_getattr() are implemented.
				578
Tommaso Cucinotta	59f8c29	2016-10-26 11:17:17 +0200	[diff] [blame]	579	For debugging purposes, the leftover runtime and absolute deadline of a
				580	SCHED_DEADLINE task can be retrieved through /proc/<pid>/sched (entries
				581	dl.runtime and dl.deadline, both values in ns). A programmatic way to
				582	retrieve these values from production code is under discussion.
				583
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	584
				585	4.3 Default behavior
				586	---------------------
				587
				588	The default value for SCHED_DEADLINE bandwidth is to have rt_runtime equal to
				589	950000. With rt_period equal to 1000000, by default, it means that -deadline
				590	tasks can use at most 95%, multiplied by the number of CPUs that compose the
				591	root_domain, for each root_domain.
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	592	This means that non -deadline tasks will receive at least 5% of the CPU time,
				593	and that -deadline tasks will receive their runtime with a guaranteed
				594	worst-case delay respect to the "deadline" parameter. If "deadline" = "period"
				595	and the cpuset mechanism is used to implement partitioned scheduling (see
				596	Section 5), then this simple setting of the bandwidth management is able to
				597	deterministically guarantee that -deadline tasks will receive their runtime
				598	in a period.
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	599
Luca Abeni	b56bfc6	2014-09-09 10:57:14 +0100	[diff] [blame]	600	Finally, notice that in order not to jeopardize the admission control a
				601	-deadline task cannot fork.
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	602
Tommaso Cucinotta	b95202a	2016-09-09 19:45:17 +0200	[diff] [blame]	603
				604	4.4 Behavior of sched_yield()
				605	-----------------------------
				606
				607	When a SCHED_DEADLINE task calls sched_yield(), it gives up its
				608	remaining runtime and is immediately throttled, until the next
				609	period, when its runtime will be replenished (a special flag
				610	dl_yielded is set and used to handle correctly throttling and runtime
				611	replenishment after a call to sched_yield()).
				612
				613	This behavior of sched_yield() allows the task to wake-up exactly at
				614	the beginning of the next period. Also, this may be useful in the
				615	future with bandwidth reclaiming mechanisms, where sched_yield() will
				616	make the leftoever runtime available for reclamation by other
				617	SCHED_DEADLINE tasks.
				618
				619
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	620	5. Tasks CPU affinity
				621	=====================
				622
				623	-deadline tasks cannot have an affinity mask smaller that the entire
				624	root_domain they are created on. However, affinities can be specified
seokhoon.yoon	09c3bcc	2016-08-02 23:23:57 +0900	[diff] [blame]	625	through the cpuset facility (Documentation/cgroup-v1/cpusets.txt).
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	626
				627	5.1 SCHED_DEADLINE and cpusets HOWTO
				628	------------------------------------
				629
				630	An example of a simple configuration (pin a -deadline task to CPU0)
				631	follows (rt-app is used to create a -deadline task).
				632
				633	mkdir /dev/cpuset
				634	mount -t cgroup -o cpuset cpuset /dev/cpuset
				635	cd /dev/cpuset
				636	mkdir cpu0
				637	echo 0 > cpu0/cpuset.cpus
				638	echo 0 > cpu0/cpuset.mems
				639	echo 1 > cpuset.cpu_exclusive
				640	echo 0 > cpuset.sched_load_balance
				641	echo 1 > cpu0/cpuset.cpu_exclusive
				642	echo 1 > cpu0/cpuset.mem_exclusive
				643	echo $$ > cpu0/tasks
				644	rt-app -t 100000:10000:d:0 -D5 (it is now actually superfluous to specify
				645	task affinity)
				646
				647	6. Future plans
				648	===============
				649
				650	Still missing:
				651
Tommaso Cucinotta	59f8c29	2016-10-26 11:17:17 +0200	[diff] [blame]	652	- programmatic way to retrieve current runtime and absolute deadline
Dario Faggioli	712e5e3	2014-01-27 12:20:15 +0100	[diff] [blame]	653	- refinements to deadline inheritance, especially regarding the possibility
				654	of retaining bandwidth isolation among non-interacting tasks. This is
				655	being studied from both theoretical and practical points of view, and
				656	hopefully we should be able to produce some demonstrative code soon;
				657	- (c)group based bandwidth management, and maybe scheduling;
				658	- access control for non-root users (and related security concerns to
				659	address), which is the best way to allow unprivileged use of the mechanisms
				660	and how to prevent non-root users "cheat" the system?
				661
				662	As already discussed, we are planning also to merge this work with the EDF
				663	throttling patches [https://lkml.org/lkml/2010/2/23/239] but we still are in
				664	the preliminary phases of the merge and we really seek feedback that would
				665	help us decide on the direction it should take.
Juri Lelli	f580193	2014-09-09 10:57:15 +0100	[diff] [blame]	666
				667	Appendix A. Test suite
				668	======================
				669
				670	The SCHED_DEADLINE policy can be easily tested using two applications that
				671	are part of a wider Linux Scheduler validation suite. The suite is
				672	available as a GitHub repository: https://github.com/scheduler-tools.
				673
				674	The first testing application is called rt-app and can be used to
				675	start multiple threads with specific parameters. rt-app supports
				676	SCHED_{OTHER,FIFO,RR,DEADLINE} scheduling policies and their related
				677	parameters (e.g., niceness, priority, runtime/deadline/period). rt-app
				678	is a valuable tool, as it can be used to synthetically recreate certain
				679	workloads (maybe mimicking real use-cases) and evaluate how the scheduler
				680	behaves under such workloads. In this way, results are easily reproducible.
				681	rt-app is available at: https://github.com/scheduler-tools/rt-app.
				682
				683	Thread parameters can be specified from the command line, with something like
				684	this:
				685
				686	# rt-app -t 100000:10000:d -t 150000:20000:f:10 -D5
				687
				688	The above creates 2 threads. The first one, scheduled by SCHED_DEADLINE,
				689	executes for 10ms every 100ms. The second one, scheduled at SCHED_FIFO
				690	priority 10, executes for 20ms every 150ms. The test will run for a total
				691	of 5 seconds.
				692
				693	More interestingly, configurations can be described with a json file that
				694	can be passed as input to rt-app with something like this:
				695
				696	# rt-app my_config.json
				697
				698	The parameters that can be specified with the second method are a superset
				699	of the command line options. Please refer to rt-app documentation for more
				700	details (<rt-app-sources>/doc/*.json).
				701
				702	The second testing application is a modification of schedtool, called
				703	schedtool-dl, which can be used to setup SCHED_DEADLINE parameters for a
				704	certain pid/application. schedtool-dl is available at:
				705	https://github.com/scheduler-tools/schedtool-dl.git.
				706
				707	The usage is straightforward:
				708
				709	# schedtool -E -t 10000000:100000000 -e ./my_cpuhog_app
				710
				711	With this, my_cpuhog_app is put to run inside a SCHED_DEADLINE reservation
				712	of 10ms every 100ms (note that parameters are expressed in microseconds).
				713	You can also use schedtool to create a reservation for an already running
				714	application, given that you know its pid:
				715
				716	# schedtool -E -t 10000000:100000000 my_app_pid
Juri Lelli	13924d2	2014-09-09 10:57:16 +0100	[diff] [blame]	717
				718	Appendix B. Minimal main()
				719	==========================
				720
				721	We provide in what follows a simple (ugly) self-contained code snippet
				722	showing how SCHED_DEADLINE reservations can be created by a real-time
				723	application developer.
				724
				725	#define _GNU_SOURCE
				726	#include <unistd.h>
				727	#include <stdio.h>
				728	#include <stdlib.h>
				729	#include <string.h>
				730	#include <time.h>
				731	#include <linux/unistd.h>
				732	#include <linux/kernel.h>
				733	#include <linux/types.h>
				734	#include <sys/syscall.h>
				735	#include <pthread.h>
				736
				737	#define gettid() syscall(__NR_gettid)
				738
				739	#define SCHED_DEADLINE 6
				740
				741	/* XXX use the proper syscall numbers */
				742	#ifdef __x86_64__
				743	#define __NR_sched_setattr 314
				744	#define __NR_sched_getattr 315
				745	#endif
				746
				747	#ifdef __i386__
				748	#define __NR_sched_setattr 351
				749	#define __NR_sched_getattr 352
				750	#endif
				751
				752	#ifdef __arm__
				753	#define __NR_sched_setattr 380
				754	#define __NR_sched_getattr 381
				755	#endif
				756
				757	static volatile int done;
				758
				759	struct sched_attr {
				760	__u32 size;
				761
				762	__u32 sched_policy;
				763	__u64 sched_flags;
				764
				765	/* SCHED_NORMAL, SCHED_BATCH */
				766	__s32 sched_nice;
				767
				768	/* SCHED_FIFO, SCHED_RR */
				769	__u32 sched_priority;
				770
				771	/* SCHED_DEADLINE (nsec) */
				772	__u64 sched_runtime;
				773	__u64 sched_deadline;
				774	__u64 sched_period;
				775	};
				776
				777	int sched_setattr(pid_t pid,
				778	const struct sched_attr *attr,
				779	unsigned int flags)
				780	{
				781	return syscall(__NR_sched_setattr, pid, attr, flags);
				782	}
				783
				784	int sched_getattr(pid_t pid,
				785	struct sched_attr *attr,
				786	unsigned int size,
				787	unsigned int flags)
				788	{
				789	return syscall(__NR_sched_getattr, pid, attr, size, flags);
				790	}
				791
				792	void run_deadline(void data)
				793	{
				794	struct sched_attr attr;
				795	int x = 0;
				796	int ret;
				797	unsigned int flags = 0;
				798
				799	printf("deadline thread started [%ld]\n", gettid());
				800
				801	attr.size = sizeof(attr);
				802	attr.sched_flags = 0;
				803	attr.sched_nice = 0;
				804	attr.sched_priority = 0;
				805
				806	/* This creates a 10ms/30ms reservation */
				807	attr.sched_policy = SCHED_DEADLINE;
				808	attr.sched_runtime = 10 * 1000 * 1000;
				809	attr.sched_period = attr.sched_deadline = 30 * 1000 * 1000;
				810
				811	ret = sched_setattr(0, &attr, flags);
				812	if (ret < 0) {
				813	done = 0;
				814	perror("sched_setattr");
				815	exit(-1);
				816	}
				817
				818	while (!done) {
				819	x++;
				820	}
				821
				822	printf("deadline thread dies [%ld]\n", gettid());
				823	return NULL;
				824	}
				825
				826	int main (int argc, char **argv)
				827	{
				828	pthread_t thread;
				829
				830	printf("main thread [%ld]\n", gettid());
				831
				832	pthread_create(&thread, NULL, run_deadline, NULL);
				833
				834	sleep(10);
				835
				836	done = 1;
				837	pthread_join(thread, NULL);
				838
				839	printf("main dies [%ld]\n", gettid());
				840	return 0;
				841	}