Blame - Documentation/block/row-iosched.txt - kernel/msm

blob: fe8b88b8c56f3611898471daf44b4abcd77f1c10 [file] [log] [blame]

Tatyana Brokhman	1634906	2012-09-20 10:46:10 +0300	[diff] [blame]	1	Introduction
				2	============
				3
				4	The ROW scheduling algorithm will be used in mobile devices as default
				5	block layer IO scheduling algorithm. ROW stands for "READ Over WRITE"
				6	which is the main requests dispatch policy of this algorithm.
				7
				8	The ROW IO scheduler was developed with the mobile devices needs in
				9	mind. In mobile devices we favor user experience upon everything else,
				10	thus we want to give READ IO requests as much priority as possible.
Tatyana Brokhman	011f09a	2012-12-03 14:15:40 +0200	[diff] [blame]	11	The main idea of the ROW scheduling policy is just that:
				12	- If there are READ requests in pipe - dispatch them, while write
				13	starvation is considered.
Tatyana Brokhman	1634906	2012-09-20 10:46:10 +0300	[diff] [blame]	14
				15	Software description
				16	====================
Tatyana Brokhman	011f09a	2012-12-03 14:15:40 +0200	[diff] [blame]	17	The elevator defines a registering mechanism for different IO scheduler
				18	to implement. This makes implementing a new algorithm quite straight
				19	forward and requires almost no changes to block/elevator framework. A
				20	new IO scheduler just has to implement a set of callback functions
				21	defined by the elevator.
				22	These callbacks cover all the required IO operations such as
				23	adding/removing request to/from the scheduler, merging two requests,
				24	dispatching a request etc.
				25
				26	Design
				27	======
				28
Tatyana Brokhman	1634906	2012-09-20 10:46:10 +0300	[diff] [blame]	29	The requests are kept in queues according to their priority. The
				30	dispatching of requests is done in a Round Robin manner with a
				31	different slice for each queue. The dispatch quantum for a specific
Tatyana Brokhman	011f09a	2012-12-03 14:15:40 +0200	[diff] [blame]	32	queue is set according to the queues priority. READ queues are
Tatyana Brokhman	1634906	2012-09-20 10:46:10 +0300	[diff] [blame]	33	given bigger dispatch quantum than the WRITE queues, within a dispatch
				34	cycle.
				35
				36	At the moment there are 6 types of queues the requests are
				37	distributed to:
				38	- High priority READ queue
				39	- High priority Synchronous WRITE queue
				40	- Regular priority READ queue
				41	- Regular priority Synchronous WRITE queue
				42	- Regular priority WRITE queue
				43	- Low priority READ queue
				44
Tatyana Brokhman	011f09a	2012-12-03 14:15:40 +0200	[diff] [blame]	45	The marking of request as high/low priority will be done by the
				46	application adding the request and not the scheduler. See TODO section.
				47	If the request is not marked in any way (high/low) the scheduler
				48	assigns it to one of the regular priority queues:
				49	read/write/sync write.
				50
Tatyana Brokhman	1634906	2012-09-20 10:46:10 +0300	[diff] [blame]	51	If in a certain dispatch cycle one of the queues was empty and didn't
Tatyana Brokhman	011f09a	2012-12-03 14:15:40 +0200	[diff] [blame]	52	use its quantum that queue will be marked as "un-served". If we're in
				53	a middle of a dispatch cycle dispatching from queue Y and a request
Tatyana Brokhman	1634906	2012-09-20 10:46:10 +0300	[diff] [blame]	54	arrives for queue X that was un-served in the previous cycle, if X's
				55	priority is higher than Y's, queue X will be preempted in the favor of
Tatyana Brokhman	011f09a	2012-12-03 14:15:40 +0200	[diff] [blame]	56	queue Y.
Tatyana Brokhman	1634906	2012-09-20 10:46:10 +0300	[diff] [blame]	57
Tatyana Brokhman	011f09a	2012-12-03 14:15:40 +0200	[diff] [blame]	58	For READ request queues ROW IO scheduler allows idling within a
				59	dispatch quantum in order to give the application a chance to insert
				60	more requests. Idling means adding some extra time for serving a
				61	certain queue even if the queue is empty. The idling is enabled if
				62	the ROW IO scheduler identifies the application is inserting requests
				63	in a high frequency.
				64	Not all queues can idle. ROW scheduler exposes an enablement struct
				65	for idling.
				66	For idling on READ queues, the ROW IO scheduler uses timer mechanism.
				67	When the timer expires we schedule a delayed work that will signal the
				68	device driver to fetch another request for dispatch.
Tatyana Brokhman	1634906	2012-09-20 10:46:10 +0300	[diff] [blame]	69
Tatyana Brokhman	011f09a	2012-12-03 14:15:40 +0200	[diff] [blame]	70	ROW scheduler will support additional services for block devices that
				71	supports Urgent Requests. That is, the scheduler may inform the
				72	device driver upon urgent requests using a newly defined callback.
Tatyana Brokhman	1634906	2012-09-20 10:46:10 +0300	[diff] [blame]	73	In addition it will support rescheduling of requests that were
Tatyana Brokhman	011f09a	2012-12-03 14:15:40 +0200	[diff] [blame]	74	interrupted. For example if the device driver issues a long write
				75	request and a sudden urgent request is received by the scheduler.
				76	The scheduler will inform the device driver about the urgent request,
				77	so the device driver can stop the current write request and serve the
				78	urgent request. In such a case the device driver may also insert back
				79	to the scheduler the remainder of the interrupted write request, such
				80	that the scheduler may continue sending urgent requests without the
				81	need to interrupt the ongoing write again and again. The write
				82	remainder will be sent later on according to the scheduler policy.
Tatyana Brokhman	1634906	2012-09-20 10:46:10 +0300	[diff] [blame]	83
				84	SMP/multi-core
				85	==============
Tatyana Brokhman	011f09a	2012-12-03 14:15:40 +0200	[diff] [blame]	86	At the moment the code is accessed from 2 contexts:
Tatyana Brokhman	1634906	2012-09-20 10:46:10 +0300	[diff] [blame]	87	- Application context (from block/elevator layer): adding the requests.
Tatyana Brokhman	011f09a	2012-12-03 14:15:40 +0200	[diff] [blame]	88	- device driver thread: dispatching the requests and notifying on
				89	completion.
Tatyana Brokhman	1634906	2012-09-20 10:46:10 +0300	[diff] [blame]	90
				91	One lock is used to synchronize between the two. This lock is provided
Tatyana Brokhman	011f09a	2012-12-03 14:15:40 +0200	[diff] [blame]	92	by the block device driver along with the dispatch queue.
Tatyana Brokhman	1634906	2012-09-20 10:46:10 +0300	[diff] [blame]	93
				94	Config options
				95	==============
				96	1. hp_read_quantum: dispatch quantum for the high priority READ queue
Tatyana Brokhman	011f09a	2012-12-03 14:15:40 +0200	[diff] [blame]	97	(default is 100 requests)
				98	2. rp_read_quantum: dispatch quantum for the regular priority READ
				99	queue (default is 100 requests)
				100	3. hp_swrite_quantum: dispatch quantum for the high priority
				101	Synchronous WRITE queue (default is 2 requests)
Tatyana Brokhman	1634906	2012-09-20 10:46:10 +0300	[diff] [blame]	102	4. rp_swrite_quantum: dispatch quantum for the regular priority
Tatyana Brokhman	011f09a	2012-12-03 14:15:40 +0200	[diff] [blame]	103	Synchronous WRITE queue (default is 1 requests)
Tatyana Brokhman	1634906	2012-09-20 10:46:10 +0300	[diff] [blame]	104	5. rp_write_quantum: dispatch quantum for the regular priority WRITE
Tatyana Brokhman	011f09a	2012-12-03 14:15:40 +0200	[diff] [blame]	105	queue (default is 1 requests)
Tatyana Brokhman	1634906	2012-09-20 10:46:10 +0300	[diff] [blame]	106	6. lp_read_quantum: dispatch quantum for the low priority READ queue
Tatyana Brokhman	011f09a	2012-12-03 14:15:40 +0200	[diff] [blame]	107	(default is 1 requests)
Tatyana Brokhman	1634906	2012-09-20 10:46:10 +0300	[diff] [blame]	108	7. lp_swrite_quantum: dispatch quantum for the low priority Synchronous
Tatyana Brokhman	011f09a	2012-12-03 14:15:40 +0200	[diff] [blame]	109	WRITE queue (default is 1 requests)
Tatyana Brokhman	1634906	2012-09-20 10:46:10 +0300	[diff] [blame]	110	8. read_idle: how long to idle on read queue in Msec (in case idling
Tatyana Brokhman	011f09a	2012-12-03 14:15:40 +0200	[diff] [blame]	111	is enabled on that queue). (default is 5 Msec)
Tatyana Brokhman	1634906	2012-09-20 10:46:10 +0300	[diff] [blame]	112	9. read_idle_freq: frequency of inserting READ requests that will
				113	trigger idling. This is the time in Msec between inserting two READ
Tatyana Brokhman	011f09a	2012-12-03 14:15:40 +0200	[diff] [blame]	114	requests. (default is 8 Msec)
Tatyana Brokhman	1634906	2012-09-20 10:46:10 +0300	[diff] [blame]	115
Tatyana Brokhman	011f09a	2012-12-03 14:15:40 +0200	[diff] [blame]	116	Note: Dispatch quantum is number of requests that will be dispatched
				117	from a certain queue in a dispatch cycle.
				118
				119	To do
				120	=====
				121	The ROW algorithm takes the scheduling policy one step further, making
				122	it a bit more "user-needs oriented", by allowing the application to
				123	hint on the urgency of its requests. For example: even among the READ
				124	requests several requests may be more urgent for completion than other.
				125	The former will go to the High priority READ queue, that is given the
				126	bigger dispatch quantum than any other queue.
				127
				128	Still need to design the way applications will "hint" on the urgency of
				129	their requests. May be done by ioctl(). We need to look into concrete
				130	use-cases in order to determine the best solution for this.
				131	This will be implemented as a second phase.
				132
				133	Design and implement additional services for block devices that
				134	supports High Priority Requests.