Documentation/block/row-iosched.txt - fp2-dev/kernel/msm - Gitiles

 Introduction
 ============

 The ROW scheduling algorithm will be used in mobile devices as default
 block layer IO scheduling algorithm. ROW stands for "READ Over WRITE"
 which is the main requests dispatch policy of this algorithm.

 The ROW IO scheduler was developed with the mobile devices needs in
 mind. In mobile devices we favor user experience upon everything else,
 thus we want to give READ IO requests as much priority as possible.
 The main idea of the ROW scheduling policy is:
 If there are READ requests in pipe - dispatch them but don't starve
 the WRITE requests too much.

 Software description
 ====================
 The requests are kept in queues according to their priority. The
 dispatching of requests is done in a Round Robin manner with a
 different slice for each queue. The dispatch quantum for a specific
 queue is defined according to the queues priority. READ queues are
 given bigger dispatch quantum than the WRITE queues, within a dispatch
 cycle.

 At the moment there are 6 types of queues the requests are
 distributed to:
 -	High priority READ queue
 -	High priority Synchronous WRITE queue
 -	Regular priority READ queue
 -	Regular priority Synchronous WRITE queue
 -	Regular priority WRITE queue
 -	Low priority READ queue

 If in a certain dispatch cycle one of the queues was empty and didn't
 use its quantum that queue will be marked as "un-served". If we're in a
 middle of a dispatch cycle dispatching from queue Y and a request
 arrives for queue X that was un-served in the previous cycle, if X's
 priority is higher than Y's, queue X will be preempted in the favor of
 queue Y. This won't mean that cycle is restarted. The "dispatched"
 counter of queue X will remain unchanged. Once queue Y uses up it's quantum
 (or there will be no more requests left on it) we'll switch back to queue X
 and allow it to finish it's quantum.

 For READ requests queues we allow idling in within a dispatch quantum in
 order to give the application a chance to insert more requests. Idling
 means adding some extra time for serving a certain queue even if the
 queue is empty. The idling is enabled if we identify the application is
 inserting requests in a high frequency.

 For idling on READ queues we use timer mechanism. When the timer expires,
 if there are requests in the scheduler we will signal the underlying driver
 (for example the MMC driver) to fetch another request for dispatch.

 The ROW algorithm takes the scheduling policy one step further, making
 it a bit more "user-needs oriented", by allowing the application to
 hint on the urgency of its requests. For example: even among the READ
 requests several requests may be more urgent for completion then others.
 The former will go to the High priority READ queue, that is given the
 bigger dispatch quantum than any other queue.

 ROW scheduler will support special services for block devices that
 supports High Priority Requests. That is, the scheduler may inform the
 device upon urgent requests using new callback make_urgent_request.
 In addition it will support rescheduling of requests that were
 interrupted. For example, if the device issues a long write request and
 a sudden high priority read interrupt pops in, the scheduler will
 inform the device about the urgent request, so the device can stop the
 current write request and serve the high priority read request. In such
 a case the device may also send back to the scheduler the reminder of
 the interrupted write request, such that the scheduler may continue
 sending high priority requests without the need to interrupt the
 ongoing write again and again. The write remainder will be sent later on
 according to the scheduler policy.

 Design
 ======
 Existing algorithms (cfq, deadline) sort the io requests according LBA.
 When deciding on the next request to dispatch they choose the closest
 request to the current disk head position (from handling last
 dispatched request). This is done in order to reduce the disk head
 movement to a minimum.
 We feel that this functionality isn't really needed in mobile devices.
 Usually applications that write/read large chunks of data insert the
 requests in already sorted LBA order. Thus dealing with sort trees adds
 unnecessary complexity.

 We're planing to try this enhancement in the future to check if the
 performance is influenced by it.

 SMP/multi-core
 ==============
 At the moment the code is acceded from 2 contexts:
 - Application context (from block/elevator layer): adding the requests.
 - Underlying driver context (for example the mmc driver thread): dispatching
   the requests and notifying on completion.

 One lock is used to synchronize between the two. This lock is provided
 by the underlying driver along with the dispatch queue.

 Config options
 ==============
 1. hp_read_quantum: dispatch quantum for the high priority READ queue
 2. rp_read_quantum: dispatch quantum for the regular priority READ queue
 3. hp_swrite_quantum: dispatch quantum for the high priority Synchronous
    WRITE queue
 4. rp_swrite_quantum: dispatch quantum for the regular priority
    Synchronous WRITE queue
 5. rp_write_quantum: dispatch quantum for the regular priority WRITE
    queue
 6. lp_read_quantum: dispatch quantum for the low priority READ queue
 7. lp_swrite_quantum: dispatch quantum for the low priority Synchronous
    WRITE queue
 8. read_idle: how long to idle on read queue in Msec (in case idling
    is enabled on that queue).
 9. read_idle_freq: frequency of inserting READ requests that will
    trigger idling. This is the time in Msec between inserting two READ
    requests
	Introduction
	============

	The ROW scheduling algorithm will be used in mobile devices as default
	block layer IO scheduling algorithm. ROW stands for "READ Over WRITE"
	which is the main requests dispatch policy of this algorithm.

	The ROW IO scheduler was developed with the mobile devices needs in
	mind. In mobile devices we favor user experience upon everything else,
	thus we want to give READ IO requests as much priority as possible.
	The main idea of the ROW scheduling policy is:
	If there are READ requests in pipe - dispatch them but don't starve
	the WRITE requests too much.

	Software description
	====================
	The requests are kept in queues according to their priority. The
	dispatching of requests is done in a Round Robin manner with a
	different slice for each queue. The dispatch quantum for a specific
	queue is defined according to the queues priority. READ queues are
	given bigger dispatch quantum than the WRITE queues, within a dispatch
	cycle.

	At the moment there are 6 types of queues the requests are
	distributed to:
	- High priority READ queue
	- High priority Synchronous WRITE queue
	- Regular priority READ queue
	- Regular priority Synchronous WRITE queue
	- Regular priority WRITE queue
	- Low priority READ queue

	If in a certain dispatch cycle one of the queues was empty and didn't
	use its quantum that queue will be marked as "un-served". If we're in a
	middle of a dispatch cycle dispatching from queue Y and a request
	arrives for queue X that was un-served in the previous cycle, if X's
	priority is higher than Y's, queue X will be preempted in the favor of
	queue Y. This won't mean that cycle is restarted. The "dispatched"
	counter of queue X will remain unchanged. Once queue Y uses up it's quantum
	(or there will be no more requests left on it) we'll switch back to queue X
	and allow it to finish it's quantum.

	For READ requests queues we allow idling in within a dispatch quantum in
	order to give the application a chance to insert more requests. Idling
	means adding some extra time for serving a certain queue even if the
	queue is empty. The idling is enabled if we identify the application is
	inserting requests in a high frequency.

	For idling on READ queues we use timer mechanism. When the timer expires,
	if there are requests in the scheduler we will signal the underlying driver
	(for example the MMC driver) to fetch another request for dispatch.

	The ROW algorithm takes the scheduling policy one step further, making
	it a bit more "user-needs oriented", by allowing the application to
	hint on the urgency of its requests. For example: even among the READ
	requests several requests may be more urgent for completion then others.
	The former will go to the High priority READ queue, that is given the
	bigger dispatch quantum than any other queue.

	ROW scheduler will support special services for block devices that
	supports High Priority Requests. That is, the scheduler may inform the
	device upon urgent requests using new callback make_urgent_request.
	In addition it will support rescheduling of requests that were
	interrupted. For example, if the device issues a long write request and
	a sudden high priority read interrupt pops in, the scheduler will
	inform the device about the urgent request, so the device can stop the
	current write request and serve the high priority read request. In such
	a case the device may also send back to the scheduler the reminder of
	the interrupted write request, such that the scheduler may continue
	sending high priority requests without the need to interrupt the
	ongoing write again and again. The write remainder will be sent later on
	according to the scheduler policy.

	Design
	======
	Existing algorithms (cfq, deadline) sort the io requests according LBA.
	When deciding on the next request to dispatch they choose the closest
	request to the current disk head position (from handling last
	dispatched request). This is done in order to reduce the disk head
	movement to a minimum.
	We feel that this functionality isn't really needed in mobile devices.
	Usually applications that write/read large chunks of data insert the
	requests in already sorted LBA order. Thus dealing with sort trees adds
	unnecessary complexity.

	We're planing to try this enhancement in the future to check if the
	performance is influenced by it.

	SMP/multi-core
	==============
	At the moment the code is acceded from 2 contexts:
	- Application context (from block/elevator layer): adding the requests.
	- Underlying driver context (for example the mmc driver thread): dispatching
	the requests and notifying on completion.

	One lock is used to synchronize between the two. This lock is provided
	by the underlying driver along with the dispatch queue.

	Config options
	==============
	1. hp_read_quantum: dispatch quantum for the high priority READ queue
	2. rp_read_quantum: dispatch quantum for the regular priority READ queue
	3. hp_swrite_quantum: dispatch quantum for the high priority Synchronous
	WRITE queue
	4. rp_swrite_quantum: dispatch quantum for the regular priority
	Synchronous WRITE queue
	5. rp_write_quantum: dispatch quantum for the regular priority WRITE
	queue
	6. lp_read_quantum: dispatch quantum for the low priority READ queue
	7. lp_swrite_quantum: dispatch quantum for the low priority Synchronous
	WRITE queue
	8. read_idle: how long to idle on read queue in Msec (in case idling
	is enabled on that queue).
	9. read_idle_freq: frequency of inserting READ requests that will
	trigger idling. This is the time in Msec between inserting two READ
	requests