blob: fe8b88b8c56f3611898471daf44b4abcd77f1c10 [file] [log] [blame]
Tatyana Brokhman16349062012-09-20 10:46:10 +03001Introduction
2============
3
4The ROW scheduling algorithm will be used in mobile devices as default
5block layer IO scheduling algorithm. ROW stands for "READ Over WRITE"
6which is the main requests dispatch policy of this algorithm.
7
8The ROW IO scheduler was developed with the mobile devices needs in
9mind. In mobile devices we favor user experience upon everything else,
10thus we want to give READ IO requests as much priority as possible.
Tatyana Brokhman011f09a2012-12-03 14:15:40 +020011The main idea of the ROW scheduling policy is just that:
12- If there are READ requests in pipe - dispatch them, while write
13starvation is considered.
Tatyana Brokhman16349062012-09-20 10:46:10 +030014
15Software description
16====================
Tatyana Brokhman011f09a2012-12-03 14:15:40 +020017The elevator defines a registering mechanism for different IO scheduler
18to implement. This makes implementing a new algorithm quite straight
19forward and requires almost no changes to block/elevator framework. A
20new IO scheduler just has to implement a set of callback functions
21defined by the elevator.
22These callbacks cover all the required IO operations such as
23adding/removing request to/from the scheduler, merging two requests,
24dispatching a request etc.
25
26Design
27======
28
Tatyana Brokhman16349062012-09-20 10:46:10 +030029The requests are kept in queues according to their priority. The
30dispatching of requests is done in a Round Robin manner with a
31different slice for each queue. The dispatch quantum for a specific
Tatyana Brokhman011f09a2012-12-03 14:15:40 +020032queue is set according to the queues priority. READ queues are
Tatyana Brokhman16349062012-09-20 10:46:10 +030033given bigger dispatch quantum than the WRITE queues, within a dispatch
34cycle.
35
36At the moment there are 6 types of queues the requests are
37distributed to:
38- High priority READ queue
39- High priority Synchronous WRITE queue
40- Regular priority READ queue
41- Regular priority Synchronous WRITE queue
42- Regular priority WRITE queue
43- Low priority READ queue
44
Tatyana Brokhman011f09a2012-12-03 14:15:40 +020045The marking of request as high/low priority will be done by the
46application adding the request and not the scheduler. See TODO section.
47If the request is not marked in any way (high/low) the scheduler
48assigns it to one of the regular priority queues:
49read/write/sync write.
50
Tatyana Brokhman16349062012-09-20 10:46:10 +030051If in a certain dispatch cycle one of the queues was empty and didn't
Tatyana Brokhman011f09a2012-12-03 14:15:40 +020052use its quantum that queue will be marked as "un-served". If we're in
53a middle of a dispatch cycle dispatching from queue Y and a request
Tatyana Brokhman16349062012-09-20 10:46:10 +030054arrives for queue X that was un-served in the previous cycle, if X's
55priority is higher than Y's, queue X will be preempted in the favor of
Tatyana Brokhman011f09a2012-12-03 14:15:40 +020056queue Y.
Tatyana Brokhman16349062012-09-20 10:46:10 +030057
Tatyana Brokhman011f09a2012-12-03 14:15:40 +020058For READ request queues ROW IO scheduler allows idling within a
59dispatch quantum in order to give the application a chance to insert
60more requests. Idling means adding some extra time for serving a
61certain queue even if the queue is empty. The idling is enabled if
62the ROW IO scheduler identifies the application is inserting requests
63in a high frequency.
64Not all queues can idle. ROW scheduler exposes an enablement struct
65for idling.
66For idling on READ queues, the ROW IO scheduler uses timer mechanism.
67When the timer expires we schedule a delayed work that will signal the
68device driver to fetch another request for dispatch.
Tatyana Brokhman16349062012-09-20 10:46:10 +030069
Tatyana Brokhman011f09a2012-12-03 14:15:40 +020070ROW scheduler will support additional services for block devices that
71supports Urgent Requests. That is, the scheduler may inform the
72device driver upon urgent requests using a newly defined callback.
Tatyana Brokhman16349062012-09-20 10:46:10 +030073In addition it will support rescheduling of requests that were
Tatyana Brokhman011f09a2012-12-03 14:15:40 +020074interrupted. For example if the device driver issues a long write
75request and a sudden urgent request is received by the scheduler.
76The scheduler will inform the device driver about the urgent request,
77so the device driver can stop the current write request and serve the
78urgent request. In such a case the device driver may also insert back
79to the scheduler the remainder of the interrupted write request, such
80that the scheduler may continue sending urgent requests without the
81need to interrupt the ongoing write again and again. The write
82remainder will be sent later on according to the scheduler policy.
Tatyana Brokhman16349062012-09-20 10:46:10 +030083
84SMP/multi-core
85==============
Tatyana Brokhman011f09a2012-12-03 14:15:40 +020086At the moment the code is accessed from 2 contexts:
Tatyana Brokhman16349062012-09-20 10:46:10 +030087- Application context (from block/elevator layer): adding the requests.
Tatyana Brokhman011f09a2012-12-03 14:15:40 +020088- device driver thread: dispatching the requests and notifying on
89 completion.
Tatyana Brokhman16349062012-09-20 10:46:10 +030090
91One lock is used to synchronize between the two. This lock is provided
Tatyana Brokhman011f09a2012-12-03 14:15:40 +020092by the block device driver along with the dispatch queue.
Tatyana Brokhman16349062012-09-20 10:46:10 +030093
94Config options
95==============
961. hp_read_quantum: dispatch quantum for the high priority READ queue
Tatyana Brokhman011f09a2012-12-03 14:15:40 +020097 (default is 100 requests)
982. rp_read_quantum: dispatch quantum for the regular priority READ
99 queue (default is 100 requests)
1003. hp_swrite_quantum: dispatch quantum for the high priority
101 Synchronous WRITE queue (default is 2 requests)
Tatyana Brokhman16349062012-09-20 10:46:10 +03001024. rp_swrite_quantum: dispatch quantum for the regular priority
Tatyana Brokhman011f09a2012-12-03 14:15:40 +0200103 Synchronous WRITE queue (default is 1 requests)
Tatyana Brokhman16349062012-09-20 10:46:10 +03001045. rp_write_quantum: dispatch quantum for the regular priority WRITE
Tatyana Brokhman011f09a2012-12-03 14:15:40 +0200105 queue (default is 1 requests)
Tatyana Brokhman16349062012-09-20 10:46:10 +03001066. lp_read_quantum: dispatch quantum for the low priority READ queue
Tatyana Brokhman011f09a2012-12-03 14:15:40 +0200107 (default is 1 requests)
Tatyana Brokhman16349062012-09-20 10:46:10 +03001087. lp_swrite_quantum: dispatch quantum for the low priority Synchronous
Tatyana Brokhman011f09a2012-12-03 14:15:40 +0200109 WRITE queue (default is 1 requests)
Tatyana Brokhman16349062012-09-20 10:46:10 +03001108. read_idle: how long to idle on read queue in Msec (in case idling
Tatyana Brokhman011f09a2012-12-03 14:15:40 +0200111 is enabled on that queue). (default is 5 Msec)
Tatyana Brokhman16349062012-09-20 10:46:10 +03001129. read_idle_freq: frequency of inserting READ requests that will
113 trigger idling. This is the time in Msec between inserting two READ
Tatyana Brokhman011f09a2012-12-03 14:15:40 +0200114 requests. (default is 8 Msec)
Tatyana Brokhman16349062012-09-20 10:46:10 +0300115
Tatyana Brokhman011f09a2012-12-03 14:15:40 +0200116Note: Dispatch quantum is number of requests that will be dispatched
117from a certain queue in a dispatch cycle.
118
119To do
120=====
121The ROW algorithm takes the scheduling policy one step further, making
122it a bit more "user-needs oriented", by allowing the application to
123hint on the urgency of its requests. For example: even among the READ
124requests several requests may be more urgent for completion than other.
125The former will go to the High priority READ queue, that is given the
126bigger dispatch quantum than any other queue.
127
128Still need to design the way applications will "hint" on the urgency of
129their requests. May be done by ioctl(). We need to look into concrete
130use-cases in order to determine the best solution for this.
131This will be implemented as a second phase.
132
133Design and implement additional services for block devices that
134supports High Priority Requests.