Tatyana Brokhman | 1634906 | 2012-09-20 10:46:10 +0300 | [diff] [blame] | 1 | Introduction |
| 2 | ============ |
| 3 | |
| 4 | The ROW scheduling algorithm will be used in mobile devices as default |
| 5 | block layer IO scheduling algorithm. ROW stands for "READ Over WRITE" |
| 6 | which is the main requests dispatch policy of this algorithm. |
| 7 | |
| 8 | The ROW IO scheduler was developed with the mobile devices needs in |
| 9 | mind. In mobile devices we favor user experience upon everything else, |
| 10 | thus we want to give READ IO requests as much priority as possible. |
Tatyana Brokhman | 011f09a | 2012-12-03 14:15:40 +0200 | [diff] [blame] | 11 | The main idea of the ROW scheduling policy is just that: |
| 12 | - If there are READ requests in pipe - dispatch them, while write |
| 13 | starvation is considered. |
Tatyana Brokhman | 1634906 | 2012-09-20 10:46:10 +0300 | [diff] [blame] | 14 | |
| 15 | Software description |
| 16 | ==================== |
Tatyana Brokhman | 011f09a | 2012-12-03 14:15:40 +0200 | [diff] [blame] | 17 | The elevator defines a registering mechanism for different IO scheduler |
| 18 | to implement. This makes implementing a new algorithm quite straight |
| 19 | forward and requires almost no changes to block/elevator framework. A |
| 20 | new IO scheduler just has to implement a set of callback functions |
| 21 | defined by the elevator. |
| 22 | These callbacks cover all the required IO operations such as |
| 23 | adding/removing request to/from the scheduler, merging two requests, |
| 24 | dispatching a request etc. |
| 25 | |
| 26 | Design |
| 27 | ====== |
| 28 | |
Tatyana Brokhman | 1634906 | 2012-09-20 10:46:10 +0300 | [diff] [blame] | 29 | The requests are kept in queues according to their priority. The |
| 30 | dispatching of requests is done in a Round Robin manner with a |
| 31 | different slice for each queue. The dispatch quantum for a specific |
Tatyana Brokhman | 011f09a | 2012-12-03 14:15:40 +0200 | [diff] [blame] | 32 | queue is set according to the queues priority. READ queues are |
Tatyana Brokhman | 1634906 | 2012-09-20 10:46:10 +0300 | [diff] [blame] | 33 | given bigger dispatch quantum than the WRITE queues, within a dispatch |
| 34 | cycle. |
| 35 | |
| 36 | At the moment there are 6 types of queues the requests are |
| 37 | distributed to: |
| 38 | - High priority READ queue |
| 39 | - High priority Synchronous WRITE queue |
| 40 | - Regular priority READ queue |
| 41 | - Regular priority Synchronous WRITE queue |
| 42 | - Regular priority WRITE queue |
| 43 | - Low priority READ queue |
| 44 | |
Tatyana Brokhman | 011f09a | 2012-12-03 14:15:40 +0200 | [diff] [blame] | 45 | The marking of request as high/low priority will be done by the |
| 46 | application adding the request and not the scheduler. See TODO section. |
| 47 | If the request is not marked in any way (high/low) the scheduler |
| 48 | assigns it to one of the regular priority queues: |
| 49 | read/write/sync write. |
| 50 | |
Tatyana Brokhman | 1634906 | 2012-09-20 10:46:10 +0300 | [diff] [blame] | 51 | If in a certain dispatch cycle one of the queues was empty and didn't |
Tatyana Brokhman | 011f09a | 2012-12-03 14:15:40 +0200 | [diff] [blame] | 52 | use its quantum that queue will be marked as "un-served". If we're in |
| 53 | a middle of a dispatch cycle dispatching from queue Y and a request |
Tatyana Brokhman | 1634906 | 2012-09-20 10:46:10 +0300 | [diff] [blame] | 54 | arrives for queue X that was un-served in the previous cycle, if X's |
| 55 | priority is higher than Y's, queue X will be preempted in the favor of |
Tatyana Brokhman | 011f09a | 2012-12-03 14:15:40 +0200 | [diff] [blame] | 56 | queue Y. |
Tatyana Brokhman | 1634906 | 2012-09-20 10:46:10 +0300 | [diff] [blame] | 57 | |
Tatyana Brokhman | 011f09a | 2012-12-03 14:15:40 +0200 | [diff] [blame] | 58 | For READ request queues ROW IO scheduler allows idling within a |
| 59 | dispatch quantum in order to give the application a chance to insert |
| 60 | more requests. Idling means adding some extra time for serving a |
| 61 | certain queue even if the queue is empty. The idling is enabled if |
| 62 | the ROW IO scheduler identifies the application is inserting requests |
| 63 | in a high frequency. |
| 64 | Not all queues can idle. ROW scheduler exposes an enablement struct |
| 65 | for idling. |
| 66 | For idling on READ queues, the ROW IO scheduler uses timer mechanism. |
| 67 | When the timer expires we schedule a delayed work that will signal the |
| 68 | device driver to fetch another request for dispatch. |
Tatyana Brokhman | 1634906 | 2012-09-20 10:46:10 +0300 | [diff] [blame] | 69 | |
Tatyana Brokhman | 011f09a | 2012-12-03 14:15:40 +0200 | [diff] [blame] | 70 | ROW scheduler will support additional services for block devices that |
| 71 | supports Urgent Requests. That is, the scheduler may inform the |
| 72 | device driver upon urgent requests using a newly defined callback. |
Tatyana Brokhman | 1634906 | 2012-09-20 10:46:10 +0300 | [diff] [blame] | 73 | In addition it will support rescheduling of requests that were |
Tatyana Brokhman | 011f09a | 2012-12-03 14:15:40 +0200 | [diff] [blame] | 74 | interrupted. For example if the device driver issues a long write |
| 75 | request and a sudden urgent request is received by the scheduler. |
| 76 | The scheduler will inform the device driver about the urgent request, |
| 77 | so the device driver can stop the current write request and serve the |
| 78 | urgent request. In such a case the device driver may also insert back |
| 79 | to the scheduler the remainder of the interrupted write request, such |
| 80 | that the scheduler may continue sending urgent requests without the |
| 81 | need to interrupt the ongoing write again and again. The write |
| 82 | remainder will be sent later on according to the scheduler policy. |
Tatyana Brokhman | 1634906 | 2012-09-20 10:46:10 +0300 | [diff] [blame] | 83 | |
| 84 | SMP/multi-core |
| 85 | ============== |
Tatyana Brokhman | 011f09a | 2012-12-03 14:15:40 +0200 | [diff] [blame] | 86 | At the moment the code is accessed from 2 contexts: |
Tatyana Brokhman | 1634906 | 2012-09-20 10:46:10 +0300 | [diff] [blame] | 87 | - Application context (from block/elevator layer): adding the requests. |
Tatyana Brokhman | 011f09a | 2012-12-03 14:15:40 +0200 | [diff] [blame] | 88 | - device driver thread: dispatching the requests and notifying on |
| 89 | completion. |
Tatyana Brokhman | 1634906 | 2012-09-20 10:46:10 +0300 | [diff] [blame] | 90 | |
| 91 | One lock is used to synchronize between the two. This lock is provided |
Tatyana Brokhman | 011f09a | 2012-12-03 14:15:40 +0200 | [diff] [blame] | 92 | by the block device driver along with the dispatch queue. |
Tatyana Brokhman | 1634906 | 2012-09-20 10:46:10 +0300 | [diff] [blame] | 93 | |
| 94 | Config options |
| 95 | ============== |
| 96 | 1. hp_read_quantum: dispatch quantum for the high priority READ queue |
Tatyana Brokhman | 011f09a | 2012-12-03 14:15:40 +0200 | [diff] [blame] | 97 | (default is 100 requests) |
| 98 | 2. rp_read_quantum: dispatch quantum for the regular priority READ |
| 99 | queue (default is 100 requests) |
| 100 | 3. hp_swrite_quantum: dispatch quantum for the high priority |
| 101 | Synchronous WRITE queue (default is 2 requests) |
Tatyana Brokhman | 1634906 | 2012-09-20 10:46:10 +0300 | [diff] [blame] | 102 | 4. rp_swrite_quantum: dispatch quantum for the regular priority |
Tatyana Brokhman | 011f09a | 2012-12-03 14:15:40 +0200 | [diff] [blame] | 103 | Synchronous WRITE queue (default is 1 requests) |
Tatyana Brokhman | 1634906 | 2012-09-20 10:46:10 +0300 | [diff] [blame] | 104 | 5. rp_write_quantum: dispatch quantum for the regular priority WRITE |
Tatyana Brokhman | 011f09a | 2012-12-03 14:15:40 +0200 | [diff] [blame] | 105 | queue (default is 1 requests) |
Tatyana Brokhman | 1634906 | 2012-09-20 10:46:10 +0300 | [diff] [blame] | 106 | 6. lp_read_quantum: dispatch quantum for the low priority READ queue |
Tatyana Brokhman | 011f09a | 2012-12-03 14:15:40 +0200 | [diff] [blame] | 107 | (default is 1 requests) |
Tatyana Brokhman | 1634906 | 2012-09-20 10:46:10 +0300 | [diff] [blame] | 108 | 7. lp_swrite_quantum: dispatch quantum for the low priority Synchronous |
Tatyana Brokhman | 011f09a | 2012-12-03 14:15:40 +0200 | [diff] [blame] | 109 | WRITE queue (default is 1 requests) |
Tatyana Brokhman | 1634906 | 2012-09-20 10:46:10 +0300 | [diff] [blame] | 110 | 8. read_idle: how long to idle on read queue in Msec (in case idling |
Tatyana Brokhman | 011f09a | 2012-12-03 14:15:40 +0200 | [diff] [blame] | 111 | is enabled on that queue). (default is 5 Msec) |
Tatyana Brokhman | 1634906 | 2012-09-20 10:46:10 +0300 | [diff] [blame] | 112 | 9. read_idle_freq: frequency of inserting READ requests that will |
| 113 | trigger idling. This is the time in Msec between inserting two READ |
Tatyana Brokhman | 011f09a | 2012-12-03 14:15:40 +0200 | [diff] [blame] | 114 | requests. (default is 8 Msec) |
Tatyana Brokhman | 1634906 | 2012-09-20 10:46:10 +0300 | [diff] [blame] | 115 | |
Tatyana Brokhman | 011f09a | 2012-12-03 14:15:40 +0200 | [diff] [blame] | 116 | Note: Dispatch quantum is number of requests that will be dispatched |
| 117 | from a certain queue in a dispatch cycle. |
| 118 | |
| 119 | To do |
| 120 | ===== |
| 121 | The ROW algorithm takes the scheduling policy one step further, making |
| 122 | it a bit more "user-needs oriented", by allowing the application to |
| 123 | hint on the urgency of its requests. For example: even among the READ |
| 124 | requests several requests may be more urgent for completion than other. |
| 125 | The former will go to the High priority READ queue, that is given the |
| 126 | bigger dispatch quantum than any other queue. |
| 127 | |
| 128 | Still need to design the way applications will "hint" on the urgency of |
| 129 | their requests. May be done by ioctl(). We need to look into concrete |
| 130 | use-cases in order to determine the best solution for this. |
| 131 | This will be implemented as a second phase. |
| 132 | |
| 133 | Design and implement additional services for block devices that |
| 134 | supports High Priority Requests. |