| Introduction |
| ============ |
| |
| The ROW scheduling algorithm will be used in mobile devices as default |
| block layer IO scheduling algorithm. ROW stands for "READ Over WRITE" |
| which is the main requests dispatch policy of this algorithm. |
| |
| The ROW IO scheduler was developed with the mobile devices needs in |
| mind. In mobile devices we favor user experience upon everything else, |
| thus we want to give READ IO requests as much priority as possible. |
| The main idea of the ROW scheduling policy is just that: |
| - If there are READ requests in pipe - dispatch them, while write |
| starvation is considered. |
| |
| Software description |
| ==================== |
| The elevator defines a registering mechanism for different IO scheduler |
| to implement. This makes implementing a new algorithm quite straight |
| forward and requires almost no changes to block/elevator framework. A |
| new IO scheduler just has to implement a set of callback functions |
| defined by the elevator. |
| These callbacks cover all the required IO operations such as |
| adding/removing request to/from the scheduler, merging two requests, |
| dispatching a request etc. |
| |
| Design |
| ====== |
| |
| The requests are kept in queues according to their priority. The |
| dispatching of requests is done in a Round Robin manner with a |
| different slice for each queue. The dispatch quantum for a specific |
| queue is set according to the queues priority. READ queues are |
| given bigger dispatch quantum than the WRITE queues, within a dispatch |
| cycle. |
| |
| At the moment there are 6 types of queues the requests are |
| distributed to: |
| - High priority READ queue |
| - High priority Synchronous WRITE queue |
| - Regular priority READ queue |
| - Regular priority Synchronous WRITE queue |
| - Regular priority WRITE queue |
| - Low priority READ queue |
| |
| The marking of request as high/low priority will be done by the |
| application adding the request and not the scheduler. See TODO section. |
| If the request is not marked in any way (high/low) the scheduler |
| assigns it to one of the regular priority queues: |
| read/write/sync write. |
| |
| If in a certain dispatch cycle one of the queues was empty and didn't |
| use its quantum that queue will be marked as "un-served". If we're in |
| a middle of a dispatch cycle dispatching from queue Y and a request |
| arrives for queue X that was un-served in the previous cycle, if X's |
| priority is higher than Y's, queue X will be preempted in the favor of |
| queue Y. |
| |
| For READ request queues ROW IO scheduler allows idling within a |
| dispatch quantum in order to give the application a chance to insert |
| more requests. Idling means adding some extra time for serving a |
| certain queue even if the queue is empty. The idling is enabled if |
| the ROW IO scheduler identifies the application is inserting requests |
| in a high frequency. |
| Not all queues can idle. ROW scheduler exposes an enablement struct |
| for idling. |
| For idling on READ queues, the ROW IO scheduler uses timer mechanism. |
| When the timer expires we schedule a delayed work that will signal the |
| device driver to fetch another request for dispatch. |
| |
| ROW scheduler will support additional services for block devices that |
| supports Urgent Requests. That is, the scheduler may inform the |
| device driver upon urgent requests using a newly defined callback. |
| In addition it will support rescheduling of requests that were |
| interrupted. For example if the device driver issues a long write |
| request and a sudden urgent request is received by the scheduler. |
| The scheduler will inform the device driver about the urgent request, |
| so the device driver can stop the current write request and serve the |
| urgent request. In such a case the device driver may also insert back |
| to the scheduler the remainder of the interrupted write request, such |
| that the scheduler may continue sending urgent requests without the |
| need to interrupt the ongoing write again and again. The write |
| remainder will be sent later on according to the scheduler policy. |
| |
| SMP/multi-core |
| ============== |
| At the moment the code is accessed from 2 contexts: |
| - Application context (from block/elevator layer): adding the requests. |
| - device driver thread: dispatching the requests and notifying on |
| completion. |
| |
| One lock is used to synchronize between the two. This lock is provided |
| by the block device driver along with the dispatch queue. |
| |
| Config options |
| ============== |
| 1. hp_read_quantum: dispatch quantum for the high priority READ queue |
| (default is 100 requests) |
| 2. rp_read_quantum: dispatch quantum for the regular priority READ |
| queue (default is 100 requests) |
| 3. hp_swrite_quantum: dispatch quantum for the high priority |
| Synchronous WRITE queue (default is 2 requests) |
| 4. rp_swrite_quantum: dispatch quantum for the regular priority |
| Synchronous WRITE queue (default is 1 requests) |
| 5. rp_write_quantum: dispatch quantum for the regular priority WRITE |
| queue (default is 1 requests) |
| 6. lp_read_quantum: dispatch quantum for the low priority READ queue |
| (default is 1 requests) |
| 7. lp_swrite_quantum: dispatch quantum for the low priority Synchronous |
| WRITE queue (default is 1 requests) |
| 8. read_idle: how long to idle on read queue in Msec (in case idling |
| is enabled on that queue). (default is 5 Msec) |
| 9. read_idle_freq: frequency of inserting READ requests that will |
| trigger idling. This is the time in Msec between inserting two READ |
| requests. (default is 8 Msec) |
| |
| Note: Dispatch quantum is number of requests that will be dispatched |
| from a certain queue in a dispatch cycle. |
| |
| To do |
| ===== |
| The ROW algorithm takes the scheduling policy one step further, making |
| it a bit more "user-needs oriented", by allowing the application to |
| hint on the urgency of its requests. For example: even among the READ |
| requests several requests may be more urgent for completion than other. |
| The former will go to the High priority READ queue, that is given the |
| bigger dispatch quantum than any other queue. |
| |
| Still need to design the way applications will "hint" on the urgency of |
| their requests. May be done by ioctl(). We need to look into concrete |
| use-cases in order to determine the best solution for this. |
| This will be implemented as a second phase. |
| |
| Design and implement additional services for block devices that |
| supports High Priority Requests. |