| Introduction |
| ============ |
| |
| The ROW scheduling algorithm will be used in mobile devices as default |
| block layer IO scheduling algorithm. ROW stands for "READ Over WRITE" |
| which is the main requests dispatch policy of this algorithm. |
| |
| The ROW IO scheduler was developed with the mobile devices needs in |
| mind. In mobile devices we favor user experience upon everything else, |
| thus we want to give READ IO requests as much priority as possible. |
| The main idea of the ROW scheduling policy is: |
| If there are READ requests in pipe - dispatch them but don't starve |
| the WRITE requests too much. |
| |
| Software description |
| ==================== |
| The requests are kept in queues according to their priority. The |
| dispatching of requests is done in a Round Robin manner with a |
| different slice for each queue. The dispatch quantum for a specific |
| queue is defined according to the queues priority. READ queues are |
| given bigger dispatch quantum than the WRITE queues, within a dispatch |
| cycle. |
| |
| At the moment there are 6 types of queues the requests are |
| distributed to: |
| - High priority READ queue |
| - High priority Synchronous WRITE queue |
| - Regular priority READ queue |
| - Regular priority Synchronous WRITE queue |
| - Regular priority WRITE queue |
| - Low priority READ queue |
| |
| If in a certain dispatch cycle one of the queues was empty and didn't |
| use its quantum that queue will be marked as "un-served". If we're in a |
| middle of a dispatch cycle dispatching from queue Y and a request |
| arrives for queue X that was un-served in the previous cycle, if X's |
| priority is higher than Y's, queue X will be preempted in the favor of |
| queue Y. This won't mean that cycle is restarted. The "dispatched" |
| counter of queue X will remain unchanged. Once queue Y uses up it's quantum |
| (or there will be no more requests left on it) we'll switch back to queue X |
| and allow it to finish it's quantum. |
| |
| For READ requests queues we allow idling in within a dispatch quantum in |
| order to give the application a chance to insert more requests. Idling |
| means adding some extra time for serving a certain queue even if the |
| queue is empty. The idling is enabled if we identify the application is |
| inserting requests in a high frequency. |
| |
| For idling on READ queues we use timer mechanism. When the timer expires, |
| if there are requests in the scheduler we will signal the underlying driver |
| (for example the MMC driver) to fetch another request for dispatch. |
| |
| The ROW algorithm takes the scheduling policy one step further, making |
| it a bit more "user-needs oriented", by allowing the application to |
| hint on the urgency of its requests. For example: even among the READ |
| requests several requests may be more urgent for completion then others. |
| The former will go to the High priority READ queue, that is given the |
| bigger dispatch quantum than any other queue. |
| |
| ROW scheduler will support special services for block devices that |
| supports High Priority Requests. That is, the scheduler may inform the |
| device upon urgent requests using new callback make_urgent_request. |
| In addition it will support rescheduling of requests that were |
| interrupted. For example, if the device issues a long write request and |
| a sudden high priority read interrupt pops in, the scheduler will |
| inform the device about the urgent request, so the device can stop the |
| current write request and serve the high priority read request. In such |
| a case the device may also send back to the scheduler the reminder of |
| the interrupted write request, such that the scheduler may continue |
| sending high priority requests without the need to interrupt the |
| ongoing write again and again. The write remainder will be sent later on |
| according to the scheduler policy. |
| |
| Design |
| ====== |
| Existing algorithms (cfq, deadline) sort the io requests according LBA. |
| When deciding on the next request to dispatch they choose the closest |
| request to the current disk head position (from handling last |
| dispatched request). This is done in order to reduce the disk head |
| movement to a minimum. |
| We feel that this functionality isn't really needed in mobile devices. |
| Usually applications that write/read large chunks of data insert the |
| requests in already sorted LBA order. Thus dealing with sort trees adds |
| unnecessary complexity. |
| |
| We're planing to try this enhancement in the future to check if the |
| performance is influenced by it. |
| |
| SMP/multi-core |
| ============== |
| At the moment the code is acceded from 2 contexts: |
| - Application context (from block/elevator layer): adding the requests. |
| - Underlying driver context (for example the mmc driver thread): dispatching |
| the requests and notifying on completion. |
| |
| One lock is used to synchronize between the two. This lock is provided |
| by the underlying driver along with the dispatch queue. |
| |
| Config options |
| ============== |
| 1. hp_read_quantum: dispatch quantum for the high priority READ queue |
| 2. rp_read_quantum: dispatch quantum for the regular priority READ queue |
| 3. hp_swrite_quantum: dispatch quantum for the high priority Synchronous |
| WRITE queue |
| 4. rp_swrite_quantum: dispatch quantum for the regular priority |
| Synchronous WRITE queue |
| 5. rp_write_quantum: dispatch quantum for the regular priority WRITE |
| queue |
| 6. lp_read_quantum: dispatch quantum for the low priority READ queue |
| 7. lp_swrite_quantum: dispatch quantum for the low priority Synchronous |
| WRITE queue |
| 8. read_idle: how long to idle on read queue in Msec (in case idling |
| is enabled on that queue). |
| 9. read_idle_freq: frequency of inserting READ requests that will |
| trigger idling. This is the time in Msec between inserting two READ |
| requests |
| |