Christoph Hellwig | 04ccc65 | 2010-09-03 11:56:17 +0200 | [diff] [blame] | 1 | |
| 2 | Explicit volatile write back cache control |
| 3 | ===================================== |
| 4 | |
| 5 | Introduction |
| 6 | ------------ |
| 7 | |
| 8 | Many storage devices, especially in the consumer market, come with volatile |
| 9 | write back caches. That means the devices signal I/O completion to the |
| 10 | operating system before data actually has hit the non-volatile storage. This |
| 11 | behavior obviously speeds up various workloads, but it means the operating |
| 12 | system needs to force data out to the non-volatile storage when it performs |
| 13 | a data integrity operation like fsync, sync or an unmount. |
| 14 | |
| 15 | The Linux block layer provides two simple mechanisms that let filesystems |
| 16 | control the caching behavior of the storage device. These mechanisms are |
| 17 | a forced cache flush, and the Force Unit Access (FUA) flag for requests. |
| 18 | |
| 19 | |
| 20 | Explicit cache flushes |
| 21 | ---------------------- |
| 22 | |
| 23 | The REQ_FLUSH flag can be OR ed into the r/w flags of a bio submitted from |
| 24 | the filesystem and will make sure the volatile cache of the storage device |
| 25 | has been flushed before the actual I/O operation is started. This explicitly |
| 26 | guarantees that previously completed write requests are on non-volatile |
| 27 | storage before the flagged bio starts. In addition the REQ_FLUSH flag can be |
| 28 | set on an otherwise empty bio structure, which causes only an explicit cache |
| 29 | flush without any dependent I/O. It is recommend to use |
| 30 | the blkdev_issue_flush() helper for a pure cache flush. |
| 31 | |
| 32 | |
| 33 | Forced Unit Access |
| 34 | ----------------- |
| 35 | |
| 36 | The REQ_FUA flag can be OR ed into the r/w flags of a bio submitted from the |
| 37 | filesystem and will make sure that I/O completion for this request is only |
| 38 | signaled after the data has been committed to non-volatile storage. |
| 39 | |
| 40 | |
| 41 | Implementation details for filesystems |
| 42 | -------------------------------------- |
| 43 | |
| 44 | Filesystems can simply set the REQ_FLUSH and REQ_FUA bits and do not have to |
| 45 | worry if the underlying devices need any explicit cache flushing and how |
| 46 | the Forced Unit Access is implemented. The REQ_FLUSH and REQ_FUA flags |
| 47 | may both be set on a single bio. |
| 48 | |
| 49 | |
| 50 | Implementation details for make_request_fn based block drivers |
| 51 | -------------------------------------------------------------- |
| 52 | |
| 53 | These drivers will always see the REQ_FLUSH and REQ_FUA bits as they sit |
| 54 | directly below the submit_bio interface. For remapping drivers the REQ_FUA |
| 55 | bits need to be propagated to underlying devices, and a global flush needs |
| 56 | to be implemented for bios with the REQ_FLUSH bit set. For real device |
| 57 | drivers that do not have a volatile cache the REQ_FLUSH and REQ_FUA bits |
| 58 | on non-empty bios can simply be ignored, and REQ_FLUSH requests without |
| 59 | data can be completed successfully without doing any work. Drivers for |
| 60 | devices with volatile caches need to implement the support for these |
| 61 | flags themselves without any help from the block layer. |
| 62 | |
| 63 | |
| 64 | Implementation details for request_fn based block drivers |
| 65 | -------------------------------------------------------------- |
| 66 | |
| 67 | For devices that do not support volatile write caches there is no driver |
| 68 | support required, the block layer completes empty REQ_FLUSH requests before |
| 69 | entering the driver and strips off the REQ_FLUSH and REQ_FUA bits from |
| 70 | requests that have a payload. For devices with volatile write caches the |
| 71 | driver needs to tell the block layer that it supports flushing caches by |
| 72 | doing: |
| 73 | |
| 74 | blk_queue_flush(sdkp->disk->queue, REQ_FLUSH); |
| 75 | |
| 76 | and handle empty REQ_FLUSH requests in its prep_fn/request_fn. Note that |
| 77 | REQ_FLUSH requests with a payload are automatically turned into a sequence |
| 78 | of an empty REQ_FLUSH request followed by the actual write by the block |
| 79 | layer. For devices that also support the FUA bit the block layer needs |
| 80 | to be told to pass through the REQ_FUA bit using: |
| 81 | |
| 82 | blk_queue_flush(sdkp->disk->queue, REQ_FLUSH | REQ_FUA); |
| 83 | |
| 84 | and the driver must handle write requests that have the REQ_FUA bit set |
| 85 | in prep_fn/request_fn. If the FUA bit is not natively supported the block |
| 86 | layer turns it into an empty REQ_FLUSH request after the actual write. |