writeback: comment on the bdi dirty threshold
We do "floating proportions" to let active devices to grow its target
share of dirty pages and stalled/inactive devices to decrease its target
share over time.
It works well except in the case of "an inactive disk suddenly goes
busy", where the initial target share may be too small. To mitigate
this, bdi_position_ratio() has the below line to raise a small
bdi_thresh when it's safe to do so, so that the disk be feed with enough
dirty pages for efficient IO and in turn fast rampup of bdi_thresh:
bdi_thresh = max(bdi_thresh, (limit - dirty) / 8);
balance_dirty_pages() normally does negative feedback control which
adjusts ratelimit to balance the bdi dirty pages around the target.
In some extreme cases when that is not enough, it will have to block
the tasks completely until the bdi dirty pages drop below bdi_thresh.
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 7125248..155efca 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -411,8 +411,13 @@
*
* Returns @bdi's dirty limit in pages. The term "dirty" in the context of
* dirty balancing includes all PG_dirty, PG_writeback and NFS unstable pages.
- * And the "limit" in the name is not seriously taken as hard limit in
- * balance_dirty_pages().
+ *
+ * Note that balance_dirty_pages() will only seriously take it as a hard limit
+ * when sleeping max_pause per page is not enough to keep the dirty pages under
+ * control. For example, when the device is completely stalled due to some error
+ * conditions, or when there are 1000 dd tasks writing to a slow 10MB/s USB key.
+ * In the other normal situations, it acts more gently by throttling the tasks
+ * more (rather than completely block them) when the bdi dirty pages go high.
*
* It allocates high/low dirty limits to fast/slow devices, in order to prevent
* - starving fast devices
@@ -594,6 +599,13 @@
*/
if (unlikely(bdi_thresh > thresh))
bdi_thresh = thresh;
+ /*
+ * It's very possible that bdi_thresh is close to 0 not because the
+ * device is slow, but that it has remained inactive for long time.
+ * Honour such devices a reasonable good (hopefully IO efficient)
+ * threshold, so that the occasional writes won't be blocked and active
+ * writes can rampup the threshold quickly.
+ */
bdi_thresh = max(bdi_thresh, (limit - dirty) / 8);
/*
* scale global setpoint to bdi's: