drbd: fix potential distributed deadlock during verify or resync If max-buffers and socket buffer sizes are "too small" for the chosen resync rate, this could lead potentially lead to a distributed deadlock, which may or may not resolve itself via the "ko-count" and request timeout mechanism, or could be resolved by forced disconnect. One option to deal with this is proper configuration: use larger max-buffer and socket buffers settings, or reduce the resync rate. But even with bad configuration we should not deadlock, but "gracefully" recover. The issue is avoided by using only up to max-buffers/2 for resync requests, and by using max-buffers not as a hard limit for data buffer allocations, but as a throttle threshold only. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>

commit: 0e49d7b014c5d591a053d08888a455bd74a88646 [log] [tgz]
author: Lars Ellenberg <lars.ellenberg@linbit.com> Mon Apr 28 18:43:18 2014 +0200
committer: Jens Axboe <axboe@fb.com> Wed Apr 30 13:46:54 2014 -0600
tree: addb770d9de32c447e12d0dd51eb383fe40fdc90
parent: 6377b9235056452cd5d592c3739baa379a8735fe [diff] [blame]
diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index 125c9e8..6ffbc22 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c

@@ -234,9 +234,17 @@
  * @retry:	whether to retry, if not enough pages are available right now
  *
  * Tries to allocate number pages, first from our own page pool, then from
- * the kernel, unless this allocation would exceed the max_buffers setting.
+ * the kernel.
  * Possibly retry until DRBD frees sufficient pages somewhere else.
  *
+ * If this allocation would exceed the max_buffers setting, we throttle
+ * allocation (schedule_timeout) to give the system some room to breathe.
+ *
+ * We do not use max-buffers as hard limit, because it could lead to
+ * congestion and further to a distributed deadlock during online-verify or
+ * (checksum based) resync, if the max-buffers, socket buffer sizes and
+ * resync-rate settings are mis-configured.
+ *
  * Returns a page chain linked via page->private.
  */
 struct page *drbd_alloc_pages(struct drbd_peer_device *peer_device, unsigned int number,
@@ -246,10 +254,8 @@
 	struct page *page = NULL;
 	struct net_conf *nc;
 	DEFINE_WAIT(wait);
-	int mxb;
+	unsigned int mxb;
 
-	/* Yes, we may run up to @number over max_buffers. If we
-	 * follow it strictly, the admin will get it wrong anyways. */
 	rcu_read_lock();
 	nc = rcu_dereference(peer_device->connection->net_conf);
 	mxb = nc ? nc->max_buffers : 1000000;
@@ -277,7 +283,8 @@
 			break;
 		}
 
-		schedule();
+		if (schedule_timeout(HZ/10) == 0)
+			mxb = UINT_MAX;
 	}
 	finish_wait(&drbd_pp_wait, &wait);
commit	0e49d7b014c5d591a053d08888a455bd74a88646	[log] [tgz]
author	Lars Ellenberg <lars.ellenberg@linbit.com>	Mon Apr 28 18:43:18 2014 +0200
committer	Jens Axboe <axboe@fb.com>	Wed Apr 30 13:46:54 2014 -0600
tree	addb770d9de32c447e12d0dd51eb383fe40fdc90
parent	6377b9235056452cd5d592c3739baa379a8735fe [diff] [blame]