ceph: fix connection fault con_work reentrancy problem
The messenger fault was clearing the BUSY bit, for reasons unclear. This
made it possible for the con->ops->fault function to reopen the connection,
and requeue work in the workqueue--even though the current thread was
already in con_work.
This avoids a problem where the client busy loops with connection failures
on an unreachable OSD, but doesn't address the root cause of that problem.
Signed-off-by: Sage Weil <sage@newdream.net>
diff --git a/fs/ceph/messenger.c b/fs/ceph/messenger.c
index 203c435..9832855 100644
--- a/fs/ceph/messenger.c
+++ b/fs/ceph/messenger.c
@@ -1836,8 +1836,6 @@
goto out;
}
- clear_bit(BUSY, &con->state); /* to avoid an improbable race */
-
mutex_lock(&con->mutex);
if (test_bit(CLOSED, &con->state))
goto out_unlock;