Diff - 9590544694becc64f4874963dbfc4b4d391024b7^! - kernel/msm-4.19

commit	9590544694becc64f4874963dbfc4b4d391024b7	[log] [tgz]
author	NeilBrown <neilb@suse.de>	Wed Sep 24 11:28:32 2014 +1000
committer	Trond Myklebust <trond.myklebust@primarydata.com>	Thu Sep 25 08:25:28 2014 -0400
tree	e0578565ba34391968b509cc5cc8cc1ae7d89614
parent	a4796e37c12e177572b80864cbab9c907ea250b0 [diff] [blame]

NFS: avoid deadlocks with loop-back mounted NFS filesystems.

Support for loop-back mounted NFS filesystems is useful when NFS is
used to access shared storage in a high-availability cluster.

If the node running the NFS server fails, some other node can mount the
filesystem and start providing NFS service.  If that node already had
the filesystem NFS mounted, it will now have it loop-back mounted.

nfsd can suffer a deadlock when allocating memory and entering direct
reclaim.
While direct reclaim does not write to the NFS filesystem it can send
and wait for a COMMIT through nfs_release_page().

This patch modifies nfs_release_page() to wait a limited time for the
commit to complete - one second.  If the commit doesn't complete
in this time, nfs_release_page() will fail.  This means it might now
fail in some cases where it wouldn't before.  These cases are only
when 'gfp' includes '__GFP_WAIT'.

nfs_release_page() is only called by try_to_release_page(), and that
can only be called on an NFS page with required 'gfp' flags from
 - page_cache_pipe_buf_steal() in splice.c
 - shrink_page_list() in vmscan.c
 - invalidate_inode_pages2_range() in truncate.c

The first two handle failure quite safely.  The last is only called
after ->launder_page() has been called, and that will have waited
for the commit to finish already.

So aborting if the commit takes longer than 1 second is perfectly safe.

Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index a623b00..c063a4e 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c

@@ -705,6 +705,8 @@
 		if (likely(!PageSwapCache(head->wb_page))) {
 			set_page_private(head->wb_page, 0);
 			ClearPagePrivate(head->wb_page);
+			smp_mb__after_atomic();
+			wake_up_page(head->wb_page, PG_private);
 			clear_bit(PG_MAPPED, &head->wb_flags);
 		}
 		nfsi->npages--;