prune back iprune_sem

iprune_sem is continously giving us lockdep warnings because we do take it in
read mode in the reclaim path, but we're also doing non-NOFS allocations under
it taken in write mode.

Taking a bit deeper look at it I think it's fixable quite trivially:

 - for invalidate_inodes we do not need iprune_sem at all.  We have an active
   reference on the superblock, so the filesystem is not going away until it
   has finished.
 - for evict_inodes we do need it, to make sure prune_icache has done it's
   work before we tear down the superblock.  But there is no reason to
   hold it over the actual reclaim operation - it's enough to cycle through
   it after the actual reclaim to make sure we wait for any pending
   prune_icache to complete.  We just have to remove the WARN_ON for
   otherwise busy inodes as they can actually happen now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
diff --git a/fs/inode.c b/fs/inode.c
index 0647d80..9910c03 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -84,16 +84,13 @@
 DEFINE_SPINLOCK(inode_lock);
 
 /*
- * iprune_sem provides exclusion between the kswapd or try_to_free_pages
- * icache shrinking path, and the umount path.  Without this exclusion,
- * by the time prune_icache calls iput for the inode whose pages it has
- * been invalidating, or by the time it calls clear_inode & destroy_inode
- * from its final dispose_list, the struct super_block they refer to
- * (for inode->i_sb->s_op) may already have been freed and reused.
+ * iprune_sem provides exclusion between the icache shrinking and the
+ * umount path.
  *
- * We make this an rwsem because the fastpath is icache shrinking. In
- * some cases a filesystem may be doing a significant amount of work in
- * its inode reclaim code, so this should improve parallelism.
+ * We don't actually need it to protect anything in the umount path,
+ * but only need to cycle through it to make sure any inode that
+ * prune_icache took off the LRU list has been fully torn down by the
+ * time we are past evict_inodes.
  */
 static DECLARE_RWSEM(iprune_sem);
 
@@ -516,17 +513,12 @@
 	struct inode *inode, *next;
 	LIST_HEAD(dispose);
 
-	down_write(&iprune_sem);
-
 	spin_lock(&inode_lock);
 	list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
 		if (atomic_read(&inode->i_count))
 			continue;
-
-		if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
-			WARN_ON(1);
+		if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE))
 			continue;
-		}
 
 		inode->i_state |= I_FREEING;
 
@@ -542,6 +534,13 @@
 	spin_unlock(&inode_lock);
 
 	dispose_list(&dispose);
+
+	/*
+	 * Cycle through iprune_sem to make sure any inode that prune_icache
+	 * moved off the list before we took the lock has been fully torn
+	 * down.
+	 */
+	down_write(&iprune_sem);
 	up_write(&iprune_sem);
 }
 
@@ -561,8 +560,6 @@
 	struct inode *inode, *next;
 	LIST_HEAD(dispose);
 
-	down_write(&iprune_sem);
-
 	spin_lock(&inode_lock);
 	list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
 		if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE))
@@ -590,7 +587,6 @@
 	spin_unlock(&inode_lock);
 
 	dispose_list(&dispose);
-	up_write(&iprune_sem);
 
 	return busy;
 }