cfq-iosched: fix RCU race in the cfq io_context destructor handling

put_io_context() drops the RCU read lock before calling into cfq_dtor(),
however we need to hold off freeing there before grabbing and
dereferencing the first object on the list.

So extend the rcu_read_lock() scope to cover the calling of cfq_dtor(),
and optimize cfq_free_io_context() to use a new variant for
call_for_each_cic() that assumes the RCU read lock is already held.

Hit in the wild by Alexey Dobriyan <adobriyan@gmail.com>

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index e34df7c..012f065 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -41,8 +41,8 @@
 		rcu_read_lock();
 		if (ioc->aic && ioc->aic->dtor)
 			ioc->aic->dtor(ioc->aic);
-		rcu_read_unlock();
 		cfq_dtor(ioc);
+		rcu_read_unlock();
 
 		kmem_cache_free(iocontext_cachep, ioc);
 		return 1;