blkcg: reduce stack usage of blkg_rwstat_recursive_sum()

The recent percpu conversion of blkg_rwstat triggered the following
warning in certain configurations.

 block/blk-cgroup.c:654:1: warning: the frame size of 1360 bytes is larger than 1024 bytes

This is because blkg_rwstat now contains four percpu_counter which can
be pretty big depending on debug options although it shouldn't be a
problem in production configs.  This patch removes one of the two
local blkg_rwstat variables used by blkg_rwstat_recursive_sum() to
reduce stack usage.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Link: http://article.gmane.org/gmane.linux.kernel.cgroups/13835
Signed-off-by: Jens Axboe <axboe@fb.com>
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index a25263c..c82c5db 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -744,7 +744,7 @@
 
 	rcu_read_lock();
 	blkg_for_each_descendant_pre(pos_blkg, pos_css, blkg) {
-		struct blkg_rwstat *rwstat, tmp;
+		struct blkg_rwstat *rwstat;
 
 		if (!pos_blkg->online)
 			continue;
@@ -754,12 +754,10 @@
 		else
 			rwstat = (void *)pos_blkg + off;
 
-		tmp = blkg_rwstat_read(rwstat);
-
 		for (i = 0; i < BLKG_RWSTAT_NR; i++)
-			atomic64_add(atomic64_read(&tmp.aux_cnt[i]) +
-				     atomic64_read(&rwstat->aux_cnt[i]),
-				     &sum.aux_cnt[i]);
+			atomic64_add(atomic64_read(&rwstat->aux_cnt[i]) +
+				percpu_counter_sum_positive(&rwstat->cpu_cnt[i]),
+				&sum.aux_cnt[i]);
 	}
 	rcu_read_unlock();