Btrfs: Add async worker threads for pre and post IO checksumming

Btrfs has been using workqueues to spread the checksumming load across
other CPUs in the system.  But, workqueues only schedule work on the
same CPU that queued the work, giving them a limited benefit for systems with
higher CPU counts.

This code adds a generic facility to schedule work with pools of kthreads,
and changes the bio submission code to queue bios up.  The queueing is
important to make sure large numbers of procs on the system don't
turn streaming workloads into random workloads by sending IO down
concurrently.

The end result of all of this is much higher performance (and CPU usage) when
doing checksumming on large machines.  Two worker pools are created,
one for writes and one for endio processing.  The two could deadlock if
we tried to service both from a single pool.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0f14697b..7daef8d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -359,7 +359,7 @@
 
 	kfree(sums);
 
-	return btrfs_map_bio(root, rw, bio, mirror_num);
+	return btrfs_map_bio(root, rw, bio, mirror_num, 1);
 }
 
 int btrfs_submit_bio_hook(struct inode *inode, int rw, struct bio *bio,
@@ -383,7 +383,7 @@
 				   inode, rw, bio, mirror_num,
 				   __btrfs_submit_bio_hook);
 mapit:
-	return btrfs_map_bio(root, rw, bio, mirror_num);
+	return btrfs_map_bio(root, rw, bio, mirror_num, 0);
 }
 
 int btrfs_readpage_io_hook(struct page *page, u64 start, u64 end)