dm snapshot: implement merge

Implement merge method for the snapshot origin to improve read
performance.

Without merge method, dm asks the upper layers to submit smallest possible
bios --- one page. Submitting such small bios impacts performance negatively
when reading or writing the origin device.

Without this patch, CPU consumption when reading the origin on lvm on md-raid0
was 6 to 12%, with this patch, it drops to 1 to 4%.

Note: in my testing, it actually degraded performance in some settings, I
traced it to Maxtor disks having problems with > 512-sector requests.
Reducing the number of sectors to /sys/block/sd*/queue/max_sectors_kb to
256 fixed the read performance. I think we don't have to care about weird
disks that actually degrade performance because of large requests being
sent to them.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index a1f2ab5..96feada 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -2171,6 +2171,21 @@
 	return 0;
 }
 
+static int origin_merge(struct dm_target *ti, struct bvec_merge_data *bvm,
+			struct bio_vec *biovec, int max_size)
+{
+	struct dm_dev *dev = ti->private;
+	struct request_queue *q = bdev_get_queue(dev->bdev);
+
+	if (!q->merge_bvec_fn)
+		return max_size;
+
+	bvm->bi_bdev = dev->bdev;
+	bvm->bi_sector = bvm->bi_sector;
+
+	return min(max_size, q->merge_bvec_fn(q, bvm, biovec));
+}
+
 static int origin_iterate_devices(struct dm_target *ti,
 				  iterate_devices_callout_fn fn, void *data)
 {
@@ -2188,6 +2203,7 @@
 	.map     = origin_map,
 	.resume  = origin_resume,
 	.status  = origin_status,
+	.merge	 = origin_merge,
 	.iterate_devices = origin_iterate_devices,
 };