Optimize pattern verify

Similar to the patch last week, this optimizes the pattern verify
operation to use optimized library calls like memcmp(), and only fall
back to byte-by-byte if there is a miscompare to locate it.

This uses the same premise that the pattern is repeated as many times
as possible to do large compares in a single call.  For single byte
pattern, the setup fills the pattern space, and verify assumes it is
full.

Tested by running a script which created an 8k file with 4k bs and,
one byte at a time, tried corrupting the pattern and running a read
pass and verified it still found miscompares across the whole range of
the pattern.  This was done with a pattern length of 1 and 3 bytes.

In performance tests, this was about 8 times more efficient on
verifies than without this patch.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff --git a/verify.c b/verify.c
index 9ee3bc4..eb8eddc 100644
--- a/verify.c
+++ b/verify.c
@@ -322,14 +322,27 @@
 	struct io_u *io_u = vc->io_u;
 	char *buf, *pattern;
 	unsigned int header_size = __hdr_size(td->o.verify);
-	unsigned int len, mod, i;
+	unsigned int len, mod, i, size, pattern_size;
 
 	pattern = td->o.verify_pattern;
+	pattern_size = td->o.verify_pattern_bytes;
+	if (pattern_size <= 1)
+		pattern_size = MAX_PATTERN_SIZE;
 	buf = (void *) hdr + header_size;
 	len = get_hdr_inc(td, io_u) - header_size;
-	mod = header_size % td->o.verify_pattern_bytes;
+	mod = header_size % pattern_size;
 
-	for (i = 0; i < len; i++) {
+	for (i = 0; i < len; i += size) {
+		size = pattern_size - mod;
+		if (size > (len - i))
+			size = len - i;
+		if (memcmp(buf + i, pattern + mod, size))
+			// Let the slow compare find the first mismatch byte.
+			break;
+		mod = 0;
+	}
+
+	for (; i < len; i++) {
 		if (buf[i] != pattern[mod]) {
 			unsigned int bits;