fs: prefetch inode data in dcache lookup

This makes single threaded git diff -1.25% +/- 0.05% elapsed time on my
2s12c24t Westmere system, and -0.86% +/- 0.05% on my 2s8c Barcelona, by
prefetching the important first cacheline of the inode in while we do the
actual name compare and other operations on the dentry.

There was no measurable slowdown in the single file stat case, or the creat
case (where negative dentries would be common).

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
diff --git a/fs/dcache.c b/fs/dcache.c
index 9e6e6db..2a4ce7d 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1793,6 +1793,9 @@
 		tlen = dentry->d_name.len;
 		tname = dentry->d_name.name;
 		i = dentry->d_inode;
+		prefetch(tname);
+		if (i)
+			prefetch(i);
 		/*
 		 * This seqcount check is required to ensure name and
 		 * len are loaded atomically, so as not to walk off the