[PATCH] x86_64: prefetch the mmap_sem in the fault path

In a micro-benchmark that stresses the pagefault path, the down_read_trylock
on the mmap_sem showed up quite high on the profile. Turns out this lock is
bouncing between cpus quite a bit and thus is cache-cold a lot. This patch
prefetches the lock (for write) as early as possible (and before some other
somewhat expensive operations). With this patch, the down_read_trylock
basically fell out of the top of profile.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
diff --git a/arch/x86_64/mm/fault.c b/arch/x86_64/mm/fault.c
index de91e17..316c53d 100644
--- a/arch/x86_64/mm/fault.c
+++ b/arch/x86_64/mm/fault.c
@@ -314,11 +314,13 @@
 	unsigned long flags;
 	siginfo_t info;
 
+	tsk = current;
+	mm = tsk->mm;
+	prefetchw(&mm->mmap_sem);
+
 	/* get the address */
 	__asm__("movq %%cr2,%0":"=r" (address));
 
-	tsk = current;
-	mm = tsk->mm;
 	info.si_code = SEGV_MAPERR;