x86: micro-optimize __raw_read_trylock()

The current version of __raw_read_trylock starts with decrementing the lock
and read its new value as a separate operation after that.

That makes 3 dereferences (read, write (after sub), read) whereas
a single atomic_dec_return does only two pointers dereferences (read, write).

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index d17c919..4d3dcc5 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -329,8 +329,7 @@
 {
 	atomic_t *count = (atomic_t *)lock;
 
-	atomic_dec(count);
-	if (atomic_read(count) >= 0)
+	if (atomic_dec_return(count) >= 0)
 		return 1;
 	atomic_inc(count);
 	return 0;