x86: remove extra barriers from load_gs_base()

Impact: optimization

mb() generates an mfence instruction, which is not needed here.  Only
a compiler barrier is needed, and that is handled by the memory clobber
in the wrmsrl function.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 32c30b0..794234e 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -397,10 +397,7 @@
 
 static inline void load_gs_base(int cpu)
 {
-	/* Memory clobbers used to order pda/percpu accesses */
-	mb();
 	wrmsrl(MSR_GS_BASE, (unsigned long)per_cpu(irq_stack_union.gs_base, cpu));
-	mb();
 }
 #endif