mlx4_core: Use mmiowb() to avoid firmware commands getting jumbled up

Firmware commands are sent to the HCA by writing multiple words to a
command register block.  Access to this block of registers is
serialized with a mutex.  However, on large SGI systems writes to the
register block may be reordered within the system interconnect and
reach the HCA in a different order than they were issued (even with
the mutex).  Fix this by adding an mmiowb() before dropping the mutex.

This bug was observed with real workloads with the similar FW command
code in the mthca driver, and adding the mmiowb() as in commit
66547550 ("IB/mthca: Use mmiowb() to avoid firmware commands getting
jumbled up") was confirmed to fix the problems, so we should add the
same fix to mlx4.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
diff --git a/drivers/net/mlx4/cmd.c b/drivers/net/mlx4/cmd.c
index b540820..db49051 100644
--- a/drivers/net/mlx4/cmd.c
+++ b/drivers/net/mlx4/cmd.c
@@ -184,6 +184,13 @@
 					       (event ? (1 << HCR_E_BIT) : 0)	|
 					       (op_modifier << HCR_OPMOD_SHIFT) |
 					       op),			  hcr + 6);
+
+	/*
+	 * Make sure that our HCR writes don't get mixed in with
+	 * writes from another CPU starting a FW command.
+	 */
+	mmiowb();
+
 	cmd->toggle = cmd->toggle ^ 1;
 
 	ret = 0;