syslet: add proper read barrier between user_tail and completion read

Also fixup a bug with ring indexing, it needs to use the real ring size
mask, not the io depth.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
diff --git a/arch/arch-ppc.h b/arch/arch-ppc.h
index 0a23c01..9783131 100644
--- a/arch/arch-ppc.h
+++ b/arch/arch-ppc.h
@@ -20,6 +20,14 @@
 
 #define nop	do { } while (0)
 
+#ifdef __powerpc64__
+#define read_barrier()	\
+	__asm__ __volatile__ ("lwsync" : : : "memory")
+#else
+#define read_barrier()	\
+	__asm__ __volatile__ ("sync" : : : "memory")
+#endif
+
 static inline int __ilog2(unsigned long bitmask)
 {
 	int lz;