RAID: add tilegx SIMD implementation of raid6

This change adds TILE-Gx SIMD instructions to the software raid
(md), modeling the Altivec implementation. This is only for Syndrome
generation; there is more that could be done to improve recovery,
as in the recent Intel SSE3 recovery implementation.

The code unrolls 8 times; this turns out to be the best on tilegx
hardware among the set 1, 2, 4, 8 or 16.  The code reads one
cache-line of data from each disk, stores P and Q then goes to the
next cache-line.

The test code in sys/linux/lib/raid6/test reports 2008 MB/s data
read rate for syndrome generation using 18 disks (16 data and 2
parity). It was 1512 MB/s before this SIMD optimizations. This is
running on 1 core with all the data in cache.

This is based on the paper The Mathematics of RAID-6.
(http://kernel.org/pub/linux/kernel/people/hpa/raid6.pdf).

Signed-off-by: Ken Steele <ken@tilera.com>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: NeilBrown <neilb@suse.de>
diff --git a/lib/raid6/Makefile b/lib/raid6/Makefile
index 9f7c184..e5e9021 100644
--- a/lib/raid6/Makefile
+++ b/lib/raid6/Makefile
@@ -5,6 +5,7 @@
 
 raid6_pq-$(CONFIG_X86) += recov_ssse3.o recov_avx2.o mmx.o sse1.o sse2.o avx2.o
 raid6_pq-$(CONFIG_ALTIVEC) += altivec1.o altivec2.o altivec4.o altivec8.o
+raid6_pq-$(CONFIG_TILEGX) += tilegx8.o
 
 hostprogs-y	+= mktables
 
@@ -70,6 +71,11 @@
 $(obj)/altivec8.c:   $(src)/altivec.uc $(src)/unroll.awk FORCE
 	$(call if_changed,unroll)
 
+targets += tilegx8.c
+$(obj)/tilegx8.c:   UNROLL := 8
+$(obj)/tilegx8.c:   $(src)/tilegx.uc $(src)/unroll.awk FORCE
+	$(call if_changed,unroll)
+
 quiet_cmd_mktable = TABLE   $@
       cmd_mktable = $(obj)/mktables > $@ || ( rm -f $@ && exit 1 )