4x8 GEMM and IGEMM microkernels for Cortex A55. 7.8% faster for e2e mobile net v2. Was f32_gemm_4x8__aarch64_neonfma_cortex_a53/mobilenet_v2/real_time 132632 us Now f32_gemm_4x8__aarch64_neonfma_cortex_a55/mobilenet_v2/real_time 123029 us The rev 1 version of Cortex A55 can co-issue a 64 bit vector load with each FMA, so re-arrange the Cortex-A53 microkernel with 3 FMA paired with 2 loads and INS. PiperOrigin-RevId: 301202721

commit: 8fb90559d6a52162ad2b9c34e5d84a989734cd00 [log] [tgz]
author: Frank Barchard <fbarchard@google.com> Mon Mar 16 11:36:09 2020 -0700
committer: XNNPACK Team <xnnpack-github-robot@google.com> Mon Mar 16 11:36:48 2020 -0700
tree: f1a70e06fe367f325f73db8811a43286cd3fee93
parent: 99103dca8efe240c55de9173838fa34f7e12ba8f [diff] [blame]
diff --git a/test/f32-gemminc.yaml b/test/f32-gemminc.yaml
index bc472f4..cb4670a 100644
--- a/test/f32-gemminc.yaml
+++ b/test/f32-gemminc.yaml

@@ -18,6 +18,10 @@
   k-block: 4
   pipelined: true
   assembly: true
+- name: xnn_f32_gemminc_ukernel_4x8__aarch64_neonfma_cortex_a55
+  k-block: 4
+  pipelined: true
+  assembly: true
 - name: xnn_f32_gemminc_ukernel_4x8__aarch64_neonfma_cortex_a57
   k-block: 8
   pipelined: true
commit	8fb90559d6a52162ad2b9c34e5d84a989734cd00	[log] [tgz]
author	Frank Barchard <fbarchard@google.com>	Mon Mar 16 11:36:09 2020 -0700
committer	XNNPACK Team <xnnpack-github-robot@google.com>	Mon Mar 16 11:36:48 2020 -0700
tree	f1a70e06fe367f325f73db8811a43286cd3fee93
parent	99103dca8efe240c55de9173838fa34f7e12ba8f [diff] [blame]