Cortex A7 microkernel based on LD64 with PLD added. 3.2% faster in end to end mobilenet v2 PLD instructions moved to end of loop to improve VMLA performance. pld_ld64 microkernel removed. Was MobileNetV2_F32/XNNPACK/T:1/real_time 511808 us 509497 us 14 FLOPS=1.17534G/s FPS=1.95386/s Freq=1.1904G Now MobileNetV2_F32/XNNPACK/T:1/real_time 496032 us 496007 us 14 FLOPS=1.21273G/s FPS=2.016/s Freq=1.1904G PiperOrigin-RevId: 321691241

commit: 490febefc20fb13d66e98bdbd25615aaba276236 [log] [tgz]
author: Frank Barchard <fbarchard@google.com> Thu Jul 16 18:42:17 2020 -0700
committer: XNNPACK Team <xnnpack-github-robot@google.com> Thu Jul 16 18:43:35 2020 -0700
tree: 13c2a98b341b3e0576676bb18c92baff740f288f
parent: 1483c53321949ab3890114c76fafc50816db252f [diff] [blame]
diff --git a/test/f32-igemm-minmax.yaml b/test/f32-igemm-minmax.yaml
index 8274224..acfb79d 100644
--- a/test/f32-igemm-minmax.yaml
+++ b/test/f32-igemm-minmax.yaml

@@ -34,7 +34,7 @@
   k-block: 2
   pipelined: false
   assembly: true
-- name: xnn_f32_igemm_minmax_ukernel_4x8__aarch32_neon_pld_ld64
+- name: xnn_f32_igemm_minmax_ukernel_4x8__aarch32_neon_cortex_a7
   k-block: 2
   pipelined: false
   assembly: true
commit	490febefc20fb13d66e98bdbd25615aaba276236	[log] [tgz]
author	Frank Barchard <fbarchard@google.com>	Thu Jul 16 18:42:17 2020 -0700
committer	XNNPACK Team <xnnpack-github-robot@google.com>	Thu Jul 16 18:43:35 2020 -0700
tree	13c2a98b341b3e0576676bb18c92baff740f288f
parent	1483c53321949ab3890114c76fafc50816db252f [diff] [blame]