- e349124 fp32 IGEMM 4x8 and 6x8 ld64 microkernels by Frank Barchard · 3 years, 5 months ago
- 7c9f1f9 Replace // with # for lines that only contain a comment. by Frank Barchard · 3 years, 5 months ago
- 104ae5e Use ISA-specific layouts in F32 [I]GEMM & DWCONV microkernels by Marat Dukhan · 3 years, 6 months ago
- 76f43f0 Apply consistent formatting to assembly by Frank Barchard · 3 years, 6 months ago
- cbfa338 text format white space of prefetch instruction on ARM microkernels by Frank Barchard · 3 years, 6 months ago
- 802fcae Additional SSE/SSE2 GEMM/IGEMM microkernels by Marat Dukhan · 4 years ago
- 0725b8d Rename WebAssembly SIMD source files and functions with x86 or arm suffix after wasmsimd by Frank Barchard · 4 years ago
- 3b26206 Renumber labels in assembly sequentially by Frank Barchard · 4 years, 1 month ago
- 115d3e2 Remove PSIMD variants of GEMM and IGEMM microkernels by Marat Dukhan · 4 years, 4 months ago
- 490febe Cortex A7 microkernel based on LD64 with PLD added. 3.2% faster in end to end mobilenet v2 by Frank Barchard · 4 years, 4 months ago
- 688f6d8 Unify x86 and ARM flavors of WAsm SIMD GEMM/IGEMM/DWCONV with RELU by Marat Dukhan · 4 years, 4 months ago
- e39e646 WAsm SIMD versions of [I]GEMM microkernels with NR=2 by Marat Dukhan · 4 years, 4 months ago
- efc1014 ld64 aarch32 GEMM 4x8 microkernel do all loads before MLA by Frank Barchard · 4 years, 4 months ago
- d6ca9d8 4x8-minmax-aarch32-neon-pld-cortex-a75 Fix prefetch offset to not skip a cache line by Frank Barchard · 4 years, 5 months ago
- 569561d Generate PLD variation of AARCH32 LD64 by Frank Barchard · 4 years, 5 months ago
- 802808c GEMM/IGEMM microkernels with alternative activations in WAsm SIMD by Marat Dukhan · 4 years, 5 months ago
- ac014d7 DWCONV microkernels in WAsm SIMD intrinsics by Marat Dukhan · 4 years, 5 months ago
- 1bbf96b GEMM/IGEMM implementations in WAsm SIMD intrinsics by Marat Dukhan · 4 years, 5 months ago
- 016e586 iOS use Cortex-A75 microkernel which avoids x18 register by Frank Barchard · 4 years, 5 months ago
- 6724218 Avoid x18 register by Frank Barchard · 4 years, 5 months ago
- 909564c Update comment for x18 register by Frank Barchard · 4 years, 5 months ago
- b2217dd Disable tsan for micro-kernels which read out-of-bounds by Marat Dukhan · 4 years, 6 months ago
- 467f636 Fused [I]GEMM+RELU micro-kernels by Marat Dukhan · 4 years, 6 months ago
- b339045 Comment change rename clamp params to params by Frank Barchard · 4 years, 6 months ago
- f5cc7e7 GEMM aarch64 microkernels use LDP to fetch param and cn_stride by Frank Barchard · 4 years, 7 months ago
- c4668ed Comment fix for mr <= 4 by Frank Barchard · 4 years, 7 months ago
- f196d01 Support CMake build with MSVC by Marat Dukhan · 4 years, 7 months ago
- 13a93fa 1x12 Cortex A64 F32 GEMM use 2 sets of accumulators by Frank Barchard · 4 years, 7 months ago
- 3cb54f9 1x8 LD64 F32 GEMM by Frank Barchard · 4 years, 7 months ago
- 163a7e6 Scalar & WAsm GEMM/IGEMM/DWCONV micro-kernels without activation by Marat Dukhan · 4 years, 7 months ago
- de06f49 Add MINMAX suffix to GEMM/IGEMM/DWCONV/PPMM micro-kernel names by Marat Dukhan · 4 years, 7 months ago
- 1c58711 Add MINMAX suffix to filenames of GEMM/IGEMM/PPMM/DWCONV micro-kernels by Marat Dukhan · 4 years, 7 months ago
- eb09a6b Rename F32/U8 output params to minmax params by Marat Dukhan · 4 years, 7 months ago
- a51cf48 Unify layout of min/max parameters by Marat Dukhan · 4 years, 7 months ago
- 0d1052c iOS 6x8 microkernel based on Cortex-A75 but with X18 avoided. by Frank Barchard · 4 years, 8 months ago
- 6f8c966 Use x13 instead of x18. by Frank Barchard · 4 years, 8 months ago
- 8fb9055 4x8 GEMM and IGEMM microkernels for Cortex A55. 7.8% faster for e2e mobile net v2. by Frank Barchard · 4 years, 8 months ago
- 91e1999 6x8 GEMM and IGEMM microkernels for Cortex A55. 9% faster end to end: by Frank Barchard · 4 years, 8 months ago
- b00004d 4x2c4 GEMM micro-kernels for PSIMD and SSE by Marat Dukhan · 4 years, 9 months ago
- c1a0697 Replace load with mov for ks in xnn_f32_igemm_ukernel_4x8__aarch32_neon_cortex_a75 by Frank Barchard · 4 years, 9 months ago
- 8155854 Direct branch to source remainder handler for GEMM/IGEMM. by Frank Barchard · 4 years, 9 months ago
- 79ade18 LD64 microkernels branch directly to remainder if less than 2 channels. by Frank Barchard · 4 years, 9 months ago
- d6ebf0c 6x8 a53 use X8 for GPR shadow register. Eliminate GPR push/pop by Frank Barchard · 4 years, 10 months ago
- 534375d A53 GEMM / IGEMM kernel prefetches adjust by 1 by Frank Barchard · 4 years, 10 months ago
- c03b2bd 4x12 A53 GEMM and IGEMM use X8 for temp GPR by Frank Barchard · 4 years, 10 months ago
- 324f2bb 4x8 A53 GEMM use X4 for temp GPR Saves a push/pop of X19. by Frank Barchard · 4 years, 10 months ago
- 7693acf 4x8 Cortex-A53 GEMM / IGEMM use 1 GPR instead of 2. by Frank Barchard · 4 years, 10 months ago
- f884a7b 6X8 Cortex-A53 GEMM use 1 GPR instead of 2. by Frank Barchard · 4 years, 10 months ago
- 4cd8907 4x12 A53 kernel use prefetches on A by Frank Barchard · 4 years, 10 months ago
- b177732 Remove prefetch of output buffer from A53 kernels. by Frank Barchard · 4 years, 10 months ago
- 279908a A75 / A53 aarch32 epilogue reordered by B the same as main loop. by Frank Barchard · 4 years, 11 months ago
- 387c2d1 Generate A57 micro-kernels from A75 source. by Frank Barchard · 5 years ago
- 0090f5b 4x8 FMA sorted by B to match load order by Frank Barchard · 5 years ago
- abf8154 Code generator for PLD and non-PLD versions of aarch32 4x8 Cortex-A75 kernel by Frank Barchard · 5 years ago
- 07efec4 Run generator for A73 kernel NOP by Frank Barchard · 5 years ago
- 73ccfb4 Move SUBS to 2nd instruction of clamp code. by Frank Barchard · 5 years ago
- c659140 a73 kernel move SUBS before clamp and add NOP before branch by Frank Barchard · 5 years ago
- d94b856 Rename strided gemm and igemm fma3 broadcasts. by Ashkan Aliabadi · 5 years ago
- 2712132 FMA3 microkernels with 4-wide shuffle by Marat Dukhan · 5 years ago
- eccfd71 NR=16 GEMM and IGEMM micro-kernels in AVX and FMA3 implementations by Marat Dukhan · 5 years ago
- cfb3134 Polyfill missing _cvtu32_mask16 intrinsic on old gcc by Marat Dukhan · 5 years ago
- 6383f49 Assembly GEMM kernel NC loop use SUBS instead of CMP+SUBS by Frank Barchard · 5 years ago
- 436ebe6 Separate WAsm micro-kernels and scalar micro-kernels by Marat Dukhan · 5 years ago
- 0f349c4 AVX512F implementation of GEMM & IGEMM micro-kernels by Marat Dukhan · 5 years ago
- c72fa1e Use XNN_ARCH_* macros for architecture-specific parts in micro-kernels by Marat Dukhan · 5 years ago
- 69172d9 6x8 ld128 GEMM microkernels by Frank Barchard · 5 years ago
- 40a672f Move generated micro-kernels into a subdirectory by Marat Dukhan · 5 years ago