1. c92034d Define constants for +/- infinity to check for clamping in JIT generators by Zhi An Ng · 2 years, 4 months ago
  2. eb7256b Port F32 GEMM A75 1x8 microkernel to JIT and specialize for min/max, add tests and benchmarks by Zhi An Ng · 2 years, 4 months ago
  3. f0f374f Rename f32-gemm/6x8-aarch64-neonfma-prfm-cortex-a75.cc to remove prfm from file name by Zhi An Ng · 2 years, 4 months ago
  4. 237473f Include missing <limits> header in 4x8 F32 GEMM codegen for A53 by Marat Dukhan · 2 years, 4 months ago
  5. 3e3124e Make void* params argument of JIT generators const by Zhi An Ng · 2 years, 4 months ago
  6. 5ebe686 Specialize 6x8-aarch64-neonfma-cortex-a75 on min/max params by Zhi An Ng · 2 years, 4 months ago
  7. f9fc9ec Integrate JIT generated GEMM microkernels into create_convolution2d_nhwc by Zhi An Ng · 2 years, 5 months ago
  8. fbd67a7 Pad K to a multiple of SR in GEMM/IGEMM microkernels by Marat Dukhan · 2 years, 5 months ago
  9. 8b758bf Integrate JIT generated GEMM microkernels into create_convolution2d_nhwc by XNNPACK Team · 2 years, 5 months ago
  10. df51e11 Integrate JIT generated GEMM microkernels into create_convolution2d_nhwc by Zhi An Ng · 2 years, 5 months ago
  11. c607028 Remove wb from JIT aarch32 instructions, use mem operand and ++ instead by Zhi An Ng · 2 years, 5 months ago
  12. adf087d Remove 3 blank lines after last jit assembly instruction before end of function by Frank Barchard · 2 years, 5 months ago
  13. 752b980 Avoid importing the entire xnnpack namespace in aarch32 assembler by Zhi An Ng · 2 years, 5 months ago
  14. c2e2da8 Fix conversion script for aarch64 assembly kernels and convert a single F32 GEMM as a test by Zhi An Ng · 2 years, 5 months ago
  15. e1ff738 Update assembly register usage comments. by Frank Barchard · 2 years, 5 months ago
  16. 70ea0a2 Specialize F32 GEMM A53 JIT microkernel for min/max params by Zhi An Ng · 2 years, 5 months ago
  17. 0ec25cf Duplicate test methods in gemm-microkernel-test for JIT codegen, update IGEMM generator signature and test generation script. by Zhi An Ng · 2 years, 5 months ago
  18. e7225eb Specialize F32 GEMM (a53) on kc by Zhi An Ng · 2 years, 5 months ago
  19. 01f6aee Add unreachable check for F32 GEMM a53 generator by Zhi An Ng · 2 years, 5 months ago
  20. 13599f3 Specialize F32 GEMM (a53) on nc by Zhi An Ng · 2 years, 5 months ago
  21. 83844ae Change JIT generator signature to accept nc and kc to specialize on those values by Zhi An Ng · 2 years, 5 months ago
  22. 13b57dd Add more converted microkernels used in init.c. by Zhi An Ng · 2 years, 5 months ago
  23. 0a40541 Use FMA instructions for scalar microkernels on RISC-V by Marat Dukhan · 2 years, 5 months ago
  24. 7873586 Rename PLD to PRFM for aarch32 microkernels. by Frank Barchard · 2 years, 5 months ago
  25. 0f28193 Minor optimization in F32 GEMM/IGEMM AVX512F microkernels by Marat Dukhan · 2 years, 6 months ago
  26. c83ef3b Refactor F32 MINMAX parameters for WAsm SIMD by Marat Dukhan · 2 years, 6 months ago
  27. a2f1891 Add _prfm to names on Neon microkernels in a consistent way. by Frank Barchard · 2 years, 6 months ago
  28. b43b47a Add a script to convert existing assembly microkernels to JIT codegen. by Zhi An Ng · 2 years, 6 months ago
  29. 7be427a Disable MSan and TSan in most microkernels with Out-of-Bounds reads by Marat Dukhan · 2 years, 6 months ago
  30. 0bf8afa Leverage f32x4.pmin and f32x4.pmax WAsm SIMD instructions by Marat Dukhan · 2 years, 9 months ago
  31. b7a7c30 NEON GEMM/IGEMM microkernels change store/dup to 2 of each by Frank Barchard · 2 years, 10 months ago
  32. 4810905 Leverage v128.const WAsm SIMD instruction by Marat Dukhan · 2 years, 10 months ago
  33. 2dac7bb Unify on wasm_f64x2_spalt(0.0) to materialize zero SIMD vector in WAsm by Marat Dukhan · 2 years, 11 months ago
  34. 2837e8b Remove 0 offset from loads. by Frank Barchard · 3 years ago
  35. ee029b2 Replace deprecated wasm_simd128.h intrinsics with new versions by Marat Dukhan · 3 years ago
  36. a03020a Run generator scripts Sort names in BUILD files by Frank Barchard · 3 years ago
  37. 167d667 Comment change x8 is a temporary params pointer by Frank Barchard · 3 years ago
  38. 143a110 Rename GEMM/IGEMM microkernels from Cortex-A57/A75 to prfm_cortex_a75 by Frank Barchard · 3 years ago
  39. e349124 fp32 IGEMM 4x8 and 6x8 ld64 microkernels by Frank Barchard · 3 years ago
  40. 7c9f1f9 Replace // with # for lines that only contain a comment. by Frank Barchard · 3 years ago
  41. 104ae5e Use ISA-specific layouts in F32 [I]GEMM & DWCONV microkernels by Marat Dukhan · 3 years, 1 month ago
  42. 76f43f0 Apply consistent formatting to assembly by Frank Barchard · 3 years, 1 month ago
  43. cbfa338 text format white space of prefetch instruction on ARM microkernels by Frank Barchard · 3 years, 1 month ago
  44. 802fcae Additional SSE/SSE2 GEMM/IGEMM microkernels by Marat Dukhan · 3 years, 6 months ago
  45. 0725b8d Rename WebAssembly SIMD source files and functions with x86 or arm suffix after wasmsimd by Frank Barchard · 3 years, 6 months ago
  46. 3b26206 Renumber labels in assembly sequentially by Frank Barchard · 3 years, 9 months ago
  47. 115d3e2 Remove PSIMD variants of GEMM and IGEMM microkernels by Marat Dukhan · 4 years ago
  48. 490febe Cortex A7 microkernel based on LD64 with PLD added. 3.2% faster in end to end mobilenet v2 by Frank Barchard · 4 years ago
  49. 688f6d8 Unify x86 and ARM flavors of WAsm SIMD GEMM/IGEMM/DWCONV with RELU by Marat Dukhan · 4 years ago
  50. e39e646 WAsm SIMD versions of [I]GEMM microkernels with NR=2 by Marat Dukhan · 4 years ago
  51. efc1014 ld64 aarch32 GEMM 4x8 microkernel do all loads before MLA by Frank Barchard · 4 years ago
  52. d6ca9d8 4x8-minmax-aarch32-neon-pld-cortex-a75 Fix prefetch offset to not skip a cache line by Frank Barchard · 4 years ago
  53. 569561d Generate PLD variation of AARCH32 LD64 by Frank Barchard · 4 years ago
  54. 802808c GEMM/IGEMM microkernels with alternative activations in WAsm SIMD by Marat Dukhan · 4 years ago
  55. ac014d7 DWCONV microkernels in WAsm SIMD intrinsics by Marat Dukhan · 4 years ago
  56. 1bbf96b GEMM/IGEMM implementations in WAsm SIMD intrinsics by Marat Dukhan · 4 years ago
  57. 016e586 iOS use Cortex-A75 microkernel which avoids x18 register by Frank Barchard · 4 years ago
  58. 6724218 Avoid x18 register by Frank Barchard · 4 years ago
  59. 909564c Update comment for x18 register by Frank Barchard · 4 years ago
  60. b2217dd Disable tsan for micro-kernels which read out-of-bounds by Marat Dukhan · 4 years, 1 month ago
  61. 467f636 Fused [I]GEMM+RELU micro-kernels by Marat Dukhan · 4 years, 1 month ago
  62. 32f9381 4x4 LD64 GEMM microkernel in AArch32+VFP assembly by Marat Dukhan · 4 years, 1 month ago
  63. 3b98f6b 4x4 LD64 GEMM+MINMAX microkernel in AArch32+VFP assembly by Marat Dukhan · 4 years, 1 month ago
  64. b339045 Comment change rename clamp params to params by Frank Barchard · 4 years, 1 month ago
  65. f5cc7e7 GEMM aarch64 microkernels use LDP to fetch param and cn_stride by Frank Barchard · 4 years, 2 months ago
  66. c4668ed Comment fix for mr <= 4 by Frank Barchard · 4 years, 2 months ago
  67. f196d01 Support CMake build with MSVC by Marat Dukhan · 4 years, 2 months ago
  68. 13a93fa 1x12 Cortex A64 F32 GEMM use 2 sets of accumulators by Frank Barchard · 4 years, 2 months ago
  69. 3cb54f9 1x8 LD64 F32 GEMM by Frank Barchard · 4 years, 2 months ago
  70. 163a7e6 Scalar & WAsm GEMM/IGEMM/DWCONV micro-kernels without activation by Marat Dukhan · 4 years, 2 months ago
  71. de06f49 Add MINMAX suffix to GEMM/IGEMM/DWCONV/PPMM micro-kernel names by Marat Dukhan · 4 years, 2 months ago
  72. 1c58711 Add MINMAX suffix to filenames of GEMM/IGEMM/PPMM/DWCONV micro-kernels by Marat Dukhan · 4 years, 2 months ago
  73. eb09a6b Rename F32/U8 output params to minmax params by Marat Dukhan · 4 years, 2 months ago
  74. a51cf48 Unify layout of min/max parameters by Marat Dukhan · 4 years, 2 months ago
  75. 0d1052c iOS 6x8 microkernel based on Cortex-A75 but with X18 avoided. by Frank Barchard · 4 years, 3 months ago
  76. 6f8c966 Use x13 instead of x18. by Frank Barchard · 4 years, 3 months ago
  77. 8fb9055 4x8 GEMM and IGEMM microkernels for Cortex A55. 7.8% faster for e2e mobile net v2. by Frank Barchard · 4 years, 3 months ago
  78. 36053aa 4x8 AARCH32 GEMM/IGEMM avoid r2 push/pop. by Frank Barchard · 4 years, 3 months ago
  79. b7dd29e 4x8 GEMM and IGEMM microkernels for AARCH32 Cortex A55. 11.5% faster end to end: by Frank Barchard · 4 years, 3 months ago
  80. f32ae34 Unify the value of $ABC variable across all templates by Marat Dukhan · 4 years, 3 months ago
  81. 91e1999 6x8 GEMM and IGEMM microkernels for Cortex A55. 9% faster end to end: by Frank Barchard · 4 years, 3 months ago
  82. b00004d 4x2c4 GEMM micro-kernels for PSIMD and SSE by Marat Dukhan · 4 years, 4 months ago
  83. c1a0697 Replace load with mov for ks in xnn_f32_igemm_ukernel_4x8__aarch32_neon_cortex_a75 by Frank Barchard · 4 years, 4 months ago
  84. 9b499d6 Load parameters in order of usage. by Frank Barchard · 4 years, 4 months ago
  85. 8155854 Direct branch to source remainder handler for GEMM/IGEMM. by Frank Barchard · 4 years, 4 months ago
  86. 79ade18 LD64 microkernels branch directly to remainder if less than 2 channels. by Frank Barchard · 4 years, 4 months ago
  87. d6ebf0c 6x8 a53 use X8 for GPR shadow register. Eliminate GPR push/pop by Frank Barchard · 4 years, 5 months ago
  88. 534375d A53 GEMM / IGEMM kernel prefetches adjust by 1 by Frank Barchard · 4 years, 5 months ago
  89. c03b2bd 4x12 A53 GEMM and IGEMM use X8 for temp GPR by Frank Barchard · 4 years, 5 months ago
  90. 324f2bb 4x8 A53 GEMM use X4 for temp GPR Saves a push/pop of X19. by Frank Barchard · 4 years, 5 months ago
  91. 7693acf 4x8 Cortex-A53 GEMM / IGEMM use 1 GPR instead of 2. by Frank Barchard · 4 years, 5 months ago
  92. f884a7b 6X8 Cortex-A53 GEMM use 1 GPR instead of 2. by Frank Barchard · 4 years, 5 months ago
  93. 54afb13 Use 2 GPR registers instead of 4 for GPR loads. by Frank Barchard · 4 years, 5 months ago
  94. 4cd8907 4x12 A53 kernel use prefetches on A by Frank Barchard · 4 years, 5 months ago
  95. c01d8a4 A53 aarch32 pipelined. 19.2% faster 4x8 GEMM, 9.3% faster end to end by Frank Barchard · 4 years, 5 months ago
  96. b177732 Remove prefetch of output buffer from A53 kernels. by Frank Barchard · 4 years, 5 months ago
  97. 279908a A75 / A53 aarch32 epilogue reordered by B the same as main loop. by Frank Barchard · 4 years, 6 months ago
  98. 387c2d1 Generate A57 micro-kernels from A75 source. by Frank Barchard · 4 years, 6 months ago
  99. 005feb8 A53 push r1, r2 so they can be used as scratch. Reorder FMA by B by Frank Barchard · 4 years, 6 months ago
  100. 0090f5b 4x8 FMA sorted by B to match load order by Frank Barchard · 4 years, 6 months ago