1. 5c37527 Make SSE2 microkernels consistent with neon zip microkernels. by Alan Kelly · 2 years, 4 months ago
  2. f2b233b Make SSE2 microkernels consistent with neon zip microkernels. - DEC is now MOV by Alan Kelly · 2 years, 4 months ago
  3. 8b758bf Integrate JIT generated GEMM microkernels into create_convolution2d_nhwc by XNNPACK Team · 2 years, 4 months ago
  4. 64cb10f Guard JIT-related structs and functionality behind XNN_PLATFORM_JIT by XNNPACK Team · 2 years, 4 months ago
  5. c9a2e74 Guard JIT-related structs and functionality behind XNN_PLATFORM_JIT by Zhi An Ng · 2 years, 4 months ago
  6. df51e11 Integrate JIT generated GEMM microkernels into create_convolution2d_nhwc by Zhi An Ng · 2 years, 4 months ago
  7. 15dd611 Check code_buffer capacity before attempting to release it by Zhi An Ng · 2 years, 4 months ago
  8. c607028 Remove wb from JIT aarch32 instructions, use mem operand and ++ instead by Zhi An Ng · 2 years, 4 months ago
  9. d236074 Add F32 GEMM 6x8 aarch64 neonfma cortex a75 JIT microkernel to benchmark by Zhi An Ng · 2 years, 4 months ago
  10. fc67a86 Fix encoding of prfm by Zhi An Ng · 2 years, 4 months ago
  11. 6cc5b48 QS8/QC8 4x8 dot product IGEMM AArch32 microkernel for Cortex A55 by Frank Barchard · 2 years, 4 months ago
  12. 2269ac8 Add default cases for switch, GCC warns that control reaches the end of non-void function. by Zhi An Ng · 2 years, 4 months ago
  13. d2bea50 Remove default member initializer for VRegister and ScalarVRegister so that we can aggregate initialize them (on GCC) by Zhi An Ng · 2 years, 4 months ago
  14. c2f62ea Remove redundant closing brace in CMakeLists by Marat Dukhan · 2 years, 4 months ago
  15. 870108c QS8/QC8 4x8 dot product IGEMM AArch32 microkernel for Cortex A55 by Frank Barchard · 2 years, 4 months ago
  16. e8fd444 QS8 IGEMM AArch64 LD64 round KC up to multiple of 4 before saving it on stack by Frank Barchard · 2 years, 4 months ago
  17. adf087d Remove 3 blank lines after last jit assembly instruction before end of function by Frank Barchard · 2 years, 4 months ago
  18. 773458c Change return type for assembler functions to void to simplify code, move emit32 into common assembler by Zhi An Ng · 2 years, 4 months ago
  19. 752b980 Avoid importing the entire xnnpack namespace in aarch32 assembler by Zhi An Ng · 2 years, 4 months ago
  20. c2e2da8 Fix conversion script for aarch64 assembly kernels and convert a single F32 GEMM as a test by Zhi An Ng · 2 years, 4 months ago
  21. 4a1c6a8 Implement ldp (d registers) offset and post index for aarch64 assembler by Zhi An Ng · 2 years, 4 months ago
  22. 193f4e1 Disable QU8 dot product for AArch32 IOS by Frank Barchard · 2 years, 4 months ago
  23. 048704d Implement stp (q registers) offset and post indexed for aarch64 assembler by Zhi An Ng · 2 years, 4 months ago
  24. 3cec451 Implement tst (immediate) for aarch64 assembler by Zhi An Ng · 2 years, 4 months ago
  25. 8709ac9 Implement csel for aarch64 assembler by Zhi An Ng · 2 years, 4 months ago
  26. ba5091f Enable QC8 4x8 dot product GEMM AArch32 microkernel for Cortex A55 by Frank Barchard · 2 years, 4 months ago
  27. a1cad4a Add x8 transpose bench by Alan Kelly · 2 years, 4 months ago
  28. ba68f44 Add x64 transpose bench by Alan Kelly · 2 years, 4 months ago
  29. e1ff738 Update assembly register usage comments. by Frank Barchard · 2 years, 4 months ago
  30. 35d8e68 Implemnet stp (d register) offset and pre-index for aarch64 assembler by Zhi An Ng · 2 years, 4 months ago
  31. 6c30427 Remove unused transpose ukernel declarations and unnecessary semi-colons. by Alan Kelly · 2 years, 4 months ago
  32. c821ea7 Refactor x16 transpose bench and add missing ukernels. by Alan Kelly · 2 years, 4 months ago
  33. 658a67d Implement add (x registers) for aarch64 assembler by Zhi An Ng · 2 years, 4 months ago
  34. 80eac62 Implement cmp (immediate) for aarch64 assembler by Zhi An Ng · 2 years, 4 months ago
  35. c98f0d2 Fix patching of branch instructions immediate by Zhi An Ng · 2 years, 4 months ago
  36. e8bbda0 Re-factor x32 transpose bench by Alan Kelly · 2 years, 4 months ago
  37. ac654f1 QC8 4x8 dot product GEMM AArch32 microkernel for Cortex A55 by Frank Barchard · 2 years, 4 months ago
  38. 364598a Enable QS8 4x8 dot product GEMM AArch32 microkernel little core by Frank Barchard · 2 years, 4 months ago
  39. 1e277fd Bug fixes for QS8 Cortex A55 by Frank Barchard · 2 years, 4 months ago
  40. 491e9e0 Implement ldr for s and d registers and str for d registers (post-indexed) for aarch64 assembler by Zhi An Ng · 2 years, 4 months ago
  41. 708874b Add cpu configs to support iOS simulator builds on M1-based macs. by XNNPACK Team · 2 years, 4 months ago
  42. 1228b3e Enable QS8 4x8 dot product GEMM AArch32 microkernel for Cortex A55 by Frank Barchard · 2 years, 4 months ago
  43. 0f294ad QS8 4x8 dot product GEMM AArch32 microkernel for Cortex A55 by Frank Barchard · 2 years, 4 months ago
  44. 2f24c3e Implement dup (vector) for aarch64 assembler by Zhi An Ng · 2 years, 4 months ago
  45. f761632 Implement str (q register, post-indexed) and str (s register, offset) for aarch64 assembler by Zhi An Ng · 2 years, 4 months ago
  46. 5a5c9e1 Implement mov (VRegister) for aarch64 assembler by Zhi An Ng · 2 years, 4 months ago
  47. 5e31395 Implement stp (post-indexed) for aarch64 assembler by Zhi An Ng · 2 years, 4 months ago
  48. 4915509 Implement add with immediate for aarch64 assembler by Zhi An Ng · 2 years, 4 months ago
  49. 4ab1390 Rename kTbz enum to kTbxz and add comment to clarify its usage for both TBZ and TBNZ by Zhi An Ng · 2 years, 4 months ago
  50. b10677e Implement unconditional branch for aarch64 assembler by Zhi An Ng · 2 years, 4 months ago
  51. 56e8b91 Implement tbz for aarch64 assembler by Zhi An Ng · 2 years, 4 months ago
  52. cdfff79 Implement ret for aarch64 assembler by Zhi An Ng · 2 years, 4 months ago
  53. 039a388 Exclude quantized AVX512 microkernels from mobile builds by Marat Dukhan · 2 years, 4 months ago
  54. 3176868 Implement sub (x register) for aarch64 assembler by Zhi An Ng · 2 years, 4 months ago
  55. 3f34299 Implement st1 for aarch64 assembler by Zhi An Ng · 2 years, 4 months ago
  56. 544d73d Implement fmax and fmin (vector) for aarch64 assembler by Zhi An Ng · 2 years, 4 months ago
  57. ecfb1f0 Implement fadd (vector) for aarch64 assembler by Zhi An Ng · 2 years, 4 months ago
  58. 0981080 Implement tbnz for aarch64 assembler by Zhi An Ng · 2 years, 4 months ago
  59. 6a1151b Implement fmla for aarch64 assembler by Zhi An Ng · 2 years, 4 months ago
  60. 157b0f4 Implement ldr ldp for q registers in aarch64 assembler by Zhi An Ng · 2 years, 4 months ago
  61. f67f1be Implement labels and B.cond for aarch64 assembler by Zhi An Ng · 2 years, 4 months ago
  62. e2dc2ec Implement subs for aarch64 assembler by Zhi An Ng · 2 years, 4 months ago
  63. 234d6b4 Implement prfm (only PLDL1KEEP) on aarch64 assembler by Zhi An Ng · 2 years, 4 months ago
  64. 65ccb13 Implement movi for aarch64 assembler by Zhi An Ng · 2 years, 4 months ago
  65. 6e68f54 Implement ld1 for 1, 2, and 3 registers for aarch64 assembler by Zhi An Ng · 2 years, 4 months ago
  66. 5702efb Implement ld2r for aarch64 assembler by Zhi An Ng · 2 years, 4 months ago
  67. 04cdc41 Implement ldr for aarch64 assembler by Zhi An Ng · 2 years, 4 months ago
  68. 0ba29e7 Implement LDP for aarch64 assembler by Zhi An Ng · 2 years, 4 months ago
  69. 70ea0a2 Specialize F32 GEMM A53 JIT microkernel for min/max params by Zhi An Ng · 2 years, 4 months ago
  70. 109a5eb Initial aarch64 assembler structure by Zhi An Ng · 2 years, 4 months ago
  71. 8f920a6 Initialize F16 microkernel pointers on x86 by Marat Dukhan · 2 years, 4 months ago
  72. ffbf7ff Cleanup transpose microkernels in BUILD & CMakeLists by Marat Dukhan · 2 years, 4 months ago
  73. 66eb508 Add missing declarations and unit tests for F16 DWCONV microkernels by Marat Dukhan · 2 years, 4 months ago
  74. 0ec25cf Duplicate test methods in gemm-microkernel-test for JIT codegen, update IGEMM generator signature and test generation script. by Zhi An Ng · 2 years, 4 months ago
  75. e7225eb Specialize F32 GEMM (a53) on kc by Zhi An Ng · 2 years, 4 months ago
  76. 8d07e40 Enable QU8 4x8 NEON MLA Lane microkernel AArch32 assembly language by Frank Barchard · 2 years, 4 months ago
  77. 901845c QU8 4x8 NEON MLA Lane microkernel AArch32 assembly language by Frank Barchard · 2 years, 4 months ago
  78. b26ead1 F16C implementation of F16 GAVGPOOL microkernels by Marat Dukhan · 2 years, 4 months ago
  79. c7c92b0 Generate F16 GAVGPOOL NEONFP16ARITH microkernels from template by Marat Dukhan · 2 years, 4 months ago
  80. 01f6aee Add unreachable check for F32 GEMM a53 generator by Zhi An Ng · 2 years, 4 months ago
  81. e78eb33 Bump shard count for f32_igemm_minmax_test (timing out on coverage runs) by Zhi An Ng · 2 years, 4 months ago
  82. 13599f3 Specialize F32 GEMM (a53) on nc by Zhi An Ng · 2 years, 4 months ago
  83. 1d6b7c9 Support FP32 weights in FP16 NC Fully Connected operator by Marat Dukhan · 2 years, 4 months ago
  84. d2e8d4d Enable QC8 AArch32 4x8 lane GEMM/IGEMM assembly microkernels for ARMv7 NEON by Frank Barchard · 2 years, 4 months ago
  85. 6989ec4 Support FP32 weights in FP16 NHWC Convolution operator by Marat Dukhan · 2 years, 4 months ago
  86. 5e1a303 QC8 GEMM/IGEMM assembly microkernels for ARMv7 NEON by Frank Barchard · 2 years, 4 months ago
  87. 83844ae Change JIT generator signature to accept nc and kc to specialize on those values by Zhi An Ng · 2 years, 4 months ago
  88. b1a869d Merge generate transpose scripts by Alan Kelly · 2 years, 4 months ago
  89. 9dfdfb5 Remove unused transpose function declarations. by Alan Kelly · 2 years, 4 months ago
  90. 667e0f1 Regenerate transpose tests by Alan Kelly · 2 years, 4 months ago
  91. 4b23423 Split test generator for qu8-gavgpool by Frank Barchard · 2 years, 4 months ago
  92. 5da6d38 SSE2 transpose microkernel code generator. by Alan Kelly · 2 years, 4 months ago
  93. d19bde9 Add x64 scalar transpose microkernels by Alan Kelly · 2 years, 4 months ago
  94. cd21b02 Add x8 scalar transpose microkernels by Alan Kelly · 2 years, 4 months ago
  95. 84aae41 Add x16 scalar transpose microkernels by Alan Kelly · 2 years, 4 months ago
  96. 6315472 Remove declarations for scalar transpose microkernels that don't exist by Alan Kelly · 2 years, 4 months ago
  97. d7111a5 Remove F32 GEMM E2E JIT benchmarks (temporarily) as we are changing the JIT generator interface by Zhi An Ng · 2 years, 4 months ago
  98. 58fe65e Change default JIT code buffer size to 16kb by Zhi An Ng · 2 years, 4 months ago
  99. af9ff85 Fix GEMM test templates to use variable n instead of fixed NR and regenerate tests by Zhi An Ng · 2 years, 4 months ago
  100. 2d38e3c Fix more errors in CMakeLists by Marat Dukhan · 2 years, 4 months ago