1. 2649014 Implement vmax_s8, vmin_s8, vqadd_s16, vqdmulh_s32, vqshl_s32, vrshl_s32 by Zhi An Ng · 2 years, 10 months ago
  2. 4ef8d51 Implement vst1_16, add some more test cases by Zhi An Ng · 2 years, 10 months ago
  3. 00a929f Implement vst1_8 and fix vst1_32 encoding by Zhi An Ng · 2 years, 10 months ago
  4. 9820234 Full set of benchmarks for Convert operator by Marat Dukhan · 2 years, 10 months ago
  5. 1d1df22 Remove comments about potential to use _mm256_maskstore_ps in AVX microkernels by Marat Dukhan · 2 years, 10 months ago
  6. 3c4bb1c Fix conditions for flushing icache (only on arm/arm64) by Zhi An Ng · 2 years, 10 months ago
  7. a38a161 Implement vld1_8, vmlal_s16, vmovl_s8 by Zhi An Ng · 2 years, 10 months ago
  8. 6883abb JIT memory allocation and integration into Assembler by Zhi An Ng · 2 years, 10 months ago
  9. 7bd7ecc qs8 4x8 aarch32/64 GEMM/IGEMM improved prefetch scheduling. by Frank Barchard · 2 years, 10 months ago
  10. 6150425 Disable MSan in AVX512SKX QS8/QC8/QU8 DWCONV microkernels by Marat Dukhan · 2 years, 10 months ago
  11. d541fc0 Annotate remaining microkernels with Out-of-Bounds reads with XNN_OOB_READS by Marat Dukhan · 2 years, 10 months ago
  12. da7b2e2 QS8 4x8 lane GEMM AArch32 microkernel by Frank Barchard · 2 years, 10 months ago
  13. 7be427a Disable MSan and TSan in most microkernels with Out-of-Bounds reads by Marat Dukhan · 2 years, 10 months ago
  14. 4f36e85 Fully quality std::isnormal in ConvertOperatorTester by Marat Dukhan · 2 years, 10 months ago
  15. 590ca5f Add missing <cstddef> include in AArch32Assembler header by Marat Dukhan · 2 years, 10 months ago
  16. 710fb42 Benchmark for the Convert (F32->QS8) operator by Marat Dukhan · 2 years, 10 months ago
  17. 6338bf0 Include signed quantized operators in TensorFlow Lite build by Marat Dukhan · 2 years, 10 months ago
  18. 914f57b Aarch64 4x8 lane ld64 GEMM/IGEMM microkernels. by Frank Barchard · 2 years, 10 months ago
  19. 77e9e65 Document Convert operator in README by Marat Dukhan · 2 years, 10 months ago
  20. 1130923 Expose QS8/QU8->FP32 Convert operator in Subgraph API by Marat Dukhan · 2 years, 10 months ago
  21. f92206b QS8->F32 and QU8->F32 Convert NC operators by Marat Dukhan · 2 years, 10 months ago
  22. 0db15d3 Define XNN_PLATFORM_WINDOWS on Windows by Zhi An Ng · 2 years, 10 months ago
  23. ad6f2dc Benchmarks for QS8->F32 and QU8->F32 VCVT microkernels by Marat Dukhan · 2 years, 10 months ago
  24. cb052a3 Remove duplicate template line for 1x8c4 NEON dot product. by Frank Barchard · 2 years, 10 months ago
  25. f0cb91e Fix formatting of bx signature by Zhi An Ng · 2 years, 10 months ago
  26. 86bd270 Scalar QS8/QU8 -> F32 VCVT microkernels by Marat Dukhan · 2 years, 10 months ago
  27. d873fa2 SSE2 QS8/QU8->F32 VCVT microkernels by Marat Dukhan · 2 years, 10 months ago
  28. fbf12b0 WAsm SIMD QS8/QU8 -> F32 VCVT microkernels by Marat Dukhan · 2 years, 10 months ago
  29. f9cf55d SSE4.1 QS8/QU8->F32 VCVT microkernels by Marat Dukhan · 2 years, 10 months ago
  30. fee66be NEON QS8/QU8 -> F32 VCVT microkernels by Marat Dukhan · 2 years, 10 months ago
  31. 4bdc9f5 Refactor VCVT microkernels by Marat Dukhan · 2 years, 10 months ago
  32. 10475ec Implement bx instruction by Zhi An Ng · 2 years, 10 months ago
  33. 16f3548 Implement pop and vpop (for D registers) by Zhi An Ng · 2 years, 10 months ago
  34. fe4a750 Implement vst1_32 (multiple single elements) and vst1_32 (single element from one lane) by Zhi An Ng · 2 years, 10 months ago
  35. 7c0303e Remove the last remnant of GEMMLOWP requantization in QU8 microkernels by Marat Dukhan · 2 years, 10 months ago
  36. ea612bc Implement vmax_f32 and vmin_f32 by Zhi An Ng · 2 years, 10 months ago
  37. 2fce75b Implement tst with immediate by Zhi An Ng · 2 years, 10 months ago
  38. f73e55b Implement add with immediate (drive-by fix for missing return when error in push) by Zhi An Ng · 2 years, 10 months ago
  39. c9f70f7 Implement vmla.f32, add DRegisterLane for lane-indexed DRegister by Zhi An Ng · 2 years, 10 months ago
  40. 1a55180 Merge pull request #2036 from digantdesai:enable_fp32_arm_kernels by XNNPACK Team · 2 years, 10 months ago
  41. 0f1ed94 QS8/QC8 GEMM/IGEMM WAsm SIMD microkernels using C2S4 layout by Marat Dukhan · 2 years, 10 months ago
  42. dfe8929 Implement vld1 (multiple single element) and vld1r (single element to all lanes) by Zhi An Ng · 2 years, 10 months ago
  43. 737ad01 Add .clang-format and reformat jit related files by Zhi An Ng · 2 years, 10 months ago
  44. 57256c5 Optimize single-threaded execution of vector unary elementwise operators by Marat Dukhan · 2 years, 10 months ago
  45. 354b263 Fix bug in Convert NC operator with large number of elements by Marat Dukhan · 2 years, 10 months ago
  46. 477bdbb Implement vldr instruction by Zhi An Ng · 2 years, 10 months ago
  47. f4beaf1 Implement vmov (q to q, d to d, s to s, core to d) by Zhi An Ng · 2 years, 10 months ago
  48. 7eef0a9 Fix formatting for parameters (use lowercase) by Zhi An Ng · 2 years, 10 months ago
  49. 637becf Implement vldm instruction by Zhi An Ng · 2 years, 10 months ago
  50. 68c27d3 Implement vpush, add SIMD registers and register lists. by Zhi An Ng · 2 years, 10 months ago
  51. 59d6515 Enable FP32 requant variant for QU8 [1,4]x8 Neon MLAL [I]GEMM kernels by Digant Desai · 2 years, 10 months ago
  52. 9982ed3 Enable FP32 requant variant for QU8 NEON dotprod [I]GEMM kernels by Digant Desai · 2 years, 11 months ago
  53. 65584bd Implement labels and branches by Zhi An Ng · 2 years, 10 months ago
  54. 2e2d179 Enable FP32 requant variant for QU8 4x16c4 NEON asm dotprod [I]GEMM kernels by Digant Desai · 2 years, 11 months ago
  55. 10f9f62 Enable FP32 requant variant for QU8 4x16c4 NEON asm dotprod [I]GEMM kernels for CA55r1 by Digant Desai · 2 years, 11 months ago
  56. 9e92451 Include FP16 operators in XNNPACK build for TensorFlow Lite by Marat Dukhan · 2 years, 10 months ago
  57. e20a873 Optimize selection of QS8/QU8 VADD[C] microkernels on WAsm SIMD by Marat Dukhan · 2 years, 10 months ago
  58. d221c54 Better formatting for instruction encoding test errors by Zhi An Ng · 2 years, 10 months ago
  59. 591b917 Implement pld instruction. by Zhi An Ng · 2 years, 10 months ago
  60. e98039f Annotate F32->QS8/QU8 VCVT microkernels reading OoB with XNN_DISABLE_MSAN by Marat Dukhan · 2 years, 10 months ago
  61. 4ab7b93 Implement sub and subs instructions. by Zhi An Ng · 2 years, 10 months ago
  62. ff2e8b2 Implement mov instruction. by Zhi An Ng · 2 years, 10 months ago
  63. 984644f Remove unused header by Zhi An Ng · 2 years, 10 months ago
  64. 663b4fe Implement cmp instruction. by Zhi An Ng · 2 years, 10 months ago
  65. 947805b Fix AArch64 build without assembly microkernels by Marat Dukhan · 2 years, 10 months ago
  66. c9ffad7 Add support for MemOperand with addressing mode and ldr instruction. by Zhi An Ng · 2 years, 10 months ago
  67. c7d0728 Reoptimize FP32 requantization in NEON QS8/QU8 VMUL[C] by Marat Dukhan · 2 years, 10 months ago
  68. 03efa0f Reoptimize FP32 requantization in NEON QS8/QC8/QU8 GEMM/IGEMM/DWCONV by Marat Dukhan · 2 years, 10 months ago
  69. 815092b Reoptimize F32->QS8 and F32->QU8 VCVT NEON microkernels by Marat Dukhan · 2 years, 10 months ago
  70. 2c75d90 Reoptimize F32->QS8/QU8 CVT NEON evaluation stubs by Marat Dukhan · 2 years, 10 months ago
  71. 5a31dc6 Optimize FP32 requantization in NEON QS8/QC8/QU8 GEMM/IGEMM/DWCONV by Marat Dukhan · 2 years, 10 months ago
  72. 7988a18 Refactoring xnn_qs8_minmax_params for NEON/NEONv8 by Marat Dukhan · 2 years, 10 months ago
  73. 9855537 Support requantization scale up to 256 by Marat Dukhan · 2 years, 10 months ago
  74. 8978ac2 Support requantization scale greater than 1 in RNDNU NEON microkernels by Marat Dukhan · 2 years, 10 months ago
  75. 512d44b Add push instruction and RegisterList support by Zhi An Ng · 2 years, 10 months ago
  76. b559fe9 Initial AArch32 structure by Zhi An Ng · 2 years, 10 months ago
  77. 13c9f8d Support requantization scale over 1 in SSE/AVX GEMM/IGEMM/DWCONV by Marat Dukhan · 2 years, 10 months ago
  78. 8999190 Remove GEMMLOWP requantization from QS8 GEMM/IGEMM templates by Marat Dukhan · 2 years, 10 months ago
  79. 17a9e3f Remove GEMMLOWP requantization from QS8 DWCONV templates by Marat Dukhan · 2 years, 10 months ago
  80. 482508b Optimize FP32 requantization in ARMv7 NEON QS8/QU8 VMUL[C] by Marat Dukhan · 2 years, 10 months ago
  81. 20483c7 Expose Convert operator in Subgraph API by Marat Dukhan · 2 years, 10 months ago
  82. d52d20b Use the same F32->QS8/QU8 VCVT WAsm SIMD microkernels on ARM and x86 by Marat Dukhan · 2 years, 10 months ago
  83. 411c18d Optimize FP32 requantization in WAsm SIMD QS8/QC8/QU8 GEMM/IGEMM/DWCONV by Marat Dukhan · 2 years, 10 months ago
  84. af9c4e1 Optimize FP32 requantization in WAsm SIMD QS8/QU8 VMUL[C] by Marat Dukhan · 2 years, 10 months ago
  85. 430b173 F32->QS8/QU8 VCVT scalar microkernels using FP32 min/max by Marat Dukhan · 2 years, 10 months ago
  86. d5ff6ae Remove erroneous assertions from ConvertOperatorTester by Marat Dukhan · 2 years, 10 months ago
  87. ed2d776 F32->QS8 and F32->QU8 Convert NC operators by Marat Dukhan · 2 years, 10 months ago
  88. 03f1297 F32->QS8 and F32->QU8 Convert NC operators by XNNPACK Team · 2 years, 10 months ago
  89. 21d9ac1 Fix debug build of XNNPACK by Marat Dukhan · 2 years, 10 months ago
  90. 7d2d85c F32->QS8 and F32->QU8 Convert NC operators by Marat Dukhan · 2 years, 10 months ago
  91. 19c8644 Fix prefetch offset for QS8 lane prfm GEMM/IGEMM microkernels/ by Frank Barchard · 2 years, 10 months ago
  92. 5740f75 Fix trailing whitespace in VCVT benchmarks by Marat Dukhan · 2 years, 10 months ago
  93. 563eee1 Benchmarks for F32->QS8 and F32->QU8 VCVT microkernels by Marat Dukhan · 2 years, 10 months ago
  94. 4bd1de9 F32->QS8 and F32->QU8 VCVT WAsm SIMD microkernels using F32->I32 conversion by Marat Dukhan · 2 years, 10 months ago
  95. 00a1085 F32->QS8 and F32->QU8 VCVT scalar microkernels by Marat Dukhan · 2 years, 10 months ago
  96. 98d5552 F32->QS8 and F32->QU8 VCVT WAsm SIMD microkernels by Marat Dukhan · 2 years, 10 months ago
  97. b2d0a2a F32->QS8 and F32->QU8 VCVT NEON microkernels by Marat Dukhan · 2 years, 10 months ago
  98. d24301d F32->QS8/QU8 CVT evaluation stubs for NEON and NEON v8 by Marat Dukhan · 2 years, 10 months ago
  99. 9551075 Fix CMake build by Marat Dukhan · 2 years, 10 months ago
  100. f82ea82 Add PRFM benchmarks for qs8 lane by Frank Barchard · 2 years, 10 months ago