- 77b694c Fixes style issues with SSE microkernel by Alan Kelly · 2 years, 10 months ago
- 691ec40 Use proper intrinsics header in SSE F32 VHSWISH microkernels by Marat Dukhan · 2 years, 10 months ago
- c025831 Refactor declarations of parameter initialization functions by Marat Dukhan · 2 years, 10 months ago
- 51c6134 Amalgamate SSE and AVX512 microkernels for TFLite build by Marat Dukhan · 2 years, 10 months ago
- e0f15ad Split scalar production microkernels into portable, AArch32, and Wasm by Marat Dukhan · 2 years, 10 months ago
- c80ffb0 Fix generation of gemm tests for ADJBLOCK and rerun scripts. by Zhi An Ng · 2 years, 10 months ago
- 0fd983b Adds -Wcast-qual flag to detect cast dropping const. by Alan Kelly · 2 years, 10 months ago
- f527d56 Avoid using C++14 features in AArch32 assembler test by Marat Dukhan · 2 years, 10 months ago
- 19bfefe Support Relaxed SIMD in xnnpack_cc_library and xnnpack_aggregate_library by Marat Dukhan · 2 years, 10 months ago
- 9519816 Enable QS8 4x8 LD64 Neon on AArch32 by Frank Barchard · 2 years, 10 months ago
- e31f29e Declare assembly for QS8 microkernels by Frank Barchard · 2 years, 10 months ago
- 4c61779 Minimally support WebAssembly Relaxed SIMD builds by Marat Dukhan · 2 years, 10 months ago
- 8c7355a Enable QS8 4x8 LD64 dot product on AArch32 by Frank Barchard · 2 years, 10 months ago
- 1e9c5ac Fix CMake build by Marat Dukhan · 2 years, 10 months ago
- c3c6632 Improve compatibility with GCC in AVX512-SKX microkernels by Marat Dukhan · 2 years, 10 months ago
- 50b0bd9 Fix encoding and supported immediate values for vldr and vstr. by Zhi An Ng · 2 years, 10 months ago
- 1aac8e8 Implement vmrs (FPSCR) by Zhi An Ng · 2 years, 10 months ago
- 0a1b7b6 Implement ldrd (immediate) by Zhi An Ng · 2 years, 10 months ago
- 26e55ed Implement vstr instruction by Zhi An Ng · 2 years, 10 months ago
- a787832 PUSH lr instead of r14 in AArch32 assembly microkernels by Frank Barchard · 2 years, 10 months ago
- e0ac223 QS8 IGEMM neon dot comment change float* to int8_t* by Frank Barchard · 2 years, 10 months ago
- 97f99fc Return error if fail to get page size by Zhi An Ng · 2 years, 10 months ago
- 932e823 Implement str (imm) by Zhi An Ng · 2 years, 10 months ago
- 4ebd680 Implement moveq, cmp (imm), sub (imm). by Zhi An Ng · 2 years, 10 months ago
- 2b74ddd Implement vld1_8 with offset register by Zhi An Ng · 2 years, 10 months ago
- fea422d Implement vld1_32 (single element to one lane). by Zhi An Ng · 2 years, 10 months ago
- e48b5c1 QS8 4x8 Neon Lane LD64 IGEMM AArch32 microkernel by Frank Barchard · 2 years, 10 months ago
- 1669dd0 aarch32 avoid the VPUSH/VPOP of unused registers by Frank Barchard · 2 years, 10 months ago
- 4841021 QS8 4x8 dot product LD64 IGEMM AArch32 microkernel by Frank Barchard · 2 years, 10 months ago
- 938ee9b Implement bic, vld1_8 and vld1_32 for QRegisterList, assert encodings don't error out in tests. by Zhi An Ng · 2 years, 10 months ago
- 9364bdc Implement vsdot_s8 instruction by Zhi An Ng · 2 years, 10 months ago
- a251f87 Implement vqmovn_s16, and_, adds. by Zhi An Ng · 2 years, 10 months ago
- 7c8090d Implement vcmpe_f32, vmovpl_f32, vmovmi_f32. by Zhi An Ng · 2 years, 10 months ago
- 2d8180c Implement 2-argument add, vmla_f32, vmov_f32, vmov_f64, vstm. by Zhi An Ng · 2 years, 10 months ago
- 70e8c99 Format source and BUILD file by Frank Barchard · 2 years, 10 months ago
- 9f3f420 QS8 4x8 LD64 dot product GEMM AArch32 microkernel by Frank Barchard · 2 years, 10 months ago
- b63e84c Implement b (unconditional branch) by Zhi An Ng · 2 years, 10 months ago
- be4e6a5 Add align for aligning instructions (similar to .align in assembly) by Zhi An Ng · 2 years, 10 months ago
- ec17e99 Add license to files by Zhi An Ng · 2 years, 10 months ago
- 98393ad AVX512 QS8->F32 and QU8->F32 VCVT microkernels by Marat Dukhan · 2 years, 10 months ago
- fda06cb SSE transpose microkernel by Alan Kelly · 2 years, 10 months ago
- 7b5f779 AVX2 QS8->F32 and QU8->F32 VCVT microkernels by Marat Dukhan · 2 years, 10 months ago
- cd4089f AVX QS8->F32 and QU8->F32 VCVT microkernels by Marat Dukhan · 2 years, 10 months ago
- 2edf863 AVX512 F32->QS8 and F32->QU8 VCVT microkernels by Marat Dukhan · 2 years, 10 months ago
- 0d399ca AVX2 F32->QS8 and F32->QU8 VCVT microkernels by Marat Dukhan · 2 years, 10 months ago
- 3bdbe9f Fix xnn_release_code_memory to unmap entire capacity of buffer by Zhi An Ng · 2 years, 10 months ago
- b91432c AVX F32->QS8 and F32->QU8 VCVT microkernels by Marat Dukhan · 2 years, 10 months ago
- 6fac719 Implement vqmovn_s32 and vext_8 by Zhi An Ng · 2 years, 10 months ago
- 4a58583 Implement vdup_8, vdup_16, vdup_32 by Zhi An Ng · 2 years, 10 months ago
- 2649014 Implement vmax_s8, vmin_s8, vqadd_s16, vqdmulh_s32, vqshl_s32, vrshl_s32 by Zhi An Ng · 2 years, 10 months ago
- 4ef8d51 Implement vst1_16, add some more test cases by Zhi An Ng · 2 years, 10 months ago
- 00a929f Implement vst1_8 and fix vst1_32 encoding by Zhi An Ng · 2 years, 10 months ago
- 9820234 Full set of benchmarks for Convert operator by Marat Dukhan · 2 years, 10 months ago
- 1d1df22 Remove comments about potential to use _mm256_maskstore_ps in AVX microkernels by Marat Dukhan · 2 years, 10 months ago
- 3c4bb1c Fix conditions for flushing icache (only on arm/arm64) by Zhi An Ng · 2 years, 10 months ago
- a38a161 Implement vld1_8, vmlal_s16, vmovl_s8 by Zhi An Ng · 2 years, 10 months ago
- 6883abb JIT memory allocation and integration into Assembler by Zhi An Ng · 2 years, 10 months ago
- 7bd7ecc qs8 4x8 aarch32/64 GEMM/IGEMM improved prefetch scheduling. by Frank Barchard · 2 years, 10 months ago
- 6150425 Disable MSan in AVX512SKX QS8/QC8/QU8 DWCONV microkernels by Marat Dukhan · 2 years, 10 months ago
- d541fc0 Annotate remaining microkernels with Out-of-Bounds reads with XNN_OOB_READS by Marat Dukhan · 2 years, 10 months ago
- da7b2e2 QS8 4x8 lane GEMM AArch32 microkernel by Frank Barchard · 2 years, 10 months ago
- 7be427a Disable MSan and TSan in most microkernels with Out-of-Bounds reads by Marat Dukhan · 2 years, 10 months ago
- 4f36e85 Fully quality std::isnormal in ConvertOperatorTester by Marat Dukhan · 2 years, 10 months ago
- 590ca5f Add missing <cstddef> include in AArch32Assembler header by Marat Dukhan · 2 years, 10 months ago
- 710fb42 Benchmark for the Convert (F32->QS8) operator by Marat Dukhan · 2 years, 10 months ago
- 6338bf0 Include signed quantized operators in TensorFlow Lite build by Marat Dukhan · 2 years, 10 months ago
- 914f57b Aarch64 4x8 lane ld64 GEMM/IGEMM microkernels. by Frank Barchard · 2 years, 10 months ago
- 77e9e65 Document Convert operator in README by Marat Dukhan · 2 years, 10 months ago
- 1130923 Expose QS8/QU8->FP32 Convert operator in Subgraph API by Marat Dukhan · 2 years, 10 months ago
- f92206b QS8->F32 and QU8->F32 Convert NC operators by Marat Dukhan · 2 years, 10 months ago
- 0db15d3 Define XNN_PLATFORM_WINDOWS on Windows by Zhi An Ng · 2 years, 10 months ago
- ad6f2dc Benchmarks for QS8->F32 and QU8->F32 VCVT microkernels by Marat Dukhan · 2 years, 10 months ago
- cb052a3 Remove duplicate template line for 1x8c4 NEON dot product. by Frank Barchard · 2 years, 10 months ago
- f0cb91e Fix formatting of bx signature by Zhi An Ng · 2 years, 10 months ago
- 86bd270 Scalar QS8/QU8 -> F32 VCVT microkernels by Marat Dukhan · 2 years, 10 months ago
- d873fa2 SSE2 QS8/QU8->F32 VCVT microkernels by Marat Dukhan · 2 years, 10 months ago
- fbf12b0 WAsm SIMD QS8/QU8 -> F32 VCVT microkernels by Marat Dukhan · 2 years, 10 months ago
- f9cf55d SSE4.1 QS8/QU8->F32 VCVT microkernels by Marat Dukhan · 2 years, 10 months ago
- fee66be NEON QS8/QU8 -> F32 VCVT microkernels by Marat Dukhan · 2 years, 10 months ago
- 4bdc9f5 Refactor VCVT microkernels by Marat Dukhan · 2 years, 10 months ago
- 10475ec Implement bx instruction by Zhi An Ng · 2 years, 10 months ago
- 16f3548 Implement pop and vpop (for D registers) by Zhi An Ng · 2 years, 10 months ago
- fe4a750 Implement vst1_32 (multiple single elements) and vst1_32 (single element from one lane) by Zhi An Ng · 2 years, 10 months ago
- 7c0303e Remove the last remnant of GEMMLOWP requantization in QU8 microkernels by Marat Dukhan · 2 years, 10 months ago
- ea612bc Implement vmax_f32 and vmin_f32 by Zhi An Ng · 2 years, 10 months ago
- 2fce75b Implement tst with immediate by Zhi An Ng · 2 years, 10 months ago
- f73e55b Implement add with immediate (drive-by fix for missing return when error in push) by Zhi An Ng · 2 years, 10 months ago
- c9f70f7 Implement vmla.f32, add DRegisterLane for lane-indexed DRegister by Zhi An Ng · 2 years, 10 months ago
- 1a55180 Merge pull request #2036 from digantdesai:enable_fp32_arm_kernels by XNNPACK Team · 2 years, 10 months ago
- 0f1ed94 QS8/QC8 GEMM/IGEMM WAsm SIMD microkernels using C2S4 layout by Marat Dukhan · 2 years, 10 months ago
- dfe8929 Implement vld1 (multiple single element) and vld1r (single element to all lanes) by Zhi An Ng · 2 years, 10 months ago
- 737ad01 Add .clang-format and reformat jit related files by Zhi An Ng · 2 years, 10 months ago
- 57256c5 Optimize single-threaded execution of vector unary elementwise operators by Marat Dukhan · 2 years, 10 months ago
- 354b263 Fix bug in Convert NC operator with large number of elements by Marat Dukhan · 2 years, 10 months ago
- 477bdbb Implement vldr instruction by Zhi An Ng · 2 years, 10 months ago
- f4beaf1 Implement vmov (q to q, d to d, s to s, core to d) by Zhi An Ng · 2 years, 10 months ago
- 7eef0a9 Fix formatting for parameters (use lowercase) by Zhi An Ng · 2 years, 10 months ago
- 637becf Implement vldm instruction by Zhi An Ng · 2 years, 10 months ago
- 68c27d3 Implement vpush, add SIMD registers and register lists. by Zhi An Ng · 2 years, 10 months ago
- 59d6515 Enable FP32 requant variant for QU8 [1,4]x8 Neon MLAL [I]GEMM kernels by Digant Desai · 2 years, 10 months ago