- bcfb762 Further tweaks to test logic enabling bias and clamping. by Benoit Jacob · 4 years, 10 months ago
- 2b0d243 Ruy - fix test to run platform-specific path by T.J. Alumbaugh · 4 years, 10 months ago
- 5b36bac When benchmarking, avoid randomly turning on/off some variants e.g. bias-addition and nonzero zero-points. This makes a very small performance difference but in benchmarking we should consistently measure the same exact thing. by Benoit Jacob · 4 years, 10 months ago
- 54d2435 Simplify ruy tests by removing the complicated logic determining quantized multipliers and clamp bounds. Now unconditionally doing what we used to do when QUICK_BENCHMARK=1 was passed. That was needed in practice to get quick results, as the old logic was very slow as it had to rely on a reference implementaiton of matmul (else it would have been very confusing when matmul regressed). by Benoit Jacob · 4 years, 10 months ago
- 73c3214 Use an ordered map for thread roots so that profiles consistently start with the 'main thread' and have a consistent order of enumeration of the other threads. by Benoit Jacob · 4 years, 10 months ago
- eb351f2 Drop the dependency on gemmlowp/fixedpoint. by Benoit Jacob · 4 years, 10 months ago
- d1a14aa Remove ruy's dependency on the gemmlowp profiler. by Benoit Jacob · 4 years, 10 months ago
- 652f111 Update README.md. Add contributing.md and LICENSE. by Benoit Jacob · 4 years, 10 months ago
- 6180f1f Ruy x86: Introduce framework for SSE 4.2 and VNNI. by Alex Stark · 4 years, 10 months ago
- c40e695 Ruy: Add note to x86 AVX2 kernels. by Alex Stark · 4 years, 10 months ago
- 29840ae Fix compilation broken by cl/288340160. by Benoit Jacob · 4 years, 11 months ago
- 23adc55 Keep only the simple auxv method for detecting dotprod instructions. by Benoit Jacob · 4 years, 11 months ago
- 879f593 Ruy GEMV: x86 AVX-512 8-bit rough kernels. by Alex Stark · 5 years ago
- b6632d3 Ruy GEMV: x86 AVX-512 float rough kernels. by Alex Stark · 5 years ago
- f2db1bf Ruy GEMV: x86 AVX2 8-bit rough kernels. by Alex Stark · 5 years ago
- 44ca9b1 Ruy GEMV: x86 AVX2 float rough kernels. by Alex Stark · 5 years ago
- 8d3e931 Add `cacheable` flag to Ruy Matrix so that caller "opts in" to cache behavior on a per-call basis by T.J. Alumbaugh · 5 years ago
- de0a983 In gemv-ish cases, each byte of the big weights matrix is traversed only once, so any notion of data locality is irrelevant. Ignore the 'cache locality score' by forcing it to be 0 in that case. by Benoit Jacob · 5 years ago
- 9e08d80 Limit rectangularness to avoid using too tiny kernel blocks in the case of highly rectangular destination matrices (gemv-ish cases), which would result in too few iterations of the kernel inner loop to be fully efficient. Now aim to have at least 8 iterations of the kernel inner loop if possible. by Benoit Jacob · 5 years ago
- 718aa11 Ruy: remove additional flag guarding use of prepacked cache by T.J. Alumbaugh · 5 years ago
- 7bbf219 NeonCpuBackendGemm uses CpuBackedGemm interface instead of Ruy interface by T.J. Alumbaugh · 5 years ago
- 2c7897c Ruy: Profile cache ejection. by T.J. Alumbaugh · 5 years ago
- c85f6d7 Resubmit of http://cl/283555950 with fix for win32. by Sean Silva · 5 years ago
- 19a85c4 Use separate allocator for cached prepacked matrix allocations. by Ruy Contributors · 5 years ago
- 9a523b8 Use separate allocator for cached prepacked matrix allocations. by Sean Silva · 5 years ago
- 634945d use nullptr for null pointers. by Benoit Jacob · 5 years ago
- 20fc9c1 Move deps to BUILD file to make them easier to manage with automation by Ruy Contributors · 5 years ago
- 48f3bc0 Ruy: Permit GEMV code to thread if thread_count above 1 by T.J. Alumbaugh · 5 years ago
- a67c966 Ruy: Add non-zero mean in matrix test data, exercising along-row summations. by Alex Stark · 5 years ago
- 3a43125 Ruy: Resubmit of: Optimize (partial) of x86 AVX-512 8-bit pack. by Alex Stark · 5 years ago
- b645c83 Ruy: Re-submit of: Further optimization (partial) of x86 AVX2 8-bit pack. by Alex Stark · 5 years ago
- 1cff2c1 Ruy: Reduce compiler warnings. by Alex Stark · 5 years ago
- b03ae6b Ruy: Further optimization (partial) of x86 AVX2 8-bit pack. by Alex Stark · 5 years ago
- cb1db77 Ruy: Optimize (partial) of x86 AVX-512 8-bit pack. by Alex Stark · 5 years ago
- 3b3dbb1 Ruy ARMv8 quantized GEMV kernel by T.J. Alumbaugh · 5 years ago
- a1aad26 Ruy: ARMv8 GEMV kernel with dotprod support by T.J. Alumbaugh · 5 years ago
- 19a7e80 Provide path to Ruy::Mul for MatrixBatchVectorMultiplyAccumulate by T.J. Alumbaugh · 5 years ago
- b148d09 Tune default Ruy cache behavior (still off by default) by T.J. Alumbaugh · 5 years ago
- bb74635 Ruy: Fix to x86 AVX2 float pack. by Alex Stark · 5 years ago
- 5d5ea3d Ruy: Add a cache policy and implementation. Protected by #ifdef usage and default policy is off. by T.J. Alumbaugh · 5 years ago
- df8c25b Ruy cache of prepacked matrices by T.J. Alumbaugh · 5 years ago
- ab42065 Ruy: Reduce compiler warnings. by Alex Stark · 5 years ago
- 0a255c6 Ruy: Improve x86 kernel profiling labels. by Alex Stark · 5 years ago
- 78c747e Ruy: Add benchmark variable to change range of sizes. by Alex Stark · 5 years ago
- c02a73a Ruy: Further optimization (partial) of x86 AVX2 8-bit pack. by Alex Stark · 5 years ago
- de56663 Ruy: Further optimization (partial) of AVX2 float pack. by Alex Stark · 5 years ago
- 9f1042d Ruy: Optimize (partial) of x86 AVX2 8-bit pack. by Alex Stark · 5 years ago
- b1e4366 Ruy: Optimize (partial) of x86 AVX-512 8-bit pack. by Alex Stark · 5 years ago
- 16c6820 Ruy: Clean up x86 packing profiling labels. by Alex Stark · 5 years ago
- 27b03e9 Ruy: Remove unused function from AVX-512 8-bit kernel. by Alex Stark · 5 years ago
- 0a72cad Ruy: Optimization (partial) of AVX2 float pack. by Alex Stark · 5 years ago
- 10d7034 Ruy: Output message when overriding block map size. by Alex Stark · 5 years ago
- 035e2c7 Use "-O3" for optimized ruy build by Terry Heo · 5 years ago
- 4eee09f Ruy: Workaround compiler problems with low optimization. by Alex Stark · 5 years ago
- 45d833e Ruy: No longer disable AVX2 within x86 by default. by Alex Stark · 5 years ago
- 770a968 Ruy: Refinements to 8-bit x86 kernels. by Alex Stark · 5 years ago
- 3a48f4e Ruy: Improve x86 8-bit AVX-512 kernel loops. by Alex Stark · 5 years ago
- d5f0a6c Ruy: Unroll x86 8-bit AVX-512 kernel loops. by Alex Stark · 5 years ago
- 45ada67 Ruy: Improve handling of offsets in 8-bit AVX-512 kernel. by Alex Stark · 5 years ago
- c7e47ca Ruy: Improve output stage of 8-bit AVX-512 kernel. by Alex Stark · 5 years ago
- dd70d1c Ruy: Move x86 AVX-512 utility functions into namespace. by Alex Stark · 5 years ago
- ac132b9 Ruy: Improve offsets handling for 8-bit AVX2. by Alex Stark · 5 years ago
- 3fb0f26 Ruy: Rework accumulation in x86 AVX2 8-bit kernel. by Alex Stark · 5 years ago
- e09d9b4 Ruy: Combine output stages in x86 AVX2 8-bit kernel. by Alex Stark · 5 years ago
- 16f513c Ruy: Load RHS data directly in x86 AVX-512 float kernel. by Alex Stark · 5 years ago
- d4508c8 Ruy: Load RHS data directly in x86 AVX2 float kernel. by Alex Stark · 5 years ago
- fc7e615 Ruy: Unroll loops in x86 AVX2 8-bit kernel. by Alex Stark · 5 years ago
- 45159dc Use structured comparison macros e.g. RUY_CHECK_EQ(a, b) by Benoit Jacob · 5 years ago
- a96184a Rewrite RUY_CHECK family of macros: by Benoit Jacob · 5 years ago
- 02f886b Ruy ARM32: additional 8bit optimizations by T.J. Alumbaugh · 5 years ago
- 8cc147e Ruy ARM32 GEMV kernel by T.J. Alumbaugh · 5 years ago
- e5d56a3 Split MakeBlockMap into smaller functions by Benoit Jacob · 5 years ago
- 494a6e6 Ruy: Improvements to AVX-512 8bit code. by Alex Stark · 5 years ago
- 51636ba Ruy: Rough optimization of x86 AVX2 8-bit kernel. by Alex Stark · 5 years ago
- 06be363 Ruy: Rough optimization of x86 AVX2 float kernel. by Alex Stark · 5 years ago
- f7118e1 Rewrite MakeBlockMap to be more principled and at the same time more explicitly empirically derived. by Benoit Jacob · 5 years ago
- f585116 Make thread_count part of BlockMap. Allow MakeBlockMap to take a tentative thread_count value as input, potentially use that value to choose BlockMap parameters, and decide on the definitive thread_count value. by Benoit Jacob · 5 years ago
- 3e1d455 Rewrite the 'rectangularness' computation. Since 'rectangularness' is now the largest-scale subdivision, it should not (anymore) have anything to do with the kernel layout. Rectangularness is now nothing but the shape 'aspect ratio', (rows/columns), of the destination matrix. This keeps concepts simpler and more orthogonal -- rectangularness is a property of the destination matrix alone, orthogonal to kernel. This will allow writing better, simpler block_map logic. by Benoit Jacob · 5 years ago
- 77344d0 Introduce pot_log2, which checks that its argument is a power of two then returns its log2. by Benoit Jacob · 5 years ago
- fa906af Switch the rectangularness of blockmaps from inner to outer. by Benoit Jacob · 5 years ago
- b580a77 Ruy ARM32: Optimize 8bit kernel by T.J. Alumbaugh · 5 years ago
- 38da52a Ruy - optimize ARM32 quantized int kernel by T.J. Alumbaugh · 5 years ago
- 945aaa1 Remove portable_test_suite inclusion for ruy tests by Jared Duke · 5 years ago
- 8428c44 Ruy: Add compile-time and runtime FMA checking under AVX2. by Alex Stark · 5 years ago
- ffcbb6e Ruy: Add mechanism to mask out paths. by Alex Stark · 5 years ago
- d594e94 Automated rollback of rollback. Fixed in preceding change. by Alex Stark · 5 years ago
- 8d47072 Ruy: Ensure that a couple of classes defined (empty) for non-x86 and non-ARM. by Alex Stark · 5 years ago
- 8d21e2b Fix an assertion. Also edit a comment in ruy/BUILD about debugging. by Benoit Jacob · 5 years ago
- ee13042 Support Emscripten (ie typically Wasm). by Benoit Jacob · 5 years ago
- 4184d61 Automated rollback from breakage by Ruy Contributors · 5 years ago
- 06ca74d Ruy: Tests for CPU ID detection. by Alex Stark · 5 years ago
- 179e1d3 Ruy: Minor clean up. by Alex Stark · 5 years ago
- 1963e4d Ruy: Improve includes. by Alex Stark · 5 years ago
- 15eeb2a Ruy: Move common copts to recently-added bzl file. by Alex Stark · 5 years ago
- d9741fb Ruy: Add bzl files for copts handling. by Alex Stark · 5 years ago
- dd04052 Ruy: Rearrange BUILD file. by Alex Stark · 5 years ago
- e439926 Ruy: Disable x86 enhancements under Clang < 8. by Alex Stark · 5 years ago
- 948a3ff Ruy: Add bzl files for copts handling. by Alex Stark · 5 years ago
- 5811fa7 Ruy: Introduce CPU ID detection on x86. by Alex Stark · 5 years ago
- 3e2acb1 Don't round the allocator's storage size to the next power of two. This is typically a huge buffer. We're going to reach a steady state where we have only a few such buffers and they won't get frequently reallocated, anyway. by Benoit Jacob · 5 years ago