- 634945d use nullptr for null pointers. by Benoit Jacob · 5 years ago
- 20fc9c1 Move deps to BUILD file to make them easier to manage with automation by Ruy Contributors · 5 years ago
- 48f3bc0 Ruy: Permit GEMV code to thread if thread_count above 1 by T.J. Alumbaugh · 5 years ago
- a67c966 Ruy: Add non-zero mean in matrix test data, exercising along-row summations. by Alex Stark · 5 years ago
- 3a43125 Ruy: Resubmit of: Optimize (partial) of x86 AVX-512 8-bit pack. by Alex Stark · 5 years ago
- b645c83 Ruy: Re-submit of: Further optimization (partial) of x86 AVX2 8-bit pack. by Alex Stark · 5 years ago
- 1cff2c1 Ruy: Reduce compiler warnings. by Alex Stark · 5 years ago
- b03ae6b Ruy: Further optimization (partial) of x86 AVX2 8-bit pack. by Alex Stark · 5 years ago
- cb1db77 Ruy: Optimize (partial) of x86 AVX-512 8-bit pack. by Alex Stark · 5 years ago
- 3b3dbb1 Ruy ARMv8 quantized GEMV kernel by T.J. Alumbaugh · 5 years ago
- a1aad26 Ruy: ARMv8 GEMV kernel with dotprod support by T.J. Alumbaugh · 5 years ago
- 19a7e80 Provide path to Ruy::Mul for MatrixBatchVectorMultiplyAccumulate by T.J. Alumbaugh · 5 years ago
- b148d09 Tune default Ruy cache behavior (still off by default) by T.J. Alumbaugh · 5 years ago
- bb74635 Ruy: Fix to x86 AVX2 float pack. by Alex Stark · 5 years ago
- 5d5ea3d Ruy: Add a cache policy and implementation. Protected by #ifdef usage and default policy is off. by T.J. Alumbaugh · 5 years ago
- df8c25b Ruy cache of prepacked matrices by T.J. Alumbaugh · 5 years ago
- ab42065 Ruy: Reduce compiler warnings. by Alex Stark · 5 years ago
- 0a255c6 Ruy: Improve x86 kernel profiling labels. by Alex Stark · 5 years ago
- 78c747e Ruy: Add benchmark variable to change range of sizes. by Alex Stark · 5 years ago
- c02a73a Ruy: Further optimization (partial) of x86 AVX2 8-bit pack. by Alex Stark · 5 years ago
- de56663 Ruy: Further optimization (partial) of AVX2 float pack. by Alex Stark · 5 years ago
- 9f1042d Ruy: Optimize (partial) of x86 AVX2 8-bit pack. by Alex Stark · 5 years ago
- b1e4366 Ruy: Optimize (partial) of x86 AVX-512 8-bit pack. by Alex Stark · 5 years ago
- 16c6820 Ruy: Clean up x86 packing profiling labels. by Alex Stark · 5 years ago
- 27b03e9 Ruy: Remove unused function from AVX-512 8-bit kernel. by Alex Stark · 5 years ago
- 0a72cad Ruy: Optimization (partial) of AVX2 float pack. by Alex Stark · 5 years ago
- 10d7034 Ruy: Output message when overriding block map size. by Alex Stark · 5 years ago
- 035e2c7 Use "-O3" for optimized ruy build by Terry Heo · 5 years ago
- 4eee09f Ruy: Workaround compiler problems with low optimization. by Alex Stark · 5 years ago
- 45d833e Ruy: No longer disable AVX2 within x86 by default. by Alex Stark · 5 years ago
- 770a968 Ruy: Refinements to 8-bit x86 kernels. by Alex Stark · 5 years ago
- 3a48f4e Ruy: Improve x86 8-bit AVX-512 kernel loops. by Alex Stark · 5 years ago
- d5f0a6c Ruy: Unroll x86 8-bit AVX-512 kernel loops. by Alex Stark · 5 years ago
- 45ada67 Ruy: Improve handling of offsets in 8-bit AVX-512 kernel. by Alex Stark · 5 years ago
- c7e47ca Ruy: Improve output stage of 8-bit AVX-512 kernel. by Alex Stark · 5 years ago
- dd70d1c Ruy: Move x86 AVX-512 utility functions into namespace. by Alex Stark · 5 years ago
- ac132b9 Ruy: Improve offsets handling for 8-bit AVX2. by Alex Stark · 5 years ago
- 3fb0f26 Ruy: Rework accumulation in x86 AVX2 8-bit kernel. by Alex Stark · 5 years ago
- e09d9b4 Ruy: Combine output stages in x86 AVX2 8-bit kernel. by Alex Stark · 5 years ago
- 16f513c Ruy: Load RHS data directly in x86 AVX-512 float kernel. by Alex Stark · 5 years ago
- d4508c8 Ruy: Load RHS data directly in x86 AVX2 float kernel. by Alex Stark · 5 years ago
- fc7e615 Ruy: Unroll loops in x86 AVX2 8-bit kernel. by Alex Stark · 5 years ago
- 45159dc Use structured comparison macros e.g. RUY_CHECK_EQ(a, b) by Benoit Jacob · 5 years ago
- a96184a Rewrite RUY_CHECK family of macros: by Benoit Jacob · 5 years ago
- 02f886b Ruy ARM32: additional 8bit optimizations by T.J. Alumbaugh · 5 years ago
- 8cc147e Ruy ARM32 GEMV kernel by T.J. Alumbaugh · 5 years ago
- e5d56a3 Split MakeBlockMap into smaller functions by Benoit Jacob · 5 years ago
- 494a6e6 Ruy: Improvements to AVX-512 8bit code. by Alex Stark · 5 years ago
- 51636ba Ruy: Rough optimization of x86 AVX2 8-bit kernel. by Alex Stark · 5 years ago
- 06be363 Ruy: Rough optimization of x86 AVX2 float kernel. by Alex Stark · 5 years ago
- f7118e1 Rewrite MakeBlockMap to be more principled and at the same time more explicitly empirically derived. by Benoit Jacob · 5 years ago
- f585116 Make thread_count part of BlockMap. Allow MakeBlockMap to take a tentative thread_count value as input, potentially use that value to choose BlockMap parameters, and decide on the definitive thread_count value. by Benoit Jacob · 5 years ago
- 3e1d455 Rewrite the 'rectangularness' computation. Since 'rectangularness' is now the largest-scale subdivision, it should not (anymore) have anything to do with the kernel layout. Rectangularness is now nothing but the shape 'aspect ratio', (rows/columns), of the destination matrix. This keeps concepts simpler and more orthogonal -- rectangularness is a property of the destination matrix alone, orthogonal to kernel. This will allow writing better, simpler block_map logic. by Benoit Jacob · 5 years ago
- 77344d0 Introduce pot_log2, which checks that its argument is a power of two then returns its log2. by Benoit Jacob · 5 years ago
- fa906af Switch the rectangularness of blockmaps from inner to outer. by Benoit Jacob · 5 years ago
- b580a77 Ruy ARM32: Optimize 8bit kernel by T.J. Alumbaugh · 5 years ago
- 38da52a Ruy - optimize ARM32 quantized int kernel by T.J. Alumbaugh · 5 years ago
- 945aaa1 Remove portable_test_suite inclusion for ruy tests by Jared Duke · 5 years ago
- 8428c44 Ruy: Add compile-time and runtime FMA checking under AVX2. by Alex Stark · 5 years ago
- ffcbb6e Ruy: Add mechanism to mask out paths. by Alex Stark · 5 years ago
- d594e94 Automated rollback of rollback. Fixed in preceding change. by Alex Stark · 5 years ago
- 8d47072 Ruy: Ensure that a couple of classes defined (empty) for non-x86 and non-ARM. by Alex Stark · 5 years ago
- 8d21e2b Fix an assertion. Also edit a comment in ruy/BUILD about debugging. by Benoit Jacob · 5 years ago
- ee13042 Support Emscripten (ie typically Wasm). by Benoit Jacob · 5 years ago
- 4184d61 Automated rollback from breakage by Ruy Contributors · 5 years ago
- 06ca74d Ruy: Tests for CPU ID detection. by Alex Stark · 5 years ago
- 179e1d3 Ruy: Minor clean up. by Alex Stark · 5 years ago
- 1963e4d Ruy: Improve includes. by Alex Stark · 5 years ago
- 15eeb2a Ruy: Move common copts to recently-added bzl file. by Alex Stark · 5 years ago
- d9741fb Ruy: Add bzl files for copts handling. by Alex Stark · 5 years ago
- dd04052 Ruy: Rearrange BUILD file. by Alex Stark · 5 years ago
- e439926 Ruy: Disable x86 enhancements under Clang < 8. by Alex Stark · 5 years ago
- 948a3ff Ruy: Add bzl files for copts handling. by Alex Stark · 5 years ago
- 5811fa7 Ruy: Introduce CPU ID detection on x86. by Alex Stark · 5 years ago
- 3e2acb1 Don't round the allocator's storage size to the next power of two. This is typically a huge buffer. We're going to reach a steady state where we have only a few such buffers and they won't get frequently reallocated, anyway. by Benoit Jacob · 5 years ago
- 607e445 Fix allocator in cases of sizes overflowing 32bit integer arithmetic by Benoit Jacob · 5 years ago
- 30a5e98 Ruy: Reformat bzl files. by Alex Stark · 5 years ago
- b7ebb18 Ruy: AVX2 model C++ code. by Alex Stark · 5 years ago
- fa69a4b Some more fixes to arm32 asm: by Benoit Jacob · 5 years ago
- 9a8ac17 Fix a vld1 instruction, see: by Benoit Jacob · 5 years ago
- bba4baa Ruy ARM32 packing asm by T.J. Alumbaugh · 5 years ago
- 0d203df Ruy ARMv7 asm int8 quantized kernel by T.J. Alumbaugh · 5 years ago
- 4e24eca Require dotprod when running the tests on ChromiumOS/ARM64. At the moment this is being used to run tests on emulator, we're currently getting dotprod support there, we don't want to regress that. by Benoit Jacob · 5 years ago
- e039ebb Ruy: Minor fixes to AVX-512 code. by Alex Stark · 5 years ago
- 6593992 Ruy: Modify guards to use X86 platform in some places. by Alex Stark · 5 years ago
- a218700 Ruy: Exclude GCC and other non-Clang compilers from x86 enhancements. by Alex Stark · 5 years ago
- 5de95a8 Ruy: Restrict path definitions to supported platforms. by Alex Stark · 5 years ago
- 4c5c04d Ruy: Split-off build targets specific to platform / ISA. by Alex Stark · 5 years ago
- f5c43c6 Ruy: Prune dependencies. by Alex Stark · 5 years ago
- bb4bbc4 Ruy: Fix bug in AVX-512 quant packing. by Alex Stark · 5 years ago
- 491ca6b Fix compilation error on arm32 by Benoit Jacob · 5 years ago
- 9284253 Ruy: Correct an include. by Alex Stark · 5 years ago
- ae10ec2 Rewrite the handling of threads==1, so it's a little more readable, and gets compiled with -O3 in a way that puts this case at the start of the function instead of at the end, which for a mysterious reason results in more stable performance. by Benoit Jacob · 5 years ago
- 55cb8a8 Specify -O3 and, on ARM32, -mfpu=neon as rule copts, for all our binary rules. by Benoit Jacob · 5 years ago
- ab0bac8 Disable AVX512 on __APPLE__ for now to unbreak the build. by Benoit Jacob · 5 years ago
- 58302ee Only import <sys/time.h> if we are running on Linux. Otherwise it causes the by Anna Revinskaya · 5 years ago
- 540a765 Ruy: Guard an include, fixing MacOS build. by Alex Stark · 5 years ago
- 8221a67 Ruy: Improve includes. by Alex Stark · 5 years ago
- 4a552ba Improve bzl file. by Alex Stark · 5 years ago
- 642abf9 Ruy: Fix to x86 (AVX-512) pack code. by Alex Stark · 5 years ago