1. bcfb762 Further tweaks to test logic enabling bias and clamping. by Benoit Jacob · 4 years, 10 months ago
  2. 2b0d243 Ruy - fix test to run platform-specific path by T.J. Alumbaugh · 4 years, 10 months ago
  3. 5b36bac When benchmarking, avoid randomly turning on/off some variants e.g. bias-addition and nonzero zero-points. This makes a very small performance difference but in benchmarking we should consistently measure the same exact thing. by Benoit Jacob · 4 years, 10 months ago
  4. 54d2435 Simplify ruy tests by removing the complicated logic determining quantized multipliers and clamp bounds. Now unconditionally doing what we used to do when QUICK_BENCHMARK=1 was passed. That was needed in practice to get quick results, as the old logic was very slow as it had to rely on a reference implementaiton of matmul (else it would have been very confusing when matmul regressed). by Benoit Jacob · 4 years, 10 months ago
  5. 73c3214 Use an ordered map for thread roots so that profiles consistently start with the 'main thread' and have a consistent order of enumeration of the other threads. by Benoit Jacob · 4 years, 10 months ago
  6. eb351f2 Drop the dependency on gemmlowp/fixedpoint. by Benoit Jacob · 4 years, 10 months ago
  7. d1a14aa Remove ruy's dependency on the gemmlowp profiler. by Benoit Jacob · 4 years, 10 months ago
  8. 652f111 Update README.md. Add contributing.md and LICENSE. by Benoit Jacob · 4 years, 10 months ago
  9. 6180f1f Ruy x86: Introduce framework for SSE 4.2 and VNNI. by Alex Stark · 4 years, 10 months ago
  10. c40e695 Ruy: Add note to x86 AVX2 kernels. by Alex Stark · 4 years, 10 months ago
  11. 29840ae Fix compilation broken by cl/288340160. by Benoit Jacob · 4 years, 11 months ago
  12. 23adc55 Keep only the simple auxv method for detecting dotprod instructions. by Benoit Jacob · 4 years, 11 months ago
  13. 879f593 Ruy GEMV: x86 AVX-512 8-bit rough kernels. by Alex Stark · 5 years ago
  14. b6632d3 Ruy GEMV: x86 AVX-512 float rough kernels. by Alex Stark · 5 years ago
  15. f2db1bf Ruy GEMV: x86 AVX2 8-bit rough kernels. by Alex Stark · 5 years ago
  16. 44ca9b1 Ruy GEMV: x86 AVX2 float rough kernels. by Alex Stark · 5 years ago
  17. 8d3e931 Add `cacheable` flag to Ruy Matrix so that caller "opts in" to cache behavior on a per-call basis by T.J. Alumbaugh · 5 years ago
  18. de0a983 In gemv-ish cases, each byte of the big weights matrix is traversed only once, so any notion of data locality is irrelevant. Ignore the 'cache locality score' by forcing it to be 0 in that case. by Benoit Jacob · 5 years ago
  19. 9e08d80 Limit rectangularness to avoid using too tiny kernel blocks in the case of highly rectangular destination matrices (gemv-ish cases), which would result in too few iterations of the kernel inner loop to be fully efficient. Now aim to have at least 8 iterations of the kernel inner loop if possible. by Benoit Jacob · 5 years ago
  20. 718aa11 Ruy: remove additional flag guarding use of prepacked cache by T.J. Alumbaugh · 5 years ago
  21. 7bbf219 NeonCpuBackendGemm uses CpuBackedGemm interface instead of Ruy interface by T.J. Alumbaugh · 5 years ago
  22. 2c7897c Ruy: Profile cache ejection. by T.J. Alumbaugh · 5 years ago
  23. c85f6d7 Resubmit of http://cl/283555950 with fix for win32. by Sean Silva · 5 years ago
  24. 19a85c4 Use separate allocator for cached prepacked matrix allocations. by Ruy Contributors · 5 years ago
  25. 9a523b8 Use separate allocator for cached prepacked matrix allocations. by Sean Silva · 5 years ago
  26. 634945d use nullptr for null pointers. by Benoit Jacob · 5 years ago
  27. 20fc9c1 Move deps to BUILD file to make them easier to manage with automation by Ruy Contributors · 5 years ago
  28. 48f3bc0 Ruy: Permit GEMV code to thread if thread_count above 1 by T.J. Alumbaugh · 5 years ago
  29. a67c966 Ruy: Add non-zero mean in matrix test data, exercising along-row summations. by Alex Stark · 5 years ago
  30. 3a43125 Ruy: Resubmit of: Optimize (partial) of x86 AVX-512 8-bit pack. by Alex Stark · 5 years ago
  31. b645c83 Ruy: Re-submit of: Further optimization (partial) of x86 AVX2 8-bit pack. by Alex Stark · 5 years ago
  32. 1cff2c1 Ruy: Reduce compiler warnings. by Alex Stark · 5 years ago
  33. b03ae6b Ruy: Further optimization (partial) of x86 AVX2 8-bit pack. by Alex Stark · 5 years ago
  34. cb1db77 Ruy: Optimize (partial) of x86 AVX-512 8-bit pack. by Alex Stark · 5 years ago
  35. 3b3dbb1 Ruy ARMv8 quantized GEMV kernel by T.J. Alumbaugh · 5 years ago
  36. a1aad26 Ruy: ARMv8 GEMV kernel with dotprod support by T.J. Alumbaugh · 5 years ago
  37. 19a7e80 Provide path to Ruy::Mul for MatrixBatchVectorMultiplyAccumulate by T.J. Alumbaugh · 5 years ago
  38. b148d09 Tune default Ruy cache behavior (still off by default) by T.J. Alumbaugh · 5 years ago
  39. bb74635 Ruy: Fix to x86 AVX2 float pack. by Alex Stark · 5 years ago
  40. 5d5ea3d Ruy: Add a cache policy and implementation. Protected by #ifdef usage and default policy is off. by T.J. Alumbaugh · 5 years ago
  41. df8c25b Ruy cache of prepacked matrices by T.J. Alumbaugh · 5 years ago
  42. ab42065 Ruy: Reduce compiler warnings. by Alex Stark · 5 years ago
  43. 0a255c6 Ruy: Improve x86 kernel profiling labels. by Alex Stark · 5 years ago
  44. 78c747e Ruy: Add benchmark variable to change range of sizes. by Alex Stark · 5 years ago
  45. c02a73a Ruy: Further optimization (partial) of x86 AVX2 8-bit pack. by Alex Stark · 5 years ago
  46. de56663 Ruy: Further optimization (partial) of AVX2 float pack. by Alex Stark · 5 years ago
  47. 9f1042d Ruy: Optimize (partial) of x86 AVX2 8-bit pack. by Alex Stark · 5 years ago
  48. b1e4366 Ruy: Optimize (partial) of x86 AVX-512 8-bit pack. by Alex Stark · 5 years ago
  49. 16c6820 Ruy: Clean up x86 packing profiling labels. by Alex Stark · 5 years ago
  50. 27b03e9 Ruy: Remove unused function from AVX-512 8-bit kernel. by Alex Stark · 5 years ago
  51. 0a72cad Ruy: Optimization (partial) of AVX2 float pack. by Alex Stark · 5 years ago
  52. 10d7034 Ruy: Output message when overriding block map size. by Alex Stark · 5 years ago
  53. 035e2c7 Use "-O3" for optimized ruy build by Terry Heo · 5 years ago
  54. 4eee09f Ruy: Workaround compiler problems with low optimization. by Alex Stark · 5 years ago
  55. 45d833e Ruy: No longer disable AVX2 within x86 by default. by Alex Stark · 5 years ago
  56. 770a968 Ruy: Refinements to 8-bit x86 kernels. by Alex Stark · 5 years ago
  57. 3a48f4e Ruy: Improve x86 8-bit AVX-512 kernel loops. by Alex Stark · 5 years ago
  58. d5f0a6c Ruy: Unroll x86 8-bit AVX-512 kernel loops. by Alex Stark · 5 years ago
  59. 45ada67 Ruy: Improve handling of offsets in 8-bit AVX-512 kernel. by Alex Stark · 5 years ago
  60. c7e47ca Ruy: Improve output stage of 8-bit AVX-512 kernel. by Alex Stark · 5 years ago
  61. dd70d1c Ruy: Move x86 AVX-512 utility functions into namespace. by Alex Stark · 5 years ago
  62. ac132b9 Ruy: Improve offsets handling for 8-bit AVX2. by Alex Stark · 5 years ago
  63. 3fb0f26 Ruy: Rework accumulation in x86 AVX2 8-bit kernel. by Alex Stark · 5 years ago
  64. e09d9b4 Ruy: Combine output stages in x86 AVX2 8-bit kernel. by Alex Stark · 5 years ago
  65. 16f513c Ruy: Load RHS data directly in x86 AVX-512 float kernel. by Alex Stark · 5 years ago
  66. d4508c8 Ruy: Load RHS data directly in x86 AVX2 float kernel. by Alex Stark · 5 years ago
  67. fc7e615 Ruy: Unroll loops in x86 AVX2 8-bit kernel. by Alex Stark · 5 years ago
  68. 45159dc Use structured comparison macros e.g. RUY_CHECK_EQ(a, b) by Benoit Jacob · 5 years ago
  69. a96184a Rewrite RUY_CHECK family of macros: by Benoit Jacob · 5 years ago
  70. 02f886b Ruy ARM32: additional 8bit optimizations by T.J. Alumbaugh · 5 years ago
  71. 8cc147e Ruy ARM32 GEMV kernel by T.J. Alumbaugh · 5 years ago
  72. e5d56a3 Split MakeBlockMap into smaller functions by Benoit Jacob · 5 years ago
  73. 494a6e6 Ruy: Improvements to AVX-512 8bit code. by Alex Stark · 5 years ago
  74. 51636ba Ruy: Rough optimization of x86 AVX2 8-bit kernel. by Alex Stark · 5 years ago
  75. 06be363 Ruy: Rough optimization of x86 AVX2 float kernel. by Alex Stark · 5 years ago
  76. f7118e1 Rewrite MakeBlockMap to be more principled and at the same time more explicitly empirically derived. by Benoit Jacob · 5 years ago
  77. f585116 Make thread_count part of BlockMap. Allow MakeBlockMap to take a tentative thread_count value as input, potentially use that value to choose BlockMap parameters, and decide on the definitive thread_count value. by Benoit Jacob · 5 years ago
  78. 3e1d455 Rewrite the 'rectangularness' computation. Since 'rectangularness' is now the largest-scale subdivision, it should not (anymore) have anything to do with the kernel layout. Rectangularness is now nothing but the shape 'aspect ratio', (rows/columns), of the destination matrix. This keeps concepts simpler and more orthogonal -- rectangularness is a property of the destination matrix alone, orthogonal to kernel. This will allow writing better, simpler block_map logic. by Benoit Jacob · 5 years ago
  79. 77344d0 Introduce pot_log2, which checks that its argument is a power of two then returns its log2. by Benoit Jacob · 5 years ago
  80. fa906af Switch the rectangularness of blockmaps from inner to outer. by Benoit Jacob · 5 years ago
  81. b580a77 Ruy ARM32: Optimize 8bit kernel by T.J. Alumbaugh · 5 years ago
  82. 38da52a Ruy - optimize ARM32 quantized int kernel by T.J. Alumbaugh · 5 years ago
  83. 945aaa1 Remove portable_test_suite inclusion for ruy tests by Jared Duke · 5 years ago
  84. 8428c44 Ruy: Add compile-time and runtime FMA checking under AVX2. by Alex Stark · 5 years ago
  85. ffcbb6e Ruy: Add mechanism to mask out paths. by Alex Stark · 5 years ago
  86. d594e94 Automated rollback of rollback. Fixed in preceding change. by Alex Stark · 5 years ago
  87. 8d47072 Ruy: Ensure that a couple of classes defined (empty) for non-x86 and non-ARM. by Alex Stark · 5 years ago
  88. 8d21e2b Fix an assertion. Also edit a comment in ruy/BUILD about debugging. by Benoit Jacob · 5 years ago
  89. ee13042 Support Emscripten (ie typically Wasm). by Benoit Jacob · 5 years ago
  90. 4184d61 Automated rollback from breakage by Ruy Contributors · 5 years ago
  91. 06ca74d Ruy: Tests for CPU ID detection. by Alex Stark · 5 years ago
  92. 179e1d3 Ruy: Minor clean up. by Alex Stark · 5 years ago
  93. 1963e4d Ruy: Improve includes. by Alex Stark · 5 years ago
  94. 15eeb2a Ruy: Move common copts to recently-added bzl file. by Alex Stark · 5 years ago
  95. d9741fb Ruy: Add bzl files for copts handling. by Alex Stark · 5 years ago
  96. dd04052 Ruy: Rearrange BUILD file. by Alex Stark · 5 years ago
  97. e439926 Ruy: Disable x86 enhancements under Clang < 8. by Alex Stark · 5 years ago
  98. 948a3ff Ruy: Add bzl files for copts handling. by Alex Stark · 5 years ago
  99. 5811fa7 Ruy: Introduce CPU ID detection on x86. by Alex Stark · 5 years ago
  100. 3e2acb1 Don't round the allocator's storage size to the next power of two. This is typically a huge buffer. We're going to reach a steady state where we have only a few such buffers and they won't get frequently reallocated, anyway. by Benoit Jacob · 5 years ago