1. 634945d use nullptr for null pointers. by Benoit Jacob · 5 years ago
  2. 20fc9c1 Move deps to BUILD file to make them easier to manage with automation by Ruy Contributors · 5 years ago
  3. 48f3bc0 Ruy: Permit GEMV code to thread if thread_count above 1 by T.J. Alumbaugh · 5 years ago
  4. a67c966 Ruy: Add non-zero mean in matrix test data, exercising along-row summations. by Alex Stark · 5 years ago
  5. 3a43125 Ruy: Resubmit of: Optimize (partial) of x86 AVX-512 8-bit pack. by Alex Stark · 5 years ago
  6. b645c83 Ruy: Re-submit of: Further optimization (partial) of x86 AVX2 8-bit pack. by Alex Stark · 5 years ago
  7. 1cff2c1 Ruy: Reduce compiler warnings. by Alex Stark · 5 years ago
  8. b03ae6b Ruy: Further optimization (partial) of x86 AVX2 8-bit pack. by Alex Stark · 5 years ago
  9. cb1db77 Ruy: Optimize (partial) of x86 AVX-512 8-bit pack. by Alex Stark · 5 years ago
  10. 3b3dbb1 Ruy ARMv8 quantized GEMV kernel by T.J. Alumbaugh · 5 years ago
  11. a1aad26 Ruy: ARMv8 GEMV kernel with dotprod support by T.J. Alumbaugh · 5 years ago
  12. 19a7e80 Provide path to Ruy::Mul for MatrixBatchVectorMultiplyAccumulate by T.J. Alumbaugh · 5 years ago
  13. b148d09 Tune default Ruy cache behavior (still off by default) by T.J. Alumbaugh · 5 years ago
  14. bb74635 Ruy: Fix to x86 AVX2 float pack. by Alex Stark · 5 years ago
  15. 5d5ea3d Ruy: Add a cache policy and implementation. Protected by #ifdef usage and default policy is off. by T.J. Alumbaugh · 5 years ago
  16. df8c25b Ruy cache of prepacked matrices by T.J. Alumbaugh · 5 years ago
  17. ab42065 Ruy: Reduce compiler warnings. by Alex Stark · 5 years ago
  18. 0a255c6 Ruy: Improve x86 kernel profiling labels. by Alex Stark · 5 years ago
  19. 78c747e Ruy: Add benchmark variable to change range of sizes. by Alex Stark · 5 years ago
  20. c02a73a Ruy: Further optimization (partial) of x86 AVX2 8-bit pack. by Alex Stark · 5 years ago
  21. de56663 Ruy: Further optimization (partial) of AVX2 float pack. by Alex Stark · 5 years ago
  22. 9f1042d Ruy: Optimize (partial) of x86 AVX2 8-bit pack. by Alex Stark · 5 years ago
  23. b1e4366 Ruy: Optimize (partial) of x86 AVX-512 8-bit pack. by Alex Stark · 5 years ago
  24. 16c6820 Ruy: Clean up x86 packing profiling labels. by Alex Stark · 5 years ago
  25. 27b03e9 Ruy: Remove unused function from AVX-512 8-bit kernel. by Alex Stark · 5 years ago
  26. 0a72cad Ruy: Optimization (partial) of AVX2 float pack. by Alex Stark · 5 years ago
  27. 10d7034 Ruy: Output message when overriding block map size. by Alex Stark · 5 years ago
  28. 035e2c7 Use "-O3" for optimized ruy build by Terry Heo · 5 years ago
  29. 4eee09f Ruy: Workaround compiler problems with low optimization. by Alex Stark · 5 years ago
  30. 45d833e Ruy: No longer disable AVX2 within x86 by default. by Alex Stark · 5 years ago
  31. 770a968 Ruy: Refinements to 8-bit x86 kernels. by Alex Stark · 5 years ago
  32. 3a48f4e Ruy: Improve x86 8-bit AVX-512 kernel loops. by Alex Stark · 5 years ago
  33. d5f0a6c Ruy: Unroll x86 8-bit AVX-512 kernel loops. by Alex Stark · 5 years ago
  34. 45ada67 Ruy: Improve handling of offsets in 8-bit AVX-512 kernel. by Alex Stark · 5 years ago
  35. c7e47ca Ruy: Improve output stage of 8-bit AVX-512 kernel. by Alex Stark · 5 years ago
  36. dd70d1c Ruy: Move x86 AVX-512 utility functions into namespace. by Alex Stark · 5 years ago
  37. ac132b9 Ruy: Improve offsets handling for 8-bit AVX2. by Alex Stark · 5 years ago
  38. 3fb0f26 Ruy: Rework accumulation in x86 AVX2 8-bit kernel. by Alex Stark · 5 years ago
  39. e09d9b4 Ruy: Combine output stages in x86 AVX2 8-bit kernel. by Alex Stark · 5 years ago
  40. 16f513c Ruy: Load RHS data directly in x86 AVX-512 float kernel. by Alex Stark · 5 years ago
  41. d4508c8 Ruy: Load RHS data directly in x86 AVX2 float kernel. by Alex Stark · 5 years ago
  42. fc7e615 Ruy: Unroll loops in x86 AVX2 8-bit kernel. by Alex Stark · 5 years ago
  43. 45159dc Use structured comparison macros e.g. RUY_CHECK_EQ(a, b) by Benoit Jacob · 5 years ago
  44. a96184a Rewrite RUY_CHECK family of macros: by Benoit Jacob · 5 years ago
  45. 02f886b Ruy ARM32: additional 8bit optimizations by T.J. Alumbaugh · 5 years ago
  46. 8cc147e Ruy ARM32 GEMV kernel by T.J. Alumbaugh · 5 years ago
  47. e5d56a3 Split MakeBlockMap into smaller functions by Benoit Jacob · 5 years ago
  48. 494a6e6 Ruy: Improvements to AVX-512 8bit code. by Alex Stark · 5 years ago
  49. 51636ba Ruy: Rough optimization of x86 AVX2 8-bit kernel. by Alex Stark · 5 years ago
  50. 06be363 Ruy: Rough optimization of x86 AVX2 float kernel. by Alex Stark · 5 years ago
  51. f7118e1 Rewrite MakeBlockMap to be more principled and at the same time more explicitly empirically derived. by Benoit Jacob · 5 years ago
  52. f585116 Make thread_count part of BlockMap. Allow MakeBlockMap to take a tentative thread_count value as input, potentially use that value to choose BlockMap parameters, and decide on the definitive thread_count value. by Benoit Jacob · 5 years ago
  53. 3e1d455 Rewrite the 'rectangularness' computation. Since 'rectangularness' is now the largest-scale subdivision, it should not (anymore) have anything to do with the kernel layout. Rectangularness is now nothing but the shape 'aspect ratio', (rows/columns), of the destination matrix. This keeps concepts simpler and more orthogonal -- rectangularness is a property of the destination matrix alone, orthogonal to kernel. This will allow writing better, simpler block_map logic. by Benoit Jacob · 5 years ago
  54. 77344d0 Introduce pot_log2, which checks that its argument is a power of two then returns its log2. by Benoit Jacob · 5 years ago
  55. fa906af Switch the rectangularness of blockmaps from inner to outer. by Benoit Jacob · 5 years ago
  56. b580a77 Ruy ARM32: Optimize 8bit kernel by T.J. Alumbaugh · 5 years ago
  57. 38da52a Ruy - optimize ARM32 quantized int kernel by T.J. Alumbaugh · 5 years ago
  58. 945aaa1 Remove portable_test_suite inclusion for ruy tests by Jared Duke · 5 years ago
  59. 8428c44 Ruy: Add compile-time and runtime FMA checking under AVX2. by Alex Stark · 5 years ago
  60. ffcbb6e Ruy: Add mechanism to mask out paths. by Alex Stark · 5 years ago
  61. d594e94 Automated rollback of rollback. Fixed in preceding change. by Alex Stark · 5 years ago
  62. 8d47072 Ruy: Ensure that a couple of classes defined (empty) for non-x86 and non-ARM. by Alex Stark · 5 years ago
  63. 8d21e2b Fix an assertion. Also edit a comment in ruy/BUILD about debugging. by Benoit Jacob · 5 years ago
  64. ee13042 Support Emscripten (ie typically Wasm). by Benoit Jacob · 5 years ago
  65. 4184d61 Automated rollback from breakage by Ruy Contributors · 5 years ago
  66. 06ca74d Ruy: Tests for CPU ID detection. by Alex Stark · 5 years ago
  67. 179e1d3 Ruy: Minor clean up. by Alex Stark · 5 years ago
  68. 1963e4d Ruy: Improve includes. by Alex Stark · 5 years ago
  69. 15eeb2a Ruy: Move common copts to recently-added bzl file. by Alex Stark · 5 years ago
  70. d9741fb Ruy: Add bzl files for copts handling. by Alex Stark · 5 years ago
  71. dd04052 Ruy: Rearrange BUILD file. by Alex Stark · 5 years ago
  72. e439926 Ruy: Disable x86 enhancements under Clang < 8. by Alex Stark · 5 years ago
  73. 948a3ff Ruy: Add bzl files for copts handling. by Alex Stark · 5 years ago
  74. 5811fa7 Ruy: Introduce CPU ID detection on x86. by Alex Stark · 5 years ago
  75. 3e2acb1 Don't round the allocator's storage size to the next power of two. This is typically a huge buffer. We're going to reach a steady state where we have only a few such buffers and they won't get frequently reallocated, anyway. by Benoit Jacob · 5 years ago
  76. 607e445 Fix allocator in cases of sizes overflowing 32bit integer arithmetic by Benoit Jacob · 5 years ago
  77. 30a5e98 Ruy: Reformat bzl files. by Alex Stark · 5 years ago
  78. b7ebb18 Ruy: AVX2 model C++ code. by Alex Stark · 5 years ago
  79. fa69a4b Some more fixes to arm32 asm: by Benoit Jacob · 5 years ago
  80. 9a8ac17 Fix a vld1 instruction, see: by Benoit Jacob · 5 years ago
  81. bba4baa Ruy ARM32 packing asm by T.J. Alumbaugh · 5 years ago
  82. 0d203df Ruy ARMv7 asm int8 quantized kernel by T.J. Alumbaugh · 5 years ago
  83. 4e24eca Require dotprod when running the tests on ChromiumOS/ARM64. At the moment this is being used to run tests on emulator, we're currently getting dotprod support there, we don't want to regress that. by Benoit Jacob · 5 years ago
  84. e039ebb Ruy: Minor fixes to AVX-512 code. by Alex Stark · 5 years ago
  85. 6593992 Ruy: Modify guards to use X86 platform in some places. by Alex Stark · 5 years ago
  86. a218700 Ruy: Exclude GCC and other non-Clang compilers from x86 enhancements. by Alex Stark · 5 years ago
  87. 5de95a8 Ruy: Restrict path definitions to supported platforms. by Alex Stark · 5 years ago
  88. 4c5c04d Ruy: Split-off build targets specific to platform / ISA. by Alex Stark · 5 years ago
  89. f5c43c6 Ruy: Prune dependencies. by Alex Stark · 5 years ago
  90. bb4bbc4 Ruy: Fix bug in AVX-512 quant packing. by Alex Stark · 5 years ago
  91. 491ca6b Fix compilation error on arm32 by Benoit Jacob · 5 years ago
  92. 9284253 Ruy: Correct an include. by Alex Stark · 5 years ago
  93. ae10ec2 Rewrite the handling of threads==1, so it's a little more readable, and gets compiled with -O3 in a way that puts this case at the start of the function instead of at the end, which for a mysterious reason results in more stable performance. by Benoit Jacob · 5 years ago
  94. 55cb8a8 Specify -O3 and, on ARM32, -mfpu=neon as rule copts, for all our binary rules. by Benoit Jacob · 5 years ago
  95. ab0bac8 Disable AVX512 on __APPLE__ for now to unbreak the build. by Benoit Jacob · 5 years ago
  96. 58302ee Only import <sys/time.h> if we are running on Linux. Otherwise it causes the by Anna Revinskaya · 5 years ago
  97. 540a765 Ruy: Guard an include, fixing MacOS build. by Alex Stark · 5 years ago
  98. 8221a67 Ruy: Improve includes. by Alex Stark · 5 years ago
  99. 4a552ba Improve bzl file. by Alex Stark · 5 years ago
  100. 642abf9 Ruy: Fix to x86 (AVX-512) pack code. by Alex Stark · 5 years ago