1. 0ad580f ruy_advanced API touchups: MulWithPrepacked does not need prepacked operands to be mutable, and PrepackedMatrix does not need accessor methods. by Benoit Jacob · 4 years, 7 months ago
  2. 6b1171e Fix the build by Benoit Jacob · 4 years, 7 months ago
  3. 6039ccc Make context.h minimal, not #including other ruy headers. by Benoit Jacob · 4 years, 7 months ago
  4. 970304d finish c++ifying Context by Benoit Jacob · 4 years, 7 months ago
  5. e866a68 finish c++ifying MulParams by Benoit Jacob · 4 years, 7 months ago
  6. 2bfeb07 finish c++ifying Matrix by Benoit Jacob · 4 years, 7 months ago
  7. de0b1b6 finish c++ifying Layout by Benoit Jacob · 4 years, 7 months ago
  8. 145aecd Rename: by Benoit Jacob · 4 years, 7 months ago
  9. f3c69a7 1. Introduce InternalLayout, a private counterpart of Layout, to be used by internal_matrix.h classes. by Benoit Jacob · 4 years, 7 months ago
  10. 5b0e99d Emulate _BitScanReverse64 on 32-bit MSVC targets by Marat Dukhan · 4 years, 7 months ago
  11. 7a9da95 Increase visibility of size_util by T.J. Alumbaugh · 4 years, 7 months ago
  12. 439c1ac Refactor ruy's predefined Path set constants, introduce a new kDefaultPaths that compiles fewer paths than kAllPaths, and have ruy::Mul(...) use it (overload not taking an explicit Path parameter). by Benoit Jacob · 4 years, 7 months ago
  13. 9f53ba4 Rename :spec to :mul_params. by Benoit Jacob · 4 years, 7 months ago
  14. 98c1b9c Rename BasicSpec to MulParams. by Benoit Jacob · 4 years, 7 months ago
  15. d4dccd6 Introduce new ruy interface: by Benoit Jacob · 4 years, 7 months ago
  16. 2e2658f Follow-up fixes after commit 3a248b34: by Benoit Jacob · 4 years, 7 months ago
  17. 3a248b3 Fix a couple warnings (-Wshorten-64-to-32 and -Wc++98-compat-extra-semi) by Ruy Contributors · 4 years, 7 months ago
  18. 51efe3f Compile without warnings with GCC -Wextra. by Benoit Jacob · 4 years, 7 months ago
  19. 9a1f601 Wrap the gtest header so that we can disable unused-param warnings in it. by Benoit Jacob · 4 years, 7 months ago
  20. dfe0f69 Add -Wall and -Wextra (ie generate lots of warnings) to ruy_copts_base, by Benoit Jacob · 4 years, 7 months ago
  21. 4385cfa And yet one more -Wsign-compare only caught by (zealous?) GCC not Clang. by Benoit Jacob · 4 years, 7 months ago
  22. 5457790 One more -Wsign-compare fix. by Benoit Jacob · 4 years, 7 months ago
  23. 0fa8594 Comments with ASCII art boxes ending in a backslash are causing GCC by Benoit Jacob · 4 years, 7 months ago
  24. d681fa8 Compile without -Wunused-params warnings (enabled at -Wextra). by Benoit Jacob · 4 years, 7 months ago
  25. e7a04be Fix bug, was not returning anything by Benoit Jacob · 4 years, 7 months ago
  26. aa429c3 In the open-source build, link with -pthread to support GCC. by Benoit Jacob · 4 years, 7 months ago
  27. 8212617 Fix -Wsign-compare warnings. by Benoit Jacob · 4 years, 7 months ago
  28. 4452e73 Fix RUY compile time errors. by Fangjun Kuang · 4 years, 7 months ago
  29. e767d8f Rename some .bzl files. Mostly an internal repo change. by Benoit Jacob · 4 years, 7 months ago
  30. e91d8ab Internal change by Benoit Jacob · 4 years, 7 months ago
  31. 8071e5d Internal change by bjacob · 4 years, 7 months ago
  32. 600d1ec Tighten visibility: only make select targets publicly visible, default to private. by Benoit Jacob · 4 years, 8 months ago
  33. 7392ea6 Just some comment fixes as a pretext to test automatic export to GitHub. by Benoit Jacob · 4 years, 8 months ago
  34. 00b6423 Fix include guards after the move out of the TFLite. by Benoit Jacob · 4 years, 8 months ago
  35. 184fd58 Reference ruy from its new location as a separate GitHub project. by Benoit Jacob · 4 years, 8 months ago
  36. 91d6280 Internal change (#2) by bjacob · 4 years, 8 months ago
  37. 2b11bd4 Fix -Wreturn-std-move on some toolchains (e.g. MSVC STL with NDEBUG not set) by Ruy Contributors · 4 years, 8 months ago
  38. f7ea583 Move ruy's code to a ruy/ subdirectory. by Benoit Jacob · 4 years, 8 months ago
  39. 299a33a PR #37852: NFC - minor spelling tweaks in documents by Kazuaki Ishizaki · 4 years, 8 months ago
  40. 4d08486 PR #37487: NFC - minor spelling tweaks under lite/experimental directory by Kazuaki Ishizaki · 4 years, 8 months ago
  41. 3d62e95 Comment side_pair.h - mostly to test GitHub export. by Benoit Jacob · 4 years, 8 months ago
  42. f535b38 Do not depend on TensorFlow's config_setting's. by Benoit Jacob · 4 years, 8 months ago
  43. 930045e Give Ruy public visibility by Benoit Jacob · 4 years, 8 months ago
  44. 6062233 Cache pre-packed LHS when RHS <= 4 columns wide by T.J. Alumbaugh · 4 years, 9 months ago
  45. 894be7c PR #36230: Fix spelling errors by comet · 4 years, 10 months ago
  46. 089e927 Rename ruy::WaitUntil to ruy::Wait, because it is most closely related to std::condition_variable::wait, rather than to std::condition_variable::wait_until, so this could have been confusing. For us the "until" means "until the predicate returns true" while in the standard library, the _until suffix means "until some delay has elapsed". by Benoit Jacob · 4 years, 10 months ago
  47. 063cfc2 Add a unit test covering GetBlockByIndex. This is where traversal orders are implemented. A mistake there would not be caught in matrix multiplication tests as it would be a performance-only bug (or even a memory-locality-only bug not necessarily affecting latencies). by Benoit Jacob · 4 years, 10 months ago
  48. 4b90d3f drop the old benchmark_opt_set_* targets. they were broken since the move of code to .cc files in separate libraries caused the defining of the RUY_OPT_SET token in these targets to no longer affect the internal code being compiled. by Benoit Jacob · 4 years, 10 months ago
  49. 07c26e6 better column headers in the benchmark output. by Benoit Jacob · 4 years, 10 months ago
  50. d4abb86 Changes to BlockMap, in particular add Hilbert-curve fractal traversal above a certain size threshold. by Benoit Jacob · 4 years, 10 months ago
  51. c3bb0b7 Fix PMU-querying code to properly count child threads. There were 2 issues: by Benoit Jacob · 4 years, 10 months ago
  52. 906fc4f Use preload-for-write instructions before actual store instructions in kernels. by Benoit Jacob · 4 years, 10 months ago
  53. d7e30f3 Rename: PREFETCH -> PREFETCH_LOAD, in preparation for introducing PREFETCH_STORE. by Benoit Jacob · 4 years, 10 months ago
  54. f63b12e Benchmark tweaks: by Benoit Jacob · 4 years, 10 months ago
  55. a822519 Add a RUY_OPTIMIZE_FOR_MATMUL_BENCHMARK compile-time control allowing to set the default RUY_OPT_SET to what helps GEMM benchmarks as opposed to the default ruy behavior of doing what helps real applications the most. Unfortunately, some specific optimizations needed for real applications are counterproductive in GEMM benchmarks. In GEMM benchmarking contexts, measuring performance against other libraries more openly optimized for GEMM benchmarks, it makes sense to disable such optimizations that are counterproductive in such settings. by Benoit Jacob · 4 years, 10 months ago
  56. 178084d Soften the penalization of lack of cache locality a little. by Benoit Jacob · 4 years, 10 months ago
  57. 56824f9 TFLM: Fix double-promotion error. by Ruy Contributors · 4 years, 10 months ago
  58. 70aad42 TFLM: Fix double-promotion error. by Ruy Contributors · 4 years, 10 months ago
  59. 8b5d287 TFLM: Fix double-promotion error. by Ruy Contributors · 4 years, 10 months ago
  60. 738c0f5 Fix the build of benchmark_opt_set rules: the build failed when RUY_OPT_ASM was disabled, this RUY_INHERIT_PACK directive was needed regardless of it. by Benoit Jacob · 4 years, 10 months ago
  61. 9f54a1e Ruy - Add cache invalidation by T.J. Alumbaugh · 4 years, 10 months ago
  62. 14bfdeb Allow fixing some of the dimensions while allowing others to vary with RUY_BENCHMARK_CUBIC. Useful to gather narrow/shallow gemm benchmark results, not just cubic. by Benoit Jacob · 4 years, 10 months ago
  63. bf99297 Remove now dead code. by Benoit Jacob · 4 years, 10 months ago
  64. bcfb762 Further tweaks to test logic enabling bias and clamping. by Benoit Jacob · 4 years, 10 months ago
  65. 2b0d243 Ruy - fix test to run platform-specific path by T.J. Alumbaugh · 4 years, 10 months ago
  66. 5b36bac When benchmarking, avoid randomly turning on/off some variants e.g. bias-addition and nonzero zero-points. This makes a very small performance difference but in benchmarking we should consistently measure the same exact thing. by Benoit Jacob · 4 years, 10 months ago
  67. 54d2435 Simplify ruy tests by removing the complicated logic determining quantized multipliers and clamp bounds. Now unconditionally doing what we used to do when QUICK_BENCHMARK=1 was passed. That was needed in practice to get quick results, as the old logic was very slow as it had to rely on a reference implementaiton of matmul (else it would have been very confusing when matmul regressed). by Benoit Jacob · 4 years, 10 months ago
  68. 73c3214 Use an ordered map for thread roots so that profiles consistently start with the 'main thread' and have a consistent order of enumeration of the other threads. by Benoit Jacob · 4 years, 10 months ago
  69. eb351f2 Drop the dependency on gemmlowp/fixedpoint. by Benoit Jacob · 4 years, 10 months ago
  70. d1a14aa Remove ruy's dependency on the gemmlowp profiler. by Benoit Jacob · 4 years, 10 months ago
  71. 652f111 Update README.md. Add contributing.md and LICENSE. by Benoit Jacob · 4 years, 10 months ago
  72. 6180f1f Ruy x86: Introduce framework for SSE 4.2 and VNNI. by Alex Stark · 4 years, 10 months ago
  73. c40e695 Ruy: Add note to x86 AVX2 kernels. by Alex Stark · 4 years, 10 months ago
  74. 29840ae Fix compilation broken by cl/288340160. by Benoit Jacob · 4 years, 11 months ago
  75. 23adc55 Keep only the simple auxv method for detecting dotprod instructions. by Benoit Jacob · 4 years, 11 months ago
  76. 879f593 Ruy GEMV: x86 AVX-512 8-bit rough kernels. by Alex Stark · 5 years ago
  77. b6632d3 Ruy GEMV: x86 AVX-512 float rough kernels. by Alex Stark · 5 years ago
  78. f2db1bf Ruy GEMV: x86 AVX2 8-bit rough kernels. by Alex Stark · 5 years ago
  79. 44ca9b1 Ruy GEMV: x86 AVX2 float rough kernels. by Alex Stark · 5 years ago
  80. 8d3e931 Add `cacheable` flag to Ruy Matrix so that caller "opts in" to cache behavior on a per-call basis by T.J. Alumbaugh · 5 years ago
  81. de0a983 In gemv-ish cases, each byte of the big weights matrix is traversed only once, so any notion of data locality is irrelevant. Ignore the 'cache locality score' by forcing it to be 0 in that case. by Benoit Jacob · 5 years ago
  82. 9e08d80 Limit rectangularness to avoid using too tiny kernel blocks in the case of highly rectangular destination matrices (gemv-ish cases), which would result in too few iterations of the kernel inner loop to be fully efficient. Now aim to have at least 8 iterations of the kernel inner loop if possible. by Benoit Jacob · 5 years ago
  83. 718aa11 Ruy: remove additional flag guarding use of prepacked cache by T.J. Alumbaugh · 5 years ago
  84. 7bbf219 NeonCpuBackendGemm uses CpuBackedGemm interface instead of Ruy interface by T.J. Alumbaugh · 5 years ago
  85. 2c7897c Ruy: Profile cache ejection. by T.J. Alumbaugh · 5 years ago
  86. c85f6d7 Resubmit of http://cl/283555950 with fix for win32. by Sean Silva · 5 years ago
  87. 19a85c4 Use separate allocator for cached prepacked matrix allocations. by Ruy Contributors · 5 years ago
  88. 9a523b8 Use separate allocator for cached prepacked matrix allocations. by Sean Silva · 5 years ago
  89. 634945d use nullptr for null pointers. by Benoit Jacob · 5 years ago
  90. 20fc9c1 Move deps to BUILD file to make them easier to manage with automation by Ruy Contributors · 5 years ago
  91. 48f3bc0 Ruy: Permit GEMV code to thread if thread_count above 1 by T.J. Alumbaugh · 5 years ago
  92. a67c966 Ruy: Add non-zero mean in matrix test data, exercising along-row summations. by Alex Stark · 5 years ago
  93. 3a43125 Ruy: Resubmit of: Optimize (partial) of x86 AVX-512 8-bit pack. by Alex Stark · 5 years ago
  94. b645c83 Ruy: Re-submit of: Further optimization (partial) of x86 AVX2 8-bit pack. by Alex Stark · 5 years ago
  95. 1cff2c1 Ruy: Reduce compiler warnings. by Alex Stark · 5 years ago
  96. b03ae6b Ruy: Further optimization (partial) of x86 AVX2 8-bit pack. by Alex Stark · 5 years ago
  97. cb1db77 Ruy: Optimize (partial) of x86 AVX-512 8-bit pack. by Alex Stark · 5 years ago
  98. 3b3dbb1 Ruy ARMv8 quantized GEMV kernel by T.J. Alumbaugh · 5 years ago
  99. a1aad26 Ruy: ARMv8 GEMV kernel with dotprod support by T.J. Alumbaugh · 5 years ago
  100. 19a7e80 Provide path to Ruy::Mul for MatrixBatchVectorMultiplyAccumulate by T.J. Alumbaugh · 5 years ago