- 0ad580f ruy_advanced API touchups: MulWithPrepacked does not need prepacked operands to be mutable, and PrepackedMatrix does not need accessor methods. by Benoit Jacob · 4 years, 7 months ago
- 6b1171e Fix the build by Benoit Jacob · 4 years, 7 months ago
- 6039ccc Make context.h minimal, not #including other ruy headers. by Benoit Jacob · 4 years, 7 months ago
- 970304d finish c++ifying Context by Benoit Jacob · 4 years, 7 months ago
- e866a68 finish c++ifying MulParams by Benoit Jacob · 4 years, 7 months ago
- 2bfeb07 finish c++ifying Matrix by Benoit Jacob · 4 years, 7 months ago
- de0b1b6 finish c++ifying Layout by Benoit Jacob · 4 years, 7 months ago
- 145aecd Rename: by Benoit Jacob · 4 years, 7 months ago
- f3c69a7 1. Introduce InternalLayout, a private counterpart of Layout, to be used by internal_matrix.h classes. by Benoit Jacob · 4 years, 7 months ago
- 5b0e99d Emulate _BitScanReverse64 on 32-bit MSVC targets by Marat Dukhan · 4 years, 7 months ago
- 7a9da95 Increase visibility of size_util by T.J. Alumbaugh · 4 years, 7 months ago
- 439c1ac Refactor ruy's predefined Path set constants, introduce a new kDefaultPaths that compiles fewer paths than kAllPaths, and have ruy::Mul(...) use it (overload not taking an explicit Path parameter). by Benoit Jacob · 4 years, 7 months ago
- 9f53ba4 Rename :spec to :mul_params. by Benoit Jacob · 4 years, 7 months ago
- 98c1b9c Rename BasicSpec to MulParams. by Benoit Jacob · 4 years, 7 months ago
- d4dccd6 Introduce new ruy interface: by Benoit Jacob · 4 years, 7 months ago
- 2e2658f Follow-up fixes after commit 3a248b34: by Benoit Jacob · 4 years, 7 months ago
- 3a248b3 Fix a couple warnings (-Wshorten-64-to-32 and -Wc++98-compat-extra-semi) by Ruy Contributors · 4 years, 7 months ago
- 51efe3f Compile without warnings with GCC -Wextra. by Benoit Jacob · 4 years, 7 months ago
- 9a1f601 Wrap the gtest header so that we can disable unused-param warnings in it. by Benoit Jacob · 4 years, 7 months ago
- dfe0f69 Add -Wall and -Wextra (ie generate lots of warnings) to ruy_copts_base, by Benoit Jacob · 4 years, 7 months ago
- 4385cfa And yet one more -Wsign-compare only caught by (zealous?) GCC not Clang. by Benoit Jacob · 4 years, 7 months ago
- 5457790 One more -Wsign-compare fix. by Benoit Jacob · 4 years, 7 months ago
- 0fa8594 Comments with ASCII art boxes ending in a backslash are causing GCC by Benoit Jacob · 4 years, 7 months ago
- d681fa8 Compile without -Wunused-params warnings (enabled at -Wextra). by Benoit Jacob · 4 years, 7 months ago
- e7a04be Fix bug, was not returning anything by Benoit Jacob · 4 years, 7 months ago
- aa429c3 In the open-source build, link with -pthread to support GCC. by Benoit Jacob · 4 years, 7 months ago
- 8212617 Fix -Wsign-compare warnings. by Benoit Jacob · 4 years, 7 months ago
- 4452e73 Fix RUY compile time errors. by Fangjun Kuang · 4 years, 7 months ago
- e767d8f Rename some .bzl files. Mostly an internal repo change. by Benoit Jacob · 4 years, 7 months ago
- e91d8ab Internal change by Benoit Jacob · 4 years, 7 months ago
- 8071e5d Internal change by bjacob · 4 years, 7 months ago
- 600d1ec Tighten visibility: only make select targets publicly visible, default to private. by Benoit Jacob · 4 years, 8 months ago
- 7392ea6 Just some comment fixes as a pretext to test automatic export to GitHub. by Benoit Jacob · 4 years, 8 months ago
- 00b6423 Fix include guards after the move out of the TFLite. by Benoit Jacob · 4 years, 8 months ago
- 184fd58 Reference ruy from its new location as a separate GitHub project. by Benoit Jacob · 4 years, 8 months ago
- 91d6280 Internal change (#2) by bjacob · 4 years, 8 months ago
- 2b11bd4 Fix -Wreturn-std-move on some toolchains (e.g. MSVC STL with NDEBUG not set) by Ruy Contributors · 4 years, 8 months ago
- f7ea583 Move ruy's code to a ruy/ subdirectory. by Benoit Jacob · 4 years, 8 months ago
- 299a33a PR #37852: NFC - minor spelling tweaks in documents by Kazuaki Ishizaki · 4 years, 8 months ago
- 4d08486 PR #37487: NFC - minor spelling tweaks under lite/experimental directory by Kazuaki Ishizaki · 4 years, 8 months ago
- 3d62e95 Comment side_pair.h - mostly to test GitHub export. by Benoit Jacob · 4 years, 8 months ago
- f535b38 Do not depend on TensorFlow's config_setting's. by Benoit Jacob · 4 years, 8 months ago
- 930045e Give Ruy public visibility by Benoit Jacob · 4 years, 8 months ago
- 6062233 Cache pre-packed LHS when RHS <= 4 columns wide by T.J. Alumbaugh · 4 years, 9 months ago
- 894be7c PR #36230: Fix spelling errors by comet · 4 years, 10 months ago
- 089e927 Rename ruy::WaitUntil to ruy::Wait, because it is most closely related to std::condition_variable::wait, rather than to std::condition_variable::wait_until, so this could have been confusing. For us the "until" means "until the predicate returns true" while in the standard library, the _until suffix means "until some delay has elapsed". by Benoit Jacob · 4 years, 10 months ago
- 063cfc2 Add a unit test covering GetBlockByIndex. This is where traversal orders are implemented. A mistake there would not be caught in matrix multiplication tests as it would be a performance-only bug (or even a memory-locality-only bug not necessarily affecting latencies). by Benoit Jacob · 4 years, 10 months ago
- 4b90d3f drop the old benchmark_opt_set_* targets. they were broken since the move of code to .cc files in separate libraries caused the defining of the RUY_OPT_SET token in these targets to no longer affect the internal code being compiled. by Benoit Jacob · 4 years, 10 months ago
- 07c26e6 better column headers in the benchmark output. by Benoit Jacob · 4 years, 10 months ago
- d4abb86 Changes to BlockMap, in particular add Hilbert-curve fractal traversal above a certain size threshold. by Benoit Jacob · 4 years, 10 months ago
- c3bb0b7 Fix PMU-querying code to properly count child threads. There were 2 issues: by Benoit Jacob · 4 years, 10 months ago
- 906fc4f Use preload-for-write instructions before actual store instructions in kernels. by Benoit Jacob · 4 years, 10 months ago
- d7e30f3 Rename: PREFETCH -> PREFETCH_LOAD, in preparation for introducing PREFETCH_STORE. by Benoit Jacob · 4 years, 10 months ago
- f63b12e Benchmark tweaks: by Benoit Jacob · 4 years, 10 months ago
- a822519 Add a RUY_OPTIMIZE_FOR_MATMUL_BENCHMARK compile-time control allowing to set the default RUY_OPT_SET to what helps GEMM benchmarks as opposed to the default ruy behavior of doing what helps real applications the most. Unfortunately, some specific optimizations needed for real applications are counterproductive in GEMM benchmarks. In GEMM benchmarking contexts, measuring performance against other libraries more openly optimized for GEMM benchmarks, it makes sense to disable such optimizations that are counterproductive in such settings. by Benoit Jacob · 4 years, 10 months ago
- 178084d Soften the penalization of lack of cache locality a little. by Benoit Jacob · 4 years, 10 months ago
- 56824f9 TFLM: Fix double-promotion error. by Ruy Contributors · 4 years, 10 months ago
- 70aad42 TFLM: Fix double-promotion error. by Ruy Contributors · 4 years, 10 months ago
- 8b5d287 TFLM: Fix double-promotion error. by Ruy Contributors · 4 years, 10 months ago
- 738c0f5 Fix the build of benchmark_opt_set rules: the build failed when RUY_OPT_ASM was disabled, this RUY_INHERIT_PACK directive was needed regardless of it. by Benoit Jacob · 4 years, 10 months ago
- 9f54a1e Ruy - Add cache invalidation by T.J. Alumbaugh · 4 years, 10 months ago
- 14bfdeb Allow fixing some of the dimensions while allowing others to vary with RUY_BENCHMARK_CUBIC. Useful to gather narrow/shallow gemm benchmark results, not just cubic. by Benoit Jacob · 4 years, 10 months ago
- bf99297 Remove now dead code. by Benoit Jacob · 4 years, 10 months ago
- bcfb762 Further tweaks to test logic enabling bias and clamping. by Benoit Jacob · 4 years, 10 months ago
- 2b0d243 Ruy - fix test to run platform-specific path by T.J. Alumbaugh · 4 years, 10 months ago
- 5b36bac When benchmarking, avoid randomly turning on/off some variants e.g. bias-addition and nonzero zero-points. This makes a very small performance difference but in benchmarking we should consistently measure the same exact thing. by Benoit Jacob · 4 years, 10 months ago
- 54d2435 Simplify ruy tests by removing the complicated logic determining quantized multipliers and clamp bounds. Now unconditionally doing what we used to do when QUICK_BENCHMARK=1 was passed. That was needed in practice to get quick results, as the old logic was very slow as it had to rely on a reference implementaiton of matmul (else it would have been very confusing when matmul regressed). by Benoit Jacob · 4 years, 10 months ago
- 73c3214 Use an ordered map for thread roots so that profiles consistently start with the 'main thread' and have a consistent order of enumeration of the other threads. by Benoit Jacob · 4 years, 10 months ago
- eb351f2 Drop the dependency on gemmlowp/fixedpoint. by Benoit Jacob · 4 years, 10 months ago
- d1a14aa Remove ruy's dependency on the gemmlowp profiler. by Benoit Jacob · 4 years, 10 months ago
- 652f111 Update README.md. Add contributing.md and LICENSE. by Benoit Jacob · 4 years, 10 months ago
- 6180f1f Ruy x86: Introduce framework for SSE 4.2 and VNNI. by Alex Stark · 4 years, 10 months ago
- c40e695 Ruy: Add note to x86 AVX2 kernels. by Alex Stark · 4 years, 10 months ago
- 29840ae Fix compilation broken by cl/288340160. by Benoit Jacob · 4 years, 11 months ago
- 23adc55 Keep only the simple auxv method for detecting dotprod instructions. by Benoit Jacob · 4 years, 11 months ago
- 879f593 Ruy GEMV: x86 AVX-512 8-bit rough kernels. by Alex Stark · 5 years ago
- b6632d3 Ruy GEMV: x86 AVX-512 float rough kernels. by Alex Stark · 5 years ago
- f2db1bf Ruy GEMV: x86 AVX2 8-bit rough kernels. by Alex Stark · 5 years ago
- 44ca9b1 Ruy GEMV: x86 AVX2 float rough kernels. by Alex Stark · 5 years ago
- 8d3e931 Add `cacheable` flag to Ruy Matrix so that caller "opts in" to cache behavior on a per-call basis by T.J. Alumbaugh · 5 years ago
- de0a983 In gemv-ish cases, each byte of the big weights matrix is traversed only once, so any notion of data locality is irrelevant. Ignore the 'cache locality score' by forcing it to be 0 in that case. by Benoit Jacob · 5 years ago
- 9e08d80 Limit rectangularness to avoid using too tiny kernel blocks in the case of highly rectangular destination matrices (gemv-ish cases), which would result in too few iterations of the kernel inner loop to be fully efficient. Now aim to have at least 8 iterations of the kernel inner loop if possible. by Benoit Jacob · 5 years ago
- 718aa11 Ruy: remove additional flag guarding use of prepacked cache by T.J. Alumbaugh · 5 years ago
- 7bbf219 NeonCpuBackendGemm uses CpuBackedGemm interface instead of Ruy interface by T.J. Alumbaugh · 5 years ago
- 2c7897c Ruy: Profile cache ejection. by T.J. Alumbaugh · 5 years ago
- c85f6d7 Resubmit of http://cl/283555950 with fix for win32. by Sean Silva · 5 years ago
- 19a85c4 Use separate allocator for cached prepacked matrix allocations. by Ruy Contributors · 5 years ago
- 9a523b8 Use separate allocator for cached prepacked matrix allocations. by Sean Silva · 5 years ago
- 634945d use nullptr for null pointers. by Benoit Jacob · 5 years ago
- 20fc9c1 Move deps to BUILD file to make them easier to manage with automation by Ruy Contributors · 5 years ago
- 48f3bc0 Ruy: Permit GEMV code to thread if thread_count above 1 by T.J. Alumbaugh · 5 years ago
- a67c966 Ruy: Add non-zero mean in matrix test data, exercising along-row summations. by Alex Stark · 5 years ago
- 3a43125 Ruy: Resubmit of: Optimize (partial) of x86 AVX-512 8-bit pack. by Alex Stark · 5 years ago
- b645c83 Ruy: Re-submit of: Further optimization (partial) of x86 AVX2 8-bit pack. by Alex Stark · 5 years ago
- 1cff2c1 Ruy: Reduce compiler warnings. by Alex Stark · 5 years ago
- b03ae6b Ruy: Further optimization (partial) of x86 AVX2 8-bit pack. by Alex Stark · 5 years ago
- cb1db77 Ruy: Optimize (partial) of x86 AVX-512 8-bit pack. by Alex Stark · 5 years ago
- 3b3dbb1 Ruy ARMv8 quantized GEMV kernel by T.J. Alumbaugh · 5 years ago
- a1aad26 Ruy: ARMv8 GEMV kernel with dotprod support by T.J. Alumbaugh · 5 years ago
- 19a7e80 Provide path to Ruy::Mul for MatrixBatchVectorMultiplyAccumulate by T.J. Alumbaugh · 5 years ago