Some performance tweaks for DAA
1. Always inline (Clang previously ignored inline and got 25% slower)
2. SIMD everywhere other than x86 gcc:
non-SIMD is only faster in my desktop with gcc;
with Clang on my desktop, SIMD is 50% faster than non-SIMD.
3. Allocate 4x memory instead of 2x when running out of space:
on old Android devices with Linux kernel 3.10 (e.g., Nexus 6P, 5X),
the alloc/memcpy will triger a major bottleneck in kernel (30% of
the running time). Such bottleneck goes away (the kernel is no
longer doing stupid things during alloc/memcpy) in Linux kernel
3.18 (e.g., Pixel), and that's why DAA is much faster on Pixel than
on Nexus 6P.
I think maybe I should adopt SkRasterPipeline for device-specific
optimizations.
Bug: skia:
Change-Id: I0408aa7671a5f1b39aad3bec25f8fc994ff5a1bb
Reviewed-on: https://skia-review.googlesource.com/30820
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Yuqian Li <liyuqian@google.com>
2 files changed