Optimize SkBlend by using NEON intrinsics
Use NEON intrinsics to check the alpha channel of the pixels.
In some case, it's about 14 times faster than the original implementation.
$ ./bin/droid out/arm64_release/nanobench --samples 300 --nompd --match LinearSrcOver -v > neon_opt.log
$ ./bin/compare neon_opt.log clean.log
LinearSrcOver_yellow_rose.pngVSkOptsDefault 1.8ms -> 24.9ms 13.8x
LinearSrcOver_iconstrip.pngVSkOptsDefault 5.71ms -> 69.8ms 12.2x
LinearSrcOver_plane.pngVSkOptsDefault 1.45ms -> 11ms 7.62x
LinearSrcOver_baby_tux.pngVSkOptsDefault 1.88ms -> 9.96ms 5.29x
LinearSrcOver_mandrill_512.pngVSkOptsDefault 1.41ms -> 4.62ms 3.29x
LinearSrcOver_yellow_rose.pngVSkOptsTrivial 24.9ms -> 24.9ms 1x
LinearSrcOver_yellow_rose.pngVSkOptsNonSimdCore 2.17ms -> 2.18ms 1x
LinearSrcOver_plane.pngVSkOptsTrivial 11.1ms -> 11.1ms 1x
LinearSrcOver_plane.pngVSkOptsNonSimdCore 1.5ms -> 1.5ms 1x
LinearSrcOver_mandrill_512.pngVSkOptsNonSimdCore 2.39ms -> 2.39ms 1x
LinearSrcOver_iconstrip.pngVSkOptsNonSimdCore 6.43ms -> 6.43ms 1x
LinearSrcOver_baby_tux.pngVSkOptsBruteForce 22.3ms -> 22.3ms 1x
LinearSrcOver_yellow_rose.pngVSkOptsBruteForce 45.5ms -> 45.5ms 1x
LinearSrcOver_baby_tux.pngVSkOptsNonSimdCore 2.02ms -> 2.02ms 1x
LinearSrcOver_iconstrip.pngVSkOptsTrivial 69.7ms -> 69.7ms 1x
LinearSrcOver_baby_tux.pngVSkOptsTrivial 9.96ms -> 9.95ms 1x
LinearSrcOver_mandrill_512.pngVSkOptsBruteForce 99.3ms -> 99.2ms 1x
BUG=skia:
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: Ia576365578d65b771440da65fdf41f090ccf0541
Reviewed-on: https://skia-review.googlesource.com/6860
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
1 file changed