Add SSSE3 Optimizations for premul and swap
Improves deocde performance for RGBA pngs.
Swizzler Time on z620 (clang):
SwapPremul 0.24x
Premul 0.24x
Swap 0.37x
Decode Time on z620 (clang):
Premul ZeroInit Decodes 0.88x
Unpremul ZeroInit Decodes 0.94x
Premul Regular Decodes 0.91x
Unpremul Regular Decodes 0.98x
Swizzler Time in Dell Venue 8 (gcc):
SwapPremul 0.14x
Premul 0.14x
Swap 0.08x
Decode Time on Dell Venus 8 (gcc):
Premul ZeroInit Decodes 0.79x
Premul Regular Decodes 0.77x
Note:
ZeroInit means memory is zero initialized, and we do not write to
memory for large sections of zero pixels (memory use opt for Android).
BUG=skia:4767
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1601883002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1601883002
2 files changed