Look beyond SSE2 for Paeth
You can break this CL down into three steps. Steps 2 and 3 depend on 1.
Step 1: go to a 16-bit impl. Speed ~unaffected.
Step 2: use SSSE3 16-bit abs. ~20% speedup to Paeth.
Step 3: use SSE4.1 blendv, total ~25% speedup to Paeth.
Overall this can improve PNG decoding by around 8% end-to-end.
I would feel most comfortable landing this only after we have a bot exercising the SSE4.1 code, either by moving this stuff behind a function pointer (simulating Chrome/Clank) or by adding a builder with at least SSE4.1 at compile time (simulating an Android system build). We've got plenty of bots building with SSSE3 at compile time to test that path.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dimage&master=false&issue=1657503002
Review URL: https://codereview.chromium.org/1657503002
1 file changed