3-15% speedup to HardLight / Overlay xfermodes.
While investigating my bug (skia:4052) I saw this TODO and figured
it'd make me feel better about an otherwise unsuccessful investigation.
This speeds up HardLight and Overlay (same code) by about 15% with SSE, mostly
by rewriting the logic from 1 cheap comparison and 2 expensive div255() calls
to 2 cheap comparisons and 1 expensive div255().
NEON speeds up by a more modest ~3%.
BUG=skia:
Review URL: https://codereview.chromium.org/1230663005
diff --git a/tests/SkNxTest.cpp b/tests/SkNxTest.cpp
index 5893214..4005d25 100644
--- a/tests/SkNxTest.cpp
+++ b/tests/SkNxTest.cpp
@@ -192,3 +192,19 @@
}
}
}
+
+DEF_TEST(Sk4px_widening, r) {
+ SkPMColor colors[] = {
+ SkPreMultiplyColor(0xff00ff00),
+ SkPreMultiplyColor(0x40008000),
+ SkPreMultiplyColor(0x7f020406),
+ SkPreMultiplyColor(0x00000000),
+ };
+ auto packed = Sk4px::Load4(colors);
+
+ auto wideLo = packed.widenLo(),
+ wideHi = packed.widenHi(),
+ wideLoHi = packed.widenLoHi(),
+ wideLoHiAlt = wideLo + wideHi;
+ REPORTER_ASSERT(r, 0 == memcmp(&wideLoHi, &wideLoHiAlt, sizeof(wideLoHi)));
+}