arithmetic mode with Sk4f

After reading the SSE version, I figured I'd show off the new hotness a little.  This'll get us SSE, NEON and portable implementations all in one easy to read package.

Since we've been talking about it, it's worth noting the several ways this implementation is still not constant time:
  - short circuits on 0x00 and 0xff coverage;
  - floating point multiplication with untrusted k1-k4; if someone figures out a clever way to sometimes create denorm floats and sometimes not, there's a gigantic performance difference.

I would hazard the pin is constant time now though.

I've also fixed the lerp to lerp between dst and r instead of src and r.  That can't have been right.

curr/maxrss	loops	min	median	mean	max	stddev	samples   	config	bench
   9/9   MB	1	25.5ms	25.5ms	25.5ms	25.5ms	0%	▃▁▁▃▂▇▅▆▇█	8888	Xfermode_arithmetic_enforce_pm_aa
   9/9   MB	1	24.1ms	24.2ms	24.2ms	24.3ms	0%	▄▃▁▄█▆▆█▃█	8888	Xfermode_arithmetic_aa
   9/9   MB	1	102ms	102ms	102ms	103ms	0%	▁▅▂▆▂█▂█▁▂	8888	Xfermode_arithmetic_enforce_pm
   9/9   MB	1	94.8ms	95.4ms	95.2ms	95.8ms	0%	▅▅▁▁▁▁▄▇█▇	8888	Xfermode_arithmetic

~~~~>

curr/maxrss	loops	min	median	mean	max	stddev	samples   	config	bench
   9/9   MB	1	9.71ms	9.74ms	9.73ms	9.78ms	0%	█▅▄▄▁▂▂▂▄▄	8888	Xfermode_arithmetic_enforce_pm_aa
   9/9   MB	1	9.5ms	9.57ms	9.58ms	9.7ms	1%	▂▁█▅▂▂▆▃▄▄	8888	Xfermode_arithmetic_aa
   9/9   MB	1	21.8ms	21.8ms	21.8ms	21.9ms	0%	█▂▂▂▂▂▂▁▄▂	8888	Xfermode_arithmetic_enforce_pm
   9/9   MB	1	16.5ms	16.6ms	16.6ms	16.6ms	0%	▃█▁▁▄▄▁▁▆▅	8888	Xfermode_arithmetic

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1873963003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1873963003
1 file changed