SoftLight with SkPMFloat

SSE speeds up about 4.5x over existing integer SSE,
NEON speeds up about 3x over serial integer code.

We expect 1-2 bit component diffs in the usual GMs.

Still guarded by SK_SUPPORT_LEGACY_XFERMODES,
which I'll now try to lift in Chrome.

BUG=skia:

Review URL: https://codereview.chromium.org/1221493002
2 files changed