Implement four more xfermodes with Sk4px.

HardLight, Overlay, Darken, and Lighten are all
~2x faster with SSE, ~25% faster with NEON.

This covers all previously-implemented NEON xfermodes.
3 previous SSE xfermodes remain.  Those need division
and sqrt, so I'm planning on using SkPMFloat for them.
It'll help the readability and NEON speed if I move that
into [0,1] space first.

The main new concept here is c.thenElse(t,e), which behaves like
(c ? t : e) except, of course, both t and e are evaluated.  This allows
us to emulate conditionals with vectors.

This also removes the concept of SkNb.  Instead of a standalone bool
vector, each SkNi or SkNf will just return their own types for
comparisons.  Turns out to be a lot more manageable this way.

BUG=skia:

Committed: https://skia.googlesource.com/skia/+/b9d4163bebab0f5639f9c5928bb5fc15f472dddc

CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu-GCC-Arm64-Debug-Android-Trybot

Review URL: https://codereview.chromium.org/1196713004
7 files changed