SkSplicer: lowp hacking

Add lowp variants for most stages in SkSplicer.  These double the number
of pixels handled by representing each channel with 16 bits, ranging from
0x0000 as 0 to 0x8000 as 1.  This format lets us use the Q15 multiply
instructions available in NEON and SSSE3 at full register width, with
a little platform-specific fix up to smooth over the fact that these
aren't quite Q15 values.

When a lowp stage is unavailable, the entire pipeline upgrades to
floats.  So by simply not implementing sRGB, f16, matrix multiplication,
etc, we naturally express that they're best handled with floats.

These lowp stages ended up different enough that I've found it clearer
to have them live in their own files, noting where they differ from the
float stages.  HSW, aarch64, and armv7 are all supported.

I've seen very good things performance-wise on all platforms.

Change-Id: Ib4f820c6665f2c9020f7449a2b51bbaf6c408a63
Reviewed-on: https://skia-review.googlesource.com/7098
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
7 files changed