lean more on the compiler in lowp stages

This refactors {from,to}_{byte,8888} to lean a bit more on the compiler,
and to share code between the two.  The algorithm is not exactly the
same, but it's comparable, and the results of course are identical.

This new algorithm is a lot easier to generalize to AVX2, and parallels
the full-precision {from,to}_{byte,8888} functions in _stages.cpp.

Change-Id: I31ea90d65967bf4ede2497d1e2197cb0e7648bf8
Reviewed-on: https://skia-review.googlesource.com/20828
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
3 files changed