add experimental bilerp_clamp_8888 stage

It looks like we can specialize hot image shaders into their
own single stages for a good speedup on both x86 and ARM.

I've started here with bilerp_clamp_8888, and will
follow up with bgra and 565, and lowp versions of those,
and probably also the same for nearest neighbors.

All pixels are identical in GMs.

Change-Id: I2f6995767cd38053d670b8d0bfdb71b687803d70
Reviewed-on: https://skia-review.googlesource.com/82100
Reviewed-by: Yuqian Li <liyuqian@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
7 files changed