attempt 2: add experimental bilerp_clamp_8888 stage

It looks like we can specialize hot image shaders into their
own single stages for a good speedup on both x86 and ARM.

I've started here with bilerp_clamp_8888, and will
follow up with bgra and 565, and lowp versions of those,
and probably also the same for nearest neighbors.

All pixels are identical in GMs.

Change-Id: Ib5ed6e528efd9e3eed96ba67d02fbec2e8133a81
Reviewed-on: https://skia-review.googlesource.com/86860
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
7 files changed