SkJumper: handle the <kStride tail in AVX+ mode.
We have plenty general purpose registers to spare on x86-64,
so the cheapest thing to do is use one to hold the usual 'tail'.
Speedups on HSW:
SkRasterPipeline_srgb: 292 -> 170
SkRasterPipeline_f16: 122 -> 90
There's plenty more room to improve here, e.g. using mask loads and
stores, but this seems to be enough to get things working reasonably.
BUG=skia:6289
Change-Id: I8c0ed325391822e9f36636500350205e93942111
Reviewed-on: https://skia-review.googlesource.com/9110
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
4 files changed