SkRasterPipeline: memcpy-free tail code.

We don't call the tail code nearly as often as the body code, but when we do and call memcpy(), we first have to vzeroupper back into the non-AVX world.  That does seem to slow things down considerably.  You wouldn't think it, but this gives a nice speed up (tested on Windows).

BUG=skia:

GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3783

Change-Id: I40cbe1e529f2431825edec7638265601b64e7ec5
Reviewed-on: https://skia-review.googlesource.com/3783
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
1 file changed