Move looping logic into start_pipeline().
This should be a big win on Windows, but I haven't timed there yet.
On my Mac, it's a solid 2% speedup.
PS1 was insufficiently ambitious, but was this for posterity:
No need to vzeroupper twice on Windows.
On Windows start_pipeline() will vzeroupper,
so no need to do it in just_return().
Change-Id: I099320b95da85900a60ce96fdb7a216a36db1858
Reviewed-on: https://skia-review.googlesource.com/8821
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
4 files changed