Generate sse2/sse4.1 splices, use them.

While we're at it, tidy up build_stages.py a bit.
Redirecting stdout seems a lot easier than print >>f all over the place.

TODO: non-VEX-encoded before_loop() and after_loop()

CQ_INCLUDE_TRYBOTS=skia.primary:Test-Win2k8-MSVC-GCE-CPU-AVX2-x86_64-Debug

Change-Id: I3f38e55f081670dd598c6050435466d9f394e5be
Reviewed-on: https://skia-review.googlesource.com/8230
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
3 files changed