add _skx stages

This just makes sure all the plumbing is in place to use the Skylake
Xeon subset of AVX-512 instructions.  So far,

  - no Windows
  - no lowp
  - nothing explicitly making use of AVX-512 registers or instructions

This initial pass should run essentially identically to the _hsw AVX2
code we've been using previously.  Clang _does_ use AVX-512-only
instructions to implement some of the higher-level concepts we've coded,
but it's really a pretty subtle difference.

Next steps will bump N from 8 to 16 and start threading through an
AVX-512-friendly mask instead of tail.  I'll also want to take a harder
look at how we do blending like if_then_else()... the default codegen
here doesn't really take advantage of AVX-512 the way I'd like here.

CQ_INCLUDE_TRYBOTS=skia.primary:Test-Debian9-Clang-GCE-CPU-AVX512-x86_64-Debug

Change-Id: I6c9442488a449ea4770617bb22b2669859cc92e2
Reviewed-on: https://skia-review.googlesource.com/54062
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
6 files changed