We can mask load and store with just AVX.

Previously we were using AVX2 instructions to generate the masks,
and AVX2 instructions for the mask load and stores themselves.

AVX came with float mask loads and stores, which will work perfectly
fine.  I don't really get what the point of the 32-bit int loads and
stores are in AVX2, beyond maybe syntax sugar?

Change-Id: I81fa55fb09daea4f5546f8c9ebbc886015edce51
Reviewed-on: https://skia-review.googlesource.com/17452
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Ravi Mistry <rmistry@google.com>
3 files changed