Add AVX to the SkJumper mix.

AVX is a nice little halfway point between SSE4.1 and HSW, in terms
of instructions available, performance, and availability.

Intel chips have had AVX since ~2011, compared to ~2013 for HSW and
~2007 for SSE4.1.  Like HSW it's got 8-wide 256-bit float vectors,
but integer (and double) operations are essentially still only 128-bit.
It also doesn't have F16 conversion or FMA instructions.

It doesn't look like this is going to be a burden to maintain, and only
adds a few KB of code size.  In exchange, we now run 8x wide on 45% to
70% of x86 machines, depending on the OS.

In my brief testing, speed eerily resembles exact geometric progression:
   SSE4.1:        1x speed (baseline)
      AVX: ~sqrt(2)x speed
      HSW:       ~2x speed

This adds all the basic plumbing for AVX but leaves it disabled.
I'll flip it on once I've implemented the f16 TODOs.

Change-Id: I1c378dabb8a06386646371bf78ade9e9432b006f
Reviewed-on: https://skia-review.googlesource.com/8898
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
5 files changed