small ABI + narrow/wide code updates

The only tangible effect this CL should have is to use __vectorcall on
all Windows builds, including scalar ones.  The code generation is a
little better there with __vectorcall than not, so might as well.  This
is a baby step towards vector stages with MSVC, but a very baby step
indeed.

Mostly this refactors and regroups a bunch of logic to reflect my
current thoughts.  The BUILD.gn changes are essentially no-ops, but they
simplify things and make our flags more similar to how those targets are
built in Chromium.

(And I cleaned up other /arch: uses so this works.)

Change-Id: I73dd39d15cdc7b3d268231a707952bbbfd91496e
Reviewed-on: https://skia-review.googlesource.com/115644
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
diff --git a/BUILD.gn b/BUILD.gn
index 7262034..2be010e 100644
--- a/BUILD.gn
+++ b/BUILD.gn
@@ -303,7 +303,7 @@
 opts("avx") {
   enabled = is_x86
   sources = skia_opts.avx_sources
-  if (!is_clang && is_win) {
+  if (is_win) {
     cflags = [ "/arch:AVX" ]
   } else {
     cflags = [ "-mavx" ]
@@ -313,14 +313,10 @@
 opts("hsw") {
   enabled = is_x86
   sources = skia_opts.hsw_sources
-  if (!is_clang && is_win) {
+  if (is_win) {
     cflags = [ "/arch:AVX2" ]
   } else {
-    cflags = [
-      "-mavx2",
-      "-mf16c",
-      "-mfma",
-    ]
+    cflags = [ "-march=haswell" ]
   }
 
   # Oddly, clang-cl doesn't recognize this as a valid flag.