Revert of use new shuffle to speed up affine matrix mappts (patchset #3 id:40001 of https://codereview.chromium.org/1333983002/ )

Reason for revert:
Unexpected perf impact, and a whole bunch of new images in gold (mostly invisibly different).

Original issue's description:
> use new shuffle to speed up affine matrix mappts
>
> sse: 25 -> 18
> neon: 95 -> 86
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/e70afc9f48d00828ee6b707899a8ff542b0e8b98

TBR=reed@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/1335003002
diff --git a/src/opts/SkMatrix_opts.h b/src/opts/SkMatrix_opts.h
index 2d0a142..3fb2701 100644
--- a/src/opts/SkMatrix_opts.h
+++ b/src/opts/SkMatrix_opts.h
@@ -89,11 +89,12 @@
         }
         Sk4s trans4(tx, ty, tx, ty);
         Sk4s scale4(sx, sy, sx, sy);
-        Sk4s  skew4(ky, kx, ky, kx);    // applied src4, then x/y swapped
+        Sk4s  skew4(kx, ky, kx, ky);    // applied to swizzle of src4
         count >>= 1;
         for (int i = 0; i < count; ++i) {
             Sk4s src4 = Sk4s::Load(&src->fX);
-            (trans4 + src4 * scale4 + SkNx_shuffle<1,0,3,2>(src4 * skew4)).store(&dst->fX);
+            Sk4s swz4(src[0].fY, src[0].fX, src[1].fY, src[1].fX);  // need ABCD -> BADC
+            (src4 * scale4 + swz4 * skew4 + trans4).store(&dst->fX);
             src += 2;
             dst += 2;
         }