Port SkMatrix opts to SkOpts.

No changes to the code, just moved around.

This will have the effect of enabling vectorized code on ARMv7.
Should be no effect on ARMv8 or x86, which would have been vectorized already.

nanobench --match mappoints changes on Nexus 5 (ARMv7):

_affine: 132 -> 95
_scale: 118 -> 47
_trans: 60 -> 37

A teaser:
We should next look at the ABCD->BADC shuffle we've noted that we need in _affine.  A quick hack showed doing that optimally is another ~35% speedup on x86.  Got to figure out how to do it best on ARM though: that same quick hack was a 2x slowdown there.  Good reason to resurrect that SkNx_shuffle() CL!

(I believe the answers are vrev64q_f32(v) and _mm_shuffle_ps(v,v, _MM_SHUFFLE(2,3,0,1), but we should probably find out in another CL.)

BUG=skia:4117

Review URL: https://codereview.chromium.org/1320673014
diff --git a/src/core/SkOpts.cpp b/src/core/SkOpts.cpp
index 492fae3..a540bc8 100644
--- a/src/core/SkOpts.cpp
+++ b/src/core/SkOpts.cpp
@@ -14,6 +14,7 @@
 #include "SkBlurImageFilter_opts.h"
 #include "SkColorCubeFilter_opts.h"
 #include "SkFloatingPoint_opts.h"
+#include "SkMatrix_opts.h"
 #include "SkMorphologyImageFilter_opts.h"
 #include "SkTextureCompressor_opts.h"
 #include "SkUtils_opts.h"
@@ -58,6 +59,10 @@
 
     decltype(blit_row_color32) blit_row_color32 = sk_default::blit_row_color32;
 
+    decltype(matrix_translate)       matrix_translate       = sk_default::matrix_translate;
+    decltype(matrix_scale_translate) matrix_scale_translate = sk_default::matrix_scale_translate;
+    decltype(matrix_affine)          matrix_affine          = sk_default::matrix_affine;
+
     // Each Init_foo() is defined in src/opts/SkOpts_foo.cpp.
     void Init_ssse3();
     void Init_sse41();