SkNx refresh

   - rearrange a bit
   - fewer macros
   - hooks for all operators
   - add left and right scalar operator overrides
   - add +=, &=, <<=, etc.
   - add SkNx_split() and SkNx_join()
   - simplify the many rsqrt() and invert() options to just what we actually use

This refactoring pointed out that our float <-> int NEON conversions are not specialized, so I've implemented them.  It seems nice that this is an error rather than silently falling back to serial code.

It's unclear to me if split/join want to be external, static methods, or non-static methods (SkNx_join(), Sk4f::Join(), x.join()).  Time will tell?

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1812233003
CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Release-Trybot;client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1812233003
diff --git a/src/effects/gradients/SkRadialGradient.cpp b/src/effects/gradients/SkRadialGradient.cpp
index 1560cd2..254e438 100644
--- a/src/effects/gradients/SkRadialGradient.cpp
+++ b/src/effects/gradients/SkRadialGradient.cpp
@@ -94,8 +94,7 @@
         int count, int toggle);
 
 static inline Sk4f fast_sqrt(const Sk4f& R) {
-    // R * R.rsqrt0() is much faster, but it's non-monotonic, which isn't so pretty for gradients.
-    return R * R.rsqrt1();
+    return R * R.rsqrt();
 }
 
 static inline Sk4f sum_squares(const Sk4f& a, const Sk4f& b) {