skeleton for float <-> half optimized procs

Nothing fancy yet, just calls the serial code in a loop.

I will try to folow this up with at least some of:
   - SSE2 version of serial code
   - NEON version of serial code
   - NEON version using vcvt.f32.f16/vcvt.f16.f32
   - F16C (between AVX and AVX2) version using vcvtph2ps/vcvtps2ph
The last two are fastest but need runtime detection.


Review URL:
diff --git a/src/core/SkOpts.h b/src/core/SkOpts.h
index 2e8778e..c717526 100644
--- a/src/core/SkOpts.h
+++ b/src/core/SkOpts.h
@@ -67,6 +67,9 @@
                         grayA_to_rgbA,         // i.e. expand to color channels and premultiply
                         inverted_CMYK_to_RGB1, // i.e. convert color space
                         inverted_CMYK_to_BGR1; // i.e. convert color space
+    extern void (*half_to_float)(float[], const uint16_t[], int);
+    extern void (*float_to_half)(uint16_t[], const float[], int);