Unify some SkNx code

 - one base case and one N=1 case instead of two each (or three with doubles)
 - use SkNx_cast instead of FromBytes/toBytes
 - 4-at-a-time Sk4f::ToBytes becomes a special standalone Sk4f_ToBytes

If I did everything right, this'll be perf- and pixel- neutral.

https://gold.skia.org/search2?issue=1526523003&unt=true&query=source_type%3Dgm&master=false

BUG=skia:
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1526523003
11 files changed