Use NEON optimizations for RGB -> RGB(FF) or BGR(FF) in SkSwizzler

Swizzle Bench Runtime Nexus 6P
xxx_xxxa        0.32x
xxx_swaprb_xxxa 0.31x

Swizzle Bench Runtime Nexus 9
xxx_xxxa        1.11x
xxx_swaprb_xxxa 1.14x
(This is a slow down.)

Swizzle Bench Runtime Nexus 5
xxx_xxxa        0.12x
xxx_swaprb      0.12x

RGB PNG Decode Runtime
Nexus 6P        0.94x
Nexus 9         0.98x

I don't know how to explain the fact that the Swizzle Bench was
slower on Nexus 9, but the decode times got faster.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1618003002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1618003002
7 files changed