finish up arm64 ops

Some small refactoring to common up redundant opcode building.

Oddly, I think I've got better codegen than what Clang would do here.
Clang doesn't generate uxtl-based code to unpack 8-bit to 32-bit,
instead preferring to load each byte one at a time and insert them one
at a time.

Me:
    ldr  s0, [x0]
    uxtl v0.8h, v0.8b
    uxtl v0.4s, v0.8h

Clang:
    ldrb  w8,  [x0]
    ldrb  w9,  [x0, #1]
    ldrb  w10, [x0, #2]
    ldrb  w11, [x0, #3]
    fmov  s0,      w8
    mov   v0.s[1], w9
    mov   v0.s[2], w10
    mov   v0.s[3], w11

Change-Id: I0fdf5c6cdcde6a4eb9290936284fd3ffcb2159f6
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/224821
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
3 files changed