add interleaved load and store instructions

store64 and store128 will use the st?.4s instructions,
and load64/load128 the ld?.4s.  The tricky bit for both
of course is that they load and store more than a single
register, and that those registers need to be adjacent.

Change-Id: I613d06cbcc6e00bfc16b1a2c88412dbbbb1c55ed
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/356344
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
3 files changed