use new shuffle to speed up affine matrix mappts

sse: 25 -> 18
neon: 95 -> 86

BUG=skia:

Review URL: https://codereview.chromium.org/1333983002
2 files changed