Move Sk4px Xfermode code to a header so we can use it twice.

  - Once in SkXfermode as usual to pick up compile-time SSE and NEON
  - Once in SkXfermode_arm_neon to pick up run-time NEON

This allows us to start cleaning up SkXfermode_arm_neon as we've done
for SkXfermode_SSE2.  I'm saving this catharsis for a day when I need it.

The Sk4px xfermodes are generally faster than the existing NEON procs,
so this should also have the side effect of a perf win there.

This means our new Plus-AA code works for runtime NEON too.
BUG=skia:3852

Review URL: https://codereview.chromium.org/1150313003
3 files changed