Revised NEON implementations of blend.

Reimplement blend intrinsic using wider memory accesses and a few more 8-bit
operations where possible.  Implementations in AArch32 and AArch64 NEON.

Change-Id: I5e56010376b1db1628a911cf09d97baf5af289b3
5 files changed