Replace NEON assembly memset16 and memset32 with intrinsic versions.

According to bench/MemsetBench.cpp, I've got them somewhere between 10% slower
and a percent or two faster than the old assembly.

BUG=skia:

CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Debug-Trybot

Review URL: https://codereview.chromium.org/1075003002
5 files changed