Add versions of memset16() and memset32() in ARM assembly.

In benchmarks here on Cortex A9 processors, this code runs 25-30% faster
than the C equivalent.

Patch by: Steve McIntyre (ARM)

http://codereview.appspot.com/1973042

git-svn-id: http://skia.googlecode.com/svn/trunk@594 2bbb7eff-a529-9590-31e7-b0007b416f81
2 files changed