Add Memcpy32 bench.
This compares 32-bit copies using memcpy, autovectorization, and when SSE2 is
available, aligned and unaligned SSE2.
Running this on my desktop (Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz), I see
all four perform essentially the same, except Clang's autovectorization looks
a little better than GCC's. memcpy is calling libc 2.19's __memcpy_sse2_unaligned.
BUG=skia:
R=reed@google.com, qiankun.miao@intel.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/290533002
git-svn-id: http://skia.googlecode.com/svn/trunk@14799 2bbb7eff-a529-9590-31e7-b0007b416f81
2 files changed