Add Memcpy32 bench.

This compares 32-bit copies using memcpy, autovectorization, and when SSE2 is
available, aligned and unaligned SSE2.

Running this on my desktop (Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz), I see
all four perform essentially the same, except Clang's autovectorization looks
 a little better than GCC's.  memcpy is calling libc 2.19's __memcpy_sse2_unaligned.


BUG=skia:
R=reed@google.com, qiankun.miao@intel.com, mtklein@google.com

Author: mtklein@chromium.org

Review URL: https://codereview.chromium.org/290533002

git-svn-id: http://skia.googlecode.com/svn/trunk@14799 2bbb7eff-a529-9590-31e7-b0007b416f81
2 files changed