Fix memcpy32_sse2_unalign.

The whole point of mempcy32_sse2_unalign is that we didn't align dst128
and src128.  So it's not safe at all to cast them back to dst and src.
That tells the compiler that dst/src are 128-bit aligned, and then it
autovectorizes the cleanup while-loop using that (false) knowledge with
aligned SSE instructions.

This leads to crashes on memcpy32_sse2_unalign_10, which is small enough
that we actually get non-16-byte aligned memory.  The larger size
benches could be crashing too, but they're big enough allocations that
they're probably always 16-byte aligned anyway.

BUG=skia:2589
R=fmalita@chromium.org, mtklein@google.com

Author: mtklein@chromium.org

Review URL: https://codereview.chromium.org/291893008

git-svn-id: http://skia.googlecode.com/svn/trunk@14851 2bbb7eff-a529-9590-31e7-b0007b416f81
1 file changed