SSE2 implementation of S32A_D565_Opaque

microbenchmark of S32A_D565_Opaque() shows a 3x speedup after SSE optimization with various count on i7-3770.

BUG=
R=mtklein@google.com, reed@google.com

Author: qiankun.miao@intel.com

Review URL: https://codereview.chromium.org/138163013

git-svn-id: http://skia.googlecode.com/svn/trunk@13495 2bbb7eff-a529-9590-31e7-b0007b416f81
4 files changed