Replace SSE optimization of Color32A_D565
Adds an SSE2 version of the Color32A_D565 function, to replace
the existing SSE4 version. Also does some minor cleanup.
Performance improvement in the following Skia benchmarks.
Measured on Atom Silvermont:
Xfermode_SrcOver - x3
luma_colorfilter_large - x4.6
luma_colorfilter_small - x2
tablebench - ~15%
chart_bw - ~10%
Measured on Corei7 Haswell:
luma_colorfilter_large running SSE2 - x2
luma_colorfilter_large running SSE4 - x2.3
Also improves performance in WPS Office application and 2D subtest of 0xbenchmark on Android.
Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
Review URL: https://codereview.chromium.org/923523002
8 files changed