Xfermode: SSE2 implementation of hardlight mode

With SSE2 optimization, performance of Xfermode_HardLight will improve
about 45% on desktop i7-3770. Here are the data:
before:
Xfermode_HardLight   8888:  cmsecs =  48.43   565:  cmsecs =  63.11
after:
Xfermode_HardLight   8888:  cmsecs =  25.71   565:  cmsecs =  33.46

BUG=skia:
R=mtklein@google.com

Author: qiankun.miao@intel.com

Review URL: https://codereview.chromium.org/229003004

git-svn-id: http://skia.googlecode.com/svn/trunk@14373 2bbb7eff-a529-9590-31e7-b0007b416f81
1 file changed