Xfermode: SSE2 implementation of softlight_modeproc

With SSE2 optimization, performance of Xfermode_SoftLight will improve
about 30% on desktop i7-3770. Here are the data:
before:
Xfermode_SoftLight   8888:  cmsecs = 379.44   565:  cmsecs =  387.74
after:
Xfermode_SoftLight   8888:  cmsecs = 272.29   565:  cmsecs =  284.31

BUG=skia:
R=mtklein@google.com

Author: qiankun.miao@intel.com

Review URL: https://codereview.chromium.org/236363012

git-svn-id: http://skia.googlecode.com/svn/trunk@14376 2bbb7eff-a529-9590-31e7-b0007b416f81
2 files changed