Xfermode: SSE2 implementation of exclusion_modeproc

With SSE2 optimization, performance of Xfermode_Exclusion will improve
about 50% on desktop i7-3770. Here are the data:
before:
Xfermode_Exclusion   8888:  cmsecs =  40.17   565:  cmsecs = 55.22
after:
Xfermode_Exclusion   8888:  cmsecs =  18.53   565:  cmsecs = 26.55

BUG=skia:
R=mtklein@google.com

Author: qiankun.miao@intel.com

Review URL: https://codereview.chromium.org/233733005

git-svn-id: http://skia.googlecode.com/svn/trunk@14371 2bbb7eff-a529-9590-31e7-b0007b416f81
1 file changed