SSE2 implementation of S32_D565_Opaque

Benchmarks hitting this path can benfit from this patch.
Here are the data:
                                    before      after
        gradient_radial2_mirror   10885.52   10849.48   0.33%
 gradient_radial2_clamp_hicolor   11819.69   11644.83   1.48%
         gradient_radial2_clamp   11816.10   11649.91   1.41%
     bitmaprect_FF_filter_trans       6.27       4.88  22.17%
   bitmaprect_FF_nofilter_trans       6.27       4.88  22.17%
  bitmaprect_FF_filter_identity       6.31       4.86  22.98%
bitmaprect_FF_nofilter_identity       6.25       4.86  22.24%
             bitmap_4444_update       6.26       5.05  19.33%
    bitmap_4444_update_volatile       6.21       5.06  18.52%
                    bitmap_4444       6.22       5.06  18.65%

BUG=
R=mtklein@google.com

Author: qiankun.miao@intel.com

Review URL: https://codereview.chromium.org/172083003

git-svn-id: http://skia.googlecode.com/svn/trunk@13556 2bbb7eff-a529-9590-31e7-b0007b416f81
3 files changed