Really use vpmaddwd in hsw::convolve_vertical().

No pixel diffs.
Performance on 8888 looks like an overall win.

Before:
    micros  bench
    222.41  bitmap_scale_filter_64_256
     40.06  bitmap_scale_filter_256_64
      8.17  bitmap_scale_filter_90_10
     10.32  bitmap_scale_filter_90_30
     22.50  bitmap_scale_filter_90_80
      1.80  bitmap_scale_filter_90_90
     57.51  bitmap_scale_filter_80_90
     41.99  bitmap_scale_filter_30_90
     31.51  bitmap_scale_filter_10_90

After:
    micros  bench
    193.60  bitmap_scale_filter_64_256
     46.26  bitmap_scale_filter_256_64
      7.81  bitmap_scale_filter_90_10
      9.99  bitmap_scale_filter_90_30
     22.05  bitmap_scale_filter_90_80
      1.96  bitmap_scale_filter_90_90
     52.07  bitmap_scale_filter_80_90
     37.73  bitmap_scale_filter_30_90
     27.63  bitmap_scale_filter_10_90

Change-Id: I2f29366b0fd503176c5af4d825fa524e632da21b
Reviewed-on: https://skia-review.googlesource.com/7630
Reviewed-by: Matt Sarett <msarett@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
1 file changed