Add SSE4 version of BlurImage optimizations.

Adds an SSE4.1 version of the existing BlurImage optimizations.
Performance of blur_image_filter_* benchmarks show a 10-50%
improvement on Linux/Ubuntu Core i7.

Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>

Committed: https://skia.googlesource.com/skia/+/2830632ce93c97ed7647b13348365ea92e4ea665

R=mtklein@google.com, reed@chromium.org

Author: henrik.smiding@intel.com

Review URL: https://codereview.chromium.org/366593004
5 files changed