land http://codereview.appspot.com/6353063/ by Lei
optimizations for D16 using SSE2

skia_bench -config 565 -match bitmap_8888_scale_filter -forceFilter 1 -repeat
30

The result I got on Android platform was below:

w/o this optimization routine: 
D/skia    ( 1868): running bench [640 480]     bitmap_8888_scale_filter
D/skia    ( 1868):    565: cmsecs = 286.50

w/ with optimization:
D/skia    ( 1463): running bench [640 480]     bitmap_8888_scale_filter
D/skia    ( 1463):    565: cmsecs = 186.80

The net gain is 34.80%.



git-svn-id: http://skia.googlecode.com/svn/trunk@4729 2bbb7eff-a529-9590-31e7-b0007b416f81
diff --git a/src/core/SkBitmapProcState.h b/src/core/SkBitmapProcState.h
index dc56138..e9951c8 100644
--- a/src/core/SkBitmapProcState.h
+++ b/src/core/SkBitmapProcState.h
@@ -186,5 +186,7 @@
                                  uint32_t xy[], int count, int x, int y);
 void ClampX_ClampY_nofilter_affine(const SkBitmapProcState& s,
                                    uint32_t xy[], int count, int x, int y);
+void S32_D16_filter_DX(const SkBitmapProcState& s,
+                                   const uint32_t* xy, int count, uint16_t* colors);
 
 #endif