More SSE2 optimizations.  This CL implements an SSE2 version of S32_bitmap_D32_filter_DX, and uses aligned loads and stores for dst, in all blending.

Review URL:  http://codereview.appspot.com/157141



git-svn-id: http://skia.googlecode.com/svn/trunk@448 2bbb7eff-a529-9590-31e7-b0007b416f81
7 files changed