Reorder the separable blur passes into XXX/YYY, with an image transpose on the last pass of each group.  This results in continuguous memory reads in all passes, giving a 22% speedup on theverge.skp over the previous separable implementation, and a 30%-50% improvement over the existing implementation (depending on platform).

Review URL: https://codereview.appspot.com/6851053

git-svn-id: http://skia.googlecode.com/svn/trunk@6445 2bbb7eff-a529-9590-31e7-b0007b416f81
1 file changed