Implementing S32A_Opaque_BlitRow32 using v7 neon instructions.

Taking the advantage of 16 channels of each QualWord register.  Also using the
software pipelining to scatter the loads/stores among vector operations.

Got roughly 70% improvements on simulation environments.

http://codereview.appspot.com/1148042/show

Patch-by: XinQi of codeaurora.org

git-svn-id: http://skia.googlecode.com/svn/trunk@578 2bbb7eff-a529-9590-31e7-b0007b416f81
2 files changed