We extract a sub-function from MAKENAME(_nofilter_DX), only dealing with reading
one index array, indexing into src array and output to dst array.

Because of the scatter-gather nature, we can not do much burst/batch
reading/writing to improve the performance.

We tried Neon vector instructions.  We also tried the hand optimize the compiler
generated assembly (non-neon) code.  The latter seems to have better gain.
About 6% improvements, not much though...

Patch-by: Xin Qi of codeaurora.org

http://codereview.appspot.com/1127042/show

git-svn-id: http://skia.googlecode.com/svn/trunk@579 2bbb7eff-a529-9590-31e7-b0007b416f81
3 files changed