Implement GPU path for matrix convolution.  Note that when not convolving alpha,
the premultiplying is done less efficiently than in the raster path:  it's
done on each texture access, rather than as a pre-processing pass.   This was
so I could do the filter as a single custom stage; will try the optimization
separately.

This implementation gives a ~30X speedup on the GPU results for the
matrixconvolution bench (~10X due to the GPU, and ~3X due to texture
uploads/readback removal).

Note:  this changes the matrixconvolution for the software path as well, so
it will likely break the bots until that test is rebaselined.

Review URL:  https://codereview.appspot.com/6585069/



git-svn-id: http://skia.googlecode.com/svn/trunk@5809 2bbb7eff-a529-9590-31e7-b0007b416f81
3 files changed