Implement GPU path for matrix convolution.  Note that when not convolving alpha,
the premultiplying is done less efficiently than in the raster path:  it's
done on each texture access, rather than as a pre-processing pass.   This was
so I could do the filter as a single custom stage; will try the optimization
separately.

This implementation gives a ~30X speedup on the GPU results for the
matrixconvolution bench (~10X due to the GPU, and ~3X due to texture
uploads/readback removal).

Note:  this changes the matrixconvolution for the software path as well, so
it will likely break the bots until that test is rebaselined.

Review URL:  https://codereview.appspot.com/6585069/



git-svn-id: http://skia.googlecode.com/svn/trunk@5809 2bbb7eff-a529-9590-31e7-b0007b416f81
diff --git a/include/effects/SkMatrixConvolutionImageFilter.h b/include/effects/SkMatrixConvolutionImageFilter.h
index a938fd0..f6e96f2 100644
--- a/include/effects/SkMatrixConvolutionImageFilter.h
+++ b/include/effects/SkMatrixConvolutionImageFilter.h
@@ -61,6 +61,10 @@
     virtual bool onFilterImage(Proxy*, const SkBitmap& src, const SkMatrix&,
                                SkBitmap* result, SkIPoint* loc) SK_OVERRIDE;
 
+#if SK_SUPPORT_GPU
+    virtual bool asNewCustomStage(GrCustomStage** stage, GrTexture*) const SK_OVERRIDE;
+#endif
+
 private:
     SkISize   fKernelSize;
     SkScalar* fKernel;