Add a hook for CPU-optimized sRGB-sRGB srcover.

Herb's really starting to get serious about tweaking this, which becomes
a lot easier when you've got SkOpts' runtime CPU detection. We should be
able to optimize this usefully for SSSE3, SSE4.1, AVX, AVX2, or NEON.
(We can of course implement a subset.)

This function takes two counts to give us flexibility to write src patterns:
   nsrc >= ndst -> the usual srcover function
   nsrc <  ndst -> repeat src until it fills dst
   nsrc << ndst -> possibly preprocess src into registers
   nsrc == 1    -> equivalent of blitrow_color32, srcover_1, etc.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1939783003

Review-Url: https://codereview.chromium.org/1939783003
3 files changed