Merge consecutive entries that share proxy in bulk texture op

Previously, a batch draw that reused the same proxy consecutively
would create a ViewCountPair for each set entry, with its count == 1.
This turned into 1 draw per entry, so although there'd still be a single
pipeline, it didn't take advantage of merging those consecutive entries
into a larger draw to reduce draw count as well.

Initially, the thinking for the batch API was that it was for tilers
that used unique images for each tile or render pass. However, Chrome's
compositor is also responsible for rendering 9 patches as part of the UI.
These appear as 9 consecutive entries in the image set that all refer to
the same texture. With this CL the texture op will automatically merge
such occurrences into one ViewCountPair with a count of 9.

The bulkrect_1000_[grid|random]_sharedimage_batch leverages this case.
Before this CL its op would hold 1000 view count pairs that each drew
one quad. Now its op will hold 1 view count pair with a count of 1000.
On my linux workstation, the bulkrect_1000_grid_sharedimage_batch time
went from 377us to 206us. For reference, the _ref variant (which already
was a 1 view count pair with ct == 1000 due to merging of each op) has
a time of 497us. The difference between 497us and 206us represents the
overhead of calling through SkCanvas, op creation, quad optimization
analysis 1000x.

Interestingly the bulkrect_1000_random_sharedimage_batch benchmark did not
change on my workstation. My conjecture is that it is bottlenecked by
overdraw of the many overlapping rectangles.

Change-Id: Icc4195de0bcb2219f424fdaa79728281c0418558
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/258418
Commit-Queue: Michael Ludwig <michaelludwig@google.com>
Reviewed-by: Brian Salomon <bsalomon@google.com>
6 files changed