Use GrTBlockList instead of SkAutoSTMalloc to reduce GrAtlasTextOp size

On an x86_64 Windows machine, the original GrAtlasTextOp is 1304 bytes
total, with the following breakdown:
  - SkAutoSTMalloc<12, Geometry> = 1160 bytes, 16 of which are state and
    the rest is all usable data for Geometry (each Geom is 96 bytes).
  - Op state = 144 bytes, 8 of which is related to managing the
    auto-malloc, 32 bytes hold the processor set, and 64 bytes come from
    GrMeshDrawOp and parent classes.
  - This was probably particularly unfortunate when we used the memory
    pool for op allocations, because its block size was 16k. We would
    quickly use up an entire allocation with <16 atlas text ops.

With this change and https://skia-review.googlesource.com/c/skia/+/331657,
GrAtlasTextOp is 264 bytes broken down as:
  - GrTBlockList<Geometry, 1> = 152 bytes, 96 is an inline Geometry and
    the rest is state.
  - Op state = 112 bytes, 96 bytes hold the processor set and same
    parent class state as before.
  - The old atlas op had logic to grow its total storage following the
    sequence 12, 18, 27, 40, 60, 90...
  - The updated op configures the block list to grow with the following
    sequence 6, 18, 36, 60, 90...
  - This can be easily tweaked if we want to explore more aggressive or
    conservative approaches. The current multiplier and policy was
    chosen to reasonably match the old 12*(1.5)^n policy, which is not
    an implemented option in GrBlockAllocator (although seems more
    useful than the 2^n exponential policy).

Overall, a large reduction in the upfront allocation size, since it does
not assume every atlas text op will need to accumulate additional
geometry. GrTBlockList does have more overhead in what it tracks since
it has a linked list of byte arrays (24 vs. 56 bytes), but this lets it
avoid memory copies during a merge without it relying on realloc to be
successful. This extra built-in overhead is paid for by packing the
various flags/config options from 32 bytes into a 4 byte bitfield. Since
the atlas op does not upgrade its config as ops are merged, this
bitfield is effectively const and captures the majority of the logic
needed in combineIfPossible() with just an equals test.

GrTBlockList is one of the types I'm hoping can be relied upon by all of
Ganesh's ops to store the geometric/per-draw state that must be
accumulated during a merge. The atlas text op would be one of the early
adopters but also hopefully shows benefits on lower end devices. Perf
tests came out tied for me on my high-end machine, which is not
surprising, but I also feel like the code simplification is worth it
even if perf is no net change.

If it does turn out to have regressions, then we can revert and I can
circle back to optimizing the list class more.

Change-Id: Ibb5b198e42006d0ed8aab6f17d9e371fbcad94dc
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/330738
Commit-Queue: Michael Ludwig <michaelludwig@google.com>
Reviewed-by: Herb Derby <herb@google.com>
Reviewed-by: Brian Salomon <bsalomon@google.com>
2 files changed