Extend SkCanvas matrix stack to be 4x4, but with (basically) the same public API.

Devices receive the 4x4, but by default they simply downsample it to SkMatrix.

New SkM44 matrix for the impl. It differs from SkMatrix44 in a few ways
- no tracking of "type"
- faster for concat, as it does not use doubles for intermediates
- much simpler API

There are some low-bit differences in some gms, so adding a flag for clients to
stage this change. (due to faster but lower-precision in SkM44::concat)

Performance: running canvas_matrix bench

3x3 version:

    167.93  	canvas_matrix_3x3	8888
    209.97  	canvas_matrix_2x3	8888
    174.87  	canvas_matrix_scale	8888
    135.30  	canvas_matrix_trans	8888

4x4 version:

    116.59  	canvas_matrix_3x3	8888
    105.40  	canvas_matrix_2x3	8888
    159.83 ?	canvas_matrix_scale	8888
    113.47  	canvas_matrix_trans	8888

Why faster?
- not tracking matrix_type helps a lot it seems
- faster full concat (no doubles)

Before adding the specialized preConcats...

    318.11 ?	canvas_matrix_3x3	8888
    339.38  	canvas_matrix_2x3	8888
    383.28  	canvas_matrix_scale	8888
    251.67  	canvas_matrix_trans	8888

Change-Id: I68eac942919fa5418081e789f31710a1e2a752da
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/262056
Reviewed-by: Brian Salomon <bsalomon@google.com>
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Reed <reed@google.com>
12 files changed