assign tmp and dst only as needed

You can tell how apprehensive I am about this by the number of comments
that I've written, but I think it all makes reasonable sense, and does
mean we can run right up to the line of using all registers, never
wasting a tmp or dst register that would go unused.

I don't think there are any function argument evaluation order issues
here, but it's reassuring that we're testing with GCC and Clang both
when I see things like a->vfoops(dst(), tmp(), r[z]).

Tests pass, and the little big of debug tracing I added temporarily
looked like it made sense.  Have not looked at how the disassembly
changes, mostly because I hacked this up on my Mac.  Will look before
and after on Linux tomorrow if this sticks.

Change-Id: I1e62aaeba12c07787128ed4a2b67fb8bc27039f6
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/228520
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
1 file changed