fold srcover coverage with SkVMBlitter

This optimization also decreases register pressure, making it possible
to JIT where we couldn't before, in particular, srcover through an A8
mask into 8888.

These programs could use fewer registers still if skvm weren't so
literal minded about running things in the order you asked, and if it
weren't so much easier to express code as [r,g,b,a] = load_dst() than to
unpack one at a time as they're needed...  we sometimes have a bunch of
registers holding temporary values where we really only need one or two
if reordered.  This might be an area where it's better to explore
backing SkVM to a more powerful code generator, like LLVM, cranelift,
subzero, V8, etc.  But it's possible I can come up with some sort of
register-pressure-reducing code reorderer?

Add more debugging tools that helped pointed to this:
   - tack on debug names to JITted routines so I can tell what's what
   - when debugging is enabled, dump out programs that fail to JIT

Change-Id: I56f1288d830f85d5fce7c59ca0ec3360069665ac
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/242559
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Mike Reed <reed@google.com>
3 files changed