Use stack instead of malloc() for most calls to SkRasterPipeline::run().

Also split bench into run/compile variants to measure the effect:
 Before …f16_compile 1x  …f16_run 1.02x  …srgb_compile 1.56x  …srgb_run 1.61x
 After  …f16_run 1x  …f16_compile 1.01x  …srgb_compile 1.58x  …srgb_run 1.59x

CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD

Change-Id: I8e65fb2acdbb05ccc0b3894f16d7646603c3e74d
Reviewed-on: https://skia-review.googlesource.com/6621
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
2 files changed