Might as well inline these premultiplies.

We're paying quite a bit of function-call overhead per pixel.

On one test image we spend 3.5% of our total time in swizzle_rgba_to_n32_premul() and 8.8% of our total time in SkPreMultiplyARGB().  That turns into just 8.8% of our total time in swizzle_rgba_to_n32_premul() after inlining.

That's about a 30% speedup.

This will make SIMD procs look worse, so it's nice to land this first.

BUG=skia:4767
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1571923002

Review URL: https://codereview.chromium.org/1571923002
1 file changed