Sk4x4f: NEON impl.

Notable tricks:
  - v{ld,st}4q_f32 handle transposing loads and stores of floats in one step
  - vcvtq_n_{f32_u32,u32_f32} let us do conversion to and from floats without shifts

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1828613002

Review URL: https://codereview.chromium.org/1828613002
1 file changed