Port to real Sk2f.

The bench improves from 39 to 30, about half from porting to Sk2f, half from
x.add(x) instead of x.multiply(two).

Remove Sk4f Load2/store2 now that we have Sk2f.

BUG=skia:

Review URL: https://codereview.chromium.org/1019773004
3 files changed