basic first pass at RGBA F32 support

Draws basically the same as f16.

The existing load_f32, load_f32_dst, and store_f32 stages all had the
same bug that we'd never noticed because dy was always 0 until now.

Change-Id: Ibbd393fa1acc5df414be4cdef0f5a9d11dcccdb3
Reviewed-on: https://skia-review.googlesource.com/137585
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Brian Osman <brianosman@google.com>
Reviewed-by: Mike Reed <reed@google.com>
29 files changed