freedreno/a3xx/compiler: half-precision output

Using generic shaders caused a measurable fps drop, which was isolated to
use of full precision (vs half precision) output.  This is an attempt to
regain that lost performance by using half precision solid/blit shaders
(when the output format is not float32).

Note: for the built-in shaders, I would not expect them to be register
starved.  And in fact it is the solid frag shader that seems to have the
biggest impact.  So I suspect you get double the pixel pipe units (or
half the cycles) when the output is half precision.  So there may be
some gain to using half precision output for application shaders as
well, even though the rest of register usage is still full precision.
But for half precision to work for more complex shaders, we need to deal
with some constraints, like cat2 needing same precision for it's two src
registers.  So for now it is not enabled by default except for the
built-in shaders.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
6 files changed