pan/bi: Lower and optimize NIR

Pretty much a copypaste from Midgard except where architectural
decisions diverge around vectorization. On that note, we will need our
own ALU scalarization pass at some point (or rather we'll need to extend
nir_lower_alu_scalar) to allow partial lowering for 8/16-bit ops. I.e.
we'll approximately need to lower

   vec4 16 ssa_2 = fadd ssa_0, ssa_1

to

   vec2 16 ssa_2 = fadd ssa_0.xy, ssa_1.xy
   vec2 16 ssa_3 = fadd ssa_0.zw, ssa_1.zw
   vec4 16 ssa_4 = vec4 ssa_2.x, ssa_2.y, ssa_3.x, ssa_4

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4097>
diff --git a/src/panfrost/bifrost/compiler.h b/src/panfrost/bifrost/compiler.h
index 74b9e22..97147d0 100644
--- a/src/panfrost/bifrost/compiler.h
+++ b/src/panfrost/bifrost/compiler.h
@@ -317,6 +317,7 @@
 
 typedef struct {
        nir_shader *nir;
+       gl_shader_stage stage;
        struct list_head blocks; /* list of bi_block */
        uint32_t quirks;
 } bi_context;