r600g: track dirty registers better. (v2)

This is a first step to decreasing the CPU usage, by decreasing how much
stuff we pass to the GPU and hence to the kernel CS checker.

This adds a check to see if the values we need to write are actually dirty,
and avoids writing if they are. However certain register need to always
be written so we add a new flag to say which ones should be always written
if used. (Note this could probably be done cleaner with a larger refactoring,
 since I think the CONST_BUFFER_SIZE_PS/VS and CONST_CACHE_PS/VS might
be better off as a special state).

It also moves the need_bo to be a flags on the register now.

With this, a frame of gears goes from emitting 3k dwords to emitting 2k dwords,
and I'm sure it could get a lot smaller.

v2: fix some evergreen dirty bits.

Original patch from: Bas Nieuwenhuizen, I NIHed nearly the same thing
before seeing his patch on the list, oops.

Reviewed-by: Bas Nieuwenhuizen
Signed-off-by: Dave Airlie <airlied@redhat.com>
diff --git a/src/gallium/winsys/r600/drm/r600_priv.h b/src/gallium/winsys/r600/drm/r600_priv.h
index 0e9dba7..534df11 100644
--- a/src/gallium/winsys/r600/drm/r600_priv.h
+++ b/src/gallium/winsys/r600/drm/r600_priv.h
@@ -59,11 +59,14 @@
 	pipe_mutex bo_handles_mutex;
 };
 
+#define REG_FLAG_NEED_BO 1
+#define REG_FLAG_DIRTY_ALWAYS 2
+
 struct r600_reg {
 	unsigned			opcode;
 	unsigned			offset_base;
 	unsigned			offset;
-	unsigned			need_bo;
+	unsigned			flags;
 	unsigned			flush_flags;
 	unsigned			flush_mask;
 };
@@ -194,9 +197,10 @@
 	}
 }
 
-static inline void r600_context_dirty_block(struct r600_context *ctx, struct r600_block *block)
+static inline void r600_context_dirty_block(struct r600_context *ctx, struct r600_block *block,
+					    int dirty)
 {
-	if (!(block->status & R600_BLOCK_STATUS_DIRTY)) {
+	if ((dirty != (block->status & R600_BLOCK_STATUS_DIRTY)) || !(block->status & R600_BLOCK_STATUS_ENABLED)) {
 		block->status |= R600_BLOCK_STATUS_ENABLED;
 		block->status |= R600_BLOCK_STATUS_DIRTY;
 		ctx->pm4_dirty_cdwords += block->pm4_ndwords + block->pm4_flush_ndwords;