i965: Add a debug flag for counting cycles spent in each compiled shader.

This can be used for two purposes: Using hand-coded shaders to determine
per-instruction timings, or figuring out which shader to optimize in a
whole application.

Note that this doesn't cover the instructions that set up the message to
the URB/FB write -- we'd need to convert the MRF usage in these
instructions to GRFs so that our offsets/times don't overwrite our
shader outputs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)

v2: Check the timestamp reset flag in the VS, which is apparently
    getting set fairly regularly in the range we watch, resulting in
    negative numbers getting added to our 32-bit counter, and thus large
    values added to our uint64_t.
v3: Rebase on reladdr changes, removing a new safety check that proved
    impossible to satisfy.  Add a comment to the AOP defs from Ken's
    review, and put them in a slightly more sensible spot.
v4: Check timestamp reset in the FS as well.
17 files changed