Move tgsi machine state init/allocations so they're done less frequently.

This, plus expanding all instructions ahead of time, seems to have improved
the performance of program execution by 8x or so.
5 files changed