For 32-bit reads of integer guest registers, generate a 64-bit Get
followed by a Iop_64to32 narrowing, rather than doing a 32-bit Get.
This makes the Put-to-Get-forwarding optimisation work seamlessly for
code which does 32-bit register operations (very common), which it
never did before.  Also add a folding rule to remove the resulting
32-to-64-to-32 widen-narrow chains.

This reduces the amount of code generated overall about 3%, but gives
a much larger speedup, of about 11% for Memcheck running perf/bz2.c.
Not sure why this is, perhaps due to reducing store bandwidth
requirements in the generated code, or due to avoiding
store-forwarding stalls when writing/reading the guest state.



git-svn-id: svn://svn.valgrind.org/vex/trunk@1955 8f6e269a-dfd6-0310-a8e1-e2731360e62c
2 files changed