Subzero: Fold the load instruction into the next cast instruction.

This is similar to the way a load instruction may be folded into the next arithmetic instruction.

Usually the effect is to improve a sequence like:
  mov ax, WORD PTR [mem]
  movsx eax, ax
into this:
  movsx eax, WORD PTR [mem]
without actually improving register allocation, though other kinds of casts may have different improvements.

Existing tests needed to be fixed when they "inadvertently" did a cast to i32 return type and triggered the optimization when it wasn't wanted.  These were fixed by inserting a "dummy" instruction between the load and the cast.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4095
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/1152783006
4 files changed