Subzero: Improve/refactor folding loads into the next instruction.

This is turned into a separate (O2-only) pass that looks for opportunities:
1. A Load instruction, or an AtomicLoad intrinsic that would be lowered just like a Load instruction
2. Followed immediately by an instruction with a whitelisted kind that uses the Load dest variable as one of its operands
3. Where the whitelisted instruction ends the live range of the Load dest variable.

In such cases, the original two instructions are deleted and a new instruction is added that folds the load into the whitelisted instruction.

We also do some work to splice the liveness information (Inst::LiveRangesEnded and Inst::isLastUse()) into the new instruction, so that the target lowering pass might still take advantage.  Currently this is used quite sparingly, but in the future we could use that along with operator commutativity to choose among different lowering sequences to reduce register pressure.

The whitelisted instruction kinds are chosen based primarily on whether the main operation's native instruction can use a memory operand - e.g., arithmetic (add/sub/imul/etc), compare (cmp/ucomiss), cast (movsx/movzx/etc).  Notably, call and ret are not included because arg passing is done through simple assignments which normal lowering is sufficient for.

BUG= none
R=jvoung@chromium.org, mtrofin@chromium.org

Review URL: https://codereview.chromium.org/1169493002
9 files changed