AMDGPU: Directly emit m0 initialization with s_mov_b32

Currently what comes out of instruction selection is a
register initialized to -1, and then copied to m0.
MachineCSE doesn't consider copies, but we want these
to be CSEed. This isn't much of a problem currently,
because SIFoldOperands is run immediately after.

This avoids regressions when SIFoldOperands is run later
from leaving all copies to m0.

llvm-svn: 266377
2 files changed