Add a "nop filler" pass to SPU.

Filling no-ops is done just before emitting of assembly,
when the instruction stream is final. No-ops are inserted
to align the instructions so the dual-issue of the pipeline
is utilized. This speeds up generated code with a minimum of 
1% on a select set of algorithms.

This pass may be redundant if the instruction scheduler and 
all subsequent passes that modify the instruction stream 
(prolog+epilog inserter, register scavenger, are there others?)
are made aware of the instruction alignments.

llvm-svn: 123226
diff --git a/llvm/lib/Target/CellSPU/SPUTargetMachine.cpp b/llvm/lib/Target/CellSPU/SPUTargetMachine.cpp
index 3423c69..3ed7361 100644
--- a/llvm/lib/Target/CellSPU/SPUTargetMachine.cpp
+++ b/llvm/lib/Target/CellSPU/SPUTargetMachine.cpp
@@ -59,3 +59,12 @@
   PM.add(createSPUISelDag(*this));
   return false;
 }
+
+// passes to run just before printing the assembly
+bool SPUTargetMachine::
+addPreEmitPass(PassManagerBase &PM, CodeGenOpt::Level OptLevel) 
+{
+  //align instructions with nops/lnops for dual issue
+  PM.add(createSPUNopFillerPass(*this));
+  return true;
+}