Move early tail duplication earlier.

This fixes the issue noted in PR10251 where early tail dup of bbs with
indirectbr would cause a bb to be duplicated into a loop preheader
and then into its predecessors, creating phi nodes with identical
operands just before register allocation.

This helps with jsinterp.o size (__TEXT goes from 163568 to 126656)
and a bit with performance 1.005x faster on sunspider (jits still enabled).

The result on webkit with the jit disabled is more significant: 1.021x faster.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@134372 91177308-0d34-0410-b5e6-96231b3b80d8
diff --git a/lib/CodeGen/LLVMTargetMachine.cpp b/lib/CodeGen/LLVMTargetMachine.cpp
index b98fbed..0255b28 100644
--- a/lib/CodeGen/LLVMTargetMachine.cpp
+++ b/lib/CodeGen/LLVMTargetMachine.cpp
@@ -388,6 +388,12 @@
   // Expand pseudo-instructions emitted by ISel.
   PM.add(createExpandISelPseudosPass());
 
+  // Pre-ra tail duplication.
+  if (OptLevel != CodeGenOpt::None && !DisableEarlyTailDup) {
+    PM.add(createTailDuplicatePass(true));
+    printAndVerify(PM, "After Pre-RegAlloc TailDuplicate");
+  }
+
   // Optimize PHIs before DCE: removing dead PHI cycles may make more
   // instructions dead.
   if (OptLevel != CodeGenOpt::None)
@@ -416,12 +422,6 @@
     printAndVerify(PM, "After codegen peephole optimization pass");
   }
 
-  // Pre-ra tail duplication.
-  if (OptLevel != CodeGenOpt::None && !DisableEarlyTailDup) {
-    PM.add(createTailDuplicatePass(true));
-    printAndVerify(PM, "After Pre-RegAlloc TailDuplicate");
-  }
-
   // Run pre-ra passes.
   if (addPreRegAlloc(PM, OptLevel))
     printAndVerify(PM, "After PreRegAlloc passes");