In the pre-RA scheduler, maintain cmp+br proximity.
This is done by pushing physical register definitions close to their
use, which happens to handle flag definitions if they're not glued to
the branch. This seems to be generally a good thing though, so I
didn't need to add a target hook yet.
The primary motivation is to generate code closer to what people
expect and rule out missed opportunity from enabling macro-op
fusion. As a side benefit, we get several 2-5% gains on x86
benchmarks. There is one regression:
SingleSource/Benchmarks/Shootout/lists slows down be -10%. But this is
an independent scheduler bug that will be tracked separately.
See rdar://problem/9283108.
Incidentally, pre-RA scheduling is only half the solution. Fixing the
later passes is tracked by:
<rdar://problem/8932804> [pre-RA-sched] on x86, attempt to schedule CMP/TEST adjacent with condition jump
Fixes:
<rdar://problem/9262453> Scheduler unnecessary break of cmp/jump fusion
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129508 91177308-0d34-0410-b5e6-96231b3b80d8
diff --git a/test/CodeGen/X86/tail-opts.ll b/test/CodeGen/X86/tail-opts.ll
index 424bd21..77710ad 100644
--- a/test/CodeGen/X86/tail-opts.ll
+++ b/test/CodeGen/X86/tail-opts.ll
@@ -109,15 +109,15 @@
; CHECK: dont_merge_oddly:
; CHECK-NOT: ret
-; CHECK: ucomiss %xmm1, %xmm2
+; CHECK: ucomiss %xmm{{[0-2]}}, %xmm{{[0-2]}}
; CHECK-NEXT: jbe .LBB2_3
-; CHECK-NEXT: ucomiss %xmm0, %xmm1
+; CHECK-NEXT: ucomiss %xmm{{[0-2]}}, %xmm{{[0-2]}}
; CHECK-NEXT: ja .LBB2_4
; CHECK-NEXT: .LBB2_2:
; CHECK-NEXT: movb $1, %al
; CHECK-NEXT: ret
; CHECK-NEXT: .LBB2_3:
-; CHECK-NEXT: ucomiss %xmm0, %xmm2
+; CHECK-NEXT: ucomiss %xmm{{[0-2]}}, %xmm{{[0-2]}}
; CHECK-NEXT: jbe .LBB2_2
; CHECK-NEXT: .LBB2_4:
; CHECK-NEXT: xorb %al, %al