[FastISel] Sink local value materializations to first use
Summary:
Local values are constants, global addresses, and stack addresses that
can't be folded into the instruction that uses them. For example, when
storing the address of a global variable into memory, we need to
materialize that address into a register.
FastISel doesn't want to materialize any given local value more than
once, so it generates all local value materialization code at
EmitStartPt, which always dominates the current insertion point. This
allows it to maintain a map of local value registers, and it knows that
the local value area will always dominate the current insertion point.
The downside is that local value instructions are always emitted without
a source location. This is done to prevent jumpy line tables, but it
means that the local value area will be considered part of the previous
statement. Consider this C code:
call1(); // line 1
++global; // line 2
++global; // line 3
call2(&global, &local); // line 4
Today we end up with assembly and line tables like this:
.loc 1 1
callq call1
leaq global(%rip), %rdi
leaq local(%rsp), %rsi
.loc 1 2
addq $1, global(%rip)
.loc 1 3
addq $1, global(%rip)
.loc 1 4
callq call2
The LEA instructions in the local value area have no source location and
are treated as being on line 1. Stepping through the code in a debugger
and correlating it with the assembly won't make much sense, because
these materializations are only required for line 4.
This is actually problematic for the VS debugger "set next statement"
feature, which effectively assumes that there are no registers live
across statement boundaries. By sinking the local value code into the
statement and fixing up the source location, we can make that feature
work. This was filed as https://bugs.llvm.org/show_bug.cgi?id=35975 and
https://crbug.com/793819.
This change is obviously not enough to make this feature work reliably
in all cases, but I felt that it was worth doing anyway because it
usually generates smaller, more comprehensible -O0 code. I measured a
0.12% regression in code generation time with LLC on the sqlite3
amalgamation, so I think this is worth doing.
There are some special cases worth calling out in the commit message:
1. local values materialized for phis
2. local values used by no-op casts
3. dead local value code
Local values can be materialized for phis, and this does not show up as
a vreg use in MachineRegisterInfo. In this case, if there are no other
uses, this patch sinks the value to the first terminator, EH label, or
the end of the BB if nothing else exists.
Local values may also be used by no-op casts, which adds the register to
the RegFixups table. Without reversing the RegFixups map direction, we
don't have enough information to sink these instructions.
Lastly, if the local value register has no other uses, we can delete it.
This comes up when fastisel tries two instruction selection approaches
and the first materializes the value but fails and the second succeeds
without using the local value.
Reviewers: aprantl, dblaikie, qcolombet, MatzeB, vsk, echristo
Subscribers: dotdash, chandlerc, hans, sdardis, amccarth, javed.absar, zturner, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D43093
llvm-svn: 327581
diff --git a/llvm/test/CodeGen/X86/avx512-mask-zext-bugfix.ll b/llvm/test/CodeGen/X86/avx512-mask-zext-bugfix.ll
index 11aba2f..f501d9c 100755
--- a/llvm/test/CodeGen/X86/avx512-mask-zext-bugfix.ll
+++ b/llvm/test/CodeGen/X86/avx512-mask-zext-bugfix.ll
@@ -17,25 +17,21 @@
define void @test_xmm(i32 %shift, i32 %mulp, <2 x i64> %a,i8* %arraydecay,i8* %fname){
; CHECK-LABEL: test_xmm:
; CHECK: ## %bb.0:
-; CHECK-NEXT: subq $72, %rsp
-; CHECK-NEXT: .cfi_def_cfa_offset 80
-; CHECK-NEXT: movl $4, %eax
+; CHECK-NEXT: subq $56, %rsp
+; CHECK-NEXT: .cfi_def_cfa_offset 64
; CHECK-NEXT: vpmovw2m %xmm0, %k0
; CHECK-NEXT: movl $2, %esi
-; CHECK-NEXT: movl $8, %edi
-; CHECK-NEXT: movl %edi, {{[0-9]+}}(%rsp) ## 4-byte Spill
+; CHECK-NEXT: movl $8, %eax
; CHECK-NEXT: movq %rdx, %rdi
-; CHECK-NEXT: movl {{[0-9]+}}(%rsp), %r8d ## 4-byte Reload
; CHECK-NEXT: movq %rdx, {{[0-9]+}}(%rsp) ## 8-byte Spill
-; CHECK-NEXT: movl %r8d, %edx
+; CHECK-NEXT: movl %eax, %edx
+; CHECK-NEXT: kmovw %k0, {{[0-9]+}}(%rsp) ## 2-byte Spill
; CHECK-NEXT: movq %rcx, {{[0-9]+}}(%rsp) ## 8-byte Spill
; CHECK-NEXT: vmovaps %xmm0, {{[0-9]+}}(%rsp) ## 16-byte Spill
-; CHECK-NEXT: movl %eax, {{[0-9]+}}(%rsp) ## 4-byte Spill
-; CHECK-NEXT: kmovw %k0, {{[0-9]+}}(%rsp) ## 2-byte Spill
; CHECK-NEXT: callq _calc_expected_mask_val
; CHECK-NEXT: movl %eax, %edx
-; CHECK-NEXT: movw %dx, %r9w
-; CHECK-NEXT: movzwl %r9w, %esi
+; CHECK-NEXT: movw %dx, %r8w
+; CHECK-NEXT: movzwl %r8w, %esi
; CHECK-NEXT: kmovw {{[0-9]+}}(%rsp), %k0 ## 2-byte Reload
; CHECK-NEXT: kmovb %k0, %edi
; CHECK-NEXT: movq {{[0-9]+}}(%rsp), %rdx ## 8-byte Reload
@@ -45,25 +41,26 @@
; CHECK-NEXT: vpmovd2m %xmm0, %k0
; CHECK-NEXT: kmovq %k0, %k1
; CHECK-NEXT: kmovd %k0, %esi
-; CHECK-NEXT: movb %sil, %r10b
-; CHECK-NEXT: movzbl %r10b, %esi
-; CHECK-NEXT: movw %si, %r9w
+; CHECK-NEXT: movb %sil, %r9b
+; CHECK-NEXT: movzbl %r9b, %esi
+; CHECK-NEXT: movw %si, %r8w
; CHECK-NEXT: movq {{[0-9]+}}(%rsp), %rdi ## 8-byte Reload
-; CHECK-NEXT: movl {{[0-9]+}}(%rsp), %esi ## 4-byte Reload
+; CHECK-NEXT: movl $4, %esi
+; CHECK-NEXT: movl %esi, {{[0-9]+}}(%rsp) ## 4-byte Spill
; CHECK-NEXT: movl {{[0-9]+}}(%rsp), %edx ## 4-byte Reload
; CHECK-NEXT: movl %eax, {{[0-9]+}}(%rsp) ## 4-byte Spill
; CHECK-NEXT: kmovw %k1, {{[0-9]+}}(%rsp) ## 2-byte Spill
-; CHECK-NEXT: movw %r9w, {{[0-9]+}}(%rsp) ## 2-byte Spill
+; CHECK-NEXT: movw %r8w, {{[0-9]+}}(%rsp) ## 2-byte Spill
; CHECK-NEXT: callq _calc_expected_mask_val
-; CHECK-NEXT: movw %ax, %r9w
-; CHECK-NEXT: movw {{[0-9]+}}(%rsp), %r11w ## 2-byte Reload
-; CHECK-NEXT: movzwl %r11w, %edi
-; CHECK-NEXT: movzwl %r9w, %esi
+; CHECK-NEXT: movw %ax, %r8w
+; CHECK-NEXT: movw {{[0-9]+}}(%rsp), %r10w ## 2-byte Reload
+; CHECK-NEXT: movzwl %r10w, %edi
+; CHECK-NEXT: movzwl %r8w, %esi
; CHECK-NEXT: movq {{[0-9]+}}(%rsp), %rdx ## 8-byte Reload
; CHECK-NEXT: movq {{[0-9]+}}(%rsp), %rcx ## 8-byte Reload
; CHECK-NEXT: callq _check_mask16
-; CHECK-NEXT: movl %eax, {{[0-9]+}}(%rsp) ## 4-byte Spill
-; CHECK-NEXT: addq $72, %rsp
+; CHECK-NEXT: movl %eax, (%rsp) ## 4-byte Spill
+; CHECK-NEXT: addq $56, %rsp
; CHECK-NEXT: retq
%d2 = bitcast <2 x i64> %a to <8 x i16>
%m2 = call i8 @llvm.x86.avx512.cvtw2mask.128(<8 x i16> %d2)
diff --git a/llvm/test/CodeGen/X86/bmi-intrinsics-fast-isel.ll b/llvm/test/CodeGen/X86/bmi-intrinsics-fast-isel.ll
index 3c183a5..206738c 100644
--- a/llvm/test/CodeGen/X86/bmi-intrinsics-fast-isel.ll
+++ b/llvm/test/CodeGen/X86/bmi-intrinsics-fast-isel.ll
@@ -24,11 +24,11 @@
;
; X64-LABEL: test__tzcnt_u16:
; X64: # %bb.0:
-; X64-NEXT: movw $16, %cx
-; X64-NEXT: movzwl %di, %edx
-; X64-NEXT: tzcntw %dx, %ax
-; X64-NEXT: cmpl $0, %edx
-; X64-NEXT: cmovew %cx, %ax
+; X64-NEXT: movzwl %di, %eax
+; X64-NEXT: tzcntw %ax, %cx
+; X64-NEXT: cmpl $0, %eax
+; X64-NEXT: movw $16, %ax
+; X64-NEXT: cmovnew %cx, %ax
; X64-NEXT: retq
%zext = zext i16 %a0 to i32
%cmp = icmp ne i32 %zext, 0
@@ -146,9 +146,9 @@
;
; X64-LABEL: test__tzcnt_u32:
; X64: # %bb.0:
-; X64-NEXT: movl $32, %ecx
-; X64-NEXT: tzcntl %edi, %eax
-; X64-NEXT: cmovbl %ecx, %eax
+; X64-NEXT: tzcntl %edi, %ecx
+; X64-NEXT: movl $32, %eax
+; X64-NEXT: cmovael %ecx, %eax
; X64-NEXT: retq
%cmp = icmp ne i32 %a0, 0
%cttz = call i32 @llvm.cttz.i32(i32 %a0, i1 true)
@@ -176,11 +176,11 @@
;
; X64-LABEL: test_tzcnt_u16:
; X64: # %bb.0:
-; X64-NEXT: movw $16, %cx
-; X64-NEXT: movzwl %di, %edx
-; X64-NEXT: tzcntw %dx, %ax
-; X64-NEXT: cmpl $0, %edx
-; X64-NEXT: cmovew %cx, %ax
+; X64-NEXT: movzwl %di, %eax
+; X64-NEXT: tzcntw %ax, %cx
+; X64-NEXT: cmpl $0, %eax
+; X64-NEXT: movw $16, %ax
+; X64-NEXT: cmovnew %cx, %ax
; X64-NEXT: retq
%zext = zext i16 %a0 to i32
%cmp = icmp ne i32 %zext, 0
@@ -311,9 +311,9 @@
;
; X64-LABEL: test_tzcnt_u32:
; X64: # %bb.0:
-; X64-NEXT: movl $32, %ecx
-; X64-NEXT: tzcntl %edi, %eax
-; X64-NEXT: cmovbl %ecx, %eax
+; X64-NEXT: tzcntl %edi, %ecx
+; X64-NEXT: movl $32, %eax
+; X64-NEXT: cmovael %ecx, %eax
; X64-NEXT: retq
%cmp = icmp ne i32 %a0, 0
%cttz = call i32 @llvm.cttz.i32(i32 %a0, i1 true)
diff --git a/llvm/test/CodeGen/X86/fast-isel-call-cleanup.ll b/llvm/test/CodeGen/X86/fast-isel-call-cleanup.ll
index 724d53d..b5c891c 100644
--- a/llvm/test/CodeGen/X86/fast-isel-call-cleanup.ll
+++ b/llvm/test/CodeGen/X86/fast-isel-call-cleanup.ll
@@ -6,10 +6,8 @@
%call = call i32 @targetfn(i32 42)
ret void
; CHECK-LABEL: fastiselcall:
-; Local value area is still there:
-; CHECK: movl $42, {{%[a-z]+}}
-; Fast-ISel's arg mov is not here:
-; CHECK-NOT: movl $42, (%esp)
+; FastISel's local value code was dead, so it's gone.
+; CHECK-NOT: movl $42,
; SDag-ISel's arg mov:
; CHECK: movabsq $_targetfn, %[[REG:[^ ]*]]
; CHECK: movl $42, %edi
diff --git a/llvm/test/CodeGen/X86/fast-isel-store.ll b/llvm/test/CodeGen/X86/fast-isel-store.ll
index 6468186..49f22ec 100644
--- a/llvm/test/CodeGen/X86/fast-isel-store.ll
+++ b/llvm/test/CodeGen/X86/fast-isel-store.ll
@@ -58,11 +58,11 @@
; SSE64-NEXT: movdqu %xmm0, (%eax)
; SSE64-NEXT: retl
;
-; AVXONLY32-LABEL: test_store_4xi32:
-; AVXONLY32: # %bb.0:
-; AVXONLY32-NEXT: vpaddd %xmm1, %xmm0, %xmm0
-; AVXONLY32-NEXT: vmovdqu %xmm0, (%rdi)
-; AVXONLY32-NEXT: retq
+; AVX32-LABEL: test_store_4xi32:
+; AVX32: # %bb.0:
+; AVX32-NEXT: vpaddd %xmm1, %xmm0, %xmm0
+; AVX32-NEXT: vmovdqu %xmm0, (%rdi)
+; AVX32-NEXT: retq
;
; AVX64-LABEL: test_store_4xi32:
; AVX64: # %bb.0:
@@ -70,18 +70,6 @@
; AVX64-NEXT: vpaddd %xmm1, %xmm0, %xmm0
; AVX64-NEXT: vmovdqu %xmm0, (%eax)
; AVX64-NEXT: retl
-;
-; KNL32-LABEL: test_store_4xi32:
-; KNL32: # %bb.0:
-; KNL32-NEXT: vpaddd %xmm1, %xmm0, %xmm0
-; KNL32-NEXT: vmovdqu %xmm0, (%rdi)
-; KNL32-NEXT: retq
-;
-; SKX32-LABEL: test_store_4xi32:
-; SKX32: # %bb.0:
-; SKX32-NEXT: vpaddd %xmm1, %xmm0, %xmm0
-; SKX32-NEXT: vmovdqu %xmm0, (%rdi)
-; SKX32-NEXT: retq
%foo = add <4 x i32> %value, %value2 ; to force integer type on store
store <4 x i32> %foo, <4 x i32>* %addr, align 1
ret <4 x i32> %foo
@@ -101,11 +89,11 @@
; SSE64-NEXT: movdqa %xmm0, (%eax)
; SSE64-NEXT: retl
;
-; AVXONLY32-LABEL: test_store_4xi32_aligned:
-; AVXONLY32: # %bb.0:
-; AVXONLY32-NEXT: vpaddd %xmm1, %xmm0, %xmm0
-; AVXONLY32-NEXT: vmovdqa %xmm0, (%rdi)
-; AVXONLY32-NEXT: retq
+; AVX32-LABEL: test_store_4xi32_aligned:
+; AVX32: # %bb.0:
+; AVX32-NEXT: vpaddd %xmm1, %xmm0, %xmm0
+; AVX32-NEXT: vmovdqa %xmm0, (%rdi)
+; AVX32-NEXT: retq
;
; AVX64-LABEL: test_store_4xi32_aligned:
; AVX64: # %bb.0:
@@ -113,18 +101,6 @@
; AVX64-NEXT: vpaddd %xmm1, %xmm0, %xmm0
; AVX64-NEXT: vmovdqa %xmm0, (%eax)
; AVX64-NEXT: retl
-;
-; KNL32-LABEL: test_store_4xi32_aligned:
-; KNL32: # %bb.0:
-; KNL32-NEXT: vpaddd %xmm1, %xmm0, %xmm0
-; KNL32-NEXT: vmovdqa %xmm0, (%rdi)
-; KNL32-NEXT: retq
-;
-; SKX32-LABEL: test_store_4xi32_aligned:
-; SKX32: # %bb.0:
-; SKX32-NEXT: vpaddd %xmm1, %xmm0, %xmm0
-; SKX32-NEXT: vmovdqa %xmm0, (%rdi)
-; SKX32-NEXT: retq
%foo = add <4 x i32> %value, %value2 ; to force integer type on store
store <4 x i32> %foo, <4 x i32>* %addr, align 16
ret <4 x i32> %foo
diff --git a/llvm/test/CodeGen/X86/inreg.ll b/llvm/test/CodeGen/X86/inreg.ll
index e4610e3..445542a 100644
--- a/llvm/test/CodeGen/X86/inreg.ll
+++ b/llvm/test/CodeGen/X86/inreg.ll
@@ -20,7 +20,7 @@
; FAST-LABEL: g1:
; FAST: subl $[[AMT:.*]], %esp
- ; FAST-NEXT: leal 8(%esp), %eax
+ ; FAST-NEXT: leal 16(%esp), %eax
; FAST-NEXT: movl $41, %edx
; FAST-NEXT: movl $42, %ecx
; FAST: $43, (%esp)
diff --git a/llvm/test/CodeGen/X86/pr32241.ll b/llvm/test/CodeGen/X86/pr32241.ll
index 69c32ea..89a73d7 100644
--- a/llvm/test/CodeGen/X86/pr32241.ll
+++ b/llvm/test/CodeGen/X86/pr32241.ll
@@ -4,19 +4,16 @@
define i32 @_Z3foov() {
; CHECK-LABEL: _Z3foov:
; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: pushl %esi
-; CHECK-NEXT: .cfi_def_cfa_offset 8
; CHECK-NEXT: subl $16, %esp
-; CHECK-NEXT: .cfi_def_cfa_offset 24
-; CHECK-NEXT: .cfi_offset %esi, -8
-; CHECK-NEXT: movb $1, %al
+; CHECK-NEXT: .cfi_def_cfa_offset 20
; CHECK-NEXT: movw $10959, {{[0-9]+}}(%esp) # imm = 0x2ACF
; CHECK-NEXT: movw $-15498, {{[0-9]+}}(%esp) # imm = 0xC376
; CHECK-NEXT: movw $19417, {{[0-9]+}}(%esp) # imm = 0x4BD9
-; CHECK-NEXT: movzwl {{[0-9]+}}(%esp), %ecx
+; CHECK-NEXT: movzwl {{[0-9]+}}(%esp), %eax
; CHECK-NEXT: cmpw $0, {{[0-9]+}}(%esp)
-; CHECK-NEXT: movl %ecx, {{[0-9]+}}(%esp) # 4-byte Spill
-; CHECK-NEXT: movb %al, {{[0-9]+}}(%esp) # 1-byte Spill
+; CHECK-NEXT: movb $1, %cl
+; CHECK-NEXT: movl %eax, {{[0-9]+}}(%esp) # 4-byte Spill
+; CHECK-NEXT: movb %cl, {{[0-9]+}}(%esp) # 1-byte Spill
; CHECK-NEXT: jne .LBB0_2
; CHECK-NEXT: # %bb.1: # %lor.rhs
; CHECK-NEXT: xorl %eax, %eax
@@ -25,17 +22,17 @@
; CHECK-NEXT: jmp .LBB0_2
; CHECK-NEXT: .LBB0_2: # %lor.end
; CHECK-NEXT: movb {{[0-9]+}}(%esp), %al # 1-byte Reload
-; CHECK-NEXT: movb $1, %cl
; CHECK-NEXT: andb $1, %al
-; CHECK-NEXT: movzbl %al, %edx
-; CHECK-NEXT: movl {{[0-9]+}}(%esp), %esi # 4-byte Reload
-; CHECK-NEXT: cmpl %edx, %esi
+; CHECK-NEXT: movzbl %al, %ecx
+; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edx # 4-byte Reload
+; CHECK-NEXT: cmpl %ecx, %edx
; CHECK-NEXT: setl %al
; CHECK-NEXT: andb $1, %al
-; CHECK-NEXT: movzbl %al, %edx
-; CHECK-NEXT: xorl $-1, %edx
-; CHECK-NEXT: cmpl $0, %edx
-; CHECK-NEXT: movb %cl, {{[0-9]+}}(%esp) # 1-byte Spill
+; CHECK-NEXT: movzbl %al, %ecx
+; CHECK-NEXT: xorl $-1, %ecx
+; CHECK-NEXT: cmpl $0, %ecx
+; CHECK-NEXT: movb $1, %al
+; CHECK-NEXT: movb %al, {{[0-9]+}}(%esp) # 1-byte Spill
; CHECK-NEXT: jne .LBB0_4
; CHECK-NEXT: # %bb.3: # %lor.rhs4
; CHECK-NEXT: xorl %eax, %eax
@@ -50,7 +47,6 @@
; CHECK-NEXT: movw %dx, {{[0-9]+}}(%esp)
; CHECK-NEXT: movzwl {{[0-9]+}}(%esp), %eax
; CHECK-NEXT: addl $16, %esp
-; CHECK-NEXT: popl %esi
; CHECK-NEXT: retl
entry:
%aa = alloca i16, align 2
diff --git a/llvm/test/CodeGen/X86/pr32284.ll b/llvm/test/CodeGen/X86/pr32284.ll
index 44367cb..69dc24b 100644
--- a/llvm/test/CodeGen/X86/pr32284.ll
+++ b/llvm/test/CodeGen/X86/pr32284.ll
@@ -121,10 +121,10 @@
define void @f1() {
; X86-O0-LABEL: f1:
; X86-O0: # %bb.0: # %entry
-; X86-O0-NEXT: movabsq $8381627093, %rax # imm = 0x1F3957AD5
-; X86-O0-NEXT: movslq var_5, %rcx
-; X86-O0-NEXT: addq %rax, %rcx
-; X86-O0-NEXT: cmpq $0, %rcx
+; X86-O0-NEXT: movslq var_5, %rax
+; X86-O0-NEXT: movabsq $8381627093, %rcx # imm = 0x1F3957AD5
+; X86-O0-NEXT: addq %rcx, %rax
+; X86-O0-NEXT: cmpq $0, %rax
; X86-O0-NEXT: setne %dl
; X86-O0-NEXT: andb $1, %dl
; X86-O0-NEXT: movb %dl, -{{[0-9]+}}(%rsp)
@@ -308,30 +308,30 @@
define void @f2() {
; X86-O0-LABEL: f2:
; X86-O0: # %bb.0: # %entry
-; X86-O0-NEXT: # implicit-def: $rax
-; X86-O0-NEXT: movzbl var_7, %ecx
+; X86-O0-NEXT: movzbl var_7, %eax
; X86-O0-NEXT: cmpb $0, var_7
-; X86-O0-NEXT: setne %dl
-; X86-O0-NEXT: xorb $-1, %dl
-; X86-O0-NEXT: andb $1, %dl
-; X86-O0-NEXT: movzbl %dl, %esi
-; X86-O0-NEXT: xorl %esi, %ecx
-; X86-O0-NEXT: movw %cx, %di
-; X86-O0-NEXT: movw %di, -{{[0-9]+}}(%rsp)
-; X86-O0-NEXT: movzbl var_7, %ecx
-; X86-O0-NEXT: movw %cx, %di
-; X86-O0-NEXT: cmpw $0, %di
-; X86-O0-NEXT: setne %dl
-; X86-O0-NEXT: xorb $-1, %dl
-; X86-O0-NEXT: andb $1, %dl
-; X86-O0-NEXT: movzbl %dl, %ecx
-; X86-O0-NEXT: movzbl var_7, %esi
-; X86-O0-NEXT: cmpl %esi, %ecx
-; X86-O0-NEXT: sete %dl
-; X86-O0-NEXT: andb $1, %dl
-; X86-O0-NEXT: movzbl %dl, %ecx
-; X86-O0-NEXT: movw %cx, %di
-; X86-O0-NEXT: movw %di, (%rax)
+; X86-O0-NEXT: setne %cl
+; X86-O0-NEXT: xorb $-1, %cl
+; X86-O0-NEXT: andb $1, %cl
+; X86-O0-NEXT: movzbl %cl, %edx
+; X86-O0-NEXT: xorl %edx, %eax
+; X86-O0-NEXT: movw %ax, %si
+; X86-O0-NEXT: movw %si, -{{[0-9]+}}(%rsp)
+; X86-O0-NEXT: movzbl var_7, %eax
+; X86-O0-NEXT: movw %ax, %si
+; X86-O0-NEXT: cmpw $0, %si
+; X86-O0-NEXT: setne %cl
+; X86-O0-NEXT: xorb $-1, %cl
+; X86-O0-NEXT: andb $1, %cl
+; X86-O0-NEXT: movzbl %cl, %eax
+; X86-O0-NEXT: movzbl var_7, %edx
+; X86-O0-NEXT: cmpl %edx, %eax
+; X86-O0-NEXT: sete %cl
+; X86-O0-NEXT: andb $1, %cl
+; X86-O0-NEXT: movzbl %cl, %eax
+; X86-O0-NEXT: movw %ax, %si
+; X86-O0-NEXT: # implicit-def: $rdi
+; X86-O0-NEXT: movw %si, (%rdi)
; X86-O0-NEXT: retq
;
; X64-LABEL: f2:
@@ -353,41 +353,37 @@
;
; 686-O0-LABEL: f2:
; 686-O0: # %bb.0: # %entry
-; 686-O0-NEXT: pushl %edi
-; 686-O0-NEXT: .cfi_def_cfa_offset 8
; 686-O0-NEXT: pushl %esi
-; 686-O0-NEXT: .cfi_def_cfa_offset 12
+; 686-O0-NEXT: .cfi_def_cfa_offset 8
; 686-O0-NEXT: subl $2, %esp
-; 686-O0-NEXT: .cfi_def_cfa_offset 14
-; 686-O0-NEXT: .cfi_offset %esi, -12
-; 686-O0-NEXT: .cfi_offset %edi, -8
-; 686-O0-NEXT: # implicit-def: $eax
-; 686-O0-NEXT: movzbl var_7, %ecx
+; 686-O0-NEXT: .cfi_def_cfa_offset 10
+; 686-O0-NEXT: .cfi_offset %esi, -8
+; 686-O0-NEXT: movzbl var_7, %eax
; 686-O0-NEXT: cmpb $0, var_7
-; 686-O0-NEXT: setne %dl
-; 686-O0-NEXT: xorb $-1, %dl
-; 686-O0-NEXT: andb $1, %dl
-; 686-O0-NEXT: movzbl %dl, %esi
-; 686-O0-NEXT: xorl %esi, %ecx
-; 686-O0-NEXT: movw %cx, %di
-; 686-O0-NEXT: movw %di, (%esp)
-; 686-O0-NEXT: movzbl var_7, %ecx
-; 686-O0-NEXT: movw %cx, %di
-; 686-O0-NEXT: cmpw $0, %di
-; 686-O0-NEXT: setne %dl
-; 686-O0-NEXT: xorb $-1, %dl
-; 686-O0-NEXT: andb $1, %dl
-; 686-O0-NEXT: movzbl %dl, %ecx
-; 686-O0-NEXT: movzbl var_7, %esi
-; 686-O0-NEXT: cmpl %esi, %ecx
-; 686-O0-NEXT: sete %dl
-; 686-O0-NEXT: andb $1, %dl
-; 686-O0-NEXT: movzbl %dl, %ecx
-; 686-O0-NEXT: movw %cx, %di
-; 686-O0-NEXT: movw %di, (%eax)
+; 686-O0-NEXT: setne %cl
+; 686-O0-NEXT: xorb $-1, %cl
+; 686-O0-NEXT: andb $1, %cl
+; 686-O0-NEXT: movzbl %cl, %edx
+; 686-O0-NEXT: xorl %edx, %eax
+; 686-O0-NEXT: movw %ax, %si
+; 686-O0-NEXT: movw %si, (%esp)
+; 686-O0-NEXT: movzbl var_7, %eax
+; 686-O0-NEXT: movw %ax, %si
+; 686-O0-NEXT: cmpw $0, %si
+; 686-O0-NEXT: setne %cl
+; 686-O0-NEXT: xorb $-1, %cl
+; 686-O0-NEXT: andb $1, %cl
+; 686-O0-NEXT: movzbl %cl, %eax
+; 686-O0-NEXT: movzbl var_7, %edx
+; 686-O0-NEXT: cmpl %edx, %eax
+; 686-O0-NEXT: sete %cl
+; 686-O0-NEXT: andb $1, %cl
+; 686-O0-NEXT: movzbl %cl, %eax
+; 686-O0-NEXT: movw %ax, %si
+; 686-O0-NEXT: # implicit-def: $eax
+; 686-O0-NEXT: movw %si, (%eax)
; 686-O0-NEXT: addl $2, %esp
; 686-O0-NEXT: popl %esi
-; 686-O0-NEXT: popl %edi
; 686-O0-NEXT: retl
;
; 686-LABEL: f2:
diff --git a/llvm/test/CodeGen/X86/pr32340.ll b/llvm/test/CodeGen/X86/pr32340.ll
index f5a67c1..cb604c08 100644
--- a/llvm/test/CodeGen/X86/pr32340.ll
+++ b/llvm/test/CodeGen/X86/pr32340.ll
@@ -15,31 +15,31 @@
; X64: # %bb.0: # %entry
; X64-NEXT: xorl %eax, %eax
; X64-NEXT: movl %eax, %ecx
-; X64-NEXT: movabsq $-1142377792914660288, %rdx # imm = 0xF02575732E06E440
; X64-NEXT: movw $0, var_825
; X64-NEXT: movzwl var_32, %eax
-; X64-NEXT: movzwl var_901, %esi
-; X64-NEXT: movl %eax, %edi
-; X64-NEXT: xorl %esi, %edi
+; X64-NEXT: movzwl var_901, %edx
; X64-NEXT: movl %eax, %esi
-; X64-NEXT: xorl %edi, %esi
-; X64-NEXT: addl %eax, %esi
-; X64-NEXT: movslq %esi, %r8
-; X64-NEXT: movq %r8, var_826
+; X64-NEXT: xorl %edx, %esi
+; X64-NEXT: movl %eax, %edx
+; X64-NEXT: xorl %esi, %edx
+; X64-NEXT: addl %eax, %edx
+; X64-NEXT: movslq %edx, %rdi
+; X64-NEXT: movq %rdi, var_826
; X64-NEXT: movzwl var_32, %eax
-; X64-NEXT: movl %eax, %r8d
+; X64-NEXT: movl %eax, %edi
; X64-NEXT: movzwl var_901, %eax
; X64-NEXT: xorl $51981, %eax # imm = 0xCB0D
-; X64-NEXT: movslq %eax, %r9
-; X64-NEXT: xorq %rdx, %r9
-; X64-NEXT: movq %r8, %rdx
-; X64-NEXT: xorq %r9, %rdx
-; X64-NEXT: xorq $-1, %rdx
-; X64-NEXT: xorq %rdx, %r8
-; X64-NEXT: movq %r8, %rdx
-; X64-NEXT: orq var_57, %rdx
-; X64-NEXT: orq %rdx, %r8
-; X64-NEXT: movw %r8w, %r10w
+; X64-NEXT: movslq %eax, %r8
+; X64-NEXT: movabsq $-1142377792914660288, %r9 # imm = 0xF02575732E06E440
+; X64-NEXT: xorq %r9, %r8
+; X64-NEXT: movq %rdi, %r9
+; X64-NEXT: xorq %r8, %r9
+; X64-NEXT: xorq $-1, %r9
+; X64-NEXT: xorq %r9, %rdi
+; X64-NEXT: movq %rdi, %r8
+; X64-NEXT: orq var_57, %r8
+; X64-NEXT: orq %r8, %rdi
+; X64-NEXT: movw %di, %r10w
; X64-NEXT: movw %r10w, var_900
; X64-NEXT: cmpq var_28, %rcx
; X64-NEXT: setne %r11b
diff --git a/llvm/test/CodeGen/X86/pr32345.ll b/llvm/test/CodeGen/X86/pr32345.ll
index 4c34d0a..4910f98 100644
--- a/llvm/test/CodeGen/X86/pr32345.ll
+++ b/llvm/test/CodeGen/X86/pr32345.ll
@@ -10,28 +10,28 @@
define void @foo() {
; X640-LABEL: foo:
; X640: # %bb.0: # %bb
-; X640-NEXT: # implicit-def: $rax
-; X640-NEXT: movzwl var_22, %ecx
-; X640-NEXT: movzwl var_27, %edx
-; X640-NEXT: xorl %edx, %ecx
-; X640-NEXT: movzwl var_27, %edx
-; X640-NEXT: xorl %edx, %ecx
-; X640-NEXT: movslq %ecx, %rsi
-; X640-NEXT: movq %rsi, -{{[0-9]+}}(%rsp)
-; X640-NEXT: movzwl var_22, %ecx
-; X640-NEXT: movzwl var_27, %edx
-; X640-NEXT: xorl %edx, %ecx
-; X640-NEXT: movzwl var_27, %edx
-; X640-NEXT: xorl %edx, %ecx
-; X640-NEXT: movslq %ecx, %rsi
+; X640-NEXT: movzwl var_22, %eax
; X640-NEXT: movzwl var_27, %ecx
-; X640-NEXT: subl $16610, %ecx # imm = 0x40E2
-; X640-NEXT: movl %ecx, %ecx
-; X640-NEXT: # kill: def $rcx killed $ecx
+; X640-NEXT: xorl %ecx, %eax
+; X640-NEXT: movzwl var_27, %ecx
+; X640-NEXT: xorl %ecx, %eax
+; X640-NEXT: movslq %eax, %rdx
+; X640-NEXT: movq %rdx, -{{[0-9]+}}(%rsp)
+; X640-NEXT: movzwl var_22, %eax
+; X640-NEXT: movzwl var_27, %ecx
+; X640-NEXT: xorl %ecx, %eax
+; X640-NEXT: movzwl var_27, %ecx
+; X640-NEXT: xorl %ecx, %eax
+; X640-NEXT: movslq %eax, %rdx
+; X640-NEXT: movzwl var_27, %eax
+; X640-NEXT: subl $16610, %eax # imm = 0x40E2
+; X640-NEXT: movl %eax, %eax
+; X640-NEXT: movl %eax, %ecx
; X640-NEXT: # kill: def $cl killed $rcx
-; X640-NEXT: sarq %cl, %rsi
-; X640-NEXT: movb %sil, %cl
-; X640-NEXT: movb %cl, (%rax)
+; X640-NEXT: sarq %cl, %rdx
+; X640-NEXT: movb %dl, %cl
+; X640-NEXT: # implicit-def: $rdx
+; X640-NEXT: movb %cl, (%rdx)
; X640-NEXT: retq
;
; 6860-LABEL: foo:
@@ -49,36 +49,36 @@
; 6860-NEXT: .cfi_offset %esi, -20
; 6860-NEXT: .cfi_offset %edi, -16
; 6860-NEXT: .cfi_offset %ebx, -12
-; 6860-NEXT: # implicit-def: $eax
-; 6860-NEXT: movw var_22, %cx
-; 6860-NEXT: movzwl var_27, %edx
-; 6860-NEXT: movw %dx, %si
-; 6860-NEXT: xorw %si, %cx
-; 6860-NEXT: # implicit-def: $edi
-; 6860-NEXT: movw %cx, %di
-; 6860-NEXT: xorl %edx, %edi
-; 6860-NEXT: movw %di, %cx
-; 6860-NEXT: movzwl %cx, %edx
-; 6860-NEXT: movl %edx, {{[0-9]+}}(%esp)
+; 6860-NEXT: movw var_22, %ax
+; 6860-NEXT: movzwl var_27, %ecx
+; 6860-NEXT: movw %cx, %dx
+; 6860-NEXT: xorw %dx, %ax
+; 6860-NEXT: # implicit-def: $esi
+; 6860-NEXT: movw %ax, %si
+; 6860-NEXT: xorl %ecx, %esi
+; 6860-NEXT: movw %si, %ax
+; 6860-NEXT: movzwl %ax, %ecx
+; 6860-NEXT: movl %ecx, {{[0-9]+}}(%esp)
; 6860-NEXT: movl $0, {{[0-9]+}}(%esp)
-; 6860-NEXT: movw var_22, %cx
-; 6860-NEXT: movzwl var_27, %edx
-; 6860-NEXT: movw %dx, %si
-; 6860-NEXT: xorw %si, %cx
-; 6860-NEXT: # implicit-def: $edi
-; 6860-NEXT: movw %cx, %di
-; 6860-NEXT: xorl %edx, %edi
-; 6860-NEXT: movw %di, %cx
-; 6860-NEXT: movzwl %cx, %edi
-; 6860-NEXT: addl $-16610, %edx # imm = 0xBF1E
-; 6860-NEXT: movb %dl, %bl
-; 6860-NEXT: xorl %edx, %edx
+; 6860-NEXT: movw var_22, %ax
+; 6860-NEXT: movzwl var_27, %ecx
+; 6860-NEXT: movw %cx, %dx
+; 6860-NEXT: xorw %dx, %ax
+; 6860-NEXT: # implicit-def: $esi
+; 6860-NEXT: movw %ax, %si
+; 6860-NEXT: xorl %ecx, %esi
+; 6860-NEXT: movw %si, %ax
+; 6860-NEXT: movzwl %ax, %esi
+; 6860-NEXT: addl $-16610, %ecx # imm = 0xBF1E
+; 6860-NEXT: movb %cl, %bl
+; 6860-NEXT: xorl %ecx, %ecx
+; 6860-NEXT: movl %ecx, {{[0-9]+}}(%esp) # 4-byte Spill
; 6860-NEXT: movb %bl, %cl
-; 6860-NEXT: shrdl %cl, %edx, %edi
+; 6860-NEXT: movl {{[0-9]+}}(%esp), %edi # 4-byte Reload
+; 6860-NEXT: shrdl %cl, %edi, %esi
; 6860-NEXT: testb $32, %bl
-; 6860-NEXT: movl %eax, {{[0-9]+}}(%esp) # 4-byte Spill
; 6860-NEXT: movl %edi, {{[0-9]+}}(%esp) # 4-byte Spill
-; 6860-NEXT: movl %edx, {{[0-9]+}}(%esp) # 4-byte Spill
+; 6860-NEXT: movl %esi, {{[0-9]+}}(%esp) # 4-byte Spill
; 6860-NEXT: jne .LBB0_2
; 6860-NEXT: # %bb.1: # %bb
; 6860-NEXT: movl {{[0-9]+}}(%esp), %eax # 4-byte Reload
@@ -86,7 +86,7 @@
; 6860-NEXT: .LBB0_2: # %bb
; 6860-NEXT: movl {{[0-9]+}}(%esp), %eax # 4-byte Reload
; 6860-NEXT: movb %al, %cl
-; 6860-NEXT: movl {{[0-9]+}}(%esp), %eax # 4-byte Reload
+; 6860-NEXT: # implicit-def: $eax
; 6860-NEXT: movb %cl, (%eax)
; 6860-NEXT: leal -12(%ebp), %esp
; 6860-NEXT: popl %esi
diff --git a/llvm/test/CodeGen/X86/pr32484.ll b/llvm/test/CodeGen/X86/pr32484.ll
index de28044..4c3f6c3 100644
--- a/llvm/test/CodeGen/X86/pr32484.ll
+++ b/llvm/test/CodeGen/X86/pr32484.ll
@@ -7,9 +7,9 @@
; CHECK-NEXT: # implicit-def: $rax
; CHECK-NEXT: jmpq *%rax
; CHECK-NEXT: .LBB0_1:
-; CHECK-NEXT: # implicit-def: $rax
; CHECK-NEXT: xorps %xmm0, %xmm0
; CHECK-NEXT: pcmpeqd %xmm1, %xmm1
+; CHECK-NEXT: # implicit-def: $rax
; CHECK-NEXT: movdqu %xmm1, (%rax)
; CHECK-NEXT: movaps %xmm0, -{{[0-9]+}}(%rsp) # 16-byte Spill
; CHECK-NEXT: .LBB0_2:
diff --git a/llvm/test/CodeGen/X86/sink-local-value.ll b/llvm/test/CodeGen/X86/sink-local-value.ll
new file mode 100644
index 0000000..df2feb9
--- /dev/null
+++ b/llvm/test/CodeGen/X86/sink-local-value.ll
@@ -0,0 +1,210 @@
+; RUN: llc -O0 < %s | FileCheck %s
+
+target datalayout = "e-m:x-p:32:32-i64:64-f80:32-n8:16:32-a:0:32-S32"
+target triple = "i386-linux-gnu"
+
+; Try some simple cases that show how local value sinking improves line tables.
+
+@sink_across = external global i32
+
+declare void @simple_callee(i32, i32)
+
+define void @simple() !dbg !5 {
+ store i32 44, i32* @sink_across, !dbg !7
+ call void @simple_callee(i32 13, i32 55), !dbg !8
+ ret void, !dbg !9
+}
+
+; CHECK-LABEL: simple:
+; CHECK-NOT: movl $13,
+; CHECK: .loc 1 1 1 prologue_end
+; CHECK: movl $44, sink_across
+; CHECK: .loc 1 2 1
+; CHECK: movl $13,
+; CHECK: movl $55,
+; CHECK: calll simple_callee
+
+declare void @simple_reg_callee(i32 inreg, i32 inreg)
+
+define void @simple_reg() !dbg !10 {
+ store i32 44, i32* @sink_across, !dbg !11
+ call void @simple_reg_callee(i32 inreg 13, i32 inreg 55), !dbg !12
+ ret void, !dbg !13
+}
+
+; CHECK-LABEL: simple_reg:
+; CHECK: .loc 1 4 1 prologue_end
+; CHECK: movl $44, sink_across
+; CHECK: .loc 1 5 1
+; CHECK: movl $13,
+; CHECK: movl $55,
+; CHECK: calll simple_reg_callee
+
+; There are two interesting cases where local values have no uses but are not
+; dead: when the local value is directly used by a phi, and when the local
+; value is used by a no-op cast instruction. In these cases, we get side tables
+; referring to the local value vreg that we need to check.
+
+define i8* @phi_const(i32 %c) !dbg !14 {
+entry:
+ %tobool = icmp eq i32 %c, 0, !dbg !20
+ call void @llvm.dbg.value(metadata i1 %tobool, metadata !16, metadata !DIExpression()), !dbg !20
+ br i1 %tobool, label %if.else, label %if.then, !dbg !21
+
+if.then: ; preds = %entry
+ br label %if.end, !dbg !22
+
+if.else: ; preds = %entry
+ br label %if.end, !dbg !23
+
+if.end: ; preds = %if.else, %if.then
+ %r.0 = phi i8* [ inttoptr (i32 42 to i8*), %if.then ], [ inttoptr (i32 1 to i8*), %if.else ], !dbg !24
+ call void @llvm.dbg.value(metadata i8* %r.0, metadata !18, metadata !DIExpression()), !dbg !24
+ ret i8* %r.0, !dbg !25
+}
+
+; CHECK-LABEL: phi_const:
+; CHECK: # %entry
+; CHECK: cmpl $0,
+; CHECK: # %if.then
+; CHECK: movl $42,
+; CHECK: jmp
+; CHECK: # %if.else
+; CHECK: movl $1,
+; CHECK: # %if.end
+
+define i8* @phi_const_cast(i32 %c) !dbg !26 {
+entry:
+ %tobool = icmp eq i32 %c, 0, !dbg !32
+ call void @llvm.dbg.value(metadata i1 %tobool, metadata !28, metadata !DIExpression()), !dbg !32
+ br i1 %tobool, label %if.else, label %if.then, !dbg !33
+
+if.then: ; preds = %entry
+ %v42 = inttoptr i32 42 to i8*, !dbg !34
+ call void @llvm.dbg.value(metadata i8* %v42, metadata !29, metadata !DIExpression()), !dbg !34
+ br label %if.end, !dbg !35
+
+if.else: ; preds = %entry
+ %v1 = inttoptr i32 1 to i8*, !dbg !36
+ call void @llvm.dbg.value(metadata i8* %v1, metadata !30, metadata !DIExpression()), !dbg !36
+ br label %if.end, !dbg !37
+
+if.end: ; preds = %if.else, %if.then
+ %r.0 = phi i8* [ %v42, %if.then ], [ %v1, %if.else ], !dbg !38
+ call void @llvm.dbg.value(metadata i8* %r.0, metadata !31, metadata !DIExpression()), !dbg !38
+ ret i8* %r.0, !dbg !39
+}
+
+; CHECK-LABEL: phi_const_cast:
+; CHECK: # %entry
+; CHECK: cmpl $0,
+; CHECK: # %if.then
+; CHECK: movl $42, %[[REG:[a-z]+]]
+; CHECK: #DEBUG_VALUE: phi_const_cast:4 <- $[[REG]]
+; CHECK: jmp
+; CHECK: # %if.else
+; CHECK: movl $1, %[[REG:[a-z]+]]
+; CHECK: #DEBUG_VALUE: phi_const_cast:5 <- $[[REG]]
+; CHECK: # %if.end
+
+declare void @may_throw() local_unnamed_addr #1
+
+declare i32 @__gxx_personality_v0(...)
+
+define i32 @invoke_phi() personality i32 (...)* @__gxx_personality_v0 {
+entry:
+ store i32 42, i32* @sink_across
+ invoke void @may_throw()
+ to label %try.cont unwind label %lpad
+
+lpad: ; preds = %entry
+ %0 = landingpad { i8*, i32 }
+ catch i8* null
+ store i32 42, i32* @sink_across
+ br label %try.cont
+
+try.cont: ; preds = %entry, %lpad
+ %r.0 = phi i32 [ 13, %entry ], [ 55, %lpad ]
+ ret i32 %r.0
+}
+
+; The constant materialization should be *after* the stores to sink_across, but
+; before any EH_LABEL.
+
+; CHECK-LABEL: invoke_phi:
+; CHECK: movl $42, sink_across
+; CHECK: movl $13, %{{[a-z]*}}
+; CHECK: .Ltmp{{.*}}:
+; CHECK: calll may_throw
+; CHECK: .Ltmp{{.*}}:
+; CHECK: jmp .LBB{{.*}}
+; CHECK: .LBB{{.*}}: # %lpad
+; CHECK: movl $42, sink_across
+; CHECK: movl $55, %{{[a-z]*}}
+; CHECK: .LBB{{.*}}: # %try.cont
+; CHECK: retl
+
+
+; Function Attrs: nounwind readnone speculatable
+declare void @llvm.dbg.value(metadata, metadata, metadata) #0
+
+attributes #0 = { nounwind readnone speculatable }
+
+!llvm.dbg.cu = !{!0}
+!llvm.debugify = !{!3, !4}
+!llvm.module.flags = !{!52, !53}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C, file: !1, producer: "debugify", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2)
+!1 = !DIFile(filename: "../llvm/test/CodeGen/X86/sink-local-value.ll", directory: "/")
+!2 = !{}
+!3 = !{i32 27}
+!4 = !{i32 8}
+!5 = distinct !DISubprogram(name: "simple", linkageName: "simple", scope: null, file: !1, line: 1, type: !6, isLocal: false, isDefinition: true, scopeLine: 1, isOptimized: true, unit: !0, variables: !2)
+!6 = !DISubroutineType(types: !2)
+!7 = !DILocation(line: 1, column: 1, scope: !5)
+!8 = !DILocation(line: 2, column: 1, scope: !5)
+!9 = !DILocation(line: 3, column: 1, scope: !5)
+!10 = distinct !DISubprogram(name: "simple_reg", linkageName: "simple_reg", scope: null, file: !1, line: 4, type: !6, isLocal: false, isDefinition: true, scopeLine: 4, isOptimized: true, unit: !0, variables: !2)
+!11 = !DILocation(line: 4, column: 1, scope: !10)
+!12 = !DILocation(line: 5, column: 1, scope: !10)
+!13 = !DILocation(line: 6, column: 1, scope: !10)
+!14 = distinct !DISubprogram(name: "phi_const", linkageName: "phi_const", scope: null, file: !1, line: 7, type: !6, isLocal: false, isDefinition: true, scopeLine: 7, isOptimized: true, unit: !0, variables: !15)
+!15 = !{!16, !18}
+!16 = !DILocalVariable(name: "1", scope: !14, file: !1, line: 7, type: !17)
+!17 = !DIBasicType(name: "ty8", size: 8, encoding: DW_ATE_unsigned)
+!18 = !DILocalVariable(name: "2", scope: !14, file: !1, line: 11, type: !19)
+!19 = !DIBasicType(name: "ty32", size: 32, encoding: DW_ATE_unsigned)
+!20 = !DILocation(line: 7, column: 1, scope: !14)
+!21 = !DILocation(line: 8, column: 1, scope: !14)
+!22 = !DILocation(line: 9, column: 1, scope: !14)
+!23 = !DILocation(line: 10, column: 1, scope: !14)
+!24 = !DILocation(line: 11, column: 1, scope: !14)
+!25 = !DILocation(line: 12, column: 1, scope: !14)
+!26 = distinct !DISubprogram(name: "phi_const_cast", linkageName: "phi_const_cast", scope: null, file: !1, line: 13, type: !6, isLocal: false, isDefinition: true, scopeLine: 13, isOptimized: true, unit: !0, variables: !27)
+!27 = !{!28, !29, !30, !31}
+!28 = !DILocalVariable(name: "3", scope: !26, file: !1, line: 13, type: !17)
+!29 = !DILocalVariable(name: "4", scope: !26, file: !1, line: 15, type: !19)
+!30 = !DILocalVariable(name: "5", scope: !26, file: !1, line: 17, type: !19)
+!31 = !DILocalVariable(name: "6", scope: !26, file: !1, line: 19, type: !19)
+!32 = !DILocation(line: 13, column: 1, scope: !26)
+!33 = !DILocation(line: 14, column: 1, scope: !26)
+!34 = !DILocation(line: 15, column: 1, scope: !26)
+!35 = !DILocation(line: 16, column: 1, scope: !26)
+!36 = !DILocation(line: 17, column: 1, scope: !26)
+!37 = !DILocation(line: 18, column: 1, scope: !26)
+!38 = !DILocation(line: 19, column: 1, scope: !26)
+!39 = !DILocation(line: 20, column: 1, scope: !26)
+!40 = distinct !DISubprogram(name: "invoke_phi", linkageName: "invoke_phi", scope: null, file: !1, line: 21, type: !6, isLocal: false, isDefinition: true, scopeLine: 21, isOptimized: true, unit: !0, variables: !41)
+!41 = !{!42, !44}
+!42 = !DILocalVariable(name: "7", scope: !40, file: !1, line: 23, type: !43)
+!43 = !DIBasicType(name: "ty64", size: 64, encoding: DW_ATE_unsigned)
+!44 = !DILocalVariable(name: "8", scope: !40, file: !1, line: 26, type: !19)
+!45 = !DILocation(line: 21, column: 1, scope: !40)
+!46 = !DILocation(line: 22, column: 1, scope: !40)
+!47 = !DILocation(line: 23, column: 1, scope: !40)
+!48 = !DILocation(line: 24, column: 1, scope: !40)
+!49 = !DILocation(line: 25, column: 1, scope: !40)
+!50 = !DILocation(line: 26, column: 1, scope: !40)
+!51 = !DILocation(line: 27, column: 1, scope: !40)
+!52 = !{i32 2, !"Dwarf Version", i32 4}
+!53 = !{i32 2, !"Debug Info Version", i32 3}
diff --git a/llvm/test/CodeGen/X86/sse-intrinsics-fast-isel.ll b/llvm/test/CodeGen/X86/sse-intrinsics-fast-isel.ll
index 17e8e51..eeacb79 100644
--- a/llvm/test/CodeGen/X86/sse-intrinsics-fast-isel.ll
+++ b/llvm/test/CodeGen/X86/sse-intrinsics-fast-isel.ll
@@ -1485,8 +1485,8 @@
;
; X64-LABEL: test_mm_setcsr:
; X64: # %bb.0:
-; X64-NEXT: leaq -{{[0-9]+}}(%rsp), %rax
; X64-NEXT: movl %edi, -{{[0-9]+}}(%rsp)
+; X64-NEXT: leaq -{{[0-9]+}}(%rsp), %rax
; X64-NEXT: ldmxcsr (%rax)
; X64-NEXT: retq
%st = alloca i32, align 4
diff --git a/llvm/test/CodeGen/X86/win32_sret.ll b/llvm/test/CodeGen/X86/win32_sret.ll
index 0a5d62c..70fa22b 100644
--- a/llvm/test/CodeGen/X86/win32_sret.ll
+++ b/llvm/test/CodeGen/X86/win32_sret.ll
@@ -137,9 +137,9 @@
; Load the address of the result and put it onto stack
; The this pointer goes to ECX.
; (through %ecx in the -O0 build).
-; WIN32: leal {{[0-9]*}}(%esp), %e{{[a-d]}}x
-; WIN32: {{leal [1-9]+\(%esp\)|movl %esp}}, %ecx
-; WIN32: {{pushl %e[a-d]x|movl %e[a-d]x, \(%esp\)}}
+; WIN32-DAG: leal {{[0-9]*}}(%esp), %e{{[a-d]}}x
+; WIN32-DAG: {{leal [1-9]+\(%esp\)|movl %esp}}, %ecx
+; WIN32-DAG: {{pushl %e[a-d]x|movl %e[a-d]x, \(%esp\)}}
; WIN32-NEXT: calll "?foo@C5@@QAE?AUS5@@XZ"
; WIN32: retl
ret void
@@ -154,21 +154,21 @@
; LINUX-LABEL: test6_f:
; The %x argument is moved to %ecx. It will be the this pointer.
-; WIN32: movl {{16|20}}(%esp), %ecx
+; WIN32-DAG: movl {{16|20}}(%esp), %ecx
; The sret pointer is (%esp)
-; WIN32: {{leal 4\(%esp\)|movl %esp}}, %eax
-; WIN32-NEXT: {{pushl %eax|movl %eax, \(%esp\)}}
+; WIN32-DAG: {{leal 4\(%esp\)|movl %esp}}, %eax
+; WIN32-DAG: {{pushl %eax|movl %eax, \(%esp\)}}
; The sret pointer is %ecx
; The %x argument is moved to (%esp). It will be the this pointer.
-; MINGW_X86: {{leal 4\(%esp\)|movl %esp}}, %ecx
-; MINGW_X86-NEXT: {{pushl 16\(%esp\)|movl %eax, \(%esp\)}}
+; MINGW_X86-DAG: {{leal 4\(%esp\)|movl %esp}}, %ecx
+; MINGW_X86-DAG: {{pushl 16\(%esp\)|movl %eax, \(%esp\)}}
; MINGW_X86-NEXT: calll _test6_g
-; CYGWIN: {{leal 4\(%esp\)|movl %esp}}, %ecx
-; CYGWIN-NEXT: {{pushl 16\(%esp\)|movl %eax, \(%esp\)}}
+; CYGWIN-DAG: {{leal 4\(%esp\)|movl %esp}}, %ecx
+; CYGWIN-DAG: {{pushl 16\(%esp\)|movl %eax, \(%esp\)}}
; CYGWIN-NEXT: calll _test6_g
%tmp = alloca %struct.test6, align 4