[X86] Add two combine rules to simplify dag nodes introduced during type legalization when promoting nodes with illegal vector type.
This patch teaches the backend how to simplify/canonicalize dag node
sequences normally introduced by the backend when promoting certain dag nodes
with illegal vector type.
This patch adds two new combine rules:
1) fold (shuffle (bitcast (BINOP A, B)), Undef, <Mask>) ->
(shuffle (BINOP (bitcast A), (bitcast B)), Undef, <Mask>)
2) fold (BINOP (shuffle (A, Undef, <Mask>)), (shuffle (B, Undef, <Mask>))) ->
(shuffle (BINOP A, B), Undef, <Mask>).
Both rules are only triggered on the type-legalized DAG.
In particular, rule 1. is a target specific combine rule that attempts
to sink a bitconvert into the operands of a binary operation.
Rule 2. is a target independet rule that attempts to move a shuffle
immediately after a binary operation.
llvm-svn: 209930
diff --git a/llvm/test/CodeGen/X86/lower-bitcast.ll b/llvm/test/CodeGen/X86/lower-bitcast.ll
index b9b29a5..769831e 100644
--- a/llvm/test/CodeGen/X86/lower-bitcast.ll
+++ b/llvm/test/CodeGen/X86/lower-bitcast.ll
@@ -14,7 +14,7 @@
; CHECK-LABEL: test1
; CHECK-NOT: movsd
; CHECK: pshufd
-; CHECK-NEXT: paddq
+; CHECK-NEXT: paddd
; CHECK-NEXT: pshufd
; CHECK-NEXT: ret
@@ -26,16 +26,9 @@
%3 = bitcast <2 x i32> %add to double
ret double %3
}
-; FIXME: Ideally we should be able to fold the entire body of @test2 into a
-; single 'paddd %xmm1, %xmm0' instruction. At the moment we produce the
-; sequence pshufd+pshufd+paddq+pshufd.
-
; CHECK-LABEL: test2
; CHECK-NOT: movsd
-; CHECK: pshufd
-; CHECK-NEXT: pshufd
-; CHECK-NEXT: paddq
-; CHECK-NEXT: pshufd
+; CHECK: paddd
; CHECK-NEXT: ret
@@ -91,7 +84,7 @@
; CHECK-LABEL: test6
; CHECK-NOT: movsd
; CHECK: punpcklwd
-; CHECK-NEXT: paddd
+; CHECK-NEXT: paddw
; CHECK-NEXT: pshufb
; CHECK-NEXT: ret
@@ -103,16 +96,10 @@
%3 = bitcast <4 x i16> %add to double
ret double %3
}
-; FIXME: Ideally we should be able to fold the entire body of @test7 into a
-; single 'paddw %xmm1, %xmm0' instruction. At the moment we produce the
-; sequence pshufd+pshufd+paddd+pshufd.
-
; CHECK-LABEL: test7
; CHECK-NOT: movsd
-; CHECK: punpcklwd
-; CHECK-NEXT: punpcklwd
-; CHECK-NEXT: paddd
-; CHECK-NEXT: pshufb
+; CHECK-NOT: punpcklwd
+; CHECK: paddw
; CHECK-NEXT: ret
@@ -129,7 +116,7 @@
; CHECK-LABEL: test8
; CHECK-NOT: movsd
; CHECK: punpcklbw
-; CHECK-NEXT: paddw
+; CHECK-NEXT: paddb
; CHECK-NEXT: pshufb
; CHECK-NEXT: ret
@@ -141,15 +128,9 @@
%3 = bitcast <8 x i8> %add to double
ret double %3
}
-; FIXME: Ideally we should be able to fold the entire body of @test9 into a
-; single 'paddb %xmm1, %xmm0' instruction. At the moment we produce the
-; sequence pshufd+pshufd+paddw+pshufd.
-
; CHECK-LABEL: test9
; CHECK-NOT: movsd
-; CHECK: punpcklbw
-; CHECK-NEXT: punpcklbw
-; CHECK-NEXT: paddw
-; CHECK-NEXT: pshufb
+; CHECK-NOT: punpcklbw
+; CHECK: paddb
; CHECK-NEXT: ret