[X86] Improve mul w/ overflow codegen, to MUL8+SETO.
Currently, @llvm.smul.with.overflow.i8 expands to 9 instructions, where
3 are really needed.
This adds X86ISD::UMUL8/SMUL8 SD nodes, and custom lowers them to
MUL8/IMUL8 + SETO.
i8 is a special case because there is no two/three operand variants of
(I)MUL8, so the first operand and return value need to go in AL/AX.
Also, we can't write patterns for these instructions: TableGen refuses
patterns where output operands don't match SDNode results. In this case,
instructions where the output operand is an implicitly defined register.
A related special case (and FIXME) exists for MUL8 (X86InstrArith.td):
// FIXME: Used for 8-bit mul, ignore result upper 8 bits.
// This probably ought to be moved to a def : Pat<> if the
// syntax can be accepted.
[(set AL, (mul AL, GR8:$src)), (implicit EFLAGS)]
Ideally, these go away with UMUL8, but we still need to improve TableGen
support of implicit operands in patterns.
Before this change:
movsbl %sil, %eax
movsbl %dil, %ecx
imull %eax, %ecx
movb %cl, %al
sarb $7, %al
movzbl %al, %eax
movzbl %ch, %esi
cmpl %eax, %esi
setne %al
After:
movb %dil, %al
imulb %sil
seto %al
Also, remove a made-redundant testcase for PR19858, and enable more FastISel
ALU-overflow tests for SelectionDAG too.
Differential Revision: http://reviews.llvm.org/D5809
llvm-svn: 220516
diff --git a/llvm/test/CodeGen/X86/xaluo.ll b/llvm/test/CodeGen/X86/xaluo.ll
index 6a98037..54a4d6a 100644
--- a/llvm/test/CodeGen/X86/xaluo.ll
+++ b/llvm/test/CodeGen/X86/xaluo.ll
@@ -123,12 +123,9 @@
; Check boundary conditions for large immediates.
define zeroext i1 @saddo.i64imm2(i64 %v1, i64* %res) {
entry:
-; SDAG-LABEL: saddo.i64imm2
-; SDAG: addq $-2147483648, %rdi
-; SDAG-NEXT: seto %al
-; FAST-LABEL: saddo.i64imm2
-; FAST: addq $-2147483648, %rdi
-; FAST-NEXT: seto %al
+; CHECK-LABEL: saddo.i64imm2
+; CHECK: addq $-2147483648, %rdi
+; CHECK-NEXT: seto %al
%t = call {i64, i1} @llvm.sadd.with.overflow.i64(i64 %v1, i64 -2147483648)
%val = extractvalue {i64, i1} %t, 0
%obit = extractvalue {i64, i1} %t, 1
@@ -297,10 +294,10 @@
; SMULO
define zeroext i1 @smulo.i8(i8 %v1, i8 %v2, i8* %res) {
entry:
-; FAST-LABEL: smulo.i8
-; FAST: movb %dil, %al
-; FAST-NEXT: imulb %sil
-; FAST-NEXT: seto %cl
+; CHECK-LABEL: smulo.i8
+; CHECK: movb %dil, %al
+; CHECK-NEXT: imulb %sil
+; CHECK-NEXT: seto %cl
%t = call {i8, i1} @llvm.smul.with.overflow.i8(i8 %v1, i8 %v2)
%val = extractvalue {i8, i1} %t, 0
%obit = extractvalue {i8, i1} %t, 1
@@ -347,10 +344,10 @@
; UMULO
define zeroext i1 @umulo.i8(i8 %v1, i8 %v2, i8* %res) {
entry:
-; FAST-LABEL: umulo.i8
-; FAST: movb %dil, %al
-; FAST-NEXT: mulb %sil
-; FAST-NEXT: seto %cl
+; CHECK-LABEL: umulo.i8
+; CHECK: movb %dil, %al
+; CHECK-NEXT: mulb %sil
+; CHECK-NEXT: seto %cl
%t = call {i8, i1} @llvm.umul.with.overflow.i8(i8 %v1, i8 %v2)
%val = extractvalue {i8, i1} %t, 0
%obit = extractvalue {i8, i1} %t, 1