[InstCombine] convert bitcast-shuffle to vector trunc
As discussed in D76983, that patch can turn a chain of insert/extract
with scalar trunc ops into bitcast+extract and existing instcombine
vector transforms end up creating a shuffle out of that (see the
PhaseOrdering test for an example). Currently, that process requires
at least this sequence: -instcombine -early-cse -instcombine.
Before D76983, the sequence of insert/extract would reach the SLP
vectorizer and become a vector trunc there.
Based on a small sampling of public targets/types, converting the
shuffle to a trunc is better for codegen in most cases (and a
regression of that form is the reason this was noticed). The trunc is
clearly better for IR-level analysis as well.
This means that we can induce "spontaneous vectorization" without
invoking any explicit vectorizer passes (at least a vector cast op
may be created out of scalar casts), but that seems to be the right
choice given that we started with a chain of insert/extract, and the
backend would expand back to that chain if a target does not support
the op.
Differential Revision: https://reviews.llvm.org/D77299
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp b/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
index 1e95f23..da5a910 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
@@ -1657,6 +1657,47 @@
return NewBO;
}
+/// Convert a narrowing shuffle of a bitcasted vector into a vector truncate.
+/// Example (little endian):
+/// shuf (bitcast <4 x i16> X to <8 x i8>), <0, 2, 4, 6> --> trunc X to <4 x i8>
+static Instruction *foldTruncShuffle(ShuffleVectorInst &Shuf,
+ bool IsBigEndian) {
+ // This must be a bitcasted shuffle of 1 vector integer operand.
+ Type *DestType = Shuf.getType();
+ Value *X;
+ if (!match(Shuf.getOperand(0), m_BitCast(m_Value(X))) ||
+ !match(Shuf.getOperand(1), m_Undef()) || !DestType->isIntOrIntVectorTy())
+ return nullptr;
+
+ // The source type must have the same number of elements as the shuffle,
+ // and the source element type must be larger than the shuffle element type.
+ Type *SrcType = X->getType();
+ if (!SrcType->isVectorTy() || !SrcType->isIntOrIntVectorTy() ||
+ SrcType->getVectorNumElements() != DestType->getVectorNumElements() ||
+ SrcType->getScalarSizeInBits() % DestType->getScalarSizeInBits() != 0)
+ return nullptr;
+
+ assert(Shuf.changesLength() && !Shuf.increasesLength() &&
+ "Expected a shuffle that decreases length");
+
+ // Last, check that the mask chooses the correct low bits for each narrow
+ // element in the result.
+ uint64_t TruncRatio =
+ SrcType->getScalarSizeInBits() / DestType->getScalarSizeInBits();
+ ArrayRef<int> Mask = Shuf.getShuffleMask();
+ for (unsigned i = 0, e = Mask.size(); i != e; ++i) {
+ if (Mask[i] == UndefMaskElem)
+ continue;
+ uint64_t LSBIndex = IsBigEndian ? (i + 1) * TruncRatio - 1 : i * TruncRatio;
+ assert(LSBIndex <= std::numeric_limits<int32_t>::max() &&
+ "Overflowed 32-bits");
+ if (Mask[i] != (int)LSBIndex)
+ return nullptr;
+ }
+
+ return new TruncInst(X, DestType);
+}
+
/// Match a shuffle-select-shuffle pattern where the shuffles are widening and
/// narrowing (concatenating with undef and extracting back to the original
/// length). This allows replacing the wide select with a narrow select.
@@ -1951,6 +1992,9 @@
if (Instruction *I = foldSelectShuffle(SVI, Builder, DL))
return I;
+ if (Instruction *I = foldTruncShuffle(SVI, DL.isBigEndian()))
+ return I;
+
if (Instruction *I = narrowVectorSelect(SVI, Builder))
return I;