[X86] Don't make 512-bit vectors legal when preferred vector width is 256 bits and 512 bits aren't required
This patch adds a new function attribute "required-vector-width" that can be set by the frontend to indicate the maximum vector width present in the original source code. The idea is that this would be set based on ABI requirements, intrinsics or explicit vector types being used, maybe simd pragmas, etc. The backend will then use this information to determine if its save to make 512-bit vectors illegal when the preference is for 256-bit vectors.
For code that has no vectors in it originally and only get vectors through the loop and slp vectorizers this allows us to generate code largely similar to our AVX2 only output while still enabling AVX512 features like mask registers and gather/scatter. The loop vectorizer doesn't always obey TTI and will create oversized vectors with the expectation the backend will legalize it. In order to avoid changing the vectorizer and potentially harm our AVX2 codegen this patch tries to make the legalizer behavior similar.
This is restricted to CPUs that support AVX512F and AVX512VL so that we have good fallback options to use 128 and 256-bit vectors and still get masking.
I've qualified every place I could find in X86ISelLowering.cpp and added tests cases for many of them with 2 different values for the attribute to see the codegen differences.
We still need to do frontend work for the attribute and teach the inliner how to merge it, etc. But this gets the codegen layer ready for it.
Differential Revision: https://reviews.llvm.org/D42724
llvm-svn: 324834
diff --git a/llvm/lib/Target/X86/X86Subtarget.h b/llvm/lib/Target/X86/X86Subtarget.h
index b8adb63..4a23f4b 100644
--- a/llvm/lib/Target/X86/X86Subtarget.h
+++ b/llvm/lib/Target/X86/X86Subtarget.h
@@ -407,6 +407,9 @@
/// features.
unsigned PreferVectorWidth;
+ /// Required vector width from function attribute.
+ unsigned RequiredVectorWidth;
+
/// True if compiling for 64-bit, false for 16-bit or 32-bit.
bool In64BitMode;
@@ -433,7 +436,8 @@
///
X86Subtarget(const Triple &TT, StringRef CPU, StringRef FS,
const X86TargetMachine &TM, unsigned StackAlignOverride,
- unsigned PreferVectorWidthOverride);
+ unsigned PreferVectorWidthOverride,
+ unsigned RequiredVectorWidth);
const X86TargetLowering *getTargetLowering() const override {
return &TLInfo;
@@ -622,6 +626,7 @@
bool useRetpolineExternalThunk() const { return UseRetpolineExternalThunk; }
unsigned getPreferVectorWidth() const { return PreferVectorWidth; }
+ unsigned getRequiredVectorWidth() const { return RequiredVectorWidth; }
// Helper functions to determine when we should allow widening to 512-bit
// during codegen.
@@ -634,6 +639,16 @@
return hasBWI() && canExtendTo512DQ();
}
+ // If there are no 512-bit vectors and we prefer not to use 512-bit registers,
+ // disable them in the legalizer.
+ bool useAVX512Regs() const {
+ return hasAVX512() && (canExtendTo512DQ() || RequiredVectorWidth > 256);
+ }
+
+ bool useBWIRegs() const {
+ return hasBWI() && useAVX512Regs();
+ }
+
bool isXRaySupported() const override { return is64Bit(); }
X86ProcFamilyEnum getProcFamily() const { return X86ProcFamily; }