[X86] Fix bug in vectorcall calling convention

Original implementation can't correctly handle __m256 and __m512 types
passed by reference through stack. This patch fixes it.

Patch by Wei Xiao!

Differential Revision: https://reviews.llvm.org/D57643

llvm-svn: 354921
diff --git a/llvm/lib/Target/X86/X86CallingConv.cpp b/llvm/lib/Target/X86/X86CallingConv.cpp
index 9be1147..aee344a 100644
--- a/llvm/lib/Target/X86/X86CallingConv.cpp
+++ b/llvm/lib/Target/X86/X86CallingConv.cpp
@@ -162,7 +162,10 @@
       // created on top of the basic 32 bytes of win64.
       // It can happen if the fifth or sixth argument is vector type or HVA.
       // At that case for each argument a shadow stack of 8 bytes is allocated.
-      if (Reg == X86::XMM4 || Reg == X86::XMM5)
+      const TargetRegisterInfo *TRI =
+          State.getMachineFunction().getSubtarget().getRegisterInfo();
+      if (TRI->regsOverlap(Reg, X86::XMM4) ||
+          TRI->regsOverlap(Reg, X86::XMM5))
         State.AllocateStack(8, 8);
 
       if (!ArgFlags.isHva()) {