[X86] Change the implementation of scalar masked load/store intrinsics to not use a 512-bit intermediate vector. This is unnecessary for AVX512VL supporting CPUs like SKX. We can just emit a 128-bit masked load/store here no matter what. The backend will widen it to 512-bits on KNL CPUs. Fixes the frontend portion of PR37386. Need to fix the backend to optimize the new sequences well. llvm-svn: 331958

commit: 74ac0eda685e2a2e286b02cb679b68fd57c636b2 [log] [tgz]
author: Craig Topper <craig.topper@intel.com> Thu May 10 05:43:43 2018 +0000
committer: Craig Topper <craig.topper@intel.com> Thu May 10 05:43:43 2018 +0000
tree: 5efd2c185c0eca69871b8fc6bcab63a58495780a
parent: ae56a957afd5e4e51a7e9374ef91a8e143b6e6c0 [diff] [blame]
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index dfb9370..4d3bbd6 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp

@@ -8735,7 +8735,7 @@
 
   case X86::BI__builtin_ia32_storess128_mask:
   case X86::BI__builtin_ia32_storesd128_mask: {
-    return EmitX86MaskedStore(*this, Ops, 16);
+    return EmitX86MaskedStore(*this, Ops, 1);
   }
   case X86::BI__builtin_ia32_vpopcntb_128:
   case X86::BI__builtin_ia32_vpopcntd_128:
@@ -8819,7 +8819,7 @@
 
   case X86::BI__builtin_ia32_loadss128_mask:
   case X86::BI__builtin_ia32_loadsd128_mask:
-    return EmitX86MaskedLoad(*this, Ops, 16);
+    return EmitX86MaskedLoad(*this, Ops, 1);
 
   case X86::BI__builtin_ia32_loadaps128_mask:
   case X86::BI__builtin_ia32_loadaps256_mask:
commit	74ac0eda685e2a2e286b02cb679b68fd57c636b2	[log] [tgz]
author	Craig Topper <craig.topper@intel.com>	Thu May 10 05:43:43 2018 +0000
committer	Craig Topper <craig.topper@intel.com>	Thu May 10 05:43:43 2018 +0000
tree	5efd2c185c0eca69871b8fc6bcab63a58495780a
parent	ae56a957afd5e4e51a7e9374ef91a8e143b6e6c0 [diff] [blame]