[x86] enable CGP memcmp() expansion for 2/4/8 byte sizes

There are a couple of potential improvements as seen in the IR and asm:
1. We're unnecessarily extending to a larger type to compare values.
2. The codegen for (select cond, 1, -1) could avoid a cmov.
(or we could change the order of the compares, so we have a select with 0 operand)

llvm-svn: 305802
diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
index 1d58ccc..f13933e 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
@@ -2232,6 +2232,12 @@
   return (CallerBits & CalleeBits) == CalleeBits;
 }
 
+bool X86TTIImpl::expandMemCmp(Instruction *I, unsigned &MaxLoadSize) {
+  // TODO: We can increase these based on available vector ops.
+  MaxLoadSize = ST->is64Bit() ? 8 : 4;
+  return true;
+}
+
 bool X86TTIImpl::enableInterleavedAccessVectorization() {
   // TODO: We expect this to be beneficial regardless of arch,
   // but there are currently some unexplained performance artifacts on Atom.