Use movups to lower memcpy and memset even if it's not fast (like corei7). The theory is it's still faster than a pair of movq / a quad of movl. This will probably hurt older chips like P4 but should run faster on current and future Intel processors. rdar://8817010 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122955 91177308-0d34-0410-b5e6-96231b3b80d8

commit: 461f1fc359dff438dad25e809499845b10a3d032 [log] [tgz]
author: Evan Cheng <evan.cheng@apple.com> Thu Jan 06 07:58:36 2011 +0000
committer: Evan Cheng <evan.cheng@apple.com> Thu Jan 06 07:58:36 2011 +0000
tree: 143a2a682ffdd84409d6bd1673e22630d42d565e
parent: cce240d26bbf1c2bec9cfff4838d8d807b215586 [diff] [blame]
diff --git a/test/CodeGen/X86/memset64-on-x86-32.ll b/test/CodeGen/X86/memset64-on-x86-32.ll
index 3f069b4..5a0e893 100644
--- a/test/CodeGen/X86/memset64-on-x86-32.ll
+++ b/test/CodeGen/X86/memset64-on-x86-32.ll

@@ -1,6 +1,5 @@
 ; RUN: llc < %s -mtriple=i386-apple-darwin   -mcpu=nehalem | grep movups | count 5
-; RUN: llc < %s -mtriple=i386-apple-darwin   -mcpu=core2   | grep movl   | count 20
-; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=core2   | grep movq   | count 10
+; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=core2   | grep movups   | count 5
 
 define void @bork() nounwind {
 entry:
commit	461f1fc359dff438dad25e809499845b10a3d032	[log] [tgz]
author	Evan Cheng <evan.cheng@apple.com>	Thu Jan 06 07:58:36 2011 +0000
committer	Evan Cheng <evan.cheng@apple.com>	Thu Jan 06 07:58:36 2011 +0000
tree	143a2a682ffdd84409d6bd1673e22630d42d565e
parent	cce240d26bbf1c2bec9cfff4838d8d807b215586 [diff] [blame]