[x86, MemCmpExpansion] allow 2 pairs of loads per block (PR33325)

This is the last step needed to fix PR33325:
https://bugs.llvm.org/show_bug.cgi?id=33325

We're trading branch and compares for loads and logic ops. 
This makes the code smaller and hopefully faster in most cases.

The 24-byte test shows an interesting construct: we load the trailing scalar 
elements into vector registers and generate the same pcmpeq+movmsk code that 
we expected for a pair of full vector elements (see the 32- and 64-byte tests).

Differential Revision: https://reviews.llvm.org/D41714

llvm-svn: 321934
5 files changed
tree: b3339387d653e097d5942262ae56c44ece95f984
  1. clang/
  2. clang-tools-extra/
  3. compiler-rt/
  4. debuginfo-tests/
  5. libclc/
  6. libcxx/
  7. libcxxabi/
  8. libunwind/
  9. lld/
  10. lldb/
  11. llgo/
  12. llvm/
  13. openmp/
  14. parallel-libs/
  15. polly/
  16. README.md
README.md

Low Level Virtual Machine (LLVM)

This directory and its subdirectories contain source code for LLVM, a toolkit for the construction of highly optimized compilers, optimizers, and runtime environments.