add ERMS (enhanced rep mov/sto) SkOpts slice
Intel's got two CPUID bits indicating the speed of rep mov/sto
(memcpy/memset),
- ERMS, Enhanced Rep Mov/Sto, older, large copies are fast?
- FSRM, Fast Short Rep Mov, newer, small copies are fast?
ERMS has been around a long time on Intel, but is relatively recent on
Ryzen, and FSRM is new across the board. The startup cost for
ERMS-but-not-FSRM copies really is noticeable, so we cut over to the
previous SSE/AVX routines when N is small.
I've left the memset benchmarks as I found them most useful when
tuning the small/large cutoff in this CL.
Change-Id: I3ac4e3f34796aba0ea86aabbe9dda7526919456a
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/332580
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
6 files changed