arm64: xchg: Implement cmpxchg_double
The arm64 architecture has the ability to exclusively load and store
a pair of registers from an address (ldxp/stxp). Also the SLUB can take
advantage of a cmpxchg_double implementation to avoid taking some
locks.
This patch provides an implementation of cmpxchg_double for 64-bit
pairs, and activates the logic required for the SLUB to use these
functions (HAVE_ALIGNED_STRUCT_PAGE and HAVE_CMPXCHG_DOUBLE).
Also definitions of this_cpu_cmpxchg_8 and this_cpu_cmpxchg_double_8
are wired up to cmpxchg_local and cmpxchg_double_local (rather than the
stock implementations that perform non-atomic operations with
interrupts disabled) as they are used by the SLUB.
On a Juno platform running on only the A57s I get quite a noticeable
performance improvement with 5 runs of hackbench on v3.17:
Baseline | With Patch
-----------------+-----------
Mean 119.2312 | 106.1782
StdDev 0.4919 | 0.4494
(times taken to complete `./hackbench 100 process 1000', in seconds)
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 2c3c2ca..1e0e467 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -34,6 +34,7 @@
select GENERIC_TIME_VSYSCALL
select HANDLE_DOMAIN_IRQ
select HARDIRQS_SW_RESEND
+ select HAVE_ALIGNED_STRUCT_PAGE if SLUB
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_KGDB
@@ -41,6 +42,7 @@
select HAVE_BPF_JIT
select HAVE_C_RECORDMCOUNT
select HAVE_CC_STACKPROTECTOR
+ select HAVE_CMPXCHG_DOUBLE
select HAVE_DEBUG_BUGVERBOSE
select HAVE_DEBUG_KMEMLEAK
select HAVE_DMA_API_DEBUG