Change the TT_FAST hash function for from "insn_address >> 2" to
"insn_address >> 1".  The former is appropriate for ARM code, where
all insns are 4-sized and 4-aligned, but not for Thumb code, where the
minimum size and alignment is 2.  The old scheme happened to work for
Thumb (indeed, any hash function would), but caused huge amounts of
conflict misses in the fast cache for some programs.

The change has been observed to reduce conflict misses by up to 100
times, and in some cases, improves performance significantly for Thumb
code.  Performance of ARM code is unchanged or possibly a bit worse.



git-svn-id: svn://svn.valgrind.org/valgrind/trunk@11716 a5019735-40e9-0310-863c-91ae7b9d1cf9
diff --git a/coregrind/pub_core_transtab_asm.h b/coregrind/pub_core_transtab_asm.h
index 76e48db..f769247 100644
--- a/coregrind/pub_core_transtab_asm.h
+++ b/coregrind/pub_core_transtab_asm.h
@@ -54,12 +54,16 @@
 
 /* This macro isn't usable in asm land; nevertheless this seems
    like a good place to put it. */
+
 #if defined(VGA_x86) || defined(VGA_amd64)
 #  define VG_TT_FAST_HASH(_addr)  ((((UWord)(_addr))     ) & VG_TT_FAST_MASK)
-#elif defined(VGA_ppc32) || defined(VGA_ppc64) || defined(VGA_arm)
-#  define VG_TT_FAST_HASH(_addr)  ((((UWord)(_addr)) >> 2) & VG_TT_FAST_MASK)
-#elif defined(VGA_s390x)
+
+#elif defined(VGA_s390x) || defined(VGA_arm)
 #  define VG_TT_FAST_HASH(_addr)  ((((UWord)(_addr)) >> 1) & VG_TT_FAST_MASK)
+
+#elif defined(VGA_ppc32) || defined(VGA_ppc64)
+#  define VG_TT_FAST_HASH(_addr)  ((((UWord)(_addr)) >> 2) & VG_TT_FAST_MASK)
+
 #else
 #  error "VG_TT_FAST_HASH: unknown platform"
 #endif