-----------------------------------------------------------------------------
overview
-----------------------------------------------------------------------------
This commit introduces an optimisation that speeds up Memcheck by roughly
-3 -- 28%, and Addrcheck by 1 -- 36%, at least for the SPEC2000 benchmarks on
my 1400MHz Athlon.
Basic idea: that handling of A/V bit updates on %esp-adjustments was quite
sub-optimal -- for each "PUT ESP", a function was called that computed the
delta from the old and new ESPs, and then called a looping function to deal
with it.
Improvements:
1. most of the time, the delta can be seen from the code. So there's no need
to compute it.
2. when the delta is known, we can directly call a skin function to handle it.
3. we can specialise for certain common cases (eg. +/- 4, 8, 12, 16, 32),
including having unrolled loops for these.
This slightly bloats UCode because of setting up args for the call, and for
updating ESP in code (previously was done in the called C function). Eg. for
`date' the code expansion ratio goes from 14.2 --> 14.6. But it's much faster.
Note that skins don't have to use the specialised cases, they can just
define the ordinary case if they want; the specialised cases are only used
if present.
-----------------------------------------------------------------------------
details
-----------------------------------------------------------------------------
Removed addrcheck/ac_common.c, put its (minimal) contents in ac_main.c.
Updated the major interface version, because this change isn't binary
compatible with the old core/skin interface.
Removed the hooks {new,die}_mem_stack_aligned, replaced with the better
{new,die}_mem_stack_{4,8,12,16,32}. Still have the generic {die,new}_mem_stack
hooks. These are called directly from UCode, thanks to a new pass that occurs
between instrumentation and register allocation (but only if the skin uses
these stack-adjustment hooks). VG_(unknown_esp_update)() is called from UCode
for the generic case; it determines if it's a stack switch, and calls the
generic {new,die}_stack_mem hooks accordingly. This meant
synth_handle_esp_assignment() could be removed.
The new %esp-delta computation phase is in vg_translate.c.
In Memcheck and Addrcheck, added functions for updating the A and V bits of a
single aligned word and a single aligned doubleword. These are called from the
specialised functions new_mem_stack_4, etc. Could remove the one for the old
hooks new_mem_stack_aligned and die_mem_stack_aligned.
In mc_common.h, added a big macro containing the definitions of new_mem_stack_4
et al. It's ``instantiated'' separately by Memcheck and Addrcheck. The macro
is a bit klugey, but I did it that way because speed is vital for these
functions, so eg. a function pointer would have slowed things down.
Updated the built-in profiling events appropriately for the changes (removed
one old event, added a new one; finding their names is left as an exercise for
the reader).
Fixed memory event profiling in {Addr,Mem}check, which had rotted.
A few other minor things.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1510 a5019735-40e9-0310-863c-91ae7b9d1cf9
diff --git a/include/vg_skin.h b/include/vg_skin.h
index 71940d7..986b8f3 100644
--- a/include/vg_skin.h
+++ b/include/vg_skin.h
@@ -116,8 +116,8 @@
interface; if the core and skin major versions don't match, Valgrind
will abort. The minor version indicates binary-compatible changes.
*/
-#define VG_CORE_INTERFACE_MAJOR_VERSION 1
-#define VG_CORE_INTERFACE_MINOR_VERSION 2
+#define VG_CORE_INTERFACE_MAJOR_VERSION 2
+#define VG_CORE_INTERFACE_MINOR_VERSION 0
extern const Int VG_(skin_interface_major_version);
extern const Int VG_(skin_interface_minor_version);
@@ -187,11 +187,11 @@
VGP_PAIR(VgpSched, "scheduler"), \
VGP_PAIR(VgpMalloc, "low-lev malloc/free"), \
VGP_PAIR(VgpCliMalloc, "client malloc/free"), \
- VGP_PAIR(VgpStack, "adjust-stack"), \
VGP_PAIR(VgpTranslate, "translate-main"), \
VGP_PAIR(VgpToUCode, "to-ucode"), \
VGP_PAIR(VgpFromUcode, "from-ucode"), \
VGP_PAIR(VgpImprove, "improve"), \
+ VGP_PAIR(VgpESPUpdate, "ESP-update"), \
VGP_PAIR(VgpRegAlloc, "reg-alloc"), \
VGP_PAIR(VgpLiveness, "liveness-analysis"), \
VGP_PAIR(VgpDoLRU, "do-lru"), \
@@ -255,6 +255,7 @@
/* Check if an address is 4-byte aligned */
#define IS_ALIGNED4_ADDR(aaa_p) (0 == (((UInt)(aaa_p)) & 3))
+#define IS_ALIGNED8_ADDR(aaa_p) (0 == (((UInt)(aaa_p)) & 7))
/* ------------------------------------------------------------------ */
@@ -1429,13 +1430,26 @@
EV VG_(track_new_mem_startup) ( void (*f)(Addr a, UInt len,
Bool rr, Bool ww, Bool xx) );
EV VG_(track_new_mem_heap) ( void (*f)(Addr a, UInt len, Bool is_inited) );
-EV VG_(track_new_mem_stack) ( void (*f)(Addr a, UInt len) );
-EV VG_(track_new_mem_stack_aligned) ( void (*f)(Addr a, UInt len) );
EV VG_(track_new_mem_stack_signal) ( void (*f)(Addr a, UInt len) );
EV VG_(track_new_mem_brk) ( void (*f)(Addr a, UInt len) );
EV VG_(track_new_mem_mmap) ( void (*f)(Addr a, UInt len,
Bool rr, Bool ww, Bool xx) );
+/* The specialised ones are called in preference to the general one, if they
+ are defined. These functions are called a lot if they are used, so
+ specialising can optimise things significantly. If any of the
+ specialised cases are defined, the general case must be defined too.
+
+ Nb: they must all use the __attribute__((regparm(n))) attribute. */
+EV VG_(track_new_mem_stack_4) ( void (*f)(Addr new_ESP) );
+EV VG_(track_new_mem_stack_8) ( void (*f)(Addr new_ESP) );
+EV VG_(track_new_mem_stack_12) ( void (*f)(Addr new_ESP) );
+EV VG_(track_new_mem_stack_16) ( void (*f)(Addr new_ESP) );
+EV VG_(track_new_mem_stack_32) ( void (*f)(Addr new_ESP) );
+EV VG_(track_new_mem_stack) ( void (*f)(Addr a, UInt len) );
+
+EV VG_(track_change_mem_stack) ( void (*f)(Addr new_ESP) );
+
EV VG_(track_copy_mem_heap) ( void (*f)(Addr from, Addr to, UInt len) );
EV VG_(track_copy_mem_remap) ( void (*f)(Addr from, Addr to, UInt len) );
EV VG_(track_change_mem_mprotect) ( void (*f)(Addr a, UInt len,
@@ -1446,12 +1460,18 @@
EV VG_(track_ban_mem_stack) ( void (*f)(Addr a, UInt len) );
EV VG_(track_die_mem_heap) ( void (*f)(Addr a, UInt len) );
-EV VG_(track_die_mem_stack) ( void (*f)(Addr a, UInt len) );
-EV VG_(track_die_mem_stack_aligned) ( void (*f)(Addr a, UInt len) );
EV VG_(track_die_mem_stack_signal) ( void (*f)(Addr a, UInt len) );
EV VG_(track_die_mem_brk) ( void (*f)(Addr a, UInt len) );
EV VG_(track_die_mem_munmap) ( void (*f)(Addr a, UInt len) );
+/* See comments for VG_(track_new_mem_stack_4) et al above */
+EV VG_(track_die_mem_stack_4) ( void (*f)(Addr die_ESP) );
+EV VG_(track_die_mem_stack_8) ( void (*f)(Addr die_ESP) );
+EV VG_(track_die_mem_stack_12) ( void (*f)(Addr die_ESP) );
+EV VG_(track_die_mem_stack_16) ( void (*f)(Addr die_ESP) );
+EV VG_(track_die_mem_stack_32) ( void (*f)(Addr die_ESP) );
+EV VG_(track_die_mem_stack) ( void (*f)(Addr a, UInt len) );
+
EV VG_(track_bad_free) ( void (*f)(ThreadState* tst, Addr a) );
EV VG_(track_mismatched_free) ( void (*f)(ThreadState* tst, Addr a) );