blob: 3691ab8b5864a6e697c157f712bc725805619701 [file] [log] [blame]
njne1b349b2005-11-28 17:32:49 +00001-----------------------------------------------------------------------------
2Notes on performance
3-----------------------------------------------------------------------------
4The intent of this file is to record progress in improving performance.
5
6-----------------------------------------------------------------------------
7Just before 3.1.0:
8- Julian made LibVEX_Alloc() inlinable. Saved a couple of percent.
9- Julian started building Vex at -O2. Saved up to 8% or so(?) in some
10 cases.
11
12Post 3.1.0:
13- Julian made the tree builder linear. Saved 2--13% on a range of programs.
njn5096a392005-12-13 20:05:00 +000014- Nick improved vg_SP_update_pass() to identify more small constant
15 increments/decrements of SP, so the fast cases can be used more often.
16 Saved 1--3% on a few programs.
sewardj5d3a1c92005-12-15 21:40:34 +000017- r5345,r5346,r5352: Julian improved the dispatcher so that x86 and
18 AMD64 use jumps instead of call/return for calling translations.
cerion297c88f2005-12-22 15:53:12 +000019 Also, on x86, amd64, ppc32 and ppc64, --profile-flags style profiling was
sewardj5d3a1c92005-12-15 21:40:34 +000020 removed from the despatch loop unless --profile-flags is being used.
21 Improved Nulgrind performance typically by 10--20%, and Memcheck
22 performance typically by 2--20%.
njn288d0df2005-12-19 19:12:13 +000023- Julian changed findSb to slowly move superblocks to the front of the list
24 as they were accessed. This sped up perf/heap by 25--50%, and some big
25 programs (eg. ktuberling) programs by a couple of percent.
njn7d414c12005-12-25 03:33:12 +000026- Nick reduced the iteration count of the loop in swizzle() from 20 to 5,
27 which gave almost identical results while saving 2% in perf/tinycc and 10%
28 in perf/heap on a 3GHz Prescott P4.
njne1b349b2005-11-28 17:32:49 +000029
30COMPVBITS branch:
31- Nick converted to compress V bits, initial version saved 0--5% on most
njn47fb6502005-12-02 23:09:49 +000032 cases, with a 30% improvement on one case (tsim_arch) which calls
njne1b349b2005-11-28 17:32:49 +000033 set_address_range_perms() a lot.
njn47fb6502005-12-02 23:09:49 +000034- Nick rewrote set_address_range_perms(), which gained 0--3% typically,
35 and 22% on tsim_arch.
njne1b349b2005-11-28 17:32:49 +000036