njn | e1b349b | 2005-11-28 17:32:49 +0000 | [diff] [blame] | 1 | ----------------------------------------------------------------------------- |
| 2 | Notes on performance |
| 3 | ----------------------------------------------------------------------------- |
| 4 | The intent of this file is to record progress in improving performance. |
| 5 | |
| 6 | ----------------------------------------------------------------------------- |
| 7 | Just before 3.1.0: |
| 8 | - Julian made LibVEX_Alloc() inlinable. Saved a couple of percent. |
| 9 | - Julian started building Vex at -O2. Saved up to 8% or so(?) in some |
| 10 | cases. |
| 11 | |
| 12 | Post 3.1.0: |
| 13 | - Julian made the tree builder linear. Saved 2--13% on a range of programs. |
njn | 5096a39 | 2005-12-13 20:05:00 +0000 | [diff] [blame] | 14 | - Nick improved vg_SP_update_pass() to identify more small constant |
| 15 | increments/decrements of SP, so the fast cases can be used more often. |
| 16 | Saved 1--3% on a few programs. |
sewardj | 5d3a1c9 | 2005-12-15 21:40:34 +0000 | [diff] [blame] | 17 | - r5345,r5346,r5352: Julian improved the dispatcher so that x86 and |
| 18 | AMD64 use jumps instead of call/return for calling translations. |
cerion | 297c88f | 2005-12-22 15:53:12 +0000 | [diff] [blame] | 19 | Also, on x86, amd64, ppc32 and ppc64, --profile-flags style profiling was |
sewardj | 5d3a1c9 | 2005-12-15 21:40:34 +0000 | [diff] [blame] | 20 | removed from the despatch loop unless --profile-flags is being used. |
| 21 | Improved Nulgrind performance typically by 10--20%, and Memcheck |
| 22 | performance typically by 2--20%. |
njn | 288d0df | 2005-12-19 19:12:13 +0000 | [diff] [blame] | 23 | - Julian changed findSb to slowly move superblocks to the front of the list |
| 24 | as they were accessed. This sped up perf/heap by 25--50%, and some big |
| 25 | programs (eg. ktuberling) programs by a couple of percent. |
njn | 7d414c1 | 2005-12-25 03:33:12 +0000 | [diff] [blame] | 26 | - Nick reduced the iteration count of the loop in swizzle() from 20 to 5, |
| 27 | which gave almost identical results while saving 2% in perf/tinycc and 10% |
| 28 | in perf/heap on a 3GHz Prescott P4. |
njn | c0ec8e9 | 2005-12-25 06:34:04 +0000 | [diff] [blame] | 29 | - Nick changed ExeContext gathering to not record/save extra zeroes at the |
| 30 | end. Saved 7% on perf/heap with --num-callers=50, and about 1% on |
| 31 | perf/tinycc. |
sewardj | f218491 | 2006-05-03 22:13:57 +0000 | [diff] [blame] | 32 | - Julian vectorised copy_address_range_perms for common cases, which |
| 33 | gives about 40% speedup on artificial programs which just do |
| 34 | realloc() and nothing else, and about a 3-4% speedup on starting |
| 35 | kpresenter-1.5.0 and loading a 16-slide presentation. |
njn | e1b349b | 2005-11-28 17:32:49 +0000 | [diff] [blame] | 36 | |
| 37 | COMPVBITS branch: |
| 38 | - Nick converted to compress V bits, initial version saved 0--5% on most |
njn | 47fb650 | 2005-12-02 23:09:49 +0000 | [diff] [blame] | 39 | cases, with a 30% improvement on one case (tsim_arch) which calls |
njn | e1b349b | 2005-11-28 17:32:49 +0000 | [diff] [blame] | 40 | set_address_range_perms() a lot. |
njn | 47fb650 | 2005-12-02 23:09:49 +0000 | [diff] [blame] | 41 | - Nick rewrote set_address_range_perms(), which gained 0--3% typically, |
| 42 | and 22% on tsim_arch. |
njn | e1b349b | 2005-11-28 17:32:49 +0000 | [diff] [blame] | 43 | |