Blame - docs/internals/performance.txt - platform/external/valgrind

blob: 2dbcfc6ae2a2db27286f907af96e64e1e9925f0f [file] [log] [blame]

njn	e1b349b	2005-11-28 17:32:49 +0000	[diff] [blame]	1	-----------------------------------------------------------------------------
				2	Notes on performance
				3	-----------------------------------------------------------------------------
				4	The intent of this file is to record progress in improving performance.
				5
				6	-----------------------------------------------------------------------------
				7	Just before 3.1.0:
				8	- Julian made LibVEX_Alloc() inlinable. Saved a couple of percent.
				9	- Julian started building Vex at -O2. Saved up to 8% or so(?) in some
				10	cases.
				11
				12	Post 3.1.0:
				13	- Julian made the tree builder linear. Saved 2--13% on a range of programs.
njn	5096a39	2005-12-13 20:05:00 +0000	[diff] [blame]	14	- Nick improved vg_SP_update_pass() to identify more small constant
				15	increments/decrements of SP, so the fast cases can be used more often.
				16	Saved 1--3% on a few programs.
sewardj	5d3a1c9	2005-12-15 21:40:34 +0000	[diff] [blame]	17	- r5345,r5346,r5352: Julian improved the dispatcher so that x86 and
				18	AMD64 use jumps instead of call/return for calling translations.
cerion	297c88f	2005-12-22 15:53:12 +0000	[diff] [blame]	19	Also, on x86, amd64, ppc32 and ppc64, --profile-flags style profiling was
sewardj	5d3a1c9	2005-12-15 21:40:34 +0000	[diff] [blame]	20	removed from the despatch loop unless --profile-flags is being used.
				21	Improved Nulgrind performance typically by 10--20%, and Memcheck
				22	performance typically by 2--20%.
njn	288d0df	2005-12-19 19:12:13 +0000	[diff] [blame]	23	- Julian changed findSb to slowly move superblocks to the front of the list
				24	as they were accessed. This sped up perf/heap by 25--50%, and some big
				25	programs (eg. ktuberling) programs by a couple of percent.
njn	7d414c1	2005-12-25 03:33:12 +0000	[diff] [blame]	26	- Nick reduced the iteration count of the loop in swizzle() from 20 to 5,
				27	which gave almost identical results while saving 2% in perf/tinycc and 10%
				28	in perf/heap on a 3GHz Prescott P4.
njn	c0ec8e9	2005-12-25 06:34:04 +0000	[diff] [blame]	29	- Nick changed ExeContext gathering to not record/save extra zeroes at the
				30	end. Saved 7% on perf/heap with --num-callers=50, and about 1% on
				31	perf/tinycc.
sewardj	f218491	2006-05-03 22:13:57 +0000	[diff] [blame]	32	- Julian vectorised copy_address_range_perms for common cases, which
				33	gives about 40% speedup on artificial programs which just do
				34	realloc() and nothing else, and about a 3-4% speedup on starting
				35	kpresenter-1.5.0 and loading a 16-slide presentation.
njn	e1b349b	2005-11-28 17:32:49 +0000	[diff] [blame]	36
				37	COMPVBITS branch:
				38	- Nick converted to compress V bits, initial version saved 0--5% on most
njn	47fb650	2005-12-02 23:09:49 +0000	[diff] [blame]	39	cases, with a 30% improvement on one case (tsim_arch) which calls
njn	e1b349b	2005-11-28 17:32:49 +0000	[diff] [blame]	40	set_address_range_perms() a lot.
njn	47fb650	2005-12-02 23:09:49 +0000	[diff] [blame]	41	- Nick rewrote set_address_range_perms(), which gained 0--3% typically,
				42	and 22% on tsim_arch.
njn	e1b349b	2005-11-28 17:32:49 +0000	[diff] [blame]	43