Blame - lib/CodeGen/README.txt - fp2-dev/platform/external/llvm

blob: b655dda41153016eb2bc77e3aa33bcbf89768858 [file] [log] [blame]

Evan Cheng	197d19d	2007-03-28 08:30:04 +0000	[diff] [blame]	1	//===---------------------------------------------------------------------===//
				2
Evan Cheng	c3c7088	2007-03-20 22:22:38 +0000	[diff] [blame]	3	Common register allocation / spilling problem:
				4
Anton Korobeynikov	bed2946	2007-04-16 18:10:23 +0000	[diff] [blame]	5	mul lr, r4, lr
				6	str lr, [sp, #+52]
				7	ldr lr, [r1, #+32]
				8	sxth r3, r3
				9	ldr r4, [sp, #+52]
				10	mla r4, r3, lr, r4
Evan Cheng	c3c7088	2007-03-20 22:22:38 +0000	[diff] [blame]	11
				12	can be:
				13
Anton Korobeynikov	bed2946	2007-04-16 18:10:23 +0000	[diff] [blame]	14	mul lr, r4, lr
Evan Cheng	c3c7088	2007-03-20 22:22:38 +0000	[diff] [blame]	15	mov r4, lr
Anton Korobeynikov	bed2946	2007-04-16 18:10:23 +0000	[diff] [blame]	16	str lr, [sp, #+52]
				17	ldr lr, [r1, #+32]
				18	sxth r3, r3
				19	mla r4, r3, lr, r4
Evan Cheng	c3c7088	2007-03-20 22:22:38 +0000	[diff] [blame]	20
				21	and then "merge" mul and mov:
				22
Anton Korobeynikov	bed2946	2007-04-16 18:10:23 +0000	[diff] [blame]	23	mul r4, r4, lr
				24	str lr, [sp, #+52]
				25	ldr lr, [r1, #+32]
				26	sxth r3, r3
				27	mla r4, r3, lr, r4
Evan Cheng	c3c7088	2007-03-20 22:22:38 +0000	[diff] [blame]	28
				29	It also increase the likelyhood the store may become dead.
Evan Cheng	197d19d	2007-03-28 08:30:04 +0000	[diff] [blame]	30
				31	//===---------------------------------------------------------------------===//
				32
Evan Cheng	9747778	2007-03-29 02:48:56 +0000	[diff] [blame]	33	bb27 ...
				34	...
Anton Korobeynikov	bed2946	2007-04-16 18:10:23 +0000	[diff] [blame]	35	%reg1037 = ADDri %reg1039, 1
				36	%reg1038 = ADDrs %reg1032, %reg1039, %NOREG, 10
Evan Cheng	9747778	2007-03-29 02:48:56 +0000	[diff] [blame]	37	Successors according to CFG: 0x8b03bf0 (#5)
				38
				39	bb76 (0x8b03bf0, LLVM BB @0x8b032d0, ID#5):
				40	Predecessors according to CFG: 0x8b0c5f0 (#3) 0x8b0a7c0 (#4)
Anton Korobeynikov	bed2946	2007-04-16 18:10:23 +0000	[diff] [blame]	41	%reg1039 = PHI %reg1070, mbb<bb76.outer,0x8b0c5f0>, %reg1037, mbb<bb27,0x8b0a7c0>
Evan Cheng	9747778	2007-03-29 02:48:56 +0000	[diff] [blame]	42
				43	Note ADDri is not a two-address instruction. However, its result %reg1037 is an
				44	operand of the PHI node in bb76 and its operand %reg1039 is the result of the
				45	PHI node. We should treat it as a two-address code and make sure the ADDri is
				46	scheduled after any node that reads %reg1039.
				47
				48	//===---------------------------------------------------------------------===//
				49
Evan Cheng	e47e75b	2007-04-30 18:42:09 +0000	[diff] [blame]	50	Use local info (i.e. register scavenger) to assign it a free register to allow
				51	reuse:
Bill Wendling	a6211d9	2008-08-22 00:04:26 +0000	[diff] [blame]	52	ldr r3, [sp, #+4]
				53	add r3, r3, #3
				54	ldr r2, [sp, #+8]
				55	add r2, r2, #2
				56	ldr r1, [sp, #+4] <==
				57	add r1, r1, #1
				58	ldr r0, [sp, #+4]
				59	add r0, r0, #2
Evan Cheng	e47e75b	2007-04-30 18:42:09 +0000	[diff] [blame]	60
				61	//===---------------------------------------------------------------------===//
				62
				63	LLVM aggressively lift CSE out of loop. Sometimes this can be negative side-
				64	effects:
				65
				66	R1 = X + 4
				67	R2 = X + 7
				68	R3 = X + 15
				69
				70	loop:
				71	load [i + R1]
				72	...
				73	load [i + R2]
				74	...
				75	load [i + R3]
				76
				77	Suppose there is high register pressure, R1, R2, R3, can be spilled. We need
				78	to implement proper re-materialization to handle this:
				79
				80	R1 = X + 4
				81	R2 = X + 7
				82	R3 = X + 15
				83
				84	loop:
				85	R1 = X + 4 @ re-materialized
				86	load [i + R1]
				87	...
				88	R2 = X + 7 @ re-materialized
				89	load [i + R2]
				90	...
				91	R3 = X + 15 @ re-materialized
				92	load [i + R3]
				93
				94	Furthermore, with re-association, we can enable sharing:
				95
				96	R1 = X + 4
				97	R2 = X + 7
				98	R3 = X + 15
				99
				100	loop:
				101	T = i + X
				102	load [T + 4]
				103	...
				104	load [T + 7]
				105	...
				106	load [T + 15]
Dale Johannesen	a469b69	2007-05-18 18:46:40 +0000	[diff] [blame]	107	//===---------------------------------------------------------------------===//
Evan Cheng	2d98238	2007-09-10 22:11:18 +0000	[diff] [blame]	108
				109	It's not always a good idea to choose rematerialization over spilling. If all
				110	the load / store instructions would be folded then spilling is cheaper because
				111	it won't require new live intervals / registers. See 2003-05-31-LongShifts for
				112	an example.
Gordon Henriksen	364caf0	2007-09-29 02:13:43 +0000	[diff] [blame]	113
				114	//===---------------------------------------------------------------------===//
				115
Gordon Henriksen	364caf0	2007-09-29 02:13:43 +0000	[diff] [blame]	116	With a copying garbage collector, derived pointers must not be retained across
				117	collector safe points; the collector could move the objects and invalidate the
				118	derived pointer. This is bad enough in the first place, but safe points can
				119	crop up unpredictably. Consider:
				120
				121	%array = load { i32, [0 x %obj] }** %array_addr
				122	%nth_el = getelementptr { i32, [0 x %obj] }* %array, i32 0, i32 %n
				123	%old = load %obj** %nth_el
				124	%z = div i64 %x, %y
				125	store %obj* %new, %obj** %nth_el
				126
				127	If the i64 division is lowered to a libcall, then a safe point will (must)
				128	appear for the call site. If a collection occurs, %array and %nth_el no longer
				129	point into the correct object.
				130
				131	The fix for this is to copy address calculations so that dependent pointers
				132	are never live across safe point boundaries. But the loads cannot be copied
				133	like this if there was an intervening store, so may be hard to get right.
				134
				135	Only a concurrent mutator can trigger a collection at the libcall safe point.
				136	So single-threaded programs do not have this requirement, even with a copying
				137	collector. Still, LLVM optimizations would probably undo a front-end's careful
				138	work.
				139
				140	//===---------------------------------------------------------------------===//
				141
				142	The ocaml frametable structure supports liveness information. It would be good
				143	to support it.
Bill Wendling	da6efc5	2007-10-25 19:49:32 +0000	[diff] [blame]	144
				145	//===---------------------------------------------------------------------===//
				146
				147	The FIXME in ComputeCommonTailLength in BranchFolding.cpp needs to be
				148	revisited. The check is there to work around a misuse of directives in inline
				149	assembly.
				150
				151	//===---------------------------------------------------------------------===//
Gordon Henriksen	ce22477	2008-01-07 01:30:38 +0000	[diff] [blame]	152
				153	It would be good to detect collector/target compatibility instead of silently
				154	doing the wrong thing.
				155
				156	//===---------------------------------------------------------------------===//
Chris Lattner	be036a9	2008-02-10 01:01:35 +0000	[diff] [blame]	157
				158	It would be really nice to be able to write patterns in .td files for copies,
				159	which would eliminate a bunch of explicit predicates on them (e.g. no side
				160	effects). Once this is in place, it would be even better to have tblgen
				161	synthesize the various copy insertion/inspection methods in TargetInstrInfo.
Evan Cheng	877333b	2008-06-06 19:52:44 +0000	[diff] [blame]	162
				163	//===---------------------------------------------------------------------===//
				164
				165	Stack coloring improvments:
				166
				167	1. Do proper LiveStackAnalysis on all stack objects including those which are
				168	not spill slots.
				169	2. Reorder objects to fill in gaps between objects.
				170	e.g. 4, 1, <gap>, 4, 1, 1, 1, <gap>, 4 => 4, 1, 1, 1, 1, 4, 4
Dan Gohman	363bbc0	2009-10-13 23:58:05 +0000	[diff] [blame]	171
				172	//===---------------------------------------------------------------------===//
				173
				174	The scheduler should be able to sort nearby instructions by their address. For
				175	example, in an expanded memset sequence it's not uncommon to see code like this:
				176
				177	movl $0, 4(%rdi)
				178	movl $0, 8(%rdi)
				179	movl $0, 12(%rdi)
				180	movl $0, 0(%rdi)
				181
				182	Each of the stores is independent, and the scheduler is currently making an
				183	arbitrary decision about the order.
				184
				185	//===---------------------------------------------------------------------===//
				186
				187	Another opportunitiy in this code is that the $0 could be moved to a register:
				188
				189	movl $0, 4(%rdi)
				190	movl $0, 8(%rdi)
				191	movl $0, 12(%rdi)
				192	movl $0, 0(%rdi)
				193
				194	This would save substantial code size, especially for longer sequences like
				195	this. It would be easy to have a rule telling isel to avoid matching MOV32mi
				196	if the immediate has more than some fixed number of uses. It's more involved
				197	to teach the register allocator how to do late folding to recover from
				198	excessive register pressure.
				199