Blame - llvm/lib/CodeGen/README.txt - toolchain/llvm-project

blob: 8f19e432ab7992a2f7d7ce18dd0402efdef64ab5 [file] [log] [blame]

Evan Cheng	d771485	2007-03-28 08:30:04 +0000	[diff] [blame]	1	//===---------------------------------------------------------------------===//
				2
Evan Cheng	3578dd6	2007-03-20 22:22:38 +0000	[diff] [blame]	3	Common register allocation / spilling problem:
				4
Anton Korobeynikov	fb80151	2007-04-16 18:10:23 +0000	[diff] [blame]	5	mul lr, r4, lr
				6	str lr, [sp, #+52]
				7	ldr lr, [r1, #+32]
				8	sxth r3, r3
				9	ldr r4, [sp, #+52]
				10	mla r4, r3, lr, r4
Evan Cheng	3578dd6	2007-03-20 22:22:38 +0000	[diff] [blame]	11
				12	can be:
				13
Anton Korobeynikov	fb80151	2007-04-16 18:10:23 +0000	[diff] [blame]	14	mul lr, r4, lr
Evan Cheng	3578dd6	2007-03-20 22:22:38 +0000	[diff] [blame]	15	mov r4, lr
Anton Korobeynikov	fb80151	2007-04-16 18:10:23 +0000	[diff] [blame]	16	str lr, [sp, #+52]
				17	ldr lr, [r1, #+32]
				18	sxth r3, r3
				19	mla r4, r3, lr, r4
Evan Cheng	3578dd6	2007-03-20 22:22:38 +0000	[diff] [blame]	20
				21	and then "merge" mul and mov:
				22
Anton Korobeynikov	fb80151	2007-04-16 18:10:23 +0000	[diff] [blame]	23	mul r4, r4, lr
Duncan Sands	e0a607e	2013-06-07 08:30:55 +0000	[diff] [blame]	24	str r4, [sp, #+52]
Anton Korobeynikov	fb80151	2007-04-16 18:10:23 +0000	[diff] [blame]	25	ldr lr, [r1, #+32]
				26	sxth r3, r3
				27	mla r4, r3, lr, r4
Evan Cheng	3578dd6	2007-03-20 22:22:38 +0000	[diff] [blame]	28
Chris Lattner	0ab5e2c	2011-04-15 05:18:47 +0000	[diff] [blame]	29	It also increase the likelihood the store may become dead.
Evan Cheng	d771485	2007-03-28 08:30:04 +0000	[diff] [blame]	30
				31	//===---------------------------------------------------------------------===//
				32
Evan Cheng	86de3a0	2007-03-29 02:48:56 +0000	[diff] [blame]	33	bb27 ...
				34	...
Anton Korobeynikov	fb80151	2007-04-16 18:10:23 +0000	[diff] [blame]	35	%reg1037 = ADDri %reg1039, 1
				36	%reg1038 = ADDrs %reg1032, %reg1039, %NOREG, 10
Evan Cheng	86de3a0	2007-03-29 02:48:56 +0000	[diff] [blame]	37	Successors according to CFG: 0x8b03bf0 (#5)
				38
				39	bb76 (0x8b03bf0, LLVM BB @0x8b032d0, ID#5):
				40	Predecessors according to CFG: 0x8b0c5f0 (#3) 0x8b0a7c0 (#4)
Anton Korobeynikov	fb80151	2007-04-16 18:10:23 +0000	[diff] [blame]	41	%reg1039 = PHI %reg1070, mbb<bb76.outer,0x8b0c5f0>, %reg1037, mbb<bb27,0x8b0a7c0>
Evan Cheng	86de3a0	2007-03-29 02:48:56 +0000	[diff] [blame]	42
				43	Note ADDri is not a two-address instruction. However, its result %reg1037 is an
				44	operand of the PHI node in bb76 and its operand %reg1039 is the result of the
				45	PHI node. We should treat it as a two-address code and make sure the ADDri is
				46	scheduled after any node that reads %reg1039.
				47
				48	//===---------------------------------------------------------------------===//
				49
Evan Cheng	6b77c3e	2007-04-30 18:42:09 +0000	[diff] [blame]	50	Use local info (i.e. register scavenger) to assign it a free register to allow
				51	reuse:
Bill Wendling	25084af	2008-08-22 00:04:26 +0000	[diff] [blame]	52	ldr r3, [sp, #+4]
				53	add r3, r3, #3
				54	ldr r2, [sp, #+8]
				55	add r2, r2, #2
				56	ldr r1, [sp, #+4] <==
				57	add r1, r1, #1
				58	ldr r0, [sp, #+4]
				59	add r0, r0, #2
Evan Cheng	6b77c3e	2007-04-30 18:42:09 +0000	[diff] [blame]	60
				61	//===---------------------------------------------------------------------===//
				62
				63	LLVM aggressively lift CSE out of loop. Sometimes this can be negative side-
				64	effects:
				65
				66	R1 = X + 4
				67	R2 = X + 7
				68	R3 = X + 15
				69
				70	loop:
				71	load [i + R1]
				72	...
				73	load [i + R2]
				74	...
				75	load [i + R3]
				76
				77	Suppose there is high register pressure, R1, R2, R3, can be spilled. We need
				78	to implement proper re-materialization to handle this:
				79
				80	R1 = X + 4
				81	R2 = X + 7
				82	R3 = X + 15
				83
				84	loop:
				85	R1 = X + 4 @ re-materialized
				86	load [i + R1]
				87	...
				88	R2 = X + 7 @ re-materialized
				89	load [i + R2]
				90	...
				91	R3 = X + 15 @ re-materialized
				92	load [i + R3]
				93
				94	Furthermore, with re-association, we can enable sharing:
				95
				96	R1 = X + 4
				97	R2 = X + 7
				98	R3 = X + 15
				99
				100	loop:
				101	T = i + X
				102	load [T + 4]
				103	...
				104	load [T + 7]
				105	...
				106	load [T + 15]
Dale Johannesen	dafda82	2007-05-18 18:46:40 +0000	[diff] [blame]	107	//===---------------------------------------------------------------------===//
Evan Cheng	3b9f777	2007-09-10 22:11:18 +0000	[diff] [blame]	108
				109	It's not always a good idea to choose rematerialization over spilling. If all
				110	the load / store instructions would be folded then spilling is cheaper because
				111	it won't require new live intervals / registers. See 2003-05-31-LongShifts for
				112	an example.
Gordon Henriksen	37ca83d	2007-09-29 02:13:43 +0000	[diff] [blame]	113
				114	//===---------------------------------------------------------------------===//
				115
Gordon Henriksen	37ca83d	2007-09-29 02:13:43 +0000	[diff] [blame]	116	With a copying garbage collector, derived pointers must not be retained across
				117	collector safe points; the collector could move the objects and invalidate the
				118	derived pointer. This is bad enough in the first place, but safe points can
				119	crop up unpredictably. Consider:
				120
				121	%array = load { i32, [0 x %obj] }** %array_addr
				122	%nth_el = getelementptr { i32, [0 x %obj] }* %array, i32 0, i32 %n
				123	%old = load %obj** %nth_el
				124	%z = div i64 %x, %y
				125	store %obj* %new, %obj** %nth_el
				126
				127	If the i64 division is lowered to a libcall, then a safe point will (must)
				128	appear for the call site. If a collection occurs, %array and %nth_el no longer
				129	point into the correct object.
				130
				131	The fix for this is to copy address calculations so that dependent pointers
				132	are never live across safe point boundaries. But the loads cannot be copied
				133	like this if there was an intervening store, so may be hard to get right.
				134
				135	Only a concurrent mutator can trigger a collection at the libcall safe point.
				136	So single-threaded programs do not have this requirement, even with a copying
				137	collector. Still, LLVM optimizations would probably undo a front-end's careful
				138	work.
				139
				140	//===---------------------------------------------------------------------===//
				141
				142	The ocaml frametable structure supports liveness information. It would be good
				143	to support it.
Bill Wendling	f73340e	2007-10-25 19:49:32 +0000	[diff] [blame]	144
				145	//===---------------------------------------------------------------------===//
				146
				147	The FIXME in ComputeCommonTailLength in BranchFolding.cpp needs to be
				148	revisited. The check is there to work around a misuse of directives in inline
				149	assembly.
				150
				151	//===---------------------------------------------------------------------===//
Gordon Henriksen	5180e85	2008-01-07 01:30:38 +0000	[diff] [blame]	152
				153	It would be good to detect collector/target compatibility instead of silently
				154	doing the wrong thing.
				155
				156	//===---------------------------------------------------------------------===//
Chris Lattner	0ededbc	2008-02-10 01:01:35 +0000	[diff] [blame]	157
				158	It would be really nice to be able to write patterns in .td files for copies,
				159	which would eliminate a bunch of explicit predicates on them (e.g. no side
				160	effects). Once this is in place, it would be even better to have tblgen
				161	synthesize the various copy insertion/inspection methods in TargetInstrInfo.
Evan Cheng	c324be3	2008-06-06 19:52:44 +0000	[diff] [blame]	162
				163	//===---------------------------------------------------------------------===//
				164
Chris Lattner	0ab5e2c	2011-04-15 05:18:47 +0000	[diff] [blame]	165	Stack coloring improvements:
Evan Cheng	c324be3	2008-06-06 19:52:44 +0000	[diff] [blame]	166
				167	1. Do proper LiveStackAnalysis on all stack objects including those which are
				168	not spill slots.
				169	2. Reorder objects to fill in gaps between objects.
				170	e.g. 4, 1, <gap>, 4, 1, 1, 1, <gap>, 4 => 4, 1, 1, 1, 1, 4, 4
Dan Gohman	0aa63c9	2009-10-13 23:58:05 +0000	[diff] [blame]	171
				172	//===---------------------------------------------------------------------===//
				173
				174	The scheduler should be able to sort nearby instructions by their address. For
				175	example, in an expanded memset sequence it's not uncommon to see code like this:
				176
				177	movl $0, 4(%rdi)
				178	movl $0, 8(%rdi)
				179	movl $0, 12(%rdi)
				180	movl $0, 0(%rdi)
				181
				182	Each of the stores is independent, and the scheduler is currently making an
				183	arbitrary decision about the order.
				184
				185	//===---------------------------------------------------------------------===//
				186
				187	Another opportunitiy in this code is that the $0 could be moved to a register:
				188
				189	movl $0, 4(%rdi)
				190	movl $0, 8(%rdi)
				191	movl $0, 12(%rdi)
				192	movl $0, 0(%rdi)
				193
				194	This would save substantial code size, especially for longer sequences like
				195	this. It would be easy to have a rule telling isel to avoid matching MOV32mi
				196	if the immediate has more than some fixed number of uses. It's more involved
				197	to teach the register allocator how to do late folding to recover from
				198	excessive register pressure.
				199