Blame - lib/Target/ARM/README.txt - fp2-dev/platform/external/llvm

blob: c155e20633432926dc710c0fd85936ac9ffbea98 [file] [log] [blame]

Rafael Espindola	f4d4005	2006-08-22 12:22:46 +0000	[diff] [blame]	1	//===---------------------------------------------------------------------===//
				2	// Random ideas for the ARM backend.
				3	//===---------------------------------------------------------------------===//
				4
Evan Cheng	a8e2989	2007-01-19 07:51:42 +0000	[diff] [blame]	5	Reimplement 'select' in terms of 'SEL'.
Rafael Espindola	f4d4005	2006-08-22 12:22:46 +0000	[diff] [blame]	6
Evan Cheng	a8e2989	2007-01-19 07:51:42 +0000	[diff] [blame]	7	* We would really like to support UXTAB16, but we need to prove that the
				8	add doesn't need to overflow between the two 16-bit chunks.
Rafael Espindola	7564549	2006-09-22 11:36:17 +0000	[diff] [blame]	9
Evan Cheng	a8e2989	2007-01-19 07:51:42 +0000	[diff] [blame]	10	* implement predication support
				11	* Implement pre/post increment support. (e.g. PR935)
				12	* Coalesce stack slots!
				13	* Implement smarter constant generation for binops with large immediates.
Rafael Espindola	7564549	2006-09-22 11:36:17 +0000	[diff] [blame]	14
Evan Cheng	a8e2989	2007-01-19 07:51:42 +0000	[diff] [blame]	15	* Consider materializing FP constants like 0.0f and 1.0f using integer
				16	immediate instructions then copy to FPU. Slower than load into FPU?
Rafael Espindola	7564549	2006-09-22 11:36:17 +0000	[diff] [blame]	17
Evan Cheng	a8e2989	2007-01-19 07:51:42 +0000	[diff] [blame]	18	//===---------------------------------------------------------------------===//
Rafael Espindola	7564549	2006-09-22 11:36:17 +0000	[diff] [blame]	19
Dale Johannesen	f1b214d	2007-02-28 18:41:23 +0000	[diff] [blame]	20	The constant island pass is in good shape. Some cleanups might be desirable,
				21	but there is unlikely to be much improvement in the generated code.
Rafael Espindola	7564549	2006-09-22 11:36:17 +0000	[diff] [blame]	22
Dale Johannesen	f1b214d	2007-02-28 18:41:23 +0000	[diff] [blame]	23	1. There may be some advantage to trying to be smarter about the initial
Dale Johannesen	88e37ae	2007-02-23 05:02:36 +0000	[diff] [blame]	24	placement, rather than putting everything at the end.
Rafael Espindola	7564549	2006-09-22 11:36:17 +0000	[diff] [blame]	25
Dale Johannesen	f1b214d	2007-02-28 18:41:23 +0000	[diff] [blame]	26	2. The handling of 2-byte padding for Thumb is overly conservative. There
Dale Johannesen	99c49a4	2007-02-25 00:47:03 +0000	[diff] [blame]	27	would be a small gain to keeping accurate track of the padding (which would
				28	require aligning functions containing constant pools to 4-byte boundaries).
				29
Dale Johannesen	f1b214d	2007-02-28 18:41:23 +0000	[diff] [blame]	30	3. There might be some compile-time efficiency to be had by representing
				31	consecutive islands as a single block rather than multiple blocks.
				32
Evan Cheng	1a9da0d	2007-03-09 19:46:06 +0000	[diff] [blame^]	33	4. Use a priority queue to sort constant pool users in inverse order of
				34	position so we always process the one closed to the end of functions
				35	first. This may simply CreateNewWater.
				36
Evan Cheng	a8e2989	2007-01-19 07:51:42 +0000	[diff] [blame]	37	//===---------------------------------------------------------------------===//
Rafael Espindola	7564549	2006-09-22 11:36:17 +0000	[diff] [blame]	38
Evan Cheng	a8e2989	2007-01-19 07:51:42 +0000	[diff] [blame]	39	We need to start generating predicated instructions. The .td files have a way
				40	to express this now (see the PPC conditional return instruction), but the
				41	branch folding pass (or a new if-cvt pass) should start producing these, at
				42	least in the trivial case.
Rafael Espindola	7564549	2006-09-22 11:36:17 +0000	[diff] [blame]	43
Evan Cheng	a8e2989	2007-01-19 07:51:42 +0000	[diff] [blame]	44	Among the obvious wins, doing so can eliminate the need to custom expand
				45	copysign (i.e. we won't need to custom expand it to get the conditional
				46	negate).
Rafael Espindola	4dfab98	2006-12-11 23:56:10 +0000	[diff] [blame]	47
Chris Lattner	2d1222c	2007-02-02 04:36:46 +0000	[diff] [blame]	48	This allows us to eliminate one instruction from:
				49
				50	define i32 @_Z6slow4bii(i32 %x, i32 %y) {
				51	%tmp = icmp sgt i32 %x, %y
				52	%retval = select i1 %tmp, i32 %x, i32 %y
				53	ret i32 %retval
				54	}
				55
				56	__Z6slow4bii:
				57	cmp r0, r1
				58	movgt r1, r0
				59	mov r0, r1
				60	bx lr
				61
Evan Cheng	a8e2989	2007-01-19 07:51:42 +0000	[diff] [blame]	62	//===---------------------------------------------------------------------===//
Rafael Espindola	4dfab98	2006-12-11 23:56:10 +0000	[diff] [blame]	63
Evan Cheng	a8e2989	2007-01-19 07:51:42 +0000	[diff] [blame]	64	Implement long long "X-3" with instructions that fold the immediate in. These
				65	were disabled due to badness with the ARM carry flag on subtracts.
Rafael Espindola	4dfab98	2006-12-11 23:56:10 +0000	[diff] [blame]	66
Evan Cheng	a8e2989	2007-01-19 07:51:42 +0000	[diff] [blame]	67	//===---------------------------------------------------------------------===//
Rafael Espindola	4dfab98	2006-12-11 23:56:10 +0000	[diff] [blame]	68
Evan Cheng	a8e2989	2007-01-19 07:51:42 +0000	[diff] [blame]	69	We currently compile abs:
				70	int foo(int p) { return p < 0 ? -p : p; }
Rafael Espindola	4dfab98	2006-12-11 23:56:10 +0000	[diff] [blame]	71
Evan Cheng	a8e2989	2007-01-19 07:51:42 +0000	[diff] [blame]	72	into:
Rafael Espindola	cd71da5	2006-10-03 17:27:58 +0000	[diff] [blame]	73
Evan Cheng	a8e2989	2007-01-19 07:51:42 +0000	[diff] [blame]	74	_foo:
				75	rsb r1, r0, #0
				76	cmn r0, #1
				77	movgt r1, r0
				78	mov r0, r1
				79	bx lr
Rafael Espindola	cd71da5	2006-10-03 17:27:58 +0000	[diff] [blame]	80
Evan Cheng	a8e2989	2007-01-19 07:51:42 +0000	[diff] [blame]	81	This is very, uh, literal. This could be a 3 operation sequence:
				82	t = (p sra 31);
				83	res = (p xor t)-t
Rafael Espindola	5af3a68	2006-10-09 14:18:33 +0000	[diff] [blame]	84
Evan Cheng	a8e2989	2007-01-19 07:51:42 +0000	[diff] [blame]	85	Which would be better. This occurs in png decode.
Rafael Espindola	5af3a68	2006-10-09 14:18:33 +0000	[diff] [blame]	86
Evan Cheng	a8e2989	2007-01-19 07:51:42 +0000	[diff] [blame]	87	//===---------------------------------------------------------------------===//
				88
				89	More load / store optimizations:
				90	1) Look past instructions without side-effects (not load, store, branch, etc.)
				91	when forming the list of loads / stores to optimize.
				92
				93	2) Smarter register allocation?
				94	We are probably missing some opportunities to use ldm / stm. Consider:
				95
				96	ldr r5, [r0]
				97	ldr r4, [r0, #4]
				98
				99	This cannot be merged into a ldm. Perhaps we will need to do the transformation
				100	before register allocation. Then teach the register allocator to allocate a
				101	chunk of consecutive registers.
				102
				103	3) Better representation for block transfer? This is from Olden/power:
				104
				105	fldd d0, [r4]
				106	fstd d0, [r4, #+32]
				107	fldd d0, [r4, #+8]
				108	fstd d0, [r4, #+40]
				109	fldd d0, [r4, #+16]
				110	fstd d0, [r4, #+48]
				111	fldd d0, [r4, #+24]
				112	fstd d0, [r4, #+56]
				113
				114	If we can spare the registers, it would be better to use fldm and fstm here.
				115	Need major register allocator enhancement though.
				116
				117	4) Can we recognize the relative position of constantpool entries? i.e. Treat
				118
				119	ldr r0, LCPI17_3
				120	ldr r1, LCPI17_4
				121	ldr r2, LCPI17_5
				122
				123	as
				124	ldr r0, LCPI17
				125	ldr r1, LCPI17+4
				126	ldr r2, LCPI17+8
				127
				128	Then the ldr's can be combined into a single ldm. See Olden/power.
				129
				130	Note for ARM v4 gcc uses ldmia to load a pair of 32-bit values to represent a
				131	double 64-bit FP constant:
				132
				133	adr r0, L6
				134	ldmia r0, {r0-r1}
				135
				136	.align 2
				137	L6:
				138	.long -858993459
				139	.long 1074318540
				140
				141	5) Can we make use of ldrd and strd? Instead of generating ldm / stm, use
				142	ldrd/strd instead if there are only two destination registers that form an
				143	odd/even pair. However, we probably would pay a penalty if the address is not
				144	aligned on 8-byte boundary. This requires more information on load / store
				145	nodes (and MI's?) then we currently carry.
				146
Dale Johannesen	818c085	2007-03-09 19:18:59 +0000	[diff] [blame]	147	6) struct copies appear to be done field by field
Dale Johannesen	a6bc6fc	2007-03-09 17:58:17 +0000	[diff] [blame]	148	instead of by words, at least sometimes:
				149
				150	struct foo { int x; short s; char c1; char c2; };
				151	void cpy(struct fooa, struct foob) { a = b; }
				152
				153	llvm code (-O2)
				154	ldrb r3, [r1, #+6]
				155	ldr r2, [r1]
				156	ldrb r12, [r1, #+7]
				157	ldrh r1, [r1, #+4]
				158	str r2, [r0]
				159	strh r1, [r0, #+4]
				160	strb r3, [r0, #+6]
				161	strb r12, [r0, #+7]
				162	gcc code (-O2)
				163	ldmia r1, {r1-r2}
				164	stmia r0, {r1-r2}
				165
				166	In this benchmark poor handling of aggregate copies has shown up as
				167	having a large effect on size, and possibly speed as well (we don't have
				168	a good way to measure on ARM).
				169
Evan Cheng	a8e2989	2007-01-19 07:51:42 +0000	[diff] [blame]	170	//===---------------------------------------------------------------------===//
				171
				172	* Consider this silly example:
				173
				174	double bar(double x) {
				175	double r = foo(3.1);
				176	return x+r;
				177	}
				178
				179	_bar:
				180	sub sp, sp, #16
				181	str r4, [sp, #+12]
				182	str r5, [sp, #+8]
				183	str lr, [sp, #+4]
				184	mov r4, r0
				185	mov r5, r1
				186	ldr r0, LCPI2_0
				187	bl _foo
				188	fmsr f0, r0
				189	fcvtsd d0, f0
				190	fmdrr d1, r4, r5
				191	faddd d0, d0, d1
				192	fmrrd r0, r1, d0
				193	ldr lr, [sp, #+4]
				194	ldr r5, [sp, #+8]
				195	ldr r4, [sp, #+12]
				196	add sp, sp, #16
				197	bx lr
				198
				199	Ignore the prologue and epilogue stuff for a second. Note
				200	mov r4, r0
				201	mov r5, r1
				202	the copys to callee-save registers and the fact they are only being used by the
				203	fmdrr instruction. It would have been better had the fmdrr been scheduled
				204	before the call and place the result in a callee-save DPR register. The two
				205	mov ops would not have been necessary.
				206
				207	//===---------------------------------------------------------------------===//
				208
				209	Calling convention related stuff:
				210
				211	* gcc's parameter passing implementation is terrible and we suffer as a result:
				212
				213	e.g.
				214	struct s {
				215	double d1;
				216	int s1;
				217	};
				218
				219	void foo(struct s S) {
				220	printf("%g, %d\n", S.d1, S.s1);
				221	}
				222
				223	'S' is passed via registers r0, r1, r2. But gcc stores them to the stack, and
				224	then reload them to r1, r2, and r3 before issuing the call (r0 contains the
				225	address of the format string):
				226
				227	stmfd sp!, {r7, lr}
				228	add r7, sp, #0
				229	sub sp, sp, #12
				230	stmia sp, {r0, r1, r2}
				231	ldmia sp, {r1-r2}
				232	ldr r0, L5
				233	ldr r3, [sp, #8]
				234	L2:
				235	add r0, pc, r0
				236	bl L_printf$stub
				237
				238	Instead of a stmia, ldmia, and a ldr, wouldn't it be better to do three moves?
				239
				240	* Return an aggregate type is even worse:
				241
				242	e.g.
				243	struct s foo(void) {
				244	struct s S = {1.1, 2};
				245	return S;
				246	}
				247
				248	mov ip, r0
				249	ldr r0, L5
				250	sub sp, sp, #12
				251	L2:
				252	add r0, pc, r0
				253	@ lr needed for prologue
				254	ldmia r0, {r0, r1, r2}
				255	stmia sp, {r0, r1, r2}
				256	stmia ip, {r0, r1, r2}
				257	mov r0, ip
				258	add sp, sp, #12
				259	bx lr
				260
				261	r0 (and later ip) is the hidden parameter from caller to store the value in. The
				262	first ldmia loads the constants into r0, r1, r2. The last stmia stores r0, r1,
				263	r2 into the address passed in. However, there is one additional stmia that
				264	stores r0, r1, and r2 to some stack location. The store is dead.
				265
				266	The llvm-gcc generated code looks like this:
				267
				268	csretcc void %foo(%struct.s* %agg.result) {
Rafael Espindola	5af3a68	2006-10-09 14:18:33 +0000	[diff] [blame]	269	entry:
Evan Cheng	a8e2989	2007-01-19 07:51:42 +0000	[diff] [blame]	270	%S = alloca %struct.s, align 4 ; <%struct.s*> [#uses=1]
				271	%memtmp = alloca %struct.s ; <%struct.s*> [#uses=1]
				272	cast %struct.s* %S to sbyte* ; <sbyte*>:0 [#uses=2]
				273	call void %llvm.memcpy.i32( sbyte* %0, sbyte* cast ({ double, int }* %C.0.904 to sbyte*), uint 12, uint 4 )
				274	cast %struct.s* %agg.result to sbyte* ; <sbyte*>:1 [#uses=2]
				275	call void %llvm.memcpy.i32( sbyte* %1, sbyte* %0, uint 12, uint 0 )
				276	cast %struct.s* %memtmp to sbyte* ; <sbyte*>:2 [#uses=1]
				277	call void %llvm.memcpy.i32( sbyte* %2, sbyte* %1, uint 12, uint 0 )
Rafael Espindola	5af3a68	2006-10-09 14:18:33 +0000	[diff] [blame]	278	ret void
				279	}
				280
Evan Cheng	a8e2989	2007-01-19 07:51:42 +0000	[diff] [blame]	281	llc ends up issuing two memcpy's (the first memcpy becomes 3 loads from
				282	constantpool). Perhaps we should 1) fix llvm-gcc so the memcpy is translated
				283	into a number of load and stores, or 2) custom lower memcpy (of small size) to
				284	be ldmia / stmia. I think option 2 is better but the current register
				285	allocator cannot allocate a chunk of registers at a time.
Rafael Espindola	5af3a68	2006-10-09 14:18:33 +0000	[diff] [blame]	286
Evan Cheng	a8e2989	2007-01-19 07:51:42 +0000	[diff] [blame]	287	A feasible temporary solution is to use specific physical registers at the
				288	lowering time for small (<= 4 words?) transfer size.
Rafael Espindola	5af3a68	2006-10-09 14:18:33 +0000	[diff] [blame]	289
Evan Cheng	a8e2989	2007-01-19 07:51:42 +0000	[diff] [blame]	290	* ARM CSRet calling convention requires the hidden argument to be returned by
				291	the callee.
Rafael Espindola	bec2e38	2006-10-16 16:33:29 +0000	[diff] [blame]	292
Evan Cheng	a8e2989	2007-01-19 07:51:42 +0000	[diff] [blame]	293	//===---------------------------------------------------------------------===//
Rafael Espindola	bec2e38	2006-10-16 16:33:29 +0000	[diff] [blame]	294
Evan Cheng	a8e2989	2007-01-19 07:51:42 +0000	[diff] [blame]	295	We can definitely do a better job on BB placements to eliminate some branches.
				296	It's very common to see llvm generated assembly code that looks like this:
Rafael Espindola	82c678b	2006-10-16 17:17:22 +0000	[diff] [blame]	297
Evan Cheng	a8e2989	2007-01-19 07:51:42 +0000	[diff] [blame]	298	LBB3:
				299	...
				300	LBB4:
				301	...
				302	beq LBB3
				303	b LBB2
Rafael Espindola	82c678b	2006-10-16 17:17:22 +0000	[diff] [blame]	304
Evan Cheng	a8e2989	2007-01-19 07:51:42 +0000	[diff] [blame]	305	If BB4 is the only predecessor of BB3, then we can emit BB3 after BB4. We can
				306	then eliminate beq and and turn the unconditional branch to LBB2 to a bne.
				307
				308	See McCat/18-imp/ComputeBoundingBoxes for an example.
				309
				310	//===---------------------------------------------------------------------===//
				311
Dale Johannesen	a6bc6fc	2007-03-09 17:58:17 +0000	[diff] [blame]	312	Register scavenging is now implemented. The example in the previous version
				313	of this document produces optimal code at -O2.
Evan Cheng	a8e2989	2007-01-19 07:51:42 +0000	[diff] [blame]	314
				315	//===---------------------------------------------------------------------===//
				316
				317	Pre-/post- indexed load / stores:
				318
				319	1) We should not make the pre/post- indexed load/store transform if the base ptr
				320	is guaranteed to be live beyond the load/store. This can happen if the base
				321	ptr is live out of the block we are performing the optimization. e.g.
				322
				323	mov r1, r2
				324	ldr r3, [r1], #4
				325	...
				326
				327	vs.
				328
				329	ldr r3, [r2]
				330	add r1, r2, #4
				331	...
				332
				333	In most cases, this is just a wasted optimization. However, sometimes it can
				334	negatively impact the performance because two-address code is more restrictive
				335	when it comes to scheduling.
				336
				337	Unfortunately, liveout information is currently unavailable during DAG combine
				338	time.
				339
				340	2) Consider spliting a indexed load / store into a pair of add/sub + load/store
				341	to solve #1 (in TwoAddressInstructionPass.cpp).
				342
				343	3) Enhance LSR to generate more opportunities for indexed ops.
				344
				345	4) Once we added support for multiple result patterns, write indexed loads
				346	patterns instead of C++ instruction selection code.
				347
				348	5) Use FLDM / FSTM to emulate indexed FP load / store.
				349
				350	//===---------------------------------------------------------------------===//
				351
				352	We should add i64 support to take advantage of the 64-bit load / stores.
				353	We can add a pseudo i64 register class containing pseudo registers that are
				354	register pairs. All other ops (e.g. add, sub) would be expanded as usual.
				355
				356	We need to add pseudo instructions (i.e. gethi / getlo) to extract i32 registers
				357	from the i64 register. These are single moves which can be eliminated if the
				358	destination register is a sub-register of the source. We should implement proper
				359	subreg support in the register allocator to coalesce these away.
				360
				361	There are other minor issues such as multiple instructions for a spill / restore
				362	/ move.
				363
				364	//===---------------------------------------------------------------------===//
				365
				366	Implement support for some more tricky ways to materialize immediates. For
				367	example, to get 0xffff8000, we can use:
				368
				369	mov r9, #&3f8000
				370	sub r9, r9, #&400000
				371
				372	//===---------------------------------------------------------------------===//
				373
				374	We sometimes generate multiple add / sub instructions to update sp in prologue
				375	and epilogue if the inc / dec value is too large to fit in a single immediate
				376	operand. In some cases, perhaps it might be better to load the value from a
				377	constantpool instead.
				378
				379	//===---------------------------------------------------------------------===//
				380
				381	GCC generates significantly better code for this function.
				382
				383	int foo(int StackPtr, unsigned char Line, unsigned char Stack, int LineLen) {
				384	int i = 0;
				385
				386	if (StackPtr != 0) {
				387	while (StackPtr != 0 && i < (((LineLen) < (32768))? (LineLen) : (32768)))
				388	Line[i++] = Stack[--StackPtr];
				389	if (LineLen > 32768)
				390	{
				391	while (StackPtr != 0 && i < LineLen)
				392	{
				393	i++;
				394	--StackPtr;
				395	}
				396	}
				397	}
				398	return StackPtr;
				399	}
				400
				401	//===---------------------------------------------------------------------===//
				402
				403	This should compile to the mlas instruction:
				404	int mlas(int x, int y, int z) { return ((x * y + z) < 0) ? 7 : 13; }
				405
				406	//===---------------------------------------------------------------------===//
				407
				408	At some point, we should triage these to see if they still apply to us:
				409
				410	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19598
				411	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18560
				412	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27016
				413
				414	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11831
				415	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11826
				416	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11825
				417	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11824
				418	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11823
				419	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11820
				420	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10982
				421
				422	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10242
				423	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=9831
				424	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=9760
				425	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=9759
				426	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=9703
				427	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=9702
				428	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=9663
				429
				430	http://www.inf.u-szeged.hu/gcc-arm/
				431	http://citeseer.ist.psu.edu/debus04linktime.html
				432
				433	//===---------------------------------------------------------------------===//
Evan Cheng	2265b49	2007-03-09 19:34:51 +0000	[diff] [blame]	434
Dale Johannesen	818c085	2007-03-09 19:18:59 +0000	[diff] [blame]	435	gcc generates smaller code for this function at -O2 or -Os:
Dale Johannesen	a6bc6fc	2007-03-09 17:58:17 +0000	[diff] [blame]	436
				437	void foo(signed char* p) {
				438	if (*p == 3)
				439	bar();
				440	else if (*p == 4)
				441	baz();
				442	else if (*p == 5)
				443	quux();
				444	}
				445
				446	llvm decides it's a good idea to turn the repeated if...else into a
				447	binary tree, as if it were a switch; the resulting code requires -1
				448	compare-and-branches when p<=2 or p==5, the same number if *p==4
				449	or p>6, and +1 if p==3. So it should be a speed win
				450	(on balance). However, the revised code is larger, with 4 conditional
				451	branches instead of 3.
				452
				453	More seriously, there is a byte->word extend before
				454	each comparison, where there should be only one, and the condition codes
				455	are not remembered when the same two values are compared twice.
				456
Evan Cheng	2265b49	2007-03-09 19:34:51 +0000	[diff] [blame]	457	//===---------------------------------------------------------------------===//
				458
				459	More register scavenging work:
				460
				461	1. Use the register scavenger to track frame index materialized into registers
				462	(those that do not fit in addressing modes) to allow reuse in the same BB.
				463	2. Finish scavenging for Thumb.
				464	3. We know some spills and restores are unnecessary. The issue is once live
				465	intervals are merged, they are not never split. So every def is spilled
				466	and every use requires a restore if the register allocator decides the
				467	resulting live interval is not assigned a physical register. It may be
				468	possible (with the help of the scavenger) to turn some spill / restore
				469	pairs into register copies.
Evan Cheng	44f4fca	2007-03-09 19:35:33 +0000	[diff] [blame]	470
				471	//===---------------------------------------------------------------------===//
				472
				473	Teach LSR about ARM addressing modes.