Blame - llvm/lib/Target/README.txt - toolchain/llvm-project

blob: 20f4898b2a4378dde9ccff7dafcd9de1bccc86ef [file] [log] [blame]

Chris Lattner	d1aaee0	2006-02-03 06:21:43 +0000	[diff] [blame]	1	Target Independent Opportunities:
				2
Chris Lattner	3cbd160	2006-09-28 06:01:17 +0000	[diff] [blame]	3	//===---------------------------------------------------------------------===//
				4
Chris Lattner	6dc2233	2006-11-14 01:57:53 +0000	[diff] [blame]	5	With the recent changes to make the implicit def/use set explicit in
				6	machineinstrs, we should change the target descriptions for 'call' instructions
				7	so that the .td files don't list all the call-clobbered registers as implicit
				8	defs. Instead, these should be added by the code generator (e.g. on the dag).
				9
				10	This has a number of uses:
				11
				12	1. PPC32/64 and X86 32/64 can avoid having multiple copies of call instructions
				13	for their different impdef sets.
				14	2. Targets with multiple calling convs (e.g. x86) which have different clobber
				15	sets don't need copies of call instructions.
				16	3. 'Interprocedural register allocation' can be done to reduce the clobber sets
				17	of calls.
				18
				19	//===---------------------------------------------------------------------===//
				20
Nate Begeman	bb01d4f	2006-03-17 01:40:33 +0000	[diff] [blame]	21	Make the PPC branch selector target independant
				22
				23	//===---------------------------------------------------------------------===//
Chris Lattner	d1aaee0	2006-02-03 06:21:43 +0000	[diff] [blame]	24
				25	Get the C front-end to expand hypot(x,y) -> llvm.sqrt(xx+yy) when errno and
				26	precision don't matter (ffastmath). Misc/mandel will like this. :)
				27
Chris Lattner	d1aaee0	2006-02-03 06:21:43 +0000	[diff] [blame]	28	//===---------------------------------------------------------------------===//
				29
				30	Solve this DAG isel folding deficiency:
				31
				32	int X, Y;
				33
				34	void fn1(void)
				35	{
				36	X = X \| (Y << 3);
				37	}
				38
				39	compiles to
				40
				41	fn1:
				42	movl Y, %eax
				43	shll $3, %eax
				44	orl X, %eax
				45	movl %eax, X
				46	ret
				47
				48	The problem is the store's chain operand is not the load X but rather
				49	a TokenFactor of the load X and load Y, which prevents the folding.
				50
				51	There are two ways to fix this:
				52
				53	1. The dag combiner can start using alias analysis to realize that y/x
				54	don't alias, making the store to X not dependent on the load from Y.
				55	2. The generated isel could be made smarter in the case it can't
				56	disambiguate the pointers.
				57
				58	Number 1 is the preferred solution.
				59
Evan Cheng	60f4951	2006-03-13 23:19:10 +0000	[diff] [blame]	60	This has been "fixed" by a TableGen hack. But that is a short term workaround
				61	which will be removed once the proper fix is made.
				62
Chris Lattner	d1aaee0	2006-02-03 06:21:43 +0000	[diff] [blame]	63	//===---------------------------------------------------------------------===//
				64
Chris Lattner	e43e5c0	2006-03-04 01:19:34 +0000	[diff] [blame]	65	On targets with expensive 64-bit multiply, we could LSR this:
				66
				67	for (i = ...; ++i) {
				68	x = 1ULL << i;
				69
				70	into:
				71	long long tmp = 1;
				72	for (i = ...; ++i, tmp+=tmp)
				73	x = tmp;
				74
				75	This would be a win on ppc32, but not x86 or ppc64.
				76
Chris Lattner	c9a318d	2006-03-04 08:44:51 +0000	[diff] [blame]	77	//===---------------------------------------------------------------------===//
Chris Lattner	5032c32	2006-03-05 20:00:08 +0000	[diff] [blame]	78
				79	Shrink: (setlt (loadi32 P), 0) -> (setlt (loadi8 Phi), 0)
				80
				81	//===---------------------------------------------------------------------===//
Chris Lattner	bccb0e0	2006-03-07 02:46:26 +0000	[diff] [blame]	82
Chris Lattner	003f633	2006-03-11 20:17:08 +0000	[diff] [blame]	83	Reassociate should turn: XXXX -> t=(XX) (t*t) to eliminate a multiply.
				84
				85	//===---------------------------------------------------------------------===//
				86
Chris Lattner	4e56b68	2006-03-11 20:20:40 +0000	[diff] [blame]	87	Interesting? testcase for add/shift/mul reassoc:
				88
				89	int bar(int x, int y) {
				90	return xxx+y+xxxxxyyyy;
				91	}
				92	int foo(int z, int n) {
				93	return bar(z, n) + bar(2z, 2n);
				94	}
				95
Chris Lattner	2cca31f	2007-05-05 22:29:06 +0000	[diff] [blame^]	96	Reassociate should handle the example in GCC PR16157.
				97
Chris Lattner	4e56b68	2006-03-11 20:20:40 +0000	[diff] [blame]	98	//===---------------------------------------------------------------------===//
				99
Chris Lattner	f136299	2006-03-09 20:13:21 +0000	[diff] [blame]	100	These two functions should generate the same code on big-endian systems:
				101
				102	int g(int j,int l) { return memcmp(j,l,4); }
				103	int h(int j, int l) { return j - l; }
				104
				105	this could be done in SelectionDAGISel.cpp, along with other special cases,
				106	for 1,2,4,8 bytes.
				107
				108	//===---------------------------------------------------------------------===//
				109
Chris Lattner	e24cf9d	2006-03-22 07:33:46 +0000	[diff] [blame]	110	It would be nice to revert this patch:
				111	http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060213/031986.html
				112
				113	And teach the dag combiner enough to simplify the code expanded before
				114	legalize. It seems plausible that this knowledge would let it simplify other
				115	stuff too.
				116
Chris Lattner	0affd76	2006-03-24 19:59:17 +0000	[diff] [blame]	117	//===---------------------------------------------------------------------===//
				118
Reid Spencer	09575ba	2007-02-15 03:39:18 +0000	[diff] [blame]	119	For vector types, TargetData.cpp::getTypeInfo() returns alignment that is equal
Evan Cheng	dc1161c	2006-03-31 22:35:14 +0000	[diff] [blame]	120	to the type size. It works but can be overly conservative as the alignment of
Reid Spencer	09575ba	2007-02-15 03:39:18 +0000	[diff] [blame]	121	specific vector types are target dependent.
Chris Lattner	0baebb1	2006-04-01 04:08:29 +0000	[diff] [blame]	122
				123	//===---------------------------------------------------------------------===//
				124
				125	We should add 'unaligned load/store' nodes, and produce them from code like
				126	this:
				127
				128	v4sf example(float *P) {
				129	return (v4sf){P[0], P[1], P[2], P[3] };
				130	}
				131
				132	//===---------------------------------------------------------------------===//
				133
Reid Spencer	09575ba	2007-02-15 03:39:18 +0000	[diff] [blame]	134	We should constant fold vector type casts at the LLVM level, regardless of the
Chris Lattner	7a29cf3	2006-04-02 01:47:20 +0000	[diff] [blame]	135	cast. Currently we cannot fold some casts because we don't have TargetData
				136	information in the constant folder, so we don't know the endianness of the
				137	target!
				138
				139	//===---------------------------------------------------------------------===//
Chris Lattner	d1c3a06	2006-04-20 18:49:28 +0000	[diff] [blame]	140
Chris Lattner	4cda95b	2006-05-18 18:26:13 +0000	[diff] [blame]	141	Add support for conditional increments, and other related patterns. Instead
				142	of:
				143
				144	movl 136(%esp), %eax
				145	cmpl $0, %eax
				146	je LBB16_2 #cond_next
				147	LBB16_1: #cond_true
				148	incl _foo
				149	LBB16_2: #cond_next
				150
				151	emit:
				152	movl _foo, %eax
				153	cmpl $1, %edi
				154	sbbl $-1, %eax
				155	movl %eax, _foo
				156
				157	//===---------------------------------------------------------------------===//
Chris Lattner	240f846	2006-05-19 20:45:08 +0000	[diff] [blame]	158
				159	Combine: a = sin(x), b = cos(x) into a,b = sincos(x).
				160
				161	Expand these to calls of sin/cos and stores:
				162	double sincos(double x, double sin, double cos);
				163	float sincosf(float x, float sin, float cos);
				164	long double sincosl(long double x, long double sin, long double cos);
				165
				166	Doing so could allow SROA of the destination pointers. See also:
				167	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17687
				168
				169	//===---------------------------------------------------------------------===//
Chris Lattner	29d7bde	2006-05-19 21:01:38 +0000	[diff] [blame]	170
				171	Scalar Repl cannot currently promote this testcase to 'ret long cst':
				172
Chris Lattner	2cca31f	2007-05-05 22:29:06 +0000	[diff] [blame^]	173	%struct.X = type { i32, i32 }
Chris Lattner	29d7bde	2006-05-19 21:01:38 +0000	[diff] [blame]	174	%struct.Y = type { %struct.X }
Chris Lattner	2cca31f	2007-05-05 22:29:06 +0000	[diff] [blame^]	175
				176	define i64 @bar() {
				177	%retval = alloca %struct.Y, align 8
				178	%tmp12 = getelementptr %struct.Y* %retval, i32 0, i32 0, i32 0
				179	store i32 0, i32* %tmp12
				180	%tmp15 = getelementptr %struct.Y* %retval, i32 0, i32 0, i32 1
				181	store i32 1, i32* %tmp15
				182	%retval.upgrd.1 = bitcast %struct.Y* %retval to i64*
				183	%retval.upgrd.2 = load i64* %retval.upgrd.1
				184	ret i64 %retval.upgrd.2
Chris Lattner	29d7bde	2006-05-19 21:01:38 +0000	[diff] [blame]	185	}
				186
				187	it should be extended to do so.
				188
				189	//===---------------------------------------------------------------------===//
Chris Lattner	80b0a70	2006-05-21 03:57:07 +0000	[diff] [blame]	190
Chris Lattner	feeb9c7	2006-12-11 00:44:03 +0000	[diff] [blame]	191	-scalarrepl should promote this to be a vector scalar.
				192
				193	%struct..0anon = type { <4 x float> }
				194
Chris Lattner	2cca31f	2007-05-05 22:29:06 +0000	[diff] [blame^]	195	define void @test1(<4 x float> %V, float* %P) {
Chris Lattner	feeb9c7	2006-12-11 00:44:03 +0000	[diff] [blame]	196	%u = alloca %struct..0anon, align 16
Chris Lattner	2cca31f	2007-05-05 22:29:06 +0000	[diff] [blame^]	197	%tmp = getelementptr %struct..0anon* %u, i32 0, i32 0
Chris Lattner	feeb9c7	2006-12-11 00:44:03 +0000	[diff] [blame]	198	store <4 x float> %V, <4 x float>* %tmp
				199	%tmp1 = bitcast %struct..0anon* %u to [4 x float]*
Chris Lattner	2cca31f	2007-05-05 22:29:06 +0000	[diff] [blame^]	200	%tmp.upgrd.1 = getelementptr [4 x float]* %tmp1, i32 0, i32 1
				201	%tmp.upgrd.2 = load float* %tmp.upgrd.1
				202	%tmp3 = mul float %tmp.upgrd.2, 2.000000e+00
Chris Lattner	feeb9c7	2006-12-11 00:44:03 +0000	[diff] [blame]	203	store float %tmp3, float* %P
				204	ret void
				205	}
				206
				207	//===---------------------------------------------------------------------===//
				208
Chris Lattner	80b0a70	2006-05-21 03:57:07 +0000	[diff] [blame]	209	Turn this into a single byte store with no load (the other 3 bytes are
				210	unmodified):
				211
				212	void %test(uint* %P) {
				213	%tmp = load uint* %P
				214	%tmp14 = or uint %tmp, 3305111552
				215	%tmp15 = and uint %tmp14, 3321888767
				216	store uint %tmp15, uint* %P
				217	ret void
				218	}
				219
Chris Lattner	a5d45872	2006-05-30 21:29:15 +0000	[diff] [blame]	220	//===---------------------------------------------------------------------===//
				221
				222	dag/inst combine "clz(x)>>5 -> x==0" for 32-bit x.
				223
				224	Compile:
				225
				226	int bar(int x)
				227	{
				228	int t = __builtin_clz(x);
				229	return -(t>>5);
				230	}
				231
				232	to:
				233
				234	_bar: addic r3,r3,-1
				235	subfe r3,r3,r3
				236	blr
				237
Chris Lattner	c9dc375	2006-09-15 20:31:36 +0000	[diff] [blame]	238	//===---------------------------------------------------------------------===//
				239
				240	Legalize should lower ctlz like this:
				241	ctlz(x) = popcnt((x-1) & ~x)
				242
				243	on targets that have popcnt but not ctlz. itanium, what else?
Chris Lattner	a5d45872	2006-05-30 21:29:15 +0000	[diff] [blame]	244
Chris Lattner	f7e3478	2006-09-16 23:57:51 +0000	[diff] [blame]	245	//===---------------------------------------------------------------------===//
				246
				247	quantum_sigma_x in 462.libquantum contains the following loop:
				248
				249	for(i=0; i<reg->size; i++)
				250	{
				251	/* Flip the target bit of each basis state */
				252	reg->node[i].state ^= ((MAX_UNSIGNED) 1 << target);
				253	}
				254
				255	Where MAX_UNSIGNED/state is a 64-bit int. On a 32-bit platform it would be just
				256	so cool to turn it into something like:
				257
Chris Lattner	4a13d3b	2006-09-18 04:54:35 +0000	[diff] [blame]	258	long long Res = ((MAX_UNSIGNED) 1 << target);
Chris Lattner	f7e3478	2006-09-16 23:57:51 +0000	[diff] [blame]	259	if (target < 32) {
				260	for(i=0; i<reg->size; i++)
Chris Lattner	4a13d3b	2006-09-18 04:54:35 +0000	[diff] [blame]	261	reg->node[i].state ^= Res & 0xFFFFFFFFULL;
Chris Lattner	f7e3478	2006-09-16 23:57:51 +0000	[diff] [blame]	262	} else {
				263	for(i=0; i<reg->size; i++)
Chris Lattner	4a13d3b	2006-09-18 04:54:35 +0000	[diff] [blame]	264	reg->node[i].state ^= Res & 0xFFFFFFFF00000000ULL
Chris Lattner	f7e3478	2006-09-16 23:57:51 +0000	[diff] [blame]	265	}
				266
				267	... which would only do one 32-bit XOR per loop iteration instead of two.
				268
				269	It would also be nice to recognize the reg->size doesn't alias reg->node[i], but
				270	alas...
				271
				272	//===---------------------------------------------------------------------===//
Chris Lattner	f11327d	2006-09-25 17:12:14 +0000	[diff] [blame]	273
				274	This isn't recognized as bswap by instcombine:
				275
				276	unsigned int swap_32(unsigned int v) {
				277	v = ((v & 0x00ff00ffU) << 8) \| ((v & 0xff00ff00U) >> 8);
				278	v = ((v & 0x0000ffffU) << 16) \| ((v & 0xffff0000U) >> 16);
				279	return v;
				280	}
				281
Chris Lattner	4d475f6	2006-12-08 02:01:32 +0000	[diff] [blame]	282	Nor is this (yes, it really is bswap):
				283
				284	unsigned long reverse(unsigned v) {
				285	unsigned t;
				286	t = v ^ ((v << 16) \| (v >> 16));
				287	t &= ~0xff0000;
				288	v = (v << 24) \| (v >> 8);
				289	return v ^ (t >> 8);
				290	}
				291
Chris Lattner	f11327d	2006-09-25 17:12:14 +0000	[diff] [blame]	292	//===---------------------------------------------------------------------===//
				293
				294	These should turn into single 16-bit (unaligned?) loads on little/big endian
				295	processors.
				296
				297	unsigned short read_16_le(const unsigned char *adr) {
				298	return adr[0] \| (adr[1] << 8);
				299	}
				300	unsigned short read_16_be(const unsigned char *adr) {
				301	return (adr[0] << 8) \| adr[1];
				302	}
				303
				304	//===---------------------------------------------------------------------===//
Chris Lattner	f054003	2006-10-24 16:12:47 +0000	[diff] [blame]	305
Reid Spencer	7e80b0b	2006-10-26 06:15:43 +0000	[diff] [blame]	306	-instcombine should handle this transform:
Reid Spencer	266e42b	2006-12-23 06:05:41 +0000	[diff] [blame]	307	icmp pred (sdiv X / C1 ), C2
Reid Spencer	7e80b0b	2006-10-26 06:15:43 +0000	[diff] [blame]	308	when X, C1, and C2 are unsigned. Similarly for udiv and signed operands.
				309
				310	Currently InstCombine avoids this transform but will do it when the signs of
				311	the operands and the sign of the divide match. See the FIXME in
				312	InstructionCombining.cpp in the visitSetCondInst method after the switch case
				313	for Instruction::UDiv (around line 4447) for more details.
				314
				315	The SingleSource/Benchmarks/Shootout-C++/hash and hash2 tests have examples of
				316	this construct.
Chris Lattner	2048373	2006-11-03 22:27:39 +0000	[diff] [blame]	317
				318	//===---------------------------------------------------------------------===//
				319
				320	Instcombine misses several of these cases (see the testcase in the patch):
				321	http://gcc.gnu.org/ml/gcc-patches/2006-10/msg01519.html
				322
Reid Spencer	7e80b0b	2006-10-26 06:15:43 +0000	[diff] [blame]	323	//===---------------------------------------------------------------------===//
Chris Lattner	4e03cb1	2006-11-10 00:23:26 +0000	[diff] [blame]	324
				325	viterbi speeds up significantly if the various "history" related copy loops
				326	are turned into memcpy calls at the source level. We need a "loops to memcpy"
				327	pass.
				328
				329	//===---------------------------------------------------------------------===//
Nick Lewycky	0df2ada	2006-11-13 00:23:28 +0000	[diff] [blame]	330
Chris Lattner	89e5813	2007-01-16 06:39:48 +0000	[diff] [blame]	331	Consider:
				332
				333	typedef unsigned U32;
				334	typedef unsigned long long U64;
				335	int test (U32 inst, U64 regs) {
				336	U64 effective_addr2;
				337	U32 temp = *inst;
				338	int r1 = (temp >> 20) & 0xf;
				339	int b2 = (temp >> 16) & 0xf;
				340	effective_addr2 = temp & 0xfff;
				341	if (b2) effective_addr2 += regs[b2];
				342	b2 = (temp >> 12) & 0xf;
				343	if (b2) effective_addr2 += regs[b2];
				344	effective_addr2 &= regs[4];
				345	if ((effective_addr2 & 3) == 0)
				346	return 1;
				347	return 0;
				348	}
				349
				350	Note that only the low 2 bits of effective_addr2 are used. On 32-bit systems,
				351	we don't eliminate the computation of the top half of effective_addr2 because
				352	we don't have whole-function selection dags. On x86, this means we use one
				353	extra register for the function when effective_addr2 is declared as U64 than
				354	when it is declared U32.
				355
				356	//===---------------------------------------------------------------------===//
				357
Chris Lattner	09a32e4	2007-02-13 21:44:43 +0000	[diff] [blame]	358	Promote for i32 bswap can use i64 bswap + shr. Useful on targets with 64-bit
				359	regs and bswap, like itanium.
				360
				361	//===---------------------------------------------------------------------===//
Chris Lattner	43cab75	2007-03-24 06:01:32 +0000	[diff] [blame]	362
				363	LSR should know what GPR types a target has. This code:
				364
				365	volatile short X, Y; // globals
				366
				367	void foo(int N) {
				368	int i;
				369	for (i = 0; i < N; i++) { X = i; Y = i*4; }
				370	}
				371
				372	produces two identical IV's (after promotion) on PPC/ARM:
				373
				374	LBB1_1: @bb.preheader
				375	mov r3, #0
				376	mov r2, r3
				377	mov r1, r3
				378	LBB1_2: @bb
				379	ldr r12, LCPI1_0
				380	ldr r12, [r12]
				381	strh r2, [r12]
				382	ldr r12, LCPI1_1
				383	ldr r12, [r12]
				384	strh r3, [r12]
				385	add r1, r1, #1 <- [0,+,1]
				386	add r3, r3, #4
				387	add r2, r2, #1 <- [0,+,1]
				388	cmp r1, r0
				389	bne LBB1_2 @bb
				390
				391
				392	//===---------------------------------------------------------------------===//
				393
Chris Lattner	2cca31f	2007-05-05 22:29:06 +0000	[diff] [blame^]	394	Tail call elim should be more aggressive, checking to see if the call is
				395	followed by an uncond branch to an exit block.
				396
				397	; This testcase is due to tail-duplication not wanting to copy the return
				398	; instruction into the terminating blocks because there was other code
				399	; optimized out of the function after the taildup happened.
				400	;RUN: llvm-upgrade < %s \| llvm-as \| opt -tailcallelim \| llvm-dis \| not grep call
				401
				402	int %t4(int %a) {
				403	entry:
				404	%tmp.1 = and int %a, 1
				405	%tmp.2 = cast int %tmp.1 to bool
				406	br bool %tmp.2, label %then.0, label %else.0
				407
				408	then.0:
				409	%tmp.5 = add int %a, -1
				410	%tmp.3 = call int %t4( int %tmp.5 )
				411	br label %return
				412
				413	else.0:
				414	%tmp.7 = setne int %a, 0
				415	br bool %tmp.7, label %then.1, label %return
				416
				417	then.1:
				418	%tmp.11 = add int %a, -2
				419	%tmp.9 = call int %t4( int %tmp.11 )
				420	br label %return
				421
				422	return:
				423	%result.0 = phi int [ 0, %else.0 ], [ %tmp.3, %then.0 ],
				424	[ %tmp.9, %then.1 ]
				425	ret int %result.0
				426	}