Blame - llvm/lib/Target/README.txt - toolchain/llvm-project

blob: c04e06fbfe0d078bf4e53ae454493c9dce48deee [file] [log] [blame]

Chris Lattner	d1aaee0	2006-02-03 06:21:43 +0000	[diff] [blame]	1	Target Independent Opportunities:
				2
Chris Lattner	3cbd160	2006-09-28 06:01:17 +0000	[diff] [blame]	3	//===---------------------------------------------------------------------===//
				4
				5	We should make the following changes to clean up MachineInstr:
				6
				7	1. Add an Opcode field to TargetInstrDescriptor, so you can tell the opcode of
				8	an instruction with just a TargetInstrDescriptor*.
				9	2. Remove the Opcode field from MachineInstr, replacing it with a
				10	TargetInstrDescriptor*.
				11	3. Getting information about a machine instr then becomes:
				12	MI->getInfo()->isTwoAddress()
				13	instead of:
				14	const TargetInstrInfo &TII = ...
				15	TII.isTwoAddrInstr(MI->getOpcode())
				16
				17	//===---------------------------------------------------------------------===//
Chris Lattner	d1aaee0	2006-02-03 06:21:43 +0000	[diff] [blame]	18
Chris Lattner	6dc2233	2006-11-14 01:57:53 +0000	[diff] [blame]	19	With the recent changes to make the implicit def/use set explicit in
				20	machineinstrs, we should change the target descriptions for 'call' instructions
				21	so that the .td files don't list all the call-clobbered registers as implicit
				22	defs. Instead, these should be added by the code generator (e.g. on the dag).
				23
				24	This has a number of uses:
				25
				26	1. PPC32/64 and X86 32/64 can avoid having multiple copies of call instructions
				27	for their different impdef sets.
				28	2. Targets with multiple calling convs (e.g. x86) which have different clobber
				29	sets don't need copies of call instructions.
				30	3. 'Interprocedural register allocation' can be done to reduce the clobber sets
				31	of calls.
				32
				33	//===---------------------------------------------------------------------===//
				34
Chris Lattner	d1aaee0	2006-02-03 06:21:43 +0000	[diff] [blame]	35	FreeBench/mason contains code like this:
				36
				37	static p_type m0u(p_type p) {
				38	int m[]={0, 8, 1, 2, 16, 5, 13, 7, 14, 9, 3, 4, 11, 12, 15, 10, 17, 6};
				39	p_type pu;
				40	pu.a = m[p.a];
				41	pu.b = m[p.b];
				42	pu.c = m[p.c];
				43	return pu;
				44	}
				45
				46	We currently compile this into a memcpy from a static array into 'm', then
				47	a bunch of loads from m. It would be better to avoid the memcpy and just do
				48	loads from the static array.
				49
Nate Begeman	bb01d4f	2006-03-17 01:40:33 +0000	[diff] [blame]	50	//===---------------------------------------------------------------------===//
				51
				52	Make the PPC branch selector target independant
				53
				54	//===---------------------------------------------------------------------===//
Chris Lattner	d1aaee0	2006-02-03 06:21:43 +0000	[diff] [blame]	55
				56	Get the C front-end to expand hypot(x,y) -> llvm.sqrt(xx+yy) when errno and
				57	precision don't matter (ffastmath). Misc/mandel will like this. :)
				58
Chris Lattner	d1aaee0	2006-02-03 06:21:43 +0000	[diff] [blame]	59	//===---------------------------------------------------------------------===//
				60
				61	Solve this DAG isel folding deficiency:
				62
				63	int X, Y;
				64
				65	void fn1(void)
				66	{
				67	X = X \| (Y << 3);
				68	}
				69
				70	compiles to
				71
				72	fn1:
				73	movl Y, %eax
				74	shll $3, %eax
				75	orl X, %eax
				76	movl %eax, X
				77	ret
				78
				79	The problem is the store's chain operand is not the load X but rather
				80	a TokenFactor of the load X and load Y, which prevents the folding.
				81
				82	There are two ways to fix this:
				83
				84	1. The dag combiner can start using alias analysis to realize that y/x
				85	don't alias, making the store to X not dependent on the load from Y.
				86	2. The generated isel could be made smarter in the case it can't
				87	disambiguate the pointers.
				88
				89	Number 1 is the preferred solution.
				90
Evan Cheng	60f4951	2006-03-13 23:19:10 +0000	[diff] [blame]	91	This has been "fixed" by a TableGen hack. But that is a short term workaround
				92	which will be removed once the proper fix is made.
				93
Chris Lattner	d1aaee0	2006-02-03 06:21:43 +0000	[diff] [blame]	94	//===---------------------------------------------------------------------===//
				95
Chris Lattner	e43e5c0	2006-03-04 01:19:34 +0000	[diff] [blame]	96	On targets with expensive 64-bit multiply, we could LSR this:
				97
				98	for (i = ...; ++i) {
				99	x = 1ULL << i;
				100
				101	into:
				102	long long tmp = 1;
				103	for (i = ...; ++i, tmp+=tmp)
				104	x = tmp;
				105
				106	This would be a win on ppc32, but not x86 or ppc64.
				107
Chris Lattner	c9a318d	2006-03-04 08:44:51 +0000	[diff] [blame]	108	//===---------------------------------------------------------------------===//
Chris Lattner	5032c32	2006-03-05 20:00:08 +0000	[diff] [blame]	109
				110	Shrink: (setlt (loadi32 P), 0) -> (setlt (loadi8 Phi), 0)
				111
				112	//===---------------------------------------------------------------------===//
Chris Lattner	bccb0e0	2006-03-07 02:46:26 +0000	[diff] [blame]	113
Chris Lattner	003f633	2006-03-11 20:17:08 +0000	[diff] [blame]	114	Reassociate should turn: XXXX -> t=(XX) (t*t) to eliminate a multiply.
				115
				116	//===---------------------------------------------------------------------===//
				117
Chris Lattner	4e56b68	2006-03-11 20:20:40 +0000	[diff] [blame]	118	Interesting? testcase for add/shift/mul reassoc:
				119
				120	int bar(int x, int y) {
				121	return xxx+y+xxxxxyyyy;
				122	}
				123	int foo(int z, int n) {
				124	return bar(z, n) + bar(2z, 2n);
				125	}
				126
				127	//===---------------------------------------------------------------------===//
				128
Chris Lattner	f136299	2006-03-09 20:13:21 +0000	[diff] [blame]	129	These two functions should generate the same code on big-endian systems:
				130
				131	int g(int j,int l) { return memcmp(j,l,4); }
				132	int h(int j, int l) { return j - l; }
				133
				134	this could be done in SelectionDAGISel.cpp, along with other special cases,
				135	for 1,2,4,8 bytes.
				136
				137	//===---------------------------------------------------------------------===//
				138
Chris Lattner	5271a1f	2006-03-14 19:31:24 +0000	[diff] [blame]	139	This code:
				140	int rot(unsigned char b) { int a = ((b>>1) ^ (b<<7)) & 0xff; return a; }
				141
				142	Can be improved in two ways:
				143
				144	1. The instcombiner should eliminate the type conversions.
				145	2. The X86 backend should turn this into a rotate by one bit.
				146
Evan Cheng	0a03f78	2006-03-19 06:09:23 +0000	[diff] [blame]	147	//===---------------------------------------------------------------------===//
				148
				149	Add LSR exit value substitution. It'll probably be a win for Ackermann, etc.
Chris Lattner	e24cf9d	2006-03-22 07:33:46 +0000	[diff] [blame]	150
				151	//===---------------------------------------------------------------------===//
				152
				153	It would be nice to revert this patch:
				154	http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060213/031986.html
				155
				156	And teach the dag combiner enough to simplify the code expanded before
				157	legalize. It seems plausible that this knowledge would let it simplify other
				158	stuff too.
				159
Chris Lattner	0affd76	2006-03-24 19:59:17 +0000	[diff] [blame]	160	//===---------------------------------------------------------------------===//
				161
Evan Cheng	dc1161c	2006-03-31 22:35:14 +0000	[diff] [blame]	162	For packed types, TargetData.cpp::getTypeInfo() returns alignment that is equal
				163	to the type size. It works but can be overly conservative as the alignment of
				164	specific packed types are target dependent.
Chris Lattner	0baebb1	2006-04-01 04:08:29 +0000	[diff] [blame]	165
				166	//===---------------------------------------------------------------------===//
				167
				168	We should add 'unaligned load/store' nodes, and produce them from code like
				169	this:
				170
				171	v4sf example(float *P) {
				172	return (v4sf){P[0], P[1], P[2], P[3] };
				173	}
				174
				175	//===---------------------------------------------------------------------===//
				176
Chris Lattner	7a29cf3	2006-04-02 01:47:20 +0000	[diff] [blame]	177	We should constant fold packed type casts at the LLVM level, regardless of the
				178	cast. Currently we cannot fold some casts because we don't have TargetData
				179	information in the constant folder, so we don't know the endianness of the
				180	target!
				181
				182	//===---------------------------------------------------------------------===//
Chris Lattner	d1c3a06	2006-04-20 18:49:28 +0000	[diff] [blame]	183
Chris Lattner	4cda95b	2006-05-18 18:26:13 +0000	[diff] [blame]	184	Add support for conditional increments, and other related patterns. Instead
				185	of:
				186
				187	movl 136(%esp), %eax
				188	cmpl $0, %eax
				189	je LBB16_2 #cond_next
				190	LBB16_1: #cond_true
				191	incl _foo
				192	LBB16_2: #cond_next
				193
				194	emit:
				195	movl _foo, %eax
				196	cmpl $1, %edi
				197	sbbl $-1, %eax
				198	movl %eax, _foo
				199
				200	//===---------------------------------------------------------------------===//
Chris Lattner	240f846	2006-05-19 20:45:08 +0000	[diff] [blame]	201
				202	Combine: a = sin(x), b = cos(x) into a,b = sincos(x).
				203
				204	Expand these to calls of sin/cos and stores:
				205	double sincos(double x, double sin, double cos);
				206	float sincosf(float x, float sin, float cos);
				207	long double sincosl(long double x, long double sin, long double cos);
				208
				209	Doing so could allow SROA of the destination pointers. See also:
				210	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17687
				211
				212	//===---------------------------------------------------------------------===//
Chris Lattner	29d7bde	2006-05-19 21:01:38 +0000	[diff] [blame]	213
				214	Scalar Repl cannot currently promote this testcase to 'ret long cst':
				215
				216	%struct.X = type { int, int }
				217	%struct.Y = type { %struct.X }
				218	ulong %bar() {
				219	%retval = alloca %struct.Y, align 8 ; <%struct.Y*> [#uses=3]
Chris Lattner	f7e3478	2006-09-16 23:57:51 +0000	[diff] [blame]	220	%tmp12 = getelementptr %struct.Y* %retval, int 0, uint 0, uint 0
Chris Lattner	29d7bde	2006-05-19 21:01:38 +0000	[diff] [blame]	221	store int 0, int* %tmp12
Chris Lattner	f7e3478	2006-09-16 23:57:51 +0000	[diff] [blame]	222	%tmp15 = getelementptr %struct.Y* %retval, int 0, uint 0, uint 1
Chris Lattner	29d7bde	2006-05-19 21:01:38 +0000	[diff] [blame]	223	store int 1, int* %tmp15
Chris Lattner	f7e3478	2006-09-16 23:57:51 +0000	[diff] [blame]	224	%retval = cast %struct.Y* %retval to ulong*
Chris Lattner	29d7bde	2006-05-19 21:01:38 +0000	[diff] [blame]	225	%retval = load ulong* %retval ; <ulong> [#uses=1]
				226	ret ulong %retval
				227	}
				228
				229	it should be extended to do so.
				230
				231	//===---------------------------------------------------------------------===//
Chris Lattner	80b0a70	2006-05-21 03:57:07 +0000	[diff] [blame]	232
				233	Turn this into a single byte store with no load (the other 3 bytes are
				234	unmodified):
				235
				236	void %test(uint* %P) {
				237	%tmp = load uint* %P
				238	%tmp14 = or uint %tmp, 3305111552
				239	%tmp15 = and uint %tmp14, 3321888767
				240	store uint %tmp15, uint* %P
				241	ret void
				242	}
				243
Chris Lattner	a5d45872	2006-05-30 21:29:15 +0000	[diff] [blame]	244	//===---------------------------------------------------------------------===//
				245
				246	dag/inst combine "clz(x)>>5 -> x==0" for 32-bit x.
				247
				248	Compile:
				249
				250	int bar(int x)
				251	{
				252	int t = __builtin_clz(x);
				253	return -(t>>5);
				254	}
				255
				256	to:
				257
				258	_bar: addic r3,r3,-1
				259	subfe r3,r3,r3
				260	blr
				261
Chris Lattner	c9dc375	2006-09-15 20:31:36 +0000	[diff] [blame]	262	//===---------------------------------------------------------------------===//
				263
				264	Legalize should lower ctlz like this:
				265	ctlz(x) = popcnt((x-1) & ~x)
				266
				267	on targets that have popcnt but not ctlz. itanium, what else?
Chris Lattner	a5d45872	2006-05-30 21:29:15 +0000	[diff] [blame]	268
Chris Lattner	f7e3478	2006-09-16 23:57:51 +0000	[diff] [blame]	269	//===---------------------------------------------------------------------===//
				270
				271	quantum_sigma_x in 462.libquantum contains the following loop:
				272
				273	for(i=0; i<reg->size; i++)
				274	{
				275	/* Flip the target bit of each basis state */
				276	reg->node[i].state ^= ((MAX_UNSIGNED) 1 << target);
				277	}
				278
				279	Where MAX_UNSIGNED/state is a 64-bit int. On a 32-bit platform it would be just
				280	so cool to turn it into something like:
				281
Chris Lattner	4a13d3b	2006-09-18 04:54:35 +0000	[diff] [blame]	282	long long Res = ((MAX_UNSIGNED) 1 << target);
Chris Lattner	f7e3478	2006-09-16 23:57:51 +0000	[diff] [blame]	283	if (target < 32) {
				284	for(i=0; i<reg->size; i++)
Chris Lattner	4a13d3b	2006-09-18 04:54:35 +0000	[diff] [blame]	285	reg->node[i].state ^= Res & 0xFFFFFFFFULL;
Chris Lattner	f7e3478	2006-09-16 23:57:51 +0000	[diff] [blame]	286	} else {
				287	for(i=0; i<reg->size; i++)
Chris Lattner	4a13d3b	2006-09-18 04:54:35 +0000	[diff] [blame]	288	reg->node[i].state ^= Res & 0xFFFFFFFF00000000ULL
Chris Lattner	f7e3478	2006-09-16 23:57:51 +0000	[diff] [blame]	289	}
				290
				291	... which would only do one 32-bit XOR per loop iteration instead of two.
				292
				293	It would also be nice to recognize the reg->size doesn't alias reg->node[i], but
				294	alas...
				295
				296	//===---------------------------------------------------------------------===//
Chris Lattner	f11327d	2006-09-25 17:12:14 +0000	[diff] [blame]	297
				298	This isn't recognized as bswap by instcombine:
				299
				300	unsigned int swap_32(unsigned int v) {
				301	v = ((v & 0x00ff00ffU) << 8) \| ((v & 0xff00ff00U) >> 8);
				302	v = ((v & 0x0000ffffU) << 16) \| ((v & 0xffff0000U) >> 16);
				303	return v;
				304	}
				305
Chris Lattner	4d475f6	2006-12-08 02:01:32 +0000	[diff] [blame]	306	Nor is this (yes, it really is bswap):
				307
				308	unsigned long reverse(unsigned v) {
				309	unsigned t;
				310	t = v ^ ((v << 16) \| (v >> 16));
				311	t &= ~0xff0000;
				312	v = (v << 24) \| (v >> 8);
				313	return v ^ (t >> 8);
				314	}
				315
Chris Lattner	f11327d	2006-09-25 17:12:14 +0000	[diff] [blame]	316	//===---------------------------------------------------------------------===//
				317
				318	These should turn into single 16-bit (unaligned?) loads on little/big endian
				319	processors.
				320
				321	unsigned short read_16_le(const unsigned char *adr) {
				322	return adr[0] \| (adr[1] << 8);
				323	}
				324	unsigned short read_16_be(const unsigned char *adr) {
				325	return (adr[0] << 8) \| adr[1];
				326	}
				327
				328	//===---------------------------------------------------------------------===//
Chris Lattner	f054003	2006-10-24 16:12:47 +0000	[diff] [blame]	329
				330	-scalarrepl should promote this to be a vector scalar.
				331
				332	%struct..0anon = type { <4 x float> }
				333	implementation ; Functions:
				334	void %test1(<4 x float> %V, float* %P) {
				335	entry:
				336	%u = alloca %struct..0anon, align 16 ; <%struct..0anon*> [#uses=2]
				337	%tmp = getelementptr %struct..0anon* %u, int 0, uint 0 ; <<4 x float>*> [#uses=1]
				338	store <4 x float> %V, <4 x float>* %tmp
				339	%tmp1 = cast %struct..0anon* %u to [4 x float]* ; <[4 x float]*> [#uses=1]
				340	%tmp = getelementptr [4 x float]* %tmp1, int 0, int 1 ; <float*> [#uses=1]
				341	%tmp = load float* %tmp ; <float> [#uses=1]
				342	%tmp3 = mul float %tmp, 2.000000e+00 ; <float> [#uses=1]
				343	store float %tmp3, float* %P
				344	ret void
				345	}
				346
				347	//===---------------------------------------------------------------------===//
Reid Spencer	7e80b0b	2006-10-26 06:15:43 +0000	[diff] [blame]	348
				349	-instcombine should handle this transform:
				350	setcc (sdiv X / C1 ), C2
				351	when X, C1, and C2 are unsigned. Similarly for udiv and signed operands.
				352
				353	Currently InstCombine avoids this transform but will do it when the signs of
				354	the operands and the sign of the divide match. See the FIXME in
				355	InstructionCombining.cpp in the visitSetCondInst method after the switch case
				356	for Instruction::UDiv (around line 4447) for more details.
				357
				358	The SingleSource/Benchmarks/Shootout-C++/hash and hash2 tests have examples of
				359	this construct.
Chris Lattner	2048373	2006-11-03 22:27:39 +0000	[diff] [blame]	360
				361	//===---------------------------------------------------------------------===//
				362
				363	Instcombine misses several of these cases (see the testcase in the patch):
				364	http://gcc.gnu.org/ml/gcc-patches/2006-10/msg01519.html
				365
Reid Spencer	7e80b0b	2006-10-26 06:15:43 +0000	[diff] [blame]	366	//===---------------------------------------------------------------------===//
Chris Lattner	4e03cb1	2006-11-10 00:23:26 +0000	[diff] [blame]	367
				368	viterbi speeds up significantly if the various "history" related copy loops
				369	are turned into memcpy calls at the source level. We need a "loops to memcpy"
				370	pass.
				371
				372	//===---------------------------------------------------------------------===//
Nick Lewycky	0df2ada	2006-11-13 00:23:28 +0000	[diff] [blame]	373
				374	-predsimplify should transform this:
				375
				376	void bad(unsigned x)
				377	{
				378	if (x > 4)
				379	bar(12);
				380	else if (x > 3)
				381	bar(523);
				382	else if (x > 2)
				383	bar(36);
				384	else if (x > 1)
				385	bar(65);
				386	else if (x > 0)
				387	bar(45);
				388	else
				389	bar(367);
				390	}
				391
				392	into:
				393
				394	void good(unsigned x)
				395	{
				396	if (x == 4)
				397	bar(523);
				398	else if (x == 3)
				399	bar(36);
				400	else if (x == 2)
				401	bar(65);
				402	else if (x == 1)
				403	bar(45);
				404	else if (x == 0)
				405	bar(367);
				406	else
				407	bar(12);
				408	}
				409
				410	to enable further optimizations.
				411
				412	//===---------------------------------------------------------------------===//