Blame - lib/Target/README.txt - fp2-dev/platform/external/llvm

blob: 89ea9d0afc42c49460c280bda16f04d6340bd587 [file] [log] [blame]

Chris Lattner	086c014	2006-02-03 06:21:43 +0000	[diff] [blame]	1	Target Independent Opportunities:
				2
Chris Lattner	f308ea0	2006-09-28 06:01:17 +0000	[diff] [blame]	3	//===---------------------------------------------------------------------===//
				4
Chris Lattner	9b62b45	2006-11-14 01:57:53 +0000	[diff] [blame]	5	With the recent changes to make the implicit def/use set explicit in
				6	machineinstrs, we should change the target descriptions for 'call' instructions
				7	so that the .td files don't list all the call-clobbered registers as implicit
				8	defs. Instead, these should be added by the code generator (e.g. on the dag).
				9
				10	This has a number of uses:
				11
				12	1. PPC32/64 and X86 32/64 can avoid having multiple copies of call instructions
				13	for their different impdef sets.
				14	2. Targets with multiple calling convs (e.g. x86) which have different clobber
				15	sets don't need copies of call instructions.
				16	3. 'Interprocedural register allocation' can be done to reduce the clobber sets
				17	of calls.
				18
				19	//===---------------------------------------------------------------------===//
				20
Nate Begeman	81e8097	2006-03-17 01:40:33 +0000	[diff] [blame]	21	Make the PPC branch selector target independant
				22
				23	//===---------------------------------------------------------------------===//
Chris Lattner	086c014	2006-02-03 06:21:43 +0000	[diff] [blame]	24
				25	Get the C front-end to expand hypot(x,y) -> llvm.sqrt(xx+yy) when errno and
Chris Lattner	2dae65d	2008-12-10 01:30:48 +0000	[diff] [blame]	26	precision don't matter (ffastmath). Misc/mandel will like this. :) This isn't
				27	safe in general, even on darwin. See the libm implementation of hypot for
				28	examples (which special case when x/y are exactly zero to get signed zeros etc
				29	right).
Chris Lattner	086c014	2006-02-03 06:21:43 +0000	[diff] [blame]	30
Chris Lattner	086c014	2006-02-03 06:21:43 +0000	[diff] [blame]	31	//===---------------------------------------------------------------------===//
				32
				33	Solve this DAG isel folding deficiency:
				34
				35	int X, Y;
				36
				37	void fn1(void)
				38	{
				39	X = X \| (Y << 3);
				40	}
				41
				42	compiles to
				43
				44	fn1:
				45	movl Y, %eax
				46	shll $3, %eax
				47	orl X, %eax
				48	movl %eax, X
				49	ret
				50
				51	The problem is the store's chain operand is not the load X but rather
				52	a TokenFactor of the load X and load Y, which prevents the folding.
				53
				54	There are two ways to fix this:
				55
				56	1. The dag combiner can start using alias analysis to realize that y/x
				57	don't alias, making the store to X not dependent on the load from Y.
				58	2. The generated isel could be made smarter in the case it can't
				59	disambiguate the pointers.
				60
				61	Number 1 is the preferred solution.
				62
Evan Cheng	e617b08	2006-03-13 23:19:10 +0000	[diff] [blame]	63	This has been "fixed" by a TableGen hack. But that is a short term workaround
				64	which will be removed once the proper fix is made.
				65
Chris Lattner	086c014	2006-02-03 06:21:43 +0000	[diff] [blame]	66	//===---------------------------------------------------------------------===//
				67
Chris Lattner	b27b69f	2006-03-04 01:19:34 +0000	[diff] [blame]	68	On targets with expensive 64-bit multiply, we could LSR this:
				69
				70	for (i = ...; ++i) {
				71	x = 1ULL << i;
				72
				73	into:
				74	long long tmp = 1;
				75	for (i = ...; ++i, tmp+=tmp)
				76	x = tmp;
				77
				78	This would be a win on ppc32, but not x86 or ppc64.
				79
Chris Lattner	ad01993	2006-03-04 08:44:51 +0000	[diff] [blame]	80	//===---------------------------------------------------------------------===//
Chris Lattner	5b0fe7d	2006-03-05 20:00:08 +0000	[diff] [blame]	81
				82	Shrink: (setlt (loadi32 P), 0) -> (setlt (loadi8 Phi), 0)
				83
				84	//===---------------------------------------------------------------------===//
Chris Lattner	549f27d2	2006-03-07 02:46:26 +0000	[diff] [blame]	85
Chris Lattner	c20995e	2006-03-11 20:17:08 +0000	[diff] [blame]	86	Reassociate should turn: XXXX -> t=(XX) (t*t) to eliminate a multiply.
				87
				88	//===---------------------------------------------------------------------===//
				89
Chris Lattner	74cfb7d	2006-03-11 20:20:40 +0000	[diff] [blame]	90	Interesting? testcase for add/shift/mul reassoc:
				91
				92	int bar(int x, int y) {
				93	return xxx+y+xxxxxyyyy;
				94	}
				95	int foo(int z, int n) {
				96	return bar(z, n) + bar(2z, 2n);
				97	}
				98
Chris Lattner	5e14b0d	2007-05-05 22:29:06 +0000	[diff] [blame]	99	Reassociate should handle the example in GCC PR16157.
				100
Chris Lattner	74cfb7d	2006-03-11 20:20:40 +0000	[diff] [blame]	101	//===---------------------------------------------------------------------===//
				102
Chris Lattner	82c78b2	2006-03-09 20:13:21 +0000	[diff] [blame]	103	These two functions should generate the same code on big-endian systems:
				104
				105	int g(int j,int l) { return memcmp(j,l,4); }
				106	int h(int j, int l) { return j - l; }
				107
				108	this could be done in SelectionDAGISel.cpp, along with other special cases,
				109	for 1,2,4,8 bytes.
				110
				111	//===---------------------------------------------------------------------===//
				112
Chris Lattner	c04b423	2006-03-22 07:33:46 +0000	[diff] [blame]	113	It would be nice to revert this patch:
				114	http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060213/031986.html
				115
				116	And teach the dag combiner enough to simplify the code expanded before
				117	legalize. It seems plausible that this knowledge would let it simplify other
				118	stuff too.
				119
Chris Lattner	e6cd96d	2006-03-24 19:59:17 +0000	[diff] [blame]	120	//===---------------------------------------------------------------------===//
				121
Reid Spencer	ac9dcb9	2007-02-15 03:39:18 +0000	[diff] [blame]	122	For vector types, TargetData.cpp::getTypeInfo() returns alignment that is equal
Evan Cheng	67d3d4c	2006-03-31 22:35:14 +0000	[diff] [blame]	123	to the type size. It works but can be overly conservative as the alignment of
Reid Spencer	ac9dcb9	2007-02-15 03:39:18 +0000	[diff] [blame]	124	specific vector types are target dependent.
Chris Lattner	eaa7c06	2006-04-01 04:08:29 +0000	[diff] [blame]	125
				126	//===---------------------------------------------------------------------===//
				127
Dan Gohman	1f3be1a	2009-05-11 18:51:16 +0000	[diff] [blame]	128	We should produce an unaligned load from code like this:
Chris Lattner	eaa7c06	2006-04-01 04:08:29 +0000	[diff] [blame]	129
				130	v4sf example(float *P) {
				131	return (v4sf){P[0], P[1], P[2], P[3] };
				132	}
				133
				134	//===---------------------------------------------------------------------===//
				135
Chris Lattner	16abfdf	2006-05-18 18:26:13 +0000	[diff] [blame]	136	Add support for conditional increments, and other related patterns. Instead
				137	of:
				138
				139	movl 136(%esp), %eax
				140	cmpl $0, %eax
				141	je LBB16_2 #cond_next
				142	LBB16_1: #cond_true
				143	incl _foo
				144	LBB16_2: #cond_next
				145
				146	emit:
				147	movl _foo, %eax
				148	cmpl $1, %edi
				149	sbbl $-1, %eax
				150	movl %eax, _foo
				151
				152	//===---------------------------------------------------------------------===//
Chris Lattner	870cf1b	2006-05-19 20:45:08 +0000	[diff] [blame]	153
				154	Combine: a = sin(x), b = cos(x) into a,b = sincos(x).
				155
				156	Expand these to calls of sin/cos and stores:
				157	double sincos(double x, double sin, double cos);
				158	float sincosf(float x, float sin, float cos);
				159	long double sincosl(long double x, long double sin, long double cos);
				160
				161	Doing so could allow SROA of the destination pointers. See also:
				162	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17687
				163
Chris Lattner	2dae65d	2008-12-10 01:30:48 +0000	[diff] [blame]	164	This is now easily doable with MRVs. We could even make an intrinsic for this
				165	if anyone cared enough about sincos.
				166
Chris Lattner	870cf1b	2006-05-19 20:45:08 +0000	[diff] [blame]	167	//===---------------------------------------------------------------------===//
Chris Lattner	f00f68a	2006-05-19 21:01:38 +0000	[diff] [blame]	168
Chris Lattner	e8263e6	2006-05-21 03:57:07 +0000	[diff] [blame]	169	Turn this into a single byte store with no load (the other 3 bytes are
				170	unmodified):
				171
Dan Gohman	5c8274b	2009-05-11 18:04:52 +0000	[diff] [blame]	172	define void @test(i32* %P) {
				173	%tmp = load i32* %P
				174	%tmp14 = or i32 %tmp, 3305111552
				175	%tmp15 = and i32 %tmp14, 3321888767
				176	store i32 %tmp15, i32* %P
Chris Lattner	e8263e6	2006-05-21 03:57:07 +0000	[diff] [blame]	177	ret void
				178	}
				179
Chris Lattner	9e18ef5	2006-05-30 21:29:15 +0000	[diff] [blame]	180	//===---------------------------------------------------------------------===//
				181
				182	dag/inst combine "clz(x)>>5 -> x==0" for 32-bit x.
				183
				184	Compile:
				185
				186	int bar(int x)
				187	{
				188	int t = __builtin_clz(x);
				189	return -(t>>5);
				190	}
				191
				192	to:
				193
				194	_bar: addic r3,r3,-1
				195	subfe r3,r3,r3
				196	blr
				197
Chris Lattner	cbce2f6	2006-09-15 20:31:36 +0000	[diff] [blame]	198	//===---------------------------------------------------------------------===//
				199
Chris Lattner	7ed96ab	2006-09-16 23:57:51 +0000	[diff] [blame]	200	quantum_sigma_x in 462.libquantum contains the following loop:
				201
				202	for(i=0; i<reg->size; i++)
				203	{
				204	/* Flip the target bit of each basis state */
				205	reg->node[i].state ^= ((MAX_UNSIGNED) 1 << target);
				206	}
				207
				208	Where MAX_UNSIGNED/state is a 64-bit int. On a 32-bit platform it would be just
				209	so cool to turn it into something like:
				210
Chris Lattner	b33a42a	2006-09-18 04:54:35 +0000	[diff] [blame]	211	long long Res = ((MAX_UNSIGNED) 1 << target);
Chris Lattner	7ed96ab	2006-09-16 23:57:51 +0000	[diff] [blame]	212	if (target < 32) {
				213	for(i=0; i<reg->size; i++)
Chris Lattner	b33a42a	2006-09-18 04:54:35 +0000	[diff] [blame]	214	reg->node[i].state ^= Res & 0xFFFFFFFFULL;
Chris Lattner	7ed96ab	2006-09-16 23:57:51 +0000	[diff] [blame]	215	} else {
				216	for(i=0; i<reg->size; i++)
Chris Lattner	b33a42a	2006-09-18 04:54:35 +0000	[diff] [blame]	217	reg->node[i].state ^= Res & 0xFFFFFFFF00000000ULL
Chris Lattner	7ed96ab	2006-09-16 23:57:51 +0000	[diff] [blame]	218	}
				219
				220	... which would only do one 32-bit XOR per loop iteration instead of two.
				221
				222	It would also be nice to recognize the reg->size doesn't alias reg->node[i], but
Chris Lattner	faa6adf	2009-09-21 06:04:07 +0000	[diff] [blame^]	223	alas.
				224
				225	//===---------------------------------------------------------------------===//
				226
				227	This should be optimized to one 'and' and one 'or', from PR4216:
				228
				229	define i32 @test_bitfield(i32 %bf.prev.low) nounwind ssp {
				230	entry:
				231	%bf.prev.lo.cleared10 = or i32 %bf.prev.low, 32962 ; <i32> [#uses=1]
				232	%0 = and i32 %bf.prev.low, -65536 ; <i32> [#uses=1]
				233	%1 = and i32 %bf.prev.lo.cleared10, 40186 ; <i32> [#uses=1]
				234	%2 = or i32 %1, %0 ; <i32> [#uses=1]
				235	ret i32 %2
				236	}
Chris Lattner	7ed96ab	2006-09-16 23:57:51 +0000	[diff] [blame]	237
				238	//===---------------------------------------------------------------------===//
Chris Lattner	fb981f3	2006-09-25 17:12:14 +0000	[diff] [blame]	239
Chris Lattner	b1ac769	2008-10-05 02:16:12 +0000	[diff] [blame]	240	This isn't recognized as bswap by instcombine (yes, it really is bswap):
Chris Lattner	f9bae43	2006-12-08 02:01:32 +0000	[diff] [blame]	241
				242	unsigned long reverse(unsigned v) {
				243	unsigned t;
				244	t = v ^ ((v << 16) \| (v >> 16));
				245	t &= ~0xff0000;
				246	v = (v << 24) \| (v >> 8);
				247	return v ^ (t >> 8);
				248	}
				249
Chris Lattner	fb981f3	2006-09-25 17:12:14 +0000	[diff] [blame]	250	//===---------------------------------------------------------------------===//
				251
Chris Lattner	f4fee2a	2008-10-15 16:02:15 +0000	[diff] [blame]	252	These idioms should be recognized as popcount (see PR1488):
				253
				254	unsigned countbits_slow(unsigned v) {
				255	unsigned c;
				256	for (c = 0; v; v >>= 1)
				257	c += v & 1;
				258	return c;
				259	}
				260	unsigned countbits_fast(unsigned v){
				261	unsigned c;
				262	for (c = 0; v; c++)
				263	v &= v - 1; // clear the least significant bit set
				264	return c;
				265	}
				266
				267	BITBOARD = unsigned long long
				268	int PopCnt(register BITBOARD a) {
				269	register int c=0;
				270	while(a) {
				271	c++;
				272	a &= a - 1;
				273	}
				274	return c;
				275	}
				276	unsigned int popcount(unsigned int input) {
				277	unsigned int count = 0;
				278	for (unsigned int i = 0; i < 4 * 8; i++)
				279	count += (input >> i) & i;
				280	return count;
				281	}
				282
				283	//===---------------------------------------------------------------------===//
				284
Chris Lattner	fb981f3	2006-09-25 17:12:14 +0000	[diff] [blame]	285	These should turn into single 16-bit (unaligned?) loads on little/big endian
				286	processors.
				287
				288	unsigned short read_16_le(const unsigned char *adr) {
				289	return adr[0] \| (adr[1] << 8);
				290	}
				291	unsigned short read_16_be(const unsigned char *adr) {
				292	return (adr[0] << 8) \| adr[1];
				293	}
				294
				295	//===---------------------------------------------------------------------===//
Chris Lattner	cf10391	2006-10-24 16:12:47 +0000	[diff] [blame]	296
Reid Spencer	1628cec	2006-10-26 06:15:43 +0000	[diff] [blame]	297	-instcombine should handle this transform:
Reid Spencer	e4d87aa	2006-12-23 06:05:41 +0000	[diff] [blame]	298	icmp pred (sdiv X / C1 ), C2
Reid Spencer	1628cec	2006-10-26 06:15:43 +0000	[diff] [blame]	299	when X, C1, and C2 are unsigned. Similarly for udiv and signed operands.
				300
				301	Currently InstCombine avoids this transform but will do it when the signs of
				302	the operands and the sign of the divide match. See the FIXME in
				303	InstructionCombining.cpp in the visitSetCondInst method after the switch case
				304	for Instruction::UDiv (around line 4447) for more details.
				305
				306	The SingleSource/Benchmarks/Shootout-C++/hash and hash2 tests have examples of
				307	this construct.
Chris Lattner	d7c628d	2006-11-03 22:27:39 +0000	[diff] [blame]	308
				309	//===---------------------------------------------------------------------===//
				310
Chris Lattner	578d2df	2006-11-10 00:23:26 +0000	[diff] [blame]	311	viterbi speeds up significantly if the various "history" related copy loops
				312	are turned into memcpy calls at the source level. We need a "loops to memcpy"
				313	pass.
				314
				315	//===---------------------------------------------------------------------===//
Nick Lewycky	bf63734	2006-11-13 00:23:28 +0000	[diff] [blame]	316
Chris Lattner	03a6d96	2007-01-16 06:39:48 +0000	[diff] [blame]	317	Consider:
				318
				319	typedef unsigned U32;
				320	typedef unsigned long long U64;
				321	int test (U32 inst, U64 regs) {
				322	U64 effective_addr2;
				323	U32 temp = *inst;
				324	int r1 = (temp >> 20) & 0xf;
				325	int b2 = (temp >> 16) & 0xf;
				326	effective_addr2 = temp & 0xfff;
				327	if (b2) effective_addr2 += regs[b2];
				328	b2 = (temp >> 12) & 0xf;
				329	if (b2) effective_addr2 += regs[b2];
				330	effective_addr2 &= regs[4];
				331	if ((effective_addr2 & 3) == 0)
				332	return 1;
				333	return 0;
				334	}
				335
				336	Note that only the low 2 bits of effective_addr2 are used. On 32-bit systems,
				337	we don't eliminate the computation of the top half of effective_addr2 because
				338	we don't have whole-function selection dags. On x86, this means we use one
				339	extra register for the function when effective_addr2 is declared as U64 than
				340	when it is declared U32.
				341
				342	//===---------------------------------------------------------------------===//
				343
Chris Lattner	1a77a55	2007-03-24 06:01:32 +0000	[diff] [blame]	344	LSR should know what GPR types a target has. This code:
				345
				346	volatile short X, Y; // globals
				347
				348	void foo(int N) {
				349	int i;
				350	for (i = 0; i < N; i++) { X = i; Y = i*4; }
				351	}
				352
Chris Lattner	c1491f3	2009-09-20 17:37:38 +0000	[diff] [blame]	353	produces two near identical IV's (after promotion) on PPC/ARM:
Chris Lattner	1a77a55	2007-03-24 06:01:32 +0000	[diff] [blame]	354
Chris Lattner	c1491f3	2009-09-20 17:37:38 +0000	[diff] [blame]	355	LBB1_2:
				356	ldr r3, LCPI1_0
				357	ldr r3, [r3]
				358	strh r2, [r3]
				359	ldr r3, LCPI1_1
				360	ldr r3, [r3]
				361	strh r1, [r3]
				362	add r1, r1, #4
				363	add r2, r2, #1 <- [0,+,1]
				364	sub r0, r0, #1 <- [0,-,1]
				365	cmp r0, #0
				366	bne LBB1_2
				367
				368	LSR should reuse the "+" IV for the exit test.
Chris Lattner	1a77a55	2007-03-24 06:01:32 +0000	[diff] [blame]	369
				370
				371	//===---------------------------------------------------------------------===//
				372
Chris Lattner	5e14b0d	2007-05-05 22:29:06 +0000	[diff] [blame]	373	Tail call elim should be more aggressive, checking to see if the call is
				374	followed by an uncond branch to an exit block.
				375
				376	; This testcase is due to tail-duplication not wanting to copy the return
				377	; instruction into the terminating blocks because there was other code
				378	; optimized out of the function after the taildup happened.
Chris Lattner	7c4e9a4	2008-02-18 18:46:39 +0000	[diff] [blame]	379	; RUN: llvm-as < %s \| opt -tailcallelim \| llvm-dis \| not grep call
Chris Lattner	5e14b0d	2007-05-05 22:29:06 +0000	[diff] [blame]	380
Chris Lattner	7c4e9a4	2008-02-18 18:46:39 +0000	[diff] [blame]	381	define i32 @t4(i32 %a) {
Chris Lattner	5e14b0d	2007-05-05 22:29:06 +0000	[diff] [blame]	382	entry:
Chris Lattner	7c4e9a4	2008-02-18 18:46:39 +0000	[diff] [blame]	383	%tmp.1 = and i32 %a, 1 ; <i32> [#uses=1]
				384	%tmp.2 = icmp ne i32 %tmp.1, 0 ; <i1> [#uses=1]
				385	br i1 %tmp.2, label %then.0, label %else.0
Chris Lattner	5e14b0d	2007-05-05 22:29:06 +0000	[diff] [blame]	386
Chris Lattner	7c4e9a4	2008-02-18 18:46:39 +0000	[diff] [blame]	387	then.0: ; preds = %entry
				388	%tmp.5 = add i32 %a, -1 ; <i32> [#uses=1]
				389	%tmp.3 = call i32 @t4( i32 %tmp.5 ) ; <i32> [#uses=1]
				390	br label %return
Chris Lattner	5e14b0d	2007-05-05 22:29:06 +0000	[diff] [blame]	391
Chris Lattner	7c4e9a4	2008-02-18 18:46:39 +0000	[diff] [blame]	392	else.0: ; preds = %entry
				393	%tmp.7 = icmp ne i32 %a, 0 ; <i1> [#uses=1]
				394	br i1 %tmp.7, label %then.1, label %return
Chris Lattner	5e14b0d	2007-05-05 22:29:06 +0000	[diff] [blame]	395
Chris Lattner	7c4e9a4	2008-02-18 18:46:39 +0000	[diff] [blame]	396	then.1: ; preds = %else.0
				397	%tmp.11 = add i32 %a, -2 ; <i32> [#uses=1]
				398	%tmp.9 = call i32 @t4( i32 %tmp.11 ) ; <i32> [#uses=1]
				399	br label %return
Chris Lattner	5e14b0d	2007-05-05 22:29:06 +0000	[diff] [blame]	400
Chris Lattner	7c4e9a4	2008-02-18 18:46:39 +0000	[diff] [blame]	401	return: ; preds = %then.1, %else.0, %then.0
				402	%result.0 = phi i32 [ 0, %else.0 ], [ %tmp.3, %then.0 ],
Chris Lattner	5e14b0d	2007-05-05 22:29:06 +0000	[diff] [blame]	403	[ %tmp.9, %then.1 ]
Chris Lattner	7c4e9a4	2008-02-18 18:46:39 +0000	[diff] [blame]	404	ret i32 %result.0
Chris Lattner	5e14b0d	2007-05-05 22:29:06 +0000	[diff] [blame]	405	}
Chris Lattner	f110a2b	2007-05-05 22:44:08 +0000	[diff] [blame]	406
				407	//===---------------------------------------------------------------------===//
				408
Chris Lattner	e1bb6ab	2007-10-03 06:10:59 +0000	[diff] [blame]	409	Tail recursion elimination is not transforming this function, because it is
				410	returning n, which fails the isDynamicConstant check in the accumulator
				411	recursion checks.
				412
				413	long long fib(const long long n) {
				414	switch(n) {
				415	case 0:
				416	case 1:
				417	return n;
				418	default:
				419	return fib(n-1) + fib(n-2);
				420	}
				421	}
				422
				423	//===---------------------------------------------------------------------===//
				424
Chris Lattner	c90b866	2008-08-10 00:47:21 +0000	[diff] [blame]	425	Tail recursion elimination should handle:
				426
				427	int pow2m1(int n) {
				428	if (n == 0)
				429	return 0;
				430	return 2 * pow2m1 (n - 1) + 1;
				431	}
				432
				433	Also, multiplies can be turned into SHL's, so they should be handled as if
				434	they were associative. "return foo() << 1" can be tail recursion eliminated.
				435
				436	//===---------------------------------------------------------------------===//
				437
Chris Lattner	f110a2b	2007-05-05 22:44:08 +0000	[diff] [blame]	438	Argument promotion should promote arguments for recursive functions, like
				439	this:
				440
Chris Lattner	7c4e9a4	2008-02-18 18:46:39 +0000	[diff] [blame]	441	; RUN: llvm-as < %s \| opt -argpromotion \| llvm-dis \| grep x.val
Chris Lattner	f110a2b	2007-05-05 22:44:08 +0000	[diff] [blame]	442
Chris Lattner	7c4e9a4	2008-02-18 18:46:39 +0000	[diff] [blame]	443	define internal i32 @foo(i32* %x) {
Chris Lattner	f110a2b	2007-05-05 22:44:08 +0000	[diff] [blame]	444	entry:
Chris Lattner	7c4e9a4	2008-02-18 18:46:39 +0000	[diff] [blame]	445	%tmp = load i32* %x ; <i32> [#uses=0]
				446	%tmp.foo = call i32 @foo( i32* %x ) ; <i32> [#uses=1]
				447	ret i32 %tmp.foo
Chris Lattner	f110a2b	2007-05-05 22:44:08 +0000	[diff] [blame]	448	}
				449
Chris Lattner	7c4e9a4	2008-02-18 18:46:39 +0000	[diff] [blame]	450	define i32 @bar(i32* %x) {
Chris Lattner	f110a2b	2007-05-05 22:44:08 +0000	[diff] [blame]	451	entry:
Chris Lattner	7c4e9a4	2008-02-18 18:46:39 +0000	[diff] [blame]	452	%tmp3 = call i32 @foo( i32* %x ) ; <i32> [#uses=1]
				453	ret i32 %tmp3
Chris Lattner	f110a2b	2007-05-05 22:44:08 +0000	[diff] [blame]	454	}
				455
Chris Lattner	81f2d71	2007-12-05 23:05:06 +0000	[diff] [blame]	456	//===---------------------------------------------------------------------===//
Chris Lattner	166a268	2007-12-28 04:42:05 +0000	[diff] [blame]	457
				458	"basicaa" should know how to look through "or" instructions that act like add
				459	instructions. For example in this code, the x4+1 is turned into x4 \| 1, and
				460	basicaa can't analyze the array subscript, leading to duplicated loads in the
				461	generated code:
				462
				463	void test(int X, int Y, int a[]) {
				464	int i;
				465	for (i=2; i<1000; i+=4) {
				466	a[i+0] = a[i-1+0]*a[i-2+0];
				467	a[i+1] = a[i-1+1]*a[i-2+1];
				468	a[i+2] = a[i-1+2]*a[i-2+2];
				469	a[i+3] = a[i-1+3]*a[i-2+3];
				470	}
				471	}
				472
Chris Lattner	2dae65d	2008-12-10 01:30:48 +0000	[diff] [blame]	473	BasicAA also doesn't do this for add. It needs to know that &A[i+1] != &A[i].
				474
Chris Lattner	a1643ba	2007-12-28 22:30:05 +0000	[diff] [blame]	475	//===---------------------------------------------------------------------===//
Chris Lattner	166a268	2007-12-28 04:42:05 +0000	[diff] [blame]	476
Chris Lattner	a1643ba	2007-12-28 22:30:05 +0000	[diff] [blame]	477	We should investigate an instruction sinking pass. Consider this silly
				478	example in pic mode:
				479
				480	#include <assert.h>
				481	void foo(int x) {
				482	assert(x);
				483	//...
				484	}
				485
				486	we compile this to:
				487	_foo:
				488	subl $28, %esp
				489	call "L1$pb"
				490	"L1$pb":
				491	popl %eax
				492	cmpl $0, 32(%esp)
				493	je LBB1_2 # cond_true
				494	LBB1_1: # return
				495	# ...
				496	addl $28, %esp
				497	ret
				498	LBB1_2: # cond_true
				499	...
				500
				501	The PIC base computation (call+popl) is only used on one path through the
				502	code, but is currently always computed in the entry block. It would be
				503	better to sink the picbase computation down into the block for the
				504	assertion, as it is the only one that uses it. This happens for a lot of
				505	code with early outs.
				506
Chris Lattner	92c06a0	2007-12-29 01:05:01 +0000	[diff] [blame]	507	Another example is loads of arguments, which are usually emitted into the
				508	entry block on targets like x86. If not used in all paths through a
				509	function, they should be sunk into the ones that do.
				510
Chris Lattner	a1643ba	2007-12-28 22:30:05 +0000	[diff] [blame]	511	In this case, whole-function-isel would also handle this.
Chris Lattner	166a268	2007-12-28 04:42:05 +0000	[diff] [blame]	512
				513	//===---------------------------------------------------------------------===//
Chris Lattner	b304194	2008-01-07 21:38:14 +0000	[diff] [blame]	514
				515	Investigate lowering of sparse switch statements into perfect hash tables:
				516	http://burtleburtle.net/bob/hash/perfect.html
				517
				518	//===---------------------------------------------------------------------===//
Chris Lattner	f61b63e	2008-01-09 00:17:57 +0000	[diff] [blame]	519
				520	We should turn things like "load+fabs+store" and "load+fneg+store" into the
				521	corresponding integer operations. On a yonah, this loop:
				522
				523	double a[256];
Chris Lattner	7c4e9a4	2008-02-18 18:46:39 +0000	[diff] [blame]	524	void foo() {
				525	int i, b;
				526	for (b = 0; b < 10000000; b++)
				527	for (i = 0; i < 256; i++)
				528	a[i] = -a[i];
				529	}
Chris Lattner	f61b63e	2008-01-09 00:17:57 +0000	[diff] [blame]	530
				531	is twice as slow as this loop:
				532
				533	long long a[256];
Chris Lattner	7c4e9a4	2008-02-18 18:46:39 +0000	[diff] [blame]	534	void foo() {
				535	int i, b;
				536	for (b = 0; b < 10000000; b++)
				537	for (i = 0; i < 256; i++)
				538	a[i] ^= (1ULL << 63);
				539	}
Chris Lattner	f61b63e	2008-01-09 00:17:57 +0000	[diff] [blame]	540
				541	and I suspect other processors are similar. On X86 in particular this is a
				542	big win because doing this with integers allows the use of read/modify/write
				543	instructions.
				544
				545	//===---------------------------------------------------------------------===//
Chris Lattner	8372601	2008-01-10 18:25:41 +0000	[diff] [blame]	546
				547	DAG Combiner should try to combine small loads into larger loads when
				548	profitable. For example, we compile this C++ example:
				549
				550	struct THotKey { short Key; bool Control; bool Shift; bool Alt; };
				551	extern THotKey m_HotKey;
				552	THotKey GetHotKey () { return m_HotKey; }
				553
				554	into (-O3 -fno-exceptions -static -fomit-frame-pointer):
				555
				556	__Z9GetHotKeyv:
				557	pushl %esi
				558	movl 8(%esp), %eax
				559	movb _m_HotKey+3, %cl
				560	movb _m_HotKey+4, %dl
				561	movb _m_HotKey+2, %ch
				562	movw _m_HotKey, %si
				563	movw %si, (%eax)
				564	movb %ch, 2(%eax)
				565	movb %cl, 3(%eax)
				566	movb %dl, 4(%eax)
				567	popl %esi
				568	ret $4
				569
				570	GCC produces:
				571
				572	__Z9GetHotKeyv:
				573	movl _m_HotKey, %edx
				574	movl 4(%esp), %eax
				575	movl %edx, (%eax)
				576	movzwl _m_HotKey+4, %edx
				577	movw %dx, 4(%eax)
				578	ret $4
				579
				580	The LLVM IR contains the needed alignment info, so we should be able to
				581	merge the loads and stores into 4-byte loads:
				582
				583	%struct.THotKey = type { i16, i8, i8, i8 }
				584	define void @_Z9GetHotKeyv(%struct.THotKey* sret %agg.result) nounwind {
				585	...
				586	%tmp2 = load i16* getelementptr (@m_HotKey, i32 0, i32 0), align 8
				587	%tmp5 = load i8* getelementptr (@m_HotKey, i32 0, i32 1), align 2
				588	%tmp8 = load i8* getelementptr (@m_HotKey, i32 0, i32 2), align 1
				589	%tmp11 = load i8* getelementptr (@m_HotKey, i32 0, i32 3), align 2
				590
				591	Alternatively, we should use a small amount of base-offset alias analysis
				592	to make it so the scheduler doesn't need to hold all the loads in regs at
				593	once.
				594
				595	//===---------------------------------------------------------------------===//
Chris Lattner	497b7e9	2008-01-11 06:17:47 +0000	[diff] [blame]	596
Nate Begeman	e9fe65c	2008-02-18 18:39:23 +0000	[diff] [blame]	597	We should add an FRINT node to the DAG to model targets that have legal
				598	implementations of ceil/floor/rint.
Chris Lattner	48840f8	2008-02-28 05:34:27 +0000	[diff] [blame]	599
				600	//===---------------------------------------------------------------------===//
				601
				602	Consider:
				603
				604	int test() {
				605	long long input[8] = {1,1,1,1,1,1,1,1};
				606	foo(input);
				607	}
				608
				609	We currently compile this into a memcpy from a global array since the
				610	initializer is fairly large and not memset'able. This is good, but the memcpy
				611	gets lowered to load/stores in the code generator. This is also ok, except
				612	that the codegen lowering for memcpy doesn't handle the case when the source
				613	is a constant global. This gives us atrocious code like this:
				614
				615	call "L1$pb"
				616	"L1$pb":
				617	popl %eax
				618	movl _C.0.1444-"L1$pb"+32(%eax), %ecx
				619	movl %ecx, 40(%esp)
				620	movl _C.0.1444-"L1$pb"+20(%eax), %ecx
				621	movl %ecx, 28(%esp)
				622	movl _C.0.1444-"L1$pb"+36(%eax), %ecx
				623	movl %ecx, 44(%esp)
				624	movl _C.0.1444-"L1$pb"+44(%eax), %ecx
				625	movl %ecx, 52(%esp)
				626	movl _C.0.1444-"L1$pb"+40(%eax), %ecx
				627	movl %ecx, 48(%esp)
				628	movl _C.0.1444-"L1$pb"+12(%eax), %ecx
				629	movl %ecx, 20(%esp)
				630	movl _C.0.1444-"L1$pb"+4(%eax), %ecx
				631	...
				632
				633	instead of:
				634	movl $1, 16(%esp)
				635	movl $0, 20(%esp)
				636	movl $1, 24(%esp)
				637	movl $0, 28(%esp)
				638	movl $1, 32(%esp)
				639	movl $0, 36(%esp)
				640	...
				641
				642	//===---------------------------------------------------------------------===//
Chris Lattner	a11deb0	2008-03-02 02:51:40 +0000	[diff] [blame]	643
				644	http://llvm.org/PR717:
				645
				646	The following code should compile into "ret int undef". Instead, LLVM
				647	produces "ret int 0":
				648
				649	int f() {
				650	int x = 4;
				651	int y;
				652	if (x == 3) y = 0;
				653	return y;
				654	}
				655
				656	//===---------------------------------------------------------------------===//
Chris Lattner	53b7277	2008-03-02 19:29:42 +0000	[diff] [blame]	657
				658	The loop unroller should partially unroll loops (instead of peeling them)
				659	when code growth isn't too bad and when an unroll count allows simplification
				660	of some code within the loop. One trivial example is:
				661
				662	#include <stdio.h>
				663	int main() {
				664	int nRet = 17;
				665	int nLoop;
				666	for ( nLoop = 0; nLoop < 1000; nLoop++ ) {
				667	if ( nLoop & 1 )
				668	nRet += 2;
				669	else
				670	nRet -= 1;
				671	}
				672	return nRet;
				673	}
				674
				675	Unrolling by 2 would eliminate the '&1' in both copies, leading to a net
				676	reduction in code size. The resultant code would then also be suitable for
				677	exit value computation.
				678
				679	//===---------------------------------------------------------------------===//
Chris Lattner	349155b	2008-03-17 01:47:51 +0000	[diff] [blame]	680
				681	We miss a bunch of rotate opportunities on various targets, including ppc, x86,
				682	etc. On X86, we miss a bunch of 'rotate by variable' cases because the rotate
				683	matching code in dag combine doesn't look through truncates aggressively
				684	enough. Here are some testcases reduces from GCC PR17886:
				685
				686	unsigned long long f(unsigned long long x, int y) {
				687	return (x << y) \| (x >> 64-y);
				688	}
				689	unsigned f2(unsigned x, int y){
				690	return (x << y) \| (x >> 32-y);
				691	}
				692	unsigned long long f3(unsigned long long x){
				693	int y = 9;
				694	return (x << y) \| (x >> 64-y);
				695	}
				696	unsigned f4(unsigned x){
				697	int y = 10;
				698	return (x << y) \| (x >> 32-y);
				699	}
				700	unsigned long long f5(unsigned long long x, unsigned long long y) {
				701	return (x << 8) \| ((y >> 48) & 0xffull);
				702	}
				703	unsigned long long f6(unsigned long long x, unsigned long long y, int z) {
				704	switch(z) {
				705	case 1:
				706	return (x << 8) \| ((y >> 48) & 0xffull);
				707	case 2:
				708	return (x << 16) \| ((y >> 40) & 0xffffull);
				709	case 3:
				710	return (x << 24) \| ((y >> 32) & 0xffffffull);
				711	case 4:
				712	return (x << 32) \| ((y >> 24) & 0xffffffffull);
				713	default:
				714	return (x << 40) \| ((y >> 16) & 0xffffffffffull);
				715	}
				716	}
				717
Dan Gohman	cb747c5	2008-10-17 21:39:27 +0000	[diff] [blame]	718	On X86-64, we only handle f2/f3/f4 right. On x86-32, a few of these
Chris Lattner	349155b	2008-03-17 01:47:51 +0000	[diff] [blame]	719	generate truly horrible code, instead of using shld and friends. On
				720	ARM, we end up with calls to L___lshrdi3/L___ashldi3 in f, which is
				721	badness. PPC64 misses f, f5 and f6. CellSPU aborts in isel.
				722
				723	//===---------------------------------------------------------------------===//
Chris Lattner	f70107f	2008-03-20 04:46:13 +0000	[diff] [blame]	724
				725	We do a number of simplifications in simplify libcalls to strength reduce
				726	standard library functions, but we don't currently merge them together. For
				727	example, it is useful to merge memcpy(a,b,strlen(b)) -> strcpy. This can only
				728	be done safely if "b" isn't modified between the strlen and memcpy of course.
				729
				730	//===---------------------------------------------------------------------===//
				731
Chris Lattner	10c5d36	2008-07-14 00:19:59 +0000	[diff] [blame]	732	Reassociate should turn things like:
				733
				734	int factorial(int X) {
				735	return XXXXXXX*X;
				736	}
				737
				738	into llvm.powi calls, allowing the code generator to produce balanced
				739	multiplication trees.
				740
				741	//===---------------------------------------------------------------------===//
				742
Chris Lattner	26e150f	2008-08-10 01:14:08 +0000	[diff] [blame]	743	We generate a horrible libcall for llvm.powi. For example, we compile:
				744
				745	#include <cmath>
				746	double f(double a) { return std::pow(a, 4); }
				747
				748	into:
				749
				750	__Z1fd:
				751	subl $12, %esp
				752	movsd 16(%esp), %xmm0
				753	movsd %xmm0, (%esp)
				754	movl $4, 8(%esp)
				755	call L___powidf2$stub
				756	addl $12, %esp
				757	ret
				758
				759	GCC produces:
				760
				761	__Z1fd:
				762	subl $12, %esp
				763	movsd 16(%esp), %xmm0
				764	mulsd %xmm0, %xmm0
				765	mulsd %xmm0, %xmm0
				766	movsd %xmm0, (%esp)
				767	fldl (%esp)
				768	addl $12, %esp
				769	ret
				770
				771	//===---------------------------------------------------------------------===//
				772
				773	We compile this program: (from GCC PR11680)
				774	http://gcc.gnu.org/bugzilla/attachment.cgi?id=4487
				775
				776	Into code that runs the same speed in fast/slow modes, but both modes run 2x
				777	slower than when compile with GCC (either 4.0 or 4.2):
				778
				779	$ llvm-g++ perf.cpp -O3 -fno-exceptions
				780	$ time ./a.out fast
				781	1.821u 0.003s 0:01.82 100.0% 0+0k 0+0io 0pf+0w
				782
				783	$ g++ perf.cpp -O3 -fno-exceptions
				784	$ time ./a.out fast
				785	0.821u 0.001s 0:00.82 100.0% 0+0k 0+0io 0pf+0w
				786
				787	It looks like we are making the same inlining decisions, so this may be raw
				788	codegen badness or something else (haven't investigated).
				789
				790	//===---------------------------------------------------------------------===//
				791
				792	We miss some instcombines for stuff like this:
				793	void bar (void);
				794	void foo (unsigned int a) {
				795	/* This one is equivalent to a >= (3 << 2). */
				796	if ((a >> 2) >= 3)
				797	bar ();
				798	}
				799
				800	A few other related ones are in GCC PR14753.
				801
				802	//===---------------------------------------------------------------------===//
				803
				804	Divisibility by constant can be simplified (according to GCC PR12849) from
				805	being a mulhi to being a mul lo (cheaper). Testcase:
				806
				807	void bar(unsigned n) {
				808	if (n % 3 == 0)
				809	true();
				810	}
				811
				812	I think this basically amounts to a dag combine to simplify comparisons against
				813	multiply hi's into a comparison against the mullo.
				814
				815	//===---------------------------------------------------------------------===//
Chris Lattner	23f35bc	2008-08-19 06:22:16 +0000	[diff] [blame]	816
Chris Lattner	db03983	2008-10-15 16:06:03 +0000	[diff] [blame]	817	Better mod/ref analysis for scanf would allow us to eliminate the vtable and a
				818	bunch of other stuff from this example (see PR1604):
				819
				820	#include <cstdio>
				821	struct test {
				822	int val;
				823	virtual ~test() {}
				824	};
				825
				826	int main() {
				827	test t;
				828	std::scanf("%d", &t.val);
				829	std::printf("%d\n", t.val);
				830	}
				831
				832	//===---------------------------------------------------------------------===//
				833
Chris Lattner	3b364cb	2008-10-15 16:33:52 +0000	[diff] [blame]	834	Instcombine will merge comparisons like (x >= 10) && (x < 20) by producing (x -
				835	10) u< 10, but only when the comparisons have matching sign.
				836
				837	This could be converted with a similiar technique. (PR1941)
				838
				839	define i1 @test(i8 %x) {
				840	%A = icmp uge i8 %x, 5
				841	%B = icmp slt i8 %x, 20
				842	%C = and i1 %A, %B
				843	ret i1 %C
				844	}
				845
				846	//===---------------------------------------------------------------------===//
Nick Lewycky	df563ca	2008-11-27 22:12:22 +0000	[diff] [blame]	847
Nick Lewycky	d2f0db1	2008-11-27 22:41:45 +0000	[diff] [blame]	848	These functions perform the same computation, but produce different assembly.
Nick Lewycky	df563ca	2008-11-27 22:12:22 +0000	[diff] [blame]	849
				850	define i8 @select(i8 %x) readnone nounwind {
				851	%A = icmp ult i8 %x, 250
				852	%B = select i1 %A, i8 0, i8 1
				853	ret i8 %B
				854	}
				855
				856	define i8 @addshr(i8 %x) readnone nounwind {
				857	%A = zext i8 %x to i9
				858	%B = add i9 %A, 6 ;; 256 - 250 == 6
				859	%C = lshr i9 %B, 8
				860	%D = trunc i9 %C to i8
				861	ret i8 %D
				862	}
				863
				864	//===---------------------------------------------------------------------===//
Eli Friedman	4e16b29	2008-11-30 07:36:04 +0000	[diff] [blame]	865
				866	From gcc bug 24696:
				867	int
				868	f (unsigned long a, unsigned long b, unsigned long c)
				869	{
				870	return ((a & (c - 1)) != 0) \|\| ((b & (c - 1)) != 0);
				871	}
				872	int
				873	f (unsigned long a, unsigned long b, unsigned long c)
				874	{
				875	return ((a & (c - 1)) != 0) \| ((b & (c - 1)) != 0);
				876	}
				877	Both should combine to ((a\|b) & (c-1)) != 0. Currently not optimized with
				878	"clang -emit-llvm-bc \| opt -std-compile-opts".
				879
				880	//===---------------------------------------------------------------------===//
				881
				882	From GCC Bug 20192:
				883	#define PMD_MASK (~((1UL << 23) - 1))
				884	void clear_pmd_range(unsigned long start, unsigned long end)
				885	{
				886	if (!(start & ~PMD_MASK) && !(end & ~PMD_MASK))
				887	f();
				888	}
				889	The expression should optimize to something like
				890	"!((start\|end)&~PMD_MASK). Currently not optimized with "clang
				891	-emit-llvm-bc \| opt -std-compile-opts".
				892
				893	//===---------------------------------------------------------------------===//
				894
				895	From GCC Bug 15241:
				896	unsigned int
				897	foo (unsigned int a, unsigned int b)
				898	{
				899	if (a <= 7 && b <= 7)
				900	baz ();
				901	}
				902	Should combine to "(a\|b) <= 7". Currently not optimized with "clang
				903	-emit-llvm-bc \| opt -std-compile-opts".
				904
				905	//===---------------------------------------------------------------------===//
				906
				907	From GCC Bug 3756:
				908	int
				909	pn (int n)
				910	{
				911	return (n >= 0 ? 1 : -1);
				912	}
				913	Should combine to (n >> 31) \| 1. Currently not optimized with "clang
				914	-emit-llvm-bc \| opt -std-compile-opts \| llc".
				915
				916	//===---------------------------------------------------------------------===//
				917
				918	From GCC Bug 28685:
				919	int test(int a, int b)
				920	{
				921	int lt = a < b;
				922	int eq = a == b;
				923
				924	return (lt \|\| eq);
				925	}
				926	Should combine to "a <= b". Currently not optimized with "clang
				927	-emit-llvm-bc \| opt -std-compile-opts \| llc".
				928
				929	//===---------------------------------------------------------------------===//
				930
				931	void a(int variable)
				932	{
				933	if (variable == 4 \|\| variable == 6)
				934	bar();
				935	}
				936	This should optimize to "if ((variable \| 2) == 6)". Currently not
				937	optimized with "clang -emit-llvm-bc \| opt -std-compile-opts \| llc".
				938
				939	//===---------------------------------------------------------------------===//
				940
				941	unsigned int f(unsigned int i, unsigned int n) {++i; if (i == n) ++i; return
				942	i;}
				943	unsigned int f2(unsigned int i, unsigned int n) {++i; i += i == n; return i;}
				944	These should combine to the same thing. Currently, the first function
				945	produces better code on X86.
				946
				947	//===---------------------------------------------------------------------===//
				948
Eli Friedman	4e16b29	2008-11-30 07:36:04 +0000	[diff] [blame]	949	From GCC Bug 15784:
				950	#define abs(x) x>0?x:-x
				951	int f(int x, int y)
				952	{
				953	return (abs(x)) >= 0;
				954	}
				955	This should optimize to x == INT_MIN. (With -fwrapv.) Currently not
				956	optimized with "clang -emit-llvm-bc \| opt -std-compile-opts".
				957
				958	//===---------------------------------------------------------------------===//
				959
				960	From GCC Bug 14753:
				961	void
				962	rotate_cst (unsigned int a)
				963	{
				964	a = (a << 10) \| (a >> 22);
				965	if (a == 123)
				966	bar ();
				967	}
				968	void
				969	minus_cst (unsigned int a)
				970	{
				971	unsigned int tem;
				972
				973	tem = 20 - a;
				974	if (tem == 5)
				975	bar ();
				976	}
				977	void
				978	mask_gt (unsigned int a)
				979	{
				980	/* This is equivalent to a > 15. */
				981	if ((a & ~7) > 8)
				982	bar ();
				983	}
				984	void
				985	rshift_gt (unsigned int a)
				986	{
				987	/* This is equivalent to a > 23. */
				988	if ((a >> 2) > 5)
				989	bar ();
				990	}
				991	All should simplify to a single comparison. All of these are
				992	currently not optimized with "clang -emit-llvm-bc \| opt
				993	-std-compile-opts".
				994
				995	//===---------------------------------------------------------------------===//
				996
				997	From GCC Bug 32605:
				998	int c(int* x) {return (char)x+2 == (char)x;}
				999	Should combine to 0. Currently not optimized with "clang
				1000	-emit-llvm-bc \| opt -std-compile-opts" (although llc can optimize it).
				1001
				1002	//===---------------------------------------------------------------------===//
				1003
				1004	int a(unsigned char* b) {return *b > 99;}
				1005	There's an unnecessary zext in the generated code with "clang
				1006	-emit-llvm-bc \| opt -std-compile-opts".
				1007
				1008	//===---------------------------------------------------------------------===//
				1009
Eli Friedman	4e16b29	2008-11-30 07:36:04 +0000	[diff] [blame]	1010	int a(unsigned b) {return ((b << 31) \| (b << 30)) >> 31;}
				1011	Should be combined to "((b >> 1) \| b) & 1". Currently not optimized
				1012	with "clang -emit-llvm-bc \| opt -std-compile-opts".
				1013
				1014	//===---------------------------------------------------------------------===//
				1015
				1016	unsigned a(unsigned x, unsigned y) { return x \| (y & 1) \| (y & 2);}
				1017	Should combine to "x \| (y & 3)". Currently not optimized with "clang
				1018	-emit-llvm-bc \| opt -std-compile-opts".
				1019
				1020	//===---------------------------------------------------------------------===//
				1021
				1022	unsigned a(unsigned a) {return ((a \| 1) & 3) \| (a & -4);}
				1023	Should combine to "a \| 1". Currently not optimized with "clang
				1024	-emit-llvm-bc \| opt -std-compile-opts".
				1025
				1026	//===---------------------------------------------------------------------===//
				1027
Eli Friedman	4e16b29	2008-11-30 07:36:04 +0000	[diff] [blame]	1028	int a(int a, int b, int c) {return (~a & c) \| ((c\|a) & b);}
				1029	Should fold to "(~a & c) \| (a & b)". Currently not optimized with
				1030	"clang -emit-llvm-bc \| opt -std-compile-opts".
				1031
				1032	//===---------------------------------------------------------------------===//
				1033
				1034	int a(int a,int b) {return (~(a\|b))\|a;}
				1035	Should fold to "a\|~b". Currently not optimized with "clang
				1036	-emit-llvm-bc \| opt -std-compile-opts".
				1037
				1038	//===---------------------------------------------------------------------===//
				1039
				1040	int a(int a, int b) {return (a&&b) \|\| (a&&!b);}
				1041	Should fold to "a". Currently not optimized with "clang -emit-llvm-bc
				1042	\| opt -std-compile-opts".
				1043
				1044	//===---------------------------------------------------------------------===//
				1045
				1046	int a(int a, int b, int c) {return (a&&b) \|\| (!a&&c);}
				1047	Should fold to "a ? b : c", or at least something sane. Currently not
				1048	optimized with "clang -emit-llvm-bc \| opt -std-compile-opts".
				1049
				1050	//===---------------------------------------------------------------------===//
				1051
				1052	int a(int a, int b, int c) {return (a&&b) \|\| (a&&c) \|\| (a&&b&&c);}
				1053	Should fold to a && (b \|\| c). Currently not optimized with "clang
				1054	-emit-llvm-bc \| opt -std-compile-opts".
				1055
				1056	//===---------------------------------------------------------------------===//
				1057
				1058	int a(int x) {return x \| ((x & 8) ^ 8);}
				1059	Should combine to x \| 8. Currently not optimized with "clang
				1060	-emit-llvm-bc \| opt -std-compile-opts".
				1061
				1062	//===---------------------------------------------------------------------===//
				1063
				1064	int a(int x) {return x ^ ((x & 8) ^ 8);}
				1065	Should also combine to x \| 8. Currently not optimized with "clang
				1066	-emit-llvm-bc \| opt -std-compile-opts".
				1067
				1068	//===---------------------------------------------------------------------===//
				1069
				1070	int a(int x) {return (x & 8) == 0 ? -1 : -9;}
				1071	Should combine to (x \| -9) ^ 8. Currently not optimized with "clang
				1072	-emit-llvm-bc \| opt -std-compile-opts".
				1073
				1074	//===---------------------------------------------------------------------===//
				1075
				1076	int a(int x) {return (x & 8) == 0 ? -9 : -1;}
				1077	Should combine to x \| -9. Currently not optimized with "clang
				1078	-emit-llvm-bc \| opt -std-compile-opts".
				1079
				1080	//===---------------------------------------------------------------------===//
				1081
				1082	int a(int x) {return ((x \| -9) ^ 8) & x;}
				1083	Should combine to x & -9. Currently not optimized with "clang
				1084	-emit-llvm-bc \| opt -std-compile-opts".
				1085
				1086	//===---------------------------------------------------------------------===//
				1087
				1088	unsigned a(unsigned a) {return a * 0x11111111 >> 28 & 1;}
				1089	Should combine to "a * 0x88888888 >> 31". Currently not optimized
				1090	with "clang -emit-llvm-bc \| opt -std-compile-opts".
				1091
				1092	//===---------------------------------------------------------------------===//
				1093
				1094	unsigned a(char* x) {if ((*x & 32) == 0) return b();}
				1095	There's an unnecessary zext in the generated code with "clang
				1096	-emit-llvm-bc \| opt -std-compile-opts".
				1097
				1098	//===---------------------------------------------------------------------===//
				1099
				1100	unsigned a(unsigned long long x) {return 40 * (x >> 1);}
				1101	Should combine to "20 * (((unsigned)x) & -2)". Currently not
				1102	optimized with "clang -emit-llvm-bc \| opt -std-compile-opts".
				1103
				1104	//===---------------------------------------------------------------------===//
Bill Wendling	3bdcda8	2008-12-02 05:12:47 +0000	[diff] [blame]	1105
Chris Lattner	88d84b2	2008-12-02 06:32:34 +0000	[diff] [blame]	1106	This was noticed in the entryblock for grokdeclarator in 403.gcc:
				1107
				1108	%tmp = icmp eq i32 %decl_context, 4
				1109	%decl_context_addr.0 = select i1 %tmp, i32 3, i32 %decl_context
				1110	%tmp1 = icmp eq i32 %decl_context_addr.0, 1
				1111	%decl_context_addr.1 = select i1 %tmp1, i32 0, i32 %decl_context_addr.0
				1112
				1113	tmp1 should be simplified to something like:
				1114	(!tmp \|\| decl_context == 1)
				1115
				1116	This allows recursive simplifications, tmp1 is used all over the place in
				1117	the function, e.g. by:
				1118
				1119	%tmp23 = icmp eq i32 %decl_context_addr.1, 0 ; <i1> [#uses=1]
				1120	%tmp24 = xor i1 %tmp1, true ; <i1> [#uses=1]
				1121	%or.cond8 = and i1 %tmp23, %tmp24 ; <i1> [#uses=1]
				1122
				1123	later.
				1124
Chris Lattner	78a7e7c	2008-12-06 19:28:22 +0000	[diff] [blame]	1125	//===---------------------------------------------------------------------===//
				1126
				1127	Store sinking: This code:
				1128
				1129	void f (int n, int cond, int res) {
				1130	int i;
				1131	*res = 0;
				1132	for (i = 0; i < n; i++)
				1133	if (*cond)
				1134	res ^= 234; / () /
				1135	}
				1136
				1137	On this function GVN hoists the fully redundant value of *res, but nothing
				1138	moves the store out. This gives us this code:
				1139
				1140	bb: ; preds = %bb2, %entry
				1141	%.rle = phi i32 [ 0, %entry ], [ %.rle6, %bb2 ]
				1142	%i.05 = phi i32 [ 0, %entry ], [ %indvar.next, %bb2 ]
				1143	%1 = load i32* %cond, align 4
				1144	%2 = icmp eq i32 %1, 0
				1145	br i1 %2, label %bb2, label %bb1
				1146
				1147	bb1: ; preds = %bb
				1148	%3 = xor i32 %.rle, 234
				1149	store i32 %3, i32* %res, align 4
				1150	br label %bb2
				1151
				1152	bb2: ; preds = %bb, %bb1
				1153	%.rle6 = phi i32 [ %3, %bb1 ], [ %.rle, %bb ]
				1154	%indvar.next = add i32 %i.05, 1
				1155	%exitcond = icmp eq i32 %indvar.next, %n
				1156	br i1 %exitcond, label %return, label %bb
				1157
				1158	DSE should sink partially dead stores to get the store out of the loop.
				1159
Chris Lattner	6a09a74	2008-12-06 22:52:12 +0000	[diff] [blame]	1160	Here's another partial dead case:
				1161	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12395
				1162
Chris Lattner	78a7e7c	2008-12-06 19:28:22 +0000	[diff] [blame]	1163	//===---------------------------------------------------------------------===//
				1164
				1165	Scalar PRE hoists the mul in the common block up to the else:
				1166
				1167	int test (int a, int b, int c, int g) {
				1168	int d, e;
				1169	if (a)
				1170	d = b * c;
				1171	else
				1172	d = b - c;
				1173	e = b * c + g;
				1174	return d + e;
				1175	}
				1176
				1177	It would be better to do the mul once to reduce codesize above the if.
				1178	This is GCC PR38204.
				1179
				1180	//===---------------------------------------------------------------------===//
				1181
				1182	GCC PR37810 is an interesting case where we should sink load/store reload
				1183	into the if block and outside the loop, so we don't reload/store it on the
				1184	non-call path.
				1185
				1186	for () {
				1187	*P += 1;
				1188	if ()
				1189	call();
				1190	else
				1191	...
				1192	->
				1193	tmp = *P
				1194	for () {
				1195	tmp += 1;
				1196	if () {
				1197	*P = tmp;
				1198	call();
				1199	tmp = *P;
				1200	} else ...
				1201	}
				1202	*P = tmp;
				1203
Chris Lattner	8f416f3	2008-12-15 07:49:24 +0000	[diff] [blame]	1204	We now hoist the reload after the call (Transforms/GVN/lpre-call-wrap.ll), but
				1205	we don't sink the store. We need partially dead store sinking.
				1206
Chris Lattner	78a7e7c	2008-12-06 19:28:22 +0000	[diff] [blame]	1207	//===---------------------------------------------------------------------===//
				1208
Chris Lattner	8f416f3	2008-12-15 07:49:24 +0000	[diff] [blame]	1209	[PHI TRANSLATE GEPs]
				1210
Chris Lattner	78a7e7c	2008-12-06 19:28:22 +0000	[diff] [blame]	1211	GCC PR37166: Sinking of loads prevents SROA'ing the "g" struct on the stack
				1212	leading to excess stack traffic. This could be handled by GVN with some crazy
				1213	symbolic phi translation. The code we get looks like (g is on the stack):
				1214
				1215	bb2: ; preds = %bb1
				1216	..
				1217	%9 = getelementptr %struct.f* %g, i32 0, i32 0
				1218	store i32 %8, i32* %9, align bel %bb3
				1219
				1220	bb3: ; preds = %bb1, %bb2, %bb
				1221	%c_addr.0 = phi %struct.f* [ %g, %bb2 ], [ %c, %bb ], [ %c, %bb1 ]
				1222	%b_addr.0 = phi %struct.f* [ %b, %bb2 ], [ %g, %bb ], [ %b, %bb1 ]
				1223	%10 = getelementptr %struct.f* %c_addr.0, i32 0, i32 0
				1224	%11 = load i32* %10, align 4
				1225
				1226	%11 is fully redundant, an in BB2 it should have the value %8.
				1227
Chris Lattner	6a09a74	2008-12-06 22:52:12 +0000	[diff] [blame]	1228	GCC PR33344 is a similar case.
				1229
Chris Lattner	78a7e7c	2008-12-06 19:28:22 +0000	[diff] [blame]	1230	//===---------------------------------------------------------------------===//
				1231
Chris Lattner	6a09a74	2008-12-06 22:52:12 +0000	[diff] [blame]	1232	There are many load PRE testcases in testsuite/gcc.dg/tree-ssa/loadpre* in the
				1233	GCC testsuite. There are many pre testcases as ssa-pre-*.c
				1234
Chris Lattner	582048d	2008-12-15 08:32:28 +0000	[diff] [blame]	1235	//===---------------------------------------------------------------------===//
				1236
				1237	There are some interesting cases in testsuite/gcc.dg/tree-ssa/pred-comm* in the
				1238	GCC testsuite. For example, predcom-1.c is:
				1239
				1240	for (i = 2; i < 1000; i++)
				1241	fib[i] = (fib[i-1] + fib[i - 2]) & 0xffff;
				1242
				1243	which compiles into:
				1244
				1245	bb1: ; preds = %bb1, %bb1.thread
				1246	%indvar = phi i32 [ 0, %bb1.thread ], [ %0, %bb1 ]
				1247	%i.0.reg2mem.0 = add i32 %indvar, 2
				1248	%0 = add i32 %indvar, 1 ; <i32> [#uses=3]
				1249	%1 = getelementptr [1000 x i32]* @fib, i32 0, i32 %0
				1250	%2 = load i32* %1, align 4 ; <i32> [#uses=1]
				1251	%3 = getelementptr [1000 x i32]* @fib, i32 0, i32 %indvar
				1252	%4 = load i32* %3, align 4 ; <i32> [#uses=1]
				1253	%5 = add i32 %4, %2 ; <i32> [#uses=1]
				1254	%6 = and i32 %5, 65535 ; <i32> [#uses=1]
				1255	%7 = getelementptr [1000 x i32]* @fib, i32 0, i32 %i.0.reg2mem.0
				1256	store i32 %6, i32* %7, align 4
				1257	%exitcond = icmp eq i32 %0, 998 ; <i1> [#uses=1]
				1258	br i1 %exitcond, label %return, label %bb1
				1259
				1260	This is basically:
				1261	LOAD fib[i+1]
				1262	LOAD fib[i]
				1263	STORE fib[i+2]
				1264
				1265	instead of handling this as a loop or other xform, all we'd need to do is teach
				1266	load PRE to phi translate the %0 add (i+1) into the predecessor as (i'+1+1) =
				1267	(i'+2) (where i' is the previous iteration of i). This would find the store
				1268	which feeds it.
				1269
				1270	predcom-2.c is apparently the same as predcom-1.c
				1271	predcom-3.c is very similar but needs loads feeding each other instead of
				1272	store->load.
				1273	predcom-4.c seems the same as the rest.
				1274
				1275
				1276	//===---------------------------------------------------------------------===//
				1277
Chris Lattner	6a09a74	2008-12-06 22:52:12 +0000	[diff] [blame]	1278	Other simple load PRE cases:
Chris Lattner	8f416f3	2008-12-15 07:49:24 +0000	[diff] [blame]	1279	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35287 [LPRE crit edge splitting]
				1280
				1281	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34677 (licm does this, LPRE crit edge)
				1282	llvm-gcc t2.c -S -o - -O0 -emit-llvm \| llvm-as \| opt -mem2reg -simplifycfg -gvn \| llvm-dis
				1283
Chris Lattner	93c6c77	2009-09-21 02:53:57 +0000	[diff] [blame]	1284	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16799 [BITCAST PHI TRANS]
				1285
Chris Lattner	582048d	2008-12-15 08:32:28 +0000	[diff] [blame]	1286	//===---------------------------------------------------------------------===//
				1287
				1288	Type based alias analysis:
Chris Lattner	6a09a74	2008-12-06 22:52:12 +0000	[diff] [blame]	1289	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14705
				1290
				1291	//===---------------------------------------------------------------------===//
				1292
Chris Lattner	6a09a74	2008-12-06 22:52:12 +0000	[diff] [blame]	1293	A/B get pinned to the stack because we turn an if/then into a select instead
				1294	of PRE'ing the load/store. This may be fixable in instcombine:
				1295	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37892
				1296
Chris Lattner	93c6c77	2009-09-21 02:53:57 +0000	[diff] [blame]	1297	struct X { int i; };
				1298	int foo (int x) {
				1299	struct X a;
				1300	struct X b;
				1301	struct X *p;
				1302	a.i = 1;
				1303	b.i = 2;
				1304	if (x)
				1305	p = &a;
				1306	else
				1307	p = &b;
				1308	return p->i;
				1309	}
Chris Lattner	582048d	2008-12-15 08:32:28 +0000	[diff] [blame]	1310
Chris Lattner	93c6c77	2009-09-21 02:53:57 +0000	[diff] [blame]	1311	//===---------------------------------------------------------------------===//
Chris Lattner	582048d	2008-12-15 08:32:28 +0000	[diff] [blame]	1312
Chris Lattner	6a09a74	2008-12-06 22:52:12 +0000	[diff] [blame]	1313	Interesting missed case because of control flow flattening (should be 2 loads):
				1314	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26629
Chris Lattner	582048d	2008-12-15 08:32:28 +0000	[diff] [blame]	1315	With: llvm-gcc t2.c -S -o - -O0 -emit-llvm \| llvm-as \|
				1316	opt -mem2reg -gvn -instcombine \| llvm-dis
				1317	we miss it because we need 1) GEP PHI TRAN, 2) CRIT EDGE 3) MULTIPLE DIFFERENT
				1318	VALS PRODUCED BY ONE BLOCK OVER DIFFERENT PATHS
Chris Lattner	6a09a74	2008-12-06 22:52:12 +0000	[diff] [blame]	1319
				1320	//===---------------------------------------------------------------------===//
				1321
				1322	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19633
				1323	We could eliminate the branch condition here, loading from null is undefined:
				1324
				1325	struct S { int w, x, y, z; };
				1326	struct T { int r; struct S s; };
				1327	void bar (struct S, int);
				1328	void foo (int a, struct T b)
				1329	{
				1330	struct S *c = 0;
				1331	if (a)
				1332	c = &b.s;
				1333	bar (*c, a);
				1334	}
				1335
				1336	//===---------------------------------------------------------------------===//
Chris Lattner	88d84b2	2008-12-02 06:32:34 +0000	[diff] [blame]	1337
Chris Lattner	9cf8ef6	2008-12-23 20:52:52 +0000	[diff] [blame]	1338	simplifylibcalls should do several optimizations for strspn/strcspn:
				1339
				1340	strcspn(x, "") -> strlen(x)
				1341	strcspn("", x) -> 0
				1342	strspn("", x) -> 0
				1343	strspn(x, "") -> strlen(x)
				1344	strspn(x, "a") -> strchr(x, 'a')-x
				1345
				1346	strcspn(x, "a") -> inlined loop for up to 3 letters (similarly for strspn):
				1347
				1348	size_t __strcspn_c3 (__const char *__s, int __reject1, int __reject2,
				1349	int __reject3) {
				1350	register size_t __result = 0;
				1351	while (__s[__result] != '\0' && __s[__result] != __reject1 &&
				1352	__s[__result] != __reject2 && __s[__result] != __reject3)
				1353	++__result;
				1354	return __result;
				1355	}
				1356
				1357	This should turn into a switch on the character. See PR3253 for some notes on
				1358	codegen.
				1359
				1360	456.hmmer apparently uses strcspn and strspn a lot. 471.omnetpp uses strspn.
				1361
				1362	//===---------------------------------------------------------------------===//
Chris Lattner	d23b799	2008-12-31 00:54:13 +0000	[diff] [blame]	1363
				1364	"gas" uses this idiom:
				1365	else if (strchr ("+-/%\|&^:[]()~", intel_parser.op_string))
				1366	..
				1367	else if (strchr ("<>", *intel_parser.op_string)
				1368
				1369	Those should be turned into a switch.
				1370
				1371	//===---------------------------------------------------------------------===//
Chris Lattner	ffb08f5	2009-01-08 06:52:57 +0000	[diff] [blame]	1372
				1373	252.eon contains this interesting code:
				1374
				1375	%3072 = getelementptr [100 x i8]* %tempString, i32 0, i32 0
				1376	%3073 = call i8* @strcpy(i8* %3072, i8* %3071) nounwind
				1377	%strlen = call i32 @strlen(i8* %3072) ; uses = 1
				1378	%endptr = getelementptr [100 x i8]* %tempString, i32 0, i32 %strlen
				1379	call void @llvm.memcpy.i32(i8* %endptr,
				1380	i8* getelementptr ([5 x i8]* @"\01LC42", i32 0, i32 0), i32 5, i32 1)
				1381	%3074 = call i32 @strlen(i8* %endptr) nounwind readonly
				1382
				1383	This is interesting for a couple reasons. First, in this:
				1384
				1385	%3073 = call i8* @strcpy(i8* %3072, i8* %3071) nounwind
				1386	%strlen = call i32 @strlen(i8* %3072)
				1387
				1388	The strlen could be replaced with: %strlen = sub %3072, %3073, because the
				1389	strcpy call returns a pointer to the end of the string. Based on that, the
				1390	endptr GEP just becomes equal to 3073, which eliminates a strlen call and GEP.
				1391
				1392	Second, the memcpy+strlen strlen can be replaced with:
				1393
				1394	%3074 = call i32 @strlen([5 x i8]* @"\01LC42") nounwind readonly
				1395
				1396	Because the destination was just copied into the specified memory buffer. This,
				1397	in turn, can be constant folded to "4".
				1398
				1399	In other code, it contains:
				1400
				1401	%endptr6978 = bitcast i8* %endptr69 to i32*
				1402	store i32 7107374, i32* %endptr6978, align 1
				1403	%3167 = call i32 @strlen(i8* %endptr69) nounwind readonly
				1404
				1405	Which could also be constant folded. Whatever is producing this should probably
				1406	be fixed to leave this as a memcpy from a string.
				1407
				1408	Further, eon also has an interesting partially redundant strlen call:
				1409
				1410	bb8: ; preds = %_ZN18eonImageCalculatorC1Ev.exit
				1411	%682 = getelementptr i8 %argv, i32 6 ; <i8> [#uses=2]
				1412	%683 = load i8** %682, align 4 ; <i8*> [#uses=4]
				1413	%684 = load i8* %683, align 1 ; <i8> [#uses=1]
				1414	%685 = icmp eq i8 %684, 0 ; <i1> [#uses=1]
				1415	br i1 %685, label %bb10, label %bb9
				1416
				1417	bb9: ; preds = %bb8
				1418	%686 = call i32 @strlen(i8* %683) nounwind readonly
				1419	%687 = icmp ugt i32 %686, 254 ; <i1> [#uses=1]
				1420	br i1 %687, label %bb10, label %bb11
				1421
				1422	bb10: ; preds = %bb9, %bb8
				1423	%688 = call i32 @strlen(i8* %683) nounwind readonly
				1424
				1425	This could be eliminated by doing the strlen once in bb8, saving code size and
				1426	improving perf on the bb8->9->10 path.
				1427
				1428	//===---------------------------------------------------------------------===//
Chris Lattner	9fee08f	2009-01-08 07:34:55 +0000	[diff] [blame]	1429
				1430	I see an interesting fully redundant call to strlen left in 186.crafty:InputMove
				1431	which looks like:
				1432	%movetext11 = getelementptr [128 x i8]* %movetext, i32 0, i32 0
				1433
				1434
				1435	bb62: ; preds = %bb55, %bb53
				1436	%promote.0 = phi i32 [ %169, %bb55 ], [ 0, %bb53 ]
				1437	%171 = call i32 @strlen(i8* %movetext11) nounwind readonly align 1
				1438	%172 = add i32 %171, -1 ; <i32> [#uses=1]
				1439	%173 = getelementptr [128 x i8]* %movetext, i32 0, i32 %172
				1440
				1441	... no stores ...
				1442	br i1 %or.cond, label %bb65, label %bb72
				1443
				1444	bb65: ; preds = %bb62
				1445	store i8 0, i8* %173, align 1
				1446	br label %bb72
				1447
				1448	bb72: ; preds = %bb65, %bb62
				1449	%trank.1 = phi i32 [ %176, %bb65 ], [ -1, %bb62 ]
				1450	%177 = call i32 @strlen(i8* %movetext11) nounwind readonly align 1
				1451
				1452	Note that on the bb62->bb72 path, that the %177 strlen call is partially
				1453	redundant with the %171 call. At worst, we could shove the %177 strlen call
				1454	up into the bb65 block moving it out of the bb62->bb72 path. However, note
				1455	that bb65 stores to the string, zeroing out the last byte. This means that on
				1456	that path the value of %177 is actually just %171-1. A sub is cheaper than a
				1457	strlen!
				1458
				1459	This pattern repeats several times, basically doing:
				1460
				1461	A = strlen(P);
				1462	P[A-1] = 0;
				1463	B = strlen(P);
				1464	where it is "obvious" that B = A-1.
				1465
				1466	//===---------------------------------------------------------------------===//
				1467
				1468	186.crafty contains this interesting pattern:
				1469
				1470	%77 = call i8* @strstr(i8* getelementptr ([6 x i8]* @"\01LC5", i32 0, i32 0),
				1471	i8* %30)
				1472	%phitmp648 = icmp eq i8* %77, getelementptr ([6 x i8]* @"\01LC5", i32 0, i32 0)
				1473	br i1 %phitmp648, label %bb70, label %bb76
				1474
				1475	bb70: ; preds = %OptionMatch.exit91, %bb69
				1476	%78 = call i32 @strlen(i8* %30) nounwind readonly align 1 ; <i32> [#uses=1]
				1477
				1478	This is basically:
				1479	cststr = "abcdef";
				1480	if (strstr(cststr, P) == cststr) {
				1481	x = strlen(P);
				1482	...
				1483
				1484	The strstr call would be significantly cheaper written as:
				1485
				1486	cststr = "abcdef";
				1487	if (memcmp(P, str, strlen(P)))
				1488	x = strlen(P);
				1489
				1490	This is memcmp+strlen instead of strstr. This also makes the strlen fully
				1491	redundant.
				1492
				1493	//===---------------------------------------------------------------------===//
				1494
				1495	186.crafty also contains this code:
				1496
				1497	%1906 = call i32 @strlen(i8* getelementptr ([32 x i8]* @pgn_event, i32 0,i32 0))
				1498	%1907 = getelementptr [32 x i8]* @pgn_event, i32 0, i32 %1906
				1499	%1908 = call i8* @strcpy(i8* %1907, i8* %1905) nounwind align 1
				1500	%1909 = call i32 @strlen(i8* getelementptr ([32 x i8]* @pgn_event, i32 0,i32 0))
				1501	%1910 = getelementptr [32 x i8]* @pgn_event, i32 0, i32 %1909
				1502
				1503	The last strlen is computable as 1908-@pgn_event, which means 1910=1908.
				1504
				1505	//===---------------------------------------------------------------------===//
				1506
				1507	186.crafty has this interesting pattern with the "out.4543" variable:
				1508
				1509	call void @llvm.memcpy.i32(
				1510	i8* getelementptr ([10 x i8]* @out.4543, i32 0, i32 0),
				1511	i8* getelementptr ([7 x i8]* @"\01LC28700", i32 0, i32 0), i32 7, i32 1)
				1512	%101 = call@printf(i8* ... @out.4543, i32 0, i32 0)) nounwind
				1513
				1514	It is basically doing:
				1515
				1516	memcpy(globalarray, "string");
				1517	printf(..., globalarray);
				1518
				1519	Anyway, by knowing that printf just reads the memory and forward substituting
				1520	the string directly into the printf, this eliminates reads from globalarray.
				1521	Since this pattern occurs frequently in crafty (due to the "DisplayTime" and
				1522	other similar functions) there are many stores to "out". Once all the printfs
				1523	stop using "out", all that is left is the memcpy's into it. This should allow
				1524	globalopt to remove the "stored only" global.
				1525
				1526	//===---------------------------------------------------------------------===//
				1527
Dan Gohman	8289b05	2009-01-20 01:07:33 +0000	[diff] [blame]	1528	This code:
				1529
				1530	define inreg i32 @foo(i8* inreg %p) nounwind {
				1531	%tmp0 = load i8* %p
				1532	%tmp1 = ashr i8 %tmp0, 5
				1533	%tmp2 = sext i8 %tmp1 to i32
				1534	ret i32 %tmp2
				1535	}
				1536
				1537	could be dagcombine'd to a sign-extending load with a shift.
				1538	For example, on x86 this currently gets this:
				1539
				1540	movb (%eax), %al
				1541	sarb $5, %al
				1542	movsbl %al, %eax
				1543
				1544	while it could get this:
				1545
				1546	movsbl (%eax), %eax
				1547	sarl $5, %eax
				1548
				1549	//===---------------------------------------------------------------------===//
Chris Lattner	256baa4	2009-01-22 07:16:03 +0000	[diff] [blame]	1550
				1551	GCC PR31029:
				1552
				1553	int test(int x) { return 1-x == x; } // --> return false
				1554	int test2(int x) { return 2-x == x; } // --> return x == 1 ?
				1555
				1556	Always foldable for odd constants, what is the rule for even?
				1557
				1558	//===---------------------------------------------------------------------===//
				1559
Torok Edwin	e46a686	2009-01-24 19:30:25 +0000	[diff] [blame]	1560	PR 3381: GEP to field of size 0 inside a struct could be turned into GEP
				1561	for next field in struct (which is at same address).
				1562
				1563	For example: store of float into { {{}}, float } could be turned into a store to
				1564	the float directly.
				1565
Torok Edwin	474479f	2009-02-20 18:42:06 +0000	[diff] [blame]	1566	//===---------------------------------------------------------------------===//
Nick Lewycky	20babb1	2009-02-25 06:52:48 +0000	[diff] [blame]	1567
Torok Edwin	474479f	2009-02-20 18:42:06 +0000	[diff] [blame]	1568	#include <math.h>
				1569	double foo(double a) { return sin(a); }
				1570
				1571	This compiles into this on x86-64 Linux:
				1572	foo:
				1573	subq $8, %rsp
				1574	call sin
				1575	addq $8, %rsp
				1576	ret
				1577	vs:
				1578
				1579	foo:
				1580	jmp sin
				1581
Nick Lewycky	20babb1	2009-02-25 06:52:48 +0000	[diff] [blame]	1582	//===---------------------------------------------------------------------===//
				1583
Chris Lattner	32c5f17	2009-05-11 17:41:40 +0000	[diff] [blame]	1584	The arg promotion pass should make use of nocapture to make its alias analysis
				1585	stuff much more precise.
				1586
				1587	//===---------------------------------------------------------------------===//
				1588
				1589	The following functions should be optimized to use a select instead of a
				1590	branch (from gcc PR40072):
				1591
				1592	char char_int(int m) {if(m>7) return 0; return m;}
				1593	int int_char(char m) {if(m>7) return 0; return m;}
				1594
				1595	//===---------------------------------------------------------------------===//
				1596
Nick Lewycky	20babb1	2009-02-25 06:52:48 +0000	[diff] [blame]	1597	Instcombine should replace the load with a constant in:
				1598
				1599	static const char x[4] = {'a', 'b', 'c', 'd'};
				1600
				1601	unsigned int y(void) {
				1602	return (unsigned int )x;
				1603	}
				1604
				1605	It currently only does this transformation when the size of the constant
				1606	is the same as the size of the integer (so, try x[5]) and the last byte
				1607	is a null (making it a C string). There's no need for these restrictions.
				1608
				1609	//===---------------------------------------------------------------------===//
				1610
Chris Lattner	d919a8b	2009-05-11 17:36:33 +0000	[diff] [blame]	1611	InstCombine's "turn load from constant into constant" optimization should be
				1612	more aggressive in the presence of bitcasts. For example, because of unions,
				1613	this code:
				1614
				1615	union vec2d {
				1616	double e[2];
				1617	double v __attribute__((vector_size(16)));
				1618	};
				1619	typedef union vec2d vec2d;
				1620
				1621	static vec2d a={{1,2}}, b={{3,4}};
				1622
				1623	vec2d foo () {
				1624	return (vec2d){ .v = a.v + b.v * (vec2d){{5,5}}.v };
				1625	}
				1626
				1627	Compiles into:
				1628
				1629	@a = internal constant %0 { [2 x double]
				1630	[double 1.000000e+00, double 2.000000e+00] }, align 16
				1631	@b = internal constant %0 { [2 x double]
				1632	[double 3.000000e+00, double 4.000000e+00] }, align 16
				1633	...
				1634	define void @foo(%struct.vec2d* noalias nocapture sret %agg.result) nounwind {
				1635	entry:
				1636	%0 = load <2 x double>* getelementptr (%struct.vec2d*
				1637	bitcast (%0* @a to %struct.vec2d*), i32 0, i32 0), align 16
				1638	%1 = load <2 x double>* getelementptr (%struct.vec2d*
				1639	bitcast (%0* @b to %struct.vec2d*), i32 0, i32 0), align 16
				1640
				1641
				1642	Instcombine should be able to optimize away the loads (and thus the globals).
				1643
Chris Lattner	c2b0d48	2009-09-14 16:49:26 +0000	[diff] [blame]	1644	See also PR4973
Chris Lattner	d919a8b	2009-05-11 17:36:33 +0000	[diff] [blame]	1645
				1646	//===---------------------------------------------------------------------===//