Blame - lib/Target/README.txt - fp2-dev/platform/external/llvm

blob: 8ff08fbf036e53b1ac893318307047d34b83cdfc [file] [log] [blame]

Chris Lattner	086c014	2006-02-03 06:21:43 +0000	[diff] [blame]	1	Target Independent Opportunities:
				2
Chris Lattner	f308ea0	2006-09-28 06:01:17 +0000	[diff] [blame]	3	//===---------------------------------------------------------------------===//
				4
Chris Lattner	9b62b45	2006-11-14 01:57:53 +0000	[diff] [blame]	5	With the recent changes to make the implicit def/use set explicit in
				6	machineinstrs, we should change the target descriptions for 'call' instructions
				7	so that the .td files don't list all the call-clobbered registers as implicit
				8	defs. Instead, these should be added by the code generator (e.g. on the dag).
				9
				10	This has a number of uses:
				11
				12	1. PPC32/64 and X86 32/64 can avoid having multiple copies of call instructions
				13	for their different impdef sets.
				14	2. Targets with multiple calling convs (e.g. x86) which have different clobber
				15	sets don't need copies of call instructions.
				16	3. 'Interprocedural register allocation' can be done to reduce the clobber sets
				17	of calls.
				18
				19	//===---------------------------------------------------------------------===//
				20
Nate Begeman	81e8097	2006-03-17 01:40:33 +0000	[diff] [blame]	21	Make the PPC branch selector target independant
				22
				23	//===---------------------------------------------------------------------===//
Chris Lattner	086c014	2006-02-03 06:21:43 +0000	[diff] [blame]	24
				25	Get the C front-end to expand hypot(x,y) -> llvm.sqrt(xx+yy) when errno and
Chris Lattner	2dae65d	2008-12-10 01:30:48 +0000	[diff] [blame]	26	precision don't matter (ffastmath). Misc/mandel will like this. :) This isn't
				27	safe in general, even on darwin. See the libm implementation of hypot for
				28	examples (which special case when x/y are exactly zero to get signed zeros etc
				29	right).
Chris Lattner	086c014	2006-02-03 06:21:43 +0000	[diff] [blame]	30
Chris Lattner	086c014	2006-02-03 06:21:43 +0000	[diff] [blame]	31	//===---------------------------------------------------------------------===//
				32
				33	Solve this DAG isel folding deficiency:
				34
				35	int X, Y;
				36
				37	void fn1(void)
				38	{
				39	X = X \| (Y << 3);
				40	}
				41
				42	compiles to
				43
				44	fn1:
				45	movl Y, %eax
				46	shll $3, %eax
				47	orl X, %eax
				48	movl %eax, X
				49	ret
				50
				51	The problem is the store's chain operand is not the load X but rather
				52	a TokenFactor of the load X and load Y, which prevents the folding.
				53
				54	There are two ways to fix this:
				55
				56	1. The dag combiner can start using alias analysis to realize that y/x
				57	don't alias, making the store to X not dependent on the load from Y.
				58	2. The generated isel could be made smarter in the case it can't
				59	disambiguate the pointers.
				60
				61	Number 1 is the preferred solution.
				62
Evan Cheng	e617b08	2006-03-13 23:19:10 +0000	[diff] [blame]	63	This has been "fixed" by a TableGen hack. But that is a short term workaround
				64	which will be removed once the proper fix is made.
				65
Chris Lattner	086c014	2006-02-03 06:21:43 +0000	[diff] [blame]	66	//===---------------------------------------------------------------------===//
				67
Chris Lattner	b27b69f	2006-03-04 01:19:34 +0000	[diff] [blame]	68	On targets with expensive 64-bit multiply, we could LSR this:
				69
				70	for (i = ...; ++i) {
				71	x = 1ULL << i;
				72
				73	into:
				74	long long tmp = 1;
				75	for (i = ...; ++i, tmp+=tmp)
				76	x = tmp;
				77
				78	This would be a win on ppc32, but not x86 or ppc64.
				79
Chris Lattner	ad01993	2006-03-04 08:44:51 +0000	[diff] [blame]	80	//===---------------------------------------------------------------------===//
Chris Lattner	5b0fe7d	2006-03-05 20:00:08 +0000	[diff] [blame]	81
				82	Shrink: (setlt (loadi32 P), 0) -> (setlt (loadi8 Phi), 0)
				83
				84	//===---------------------------------------------------------------------===//
Chris Lattner	549f27d2	2006-03-07 02:46:26 +0000	[diff] [blame]	85
Chris Lattner	c20995e	2006-03-11 20:17:08 +0000	[diff] [blame]	86	Reassociate should turn: XXXX -> t=(XX) (t*t) to eliminate a multiply.
				87
				88	//===---------------------------------------------------------------------===//
				89
Chris Lattner	74cfb7d	2006-03-11 20:20:40 +0000	[diff] [blame]	90	Interesting? testcase for add/shift/mul reassoc:
				91
				92	int bar(int x, int y) {
				93	return xxx+y+xxxxxyyyy;
				94	}
				95	int foo(int z, int n) {
				96	return bar(z, n) + bar(2z, 2n);
				97	}
				98
Chris Lattner	5e14b0d	2007-05-05 22:29:06 +0000	[diff] [blame]	99	Reassociate should handle the example in GCC PR16157.
				100
Chris Lattner	74cfb7d	2006-03-11 20:20:40 +0000	[diff] [blame]	101	//===---------------------------------------------------------------------===//
				102
Chris Lattner	82c78b2	2006-03-09 20:13:21 +0000	[diff] [blame]	103	These two functions should generate the same code on big-endian systems:
				104
				105	int g(int j,int l) { return memcmp(j,l,4); }
				106	int h(int j, int l) { return j - l; }
				107
				108	this could be done in SelectionDAGISel.cpp, along with other special cases,
				109	for 1,2,4,8 bytes.
				110
				111	//===---------------------------------------------------------------------===//
				112
Chris Lattner	c04b423	2006-03-22 07:33:46 +0000	[diff] [blame]	113	It would be nice to revert this patch:
				114	http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060213/031986.html
				115
				116	And teach the dag combiner enough to simplify the code expanded before
				117	legalize. It seems plausible that this knowledge would let it simplify other
				118	stuff too.
				119
Chris Lattner	e6cd96d	2006-03-24 19:59:17 +0000	[diff] [blame]	120	//===---------------------------------------------------------------------===//
				121
Reid Spencer	ac9dcb9	2007-02-15 03:39:18 +0000	[diff] [blame]	122	For vector types, TargetData.cpp::getTypeInfo() returns alignment that is equal
Evan Cheng	67d3d4c	2006-03-31 22:35:14 +0000	[diff] [blame]	123	to the type size. It works but can be overly conservative as the alignment of
Reid Spencer	ac9dcb9	2007-02-15 03:39:18 +0000	[diff] [blame]	124	specific vector types are target dependent.
Chris Lattner	eaa7c06	2006-04-01 04:08:29 +0000	[diff] [blame]	125
				126	//===---------------------------------------------------------------------===//
				127
Dan Gohman	1f3be1a	2009-05-11 18:51:16 +0000	[diff] [blame]	128	We should produce an unaligned load from code like this:
Chris Lattner	eaa7c06	2006-04-01 04:08:29 +0000	[diff] [blame]	129
				130	v4sf example(float *P) {
				131	return (v4sf){P[0], P[1], P[2], P[3] };
				132	}
				133
				134	//===---------------------------------------------------------------------===//
				135
Chris Lattner	16abfdf	2006-05-18 18:26:13 +0000	[diff] [blame]	136	Add support for conditional increments, and other related patterns. Instead
				137	of:
				138
				139	movl 136(%esp), %eax
				140	cmpl $0, %eax
				141	je LBB16_2 #cond_next
				142	LBB16_1: #cond_true
				143	incl _foo
				144	LBB16_2: #cond_next
				145
				146	emit:
				147	movl _foo, %eax
				148	cmpl $1, %edi
				149	sbbl $-1, %eax
				150	movl %eax, _foo
				151
				152	//===---------------------------------------------------------------------===//
Chris Lattner	870cf1b	2006-05-19 20:45:08 +0000	[diff] [blame]	153
				154	Combine: a = sin(x), b = cos(x) into a,b = sincos(x).
				155
				156	Expand these to calls of sin/cos and stores:
				157	double sincos(double x, double sin, double cos);
				158	float sincosf(float x, float sin, float cos);
				159	long double sincosl(long double x, long double sin, long double cos);
				160
				161	Doing so could allow SROA of the destination pointers. See also:
				162	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17687
				163
Chris Lattner	2dae65d	2008-12-10 01:30:48 +0000	[diff] [blame]	164	This is now easily doable with MRVs. We could even make an intrinsic for this
				165	if anyone cared enough about sincos.
				166
Chris Lattner	870cf1b	2006-05-19 20:45:08 +0000	[diff] [blame]	167	//===---------------------------------------------------------------------===//
Chris Lattner	f00f68a	2006-05-19 21:01:38 +0000	[diff] [blame]	168
Chris Lattner	e8263e6	2006-05-21 03:57:07 +0000	[diff] [blame]	169	Turn this into a single byte store with no load (the other 3 bytes are
				170	unmodified):
				171
Dan Gohman	5c8274b	2009-05-11 18:04:52 +0000	[diff] [blame]	172	define void @test(i32* %P) {
				173	%tmp = load i32* %P
				174	%tmp14 = or i32 %tmp, 3305111552
				175	%tmp15 = and i32 %tmp14, 3321888767
				176	store i32 %tmp15, i32* %P
Chris Lattner	e8263e6	2006-05-21 03:57:07 +0000	[diff] [blame]	177	ret void
				178	}
				179
Chris Lattner	9e18ef5	2006-05-30 21:29:15 +0000	[diff] [blame]	180	//===---------------------------------------------------------------------===//
				181
				182	dag/inst combine "clz(x)>>5 -> x==0" for 32-bit x.
				183
				184	Compile:
				185
				186	int bar(int x)
				187	{
				188	int t = __builtin_clz(x);
				189	return -(t>>5);
				190	}
				191
				192	to:
				193
				194	_bar: addic r3,r3,-1
				195	subfe r3,r3,r3
				196	blr
				197
Chris Lattner	cbce2f6	2006-09-15 20:31:36 +0000	[diff] [blame]	198	//===---------------------------------------------------------------------===//
				199
Chris Lattner	7ed96ab	2006-09-16 23:57:51 +0000	[diff] [blame]	200	quantum_sigma_x in 462.libquantum contains the following loop:
				201
				202	for(i=0; i<reg->size; i++)
				203	{
				204	/* Flip the target bit of each basis state */
				205	reg->node[i].state ^= ((MAX_UNSIGNED) 1 << target);
				206	}
				207
				208	Where MAX_UNSIGNED/state is a 64-bit int. On a 32-bit platform it would be just
				209	so cool to turn it into something like:
				210
Chris Lattner	b33a42a	2006-09-18 04:54:35 +0000	[diff] [blame]	211	long long Res = ((MAX_UNSIGNED) 1 << target);
Chris Lattner	7ed96ab	2006-09-16 23:57:51 +0000	[diff] [blame]	212	if (target < 32) {
				213	for(i=0; i<reg->size; i++)
Chris Lattner	b33a42a	2006-09-18 04:54:35 +0000	[diff] [blame]	214	reg->node[i].state ^= Res & 0xFFFFFFFFULL;
Chris Lattner	7ed96ab	2006-09-16 23:57:51 +0000	[diff] [blame]	215	} else {
				216	for(i=0; i<reg->size; i++)
Chris Lattner	b33a42a	2006-09-18 04:54:35 +0000	[diff] [blame]	217	reg->node[i].state ^= Res & 0xFFFFFFFF00000000ULL
Chris Lattner	7ed96ab	2006-09-16 23:57:51 +0000	[diff] [blame]	218	}
				219
				220	... which would only do one 32-bit XOR per loop iteration instead of two.
				221
				222	It would also be nice to recognize the reg->size doesn't alias reg->node[i], but
				223	alas...
				224
				225	//===---------------------------------------------------------------------===//
Chris Lattner	fb981f3	2006-09-25 17:12:14 +0000	[diff] [blame]	226
Chris Lattner	b1ac769	2008-10-05 02:16:12 +0000	[diff] [blame]	227	This isn't recognized as bswap by instcombine (yes, it really is bswap):
Chris Lattner	f9bae43	2006-12-08 02:01:32 +0000	[diff] [blame]	228
				229	unsigned long reverse(unsigned v) {
				230	unsigned t;
				231	t = v ^ ((v << 16) \| (v >> 16));
				232	t &= ~0xff0000;
				233	v = (v << 24) \| (v >> 8);
				234	return v ^ (t >> 8);
				235	}
				236
Chris Lattner	fb981f3	2006-09-25 17:12:14 +0000	[diff] [blame]	237	//===---------------------------------------------------------------------===//
				238
Chris Lattner	f4fee2a	2008-10-15 16:02:15 +0000	[diff] [blame]	239	These idioms should be recognized as popcount (see PR1488):
				240
				241	unsigned countbits_slow(unsigned v) {
				242	unsigned c;
				243	for (c = 0; v; v >>= 1)
				244	c += v & 1;
				245	return c;
				246	}
				247	unsigned countbits_fast(unsigned v){
				248	unsigned c;
				249	for (c = 0; v; c++)
				250	v &= v - 1; // clear the least significant bit set
				251	return c;
				252	}
				253
				254	BITBOARD = unsigned long long
				255	int PopCnt(register BITBOARD a) {
				256	register int c=0;
				257	while(a) {
				258	c++;
				259	a &= a - 1;
				260	}
				261	return c;
				262	}
				263	unsigned int popcount(unsigned int input) {
				264	unsigned int count = 0;
				265	for (unsigned int i = 0; i < 4 * 8; i++)
				266	count += (input >> i) & i;
				267	return count;
				268	}
				269
				270	//===---------------------------------------------------------------------===//
				271
Chris Lattner	fb981f3	2006-09-25 17:12:14 +0000	[diff] [blame]	272	These should turn into single 16-bit (unaligned?) loads on little/big endian
				273	processors.
				274
				275	unsigned short read_16_le(const unsigned char *adr) {
				276	return adr[0] \| (adr[1] << 8);
				277	}
				278	unsigned short read_16_be(const unsigned char *adr) {
				279	return (adr[0] << 8) \| adr[1];
				280	}
				281
				282	//===---------------------------------------------------------------------===//
Chris Lattner	cf10391	2006-10-24 16:12:47 +0000	[diff] [blame]	283
Reid Spencer	1628cec	2006-10-26 06:15:43 +0000	[diff] [blame]	284	-instcombine should handle this transform:
Reid Spencer	e4d87aa	2006-12-23 06:05:41 +0000	[diff] [blame]	285	icmp pred (sdiv X / C1 ), C2
Reid Spencer	1628cec	2006-10-26 06:15:43 +0000	[diff] [blame]	286	when X, C1, and C2 are unsigned. Similarly for udiv and signed operands.
				287
				288	Currently InstCombine avoids this transform but will do it when the signs of
				289	the operands and the sign of the divide match. See the FIXME in
				290	InstructionCombining.cpp in the visitSetCondInst method after the switch case
				291	for Instruction::UDiv (around line 4447) for more details.
				292
				293	The SingleSource/Benchmarks/Shootout-C++/hash and hash2 tests have examples of
				294	this construct.
Chris Lattner	d7c628d	2006-11-03 22:27:39 +0000	[diff] [blame]	295
				296	//===---------------------------------------------------------------------===//
				297
Chris Lattner	578d2df	2006-11-10 00:23:26 +0000	[diff] [blame]	298	viterbi speeds up significantly if the various "history" related copy loops
				299	are turned into memcpy calls at the source level. We need a "loops to memcpy"
				300	pass.
				301
				302	//===---------------------------------------------------------------------===//
Nick Lewycky	bf63734	2006-11-13 00:23:28 +0000	[diff] [blame]	303
Chris Lattner	03a6d96	2007-01-16 06:39:48 +0000	[diff] [blame]	304	Consider:
				305
				306	typedef unsigned U32;
				307	typedef unsigned long long U64;
				308	int test (U32 inst, U64 regs) {
				309	U64 effective_addr2;
				310	U32 temp = *inst;
				311	int r1 = (temp >> 20) & 0xf;
				312	int b2 = (temp >> 16) & 0xf;
				313	effective_addr2 = temp & 0xfff;
				314	if (b2) effective_addr2 += regs[b2];
				315	b2 = (temp >> 12) & 0xf;
				316	if (b2) effective_addr2 += regs[b2];
				317	effective_addr2 &= regs[4];
				318	if ((effective_addr2 & 3) == 0)
				319	return 1;
				320	return 0;
				321	}
				322
				323	Note that only the low 2 bits of effective_addr2 are used. On 32-bit systems,
				324	we don't eliminate the computation of the top half of effective_addr2 because
				325	we don't have whole-function selection dags. On x86, this means we use one
				326	extra register for the function when effective_addr2 is declared as U64 than
				327	when it is declared U32.
				328
				329	//===---------------------------------------------------------------------===//
				330
Chris Lattner	1a77a55	2007-03-24 06:01:32 +0000	[diff] [blame]	331	LSR should know what GPR types a target has. This code:
				332
				333	volatile short X, Y; // globals
				334
				335	void foo(int N) {
				336	int i;
				337	for (i = 0; i < N; i++) { X = i; Y = i*4; }
				338	}
				339
Chris Lattner	c1491f3	2009-09-20 17:37:38 +0000	[diff] [blame]	340	produces two near identical IV's (after promotion) on PPC/ARM:
Chris Lattner	1a77a55	2007-03-24 06:01:32 +0000	[diff] [blame]	341
Chris Lattner	c1491f3	2009-09-20 17:37:38 +0000	[diff] [blame]	342	LBB1_2:
				343	ldr r3, LCPI1_0
				344	ldr r3, [r3]
				345	strh r2, [r3]
				346	ldr r3, LCPI1_1
				347	ldr r3, [r3]
				348	strh r1, [r3]
				349	add r1, r1, #4
				350	add r2, r2, #1 <- [0,+,1]
				351	sub r0, r0, #1 <- [0,-,1]
				352	cmp r0, #0
				353	bne LBB1_2
				354
				355	LSR should reuse the "+" IV for the exit test.
Chris Lattner	1a77a55	2007-03-24 06:01:32 +0000	[diff] [blame]	356
				357
				358	//===---------------------------------------------------------------------===//
				359
Chris Lattner	5e14b0d	2007-05-05 22:29:06 +0000	[diff] [blame]	360	Tail call elim should be more aggressive, checking to see if the call is
				361	followed by an uncond branch to an exit block.
				362
				363	; This testcase is due to tail-duplication not wanting to copy the return
				364	; instruction into the terminating blocks because there was other code
				365	; optimized out of the function after the taildup happened.
Chris Lattner	7c4e9a4	2008-02-18 18:46:39 +0000	[diff] [blame]	366	; RUN: llvm-as < %s \| opt -tailcallelim \| llvm-dis \| not grep call
Chris Lattner	5e14b0d	2007-05-05 22:29:06 +0000	[diff] [blame]	367
Chris Lattner	7c4e9a4	2008-02-18 18:46:39 +0000	[diff] [blame]	368	define i32 @t4(i32 %a) {
Chris Lattner	5e14b0d	2007-05-05 22:29:06 +0000	[diff] [blame]	369	entry:
Chris Lattner	7c4e9a4	2008-02-18 18:46:39 +0000	[diff] [blame]	370	%tmp.1 = and i32 %a, 1 ; <i32> [#uses=1]
				371	%tmp.2 = icmp ne i32 %tmp.1, 0 ; <i1> [#uses=1]
				372	br i1 %tmp.2, label %then.0, label %else.0
Chris Lattner	5e14b0d	2007-05-05 22:29:06 +0000	[diff] [blame]	373
Chris Lattner	7c4e9a4	2008-02-18 18:46:39 +0000	[diff] [blame]	374	then.0: ; preds = %entry
				375	%tmp.5 = add i32 %a, -1 ; <i32> [#uses=1]
				376	%tmp.3 = call i32 @t4( i32 %tmp.5 ) ; <i32> [#uses=1]
				377	br label %return
Chris Lattner	5e14b0d	2007-05-05 22:29:06 +0000	[diff] [blame]	378
Chris Lattner	7c4e9a4	2008-02-18 18:46:39 +0000	[diff] [blame]	379	else.0: ; preds = %entry
				380	%tmp.7 = icmp ne i32 %a, 0 ; <i1> [#uses=1]
				381	br i1 %tmp.7, label %then.1, label %return
Chris Lattner	5e14b0d	2007-05-05 22:29:06 +0000	[diff] [blame]	382
Chris Lattner	7c4e9a4	2008-02-18 18:46:39 +0000	[diff] [blame]	383	then.1: ; preds = %else.0
				384	%tmp.11 = add i32 %a, -2 ; <i32> [#uses=1]
				385	%tmp.9 = call i32 @t4( i32 %tmp.11 ) ; <i32> [#uses=1]
				386	br label %return
Chris Lattner	5e14b0d	2007-05-05 22:29:06 +0000	[diff] [blame]	387
Chris Lattner	7c4e9a4	2008-02-18 18:46:39 +0000	[diff] [blame]	388	return: ; preds = %then.1, %else.0, %then.0
				389	%result.0 = phi i32 [ 0, %else.0 ], [ %tmp.3, %then.0 ],
Chris Lattner	5e14b0d	2007-05-05 22:29:06 +0000	[diff] [blame]	390	[ %tmp.9, %then.1 ]
Chris Lattner	7c4e9a4	2008-02-18 18:46:39 +0000	[diff] [blame]	391	ret i32 %result.0
Chris Lattner	5e14b0d	2007-05-05 22:29:06 +0000	[diff] [blame]	392	}
Chris Lattner	f110a2b	2007-05-05 22:44:08 +0000	[diff] [blame]	393
				394	//===---------------------------------------------------------------------===//
				395
Chris Lattner	e1bb6ab	2007-10-03 06:10:59 +0000	[diff] [blame]	396	Tail recursion elimination is not transforming this function, because it is
				397	returning n, which fails the isDynamicConstant check in the accumulator
				398	recursion checks.
				399
				400	long long fib(const long long n) {
				401	switch(n) {
				402	case 0:
				403	case 1:
				404	return n;
				405	default:
				406	return fib(n-1) + fib(n-2);
				407	}
				408	}
				409
				410	//===---------------------------------------------------------------------===//
				411
Chris Lattner	c90b866	2008-08-10 00:47:21 +0000	[diff] [blame]	412	Tail recursion elimination should handle:
				413
				414	int pow2m1(int n) {
				415	if (n == 0)
				416	return 0;
				417	return 2 * pow2m1 (n - 1) + 1;
				418	}
				419
				420	Also, multiplies can be turned into SHL's, so they should be handled as if
				421	they were associative. "return foo() << 1" can be tail recursion eliminated.
				422
				423	//===---------------------------------------------------------------------===//
				424
Chris Lattner	f110a2b	2007-05-05 22:44:08 +0000	[diff] [blame]	425	Argument promotion should promote arguments for recursive functions, like
				426	this:
				427
Chris Lattner	7c4e9a4	2008-02-18 18:46:39 +0000	[diff] [blame]	428	; RUN: llvm-as < %s \| opt -argpromotion \| llvm-dis \| grep x.val
Chris Lattner	f110a2b	2007-05-05 22:44:08 +0000	[diff] [blame]	429
Chris Lattner	7c4e9a4	2008-02-18 18:46:39 +0000	[diff] [blame]	430	define internal i32 @foo(i32* %x) {
Chris Lattner	f110a2b	2007-05-05 22:44:08 +0000	[diff] [blame]	431	entry:
Chris Lattner	7c4e9a4	2008-02-18 18:46:39 +0000	[diff] [blame]	432	%tmp = load i32* %x ; <i32> [#uses=0]
				433	%tmp.foo = call i32 @foo( i32* %x ) ; <i32> [#uses=1]
				434	ret i32 %tmp.foo
Chris Lattner	f110a2b	2007-05-05 22:44:08 +0000	[diff] [blame]	435	}
				436
Chris Lattner	7c4e9a4	2008-02-18 18:46:39 +0000	[diff] [blame]	437	define i32 @bar(i32* %x) {
Chris Lattner	f110a2b	2007-05-05 22:44:08 +0000	[diff] [blame]	438	entry:
Chris Lattner	7c4e9a4	2008-02-18 18:46:39 +0000	[diff] [blame]	439	%tmp3 = call i32 @foo( i32* %x ) ; <i32> [#uses=1]
				440	ret i32 %tmp3
Chris Lattner	f110a2b	2007-05-05 22:44:08 +0000	[diff] [blame]	441	}
				442
Chris Lattner	81f2d71	2007-12-05 23:05:06 +0000	[diff] [blame]	443	//===---------------------------------------------------------------------===//
Chris Lattner	166a268	2007-12-28 04:42:05 +0000	[diff] [blame]	444
				445	"basicaa" should know how to look through "or" instructions that act like add
				446	instructions. For example in this code, the x4+1 is turned into x4 \| 1, and
				447	basicaa can't analyze the array subscript, leading to duplicated loads in the
				448	generated code:
				449
				450	void test(int X, int Y, int a[]) {
				451	int i;
				452	for (i=2; i<1000; i+=4) {
				453	a[i+0] = a[i-1+0]*a[i-2+0];
				454	a[i+1] = a[i-1+1]*a[i-2+1];
				455	a[i+2] = a[i-1+2]*a[i-2+2];
				456	a[i+3] = a[i-1+3]*a[i-2+3];
				457	}
				458	}
				459
Chris Lattner	2dae65d	2008-12-10 01:30:48 +0000	[diff] [blame]	460	BasicAA also doesn't do this for add. It needs to know that &A[i+1] != &A[i].
				461
Chris Lattner	a1643ba	2007-12-28 22:30:05 +0000	[diff] [blame]	462	//===---------------------------------------------------------------------===//
Chris Lattner	166a268	2007-12-28 04:42:05 +0000	[diff] [blame]	463
Chris Lattner	a1643ba	2007-12-28 22:30:05 +0000	[diff] [blame]	464	We should investigate an instruction sinking pass. Consider this silly
				465	example in pic mode:
				466
				467	#include <assert.h>
				468	void foo(int x) {
				469	assert(x);
				470	//...
				471	}
				472
				473	we compile this to:
				474	_foo:
				475	subl $28, %esp
				476	call "L1$pb"
				477	"L1$pb":
				478	popl %eax
				479	cmpl $0, 32(%esp)
				480	je LBB1_2 # cond_true
				481	LBB1_1: # return
				482	# ...
				483	addl $28, %esp
				484	ret
				485	LBB1_2: # cond_true
				486	...
				487
				488	The PIC base computation (call+popl) is only used on one path through the
				489	code, but is currently always computed in the entry block. It would be
				490	better to sink the picbase computation down into the block for the
				491	assertion, as it is the only one that uses it. This happens for a lot of
				492	code with early outs.
				493
Chris Lattner	92c06a0	2007-12-29 01:05:01 +0000	[diff] [blame]	494	Another example is loads of arguments, which are usually emitted into the
				495	entry block on targets like x86. If not used in all paths through a
				496	function, they should be sunk into the ones that do.
				497
Chris Lattner	a1643ba	2007-12-28 22:30:05 +0000	[diff] [blame]	498	In this case, whole-function-isel would also handle this.
Chris Lattner	166a268	2007-12-28 04:42:05 +0000	[diff] [blame]	499
				500	//===---------------------------------------------------------------------===//
Chris Lattner	b304194	2008-01-07 21:38:14 +0000	[diff] [blame]	501
				502	Investigate lowering of sparse switch statements into perfect hash tables:
				503	http://burtleburtle.net/bob/hash/perfect.html
				504
				505	//===---------------------------------------------------------------------===//
Chris Lattner	f61b63e	2008-01-09 00:17:57 +0000	[diff] [blame]	506
				507	We should turn things like "load+fabs+store" and "load+fneg+store" into the
				508	corresponding integer operations. On a yonah, this loop:
				509
				510	double a[256];
Chris Lattner	7c4e9a4	2008-02-18 18:46:39 +0000	[diff] [blame]	511	void foo() {
				512	int i, b;
				513	for (b = 0; b < 10000000; b++)
				514	for (i = 0; i < 256; i++)
				515	a[i] = -a[i];
				516	}
Chris Lattner	f61b63e	2008-01-09 00:17:57 +0000	[diff] [blame]	517
				518	is twice as slow as this loop:
				519
				520	long long a[256];
Chris Lattner	7c4e9a4	2008-02-18 18:46:39 +0000	[diff] [blame]	521	void foo() {
				522	int i, b;
				523	for (b = 0; b < 10000000; b++)
				524	for (i = 0; i < 256; i++)
				525	a[i] ^= (1ULL << 63);
				526	}
Chris Lattner	f61b63e	2008-01-09 00:17:57 +0000	[diff] [blame]	527
				528	and I suspect other processors are similar. On X86 in particular this is a
				529	big win because doing this with integers allows the use of read/modify/write
				530	instructions.
				531
				532	//===---------------------------------------------------------------------===//
Chris Lattner	8372601	2008-01-10 18:25:41 +0000	[diff] [blame]	533
				534	DAG Combiner should try to combine small loads into larger loads when
				535	profitable. For example, we compile this C++ example:
				536
				537	struct THotKey { short Key; bool Control; bool Shift; bool Alt; };
				538	extern THotKey m_HotKey;
				539	THotKey GetHotKey () { return m_HotKey; }
				540
				541	into (-O3 -fno-exceptions -static -fomit-frame-pointer):
				542
				543	__Z9GetHotKeyv:
				544	pushl %esi
				545	movl 8(%esp), %eax
				546	movb _m_HotKey+3, %cl
				547	movb _m_HotKey+4, %dl
				548	movb _m_HotKey+2, %ch
				549	movw _m_HotKey, %si
				550	movw %si, (%eax)
				551	movb %ch, 2(%eax)
				552	movb %cl, 3(%eax)
				553	movb %dl, 4(%eax)
				554	popl %esi
				555	ret $4
				556
				557	GCC produces:
				558
				559	__Z9GetHotKeyv:
				560	movl _m_HotKey, %edx
				561	movl 4(%esp), %eax
				562	movl %edx, (%eax)
				563	movzwl _m_HotKey+4, %edx
				564	movw %dx, 4(%eax)
				565	ret $4
				566
				567	The LLVM IR contains the needed alignment info, so we should be able to
				568	merge the loads and stores into 4-byte loads:
				569
				570	%struct.THotKey = type { i16, i8, i8, i8 }
				571	define void @_Z9GetHotKeyv(%struct.THotKey* sret %agg.result) nounwind {
				572	...
				573	%tmp2 = load i16* getelementptr (@m_HotKey, i32 0, i32 0), align 8
				574	%tmp5 = load i8* getelementptr (@m_HotKey, i32 0, i32 1), align 2
				575	%tmp8 = load i8* getelementptr (@m_HotKey, i32 0, i32 2), align 1
				576	%tmp11 = load i8* getelementptr (@m_HotKey, i32 0, i32 3), align 2
				577
				578	Alternatively, we should use a small amount of base-offset alias analysis
				579	to make it so the scheduler doesn't need to hold all the loads in regs at
				580	once.
				581
				582	//===---------------------------------------------------------------------===//
Chris Lattner	497b7e9	2008-01-11 06:17:47 +0000	[diff] [blame]	583
Nate Begeman	e9fe65c	2008-02-18 18:39:23 +0000	[diff] [blame]	584	We should add an FRINT node to the DAG to model targets that have legal
				585	implementations of ceil/floor/rint.
Chris Lattner	48840f8	2008-02-28 05:34:27 +0000	[diff] [blame]	586
				587	//===---------------------------------------------------------------------===//
				588
				589	Consider:
				590
				591	int test() {
				592	long long input[8] = {1,1,1,1,1,1,1,1};
				593	foo(input);
				594	}
				595
				596	We currently compile this into a memcpy from a global array since the
				597	initializer is fairly large and not memset'able. This is good, but the memcpy
				598	gets lowered to load/stores in the code generator. This is also ok, except
				599	that the codegen lowering for memcpy doesn't handle the case when the source
				600	is a constant global. This gives us atrocious code like this:
				601
				602	call "L1$pb"
				603	"L1$pb":
				604	popl %eax
				605	movl _C.0.1444-"L1$pb"+32(%eax), %ecx
				606	movl %ecx, 40(%esp)
				607	movl _C.0.1444-"L1$pb"+20(%eax), %ecx
				608	movl %ecx, 28(%esp)
				609	movl _C.0.1444-"L1$pb"+36(%eax), %ecx
				610	movl %ecx, 44(%esp)
				611	movl _C.0.1444-"L1$pb"+44(%eax), %ecx
				612	movl %ecx, 52(%esp)
				613	movl _C.0.1444-"L1$pb"+40(%eax), %ecx
				614	movl %ecx, 48(%esp)
				615	movl _C.0.1444-"L1$pb"+12(%eax), %ecx
				616	movl %ecx, 20(%esp)
				617	movl _C.0.1444-"L1$pb"+4(%eax), %ecx
				618	...
				619
				620	instead of:
				621	movl $1, 16(%esp)
				622	movl $0, 20(%esp)
				623	movl $1, 24(%esp)
				624	movl $0, 28(%esp)
				625	movl $1, 32(%esp)
				626	movl $0, 36(%esp)
				627	...
				628
				629	//===---------------------------------------------------------------------===//
Chris Lattner	a11deb0	2008-03-02 02:51:40 +0000	[diff] [blame]	630
				631	http://llvm.org/PR717:
				632
				633	The following code should compile into "ret int undef". Instead, LLVM
				634	produces "ret int 0":
				635
				636	int f() {
				637	int x = 4;
				638	int y;
				639	if (x == 3) y = 0;
				640	return y;
				641	}
				642
				643	//===---------------------------------------------------------------------===//
Chris Lattner	53b7277	2008-03-02 19:29:42 +0000	[diff] [blame]	644
				645	The loop unroller should partially unroll loops (instead of peeling them)
				646	when code growth isn't too bad and when an unroll count allows simplification
				647	of some code within the loop. One trivial example is:
				648
				649	#include <stdio.h>
				650	int main() {
				651	int nRet = 17;
				652	int nLoop;
				653	for ( nLoop = 0; nLoop < 1000; nLoop++ ) {
				654	if ( nLoop & 1 )
				655	nRet += 2;
				656	else
				657	nRet -= 1;
				658	}
				659	return nRet;
				660	}
				661
				662	Unrolling by 2 would eliminate the '&1' in both copies, leading to a net
				663	reduction in code size. The resultant code would then also be suitable for
				664	exit value computation.
				665
				666	//===---------------------------------------------------------------------===//
Chris Lattner	349155b	2008-03-17 01:47:51 +0000	[diff] [blame]	667
				668	We miss a bunch of rotate opportunities on various targets, including ppc, x86,
				669	etc. On X86, we miss a bunch of 'rotate by variable' cases because the rotate
				670	matching code in dag combine doesn't look through truncates aggressively
				671	enough. Here are some testcases reduces from GCC PR17886:
				672
				673	unsigned long long f(unsigned long long x, int y) {
				674	return (x << y) \| (x >> 64-y);
				675	}
				676	unsigned f2(unsigned x, int y){
				677	return (x << y) \| (x >> 32-y);
				678	}
				679	unsigned long long f3(unsigned long long x){
				680	int y = 9;
				681	return (x << y) \| (x >> 64-y);
				682	}
				683	unsigned f4(unsigned x){
				684	int y = 10;
				685	return (x << y) \| (x >> 32-y);
				686	}
				687	unsigned long long f5(unsigned long long x, unsigned long long y) {
				688	return (x << 8) \| ((y >> 48) & 0xffull);
				689	}
				690	unsigned long long f6(unsigned long long x, unsigned long long y, int z) {
				691	switch(z) {
				692	case 1:
				693	return (x << 8) \| ((y >> 48) & 0xffull);
				694	case 2:
				695	return (x << 16) \| ((y >> 40) & 0xffffull);
				696	case 3:
				697	return (x << 24) \| ((y >> 32) & 0xffffffull);
				698	case 4:
				699	return (x << 32) \| ((y >> 24) & 0xffffffffull);
				700	default:
				701	return (x << 40) \| ((y >> 16) & 0xffffffffffull);
				702	}
				703	}
				704
Dan Gohman	cb747c5	2008-10-17 21:39:27 +0000	[diff] [blame]	705	On X86-64, we only handle f2/f3/f4 right. On x86-32, a few of these
Chris Lattner	349155b	2008-03-17 01:47:51 +0000	[diff] [blame]	706	generate truly horrible code, instead of using shld and friends. On
				707	ARM, we end up with calls to L___lshrdi3/L___ashldi3 in f, which is
				708	badness. PPC64 misses f, f5 and f6. CellSPU aborts in isel.
				709
				710	//===---------------------------------------------------------------------===//
Chris Lattner	f70107f	2008-03-20 04:46:13 +0000	[diff] [blame]	711
				712	We do a number of simplifications in simplify libcalls to strength reduce
				713	standard library functions, but we don't currently merge them together. For
				714	example, it is useful to merge memcpy(a,b,strlen(b)) -> strcpy. This can only
				715	be done safely if "b" isn't modified between the strlen and memcpy of course.
				716
				717	//===---------------------------------------------------------------------===//
				718
Chris Lattner	10c5d36	2008-07-14 00:19:59 +0000	[diff] [blame]	719	Reassociate should turn things like:
				720
				721	int factorial(int X) {
				722	return XXXXXXX*X;
				723	}
				724
				725	into llvm.powi calls, allowing the code generator to produce balanced
				726	multiplication trees.
				727
				728	//===---------------------------------------------------------------------===//
				729
Chris Lattner	26e150f	2008-08-10 01:14:08 +0000	[diff] [blame]	730	We generate a horrible libcall for llvm.powi. For example, we compile:
				731
				732	#include <cmath>
				733	double f(double a) { return std::pow(a, 4); }
				734
				735	into:
				736
				737	__Z1fd:
				738	subl $12, %esp
				739	movsd 16(%esp), %xmm0
				740	movsd %xmm0, (%esp)
				741	movl $4, 8(%esp)
				742	call L___powidf2$stub
				743	addl $12, %esp
				744	ret
				745
				746	GCC produces:
				747
				748	__Z1fd:
				749	subl $12, %esp
				750	movsd 16(%esp), %xmm0
				751	mulsd %xmm0, %xmm0
				752	mulsd %xmm0, %xmm0
				753	movsd %xmm0, (%esp)
				754	fldl (%esp)
				755	addl $12, %esp
				756	ret
				757
				758	//===---------------------------------------------------------------------===//
				759
				760	We compile this program: (from GCC PR11680)
				761	http://gcc.gnu.org/bugzilla/attachment.cgi?id=4487
				762
				763	Into code that runs the same speed in fast/slow modes, but both modes run 2x
				764	slower than when compile with GCC (either 4.0 or 4.2):
				765
				766	$ llvm-g++ perf.cpp -O3 -fno-exceptions
				767	$ time ./a.out fast
				768	1.821u 0.003s 0:01.82 100.0% 0+0k 0+0io 0pf+0w
				769
				770	$ g++ perf.cpp -O3 -fno-exceptions
				771	$ time ./a.out fast
				772	0.821u 0.001s 0:00.82 100.0% 0+0k 0+0io 0pf+0w
				773
				774	It looks like we are making the same inlining decisions, so this may be raw
				775	codegen badness or something else (haven't investigated).
				776
				777	//===---------------------------------------------------------------------===//
				778
				779	We miss some instcombines for stuff like this:
				780	void bar (void);
				781	void foo (unsigned int a) {
				782	/* This one is equivalent to a >= (3 << 2). */
				783	if ((a >> 2) >= 3)
				784	bar ();
				785	}
				786
				787	A few other related ones are in GCC PR14753.
				788
				789	//===---------------------------------------------------------------------===//
				790
				791	Divisibility by constant can be simplified (according to GCC PR12849) from
				792	being a mulhi to being a mul lo (cheaper). Testcase:
				793
				794	void bar(unsigned n) {
				795	if (n % 3 == 0)
				796	true();
				797	}
				798
				799	I think this basically amounts to a dag combine to simplify comparisons against
				800	multiply hi's into a comparison against the mullo.
				801
				802	//===---------------------------------------------------------------------===//
Chris Lattner	23f35bc	2008-08-19 06:22:16 +0000	[diff] [blame]	803
Chris Lattner	db03983	2008-10-15 16:06:03 +0000	[diff] [blame]	804	Better mod/ref analysis for scanf would allow us to eliminate the vtable and a
				805	bunch of other stuff from this example (see PR1604):
				806
				807	#include <cstdio>
				808	struct test {
				809	int val;
				810	virtual ~test() {}
				811	};
				812
				813	int main() {
				814	test t;
				815	std::scanf("%d", &t.val);
				816	std::printf("%d\n", t.val);
				817	}
				818
				819	//===---------------------------------------------------------------------===//
				820
Chris Lattner	3b364cb	2008-10-15 16:33:52 +0000	[diff] [blame]	821	Instcombine will merge comparisons like (x >= 10) && (x < 20) by producing (x -
				822	10) u< 10, but only when the comparisons have matching sign.
				823
				824	This could be converted with a similiar technique. (PR1941)
				825
				826	define i1 @test(i8 %x) {
				827	%A = icmp uge i8 %x, 5
				828	%B = icmp slt i8 %x, 20
				829	%C = and i1 %A, %B
				830	ret i1 %C
				831	}
				832
				833	//===---------------------------------------------------------------------===//
Nick Lewycky	df563ca	2008-11-27 22:12:22 +0000	[diff] [blame]	834
Nick Lewycky	d2f0db1	2008-11-27 22:41:45 +0000	[diff] [blame]	835	These functions perform the same computation, but produce different assembly.
Nick Lewycky	df563ca	2008-11-27 22:12:22 +0000	[diff] [blame]	836
				837	define i8 @select(i8 %x) readnone nounwind {
				838	%A = icmp ult i8 %x, 250
				839	%B = select i1 %A, i8 0, i8 1
				840	ret i8 %B
				841	}
				842
				843	define i8 @addshr(i8 %x) readnone nounwind {
				844	%A = zext i8 %x to i9
				845	%B = add i9 %A, 6 ;; 256 - 250 == 6
				846	%C = lshr i9 %B, 8
				847	%D = trunc i9 %C to i8
				848	ret i8 %D
				849	}
				850
				851	//===---------------------------------------------------------------------===//
Eli Friedman	4e16b29	2008-11-30 07:36:04 +0000	[diff] [blame]	852
				853	From gcc bug 24696:
				854	int
				855	f (unsigned long a, unsigned long b, unsigned long c)
				856	{
				857	return ((a & (c - 1)) != 0) \|\| ((b & (c - 1)) != 0);
				858	}
				859	int
				860	f (unsigned long a, unsigned long b, unsigned long c)
				861	{
				862	return ((a & (c - 1)) != 0) \| ((b & (c - 1)) != 0);
				863	}
				864	Both should combine to ((a\|b) & (c-1)) != 0. Currently not optimized with
				865	"clang -emit-llvm-bc \| opt -std-compile-opts".
				866
				867	//===---------------------------------------------------------------------===//
				868
				869	From GCC Bug 20192:
				870	#define PMD_MASK (~((1UL << 23) - 1))
				871	void clear_pmd_range(unsigned long start, unsigned long end)
				872	{
				873	if (!(start & ~PMD_MASK) && !(end & ~PMD_MASK))
				874	f();
				875	}
				876	The expression should optimize to something like
				877	"!((start\|end)&~PMD_MASK). Currently not optimized with "clang
				878	-emit-llvm-bc \| opt -std-compile-opts".
				879
				880	//===---------------------------------------------------------------------===//
				881
				882	From GCC Bug 15241:
				883	unsigned int
				884	foo (unsigned int a, unsigned int b)
				885	{
				886	if (a <= 7 && b <= 7)
				887	baz ();
				888	}
				889	Should combine to "(a\|b) <= 7". Currently not optimized with "clang
				890	-emit-llvm-bc \| opt -std-compile-opts".
				891
				892	//===---------------------------------------------------------------------===//
				893
				894	From GCC Bug 3756:
				895	int
				896	pn (int n)
				897	{
				898	return (n >= 0 ? 1 : -1);
				899	}
				900	Should combine to (n >> 31) \| 1. Currently not optimized with "clang
				901	-emit-llvm-bc \| opt -std-compile-opts \| llc".
				902
				903	//===---------------------------------------------------------------------===//
				904
				905	From GCC Bug 28685:
				906	int test(int a, int b)
				907	{
				908	int lt = a < b;
				909	int eq = a == b;
				910
				911	return (lt \|\| eq);
				912	}
				913	Should combine to "a <= b". Currently not optimized with "clang
				914	-emit-llvm-bc \| opt -std-compile-opts \| llc".
				915
				916	//===---------------------------------------------------------------------===//
				917
				918	void a(int variable)
				919	{
				920	if (variable == 4 \|\| variable == 6)
				921	bar();
				922	}
				923	This should optimize to "if ((variable \| 2) == 6)". Currently not
				924	optimized with "clang -emit-llvm-bc \| opt -std-compile-opts \| llc".
				925
				926	//===---------------------------------------------------------------------===//
				927
				928	unsigned int f(unsigned int i, unsigned int n) {++i; if (i == n) ++i; return
				929	i;}
				930	unsigned int f2(unsigned int i, unsigned int n) {++i; i += i == n; return i;}
				931	These should combine to the same thing. Currently, the first function
				932	produces better code on X86.
				933
				934	//===---------------------------------------------------------------------===//
				935
Eli Friedman	4e16b29	2008-11-30 07:36:04 +0000	[diff] [blame]	936	From GCC Bug 15784:
				937	#define abs(x) x>0?x:-x
				938	int f(int x, int y)
				939	{
				940	return (abs(x)) >= 0;
				941	}
				942	This should optimize to x == INT_MIN. (With -fwrapv.) Currently not
				943	optimized with "clang -emit-llvm-bc \| opt -std-compile-opts".
				944
				945	//===---------------------------------------------------------------------===//
				946
				947	From GCC Bug 14753:
				948	void
				949	rotate_cst (unsigned int a)
				950	{
				951	a = (a << 10) \| (a >> 22);
				952	if (a == 123)
				953	bar ();
				954	}
				955	void
				956	minus_cst (unsigned int a)
				957	{
				958	unsigned int tem;
				959
				960	tem = 20 - a;
				961	if (tem == 5)
				962	bar ();
				963	}
				964	void
				965	mask_gt (unsigned int a)
				966	{
				967	/* This is equivalent to a > 15. */
				968	if ((a & ~7) > 8)
				969	bar ();
				970	}
				971	void
				972	rshift_gt (unsigned int a)
				973	{
				974	/* This is equivalent to a > 23. */
				975	if ((a >> 2) > 5)
				976	bar ();
				977	}
				978	All should simplify to a single comparison. All of these are
				979	currently not optimized with "clang -emit-llvm-bc \| opt
				980	-std-compile-opts".
				981
				982	//===---------------------------------------------------------------------===//
				983
				984	From GCC Bug 32605:
				985	int c(int* x) {return (char)x+2 == (char)x;}
				986	Should combine to 0. Currently not optimized with "clang
				987	-emit-llvm-bc \| opt -std-compile-opts" (although llc can optimize it).
				988
				989	//===---------------------------------------------------------------------===//
				990
				991	int a(unsigned char* b) {return *b > 99;}
				992	There's an unnecessary zext in the generated code with "clang
				993	-emit-llvm-bc \| opt -std-compile-opts".
				994
				995	//===---------------------------------------------------------------------===//
				996
Eli Friedman	4e16b29	2008-11-30 07:36:04 +0000	[diff] [blame]	997	int a(unsigned b) {return ((b << 31) \| (b << 30)) >> 31;}
				998	Should be combined to "((b >> 1) \| b) & 1". Currently not optimized
				999	with "clang -emit-llvm-bc \| opt -std-compile-opts".
				1000
				1001	//===---------------------------------------------------------------------===//
				1002
				1003	unsigned a(unsigned x, unsigned y) { return x \| (y & 1) \| (y & 2);}
				1004	Should combine to "x \| (y & 3)". Currently not optimized with "clang
				1005	-emit-llvm-bc \| opt -std-compile-opts".
				1006
				1007	//===---------------------------------------------------------------------===//
				1008
				1009	unsigned a(unsigned a) {return ((a \| 1) & 3) \| (a & -4);}
				1010	Should combine to "a \| 1". Currently not optimized with "clang
				1011	-emit-llvm-bc \| opt -std-compile-opts".
				1012
				1013	//===---------------------------------------------------------------------===//
				1014
Eli Friedman	4e16b29	2008-11-30 07:36:04 +0000	[diff] [blame]	1015	int a(int a, int b, int c) {return (~a & c) \| ((c\|a) & b);}
				1016	Should fold to "(~a & c) \| (a & b)". Currently not optimized with
				1017	"clang -emit-llvm-bc \| opt -std-compile-opts".
				1018
				1019	//===---------------------------------------------------------------------===//
				1020
				1021	int a(int a,int b) {return (~(a\|b))\|a;}
				1022	Should fold to "a\|~b". Currently not optimized with "clang
				1023	-emit-llvm-bc \| opt -std-compile-opts".
				1024
				1025	//===---------------------------------------------------------------------===//
				1026
				1027	int a(int a, int b) {return (a&&b) \|\| (a&&!b);}
				1028	Should fold to "a". Currently not optimized with "clang -emit-llvm-bc
				1029	\| opt -std-compile-opts".
				1030
				1031	//===---------------------------------------------------------------------===//
				1032
				1033	int a(int a, int b, int c) {return (a&&b) \|\| (!a&&c);}
				1034	Should fold to "a ? b : c", or at least something sane. Currently not
				1035	optimized with "clang -emit-llvm-bc \| opt -std-compile-opts".
				1036
				1037	//===---------------------------------------------------------------------===//
				1038
				1039	int a(int a, int b, int c) {return (a&&b) \|\| (a&&c) \|\| (a&&b&&c);}
				1040	Should fold to a && (b \|\| c). Currently not optimized with "clang
				1041	-emit-llvm-bc \| opt -std-compile-opts".
				1042
				1043	//===---------------------------------------------------------------------===//
				1044
				1045	int a(int x) {return x \| ((x & 8) ^ 8);}
				1046	Should combine to x \| 8. Currently not optimized with "clang
				1047	-emit-llvm-bc \| opt -std-compile-opts".
				1048
				1049	//===---------------------------------------------------------------------===//
				1050
				1051	int a(int x) {return x ^ ((x & 8) ^ 8);}
				1052	Should also combine to x \| 8. Currently not optimized with "clang
				1053	-emit-llvm-bc \| opt -std-compile-opts".
				1054
				1055	//===---------------------------------------------------------------------===//
				1056
				1057	int a(int x) {return (x & 8) == 0 ? -1 : -9;}
				1058	Should combine to (x \| -9) ^ 8. Currently not optimized with "clang
				1059	-emit-llvm-bc \| opt -std-compile-opts".
				1060
				1061	//===---------------------------------------------------------------------===//
				1062
				1063	int a(int x) {return (x & 8) == 0 ? -9 : -1;}
				1064	Should combine to x \| -9. Currently not optimized with "clang
				1065	-emit-llvm-bc \| opt -std-compile-opts".
				1066
				1067	//===---------------------------------------------------------------------===//
				1068
				1069	int a(int x) {return ((x \| -9) ^ 8) & x;}
				1070	Should combine to x & -9. Currently not optimized with "clang
				1071	-emit-llvm-bc \| opt -std-compile-opts".
				1072
				1073	//===---------------------------------------------------------------------===//
				1074
				1075	unsigned a(unsigned a) {return a * 0x11111111 >> 28 & 1;}
				1076	Should combine to "a * 0x88888888 >> 31". Currently not optimized
				1077	with "clang -emit-llvm-bc \| opt -std-compile-opts".
				1078
				1079	//===---------------------------------------------------------------------===//
				1080
				1081	unsigned a(char* x) {if ((*x & 32) == 0) return b();}
				1082	There's an unnecessary zext in the generated code with "clang
				1083	-emit-llvm-bc \| opt -std-compile-opts".
				1084
				1085	//===---------------------------------------------------------------------===//
				1086
				1087	unsigned a(unsigned long long x) {return 40 * (x >> 1);}
				1088	Should combine to "20 * (((unsigned)x) & -2)". Currently not
				1089	optimized with "clang -emit-llvm-bc \| opt -std-compile-opts".
				1090
				1091	//===---------------------------------------------------------------------===//
Bill Wendling	3bdcda8	2008-12-02 05:12:47 +0000	[diff] [blame]	1092
Chris Lattner	88d84b2	2008-12-02 06:32:34 +0000	[diff] [blame]	1093	This was noticed in the entryblock for grokdeclarator in 403.gcc:
				1094
				1095	%tmp = icmp eq i32 %decl_context, 4
				1096	%decl_context_addr.0 = select i1 %tmp, i32 3, i32 %decl_context
				1097	%tmp1 = icmp eq i32 %decl_context_addr.0, 1
				1098	%decl_context_addr.1 = select i1 %tmp1, i32 0, i32 %decl_context_addr.0
				1099
				1100	tmp1 should be simplified to something like:
				1101	(!tmp \|\| decl_context == 1)
				1102
				1103	This allows recursive simplifications, tmp1 is used all over the place in
				1104	the function, e.g. by:
				1105
				1106	%tmp23 = icmp eq i32 %decl_context_addr.1, 0 ; <i1> [#uses=1]
				1107	%tmp24 = xor i1 %tmp1, true ; <i1> [#uses=1]
				1108	%or.cond8 = and i1 %tmp23, %tmp24 ; <i1> [#uses=1]
				1109
				1110	later.
				1111
Chris Lattner	78a7e7c	2008-12-06 19:28:22 +0000	[diff] [blame]	1112	//===---------------------------------------------------------------------===//
				1113
				1114	Store sinking: This code:
				1115
				1116	void f (int n, int cond, int res) {
				1117	int i;
				1118	*res = 0;
				1119	for (i = 0; i < n; i++)
				1120	if (*cond)
				1121	res ^= 234; / () /
				1122	}
				1123
				1124	On this function GVN hoists the fully redundant value of *res, but nothing
				1125	moves the store out. This gives us this code:
				1126
				1127	bb: ; preds = %bb2, %entry
				1128	%.rle = phi i32 [ 0, %entry ], [ %.rle6, %bb2 ]
				1129	%i.05 = phi i32 [ 0, %entry ], [ %indvar.next, %bb2 ]
				1130	%1 = load i32* %cond, align 4
				1131	%2 = icmp eq i32 %1, 0
				1132	br i1 %2, label %bb2, label %bb1
				1133
				1134	bb1: ; preds = %bb
				1135	%3 = xor i32 %.rle, 234
				1136	store i32 %3, i32* %res, align 4
				1137	br label %bb2
				1138
				1139	bb2: ; preds = %bb, %bb1
				1140	%.rle6 = phi i32 [ %3, %bb1 ], [ %.rle, %bb ]
				1141	%indvar.next = add i32 %i.05, 1
				1142	%exitcond = icmp eq i32 %indvar.next, %n
				1143	br i1 %exitcond, label %return, label %bb
				1144
				1145	DSE should sink partially dead stores to get the store out of the loop.
				1146
Chris Lattner	6a09a74	2008-12-06 22:52:12 +0000	[diff] [blame]	1147	Here's another partial dead case:
				1148	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12395
				1149
Chris Lattner	78a7e7c	2008-12-06 19:28:22 +0000	[diff] [blame]	1150	//===---------------------------------------------------------------------===//
				1151
				1152	Scalar PRE hoists the mul in the common block up to the else:
				1153
				1154	int test (int a, int b, int c, int g) {
				1155	int d, e;
				1156	if (a)
				1157	d = b * c;
				1158	else
				1159	d = b - c;
				1160	e = b * c + g;
				1161	return d + e;
				1162	}
				1163
				1164	It would be better to do the mul once to reduce codesize above the if.
				1165	This is GCC PR38204.
				1166
				1167	//===---------------------------------------------------------------------===//
				1168
				1169	GCC PR37810 is an interesting case where we should sink load/store reload
				1170	into the if block and outside the loop, so we don't reload/store it on the
				1171	non-call path.
				1172
				1173	for () {
				1174	*P += 1;
				1175	if ()
				1176	call();
				1177	else
				1178	...
				1179	->
				1180	tmp = *P
				1181	for () {
				1182	tmp += 1;
				1183	if () {
				1184	*P = tmp;
				1185	call();
				1186	tmp = *P;
				1187	} else ...
				1188	}
				1189	*P = tmp;
				1190
Chris Lattner	8f416f3	2008-12-15 07:49:24 +0000	[diff] [blame]	1191	We now hoist the reload after the call (Transforms/GVN/lpre-call-wrap.ll), but
				1192	we don't sink the store. We need partially dead store sinking.
				1193
Chris Lattner	78a7e7c	2008-12-06 19:28:22 +0000	[diff] [blame]	1194	//===---------------------------------------------------------------------===//
				1195
Chris Lattner	8f416f3	2008-12-15 07:49:24 +0000	[diff] [blame]	1196	[PHI TRANSLATE GEPs]
				1197
Chris Lattner	78a7e7c	2008-12-06 19:28:22 +0000	[diff] [blame]	1198	GCC PR37166: Sinking of loads prevents SROA'ing the "g" struct on the stack
				1199	leading to excess stack traffic. This could be handled by GVN with some crazy
				1200	symbolic phi translation. The code we get looks like (g is on the stack):
				1201
				1202	bb2: ; preds = %bb1
				1203	..
				1204	%9 = getelementptr %struct.f* %g, i32 0, i32 0
				1205	store i32 %8, i32* %9, align bel %bb3
				1206
				1207	bb3: ; preds = %bb1, %bb2, %bb
				1208	%c_addr.0 = phi %struct.f* [ %g, %bb2 ], [ %c, %bb ], [ %c, %bb1 ]
				1209	%b_addr.0 = phi %struct.f* [ %b, %bb2 ], [ %g, %bb ], [ %b, %bb1 ]
				1210	%10 = getelementptr %struct.f* %c_addr.0, i32 0, i32 0
				1211	%11 = load i32* %10, align 4
				1212
				1213	%11 is fully redundant, an in BB2 it should have the value %8.
				1214
Chris Lattner	6a09a74	2008-12-06 22:52:12 +0000	[diff] [blame]	1215	GCC PR33344 is a similar case.
				1216
Chris Lattner	78a7e7c	2008-12-06 19:28:22 +0000	[diff] [blame]	1217	//===---------------------------------------------------------------------===//
				1218
Chris Lattner	6a09a74	2008-12-06 22:52:12 +0000	[diff] [blame]	1219	There are many load PRE testcases in testsuite/gcc.dg/tree-ssa/loadpre* in the
				1220	GCC testsuite. There are many pre testcases as ssa-pre-*.c
				1221
Chris Lattner	582048d	2008-12-15 08:32:28 +0000	[diff] [blame]	1222	//===---------------------------------------------------------------------===//
				1223
				1224	There are some interesting cases in testsuite/gcc.dg/tree-ssa/pred-comm* in the
				1225	GCC testsuite. For example, predcom-1.c is:
				1226
				1227	for (i = 2; i < 1000; i++)
				1228	fib[i] = (fib[i-1] + fib[i - 2]) & 0xffff;
				1229
				1230	which compiles into:
				1231
				1232	bb1: ; preds = %bb1, %bb1.thread
				1233	%indvar = phi i32 [ 0, %bb1.thread ], [ %0, %bb1 ]
				1234	%i.0.reg2mem.0 = add i32 %indvar, 2
				1235	%0 = add i32 %indvar, 1 ; <i32> [#uses=3]
				1236	%1 = getelementptr [1000 x i32]* @fib, i32 0, i32 %0
				1237	%2 = load i32* %1, align 4 ; <i32> [#uses=1]
				1238	%3 = getelementptr [1000 x i32]* @fib, i32 0, i32 %indvar
				1239	%4 = load i32* %3, align 4 ; <i32> [#uses=1]
				1240	%5 = add i32 %4, %2 ; <i32> [#uses=1]
				1241	%6 = and i32 %5, 65535 ; <i32> [#uses=1]
				1242	%7 = getelementptr [1000 x i32]* @fib, i32 0, i32 %i.0.reg2mem.0
				1243	store i32 %6, i32* %7, align 4
				1244	%exitcond = icmp eq i32 %0, 998 ; <i1> [#uses=1]
				1245	br i1 %exitcond, label %return, label %bb1
				1246
				1247	This is basically:
				1248	LOAD fib[i+1]
				1249	LOAD fib[i]
				1250	STORE fib[i+2]
				1251
				1252	instead of handling this as a loop or other xform, all we'd need to do is teach
				1253	load PRE to phi translate the %0 add (i+1) into the predecessor as (i'+1+1) =
				1254	(i'+2) (where i' is the previous iteration of i). This would find the store
				1255	which feeds it.
				1256
				1257	predcom-2.c is apparently the same as predcom-1.c
				1258	predcom-3.c is very similar but needs loads feeding each other instead of
				1259	store->load.
				1260	predcom-4.c seems the same as the rest.
				1261
				1262
				1263	//===---------------------------------------------------------------------===//
				1264
Chris Lattner	6a09a74	2008-12-06 22:52:12 +0000	[diff] [blame]	1265	Other simple load PRE cases:
Chris Lattner	8f416f3	2008-12-15 07:49:24 +0000	[diff] [blame]	1266	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35287 [LPRE crit edge splitting]
				1267
				1268	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34677 (licm does this, LPRE crit edge)
				1269	llvm-gcc t2.c -S -o - -O0 -emit-llvm \| llvm-as \| opt -mem2reg -simplifycfg -gvn \| llvm-dis
				1270
Chris Lattner	93c6c77	2009-09-21 02:53:57 +0000	[diff] [blame]	1271	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16799 [BITCAST PHI TRANS]
				1272
Chris Lattner	582048d	2008-12-15 08:32:28 +0000	[diff] [blame]	1273	//===---------------------------------------------------------------------===//
				1274
				1275	Type based alias analysis:
Chris Lattner	6a09a74	2008-12-06 22:52:12 +0000	[diff] [blame]	1276	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14705
				1277
				1278	//===---------------------------------------------------------------------===//
				1279
Chris Lattner	6a09a74	2008-12-06 22:52:12 +0000	[diff] [blame]	1280	A/B get pinned to the stack because we turn an if/then into a select instead
				1281	of PRE'ing the load/store. This may be fixable in instcombine:
				1282	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37892
				1283
Chris Lattner	93c6c77	2009-09-21 02:53:57 +0000	[diff] [blame]	1284	struct X { int i; };
				1285	int foo (int x) {
				1286	struct X a;
				1287	struct X b;
				1288	struct X *p;
				1289	a.i = 1;
				1290	b.i = 2;
				1291	if (x)
				1292	p = &a;
				1293	else
				1294	p = &b;
				1295	return p->i;
				1296	}
Chris Lattner	582048d	2008-12-15 08:32:28 +0000	[diff] [blame]	1297
Chris Lattner	93c6c77	2009-09-21 02:53:57 +0000	[diff] [blame]	1298	//===---------------------------------------------------------------------===//
Chris Lattner	582048d	2008-12-15 08:32:28 +0000	[diff] [blame]	1299
Chris Lattner	6a09a74	2008-12-06 22:52:12 +0000	[diff] [blame]	1300	Interesting missed case because of control flow flattening (should be 2 loads):
				1301	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26629
Chris Lattner	582048d	2008-12-15 08:32:28 +0000	[diff] [blame]	1302	With: llvm-gcc t2.c -S -o - -O0 -emit-llvm \| llvm-as \|
				1303	opt -mem2reg -gvn -instcombine \| llvm-dis
				1304	we miss it because we need 1) GEP PHI TRAN, 2) CRIT EDGE 3) MULTIPLE DIFFERENT
				1305	VALS PRODUCED BY ONE BLOCK OVER DIFFERENT PATHS
Chris Lattner	6a09a74	2008-12-06 22:52:12 +0000	[diff] [blame]	1306
				1307	//===---------------------------------------------------------------------===//
				1308
				1309	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19633
				1310	We could eliminate the branch condition here, loading from null is undefined:
				1311
				1312	struct S { int w, x, y, z; };
				1313	struct T { int r; struct S s; };
				1314	void bar (struct S, int);
				1315	void foo (int a, struct T b)
				1316	{
				1317	struct S *c = 0;
				1318	if (a)
				1319	c = &b.s;
				1320	bar (*c, a);
				1321	}
				1322
				1323	//===---------------------------------------------------------------------===//
Chris Lattner	88d84b2	2008-12-02 06:32:34 +0000	[diff] [blame]	1324
Chris Lattner	9cf8ef6	2008-12-23 20:52:52 +0000	[diff] [blame]	1325	simplifylibcalls should do several optimizations for strspn/strcspn:
				1326
				1327	strcspn(x, "") -> strlen(x)
				1328	strcspn("", x) -> 0
				1329	strspn("", x) -> 0
				1330	strspn(x, "") -> strlen(x)
				1331	strspn(x, "a") -> strchr(x, 'a')-x
				1332
				1333	strcspn(x, "a") -> inlined loop for up to 3 letters (similarly for strspn):
				1334
				1335	size_t __strcspn_c3 (__const char *__s, int __reject1, int __reject2,
				1336	int __reject3) {
				1337	register size_t __result = 0;
				1338	while (__s[__result] != '\0' && __s[__result] != __reject1 &&
				1339	__s[__result] != __reject2 && __s[__result] != __reject3)
				1340	++__result;
				1341	return __result;
				1342	}
				1343
				1344	This should turn into a switch on the character. See PR3253 for some notes on
				1345	codegen.
				1346
				1347	456.hmmer apparently uses strcspn and strspn a lot. 471.omnetpp uses strspn.
				1348
				1349	//===---------------------------------------------------------------------===//
Chris Lattner	d23b799	2008-12-31 00:54:13 +0000	[diff] [blame]	1350
				1351	"gas" uses this idiom:
				1352	else if (strchr ("+-/%\|&^:[]()~", intel_parser.op_string))
				1353	..
				1354	else if (strchr ("<>", *intel_parser.op_string)
				1355
				1356	Those should be turned into a switch.
				1357
				1358	//===---------------------------------------------------------------------===//
Chris Lattner	ffb08f5	2009-01-08 06:52:57 +0000	[diff] [blame]	1359
				1360	252.eon contains this interesting code:
				1361
				1362	%3072 = getelementptr [100 x i8]* %tempString, i32 0, i32 0
				1363	%3073 = call i8* @strcpy(i8* %3072, i8* %3071) nounwind
				1364	%strlen = call i32 @strlen(i8* %3072) ; uses = 1
				1365	%endptr = getelementptr [100 x i8]* %tempString, i32 0, i32 %strlen
				1366	call void @llvm.memcpy.i32(i8* %endptr,
				1367	i8* getelementptr ([5 x i8]* @"\01LC42", i32 0, i32 0), i32 5, i32 1)
				1368	%3074 = call i32 @strlen(i8* %endptr) nounwind readonly
				1369
				1370	This is interesting for a couple reasons. First, in this:
				1371
				1372	%3073 = call i8* @strcpy(i8* %3072, i8* %3071) nounwind
				1373	%strlen = call i32 @strlen(i8* %3072)
				1374
				1375	The strlen could be replaced with: %strlen = sub %3072, %3073, because the
				1376	strcpy call returns a pointer to the end of the string. Based on that, the
				1377	endptr GEP just becomes equal to 3073, which eliminates a strlen call and GEP.
				1378
				1379	Second, the memcpy+strlen strlen can be replaced with:
				1380
				1381	%3074 = call i32 @strlen([5 x i8]* @"\01LC42") nounwind readonly
				1382
				1383	Because the destination was just copied into the specified memory buffer. This,
				1384	in turn, can be constant folded to "4".
				1385
				1386	In other code, it contains:
				1387
				1388	%endptr6978 = bitcast i8* %endptr69 to i32*
				1389	store i32 7107374, i32* %endptr6978, align 1
				1390	%3167 = call i32 @strlen(i8* %endptr69) nounwind readonly
				1391
				1392	Which could also be constant folded. Whatever is producing this should probably
				1393	be fixed to leave this as a memcpy from a string.
				1394
				1395	Further, eon also has an interesting partially redundant strlen call:
				1396
				1397	bb8: ; preds = %_ZN18eonImageCalculatorC1Ev.exit
				1398	%682 = getelementptr i8 %argv, i32 6 ; <i8> [#uses=2]
				1399	%683 = load i8** %682, align 4 ; <i8*> [#uses=4]
				1400	%684 = load i8* %683, align 1 ; <i8> [#uses=1]
				1401	%685 = icmp eq i8 %684, 0 ; <i1> [#uses=1]
				1402	br i1 %685, label %bb10, label %bb9
				1403
				1404	bb9: ; preds = %bb8
				1405	%686 = call i32 @strlen(i8* %683) nounwind readonly
				1406	%687 = icmp ugt i32 %686, 254 ; <i1> [#uses=1]
				1407	br i1 %687, label %bb10, label %bb11
				1408
				1409	bb10: ; preds = %bb9, %bb8
				1410	%688 = call i32 @strlen(i8* %683) nounwind readonly
				1411
				1412	This could be eliminated by doing the strlen once in bb8, saving code size and
				1413	improving perf on the bb8->9->10 path.
				1414
				1415	//===---------------------------------------------------------------------===//
Chris Lattner	9fee08f	2009-01-08 07:34:55 +0000	[diff] [blame]	1416
				1417	I see an interesting fully redundant call to strlen left in 186.crafty:InputMove
				1418	which looks like:
				1419	%movetext11 = getelementptr [128 x i8]* %movetext, i32 0, i32 0
				1420
				1421
				1422	bb62: ; preds = %bb55, %bb53
				1423	%promote.0 = phi i32 [ %169, %bb55 ], [ 0, %bb53 ]
				1424	%171 = call i32 @strlen(i8* %movetext11) nounwind readonly align 1
				1425	%172 = add i32 %171, -1 ; <i32> [#uses=1]
				1426	%173 = getelementptr [128 x i8]* %movetext, i32 0, i32 %172
				1427
				1428	... no stores ...
				1429	br i1 %or.cond, label %bb65, label %bb72
				1430
				1431	bb65: ; preds = %bb62
				1432	store i8 0, i8* %173, align 1
				1433	br label %bb72
				1434
				1435	bb72: ; preds = %bb65, %bb62
				1436	%trank.1 = phi i32 [ %176, %bb65 ], [ -1, %bb62 ]
				1437	%177 = call i32 @strlen(i8* %movetext11) nounwind readonly align 1
				1438
				1439	Note that on the bb62->bb72 path, that the %177 strlen call is partially
				1440	redundant with the %171 call. At worst, we could shove the %177 strlen call
				1441	up into the bb65 block moving it out of the bb62->bb72 path. However, note
				1442	that bb65 stores to the string, zeroing out the last byte. This means that on
				1443	that path the value of %177 is actually just %171-1. A sub is cheaper than a
				1444	strlen!
				1445
				1446	This pattern repeats several times, basically doing:
				1447
				1448	A = strlen(P);
				1449	P[A-1] = 0;
				1450	B = strlen(P);
				1451	where it is "obvious" that B = A-1.
				1452
				1453	//===---------------------------------------------------------------------===//
				1454
				1455	186.crafty contains this interesting pattern:
				1456
				1457	%77 = call i8* @strstr(i8* getelementptr ([6 x i8]* @"\01LC5", i32 0, i32 0),
				1458	i8* %30)
				1459	%phitmp648 = icmp eq i8* %77, getelementptr ([6 x i8]* @"\01LC5", i32 0, i32 0)
				1460	br i1 %phitmp648, label %bb70, label %bb76
				1461
				1462	bb70: ; preds = %OptionMatch.exit91, %bb69
				1463	%78 = call i32 @strlen(i8* %30) nounwind readonly align 1 ; <i32> [#uses=1]
				1464
				1465	This is basically:
				1466	cststr = "abcdef";
				1467	if (strstr(cststr, P) == cststr) {
				1468	x = strlen(P);
				1469	...
				1470
				1471	The strstr call would be significantly cheaper written as:
				1472
				1473	cststr = "abcdef";
				1474	if (memcmp(P, str, strlen(P)))
				1475	x = strlen(P);
				1476
				1477	This is memcmp+strlen instead of strstr. This also makes the strlen fully
				1478	redundant.
				1479
				1480	//===---------------------------------------------------------------------===//
				1481
				1482	186.crafty also contains this code:
				1483
				1484	%1906 = call i32 @strlen(i8* getelementptr ([32 x i8]* @pgn_event, i32 0,i32 0))
				1485	%1907 = getelementptr [32 x i8]* @pgn_event, i32 0, i32 %1906
				1486	%1908 = call i8* @strcpy(i8* %1907, i8* %1905) nounwind align 1
				1487	%1909 = call i32 @strlen(i8* getelementptr ([32 x i8]* @pgn_event, i32 0,i32 0))
				1488	%1910 = getelementptr [32 x i8]* @pgn_event, i32 0, i32 %1909
				1489
				1490	The last strlen is computable as 1908-@pgn_event, which means 1910=1908.
				1491
				1492	//===---------------------------------------------------------------------===//
				1493
				1494	186.crafty has this interesting pattern with the "out.4543" variable:
				1495
				1496	call void @llvm.memcpy.i32(
				1497	i8* getelementptr ([10 x i8]* @out.4543, i32 0, i32 0),
				1498	i8* getelementptr ([7 x i8]* @"\01LC28700", i32 0, i32 0), i32 7, i32 1)
				1499	%101 = call@printf(i8* ... @out.4543, i32 0, i32 0)) nounwind
				1500
				1501	It is basically doing:
				1502
				1503	memcpy(globalarray, "string");
				1504	printf(..., globalarray);
				1505
				1506	Anyway, by knowing that printf just reads the memory and forward substituting
				1507	the string directly into the printf, this eliminates reads from globalarray.
				1508	Since this pattern occurs frequently in crafty (due to the "DisplayTime" and
				1509	other similar functions) there are many stores to "out". Once all the printfs
				1510	stop using "out", all that is left is the memcpy's into it. This should allow
				1511	globalopt to remove the "stored only" global.
				1512
				1513	//===---------------------------------------------------------------------===//
				1514
Dan Gohman	8289b05	2009-01-20 01:07:33 +0000	[diff] [blame]	1515	This code:
				1516
				1517	define inreg i32 @foo(i8* inreg %p) nounwind {
				1518	%tmp0 = load i8* %p
				1519	%tmp1 = ashr i8 %tmp0, 5
				1520	%tmp2 = sext i8 %tmp1 to i32
				1521	ret i32 %tmp2
				1522	}
				1523
				1524	could be dagcombine'd to a sign-extending load with a shift.
				1525	For example, on x86 this currently gets this:
				1526
				1527	movb (%eax), %al
				1528	sarb $5, %al
				1529	movsbl %al, %eax
				1530
				1531	while it could get this:
				1532
				1533	movsbl (%eax), %eax
				1534	sarl $5, %eax
				1535
				1536	//===---------------------------------------------------------------------===//
Chris Lattner	256baa4	2009-01-22 07:16:03 +0000	[diff] [blame]	1537
				1538	GCC PR31029:
				1539
				1540	int test(int x) { return 1-x == x; } // --> return false
				1541	int test2(int x) { return 2-x == x; } // --> return x == 1 ?
				1542
				1543	Always foldable for odd constants, what is the rule for even?
				1544
				1545	//===---------------------------------------------------------------------===//
				1546
Torok Edwin	e46a686	2009-01-24 19:30:25 +0000	[diff] [blame]	1547	PR 3381: GEP to field of size 0 inside a struct could be turned into GEP
				1548	for next field in struct (which is at same address).
				1549
				1550	For example: store of float into { {{}}, float } could be turned into a store to
				1551	the float directly.
				1552
Torok Edwin	474479f	2009-02-20 18:42:06 +0000	[diff] [blame]	1553	//===---------------------------------------------------------------------===//
Nick Lewycky	20babb1	2009-02-25 06:52:48 +0000	[diff] [blame]	1554
Torok Edwin	474479f	2009-02-20 18:42:06 +0000	[diff] [blame]	1555	#include <math.h>
				1556	double foo(double a) { return sin(a); }
				1557
				1558	This compiles into this on x86-64 Linux:
				1559	foo:
				1560	subq $8, %rsp
				1561	call sin
				1562	addq $8, %rsp
				1563	ret
				1564	vs:
				1565
				1566	foo:
				1567	jmp sin
				1568
Nick Lewycky	20babb1	2009-02-25 06:52:48 +0000	[diff] [blame]	1569	//===---------------------------------------------------------------------===//
				1570
Chris Lattner	32c5f17	2009-05-11 17:41:40 +0000	[diff] [blame]	1571	The arg promotion pass should make use of nocapture to make its alias analysis
				1572	stuff much more precise.
				1573
				1574	//===---------------------------------------------------------------------===//
				1575
				1576	The following functions should be optimized to use a select instead of a
				1577	branch (from gcc PR40072):
				1578
				1579	char char_int(int m) {if(m>7) return 0; return m;}
				1580	int int_char(char m) {if(m>7) return 0; return m;}
				1581
				1582	//===---------------------------------------------------------------------===//
				1583
Nick Lewycky	20babb1	2009-02-25 06:52:48 +0000	[diff] [blame]	1584	Instcombine should replace the load with a constant in:
				1585
				1586	static const char x[4] = {'a', 'b', 'c', 'd'};
				1587
				1588	unsigned int y(void) {
				1589	return (unsigned int )x;
				1590	}
				1591
				1592	It currently only does this transformation when the size of the constant
				1593	is the same as the size of the integer (so, try x[5]) and the last byte
				1594	is a null (making it a C string). There's no need for these restrictions.
				1595
				1596	//===---------------------------------------------------------------------===//
				1597
Chris Lattner	d919a8b	2009-05-11 17:36:33 +0000	[diff] [blame]	1598	InstCombine's "turn load from constant into constant" optimization should be
				1599	more aggressive in the presence of bitcasts. For example, because of unions,
				1600	this code:
				1601
				1602	union vec2d {
				1603	double e[2];
				1604	double v __attribute__((vector_size(16)));
				1605	};
				1606	typedef union vec2d vec2d;
				1607
				1608	static vec2d a={{1,2}}, b={{3,4}};
				1609
				1610	vec2d foo () {
				1611	return (vec2d){ .v = a.v + b.v * (vec2d){{5,5}}.v };
				1612	}
				1613
				1614	Compiles into:
				1615
				1616	@a = internal constant %0 { [2 x double]
				1617	[double 1.000000e+00, double 2.000000e+00] }, align 16
				1618	@b = internal constant %0 { [2 x double]
				1619	[double 3.000000e+00, double 4.000000e+00] }, align 16
				1620	...
				1621	define void @foo(%struct.vec2d* noalias nocapture sret %agg.result) nounwind {
				1622	entry:
				1623	%0 = load <2 x double>* getelementptr (%struct.vec2d*
				1624	bitcast (%0* @a to %struct.vec2d*), i32 0, i32 0), align 16
				1625	%1 = load <2 x double>* getelementptr (%struct.vec2d*
				1626	bitcast (%0* @b to %struct.vec2d*), i32 0, i32 0), align 16
				1627
				1628
				1629	Instcombine should be able to optimize away the loads (and thus the globals).
				1630
Chris Lattner	c2b0d48	2009-09-14 16:49:26 +0000	[diff] [blame]	1631	See also PR4973
Chris Lattner	d919a8b	2009-05-11 17:36:33 +0000	[diff] [blame]	1632
				1633	//===---------------------------------------------------------------------===//