Blame - llvm/lib/Target/README.txt - toolchain/llvm-project

blob: a02863c6a4797748d47ab5daa8c7dd0fd593be0b [file] [log] [blame]

Chris Lattner	d1aaee0	2006-02-03 06:21:43 +0000	[diff] [blame]	1	Target Independent Opportunities:
				2
Chris Lattner	3cbd160	2006-09-28 06:01:17 +0000	[diff] [blame]	3	//===---------------------------------------------------------------------===//
				4
Chris Lattner	83a4a98	2009-11-27 17:12:30 +0000	[diff] [blame]	5	Dead argument elimination should be enhanced to handle cases when an argument is
				6	dead to an externally visible function. Though the argument can't be removed
				7	from the externally visible function, the caller doesn't need to pass it in.
				8	For example in this testcase:
				9
				10	void foo(int X) __attribute__((noinline));
				11	void foo(int X) { sideeffect(); }
				12	void bar(int A) { foo(A+1); }
				13
				14	We compile bar to:
				15
				16	define void @bar(i32 %A) nounwind ssp {
				17	%0 = add nsw i32 %A, 1 ; <i32> [#uses=1]
				18	tail call void @foo(i32 %0) nounwind noinline ssp
				19	ret void
				20	}
				21
				22	The add is dead, we could pass in 'i32 undef' instead. This occurs for C++
				23	templates etc, which usually have linkonce_odr/weak_odr linkage, not internal
				24	linkage.
				25
				26	//===---------------------------------------------------------------------===//
				27
Chris Lattner	6dc2233	2006-11-14 01:57:53 +0000	[diff] [blame]	28	With the recent changes to make the implicit def/use set explicit in
				29	machineinstrs, we should change the target descriptions for 'call' instructions
				30	so that the .td files don't list all the call-clobbered registers as implicit
				31	defs. Instead, these should be added by the code generator (e.g. on the dag).
				32
				33	This has a number of uses:
				34
				35	1. PPC32/64 and X86 32/64 can avoid having multiple copies of call instructions
				36	for their different impdef sets.
				37	2. Targets with multiple calling convs (e.g. x86) which have different clobber
				38	sets don't need copies of call instructions.
				39	3. 'Interprocedural register allocation' can be done to reduce the clobber sets
				40	of calls.
				41
				42	//===---------------------------------------------------------------------===//
				43
Chris Lattner	2e33985	2010-12-15 07:25:55 +0000	[diff] [blame]	44	We should recognized various "overflow detection" idioms and translate them into
Chris Lattner	5e0c0c7	2010-12-19 19:37:52 +0000	[diff] [blame]	45	llvm.uadd.with.overflow and similar intrinsics. Here is a multiply idiom:
Chris Lattner	5174921	2010-12-15 07:28:58 +0000	[diff] [blame]	46
				47	unsigned int mul(unsigned int a,unsigned int b) {
				48	if ((unsigned long long)a*b>0xffffffff)
				49	exit(0);
				50	return a*b;
				51	}
				52
Chris Lattner	51415d2	2011-01-02 18:31:38 +0000	[diff] [blame]	53	The legalization code for mul-with-overflow needs to be made more robust before
				54	this can be implemented though.
				55
Nate Begeman	bb01d4f	2006-03-17 01:40:33 +0000	[diff] [blame]	56	//===---------------------------------------------------------------------===//
Chris Lattner	d1aaee0	2006-02-03 06:21:43 +0000	[diff] [blame]	57
				58	Get the C front-end to expand hypot(x,y) -> llvm.sqrt(xx+yy) when errno and
Chris Lattner	56fe52e	2008-12-10 01:30:48 +0000	[diff] [blame]	59	precision don't matter (ffastmath). Misc/mandel will like this. :) This isn't
				60	safe in general, even on darwin. See the libm implementation of hypot for
				61	examples (which special case when x/y are exactly zero to get signed zeros etc
				62	right).
Chris Lattner	d1aaee0	2006-02-03 06:21:43 +0000	[diff] [blame]	63
Chris Lattner	d1aaee0	2006-02-03 06:21:43 +0000	[diff] [blame]	64	//===---------------------------------------------------------------------===//
				65
Chris Lattner	e43e5c0	2006-03-04 01:19:34 +0000	[diff] [blame]	66	On targets with expensive 64-bit multiply, we could LSR this:
				67
				68	for (i = ...; ++i) {
				69	x = 1ULL << i;
				70
				71	into:
				72	long long tmp = 1;
				73	for (i = ...; ++i, tmp+=tmp)
				74	x = tmp;
				75
				76	This would be a win on ppc32, but not x86 or ppc64.
				77
Chris Lattner	c9a318d	2006-03-04 08:44:51 +0000	[diff] [blame]	78	//===---------------------------------------------------------------------===//
Chris Lattner	5032c32	2006-03-05 20:00:08 +0000	[diff] [blame]	79
				80	Shrink: (setlt (loadi32 P), 0) -> (setlt (loadi8 Phi), 0)
				81
				82	//===---------------------------------------------------------------------===//
Chris Lattner	bccb0e0	2006-03-07 02:46:26 +0000	[diff] [blame]	83
Chris Lattner	71cf7c2	2010-01-01 01:29:26 +0000	[diff] [blame]	84	Reassociate should turn things like:
				85
				86	int factorial(int X) {
				87	return XXXXXXX*X;
				88	}
				89
				90	into llvm.powi calls, allowing the code generator to produce balanced
				91	multiplication trees.
				92
				93	First, the intrinsic needs to be extended to support integers, and second the
				94	code generator needs to be enhanced to lower these to multiplication trees.
Chris Lattner	003f633	2006-03-11 20:17:08 +0000	[diff] [blame]	95
				96	//===---------------------------------------------------------------------===//
				97
Chris Lattner	4e56b68	2006-03-11 20:20:40 +0000	[diff] [blame]	98	Interesting? testcase for add/shift/mul reassoc:
				99
				100	int bar(int x, int y) {
				101	return xxx+y+xxxxxyyyy;
				102	}
				103	int foo(int z, int n) {
				104	return bar(z, n) + bar(2z, 2n);
				105	}
				106
Chris Lattner	71cf7c2	2010-01-01 01:29:26 +0000	[diff] [blame]	107	This is blocked on not handling XXX -> powi(X, 3) (see note above). The issue
				108	is that we end up getting t = 2X s = tt and don't turn this into 4XX,
				109	which is the same number of multiplies and is canonical, because the 2*X has
				110	multiple uses. Here's a simple example:
				111
				112	define i32 @test15(i32 %X1) {
				113	%B = mul i32 %X1, 47 ; X1*47
				114	%C = mul i32 %B, %B
				115	ret i32 %C
				116	}
				117
				118
				119	//===---------------------------------------------------------------------===//
				120
				121	Reassociate should handle the example in GCC PR16157:
				122
				123	extern int a0, a1, a2, a3, a4; extern int b0, b1, b2, b3, b4;
				124	void f () { /* this can be optimized to four additions... */
				125	b4 = a4 + a3 + a2 + a1 + a0;
				126	b3 = a3 + a2 + a1 + a0;
				127	b2 = a2 + a1 + a0;
				128	b1 = a1 + a0;
				129	}
				130
				131	This requires reassociating to forms of expressions that are already available,
				132	something that reassoc doesn't think about yet.
Chris Lattner	2cca31f	2007-05-05 22:29:06 +0000	[diff] [blame]	133
Chris Lattner	7e3f8b6	2010-01-24 20:01:41 +0000	[diff] [blame]	134
				135	//===---------------------------------------------------------------------===//
				136
				137	This function: (derived from GCC PR19988)
				138	double foo(double x, double y) {
				139	return ((x + 0.1234 * y) * (x + -0.1234 * y));
				140	}
				141
				142	compiles to:
				143	_foo:
				144	movapd %xmm1, %xmm2
				145	mulsd LCPI1_1(%rip), %xmm1
				146	mulsd LCPI1_0(%rip), %xmm2
				147	addsd %xmm0, %xmm1
				148	addsd %xmm0, %xmm2
				149	movapd %xmm1, %xmm0
				150	mulsd %xmm2, %xmm0
				151	ret
				152
Chris Lattner	e3a68d1	2010-01-24 20:17:09 +0000	[diff] [blame]	153	Reassociate should be able to turn it into:
Chris Lattner	7e3f8b6	2010-01-24 20:01:41 +0000	[diff] [blame]	154
				155	double foo(double x, double y) {
				156	return ((x + 0.1234 * y) * (x - 0.1234 * y));
				157	}
				158
				159	Which allows the multiply by constant to be CSE'd, producing:
				160
				161	_foo:
				162	mulsd LCPI1_0(%rip), %xmm1
				163	movapd %xmm1, %xmm2
				164	addsd %xmm0, %xmm2
				165	subsd %xmm1, %xmm0
				166	mulsd %xmm2, %xmm0
				167	ret
				168
				169	This doesn't need -ffast-math support at all. This is particularly bad because
				170	the llvm-gcc frontend is canonicalizing the later into the former, but clang
				171	doesn't have this problem.
				172
Chris Lattner	4e56b68	2006-03-11 20:20:40 +0000	[diff] [blame]	173	//===---------------------------------------------------------------------===//
				174
Chris Lattner	f136299	2006-03-09 20:13:21 +0000	[diff] [blame]	175	These two functions should generate the same code on big-endian systems:
				176
				177	int g(int j,int l) { return memcmp(j,l,4); }
				178	int h(int j, int l) { return j - l; }
				179
				180	this could be done in SelectionDAGISel.cpp, along with other special cases,
				181	for 1,2,4,8 bytes.
				182
				183	//===---------------------------------------------------------------------===//
				184
Chris Lattner	e24cf9d	2006-03-22 07:33:46 +0000	[diff] [blame]	185	It would be nice to revert this patch:
				186	http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060213/031986.html
				187
				188	And teach the dag combiner enough to simplify the code expanded before
				189	legalize. It seems plausible that this knowledge would let it simplify other
				190	stuff too.
				191
Chris Lattner	0affd76	2006-03-24 19:59:17 +0000	[diff] [blame]	192	//===---------------------------------------------------------------------===//
				193
Reid Spencer	09575ba	2007-02-15 03:39:18 +0000	[diff] [blame]	194	For vector types, TargetData.cpp::getTypeInfo() returns alignment that is equal
Evan Cheng	dc1161c	2006-03-31 22:35:14 +0000	[diff] [blame]	195	to the type size. It works but can be overly conservative as the alignment of
Reid Spencer	09575ba	2007-02-15 03:39:18 +0000	[diff] [blame]	196	specific vector types are target dependent.
Chris Lattner	0baebb1	2006-04-01 04:08:29 +0000	[diff] [blame]	197
				198	//===---------------------------------------------------------------------===//
				199
Dan Gohman	1dbb40f	2009-05-11 18:51:16 +0000	[diff] [blame]	200	We should produce an unaligned load from code like this:
Chris Lattner	0baebb1	2006-04-01 04:08:29 +0000	[diff] [blame]	201
				202	v4sf example(float *P) {
				203	return (v4sf){P[0], P[1], P[2], P[3] };
				204	}
				205
				206	//===---------------------------------------------------------------------===//
				207
Chris Lattner	4cda95b	2006-05-18 18:26:13 +0000	[diff] [blame]	208	Add support for conditional increments, and other related patterns. Instead
				209	of:
				210
				211	movl 136(%esp), %eax
				212	cmpl $0, %eax
				213	je LBB16_2 #cond_next
				214	LBB16_1: #cond_true
				215	incl _foo
				216	LBB16_2: #cond_next
				217
				218	emit:
				219	movl _foo, %eax
				220	cmpl $1, %edi
				221	sbbl $-1, %eax
				222	movl %eax, _foo
				223
				224	//===---------------------------------------------------------------------===//
Chris Lattner	240f846	2006-05-19 20:45:08 +0000	[diff] [blame]	225
				226	Combine: a = sin(x), b = cos(x) into a,b = sincos(x).
				227
				228	Expand these to calls of sin/cos and stores:
				229	double sincos(double x, double sin, double cos);
				230	float sincosf(float x, float sin, float cos);
				231	long double sincosl(long double x, long double sin, long double cos);
				232
				233	Doing so could allow SROA of the destination pointers. See also:
				234	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17687
				235
Chris Lattner	56fe52e	2008-12-10 01:30:48 +0000	[diff] [blame]	236	This is now easily doable with MRVs. We could even make an intrinsic for this
				237	if anyone cared enough about sincos.
				238
Chris Lattner	240f846	2006-05-19 20:45:08 +0000	[diff] [blame]	239	//===---------------------------------------------------------------------===//
Chris Lattner	29d7bde	2006-05-19 21:01:38 +0000	[diff] [blame]	240
Chris Lattner	f7e3478	2006-09-16 23:57:51 +0000	[diff] [blame]	241	quantum_sigma_x in 462.libquantum contains the following loop:
				242
				243	for(i=0; i<reg->size; i++)
				244	{
				245	/* Flip the target bit of each basis state */
				246	reg->node[i].state ^= ((MAX_UNSIGNED) 1 << target);
				247	}
				248
				249	Where MAX_UNSIGNED/state is a 64-bit int. On a 32-bit platform it would be just
				250	so cool to turn it into something like:
				251
Chris Lattner	4a13d3b	2006-09-18 04:54:35 +0000	[diff] [blame]	252	long long Res = ((MAX_UNSIGNED) 1 << target);
Chris Lattner	f7e3478	2006-09-16 23:57:51 +0000	[diff] [blame]	253	if (target < 32) {
				254	for(i=0; i<reg->size; i++)
Chris Lattner	4a13d3b	2006-09-18 04:54:35 +0000	[diff] [blame]	255	reg->node[i].state ^= Res & 0xFFFFFFFFULL;
Chris Lattner	f7e3478	2006-09-16 23:57:51 +0000	[diff] [blame]	256	} else {
				257	for(i=0; i<reg->size; i++)
Chris Lattner	4a13d3b	2006-09-18 04:54:35 +0000	[diff] [blame]	258	reg->node[i].state ^= Res & 0xFFFFFFFF00000000ULL
Chris Lattner	f7e3478	2006-09-16 23:57:51 +0000	[diff] [blame]	259	}
				260
				261	... which would only do one 32-bit XOR per loop iteration instead of two.
				262
				263	It would also be nice to recognize the reg->size doesn't alias reg->node[i], but
Chris Lattner	8e09ad6	2009-11-26 01:51:18 +0000	[diff] [blame]	264	this requires TBAA.
Chris Lattner	6185878	2009-09-21 06:04:07 +0000	[diff] [blame]	265
				266	//===---------------------------------------------------------------------===//
				267
Chris Lattner	f9325e5	2008-10-05 02:16:12 +0000	[diff] [blame]	268	This isn't recognized as bswap by instcombine (yes, it really is bswap):
Chris Lattner	4d475f6	2006-12-08 02:01:32 +0000	[diff] [blame]	269
				270	unsigned long reverse(unsigned v) {
				271	unsigned t;
				272	t = v ^ ((v << 16) \| (v >> 16));
				273	t &= ~0xff0000;
				274	v = (v << 24) \| (v >> 8);
				275	return v ^ (t >> 8);
				276	}
				277
Chris Lattner	f11327d	2006-09-25 17:12:14 +0000	[diff] [blame]	278	//===---------------------------------------------------------------------===//
				279
Chris Lattner	249da5c	2010-01-23 18:49:30 +0000	[diff] [blame]	280	[LOOP RECOGNITION]
				281
Chris Lattner	843dacc	2008-10-15 16:02:15 +0000	[diff] [blame]	282	These idioms should be recognized as popcount (see PR1488):
				283
				284	unsigned countbits_slow(unsigned v) {
				285	unsigned c;
				286	for (c = 0; v; v >>= 1)
				287	c += v & 1;
				288	return c;
				289	}
				290	unsigned countbits_fast(unsigned v){
				291	unsigned c;
				292	for (c = 0; v; c++)
				293	v &= v - 1; // clear the least significant bit set
				294	return c;
				295	}
				296
				297	BITBOARD = unsigned long long
				298	int PopCnt(register BITBOARD a) {
				299	register int c=0;
				300	while(a) {
				301	c++;
				302	a &= a - 1;
				303	}
				304	return c;
				305	}
				306	unsigned int popcount(unsigned int input) {
				307	unsigned int count = 0;
				308	for (unsigned int i = 0; i < 4 * 8; i++)
				309	count += (input >> i) & i;
				310	return count;
				311	}
				312
Chris Lattner	51415d2	2011-01-02 18:31:38 +0000	[diff] [blame]	313	This sort of thing should be added to the loop idiom pass.
Chris Lattner	8e09ad6	2009-11-26 01:51:18 +0000	[diff] [blame]	314
Chris Lattner	843dacc	2008-10-15 16:02:15 +0000	[diff] [blame]	315	//===---------------------------------------------------------------------===//
				316
Chris Lattner	f11327d	2006-09-25 17:12:14 +0000	[diff] [blame]	317	These should turn into single 16-bit (unaligned?) loads on little/big endian
				318	processors.
				319
				320	unsigned short read_16_le(const unsigned char *adr) {
				321	return adr[0] \| (adr[1] << 8);
				322	}
				323	unsigned short read_16_be(const unsigned char *adr) {
				324	return (adr[0] << 8) \| adr[1];
				325	}
				326
				327	//===---------------------------------------------------------------------===//
Chris Lattner	f054003	2006-10-24 16:12:47 +0000	[diff] [blame]	328
Reid Spencer	7e80b0b	2006-10-26 06:15:43 +0000	[diff] [blame]	329	-instcombine should handle this transform:
Reid Spencer	266e42b	2006-12-23 06:05:41 +0000	[diff] [blame]	330	icmp pred (sdiv X / C1 ), C2
Reid Spencer	7e80b0b	2006-10-26 06:15:43 +0000	[diff] [blame]	331	when X, C1, and C2 are unsigned. Similarly for udiv and signed operands.
				332
				333	Currently InstCombine avoids this transform but will do it when the signs of
				334	the operands and the sign of the divide match. See the FIXME in
				335	InstructionCombining.cpp in the visitSetCondInst method after the switch case
				336	for Instruction::UDiv (around line 4447) for more details.
				337
				338	The SingleSource/Benchmarks/Shootout-C++/hash and hash2 tests have examples of
				339	this construct.
Chris Lattner	2048373	2006-11-03 22:27:39 +0000	[diff] [blame]	340
				341	//===---------------------------------------------------------------------===//
				342
Chris Lattner	082da53	2010-01-23 17:59:23 +0000	[diff] [blame]	343	[LOOP OPTIMIZATION]
				344
				345	SingleSource/Benchmarks/Misc/dt.c shows several interesting optimization
				346	opportunities in its double_array_divs_variable function: it needs loop
				347	interchange, memory promotion (which LICM already does), vectorization and
				348	variable trip count loop unrolling (since it has a constant trip count). ICC
				349	apparently produces this very nice code with -ffast-math:
				350
				351	..B1.70: # Preds ..B1.70 ..B1.69
				352	mulpd %xmm0, %xmm1 #108.2
				353	mulpd %xmm0, %xmm1 #108.2
				354	mulpd %xmm0, %xmm1 #108.2
				355	mulpd %xmm0, %xmm1 #108.2
				356	addl $8, %edx #
				357	cmpl $131072, %edx #108.2
				358	jb ..B1.70 # Prob 99% #108.2
				359
				360	It would be better to count down to zero, but this is a lot better than what we
				361	do.
				362
				363	//===---------------------------------------------------------------------===//
				364
Chris Lattner	89e5813	2007-01-16 06:39:48 +0000	[diff] [blame]	365	Consider:
				366
				367	typedef unsigned U32;
				368	typedef unsigned long long U64;
				369	int test (U32 inst, U64 regs) {
				370	U64 effective_addr2;
				371	U32 temp = *inst;
				372	int r1 = (temp >> 20) & 0xf;
				373	int b2 = (temp >> 16) & 0xf;
				374	effective_addr2 = temp & 0xfff;
				375	if (b2) effective_addr2 += regs[b2];
				376	b2 = (temp >> 12) & 0xf;
				377	if (b2) effective_addr2 += regs[b2];
				378	effective_addr2 &= regs[4];
				379	if ((effective_addr2 & 3) == 0)
				380	return 1;
				381	return 0;
				382	}
				383
				384	Note that only the low 2 bits of effective_addr2 are used. On 32-bit systems,
				385	we don't eliminate the computation of the top half of effective_addr2 because
				386	we don't have whole-function selection dags. On x86, this means we use one
				387	extra register for the function when effective_addr2 is declared as U64 than
				388	when it is declared U32.
				389
Chris Lattner	0169fd7	2009-11-10 23:47:45 +0000	[diff] [blame]	390	PHI Slicing could be extended to do this.
				391
Chris Lattner	89e5813	2007-01-16 06:39:48 +0000	[diff] [blame]	392	//===---------------------------------------------------------------------===//
				393
Chris Lattner	8e09ad6	2009-11-26 01:51:18 +0000	[diff] [blame]	394	LSR should know what GPR types a target has from TargetData. This code:
Chris Lattner	43cab75	2007-03-24 06:01:32 +0000	[diff] [blame]	395
				396	volatile short X, Y; // globals
				397
				398	void foo(int N) {
				399	int i;
				400	for (i = 0; i < N; i++) { X = i; Y = i*4; }
				401	}
				402
Chris Lattner	fc2d846	2009-09-20 17:37:38 +0000	[diff] [blame]	403	produces two near identical IV's (after promotion) on PPC/ARM:
Chris Lattner	43cab75	2007-03-24 06:01:32 +0000	[diff] [blame]	404
Chris Lattner	fc2d846	2009-09-20 17:37:38 +0000	[diff] [blame]	405	LBB1_2:
				406	ldr r3, LCPI1_0
				407	ldr r3, [r3]
				408	strh r2, [r3]
				409	ldr r3, LCPI1_1
				410	ldr r3, [r3]
				411	strh r1, [r3]
				412	add r1, r1, #4
				413	add r2, r2, #1 <- [0,+,1]
				414	sub r0, r0, #1 <- [0,-,1]
				415	cmp r0, #0
				416	bne LBB1_2
				417
				418	LSR should reuse the "+" IV for the exit test.
Chris Lattner	43cab75	2007-03-24 06:01:32 +0000	[diff] [blame]	419
Chris Lattner	43cab75	2007-03-24 06:01:32 +0000	[diff] [blame]	420	//===---------------------------------------------------------------------===//
				421
Chris Lattner	2cca31f	2007-05-05 22:29:06 +0000	[diff] [blame]	422	Tail call elim should be more aggressive, checking to see if the call is
				423	followed by an uncond branch to an exit block.
				424
				425	; This testcase is due to tail-duplication not wanting to copy the return
				426	; instruction into the terminating blocks because there was other code
				427	; optimized out of the function after the taildup happened.
Chris Lattner	6bb6a55	2008-02-18 18:46:39 +0000	[diff] [blame]	428	; RUN: llvm-as < %s \| opt -tailcallelim \| llvm-dis \| not grep call
Chris Lattner	2cca31f	2007-05-05 22:29:06 +0000	[diff] [blame]	429
Chris Lattner	6bb6a55	2008-02-18 18:46:39 +0000	[diff] [blame]	430	define i32 @t4(i32 %a) {
Chris Lattner	2cca31f	2007-05-05 22:29:06 +0000	[diff] [blame]	431	entry:
Chris Lattner	6bb6a55	2008-02-18 18:46:39 +0000	[diff] [blame]	432	%tmp.1 = and i32 %a, 1 ; <i32> [#uses=1]
				433	%tmp.2 = icmp ne i32 %tmp.1, 0 ; <i1> [#uses=1]
				434	br i1 %tmp.2, label %then.0, label %else.0
Chris Lattner	2cca31f	2007-05-05 22:29:06 +0000	[diff] [blame]	435
Chris Lattner	6bb6a55	2008-02-18 18:46:39 +0000	[diff] [blame]	436	then.0: ; preds = %entry
				437	%tmp.5 = add i32 %a, -1 ; <i32> [#uses=1]
				438	%tmp.3 = call i32 @t4( i32 %tmp.5 ) ; <i32> [#uses=1]
				439	br label %return
Chris Lattner	2cca31f	2007-05-05 22:29:06 +0000	[diff] [blame]	440
Chris Lattner	6bb6a55	2008-02-18 18:46:39 +0000	[diff] [blame]	441	else.0: ; preds = %entry
				442	%tmp.7 = icmp ne i32 %a, 0 ; <i1> [#uses=1]
				443	br i1 %tmp.7, label %then.1, label %return
Chris Lattner	2cca31f	2007-05-05 22:29:06 +0000	[diff] [blame]	444
Chris Lattner	6bb6a55	2008-02-18 18:46:39 +0000	[diff] [blame]	445	then.1: ; preds = %else.0
				446	%tmp.11 = add i32 %a, -2 ; <i32> [#uses=1]
				447	%tmp.9 = call i32 @t4( i32 %tmp.11 ) ; <i32> [#uses=1]
				448	br label %return
Chris Lattner	2cca31f	2007-05-05 22:29:06 +0000	[diff] [blame]	449
Chris Lattner	6bb6a55	2008-02-18 18:46:39 +0000	[diff] [blame]	450	return: ; preds = %then.1, %else.0, %then.0
				451	%result.0 = phi i32 [ 0, %else.0 ], [ %tmp.3, %then.0 ],
Chris Lattner	2cca31f	2007-05-05 22:29:06 +0000	[diff] [blame]	452	[ %tmp.9, %then.1 ]
Chris Lattner	6bb6a55	2008-02-18 18:46:39 +0000	[diff] [blame]	453	ret i32 %result.0
Chris Lattner	2cca31f	2007-05-05 22:29:06 +0000	[diff] [blame]	454	}
Chris Lattner	9958b82	2007-05-05 22:44:08 +0000	[diff] [blame]	455
				456	//===---------------------------------------------------------------------===//
				457
Chris Lattner	4afb010	2008-08-10 00:47:21 +0000	[diff] [blame]	458	Tail recursion elimination should handle:
				459
				460	int pow2m1(int n) {
				461	if (n == 0)
				462	return 0;
				463	return 2 * pow2m1 (n - 1) + 1;
				464	}
				465
				466	Also, multiplies can be turned into SHL's, so they should be handled as if
				467	they were associative. "return foo() << 1" can be tail recursion eliminated.
				468
				469	//===---------------------------------------------------------------------===//
				470
Chris Lattner	9958b82	2007-05-05 22:44:08 +0000	[diff] [blame]	471	Argument promotion should promote arguments for recursive functions, like
				472	this:
				473
Chris Lattner	6bb6a55	2008-02-18 18:46:39 +0000	[diff] [blame]	474	; RUN: llvm-as < %s \| opt -argpromotion \| llvm-dis \| grep x.val
Chris Lattner	9958b82	2007-05-05 22:44:08 +0000	[diff] [blame]	475
Chris Lattner	6bb6a55	2008-02-18 18:46:39 +0000	[diff] [blame]	476	define internal i32 @foo(i32* %x) {
Chris Lattner	9958b82	2007-05-05 22:44:08 +0000	[diff] [blame]	477	entry:
Chris Lattner	6bb6a55	2008-02-18 18:46:39 +0000	[diff] [blame]	478	%tmp = load i32* %x ; <i32> [#uses=0]
				479	%tmp.foo = call i32 @foo( i32* %x ) ; <i32> [#uses=1]
				480	ret i32 %tmp.foo
Chris Lattner	9958b82	2007-05-05 22:44:08 +0000	[diff] [blame]	481	}
				482
Chris Lattner	6bb6a55	2008-02-18 18:46:39 +0000	[diff] [blame]	483	define i32 @bar(i32* %x) {
Chris Lattner	9958b82	2007-05-05 22:44:08 +0000	[diff] [blame]	484	entry:
Chris Lattner	6bb6a55	2008-02-18 18:46:39 +0000	[diff] [blame]	485	%tmp3 = call i32 @foo( i32* %x ) ; <i32> [#uses=1]
				486	ret i32 %tmp3
Chris Lattner	9958b82	2007-05-05 22:44:08 +0000	[diff] [blame]	487	}
				488
Chris Lattner	5e224c3	2007-12-05 23:05:06 +0000	[diff] [blame]	489	//===---------------------------------------------------------------------===//
Chris Lattner	180f0e9	2007-12-28 04:42:05 +0000	[diff] [blame]	490
Chris Lattner	9d53b61	2007-12-28 22:30:05 +0000	[diff] [blame]	491	We should investigate an instruction sinking pass. Consider this silly
				492	example in pic mode:
				493
				494	#include <assert.h>
				495	void foo(int x) {
				496	assert(x);
				497	//...
				498	}
				499
				500	we compile this to:
				501	_foo:
				502	subl $28, %esp
				503	call "L1$pb"
				504	"L1$pb":
				505	popl %eax
				506	cmpl $0, 32(%esp)
				507	je LBB1_2 # cond_true
				508	LBB1_1: # return
				509	# ...
				510	addl $28, %esp
				511	ret
				512	LBB1_2: # cond_true
				513	...
				514
				515	The PIC base computation (call+popl) is only used on one path through the
				516	code, but is currently always computed in the entry block. It would be
				517	better to sink the picbase computation down into the block for the
				518	assertion, as it is the only one that uses it. This happens for a lot of
				519	code with early outs.
				520
Chris Lattner	7cafd92	2007-12-29 01:05:01 +0000	[diff] [blame]	521	Another example is loads of arguments, which are usually emitted into the
				522	entry block on targets like x86. If not used in all paths through a
				523	function, they should be sunk into the ones that do.
				524
Chris Lattner	9d53b61	2007-12-28 22:30:05 +0000	[diff] [blame]	525	In this case, whole-function-isel would also handle this.
Chris Lattner	180f0e9	2007-12-28 04:42:05 +0000	[diff] [blame]	526
				527	//===---------------------------------------------------------------------===//
Chris Lattner	730d088	2008-01-07 21:38:14 +0000	[diff] [blame]	528
				529	Investigate lowering of sparse switch statements into perfect hash tables:
				530	http://burtleburtle.net/bob/hash/perfect.html
				531
				532	//===---------------------------------------------------------------------===//
Chris Lattner	45e5032	2008-01-09 00:17:57 +0000	[diff] [blame]	533
				534	We should turn things like "load+fabs+store" and "load+fneg+store" into the
				535	corresponding integer operations. On a yonah, this loop:
				536
				537	double a[256];
Chris Lattner	6bb6a55	2008-02-18 18:46:39 +0000	[diff] [blame]	538	void foo() {
				539	int i, b;
				540	for (b = 0; b < 10000000; b++)
				541	for (i = 0; i < 256; i++)
				542	a[i] = -a[i];
				543	}
Chris Lattner	45e5032	2008-01-09 00:17:57 +0000	[diff] [blame]	544
				545	is twice as slow as this loop:
				546
				547	long long a[256];
Chris Lattner	6bb6a55	2008-02-18 18:46:39 +0000	[diff] [blame]	548	void foo() {
				549	int i, b;
				550	for (b = 0; b < 10000000; b++)
				551	for (i = 0; i < 256; i++)
				552	a[i] ^= (1ULL << 63);
				553	}
Chris Lattner	45e5032	2008-01-09 00:17:57 +0000	[diff] [blame]	554
				555	and I suspect other processors are similar. On X86 in particular this is a
				556	big win because doing this with integers allows the use of read/modify/write
				557	instructions.
				558
				559	//===---------------------------------------------------------------------===//
Chris Lattner	1d07b65	2008-01-10 18:25:41 +0000	[diff] [blame]	560
				561	DAG Combiner should try to combine small loads into larger loads when
				562	profitable. For example, we compile this C++ example:
				563
				564	struct THotKey { short Key; bool Control; bool Shift; bool Alt; };
				565	extern THotKey m_HotKey;
				566	THotKey GetHotKey () { return m_HotKey; }
				567
Chris Lattner	51415d2	2011-01-02 18:31:38 +0000	[diff] [blame]	568	into (-m64 -O3 -fno-exceptions -static -fomit-frame-pointer):
Chris Lattner	1d07b65	2008-01-10 18:25:41 +0000	[diff] [blame]	569
Chris Lattner	51415d2	2011-01-02 18:31:38 +0000	[diff] [blame]	570	__Z9GetHotKeyv: ## @_Z9GetHotKeyv
				571	movq _m_HotKey@GOTPCREL(%rip), %rax
				572	movzwl (%rax), %ecx
				573	movzbl 2(%rax), %edx
				574	shlq $16, %rdx
				575	orq %rcx, %rdx
				576	movzbl 3(%rax), %ecx
				577	shlq $24, %rcx
				578	orq %rdx, %rcx
				579	movzbl 4(%rax), %eax
				580	shlq $32, %rax
				581	orq %rcx, %rax
				582	ret
Chris Lattner	1d07b65	2008-01-10 18:25:41 +0000	[diff] [blame]	583
				584	//===---------------------------------------------------------------------===//
Chris Lattner	87b0c13	2008-01-11 06:17:47 +0000	[diff] [blame]	585
Nate Begeman	0fddc34	2008-02-18 18:39:23 +0000	[diff] [blame]	586	We should add an FRINT node to the DAG to model targets that have legal
				587	implementations of ceil/floor/rint.
Chris Lattner	765be88	2008-02-28 05:34:27 +0000	[diff] [blame]	588
				589	//===---------------------------------------------------------------------===//
				590
				591	Consider:
				592
				593	int test() {
Benjamin Kramer	dfa40f8	2010-12-23 15:32:07 +0000	[diff] [blame]	594	long long input[8] = {1,0,1,0,1,0,1,0};
Chris Lattner	765be88	2008-02-28 05:34:27 +0000	[diff] [blame]	595	foo(input);
				596	}
				597
Chris Lattner	e5d5a41	2011-01-01 22:52:11 +0000	[diff] [blame]	598	Clang compiles this into:
Chris Lattner	765be88	2008-02-28 05:34:27 +0000	[diff] [blame]	599
Chris Lattner	e5d5a41	2011-01-01 22:52:11 +0000	[diff] [blame]	600	call void @llvm.memset.p0i8.i64(i8* %tmp, i8 0, i64 64, i32 16, i1 false)
				601	%0 = getelementptr [8 x i64]* %input, i64 0, i64 0
				602	store i64 1, i64* %0, align 16
				603	%1 = getelementptr [8 x i64]* %input, i64 0, i64 2
				604	store i64 1, i64* %1, align 16
				605	%2 = getelementptr [8 x i64]* %input, i64 0, i64 4
				606	store i64 1, i64* %2, align 16
				607	%3 = getelementptr [8 x i64]* %input, i64 0, i64 6
				608	store i64 1, i64* %3, align 16
Chris Lattner	765be88	2008-02-28 05:34:27 +0000	[diff] [blame]	609
Chris Lattner	e5d5a41	2011-01-01 22:52:11 +0000	[diff] [blame]	610	Which gets codegen'd into:
				611
				612	pxor %xmm0, %xmm0
				613	movaps %xmm0, -16(%rbp)
				614	movaps %xmm0, -32(%rbp)
				615	movaps %xmm0, -48(%rbp)
				616	movaps %xmm0, -64(%rbp)
				617	movq $1, -64(%rbp)
				618	movq $1, -48(%rbp)
				619	movq $1, -32(%rbp)
				620	movq $1, -16(%rbp)
				621
				622	It would be better to have 4 movq's of 0 instead of the movaps's.
Chris Lattner	765be88	2008-02-28 05:34:27 +0000	[diff] [blame]	623
				624	//===---------------------------------------------------------------------===//
Chris Lattner	647c664	2008-03-02 02:51:40 +0000	[diff] [blame]	625
				626	http://llvm.org/PR717:
				627
				628	The following code should compile into "ret int undef". Instead, LLVM
				629	produces "ret int 0":
				630
				631	int f() {
				632	int x = 4;
				633	int y;
				634	if (x == 3) y = 0;
				635	return y;
				636	}
				637
				638	//===---------------------------------------------------------------------===//
Chris Lattner	d51372a	2008-03-02 19:29:42 +0000	[diff] [blame]	639
				640	The loop unroller should partially unroll loops (instead of peeling them)
				641	when code growth isn't too bad and when an unroll count allows simplification
				642	of some code within the loop. One trivial example is:
				643
				644	#include <stdio.h>
				645	int main() {
				646	int nRet = 17;
				647	int nLoop;
				648	for ( nLoop = 0; nLoop < 1000; nLoop++ ) {
				649	if ( nLoop & 1 )
				650	nRet += 2;
				651	else
				652	nRet -= 1;
				653	}
				654	return nRet;
				655	}
				656
				657	Unrolling by 2 would eliminate the '&1' in both copies, leading to a net
				658	reduction in code size. The resultant code would then also be suitable for
				659	exit value computation.
				660
				661	//===---------------------------------------------------------------------===//
Chris Lattner	af8d3c6	2008-03-17 01:47:51 +0000	[diff] [blame]	662
				663	We miss a bunch of rotate opportunities on various targets, including ppc, x86,
				664	etc. On X86, we miss a bunch of 'rotate by variable' cases because the rotate
				665	matching code in dag combine doesn't look through truncates aggressively
				666	enough. Here are some testcases reduces from GCC PR17886:
				667
Chris Lattner	af8d3c6	2008-03-17 01:47:51 +0000	[diff] [blame]	668	unsigned long long f5(unsigned long long x, unsigned long long y) {
				669	return (x << 8) \| ((y >> 48) & 0xffull);
				670	}
				671	unsigned long long f6(unsigned long long x, unsigned long long y, int z) {
				672	switch(z) {
				673	case 1:
				674	return (x << 8) \| ((y >> 48) & 0xffull);
				675	case 2:
				676	return (x << 16) \| ((y >> 40) & 0xffffull);
				677	case 3:
				678	return (x << 24) \| ((y >> 32) & 0xffffffull);
				679	case 4:
				680	return (x << 32) \| ((y >> 24) & 0xffffffffull);
				681	default:
				682	return (x << 40) \| ((y >> 16) & 0xffffffffffull);
				683	}
				684	}
				685
Chris Lattner	af8d3c6	2008-03-17 01:47:51 +0000	[diff] [blame]	686	//===---------------------------------------------------------------------===//
Chris Lattner	fd5fe2a	2008-03-20 04:46:13 +0000	[diff] [blame]	687
Chris Lattner	27ecda1	2010-12-15 07:10:43 +0000	[diff] [blame]	688	This (and similar related idioms):
				689
				690	unsigned int foo(unsigned char i) {
				691	return i \| (i<<8) \| (i<<16) \| (i<<24);
				692	}
				693
				694	compiles into:
				695
				696	define i32 @foo(i8 zeroext %i) nounwind readnone ssp noredzone {
				697	entry:
				698	%conv = zext i8 %i to i32
				699	%shl = shl i32 %conv, 8
				700	%shl5 = shl i32 %conv, 16
				701	%shl9 = shl i32 %conv, 24
				702	%or = or i32 %shl9, %conv
				703	%or6 = or i32 %or, %shl5
				704	%or10 = or i32 %or6, %shl
				705	ret i32 %or10
				706	}
				707
				708	it would be better as:
				709
				710	unsigned int bar(unsigned char i) {
				711	unsigned int j=i \| (i << 8);
				712	return j \| (j<<16);
				713	}
				714
				715	aka:
				716
				717	define i32 @bar(i8 zeroext %i) nounwind readnone ssp noredzone {
				718	entry:
				719	%conv = zext i8 %i to i32
				720	%shl = shl i32 %conv, 8
				721	%or = or i32 %shl, %conv
				722	%shl5 = shl i32 %or, 16
				723	%or6 = or i32 %shl5, %or
				724	ret i32 %or6
				725	}
				726
				727	or even i*0x01010101, depending on the speed of the multiplier. The best way to
				728	handle this is to canonicalize it to a multiply in IR and have codegen handle
				729	lowering multiplies to shifts on cpus where shifts are faster.
				730
				731	//===---------------------------------------------------------------------===//
				732
Chris Lattner	fd5fe2a	2008-03-20 04:46:13 +0000	[diff] [blame]	733	We do a number of simplifications in simplify libcalls to strength reduce
				734	standard library functions, but we don't currently merge them together. For
				735	example, it is useful to merge memcpy(a,b,strlen(b)) -> strcpy. This can only
				736	be done safely if "b" isn't modified between the strlen and memcpy of course.
				737
				738	//===---------------------------------------------------------------------===//
				739
Chris Lattner	113b336	2008-08-10 01:14:08 +0000	[diff] [blame]	740	We compile this program: (from GCC PR11680)
				741	http://gcc.gnu.org/bugzilla/attachment.cgi?id=4487
				742
				743	Into code that runs the same speed in fast/slow modes, but both modes run 2x
				744	slower than when compile with GCC (either 4.0 or 4.2):
				745
				746	$ llvm-g++ perf.cpp -O3 -fno-exceptions
				747	$ time ./a.out fast
				748	1.821u 0.003s 0:01.82 100.0% 0+0k 0+0io 0pf+0w
				749
				750	$ g++ perf.cpp -O3 -fno-exceptions
				751	$ time ./a.out fast
				752	0.821u 0.001s 0:00.82 100.0% 0+0k 0+0io 0pf+0w
				753
				754	It looks like we are making the same inlining decisions, so this may be raw
				755	codegen badness or something else (haven't investigated).
				756
				757	//===---------------------------------------------------------------------===//
				758
				759	We miss some instcombines for stuff like this:
				760	void bar (void);
				761	void foo (unsigned int a) {
				762	/* This one is equivalent to a >= (3 << 2). */
				763	if ((a >> 2) >= 3)
				764	bar ();
				765	}
				766
				767	A few other related ones are in GCC PR14753.
				768
				769	//===---------------------------------------------------------------------===//
				770
				771	Divisibility by constant can be simplified (according to GCC PR12849) from
				772	being a mulhi to being a mul lo (cheaper). Testcase:
				773
				774	void bar(unsigned n) {
				775	if (n % 3 == 0)
				776	true();
				777	}
				778
Eli Friedman	96cf7f4	2009-12-12 23:23:43 +0000	[diff] [blame]	779	This is equivalent to the following, where 2863311531 is the multiplicative
				780	inverse of 3, and 1431655766 is ((2^32)-1)/3+1:
				781	void bar(unsigned n) {
				782	if (n * 2863311531U < 1431655766U)
				783	true();
				784	}
				785
				786	The same transformation can work with an even modulo with the addition of a
				787	rotate: rotate the result of the multiply to the right by the number of bits
				788	which need to be zero for the condition to be true, and shrink the compare RHS
				789	by the same amount. Unless the target supports rotates, though, that
				790	transformation probably isn't worthwhile.
				791
				792	The transformation can also easily be made to work with non-zero equality
				793	comparisons: just transform, for example, "n % 3 == 1" to "(n-1) % 3 == 0".
Chris Lattner	113b336	2008-08-10 01:14:08 +0000	[diff] [blame]	794
				795	//===---------------------------------------------------------------------===//
Chris Lattner	d7dd8b8	2008-08-19 06:22:16 +0000	[diff] [blame]	796
Chris Lattner	6d275fd	2008-10-15 16:06:03 +0000	[diff] [blame]	797	Better mod/ref analysis for scanf would allow us to eliminate the vtable and a
				798	bunch of other stuff from this example (see PR1604):
				799
				800	#include <cstdio>
				801	struct test {
				802	int val;
				803	virtual ~test() {}
				804	};
				805
				806	int main() {
				807	test t;
				808	std::scanf("%d", &t.val);
				809	std::printf("%d\n", t.val);
				810	}
				811
				812	//===---------------------------------------------------------------------===//
				813
Nick Lewycky	edd5d3e	2008-11-27 22:41:45 +0000	[diff] [blame]	814	These functions perform the same computation, but produce different assembly.
Nick Lewycky	b3dc4ad	2008-11-27 22:12:22 +0000	[diff] [blame]	815
				816	define i8 @select(i8 %x) readnone nounwind {
				817	%A = icmp ult i8 %x, 250
				818	%B = select i1 %A, i8 0, i8 1
				819	ret i8 %B
				820	}
				821
				822	define i8 @addshr(i8 %x) readnone nounwind {
				823	%A = zext i8 %x to i9
				824	%B = add i9 %A, 6 ;; 256 - 250 == 6
				825	%C = lshr i9 %B, 8
				826	%D = trunc i9 %C to i8
				827	ret i8 %D
				828	}
				829
				830	//===---------------------------------------------------------------------===//
Eli Friedman	e16c0ff	2008-11-30 07:36:04 +0000	[diff] [blame]	831
				832	From gcc bug 24696:
				833	int
				834	f (unsigned long a, unsigned long b, unsigned long c)
				835	{
				836	return ((a & (c - 1)) != 0) \|\| ((b & (c - 1)) != 0);
				837	}
				838	int
				839	f (unsigned long a, unsigned long b, unsigned long c)
				840	{
				841	return ((a & (c - 1)) != 0) \| ((b & (c - 1)) != 0);
				842	}
				843	Both should combine to ((a\|b) & (c-1)) != 0. Currently not optimized with
				844	"clang -emit-llvm-bc \| opt -std-compile-opts".
				845
				846	//===---------------------------------------------------------------------===//
				847
				848	From GCC Bug 20192:
				849	#define PMD_MASK (~((1UL << 23) - 1))
				850	void clear_pmd_range(unsigned long start, unsigned long end)
				851	{
				852	if (!(start & ~PMD_MASK) && !(end & ~PMD_MASK))
				853	f();
				854	}
				855	The expression should optimize to something like
				856	"!((start\|end)&~PMD_MASK). Currently not optimized with "clang
				857	-emit-llvm-bc \| opt -std-compile-opts".
				858
				859	//===---------------------------------------------------------------------===//
				860
Eli Friedman	e16c0ff	2008-11-30 07:36:04 +0000	[diff] [blame]	861	unsigned int f(unsigned int i, unsigned int n) {++i; if (i == n) ++i; return
				862	i;}
				863	unsigned int f2(unsigned int i, unsigned int n) {++i; i += i == n; return i;}
				864	These should combine to the same thing. Currently, the first function
				865	produces better code on X86.
				866
				867	//===---------------------------------------------------------------------===//
				868
Eli Friedman	e16c0ff	2008-11-30 07:36:04 +0000	[diff] [blame]	869	From GCC Bug 15784:
				870	#define abs(x) x>0?x:-x
				871	int f(int x, int y)
				872	{
				873	return (abs(x)) >= 0;
				874	}
				875	This should optimize to x == INT_MIN. (With -fwrapv.) Currently not
				876	optimized with "clang -emit-llvm-bc \| opt -std-compile-opts".
				877
				878	//===---------------------------------------------------------------------===//
				879
				880	From GCC Bug 14753:
				881	void
				882	rotate_cst (unsigned int a)
				883	{
				884	a = (a << 10) \| (a >> 22);
				885	if (a == 123)
				886	bar ();
				887	}
				888	void
				889	minus_cst (unsigned int a)
				890	{
				891	unsigned int tem;
				892
				893	tem = 20 - a;
				894	if (tem == 5)
				895	bar ();
				896	}
				897	void
				898	mask_gt (unsigned int a)
				899	{
				900	/* This is equivalent to a > 15. */
				901	if ((a & ~7) > 8)
				902	bar ();
				903	}
				904	void
				905	rshift_gt (unsigned int a)
				906	{
				907	/* This is equivalent to a > 23. */
				908	if ((a >> 2) > 5)
				909	bar ();
				910	}
				911	All should simplify to a single comparison. All of these are
				912	currently not optimized with "clang -emit-llvm-bc \| opt
				913	-std-compile-opts".
				914
				915	//===---------------------------------------------------------------------===//
				916
				917	From GCC Bug 32605:
				918	int c(int* x) {return (char)x+2 == (char)x;}
				919	Should combine to 0. Currently not optimized with "clang
				920	-emit-llvm-bc \| opt -std-compile-opts" (although llc can optimize it).
				921
				922	//===---------------------------------------------------------------------===//
				923
Eli Friedman	e16c0ff	2008-11-30 07:36:04 +0000	[diff] [blame]	924	int a(unsigned b) {return ((b << 31) \| (b << 30)) >> 31;}
				925	Should be combined to "((b >> 1) \| b) & 1". Currently not optimized
				926	with "clang -emit-llvm-bc \| opt -std-compile-opts".
				927
				928	//===---------------------------------------------------------------------===//
				929
				930	unsigned a(unsigned x, unsigned y) { return x \| (y & 1) \| (y & 2);}
				931	Should combine to "x \| (y & 3)". Currently not optimized with "clang
				932	-emit-llvm-bc \| opt -std-compile-opts".
				933
				934	//===---------------------------------------------------------------------===//
				935
Eli Friedman	e16c0ff	2008-11-30 07:36:04 +0000	[diff] [blame]	936	int a(int a, int b, int c) {return (~a & c) \| ((c\|a) & b);}
				937	Should fold to "(~a & c) \| (a & b)". Currently not optimized with
				938	"clang -emit-llvm-bc \| opt -std-compile-opts".
				939
				940	//===---------------------------------------------------------------------===//
				941
				942	int a(int a,int b) {return (~(a\|b))\|a;}
				943	Should fold to "a\|~b". Currently not optimized with "clang
				944	-emit-llvm-bc \| opt -std-compile-opts".
				945
				946	//===---------------------------------------------------------------------===//
				947
				948	int a(int a, int b) {return (a&&b) \|\| (a&&!b);}
				949	Should fold to "a". Currently not optimized with "clang -emit-llvm-bc
				950	\| opt -std-compile-opts".
				951
				952	//===---------------------------------------------------------------------===//
				953
				954	int a(int a, int b, int c) {return (a&&b) \|\| (!a&&c);}
				955	Should fold to "a ? b : c", or at least something sane. Currently not
				956	optimized with "clang -emit-llvm-bc \| opt -std-compile-opts".
				957
				958	//===---------------------------------------------------------------------===//
				959
				960	int a(int a, int b, int c) {return (a&&b) \|\| (a&&c) \|\| (a&&b&&c);}
				961	Should fold to a && (b \|\| c). Currently not optimized with "clang
				962	-emit-llvm-bc \| opt -std-compile-opts".
				963
				964	//===---------------------------------------------------------------------===//
				965
				966	int a(int x) {return x \| ((x & 8) ^ 8);}
				967	Should combine to x \| 8. Currently not optimized with "clang
				968	-emit-llvm-bc \| opt -std-compile-opts".
				969
				970	//===---------------------------------------------------------------------===//
				971
				972	int a(int x) {return x ^ ((x & 8) ^ 8);}
				973	Should also combine to x \| 8. Currently not optimized with "clang
				974	-emit-llvm-bc \| opt -std-compile-opts".
				975
				976	//===---------------------------------------------------------------------===//
				977
Eli Friedman	e16c0ff	2008-11-30 07:36:04 +0000	[diff] [blame]	978	int a(int x) {return ((x \| -9) ^ 8) & x;}
				979	Should combine to x & -9. Currently not optimized with "clang
				980	-emit-llvm-bc \| opt -std-compile-opts".
				981
				982	//===---------------------------------------------------------------------===//
				983
				984	unsigned a(unsigned a) {return a * 0x11111111 >> 28 & 1;}
				985	Should combine to "a * 0x88888888 >> 31". Currently not optimized
				986	with "clang -emit-llvm-bc \| opt -std-compile-opts".
				987
				988	//===---------------------------------------------------------------------===//
				989
				990	unsigned a(char* x) {if ((*x & 32) == 0) return b();}
				991	There's an unnecessary zext in the generated code with "clang
				992	-emit-llvm-bc \| opt -std-compile-opts".
				993
				994	//===---------------------------------------------------------------------===//
				995
				996	unsigned a(unsigned long long x) {return 40 * (x >> 1);}
				997	Should combine to "20 * (((unsigned)x) & -2)". Currently not
				998	optimized with "clang -emit-llvm-bc \| opt -std-compile-opts".
				999
				1000	//===---------------------------------------------------------------------===//
Bill Wendling	85de4b3	2008-12-02 05:12:47 +0000	[diff] [blame]	1001
Chris Lattner	0cdc0bb	2008-12-02 06:32:34 +0000	[diff] [blame]	1002	This was noticed in the entryblock for grokdeclarator in 403.gcc:
				1003
				1004	%tmp = icmp eq i32 %decl_context, 4
				1005	%decl_context_addr.0 = select i1 %tmp, i32 3, i32 %decl_context
				1006	%tmp1 = icmp eq i32 %decl_context_addr.0, 1
				1007	%decl_context_addr.1 = select i1 %tmp1, i32 0, i32 %decl_context_addr.0
				1008
				1009	tmp1 should be simplified to something like:
				1010	(!tmp \|\| decl_context == 1)
				1011
				1012	This allows recursive simplifications, tmp1 is used all over the place in
				1013	the function, e.g. by:
				1014
				1015	%tmp23 = icmp eq i32 %decl_context_addr.1, 0 ; <i1> [#uses=1]
				1016	%tmp24 = xor i1 %tmp1, true ; <i1> [#uses=1]
				1017	%or.cond8 = and i1 %tmp23, %tmp24 ; <i1> [#uses=1]
				1018
				1019	later.
				1020
Chris Lattner	543d6c6	2008-12-06 19:28:22 +0000	[diff] [blame]	1021	//===---------------------------------------------------------------------===//
				1022
Chris Lattner	58ccf88	2009-11-29 02:19:52 +0000	[diff] [blame]	1023	[STORE SINKING]
				1024
Chris Lattner	543d6c6	2008-12-06 19:28:22 +0000	[diff] [blame]	1025	Store sinking: This code:
				1026
				1027	void f (int n, int cond, int res) {
				1028	int i;
				1029	*res = 0;
				1030	for (i = 0; i < n; i++)
				1031	if (*cond)
				1032	res ^= 234; / () /
				1033	}
				1034
				1035	On this function GVN hoists the fully redundant value of *res, but nothing
				1036	moves the store out. This gives us this code:
				1037
				1038	bb: ; preds = %bb2, %entry
				1039	%.rle = phi i32 [ 0, %entry ], [ %.rle6, %bb2 ]
				1040	%i.05 = phi i32 [ 0, %entry ], [ %indvar.next, %bb2 ]
				1041	%1 = load i32* %cond, align 4
				1042	%2 = icmp eq i32 %1, 0
				1043	br i1 %2, label %bb2, label %bb1
				1044
				1045	bb1: ; preds = %bb
				1046	%3 = xor i32 %.rle, 234
				1047	store i32 %3, i32* %res, align 4
				1048	br label %bb2
				1049
				1050	bb2: ; preds = %bb, %bb1
				1051	%.rle6 = phi i32 [ %3, %bb1 ], [ %.rle, %bb ]
				1052	%indvar.next = add i32 %i.05, 1
				1053	%exitcond = icmp eq i32 %indvar.next, %n
				1054	br i1 %exitcond, label %return, label %bb
				1055
				1056	DSE should sink partially dead stores to get the store out of the loop.
				1057
Chris Lattner	da93063	2008-12-06 22:52:12 +0000	[diff] [blame]	1058	Here's another partial dead case:
				1059	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12395
				1060
Chris Lattner	543d6c6	2008-12-06 19:28:22 +0000	[diff] [blame]	1061	//===---------------------------------------------------------------------===//
				1062
				1063	Scalar PRE hoists the mul in the common block up to the else:
				1064
				1065	int test (int a, int b, int c, int g) {
				1066	int d, e;
				1067	if (a)
				1068	d = b * c;
				1069	else
				1070	d = b - c;
				1071	e = b * c + g;
				1072	return d + e;
				1073	}
				1074
				1075	It would be better to do the mul once to reduce codesize above the if.
				1076	This is GCC PR38204.
				1077
				1078	//===---------------------------------------------------------------------===//
				1079
Chris Lattner	58ccf88	2009-11-29 02:19:52 +0000	[diff] [blame]	1080	[STORE SINKING]
				1081
Chris Lattner	543d6c6	2008-12-06 19:28:22 +0000	[diff] [blame]	1082	GCC PR37810 is an interesting case where we should sink load/store reload
				1083	into the if block and outside the loop, so we don't reload/store it on the
				1084	non-call path.
				1085
				1086	for () {
				1087	*P += 1;
				1088	if ()
				1089	call();
				1090	else
				1091	...
				1092	->
				1093	tmp = *P
				1094	for () {
				1095	tmp += 1;
				1096	if () {
				1097	*P = tmp;
				1098	call();
				1099	tmp = *P;
				1100	} else ...
				1101	}
				1102	*P = tmp;
				1103
Chris Lattner	81ee731	2008-12-15 07:49:24 +0000	[diff] [blame]	1104	We now hoist the reload after the call (Transforms/GVN/lpre-call-wrap.ll), but
				1105	we don't sink the store. We need partially dead store sinking.
				1106
Chris Lattner	543d6c6	2008-12-06 19:28:22 +0000	[diff] [blame]	1107	//===---------------------------------------------------------------------===//
				1108
Chris Lattner	58ccf88	2009-11-29 02:19:52 +0000	[diff] [blame]	1109	[LOAD PRE CRIT EDGE SPLITTING]
Chris Lattner	81ee731	2008-12-15 07:49:24 +0000	[diff] [blame]	1110
Chris Lattner	543d6c6	2008-12-06 19:28:22 +0000	[diff] [blame]	1111	GCC PR37166: Sinking of loads prevents SROA'ing the "g" struct on the stack
				1112	leading to excess stack traffic. This could be handled by GVN with some crazy
				1113	symbolic phi translation. The code we get looks like (g is on the stack):
				1114
				1115	bb2: ; preds = %bb1
				1116	..
				1117	%9 = getelementptr %struct.f* %g, i32 0, i32 0
				1118	store i32 %8, i32* %9, align bel %bb3
				1119
				1120	bb3: ; preds = %bb1, %bb2, %bb
				1121	%c_addr.0 = phi %struct.f* [ %g, %bb2 ], [ %c, %bb ], [ %c, %bb1 ]
				1122	%b_addr.0 = phi %struct.f* [ %b, %bb2 ], [ %g, %bb ], [ %b, %bb1 ]
				1123	%10 = getelementptr %struct.f* %c_addr.0, i32 0, i32 0
				1124	%11 = load i32* %10, align 4
				1125
Chris Lattner	ca9e0e8	2009-11-27 16:53:57 +0000	[diff] [blame]	1126	%11 is partially redundant, an in BB2 it should have the value %8.
Chris Lattner	543d6c6	2008-12-06 19:28:22 +0000	[diff] [blame]	1127
Chris Lattner	58ccf88	2009-11-29 02:19:52 +0000	[diff] [blame]	1128	GCC PR33344 and PR35287 are similar cases.
Chris Lattner	da93063	2008-12-06 22:52:12 +0000	[diff] [blame]	1129
Chris Lattner	06c26d9	2009-11-05 18:19:19 +0000	[diff] [blame]	1130
				1131	//===---------------------------------------------------------------------===//
				1132
Chris Lattner	58ccf88	2009-11-29 02:19:52 +0000	[diff] [blame]	1133	[LOAD PRE]
				1134
Chris Lattner	da93063	2008-12-06 22:52:12 +0000	[diff] [blame]	1135	There are many load PRE testcases in testsuite/gcc.dg/tree-ssa/loadpre* in the
Chris Lattner	58ccf88	2009-11-29 02:19:52 +0000	[diff] [blame]	1136	GCC testsuite, ones we don't get yet are (checked through loadpre25):
				1137
				1138	[CRIT EDGE BREAKING]
				1139	loadpre3.c predcom-4.c
				1140
				1141	[PRE OF READONLY CALL]
				1142	loadpre5.c
				1143
				1144	[TURN SELECT INTO BRANCH]
				1145	loadpre14.c loadpre15.c
				1146
				1147	actually a conditional increment: loadpre18.c loadpre19.c
				1148
Chris Lattner	aded09f	2010-12-15 06:38:24 +0000	[diff] [blame]	1149	//===---------------------------------------------------------------------===//
				1150
				1151	[LOAD PRE / STORE SINKING / SPEC HACK]
				1152
				1153	This is a chunk of code from 456.hmmer:
				1154
				1155	int f(int M, int mc, int mpp, int tpmm, int ip, int tpim, int dpp,
				1156	int tpdm, int xmb, int bp, int *ms) {
				1157	int k, sc;
				1158	for (k = 1; k <= M; k++) {
				1159	mc[k] = mpp[k-1] + tpmm[k-1];
				1160	if ((sc = ip[k-1] + tpim[k-1]) > mc[k]) mc[k] = sc;
				1161	if ((sc = dpp[k-1] + tpdm[k-1]) > mc[k]) mc[k] = sc;
				1162	if ((sc = xmb + bp[k]) > mc[k]) mc[k] = sc;
				1163	mc[k] += ms[k];
				1164	}
				1165	}
				1166
				1167	It is very profitable for this benchmark to turn the conditional stores to mc[k]
				1168	into a conditional move (select instr in IR) and allow the final store to do the
				1169	store. See GCC PR27313 for more details. Note that this is valid to xform even
				1170	with the new C++ memory model, since mc[k] is previously loaded and later
				1171	stored.
Chris Lattner	58ccf88	2009-11-29 02:19:52 +0000	[diff] [blame]	1172
				1173	//===---------------------------------------------------------------------===//
				1174
				1175	[SCALAR PRE]
				1176	There are many PRE testcases in testsuite/gcc.dg/tree-ssa/ssa-pre-*.c in the
				1177	GCC testsuite.
Chris Lattner	da93063	2008-12-06 22:52:12 +0000	[diff] [blame]	1178
Chris Lattner	5d196e6	2008-12-15 08:32:28 +0000	[diff] [blame]	1179	//===---------------------------------------------------------------------===//
				1180
				1181	There are some interesting cases in testsuite/gcc.dg/tree-ssa/pred-comm* in the
Chris Lattner	58ccf88	2009-11-29 02:19:52 +0000	[diff] [blame]	1182	GCC testsuite. For example, we get the first example in predcom-1.c, but
				1183	miss the second one:
Chris Lattner	5d196e6	2008-12-15 08:32:28 +0000	[diff] [blame]	1184
Chris Lattner	58ccf88	2009-11-29 02:19:52 +0000	[diff] [blame]	1185	unsigned fib[1000];
				1186	unsigned avg[1000];
Chris Lattner	5d196e6	2008-12-15 08:32:28 +0000	[diff] [blame]	1187
Chris Lattner	58ccf88	2009-11-29 02:19:52 +0000	[diff] [blame]	1188	__attribute__ ((noinline))
				1189	void count_averages(int n) {
				1190	int i;
				1191	for (i = 1; i < n; i++)
				1192	avg[i] = (((unsigned long) fib[i - 1] + fib[i] + fib[i + 1]) / 3) & 0xffff;
				1193	}
Chris Lattner	5d196e6	2008-12-15 08:32:28 +0000	[diff] [blame]	1194
Chris Lattner	58ccf88	2009-11-29 02:19:52 +0000	[diff] [blame]	1195	which compiles into two loads instead of one in the loop.
Chris Lattner	5d196e6	2008-12-15 08:32:28 +0000	[diff] [blame]	1196
Chris Lattner	58ccf88	2009-11-29 02:19:52 +0000	[diff] [blame]	1197	predcom-2.c is the same as predcom-1.c
Chris Lattner	5d196e6	2008-12-15 08:32:28 +0000	[diff] [blame]	1198
Chris Lattner	5d196e6	2008-12-15 08:32:28 +0000	[diff] [blame]	1199	predcom-3.c is very similar but needs loads feeding each other instead of
				1200	store->load.
Chris Lattner	5d196e6	2008-12-15 08:32:28 +0000	[diff] [blame]	1201
				1202
				1203	//===---------------------------------------------------------------------===//
				1204
Chris Lattner	082da53	2010-01-23 17:59:23 +0000	[diff] [blame]	1205	[ALIAS ANALYSIS]
				1206
Chris Lattner	5d196e6	2008-12-15 08:32:28 +0000	[diff] [blame]	1207	Type based alias analysis:
Chris Lattner	da93063	2008-12-06 22:52:12 +0000	[diff] [blame]	1208	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14705
				1209
Chris Lattner	082da53	2010-01-23 17:59:23 +0000	[diff] [blame]	1210	We should do better analysis of posix_memalign. At the least it should
				1211	no-capture its pointer argument, at best, we should know that the out-value
				1212	result doesn't point to anything (like malloc). One example of this is in
				1213	SingleSource/Benchmarks/Misc/dt.c
				1214
Chris Lattner	da93063	2008-12-06 22:52:12 +0000	[diff] [blame]	1215	//===---------------------------------------------------------------------===//
				1216
Chris Lattner	da93063	2008-12-06 22:52:12 +0000	[diff] [blame]	1217	A/B get pinned to the stack because we turn an if/then into a select instead
				1218	of PRE'ing the load/store. This may be fixable in instcombine:
				1219	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37892
				1220
Chris Lattner	9858859	2009-09-21 02:53:57 +0000	[diff] [blame]	1221	struct X { int i; };
				1222	int foo (int x) {
				1223	struct X a;
				1224	struct X b;
				1225	struct X *p;
				1226	a.i = 1;
				1227	b.i = 2;
				1228	if (x)
				1229	p = &a;
				1230	else
				1231	p = &b;
				1232	return p->i;
				1233	}
Chris Lattner	5d196e6	2008-12-15 08:32:28 +0000	[diff] [blame]	1234
Chris Lattner	9858859	2009-09-21 02:53:57 +0000	[diff] [blame]	1235	//===---------------------------------------------------------------------===//
Chris Lattner	5d196e6	2008-12-15 08:32:28 +0000	[diff] [blame]	1236
Chris Lattner	da93063	2008-12-06 22:52:12 +0000	[diff] [blame]	1237	Interesting missed case because of control flow flattening (should be 2 loads):
				1238	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26629
Chris Lattner	5d196e6	2008-12-15 08:32:28 +0000	[diff] [blame]	1239	With: llvm-gcc t2.c -S -o - -O0 -emit-llvm \| llvm-as \|
				1240	opt -mem2reg -gvn -instcombine \| llvm-dis
Chris Lattner	58ccf88	2009-11-29 02:19:52 +0000	[diff] [blame]	1241	we miss it because we need 1) CRIT EDGE 2) MULTIPLE DIFFERENT
Chris Lattner	5d196e6	2008-12-15 08:32:28 +0000	[diff] [blame]	1242	VALS PRODUCED BY ONE BLOCK OVER DIFFERENT PATHS
Chris Lattner	da93063	2008-12-06 22:52:12 +0000	[diff] [blame]	1243
				1244	//===---------------------------------------------------------------------===//
				1245
				1246	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19633
				1247	We could eliminate the branch condition here, loading from null is undefined:
				1248
				1249	struct S { int w, x, y, z; };
				1250	struct T { int r; struct S s; };
				1251	void bar (struct S, int);
				1252	void foo (int a, struct T b)
				1253	{
				1254	struct S *c = 0;
				1255	if (a)
				1256	c = &b.s;
				1257	bar (*c, a);
				1258	}
				1259
				1260	//===---------------------------------------------------------------------===//
Chris Lattner	0cdc0bb	2008-12-02 06:32:34 +0000	[diff] [blame]	1261
Chris Lattner	8a35adf	2008-12-23 20:52:52 +0000	[diff] [blame]	1262	simplifylibcalls should do several optimizations for strspn/strcspn:
				1263
Chris Lattner	8a35adf	2008-12-23 20:52:52 +0000	[diff] [blame]	1264	strcspn(x, "a") -> inlined loop for up to 3 letters (similarly for strspn):
				1265
				1266	size_t __strcspn_c3 (__const char *__s, int __reject1, int __reject2,
				1267	int __reject3) {
				1268	register size_t __result = 0;
				1269	while (__s[__result] != '\0' && __s[__result] != __reject1 &&
				1270	__s[__result] != __reject2 && __s[__result] != __reject3)
				1271	++__result;
				1272	return __result;
				1273	}
				1274
				1275	This should turn into a switch on the character. See PR3253 for some notes on
				1276	codegen.
				1277
				1278	456.hmmer apparently uses strcspn and strspn a lot. 471.omnetpp uses strspn.
				1279
				1280	//===---------------------------------------------------------------------===//
Chris Lattner	a414225	2008-12-31 00:54:13 +0000	[diff] [blame]	1281
				1282	"gas" uses this idiom:
				1283	else if (strchr ("+-/%\|&^:[]()~", intel_parser.op_string))
				1284	..
				1285	else if (strchr ("<>", *intel_parser.op_string)
				1286
				1287	Those should be turned into a switch.
				1288
				1289	//===---------------------------------------------------------------------===//
Chris Lattner	7cb3ae0	2009-01-08 06:52:57 +0000	[diff] [blame]	1290
				1291	252.eon contains this interesting code:
				1292
				1293	%3072 = getelementptr [100 x i8]* %tempString, i32 0, i32 0
				1294	%3073 = call i8* @strcpy(i8* %3072, i8* %3071) nounwind
				1295	%strlen = call i32 @strlen(i8* %3072) ; uses = 1
				1296	%endptr = getelementptr [100 x i8]* %tempString, i32 0, i32 %strlen
				1297	call void @llvm.memcpy.i32(i8* %endptr,
				1298	i8* getelementptr ([5 x i8]* @"\01LC42", i32 0, i32 0), i32 5, i32 1)
				1299	%3074 = call i32 @strlen(i8* %endptr) nounwind readonly
				1300
				1301	This is interesting for a couple reasons. First, in this:
				1302
Benjamin Kramer	dfa40f8	2010-12-23 15:32:07 +0000	[diff] [blame]	1303	The memcpy+strlen strlen can be replaced with:
Chris Lattner	7cb3ae0	2009-01-08 06:52:57 +0000	[diff] [blame]	1304
				1305	%3074 = call i32 @strlen([5 x i8]* @"\01LC42") nounwind readonly
				1306
				1307	Because the destination was just copied into the specified memory buffer. This,
				1308	in turn, can be constant folded to "4".
				1309
				1310	In other code, it contains:
				1311
				1312	%endptr6978 = bitcast i8* %endptr69 to i32*
				1313	store i32 7107374, i32* %endptr6978, align 1
				1314	%3167 = call i32 @strlen(i8* %endptr69) nounwind readonly
				1315
				1316	Which could also be constant folded. Whatever is producing this should probably
				1317	be fixed to leave this as a memcpy from a string.
				1318
				1319	Further, eon also has an interesting partially redundant strlen call:
				1320
				1321	bb8: ; preds = %_ZN18eonImageCalculatorC1Ev.exit
				1322	%682 = getelementptr i8 %argv, i32 6 ; <i8> [#uses=2]
				1323	%683 = load i8** %682, align 4 ; <i8*> [#uses=4]
				1324	%684 = load i8* %683, align 1 ; <i8> [#uses=1]
				1325	%685 = icmp eq i8 %684, 0 ; <i1> [#uses=1]
				1326	br i1 %685, label %bb10, label %bb9
				1327
				1328	bb9: ; preds = %bb8
				1329	%686 = call i32 @strlen(i8* %683) nounwind readonly
				1330	%687 = icmp ugt i32 %686, 254 ; <i1> [#uses=1]
				1331	br i1 %687, label %bb10, label %bb11
				1332
				1333	bb10: ; preds = %bb9, %bb8
				1334	%688 = call i32 @strlen(i8* %683) nounwind readonly
				1335
				1336	This could be eliminated by doing the strlen once in bb8, saving code size and
				1337	improving perf on the bb8->9->10 path.
				1338
				1339	//===---------------------------------------------------------------------===//
Chris Lattner	6c2ee50	2009-01-08 07:34:55 +0000	[diff] [blame]	1340
				1341	I see an interesting fully redundant call to strlen left in 186.crafty:InputMove
				1342	which looks like:
				1343	%movetext11 = getelementptr [128 x i8]* %movetext, i32 0, i32 0
				1344
				1345
				1346	bb62: ; preds = %bb55, %bb53
				1347	%promote.0 = phi i32 [ %169, %bb55 ], [ 0, %bb53 ]
				1348	%171 = call i32 @strlen(i8* %movetext11) nounwind readonly align 1
				1349	%172 = add i32 %171, -1 ; <i32> [#uses=1]
				1350	%173 = getelementptr [128 x i8]* %movetext, i32 0, i32 %172
				1351
				1352	... no stores ...
				1353	br i1 %or.cond, label %bb65, label %bb72
				1354
				1355	bb65: ; preds = %bb62
				1356	store i8 0, i8* %173, align 1
				1357	br label %bb72
				1358
				1359	bb72: ; preds = %bb65, %bb62
				1360	%trank.1 = phi i32 [ %176, %bb65 ], [ -1, %bb62 ]
				1361	%177 = call i32 @strlen(i8* %movetext11) nounwind readonly align 1
				1362
				1363	Note that on the bb62->bb72 path, that the %177 strlen call is partially
				1364	redundant with the %171 call. At worst, we could shove the %177 strlen call
				1365	up into the bb65 block moving it out of the bb62->bb72 path. However, note
				1366	that bb65 stores to the string, zeroing out the last byte. This means that on
				1367	that path the value of %177 is actually just %171-1. A sub is cheaper than a
				1368	strlen!
				1369
				1370	This pattern repeats several times, basically doing:
				1371
				1372	A = strlen(P);
				1373	P[A-1] = 0;
				1374	B = strlen(P);
				1375	where it is "obvious" that B = A-1.
				1376
				1377	//===---------------------------------------------------------------------===//
				1378
Chris Lattner	6c2ee50	2009-01-08 07:34:55 +0000	[diff] [blame]	1379	186.crafty has this interesting pattern with the "out.4543" variable:
				1380
				1381	call void @llvm.memcpy.i32(
				1382	i8* getelementptr ([10 x i8]* @out.4543, i32 0, i32 0),
				1383	i8* getelementptr ([7 x i8]* @"\01LC28700", i32 0, i32 0), i32 7, i32 1)
				1384	%101 = call@printf(i8* ... @out.4543, i32 0, i32 0)) nounwind
				1385
				1386	It is basically doing:
				1387
				1388	memcpy(globalarray, "string");
				1389	printf(..., globalarray);
				1390
				1391	Anyway, by knowing that printf just reads the memory and forward substituting
				1392	the string directly into the printf, this eliminates reads from globalarray.
				1393	Since this pattern occurs frequently in crafty (due to the "DisplayTime" and
				1394	other similar functions) there are many stores to "out". Once all the printfs
				1395	stop using "out", all that is left is the memcpy's into it. This should allow
				1396	globalopt to remove the "stored only" global.
				1397
				1398	//===---------------------------------------------------------------------===//
				1399
Dan Gohman	83d2e06	2009-01-20 01:07:33 +0000	[diff] [blame]	1400	This code:
				1401
				1402	define inreg i32 @foo(i8* inreg %p) nounwind {
				1403	%tmp0 = load i8* %p
				1404	%tmp1 = ashr i8 %tmp0, 5
				1405	%tmp2 = sext i8 %tmp1 to i32
				1406	ret i32 %tmp2
				1407	}
				1408
				1409	could be dagcombine'd to a sign-extending load with a shift.
				1410	For example, on x86 this currently gets this:
				1411
				1412	movb (%eax), %al
				1413	sarb $5, %al
				1414	movsbl %al, %eax
				1415
				1416	while it could get this:
				1417
				1418	movsbl (%eax), %eax
				1419	sarl $5, %eax
				1420
				1421	//===---------------------------------------------------------------------===//
Chris Lattner	705ac70	2009-01-22 07:16:03 +0000	[diff] [blame]	1422
				1423	GCC PR31029:
				1424
				1425	int test(int x) { return 1-x == x; } // --> return false
				1426	int test2(int x) { return 2-x == x; } // --> return x == 1 ?
				1427
				1428	Always foldable for odd constants, what is the rule for even?
				1429
				1430	//===---------------------------------------------------------------------===//
				1431
Torok Edwin	3cedd4d	2009-01-24 19:30:25 +0000	[diff] [blame]	1432	PR 3381: GEP to field of size 0 inside a struct could be turned into GEP
				1433	for next field in struct (which is at same address).
				1434
				1435	For example: store of float into { {{}}, float } could be turned into a store to
				1436	the float directly.
				1437
Torok Edwin	87d5ca0	2009-02-20 18:42:06 +0000	[diff] [blame]	1438	//===---------------------------------------------------------------------===//
Nick Lewycky	5c10a3a	2009-02-25 06:52:48 +0000	[diff] [blame]	1439
Chris Lattner	17a999e	2009-05-11 17:41:40 +0000	[diff] [blame]	1440	The arg promotion pass should make use of nocapture to make its alias analysis
				1441	stuff much more precise.
				1442
				1443	//===---------------------------------------------------------------------===//
				1444
				1445	The following functions should be optimized to use a select instead of a
				1446	branch (from gcc PR40072):
				1447
				1448	char char_int(int m) {if(m>7) return 0; return m;}
				1449	int int_char(char m) {if(m>7) return 0; return m;}
				1450
				1451	//===---------------------------------------------------------------------===//
				1452
Bill Wendling	fd2730e	2009-10-27 22:48:31 +0000	[diff] [blame]	1453	int func(int a, int b) { if (a & 0x80) b \|= 0x80; else b &= ~0x80; return b; }
				1454
				1455	Generates this:
				1456
				1457	define i32 @func(i32 %a, i32 %b) nounwind readnone ssp {
				1458	entry:
				1459	%0 = and i32 %a, 128 ; <i32> [#uses=1]
				1460	%1 = icmp eq i32 %0, 0 ; <i1> [#uses=1]
				1461	%2 = or i32 %b, 128 ; <i32> [#uses=1]
				1462	%3 = and i32 %b, -129 ; <i32> [#uses=1]
				1463	%b_addr.0 = select i1 %1, i32 %3, i32 %2 ; <i32> [#uses=1]
				1464	ret i32 %b_addr.0
				1465	}
				1466
				1467	However, it's functionally equivalent to:
				1468
				1469	b = (b & ~0x80) \| (a & 0x80);
				1470
				1471	Which generates this:
				1472
				1473	define i32 @func(i32 %a, i32 %b) nounwind readnone ssp {
				1474	entry:
				1475	%0 = and i32 %b, -129 ; <i32> [#uses=1]
				1476	%1 = and i32 %a, 128 ; <i32> [#uses=1]
				1477	%2 = or i32 %0, %1 ; <i32> [#uses=1]
				1478	ret i32 %2
				1479	}
				1480
				1481	This can be generalized for other forms:
				1482
				1483	b = (b & ~0x80) \| (a & 0x40) << 1;
				1484
				1485	//===---------------------------------------------------------------------===//
Bill Wendling	2e5198f	2009-10-27 23:30:07 +0000	[diff] [blame]	1486
				1487	These two functions produce different code. They shouldn't:
				1488
				1489	#include <stdint.h>
				1490
				1491	uint8_t p1(uint8_t b, uint8_t a) {
				1492	b = (b & ~0xc0) \| (a & 0xc0);
				1493	return (b);
				1494	}
				1495
				1496	uint8_t p2(uint8_t b, uint8_t a) {
				1497	b = (b & ~0x40) \| (a & 0x40);
				1498	b = (b & ~0x80) \| (a & 0x80);
				1499	return (b);
				1500	}
				1501
				1502	define zeroext i8 @p1(i8 zeroext %b, i8 zeroext %a) nounwind readnone ssp {
				1503	entry:
				1504	%0 = and i8 %b, 63 ; <i8> [#uses=1]
				1505	%1 = and i8 %a, -64 ; <i8> [#uses=1]
				1506	%2 = or i8 %1, %0 ; <i8> [#uses=1]
				1507	ret i8 %2
				1508	}
				1509
				1510	define zeroext i8 @p2(i8 zeroext %b, i8 zeroext %a) nounwind readnone ssp {
				1511	entry:
				1512	%0 = and i8 %b, 63 ; <i8> [#uses=1]
				1513	%.masked = and i8 %a, 64 ; <i8> [#uses=1]
				1514	%1 = and i8 %a, -128 ; <i8> [#uses=1]
				1515	%2 = or i8 %1, %0 ; <i8> [#uses=1]
				1516	%3 = or i8 %2, %.masked ; <i8> [#uses=1]
				1517	ret i8 %3
				1518	}
				1519
				1520	//===---------------------------------------------------------------------===//
Chris Lattner	539bdf0	2009-11-11 17:51:27 +0000	[diff] [blame]	1521
				1522	IPSCCP does not currently propagate argument dependent constants through
				1523	functions where it does not not all of the callers. This includes functions
				1524	with normal external linkage as well as templates, C99 inline functions etc.
				1525	Specifically, it does nothing to:
				1526
				1527	define i32 @test(i32 %x, i32 %y, i32 %z) nounwind {
				1528	entry:
				1529	%0 = add nsw i32 %y, %z
				1530	%1 = mul i32 %0, %x
				1531	%2 = mul i32 %y, %z
				1532	%3 = add nsw i32 %1, %2
				1533	ret i32 %3
				1534	}
				1535
				1536	define i32 @test2() nounwind {
				1537	entry:
				1538	%0 = call i32 @test(i32 1, i32 2, i32 4) nounwind
				1539	ret i32 %0
				1540	}
				1541
				1542	It would be interesting extend IPSCCP to be able to handle simple cases like
				1543	this, where all of the arguments to a call are constant. Because IPSCCP runs
				1544	before inlining, trivial templates and inline functions are not yet inlined.
				1545	The results for a function + set of constant arguments should be memoized in a
				1546	map.
				1547
				1548	//===---------------------------------------------------------------------===//
Chris Lattner	7a09964	2009-11-11 17:54:02 +0000	[diff] [blame]	1549
				1550	The libcall constant folding stuff should be moved out of SimplifyLibcalls into
				1551	libanalysis' constantfolding logic. This would allow IPSCCP to be able to
				1552	handle simple things like this:
				1553
				1554	static int foo(const char *X) { return strlen(X); }
				1555	int bar() { return foo("abcd"); }
				1556
				1557	//===---------------------------------------------------------------------===//
Nick Lewycky	ef4ea9a	2009-11-15 17:51:23 +0000	[diff] [blame]	1558
				1559	InstCombine should use SimplifyDemandedBits to remove the or instruction:
				1560
				1561	define i1 @test(i8 %x, i8 %y) {
				1562	%A = or i8 %x, 1
				1563	%B = icmp ugt i8 %A, 3
				1564	ret i1 %B
				1565	}
				1566
				1567	Currently instcombine calls SimplifyDemandedBits with either all bits or just
				1568	the sign bit, if the comparison is obviously a sign test. In this case, we only
				1569	need all but the bottom two bits from %A, and if we gave that mask to SDB it
				1570	would delete the or instruction for us.
				1571
				1572	//===---------------------------------------------------------------------===//
Chris Lattner	d1e4ee3	2009-12-03 07:41:54 +0000	[diff] [blame]	1573
Duncan Sands	c8493da	2010-01-06 15:37:47 +0000	[diff] [blame]	1574	functionattrs doesn't know much about memcpy/memset. This function should be
Duncan Sands	78376ad	2010-01-06 08:45:52 +0000	[diff] [blame]	1575	marked readnone rather than readonly, since it only twiddles local memory, but
				1576	functionattrs doesn't handle memset/memcpy/memmove aggressively:
Chris Lattner	f05330a	2009-12-03 07:43:46 +0000	[diff] [blame]	1577
				1578	struct X { int p; int q; };
				1579	int foo() {
				1580	int i = 0, j = 1;
				1581	struct X x, y;
				1582	int **p;
				1583	y.p = &i;
				1584	x.q = &j;
				1585	p = __builtin_memcpy (&x, &y, sizeof (int *));
				1586	return **p;
				1587	}
				1588
Chris Lattner	e5d5a41	2011-01-01 22:52:11 +0000	[diff] [blame]	1589	This can be seen at:
				1590	$ clang t.c -S -o - -mkernel -O0 -emit-llvm \| opt -functionattrs -S
				1591
				1592
Chris Lattner	d1e4ee3	2009-12-03 07:41:54 +0000	[diff] [blame]	1593	//===---------------------------------------------------------------------===//
				1594
Eli Friedman	9ed49c5	2010-01-18 22:36:59 +0000	[diff] [blame]	1595	Missed instcombine transformation:
				1596	define i1 @a(i32 %x) nounwind readnone {
				1597	entry:
				1598	%cmp = icmp eq i32 %x, 30
				1599	%sub = add i32 %x, -30
				1600	%cmp2 = icmp ugt i32 %sub, 9
				1601	%or = or i1 %cmp, %cmp2
				1602	ret i1 %or
				1603	}
				1604	This should be optimized to a single compare. Testcase derived from gcc.
				1605
				1606	//===---------------------------------------------------------------------===//
				1607
Eli Friedman	9ed49c5	2010-01-18 22:36:59 +0000	[diff] [blame]	1608	Missed instcombine or reassociate transformation:
				1609	int a(int a, int b) { return (a==12)&(b>47)&(b<58); }
				1610
				1611	The sgt and slt should be combined into a single comparison. Testcase derived
				1612	from gcc.
				1613
				1614	//===---------------------------------------------------------------------===//
				1615
				1616	Missed instcombine transformation:
Chris Lattner	9165d9d	2010-11-21 07:05:31 +0000	[diff] [blame]	1617
				1618	%382 = srem i32 %tmp14.i, 64 ; [#uses=1]
				1619	%383 = zext i32 %382 to i64 ; [#uses=1]
				1620	%384 = shl i64 %381, %383 ; [#uses=1]
				1621	%385 = icmp slt i32 %tmp14.i, 64 ; [#uses=1]
				1622
Benjamin Kramer	94a622a	2010-11-23 20:33:57 +0000	[diff] [blame]	1623	The srem can be transformed to an and because if %tmp14.i is negative, the
				1624	shift is undefined. Testcase derived from 403.gcc.
Chris Lattner	9165d9d	2010-11-21 07:05:31 +0000	[diff] [blame]	1625
				1626	//===---------------------------------------------------------------------===//
				1627
				1628	This is a range comparison on a divided result (from 403.gcc):
				1629
				1630	%1337 = sdiv i32 %1336, 8 ; [#uses=1]
				1631	%.off.i208 = add i32 %1336, 7 ; [#uses=1]
				1632	%1338 = icmp ult i32 %.off.i208, 15 ; [#uses=1]
				1633
				1634	We already catch this (removing the sdiv) if there isn't an add, we should
				1635	handle the 'add' as well. This is a common idiom with it's builtin_alloca code.
				1636	C testcase:
				1637
				1638	int a(int x) { return (unsigned)(x/16+7) < 15; }
				1639
				1640	Another similar case involves truncations on 64-bit targets:
				1641
				1642	%361 = sdiv i64 %.046, 8 ; [#uses=1]
				1643	%362 = trunc i64 %361 to i32 ; [#uses=2]
				1644	...
				1645	%367 = icmp eq i32 %362, 0 ; [#uses=1]
				1646
Eli Friedman	0de0b36	2010-01-31 04:55:32 +0000	[diff] [blame]	1647	//===---------------------------------------------------------------------===//
				1648
				1649	Missed instcombine/dagcombine transformation:
				1650	define void @lshift_lt(i8 zeroext %a) nounwind {
				1651	entry:
				1652	%conv = zext i8 %a to i32
				1653	%shl = shl i32 %conv, 3
				1654	%cmp = icmp ult i32 %shl, 33
				1655	br i1 %cmp, label %if.then, label %if.end
				1656
				1657	if.then:
				1658	tail call void @bar() nounwind
				1659	ret void
				1660
				1661	if.end:
				1662	ret void
				1663	}
				1664	declare void @bar() nounwind
				1665
				1666	The shift should be eliminated. Testcase derived from gcc.
Eli Friedman	9ed49c5	2010-01-18 22:36:59 +0000	[diff] [blame]	1667
				1668	//===---------------------------------------------------------------------===//
Chris Lattner	187242b	2010-02-09 00:11:10 +0000	[diff] [blame]	1669
				1670	These compile into different code, one gets recognized as a switch and the
				1671	other doesn't due to phase ordering issues (PR6212):
				1672
				1673	int test1(int mainType, int subType) {
				1674	if (mainType == 7)
				1675	subType = 4;
				1676	else if (mainType == 9)
				1677	subType = 6;
				1678	else if (mainType == 11)
				1679	subType = 9;
				1680	return subType;
				1681	}
				1682
				1683	int test2(int mainType, int subType) {
				1684	if (mainType == 7)
				1685	subType = 4;
				1686	if (mainType == 9)
				1687	subType = 6;
				1688	if (mainType == 11)
				1689	subType = 9;
				1690	return subType;
				1691	}
				1692
				1693	//===---------------------------------------------------------------------===//
Chris Lattner	1f6689a	2010-03-10 21:42:42 +0000	[diff] [blame]	1694
				1695	The following test case (from PR6576):
				1696
				1697	define i32 @mul(i32 %a, i32 %b) nounwind readnone {
				1698	entry:
				1699	%cond1 = icmp eq i32 %b, 0 ; <i1> [#uses=1]
				1700	br i1 %cond1, label %exit, label %bb.nph
				1701	bb.nph: ; preds = %entry
				1702	%tmp = mul i32 %b, %a ; <i32> [#uses=1]
				1703	ret i32 %tmp
				1704	exit: ; preds = %entry
				1705	ret i32 0
				1706	}
				1707
				1708	could be reduced to:
				1709
				1710	define i32 @mul(i32 %a, i32 %b) nounwind readnone {
				1711	entry:
				1712	%tmp = mul i32 %b, %a
				1713	ret i32 %tmp
				1714	}
				1715
				1716	//===---------------------------------------------------------------------===//
				1717
Chris Lattner	cfc921c	2010-04-16 23:52:30 +0000	[diff] [blame]	1718	We should use DSE + llvm.lifetime.end to delete dead vtable pointer updates.
				1719	See GCC PR34949
				1720
Chris Lattner	4dc833c	2010-05-21 23:16:21 +0000	[diff] [blame]	1721	Another interesting case is that something related could be used for variables
				1722	that go const after their ctor has finished. In these cases, globalopt (which
				1723	can statically run the constructor) could mark the global const (so it gets put
				1724	in the readonly section). A testcase would be:
				1725
				1726	#include <complex>
				1727	using namespace std;
				1728	const complex<char> should_be_in_rodata (42,-42);
				1729	complex<char> should_be_in_data (42,-42);
				1730	complex<char> should_be_in_bss;
				1731
				1732	Where we currently evaluate the ctors but the globals don't become const because
				1733	the optimizer doesn't know they "become const" after the ctor is done. See
				1734	GCC PR4131 for more examples.
				1735
Chris Lattner	cfc921c	2010-04-16 23:52:30 +0000	[diff] [blame]	1736	//===---------------------------------------------------------------------===//
				1737
Dan Gohman	73c8145	2010-05-03 14:31:00 +0000	[diff] [blame]	1738	In this code:
				1739
				1740	long foo(long x) {
				1741	return x > 1 ? x : 1;
				1742	}
				1743
				1744	LLVM emits a comparison with 1 instead of 0. 0 would be equivalent
				1745	and cheaper on most targets.
				1746
				1747	LLVM prefers comparisons with zero over non-zero in general, but in this
				1748	case it choses instead to keep the max operation obvious.
				1749
				1750	//===---------------------------------------------------------------------===//
Eli Friedman	e17e4ae	2010-06-12 05:54:27 +0000	[diff] [blame]	1751
				1752	Take the following testcase on x86-64 (similar testcases exist for all targets
				1753	with addc/adde):
				1754
				1755	define void @a(i64* nocapture %s, i64* nocapture %t, i64 %a, i64 %b,
				1756	i64 %c) nounwind {
				1757	entry:
				1758	%0 = zext i64 %a to i128 ; <i128> [#uses=1]
				1759	%1 = zext i64 %b to i128 ; <i128> [#uses=1]
				1760	%2 = add i128 %1, %0 ; <i128> [#uses=2]
				1761	%3 = zext i64 %c to i128 ; <i128> [#uses=1]
				1762	%4 = shl i128 %3, 64 ; <i128> [#uses=1]
				1763	%5 = add i128 %4, %2 ; <i128> [#uses=1]
				1764	%6 = lshr i128 %5, 64 ; <i128> [#uses=1]
				1765	%7 = trunc i128 %6 to i64 ; <i64> [#uses=1]
				1766	store i64 %7, i64* %s, align 8
				1767	%8 = trunc i128 %2 to i64 ; <i64> [#uses=1]
				1768	store i64 %8, i64* %t, align 8
				1769	ret void
				1770	}
				1771
				1772	Generated code:
				1773	addq %rcx, %rdx
				1774	movl $0, %eax
				1775	adcq $0, %rax
				1776	addq %r8, %rax
				1777	movq %rax, (%rdi)
				1778	movq %rdx, (%rsi)
				1779	ret
				1780
				1781	Expected code:
				1782	addq %rcx, %rdx
				1783	adcq $0, %r8
				1784	movq %r8, (%rdi)
				1785	movq %rdx, (%rsi)
				1786	ret
				1787
				1788	The generated SelectionDAG has an ADD of an ADDE, where both operands of the
				1789	ADDE are zero. Replacing one of the operands of the ADDE with the other operand
				1790	of the ADD, and replacing the ADD with the ADDE, should give the desired result.
				1791
				1792	(That said, we are doing a lot better than gcc on this testcase. :) )
				1793
				1794	//===---------------------------------------------------------------------===//
Eli Friedman	836fdbc	2010-07-03 07:38:12 +0000	[diff] [blame]	1795
				1796	Switch lowering generates less than ideal code for the following switch:
				1797	define void @a(i32 %x) nounwind {
				1798	entry:
				1799	switch i32 %x, label %if.end [
				1800	i32 0, label %if.then
				1801	i32 1, label %if.then
				1802	i32 2, label %if.then
				1803	i32 3, label %if.then
				1804	i32 5, label %if.then
				1805	]
				1806	if.then:
				1807	tail call void @foo() nounwind
				1808	ret void
				1809	if.end:
				1810	ret void
				1811	}
				1812	declare void @foo()
				1813
				1814	Generated code on x86-64 (other platforms give similar results):
				1815	a:
				1816	cmpl $5, %edi
				1817	ja .LBB0_2
				1818	movl %edi, %eax
				1819	movl $47, %ecx
				1820	btq %rax, %rcx
				1821	jb .LBB0_3
				1822	.LBB0_2:
				1823	ret
				1824	.LBB0_3:
Eli Friedman	c8f5952	2010-07-03 08:43:32 +0000	[diff] [blame]	1825	jmp foo # TAILCALL
Eli Friedman	836fdbc	2010-07-03 07:38:12 +0000	[diff] [blame]	1826
				1827	The movl+movl+btq+jb could be simplified to a cmpl+jne.
				1828
Eli Friedman	c8f5952	2010-07-03 08:43:32 +0000	[diff] [blame]	1829	Or, if we wanted to be really clever, we could simplify the whole thing to
				1830	something like the following, which eliminates a branch:
				1831	xorl $1, %edi
				1832	cmpl $4, %edi
				1833	ja .LBB0_2
				1834	ret
				1835	.LBB0_2:
				1836	jmp foo # TAILCALL
Nick Lewycky	bb10e90	2010-08-08 07:04:25 +0000	[diff] [blame]	1837	//===---------------------------------------------------------------------===//
				1838	Given a branch where the two target blocks are identical ("ret i32 %b" in
				1839	both), simplifycfg will simplify them away. But not so for a switch statement:
Eli Friedman	c8f5952	2010-07-03 08:43:32 +0000	[diff] [blame]	1840
Nick Lewycky	bb10e90	2010-08-08 07:04:25 +0000	[diff] [blame]	1841	define i32 @f(i32 %a, i32 %b) nounwind readnone {
				1842	entry:
				1843	switch i32 %a, label %bb3 [
				1844	i32 4, label %bb
				1845	i32 6, label %bb
				1846	]
				1847
				1848	bb: ; preds = %entry, %entry
				1849	ret i32 %b
				1850
				1851	bb3: ; preds = %entry
				1852	ret i32 %b
				1853	}
Eli Friedman	836fdbc	2010-07-03 07:38:12 +0000	[diff] [blame]	1854	//===---------------------------------------------------------------------===//
Chris Lattner	4d94e47	2010-11-09 19:37:28 +0000	[diff] [blame]	1855
				1856	clang -O3 fails to devirtualize this virtual inheritance case: (GCC PR45875)
Chris Lattner	1d6aa32	2010-11-11 17:17:56 +0000	[diff] [blame]	1857	Looks related to PR3100
Chris Lattner	4d94e47	2010-11-09 19:37:28 +0000	[diff] [blame]	1858
				1859	struct c1 {};
				1860	struct c10 : c1{
				1861	virtual void foo ();
				1862	};
				1863	struct c11 : c10, c1{
				1864	virtual void f6 ();
				1865	};
				1866	struct c28 : virtual c11{
				1867	void f6 ();
				1868	};
				1869	void check_c28 () {
				1870	c28 obj;
				1871	c11 *ptr = &obj;
				1872	ptr->f6 ();
				1873	}
				1874
				1875	//===---------------------------------------------------------------------===//
Chris Lattner	932aab3	2010-11-11 18:23:57 +0000	[diff] [blame]	1876
				1877	We compile this:
				1878
				1879	int foo(int a) { return (a & (~15)) / 16; }
				1880
				1881	Into:
				1882
				1883	define i32 @foo(i32 %a) nounwind readnone ssp {
				1884	entry:
				1885	%and = and i32 %a, -16
				1886	%div = sdiv i32 %and, 16
				1887	ret i32 %div
				1888	}
				1889
				1890	but this code (X & -A)/A is X >> log2(A) when A is a power of 2, so this case
				1891	should be instcombined into just "a >> 4".
				1892
				1893	We do get this at the codegen level, so something knows about it, but
				1894	instcombine should catch it earlier:
				1895
				1896	_foo: ## @foo
				1897	## BB#0: ## %entry
				1898	movl %edi, %eax
				1899	sarl $4, %eax
				1900	ret
				1901
				1902	//===---------------------------------------------------------------------===//
				1903
Chris Lattner	14cb11d	2010-12-13 00:15:25 +0000	[diff] [blame]	1904	This code (from GCC PR28685):
				1905
				1906	int test(int a, int b) {
				1907	int lt = a < b;
				1908	int eq = a == b;
				1909	if (lt)
				1910	return 1;
				1911	return eq;
				1912	}
				1913
				1914	Is compiled to:
				1915
				1916	define i32 @test(i32 %a, i32 %b) nounwind readnone ssp {
				1917	entry:
				1918	%cmp = icmp slt i32 %a, %b
				1919	br i1 %cmp, label %return, label %if.end
				1920
				1921	if.end: ; preds = %entry
				1922	%cmp5 = icmp eq i32 %a, %b
				1923	%conv6 = zext i1 %cmp5 to i32
				1924	ret i32 %conv6
				1925
				1926	return: ; preds = %entry
				1927	ret i32 1
				1928	}
				1929
				1930	it could be:
				1931
				1932	define i32 @test__(i32 %a, i32 %b) nounwind readnone ssp {
				1933	entry:
				1934	%0 = icmp sle i32 %a, %b
				1935	%retval = zext i1 %0 to i32
				1936	ret i32 %retval
				1937	}
				1938
				1939	//===---------------------------------------------------------------------===//
Duncan Sands	772749a	2011-01-01 20:08:02 +0000	[diff] [blame]	1940
				1941	This compare could fold to false:
				1942
				1943	define i1 @g(i32 a) nounwind readnone {
				1944	%add = shl i32 %a, 1
				1945	%mul = shl i32 %a, 1
				1946	%cmp = icmp ugt i32 %add, %mul
				1947	ret i1 %cmp
				1948	}
				1949
				1950	//===---------------------------------------------------------------------===//
Chris Lattner	6c3fc0a	2011-01-01 22:57:31 +0000	[diff] [blame]	1951
				1952	This code can be seen in viterbi:
				1953
				1954	%64 = call noalias i8* @malloc(i64 %62) nounwind
				1955	...
				1956	%67 = call i64 @llvm.objectsize.i64(i8* %64, i1 false) nounwind
				1957	%68 = call i8* @__memset_chk(i8* %64, i32 0, i64 %62, i64 %67) nounwind
				1958
				1959	llvm.objectsize.i64 should be taught about malloc/calloc, allowing it to
				1960	fold to %62. This is a security win (overflows of malloc will get caught)
				1961	and also a performance win by exposing more memsets to the optimizer.
				1962
				1963	This occurs several times in viterbi.
				1964
				1965	//===---------------------------------------------------------------------===//
				1966
Chris Lattner	73552c2	2011-01-06 07:09:23 +0000	[diff] [blame^]	1967	This code (from Benchmarks/Dhrystone/dry.c):
				1968
				1969	define i32 @Func1(i32, i32) nounwind readnone optsize ssp {
				1970	entry:
				1971	%sext = shl i32 %0, 24
				1972	%conv = ashr i32 %sext, 24
				1973	%sext6 = shl i32 %1, 24
				1974	%conv4 = ashr i32 %sext6, 24
				1975	%cmp = icmp eq i32 %conv, %conv4
				1976	%. = select i1 %cmp, i32 10000, i32 0
				1977	ret i32 %.
				1978	}
				1979
				1980	Should be simplified into something like:
				1981
				1982	define i32 @Func1(i32, i32) nounwind readnone optsize ssp {
				1983	entry:
				1984	%sext = shl i32 %0, 24
				1985	%conv = and i32 %sext, 0xFF000000
				1986	%sext6 = shl i32 %1, 24
				1987	%conv4 = and i32 %sext6, 0xFF000000
				1988	%cmp = icmp eq i32 %conv, %conv4
				1989	%. = select i1 %cmp, i32 10000, i32 0
				1990	ret i32 %.
				1991	}
				1992
				1993	and then to:
				1994
				1995	define i32 @Func1(i32, i32) nounwind readnone optsize ssp {
				1996	entry:
				1997	%conv = and i32 %0, 0xFF
				1998	%conv4 = and i32 %1, 0xFF
				1999	%cmp = icmp eq i32 %conv, %conv4
				2000	%. = select i1 %cmp, i32 10000, i32 0
				2001	ret i32 %.
				2002	}
				2003	//===---------------------------------------------------------------------===//
Chris Lattner	6c3fc0a	2011-01-01 22:57:31 +0000	[diff] [blame]	2004