blob: ce84cbb79d7b90e053b84da0688fcffe99e446d5 [file] [log] [blame]
Chris Lattner086c0142006-02-03 06:21:43 +00001Target Independent Opportunities:
2
Chris Lattnerf308ea02006-09-28 06:01:17 +00003//===---------------------------------------------------------------------===//
4
Chris Lattner1d159832009-11-27 17:12:30 +00005Dead argument elimination should be enhanced to handle cases when an argument is
6dead to an externally visible function. Though the argument can't be removed
7from the externally visible function, the caller doesn't need to pass it in.
8For example in this testcase:
9
10 void foo(int X) __attribute__((noinline));
11 void foo(int X) { sideeffect(); }
12 void bar(int A) { foo(A+1); }
13
14We compile bar to:
15
16define void @bar(i32 %A) nounwind ssp {
17 %0 = add nsw i32 %A, 1 ; <i32> [#uses=1]
18 tail call void @foo(i32 %0) nounwind noinline ssp
19 ret void
20}
21
22The add is dead, we could pass in 'i32 undef' instead. This occurs for C++
23templates etc, which usually have linkonce_odr/weak_odr linkage, not internal
24linkage.
25
26//===---------------------------------------------------------------------===//
27
Chris Lattner9b62b452006-11-14 01:57:53 +000028With the recent changes to make the implicit def/use set explicit in
29machineinstrs, we should change the target descriptions for 'call' instructions
30so that the .td files don't list all the call-clobbered registers as implicit
31defs. Instead, these should be added by the code generator (e.g. on the dag).
32
33This has a number of uses:
34
351. PPC32/64 and X86 32/64 can avoid having multiple copies of call instructions
36 for their different impdef sets.
372. Targets with multiple calling convs (e.g. x86) which have different clobber
38 sets don't need copies of call instructions.
393. 'Interprocedural register allocation' can be done to reduce the clobber sets
40 of calls.
41
42//===---------------------------------------------------------------------===//
43
Nate Begeman81e80972006-03-17 01:40:33 +000044Make the PPC branch selector target independant
45
46//===---------------------------------------------------------------------===//
Chris Lattner086c0142006-02-03 06:21:43 +000047
48Get the C front-end to expand hypot(x,y) -> llvm.sqrt(x*x+y*y) when errno and
Chris Lattner2dae65d2008-12-10 01:30:48 +000049precision don't matter (ffastmath). Misc/mandel will like this. :) This isn't
50safe in general, even on darwin. See the libm implementation of hypot for
51examples (which special case when x/y are exactly zero to get signed zeros etc
52right).
Chris Lattner086c0142006-02-03 06:21:43 +000053
Chris Lattner086c0142006-02-03 06:21:43 +000054//===---------------------------------------------------------------------===//
55
56Solve this DAG isel folding deficiency:
57
58int X, Y;
59
60void fn1(void)
61{
62 X = X | (Y << 3);
63}
64
65compiles to
66
67fn1:
68 movl Y, %eax
69 shll $3, %eax
70 orl X, %eax
71 movl %eax, X
72 ret
73
74The problem is the store's chain operand is not the load X but rather
75a TokenFactor of the load X and load Y, which prevents the folding.
76
77There are two ways to fix this:
78
791. The dag combiner can start using alias analysis to realize that y/x
80 don't alias, making the store to X not dependent on the load from Y.
812. The generated isel could be made smarter in the case it can't
82 disambiguate the pointers.
83
84Number 1 is the preferred solution.
85
Evan Chenge617b082006-03-13 23:19:10 +000086This has been "fixed" by a TableGen hack. But that is a short term workaround
87which will be removed once the proper fix is made.
88
Chris Lattner086c0142006-02-03 06:21:43 +000089//===---------------------------------------------------------------------===//
90
Chris Lattnerb27b69f2006-03-04 01:19:34 +000091On targets with expensive 64-bit multiply, we could LSR this:
92
93for (i = ...; ++i) {
94 x = 1ULL << i;
95
96into:
97 long long tmp = 1;
98 for (i = ...; ++i, tmp+=tmp)
99 x = tmp;
100
101This would be a win on ppc32, but not x86 or ppc64.
102
Chris Lattnerad019932006-03-04 08:44:51 +0000103//===---------------------------------------------------------------------===//
Chris Lattner5b0fe7d2006-03-05 20:00:08 +0000104
105Shrink: (setlt (loadi32 P), 0) -> (setlt (loadi8 Phi), 0)
106
107//===---------------------------------------------------------------------===//
Chris Lattner549f27d22006-03-07 02:46:26 +0000108
Chris Lattner398ffba2010-01-01 01:29:26 +0000109Reassociate should turn things like:
110
111int factorial(int X) {
112 return X*X*X*X*X*X*X*X;
113}
114
115into llvm.powi calls, allowing the code generator to produce balanced
116multiplication trees.
117
118First, the intrinsic needs to be extended to support integers, and second the
119code generator needs to be enhanced to lower these to multiplication trees.
Chris Lattnerc20995e2006-03-11 20:17:08 +0000120
121//===---------------------------------------------------------------------===//
122
Chris Lattner74cfb7d2006-03-11 20:20:40 +0000123Interesting? testcase for add/shift/mul reassoc:
124
125int bar(int x, int y) {
126 return x*x*x+y+x*x*x*x*x*y*y*y*y;
127}
128int foo(int z, int n) {
129 return bar(z, n) + bar(2*z, 2*n);
130}
131
Chris Lattner398ffba2010-01-01 01:29:26 +0000132This is blocked on not handling X*X*X -> powi(X, 3) (see note above). The issue
133is that we end up getting t = 2*X s = t*t and don't turn this into 4*X*X,
134which is the same number of multiplies and is canonical, because the 2*X has
135multiple uses. Here's a simple example:
136
137define i32 @test15(i32 %X1) {
138 %B = mul i32 %X1, 47 ; X1*47
139 %C = mul i32 %B, %B
140 ret i32 %C
141}
142
143
144//===---------------------------------------------------------------------===//
145
146Reassociate should handle the example in GCC PR16157:
147
148extern int a0, a1, a2, a3, a4; extern int b0, b1, b2, b3, b4;
149void f () { /* this can be optimized to four additions... */
150 b4 = a4 + a3 + a2 + a1 + a0;
151 b3 = a3 + a2 + a1 + a0;
152 b2 = a2 + a1 + a0;
153 b1 = a1 + a0;
154}
155
156This requires reassociating to forms of expressions that are already available,
157something that reassoc doesn't think about yet.
Chris Lattner5e14b0d2007-05-05 22:29:06 +0000158
Chris Lattner10c42452010-01-24 20:01:41 +0000159
160//===---------------------------------------------------------------------===//
161
162This function: (derived from GCC PR19988)
163double foo(double x, double y) {
164 return ((x + 0.1234 * y) * (x + -0.1234 * y));
165}
166
167compiles to:
168_foo:
169 movapd %xmm1, %xmm2
170 mulsd LCPI1_1(%rip), %xmm1
171 mulsd LCPI1_0(%rip), %xmm2
172 addsd %xmm0, %xmm1
173 addsd %xmm0, %xmm2
174 movapd %xmm1, %xmm0
175 mulsd %xmm2, %xmm0
176 ret
177
Chris Lattner43dc2e62010-01-24 20:17:09 +0000178Reassociate should be able to turn it into:
Chris Lattner10c42452010-01-24 20:01:41 +0000179
180double foo(double x, double y) {
181 return ((x + 0.1234 * y) * (x - 0.1234 * y));
182}
183
184Which allows the multiply by constant to be CSE'd, producing:
185
186_foo:
187 mulsd LCPI1_0(%rip), %xmm1
188 movapd %xmm1, %xmm2
189 addsd %xmm0, %xmm2
190 subsd %xmm1, %xmm0
191 mulsd %xmm2, %xmm0
192 ret
193
194This doesn't need -ffast-math support at all. This is particularly bad because
195the llvm-gcc frontend is canonicalizing the later into the former, but clang
196doesn't have this problem.
197
Chris Lattner74cfb7d2006-03-11 20:20:40 +0000198//===---------------------------------------------------------------------===//
199
Chris Lattner82c78b22006-03-09 20:13:21 +0000200These two functions should generate the same code on big-endian systems:
201
202int g(int *j,int *l) { return memcmp(j,l,4); }
203int h(int *j, int *l) { return *j - *l; }
204
205this could be done in SelectionDAGISel.cpp, along with other special cases,
206for 1,2,4,8 bytes.
207
208//===---------------------------------------------------------------------===//
209
Chris Lattnerc04b4232006-03-22 07:33:46 +0000210It would be nice to revert this patch:
211http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060213/031986.html
212
213And teach the dag combiner enough to simplify the code expanded before
214legalize. It seems plausible that this knowledge would let it simplify other
215stuff too.
216
Chris Lattnere6cd96d2006-03-24 19:59:17 +0000217//===---------------------------------------------------------------------===//
218
Reid Spencerac9dcb92007-02-15 03:39:18 +0000219For vector types, TargetData.cpp::getTypeInfo() returns alignment that is equal
Evan Cheng67d3d4c2006-03-31 22:35:14 +0000220to the type size. It works but can be overly conservative as the alignment of
Reid Spencerac9dcb92007-02-15 03:39:18 +0000221specific vector types are target dependent.
Chris Lattnereaa7c062006-04-01 04:08:29 +0000222
223//===---------------------------------------------------------------------===//
224
Dan Gohman1f3be1a2009-05-11 18:51:16 +0000225We should produce an unaligned load from code like this:
Chris Lattnereaa7c062006-04-01 04:08:29 +0000226
227v4sf example(float *P) {
228 return (v4sf){P[0], P[1], P[2], P[3] };
229}
230
231//===---------------------------------------------------------------------===//
232
Chris Lattner16abfdf2006-05-18 18:26:13 +0000233Add support for conditional increments, and other related patterns. Instead
234of:
235
236 movl 136(%esp), %eax
237 cmpl $0, %eax
238 je LBB16_2 #cond_next
239LBB16_1: #cond_true
240 incl _foo
241LBB16_2: #cond_next
242
243emit:
244 movl _foo, %eax
245 cmpl $1, %edi
246 sbbl $-1, %eax
247 movl %eax, _foo
248
249//===---------------------------------------------------------------------===//
Chris Lattner870cf1b2006-05-19 20:45:08 +0000250
251Combine: a = sin(x), b = cos(x) into a,b = sincos(x).
252
253Expand these to calls of sin/cos and stores:
254 double sincos(double x, double *sin, double *cos);
255 float sincosf(float x, float *sin, float *cos);
256 long double sincosl(long double x, long double *sin, long double *cos);
257
258Doing so could allow SROA of the destination pointers. See also:
259http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17687
260
Chris Lattner2dae65d2008-12-10 01:30:48 +0000261This is now easily doable with MRVs. We could even make an intrinsic for this
262if anyone cared enough about sincos.
263
Chris Lattner870cf1b2006-05-19 20:45:08 +0000264//===---------------------------------------------------------------------===//
Chris Lattnerf00f68a2006-05-19 21:01:38 +0000265
Chris Lattnere8263e62006-05-21 03:57:07 +0000266Turn this into a single byte store with no load (the other 3 bytes are
267unmodified):
268
Dan Gohman5c8274b2009-05-11 18:04:52 +0000269define void @test(i32* %P) {
270 %tmp = load i32* %P
271 %tmp14 = or i32 %tmp, 3305111552
272 %tmp15 = and i32 %tmp14, 3321888767
273 store i32 %tmp15, i32* %P
Chris Lattnere8263e62006-05-21 03:57:07 +0000274 ret void
275}
276
Chris Lattner9e18ef52006-05-30 21:29:15 +0000277//===---------------------------------------------------------------------===//
278
Chris Lattner7ed96ab2006-09-16 23:57:51 +0000279quantum_sigma_x in 462.libquantum contains the following loop:
280
281 for(i=0; i<reg->size; i++)
282 {
283 /* Flip the target bit of each basis state */
284 reg->node[i].state ^= ((MAX_UNSIGNED) 1 << target);
285 }
286
287Where MAX_UNSIGNED/state is a 64-bit int. On a 32-bit platform it would be just
288so cool to turn it into something like:
289
Chris Lattnerb33a42a2006-09-18 04:54:35 +0000290 long long Res = ((MAX_UNSIGNED) 1 << target);
Chris Lattner7ed96ab2006-09-16 23:57:51 +0000291 if (target < 32) {
292 for(i=0; i<reg->size; i++)
Chris Lattnerb33a42a2006-09-18 04:54:35 +0000293 reg->node[i].state ^= Res & 0xFFFFFFFFULL;
Chris Lattner7ed96ab2006-09-16 23:57:51 +0000294 } else {
295 for(i=0; i<reg->size; i++)
Chris Lattnerb33a42a2006-09-18 04:54:35 +0000296 reg->node[i].state ^= Res & 0xFFFFFFFF00000000ULL
Chris Lattner7ed96ab2006-09-16 23:57:51 +0000297 }
298
299... which would only do one 32-bit XOR per loop iteration instead of two.
300
301It would also be nice to recognize the reg->size doesn't alias reg->node[i], but
Chris Lattner9c6a0dc2009-11-26 01:51:18 +0000302this requires TBAA.
Chris Lattnerfaa6adf2009-09-21 06:04:07 +0000303
304//===---------------------------------------------------------------------===//
305
Chris Lattnerb1ac7692008-10-05 02:16:12 +0000306This isn't recognized as bswap by instcombine (yes, it really is bswap):
Chris Lattnerf9bae432006-12-08 02:01:32 +0000307
308unsigned long reverse(unsigned v) {
309 unsigned t;
310 t = v ^ ((v << 16) | (v >> 16));
311 t &= ~0xff0000;
312 v = (v << 24) | (v >> 8);
313 return v ^ (t >> 8);
314}
315
Chris Lattnerfb981f32006-09-25 17:12:14 +0000316//===---------------------------------------------------------------------===//
317
Chris Lattner818ff342010-01-23 18:49:30 +0000318[LOOP RECOGNITION]
319
Chris Lattnerf4fee2a2008-10-15 16:02:15 +0000320These idioms should be recognized as popcount (see PR1488):
321
322unsigned countbits_slow(unsigned v) {
323 unsigned c;
324 for (c = 0; v; v >>= 1)
325 c += v & 1;
326 return c;
327}
328unsigned countbits_fast(unsigned v){
329 unsigned c;
330 for (c = 0; v; c++)
331 v &= v - 1; // clear the least significant bit set
332 return c;
333}
334
335BITBOARD = unsigned long long
336int PopCnt(register BITBOARD a) {
337 register int c=0;
338 while(a) {
339 c++;
340 a &= a - 1;
341 }
342 return c;
343}
344unsigned int popcount(unsigned int input) {
345 unsigned int count = 0;
346 for (unsigned int i = 0; i < 4 * 8; i++)
347 count += (input >> i) & i;
348 return count;
349}
350
Chris Lattner9c6a0dc2009-11-26 01:51:18 +0000351This is a form of idiom recognition for loops, the same thing that could be
352useful for recognizing memset/memcpy.
353
Chris Lattnerf4fee2a2008-10-15 16:02:15 +0000354//===---------------------------------------------------------------------===//
355
Chris Lattnerfb981f32006-09-25 17:12:14 +0000356These should turn into single 16-bit (unaligned?) loads on little/big endian
357processors.
358
359unsigned short read_16_le(const unsigned char *adr) {
360 return adr[0] | (adr[1] << 8);
361}
362unsigned short read_16_be(const unsigned char *adr) {
363 return (adr[0] << 8) | adr[1];
364}
365
366//===---------------------------------------------------------------------===//
Chris Lattnercf103912006-10-24 16:12:47 +0000367
Reid Spencer1628cec2006-10-26 06:15:43 +0000368-instcombine should handle this transform:
Reid Spencere4d87aa2006-12-23 06:05:41 +0000369 icmp pred (sdiv X / C1 ), C2
Reid Spencer1628cec2006-10-26 06:15:43 +0000370when X, C1, and C2 are unsigned. Similarly for udiv and signed operands.
371
372Currently InstCombine avoids this transform but will do it when the signs of
373the operands and the sign of the divide match. See the FIXME in
374InstructionCombining.cpp in the visitSetCondInst method after the switch case
375for Instruction::UDiv (around line 4447) for more details.
376
377The SingleSource/Benchmarks/Shootout-C++/hash and hash2 tests have examples of
378this construct.
Chris Lattnerd7c628d2006-11-03 22:27:39 +0000379
380//===---------------------------------------------------------------------===//
381
Chris Lattneraa306c22010-01-23 17:59:23 +0000382[LOOP RECOGNITION]
383
Chris Lattner578d2df2006-11-10 00:23:26 +0000384viterbi speeds up *significantly* if the various "history" related copy loops
385are turned into memcpy calls at the source level. We need a "loops to memcpy"
386pass.
387
388//===---------------------------------------------------------------------===//
Nick Lewyckybf637342006-11-13 00:23:28 +0000389
Chris Lattneraa306c22010-01-23 17:59:23 +0000390[LOOP OPTIMIZATION]
391
392SingleSource/Benchmarks/Misc/dt.c shows several interesting optimization
393opportunities in its double_array_divs_variable function: it needs loop
394interchange, memory promotion (which LICM already does), vectorization and
395variable trip count loop unrolling (since it has a constant trip count). ICC
396apparently produces this very nice code with -ffast-math:
397
398..B1.70: # Preds ..B1.70 ..B1.69
399 mulpd %xmm0, %xmm1 #108.2
400 mulpd %xmm0, %xmm1 #108.2
401 mulpd %xmm0, %xmm1 #108.2
402 mulpd %xmm0, %xmm1 #108.2
403 addl $8, %edx #
404 cmpl $131072, %edx #108.2
405 jb ..B1.70 # Prob 99% #108.2
406
407It would be better to count down to zero, but this is a lot better than what we
408do.
409
410//===---------------------------------------------------------------------===//
411
Chris Lattner03a6d962007-01-16 06:39:48 +0000412Consider:
413
414typedef unsigned U32;
415typedef unsigned long long U64;
416int test (U32 *inst, U64 *regs) {
417 U64 effective_addr2;
418 U32 temp = *inst;
419 int r1 = (temp >> 20) & 0xf;
420 int b2 = (temp >> 16) & 0xf;
421 effective_addr2 = temp & 0xfff;
422 if (b2) effective_addr2 += regs[b2];
423 b2 = (temp >> 12) & 0xf;
424 if (b2) effective_addr2 += regs[b2];
425 effective_addr2 &= regs[4];
426 if ((effective_addr2 & 3) == 0)
427 return 1;
428 return 0;
429}
430
431Note that only the low 2 bits of effective_addr2 are used. On 32-bit systems,
432we don't eliminate the computation of the top half of effective_addr2 because
433we don't have whole-function selection dags. On x86, this means we use one
434extra register for the function when effective_addr2 is declared as U64 than
435when it is declared U32.
436
Chris Lattner17424982009-11-10 23:47:45 +0000437PHI Slicing could be extended to do this.
438
Chris Lattner03a6d962007-01-16 06:39:48 +0000439//===---------------------------------------------------------------------===//
440
Chris Lattner9c6a0dc2009-11-26 01:51:18 +0000441LSR should know what GPR types a target has from TargetData. This code:
Chris Lattner1a77a552007-03-24 06:01:32 +0000442
443volatile short X, Y; // globals
444
445void foo(int N) {
446 int i;
447 for (i = 0; i < N; i++) { X = i; Y = i*4; }
448}
449
Chris Lattnerc1491f32009-09-20 17:37:38 +0000450produces two near identical IV's (after promotion) on PPC/ARM:
Chris Lattner1a77a552007-03-24 06:01:32 +0000451
Chris Lattnerc1491f32009-09-20 17:37:38 +0000452LBB1_2:
453 ldr r3, LCPI1_0
454 ldr r3, [r3]
455 strh r2, [r3]
456 ldr r3, LCPI1_1
457 ldr r3, [r3]
458 strh r1, [r3]
459 add r1, r1, #4
460 add r2, r2, #1 <- [0,+,1]
461 sub r0, r0, #1 <- [0,-,1]
462 cmp r0, #0
463 bne LBB1_2
464
465LSR should reuse the "+" IV for the exit test.
Chris Lattner1a77a552007-03-24 06:01:32 +0000466
Chris Lattner1a77a552007-03-24 06:01:32 +0000467//===---------------------------------------------------------------------===//
468
Chris Lattner5e14b0d2007-05-05 22:29:06 +0000469Tail call elim should be more aggressive, checking to see if the call is
470followed by an uncond branch to an exit block.
471
472; This testcase is due to tail-duplication not wanting to copy the return
473; instruction into the terminating blocks because there was other code
474; optimized out of the function after the taildup happened.
Chris Lattner7c4e9a42008-02-18 18:46:39 +0000475; RUN: llvm-as < %s | opt -tailcallelim | llvm-dis | not grep call
Chris Lattner5e14b0d2007-05-05 22:29:06 +0000476
Chris Lattner7c4e9a42008-02-18 18:46:39 +0000477define i32 @t4(i32 %a) {
Chris Lattner5e14b0d2007-05-05 22:29:06 +0000478entry:
Chris Lattner7c4e9a42008-02-18 18:46:39 +0000479 %tmp.1 = and i32 %a, 1 ; <i32> [#uses=1]
480 %tmp.2 = icmp ne i32 %tmp.1, 0 ; <i1> [#uses=1]
481 br i1 %tmp.2, label %then.0, label %else.0
Chris Lattner5e14b0d2007-05-05 22:29:06 +0000482
Chris Lattner7c4e9a42008-02-18 18:46:39 +0000483then.0: ; preds = %entry
484 %tmp.5 = add i32 %a, -1 ; <i32> [#uses=1]
485 %tmp.3 = call i32 @t4( i32 %tmp.5 ) ; <i32> [#uses=1]
486 br label %return
Chris Lattner5e14b0d2007-05-05 22:29:06 +0000487
Chris Lattner7c4e9a42008-02-18 18:46:39 +0000488else.0: ; preds = %entry
489 %tmp.7 = icmp ne i32 %a, 0 ; <i1> [#uses=1]
490 br i1 %tmp.7, label %then.1, label %return
Chris Lattner5e14b0d2007-05-05 22:29:06 +0000491
Chris Lattner7c4e9a42008-02-18 18:46:39 +0000492then.1: ; preds = %else.0
493 %tmp.11 = add i32 %a, -2 ; <i32> [#uses=1]
494 %tmp.9 = call i32 @t4( i32 %tmp.11 ) ; <i32> [#uses=1]
495 br label %return
Chris Lattner5e14b0d2007-05-05 22:29:06 +0000496
Chris Lattner7c4e9a42008-02-18 18:46:39 +0000497return: ; preds = %then.1, %else.0, %then.0
498 %result.0 = phi i32 [ 0, %else.0 ], [ %tmp.3, %then.0 ],
Chris Lattner5e14b0d2007-05-05 22:29:06 +0000499 [ %tmp.9, %then.1 ]
Chris Lattner7c4e9a42008-02-18 18:46:39 +0000500 ret i32 %result.0
Chris Lattner5e14b0d2007-05-05 22:29:06 +0000501}
Chris Lattnerf110a2b2007-05-05 22:44:08 +0000502
503//===---------------------------------------------------------------------===//
504
Chris Lattnerc90b8662008-08-10 00:47:21 +0000505Tail recursion elimination should handle:
506
507int pow2m1(int n) {
508 if (n == 0)
509 return 0;
510 return 2 * pow2m1 (n - 1) + 1;
511}
512
513Also, multiplies can be turned into SHL's, so they should be handled as if
514they were associative. "return foo() << 1" can be tail recursion eliminated.
515
516//===---------------------------------------------------------------------===//
517
Chris Lattnerf110a2b2007-05-05 22:44:08 +0000518Argument promotion should promote arguments for recursive functions, like
519this:
520
Chris Lattner7c4e9a42008-02-18 18:46:39 +0000521; RUN: llvm-as < %s | opt -argpromotion | llvm-dis | grep x.val
Chris Lattnerf110a2b2007-05-05 22:44:08 +0000522
Chris Lattner7c4e9a42008-02-18 18:46:39 +0000523define internal i32 @foo(i32* %x) {
Chris Lattnerf110a2b2007-05-05 22:44:08 +0000524entry:
Chris Lattner7c4e9a42008-02-18 18:46:39 +0000525 %tmp = load i32* %x ; <i32> [#uses=0]
526 %tmp.foo = call i32 @foo( i32* %x ) ; <i32> [#uses=1]
527 ret i32 %tmp.foo
Chris Lattnerf110a2b2007-05-05 22:44:08 +0000528}
529
Chris Lattner7c4e9a42008-02-18 18:46:39 +0000530define i32 @bar(i32* %x) {
Chris Lattnerf110a2b2007-05-05 22:44:08 +0000531entry:
Chris Lattner7c4e9a42008-02-18 18:46:39 +0000532 %tmp3 = call i32 @foo( i32* %x ) ; <i32> [#uses=1]
533 ret i32 %tmp3
Chris Lattnerf110a2b2007-05-05 22:44:08 +0000534}
535
Chris Lattner81f2d712007-12-05 23:05:06 +0000536//===---------------------------------------------------------------------===//
Chris Lattner166a2682007-12-28 04:42:05 +0000537
Chris Lattnera1643ba2007-12-28 22:30:05 +0000538We should investigate an instruction sinking pass. Consider this silly
539example in pic mode:
540
541#include <assert.h>
542void foo(int x) {
543 assert(x);
544 //...
545}
546
547we compile this to:
548_foo:
549 subl $28, %esp
550 call "L1$pb"
551"L1$pb":
552 popl %eax
553 cmpl $0, 32(%esp)
554 je LBB1_2 # cond_true
555LBB1_1: # return
556 # ...
557 addl $28, %esp
558 ret
559LBB1_2: # cond_true
560...
561
562The PIC base computation (call+popl) is only used on one path through the
563code, but is currently always computed in the entry block. It would be
564better to sink the picbase computation down into the block for the
565assertion, as it is the only one that uses it. This happens for a lot of
566code with early outs.
567
Chris Lattner92c06a02007-12-29 01:05:01 +0000568Another example is loads of arguments, which are usually emitted into the
569entry block on targets like x86. If not used in all paths through a
570function, they should be sunk into the ones that do.
571
Chris Lattnera1643ba2007-12-28 22:30:05 +0000572In this case, whole-function-isel would also handle this.
Chris Lattner166a2682007-12-28 04:42:05 +0000573
574//===---------------------------------------------------------------------===//
Chris Lattnerb3041942008-01-07 21:38:14 +0000575
576Investigate lowering of sparse switch statements into perfect hash tables:
577http://burtleburtle.net/bob/hash/perfect.html
578
579//===---------------------------------------------------------------------===//
Chris Lattnerf61b63e2008-01-09 00:17:57 +0000580
581We should turn things like "load+fabs+store" and "load+fneg+store" into the
582corresponding integer operations. On a yonah, this loop:
583
584double a[256];
Chris Lattner7c4e9a42008-02-18 18:46:39 +0000585void foo() {
586 int i, b;
587 for (b = 0; b < 10000000; b++)
588 for (i = 0; i < 256; i++)
589 a[i] = -a[i];
590}
Chris Lattnerf61b63e2008-01-09 00:17:57 +0000591
592is twice as slow as this loop:
593
594long long a[256];
Chris Lattner7c4e9a42008-02-18 18:46:39 +0000595void foo() {
596 int i, b;
597 for (b = 0; b < 10000000; b++)
598 for (i = 0; i < 256; i++)
599 a[i] ^= (1ULL << 63);
600}
Chris Lattnerf61b63e2008-01-09 00:17:57 +0000601
602and I suspect other processors are similar. On X86 in particular this is a
603big win because doing this with integers allows the use of read/modify/write
604instructions.
605
606//===---------------------------------------------------------------------===//
Chris Lattner83726012008-01-10 18:25:41 +0000607
608DAG Combiner should try to combine small loads into larger loads when
609profitable. For example, we compile this C++ example:
610
611struct THotKey { short Key; bool Control; bool Shift; bool Alt; };
612extern THotKey m_HotKey;
613THotKey GetHotKey () { return m_HotKey; }
614
615into (-O3 -fno-exceptions -static -fomit-frame-pointer):
616
617__Z9GetHotKeyv:
618 pushl %esi
619 movl 8(%esp), %eax
620 movb _m_HotKey+3, %cl
621 movb _m_HotKey+4, %dl
622 movb _m_HotKey+2, %ch
623 movw _m_HotKey, %si
624 movw %si, (%eax)
625 movb %ch, 2(%eax)
626 movb %cl, 3(%eax)
627 movb %dl, 4(%eax)
628 popl %esi
629 ret $4
630
631GCC produces:
632
633__Z9GetHotKeyv:
634 movl _m_HotKey, %edx
635 movl 4(%esp), %eax
636 movl %edx, (%eax)
637 movzwl _m_HotKey+4, %edx
638 movw %dx, 4(%eax)
639 ret $4
640
641The LLVM IR contains the needed alignment info, so we should be able to
642merge the loads and stores into 4-byte loads:
643
644 %struct.THotKey = type { i16, i8, i8, i8 }
645define void @_Z9GetHotKeyv(%struct.THotKey* sret %agg.result) nounwind {
646...
647 %tmp2 = load i16* getelementptr (@m_HotKey, i32 0, i32 0), align 8
648 %tmp5 = load i8* getelementptr (@m_HotKey, i32 0, i32 1), align 2
649 %tmp8 = load i8* getelementptr (@m_HotKey, i32 0, i32 2), align 1
650 %tmp11 = load i8* getelementptr (@m_HotKey, i32 0, i32 3), align 2
651
652Alternatively, we should use a small amount of base-offset alias analysis
653to make it so the scheduler doesn't need to hold all the loads in regs at
654once.
655
656//===---------------------------------------------------------------------===//
Chris Lattner497b7e92008-01-11 06:17:47 +0000657
Nate Begemane9fe65c2008-02-18 18:39:23 +0000658We should add an FRINT node to the DAG to model targets that have legal
659implementations of ceil/floor/rint.
Chris Lattner48840f82008-02-28 05:34:27 +0000660
661//===---------------------------------------------------------------------===//
662
663Consider:
664
665int test() {
666 long long input[8] = {1,1,1,1,1,1,1,1};
667 foo(input);
668}
669
670We currently compile this into a memcpy from a global array since the
671initializer is fairly large and not memset'able. This is good, but the memcpy
672gets lowered to load/stores in the code generator. This is also ok, except
673that the codegen lowering for memcpy doesn't handle the case when the source
674is a constant global. This gives us atrocious code like this:
675
676 call "L1$pb"
677"L1$pb":
678 popl %eax
679 movl _C.0.1444-"L1$pb"+32(%eax), %ecx
680 movl %ecx, 40(%esp)
681 movl _C.0.1444-"L1$pb"+20(%eax), %ecx
682 movl %ecx, 28(%esp)
683 movl _C.0.1444-"L1$pb"+36(%eax), %ecx
684 movl %ecx, 44(%esp)
685 movl _C.0.1444-"L1$pb"+44(%eax), %ecx
686 movl %ecx, 52(%esp)
687 movl _C.0.1444-"L1$pb"+40(%eax), %ecx
688 movl %ecx, 48(%esp)
689 movl _C.0.1444-"L1$pb"+12(%eax), %ecx
690 movl %ecx, 20(%esp)
691 movl _C.0.1444-"L1$pb"+4(%eax), %ecx
692...
693
694instead of:
695 movl $1, 16(%esp)
696 movl $0, 20(%esp)
697 movl $1, 24(%esp)
698 movl $0, 28(%esp)
699 movl $1, 32(%esp)
700 movl $0, 36(%esp)
701 ...
702
703//===---------------------------------------------------------------------===//
Chris Lattnera11deb02008-03-02 02:51:40 +0000704
705http://llvm.org/PR717:
706
707The following code should compile into "ret int undef". Instead, LLVM
708produces "ret int 0":
709
710int f() {
711 int x = 4;
712 int y;
713 if (x == 3) y = 0;
714 return y;
715}
716
717//===---------------------------------------------------------------------===//
Chris Lattner53b72772008-03-02 19:29:42 +0000718
719The loop unroller should partially unroll loops (instead of peeling them)
720when code growth isn't too bad and when an unroll count allows simplification
721of some code within the loop. One trivial example is:
722
723#include <stdio.h>
724int main() {
725 int nRet = 17;
726 int nLoop;
727 for ( nLoop = 0; nLoop < 1000; nLoop++ ) {
728 if ( nLoop & 1 )
729 nRet += 2;
730 else
731 nRet -= 1;
732 }
733 return nRet;
734}
735
736Unrolling by 2 would eliminate the '&1' in both copies, leading to a net
737reduction in code size. The resultant code would then also be suitable for
738exit value computation.
739
740//===---------------------------------------------------------------------===//
Chris Lattner349155b2008-03-17 01:47:51 +0000741
742We miss a bunch of rotate opportunities on various targets, including ppc, x86,
743etc. On X86, we miss a bunch of 'rotate by variable' cases because the rotate
744matching code in dag combine doesn't look through truncates aggressively
745enough. Here are some testcases reduces from GCC PR17886:
746
747unsigned long long f(unsigned long long x, int y) {
748 return (x << y) | (x >> 64-y);
749}
750unsigned f2(unsigned x, int y){
751 return (x << y) | (x >> 32-y);
752}
753unsigned long long f3(unsigned long long x){
754 int y = 9;
755 return (x << y) | (x >> 64-y);
756}
757unsigned f4(unsigned x){
758 int y = 10;
759 return (x << y) | (x >> 32-y);
760}
761unsigned long long f5(unsigned long long x, unsigned long long y) {
762 return (x << 8) | ((y >> 48) & 0xffull);
763}
764unsigned long long f6(unsigned long long x, unsigned long long y, int z) {
765 switch(z) {
766 case 1:
767 return (x << 8) | ((y >> 48) & 0xffull);
768 case 2:
769 return (x << 16) | ((y >> 40) & 0xffffull);
770 case 3:
771 return (x << 24) | ((y >> 32) & 0xffffffull);
772 case 4:
773 return (x << 32) | ((y >> 24) & 0xffffffffull);
774 default:
775 return (x << 40) | ((y >> 16) & 0xffffffffffull);
776 }
777}
778
Dan Gohmancb747c52008-10-17 21:39:27 +0000779On X86-64, we only handle f2/f3/f4 right. On x86-32, a few of these
Chris Lattner349155b2008-03-17 01:47:51 +0000780generate truly horrible code, instead of using shld and friends. On
781ARM, we end up with calls to L___lshrdi3/L___ashldi3 in f, which is
782badness. PPC64 misses f, f5 and f6. CellSPU aborts in isel.
783
784//===---------------------------------------------------------------------===//
Chris Lattnerf70107f2008-03-20 04:46:13 +0000785
786We do a number of simplifications in simplify libcalls to strength reduce
787standard library functions, but we don't currently merge them together. For
788example, it is useful to merge memcpy(a,b,strlen(b)) -> strcpy. This can only
789be done safely if "b" isn't modified between the strlen and memcpy of course.
790
791//===---------------------------------------------------------------------===//
792
Chris Lattner26e150f2008-08-10 01:14:08 +0000793We compile this program: (from GCC PR11680)
794http://gcc.gnu.org/bugzilla/attachment.cgi?id=4487
795
796Into code that runs the same speed in fast/slow modes, but both modes run 2x
797slower than when compile with GCC (either 4.0 or 4.2):
798
799$ llvm-g++ perf.cpp -O3 -fno-exceptions
800$ time ./a.out fast
8011.821u 0.003s 0:01.82 100.0% 0+0k 0+0io 0pf+0w
802
803$ g++ perf.cpp -O3 -fno-exceptions
804$ time ./a.out fast
8050.821u 0.001s 0:00.82 100.0% 0+0k 0+0io 0pf+0w
806
807It looks like we are making the same inlining decisions, so this may be raw
808codegen badness or something else (haven't investigated).
809
810//===---------------------------------------------------------------------===//
811
812We miss some instcombines for stuff like this:
813void bar (void);
814void foo (unsigned int a) {
815 /* This one is equivalent to a >= (3 << 2). */
816 if ((a >> 2) >= 3)
817 bar ();
818}
819
820A few other related ones are in GCC PR14753.
821
822//===---------------------------------------------------------------------===//
823
824Divisibility by constant can be simplified (according to GCC PR12849) from
825being a mulhi to being a mul lo (cheaper). Testcase:
826
827void bar(unsigned n) {
828 if (n % 3 == 0)
829 true();
830}
831
Eli Friedmanbcae2052009-12-12 23:23:43 +0000832This is equivalent to the following, where 2863311531 is the multiplicative
833inverse of 3, and 1431655766 is ((2^32)-1)/3+1:
834void bar(unsigned n) {
835 if (n * 2863311531U < 1431655766U)
836 true();
837}
838
839The same transformation can work with an even modulo with the addition of a
840rotate: rotate the result of the multiply to the right by the number of bits
841which need to be zero for the condition to be true, and shrink the compare RHS
842by the same amount. Unless the target supports rotates, though, that
843transformation probably isn't worthwhile.
844
845The transformation can also easily be made to work with non-zero equality
846comparisons: just transform, for example, "n % 3 == 1" to "(n-1) % 3 == 0".
Chris Lattner26e150f2008-08-10 01:14:08 +0000847
848//===---------------------------------------------------------------------===//
Chris Lattner23f35bc2008-08-19 06:22:16 +0000849
Chris Lattnerdb039832008-10-15 16:06:03 +0000850Better mod/ref analysis for scanf would allow us to eliminate the vtable and a
851bunch of other stuff from this example (see PR1604):
852
853#include <cstdio>
854struct test {
855 int val;
856 virtual ~test() {}
857};
858
859int main() {
860 test t;
861 std::scanf("%d", &t.val);
862 std::printf("%d\n", t.val);
863}
864
865//===---------------------------------------------------------------------===//
866
Nick Lewyckyd2f0db12008-11-27 22:41:45 +0000867These functions perform the same computation, but produce different assembly.
Nick Lewyckydf563ca2008-11-27 22:12:22 +0000868
869define i8 @select(i8 %x) readnone nounwind {
870 %A = icmp ult i8 %x, 250
871 %B = select i1 %A, i8 0, i8 1
872 ret i8 %B
873}
874
875define i8 @addshr(i8 %x) readnone nounwind {
876 %A = zext i8 %x to i9
877 %B = add i9 %A, 6 ;; 256 - 250 == 6
878 %C = lshr i9 %B, 8
879 %D = trunc i9 %C to i8
880 ret i8 %D
881}
882
883//===---------------------------------------------------------------------===//
Eli Friedman4e16b292008-11-30 07:36:04 +0000884
885From gcc bug 24696:
886int
887f (unsigned long a, unsigned long b, unsigned long c)
888{
889 return ((a & (c - 1)) != 0) || ((b & (c - 1)) != 0);
890}
891int
892f (unsigned long a, unsigned long b, unsigned long c)
893{
894 return ((a & (c - 1)) != 0) | ((b & (c - 1)) != 0);
895}
896Both should combine to ((a|b) & (c-1)) != 0. Currently not optimized with
897"clang -emit-llvm-bc | opt -std-compile-opts".
898
899//===---------------------------------------------------------------------===//
900
901From GCC Bug 20192:
902#define PMD_MASK (~((1UL << 23) - 1))
903void clear_pmd_range(unsigned long start, unsigned long end)
904{
905 if (!(start & ~PMD_MASK) && !(end & ~PMD_MASK))
906 f();
907}
908The expression should optimize to something like
909"!((start|end)&~PMD_MASK). Currently not optimized with "clang
910-emit-llvm-bc | opt -std-compile-opts".
911
912//===---------------------------------------------------------------------===//
913
Eli Friedman4e16b292008-11-30 07:36:04 +0000914From GCC Bug 3756:
915int
916pn (int n)
917{
918 return (n >= 0 ? 1 : -1);
919}
920Should combine to (n >> 31) | 1. Currently not optimized with "clang
921-emit-llvm-bc | opt -std-compile-opts | llc".
922
923//===---------------------------------------------------------------------===//
924
Eli Friedman4e16b292008-11-30 07:36:04 +0000925void a(int variable)
926{
927 if (variable == 4 || variable == 6)
928 bar();
929}
930This should optimize to "if ((variable | 2) == 6)". Currently not
931optimized with "clang -emit-llvm-bc | opt -std-compile-opts | llc".
932
933//===---------------------------------------------------------------------===//
934
935unsigned int f(unsigned int i, unsigned int n) {++i; if (i == n) ++i; return
936i;}
937unsigned int f2(unsigned int i, unsigned int n) {++i; i += i == n; return i;}
938These should combine to the same thing. Currently, the first function
939produces better code on X86.
940
941//===---------------------------------------------------------------------===//
942
Eli Friedman4e16b292008-11-30 07:36:04 +0000943From GCC Bug 15784:
944#define abs(x) x>0?x:-x
945int f(int x, int y)
946{
947 return (abs(x)) >= 0;
948}
949This should optimize to x == INT_MIN. (With -fwrapv.) Currently not
950optimized with "clang -emit-llvm-bc | opt -std-compile-opts".
951
952//===---------------------------------------------------------------------===//
953
954From GCC Bug 14753:
955void
956rotate_cst (unsigned int a)
957{
958 a = (a << 10) | (a >> 22);
959 if (a == 123)
960 bar ();
961}
962void
963minus_cst (unsigned int a)
964{
965 unsigned int tem;
966
967 tem = 20 - a;
968 if (tem == 5)
969 bar ();
970}
971void
972mask_gt (unsigned int a)
973{
974 /* This is equivalent to a > 15. */
975 if ((a & ~7) > 8)
976 bar ();
977}
978void
979rshift_gt (unsigned int a)
980{
981 /* This is equivalent to a > 23. */
982 if ((a >> 2) > 5)
983 bar ();
984}
985All should simplify to a single comparison. All of these are
986currently not optimized with "clang -emit-llvm-bc | opt
987-std-compile-opts".
988
989//===---------------------------------------------------------------------===//
990
991From GCC Bug 32605:
992int c(int* x) {return (char*)x+2 == (char*)x;}
993Should combine to 0. Currently not optimized with "clang
994-emit-llvm-bc | opt -std-compile-opts" (although llc can optimize it).
995
996//===---------------------------------------------------------------------===//
997
Eli Friedman4e16b292008-11-30 07:36:04 +0000998int a(unsigned b) {return ((b << 31) | (b << 30)) >> 31;}
999Should be combined to "((b >> 1) | b) & 1". Currently not optimized
1000with "clang -emit-llvm-bc | opt -std-compile-opts".
1001
1002//===---------------------------------------------------------------------===//
1003
1004unsigned a(unsigned x, unsigned y) { return x | (y & 1) | (y & 2);}
1005Should combine to "x | (y & 3)". Currently not optimized with "clang
1006-emit-llvm-bc | opt -std-compile-opts".
1007
1008//===---------------------------------------------------------------------===//
1009
Eli Friedman4e16b292008-11-30 07:36:04 +00001010int a(int a, int b, int c) {return (~a & c) | ((c|a) & b);}
1011Should fold to "(~a & c) | (a & b)". Currently not optimized with
1012"clang -emit-llvm-bc | opt -std-compile-opts".
1013
1014//===---------------------------------------------------------------------===//
1015
1016int a(int a,int b) {return (~(a|b))|a;}
1017Should fold to "a|~b". Currently not optimized with "clang
1018-emit-llvm-bc | opt -std-compile-opts".
1019
1020//===---------------------------------------------------------------------===//
1021
1022int a(int a, int b) {return (a&&b) || (a&&!b);}
1023Should fold to "a". Currently not optimized with "clang -emit-llvm-bc
1024| opt -std-compile-opts".
1025
1026//===---------------------------------------------------------------------===//
1027
1028int a(int a, int b, int c) {return (a&&b) || (!a&&c);}
1029Should fold to "a ? b : c", or at least something sane. Currently not
1030optimized with "clang -emit-llvm-bc | opt -std-compile-opts".
1031
1032//===---------------------------------------------------------------------===//
1033
1034int a(int a, int b, int c) {return (a&&b) || (a&&c) || (a&&b&&c);}
1035Should fold to a && (b || c). Currently not optimized with "clang
1036-emit-llvm-bc | opt -std-compile-opts".
1037
1038//===---------------------------------------------------------------------===//
1039
1040int a(int x) {return x | ((x & 8) ^ 8);}
1041Should combine to x | 8. Currently not optimized with "clang
1042-emit-llvm-bc | opt -std-compile-opts".
1043
1044//===---------------------------------------------------------------------===//
1045
1046int a(int x) {return x ^ ((x & 8) ^ 8);}
1047Should also combine to x | 8. Currently not optimized with "clang
1048-emit-llvm-bc | opt -std-compile-opts".
1049
1050//===---------------------------------------------------------------------===//
1051
1052int a(int x) {return (x & 8) == 0 ? -1 : -9;}
1053Should combine to (x | -9) ^ 8. Currently not optimized with "clang
1054-emit-llvm-bc | opt -std-compile-opts".
1055
1056//===---------------------------------------------------------------------===//
1057
1058int a(int x) {return (x & 8) == 0 ? -9 : -1;}
1059Should combine to x | -9. Currently not optimized with "clang
1060-emit-llvm-bc | opt -std-compile-opts".
1061
1062//===---------------------------------------------------------------------===//
1063
1064int a(int x) {return ((x | -9) ^ 8) & x;}
1065Should combine to x & -9. Currently not optimized with "clang
1066-emit-llvm-bc | opt -std-compile-opts".
1067
1068//===---------------------------------------------------------------------===//
1069
1070unsigned a(unsigned a) {return a * 0x11111111 >> 28 & 1;}
1071Should combine to "a * 0x88888888 >> 31". Currently not optimized
1072with "clang -emit-llvm-bc | opt -std-compile-opts".
1073
1074//===---------------------------------------------------------------------===//
1075
1076unsigned a(char* x) {if ((*x & 32) == 0) return b();}
1077There's an unnecessary zext in the generated code with "clang
1078-emit-llvm-bc | opt -std-compile-opts".
1079
1080//===---------------------------------------------------------------------===//
1081
1082unsigned a(unsigned long long x) {return 40 * (x >> 1);}
1083Should combine to "20 * (((unsigned)x) & -2)". Currently not
1084optimized with "clang -emit-llvm-bc | opt -std-compile-opts".
1085
1086//===---------------------------------------------------------------------===//
Bill Wendling3bdcda82008-12-02 05:12:47 +00001087
Chris Lattner88d84b22008-12-02 06:32:34 +00001088This was noticed in the entryblock for grokdeclarator in 403.gcc:
1089
1090 %tmp = icmp eq i32 %decl_context, 4
1091 %decl_context_addr.0 = select i1 %tmp, i32 3, i32 %decl_context
1092 %tmp1 = icmp eq i32 %decl_context_addr.0, 1
1093 %decl_context_addr.1 = select i1 %tmp1, i32 0, i32 %decl_context_addr.0
1094
1095tmp1 should be simplified to something like:
1096 (!tmp || decl_context == 1)
1097
1098This allows recursive simplifications, tmp1 is used all over the place in
1099the function, e.g. by:
1100
1101 %tmp23 = icmp eq i32 %decl_context_addr.1, 0 ; <i1> [#uses=1]
1102 %tmp24 = xor i1 %tmp1, true ; <i1> [#uses=1]
1103 %or.cond8 = and i1 %tmp23, %tmp24 ; <i1> [#uses=1]
1104
1105later.
1106
Chris Lattner78a7e7c2008-12-06 19:28:22 +00001107//===---------------------------------------------------------------------===//
1108
Chris Lattnerd4137f42009-11-29 02:19:52 +00001109[STORE SINKING]
1110
Chris Lattner78a7e7c2008-12-06 19:28:22 +00001111Store sinking: This code:
1112
1113void f (int n, int *cond, int *res) {
1114 int i;
1115 *res = 0;
1116 for (i = 0; i < n; i++)
1117 if (*cond)
1118 *res ^= 234; /* (*) */
1119}
1120
1121On this function GVN hoists the fully redundant value of *res, but nothing
1122moves the store out. This gives us this code:
1123
1124bb: ; preds = %bb2, %entry
1125 %.rle = phi i32 [ 0, %entry ], [ %.rle6, %bb2 ]
1126 %i.05 = phi i32 [ 0, %entry ], [ %indvar.next, %bb2 ]
1127 %1 = load i32* %cond, align 4
1128 %2 = icmp eq i32 %1, 0
1129 br i1 %2, label %bb2, label %bb1
1130
1131bb1: ; preds = %bb
1132 %3 = xor i32 %.rle, 234
1133 store i32 %3, i32* %res, align 4
1134 br label %bb2
1135
1136bb2: ; preds = %bb, %bb1
1137 %.rle6 = phi i32 [ %3, %bb1 ], [ %.rle, %bb ]
1138 %indvar.next = add i32 %i.05, 1
1139 %exitcond = icmp eq i32 %indvar.next, %n
1140 br i1 %exitcond, label %return, label %bb
1141
1142DSE should sink partially dead stores to get the store out of the loop.
1143
Chris Lattner6a09a742008-12-06 22:52:12 +00001144Here's another partial dead case:
1145http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12395
1146
Chris Lattner78a7e7c2008-12-06 19:28:22 +00001147//===---------------------------------------------------------------------===//
1148
1149Scalar PRE hoists the mul in the common block up to the else:
1150
1151int test (int a, int b, int c, int g) {
1152 int d, e;
1153 if (a)
1154 d = b * c;
1155 else
1156 d = b - c;
1157 e = b * c + g;
1158 return d + e;
1159}
1160
1161It would be better to do the mul once to reduce codesize above the if.
1162This is GCC PR38204.
1163
1164//===---------------------------------------------------------------------===//
1165
Chris Lattnerd4137f42009-11-29 02:19:52 +00001166[STORE SINKING]
1167
Chris Lattner78a7e7c2008-12-06 19:28:22 +00001168GCC PR37810 is an interesting case where we should sink load/store reload
1169into the if block and outside the loop, so we don't reload/store it on the
1170non-call path.
1171
1172for () {
1173 *P += 1;
1174 if ()
1175 call();
1176 else
1177 ...
1178->
1179tmp = *P
1180for () {
1181 tmp += 1;
1182 if () {
1183 *P = tmp;
1184 call();
1185 tmp = *P;
1186 } else ...
1187}
1188*P = tmp;
1189
Chris Lattner8f416f32008-12-15 07:49:24 +00001190We now hoist the reload after the call (Transforms/GVN/lpre-call-wrap.ll), but
1191we don't sink the store. We need partially dead store sinking.
1192
Chris Lattner78a7e7c2008-12-06 19:28:22 +00001193//===---------------------------------------------------------------------===//
1194
Chris Lattnerd4137f42009-11-29 02:19:52 +00001195[LOAD PRE CRIT EDGE SPLITTING]
Chris Lattner8f416f32008-12-15 07:49:24 +00001196
Chris Lattner78a7e7c2008-12-06 19:28:22 +00001197GCC PR37166: Sinking of loads prevents SROA'ing the "g" struct on the stack
1198leading to excess stack traffic. This could be handled by GVN with some crazy
1199symbolic phi translation. The code we get looks like (g is on the stack):
1200
1201bb2: ; preds = %bb1
1202..
1203 %9 = getelementptr %struct.f* %g, i32 0, i32 0
1204 store i32 %8, i32* %9, align bel %bb3
1205
1206bb3: ; preds = %bb1, %bb2, %bb
1207 %c_addr.0 = phi %struct.f* [ %g, %bb2 ], [ %c, %bb ], [ %c, %bb1 ]
1208 %b_addr.0 = phi %struct.f* [ %b, %bb2 ], [ %g, %bb ], [ %b, %bb1 ]
1209 %10 = getelementptr %struct.f* %c_addr.0, i32 0, i32 0
1210 %11 = load i32* %10, align 4
1211
Chris Lattner6d949262009-11-27 16:53:57 +00001212%11 is partially redundant, an in BB2 it should have the value %8.
Chris Lattner78a7e7c2008-12-06 19:28:22 +00001213
Chris Lattnerd4137f42009-11-29 02:19:52 +00001214GCC PR33344 and PR35287 are similar cases.
Chris Lattner6a09a742008-12-06 22:52:12 +00001215
Chris Lattner6c9fab72009-11-05 18:19:19 +00001216
1217//===---------------------------------------------------------------------===//
1218
Chris Lattnerd4137f42009-11-29 02:19:52 +00001219[LOAD PRE]
1220
Chris Lattner6a09a742008-12-06 22:52:12 +00001221There are many load PRE testcases in testsuite/gcc.dg/tree-ssa/loadpre* in the
Chris Lattnerd4137f42009-11-29 02:19:52 +00001222GCC testsuite, ones we don't get yet are (checked through loadpre25):
1223
1224[CRIT EDGE BREAKING]
1225loadpre3.c predcom-4.c
1226
1227[PRE OF READONLY CALL]
1228loadpre5.c
1229
1230[TURN SELECT INTO BRANCH]
1231loadpre14.c loadpre15.c
1232
1233actually a conditional increment: loadpre18.c loadpre19.c
1234
1235
1236//===---------------------------------------------------------------------===//
1237
1238[SCALAR PRE]
1239There are many PRE testcases in testsuite/gcc.dg/tree-ssa/ssa-pre-*.c in the
1240GCC testsuite.
Chris Lattner6a09a742008-12-06 22:52:12 +00001241
Chris Lattner582048d2008-12-15 08:32:28 +00001242//===---------------------------------------------------------------------===//
1243
1244There are some interesting cases in testsuite/gcc.dg/tree-ssa/pred-comm* in the
Chris Lattnerd4137f42009-11-29 02:19:52 +00001245GCC testsuite. For example, we get the first example in predcom-1.c, but
1246miss the second one:
Chris Lattner582048d2008-12-15 08:32:28 +00001247
Chris Lattnerd4137f42009-11-29 02:19:52 +00001248unsigned fib[1000];
1249unsigned avg[1000];
Chris Lattner582048d2008-12-15 08:32:28 +00001250
Chris Lattnerd4137f42009-11-29 02:19:52 +00001251__attribute__ ((noinline))
1252void count_averages(int n) {
1253 int i;
1254 for (i = 1; i < n; i++)
1255 avg[i] = (((unsigned long) fib[i - 1] + fib[i] + fib[i + 1]) / 3) & 0xffff;
1256}
Chris Lattner582048d2008-12-15 08:32:28 +00001257
Chris Lattnerd4137f42009-11-29 02:19:52 +00001258which compiles into two loads instead of one in the loop.
Chris Lattner582048d2008-12-15 08:32:28 +00001259
Chris Lattnerd4137f42009-11-29 02:19:52 +00001260predcom-2.c is the same as predcom-1.c
Chris Lattner582048d2008-12-15 08:32:28 +00001261
Chris Lattner582048d2008-12-15 08:32:28 +00001262predcom-3.c is very similar but needs loads feeding each other instead of
1263store->load.
Chris Lattner582048d2008-12-15 08:32:28 +00001264
1265
1266//===---------------------------------------------------------------------===//
1267
Chris Lattneraa306c22010-01-23 17:59:23 +00001268[ALIAS ANALYSIS]
1269
Chris Lattner582048d2008-12-15 08:32:28 +00001270Type based alias analysis:
Chris Lattner6a09a742008-12-06 22:52:12 +00001271http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14705
1272
Chris Lattneraa306c22010-01-23 17:59:23 +00001273We should do better analysis of posix_memalign. At the least it should
1274no-capture its pointer argument, at best, we should know that the out-value
1275result doesn't point to anything (like malloc). One example of this is in
1276SingleSource/Benchmarks/Misc/dt.c
1277
Chris Lattner6a09a742008-12-06 22:52:12 +00001278//===---------------------------------------------------------------------===//
1279
Chris Lattner6a09a742008-12-06 22:52:12 +00001280A/B get pinned to the stack because we turn an if/then into a select instead
1281of PRE'ing the load/store. This may be fixable in instcombine:
1282http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37892
1283
Chris Lattner93c6c772009-09-21 02:53:57 +00001284struct X { int i; };
1285int foo (int x) {
1286 struct X a;
1287 struct X b;
1288 struct X *p;
1289 a.i = 1;
1290 b.i = 2;
1291 if (x)
1292 p = &a;
1293 else
1294 p = &b;
1295 return p->i;
1296}
Chris Lattner582048d2008-12-15 08:32:28 +00001297
Chris Lattner93c6c772009-09-21 02:53:57 +00001298//===---------------------------------------------------------------------===//
Chris Lattner582048d2008-12-15 08:32:28 +00001299
Chris Lattner6a09a742008-12-06 22:52:12 +00001300Interesting missed case because of control flow flattening (should be 2 loads):
1301http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26629
Chris Lattner582048d2008-12-15 08:32:28 +00001302With: llvm-gcc t2.c -S -o - -O0 -emit-llvm | llvm-as |
1303 opt -mem2reg -gvn -instcombine | llvm-dis
Chris Lattnerd4137f42009-11-29 02:19:52 +00001304we miss it because we need 1) CRIT EDGE 2) MULTIPLE DIFFERENT
Chris Lattner582048d2008-12-15 08:32:28 +00001305VALS PRODUCED BY ONE BLOCK OVER DIFFERENT PATHS
Chris Lattner6a09a742008-12-06 22:52:12 +00001306
1307//===---------------------------------------------------------------------===//
1308
1309http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19633
1310We could eliminate the branch condition here, loading from null is undefined:
1311
1312struct S { int w, x, y, z; };
1313struct T { int r; struct S s; };
1314void bar (struct S, int);
1315void foo (int a, struct T b)
1316{
1317 struct S *c = 0;
1318 if (a)
1319 c = &b.s;
1320 bar (*c, a);
1321}
1322
1323//===---------------------------------------------------------------------===//
Chris Lattner88d84b22008-12-02 06:32:34 +00001324
Chris Lattner9cf8ef62008-12-23 20:52:52 +00001325simplifylibcalls should do several optimizations for strspn/strcspn:
1326
1327strcspn(x, "") -> strlen(x)
1328strcspn("", x) -> 0
1329strspn("", x) -> 0
1330strspn(x, "") -> strlen(x)
1331strspn(x, "a") -> strchr(x, 'a')-x
1332
1333strcspn(x, "a") -> inlined loop for up to 3 letters (similarly for strspn):
1334
1335size_t __strcspn_c3 (__const char *__s, int __reject1, int __reject2,
1336 int __reject3) {
1337 register size_t __result = 0;
1338 while (__s[__result] != '\0' && __s[__result] != __reject1 &&
1339 __s[__result] != __reject2 && __s[__result] != __reject3)
1340 ++__result;
1341 return __result;
1342}
1343
1344This should turn into a switch on the character. See PR3253 for some notes on
1345codegen.
1346
1347456.hmmer apparently uses strcspn and strspn a lot. 471.omnetpp uses strspn.
1348
1349//===---------------------------------------------------------------------===//
Chris Lattnerd23b7992008-12-31 00:54:13 +00001350
1351"gas" uses this idiom:
1352 else if (strchr ("+-/*%|&^:[]()~", *intel_parser.op_string))
1353..
1354 else if (strchr ("<>", *intel_parser.op_string)
1355
1356Those should be turned into a switch.
1357
1358//===---------------------------------------------------------------------===//
Chris Lattnerffb08f52009-01-08 06:52:57 +00001359
1360252.eon contains this interesting code:
1361
1362 %3072 = getelementptr [100 x i8]* %tempString, i32 0, i32 0
1363 %3073 = call i8* @strcpy(i8* %3072, i8* %3071) nounwind
1364 %strlen = call i32 @strlen(i8* %3072) ; uses = 1
1365 %endptr = getelementptr [100 x i8]* %tempString, i32 0, i32 %strlen
1366 call void @llvm.memcpy.i32(i8* %endptr,
1367 i8* getelementptr ([5 x i8]* @"\01LC42", i32 0, i32 0), i32 5, i32 1)
1368 %3074 = call i32 @strlen(i8* %endptr) nounwind readonly
1369
1370This is interesting for a couple reasons. First, in this:
1371
1372 %3073 = call i8* @strcpy(i8* %3072, i8* %3071) nounwind
1373 %strlen = call i32 @strlen(i8* %3072)
1374
1375The strlen could be replaced with: %strlen = sub %3072, %3073, because the
1376strcpy call returns a pointer to the end of the string. Based on that, the
1377endptr GEP just becomes equal to 3073, which eliminates a strlen call and GEP.
1378
1379Second, the memcpy+strlen strlen can be replaced with:
1380
1381 %3074 = call i32 @strlen([5 x i8]* @"\01LC42") nounwind readonly
1382
1383Because the destination was just copied into the specified memory buffer. This,
1384in turn, can be constant folded to "4".
1385
1386In other code, it contains:
1387
1388 %endptr6978 = bitcast i8* %endptr69 to i32*
1389 store i32 7107374, i32* %endptr6978, align 1
1390 %3167 = call i32 @strlen(i8* %endptr69) nounwind readonly
1391
1392Which could also be constant folded. Whatever is producing this should probably
1393be fixed to leave this as a memcpy from a string.
1394
1395Further, eon also has an interesting partially redundant strlen call:
1396
1397bb8: ; preds = %_ZN18eonImageCalculatorC1Ev.exit
1398 %682 = getelementptr i8** %argv, i32 6 ; <i8**> [#uses=2]
1399 %683 = load i8** %682, align 4 ; <i8*> [#uses=4]
1400 %684 = load i8* %683, align 1 ; <i8> [#uses=1]
1401 %685 = icmp eq i8 %684, 0 ; <i1> [#uses=1]
1402 br i1 %685, label %bb10, label %bb9
1403
1404bb9: ; preds = %bb8
1405 %686 = call i32 @strlen(i8* %683) nounwind readonly
1406 %687 = icmp ugt i32 %686, 254 ; <i1> [#uses=1]
1407 br i1 %687, label %bb10, label %bb11
1408
1409bb10: ; preds = %bb9, %bb8
1410 %688 = call i32 @strlen(i8* %683) nounwind readonly
1411
1412This could be eliminated by doing the strlen once in bb8, saving code size and
1413improving perf on the bb8->9->10 path.
1414
1415//===---------------------------------------------------------------------===//
Chris Lattner9fee08f2009-01-08 07:34:55 +00001416
1417I see an interesting fully redundant call to strlen left in 186.crafty:InputMove
1418which looks like:
1419 %movetext11 = getelementptr [128 x i8]* %movetext, i32 0, i32 0
1420
1421
1422bb62: ; preds = %bb55, %bb53
1423 %promote.0 = phi i32 [ %169, %bb55 ], [ 0, %bb53 ]
1424 %171 = call i32 @strlen(i8* %movetext11) nounwind readonly align 1
1425 %172 = add i32 %171, -1 ; <i32> [#uses=1]
1426 %173 = getelementptr [128 x i8]* %movetext, i32 0, i32 %172
1427
1428... no stores ...
1429 br i1 %or.cond, label %bb65, label %bb72
1430
1431bb65: ; preds = %bb62
1432 store i8 0, i8* %173, align 1
1433 br label %bb72
1434
1435bb72: ; preds = %bb65, %bb62
1436 %trank.1 = phi i32 [ %176, %bb65 ], [ -1, %bb62 ]
1437 %177 = call i32 @strlen(i8* %movetext11) nounwind readonly align 1
1438
1439Note that on the bb62->bb72 path, that the %177 strlen call is partially
1440redundant with the %171 call. At worst, we could shove the %177 strlen call
1441up into the bb65 block moving it out of the bb62->bb72 path. However, note
1442that bb65 stores to the string, zeroing out the last byte. This means that on
1443that path the value of %177 is actually just %171-1. A sub is cheaper than a
1444strlen!
1445
1446This pattern repeats several times, basically doing:
1447
1448 A = strlen(P);
1449 P[A-1] = 0;
1450 B = strlen(P);
1451 where it is "obvious" that B = A-1.
1452
1453//===---------------------------------------------------------------------===//
1454
1455186.crafty contains this interesting pattern:
1456
1457%77 = call i8* @strstr(i8* getelementptr ([6 x i8]* @"\01LC5", i32 0, i32 0),
1458 i8* %30)
1459%phitmp648 = icmp eq i8* %77, getelementptr ([6 x i8]* @"\01LC5", i32 0, i32 0)
1460br i1 %phitmp648, label %bb70, label %bb76
1461
1462bb70: ; preds = %OptionMatch.exit91, %bb69
1463 %78 = call i32 @strlen(i8* %30) nounwind readonly align 1 ; <i32> [#uses=1]
1464
1465This is basically:
1466 cststr = "abcdef";
1467 if (strstr(cststr, P) == cststr) {
1468 x = strlen(P);
1469 ...
1470
1471The strstr call would be significantly cheaper written as:
1472
1473cststr = "abcdef";
1474if (memcmp(P, str, strlen(P)))
1475 x = strlen(P);
1476
1477This is memcmp+strlen instead of strstr. This also makes the strlen fully
1478redundant.
1479
1480//===---------------------------------------------------------------------===//
1481
1482186.crafty also contains this code:
1483
1484%1906 = call i32 @strlen(i8* getelementptr ([32 x i8]* @pgn_event, i32 0,i32 0))
1485%1907 = getelementptr [32 x i8]* @pgn_event, i32 0, i32 %1906
1486%1908 = call i8* @strcpy(i8* %1907, i8* %1905) nounwind align 1
1487%1909 = call i32 @strlen(i8* getelementptr ([32 x i8]* @pgn_event, i32 0,i32 0))
1488%1910 = getelementptr [32 x i8]* @pgn_event, i32 0, i32 %1909
1489
1490The last strlen is computable as 1908-@pgn_event, which means 1910=1908.
1491
1492//===---------------------------------------------------------------------===//
1493
1494186.crafty has this interesting pattern with the "out.4543" variable:
1495
1496call void @llvm.memcpy.i32(
1497 i8* getelementptr ([10 x i8]* @out.4543, i32 0, i32 0),
1498 i8* getelementptr ([7 x i8]* @"\01LC28700", i32 0, i32 0), i32 7, i32 1)
1499%101 = call@printf(i8* ... @out.4543, i32 0, i32 0)) nounwind
1500
1501It is basically doing:
1502
1503 memcpy(globalarray, "string");
1504 printf(..., globalarray);
1505
1506Anyway, by knowing that printf just reads the memory and forward substituting
1507the string directly into the printf, this eliminates reads from globalarray.
1508Since this pattern occurs frequently in crafty (due to the "DisplayTime" and
1509other similar functions) there are many stores to "out". Once all the printfs
1510stop using "out", all that is left is the memcpy's into it. This should allow
1511globalopt to remove the "stored only" global.
1512
1513//===---------------------------------------------------------------------===//
1514
Dan Gohman8289b052009-01-20 01:07:33 +00001515This code:
1516
1517define inreg i32 @foo(i8* inreg %p) nounwind {
1518 %tmp0 = load i8* %p
1519 %tmp1 = ashr i8 %tmp0, 5
1520 %tmp2 = sext i8 %tmp1 to i32
1521 ret i32 %tmp2
1522}
1523
1524could be dagcombine'd to a sign-extending load with a shift.
1525For example, on x86 this currently gets this:
1526
1527 movb (%eax), %al
1528 sarb $5, %al
1529 movsbl %al, %eax
1530
1531while it could get this:
1532
1533 movsbl (%eax), %eax
1534 sarl $5, %eax
1535
1536//===---------------------------------------------------------------------===//
Chris Lattner256baa42009-01-22 07:16:03 +00001537
1538GCC PR31029:
1539
1540int test(int x) { return 1-x == x; } // --> return false
1541int test2(int x) { return 2-x == x; } // --> return x == 1 ?
1542
1543Always foldable for odd constants, what is the rule for even?
1544
1545//===---------------------------------------------------------------------===//
1546
Torok Edwine46a6862009-01-24 19:30:25 +00001547PR 3381: GEP to field of size 0 inside a struct could be turned into GEP
1548for next field in struct (which is at same address).
1549
1550For example: store of float into { {{}}, float } could be turned into a store to
1551the float directly.
1552
Torok Edwin474479f2009-02-20 18:42:06 +00001553//===---------------------------------------------------------------------===//
Nick Lewycky20babb12009-02-25 06:52:48 +00001554
Torok Edwin474479f2009-02-20 18:42:06 +00001555#include <math.h>
1556double foo(double a) { return sin(a); }
1557
1558This compiles into this on x86-64 Linux:
1559foo:
1560 subq $8, %rsp
1561 call sin
1562 addq $8, %rsp
1563 ret
1564vs:
1565
1566foo:
1567 jmp sin
1568
Nick Lewycky20babb12009-02-25 06:52:48 +00001569//===---------------------------------------------------------------------===//
1570
Chris Lattner32c5f172009-05-11 17:41:40 +00001571The arg promotion pass should make use of nocapture to make its alias analysis
1572stuff much more precise.
1573
1574//===---------------------------------------------------------------------===//
1575
1576The following functions should be optimized to use a select instead of a
1577branch (from gcc PR40072):
1578
1579char char_int(int m) {if(m>7) return 0; return m;}
1580int int_char(char m) {if(m>7) return 0; return m;}
1581
1582//===---------------------------------------------------------------------===//
1583
Bill Wendling5a569272009-10-27 22:48:31 +00001584int func(int a, int b) { if (a & 0x80) b |= 0x80; else b &= ~0x80; return b; }
1585
1586Generates this:
1587
1588define i32 @func(i32 %a, i32 %b) nounwind readnone ssp {
1589entry:
1590 %0 = and i32 %a, 128 ; <i32> [#uses=1]
1591 %1 = icmp eq i32 %0, 0 ; <i1> [#uses=1]
1592 %2 = or i32 %b, 128 ; <i32> [#uses=1]
1593 %3 = and i32 %b, -129 ; <i32> [#uses=1]
1594 %b_addr.0 = select i1 %1, i32 %3, i32 %2 ; <i32> [#uses=1]
1595 ret i32 %b_addr.0
1596}
1597
1598However, it's functionally equivalent to:
1599
1600 b = (b & ~0x80) | (a & 0x80);
1601
1602Which generates this:
1603
1604define i32 @func(i32 %a, i32 %b) nounwind readnone ssp {
1605entry:
1606 %0 = and i32 %b, -129 ; <i32> [#uses=1]
1607 %1 = and i32 %a, 128 ; <i32> [#uses=1]
1608 %2 = or i32 %0, %1 ; <i32> [#uses=1]
1609 ret i32 %2
1610}
1611
1612This can be generalized for other forms:
1613
1614 b = (b & ~0x80) | (a & 0x40) << 1;
1615
1616//===---------------------------------------------------------------------===//
Bill Wendlingc872e9c2009-10-27 23:30:07 +00001617
1618These two functions produce different code. They shouldn't:
1619
1620#include <stdint.h>
1621
1622uint8_t p1(uint8_t b, uint8_t a) {
1623 b = (b & ~0xc0) | (a & 0xc0);
1624 return (b);
1625}
1626
1627uint8_t p2(uint8_t b, uint8_t a) {
1628 b = (b & ~0x40) | (a & 0x40);
1629 b = (b & ~0x80) | (a & 0x80);
1630 return (b);
1631}
1632
1633define zeroext i8 @p1(i8 zeroext %b, i8 zeroext %a) nounwind readnone ssp {
1634entry:
1635 %0 = and i8 %b, 63 ; <i8> [#uses=1]
1636 %1 = and i8 %a, -64 ; <i8> [#uses=1]
1637 %2 = or i8 %1, %0 ; <i8> [#uses=1]
1638 ret i8 %2
1639}
1640
1641define zeroext i8 @p2(i8 zeroext %b, i8 zeroext %a) nounwind readnone ssp {
1642entry:
1643 %0 = and i8 %b, 63 ; <i8> [#uses=1]
1644 %.masked = and i8 %a, 64 ; <i8> [#uses=1]
1645 %1 = and i8 %a, -128 ; <i8> [#uses=1]
1646 %2 = or i8 %1, %0 ; <i8> [#uses=1]
1647 %3 = or i8 %2, %.masked ; <i8> [#uses=1]
1648 ret i8 %3
1649}
1650
1651//===---------------------------------------------------------------------===//
Chris Lattner6fdfc9c2009-11-11 17:51:27 +00001652
1653IPSCCP does not currently propagate argument dependent constants through
1654functions where it does not not all of the callers. This includes functions
1655with normal external linkage as well as templates, C99 inline functions etc.
1656Specifically, it does nothing to:
1657
1658define i32 @test(i32 %x, i32 %y, i32 %z) nounwind {
1659entry:
1660 %0 = add nsw i32 %y, %z
1661 %1 = mul i32 %0, %x
1662 %2 = mul i32 %y, %z
1663 %3 = add nsw i32 %1, %2
1664 ret i32 %3
1665}
1666
1667define i32 @test2() nounwind {
1668entry:
1669 %0 = call i32 @test(i32 1, i32 2, i32 4) nounwind
1670 ret i32 %0
1671}
1672
1673It would be interesting extend IPSCCP to be able to handle simple cases like
1674this, where all of the arguments to a call are constant. Because IPSCCP runs
1675before inlining, trivial templates and inline functions are not yet inlined.
1676The results for a function + set of constant arguments should be memoized in a
1677map.
1678
1679//===---------------------------------------------------------------------===//
Chris Lattnerfc926c22009-11-11 17:54:02 +00001680
1681The libcall constant folding stuff should be moved out of SimplifyLibcalls into
1682libanalysis' constantfolding logic. This would allow IPSCCP to be able to
1683handle simple things like this:
1684
1685static int foo(const char *X) { return strlen(X); }
1686int bar() { return foo("abcd"); }
1687
1688//===---------------------------------------------------------------------===//
Nick Lewycky93f9f7a2009-11-15 17:51:23 +00001689
1690InstCombine should use SimplifyDemandedBits to remove the or instruction:
1691
1692define i1 @test(i8 %x, i8 %y) {
1693 %A = or i8 %x, 1
1694 %B = icmp ugt i8 %A, 3
1695 ret i1 %B
1696}
1697
1698Currently instcombine calls SimplifyDemandedBits with either all bits or just
1699the sign bit, if the comparison is obviously a sign test. In this case, we only
1700need all but the bottom two bits from %A, and if we gave that mask to SDB it
1701would delete the or instruction for us.
1702
1703//===---------------------------------------------------------------------===//
Chris Lattner05332172009-12-03 07:41:54 +00001704
Duncan Sandse10920d2010-01-06 15:37:47 +00001705functionattrs doesn't know much about memcpy/memset. This function should be
Duncan Sands7c422ac2010-01-06 08:45:52 +00001706marked readnone rather than readonly, since it only twiddles local memory, but
1707functionattrs doesn't handle memset/memcpy/memmove aggressively:
Chris Lattner89742c22009-12-03 07:43:46 +00001708
1709struct X { int *p; int *q; };
1710int foo() {
1711 int i = 0, j = 1;
1712 struct X x, y;
1713 int **p;
1714 y.p = &i;
1715 x.q = &j;
1716 p = __builtin_memcpy (&x, &y, sizeof (int *));
1717 return **p;
1718}
1719
Chris Lattner05332172009-12-03 07:41:54 +00001720//===---------------------------------------------------------------------===//
1721
Eli Friedman9cfb3ad2010-01-18 22:36:59 +00001722Missed instcombine transformation:
1723define i1 @a(i32 %x) nounwind readnone {
1724entry:
1725 %cmp = icmp eq i32 %x, 30
1726 %sub = add i32 %x, -30
1727 %cmp2 = icmp ugt i32 %sub, 9
1728 %or = or i1 %cmp, %cmp2
1729 ret i1 %or
1730}
1731This should be optimized to a single compare. Testcase derived from gcc.
1732
1733//===---------------------------------------------------------------------===//
1734
1735Missed instcombine transformation:
1736void b();
1737void a(int x) { if (((1<<x)&8)==0) b(); }
1738
1739The shift should be optimized out. Testcase derived from gcc.
1740
1741//===---------------------------------------------------------------------===//
1742
1743Missed instcombine or reassociate transformation:
1744int a(int a, int b) { return (a==12)&(b>47)&(b<58); }
1745
1746The sgt and slt should be combined into a single comparison. Testcase derived
1747from gcc.
1748
1749//===---------------------------------------------------------------------===//
1750
1751Missed instcombine transformation:
1752define i32 @a(i32 %x) nounwind readnone {
1753entry:
1754 %shr = lshr i32 %x, 5 ; <i32> [#uses=1]
1755 %xor = xor i32 %shr, 67108864 ; <i32> [#uses=1]
1756 %sub = add i32 %xor, -67108864 ; <i32> [#uses=1]
1757 ret i32 %sub
1758}
1759
1760This function is equivalent to "ashr i32 %x, 5". Testcase derived from gcc.
1761
1762//===---------------------------------------------------------------------===//