blob: e1772c2ead0eba78b6ee0cdebe3e3da636d0a2c5 [file] [log] [blame]
Chris Lattner086c0142006-02-03 06:21:43 +00001Target Independent Opportunities:
2
Chris Lattnerf308ea02006-09-28 06:01:17 +00003//===---------------------------------------------------------------------===//
4
Chris Lattner1d159832009-11-27 17:12:30 +00005Dead argument elimination should be enhanced to handle cases when an argument is
6dead to an externally visible function. Though the argument can't be removed
7from the externally visible function, the caller doesn't need to pass it in.
8For example in this testcase:
9
10 void foo(int X) __attribute__((noinline));
11 void foo(int X) { sideeffect(); }
12 void bar(int A) { foo(A+1); }
13
14We compile bar to:
15
16define void @bar(i32 %A) nounwind ssp {
17 %0 = add nsw i32 %A, 1 ; <i32> [#uses=1]
18 tail call void @foo(i32 %0) nounwind noinline ssp
19 ret void
20}
21
22The add is dead, we could pass in 'i32 undef' instead. This occurs for C++
23templates etc, which usually have linkonce_odr/weak_odr linkage, not internal
24linkage.
25
26//===---------------------------------------------------------------------===//
27
Chris Lattner9b62b452006-11-14 01:57:53 +000028With the recent changes to make the implicit def/use set explicit in
29machineinstrs, we should change the target descriptions for 'call' instructions
30so that the .td files don't list all the call-clobbered registers as implicit
31defs. Instead, these should be added by the code generator (e.g. on the dag).
32
33This has a number of uses:
34
351. PPC32/64 and X86 32/64 can avoid having multiple copies of call instructions
36 for their different impdef sets.
372. Targets with multiple calling convs (e.g. x86) which have different clobber
38 sets don't need copies of call instructions.
393. 'Interprocedural register allocation' can be done to reduce the clobber sets
40 of calls.
41
42//===---------------------------------------------------------------------===//
43
Nate Begeman81e80972006-03-17 01:40:33 +000044Make the PPC branch selector target independant
45
46//===---------------------------------------------------------------------===//
Chris Lattner086c0142006-02-03 06:21:43 +000047
48Get the C front-end to expand hypot(x,y) -> llvm.sqrt(x*x+y*y) when errno and
Chris Lattner2dae65d2008-12-10 01:30:48 +000049precision don't matter (ffastmath). Misc/mandel will like this. :) This isn't
50safe in general, even on darwin. See the libm implementation of hypot for
51examples (which special case when x/y are exactly zero to get signed zeros etc
52right).
Chris Lattner086c0142006-02-03 06:21:43 +000053
Chris Lattner086c0142006-02-03 06:21:43 +000054//===---------------------------------------------------------------------===//
55
56Solve this DAG isel folding deficiency:
57
58int X, Y;
59
60void fn1(void)
61{
62 X = X | (Y << 3);
63}
64
65compiles to
66
67fn1:
68 movl Y, %eax
69 shll $3, %eax
70 orl X, %eax
71 movl %eax, X
72 ret
73
74The problem is the store's chain operand is not the load X but rather
75a TokenFactor of the load X and load Y, which prevents the folding.
76
77There are two ways to fix this:
78
791. The dag combiner can start using alias analysis to realize that y/x
80 don't alias, making the store to X not dependent on the load from Y.
812. The generated isel could be made smarter in the case it can't
82 disambiguate the pointers.
83
84Number 1 is the preferred solution.
85
Evan Chenge617b082006-03-13 23:19:10 +000086This has been "fixed" by a TableGen hack. But that is a short term workaround
87which will be removed once the proper fix is made.
88
Chris Lattner086c0142006-02-03 06:21:43 +000089//===---------------------------------------------------------------------===//
90
Chris Lattnerb27b69f2006-03-04 01:19:34 +000091On targets with expensive 64-bit multiply, we could LSR this:
92
93for (i = ...; ++i) {
94 x = 1ULL << i;
95
96into:
97 long long tmp = 1;
98 for (i = ...; ++i, tmp+=tmp)
99 x = tmp;
100
101This would be a win on ppc32, but not x86 or ppc64.
102
Chris Lattnerad019932006-03-04 08:44:51 +0000103//===---------------------------------------------------------------------===//
Chris Lattner5b0fe7d2006-03-05 20:00:08 +0000104
105Shrink: (setlt (loadi32 P), 0) -> (setlt (loadi8 Phi), 0)
106
107//===---------------------------------------------------------------------===//
Chris Lattner549f27d22006-03-07 02:46:26 +0000108
Chris Lattnerc20995e2006-03-11 20:17:08 +0000109Reassociate should turn: X*X*X*X -> t=(X*X) (t*t) to eliminate a multiply.
110
111//===---------------------------------------------------------------------===//
112
Chris Lattner74cfb7d2006-03-11 20:20:40 +0000113Interesting? testcase for add/shift/mul reassoc:
114
115int bar(int x, int y) {
116 return x*x*x+y+x*x*x*x*x*y*y*y*y;
117}
118int foo(int z, int n) {
119 return bar(z, n) + bar(2*z, 2*n);
120}
121
Chris Lattner5e14b0d2007-05-05 22:29:06 +0000122Reassociate should handle the example in GCC PR16157.
123
Chris Lattner74cfb7d2006-03-11 20:20:40 +0000124//===---------------------------------------------------------------------===//
125
Chris Lattner82c78b22006-03-09 20:13:21 +0000126These two functions should generate the same code on big-endian systems:
127
128int g(int *j,int *l) { return memcmp(j,l,4); }
129int h(int *j, int *l) { return *j - *l; }
130
131this could be done in SelectionDAGISel.cpp, along with other special cases,
132for 1,2,4,8 bytes.
133
134//===---------------------------------------------------------------------===//
135
Chris Lattnerc04b4232006-03-22 07:33:46 +0000136It would be nice to revert this patch:
137http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060213/031986.html
138
139And teach the dag combiner enough to simplify the code expanded before
140legalize. It seems plausible that this knowledge would let it simplify other
141stuff too.
142
Chris Lattnere6cd96d2006-03-24 19:59:17 +0000143//===---------------------------------------------------------------------===//
144
Reid Spencerac9dcb92007-02-15 03:39:18 +0000145For vector types, TargetData.cpp::getTypeInfo() returns alignment that is equal
Evan Cheng67d3d4c2006-03-31 22:35:14 +0000146to the type size. It works but can be overly conservative as the alignment of
Reid Spencerac9dcb92007-02-15 03:39:18 +0000147specific vector types are target dependent.
Chris Lattnereaa7c062006-04-01 04:08:29 +0000148
149//===---------------------------------------------------------------------===//
150
Dan Gohman1f3be1a2009-05-11 18:51:16 +0000151We should produce an unaligned load from code like this:
Chris Lattnereaa7c062006-04-01 04:08:29 +0000152
153v4sf example(float *P) {
154 return (v4sf){P[0], P[1], P[2], P[3] };
155}
156
157//===---------------------------------------------------------------------===//
158
Chris Lattner16abfdf2006-05-18 18:26:13 +0000159Add support for conditional increments, and other related patterns. Instead
160of:
161
162 movl 136(%esp), %eax
163 cmpl $0, %eax
164 je LBB16_2 #cond_next
165LBB16_1: #cond_true
166 incl _foo
167LBB16_2: #cond_next
168
169emit:
170 movl _foo, %eax
171 cmpl $1, %edi
172 sbbl $-1, %eax
173 movl %eax, _foo
174
175//===---------------------------------------------------------------------===//
Chris Lattner870cf1b2006-05-19 20:45:08 +0000176
177Combine: a = sin(x), b = cos(x) into a,b = sincos(x).
178
179Expand these to calls of sin/cos and stores:
180 double sincos(double x, double *sin, double *cos);
181 float sincosf(float x, float *sin, float *cos);
182 long double sincosl(long double x, long double *sin, long double *cos);
183
184Doing so could allow SROA of the destination pointers. See also:
185http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17687
186
Chris Lattner2dae65d2008-12-10 01:30:48 +0000187This is now easily doable with MRVs. We could even make an intrinsic for this
188if anyone cared enough about sincos.
189
Chris Lattner870cf1b2006-05-19 20:45:08 +0000190//===---------------------------------------------------------------------===//
Chris Lattnerf00f68a2006-05-19 21:01:38 +0000191
Chris Lattnere8263e62006-05-21 03:57:07 +0000192Turn this into a single byte store with no load (the other 3 bytes are
193unmodified):
194
Dan Gohman5c8274b2009-05-11 18:04:52 +0000195define void @test(i32* %P) {
196 %tmp = load i32* %P
197 %tmp14 = or i32 %tmp, 3305111552
198 %tmp15 = and i32 %tmp14, 3321888767
199 store i32 %tmp15, i32* %P
Chris Lattnere8263e62006-05-21 03:57:07 +0000200 ret void
201}
202
Chris Lattner9e18ef52006-05-30 21:29:15 +0000203//===---------------------------------------------------------------------===//
204
205dag/inst combine "clz(x)>>5 -> x==0" for 32-bit x.
206
207Compile:
208
209int bar(int x)
210{
211 int t = __builtin_clz(x);
212 return -(t>>5);
213}
214
215to:
216
217_bar: addic r3,r3,-1
218 subfe r3,r3,r3
219 blr
220
Chris Lattnercbce2f62006-09-15 20:31:36 +0000221//===---------------------------------------------------------------------===//
222
Chris Lattner7ed96ab2006-09-16 23:57:51 +0000223quantum_sigma_x in 462.libquantum contains the following loop:
224
225 for(i=0; i<reg->size; i++)
226 {
227 /* Flip the target bit of each basis state */
228 reg->node[i].state ^= ((MAX_UNSIGNED) 1 << target);
229 }
230
231Where MAX_UNSIGNED/state is a 64-bit int. On a 32-bit platform it would be just
232so cool to turn it into something like:
233
Chris Lattnerb33a42a2006-09-18 04:54:35 +0000234 long long Res = ((MAX_UNSIGNED) 1 << target);
Chris Lattner7ed96ab2006-09-16 23:57:51 +0000235 if (target < 32) {
236 for(i=0; i<reg->size; i++)
Chris Lattnerb33a42a2006-09-18 04:54:35 +0000237 reg->node[i].state ^= Res & 0xFFFFFFFFULL;
Chris Lattner7ed96ab2006-09-16 23:57:51 +0000238 } else {
239 for(i=0; i<reg->size; i++)
Chris Lattnerb33a42a2006-09-18 04:54:35 +0000240 reg->node[i].state ^= Res & 0xFFFFFFFF00000000ULL
Chris Lattner7ed96ab2006-09-16 23:57:51 +0000241 }
242
243... which would only do one 32-bit XOR per loop iteration instead of two.
244
245It would also be nice to recognize the reg->size doesn't alias reg->node[i], but
Chris Lattner9c6a0dc2009-11-26 01:51:18 +0000246this requires TBAA.
Chris Lattnerfaa6adf2009-09-21 06:04:07 +0000247
248//===---------------------------------------------------------------------===//
249
250This should be optimized to one 'and' and one 'or', from PR4216:
251
252define i32 @test_bitfield(i32 %bf.prev.low) nounwind ssp {
253entry:
254 %bf.prev.lo.cleared10 = or i32 %bf.prev.low, 32962 ; <i32> [#uses=1]
255 %0 = and i32 %bf.prev.low, -65536 ; <i32> [#uses=1]
256 %1 = and i32 %bf.prev.lo.cleared10, 40186 ; <i32> [#uses=1]
257 %2 = or i32 %1, %0 ; <i32> [#uses=1]
258 ret i32 %2
259}
Chris Lattner7ed96ab2006-09-16 23:57:51 +0000260
261//===---------------------------------------------------------------------===//
Chris Lattnerfb981f32006-09-25 17:12:14 +0000262
Chris Lattnerb1ac7692008-10-05 02:16:12 +0000263This isn't recognized as bswap by instcombine (yes, it really is bswap):
Chris Lattnerf9bae432006-12-08 02:01:32 +0000264
265unsigned long reverse(unsigned v) {
266 unsigned t;
267 t = v ^ ((v << 16) | (v >> 16));
268 t &= ~0xff0000;
269 v = (v << 24) | (v >> 8);
270 return v ^ (t >> 8);
271}
272
Chris Lattnerfb981f32006-09-25 17:12:14 +0000273//===---------------------------------------------------------------------===//
274
Chris Lattnerf4fee2a2008-10-15 16:02:15 +0000275These idioms should be recognized as popcount (see PR1488):
276
277unsigned countbits_slow(unsigned v) {
278 unsigned c;
279 for (c = 0; v; v >>= 1)
280 c += v & 1;
281 return c;
282}
283unsigned countbits_fast(unsigned v){
284 unsigned c;
285 for (c = 0; v; c++)
286 v &= v - 1; // clear the least significant bit set
287 return c;
288}
289
290BITBOARD = unsigned long long
291int PopCnt(register BITBOARD a) {
292 register int c=0;
293 while(a) {
294 c++;
295 a &= a - 1;
296 }
297 return c;
298}
299unsigned int popcount(unsigned int input) {
300 unsigned int count = 0;
301 for (unsigned int i = 0; i < 4 * 8; i++)
302 count += (input >> i) & i;
303 return count;
304}
305
Chris Lattner9c6a0dc2009-11-26 01:51:18 +0000306This is a form of idiom recognition for loops, the same thing that could be
307useful for recognizing memset/memcpy.
308
Chris Lattnerf4fee2a2008-10-15 16:02:15 +0000309//===---------------------------------------------------------------------===//
310
Chris Lattnerfb981f32006-09-25 17:12:14 +0000311These should turn into single 16-bit (unaligned?) loads on little/big endian
312processors.
313
314unsigned short read_16_le(const unsigned char *adr) {
315 return adr[0] | (adr[1] << 8);
316}
317unsigned short read_16_be(const unsigned char *adr) {
318 return (adr[0] << 8) | adr[1];
319}
320
321//===---------------------------------------------------------------------===//
Chris Lattnercf103912006-10-24 16:12:47 +0000322
Reid Spencer1628cec2006-10-26 06:15:43 +0000323-instcombine should handle this transform:
Reid Spencere4d87aa2006-12-23 06:05:41 +0000324 icmp pred (sdiv X / C1 ), C2
Reid Spencer1628cec2006-10-26 06:15:43 +0000325when X, C1, and C2 are unsigned. Similarly for udiv and signed operands.
326
327Currently InstCombine avoids this transform but will do it when the signs of
328the operands and the sign of the divide match. See the FIXME in
329InstructionCombining.cpp in the visitSetCondInst method after the switch case
330for Instruction::UDiv (around line 4447) for more details.
331
332The SingleSource/Benchmarks/Shootout-C++/hash and hash2 tests have examples of
333this construct.
Chris Lattnerd7c628d2006-11-03 22:27:39 +0000334
335//===---------------------------------------------------------------------===//
336
Chris Lattner578d2df2006-11-10 00:23:26 +0000337viterbi speeds up *significantly* if the various "history" related copy loops
338are turned into memcpy calls at the source level. We need a "loops to memcpy"
339pass.
340
341//===---------------------------------------------------------------------===//
Nick Lewyckybf637342006-11-13 00:23:28 +0000342
Chris Lattner03a6d962007-01-16 06:39:48 +0000343Consider:
344
345typedef unsigned U32;
346typedef unsigned long long U64;
347int test (U32 *inst, U64 *regs) {
348 U64 effective_addr2;
349 U32 temp = *inst;
350 int r1 = (temp >> 20) & 0xf;
351 int b2 = (temp >> 16) & 0xf;
352 effective_addr2 = temp & 0xfff;
353 if (b2) effective_addr2 += regs[b2];
354 b2 = (temp >> 12) & 0xf;
355 if (b2) effective_addr2 += regs[b2];
356 effective_addr2 &= regs[4];
357 if ((effective_addr2 & 3) == 0)
358 return 1;
359 return 0;
360}
361
362Note that only the low 2 bits of effective_addr2 are used. On 32-bit systems,
363we don't eliminate the computation of the top half of effective_addr2 because
364we don't have whole-function selection dags. On x86, this means we use one
365extra register for the function when effective_addr2 is declared as U64 than
366when it is declared U32.
367
Chris Lattner17424982009-11-10 23:47:45 +0000368PHI Slicing could be extended to do this.
369
Chris Lattner03a6d962007-01-16 06:39:48 +0000370//===---------------------------------------------------------------------===//
371
Chris Lattner9c6a0dc2009-11-26 01:51:18 +0000372LSR should know what GPR types a target has from TargetData. This code:
Chris Lattner1a77a552007-03-24 06:01:32 +0000373
374volatile short X, Y; // globals
375
376void foo(int N) {
377 int i;
378 for (i = 0; i < N; i++) { X = i; Y = i*4; }
379}
380
Chris Lattnerc1491f32009-09-20 17:37:38 +0000381produces two near identical IV's (after promotion) on PPC/ARM:
Chris Lattner1a77a552007-03-24 06:01:32 +0000382
Chris Lattnerc1491f32009-09-20 17:37:38 +0000383LBB1_2:
384 ldr r3, LCPI1_0
385 ldr r3, [r3]
386 strh r2, [r3]
387 ldr r3, LCPI1_1
388 ldr r3, [r3]
389 strh r1, [r3]
390 add r1, r1, #4
391 add r2, r2, #1 <- [0,+,1]
392 sub r0, r0, #1 <- [0,-,1]
393 cmp r0, #0
394 bne LBB1_2
395
396LSR should reuse the "+" IV for the exit test.
Chris Lattner1a77a552007-03-24 06:01:32 +0000397
Chris Lattner1a77a552007-03-24 06:01:32 +0000398//===---------------------------------------------------------------------===//
399
Chris Lattner5e14b0d2007-05-05 22:29:06 +0000400Tail call elim should be more aggressive, checking to see if the call is
401followed by an uncond branch to an exit block.
402
403; This testcase is due to tail-duplication not wanting to copy the return
404; instruction into the terminating blocks because there was other code
405; optimized out of the function after the taildup happened.
Chris Lattner7c4e9a42008-02-18 18:46:39 +0000406; RUN: llvm-as < %s | opt -tailcallelim | llvm-dis | not grep call
Chris Lattner5e14b0d2007-05-05 22:29:06 +0000407
Chris Lattner7c4e9a42008-02-18 18:46:39 +0000408define i32 @t4(i32 %a) {
Chris Lattner5e14b0d2007-05-05 22:29:06 +0000409entry:
Chris Lattner7c4e9a42008-02-18 18:46:39 +0000410 %tmp.1 = and i32 %a, 1 ; <i32> [#uses=1]
411 %tmp.2 = icmp ne i32 %tmp.1, 0 ; <i1> [#uses=1]
412 br i1 %tmp.2, label %then.0, label %else.0
Chris Lattner5e14b0d2007-05-05 22:29:06 +0000413
Chris Lattner7c4e9a42008-02-18 18:46:39 +0000414then.0: ; preds = %entry
415 %tmp.5 = add i32 %a, -1 ; <i32> [#uses=1]
416 %tmp.3 = call i32 @t4( i32 %tmp.5 ) ; <i32> [#uses=1]
417 br label %return
Chris Lattner5e14b0d2007-05-05 22:29:06 +0000418
Chris Lattner7c4e9a42008-02-18 18:46:39 +0000419else.0: ; preds = %entry
420 %tmp.7 = icmp ne i32 %a, 0 ; <i1> [#uses=1]
421 br i1 %tmp.7, label %then.1, label %return
Chris Lattner5e14b0d2007-05-05 22:29:06 +0000422
Chris Lattner7c4e9a42008-02-18 18:46:39 +0000423then.1: ; preds = %else.0
424 %tmp.11 = add i32 %a, -2 ; <i32> [#uses=1]
425 %tmp.9 = call i32 @t4( i32 %tmp.11 ) ; <i32> [#uses=1]
426 br label %return
Chris Lattner5e14b0d2007-05-05 22:29:06 +0000427
Chris Lattner7c4e9a42008-02-18 18:46:39 +0000428return: ; preds = %then.1, %else.0, %then.0
429 %result.0 = phi i32 [ 0, %else.0 ], [ %tmp.3, %then.0 ],
Chris Lattner5e14b0d2007-05-05 22:29:06 +0000430 [ %tmp.9, %then.1 ]
Chris Lattner7c4e9a42008-02-18 18:46:39 +0000431 ret i32 %result.0
Chris Lattner5e14b0d2007-05-05 22:29:06 +0000432}
Chris Lattnerf110a2b2007-05-05 22:44:08 +0000433
434//===---------------------------------------------------------------------===//
435
Chris Lattnerc90b8662008-08-10 00:47:21 +0000436Tail recursion elimination should handle:
437
438int pow2m1(int n) {
439 if (n == 0)
440 return 0;
441 return 2 * pow2m1 (n - 1) + 1;
442}
443
444Also, multiplies can be turned into SHL's, so they should be handled as if
445they were associative. "return foo() << 1" can be tail recursion eliminated.
446
447//===---------------------------------------------------------------------===//
448
Chris Lattnerf110a2b2007-05-05 22:44:08 +0000449Argument promotion should promote arguments for recursive functions, like
450this:
451
Chris Lattner7c4e9a42008-02-18 18:46:39 +0000452; RUN: llvm-as < %s | opt -argpromotion | llvm-dis | grep x.val
Chris Lattnerf110a2b2007-05-05 22:44:08 +0000453
Chris Lattner7c4e9a42008-02-18 18:46:39 +0000454define internal i32 @foo(i32* %x) {
Chris Lattnerf110a2b2007-05-05 22:44:08 +0000455entry:
Chris Lattner7c4e9a42008-02-18 18:46:39 +0000456 %tmp = load i32* %x ; <i32> [#uses=0]
457 %tmp.foo = call i32 @foo( i32* %x ) ; <i32> [#uses=1]
458 ret i32 %tmp.foo
Chris Lattnerf110a2b2007-05-05 22:44:08 +0000459}
460
Chris Lattner7c4e9a42008-02-18 18:46:39 +0000461define i32 @bar(i32* %x) {
Chris Lattnerf110a2b2007-05-05 22:44:08 +0000462entry:
Chris Lattner7c4e9a42008-02-18 18:46:39 +0000463 %tmp3 = call i32 @foo( i32* %x ) ; <i32> [#uses=1]
464 ret i32 %tmp3
Chris Lattnerf110a2b2007-05-05 22:44:08 +0000465}
466
Chris Lattner81f2d712007-12-05 23:05:06 +0000467//===---------------------------------------------------------------------===//
Chris Lattner166a2682007-12-28 04:42:05 +0000468
Chris Lattnera1643ba2007-12-28 22:30:05 +0000469We should investigate an instruction sinking pass. Consider this silly
470example in pic mode:
471
472#include <assert.h>
473void foo(int x) {
474 assert(x);
475 //...
476}
477
478we compile this to:
479_foo:
480 subl $28, %esp
481 call "L1$pb"
482"L1$pb":
483 popl %eax
484 cmpl $0, 32(%esp)
485 je LBB1_2 # cond_true
486LBB1_1: # return
487 # ...
488 addl $28, %esp
489 ret
490LBB1_2: # cond_true
491...
492
493The PIC base computation (call+popl) is only used on one path through the
494code, but is currently always computed in the entry block. It would be
495better to sink the picbase computation down into the block for the
496assertion, as it is the only one that uses it. This happens for a lot of
497code with early outs.
498
Chris Lattner92c06a02007-12-29 01:05:01 +0000499Another example is loads of arguments, which are usually emitted into the
500entry block on targets like x86. If not used in all paths through a
501function, they should be sunk into the ones that do.
502
Chris Lattnera1643ba2007-12-28 22:30:05 +0000503In this case, whole-function-isel would also handle this.
Chris Lattner166a2682007-12-28 04:42:05 +0000504
505//===---------------------------------------------------------------------===//
Chris Lattnerb3041942008-01-07 21:38:14 +0000506
507Investigate lowering of sparse switch statements into perfect hash tables:
508http://burtleburtle.net/bob/hash/perfect.html
509
510//===---------------------------------------------------------------------===//
Chris Lattnerf61b63e2008-01-09 00:17:57 +0000511
512We should turn things like "load+fabs+store" and "load+fneg+store" into the
513corresponding integer operations. On a yonah, this loop:
514
515double a[256];
Chris Lattner7c4e9a42008-02-18 18:46:39 +0000516void foo() {
517 int i, b;
518 for (b = 0; b < 10000000; b++)
519 for (i = 0; i < 256; i++)
520 a[i] = -a[i];
521}
Chris Lattnerf61b63e2008-01-09 00:17:57 +0000522
523is twice as slow as this loop:
524
525long long a[256];
Chris Lattner7c4e9a42008-02-18 18:46:39 +0000526void foo() {
527 int i, b;
528 for (b = 0; b < 10000000; b++)
529 for (i = 0; i < 256; i++)
530 a[i] ^= (1ULL << 63);
531}
Chris Lattnerf61b63e2008-01-09 00:17:57 +0000532
533and I suspect other processors are similar. On X86 in particular this is a
534big win because doing this with integers allows the use of read/modify/write
535instructions.
536
537//===---------------------------------------------------------------------===//
Chris Lattner83726012008-01-10 18:25:41 +0000538
539DAG Combiner should try to combine small loads into larger loads when
540profitable. For example, we compile this C++ example:
541
542struct THotKey { short Key; bool Control; bool Shift; bool Alt; };
543extern THotKey m_HotKey;
544THotKey GetHotKey () { return m_HotKey; }
545
546into (-O3 -fno-exceptions -static -fomit-frame-pointer):
547
548__Z9GetHotKeyv:
549 pushl %esi
550 movl 8(%esp), %eax
551 movb _m_HotKey+3, %cl
552 movb _m_HotKey+4, %dl
553 movb _m_HotKey+2, %ch
554 movw _m_HotKey, %si
555 movw %si, (%eax)
556 movb %ch, 2(%eax)
557 movb %cl, 3(%eax)
558 movb %dl, 4(%eax)
559 popl %esi
560 ret $4
561
562GCC produces:
563
564__Z9GetHotKeyv:
565 movl _m_HotKey, %edx
566 movl 4(%esp), %eax
567 movl %edx, (%eax)
568 movzwl _m_HotKey+4, %edx
569 movw %dx, 4(%eax)
570 ret $4
571
572The LLVM IR contains the needed alignment info, so we should be able to
573merge the loads and stores into 4-byte loads:
574
575 %struct.THotKey = type { i16, i8, i8, i8 }
576define void @_Z9GetHotKeyv(%struct.THotKey* sret %agg.result) nounwind {
577...
578 %tmp2 = load i16* getelementptr (@m_HotKey, i32 0, i32 0), align 8
579 %tmp5 = load i8* getelementptr (@m_HotKey, i32 0, i32 1), align 2
580 %tmp8 = load i8* getelementptr (@m_HotKey, i32 0, i32 2), align 1
581 %tmp11 = load i8* getelementptr (@m_HotKey, i32 0, i32 3), align 2
582
583Alternatively, we should use a small amount of base-offset alias analysis
584to make it so the scheduler doesn't need to hold all the loads in regs at
585once.
586
587//===---------------------------------------------------------------------===//
Chris Lattner497b7e92008-01-11 06:17:47 +0000588
Nate Begemane9fe65c2008-02-18 18:39:23 +0000589We should add an FRINT node to the DAG to model targets that have legal
590implementations of ceil/floor/rint.
Chris Lattner48840f82008-02-28 05:34:27 +0000591
592//===---------------------------------------------------------------------===//
593
594Consider:
595
596int test() {
597 long long input[8] = {1,1,1,1,1,1,1,1};
598 foo(input);
599}
600
601We currently compile this into a memcpy from a global array since the
602initializer is fairly large and not memset'able. This is good, but the memcpy
603gets lowered to load/stores in the code generator. This is also ok, except
604that the codegen lowering for memcpy doesn't handle the case when the source
605is a constant global. This gives us atrocious code like this:
606
607 call "L1$pb"
608"L1$pb":
609 popl %eax
610 movl _C.0.1444-"L1$pb"+32(%eax), %ecx
611 movl %ecx, 40(%esp)
612 movl _C.0.1444-"L1$pb"+20(%eax), %ecx
613 movl %ecx, 28(%esp)
614 movl _C.0.1444-"L1$pb"+36(%eax), %ecx
615 movl %ecx, 44(%esp)
616 movl _C.0.1444-"L1$pb"+44(%eax), %ecx
617 movl %ecx, 52(%esp)
618 movl _C.0.1444-"L1$pb"+40(%eax), %ecx
619 movl %ecx, 48(%esp)
620 movl _C.0.1444-"L1$pb"+12(%eax), %ecx
621 movl %ecx, 20(%esp)
622 movl _C.0.1444-"L1$pb"+4(%eax), %ecx
623...
624
625instead of:
626 movl $1, 16(%esp)
627 movl $0, 20(%esp)
628 movl $1, 24(%esp)
629 movl $0, 28(%esp)
630 movl $1, 32(%esp)
631 movl $0, 36(%esp)
632 ...
633
634//===---------------------------------------------------------------------===//
Chris Lattnera11deb02008-03-02 02:51:40 +0000635
636http://llvm.org/PR717:
637
638The following code should compile into "ret int undef". Instead, LLVM
639produces "ret int 0":
640
641int f() {
642 int x = 4;
643 int y;
644 if (x == 3) y = 0;
645 return y;
646}
647
648//===---------------------------------------------------------------------===//
Chris Lattner53b72772008-03-02 19:29:42 +0000649
650The loop unroller should partially unroll loops (instead of peeling them)
651when code growth isn't too bad and when an unroll count allows simplification
652of some code within the loop. One trivial example is:
653
654#include <stdio.h>
655int main() {
656 int nRet = 17;
657 int nLoop;
658 for ( nLoop = 0; nLoop < 1000; nLoop++ ) {
659 if ( nLoop & 1 )
660 nRet += 2;
661 else
662 nRet -= 1;
663 }
664 return nRet;
665}
666
667Unrolling by 2 would eliminate the '&1' in both copies, leading to a net
668reduction in code size. The resultant code would then also be suitable for
669exit value computation.
670
671//===---------------------------------------------------------------------===//
Chris Lattner349155b2008-03-17 01:47:51 +0000672
673We miss a bunch of rotate opportunities on various targets, including ppc, x86,
674etc. On X86, we miss a bunch of 'rotate by variable' cases because the rotate
675matching code in dag combine doesn't look through truncates aggressively
676enough. Here are some testcases reduces from GCC PR17886:
677
678unsigned long long f(unsigned long long x, int y) {
679 return (x << y) | (x >> 64-y);
680}
681unsigned f2(unsigned x, int y){
682 return (x << y) | (x >> 32-y);
683}
684unsigned long long f3(unsigned long long x){
685 int y = 9;
686 return (x << y) | (x >> 64-y);
687}
688unsigned f4(unsigned x){
689 int y = 10;
690 return (x << y) | (x >> 32-y);
691}
692unsigned long long f5(unsigned long long x, unsigned long long y) {
693 return (x << 8) | ((y >> 48) & 0xffull);
694}
695unsigned long long f6(unsigned long long x, unsigned long long y, int z) {
696 switch(z) {
697 case 1:
698 return (x << 8) | ((y >> 48) & 0xffull);
699 case 2:
700 return (x << 16) | ((y >> 40) & 0xffffull);
701 case 3:
702 return (x << 24) | ((y >> 32) & 0xffffffull);
703 case 4:
704 return (x << 32) | ((y >> 24) & 0xffffffffull);
705 default:
706 return (x << 40) | ((y >> 16) & 0xffffffffffull);
707 }
708}
709
Dan Gohmancb747c52008-10-17 21:39:27 +0000710On X86-64, we only handle f2/f3/f4 right. On x86-32, a few of these
Chris Lattner349155b2008-03-17 01:47:51 +0000711generate truly horrible code, instead of using shld and friends. On
712ARM, we end up with calls to L___lshrdi3/L___ashldi3 in f, which is
713badness. PPC64 misses f, f5 and f6. CellSPU aborts in isel.
714
715//===---------------------------------------------------------------------===//
Chris Lattnerf70107f2008-03-20 04:46:13 +0000716
717We do a number of simplifications in simplify libcalls to strength reduce
718standard library functions, but we don't currently merge them together. For
719example, it is useful to merge memcpy(a,b,strlen(b)) -> strcpy. This can only
720be done safely if "b" isn't modified between the strlen and memcpy of course.
721
722//===---------------------------------------------------------------------===//
723
Chris Lattner10c5d362008-07-14 00:19:59 +0000724Reassociate should turn things like:
725
726int factorial(int X) {
727 return X*X*X*X*X*X*X*X;
728}
729
730into llvm.powi calls, allowing the code generator to produce balanced
731multiplication trees.
732
733//===---------------------------------------------------------------------===//
734
Chris Lattner26e150f2008-08-10 01:14:08 +0000735We generate a horrible libcall for llvm.powi. For example, we compile:
736
737#include <cmath>
738double f(double a) { return std::pow(a, 4); }
739
740into:
741
742__Z1fd:
743 subl $12, %esp
744 movsd 16(%esp), %xmm0
745 movsd %xmm0, (%esp)
746 movl $4, 8(%esp)
747 call L___powidf2$stub
748 addl $12, %esp
749 ret
750
751GCC produces:
752
753__Z1fd:
754 subl $12, %esp
755 movsd 16(%esp), %xmm0
756 mulsd %xmm0, %xmm0
757 mulsd %xmm0, %xmm0
758 movsd %xmm0, (%esp)
759 fldl (%esp)
760 addl $12, %esp
761 ret
762
763//===---------------------------------------------------------------------===//
764
765We compile this program: (from GCC PR11680)
766http://gcc.gnu.org/bugzilla/attachment.cgi?id=4487
767
768Into code that runs the same speed in fast/slow modes, but both modes run 2x
769slower than when compile with GCC (either 4.0 or 4.2):
770
771$ llvm-g++ perf.cpp -O3 -fno-exceptions
772$ time ./a.out fast
7731.821u 0.003s 0:01.82 100.0% 0+0k 0+0io 0pf+0w
774
775$ g++ perf.cpp -O3 -fno-exceptions
776$ time ./a.out fast
7770.821u 0.001s 0:00.82 100.0% 0+0k 0+0io 0pf+0w
778
779It looks like we are making the same inlining decisions, so this may be raw
780codegen badness or something else (haven't investigated).
781
782//===---------------------------------------------------------------------===//
783
784We miss some instcombines for stuff like this:
785void bar (void);
786void foo (unsigned int a) {
787 /* This one is equivalent to a >= (3 << 2). */
788 if ((a >> 2) >= 3)
789 bar ();
790}
791
792A few other related ones are in GCC PR14753.
793
794//===---------------------------------------------------------------------===//
795
796Divisibility by constant can be simplified (according to GCC PR12849) from
797being a mulhi to being a mul lo (cheaper). Testcase:
798
799void bar(unsigned n) {
800 if (n % 3 == 0)
801 true();
802}
803
Eli Friedmanbcae2052009-12-12 23:23:43 +0000804This is equivalent to the following, where 2863311531 is the multiplicative
805inverse of 3, and 1431655766 is ((2^32)-1)/3+1:
806void bar(unsigned n) {
807 if (n * 2863311531U < 1431655766U)
808 true();
809}
810
811The same transformation can work with an even modulo with the addition of a
812rotate: rotate the result of the multiply to the right by the number of bits
813which need to be zero for the condition to be true, and shrink the compare RHS
814by the same amount. Unless the target supports rotates, though, that
815transformation probably isn't worthwhile.
816
817The transformation can also easily be made to work with non-zero equality
818comparisons: just transform, for example, "n % 3 == 1" to "(n-1) % 3 == 0".
Chris Lattner26e150f2008-08-10 01:14:08 +0000819
820//===---------------------------------------------------------------------===//
Chris Lattner23f35bc2008-08-19 06:22:16 +0000821
Chris Lattnerdb039832008-10-15 16:06:03 +0000822Better mod/ref analysis for scanf would allow us to eliminate the vtable and a
823bunch of other stuff from this example (see PR1604):
824
825#include <cstdio>
826struct test {
827 int val;
828 virtual ~test() {}
829};
830
831int main() {
832 test t;
833 std::scanf("%d", &t.val);
834 std::printf("%d\n", t.val);
835}
836
837//===---------------------------------------------------------------------===//
838
Nick Lewyckyd2f0db12008-11-27 22:41:45 +0000839These functions perform the same computation, but produce different assembly.
Nick Lewyckydf563ca2008-11-27 22:12:22 +0000840
841define i8 @select(i8 %x) readnone nounwind {
842 %A = icmp ult i8 %x, 250
843 %B = select i1 %A, i8 0, i8 1
844 ret i8 %B
845}
846
847define i8 @addshr(i8 %x) readnone nounwind {
848 %A = zext i8 %x to i9
849 %B = add i9 %A, 6 ;; 256 - 250 == 6
850 %C = lshr i9 %B, 8
851 %D = trunc i9 %C to i8
852 ret i8 %D
853}
854
855//===---------------------------------------------------------------------===//
Eli Friedman4e16b292008-11-30 07:36:04 +0000856
857From gcc bug 24696:
858int
859f (unsigned long a, unsigned long b, unsigned long c)
860{
861 return ((a & (c - 1)) != 0) || ((b & (c - 1)) != 0);
862}
863int
864f (unsigned long a, unsigned long b, unsigned long c)
865{
866 return ((a & (c - 1)) != 0) | ((b & (c - 1)) != 0);
867}
868Both should combine to ((a|b) & (c-1)) != 0. Currently not optimized with
869"clang -emit-llvm-bc | opt -std-compile-opts".
870
871//===---------------------------------------------------------------------===//
872
873From GCC Bug 20192:
874#define PMD_MASK (~((1UL << 23) - 1))
875void clear_pmd_range(unsigned long start, unsigned long end)
876{
877 if (!(start & ~PMD_MASK) && !(end & ~PMD_MASK))
878 f();
879}
880The expression should optimize to something like
881"!((start|end)&~PMD_MASK). Currently not optimized with "clang
882-emit-llvm-bc | opt -std-compile-opts".
883
884//===---------------------------------------------------------------------===//
885
Eli Friedman4e16b292008-11-30 07:36:04 +0000886From GCC Bug 3756:
887int
888pn (int n)
889{
890 return (n >= 0 ? 1 : -1);
891}
892Should combine to (n >> 31) | 1. Currently not optimized with "clang
893-emit-llvm-bc | opt -std-compile-opts | llc".
894
895//===---------------------------------------------------------------------===//
896
Eli Friedman4e16b292008-11-30 07:36:04 +0000897void a(int variable)
898{
899 if (variable == 4 || variable == 6)
900 bar();
901}
902This should optimize to "if ((variable | 2) == 6)". Currently not
903optimized with "clang -emit-llvm-bc | opt -std-compile-opts | llc".
904
905//===---------------------------------------------------------------------===//
906
907unsigned int f(unsigned int i, unsigned int n) {++i; if (i == n) ++i; return
908i;}
909unsigned int f2(unsigned int i, unsigned int n) {++i; i += i == n; return i;}
910These should combine to the same thing. Currently, the first function
911produces better code on X86.
912
913//===---------------------------------------------------------------------===//
914
Eli Friedman4e16b292008-11-30 07:36:04 +0000915From GCC Bug 15784:
916#define abs(x) x>0?x:-x
917int f(int x, int y)
918{
919 return (abs(x)) >= 0;
920}
921This should optimize to x == INT_MIN. (With -fwrapv.) Currently not
922optimized with "clang -emit-llvm-bc | opt -std-compile-opts".
923
924//===---------------------------------------------------------------------===//
925
926From GCC Bug 14753:
927void
928rotate_cst (unsigned int a)
929{
930 a = (a << 10) | (a >> 22);
931 if (a == 123)
932 bar ();
933}
934void
935minus_cst (unsigned int a)
936{
937 unsigned int tem;
938
939 tem = 20 - a;
940 if (tem == 5)
941 bar ();
942}
943void
944mask_gt (unsigned int a)
945{
946 /* This is equivalent to a > 15. */
947 if ((a & ~7) > 8)
948 bar ();
949}
950void
951rshift_gt (unsigned int a)
952{
953 /* This is equivalent to a > 23. */
954 if ((a >> 2) > 5)
955 bar ();
956}
957All should simplify to a single comparison. All of these are
958currently not optimized with "clang -emit-llvm-bc | opt
959-std-compile-opts".
960
961//===---------------------------------------------------------------------===//
962
963From GCC Bug 32605:
964int c(int* x) {return (char*)x+2 == (char*)x;}
965Should combine to 0. Currently not optimized with "clang
966-emit-llvm-bc | opt -std-compile-opts" (although llc can optimize it).
967
968//===---------------------------------------------------------------------===//
969
Eli Friedman4e16b292008-11-30 07:36:04 +0000970int a(unsigned b) {return ((b << 31) | (b << 30)) >> 31;}
971Should be combined to "((b >> 1) | b) & 1". Currently not optimized
972with "clang -emit-llvm-bc | opt -std-compile-opts".
973
974//===---------------------------------------------------------------------===//
975
976unsigned a(unsigned x, unsigned y) { return x | (y & 1) | (y & 2);}
977Should combine to "x | (y & 3)". Currently not optimized with "clang
978-emit-llvm-bc | opt -std-compile-opts".
979
980//===---------------------------------------------------------------------===//
981
Eli Friedman4e16b292008-11-30 07:36:04 +0000982int a(int a, int b, int c) {return (~a & c) | ((c|a) & b);}
983Should fold to "(~a & c) | (a & b)". Currently not optimized with
984"clang -emit-llvm-bc | opt -std-compile-opts".
985
986//===---------------------------------------------------------------------===//
987
988int a(int a,int b) {return (~(a|b))|a;}
989Should fold to "a|~b". Currently not optimized with "clang
990-emit-llvm-bc | opt -std-compile-opts".
991
992//===---------------------------------------------------------------------===//
993
994int a(int a, int b) {return (a&&b) || (a&&!b);}
995Should fold to "a". Currently not optimized with "clang -emit-llvm-bc
996| opt -std-compile-opts".
997
998//===---------------------------------------------------------------------===//
999
1000int a(int a, int b, int c) {return (a&&b) || (!a&&c);}
1001Should fold to "a ? b : c", or at least something sane. Currently not
1002optimized with "clang -emit-llvm-bc | opt -std-compile-opts".
1003
1004//===---------------------------------------------------------------------===//
1005
1006int a(int a, int b, int c) {return (a&&b) || (a&&c) || (a&&b&&c);}
1007Should fold to a && (b || c). Currently not optimized with "clang
1008-emit-llvm-bc | opt -std-compile-opts".
1009
1010//===---------------------------------------------------------------------===//
1011
1012int a(int x) {return x | ((x & 8) ^ 8);}
1013Should combine to x | 8. Currently not optimized with "clang
1014-emit-llvm-bc | opt -std-compile-opts".
1015
1016//===---------------------------------------------------------------------===//
1017
1018int a(int x) {return x ^ ((x & 8) ^ 8);}
1019Should also combine to x | 8. Currently not optimized with "clang
1020-emit-llvm-bc | opt -std-compile-opts".
1021
1022//===---------------------------------------------------------------------===//
1023
1024int a(int x) {return (x & 8) == 0 ? -1 : -9;}
1025Should combine to (x | -9) ^ 8. Currently not optimized with "clang
1026-emit-llvm-bc | opt -std-compile-opts".
1027
1028//===---------------------------------------------------------------------===//
1029
1030int a(int x) {return (x & 8) == 0 ? -9 : -1;}
1031Should combine to x | -9. Currently not optimized with "clang
1032-emit-llvm-bc | opt -std-compile-opts".
1033
1034//===---------------------------------------------------------------------===//
1035
1036int a(int x) {return ((x | -9) ^ 8) & x;}
1037Should combine to x & -9. Currently not optimized with "clang
1038-emit-llvm-bc | opt -std-compile-opts".
1039
1040//===---------------------------------------------------------------------===//
1041
1042unsigned a(unsigned a) {return a * 0x11111111 >> 28 & 1;}
1043Should combine to "a * 0x88888888 >> 31". Currently not optimized
1044with "clang -emit-llvm-bc | opt -std-compile-opts".
1045
1046//===---------------------------------------------------------------------===//
1047
1048unsigned a(char* x) {if ((*x & 32) == 0) return b();}
1049There's an unnecessary zext in the generated code with "clang
1050-emit-llvm-bc | opt -std-compile-opts".
1051
1052//===---------------------------------------------------------------------===//
1053
1054unsigned a(unsigned long long x) {return 40 * (x >> 1);}
1055Should combine to "20 * (((unsigned)x) & -2)". Currently not
1056optimized with "clang -emit-llvm-bc | opt -std-compile-opts".
1057
1058//===---------------------------------------------------------------------===//
Bill Wendling3bdcda82008-12-02 05:12:47 +00001059
Chris Lattner88d84b22008-12-02 06:32:34 +00001060This was noticed in the entryblock for grokdeclarator in 403.gcc:
1061
1062 %tmp = icmp eq i32 %decl_context, 4
1063 %decl_context_addr.0 = select i1 %tmp, i32 3, i32 %decl_context
1064 %tmp1 = icmp eq i32 %decl_context_addr.0, 1
1065 %decl_context_addr.1 = select i1 %tmp1, i32 0, i32 %decl_context_addr.0
1066
1067tmp1 should be simplified to something like:
1068 (!tmp || decl_context == 1)
1069
1070This allows recursive simplifications, tmp1 is used all over the place in
1071the function, e.g. by:
1072
1073 %tmp23 = icmp eq i32 %decl_context_addr.1, 0 ; <i1> [#uses=1]
1074 %tmp24 = xor i1 %tmp1, true ; <i1> [#uses=1]
1075 %or.cond8 = and i1 %tmp23, %tmp24 ; <i1> [#uses=1]
1076
1077later.
1078
Chris Lattner78a7e7c2008-12-06 19:28:22 +00001079//===---------------------------------------------------------------------===//
1080
Chris Lattnerd4137f42009-11-29 02:19:52 +00001081[STORE SINKING]
1082
Chris Lattner78a7e7c2008-12-06 19:28:22 +00001083Store sinking: This code:
1084
1085void f (int n, int *cond, int *res) {
1086 int i;
1087 *res = 0;
1088 for (i = 0; i < n; i++)
1089 if (*cond)
1090 *res ^= 234; /* (*) */
1091}
1092
1093On this function GVN hoists the fully redundant value of *res, but nothing
1094moves the store out. This gives us this code:
1095
1096bb: ; preds = %bb2, %entry
1097 %.rle = phi i32 [ 0, %entry ], [ %.rle6, %bb2 ]
1098 %i.05 = phi i32 [ 0, %entry ], [ %indvar.next, %bb2 ]
1099 %1 = load i32* %cond, align 4
1100 %2 = icmp eq i32 %1, 0
1101 br i1 %2, label %bb2, label %bb1
1102
1103bb1: ; preds = %bb
1104 %3 = xor i32 %.rle, 234
1105 store i32 %3, i32* %res, align 4
1106 br label %bb2
1107
1108bb2: ; preds = %bb, %bb1
1109 %.rle6 = phi i32 [ %3, %bb1 ], [ %.rle, %bb ]
1110 %indvar.next = add i32 %i.05, 1
1111 %exitcond = icmp eq i32 %indvar.next, %n
1112 br i1 %exitcond, label %return, label %bb
1113
1114DSE should sink partially dead stores to get the store out of the loop.
1115
Chris Lattner6a09a742008-12-06 22:52:12 +00001116Here's another partial dead case:
1117http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12395
1118
Chris Lattner78a7e7c2008-12-06 19:28:22 +00001119//===---------------------------------------------------------------------===//
1120
1121Scalar PRE hoists the mul in the common block up to the else:
1122
1123int test (int a, int b, int c, int g) {
1124 int d, e;
1125 if (a)
1126 d = b * c;
1127 else
1128 d = b - c;
1129 e = b * c + g;
1130 return d + e;
1131}
1132
1133It would be better to do the mul once to reduce codesize above the if.
1134This is GCC PR38204.
1135
1136//===---------------------------------------------------------------------===//
1137
Chris Lattnerd4137f42009-11-29 02:19:52 +00001138[STORE SINKING]
1139
Chris Lattner78a7e7c2008-12-06 19:28:22 +00001140GCC PR37810 is an interesting case where we should sink load/store reload
1141into the if block and outside the loop, so we don't reload/store it on the
1142non-call path.
1143
1144for () {
1145 *P += 1;
1146 if ()
1147 call();
1148 else
1149 ...
1150->
1151tmp = *P
1152for () {
1153 tmp += 1;
1154 if () {
1155 *P = tmp;
1156 call();
1157 tmp = *P;
1158 } else ...
1159}
1160*P = tmp;
1161
Chris Lattner8f416f32008-12-15 07:49:24 +00001162We now hoist the reload after the call (Transforms/GVN/lpre-call-wrap.ll), but
1163we don't sink the store. We need partially dead store sinking.
1164
Chris Lattner78a7e7c2008-12-06 19:28:22 +00001165//===---------------------------------------------------------------------===//
1166
Chris Lattnerd4137f42009-11-29 02:19:52 +00001167[LOAD PRE CRIT EDGE SPLITTING]
Chris Lattner8f416f32008-12-15 07:49:24 +00001168
Chris Lattner78a7e7c2008-12-06 19:28:22 +00001169GCC PR37166: Sinking of loads prevents SROA'ing the "g" struct on the stack
1170leading to excess stack traffic. This could be handled by GVN with some crazy
1171symbolic phi translation. The code we get looks like (g is on the stack):
1172
1173bb2: ; preds = %bb1
1174..
1175 %9 = getelementptr %struct.f* %g, i32 0, i32 0
1176 store i32 %8, i32* %9, align bel %bb3
1177
1178bb3: ; preds = %bb1, %bb2, %bb
1179 %c_addr.0 = phi %struct.f* [ %g, %bb2 ], [ %c, %bb ], [ %c, %bb1 ]
1180 %b_addr.0 = phi %struct.f* [ %b, %bb2 ], [ %g, %bb ], [ %b, %bb1 ]
1181 %10 = getelementptr %struct.f* %c_addr.0, i32 0, i32 0
1182 %11 = load i32* %10, align 4
1183
Chris Lattner6d949262009-11-27 16:53:57 +00001184%11 is partially redundant, an in BB2 it should have the value %8.
Chris Lattner78a7e7c2008-12-06 19:28:22 +00001185
Chris Lattnerd4137f42009-11-29 02:19:52 +00001186GCC PR33344 and PR35287 are similar cases.
Chris Lattner6a09a742008-12-06 22:52:12 +00001187
Chris Lattner6c9fab72009-11-05 18:19:19 +00001188
1189//===---------------------------------------------------------------------===//
1190
Chris Lattnerd4137f42009-11-29 02:19:52 +00001191[LOAD PRE]
1192
Chris Lattner6a09a742008-12-06 22:52:12 +00001193There are many load PRE testcases in testsuite/gcc.dg/tree-ssa/loadpre* in the
Chris Lattnerd4137f42009-11-29 02:19:52 +00001194GCC testsuite, ones we don't get yet are (checked through loadpre25):
1195
1196[CRIT EDGE BREAKING]
1197loadpre3.c predcom-4.c
1198
1199[PRE OF READONLY CALL]
1200loadpre5.c
1201
1202[TURN SELECT INTO BRANCH]
1203loadpre14.c loadpre15.c
1204
1205actually a conditional increment: loadpre18.c loadpre19.c
1206
1207
1208//===---------------------------------------------------------------------===//
1209
1210[SCALAR PRE]
1211There are many PRE testcases in testsuite/gcc.dg/tree-ssa/ssa-pre-*.c in the
1212GCC testsuite.
Chris Lattner6a09a742008-12-06 22:52:12 +00001213
Chris Lattner582048d2008-12-15 08:32:28 +00001214//===---------------------------------------------------------------------===//
1215
1216There are some interesting cases in testsuite/gcc.dg/tree-ssa/pred-comm* in the
Chris Lattnerd4137f42009-11-29 02:19:52 +00001217GCC testsuite. For example, we get the first example in predcom-1.c, but
1218miss the second one:
Chris Lattner582048d2008-12-15 08:32:28 +00001219
Chris Lattnerd4137f42009-11-29 02:19:52 +00001220unsigned fib[1000];
1221unsigned avg[1000];
Chris Lattner582048d2008-12-15 08:32:28 +00001222
Chris Lattnerd4137f42009-11-29 02:19:52 +00001223__attribute__ ((noinline))
1224void count_averages(int n) {
1225 int i;
1226 for (i = 1; i < n; i++)
1227 avg[i] = (((unsigned long) fib[i - 1] + fib[i] + fib[i + 1]) / 3) & 0xffff;
1228}
Chris Lattner582048d2008-12-15 08:32:28 +00001229
Chris Lattnerd4137f42009-11-29 02:19:52 +00001230which compiles into two loads instead of one in the loop.
Chris Lattner582048d2008-12-15 08:32:28 +00001231
Chris Lattnerd4137f42009-11-29 02:19:52 +00001232predcom-2.c is the same as predcom-1.c
Chris Lattner582048d2008-12-15 08:32:28 +00001233
Chris Lattner582048d2008-12-15 08:32:28 +00001234predcom-3.c is very similar but needs loads feeding each other instead of
1235store->load.
Chris Lattner582048d2008-12-15 08:32:28 +00001236
1237
1238//===---------------------------------------------------------------------===//
1239
Chris Lattner582048d2008-12-15 08:32:28 +00001240Type based alias analysis:
Chris Lattner6a09a742008-12-06 22:52:12 +00001241http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14705
1242
1243//===---------------------------------------------------------------------===//
1244
Chris Lattner6a09a742008-12-06 22:52:12 +00001245A/B get pinned to the stack because we turn an if/then into a select instead
1246of PRE'ing the load/store. This may be fixable in instcombine:
1247http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37892
1248
Chris Lattner93c6c772009-09-21 02:53:57 +00001249struct X { int i; };
1250int foo (int x) {
1251 struct X a;
1252 struct X b;
1253 struct X *p;
1254 a.i = 1;
1255 b.i = 2;
1256 if (x)
1257 p = &a;
1258 else
1259 p = &b;
1260 return p->i;
1261}
Chris Lattner582048d2008-12-15 08:32:28 +00001262
Chris Lattner93c6c772009-09-21 02:53:57 +00001263//===---------------------------------------------------------------------===//
Chris Lattner582048d2008-12-15 08:32:28 +00001264
Chris Lattner6a09a742008-12-06 22:52:12 +00001265Interesting missed case because of control flow flattening (should be 2 loads):
1266http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26629
Chris Lattner582048d2008-12-15 08:32:28 +00001267With: llvm-gcc t2.c -S -o - -O0 -emit-llvm | llvm-as |
1268 opt -mem2reg -gvn -instcombine | llvm-dis
Chris Lattnerd4137f42009-11-29 02:19:52 +00001269we miss it because we need 1) CRIT EDGE 2) MULTIPLE DIFFERENT
Chris Lattner582048d2008-12-15 08:32:28 +00001270VALS PRODUCED BY ONE BLOCK OVER DIFFERENT PATHS
Chris Lattner6a09a742008-12-06 22:52:12 +00001271
1272//===---------------------------------------------------------------------===//
1273
1274http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19633
1275We could eliminate the branch condition here, loading from null is undefined:
1276
1277struct S { int w, x, y, z; };
1278struct T { int r; struct S s; };
1279void bar (struct S, int);
1280void foo (int a, struct T b)
1281{
1282 struct S *c = 0;
1283 if (a)
1284 c = &b.s;
1285 bar (*c, a);
1286}
1287
1288//===---------------------------------------------------------------------===//
Chris Lattner88d84b22008-12-02 06:32:34 +00001289
Chris Lattner9cf8ef62008-12-23 20:52:52 +00001290simplifylibcalls should do several optimizations for strspn/strcspn:
1291
1292strcspn(x, "") -> strlen(x)
1293strcspn("", x) -> 0
1294strspn("", x) -> 0
1295strspn(x, "") -> strlen(x)
1296strspn(x, "a") -> strchr(x, 'a')-x
1297
1298strcspn(x, "a") -> inlined loop for up to 3 letters (similarly for strspn):
1299
1300size_t __strcspn_c3 (__const char *__s, int __reject1, int __reject2,
1301 int __reject3) {
1302 register size_t __result = 0;
1303 while (__s[__result] != '\0' && __s[__result] != __reject1 &&
1304 __s[__result] != __reject2 && __s[__result] != __reject3)
1305 ++__result;
1306 return __result;
1307}
1308
1309This should turn into a switch on the character. See PR3253 for some notes on
1310codegen.
1311
1312456.hmmer apparently uses strcspn and strspn a lot. 471.omnetpp uses strspn.
1313
1314//===---------------------------------------------------------------------===//
Chris Lattnerd23b7992008-12-31 00:54:13 +00001315
1316"gas" uses this idiom:
1317 else if (strchr ("+-/*%|&^:[]()~", *intel_parser.op_string))
1318..
1319 else if (strchr ("<>", *intel_parser.op_string)
1320
1321Those should be turned into a switch.
1322
1323//===---------------------------------------------------------------------===//
Chris Lattnerffb08f52009-01-08 06:52:57 +00001324
1325252.eon contains this interesting code:
1326
1327 %3072 = getelementptr [100 x i8]* %tempString, i32 0, i32 0
1328 %3073 = call i8* @strcpy(i8* %3072, i8* %3071) nounwind
1329 %strlen = call i32 @strlen(i8* %3072) ; uses = 1
1330 %endptr = getelementptr [100 x i8]* %tempString, i32 0, i32 %strlen
1331 call void @llvm.memcpy.i32(i8* %endptr,
1332 i8* getelementptr ([5 x i8]* @"\01LC42", i32 0, i32 0), i32 5, i32 1)
1333 %3074 = call i32 @strlen(i8* %endptr) nounwind readonly
1334
1335This is interesting for a couple reasons. First, in this:
1336
1337 %3073 = call i8* @strcpy(i8* %3072, i8* %3071) nounwind
1338 %strlen = call i32 @strlen(i8* %3072)
1339
1340The strlen could be replaced with: %strlen = sub %3072, %3073, because the
1341strcpy call returns a pointer to the end of the string. Based on that, the
1342endptr GEP just becomes equal to 3073, which eliminates a strlen call and GEP.
1343
1344Second, the memcpy+strlen strlen can be replaced with:
1345
1346 %3074 = call i32 @strlen([5 x i8]* @"\01LC42") nounwind readonly
1347
1348Because the destination was just copied into the specified memory buffer. This,
1349in turn, can be constant folded to "4".
1350
1351In other code, it contains:
1352
1353 %endptr6978 = bitcast i8* %endptr69 to i32*
1354 store i32 7107374, i32* %endptr6978, align 1
1355 %3167 = call i32 @strlen(i8* %endptr69) nounwind readonly
1356
1357Which could also be constant folded. Whatever is producing this should probably
1358be fixed to leave this as a memcpy from a string.
1359
1360Further, eon also has an interesting partially redundant strlen call:
1361
1362bb8: ; preds = %_ZN18eonImageCalculatorC1Ev.exit
1363 %682 = getelementptr i8** %argv, i32 6 ; <i8**> [#uses=2]
1364 %683 = load i8** %682, align 4 ; <i8*> [#uses=4]
1365 %684 = load i8* %683, align 1 ; <i8> [#uses=1]
1366 %685 = icmp eq i8 %684, 0 ; <i1> [#uses=1]
1367 br i1 %685, label %bb10, label %bb9
1368
1369bb9: ; preds = %bb8
1370 %686 = call i32 @strlen(i8* %683) nounwind readonly
1371 %687 = icmp ugt i32 %686, 254 ; <i1> [#uses=1]
1372 br i1 %687, label %bb10, label %bb11
1373
1374bb10: ; preds = %bb9, %bb8
1375 %688 = call i32 @strlen(i8* %683) nounwind readonly
1376
1377This could be eliminated by doing the strlen once in bb8, saving code size and
1378improving perf on the bb8->9->10 path.
1379
1380//===---------------------------------------------------------------------===//
Chris Lattner9fee08f2009-01-08 07:34:55 +00001381
1382I see an interesting fully redundant call to strlen left in 186.crafty:InputMove
1383which looks like:
1384 %movetext11 = getelementptr [128 x i8]* %movetext, i32 0, i32 0
1385
1386
1387bb62: ; preds = %bb55, %bb53
1388 %promote.0 = phi i32 [ %169, %bb55 ], [ 0, %bb53 ]
1389 %171 = call i32 @strlen(i8* %movetext11) nounwind readonly align 1
1390 %172 = add i32 %171, -1 ; <i32> [#uses=1]
1391 %173 = getelementptr [128 x i8]* %movetext, i32 0, i32 %172
1392
1393... no stores ...
1394 br i1 %or.cond, label %bb65, label %bb72
1395
1396bb65: ; preds = %bb62
1397 store i8 0, i8* %173, align 1
1398 br label %bb72
1399
1400bb72: ; preds = %bb65, %bb62
1401 %trank.1 = phi i32 [ %176, %bb65 ], [ -1, %bb62 ]
1402 %177 = call i32 @strlen(i8* %movetext11) nounwind readonly align 1
1403
1404Note that on the bb62->bb72 path, that the %177 strlen call is partially
1405redundant with the %171 call. At worst, we could shove the %177 strlen call
1406up into the bb65 block moving it out of the bb62->bb72 path. However, note
1407that bb65 stores to the string, zeroing out the last byte. This means that on
1408that path the value of %177 is actually just %171-1. A sub is cheaper than a
1409strlen!
1410
1411This pattern repeats several times, basically doing:
1412
1413 A = strlen(P);
1414 P[A-1] = 0;
1415 B = strlen(P);
1416 where it is "obvious" that B = A-1.
1417
1418//===---------------------------------------------------------------------===//
1419
1420186.crafty contains this interesting pattern:
1421
1422%77 = call i8* @strstr(i8* getelementptr ([6 x i8]* @"\01LC5", i32 0, i32 0),
1423 i8* %30)
1424%phitmp648 = icmp eq i8* %77, getelementptr ([6 x i8]* @"\01LC5", i32 0, i32 0)
1425br i1 %phitmp648, label %bb70, label %bb76
1426
1427bb70: ; preds = %OptionMatch.exit91, %bb69
1428 %78 = call i32 @strlen(i8* %30) nounwind readonly align 1 ; <i32> [#uses=1]
1429
1430This is basically:
1431 cststr = "abcdef";
1432 if (strstr(cststr, P) == cststr) {
1433 x = strlen(P);
1434 ...
1435
1436The strstr call would be significantly cheaper written as:
1437
1438cststr = "abcdef";
1439if (memcmp(P, str, strlen(P)))
1440 x = strlen(P);
1441
1442This is memcmp+strlen instead of strstr. This also makes the strlen fully
1443redundant.
1444
1445//===---------------------------------------------------------------------===//
1446
1447186.crafty also contains this code:
1448
1449%1906 = call i32 @strlen(i8* getelementptr ([32 x i8]* @pgn_event, i32 0,i32 0))
1450%1907 = getelementptr [32 x i8]* @pgn_event, i32 0, i32 %1906
1451%1908 = call i8* @strcpy(i8* %1907, i8* %1905) nounwind align 1
1452%1909 = call i32 @strlen(i8* getelementptr ([32 x i8]* @pgn_event, i32 0,i32 0))
1453%1910 = getelementptr [32 x i8]* @pgn_event, i32 0, i32 %1909
1454
1455The last strlen is computable as 1908-@pgn_event, which means 1910=1908.
1456
1457//===---------------------------------------------------------------------===//
1458
1459186.crafty has this interesting pattern with the "out.4543" variable:
1460
1461call void @llvm.memcpy.i32(
1462 i8* getelementptr ([10 x i8]* @out.4543, i32 0, i32 0),
1463 i8* getelementptr ([7 x i8]* @"\01LC28700", i32 0, i32 0), i32 7, i32 1)
1464%101 = call@printf(i8* ... @out.4543, i32 0, i32 0)) nounwind
1465
1466It is basically doing:
1467
1468 memcpy(globalarray, "string");
1469 printf(..., globalarray);
1470
1471Anyway, by knowing that printf just reads the memory and forward substituting
1472the string directly into the printf, this eliminates reads from globalarray.
1473Since this pattern occurs frequently in crafty (due to the "DisplayTime" and
1474other similar functions) there are many stores to "out". Once all the printfs
1475stop using "out", all that is left is the memcpy's into it. This should allow
1476globalopt to remove the "stored only" global.
1477
1478//===---------------------------------------------------------------------===//
1479
Dan Gohman8289b052009-01-20 01:07:33 +00001480This code:
1481
1482define inreg i32 @foo(i8* inreg %p) nounwind {
1483 %tmp0 = load i8* %p
1484 %tmp1 = ashr i8 %tmp0, 5
1485 %tmp2 = sext i8 %tmp1 to i32
1486 ret i32 %tmp2
1487}
1488
1489could be dagcombine'd to a sign-extending load with a shift.
1490For example, on x86 this currently gets this:
1491
1492 movb (%eax), %al
1493 sarb $5, %al
1494 movsbl %al, %eax
1495
1496while it could get this:
1497
1498 movsbl (%eax), %eax
1499 sarl $5, %eax
1500
1501//===---------------------------------------------------------------------===//
Chris Lattner256baa42009-01-22 07:16:03 +00001502
1503GCC PR31029:
1504
1505int test(int x) { return 1-x == x; } // --> return false
1506int test2(int x) { return 2-x == x; } // --> return x == 1 ?
1507
1508Always foldable for odd constants, what is the rule for even?
1509
1510//===---------------------------------------------------------------------===//
1511
Torok Edwine46a6862009-01-24 19:30:25 +00001512PR 3381: GEP to field of size 0 inside a struct could be turned into GEP
1513for next field in struct (which is at same address).
1514
1515For example: store of float into { {{}}, float } could be turned into a store to
1516the float directly.
1517
Torok Edwin474479f2009-02-20 18:42:06 +00001518//===---------------------------------------------------------------------===//
Nick Lewycky20babb12009-02-25 06:52:48 +00001519
Torok Edwin474479f2009-02-20 18:42:06 +00001520#include <math.h>
1521double foo(double a) { return sin(a); }
1522
1523This compiles into this on x86-64 Linux:
1524foo:
1525 subq $8, %rsp
1526 call sin
1527 addq $8, %rsp
1528 ret
1529vs:
1530
1531foo:
1532 jmp sin
1533
Nick Lewycky20babb12009-02-25 06:52:48 +00001534//===---------------------------------------------------------------------===//
1535
Chris Lattner32c5f172009-05-11 17:41:40 +00001536The arg promotion pass should make use of nocapture to make its alias analysis
1537stuff much more precise.
1538
1539//===---------------------------------------------------------------------===//
1540
1541The following functions should be optimized to use a select instead of a
1542branch (from gcc PR40072):
1543
1544char char_int(int m) {if(m>7) return 0; return m;}
1545int int_char(char m) {if(m>7) return 0; return m;}
1546
1547//===---------------------------------------------------------------------===//
1548
Bill Wendling5a569272009-10-27 22:48:31 +00001549int func(int a, int b) { if (a & 0x80) b |= 0x80; else b &= ~0x80; return b; }
1550
1551Generates this:
1552
1553define i32 @func(i32 %a, i32 %b) nounwind readnone ssp {
1554entry:
1555 %0 = and i32 %a, 128 ; <i32> [#uses=1]
1556 %1 = icmp eq i32 %0, 0 ; <i1> [#uses=1]
1557 %2 = or i32 %b, 128 ; <i32> [#uses=1]
1558 %3 = and i32 %b, -129 ; <i32> [#uses=1]
1559 %b_addr.0 = select i1 %1, i32 %3, i32 %2 ; <i32> [#uses=1]
1560 ret i32 %b_addr.0
1561}
1562
1563However, it's functionally equivalent to:
1564
1565 b = (b & ~0x80) | (a & 0x80);
1566
1567Which generates this:
1568
1569define i32 @func(i32 %a, i32 %b) nounwind readnone ssp {
1570entry:
1571 %0 = and i32 %b, -129 ; <i32> [#uses=1]
1572 %1 = and i32 %a, 128 ; <i32> [#uses=1]
1573 %2 = or i32 %0, %1 ; <i32> [#uses=1]
1574 ret i32 %2
1575}
1576
1577This can be generalized for other forms:
1578
1579 b = (b & ~0x80) | (a & 0x40) << 1;
1580
1581//===---------------------------------------------------------------------===//
Bill Wendlingc872e9c2009-10-27 23:30:07 +00001582
1583These two functions produce different code. They shouldn't:
1584
1585#include <stdint.h>
1586
1587uint8_t p1(uint8_t b, uint8_t a) {
1588 b = (b & ~0xc0) | (a & 0xc0);
1589 return (b);
1590}
1591
1592uint8_t p2(uint8_t b, uint8_t a) {
1593 b = (b & ~0x40) | (a & 0x40);
1594 b = (b & ~0x80) | (a & 0x80);
1595 return (b);
1596}
1597
1598define zeroext i8 @p1(i8 zeroext %b, i8 zeroext %a) nounwind readnone ssp {
1599entry:
1600 %0 = and i8 %b, 63 ; <i8> [#uses=1]
1601 %1 = and i8 %a, -64 ; <i8> [#uses=1]
1602 %2 = or i8 %1, %0 ; <i8> [#uses=1]
1603 ret i8 %2
1604}
1605
1606define zeroext i8 @p2(i8 zeroext %b, i8 zeroext %a) nounwind readnone ssp {
1607entry:
1608 %0 = and i8 %b, 63 ; <i8> [#uses=1]
1609 %.masked = and i8 %a, 64 ; <i8> [#uses=1]
1610 %1 = and i8 %a, -128 ; <i8> [#uses=1]
1611 %2 = or i8 %1, %0 ; <i8> [#uses=1]
1612 %3 = or i8 %2, %.masked ; <i8> [#uses=1]
1613 ret i8 %3
1614}
1615
1616//===---------------------------------------------------------------------===//
Chris Lattner6fdfc9c2009-11-11 17:51:27 +00001617
1618IPSCCP does not currently propagate argument dependent constants through
1619functions where it does not not all of the callers. This includes functions
1620with normal external linkage as well as templates, C99 inline functions etc.
1621Specifically, it does nothing to:
1622
1623define i32 @test(i32 %x, i32 %y, i32 %z) nounwind {
1624entry:
1625 %0 = add nsw i32 %y, %z
1626 %1 = mul i32 %0, %x
1627 %2 = mul i32 %y, %z
1628 %3 = add nsw i32 %1, %2
1629 ret i32 %3
1630}
1631
1632define i32 @test2() nounwind {
1633entry:
1634 %0 = call i32 @test(i32 1, i32 2, i32 4) nounwind
1635 ret i32 %0
1636}
1637
1638It would be interesting extend IPSCCP to be able to handle simple cases like
1639this, where all of the arguments to a call are constant. Because IPSCCP runs
1640before inlining, trivial templates and inline functions are not yet inlined.
1641The results for a function + set of constant arguments should be memoized in a
1642map.
1643
1644//===---------------------------------------------------------------------===//
Chris Lattnerfc926c22009-11-11 17:54:02 +00001645
1646The libcall constant folding stuff should be moved out of SimplifyLibcalls into
1647libanalysis' constantfolding logic. This would allow IPSCCP to be able to
1648handle simple things like this:
1649
1650static int foo(const char *X) { return strlen(X); }
1651int bar() { return foo("abcd"); }
1652
1653//===---------------------------------------------------------------------===//
Nick Lewycky93f9f7a2009-11-15 17:51:23 +00001654
1655InstCombine should use SimplifyDemandedBits to remove the or instruction:
1656
1657define i1 @test(i8 %x, i8 %y) {
1658 %A = or i8 %x, 1
1659 %B = icmp ugt i8 %A, 3
1660 ret i1 %B
1661}
1662
1663Currently instcombine calls SimplifyDemandedBits with either all bits or just
1664the sign bit, if the comparison is obviously a sign test. In this case, we only
1665need all but the bottom two bits from %A, and if we gave that mask to SDB it
1666would delete the or instruction for us.
1667
1668//===---------------------------------------------------------------------===//
Chris Lattner05332172009-12-03 07:41:54 +00001669
1670FunctionAttrs is not marking this function as readnone (just readonly):
1671$ clang t.c -emit-llvm -S -o - -O0 | opt -mem2reg -S -functionattrs
1672
1673int t(int a, int b, int c) {
1674 int *p;
1675 if (a)
1676 p = &a;
1677 else
1678 p = &c;
1679 return *p;
1680}
1681
1682This is because we codegen this to:
1683
1684define i32 @t(i32 %a, i32 %b, i32 %c) nounwind readonly ssp {
1685entry:
1686 %a.addr = alloca i32 ; <i32*> [#uses=3]
1687 %c.addr = alloca i32 ; <i32*> [#uses=2]
1688...
1689
1690if.end:
1691 %p.0 = phi i32* [ %a.addr, %if.then ], [ %c.addr, %if.else ]
1692 %tmp2 = load i32* %p.0 ; <i32> [#uses=1]
1693 ret i32 %tmp2
1694}
1695
1696And functionattrs doesn't realize that the p.0 load points to function local
1697memory.
1698
Chris Lattner89742c22009-12-03 07:43:46 +00001699Also, functionattrs doesn't know about memcpy/memset. This function should be
1700marked readnone, since it only twiddles local memory, but functionattrs doesn't
1701handle memset/memcpy/memmove aggressively:
1702
1703struct X { int *p; int *q; };
1704int foo() {
1705 int i = 0, j = 1;
1706 struct X x, y;
1707 int **p;
1708 y.p = &i;
1709 x.q = &j;
1710 p = __builtin_memcpy (&x, &y, sizeof (int *));
1711 return **p;
1712}
1713
Chris Lattner05332172009-12-03 07:41:54 +00001714//===---------------------------------------------------------------------===//
1715