blob: 9ea0a91daf518c27f1eec33dd3acd87ccf04c7b5 [file] [log] [blame]
Chris Lattner086c0142006-02-03 06:21:43 +00001Target Independent Opportunities:
2
Chris Lattnerf308ea02006-09-28 06:01:17 +00003//===---------------------------------------------------------------------===//
4
Chris Lattner9b62b452006-11-14 01:57:53 +00005With the recent changes to make the implicit def/use set explicit in
6machineinstrs, we should change the target descriptions for 'call' instructions
7so that the .td files don't list all the call-clobbered registers as implicit
8defs. Instead, these should be added by the code generator (e.g. on the dag).
9
10This has a number of uses:
11
121. PPC32/64 and X86 32/64 can avoid having multiple copies of call instructions
13 for their different impdef sets.
142. Targets with multiple calling convs (e.g. x86) which have different clobber
15 sets don't need copies of call instructions.
163. 'Interprocedural register allocation' can be done to reduce the clobber sets
17 of calls.
18
19//===---------------------------------------------------------------------===//
20
Chris Lattner086c0142006-02-03 06:21:43 +000021FreeBench/mason contains code like this:
22
23static p_type m0u(p_type p) {
24 int m[]={0, 8, 1, 2, 16, 5, 13, 7, 14, 9, 3, 4, 11, 12, 15, 10, 17, 6};
25 p_type pu;
26 pu.a = m[p.a];
27 pu.b = m[p.b];
28 pu.c = m[p.c];
29 return pu;
30}
31
32We currently compile this into a memcpy from a static array into 'm', then
33a bunch of loads from m. It would be better to avoid the memcpy and just do
34loads from the static array.
35
Nate Begeman81e80972006-03-17 01:40:33 +000036//===---------------------------------------------------------------------===//
37
38Make the PPC branch selector target independant
39
40//===---------------------------------------------------------------------===//
Chris Lattner086c0142006-02-03 06:21:43 +000041
42Get the C front-end to expand hypot(x,y) -> llvm.sqrt(x*x+y*y) when errno and
43precision don't matter (ffastmath). Misc/mandel will like this. :)
44
Chris Lattner086c0142006-02-03 06:21:43 +000045//===---------------------------------------------------------------------===//
46
47Solve this DAG isel folding deficiency:
48
49int X, Y;
50
51void fn1(void)
52{
53 X = X | (Y << 3);
54}
55
56compiles to
57
58fn1:
59 movl Y, %eax
60 shll $3, %eax
61 orl X, %eax
62 movl %eax, X
63 ret
64
65The problem is the store's chain operand is not the load X but rather
66a TokenFactor of the load X and load Y, which prevents the folding.
67
68There are two ways to fix this:
69
701. The dag combiner can start using alias analysis to realize that y/x
71 don't alias, making the store to X not dependent on the load from Y.
722. The generated isel could be made smarter in the case it can't
73 disambiguate the pointers.
74
75Number 1 is the preferred solution.
76
Evan Chenge617b082006-03-13 23:19:10 +000077This has been "fixed" by a TableGen hack. But that is a short term workaround
78which will be removed once the proper fix is made.
79
Chris Lattner086c0142006-02-03 06:21:43 +000080//===---------------------------------------------------------------------===//
81
Chris Lattnerb27b69f2006-03-04 01:19:34 +000082On targets with expensive 64-bit multiply, we could LSR this:
83
84for (i = ...; ++i) {
85 x = 1ULL << i;
86
87into:
88 long long tmp = 1;
89 for (i = ...; ++i, tmp+=tmp)
90 x = tmp;
91
92This would be a win on ppc32, but not x86 or ppc64.
93
Chris Lattnerad019932006-03-04 08:44:51 +000094//===---------------------------------------------------------------------===//
Chris Lattner5b0fe7d2006-03-05 20:00:08 +000095
96Shrink: (setlt (loadi32 P), 0) -> (setlt (loadi8 Phi), 0)
97
98//===---------------------------------------------------------------------===//
Chris Lattner549f27d22006-03-07 02:46:26 +000099
Chris Lattnerc20995e2006-03-11 20:17:08 +0000100Reassociate should turn: X*X*X*X -> t=(X*X) (t*t) to eliminate a multiply.
101
102//===---------------------------------------------------------------------===//
103
Chris Lattner74cfb7d2006-03-11 20:20:40 +0000104Interesting? testcase for add/shift/mul reassoc:
105
106int bar(int x, int y) {
107 return x*x*x+y+x*x*x*x*x*y*y*y*y;
108}
109int foo(int z, int n) {
110 return bar(z, n) + bar(2*z, 2*n);
111}
112
113//===---------------------------------------------------------------------===//
114
Chris Lattner82c78b22006-03-09 20:13:21 +0000115These two functions should generate the same code on big-endian systems:
116
117int g(int *j,int *l) { return memcmp(j,l,4); }
118int h(int *j, int *l) { return *j - *l; }
119
120this could be done in SelectionDAGISel.cpp, along with other special cases,
121for 1,2,4,8 bytes.
122
123//===---------------------------------------------------------------------===//
124
Evan Chengd3864b52006-03-19 06:09:23 +0000125Add LSR exit value substitution. It'll probably be a win for Ackermann, etc.
Chris Lattnerc04b4232006-03-22 07:33:46 +0000126
127//===---------------------------------------------------------------------===//
128
129It would be nice to revert this patch:
130http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060213/031986.html
131
132And teach the dag combiner enough to simplify the code expanded before
133legalize. It seems plausible that this knowledge would let it simplify other
134stuff too.
135
Chris Lattnere6cd96d2006-03-24 19:59:17 +0000136//===---------------------------------------------------------------------===//
137
Evan Cheng67d3d4c2006-03-31 22:35:14 +0000138For packed types, TargetData.cpp::getTypeInfo() returns alignment that is equal
139to the type size. It works but can be overly conservative as the alignment of
140specific packed types are target dependent.
Chris Lattnereaa7c062006-04-01 04:08:29 +0000141
142//===---------------------------------------------------------------------===//
143
144We should add 'unaligned load/store' nodes, and produce them from code like
145this:
146
147v4sf example(float *P) {
148 return (v4sf){P[0], P[1], P[2], P[3] };
149}
150
151//===---------------------------------------------------------------------===//
152
Chris Lattner52951222006-04-02 01:47:20 +0000153We should constant fold packed type casts at the LLVM level, regardless of the
154cast. Currently we cannot fold some casts because we don't have TargetData
155information in the constant folder, so we don't know the endianness of the
156target!
157
158//===---------------------------------------------------------------------===//
Chris Lattner879acef2006-04-20 18:49:28 +0000159
Chris Lattner16abfdf2006-05-18 18:26:13 +0000160Add support for conditional increments, and other related patterns. Instead
161of:
162
163 movl 136(%esp), %eax
164 cmpl $0, %eax
165 je LBB16_2 #cond_next
166LBB16_1: #cond_true
167 incl _foo
168LBB16_2: #cond_next
169
170emit:
171 movl _foo, %eax
172 cmpl $1, %edi
173 sbbl $-1, %eax
174 movl %eax, _foo
175
176//===---------------------------------------------------------------------===//
Chris Lattner870cf1b2006-05-19 20:45:08 +0000177
178Combine: a = sin(x), b = cos(x) into a,b = sincos(x).
179
180Expand these to calls of sin/cos and stores:
181 double sincos(double x, double *sin, double *cos);
182 float sincosf(float x, float *sin, float *cos);
183 long double sincosl(long double x, long double *sin, long double *cos);
184
185Doing so could allow SROA of the destination pointers. See also:
186http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17687
187
188//===---------------------------------------------------------------------===//
Chris Lattnerf00f68a2006-05-19 21:01:38 +0000189
190Scalar Repl cannot currently promote this testcase to 'ret long cst':
191
192 %struct.X = type { int, int }
193 %struct.Y = type { %struct.X }
194ulong %bar() {
Chris Lattnera5546fb2006-12-11 00:44:03 +0000195 %retval = alloca %struct.Y, align 8
Chris Lattner7ed96ab2006-09-16 23:57:51 +0000196 %tmp12 = getelementptr %struct.Y* %retval, int 0, uint 0, uint 0
Chris Lattnerf00f68a2006-05-19 21:01:38 +0000197 store int 0, int* %tmp12
Chris Lattner7ed96ab2006-09-16 23:57:51 +0000198 %tmp15 = getelementptr %struct.Y* %retval, int 0, uint 0, uint 1
Chris Lattnerf00f68a2006-05-19 21:01:38 +0000199 store int 1, int* %tmp15
Chris Lattnera5546fb2006-12-11 00:44:03 +0000200 %retval = bitcast %struct.Y* %retval to ulong*
201 %retval = load ulong* %retval
Chris Lattnerf00f68a2006-05-19 21:01:38 +0000202 ret ulong %retval
203}
204
205it should be extended to do so.
206
207//===---------------------------------------------------------------------===//
Chris Lattnere8263e62006-05-21 03:57:07 +0000208
Chris Lattnera5546fb2006-12-11 00:44:03 +0000209-scalarrepl should promote this to be a vector scalar.
210
211 %struct..0anon = type { <4 x float> }
212
213implementation ; Functions:
214
215void %test1(<4 x float> %V, float* %P) {
216 %u = alloca %struct..0anon, align 16
217 %tmp = getelementptr %struct..0anon* %u, int 0, uint 0
218 store <4 x float> %V, <4 x float>* %tmp
219 %tmp1 = bitcast %struct..0anon* %u to [4 x float]*
220 %tmp = getelementptr [4 x float]* %tmp1, int 0, int 1
221 %tmp = load float* %tmp
222 %tmp3 = mul float %tmp, 2.000000e+00
223 store float %tmp3, float* %P
224 ret void
225}
226
227//===---------------------------------------------------------------------===//
228
Chris Lattnere8263e62006-05-21 03:57:07 +0000229Turn this into a single byte store with no load (the other 3 bytes are
230unmodified):
231
232void %test(uint* %P) {
233 %tmp = load uint* %P
234 %tmp14 = or uint %tmp, 3305111552
235 %tmp15 = and uint %tmp14, 3321888767
236 store uint %tmp15, uint* %P
237 ret void
238}
239
Chris Lattner9e18ef52006-05-30 21:29:15 +0000240//===---------------------------------------------------------------------===//
241
242dag/inst combine "clz(x)>>5 -> x==0" for 32-bit x.
243
244Compile:
245
246int bar(int x)
247{
248 int t = __builtin_clz(x);
249 return -(t>>5);
250}
251
252to:
253
254_bar: addic r3,r3,-1
255 subfe r3,r3,r3
256 blr
257
Chris Lattnercbce2f62006-09-15 20:31:36 +0000258//===---------------------------------------------------------------------===//
259
260Legalize should lower ctlz like this:
261 ctlz(x) = popcnt((x-1) & ~x)
262
263on targets that have popcnt but not ctlz. itanium, what else?
Chris Lattner9e18ef52006-05-30 21:29:15 +0000264
Chris Lattner7ed96ab2006-09-16 23:57:51 +0000265//===---------------------------------------------------------------------===//
266
267quantum_sigma_x in 462.libquantum contains the following loop:
268
269 for(i=0; i<reg->size; i++)
270 {
271 /* Flip the target bit of each basis state */
272 reg->node[i].state ^= ((MAX_UNSIGNED) 1 << target);
273 }
274
275Where MAX_UNSIGNED/state is a 64-bit int. On a 32-bit platform it would be just
276so cool to turn it into something like:
277
Chris Lattnerb33a42a2006-09-18 04:54:35 +0000278 long long Res = ((MAX_UNSIGNED) 1 << target);
Chris Lattner7ed96ab2006-09-16 23:57:51 +0000279 if (target < 32) {
280 for(i=0; i<reg->size; i++)
Chris Lattnerb33a42a2006-09-18 04:54:35 +0000281 reg->node[i].state ^= Res & 0xFFFFFFFFULL;
Chris Lattner7ed96ab2006-09-16 23:57:51 +0000282 } else {
283 for(i=0; i<reg->size; i++)
Chris Lattnerb33a42a2006-09-18 04:54:35 +0000284 reg->node[i].state ^= Res & 0xFFFFFFFF00000000ULL
Chris Lattner7ed96ab2006-09-16 23:57:51 +0000285 }
286
287... which would only do one 32-bit XOR per loop iteration instead of two.
288
289It would also be nice to recognize the reg->size doesn't alias reg->node[i], but
290alas...
291
292//===---------------------------------------------------------------------===//
Chris Lattnerfb981f32006-09-25 17:12:14 +0000293
294This isn't recognized as bswap by instcombine:
295
296unsigned int swap_32(unsigned int v) {
297 v = ((v & 0x00ff00ffU) << 8) | ((v & 0xff00ff00U) >> 8);
298 v = ((v & 0x0000ffffU) << 16) | ((v & 0xffff0000U) >> 16);
299 return v;
300}
301
Chris Lattnerf9bae432006-12-08 02:01:32 +0000302Nor is this (yes, it really is bswap):
303
304unsigned long reverse(unsigned v) {
305 unsigned t;
306 t = v ^ ((v << 16) | (v >> 16));
307 t &= ~0xff0000;
308 v = (v << 24) | (v >> 8);
309 return v ^ (t >> 8);
310}
311
Chris Lattnerfb981f32006-09-25 17:12:14 +0000312//===---------------------------------------------------------------------===//
313
314These should turn into single 16-bit (unaligned?) loads on little/big endian
315processors.
316
317unsigned short read_16_le(const unsigned char *adr) {
318 return adr[0] | (adr[1] << 8);
319}
320unsigned short read_16_be(const unsigned char *adr) {
321 return (adr[0] << 8) | adr[1];
322}
323
324//===---------------------------------------------------------------------===//
Chris Lattnercf103912006-10-24 16:12:47 +0000325
Reid Spencer1628cec2006-10-26 06:15:43 +0000326-instcombine should handle this transform:
Reid Spencere4d87aa2006-12-23 06:05:41 +0000327 icmp pred (sdiv X / C1 ), C2
Reid Spencer1628cec2006-10-26 06:15:43 +0000328when X, C1, and C2 are unsigned. Similarly for udiv and signed operands.
329
330Currently InstCombine avoids this transform but will do it when the signs of
331the operands and the sign of the divide match. See the FIXME in
332InstructionCombining.cpp in the visitSetCondInst method after the switch case
333for Instruction::UDiv (around line 4447) for more details.
334
335The SingleSource/Benchmarks/Shootout-C++/hash and hash2 tests have examples of
336this construct.
Chris Lattnerd7c628d2006-11-03 22:27:39 +0000337
338//===---------------------------------------------------------------------===//
339
340Instcombine misses several of these cases (see the testcase in the patch):
341http://gcc.gnu.org/ml/gcc-patches/2006-10/msg01519.html
342
Reid Spencer1628cec2006-10-26 06:15:43 +0000343//===---------------------------------------------------------------------===//
Chris Lattner578d2df2006-11-10 00:23:26 +0000344
345viterbi speeds up *significantly* if the various "history" related copy loops
346are turned into memcpy calls at the source level. We need a "loops to memcpy"
347pass.
348
349//===---------------------------------------------------------------------===//
Nick Lewyckybf637342006-11-13 00:23:28 +0000350
351-predsimplify should transform this:
352
353void bad(unsigned x)
354{
355 if (x > 4)
356 bar(12);
357 else if (x > 3)
358 bar(523);
359 else if (x > 2)
360 bar(36);
361 else if (x > 1)
362 bar(65);
363 else if (x > 0)
364 bar(45);
365 else
366 bar(367);
367}
368
369into:
370
371void good(unsigned x)
372{
373 if (x == 4)
374 bar(523);
375 else if (x == 3)
376 bar(36);
377 else if (x == 2)
378 bar(65);
379 else if (x == 1)
380 bar(45);
381 else if (x == 0)
382 bar(367);
383 else
384 bar(12);
385}
386
387to enable further optimizations.
388
389//===---------------------------------------------------------------------===//