blob: c04e06fbfe0d078bf4e53ae454493c9dce48deee [file] [log] [blame]
Chris Lattnerd1aaee02006-02-03 06:21:43 +00001Target Independent Opportunities:
2
Chris Lattner3cbd1602006-09-28 06:01:17 +00003//===---------------------------------------------------------------------===//
4
5We should make the following changes to clean up MachineInstr:
6
71. Add an Opcode field to TargetInstrDescriptor, so you can tell the opcode of
8 an instruction with just a TargetInstrDescriptor*.
92. Remove the Opcode field from MachineInstr, replacing it with a
10 TargetInstrDescriptor*.
113. Getting information about a machine instr then becomes:
12 MI->getInfo()->isTwoAddress()
13 instead of:
14 const TargetInstrInfo &TII = ...
15 TII.isTwoAddrInstr(MI->getOpcode())
16
17//===---------------------------------------------------------------------===//
Chris Lattnerd1aaee02006-02-03 06:21:43 +000018
Chris Lattner6dc22332006-11-14 01:57:53 +000019With the recent changes to make the implicit def/use set explicit in
20machineinstrs, we should change the target descriptions for 'call' instructions
21so that the .td files don't list all the call-clobbered registers as implicit
22defs. Instead, these should be added by the code generator (e.g. on the dag).
23
24This has a number of uses:
25
261. PPC32/64 and X86 32/64 can avoid having multiple copies of call instructions
27 for their different impdef sets.
282. Targets with multiple calling convs (e.g. x86) which have different clobber
29 sets don't need copies of call instructions.
303. 'Interprocedural register allocation' can be done to reduce the clobber sets
31 of calls.
32
33//===---------------------------------------------------------------------===//
34
Chris Lattnerd1aaee02006-02-03 06:21:43 +000035FreeBench/mason contains code like this:
36
37static p_type m0u(p_type p) {
38 int m[]={0, 8, 1, 2, 16, 5, 13, 7, 14, 9, 3, 4, 11, 12, 15, 10, 17, 6};
39 p_type pu;
40 pu.a = m[p.a];
41 pu.b = m[p.b];
42 pu.c = m[p.c];
43 return pu;
44}
45
46We currently compile this into a memcpy from a static array into 'm', then
47a bunch of loads from m. It would be better to avoid the memcpy and just do
48loads from the static array.
49
Nate Begemanbb01d4f2006-03-17 01:40:33 +000050//===---------------------------------------------------------------------===//
51
52Make the PPC branch selector target independant
53
54//===---------------------------------------------------------------------===//
Chris Lattnerd1aaee02006-02-03 06:21:43 +000055
56Get the C front-end to expand hypot(x,y) -> llvm.sqrt(x*x+y*y) when errno and
57precision don't matter (ffastmath). Misc/mandel will like this. :)
58
Chris Lattnerd1aaee02006-02-03 06:21:43 +000059//===---------------------------------------------------------------------===//
60
61Solve this DAG isel folding deficiency:
62
63int X, Y;
64
65void fn1(void)
66{
67 X = X | (Y << 3);
68}
69
70compiles to
71
72fn1:
73 movl Y, %eax
74 shll $3, %eax
75 orl X, %eax
76 movl %eax, X
77 ret
78
79The problem is the store's chain operand is not the load X but rather
80a TokenFactor of the load X and load Y, which prevents the folding.
81
82There are two ways to fix this:
83
841. The dag combiner can start using alias analysis to realize that y/x
85 don't alias, making the store to X not dependent on the load from Y.
862. The generated isel could be made smarter in the case it can't
87 disambiguate the pointers.
88
89Number 1 is the preferred solution.
90
Evan Cheng60f49512006-03-13 23:19:10 +000091This has been "fixed" by a TableGen hack. But that is a short term workaround
92which will be removed once the proper fix is made.
93
Chris Lattnerd1aaee02006-02-03 06:21:43 +000094//===---------------------------------------------------------------------===//
95
Chris Lattnere43e5c02006-03-04 01:19:34 +000096On targets with expensive 64-bit multiply, we could LSR this:
97
98for (i = ...; ++i) {
99 x = 1ULL << i;
100
101into:
102 long long tmp = 1;
103 for (i = ...; ++i, tmp+=tmp)
104 x = tmp;
105
106This would be a win on ppc32, but not x86 or ppc64.
107
Chris Lattnerc9a318d2006-03-04 08:44:51 +0000108//===---------------------------------------------------------------------===//
Chris Lattner5032c322006-03-05 20:00:08 +0000109
110Shrink: (setlt (loadi32 P), 0) -> (setlt (loadi8 Phi), 0)
111
112//===---------------------------------------------------------------------===//
Chris Lattnerbccb0e02006-03-07 02:46:26 +0000113
Chris Lattner003f6332006-03-11 20:17:08 +0000114Reassociate should turn: X*X*X*X -> t=(X*X) (t*t) to eliminate a multiply.
115
116//===---------------------------------------------------------------------===//
117
Chris Lattner4e56b682006-03-11 20:20:40 +0000118Interesting? testcase for add/shift/mul reassoc:
119
120int bar(int x, int y) {
121 return x*x*x+y+x*x*x*x*x*y*y*y*y;
122}
123int foo(int z, int n) {
124 return bar(z, n) + bar(2*z, 2*n);
125}
126
127//===---------------------------------------------------------------------===//
128
Chris Lattnerf1362992006-03-09 20:13:21 +0000129These two functions should generate the same code on big-endian systems:
130
131int g(int *j,int *l) { return memcmp(j,l,4); }
132int h(int *j, int *l) { return *j - *l; }
133
134this could be done in SelectionDAGISel.cpp, along with other special cases,
135for 1,2,4,8 bytes.
136
137//===---------------------------------------------------------------------===//
138
Chris Lattner5271a1f2006-03-14 19:31:24 +0000139This code:
140int rot(unsigned char b) { int a = ((b>>1) ^ (b<<7)) & 0xff; return a; }
141
142Can be improved in two ways:
143
1441. The instcombiner should eliminate the type conversions.
1452. The X86 backend should turn this into a rotate by one bit.
146
Evan Cheng0a03f782006-03-19 06:09:23 +0000147//===---------------------------------------------------------------------===//
148
149Add LSR exit value substitution. It'll probably be a win for Ackermann, etc.
Chris Lattnere24cf9d2006-03-22 07:33:46 +0000150
151//===---------------------------------------------------------------------===//
152
153It would be nice to revert this patch:
154http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060213/031986.html
155
156And teach the dag combiner enough to simplify the code expanded before
157legalize. It seems plausible that this knowledge would let it simplify other
158stuff too.
159
Chris Lattner0affd762006-03-24 19:59:17 +0000160//===---------------------------------------------------------------------===//
161
Evan Chengdc1161c2006-03-31 22:35:14 +0000162For packed types, TargetData.cpp::getTypeInfo() returns alignment that is equal
163to the type size. It works but can be overly conservative as the alignment of
164specific packed types are target dependent.
Chris Lattner0baebb12006-04-01 04:08:29 +0000165
166//===---------------------------------------------------------------------===//
167
168We should add 'unaligned load/store' nodes, and produce them from code like
169this:
170
171v4sf example(float *P) {
172 return (v4sf){P[0], P[1], P[2], P[3] };
173}
174
175//===---------------------------------------------------------------------===//
176
Chris Lattner7a29cf32006-04-02 01:47:20 +0000177We should constant fold packed type casts at the LLVM level, regardless of the
178cast. Currently we cannot fold some casts because we don't have TargetData
179information in the constant folder, so we don't know the endianness of the
180target!
181
182//===---------------------------------------------------------------------===//
Chris Lattnerd1c3a062006-04-20 18:49:28 +0000183
Chris Lattner4cda95b2006-05-18 18:26:13 +0000184Add support for conditional increments, and other related patterns. Instead
185of:
186
187 movl 136(%esp), %eax
188 cmpl $0, %eax
189 je LBB16_2 #cond_next
190LBB16_1: #cond_true
191 incl _foo
192LBB16_2: #cond_next
193
194emit:
195 movl _foo, %eax
196 cmpl $1, %edi
197 sbbl $-1, %eax
198 movl %eax, _foo
199
200//===---------------------------------------------------------------------===//
Chris Lattner240f8462006-05-19 20:45:08 +0000201
202Combine: a = sin(x), b = cos(x) into a,b = sincos(x).
203
204Expand these to calls of sin/cos and stores:
205 double sincos(double x, double *sin, double *cos);
206 float sincosf(float x, float *sin, float *cos);
207 long double sincosl(long double x, long double *sin, long double *cos);
208
209Doing so could allow SROA of the destination pointers. See also:
210http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17687
211
212//===---------------------------------------------------------------------===//
Chris Lattner29d7bde2006-05-19 21:01:38 +0000213
214Scalar Repl cannot currently promote this testcase to 'ret long cst':
215
216 %struct.X = type { int, int }
217 %struct.Y = type { %struct.X }
218ulong %bar() {
219 %retval = alloca %struct.Y, align 8 ; <%struct.Y*> [#uses=3]
Chris Lattnerf7e34782006-09-16 23:57:51 +0000220 %tmp12 = getelementptr %struct.Y* %retval, int 0, uint 0, uint 0
Chris Lattner29d7bde2006-05-19 21:01:38 +0000221 store int 0, int* %tmp12
Chris Lattnerf7e34782006-09-16 23:57:51 +0000222 %tmp15 = getelementptr %struct.Y* %retval, int 0, uint 0, uint 1
Chris Lattner29d7bde2006-05-19 21:01:38 +0000223 store int 1, int* %tmp15
Chris Lattnerf7e34782006-09-16 23:57:51 +0000224 %retval = cast %struct.Y* %retval to ulong*
Chris Lattner29d7bde2006-05-19 21:01:38 +0000225 %retval = load ulong* %retval ; <ulong> [#uses=1]
226 ret ulong %retval
227}
228
229it should be extended to do so.
230
231//===---------------------------------------------------------------------===//
Chris Lattner80b0a702006-05-21 03:57:07 +0000232
233Turn this into a single byte store with no load (the other 3 bytes are
234unmodified):
235
236void %test(uint* %P) {
237 %tmp = load uint* %P
238 %tmp14 = or uint %tmp, 3305111552
239 %tmp15 = and uint %tmp14, 3321888767
240 store uint %tmp15, uint* %P
241 ret void
242}
243
Chris Lattnera5d458722006-05-30 21:29:15 +0000244//===---------------------------------------------------------------------===//
245
246dag/inst combine "clz(x)>>5 -> x==0" for 32-bit x.
247
248Compile:
249
250int bar(int x)
251{
252 int t = __builtin_clz(x);
253 return -(t>>5);
254}
255
256to:
257
258_bar: addic r3,r3,-1
259 subfe r3,r3,r3
260 blr
261
Chris Lattnerc9dc3752006-09-15 20:31:36 +0000262//===---------------------------------------------------------------------===//
263
264Legalize should lower ctlz like this:
265 ctlz(x) = popcnt((x-1) & ~x)
266
267on targets that have popcnt but not ctlz. itanium, what else?
Chris Lattnera5d458722006-05-30 21:29:15 +0000268
Chris Lattnerf7e34782006-09-16 23:57:51 +0000269//===---------------------------------------------------------------------===//
270
271quantum_sigma_x in 462.libquantum contains the following loop:
272
273 for(i=0; i<reg->size; i++)
274 {
275 /* Flip the target bit of each basis state */
276 reg->node[i].state ^= ((MAX_UNSIGNED) 1 << target);
277 }
278
279Where MAX_UNSIGNED/state is a 64-bit int. On a 32-bit platform it would be just
280so cool to turn it into something like:
281
Chris Lattner4a13d3b2006-09-18 04:54:35 +0000282 long long Res = ((MAX_UNSIGNED) 1 << target);
Chris Lattnerf7e34782006-09-16 23:57:51 +0000283 if (target < 32) {
284 for(i=0; i<reg->size; i++)
Chris Lattner4a13d3b2006-09-18 04:54:35 +0000285 reg->node[i].state ^= Res & 0xFFFFFFFFULL;
Chris Lattnerf7e34782006-09-16 23:57:51 +0000286 } else {
287 for(i=0; i<reg->size; i++)
Chris Lattner4a13d3b2006-09-18 04:54:35 +0000288 reg->node[i].state ^= Res & 0xFFFFFFFF00000000ULL
Chris Lattnerf7e34782006-09-16 23:57:51 +0000289 }
290
291... which would only do one 32-bit XOR per loop iteration instead of two.
292
293It would also be nice to recognize the reg->size doesn't alias reg->node[i], but
294alas...
295
296//===---------------------------------------------------------------------===//
Chris Lattnerf11327d2006-09-25 17:12:14 +0000297
298This isn't recognized as bswap by instcombine:
299
300unsigned int swap_32(unsigned int v) {
301 v = ((v & 0x00ff00ffU) << 8) | ((v & 0xff00ff00U) >> 8);
302 v = ((v & 0x0000ffffU) << 16) | ((v & 0xffff0000U) >> 16);
303 return v;
304}
305
Chris Lattner4d475f62006-12-08 02:01:32 +0000306Nor is this (yes, it really is bswap):
307
308unsigned long reverse(unsigned v) {
309 unsigned t;
310 t = v ^ ((v << 16) | (v >> 16));
311 t &= ~0xff0000;
312 v = (v << 24) | (v >> 8);
313 return v ^ (t >> 8);
314}
315
Chris Lattnerf11327d2006-09-25 17:12:14 +0000316//===---------------------------------------------------------------------===//
317
318These should turn into single 16-bit (unaligned?) loads on little/big endian
319processors.
320
321unsigned short read_16_le(const unsigned char *adr) {
322 return adr[0] | (adr[1] << 8);
323}
324unsigned short read_16_be(const unsigned char *adr) {
325 return (adr[0] << 8) | adr[1];
326}
327
328//===---------------------------------------------------------------------===//
Chris Lattnerf0540032006-10-24 16:12:47 +0000329
330-scalarrepl should promote this to be a vector scalar.
331
332 %struct..0anon = type { <4 x float> }
333implementation ; Functions:
334void %test1(<4 x float> %V, float* %P) {
335entry:
336 %u = alloca %struct..0anon, align 16 ; <%struct..0anon*> [#uses=2]
337 %tmp = getelementptr %struct..0anon* %u, int 0, uint 0 ; <<4 x float>*> [#uses=1]
338 store <4 x float> %V, <4 x float>* %tmp
339 %tmp1 = cast %struct..0anon* %u to [4 x float]* ; <[4 x float]*> [#uses=1]
340 %tmp = getelementptr [4 x float]* %tmp1, int 0, int 1 ; <float*> [#uses=1]
341 %tmp = load float* %tmp ; <float> [#uses=1]
342 %tmp3 = mul float %tmp, 2.000000e+00 ; <float> [#uses=1]
343 store float %tmp3, float* %P
344 ret void
345}
346
347//===---------------------------------------------------------------------===//
Reid Spencer7e80b0b2006-10-26 06:15:43 +0000348
349-instcombine should handle this transform:
350 setcc (sdiv X / C1 ), C2
351when X, C1, and C2 are unsigned. Similarly for udiv and signed operands.
352
353Currently InstCombine avoids this transform but will do it when the signs of
354the operands and the sign of the divide match. See the FIXME in
355InstructionCombining.cpp in the visitSetCondInst method after the switch case
356for Instruction::UDiv (around line 4447) for more details.
357
358The SingleSource/Benchmarks/Shootout-C++/hash and hash2 tests have examples of
359this construct.
Chris Lattner20483732006-11-03 22:27:39 +0000360
361//===---------------------------------------------------------------------===//
362
363Instcombine misses several of these cases (see the testcase in the patch):
364http://gcc.gnu.org/ml/gcc-patches/2006-10/msg01519.html
365
Reid Spencer7e80b0b2006-10-26 06:15:43 +0000366//===---------------------------------------------------------------------===//
Chris Lattner4e03cb12006-11-10 00:23:26 +0000367
368viterbi speeds up *significantly* if the various "history" related copy loops
369are turned into memcpy calls at the source level. We need a "loops to memcpy"
370pass.
371
372//===---------------------------------------------------------------------===//
Nick Lewycky0df2ada2006-11-13 00:23:28 +0000373
374-predsimplify should transform this:
375
376void bad(unsigned x)
377{
378 if (x > 4)
379 bar(12);
380 else if (x > 3)
381 bar(523);
382 else if (x > 2)
383 bar(36);
384 else if (x > 1)
385 bar(65);
386 else if (x > 0)
387 bar(45);
388 else
389 bar(367);
390}
391
392into:
393
394void good(unsigned x)
395{
396 if (x == 4)
397 bar(523);
398 else if (x == 3)
399 bar(36);
400 else if (x == 2)
401 bar(65);
402 else if (x == 1)
403 bar(45);
404 else if (x == 0)
405 bar(367);
406 else
407 bar(12);
408}
409
410to enable further optimizations.
411
412//===---------------------------------------------------------------------===//