blob: c04e06fbfe0d078bf4e53ae454493c9dce48deee [file] [log] [blame]
Chris Lattner086c0142006-02-03 06:21:43 +00001Target Independent Opportunities:
2
Chris Lattnerf308ea02006-09-28 06:01:17 +00003//===---------------------------------------------------------------------===//
4
5We should make the following changes to clean up MachineInstr:
6
71. Add an Opcode field to TargetInstrDescriptor, so you can tell the opcode of
8 an instruction with just a TargetInstrDescriptor*.
92. Remove the Opcode field from MachineInstr, replacing it with a
10 TargetInstrDescriptor*.
113. Getting information about a machine instr then becomes:
12 MI->getInfo()->isTwoAddress()
13 instead of:
14 const TargetInstrInfo &TII = ...
15 TII.isTwoAddrInstr(MI->getOpcode())
16
17//===---------------------------------------------------------------------===//
Chris Lattner086c0142006-02-03 06:21:43 +000018
Chris Lattner9b62b452006-11-14 01:57:53 +000019With the recent changes to make the implicit def/use set explicit in
20machineinstrs, we should change the target descriptions for 'call' instructions
21so that the .td files don't list all the call-clobbered registers as implicit
22defs. Instead, these should be added by the code generator (e.g. on the dag).
23
24This has a number of uses:
25
261. PPC32/64 and X86 32/64 can avoid having multiple copies of call instructions
27 for their different impdef sets.
282. Targets with multiple calling convs (e.g. x86) which have different clobber
29 sets don't need copies of call instructions.
303. 'Interprocedural register allocation' can be done to reduce the clobber sets
31 of calls.
32
33//===---------------------------------------------------------------------===//
34
Chris Lattner086c0142006-02-03 06:21:43 +000035FreeBench/mason contains code like this:
36
37static p_type m0u(p_type p) {
38 int m[]={0, 8, 1, 2, 16, 5, 13, 7, 14, 9, 3, 4, 11, 12, 15, 10, 17, 6};
39 p_type pu;
40 pu.a = m[p.a];
41 pu.b = m[p.b];
42 pu.c = m[p.c];
43 return pu;
44}
45
46We currently compile this into a memcpy from a static array into 'm', then
47a bunch of loads from m. It would be better to avoid the memcpy and just do
48loads from the static array.
49
Nate Begeman81e80972006-03-17 01:40:33 +000050//===---------------------------------------------------------------------===//
51
52Make the PPC branch selector target independant
53
54//===---------------------------------------------------------------------===//
Chris Lattner086c0142006-02-03 06:21:43 +000055
56Get the C front-end to expand hypot(x,y) -> llvm.sqrt(x*x+y*y) when errno and
57precision don't matter (ffastmath). Misc/mandel will like this. :)
58
Chris Lattner086c0142006-02-03 06:21:43 +000059//===---------------------------------------------------------------------===//
60
61Solve this DAG isel folding deficiency:
62
63int X, Y;
64
65void fn1(void)
66{
67 X = X | (Y << 3);
68}
69
70compiles to
71
72fn1:
73 movl Y, %eax
74 shll $3, %eax
75 orl X, %eax
76 movl %eax, X
77 ret
78
79The problem is the store's chain operand is not the load X but rather
80a TokenFactor of the load X and load Y, which prevents the folding.
81
82There are two ways to fix this:
83
841. The dag combiner can start using alias analysis to realize that y/x
85 don't alias, making the store to X not dependent on the load from Y.
862. The generated isel could be made smarter in the case it can't
87 disambiguate the pointers.
88
89Number 1 is the preferred solution.
90
Evan Chenge617b082006-03-13 23:19:10 +000091This has been "fixed" by a TableGen hack. But that is a short term workaround
92which will be removed once the proper fix is made.
93
Chris Lattner086c0142006-02-03 06:21:43 +000094//===---------------------------------------------------------------------===//
95
Chris Lattnerb27b69f2006-03-04 01:19:34 +000096On targets with expensive 64-bit multiply, we could LSR this:
97
98for (i = ...; ++i) {
99 x = 1ULL << i;
100
101into:
102 long long tmp = 1;
103 for (i = ...; ++i, tmp+=tmp)
104 x = tmp;
105
106This would be a win on ppc32, but not x86 or ppc64.
107
Chris Lattnerad019932006-03-04 08:44:51 +0000108//===---------------------------------------------------------------------===//
Chris Lattner5b0fe7d2006-03-05 20:00:08 +0000109
110Shrink: (setlt (loadi32 P), 0) -> (setlt (loadi8 Phi), 0)
111
112//===---------------------------------------------------------------------===//
Chris Lattner549f27d22006-03-07 02:46:26 +0000113
Chris Lattnerc20995e2006-03-11 20:17:08 +0000114Reassociate should turn: X*X*X*X -> t=(X*X) (t*t) to eliminate a multiply.
115
116//===---------------------------------------------------------------------===//
117
Chris Lattner74cfb7d2006-03-11 20:20:40 +0000118Interesting? testcase for add/shift/mul reassoc:
119
120int bar(int x, int y) {
121 return x*x*x+y+x*x*x*x*x*y*y*y*y;
122}
123int foo(int z, int n) {
124 return bar(z, n) + bar(2*z, 2*n);
125}
126
127//===---------------------------------------------------------------------===//
128
Chris Lattner82c78b22006-03-09 20:13:21 +0000129These two functions should generate the same code on big-endian systems:
130
131int g(int *j,int *l) { return memcmp(j,l,4); }
132int h(int *j, int *l) { return *j - *l; }
133
134this could be done in SelectionDAGISel.cpp, along with other special cases,
135for 1,2,4,8 bytes.
136
137//===---------------------------------------------------------------------===//
138
Chris Lattnercbd3cdd2006-03-14 19:31:24 +0000139This code:
140int rot(unsigned char b) { int a = ((b>>1) ^ (b<<7)) & 0xff; return a; }
141
142Can be improved in two ways:
143
1441. The instcombiner should eliminate the type conversions.
1452. The X86 backend should turn this into a rotate by one bit.
146
Evan Chengd3864b52006-03-19 06:09:23 +0000147//===---------------------------------------------------------------------===//
148
149Add LSR exit value substitution. It'll probably be a win for Ackermann, etc.
Chris Lattnerc04b4232006-03-22 07:33:46 +0000150
151//===---------------------------------------------------------------------===//
152
153It would be nice to revert this patch:
154http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060213/031986.html
155
156And teach the dag combiner enough to simplify the code expanded before
157legalize. It seems plausible that this knowledge would let it simplify other
158stuff too.
159
Chris Lattnere6cd96d2006-03-24 19:59:17 +0000160//===---------------------------------------------------------------------===//
161
Evan Cheng67d3d4c2006-03-31 22:35:14 +0000162For packed types, TargetData.cpp::getTypeInfo() returns alignment that is equal
163to the type size. It works but can be overly conservative as the alignment of
164specific packed types are target dependent.
Chris Lattnereaa7c062006-04-01 04:08:29 +0000165
166//===---------------------------------------------------------------------===//
167
168We should add 'unaligned load/store' nodes, and produce them from code like
169this:
170
171v4sf example(float *P) {
172 return (v4sf){P[0], P[1], P[2], P[3] };
173}
174
175//===---------------------------------------------------------------------===//
176
Chris Lattner52951222006-04-02 01:47:20 +0000177We should constant fold packed type casts at the LLVM level, regardless of the
178cast. Currently we cannot fold some casts because we don't have TargetData
179information in the constant folder, so we don't know the endianness of the
180target!
181
182//===---------------------------------------------------------------------===//
Chris Lattner879acef2006-04-20 18:49:28 +0000183
Chris Lattner16abfdf2006-05-18 18:26:13 +0000184Add support for conditional increments, and other related patterns. Instead
185of:
186
187 movl 136(%esp), %eax
188 cmpl $0, %eax
189 je LBB16_2 #cond_next
190LBB16_1: #cond_true
191 incl _foo
192LBB16_2: #cond_next
193
194emit:
195 movl _foo, %eax
196 cmpl $1, %edi
197 sbbl $-1, %eax
198 movl %eax, _foo
199
200//===---------------------------------------------------------------------===//
Chris Lattner870cf1b2006-05-19 20:45:08 +0000201
202Combine: a = sin(x), b = cos(x) into a,b = sincos(x).
203
204Expand these to calls of sin/cos and stores:
205 double sincos(double x, double *sin, double *cos);
206 float sincosf(float x, float *sin, float *cos);
207 long double sincosl(long double x, long double *sin, long double *cos);
208
209Doing so could allow SROA of the destination pointers. See also:
210http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17687
211
212//===---------------------------------------------------------------------===//
Chris Lattnerf00f68a2006-05-19 21:01:38 +0000213
214Scalar Repl cannot currently promote this testcase to 'ret long cst':
215
216 %struct.X = type { int, int }
217 %struct.Y = type { %struct.X }
218ulong %bar() {
219 %retval = alloca %struct.Y, align 8 ; <%struct.Y*> [#uses=3]
Chris Lattner7ed96ab2006-09-16 23:57:51 +0000220 %tmp12 = getelementptr %struct.Y* %retval, int 0, uint 0, uint 0
Chris Lattnerf00f68a2006-05-19 21:01:38 +0000221 store int 0, int* %tmp12
Chris Lattner7ed96ab2006-09-16 23:57:51 +0000222 %tmp15 = getelementptr %struct.Y* %retval, int 0, uint 0, uint 1
Chris Lattnerf00f68a2006-05-19 21:01:38 +0000223 store int 1, int* %tmp15
Chris Lattner7ed96ab2006-09-16 23:57:51 +0000224 %retval = cast %struct.Y* %retval to ulong*
Chris Lattnerf00f68a2006-05-19 21:01:38 +0000225 %retval = load ulong* %retval ; <ulong> [#uses=1]
226 ret ulong %retval
227}
228
229it should be extended to do so.
230
231//===---------------------------------------------------------------------===//
Chris Lattnere8263e62006-05-21 03:57:07 +0000232
233Turn this into a single byte store with no load (the other 3 bytes are
234unmodified):
235
236void %test(uint* %P) {
237 %tmp = load uint* %P
238 %tmp14 = or uint %tmp, 3305111552
239 %tmp15 = and uint %tmp14, 3321888767
240 store uint %tmp15, uint* %P
241 ret void
242}
243
Chris Lattner9e18ef52006-05-30 21:29:15 +0000244//===---------------------------------------------------------------------===//
245
246dag/inst combine "clz(x)>>5 -> x==0" for 32-bit x.
247
248Compile:
249
250int bar(int x)
251{
252 int t = __builtin_clz(x);
253 return -(t>>5);
254}
255
256to:
257
258_bar: addic r3,r3,-1
259 subfe r3,r3,r3
260 blr
261
Chris Lattnercbce2f62006-09-15 20:31:36 +0000262//===---------------------------------------------------------------------===//
263
264Legalize should lower ctlz like this:
265 ctlz(x) = popcnt((x-1) & ~x)
266
267on targets that have popcnt but not ctlz. itanium, what else?
Chris Lattner9e18ef52006-05-30 21:29:15 +0000268
Chris Lattner7ed96ab2006-09-16 23:57:51 +0000269//===---------------------------------------------------------------------===//
270
271quantum_sigma_x in 462.libquantum contains the following loop:
272
273 for(i=0; i<reg->size; i++)
274 {
275 /* Flip the target bit of each basis state */
276 reg->node[i].state ^= ((MAX_UNSIGNED) 1 << target);
277 }
278
279Where MAX_UNSIGNED/state is a 64-bit int. On a 32-bit platform it would be just
280so cool to turn it into something like:
281
Chris Lattnerb33a42a2006-09-18 04:54:35 +0000282 long long Res = ((MAX_UNSIGNED) 1 << target);
Chris Lattner7ed96ab2006-09-16 23:57:51 +0000283 if (target < 32) {
284 for(i=0; i<reg->size; i++)
Chris Lattnerb33a42a2006-09-18 04:54:35 +0000285 reg->node[i].state ^= Res & 0xFFFFFFFFULL;
Chris Lattner7ed96ab2006-09-16 23:57:51 +0000286 } else {
287 for(i=0; i<reg->size; i++)
Chris Lattnerb33a42a2006-09-18 04:54:35 +0000288 reg->node[i].state ^= Res & 0xFFFFFFFF00000000ULL
Chris Lattner7ed96ab2006-09-16 23:57:51 +0000289 }
290
291... which would only do one 32-bit XOR per loop iteration instead of two.
292
293It would also be nice to recognize the reg->size doesn't alias reg->node[i], but
294alas...
295
296//===---------------------------------------------------------------------===//
Chris Lattnerfb981f32006-09-25 17:12:14 +0000297
298This isn't recognized as bswap by instcombine:
299
300unsigned int swap_32(unsigned int v) {
301 v = ((v & 0x00ff00ffU) << 8) | ((v & 0xff00ff00U) >> 8);
302 v = ((v & 0x0000ffffU) << 16) | ((v & 0xffff0000U) >> 16);
303 return v;
304}
305
Chris Lattnerf9bae432006-12-08 02:01:32 +0000306Nor is this (yes, it really is bswap):
307
308unsigned long reverse(unsigned v) {
309 unsigned t;
310 t = v ^ ((v << 16) | (v >> 16));
311 t &= ~0xff0000;
312 v = (v << 24) | (v >> 8);
313 return v ^ (t >> 8);
314}
315
Chris Lattnerfb981f32006-09-25 17:12:14 +0000316//===---------------------------------------------------------------------===//
317
318These should turn into single 16-bit (unaligned?) loads on little/big endian
319processors.
320
321unsigned short read_16_le(const unsigned char *adr) {
322 return adr[0] | (adr[1] << 8);
323}
324unsigned short read_16_be(const unsigned char *adr) {
325 return (adr[0] << 8) | adr[1];
326}
327
328//===---------------------------------------------------------------------===//
Chris Lattnercf103912006-10-24 16:12:47 +0000329
330-scalarrepl should promote this to be a vector scalar.
331
332 %struct..0anon = type { <4 x float> }
333implementation ; Functions:
334void %test1(<4 x float> %V, float* %P) {
335entry:
336 %u = alloca %struct..0anon, align 16 ; <%struct..0anon*> [#uses=2]
337 %tmp = getelementptr %struct..0anon* %u, int 0, uint 0 ; <<4 x float>*> [#uses=1]
338 store <4 x float> %V, <4 x float>* %tmp
339 %tmp1 = cast %struct..0anon* %u to [4 x float]* ; <[4 x float]*> [#uses=1]
340 %tmp = getelementptr [4 x float]* %tmp1, int 0, int 1 ; <float*> [#uses=1]
341 %tmp = load float* %tmp ; <float> [#uses=1]
342 %tmp3 = mul float %tmp, 2.000000e+00 ; <float> [#uses=1]
343 store float %tmp3, float* %P
344 ret void
345}
346
347//===---------------------------------------------------------------------===//
Reid Spencer1628cec2006-10-26 06:15:43 +0000348
349-instcombine should handle this transform:
350 setcc (sdiv X / C1 ), C2
351when X, C1, and C2 are unsigned. Similarly for udiv and signed operands.
352
353Currently InstCombine avoids this transform but will do it when the signs of
354the operands and the sign of the divide match. See the FIXME in
355InstructionCombining.cpp in the visitSetCondInst method after the switch case
356for Instruction::UDiv (around line 4447) for more details.
357
358The SingleSource/Benchmarks/Shootout-C++/hash and hash2 tests have examples of
359this construct.
Chris Lattnerd7c628d2006-11-03 22:27:39 +0000360
361//===---------------------------------------------------------------------===//
362
363Instcombine misses several of these cases (see the testcase in the patch):
364http://gcc.gnu.org/ml/gcc-patches/2006-10/msg01519.html
365
Reid Spencer1628cec2006-10-26 06:15:43 +0000366//===---------------------------------------------------------------------===//
Chris Lattner578d2df2006-11-10 00:23:26 +0000367
368viterbi speeds up *significantly* if the various "history" related copy loops
369are turned into memcpy calls at the source level. We need a "loops to memcpy"
370pass.
371
372//===---------------------------------------------------------------------===//
Nick Lewyckybf637342006-11-13 00:23:28 +0000373
374-predsimplify should transform this:
375
376void bad(unsigned x)
377{
378 if (x > 4)
379 bar(12);
380 else if (x > 3)
381 bar(523);
382 else if (x > 2)
383 bar(36);
384 else if (x > 1)
385 bar(65);
386 else if (x > 0)
387 bar(45);
388 else
389 bar(367);
390}
391
392into:
393
394void good(unsigned x)
395{
396 if (x == 4)
397 bar(523);
398 else if (x == 3)
399 bar(36);
400 else if (x == 2)
401 bar(65);
402 else if (x == 1)
403 bar(45);
404 else if (x == 0)
405 bar(367);
406 else
407 bar(12);
408}
409
410to enable further optimizations.
411
412//===---------------------------------------------------------------------===//