blob: 3e5dbf6172dc749261ed075d15e1a2852907c280 [file] [log] [blame]
Chris Lattner086c0142006-02-03 06:21:43 +00001Target Independent Opportunities:
2
Chris Lattnerf308ea02006-09-28 06:01:17 +00003//===---------------------------------------------------------------------===//
4
5We should make the following changes to clean up MachineInstr:
6
71. Add an Opcode field to TargetInstrDescriptor, so you can tell the opcode of
8 an instruction with just a TargetInstrDescriptor*.
92. Remove the Opcode field from MachineInstr, replacing it with a
10 TargetInstrDescriptor*.
113. Getting information about a machine instr then becomes:
12 MI->getInfo()->isTwoAddress()
13 instead of:
14 const TargetInstrInfo &TII = ...
15 TII.isTwoAddrInstr(MI->getOpcode())
16
17//===---------------------------------------------------------------------===//
Chris Lattner086c0142006-02-03 06:21:43 +000018
19FreeBench/mason contains code like this:
20
21static p_type m0u(p_type p) {
22 int m[]={0, 8, 1, 2, 16, 5, 13, 7, 14, 9, 3, 4, 11, 12, 15, 10, 17, 6};
23 p_type pu;
24 pu.a = m[p.a];
25 pu.b = m[p.b];
26 pu.c = m[p.c];
27 return pu;
28}
29
30We currently compile this into a memcpy from a static array into 'm', then
31a bunch of loads from m. It would be better to avoid the memcpy and just do
32loads from the static array.
33
Nate Begeman81e80972006-03-17 01:40:33 +000034//===---------------------------------------------------------------------===//
35
36Make the PPC branch selector target independant
37
38//===---------------------------------------------------------------------===//
Chris Lattner086c0142006-02-03 06:21:43 +000039
40Get the C front-end to expand hypot(x,y) -> llvm.sqrt(x*x+y*y) when errno and
41precision don't matter (ffastmath). Misc/mandel will like this. :)
42
Chris Lattner086c0142006-02-03 06:21:43 +000043//===---------------------------------------------------------------------===//
44
45Solve this DAG isel folding deficiency:
46
47int X, Y;
48
49void fn1(void)
50{
51 X = X | (Y << 3);
52}
53
54compiles to
55
56fn1:
57 movl Y, %eax
58 shll $3, %eax
59 orl X, %eax
60 movl %eax, X
61 ret
62
63The problem is the store's chain operand is not the load X but rather
64a TokenFactor of the load X and load Y, which prevents the folding.
65
66There are two ways to fix this:
67
681. The dag combiner can start using alias analysis to realize that y/x
69 don't alias, making the store to X not dependent on the load from Y.
702. The generated isel could be made smarter in the case it can't
71 disambiguate the pointers.
72
73Number 1 is the preferred solution.
74
Evan Chenge617b082006-03-13 23:19:10 +000075This has been "fixed" by a TableGen hack. But that is a short term workaround
76which will be removed once the proper fix is made.
77
Chris Lattner086c0142006-02-03 06:21:43 +000078//===---------------------------------------------------------------------===//
79
Chris Lattnerb27b69f2006-03-04 01:19:34 +000080On targets with expensive 64-bit multiply, we could LSR this:
81
82for (i = ...; ++i) {
83 x = 1ULL << i;
84
85into:
86 long long tmp = 1;
87 for (i = ...; ++i, tmp+=tmp)
88 x = tmp;
89
90This would be a win on ppc32, but not x86 or ppc64.
91
Chris Lattnerad019932006-03-04 08:44:51 +000092//===---------------------------------------------------------------------===//
Chris Lattner5b0fe7d2006-03-05 20:00:08 +000093
94Shrink: (setlt (loadi32 P), 0) -> (setlt (loadi8 Phi), 0)
95
96//===---------------------------------------------------------------------===//
Chris Lattner549f27d22006-03-07 02:46:26 +000097
Chris Lattnerc20995e2006-03-11 20:17:08 +000098Reassociate should turn: X*X*X*X -> t=(X*X) (t*t) to eliminate a multiply.
99
100//===---------------------------------------------------------------------===//
101
Chris Lattner74cfb7d2006-03-11 20:20:40 +0000102Interesting? testcase for add/shift/mul reassoc:
103
104int bar(int x, int y) {
105 return x*x*x+y+x*x*x*x*x*y*y*y*y;
106}
107int foo(int z, int n) {
108 return bar(z, n) + bar(2*z, 2*n);
109}
110
111//===---------------------------------------------------------------------===//
112
Chris Lattner82c78b22006-03-09 20:13:21 +0000113These two functions should generate the same code on big-endian systems:
114
115int g(int *j,int *l) { return memcmp(j,l,4); }
116int h(int *j, int *l) { return *j - *l; }
117
118this could be done in SelectionDAGISel.cpp, along with other special cases,
119for 1,2,4,8 bytes.
120
121//===---------------------------------------------------------------------===//
122
Chris Lattnercbd3cdd2006-03-14 19:31:24 +0000123This code:
124int rot(unsigned char b) { int a = ((b>>1) ^ (b<<7)) & 0xff; return a; }
125
126Can be improved in two ways:
127
1281. The instcombiner should eliminate the type conversions.
1292. The X86 backend should turn this into a rotate by one bit.
130
Evan Chengd3864b52006-03-19 06:09:23 +0000131//===---------------------------------------------------------------------===//
132
133Add LSR exit value substitution. It'll probably be a win for Ackermann, etc.
Chris Lattnerc04b4232006-03-22 07:33:46 +0000134
135//===---------------------------------------------------------------------===//
136
137It would be nice to revert this patch:
138http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060213/031986.html
139
140And teach the dag combiner enough to simplify the code expanded before
141legalize. It seems plausible that this knowledge would let it simplify other
142stuff too.
143
Chris Lattnere6cd96d2006-03-24 19:59:17 +0000144//===---------------------------------------------------------------------===//
145
Evan Cheng67d3d4c2006-03-31 22:35:14 +0000146For packed types, TargetData.cpp::getTypeInfo() returns alignment that is equal
147to the type size. It works but can be overly conservative as the alignment of
148specific packed types are target dependent.
Chris Lattnereaa7c062006-04-01 04:08:29 +0000149
150//===---------------------------------------------------------------------===//
151
152We should add 'unaligned load/store' nodes, and produce them from code like
153this:
154
155v4sf example(float *P) {
156 return (v4sf){P[0], P[1], P[2], P[3] };
157}
158
159//===---------------------------------------------------------------------===//
160
Chris Lattner52951222006-04-02 01:47:20 +0000161We should constant fold packed type casts at the LLVM level, regardless of the
162cast. Currently we cannot fold some casts because we don't have TargetData
163information in the constant folder, so we don't know the endianness of the
164target!
165
166//===---------------------------------------------------------------------===//
Chris Lattner879acef2006-04-20 18:49:28 +0000167
Chris Lattner16abfdf2006-05-18 18:26:13 +0000168Add support for conditional increments, and other related patterns. Instead
169of:
170
171 movl 136(%esp), %eax
172 cmpl $0, %eax
173 je LBB16_2 #cond_next
174LBB16_1: #cond_true
175 incl _foo
176LBB16_2: #cond_next
177
178emit:
179 movl _foo, %eax
180 cmpl $1, %edi
181 sbbl $-1, %eax
182 movl %eax, _foo
183
184//===---------------------------------------------------------------------===//
Chris Lattner870cf1b2006-05-19 20:45:08 +0000185
186Combine: a = sin(x), b = cos(x) into a,b = sincos(x).
187
188Expand these to calls of sin/cos and stores:
189 double sincos(double x, double *sin, double *cos);
190 float sincosf(float x, float *sin, float *cos);
191 long double sincosl(long double x, long double *sin, long double *cos);
192
193Doing so could allow SROA of the destination pointers. See also:
194http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17687
195
196//===---------------------------------------------------------------------===//
Chris Lattnerf00f68a2006-05-19 21:01:38 +0000197
198Scalar Repl cannot currently promote this testcase to 'ret long cst':
199
200 %struct.X = type { int, int }
201 %struct.Y = type { %struct.X }
202ulong %bar() {
203 %retval = alloca %struct.Y, align 8 ; <%struct.Y*> [#uses=3]
Chris Lattner7ed96ab2006-09-16 23:57:51 +0000204 %tmp12 = getelementptr %struct.Y* %retval, int 0, uint 0, uint 0
Chris Lattnerf00f68a2006-05-19 21:01:38 +0000205 store int 0, int* %tmp12
Chris Lattner7ed96ab2006-09-16 23:57:51 +0000206 %tmp15 = getelementptr %struct.Y* %retval, int 0, uint 0, uint 1
Chris Lattnerf00f68a2006-05-19 21:01:38 +0000207 store int 1, int* %tmp15
Chris Lattner7ed96ab2006-09-16 23:57:51 +0000208 %retval = cast %struct.Y* %retval to ulong*
Chris Lattnerf00f68a2006-05-19 21:01:38 +0000209 %retval = load ulong* %retval ; <ulong> [#uses=1]
210 ret ulong %retval
211}
212
213it should be extended to do so.
214
215//===---------------------------------------------------------------------===//
Chris Lattnere8263e62006-05-21 03:57:07 +0000216
217Turn this into a single byte store with no load (the other 3 bytes are
218unmodified):
219
220void %test(uint* %P) {
221 %tmp = load uint* %P
222 %tmp14 = or uint %tmp, 3305111552
223 %tmp15 = and uint %tmp14, 3321888767
224 store uint %tmp15, uint* %P
225 ret void
226}
227
Chris Lattner9e18ef52006-05-30 21:29:15 +0000228//===---------------------------------------------------------------------===//
229
230dag/inst combine "clz(x)>>5 -> x==0" for 32-bit x.
231
232Compile:
233
234int bar(int x)
235{
236 int t = __builtin_clz(x);
237 return -(t>>5);
238}
239
240to:
241
242_bar: addic r3,r3,-1
243 subfe r3,r3,r3
244 blr
245
Chris Lattnercbce2f62006-09-15 20:31:36 +0000246//===---------------------------------------------------------------------===//
247
248Legalize should lower ctlz like this:
249 ctlz(x) = popcnt((x-1) & ~x)
250
251on targets that have popcnt but not ctlz. itanium, what else?
Chris Lattner9e18ef52006-05-30 21:29:15 +0000252
Chris Lattner7ed96ab2006-09-16 23:57:51 +0000253//===---------------------------------------------------------------------===//
254
255quantum_sigma_x in 462.libquantum contains the following loop:
256
257 for(i=0; i<reg->size; i++)
258 {
259 /* Flip the target bit of each basis state */
260 reg->node[i].state ^= ((MAX_UNSIGNED) 1 << target);
261 }
262
263Where MAX_UNSIGNED/state is a 64-bit int. On a 32-bit platform it would be just
264so cool to turn it into something like:
265
Chris Lattnerb33a42a2006-09-18 04:54:35 +0000266 long long Res = ((MAX_UNSIGNED) 1 << target);
Chris Lattner7ed96ab2006-09-16 23:57:51 +0000267 if (target < 32) {
268 for(i=0; i<reg->size; i++)
Chris Lattnerb33a42a2006-09-18 04:54:35 +0000269 reg->node[i].state ^= Res & 0xFFFFFFFFULL;
Chris Lattner7ed96ab2006-09-16 23:57:51 +0000270 } else {
271 for(i=0; i<reg->size; i++)
Chris Lattnerb33a42a2006-09-18 04:54:35 +0000272 reg->node[i].state ^= Res & 0xFFFFFFFF00000000ULL
Chris Lattner7ed96ab2006-09-16 23:57:51 +0000273 }
274
275... which would only do one 32-bit XOR per loop iteration instead of two.
276
277It would also be nice to recognize the reg->size doesn't alias reg->node[i], but
278alas...
279
280//===---------------------------------------------------------------------===//
Chris Lattnerfb981f32006-09-25 17:12:14 +0000281
282This isn't recognized as bswap by instcombine:
283
284unsigned int swap_32(unsigned int v) {
285 v = ((v & 0x00ff00ffU) << 8) | ((v & 0xff00ff00U) >> 8);
286 v = ((v & 0x0000ffffU) << 16) | ((v & 0xffff0000U) >> 16);
287 return v;
288}
289
Chris Lattner8c97c072006-11-06 21:26:49 +0000290Nor is this:
291
292ushort %bad(ushort %a) {
293entry:
294 %tmp = cast ushort %a to uint ; <uint> [#uses=1]
295 %tmp2 = shr uint %tmp, ubyte 8 ; <uint> [#uses=1]
296 %tmp2 = cast uint %tmp2 to ushort ; <ushort> [#uses=1]
297 %tmp5 = shl ushort %a, ubyte 8 ; <ushort> [#uses=1]
298 %tmp6 = or ushort %tmp2, %tmp5 ; <ushort> [#uses=1]
299 ret ushort %tmp6
300}
301
302unsigned short bad(unsigned short a) {
303 return ((a & 0xff00) >> 8 | (a & 0x00ff) << 8);
304}
305
Chris Lattnerfb981f32006-09-25 17:12:14 +0000306//===---------------------------------------------------------------------===//
307
308These should turn into single 16-bit (unaligned?) loads on little/big endian
309processors.
310
311unsigned short read_16_le(const unsigned char *adr) {
312 return adr[0] | (adr[1] << 8);
313}
314unsigned short read_16_be(const unsigned char *adr) {
315 return (adr[0] << 8) | adr[1];
316}
317
318//===---------------------------------------------------------------------===//
Chris Lattnercf103912006-10-24 16:12:47 +0000319
320-scalarrepl should promote this to be a vector scalar.
321
322 %struct..0anon = type { <4 x float> }
323implementation ; Functions:
324void %test1(<4 x float> %V, float* %P) {
325entry:
326 %u = alloca %struct..0anon, align 16 ; <%struct..0anon*> [#uses=2]
327 %tmp = getelementptr %struct..0anon* %u, int 0, uint 0 ; <<4 x float>*> [#uses=1]
328 store <4 x float> %V, <4 x float>* %tmp
329 %tmp1 = cast %struct..0anon* %u to [4 x float]* ; <[4 x float]*> [#uses=1]
330 %tmp = getelementptr [4 x float]* %tmp1, int 0, int 1 ; <float*> [#uses=1]
331 %tmp = load float* %tmp ; <float> [#uses=1]
332 %tmp3 = mul float %tmp, 2.000000e+00 ; <float> [#uses=1]
333 store float %tmp3, float* %P
334 ret void
335}
336
337//===---------------------------------------------------------------------===//
Reid Spencer1628cec2006-10-26 06:15:43 +0000338
339-instcombine should handle this transform:
340 setcc (sdiv X / C1 ), C2
341when X, C1, and C2 are unsigned. Similarly for udiv and signed operands.
342
343Currently InstCombine avoids this transform but will do it when the signs of
344the operands and the sign of the divide match. See the FIXME in
345InstructionCombining.cpp in the visitSetCondInst method after the switch case
346for Instruction::UDiv (around line 4447) for more details.
347
348The SingleSource/Benchmarks/Shootout-C++/hash and hash2 tests have examples of
349this construct.
Chris Lattnerd7c628d2006-11-03 22:27:39 +0000350
351//===---------------------------------------------------------------------===//
352
353Instcombine misses several of these cases (see the testcase in the patch):
354http://gcc.gnu.org/ml/gcc-patches/2006-10/msg01519.html
355
Reid Spencer1628cec2006-10-26 06:15:43 +0000356//===---------------------------------------------------------------------===//
Chris Lattner578d2df2006-11-10 00:23:26 +0000357
358viterbi speeds up *significantly* if the various "history" related copy loops
359are turned into memcpy calls at the source level. We need a "loops to memcpy"
360pass.
361
362//===---------------------------------------------------------------------===//