blob: 650eabf2d4d15b9203fb1577e47ea17d854683c0 [file] [log] [blame]
Chris Lattner086c0142006-02-03 06:21:43 +00001Target Independent Opportunities:
2
3===-------------------------------------------------------------------------===
4
5FreeBench/mason contains code like this:
6
7static p_type m0u(p_type p) {
8 int m[]={0, 8, 1, 2, 16, 5, 13, 7, 14, 9, 3, 4, 11, 12, 15, 10, 17, 6};
9 p_type pu;
10 pu.a = m[p.a];
11 pu.b = m[p.b];
12 pu.c = m[p.c];
13 return pu;
14}
15
16We currently compile this into a memcpy from a static array into 'm', then
17a bunch of loads from m. It would be better to avoid the memcpy and just do
18loads from the static array.
19
Nate Begeman81e80972006-03-17 01:40:33 +000020//===---------------------------------------------------------------------===//
21
22Make the PPC branch selector target independant
23
24//===---------------------------------------------------------------------===//
Chris Lattner086c0142006-02-03 06:21:43 +000025
26Get the C front-end to expand hypot(x,y) -> llvm.sqrt(x*x+y*y) when errno and
27precision don't matter (ffastmath). Misc/mandel will like this. :)
28
Chris Lattner086c0142006-02-03 06:21:43 +000029//===---------------------------------------------------------------------===//
30
31Solve this DAG isel folding deficiency:
32
33int X, Y;
34
35void fn1(void)
36{
37 X = X | (Y << 3);
38}
39
40compiles to
41
42fn1:
43 movl Y, %eax
44 shll $3, %eax
45 orl X, %eax
46 movl %eax, X
47 ret
48
49The problem is the store's chain operand is not the load X but rather
50a TokenFactor of the load X and load Y, which prevents the folding.
51
52There are two ways to fix this:
53
541. The dag combiner can start using alias analysis to realize that y/x
55 don't alias, making the store to X not dependent on the load from Y.
562. The generated isel could be made smarter in the case it can't
57 disambiguate the pointers.
58
59Number 1 is the preferred solution.
60
Evan Chenge617b082006-03-13 23:19:10 +000061This has been "fixed" by a TableGen hack. But that is a short term workaround
62which will be removed once the proper fix is made.
63
Chris Lattner086c0142006-02-03 06:21:43 +000064//===---------------------------------------------------------------------===//
65
Chris Lattnera1532bc2006-02-21 18:29:44 +000066Turn this into a signed shift right in instcombine:
67
68int f(unsigned x) {
69 return x >> 31 ? -1 : 0;
70}
71
72http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25600
73http://gcc.gnu.org/ml/gcc-patches/2006-02/msg01492.html
74
Chris Lattner89188a12006-03-02 22:34:38 +000075//===---------------------------------------------------------------------===//
76
Chris Lattnerb27b69f2006-03-04 01:19:34 +000077On targets with expensive 64-bit multiply, we could LSR this:
78
79for (i = ...; ++i) {
80 x = 1ULL << i;
81
82into:
83 long long tmp = 1;
84 for (i = ...; ++i, tmp+=tmp)
85 x = tmp;
86
87This would be a win on ppc32, but not x86 or ppc64.
88
Chris Lattnerad019932006-03-04 08:44:51 +000089//===---------------------------------------------------------------------===//
Chris Lattner5b0fe7d2006-03-05 20:00:08 +000090
91Shrink: (setlt (loadi32 P), 0) -> (setlt (loadi8 Phi), 0)
92
93//===---------------------------------------------------------------------===//
Chris Lattner549f27d22006-03-07 02:46:26 +000094
Chris Lattnerc20995e2006-03-11 20:17:08 +000095Reassociate should turn: X*X*X*X -> t=(X*X) (t*t) to eliminate a multiply.
96
97//===---------------------------------------------------------------------===//
98
Chris Lattner74cfb7d2006-03-11 20:20:40 +000099Interesting? testcase for add/shift/mul reassoc:
100
101int bar(int x, int y) {
102 return x*x*x+y+x*x*x*x*x*y*y*y*y;
103}
104int foo(int z, int n) {
105 return bar(z, n) + bar(2*z, 2*n);
106}
107
108//===---------------------------------------------------------------------===//
109
Chris Lattner82c78b22006-03-09 20:13:21 +0000110These two functions should generate the same code on big-endian systems:
111
112int g(int *j,int *l) { return memcmp(j,l,4); }
113int h(int *j, int *l) { return *j - *l; }
114
115this could be done in SelectionDAGISel.cpp, along with other special cases,
116for 1,2,4,8 bytes.
117
118//===---------------------------------------------------------------------===//
119
Chris Lattnercbd3cdd2006-03-14 19:31:24 +0000120This code:
121int rot(unsigned char b) { int a = ((b>>1) ^ (b<<7)) & 0xff; return a; }
122
123Can be improved in two ways:
124
1251. The instcombiner should eliminate the type conversions.
1262. The X86 backend should turn this into a rotate by one bit.
127
Evan Chengd3864b52006-03-19 06:09:23 +0000128//===---------------------------------------------------------------------===//
129
130Add LSR exit value substitution. It'll probably be a win for Ackermann, etc.
Chris Lattnerc04b4232006-03-22 07:33:46 +0000131
132//===---------------------------------------------------------------------===//
133
134It would be nice to revert this patch:
135http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060213/031986.html
136
137And teach the dag combiner enough to simplify the code expanded before
138legalize. It seems plausible that this knowledge would let it simplify other
139stuff too.
140
Chris Lattnere6cd96d2006-03-24 19:59:17 +0000141//===---------------------------------------------------------------------===//
142
143The loop unroller should be enhanced to be able to unroll loops that aren't
144single basic blocks. It should be able to handle stuff like this:
145
146 for (i = 0; i < c1; ++i)
147 if (c2 & (1 << i))
148 foo
149
150where c1/c2 are constants.
151
Evan Cheng67d3d4c2006-03-31 22:35:14 +0000152//===---------------------------------------------------------------------===//
Chris Lattnere6cd96d2006-03-24 19:59:17 +0000153
Evan Cheng67d3d4c2006-03-31 22:35:14 +0000154For packed types, TargetData.cpp::getTypeInfo() returns alignment that is equal
155to the type size. It works but can be overly conservative as the alignment of
156specific packed types are target dependent.
Chris Lattnereaa7c062006-04-01 04:08:29 +0000157
158//===---------------------------------------------------------------------===//
159
160We should add 'unaligned load/store' nodes, and produce them from code like
161this:
162
163v4sf example(float *P) {
164 return (v4sf){P[0], P[1], P[2], P[3] };
165}
166
167//===---------------------------------------------------------------------===//
168
Chris Lattner52951222006-04-02 01:47:20 +0000169We should constant fold packed type casts at the LLVM level, regardless of the
170cast. Currently we cannot fold some casts because we don't have TargetData
171information in the constant folder, so we don't know the endianness of the
172target!
173
174//===---------------------------------------------------------------------===//
Chris Lattner879acef2006-04-20 18:49:28 +0000175
176Consider this:
177
178unsigned short swap_16(unsigned short v) { return (v>>8) | (v<<8); }
179
180Compiled with the ppc backend:
181
182_swap_16:
183 slwi r2, r3, 8
184 srwi r3, r3, 8
185 or r2, r3, r2
186 rlwinm r3, r2, 0, 16, 31
187 blr
188
189The rlwinm (an and by 65535) is dead. The dag combiner should propagate bits
190better than that to see this.
191
192//===---------------------------------------------------------------------===//
Chris Lattner16abfdf2006-05-18 18:26:13 +0000193
194Add support for conditional increments, and other related patterns. Instead
195of:
196
197 movl 136(%esp), %eax
198 cmpl $0, %eax
199 je LBB16_2 #cond_next
200LBB16_1: #cond_true
201 incl _foo
202LBB16_2: #cond_next
203
204emit:
205 movl _foo, %eax
206 cmpl $1, %edi
207 sbbl $-1, %eax
208 movl %eax, _foo
209
210//===---------------------------------------------------------------------===//
Chris Lattner870cf1b2006-05-19 20:45:08 +0000211
212Combine: a = sin(x), b = cos(x) into a,b = sincos(x).
213
214Expand these to calls of sin/cos and stores:
215 double sincos(double x, double *sin, double *cos);
216 float sincosf(float x, float *sin, float *cos);
217 long double sincosl(long double x, long double *sin, long double *cos);
218
219Doing so could allow SROA of the destination pointers. See also:
220http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17687
221
222//===---------------------------------------------------------------------===//
Chris Lattnerf00f68a2006-05-19 21:01:38 +0000223
224Scalar Repl cannot currently promote this testcase to 'ret long cst':
225
226 %struct.X = type { int, int }
227 %struct.Y = type { %struct.X }
228ulong %bar() {
229 %retval = alloca %struct.Y, align 8 ; <%struct.Y*> [#uses=3]
230 %tmp12 = getelementptr %struct.Y* %retval, int 0, uint 0, uint 0 ; <int*> [#uses=1]
231 store int 0, int* %tmp12
232 %tmp15 = getelementptr %struct.Y* %retval, int 0, uint 0, uint 1 ; <int*> [#uses=1]
233 store int 1, int* %tmp15
234 %retval = cast %struct.Y* %retval to ulong* ; <ulong*> [#uses=1]
235 %retval = load ulong* %retval ; <ulong> [#uses=1]
236 ret ulong %retval
237}
238
239it should be extended to do so.
240
241//===---------------------------------------------------------------------===//
Chris Lattnere8263e62006-05-21 03:57:07 +0000242
243Turn this into a single byte store with no load (the other 3 bytes are
244unmodified):
245
246void %test(uint* %P) {
247 %tmp = load uint* %P
248 %tmp14 = or uint %tmp, 3305111552
249 %tmp15 = and uint %tmp14, 3321888767
250 store uint %tmp15, uint* %P
251 ret void
252}
253
Chris Lattner9e18ef52006-05-30 21:29:15 +0000254//===---------------------------------------------------------------------===//
255
256dag/inst combine "clz(x)>>5 -> x==0" for 32-bit x.
257
258Compile:
259
260int bar(int x)
261{
262 int t = __builtin_clz(x);
263 return -(t>>5);
264}
265
266to:
267
268_bar: addic r3,r3,-1
269 subfe r3,r3,r3
270 blr
271
272