Dan Gohman | f17a25c | 2007-07-18 16:29:46 +0000 | [diff] [blame] | 1 | //===---------------------------------------------------------------------===// |
| 2 | // Random ideas for the X86 backend: FP stack related stuff |
| 3 | //===---------------------------------------------------------------------===// |
| 4 | |
| 5 | //===---------------------------------------------------------------------===// |
| 6 | |
| 7 | Some targets (e.g. athlons) prefer freep to fstp ST(0): |
| 8 | http://gcc.gnu.org/ml/gcc-patches/2004-04/msg00659.html |
| 9 | |
| 10 | //===---------------------------------------------------------------------===// |
| 11 | |
Dan Gohman | f17a25c | 2007-07-18 16:29:46 +0000 | [diff] [blame] | 12 | This should use fiadd on chips where it is profitable: |
| 13 | double foo(double P, int *I) { return P+*I; } |
| 14 | |
| 15 | We have fiadd patterns now but the followings have the same cost and |
| 16 | complexity. We need a way to specify the later is more profitable. |
| 17 | |
| 18 | def FpADD32m : FpI<(ops RFP:$dst, RFP:$src1, f32mem:$src2), OneArgFPRW, |
| 19 | [(set RFP:$dst, (fadd RFP:$src1, |
| 20 | (extloadf64f32 addr:$src2)))]>; |
| 21 | // ST(0) = ST(0) + [mem32] |
| 22 | |
| 23 | def FpIADD32m : FpI<(ops RFP:$dst, RFP:$src1, i32mem:$src2), OneArgFPRW, |
| 24 | [(set RFP:$dst, (fadd RFP:$src1, |
| 25 | (X86fild addr:$src2, i32)))]>; |
| 26 | // ST(0) = ST(0) + [mem32int] |
| 27 | |
| 28 | //===---------------------------------------------------------------------===// |
| 29 | |
| 30 | The FP stackifier needs to be global. Also, it should handle simple permutates |
| 31 | to reduce number of shuffle instructions, e.g. turning: |
| 32 | |
| 33 | fld P -> fld Q |
| 34 | fld Q fld P |
| 35 | fxch |
| 36 | |
| 37 | or: |
| 38 | |
| 39 | fxch -> fucomi |
| 40 | fucomi jl X |
| 41 | jg X |
| 42 | |
| 43 | Ideas: |
| 44 | http://gcc.gnu.org/ml/gcc-patches/2004-11/msg02410.html |
| 45 | |
| 46 | |
| 47 | //===---------------------------------------------------------------------===// |
| 48 | |
| 49 | Add a target specific hook to DAG combiner to handle SINT_TO_FP and |
| 50 | FP_TO_SINT when the source operand is already in memory. |
| 51 | |
| 52 | //===---------------------------------------------------------------------===// |
| 53 | |
| 54 | Open code rint,floor,ceil,trunc: |
| 55 | http://gcc.gnu.org/ml/gcc-patches/2004-08/msg02006.html |
| 56 | http://gcc.gnu.org/ml/gcc-patches/2004-08/msg02011.html |
| 57 | |
| 58 | Opencode the sincos[f] libcall. |
| 59 | |
| 60 | //===---------------------------------------------------------------------===// |
| 61 | |
| 62 | None of the FPStack instructions are handled in |
| 63 | X86RegisterInfo::foldMemoryOperand, which prevents the spiller from |
| 64 | folding spill code into the instructions. |
| 65 | |
| 66 | //===---------------------------------------------------------------------===// |
| 67 | |
| 68 | Currently the x86 codegen isn't very good at mixing SSE and FPStack |
| 69 | code: |
| 70 | |
| 71 | unsigned int foo(double x) { return x; } |
| 72 | |
| 73 | foo: |
| 74 | subl $20, %esp |
| 75 | movsd 24(%esp), %xmm0 |
| 76 | movsd %xmm0, 8(%esp) |
| 77 | fldl 8(%esp) |
| 78 | fisttpll (%esp) |
| 79 | movl (%esp), %eax |
| 80 | addl $20, %esp |
| 81 | ret |
| 82 | |
Chris Lattner | 1500dc3 | 2008-02-14 05:41:38 +0000 | [diff] [blame] | 83 | This just requires being smarter when custom expanding fptoui. |
Dan Gohman | f17a25c | 2007-07-18 16:29:46 +0000 | [diff] [blame] | 84 | |
| 85 | //===---------------------------------------------------------------------===// |