Chris Lattner | 34cb154 | 2009-04-01 21:11:04 +0000 | [diff] [blame] | 1 | //===----------------------------------------------------------------------===// |
| 2 | // Representing sign/zero extension of function results |
| 3 | //===----------------------------------------------------------------------===// |
| 4 | |
| 5 | Mar 25, 2009 - Initial Revision |
| 6 | |
| 7 | Most ABIs specify that functions which return small integers do so in a |
| 8 | specific integer GPR. This is an efficient way to go, but raises the question: |
| 9 | if the returned value is smaller than the register, what do the high bits hold? |
| 10 | |
| 11 | There are three (interesting) possible answers: undefined, zero extended, or |
| 12 | sign extended. The number of bits in question depends on the data-type that |
| 13 | the front-end is referencing (typically i1/i8/i16/i32). |
| 14 | |
| 15 | Knowing the answer to this is important for two reasons: 1) we want to be able |
| 16 | to implement the ABI correctly. If we need to sign extend the result according |
| 17 | to the ABI, we really really do need to do this to preserve correctness. 2) |
| 18 | this information is often useful for optimization purposes, and we want the |
| 19 | mid-level optimizers to be able to process this (e.g. eliminate redundant |
| 20 | extensions). |
| 21 | |
| 22 | For example, lets pretend that X86 requires the caller to properly extend the |
| 23 | result of a return (I'm not sure this is the case, but the argument doesn't |
| 24 | depend on this). Given this, we should compile this: |
| 25 | |
| 26 | int a(); |
| 27 | short b() { return a(); } |
| 28 | |
| 29 | into: |
| 30 | |
| 31 | _b: |
| 32 | subl $12, %esp |
| 33 | call L_a$stub |
| 34 | addl $12, %esp |
| 35 | cwtl |
| 36 | ret |
| 37 | |
| 38 | An optimization example is that we should be able to eliminate the explicit |
| 39 | sign extension in this example: |
| 40 | |
| 41 | short y(); |
| 42 | int z() { |
| 43 | return ((int)y() << 16) >> 16; |
| 44 | } |
| 45 | |
| 46 | _z: |
| 47 | subl $12, %esp |
| 48 | call _y |
| 49 | ;; movswl %ax, %eax -> not needed because eax is already sext'd |
| 50 | addl $12, %esp |
| 51 | ret |
| 52 | |
| 53 | //===----------------------------------------------------------------------===// |
| 54 | // What we have right now. |
| 55 | //===----------------------------------------------------------------------===// |
| 56 | |
| 57 | Currently, these sorts of things are modelled by compiling a function to return |
| 58 | the small type and a signext/zeroext marker is used. For example, we compile |
| 59 | Z into: |
| 60 | |
| 61 | define i32 @z() nounwind { |
| 62 | entry: |
| 63 | %0 = tail call signext i16 (...)* @y() nounwind |
| 64 | %1 = sext i16 %0 to i32 |
| 65 | ret i32 %1 |
| 66 | } |
| 67 | |
| 68 | and b into: |
| 69 | |
| 70 | define signext i16 @b() nounwind { |
| 71 | entry: |
| 72 | %0 = tail call i32 (...)* @a() nounwind ; <i32> [#uses=1] |
| 73 | %retval12 = trunc i32 %0 to i16 ; <i16> [#uses=1] |
| 74 | ret i16 %retval12 |
| 75 | } |
| 76 | |
| 77 | This has some problems: 1) the actual precise semantics are really poorly |
| 78 | defined (see PR3779). 2) some targets might want the caller to extend, some |
| 79 | might want the callee to extend 3) the mid-level optimizer doesn't know the |
| 80 | size of the GPR, so it doesn't know that %0 is sign extended up to 32-bits |
| 81 | here, and even if it did, it could not eliminate the sext. 4) the code |
| 82 | generator has historically assumed that the result is extended to i32, which is |
| 83 | a problem on PIC16 (and is also probably wrong on alpha and other 64-bit |
| 84 | targets). |
| 85 | |
| 86 | //===----------------------------------------------------------------------===// |
| 87 | // The proposal |
| 88 | //===----------------------------------------------------------------------===// |
| 89 | |
| 90 | I suggest that we have the front-end fully lower out the ABI issues here to |
| 91 | LLVM IR. This makes it 100% explicit what is going on and means that there is |
| 92 | no cause for confusion. For example, the cases above should compile into: |
| 93 | |
| 94 | define i32 @z() nounwind { |
| 95 | entry: |
| 96 | %0 = tail call i32 (...)* @y() nounwind |
| 97 | %1 = trunc i32 %0 to i16 |
| 98 | %2 = sext i16 %1 to i32 |
| 99 | ret i32 %2 |
| 100 | } |
| 101 | define i32 @b() nounwind { |
| 102 | entry: |
| 103 | %0 = tail call i32 (...)* @a() nounwind |
| 104 | %retval12 = trunc i32 %0 to i16 |
| 105 | %tmp = sext i16 %retval12 to i32 |
| 106 | ret i32 %tmp |
| 107 | } |
| 108 | |
| 109 | In this model, no functions will return an i1/i8/i16 (and on a x86-64 target |
| 110 | that extends results to i64, no i32). This solves the ambiguity issue, allows us |
| 111 | to fully describe all possible ABIs, and now allows the optimizers to reason |
| 112 | about and eliminate these extensions. |
| 113 | |
| 114 | The one thing that is missing is the ability for the front-end and optimizer to |
| 115 | specify/infer the guarantees provided by the ABI to allow other optimizations. |
| 116 | For example, in the y/z case, since y is known to return a sign extended value, |
| 117 | the trunc/sext in z should be eliminable. |
| 118 | |
| 119 | This can be done by introducing new sext/zext attributes which mean "I know |
| 120 | that the result of the function is sign extended at least N bits. Given this, |
| 121 | and given that it is stuck on the y function, the mid-level optimizer could |
| 122 | easily eliminate the extensions etc with existing functionality. |
| 123 | |
| 124 | The major disadvantage of doing this sort of thing is that it makes the ABI |
| 125 | lowering stuff even more explicit in the front-end, and that we would like to |
| 126 | eventually move to having the code generator do more of this work. However, |
| 127 | the sad truth of the matter is that this is a) unlikely to happen anytime in |
| 128 | the near future, and b) this is no worse than we have now with the existing |
| 129 | attributes. |
| 130 | |
| 131 | C compilers fundamentally have to reason about the target in many ways. |
| 132 | This is ugly and horrible, but a fact of life. |
| 133 | |