Reid Kleckner | a534a38 | 2013-12-19 02:14:12 +0000 | [diff] [blame] | 1 | ========================================== |
| 2 | Design and Usage of the InAlloca Attribute |
| 3 | ========================================== |
| 4 | |
| 5 | Introduction |
| 6 | ============ |
| 7 | |
| 8 | .. Warning:: This feature is unstable and not fully implemented. |
| 9 | |
| 10 | The :ref:`attr_inalloca` attribute is designed to allow taking the |
| 11 | address of an aggregate argument that is being passed by value through |
| 12 | memory. Primarily, this feature is required for compatibility with the |
| 13 | Microsoft C++ ABI. Under that ABI, class instances that are passed by |
| 14 | value are constructed directly into argument stack memory. Prior to the |
| 15 | addition of inalloca, calls in LLVM were indivisible instructions. |
| 16 | There was no way to perform intermediate work, such as object |
| 17 | construction, between the first stack adjustment and the final control |
| 18 | transfer. With inalloca, each argument is modelled as an alloca, which |
| 19 | can be stored to independently of the call. Unfortunately, this |
| 20 | complicated feature comes with a large set of restrictions designed to |
| 21 | bound the lifetime of the argument memory around the call, which are |
| 22 | explained in this document. |
| 23 | |
| 24 | For now, it is recommended that frontends and optimizers avoid producing |
| 25 | this construct, primarily because it forces the use of a base pointer. |
| 26 | This feature may grow in the future to allow general mid-level |
| 27 | optimization, but for now, it should be regarded as less efficient than |
| 28 | passing by value with a copy. |
| 29 | |
| 30 | Intended Usage |
| 31 | ============== |
| 32 | |
| 33 | In the example below, ``f`` is attempting to pass a default-constructed |
| 34 | ``Foo`` object to ``g`` by value. |
| 35 | |
| 36 | .. code-block:: llvm |
| 37 | |
| 38 | %Foo = type { i32, i32 } |
| 39 | declare void @Foo_ctor(%Foo* %this) |
| 40 | declare void @g(%Foo* inalloca %arg) |
| 41 | |
| 42 | define void @f() { |
| 43 | ... |
| 44 | |
| 45 | bb1: |
| 46 | %base = call i8* @llvm.stacksave() |
| 47 | %arg = alloca %Foo |
| 48 | invoke void @Foo_ctor(%Foo* %arg) |
| 49 | to label %invoke.cont unwind %invoke.unwind |
| 50 | |
| 51 | invoke.cont: |
| 52 | call void @g(%Foo* inalloca %arg) |
| 53 | call void @llvm.stackrestore(i8* %base) |
| 54 | ... |
| 55 | |
| 56 | invoke.unwind: |
| 57 | call void @llvm.stackrestore(i8* %base) |
| 58 | ... |
| 59 | } |
| 60 | |
| 61 | The alloca in this example is dynamic, meaning it is not in the entry |
| 62 | block, and it can be executed more than once. Due to the restrictions |
| 63 | against allocas between an alloca used with inalloca and its associated |
| 64 | call site, all allocas used with inalloca are considered dynamic. |
| 65 | |
| 66 | To avoid any stack leakage, the frontend saves the current stack pointer |
| 67 | with a call to :ref:`llvm.stacksave <int_stacksave>`. Then, it |
| 68 | allocates the argument stack space with alloca and calls the default |
| 69 | constructor. One important consideration is that the default |
| 70 | constructor could throw an exception, so the frontend has to create a |
| 71 | landing pad. At this point, if there were any other inalloca arguments, |
| 72 | the frontend would have to destruct them before restoring the stack |
| 73 | pointer. If the constructor does not unwind, ``g`` is called, and then |
| 74 | the stack is restored. |
| 75 | |
| 76 | Design Considerations |
| 77 | ===================== |
| 78 | |
| 79 | Lifetime |
| 80 | -------- |
| 81 | |
| 82 | The biggest design consideration for this feature is object lifetime. |
| 83 | We cannot model the arguments as static allocas in the entry block, |
| 84 | because all calls need to use the memory that is at the end of the call |
| 85 | frame to pass arguments. We cannot vend pointers to that memory at |
| 86 | function entry because after code generation they will alias. In the |
| 87 | current design, the rule against allocas between the inalloca alloca |
| 88 | values and the call site avoids this problem, but it creates a cleanup |
| 89 | problem. Cleanup and lifetime is handled explicitly with stack save and |
| 90 | restore calls. In the future, we may be able to avoid this by using |
| 91 | :ref:`llvm.lifetime.start <int_lifestart>` and :ref:`llvm.lifetime.end |
| 92 | <int_lifeend>` instead. |
| 93 | |
| 94 | Nested Calls and Copy Elision |
| 95 | ----------------------------- |
| 96 | |
| 97 | The next consideration is the ability for the frontend to perform copy |
| 98 | elision in the face of nested calls. Consider the evaluation of |
| 99 | ``foo(foo(Bar()))``, where ``foo`` takes and returns a ``Bar`` object by |
| 100 | value and ``Bar`` has non-trivial constructors. In this case, we want |
| 101 | to be able to elide copies into ``foo``'s argument slots. That means we |
| 102 | need to have more than one set of argument frames active at the same |
| 103 | time. First, we need to allocate the frame for the outer call so we can |
| 104 | pass it in as the hidden struct return pointer to the middle call. Then |
| 105 | we do the same for the middle call, allocating a frame and passing its |
| 106 | address to ``Bar``'s default constructor. By wrapping the evaluation of |
| 107 | the inner ``foo`` with stack save and restore, we can have multiple |
| 108 | overlapping active call frames. |
| 109 | |
| 110 | Callee-cleanup Calling Conventions |
| 111 | ---------------------------------- |
| 112 | |
| 113 | Another wrinkle is the existence of callee-cleanup conventions. On |
| 114 | Windows, all methods and many other functions adjust the stack to clear |
| 115 | the memory used to pass their arguments. In some sense, this means that |
| 116 | the allocas are automatically cleared by the call. However, LLVM |
| 117 | instead models this as a write of undef to all of the inalloca values |
| 118 | passed to the call instead of a stack adjustment. Frontends should |
| 119 | still restore the stack pointer to avoid a stack leak. |
| 120 | |
| 121 | Exceptions |
| 122 | ---------- |
| 123 | |
| 124 | There is also the possibility of an exception. If argument evaluation |
| 125 | or copy construction throws an exception, the landing pad must do |
| 126 | cleanup, which includes adjusting the stack pointer to avoid a stack |
| 127 | leak. This means the cleanup of the stack memory cannot be tied to the |
| 128 | call itself. There needs to be a separate IR-level instruction that can |
| 129 | perform independent cleanup of arguments. |
| 130 | |
| 131 | Efficiency |
| 132 | ---------- |
| 133 | |
| 134 | Eventually, it should be possible to generate efficient code for this |
| 135 | construct. In particular, using inalloca should not require a base |
| 136 | pointer. If the backend can prove that all points in the CFG only have |
| 137 | one possible stack level, then it can address the stack directly from |
| 138 | the stack pointer. While this is not yet implemented, the plan is that |
| 139 | the inalloca attribute should not change much, but the frontend IR |
| 140 | generation recommendations may change. |