blob: 8bc83d8622f91c339f5c60073cf77200bd837f27 [file] [log] [blame]
Reid Klecknera534a382013-12-19 02:14:12 +00001==========================================
2Design and Usage of the InAlloca Attribute
3==========================================
4
5Introduction
6============
7
8.. Warning:: This feature is unstable and not fully implemented.
9
Reid Kleckner60d3a832014-01-16 22:59:24 +000010The :ref:`inalloca <attr_inalloca>` attribute is designed to allow
11taking the address of an aggregate argument that is being passed by
12value through memory. Primarily, this feature is required for
13compatibility with the Microsoft C++ ABI. Under that ABI, class
14instances that are passed by value are constructed directly into
15argument stack memory. Prior to the addition of inalloca, calls in LLVM
16were indivisible instructions. There was no way to perform intermediate
17work, such as object construction, between the first stack adjustment
18and the final control transfer. With inalloca, all arguments passed in
19memory are modelled as a single alloca, which can be stored to prior to
20the call. Unfortunately, this complicated feature comes with a large
21set of restrictions designed to bound the lifetime of the argument
22memory around the call.
Reid Klecknera534a382013-12-19 02:14:12 +000023
24For now, it is recommended that frontends and optimizers avoid producing
25this construct, primarily because it forces the use of a base pointer.
26This feature may grow in the future to allow general mid-level
27optimization, but for now, it should be regarded as less efficient than
28passing by value with a copy.
29
30Intended Usage
31==============
32
Reid Kleckner60d3a832014-01-16 22:59:24 +000033The example below is the intended LLVM IR lowering for some C++ code
Reid Kleckner24e3f7c2014-03-27 01:32:22 +000034that passes two default-constructed ``Foo`` objects to ``g`` in the
3532-bit Microsoft C++ ABI.
Reid Kleckner60d3a832014-01-16 22:59:24 +000036
37.. code-block:: c++
38
39 // Foo is non-trivial.
Reid Kleckner24e3f7c2014-03-27 01:32:22 +000040 struct Foo { int a, b; Foo(); ~Foo(); Foo(const Foo &); };
Reid Kleckner60d3a832014-01-16 22:59:24 +000041 void g(Foo a, Foo b);
42 void f() {
Reid Kleckner24e3f7c2014-03-27 01:32:22 +000043 g(Foo(), Foo());
Reid Kleckner60d3a832014-01-16 22:59:24 +000044 }
Reid Klecknera534a382013-12-19 02:14:12 +000045
46.. code-block:: llvm
47
Reid Kleckner60d3a832014-01-16 22:59:24 +000048 %struct.Foo = type { i32, i32 }
Reid Kleckner24e3f7c2014-03-27 01:32:22 +000049 declare void @Foo_ctor(%struct.Foo* %this)
50 declare void @Foo_dtor(%struct.Foo* %this)
51 declare void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs)
Reid Klecknera534a382013-12-19 02:14:12 +000052
53 define void @f() {
Reid Kleckner60d3a832014-01-16 22:59:24 +000054 entry:
Reid Klecknera534a382013-12-19 02:14:12 +000055 %base = call i8* @llvm.stacksave()
Reid Kleckner24e3f7c2014-03-27 01:32:22 +000056 %memargs = alloca <{ %struct.Foo, %struct.Foo }>
57 %b = getelementptr <{ %struct.Foo, %struct.Foo }>*, i32 1
Reid Kleckner60d3a832014-01-16 22:59:24 +000058 call void @Foo_ctor(%struct.Foo* %b)
59
60 ; If a's ctor throws, we must destruct b.
Reid Kleckner24e3f7c2014-03-27 01:32:22 +000061 %a = getelementptr <{ %struct.Foo, %struct.Foo }>*, i32 0
62 invoke void @Foo_ctor(%struct.Foo* %a)
Reid Klecknera534a382013-12-19 02:14:12 +000063 to label %invoke.cont unwind %invoke.unwind
64
65 invoke.cont:
Reid Kleckner24e3f7c2014-03-27 01:32:22 +000066 call void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs)
Reid Klecknera534a382013-12-19 02:14:12 +000067 call void @llvm.stackrestore(i8* %base)
68 ...
69
70 invoke.unwind:
Reid Kleckner60d3a832014-01-16 22:59:24 +000071 call void @Foo_dtor(%struct.Foo* %b)
Reid Klecknera534a382013-12-19 02:14:12 +000072 call void @llvm.stackrestore(i8* %base)
73 ...
74 }
75
Reid Kleckner60d3a832014-01-16 22:59:24 +000076To avoid stack leaks, the frontend saves the current stack pointer with
77a call to :ref:`llvm.stacksave <int_stacksave>`. Then, it allocates the
78argument stack space with alloca and calls the default constructor. The
79default constructor could throw an exception, so the frontend has to
80create a landing pad. The frontend has to destroy the already
81constructed argument ``b`` before restoring the stack pointer. If the
82constructor does not unwind, ``g`` is called. In the Microsoft C++ ABI,
83``g`` will destroy its arguments, and then the stack is restored in
84``f``.
Reid Klecknera534a382013-12-19 02:14:12 +000085
86Design Considerations
87=====================
88
89Lifetime
90--------
91
92The biggest design consideration for this feature is object lifetime.
93We cannot model the arguments as static allocas in the entry block,
Reid Kleckner60d3a832014-01-16 22:59:24 +000094because all calls need to use the memory at the top of the stack to pass
95arguments. We cannot vend pointers to that memory at function entry
96because after code generation they will alias.
97
98The rule against allocas between argument allocations and the call site
99avoids this problem, but it creates a cleanup problem. Cleanup and
100lifetime is handled explicitly with stack save and restore calls. In
101the future, we may want to introduce a new construct such as ``freea``
102or ``afree`` to make it clear that this stack adjusting cleanup is less
103powerful than a full stack save and restore.
Reid Klecknera534a382013-12-19 02:14:12 +0000104
105Nested Calls and Copy Elision
106-----------------------------
107
Reid Kleckner60d3a832014-01-16 22:59:24 +0000108We also want to be able to support copy elision into these argument
109slots. This means we have to support multiple live argument
110allocations.
111
112Consider the evaluation of:
113
114.. code-block:: c++
115
116 // Foo is non-trivial.
117 struct Foo { int a; Foo(); Foo(const &Foo); ~Foo(); };
118 Foo bar(Foo b);
119 int main() {
120 bar(bar(Foo()));
121 }
122
123In this case, we want to be able to elide copies into ``bar``'s argument
124slots. That means we need to have more than one set of argument frames
125active at the same time. First, we need to allocate the frame for the
126outer call so we can pass it in as the hidden struct return pointer to
127the middle call. Then we do the same for the middle call, allocating a
128frame and passing its address to ``Foo``'s default constructor. By
129wrapping the evaluation of the inner ``bar`` with stack save and
130restore, we can have multiple overlapping active call frames.
Reid Klecknera534a382013-12-19 02:14:12 +0000131
132Callee-cleanup Calling Conventions
133----------------------------------
134
135Another wrinkle is the existence of callee-cleanup conventions. On
136Windows, all methods and many other functions adjust the stack to clear
137the memory used to pass their arguments. In some sense, this means that
138the allocas are automatically cleared by the call. However, LLVM
139instead models this as a write of undef to all of the inalloca values
140passed to the call instead of a stack adjustment. Frontends should
141still restore the stack pointer to avoid a stack leak.
142
143Exceptions
144----------
145
146There is also the possibility of an exception. If argument evaluation
147or copy construction throws an exception, the landing pad must do
148cleanup, which includes adjusting the stack pointer to avoid a stack
149leak. This means the cleanup of the stack memory cannot be tied to the
150call itself. There needs to be a separate IR-level instruction that can
151perform independent cleanup of arguments.
152
153Efficiency
154----------
155
156Eventually, it should be possible to generate efficient code for this
157construct. In particular, using inalloca should not require a base
158pointer. If the backend can prove that all points in the CFG only have
159one possible stack level, then it can address the stack directly from
160the stack pointer. While this is not yet implemented, the plan is that
161the inalloca attribute should not change much, but the frontend IR
162generation recommendations may change.