blob: b1779874e0e2a0436fdf962bc85782ed9190365b [file] [log] [blame]
Reid Klecknera534a382013-12-19 02:14:12 +00001==========================================
2Design and Usage of the InAlloca Attribute
3==========================================
4
5Introduction
6============
7
8.. Warning:: This feature is unstable and not fully implemented.
9
10The :ref:`attr_inalloca` attribute is designed to allow taking the
11address of an aggregate argument that is being passed by value through
12memory. Primarily, this feature is required for compatibility with the
13Microsoft C++ ABI. Under that ABI, class instances that are passed by
14value are constructed directly into argument stack memory. Prior to the
15addition of inalloca, calls in LLVM were indivisible instructions.
16There was no way to perform intermediate work, such as object
17construction, between the first stack adjustment and the final control
18transfer. With inalloca, each argument is modelled as an alloca, which
19can be stored to independently of the call. Unfortunately, this
20complicated feature comes with a large set of restrictions designed to
21bound the lifetime of the argument memory around the call, which are
22explained in this document.
23
24For now, it is recommended that frontends and optimizers avoid producing
25this construct, primarily because it forces the use of a base pointer.
26This feature may grow in the future to allow general mid-level
27optimization, but for now, it should be regarded as less efficient than
28passing by value with a copy.
29
30Intended Usage
31==============
32
33In the example below, ``f`` is attempting to pass a default-constructed
34``Foo`` object to ``g`` by value.
35
36.. code-block:: llvm
37
38 %Foo = type { i32, i32 }
39 declare void @Foo_ctor(%Foo* %this)
40 declare void @g(%Foo* inalloca %arg)
41
42 define void @f() {
43 ...
44
45 bb1:
46 %base = call i8* @llvm.stacksave()
47 %arg = alloca %Foo
48 invoke void @Foo_ctor(%Foo* %arg)
49 to label %invoke.cont unwind %invoke.unwind
50
51 invoke.cont:
52 call void @g(%Foo* inalloca %arg)
53 call void @llvm.stackrestore(i8* %base)
54 ...
55
56 invoke.unwind:
57 call void @llvm.stackrestore(i8* %base)
58 ...
59 }
60
61The alloca in this example is dynamic, meaning it is not in the entry
62block, and it can be executed more than once. Due to the restrictions
63against allocas between an alloca used with inalloca and its associated
64call site, all allocas used with inalloca are considered dynamic.
65
66To avoid any stack leakage, the frontend saves the current stack pointer
67with a call to :ref:`llvm.stacksave <int_stacksave>`. Then, it
68allocates the argument stack space with alloca and calls the default
69constructor. One important consideration is that the default
70constructor could throw an exception, so the frontend has to create a
71landing pad. At this point, if there were any other inalloca arguments,
72the frontend would have to destruct them before restoring the stack
73pointer. If the constructor does not unwind, ``g`` is called, and then
74the stack is restored.
75
76Design Considerations
77=====================
78
79Lifetime
80--------
81
82The biggest design consideration for this feature is object lifetime.
83We cannot model the arguments as static allocas in the entry block,
84because all calls need to use the memory that is at the end of the call
85frame to pass arguments. We cannot vend pointers to that memory at
86function entry because after code generation they will alias. In the
87current design, the rule against allocas between the inalloca alloca
88values and the call site avoids this problem, but it creates a cleanup
89problem. Cleanup and lifetime is handled explicitly with stack save and
90restore calls. In the future, we may be able to avoid this by using
91:ref:`llvm.lifetime.start <int_lifestart>` and :ref:`llvm.lifetime.end
92<int_lifeend>` instead.
93
94Nested Calls and Copy Elision
95-----------------------------
96
97The next consideration is the ability for the frontend to perform copy
98elision in the face of nested calls. Consider the evaluation of
99``foo(foo(Bar()))``, where ``foo`` takes and returns a ``Bar`` object by
100value and ``Bar`` has non-trivial constructors. In this case, we want
101to be able to elide copies into ``foo``'s argument slots. That means we
102need to have more than one set of argument frames active at the same
103time. First, we need to allocate the frame for the outer call so we can
104pass it in as the hidden struct return pointer to the middle call. Then
105we do the same for the middle call, allocating a frame and passing its
106address to ``Bar``'s default constructor. By wrapping the evaluation of
107the inner ``foo`` with stack save and restore, we can have multiple
108overlapping active call frames.
109
110Callee-cleanup Calling Conventions
111----------------------------------
112
113Another wrinkle is the existence of callee-cleanup conventions. On
114Windows, all methods and many other functions adjust the stack to clear
115the memory used to pass their arguments. In some sense, this means that
116the allocas are automatically cleared by the call. However, LLVM
117instead models this as a write of undef to all of the inalloca values
118passed to the call instead of a stack adjustment. Frontends should
119still restore the stack pointer to avoid a stack leak.
120
121Exceptions
122----------
123
124There is also the possibility of an exception. If argument evaluation
125or copy construction throws an exception, the landing pad must do
126cleanup, which includes adjusting the stack pointer to avoid a stack
127leak. This means the cleanup of the stack memory cannot be tied to the
128call itself. There needs to be a separate IR-level instruction that can
129perform independent cleanup of arguments.
130
131Efficiency
132----------
133
134Eventually, it should be possible to generate efficient code for this
135construct. In particular, using inalloca should not require a base
136pointer. If the backend can prove that all points in the CFG only have
137one possible stack level, then it can address the stack directly from
138the stack pointer. While this is not yet implemented, the plan is that
139the inalloca attribute should not change much, but the frontend IR
140generation recommendations may change.