blob: 43c60c9e7850dbeba38a33bc313cf5adb3891f4e [file] [log] [blame]
Stephen Hines36b56882014-04-23 16:57:46 -07001===================================
2Stack maps and patch points in LLVM
3===================================
4
5.. contents::
6 :local:
7 :depth: 2
8
9Definitions
10===========
11
12In this document we refer to the "runtime" collectively as all
13components that serve as the LLVM client, including the LLVM IR
14generator, object code consumer, and code patcher.
15
16A stack map records the location of ``live values`` at a particular
17instruction address. These ``live values`` do not refer to all the
18LLVM values live across the stack map. Instead, they are only the
19values that the runtime requires to be live at this point. For
20example, they may be the values the runtime will need to resume
21program execution at that point independent of the compiled function
22containing the stack map.
23
24LLVM emits stack map data into the object code within a designated
25:ref:`stackmap-section`. This stack map data contains a record for
26each stack map. The record stores the stack map's instruction address
27and contains a entry for each mapped value. Each entry encodes a
28value's location as a register, stack offset, or constant.
29
30A patch point is an instruction address at which space is reserved for
31patching a new instruction sequence at run time. Patch points look
32much like calls to LLVM. They take arguments that follow a calling
33convention and may return a value. They also imply stack map
34generation, which allows the runtime to locate the patchpoint and
35find the location of ``live values`` at that point.
36
37Motivation
38==========
39
40This functionality is currently experimental but is potentially useful
41in a variety of settings, the most obvious being a runtime (JIT)
42compiler. Example applications of the patchpoint intrinsics are
43implementing an inline call cache for polymorphic method dispatch or
44optimizing the retrieval of properties in dynamically typed languages
45such as JavaScript.
46
47The intrinsics documented here are currently used by the JavaScript
48compiler within the open source WebKit project, see the `FTL JIT
49<https://trac.webkit.org/wiki/FTLJIT>`_, but they are designed to be
50used whenever stack maps or code patching are needed. Because the
51intrinsics have experimental status, compatibility across LLVM
52releases is not guaranteed.
53
54The stack map functionality described in this document is separate
55from the functionality described in
56:ref:`stack-map`. `GCFunctionMetadata` provides the location of
57pointers into a collected heap captured by the `GCRoot` intrinsic,
58which can also be considered a "stack map". Unlike the stack maps
59defined above, the `GCFunctionMetadata` stack map interface does not
60provide a way to associate live register values of arbitrary type with
61an instruction address, nor does it specify a format for the resulting
62stack map. The stack maps described here could potentially provide
63richer information to a garbage collecting runtime, but that usage
64will not be discussed in this document.
65
66Intrinsics
67==========
68
69The following two kinds of intrinsics can be used to implement stack
70maps and patch points: ``llvm.experimental.stackmap`` and
71``llvm.experimental.patchpoint``. Both kinds of intrinsics generate a
72stack map record, and they both allow some form of code patching. They
73can be used independently (i.e. ``llvm.experimental.patchpoint``
74implicitly generates a stack map without the need for an additional
75call to ``llvm.experimental.stackmap``). The choice of which to use
76depends on whether it is necessary to reserve space for code patching
77and whether any of the intrinsic arguments should be lowered according
78to calling conventions. ``llvm.experimental.stackmap`` does not
79reserve any space, nor does it expect any call arguments. If the
80runtime patches code at the stack map's address, it will destructively
81overwrite the program text. This is unlike
82``llvm.experimental.patchpoint``, which reserves space for in-place
83patching without overwriting surrounding code. The
84``llvm.experimental.patchpoint`` intrinsic also lowers a specified
85number of arguments according to its calling convention. This allows
86patched code to make in-place function calls without marshaling.
87
88Each instance of one of these intrinsics generates a stack map record
89in the :ref:`stackmap-section`. The record includes an ID, allowing
90the runtime to uniquely identify the stack map, and the offset within
91the code from the beginning of the enclosing function.
92
93'``llvm.experimental.stackmap``' Intrinsic
94^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
95
96Syntax:
97"""""""
98
99::
100
101 declare void
102 @llvm.experimental.stackmap(i64 <id>, i32 <numShadowBytes>, ...)
103
104Overview:
105"""""""""
106
107The '``llvm.experimental.stackmap``' intrinsic records the location of
108specified values in the stack map without generating any code.
109
110Operands:
111"""""""""
112
113The first operand is an ID to be encoded within the stack map. The
114second operand is the number of shadow bytes following the
115intrinsic. The variable number of operands that follow are the ``live
116values`` for which locations will be recorded in the stack map.
117
118To use this intrinsic as a bare-bones stack map, with no code patching
119support, the number of shadow bytes can be set to zero.
120
121Semantics:
122""""""""""
123
124The stack map intrinsic generates no code in place, unless nops are
125needed to cover its shadow (see below). However, its offset from
126function entry is stored in the stack map. This is the relative
127instruction address immediately following the instructions that
128precede the stack map.
129
130The stack map ID allows a runtime to locate the desired stack map
131record. LLVM passes this ID through directly to the stack map
132record without checking uniqueness.
133
134LLVM guarantees a shadow of instructions following the stack map's
135instruction offset during which neither the end of the basic block nor
136another call to ``llvm.experimental.stackmap`` or
137``llvm.experimental.patchpoint`` may occur. This allows the runtime to
138patch the code at this point in response to an event triggered from
139outside the code. The code for instructions following the stack map
140may be emitted in the stack map's shadow, and these instructions may
141be overwritten by destructive patching. Without shadow bytes, this
142destructive patching could overwrite program text or data outside the
143current function. We disallow overlapping stack map shadows so that
144the runtime does not need to consider this corner case.
145
146For example, a stack map with 8 byte shadow:
147
148.. code-block:: llvm
149
150 call void @runtime()
151 call void (i64, i32, ...)* @llvm.experimental.stackmap(i64 77, i32 8,
152 i64* %ptr)
153 %val = load i64* %ptr
154 %add = add i64 %val, 3
155 ret i64 %add
156
157May require one byte of nop-padding:
158
159.. code-block:: none
160
161 0x00 callq _runtime
162 0x05 nop <--- stack map address
163 0x06 movq (%rdi), %rax
164 0x07 addq $3, %rax
165 0x0a popq %rdx
166 0x0b ret <---- end of 8-byte shadow
167
168Now, if the runtime needs to invalidate the compiled code, it may
169patch 8 bytes of code at the stack map's address at follows:
170
171.. code-block:: none
172
173 0x00 callq _runtime
174 0x05 movl $0xffff, %rax <--- patched code at stack map address
175 0x0a callq *%rax <---- end of 8-byte shadow
176
177This way, after the normal call to the runtime returns, the code will
178execute a patched call to a special entry point that can rebuild a
179stack frame from the values located by the stack map.
180
181'``llvm.experimental.patchpoint.*``' Intrinsic
182^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
183
184Syntax:
185"""""""
186
187::
188
189 declare void
190 @llvm.experimental.patchpoint.void(i64 <id>, i32 <numBytes>,
191 i8* <target>, i32 <numArgs>, ...)
192 declare i64
193 @llvm.experimental.patchpoint.i64(i64 <id>, i32 <numBytes>,
194 i8* <target>, i32 <numArgs>, ...)
195
196Overview:
197"""""""""
198
199The '``llvm.experimental.patchpoint.*``' intrinsics creates a function
200call to the specified ``<target>`` and records the location of specified
201values in the stack map.
202
203Operands:
204"""""""""
205
206The first operand is an ID, the second operand is the number of bytes
207reserved for the patchable region, the third operand is the target
208address of a function (optionally null), and the fourth operand
209specifies how many of the following variable operands are considered
210function call arguments. The remaining variable number of operands are
211the ``live values`` for which locations will be recorded in the stack
212map.
213
214Semantics:
215""""""""""
216
217The patch point intrinsic generates a stack map. It also emits a
218function call to the address specified by ``<target>`` if the address
219is not a constant null. The function call and its arguments are
220lowered according to the calling convention specified at the
221intrinsic's callsite. Variants of the intrinsic with non-void return
222type also return a value according to calling convention.
223
Stephen Hinesebe69fe2015-03-23 12:10:34 -0700224On PowerPC, note that ``<target>`` must be the actual intended target of
225the indirect call. Specifically, even when compiling for the ELF V1 ABI,
226``<target>`` is not the function-descriptor address normally used as the C/C++
227function-pointer representation. As a result, the call target must be local
228because no adjustment or restoration of the TOC pointer (in register r2) will
229be performed.
230
Stephen Hines36b56882014-04-23 16:57:46 -0700231Requesting zero patch point arguments is valid. In this case, all
232variable operands are handled just like
233``llvm.experimental.stackmap.*``. The difference is that space will
234still be reserved for patching, a call will be emitted, and a return
235value is allowed.
236
237The location of the arguments are not normally recorded in the stack
238map because they are already fixed by the calling convention. The
239remaining ``live values`` will have their location recorded, which
240could be a register, stack location, or constant. A special calling
241convention has been introduced for use with stack maps, anyregcc,
242which forces the arguments to be loaded into registers but allows
243those register to be dynamically allocated. These argument registers
244will have their register locations recorded in the stack map in
245addition to the remaining ``live values``.
246
247The patch point also emits nops to cover at least ``<numBytes>`` of
248instruction encoding space. Hence, the client must ensure that
249``<numBytes>`` is enough to encode a call to the target address on the
250supported targets. If the call target is constant null, then there is
251no minimum requirement. A zero-byte null target patchpoint is
252valid.
253
254The runtime may patch the code emitted for the patch point, including
255the call sequence and nops. However, the runtime may not assume
256anything about the code LLVM emits within the reserved space. Partial
257patching is not allowed. The runtime must patch all reserved bytes,
258padding with nops if necessary.
259
260This example shows a patch point reserving 15 bytes, with one argument
261in $rdi, and a return value in $rax per native calling convention:
262
263.. code-block:: llvm
264
265 %target = inttoptr i64 -281474976710654 to i8*
266 %val = call i64 (i64, i32, ...)*
267 @llvm.experimental.patchpoint.i64(i64 78, i32 15,
268 i8* %target, i32 1, i64* %ptr)
269 %add = add i64 %val, 3
270 ret i64 %add
271
272May generate:
273
274.. code-block:: none
275
276 0x00 movabsq $0xffff000000000002, %r11 <--- patch point address
277 0x0a callq *%r11
278 0x0d nop
279 0x0e nop <--- end of reserved 15-bytes
280 0x0f addq $0x3, %rax
281 0x10 movl %rax, 8(%rsp)
282
283Note that no stack map locations will be recorded. If the patched code
284sequence does not need arguments fixed to specific calling convention
285registers, then the ``anyregcc`` convention may be used:
286
287.. code-block:: none
288
289 %val = call anyregcc @llvm.experimental.patchpoint(i64 78, i32 15,
290 i8* %target, i32 1,
291 i64* %ptr)
292
293The stack map now indicates the location of the %ptr argument and
294return value:
295
296.. code-block:: none
297
298 Stack Map: ID=78, Loc0=%r9 Loc1=%r8
299
300The patch code sequence may now use the argument that happened to be
301allocated in %r8 and return a value allocated in %r9:
302
303.. code-block:: none
304
305 0x00 movslq 4(%r8) %r9 <--- patched code at patch point address
306 0x03 nop
307 ...
308 0x0e nop <--- end of reserved 15-bytes
309 0x0f addq $0x3, %r9
310 0x10 movl %r9, 8(%rsp)
311
312.. _stackmap-format:
313
314Stack Map Format
315================
316
317The existence of a stack map or patch point intrinsic within an LLVM
318Module forces code emission to create a :ref:`stackmap-section`. The
319format of this section follows:
320
321.. code-block:: none
322
323 Header {
324 uint8 : Stack Map Version (current version is 1)
325 uint8 : Reserved (expected to be 0)
326 uint16 : Reserved (expected to be 0)
327 }
328 uint32 : NumFunctions
329 uint32 : NumConstants
330 uint32 : NumRecords
331 StkSizeRecord[NumFunctions] {
332 uint64 : Function Address
333 uint64 : Stack Size
334 }
335 Constants[NumConstants] {
336 uint64 : LargeConstant
337 }
338 StkMapRecord[NumRecords] {
339 uint64 : PatchPoint ID
340 uint32 : Instruction Offset
341 uint16 : Reserved (record flags)
342 uint16 : NumLocations
343 Location[NumLocations] {
344 uint8 : Register | Direct | Indirect | Constant | ConstantIndex
345 uint8 : Reserved (location flags)
346 uint16 : Dwarf RegNum
347 int32 : Offset or SmallConstant
348 }
349 uint16 : Padding
350 uint16 : NumLiveOuts
351 LiveOuts[NumLiveOuts]
352 uint16 : Dwarf RegNum
353 uint8 : Reserved
354 uint8 : Size in Bytes
355 }
356 uint32 : Padding (only if required to align to 8 byte)
357 }
358
359The first byte of each location encodes a type that indicates how to
360interpret the ``RegNum`` and ``Offset`` fields as follows:
361
362======== ========== =================== ===========================
363Encoding Type Value Description
364-------- ---------- ------------------- ---------------------------
3650x1 Register Reg Value in a register
3660x2 Direct Reg + Offset Frame index value
3670x3 Indirect [Reg + Offset] Spilled value
3680x4 Constant Offset Small constant
3690x5 ConstIndex Constants[Offset] Large constant
370======== ========== =================== ===========================
371
372In the common case, a value is available in a register, and the
373``Offset`` field will be zero. Values spilled to the stack are encoded
374as ``Indirect`` locations. The runtime must load those values from a
375stack address, typically in the form ``[BP + Offset]``. If an
376``alloca`` value is passed directly to a stack map intrinsic, then
377LLVM may fold the frame index into the stack map as an optimization to
378avoid allocating a register or stack slot. These frame indices will be
379encoded as ``Direct`` locations in the form ``BP + Offset``. LLVM may
380also optimize constants by emitting them directly in the stack map,
381either in the ``Offset`` of a ``Constant`` location or in the constant
382pool, referred to by ``ConstantIndex`` locations.
383
384At each callsite, a "liveout" register list is also recorded. These
385are the registers that are live across the stackmap and therefore must
386be saved by the runtime. This is an important optimization when the
387patchpoint intrinsic is used with a calling convention that by default
388preserves most registers as callee-save.
389
390Each entry in the liveout register list contains a DWARF register
391number and size in bytes. The stackmap format deliberately omits
392specific subregister information. Instead the runtime must interpret
393this information conservatively. For example, if the stackmap reports
394one byte at ``%rax``, then the value may be in either ``%al`` or
395``%ah``. It doesn't matter in practice, because the runtime will
396simply save ``%rax``. However, if the stackmap reports 16 bytes at
397``%ymm0``, then the runtime can safely optimize by saving only
398``%xmm0``.
399
400The stack map format is a contract between an LLVM SVN revision and
401the runtime. It is currently experimental and may change in the short
402term, but minimizing the need to update the runtime is
403important. Consequently, the stack map design is motivated by
404simplicity and extensibility. Compactness of the representation is
405secondary because the runtime is expected to parse the data
406immediately after compiling a module and encode the information in its
407own format. Since the runtime controls the allocation of sections, it
408can reuse the same stack map space for multiple modules.
409
410Stackmap support is currently only implemented for 64-bit
411platforms. However, a 32-bit implementation should be able to use the
412same format with an insignificant amount of wasted space.
413
414.. _stackmap-section:
415
416Stack Map Section
417^^^^^^^^^^^^^^^^^
418
419A JIT compiler can easily access this section by providing its own
420memory manager via the LLVM C API
421``LLVMCreateSimpleMCJITMemoryManager()``. When creating the memory
422manager, the JIT provides a callback:
423``LLVMMemoryManagerAllocateDataSectionCallback()``. When LLVM creates
424this section, it invokes the callback and passes the section name. The
425JIT can record the in-memory address of the section at this time and
426later parse it to recover the stack map data.
427
428On Darwin, the stack map section name is "__llvm_stackmaps". The
429segment name is "__LLVM_STACKMAPS".
430
431Stack Map Usage
432===============
433
434The stack map support described in this document can be used to
435precisely determine the location of values at a specific position in
436the code. LLVM does not maintain any mapping between those values and
437any higher-level entity. The runtime must be able to interpret the
438stack map record given only the ID, offset, and the order of the
439locations, which LLVM preserves.
440
441Note that this is quite different from the goal of debug information,
442which is a best-effort attempt to track the location of named
443variables at every instruction.
444
445An important motivation for this design is to allow a runtime to
446commandeer a stack frame when execution reaches an instruction address
447associated with a stack map. The runtime must be able to rebuild a
448stack frame and resume program execution using the information
449provided by the stack map. For example, execution may resume in an
450interpreter or a recompiled version of the same function.
451
452This usage restricts LLVM optimization. Clearly, LLVM must not move
453stores across a stack map. However, loads must also be handled
454conservatively. If the load may trigger an exception, hoisting it
455above a stack map could be invalid. For example, the runtime may
456determine that a load is safe to execute without a type check given
457the current state of the type system. If the type system changes while
458some activation of the load's function exists on the stack, the load
459becomes unsafe. The runtime can prevent subsequent execution of that
460load by immediately patching any stack map location that lies between
461the current call site and the load (typically, the runtime would
462simply patch all stack map locations to invalidate the function). If
463the compiler had hoisted the load above the stack map, then the
464program could crash before the runtime could take back control.
465
466To enforce these semantics, stackmap and patchpoint intrinsics are
467considered to potentially read and write all memory. This may limit
468optimization more than some clients desire. This limitation may be
469avoided by marking the call site as "readonly". In the future we may
470also allow meta-data to be added to the intrinsic call to express
471aliasing, thereby allowing optimizations to hoist certain loads above
472stack maps.
473
474Direct Stack Map Entries
475^^^^^^^^^^^^^^^^^^^^^^^^
476
477As shown in :ref:`stackmap-section`, a Direct stack map location
478records the address of frame index. This address is itself the value
479that the runtime requested. This differs from Indirect locations,
480which refer to a stack locations from which the requested values must
481be loaded. Direct locations can communicate the address if an alloca,
482while Indirect locations handle register spills.
483
484For example:
485
486.. code-block:: none
487
488 entry:
489 %a = alloca i64...
490 llvm.experimental.stackmap(i64 <ID>, i32 <shadowBytes>, i64* %a)
491
492The runtime can determine this alloca's relative location on the
493stack immediately after compilation, or at any time thereafter. This
494differs from Register and Indirect locations, because the runtime can
495only read the values in those locations when execution reaches the
496instruction address of the stack map.
497
498This functionality requires LLVM to treat entry-block allocas
499specially when they are directly consumed by an intrinsics. (This is
500the same requirement imposed by the llvm.gcroot intrinsic.) LLVM
501transformations must not substitute the alloca with any intervening
502value. This can be verified by the runtime simply by checking that the
503stack map's location is a Direct location type.