blob: 21e895a34f30ca3cc19b1c8a5dc748c0f69e9fbf [file] [log] [blame]
mostang.com!davidme91ef292003-12-10 07:14:38 +00001\documentclass{article}
2\usepackage[fancyhdr,pdf]{latex2man}
mostang.com!davidm34340902003-02-22 08:19:43 +00003
mostang.com!davidme91ef292003-12-10 07:14:38 +00004\input{common.tex}
5
6\begin{document}
7
8\begin{Name}{3}{libunwind-dynamic}{David Mosberger-Tang}{Programming Library}{Introduction to dynamic unwind-info}libunwind-dynamic -- libunwind-support for runtime-generated code
9\end{Name}
10
11\section{Introduction}
12
mostang.com!davidm3d24b592003-12-21 05:53:57 +000013For \Prog{libunwind} to do its job, it needs to be able to reconstruct
14the \emph{frame state} of each frame in a call-chain. The frame state
15describes the subset of the machine-state that consists of the
16\emph{frame registers} (typically the instruction-pointer and the
17stack-pointer) and all callee-saved registers (preserved registers).
18The frame state describes each register either by providing its
19current value (for frame registers) or by providing the location at
20which the current value is stored (callee-saved registers).
mostang.com!davidme91ef292003-12-10 07:14:38 +000021
mostang.com!davidm3d24b592003-12-21 05:53:57 +000022For statically generated code, the compiler normally takes care of
23emitting \emph{unwind-info} which provides the minimum amount of
24information needed to reconstruct the frame-state for each instruction
25in a procedure. For dynamically generated code, the runtime code
26generator must use the dynamic unwind-info interface provided by
27\Prog{libunwind} to supply the equivalent information. This manual
28page describes the format of this information in detail.
mostang.com!davidme91ef292003-12-10 07:14:38 +000029
mostang.com!davidm3d24b592003-12-21 05:53:57 +000030For the purpose of this discussion, a \emph{procedure} is defined to
31be an arbitrary piece of \emph{contiguous} code. Normally, each
32procedure directly corresponds to a function in the source-language
33but this is not strictly required. For example, a runtime
34code-generator could translate a given function into two separate
35(discontiguous) procedures: one for frequently-executed (hot) code and
36one for rarely-executed (cold) code. Similarly, simple
37source-language functions (usually leaf functions) may get translated
38into code for which the default unwind-conventions apply and for such
39code, it is not strictly necessary to register dynamic unwind-info.
mostang.com!davidme91ef292003-12-10 07:14:38 +000040
mostang.com!davidm3d24b592003-12-21 05:53:57 +000041A procedure logically consists of a sequence of \emph{regions}.
42Regions are nested in the sense that the frame state at the end of one
43region is, by default, assumed to be the frame state for the next
44region. Each region is thought of as being divided into a
45\emph{prologue}, a \emph{body}, and an \emph{epilogue}. Each of them
46can be empty. If non-empty, the prologue sets up the frame state for
47the body. For example, the prologue may need to allocate some space
48on the stack and save certain callee-saved registers. The body
49performs the actual work of the procedure but does not change the
50frame state in any way. If non-empty, the epilogue restores the
51previous frame state and as such it undoes or cancels the effect of
52the prologue. In fact, a single epilogue may undo the effect of the
53prologues of several (nested) regions.
mostang.com!davidme91ef292003-12-10 07:14:38 +000054
mostang.com!davidm3d24b592003-12-21 05:53:57 +000055We should point out that even though the prologue, body, and epilogue
56are logically separate entities, optimizing code-generators will
57generally interleave instructions from all three entities. For this
58reason, the dynamic unwind-info interface of \Prog{libunwind} makes no
59distinction whatsoever between prologue and body. Similarly, the
60exact set of instructions that make up an epilogue is also irrelevant.
61The only point in the epilogue that needs to be described explicitly
62by the dynamic unwind-info is the point at which the stack-pointer
63gets restored. The reason this point needs to be described is that
64once the stack-pointer is restored, all values saved in the
65deallocated portion of the stack frame become invalid and hence
66\Prog{libunwind} needs to know about it. The portion of the frame
67state not saved on the stack is assume to remain valid through the end
68of the region. For this reason, there is usually no need to describe
69instructions which restore the contents of callee-saved registers.
mostang.com!davidme91ef292003-12-10 07:14:38 +000070
71Within a region, each instruction that affects the frame state in some
72fashion needs to be described with an operation descriptor. For this
73purpose, each instruction in the region is assigned a unique index.
74Exactly how this index is derived depends on the architecture. For
75example, on RISC and EPIC-style architecture, instructions have a
76fixed size so it's possible to simply number the instructions. In
77contrast, most CISC use variable-length instruction encodings, so it
78is usually necessary to use a byte-offset as the index. Given the
79instruction index, the operation descriptor specifies the effect of
80the instruction in an abstract manner. For example, it might express
81that the instruction stores calle-saved register \Var{r1} at offset 16
82in the stack frame.
83
84\section{Procedures}
85
mostang.com!davidm3d24b592003-12-21 05:53:57 +000086A runtime code-generator registers the dynamic unwind-info of a
87procedure by setting up a structure of type \Type{unw\_dyn\_info\_t}
88and calling \Func{\_U\_dyn\_register}(), passing the address of the
89structure as the sole argument. The members of the
90\Type{unw\_dyn\_info\_t} structure are described below:
91\begin{itemize}
92\item[\Type{void~*}next] Private to \Prog{libunwind}. Must not be used
93 by the application.
94\item[\Type{void~*}prev] Private to \Prog{libunwind}. Must not be used
95 by the application.
96\item[\Type{unw\_word\_t} \Var{start\_ip}] The start-address of the
97 instructions of the procedure (remember: procedure are defined to be
98 contiguous pieces of code, so a single code-range is sufficient).
99\item[\Type{unw\_word\_t} \Var{end\_ip}] The end-address of the
100 instructions of the procedure (non-inclusive, that is,
101 \Var{end\_ip}-\Var{start\_ip} is the size of the procedure in
102 bytes).
103\item[\Type{unw\_word\_t} \Var{gp}] The global-pointer value in use
104 for this procedure. The exact meaing of the global-pointer is
105 architecture-specific and on some architecture, it is not used at
106 all.
107\item[\Type{int32\_t} \Var{format}] The format of the unwind-info.
108 This member can be one of \Const{UNW\_INFO\_FORMAT\_DYNAMIC},
109 \Const{UNW\_INFO\_FORMAT\_TABLE}, or
110 \Const{UNW\_INFO\_FORMAT\_REMOTE\_TABLE}.
111\item[\Type{union} \Var{u}] This union contains one sub-member
112 structure for every possible unwind-info format:
113 \begin{description}
114 \item[\Type{unw\_dyn\_proc\_info\_t} \Var{pi}] This member is used
115 for format \Const{UNW\_INFO\_FORMAT\_DYNAMIC}.
116 \item[\Type{unw\_dyn\_table\_info\_t} \Var{ti}] This member is used
117 for format \Const{UNW\_INFO\_FORMAT\_TABLE}.
118 \item[\Type{unw\_dyn\_remote\_table\_info\_t} \Var{rti}] This member
119 is used for format \Const{UNW\_INFO\_FORMAT\_REMOTE\_TABLE}.
120 \end{description}\
121 The format of these sub-members is described in detail below.
122\end{itemize}
mostang.com!davidme91ef292003-12-10 07:14:38 +0000123
mostang.com!davidm3d24b592003-12-21 05:53:57 +0000124\subsection{Proc-info format}
mostang.com!davidme91ef292003-12-10 07:14:38 +0000125
mostang.com!davidm3d24b592003-12-21 05:53:57 +0000126This is the preferred dynamic unwind-info format and it is generally
127the one used by full-blown runtime code-generators. In this format,
128the details of a procedure are described by a structure of type
129\Type{unw\_dyn\_proc\_info\_t}. This structure contains the following
130members:
131\begin{description}
mostang.com!davidme91ef292003-12-10 07:14:38 +0000132
mostang.com!davidm3d24b592003-12-21 05:53:57 +0000133\item[\Type{unw\_word\_t} \Var{name\_ptr}] The address of a
134 (human-readable) name of the procedure or 0 if no such name is
135 available. If non-zero, The string stored at this address must be
136 ASCII NUL terminated. For source languages that use name-mangling
137 (such as C++ or Java) the string stored at this address should be
138 the \emph{demangled} version of the name.
mostang.com!davidme91ef292003-12-10 07:14:38 +0000139
mostang.com!davidm3d24b592003-12-21 05:53:57 +0000140\item[\Type{unw\_word\_t} \Var{handler}] The address of the
141 personality-routine for this procedure. Personality-routines are
142 used in conjunction with exception handling. See the C++ ABI draft
143 (http://www.codesourcery.com/cxx-abi/) for an overview and a
144 description of the personality routine. If the procedure has no
145 personality routine, \Var{handler} must be set to 0.
146
147\item[\Type{uint32\_t} \Var{flags}] A bitmask of flags. At the
148 moment, no flags have been defined and this member must be
149 set to 0.
150
151\item[\Type{unw\_dyn\_region\_info\_t~*}\Var{regions}] A NULL-terminated
152 linked list of region-descriptors. See section ``Region
153 descriptors'' below for more details.
154
155\end{description}
156
157\subsection{Table-info format}
158
159This format is generally used when the dynamically generated code was
160derived from static code and the unwind-info for the dynamic and the
161static versions is identical. For example, this format can be useful
162when loading statically-generated code into an address-space in a
163non-standard fashion (i.e., through some means other than
164\Func{dlopen}()). In this format, the details of a group of procedures
165is described by a structure of type \Type{unw\_dyn\_table\_info}.
166This structure contains the following members:
167\begin{description}
168
169\item[\Type{unw\_word\_t} \Var{name\_ptr}] The address of a
170 (human-readable) name of the procedure or 0 if no such name is
171 available. If non-zero, The string stored at this address must be
172 ASCII NUL terminated. For source languages that use name-mangling
173 (such as C++ or Java) the string stored at this address should be
174 the \emph{demangled} version of the name.
175
176\item[\Type{unw\_word\_t} \Var{segbase}] The segment-base value
177 that needs to be added to the segment-relative values stored in the
178 unwind-info. The exact meaning of this value is
179 architecture-specific.
180
181\item[\Type{unw\_word\_t} \Var{table\_len}] The length of the
182 unwind-info (\Var{table\_data}) counted in units of words
183 (\Type{unw\_word\_t}).
184
185\item[\Type{unw\_word\_t} \Var{table\_data}] A pointer to the actual
186 data encoding the unwind-info. The exact format is
187 architecture-specific (see architecture-specific sections below).
188
189\end{description}
190
191\subsection{Remote table-info format}
192
193The remote table-info format has the same basic purpose as the regular
194table-info format. The only difference is that when \Prog{libunwind}
195uses the unwind-info, it will keep the table data in the target
196address-space (which may be remote). Consequently, the type of the
197\Var{table\_data} member is \Type{unw\_word\_t} rather than a pointer.
198This implies that \Prog{libunwind} will have to access the table-data
199via the address-space's \Func{access\_mem}() call-back, rather than
200through a direct memory reference.
201
202From the point of view of a runtime-code generator, the remote
203table-info format offers no advantage and it is expected that such
204generators will describe their procedures either with the proc-info
205format or the normal table-info format. The main reason that the
206remote table-info format exists is to enable the
207address-space-specific \Func{find\_proc\_info}() callback (see
208\SeeAlso{unw\_create\_addr\_space}(3)) to return unwind tables whose
209data remains in remote memory. This can speed up unwinding (e.g., for
210a debugger) because it reduces the amount of data that needs to be
211loaded from remote memory.
212
213\section{Regions descriptors}
214
215A region descriptor is a variable length structure that describes how
216each instruction in the region affects the frame state. Of course,
217most instructions in a region usualy do not change the frame state and
218for those, nothing needs to be recorded in the region descriptor. A
219region descriptor is a structure of type
220\Type{unw\_dyn\_region\_info\_t} and has the following members:
221\begin{description}
222\item[\Type{unw\_dyn\_region\_info\_t~*}\Var{next}] A pointer to the
223 next region. If this is the last region, \Var{next} is \Const{NULL}.
224\item[\Type{int32\_t} \Var{insn\_count}] The length of the region in
225 instructions. Each instruction is assumed to have a fixed size (see
226 architecture-specific sections for details). The value of
227 \Var{insn\_count} may be negative in the last region of a procedure
228 (i.e., it may be negative only if \Var{next} is \Const{NULL}). A
229 negative value indicates that the region covers the last \emph{N}
230 instructions of the procedure, where \emph{N} is the absolute value
231 of \Var{insn\_count}.
232\item[\Type{uint32\_t} \Var{op\_count}] The (allocated) length of
233 the \Var{op\_count} array.
234\item[\Type{unw\_dyn\_op\_t} \Var{op}] An array of dynamic unwind
235 directives. See Section ``Dynamic unwind directives'' for a
236 description of the directives.
237\end{description}
238A region descriptor with an \Var{insn\_count} of zero is an
239\emph{empty region} and such regions are perfectly legal. In fact,
240empty regions can be useful to establish a particular frame state
241before the start of another region.
242
243A single region list can be shared across multiple procedures provided
244those procedures share a common prologue and epilogue (their bodies
245may differ, of course). Normally, such procedures consist of a canned
246prologue, the body, and a canned epilogue. This could be described by
247two regions: one covering the prologue and one covering the epilogue.
248Since the body length is variable, the latter region would need to
249specify a negative value in \Var{insn\_count} such that
250\Prog{libunwind} knows that the region covers the end of the procedure
251(up to the address specified by \Var{end\_ip}).
252
253The region descriptor is a variable length structure to make it
254possible to allocate all the necessary memory with a single
255memory-allocation request. To facilitate the allocation of a region
256descriptors \Prog{libunwind} provides a helper routine with the
257following synopsis:
258
259\noindent
260\Type{size\_t} \Func{\_U\_dyn\_region\_size}(\Type{int} \Var{op\_count});
261
262This routine returns the number of bytes needed to hold a region
263descriptor with space for \Var{op\_count} unwind directives. Note
264that the length of the \Var{op} array does not have to match exactly
265with the number of directives in a region. Instead, it is sufficient
266if the \Var{op} array contains at least as many entries as there are
267directives, since the end of the directives can always be indicated
268with the \Const{UNW\_DYN\_STOP} directive.
269
270\section{Dynamic unwind directives}
271
272A dynamic unwind directive describes how the frame state changes
273at a particular point within a region. The description is in
274the form of a structure of type \Type{unw\_dyn\_op\_t}. This
275structure has the following members:
276\begin{description}
277\item[\Type{int8\_t} \Var{tag}] The operation tag. Must be one
278 of the \Type{unw\_dyn\_operation\_t} values described below.
279\item[\Type{int8\_t} \Var{qp}] The qualifying predicate that controls
280 whether or not this directive is active. This is useful for
281 predicated architecturs such as IA-64 or ARM, where the contents of
282 another (callee-saved) register determines whether or not an
283 instruction is executed (takes effect). If the directive is always
284 active, this member should be set to the manifest constant
285 \Const{\_U\_QP\_TRUE} (this constant is defined for all
286 architectures, predicated or not).
287\item[\Type{int16\_t} \Var{reg}] The number of the register affected
288 by the instruction.
289\item[\Type{int32\_t} \Var{when}] The region-relative number of
290 the instruction to which this directive applies. For example,
291 a value of 0 means that the effect described by this directive
292 has taken place once the first instruction in the region has
293 executed.
294\item[\Type{unw\_word\_t} \Var{val}] The value to be applied by the
295 operation tag. The exact meaning of this value varies by tag. See
296 Section ``Operation tags'' below.
297\end{description}
298It is perfectly legitimate to specify multiple dynamic unwind
299directives with the same \Var{when} value, if a particular instruction
300has a complex effect on the frame state.
301
302Empty regions by definition contain no actual instructions and as such
303the directives are not tied to a particular instruction. By
304convention, the \Var{when} member should be set to 0, however.
305
306There is no need for the dynamic unwind directives to appear
307in order of increasing \Var{when} values. If the directives happen to
308be sorted in that order, it may result in slightly faster execution,
309but a runtime code-generator should not go to extra lengths just to
310ensure that the directives are sorted.
311
312IMPLEMENTATION NOTE: should \Prog{libunwind} implementations for
313certain architectures prefer the list of unwind directives to be
314sorted, it is recommended that such implementations first check
315whether the list happens to be sorted already and, if not, sort the
316directives explicitly before the first use. With this approach, the
317overhead of explicit sorting is only paid when there is a real benefit
318and if the runtime code-generator happens to generated sorted lists
319naturally, the performance penalty is limited to a simple O(N) check.
320
321\subsection{Operations tags}
322
323The possible operation tags are defined by enumeration type
324\Type{unw\_dyn\_operation\_t} which defines the following
325values:
326\begin{description}
327
328\item[\Const{UNW\_DYN\_STOP}] Marks the end of the dynamic unwind
329 directive list. All remaining entries in the \Var{op} array of the
330 region-descriptor are ignored. This tag is guaranteed to have a
331 value of 0.
332
333\item[\Const{UNW\_DYN\_SAVE\_REG}] Marks an instruction which saves
334 register \Var{reg} to register \Var{val}.
335
336\item[\Const{UNW\_DYN\_SPILL\_FP\_REL}] Marks an instruction which
337 spills register \Var{reg} to a frame-pointer-relative location. The
338 frame-pointer-relative offset is given by the value stored in member
339 \Var{val}. See the architecture-specific sections for a description
340 of the stack frame layout.
341
342\item[\Const{UNW\_DYN\_SPILL\_SP\_REL}] Marks an instruction which
343 spills register \Var{reg} to a stack-pointer-relative location. The
344 stack-pointer-relative offset is given by the value stored in member
345 \Var{val}. See the architecture-specific sections for a description
346 of the stack frame layout.
347
348\item[\Const{UNW\_DYN\_ADD}] Marks an instruction which adds
349 the constant value \Var{val} to register \Var{reg}. To add subtract
350 a constant value, store the two's-complement of the value in
351 \Var{val}. The set of registers that can be specified for this tag
352 is described in the architecture-specific sections below.
353
354\item[\Const{UNW\_DYN\_POP\_FRAMES}]
355
356\item[\Const{UNW\_DYN\_LABEL\_STATE}]
357
358\item[\Const{UNW\_DYN\_COPY\_STATE}]
359
360\item[\Const{UNW\_DYN\_ALIAS}]
361
362\end{description}
363
mostang.com!davidme91ef292003-12-10 07:14:38 +0000364unw\_dyn\_op\_t
mostang.com!davidme91ef292003-12-10 07:14:38 +0000365
mostang.com!davidme91ef292003-12-10 07:14:38 +0000366\_U\_dyn\_op\_save\_reg();
367\_U\_dyn\_op\_spill\_fp\_rel();
368\_U\_dyn\_op\_spill\_sp\_rel();
369\_U\_dyn\_op\_add();
370\_U\_dyn\_op\_pop\_frames();
371\_U\_dyn\_op\_label\_state();
372\_U\_dyn\_op\_copy\_state();
373\_U\_dyn\_op\_alias();
374\_U\_dyn\_op\_stop();
375
mostang.com!davidm3d24b592003-12-21 05:53:57 +0000376\section{IA-64 specifics}
377
378- meaning of segbase member in table-info/table-remote-info format
379- format of table\_data in table-info/table-remote-info format
380- instruction size: each bundle is counted as 3 instructions, regardless
381 of template (MLX)
382- describe stack-frame layout, especially with regards to sp-relative
383 and fp-relative addressing
384- UNW\_DYN\_ADD can only add to ``sp'' (always a negative value); use
385 POP\_FRAMES otherwise
386
mostang.com!davidme91ef292003-12-10 07:14:38 +0000387\section{See Also}
388
389\SeeAlso{libunwind(3)},
390\SeeAlso{\_U\_dyn\_register(3)},
391\SeeAlso{\_U\_dyn\_cancel(3)}
392
393\section{Author}
394
395\noindent
396David Mosberger-Tang\\
David Mosberger-Tang75f34cc2007-08-22 12:49:08 -0600397Email: \Email{dmosberger@gmail.com}\\
398WWW: \URL{http://www.nongnu.org/libunwind/}.
mostang.com!davidme91ef292003-12-10 07:14:38 +0000399\LatexManEnd
400
401\end{document}