| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" |
| "http://www.w3.org/TR/html4/strict.dtd"> |
| <html> |
| <head> |
| <title>Source Level Debugging with LLVM</title> |
| <link rel="stylesheet" href="llvm.css" type="text/css"> |
| </head> |
| <body> |
| |
| <div class="doc_title">Source Level Debugging with LLVM</div> |
| |
| <ul> |
| |
| <img src="venusflytrap.jpg" width=247 height=369 align=right> |
| |
| <li><a href="#introduction">Introduction</a></li> |
| <ol> |
| <li><a href="#phil">Philosophy behind LLVM debugging information</a></li> |
| <li><a href="#debugopt">Debugging optimized code</a></li> |
| <li><a href="#future">Future work</a></li> |
| </ol> |
| <li><a href="#llvm-db">Using the <tt>llvm-db</tt> tool</a> |
| <ol> |
| <li><a href="#limitations">Limitations of <tt>llvm-db</tt></a></li> |
| <li><a href="#sample">A sample <tt>llvm-db</tt> session</a></li> |
| <li><a href="#startup">Starting the debugger</a></li> |
| <li><a href="#commands">Commands recognized by the debugger</a></li> |
| </ol></li> |
| |
| <li><a href="#architecture">Architecture of the LLVM debugger</a></li> |
| <ol> |
| <li><a href="#arch_todo">Short-term TODO list</a></li> |
| </ol> |
| |
| <li><a href="#implementation">Debugging information implementation</a></li> |
| <ol> |
| <li><a href="#impl_common_anchors">Anchors for global objects</a></li> |
| <li><a href="#impl_common_stoppoint">Representing stopping points in the source program</a></li> |
| <li><a href="#impl_common_lifetime">Object lifetimes and scoping</a></li> |
| <li><a href="#impl_common_descriptors">Object descriptor formats</a></li> |
| <ul> |
| <li><a href="#impl_common_source_files">Representation of source files</a></li> |
| <li><a href="#impl_common_globals">Representation of global objects</a></li> |
| <li><a href="#impl_common_localvars">Representation of local variables</a></li> |
| </ul> |
| <li><a href="#impl_common_intrinsics">Other intrinsic functions</a></li> |
| </ol> |
| <li><a href="#impl_ccxx">C/C++ front-end specific debug information</a></li> |
| <ol> |
| <li><a href="#impl_ccxx_descriptors">Object descriptor formats</a></li> |
| </ol> |
| </ul> |
| |
| <!-- *********************************************************************** --> |
| <div class="doc_section"><a name="introduction">Introduction</a></div> |
| <!-- *********************************************************************** --> |
| |
| <div class="doc_text"> |
| |
| <p>This document is the central repository for all information pertaining to |
| debug information in LLVM. It describes how to use the <a |
| href="CommandGuide/llvm-db.html"><tt>llvm-db</tt> tool</a>, which provides a |
| powerful <a href="#llvm-db">source-level debugger</a> to users of LLVM-based |
| compilers. When compiling a program in debug mode, the front-end in use adds |
| LLVM debugging information to the program in the form of normal <a |
| href="LangRef.html">LLVM program objects</a> as well as a small set of LLVM <a |
| href="#implementation">intrinsic functions</a>, which specify the mapping of the |
| program in LLVM form to the program in the source language. |
| </p> |
| |
| </div> |
| |
| <!-- ======================================================================= --> |
| <div class="doc_subsection"> |
| <a name="phil">Philosophy behind LLVM debugging information</a> |
| </div> |
| |
| <div class="doc_text"> |
| |
| <p> |
| The idea of the LLVM debugging information is to capture how the important |
| pieces of the source-language's Abstract Syntax Tree map onto LLVM code. |
| Several design aspects have shaped the solution that appears here. The |
| important ones are:</p> |
| |
| <p><ul> |
| <li>Debugging information should have very little impact on the rest of the |
| compiler. No transformations, analyses, or code generators should need to be |
| modified because of debugging information.</li> |
| |
| <li>LLVM optimizations should interact in <a href="#debugopt">well-defined and |
| easily described ways</a> with the debugging information.</li> |
| |
| <li>Because LLVM is designed to support arbitrary programming languages, |
| LLVM-to-LLVM tools should not need to know anything about the semantics of the |
| source-level-language.</li> |
| |
| <li>Source-level languages are often <b>widely</b> different from one another. |
| LLVM should not put any restrictions of the flavor of the source-language, and |
| the debugging information should work with any language.</li> |
| |
| <li>With code generator support, it should be possible to use an LLVM compiler |
| to compile a program to native machine code with standard debugging formats. |
| This allows compatibility with traditional machine-code level debuggers, like |
| GDB or DBX.</li> |
| |
| </ul></p> |
| |
| <p> |
| The approach used by the LLVM implementation is to use a small set of <a |
| href="#impl_common_intrinsics">intrinsic functions</a> to define a mapping |
| between LLVM program objects and the source-level objects. The description of |
| the source-level program is maintained in LLVM global variables in an <a |
| href="#impl_ccxx">implementation-defined format</a> (the C/C++ front-end |
| currently uses working draft 7 of the <a |
| href="http://www.eagercon.com/dwarf/dwarf3std.htm">Dwarf 3 standard</a>).</p> |
| |
| <p> |
| When a program is debugged, the debugger interacts with the user and turns the |
| stored debug information into source-language specific information. As such, |
| the debugger must be aware of the source-language, and is thus tied to a |
| specific language of family of languages. The <a href="#llvm-db">LLVM |
| debugger</a> is designed to be modular in its support for source-languages. |
| </p> |
| |
| </div> |
| |
| |
| <!-- ======================================================================= --> |
| <div class="doc_subsection"> |
| <a name="debugopt">Debugging optimized code</a> |
| </div> |
| |
| <div class="doc_text"> |
| <p> |
| An extremely high priority of LLVM debugging information is to make it interact |
| well with optimizations and analysis. In particular, the LLVM debug information |
| provides the following guarantees:</p> |
| |
| <p><ul> |
| |
| <li>LLVM debug information <b>always provides information to accurately read the |
| source-level state of the program</b>, regardless of which LLVM optimizations |
| have been run, and without any modification to the optimizations themselves. |
| However, some optimizations may impact the ability to modify the current state |
| of the program with a debugger, such as setting program variables, or calling |
| function that have been deleted.</li> |
| |
| <li>LLVM optimizations gracefully interact with debugging information. If they |
| are not aware of debug information, they are automatically disabled as necessary |
| in the cases that would invalidate the debug info. This retains the LLVM |
| features making it easy to write new transformations.</li> |
| |
| <li>As desired, LLVM optimizations can be upgraded to be aware of the LLVM |
| debugging information, allowing them to update the debugging information as they |
| perform aggressive optimizations. This means that, with effort, the LLVM |
| optimizers could optimize debug code just as well as non-debug code.</li> |
| |
| <li>LLVM debug information does not prevent many important optimizations from |
| happening (for example inlining, basic block reordering/merging/cleanup, tail |
| duplication, etc), further reducing the amount of the compiler that eventually |
| is "aware" of debugging information.</li> |
| |
| <li>LLVM debug information is automatically optimized along with the rest of the |
| program, using existing facilities. For example, duplicate information is |
| automatically merged by the linker, and unused information is automatically |
| removed.</li> |
| |
| </ul></p> |
| |
| <p> |
| Basically, the debug information allows you to compile a program with "<tt>-O0 |
| -g</tt>" and get full debug information, allowing you to arbitrarily modify the |
| program as it executes from the debugger. Compiling a program with "<tt>-O3 |
| -g</tt>" gives you full debug information that is always available and accurate |
| for reading (e.g., you get accurate stack traces despite tail call elimination |
| and inlining), but you might lose the ability to modify the program and call |
| functions where were optimized out of the program, or inlined away completely. |
| </p> |
| |
| </div> |
| |
| |
| <!-- ======================================================================= --> |
| <div class="doc_subsection"> |
| <a name="future">Future work</a> |
| </div> |
| |
| <div class="doc_text"> |
| <p> |
| There are several important extensions that could be eventually added to the |
| LLVM debugger. The most important extension would be to upgrade the LLVM code |
| generators to support debugging information. This would also allow, for |
| example, the X86 code generator to emit native objects that contain debugging |
| information consumable by traditional source-level debuggers like GDB or |
| DBX.</p> |
| |
| <p> |
| Additionally, LLVM optimizations can be upgraded to incrementally update the |
| debugging information, <a href="#commands">new commands</a> can be added to the |
| debugger, and thread support could be added to the debugger.</p> |
| |
| <p> |
| The "SourceLanguage" modules provided by <tt>llvm-db</tt> could be substantially |
| improved to provide good support for C++ language features like namespaces and |
| scoping rules.</p> |
| |
| <p> |
| After working with the debugger for a while, perhaps the nicest improvement |
| would be to add some sort of line editor, such as GNU readline (but that is |
| compatible with the LLVM license).</p> |
| |
| <p> |
| For someone so inclined, it should be straight-forward to write different |
| front-ends for the LLVM debugger, as the LLVM debugging engine is cleanly |
| seperated from the <tt>llvm-db</tt> front-end. A GUI debugger or IDE would be |
| an interesting project. |
| </p> |
| |
| </div> |
| |
| |
| <!-- *********************************************************************** --> |
| <div class="doc_section"> |
| <a name="llvm-db">Using the <tt>llvm-db</tt> tool</a> |
| </div> |
| <!-- *********************************************************************** --> |
| |
| <div class="doc_text"> |
| |
| <p> |
| The <tt>llvm-db</tt> tool provides a GDB-like interface for source-level |
| debugging of programs. This tool provides many standard commands for inspecting |
| and modifying the program as it executes, loading new programs, single stepping, |
| placing breakpoints, etc. This section describes how to use the debugger. |
| </p> |
| |
| <p><tt>llvm-db</tt> has been designed to be as similar to GDB in its user |
| interface as possible. This should make it extremely easy to learn |
| <tt>llvm-db</tt> if you already know <tt>GDB</tt>. In general, <tt>llvm-db</tt> |
| provides the subset of GDB commands that are applicable to LLVM debugging users. |
| If there is a command missing that make a reasonable amount of sense within the |
| <a href="#limitations">limitations of <tt>llvm-db</tt></a>, please report it as |
| a bug or, better yet, submit a patch to add it. :)</p> |
| |
| </div> |
| |
| <!-- ======================================================================= --> |
| <div class="doc_subsection"> |
| <a name="limitations">Limitations of <tt>llvm-db</tt></a> |
| </div> |
| |
| <div class="doc_text"> |
| |
| <p><tt>llvm-db</tt> is the first LLVM debugger, and as such was designed to be |
| quick to prototype and build, and simple to extend. It is missing many many |
| features, though they should be easy to add over time (patches welcomed!). |
| Because the (currently only) debugger backend (implemented in |
| "lib/Debugger/UnixLocalInferiorProcess.cpp") was designed to work without any |
| cooperation from the code generators, it suffers from the following inherent |
| limitations:</p> |
| |
| <p><ul> |
| |
| <li>Running a program in <tt>llvm-db</tt> is a bit slower than running it with |
| <tt>lli</tt>.</li> |
| |
| <li>Inspection of the target hardware is not supported. This means that you |
| cannot, for example, print the contents of X86 registers.</li> |
| |
| <li>Inspection of LLVM code is not supported. This means that you cannot print |
| the contents of arbitrary LLVM values, or use commands such as <tt>stepi</tt>. |
| This also means that you cannot debug code without debug information.</li> |
| |
| <li>Portions of the debugger run in the same address space as the program being |
| debugged. This means that memory corruption by the program could trample on |
| portions of the debugger.</li> |
| |
| <li>Attaching to existing processes and core files is not currently |
| supported.</li> |
| |
| </ul></p> |
| |
| <p>That said, it is still quite useful, and all of these limitations can be |
| eliminated by integrating support for the debugger into the code generators. |
| See the <a href="#future">future work</a> section for ideas of how to extend |
| the LLVM debugger despite these limitations.</p> |
| |
| </div> |
| |
| |
| <!-- ======================================================================= --> |
| <div class="doc_subsection"> |
| <a name="sample">A sample <tt>llvm-db</tt> session</a> |
| </div> |
| |
| <div class="doc_text"> |
| |
| <p> |
| TODO |
| </p> |
| |
| </div> |
| |
| |
| |
| <!-- ======================================================================= --> |
| <div class="doc_subsection"> |
| <a name="startup">Starting the debugger</a> |
| </div> |
| |
| <div class="doc_text"> |
| |
| <p>There are three ways to start up the <tt>llvm-db</tt> debugger:</p> |
| |
| <p>When run with no options, just <tt>llvm-db</tt>, the debugger starts up |
| without a program loaded at all. You must use the <a |
| href="#c_file"><tt>file</tt> command</a> to load a program, and the <a |
| href="c_set_args"><tt>set args</tt></a> or <a href="#c_run"><tt>run</tt></a> |
| commands to specify the arguments for the program.</p> |
| |
| <p>If you start the debugger with one argument, as <tt>llvm-db |
| <program></tt>, the debugger will start up and load in the specified |
| program. You can then optionally specify arguments to the program with the <a |
| href="c_set_args"><tt>set args</tt></a> or <a href="#c_run"><tt>run</tt></a> |
| commands.</p> |
| |
| <p>The third way to start the program is with the <tt>--args</tt> option. This |
| option allows you to specify the program to load and the arguments to start out |
| with. <!-- No options to <tt>llvm-db</tt> may be specified after the |
| <tt>-args</tt> option. --> Example use: <tt>llvm-db --args ls /home</tt></p> |
| |
| </div> |
| |
| <!-- ======================================================================= --> |
| <div class="doc_subsection"> |
| <a name="commands">Commands recognized by the debugger</a> |
| </div> |
| |
| <div class="doc_text"> |
| |
| <p>FIXME: this needs work obviously. See the <a |
| href="http://sources.redhat.com/gdb/documentation/">GDB documentation</a> for |
| information about what these do, or try '<tt>help [command]</tt>' within |
| <tt>llvm-db</tt> to get information.</p> |
| |
| <p> |
| <h2>General usage:</h2> |
| <ul> |
| <li>help [command]</li> |
| <li>quit</li> |
| <li><a name="c_file">file</a> [program]</li> |
| </ul> |
| |
| <h2>Program inspection and interaction:</h2> |
| <ul> |
| <li>create (start the program, stopping it ASAP in <tt>main</tt>)</li> |
| <li>kill</li> |
| <li>run [args]</li> |
| <li>step [num]</li> |
| <li>next [num]</li> |
| <li>cont</li> |
| <li>finish</li> |
| |
| <li>list [start[, end]]</li> |
| <li>info source</li> |
| <li>info sources</li> |
| <li>info functions</li> |
| </ul> |
| |
| <h2>Call stack inspection:</h2> |
| <ul> |
| <li>backtrace</li> |
| <li>up [n]</li> |
| <li>down [n]</li> |
| <li>frame [n]</li> |
| </ul> |
| |
| |
| <h2>Debugger inspection and interaction:</h2> |
| <ul> |
| <li>info target</li> |
| <li>show prompt</li> |
| <li>set prompt</li> |
| <li>show listsize</li> |
| <li>set listsize</li> |
| <li>show language</li> |
| <li>set language</li> |
| </ul> |
| |
| <h2>TODO:</h2> |
| <ul> |
| <li>info frame</li> |
| <li>break</li> |
| <li>print</li> |
| <li>ptype</li> |
| |
| <li>info types</li> |
| <li>info variables</li> |
| <li>info program</li> |
| |
| <li>info args</li> |
| <li>info locals</li> |
| <li>info catch</li> |
| <li>... many others</li> |
| </ul> |
| </p> |
| </div> |
| |
| <!-- *********************************************************************** --> |
| <div class="doc_section"> |
| <a name="architecture">Architecture of the LLVM debugger</a> |
| </div> |
| <!-- *********************************************************************** --> |
| |
| <div class="doc_text"> |
| |
| <p><pre> |
| lib/Debugger |
| - UnixLocalInferiorProcess.cpp |
| |
| tools/llvm-db |
| - SourceLanguage interfaces |
| - ProgramInfo/RuntimeInfo |
| - Commands |
| |
| </pre></p> |
| |
| </div> |
| |
| <!-- ======================================================================= --> |
| <div class="doc_subsection"> |
| <a name="arch_todo">Short-term TODO list</a> |
| </div> |
| |
| <div class="doc_text"> |
| |
| <p> |
| FIXME: this section will eventually go away. These are notes to myself of |
| things that should be implemented, but haven't yet. |
| </p> |
| |
| <p> |
| <b>Breakpoints:</b> Support is already implemented in the 'InferiorProcess' |
| class, though it hasn't been tested yet. To finish breakpoint support, we need |
| to implement breakCommand (which should reuse the linespec parser from the list |
| command), and handle the fact that 'break foo' or 'break file.c:53' may insert |
| multiple breakpoints. Also, if you say 'break file.c:53' and there is no |
| stoppoint on line 53, the breakpoint should go on the next available line. My |
| idea was to have the Debugger class provide a "Breakpoint" class which |
| encapsulated this messiness, giving the debugger front-end a simple interface. |
| The debugger front-end would have to map the really complex semantics of |
| temporary breakpoints and 'conditional' breakpoints onto this intermediate |
| level. Also, breakpoints should survive as much as possible across program |
| reloads. |
| </p> |
| |
| <p> |
| <b>run (with args)</b> & <b>set args</b>: These need to be implemented. |
| Currently run doesn't support setting arguments as part of the command. The |
| only tricky thing is handling quotes right and stuff.</p> |
| |
| <p> |
| <b>UnixLocalInferiorProcess.cpp speedup</b>: There is no reason for the debugged |
| process to code gen the globals corresponding to debug information. The |
| IntrinsicLowering object could instead change descriptors into constant expr |
| casts of the constant address of the LLVM objects for the descriptors. This |
| would also allow us to eliminate the mapping back and forth between physical |
| addresses that must be done.</p> |
| |
| </div> |
| |
| <!-- *********************************************************************** --> |
| <div class="doc_section"> |
| <a name="implementation">Debugging information implementation</a> |
| </div> |
| <!-- *********************************************************************** --> |
| |
| <div class="doc_text"> |
| |
| <p>LLVM debugging information has been carefully designed to make it possible |
| for the optimizer to optimize the program and debugging information without |
| necessarily having to know anything about debugging information. In particular, |
| the global constant merging pass automatically eliminates duplicated debugging |
| information (often caused by header files), the global dead code elimination |
| pass automatically deletes debugging information for a function if it decides to |
| delete the function, and the linker eliminates debug information when it merges |
| <tt>linkonce</tt> functions.</p> |
| |
| <p>To do this, most of the debugging information (descriptors for types, |
| variables, functions, source files, etc) is inserted by the language front-end |
| in the form of LLVM global variables. These LLVM global variables are no |
| different from any other global variables, except that they have a web of LLVM |
| intrinsic functions that point to them. If the last references to a particular |
| piece of debugging information are deleted (for example, by the |
| <tt>-globaldce</tt> pass), the extraneous debug information will automatically |
| become dead and be removed by the optimizer.</p> |
| |
| <p>The debugger is designed to be agnostic about the contents of most of the |
| debugging information. It uses a source-language-specific module to decode the |
| information that represents variables, types, functions, namespaces, etc: this |
| allows for arbitrary source-language semantics and type-systems to be used, as |
| long as there is a module written for the debugger to interpret the information. |
| </p> |
| |
| <p> |
| To provide basic functionality, the LLVM debugger does have to make some |
| assumptions about the source-level language being debugged, though it keeps |
| these to a minimum. The only common features that the LLVM debugger assumes |
| exist are <a href="#impl_common_source_files">source files</a>, <a |
| href="#impl_common_globals">global objects</a> (aka methods, messages, global |
| variables, etc), and <a href="#impl_common_localvars">local variables</a>. |
| These abstract objects are used by the debugger to form stack traces, show |
| information about local variables, etc. |
| |
| <p>This section of the documentation first describes the representation aspects |
| <a href="#impl_common">common to any source-language</a>. The next section |
| describes the data layout conventions used by the <a href="#impl_ccxx">C and C++ |
| front-ends</a>.</p> |
| |
| </div> |
| |
| <!-- ======================================================================= --> |
| <div class="doc_subsection"> |
| <a name="impl_common_anchors">Anchors for global objects</a> |
| </div> |
| |
| <div class="doc_text"> |
| <p> |
| One important aspect of the LLVM debug representation is that it allows the LLVM |
| debugger to efficiently index all of the global objects without having the scan |
| the program. To do this, all of the global objects use "anchor" globals of type |
| "<tt>{}</tt>", with designated names. These anchor objects obviously do not |
| contain any content or meaning by themselves, but all of the global objects of a |
| particular type (e.g., source file descriptors) contain a pointer to the anchor. |
| This pointer allows the debugger to use def-use chains to find all global |
| objects of that type. |
| </p> |
| |
| <p> |
| So far, the following names are recognized as anchors by the LLVM debugger: |
| </p> |
| |
| <p><pre> |
| %<a href="#impl_common_source_files">llvm.dbg.translation_units</a> = linkonce global {} {} |
| %<a href="#impl_common_globals">llvm.dbg.globals</a> = linkonce global {} {} |
| </pre></p> |
| |
| <p> |
| Using anchors in this way (where the source file descriptor points to the |
| anchors, as opposed to having a list of source file descriptors) allows for the |
| standard dead global elimination and merging passes to automatically remove |
| unused debugging information. If the globals were kept track of through lists, |
| there would always be an object pointing to the descriptors, thus would never be |
| deleted. |
| </p> |
| |
| </div> |
| |
| |
| <!-- ======================================================================= --> |
| <div class="doc_subsection"> |
| <a name="impl_common_stoppoint"> |
| Representing stopping points in the source program |
| </a> |
| </div> |
| |
| <div class="doc_text"> |
| |
| <p>LLVM debugger "stop points" are a key part of the debugging representation |
| that allows the LLVM to maintain simple semantics for <a |
| href="#debugopt">debugging optimized code</a>. The basic idea is that the |
| front-end inserts calls to the <tt>%llvm.dbg.stoppoint</tt> intrinsic function |
| at every point in the program where the debugger should be able to inspect the |
| program (these correspond to places the debugger stops when you "<tt>step</tt>" |
| through it). The front-end can choose to place these as fine-grained as it |
| would like (for example, before every subexpression was evaluated), but it is |
| recommended to only put them after every source statement.</p> |
| |
| <p> |
| Using calls to this intrinsic function to demark legal points for the debugger |
| to inspect the program automatically disables any optimizations that could |
| potentially confuse debugging information. To non-debug-information-aware |
| transformations, these calls simply look like calls to an external function, |
| which they must assume to do anything (including reading or writing to any part |
| of reachable memory). On the other hand, it does not impact many optimizations, |
| such as code motion of non-trapping instructions, nor does it impact |
| optimization of subexpressions, or any other code between the stop points.</p> |
| |
| <p> |
| An important aspect of the calls to the <tt>%llvm.dbg.stoppoint</tt> intrinsic |
| is that the function-local debugging information is woven together with use-def |
| chains. This makes it easy for the debugger to, for example, locate the 'next' |
| stop point. For a concrete example of stop points, see <a |
| href="#impl_common_lifetime">the next section</a>.</p> |
| |
| </div> |
| |
| |
| <!-- ======================================================================= --> |
| <div class="doc_subsection"> |
| <a name="impl_common_lifetime">Object lifetimes and scoping</a> |
| </div> |
| |
| <div class="doc_text"> |
| <p> |
| In many languages, the local variables in functions can have their lifetime or |
| scope limited to a subset of a function. In the C family of languages, for |
| example, variables are only live (readable and writable) within the source block |
| that they are defined in. In functional languages, values are only readable |
| after they have been defined. Though this is a very obvious concept, it is also |
| non-trivial to model in LLVM, because it has no notion of scoping in this sense, |
| and does not want to be tied to a language's scoping rules. |
| </p> |
| |
| <p> |
| In order to handle this, the LLVM debug format uses the notion of "regions" of a |
| function, delineated by calls to intrinsic functions. These intrinsic functions |
| define new regions of the program and indicate when the region lifetime expires. |
| Consider the following C fragment, for example: |
| </p> |
| |
| <p><pre> |
| 1. void foo() { |
| 2. int X = ...; |
| 3. int Y = ...; |
| 4. { |
| 5. int Z = ...; |
| 6. ... |
| 7. } |
| 8. ... |
| 9. } |
| </pre></p> |
| |
| <p> |
| Compiled to LLVM, this function would be represented like this (FIXME: CHECK AND |
| UPDATE THIS): |
| </p> |
| |
| <p><pre> |
| void %foo() { |
| %X = alloca int |
| %Y = alloca int |
| %Z = alloca int |
| <a name="#icl_ex_D1">%D1</a> = call {}* %llvm.dbg.func.start(<a href="#impl_common_globals">%lldb.global</a>* %d.foo) |
| %D2 = call {}* <a href="#impl_common_stoppoint">%llvm.dbg.stoppoint</a>({}* %D1, uint 2, uint 2, <a href="#impl_common_source_files">%lldb.compile_unit</a>* %file) |
| |
| %D3 = call {}* %llvm.dbg.DEFINEVARIABLE({}* %D2, ...) |
| <i>;; Evaluate expression on line 2, assigning to X.</i> |
| %D4 = call {}* <a href="#impl_common_stoppoint">%llvm.dbg.stoppoint</a>({}* %D3, uint 3, uint 2, <a href="#impl_common_source_files">%lldb.compile_unit</a>* %file) |
| |
| %D5 = call {}* %llvm.dbg.DEFINEVARIABLE({}* %D4, ...) |
| <i>;; Evaluate expression on line 3, assigning to Y.</i> |
| %D6 = call {}* <a href="#impl_common_stoppoint">%llvm.dbg.stoppoint</a>({}* %D5, uint 5, uint 4, <a href="#impl_common_source_files">%lldb.compile_unit</a>* %file) |
| |
| <a name="#icl_ex_D1">%D7</a> = call {}* %llvm.region.start({}* %D6) |
| %D8 = call {}* %llvm.dbg.DEFINEVARIABLE({}* %D7, ...) |
| <i>;; Evaluate expression on line 5, assigning to Z.</i> |
| %D9 = call {}* <a href="#impl_common_stoppoint">%llvm.dbg.stoppoint</a>({}* %D8, uint 6, uint 4, <a href="#impl_common_source_files">%lldb.compile_unit</a>* %file) |
| |
| <i>;; Code for line 6.</i> |
| %D10 = call {}* %llvm.region.end({}* %D9) |
| %D11 = call {}* <a href="#impl_common_stoppoint">%llvm.dbg.stoppoint</a>({}* %D10, uint 8, uint 2, <a href="#impl_common_source_files">%lldb.compile_unit</a>* %file) |
| |
| <i>;; Code for line 8.</i> |
| <a name="#icl_ex_D1">%D12</a> = call {}* %llvm.region.end({}* %D11) |
| ret void |
| } |
| </pre></p> |
| |
| <p> |
| This example illustrates a few important details about the LLVM debugging |
| information. In particular, it shows how the various intrinsics used are woven |
| together with def-use and use-def chains, similar to how <a |
| href="#impl_common_anchors">anchors</a> are used with globals. This allows the |
| debugger to analyze the relationship between statements, variable definitions, |
| and the code used to implement the function.</p> |
| |
| <p> |
| In this example, two explicit regions are defined, one with the <a |
| href="#icl_ex_D1">definition of the <tt>%D1</tt> variable</a> and one with the |
| <a href="#icl_ex_D7">definition of <tt>%D7</tt></a>. In the case of |
| <tt>%D1</tt>, the debug information indicates that the function whose <a |
| href="#impl_common_globals">descriptor</a> is specified as an argument to the |
| intrinsic. This defines a new stack frame whose lifetime ends when the region |
| is ended by <a href="#icl_ex_D12">the <tt>%D12</tt> call</a>.</p> |
| |
| <p> |
| Representing the boundaries of functions with regions allows normal LLVM |
| interprocedural optimizations to change the boundaries of functions without |
| having to worry about breaking mapping information between LLVM and source-level |
| functions. In particular, the inlining optimization requires no modification to |
| support inlining with debugging information: there is no correlation drawn |
| between LLVM functions and their source-level counterparts.</p> |
| |
| <p> |
| Once the function has been defined, the <a |
| href="#impl_common_stoppoint">stopping point</a> corresponding to line #2 of the |
| function is encountered. At this point in the function, <b>no</b> local |
| variables are live. As lines 2 and 3 of the example are executed, their |
| variable definitions are automatically introduced into the program, without the |
| need to specify a new region. These variables do not require new regions to be |
| introduced because they go out of scope at the same point in the program: line |
| 9. |
| </p> |
| |
| <p> |
| In contrast, the <tt>Z</tt> variable goes out of scope at a different time, on |
| line 7. For this reason, it is defined within <a href="#icl_ex_D7">the |
| <tt>%D7</tt> region</a>, which kills the availability of <tt>Z</tt> before the |
| code for line 8 is executed. Through the use of LLVM debugger regions, |
| arbitrary source-language scoping rules can be supported, as long as they can |
| only be nested (ie, one scope cannot partially overlap with a part of another |
| scope). |
| </p> |
| |
| <p> |
| It is worth noting that this scoping mechanism is used to control scoping of all |
| declarations, not just variable declarations. For example, the scope of a C++ |
| using declaration is controlled with this, and the <tt>llvm-db</tt> C++ support |
| routines could use this to change how name lookup is performed (though this is |
| not yet implemented). |
| </p> |
| |
| </div> |
| |
| |
| <!-- ======================================================================= --> |
| <div class="doc_subsection"> |
| <a name="impl_common_descriptors">Object descriptor formats</a> |
| </div> |
| |
| <div class="doc_text"> |
| <p> |
| The LLVM debugger expects the descriptors for global objects to start in a |
| canonical format, but the descriptors can include additional information |
| appended at the end. All LLVM debugging information is versioned, allowing |
| backwards compatibility in the case that the core structures need to change in |
| some way. The lowest-level descriptor are those describing <a |
| href="#impl_common_source_files">the files containing the program source |
| code</a>, all other descriptors refer to them. |
| </p> |
| </div> |
| |
| |
| <!-----------------------------------------------------------------------------> |
| <div class="doc_subsubsection"> |
| <a name="impl_common_source_files">Representation of source files</a> |
| </div> |
| |
| <div class="doc_text"> |
| <p> |
| Source file descriptors were roughly patterned after the Dwarf "compile_unit" |
| object. The descriptor currently is defined to have the following LLVM |
| type:</p> |
| |
| <p><pre> |
| %lldb.compile_unit = type { |
| ushort, <i>;; LLVM debug version number</i> |
| ushort, <i>;; Dwarf language identifier</i> |
| sbyte*, <i>;; Filename</i> |
| sbyte*, <i>;; Working directory when compiled</i> |
| sbyte*, <i>;; Producer of the debug information</i> |
| {}* <i>;; Anchor for llvm.dbg.translation_units</i> |
| } |
| </pre></p> |
| |
| <p> |
| These descriptors contain the version number for the debug info, a source |
| language ID for the file (we use the Dwarf 3.0 ID numbers, such as |
| <tt>DW_LANG_C89</tt>, <tt>DW_LANG_C_plus_plus</tt>, <tt>DW_LANG_Cobol74</tt>, |
| etc), three strings describing the filename, working directory of the compiler, |
| and an identifier string for the compiler that produced it, and the <a |
| href="#impl_common_anchors">anchor</a> for the descriptor. Here is an example |
| descriptor: |
| </p> |
| |
| <p><pre> |
| %arraytest_source_file = internal constant %lldb.compile_unit { |
| ushort 0, ; Version #0 |
| ushort 1, ; DW_LANG_C89 |
| sbyte* getelementptr ([12 x sbyte]* %.str_1, long 0, long 0), ; filename |
| sbyte* getelementptr ([12 x sbyte]* %.str_2, long 0, long 0), ; working dir |
| sbyte* getelementptr ([12 x sbyte]* %.str_3, long 0, long 0), ; producer |
| {}* %llvm.dbg.translation_units ; Anchor |
| } |
| %.str_1 = internal constant [12 x sbyte] c"arraytest.c\00" |
| %.str_2 = internal constant [12 x sbyte] c"/home/sabre\00" |
| %.str_3 = internal constant [12 x sbyte] c"llvmgcc 3.4\00" |
| </pre></p> |
| |
| |
| </div> |
| |
| |
| <!-----------------------------------------------------------------------------> |
| <div class="doc_subsubsection"> |
| <a name="impl_common_globals">Representation of global objects</a> |
| </div> |
| |
| <div class="doc_text"> |
| <p> |
| The LLVM debugger needs to know what the source-language global objects, in |
| order to build stack traces and other related activities. Because |
| source-languages have widly varying forms of global objects, the LLVM debugger |
| only expects the following fields in the descriptor for each global: |
| </p> |
| |
| <p><pre> |
| %lldb.global = type { |
| <a href="#impl_common_source_files">%lldb.compile_unit</a>*, <i>;; The translation unit containing the global</i> |
| sbyte*, <i>;; The global object 'name'</i> |
| [type]*, <i>;; Source-language type descriptor for global</i> |
| {}* <i>;; The anchor for llvm.dbg.globals</i> |
| } |
| </pre></p> |
| |
| <p> |
| The first field contains a pointer to the translation unit the function is |
| defined in. This pointer allows the debugger to find out which version of debug |
| information the function corresponds to. The second field contains a string |
| that the debugger can use to identify the subprogram if it does not contain |
| explicit support for the source-language in use. This should be some sort of |
| unmangled string that corresponds to the function somehow. |
| </p> |
| |
| <p> |
| Note again that descriptors can be extended to include source-language-specific |
| information in addition to the fields required by the LLVM debugger. See the <a |
| href="#impl_ccxx_descriptors">section on the C/C++ front-end</a> for more |
| information. |
| </p> |
| </div> |
| |
| |
| |
| <!-----------------------------------------------------------------------------> |
| <div class="doc_subsubsection"> |
| <a name="impl_common_localvars">Representation of local variables</a> |
| </div> |
| |
| <div class="doc_text"> |
| <p> |
| </p> |
| </div> |
| |
| |
| <!-- ======================================================================= --> |
| <div class="doc_subsection"> |
| <a name="impl_common_intrinsics">Other intrinsic functions</a> |
| </div> |
| |
| <div class="doc_text"> |
| <p> |
| |
| </p> |
| </div> |
| |
| |
| |
| <!-- *********************************************************************** --> |
| <div class="doc_section"> |
| <a name="impl_ccxx">C/C++ front-end specific debug information</a> |
| </div> |
| |
| <div class="doc_text"> |
| |
| <p> |
| The C and C++ front-ends represent information about the program in a format |
| that is effectively identical to <a |
| href="http://www.eagercon.com/dwarf/dwarf3std.htm">Dwarf 3.0</a> in terms of |
| information content. This allows code generators to trivially support native |
| debuggers by generating standard dwarf information, and contains enough |
| information for non-dwarf targets to translate it other as needed.</p> |
| |
| <p> |
| TODO: document extensions to standard debugging objects, document how we |
| represent source types, etc. |
| </p> |
| |
| </div> |
| |
| <!-- ======================================================================= --> |
| <div class="doc_subsection"> |
| <a name="impl_ccxx_descriptors">Object Descriptor Formats</a> |
| </div> |
| |
| <div class="doc_text"> |
| <p> |
| |
| </p> |
| </div> |
| |
| |
| |
| <!-- *********************************************************************** --> |
| <hr> |
| <div class="doc_footer"> |
| <address><a href="mailto:sabre@nondot.org">Chris Lattner</a></address> |
| <a href="http://llvm.cs.uiuc.edu">The LLVM Compiler Infrastructure</a> |
| <br> |
| Last modified: $Date$ |
| </div> |
| |
| </body> |
| </html> |