docs/SourceLevelDebugging.html - fp2-dev/platform/external/llvm - Gitiles

 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
                       "http://www.w3.org/TR/html4/strict.dtd">
 <html>
 <head>
   <title>Source Level Debugging with LLVM</title>
   <link rel="stylesheet" href="llvm.css" type="text/css">
 </head>
 <body>

 <div class="doc_title">Source Level Debugging with LLVM</div>

 <ul>

 <img src="venusflytrap.jpg" width=247 height=369 align=right>

   <li><a href="#introduction">Introduction</a></li>
   <ol>
     <li><a href="#phil">Philosophy behind LLVM debugging information</a></li>
     <li><a href="#debugopt">Debugging optimized code</a></li>
     <li><a href="#future">Future work</a></li>
   </ol>
   <li><a href="#llvm-db">Using the <tt>llvm-db</tt> tool</a>
   <ol>
     <li><a href="#limitations">Limitations of <tt>llvm-db</tt></a></li>
     <li><a href="#sample">A sample <tt>llvm-db</tt> session</a></li>
     <li><a href="#startup">Starting the debugger</a></li>
     <li><a href="#commands">Commands recognized by the debugger</a></li>
   </ol></li>

   <li><a href="#architecture">Architecture of the LLVM debugger</a></li>
   <ol>
     <li><a href="#arch_todo">Short-term TODO list</a></li>
   </ol>

   <li><a href="#implementation">Debugging information implementation</a></li>
   <ol>
     <li><a href="#impl_common_anchors">Anchors for global objects</a></li>
     <li><a href="#impl_common_stoppoint">Representing stopping points in the source program</a></li>
     <li><a href="#impl_common_lifetime">Object lifetimes and scoping</a></li>
     <li><a href="#impl_common_descriptors">Object descriptor formats</a></li>
     <ul>
       <li><a href="#impl_common_source_files">Representation of source files</a></li>
       <li><a href="#impl_common_globals">Representation of global objects</a></li>
       <li><a href="#impl_common_localvars">Representation of local variables</a></li>
     </ul>
     <li><a href="#impl_common_intrinsics">Other intrinsic functions</a></li>
   </ol>
   <li><a href="#impl_ccxx">C/C++ front-end specific debug information</a></li>
   <ol>
     <li><a href="#impl_ccxx_descriptors">Object descriptor formats</a></li>
   </ol>
 </ul>

 <!-- *********************************************************************** -->
 <div class="doc_section"><a name="introduction">Introduction</a></div>
 <!-- *********************************************************************** -->

 <div class="doc_text">

 <p>This document is the central repository for all information pertaining to
 debug information in LLVM.  It describes how to use the <a
 href="CommandGuide/llvm-db.html"><tt>llvm-db</tt> tool</a>, which provides a
 powerful <a href="#llvm-db">source-level debugger</a> to users of LLVM-based
 compilers.  When compiling a program in debug mode, the front-end in use adds
 LLVM debugging information to the program in the form of normal <a
 href="LangRef.html">LLVM program objects</a> as well as a small set of LLVM <a
 href="#implementation">intrinsic functions</a>, which specify the mapping of the
 program in LLVM form to the program in the source language.
 </p>

 </div>

 <!-- ======================================================================= -->
 <div class="doc_subsection">
   <a name="phil">Philosophy behind LLVM debugging information</a>
 </div>

 <div class="doc_text">

 <p>
 The idea of the LLVM debugging information is to capture how the important
 pieces of the source-language's Abstract Syntax Tree map onto LLVM code.
 Several design aspects have shaped the solution that appears here.  The
 important ones are:</p>

 <p><ul>
 <li>Debugging information should have very little impact on the rest of the
 compiler.  No transformations, analyses, or code generators should need to be
 modified because of debugging information.</li>

 <li>LLVM optimizations should interact in <a href="#debugopt">well-defined and
 easily described ways</a> with the debugging information.</li>

 <li>Because LLVM is designed to support arbitrary programming languages,
 LLVM-to-LLVM tools should not need to know anything about the semantics of the
 source-level-language.</li>

 <li>Source-level languages are often <b>widely</b> different from one another.
 LLVM should not put any restrictions of the flavor of the source-language, and
 the debugging information should work with any language.</li>

 <li>With code generator support, it should be possible to use an LLVM compiler
 to compile a program to native machine code with standard debugging formats.
 This allows compatibility with traditional machine-code level debuggers, like
 GDB or DBX.</li>

 </ul></p>

 <p>
 The approach used by the LLVM implementation is to use a small set of <a
 href="#impl_common_intrinsics">intrinsic functions</a> to define a mapping
 between LLVM program objects and the source-level objects.  The description of
 the source-level program is maintained in LLVM global variables in an <a
 href="#impl_ccxx">implementation-defined format</a> (the C/C++ front-end
 currently uses working draft 7 of the <a
 href="http://www.eagercon.com/dwarf/dwarf3std.htm">Dwarf 3 standard</a>).</p>

 <p>
 When a program is debugged, the debugger interacts with the user and turns the
 stored debug information into source-language specific information.  As such,
 the debugger must be aware of the source-language, and is thus tied to a
 specific language of family of languages.  The <a href="#llvm-db">LLVM
 debugger</a> is designed to be modular in its support for source-languages.
 </p>

 </div>


 <!-- ======================================================================= -->
 <div class="doc_subsection">
   <a name="debugopt">Debugging optimized code</a>
 </div>

 <div class="doc_text">
 <p>
 An extremely high priority of LLVM debugging information is to make it interact
 well with optimizations and analysis.  In particular, the LLVM debug information
 provides the following guarantees:</p>

 <p><ul>

 <li>LLVM debug information <b>always provides information to accurately read the
 source-level state of the program</b>, regardless of which LLVM optimizations
 have been run, and without any modification to the optimizations themselves.
 However, some optimizations may impact the ability to modify the current state
 of the program with a debugger, such as setting program variables, or calling
 function that have been deleted.</li>

 <li>LLVM optimizations gracefully interact with debugging information.  If they
 are not aware of debug information, they are automatically disabled as necessary
 in the cases that would invalidate the debug info.  This retains the LLVM
 features making it easy to write new transformations.</li>

 <li>As desired, LLVM optimizations can be upgraded to be aware of the LLVM
 debugging information, allowing them to update the debugging information as they
 perform aggressive optimizations.  This means that, with effort, the LLVM
 optimizers could optimize debug code just as well as non-debug code.</li>

 <li>LLVM debug information does not prevent many important optimizations from
 happening (for example inlining, basic block reordering/merging/cleanup, tail
 duplication, etc), further reducing the amount of the compiler that eventually
 is "aware" of debugging information.</li>

 <li>LLVM debug information is automatically optimized along with the rest of the
 program, using existing facilities.  For example, duplicate information is
 automatically merged by the linker, and unused information is automatically
 removed.</li>

 </ul></p>

 <p>
 Basically, the debug information allows you to compile a program with "<tt>-O0
 -g</tt>" and get full debug information, allowing you to arbitrarily modify the
 program as it executes from the debugger.  Compiling a program with "<tt>-O3
 -g</tt>" gives you full debug information that is always available and accurate
 for reading (e.g., you get accurate stack traces despite tail call elimination
 and inlining), but you might lose the ability to modify the program and call
 functions where were optimized out of the program, or inlined away completely.
 </p>

 </div>


 <!-- ======================================================================= -->
 <div class="doc_subsection">
   <a name="future">Future work</a>
 </div>

 <div class="doc_text">
 <p>
 There are several important extensions that could be eventually added to the
 LLVM debugger.  The most important extension would be to upgrade the LLVM code
 generators to support debugging information.  This would also allow, for
 example, the X86 code generator to emit native objects that contain debugging
 information consumable by traditional source-level debuggers like GDB or
 DBX.</p>

 <p>
 Additionally, LLVM optimizations can be upgraded to incrementally update the
 debugging information, <a href="#commands">new commands</a> can be added to the
 debugger, and thread support could be added to the debugger.</p>

 <p>
 The "SourceLanguage" modules provided by <tt>llvm-db</tt> could be substantially
 improved to provide good support for C++ language features like namespaces and
 scoping rules.</p>

 <p>
 After working with the debugger for a while, perhaps the nicest improvement
 would be to add some sort of line editor, such as GNU readline (but that is
 compatible with the LLVM license).</p>

 <p>
 For someone so inclined, it should be straight-forward to write different
 front-ends for the LLVM debugger, as the LLVM debugging engine is cleanly
 seperated from the <tt>llvm-db</tt> front-end.  A GUI debugger or IDE would be
 an interesting project.
 </p>

 </div>


 <!-- *********************************************************************** -->
 <div class="doc_section">
   <a name="llvm-db">Using the <tt>llvm-db</tt> tool</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="doc_text">

 <p>
 The <tt>llvm-db</tt> tool provides a GDB-like interface for source-level
 debugging of programs.  This tool provides many standard commands for inspecting
 and modifying the program as it executes, loading new programs, single stepping,
 placing breakpoints, etc.  This section describes how to use the debugger.
 </p>

 <p><tt>llvm-db</tt> has been designed to be as similar to GDB in its user
 interface as possible.  This should make it extremely easy to learn
 <tt>llvm-db</tt> if you already know <tt>GDB</tt>.  In general, <tt>llvm-db</tt>
 provides the subset of GDB commands that are applicable to LLVM debugging users.
 If there is a command missing that make a reasonable amount of sense within the
 <a href="#limitations">limitations of <tt>llvm-db</tt></a>, please report it as
 a bug or, better yet, submit a patch to add it. :)</p>

 </div>

 <!-- ======================================================================= -->
 <div class="doc_subsection">
   <a name="limitations">Limitations of <tt>llvm-db</tt></a>
 </div>

 <div class="doc_text">

 <p><tt>llvm-db</tt> is the first LLVM debugger, and as such was designed to be
 quick to prototype and build, and simple to extend.  It is missing many many
 features, though they should be easy to add over time (patches welcomed!).
 Because the (currently only) debugger backend (implemented in
 "lib/Debugger/UnixLocalInferiorProcess.cpp") was designed to work without any
 cooperation from the code generators, it suffers from the following inherent
 limitations:</p>

 <p><ul>

 <li>Running a program in <tt>llvm-db</tt> is a bit slower than running it with
 <tt>lli</tt>.</li>

 <li>Inspection of the target hardware is not supported.  This means that you
 cannot, for example, print the contents of X86 registers.</li>

 <li>Inspection of LLVM code is not supported.  This means that you cannot print
 the contents of arbitrary LLVM values, or use commands such as <tt>stepi</tt>.
 This also means that you cannot debug code without debug information.</li>

 <li>Portions of the debugger run in the same address space as the program being
 debugged.  This means that memory corruption by the program could trample on
 portions of the debugger.</li>

 <li>Attaching to existing processes and core files is not currently
 supported.</li>

 </ul></p>

 <p>That said, it is still quite useful, and all of these limitations can be
 eliminated by integrating support for the debugger into the code generators.
 See the <a href="#future">future work</a> section for ideas of how to extend
 the LLVM debugger despite these limitations.</p>

 </div>


 <!-- ======================================================================= -->
 <div class="doc_subsection">
   <a name="sample">A sample <tt>llvm-db</tt> session</a>
 </div>

 <div class="doc_text">

 <p>
 TODO
 </p>

 </div>


 <!-- ======================================================================= -->
 <div class="doc_subsection">
   <a name="startup">Starting the debugger</a>
 </div>

 <div class="doc_text">

 <p>There are three ways to start up the <tt>llvm-db</tt> debugger:</p>

 <p>When run with no options, just <tt>llvm-db</tt>, the debugger starts up
 without a program loaded at all.  You must use the <a
 href="#c_file"><tt>file</tt> command</a> to load a program, and the <a
 href="c_set_args"><tt>set args</tt></a> or <a href="#c_run"><tt>run</tt></a>
 commands to specify the arguments for the program.</p>

 <p>If you start the debugger with one argument, as <tt>llvm-db
 &lt;program&gt;</tt>, the debugger will start up and load in the specified
 program.  You can then optionally specify arguments to the program with the <a
 href="c_set_args"><tt>set args</tt></a> or <a href="#c_run"><tt>run</tt></a>
 commands.</p>

 <p>The third way to start the program is with the <tt>--args</tt> option.  This
 option allows you to specify the program to load and the arguments to start out
 with.  <!-- No options to <tt>llvm-db</tt> may be specified after the
 <tt>-args</tt> option. --> Example use: <tt>llvm-db --args ls /home</tt></p>

 </div>

 <!-- ======================================================================= -->
 <div class="doc_subsection">
   <a name="commands">Commands recognized by the debugger</a>
 </div>

 <div class="doc_text">

 <p>FIXME: this needs work obviously.  See the <a
 href="http://sources.redhat.com/gdb/documentation/">GDB documentation</a> for
 information about what these do, or try '<tt>help [command]</tt>' within
 <tt>llvm-db</tt> to get information.</p>

 <p>
 <h2>General usage:</h2>
 <ul>
 <li>help [command]</li>
 <li>quit</li>
 <li><a name="c_file">file</a> [program]</li>
 </ul>

 <h2>Program inspection and interaction:</h2>
 <ul>
 <li>create (start the program, stopping it ASAP in <tt>main</tt>)</li>
 <li>kill</li>
 <li>run [args]</li>
 <li>step [num]</li>
 <li>next [num]</li>
 <li>cont</li>
 <li>finish</li>

 <li>list [start[, end]]</li>
 <li>info source</li>
 <li>info sources</li>
 <li>info functions</li>
 </ul>

 <h2>Call stack inspection:</h2>
 <ul>
 <li>backtrace</li>
 <li>up [n]</li>
 <li>down [n]</li>
 <li>frame [n]</li>
 </ul>


 <h2>Debugger inspection and interaction:</h2>
 <ul>
 <li>info target</li>
 <li>show prompt</li>
 <li>set prompt</li>
 <li>show listsize</li>
 <li>set listsize</li>
 <li>show language</li>
 <li>set language</li>
 </ul>

 <h2>TODO:</h2>
 <ul>
 <li>info frame</li>
 <li>break</li>
 <li>print</li>
 <li>ptype</li>

 <li>info types</li>
 <li>info variables</li>
 <li>info program</li>

 <li>info args</li>
 <li>info locals</li>
 <li>info catch</li>
 <li>... many others</li>
 </ul>
 </p>
 </div>

 <!-- *********************************************************************** -->
 <div class="doc_section">
   <a name="architecture">Architecture of the LLVM debugger</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="doc_text">

 <p><pre>
 lib/Debugger
   - UnixLocalInferiorProcess.cpp

 tools/llvm-db
   - SourceLanguage interfaces
   - ProgramInfo/RuntimeInfo
   - Commands

 </pre></p>

 </div>

 <!-- ======================================================================= -->
 <div class="doc_subsection">
   <a name="arch_todo">Short-term TODO list</a>
 </div>

 <div class="doc_text">

 <p>
 FIXME: this section will eventually go away.  These are notes to myself of
 things that should be implemented, but haven't yet.
 </p>

 <p>
 <b>Breakpoints:</b> Support is already implemented in the 'InferiorProcess'
 class, though it hasn't been tested yet.  To finish breakpoint support, we need
 to implement breakCommand (which should reuse the linespec parser from the list
 command), and handle the fact that 'break foo' or 'break file.c:53' may insert
 multiple breakpoints.  Also, if you say 'break file.c:53' and there is no
 stoppoint on line 53, the breakpoint should go on the next available line.  My
 idea was to have the Debugger class provide a "Breakpoint" class which
 encapsulated this messiness, giving the debugger front-end a simple interface.
 The debugger front-end would have to map the really complex semantics of
 temporary breakpoints and 'conditional' breakpoints onto this intermediate
 level. Also, breakpoints should survive as much as possible across program
 reloads.
 </p>

 <p>
 <b>run (with args)</b> &amp; <b>set args</b>: These need to be implemented.
 Currently run doesn't support setting arguments as part of the command.  The
 only tricky thing is handling quotes right and stuff.</p>

 <p>
 <b>UnixLocalInferiorProcess.cpp speedup</b>: There is no reason for the debugged
 process to code gen the globals corresponding to debug information.  The
 IntrinsicLowering object could instead change descriptors into constant expr
 casts of the constant address of the LLVM objects for the descriptors.  This
 would also allow us to eliminate the mapping back and forth between physical
 addresses that must be done.</p>

 </div>

 <!-- *********************************************************************** -->
 <div class="doc_section">
   <a name="implementation">Debugging information implementation</a>
 </div>
 <!-- *********************************************************************** -->

 <div class="doc_text">

 <p>LLVM debugging information has been carefully designed to make it possible
 for the optimizer to optimize the program and debugging information without
 necessarily having to know anything about debugging information.  In particular,
 the global constant merging pass automatically eliminates duplicated debugging
 information (often caused by header files), the global dead code elimination
 pass automatically deletes debugging information for a function if it decides to
 delete the function, and the linker eliminates debug information when it merges
 <tt>linkonce</tt> functions.</p>

 <p>To do this, most of the debugging information (descriptors for types,
 variables, functions, source files, etc) is inserted by the language front-end
 in the form of LLVM global variables.  These LLVM global variables are no
 different from any other global variables, except that they have a web of LLVM
 intrinsic functions that point to them.  If the last references to a particular
 piece of debugging information are deleted (for example, by the
 <tt>-globaldce</tt> pass), the extraneous debug information will automatically
 become dead and be removed by the optimizer.</p>

 <p>The debugger is designed to be agnostic about the contents of most of the
 debugging information.  It uses a source-language-specific module to decode the
 information that represents variables, types, functions, namespaces, etc: this
 allows for arbitrary source-language semantics and type-systems to be used, as
 long as there is a module written for the debugger to interpret the information.
 </p>

 <p>
 To provide basic functionality, the LLVM debugger does have to make some
 assumptions about the source-level language being debugged, though it keeps
 these to a minimum.  The only common features that the LLVM debugger assumes
 exist are <a href="#impl_common_source_files">source files</a>, <a
 href="#impl_common_globals">global objects</a> (aka methods, messages, global
 variables, etc), and <a href="#impl_common_localvars">local variables</a>.
 These abstract objects are used by the debugger to form stack traces, show
 information about local variables, etc.

 <p>This section of the documentation first describes the representation aspects
 <a href="#impl_common">common to any source-language</a>.  The next section
 describes the data layout conventions used by the <a href="#impl_ccxx">C and C++
 front-ends</a>.</p>

 </div>

 <!-- ======================================================================= -->
 <div class="doc_subsection">
   <a name="impl_common_anchors">Anchors for global objects</a>
 </div>

 <div class="doc_text">
 <p>
 One important aspect of the LLVM debug representation is that it allows the LLVM
 debugger to efficiently index all of the global objects without having the scan
 the program.  To do this, all of the global objects use "anchor" globals of type
 "<tt>{}</tt>", with designated names.  These anchor objects obviously do not
 contain any content or meaning by themselves, but all of the global objects of a
 particular type (e.g., source file descriptors) contain a pointer to the anchor.
 This pointer allows the debugger to use def-use chains to find all global
 objects of that type.
 </p>

 <p>
 So far, the following names are recognized as anchors by the LLVM debugger:
 </p>

 <p><pre>
   %<a href="#impl_common_source_files">llvm.dbg.translation_units</a> = linkonce global {} {}
   %<a href="#impl_common_globals">llvm.dbg.globals</a>         = linkonce global {} {}
 </pre></p>

 <p>
 Using anchors in this way (where the source file descriptor points to the
 anchors, as opposed to having a list of source file descriptors) allows for the
 standard dead global elimination and merging passes to automatically remove
 unused debugging information.  If the globals were kept track of through lists,
 there would always be an object pointing to the descriptors, thus would never be
 deleted.
 </p>

 </div>


 <!-- ======================================================================= -->
 <div class="doc_subsection">
   <a name="impl_common_stoppoint">
      Representing stopping points in the source program
   </a>
 </div>

 <div class="doc_text">

 <p>LLVM debugger "stop points" are a key part of the debugging representation
 that allows the LLVM to maintain simple semantics for <a
 href="#debugopt">debugging optimized code</a>.  The basic idea is that the
 front-end inserts calls to the <tt>%llvm.dbg.stoppoint</tt> intrinsic function
 at every point in the program where the debugger should be able to inspect the
 program (these correspond to places the debugger stops when you "<tt>step</tt>"
 through it).  The front-end can choose to place these as fine-grained as it
 would like (for example, before every subexpression was evaluated), but it is
 recommended to only put them after every source statement.</p>

 <p>
 Using calls to this intrinsic function to demark legal points for the debugger
 to inspect the program automatically disables any optimizations that could
 potentially confuse debugging information.  To non-debug-information-aware
 transformations, these calls simply look like calls to an external function,
 which they must assume to do anything (including reading or writing to any part
 of reachable memory).  On the other hand, it does not impact many optimizations,
 such as code motion of non-trapping instructions, nor does it impact
 optimization of subexpressions, or any other code between the stop points.</p>

 <p>
 An important aspect of the calls to the <tt>%llvm.dbg.stoppoint</tt> intrinsic
 is that the function-local debugging information is woven together with use-def
 chains.  This makes it easy for the debugger to, for example, locate the 'next'
 stop point.  For a concrete example of stop points, see <a
 href="#impl_common_lifetime">the next section</a>.</p>

 </div>


 <!-- ======================================================================= -->
 <div class="doc_subsection">
   <a name="impl_common_lifetime">Object lifetimes and scoping</a>
 </div>

 <div class="doc_text">
 <p>
 In many languages, the local variables in functions can have their lifetime or
 scope limited to a subset of a function.  In the C family of languages, for
 example, variables are only live (readable and writable) within the source block
 that they are defined in.  In functional languages, values are only readable
 after they have been defined.  Though this is a very obvious concept, it is also
 non-trivial to model in LLVM, because it has no notion of scoping in this sense,
 and does not want to be tied to a language's scoping rules.
 </p>

 <p>
 In order to handle this, the LLVM debug format uses the notion of "regions" of a
 function, delineated by calls to intrinsic functions.  These intrinsic functions
 define new regions of the program and indicate when the region lifetime expires.
 Consider the following C fragment, for example:
 </p>

 <p><pre>
 1.  void foo() {
 2.    int X = ...;
 3.    int Y = ...;
 4.    {
 5.      int Z = ...;
 6.      ...
 7.    }
 8.    ...
 9.  }
 </pre></p>

 <p>
 Compiled to LLVM, this function would be represented like this (FIXME: CHECK AND
 UPDATE THIS):
 </p>

 <p><pre>
 void %foo() {
     %X = alloca int
     %Y = alloca int
     %Z = alloca int
     <a name="#icl_ex_D1">%D1</a> = call {}* %llvm.dbg.func.start(<a href="#impl_common_globals">%lldb.global</a>* %d.foo)
     %D2 = call {}* <a href="#impl_common_stoppoint">%llvm.dbg.stoppoint</a>({}* %D1, uint 2, uint 2, <a href="#impl_common_source_files">%lldb.compile_unit</a>* %file)

     %D3 = call {}* %llvm.dbg.DEFINEVARIABLE({}* %D2, ...)
     <i>;; Evaluate expression on line 2, assigning to X.</i>
     %D4 = call {}* <a href="#impl_common_stoppoint">%llvm.dbg.stoppoint</a>({}* %D3, uint 3, uint 2, <a href="#impl_common_source_files">%lldb.compile_unit</a>* %file)

     %D5 = call {}* %llvm.dbg.DEFINEVARIABLE({}* %D4, ...)
     <i>;; Evaluate expression on line 3, assigning to Y.</i>
     %D6 = call {}* <a href="#impl_common_stoppoint">%llvm.dbg.stoppoint</a>({}* %D5, uint 5, uint 4, <a href="#impl_common_source_files">%lldb.compile_unit</a>* %file)

     <a name="#icl_ex_D1">%D7</a> = call {}* %llvm.region.start({}* %D6)
     %D8 = call {}* %llvm.dbg.DEFINEVARIABLE({}* %D7, ...)
     <i>;; Evaluate expression on line 5, assigning to Z.</i>
     %D9 = call {}* <a href="#impl_common_stoppoint">%llvm.dbg.stoppoint</a>({}* %D8, uint 6, uint 4, <a href="#impl_common_source_files">%lldb.compile_unit</a>* %file)

     <i>;; Code for line 6.</i>
     %D10 = call {}* %llvm.region.end({}* %D9)
     %D11 = call {}* <a href="#impl_common_stoppoint">%llvm.dbg.stoppoint</a>({}* %D10, uint 8, uint 2, <a href="#impl_common_source_files">%lldb.compile_unit</a>* %file)

     <i>;; Code for line 8.</i>
     <a name="#icl_ex_D1">%D12</a> = call {}* %llvm.region.end({}* %D11)
     ret void
 }
 </pre></p>

 <p>
 This example illustrates a few important details about the LLVM debugging
 information.  In particular, it shows how the various intrinsics used are woven
 together with def-use and use-def chains, similar to how <a
 href="#impl_common_anchors">anchors</a> are used with globals.  This allows the
 debugger to analyze the relationship between statements, variable definitions,
 and the code used to implement the function.</p>

 <p>
 In this example, two explicit regions are defined, one with the <a
 href="#icl_ex_D1">definition of the <tt>%D1</tt> variable</a> and one with the
 <a href="#icl_ex_D7">definition of <tt>%D7</tt></a>.  In the case of
 <tt>%D1</tt>, the debug information indicates that the function whose <a
 href="#impl_common_globals">descriptor</a> is specified as an argument to the
 intrinsic.  This defines a new stack frame whose lifetime ends when the region
 is ended by <a href="#icl_ex_D12">the <tt>%D12</tt> call</a>.</p>

 <p>
 Representing the boundaries of functions with regions allows normal LLVM
 interprocedural optimizations to change the boundaries of functions without
 having to worry about breaking mapping information between LLVM and source-level
 functions.  In particular, the inlining optimization requires no modification to
 support inlining with debugging information: there is no correlation drawn
 between LLVM functions and their source-level counterparts.</p>

 <p>
 Once the function has been defined, the <a
 href="#impl_common_stoppoint">stopping point</a> corresponding to line #2 of the
 function is encountered.  At this point in the function, <b>no</b> local
 variables are live.  As lines 2 and 3 of the example are executed, their
 variable definitions are automatically introduced into the program, without the
 need to specify a new region.  These variables do not require new regions to be
 introduced because they go out of scope at the same point in the program: line
 9.
 </p>

 <p>
 In contrast, the <tt>Z</tt> variable goes out of scope at a different time, on
 line 7.  For this reason, it is defined within <a href="#icl_ex_D7">the
 <tt>%D7</tt> region</a>, which kills the availability of <tt>Z</tt> before the
 code for line 8 is executed.  Through the use of LLVM debugger regions,
 arbitrary source-language scoping rules can be supported, as long as they can
 only be nested (ie, one scope cannot partially overlap with a part of another
 scope).
 </p>

 <p>
 It is worth noting that this scoping mechanism is used to control scoping of all
 declarations, not just variable declarations.  For example, the scope of a C++
 using declaration is controlled with this, and the <tt>llvm-db</tt> C++ support
 routines could use this to change how name lookup is performed (though this is
 not yet implemented).
 </p>

 </div>


 <!-- ======================================================================= -->
 <div class="doc_subsection">
   <a name="impl_common_descriptors">Object descriptor formats</a>
 </div>

 <div class="doc_text">
 <p>
 The LLVM debugger expects the descriptors for global objects to start in a
 canonical format, but the descriptors can include additional information
 appended at the end.  All LLVM debugging information is versioned, allowing
 backwards compatibility in the case that the core structures need to change in
 some way.  The lowest-level descriptor are those describing <a
 href="#impl_common_source_files">the files containing the program source
 code</a>, all other descriptors refer to them.
 </p>
 </div>


 <!----------------------------------------------------------------------------->
 <div class="doc_subsubsection">
   <a name="impl_common_source_files">Representation of source files</a>
 </div>

 <div class="doc_text">
 <p>
 Source file descriptors were roughly patterned after the Dwarf "compile_unit"
 object.  The descriptor currently is defined to have the following LLVM
 type:</p>

 <p><pre>
 %lldb.compile_unit = type {
        ushort,               <i>;; LLVM debug version number</i>
        ushort,               <i>;; Dwarf language identifier</i>
        sbyte*,               <i>;; Filename</i>
        sbyte*,               <i>;; Working directory when compiled</i>
        sbyte*,               <i>;; Producer of the debug information</i>
        {}*                   <i>;; Anchor for llvm.dbg.translation_units</i>
 }
 </pre></p>

 <p>
 These descriptors contain the version number for the debug info, a source
 language ID for the file (we use the Dwarf 3.0 ID numbers, such as
 <tt>DW_LANG_C89</tt>, <tt>DW_LANG_C_plus_plus</tt>, <tt>DW_LANG_Cobol74</tt>,
 etc), three strings describing the filename, working directory of the compiler,
 and an identifier string for the compiler that produced it, and the <a
 href="#impl_common_anchors">anchor</a> for the descriptor.  Here is an example
 descriptor:
 </p>

 <p><pre>
 %arraytest_source_file = internal constant %lldb.compile_unit {
     ushort 0,                                                     ; Version #0
     ushort 1,                                                     ; DW_LANG_C89
     sbyte* getelementptr ([12 x sbyte]* %.str_1, long 0, long 0), ; filename
     sbyte* getelementptr ([12 x sbyte]* %.str_2, long 0, long 0), ; working dir
     sbyte* getelementptr ([12 x sbyte]* %.str_3, long 0, long 0), ; producer
     {}* %llvm.dbg.translation_units                               ; Anchor
 }
 %.str_1 = internal constant [12 x sbyte] c"arraytest.c\00"
 %.str_2 = internal constant [12 x sbyte] c"/home/sabre\00"
 %.str_3 = internal constant [12 x sbyte] c"llvmgcc 3.4\00"
 </pre></p>


 </div>


 <!----------------------------------------------------------------------------->
 <div class="doc_subsubsection">
   <a name="impl_common_globals">Representation of global objects</a>
 </div>

 <div class="doc_text">
 <p>
 The LLVM debugger needs to know what the source-language global objects, in
 order to build stack traces and other related activities.  Because
 source-languages have widly varying forms of global objects, the LLVM debugger
 only expects the following fields in the descriptor for each global:
 </p>

 <p><pre>
 %lldb.global = type {
        <a href="#impl_common_source_files">%lldb.compile_unit</a>*,   <i>;; The translation unit containing the global</i>
        sbyte*,                <i>;; The global object 'name'</i>
        [type]*,               <i>;; Source-language type descriptor for global</i>
        {}*                    <i>;; The anchor for llvm.dbg.globals</i>
 }
 </pre></p>

 <p>
 The first field contains a pointer to the translation unit the function is
 defined in.  This pointer allows the debugger to find out which version of debug
 information the function corresponds to.  The second field contains a string
 that the debugger can use to identify the subprogram if it does not contain
 explicit support for the source-language in use.  This should be some sort of
 unmangled string that corresponds to the function somehow.
 </p>

 <p>
 Note again that descriptors can be extended to include source-language-specific
 information in addition to the fields required by the LLVM debugger.  See the <a
 href="#impl_ccxx_descriptors">section on the C/C++ front-end</a> for more
 information.
 </p>
 </div>


 <!----------------------------------------------------------------------------->
 <div class="doc_subsubsection">
   <a name="impl_common_localvars">Representation of local variables</a>
 </div>

 <div class="doc_text">
 <p>
 </p>
 </div>


 <!-- ======================================================================= -->
 <div class="doc_subsection">
   <a name="impl_common_intrinsics">Other intrinsic functions</a>
 </div>

 <div class="doc_text">
 <p>

 </p>
 </div>


 <!-- *********************************************************************** -->
 <div class="doc_section">
   <a name="impl_ccxx">C/C++ front-end specific debug information</a>
 </div>

 <div class="doc_text">

 <p>
 The C and C++ front-ends represent information about the program in a format
 that is effectively identical to <a
 href="http://www.eagercon.com/dwarf/dwarf3std.htm">Dwarf 3.0</a> in terms of
 information content.  This allows code generators to trivially support native
 debuggers by generating standard dwarf information, and contains enough
 information for non-dwarf targets to translate it other as needed.</p>

 <p>
 TODO: document extensions to standard debugging objects, document how we
 represent source types, etc.
 </p>

 </div>

 <!-- ======================================================================= -->
 <div class="doc_subsection">
   <a name="impl_ccxx_descriptors">Object Descriptor Formats</a>
 </div>

 <div class="doc_text">
 <p>

 </p>
 </div>


 <!-- *********************************************************************** -->
 <hr>
 <div class="doc_footer">
   <address><a href="mailto:sabre@nondot.org">Chris Lattner</a></address>
   <a href="http://llvm.cs.uiuc.edu">The LLVM Compiler Infrastructure</a>
   <br>
   Last modified: $Date$
 </div>

 </body>
 </html>