Chris Lattner | bdfb339 | 2004-01-05 05:06:33 +0000 | [diff] [blame^] | 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" |
| 2 | "http://www.w3.org/TR/html4/strict.dtd"> |
| 3 | <html> |
| 4 | <head> |
| 5 | <title>Source Level Debugging with LLVM</title> |
| 6 | <link rel="stylesheet" href="llvm.css" type="text/css"> |
| 7 | </head> |
| 8 | <body> |
| 9 | |
| 10 | <div class="doc_title">Source Level Debugging with LLVM</div> |
| 11 | |
| 12 | <ul> |
| 13 | |
| 14 | <img src="venusflytrap.jpg" width=247 height=369 align=right> |
| 15 | |
| 16 | <li><a href="#introduction">Introduction</a></li> |
| 17 | <ol> |
| 18 | <li><a href="#phil">Philosophy behind LLVM debugging information</a></li> |
| 19 | <li><a href="#debugopt">Debugging optimized code</a></li> |
| 20 | <li><a href="#future">Future work</a></li> |
| 21 | </ol> |
| 22 | <li><a href="#llvm-db">Using the <tt>llvm-db</tt> tool</a> |
| 23 | <ol> |
| 24 | <li><a href="#limitations">Limitations of <tt>llvm-db</tt></a></li> |
| 25 | <li><a href="#sample">A sample <tt>llvm-db</tt> session</a></li> |
| 26 | <li><a href="#startup">Starting the debugger</a></li> |
| 27 | <li><a href="#commands">Commands recognized by the debugger</a></li> |
| 28 | </ol></li> |
| 29 | |
| 30 | <li><a href="#architecture">Architecture of the LLVM debugger</a></li> |
| 31 | <ol> |
| 32 | <li><a href="#arch_todo">Short-term TODO list</a></li> |
| 33 | </ol> |
| 34 | |
| 35 | <li><a href="#implementation">Debugging information implementation</a></li> |
| 36 | <ol> |
| 37 | <li><a href="#impl_common_anchors">Anchors for global objects</a></li> |
| 38 | <li><a href="#impl_common_stoppoint">Representing stopping points in the source program</a></li> |
| 39 | <li><a href="#impl_common_lifetime">Object lifetimes and scoping</a></li> |
| 40 | <li><a href="#impl_common_descriptors">Object descriptor formats</a></li> |
| 41 | <ul> |
| 42 | <li><a href="#impl_common_source_files">Representation of source files</a></li> |
| 43 | <li><a href="#impl_common_globals">Representation of global objects</a></li> |
| 44 | <li><a href="#impl_common_localvars">Representation of local variables</a></li> |
| 45 | </ul> |
| 46 | <li><a href="#impl_common_intrinsics">Other intrinsic functions</a></li> |
| 47 | </ol> |
| 48 | <li><a href="#impl_ccxx">C/C++ front-end specific debug information</a></li> |
| 49 | <ol> |
| 50 | <li><a href="#impl_ccxx_descriptors">Object descriptor formats</a></li> |
| 51 | </ol> |
| 52 | </ul> |
| 53 | |
| 54 | <!-- *********************************************************************** --> |
| 55 | <div class="doc_section"><a name="introduction">Introduction</a></div> |
| 56 | <!-- *********************************************************************** --> |
| 57 | |
| 58 | <div class="doc_text"> |
| 59 | |
| 60 | <p>This document is the central repository for all information pertaining to |
| 61 | debug information in LLVM. It describes how to use the <a |
| 62 | href="CommandGuide/llvm-db.html"><tt>llvm-db</tt> tool</a>, which provides a |
| 63 | powerful <a href="#llvm-db">source-level debugger</a> to users of LLVM-based |
| 64 | compilers. When compiling a program in debug mode, the front-end in use adds |
| 65 | LLVM debugging information to the program in the form of normal <a |
| 66 | href="LangRef.html">LLVM program objects</a> as well as a small set of LLVM <a |
| 67 | href="#implementation">intrinsic functions</a>, which specify the mapping of the |
| 68 | program in LLVM form to the program in the source language. |
| 69 | </p> |
| 70 | |
| 71 | </div> |
| 72 | |
| 73 | <!-- ======================================================================= --> |
| 74 | <div class="doc_subsection"> |
| 75 | <a name="phil">Philosophy behind LLVM debugging information</a> |
| 76 | </div> |
| 77 | |
| 78 | <div class="doc_text"> |
| 79 | |
| 80 | <p> |
| 81 | The idea of the LLVM debugging information is to capture how the important |
| 82 | pieces of the source-language's Abstract Syntax Tree map onto LLVM code. |
| 83 | Several design aspects have shaped the solution that appears here. The |
| 84 | important ones are:</p> |
| 85 | |
| 86 | <p><ul> |
| 87 | <li>Debugging information should have very little impact on the rest of the |
| 88 | compiler. No transformations, analyses, or code generators should need to be |
| 89 | modified because of debugging information.</li> |
| 90 | |
| 91 | <li>LLVM optimizations should interact in <a href="#debugopt">well-defined and |
| 92 | easily described ways</a> with the debugging information.</li> |
| 93 | |
| 94 | <li>Because LLVM is designed to support arbitrary programming languages, |
| 95 | LLVM-to-LLVM tools should not need to know anything about the semantics of the |
| 96 | source-level-language.</li> |
| 97 | |
| 98 | <li>Source-level languages are often <b>widely</b> different from one another. |
| 99 | LLVM should not put any restrictions of the flavor of the source-language, and |
| 100 | the debugging information should work with any language.</li> |
| 101 | |
| 102 | <li>With code generator support, it should be possible to use an LLVM compiler |
| 103 | to compile a program to native machine code with standard debugging formats. |
| 104 | This allows compatibility with traditional machine-code level debuggers, like |
| 105 | GDB or DBX.</li> |
| 106 | |
| 107 | </ul></p> |
| 108 | |
| 109 | <p> |
| 110 | The approach used by the LLVM implementation is to use a small set of <a |
| 111 | href="#impl_common_intrinsics">intrinsic functions</a> to define a mapping |
| 112 | between LLVM program objects and the source-level objects. The description of |
| 113 | the source-level program is maintained in LLVM global variables in an <a |
| 114 | href="#impl_ccxx">implementation-defined format</a> (the C/C++ front-end |
| 115 | currently uses working draft 7 of the <a |
| 116 | href="http://www.eagercon.com/dwarf/dwarf3std.htm">Dwarf 3 standard</a>).</p> |
| 117 | |
| 118 | <p> |
| 119 | When a program is debugged, the debugger interacts with the user and turns the |
| 120 | stored debug information into source-language specific information. As such, |
| 121 | the debugger must be aware of the source-language, and is thus tied to a |
| 122 | specific language of family of languages. The <a href="#llvm-db">LLVM |
| 123 | debugger</a> is designed to be modular in its support for source-languages. |
| 124 | </p> |
| 125 | |
| 126 | </div> |
| 127 | |
| 128 | |
| 129 | <!-- ======================================================================= --> |
| 130 | <div class="doc_subsection"> |
| 131 | <a name="debugopt">Debugging optimized code</a> |
| 132 | </div> |
| 133 | |
| 134 | <div class="doc_text"> |
| 135 | <p> |
| 136 | An extremely high priority of LLVM debugging information is to make it interact |
| 137 | well with optimizations and analysis. In particular, the LLVM debug information |
| 138 | provides the following guarantees:</p> |
| 139 | |
| 140 | <p><ul> |
| 141 | |
| 142 | <li>LLVM debug information <b>always provides information to accurately read the |
| 143 | source-level state of the program</b>, regardless of which LLVM optimizations |
| 144 | have been run, and without any modification to the optimizations themselves. |
| 145 | However, some optimizations may impact the ability to modify the current state |
| 146 | of the program with a debugger, such as setting program variables, or calling |
| 147 | function that have been deleted.</li> |
| 148 | |
| 149 | <li>LLVM optimizations gracefully interact with debugging information. If they |
| 150 | are not aware of debug information, they are automatically disabled as necessary |
| 151 | in the cases that would invalidate the debug info. This retains the LLVM |
| 152 | features making it easy to write new transformations.</li> |
| 153 | |
| 154 | <li>As desired, LLVM optimizations can be upgraded to be aware of the LLVM |
| 155 | debugging information, allowing them to update the debugging information as they |
| 156 | perform aggressive optimizations. This means that, with effort, the LLVM |
| 157 | optimizers could optimize debug code just as well as non-debug code.</li> |
| 158 | |
| 159 | <li>LLVM debug information does not prevent many important optimizations from |
| 160 | happening (for example inlining, basic block reordering/merging/cleanup, tail |
| 161 | duplication, etc), further reducing the amount of the compiler that eventually |
| 162 | is "aware" of debugging information.</li> |
| 163 | |
| 164 | <li>LLVM debug information is automatically optimized along with the rest of the |
| 165 | program, using existing facilities. For example, duplicate information is |
| 166 | automatically merged by the linker, and unused information is automatically |
| 167 | removed.</li> |
| 168 | |
| 169 | </ul></p> |
| 170 | |
| 171 | <p> |
| 172 | Basically, the debug information allows you to compile a program with "<tt>-O0 |
| 173 | -g</tt>" and get full debug information, allowing you to arbitrarily modify the |
| 174 | program as it executes from the debugger. Compiling a program with "<tt>-O3 |
| 175 | -g</tt>" gives you full debug information that is always available and accurate |
| 176 | for reading (e.g., you get accurate stack traces despite tail call elimination |
| 177 | and inlining), but you might lose the ability to modify the program and call |
| 178 | functions where were optimized out of the program, or inlined away completely. |
| 179 | </p> |
| 180 | |
| 181 | </div> |
| 182 | |
| 183 | |
| 184 | <!-- ======================================================================= --> |
| 185 | <div class="doc_subsection"> |
| 186 | <a name="future">Future work</a> |
| 187 | </div> |
| 188 | |
| 189 | <div class="doc_text"> |
| 190 | <p> |
| 191 | There are several important extensions that could be eventually added to the |
| 192 | LLVM debugger. The most important extension would be to upgrade the LLVM code |
| 193 | generators to support debugging information. This would also allow, for |
| 194 | example, the X86 code generator to emit native objects that contain debugging |
| 195 | information consumable by traditional source-level debuggers like GDB or |
| 196 | DBX.</p> |
| 197 | |
| 198 | <p> |
| 199 | Additionally, LLVM optimizations can be upgraded to incrementally update the |
| 200 | debugging information, <a href="#commands">new commands</a> can be added to the |
| 201 | debugger, and thread support could be added to the debugger.</p> |
| 202 | |
| 203 | <p> |
| 204 | The "SourceLanguage" modules provided by <tt>llvm-db</tt> could be substantially |
| 205 | improved to provide good support for C++ language features like namespaces and |
| 206 | scoping rules.</p> |
| 207 | |
| 208 | <p> |
| 209 | After working with the debugger for a while, perhaps the nicest improvement |
| 210 | would be to add some sort of line editor, such as GNU readline (but that is |
| 211 | compatible with the LLVM license).</p> |
| 212 | |
| 213 | <p> |
| 214 | For someone so inclined, it should be straight-forward to write different |
| 215 | front-ends for the LLVM debugger, as the LLVM debugging engine is cleanly |
| 216 | seperated from the <tt>llvm-db</tt> front-end. A GUI debugger or IDE would be |
| 217 | an interesting project. |
| 218 | </p> |
| 219 | |
| 220 | </div> |
| 221 | |
| 222 | |
| 223 | <!-- *********************************************************************** --> |
| 224 | <div class="doc_section"> |
| 225 | <a name="llvm-db">Using the <tt>llvm-db</tt> tool</a> |
| 226 | </div> |
| 227 | <!-- *********************************************************************** --> |
| 228 | |
| 229 | <div class="doc_text"> |
| 230 | |
| 231 | <p> |
| 232 | The <tt>llvm-db</tt> tool provides a GDB-like interface for source-level |
| 233 | debugging of programs. This tool provides many standard commands for inspecting |
| 234 | and modifying the program as it executes, loading new programs, single stepping, |
| 235 | placing breakpoints, etc. This section describes how to use the debugger. |
| 236 | </p> |
| 237 | |
| 238 | <p><tt>llvm-db</tt> has been designed to be as similar to GDB in its user |
| 239 | interface as possible. This should make it extremely easy to learn |
| 240 | <tt>llvm-db</tt> if you already know <tt>GDB</tt>. In general, <tt>llvm-db</tt> |
| 241 | provides the subset of GDB commands that are applicable to LLVM debugging users. |
| 242 | If there is a command missing that make a reasonable amount of sense within the |
| 243 | <a href="#limitations">limitations of <tt>llvm-db</tt></a>, please report it as |
| 244 | a bug or, better yet, submit a patch to add it. :)</p> |
| 245 | |
| 246 | </div> |
| 247 | |
| 248 | <!-- ======================================================================= --> |
| 249 | <div class="doc_subsection"> |
| 250 | <a name="limitations">Limitations of <tt>llvm-db</tt></a> |
| 251 | </div> |
| 252 | |
| 253 | <div class="doc_text"> |
| 254 | |
| 255 | <p><tt>llvm-db</tt> is the first LLVM debugger, and as such was designed to be |
| 256 | quick to prototype and build, and simple to extend. It is missing many many |
| 257 | features, though they should be easy to add over time (patches welcomed!). |
| 258 | Because the (currently only) debugger backend (implemented in |
| 259 | "lib/Debugger/UnixLocalInferiorProcess.cpp") was designed to work without any |
| 260 | cooperation from the code generators, it suffers from the following inherent |
| 261 | limitations:</p> |
| 262 | |
| 263 | <p><ul> |
| 264 | |
| 265 | <li>Running a program in <tt>llvm-db</tt> is a bit slower than running it with |
| 266 | <tt>lli</tt>.</li> |
| 267 | |
| 268 | <li>Inspection of the target hardware is not supported. This means that you |
| 269 | cannot, for example, print the contents of X86 registers.</li> |
| 270 | |
| 271 | <li>Inspection of LLVM code is not supported. This means that you cannot print |
| 272 | the contents of arbitrary LLVM values, or use commands such as <tt>stepi</tt>. |
| 273 | This also means that you cannot debug code without debug information.</li> |
| 274 | |
| 275 | <li>Portions of the debugger run in the same address space as the program being |
| 276 | debugged. This means that memory corruption by the program could trample on |
| 277 | portions of the debugger.</li> |
| 278 | |
| 279 | <li>Attaching to existing processes and core files is not currently |
| 280 | supported.</li> |
| 281 | |
| 282 | </ul></p> |
| 283 | |
| 284 | <p>That said, it is still quite useful, and all of these limitations can be |
| 285 | eliminated by integrating support for the debugger into the code generators. |
| 286 | See the <a href="#future">future work</a> section for ideas of how to extend |
| 287 | the LLVM debugger despite these limitations.</p> |
| 288 | |
| 289 | </div> |
| 290 | |
| 291 | |
| 292 | <!-- ======================================================================= --> |
| 293 | <div class="doc_subsection"> |
| 294 | <a name="sample">A sample <tt>llvm-db</tt> session</a> |
| 295 | </div> |
| 296 | |
| 297 | <div class="doc_text"> |
| 298 | |
| 299 | <p> |
| 300 | TODO |
| 301 | </p> |
| 302 | |
| 303 | </div> |
| 304 | |
| 305 | |
| 306 | |
| 307 | <!-- ======================================================================= --> |
| 308 | <div class="doc_subsection"> |
| 309 | <a name="startup">Starting the debugger</a> |
| 310 | </div> |
| 311 | |
| 312 | <div class="doc_text"> |
| 313 | |
| 314 | <p>There are three ways to start up the <tt>llvm-db</tt> debugger:</p> |
| 315 | |
| 316 | <p>When run with no options, just <tt>llvm-db</tt>, the debugger starts up |
| 317 | without a program loaded at all. You must use the <a |
| 318 | href="#c_file"><tt>file</tt> command</a> to load a program, and the <a |
| 319 | href="c_set_args"><tt>set args</tt></a> or <a href="#c_run"><tt>run</tt></a> |
| 320 | commands to specify the arguments for the program.</p> |
| 321 | |
| 322 | <p>If you start the debugger with one argument, as <tt>llvm-db |
| 323 | <program></tt>, the debugger will start up and load in the specified |
| 324 | program. You can then optionally specify arguments to the program with the <a |
| 325 | href="c_set_args"><tt>set args</tt></a> or <a href="#c_run"><tt>run</tt></a> |
| 326 | commands.</p> |
| 327 | |
| 328 | <p>The third way to start the program is with the <tt>--args</tt> option. This |
| 329 | option allows you to specify the program to load and the arguments to start out |
| 330 | with. <!-- No options to <tt>llvm-db</tt> may be specified after the |
| 331 | <tt>-args</tt> option. --> Example use: <tt>llvm-db --args ls /home</tt></p> |
| 332 | |
| 333 | </div> |
| 334 | |
| 335 | <!-- ======================================================================= --> |
| 336 | <div class="doc_subsection"> |
| 337 | <a name="commands">Commands recognized by the debugger</a> |
| 338 | </div> |
| 339 | |
| 340 | <div class="doc_text"> |
| 341 | |
| 342 | <p>FIXME: this needs work obviously. See the <a |
| 343 | href="http://sources.redhat.com/gdb/documentation/">GDB documentation</a> for |
| 344 | information about what these do, or try '<tt>help [command]</tt>' within |
| 345 | <tt>llvm-db</tt> to get information.</p> |
| 346 | |
| 347 | <p> |
| 348 | <h2>General usage:</h2> |
| 349 | <ul> |
| 350 | <li>help [command]</li> |
| 351 | <li>quit</li> |
| 352 | <li><a name="c_file">file</a> [program]</li> |
| 353 | </ul> |
| 354 | |
| 355 | <h2>Program inspection and interaction:</h2> |
| 356 | <ul> |
| 357 | <li>create (start the program, stopping it ASAP in <tt>main</tt>)</li> |
| 358 | <li>kill</li> |
| 359 | <li>run [args]</li> |
| 360 | <li>step [num]</li> |
| 361 | <li>next [num]</li> |
| 362 | <li>cont</li> |
| 363 | <li>finish</li> |
| 364 | |
| 365 | <li>list [start[, end]]</li> |
| 366 | <li>info source</li> |
| 367 | <li>info sources</li> |
| 368 | <li>info functions</li> |
| 369 | </ul> |
| 370 | |
| 371 | <h2>Call stack inspection:</h2> |
| 372 | <ul> |
| 373 | <li>backtrace</li> |
| 374 | <li>up [n]</li> |
| 375 | <li>down [n]</li> |
| 376 | <li>frame [n]</li> |
| 377 | </ul> |
| 378 | |
| 379 | |
| 380 | <h2>Debugger inspection and interaction:</h2> |
| 381 | <ul> |
| 382 | <li>info target</li> |
| 383 | <li>show prompt</li> |
| 384 | <li>set prompt</li> |
| 385 | <li>show listsize</li> |
| 386 | <li>set listsize</li> |
| 387 | <li>show language</li> |
| 388 | <li>set language</li> |
| 389 | </ul> |
| 390 | |
| 391 | <h2>TODO:</h2> |
| 392 | <ul> |
| 393 | <li>info frame</li> |
| 394 | <li>break</li> |
| 395 | <li>print</li> |
| 396 | <li>ptype</li> |
| 397 | |
| 398 | <li>info types</li> |
| 399 | <li>info variables</li> |
| 400 | <li>info program</li> |
| 401 | |
| 402 | <li>info args</li> |
| 403 | <li>info locals</li> |
| 404 | <li>info catch</li> |
| 405 | <li>... many others</li> |
| 406 | </ul> |
| 407 | </p> |
| 408 | </div> |
| 409 | |
| 410 | <!-- *********************************************************************** --> |
| 411 | <div class="doc_section"> |
| 412 | <a name="architecture">Architecture of the LLVM debugger</a> |
| 413 | </div> |
| 414 | <!-- *********************************************************************** --> |
| 415 | |
| 416 | <div class="doc_text"> |
| 417 | |
| 418 | <p><pre> |
| 419 | lib/Debugger |
| 420 | - UnixLocalInferiorProcess.cpp |
| 421 | |
| 422 | tools/llvm-db |
| 423 | - SourceLanguage interfaces |
| 424 | - ProgramInfo/RuntimeInfo |
| 425 | - Commands |
| 426 | |
| 427 | </pre></p> |
| 428 | |
| 429 | </div> |
| 430 | |
| 431 | <!-- ======================================================================= --> |
| 432 | <div class="doc_subsection"> |
| 433 | <a name="arch_todo">Short-term TODO list</a> |
| 434 | </div> |
| 435 | |
| 436 | <div class="doc_text"> |
| 437 | |
| 438 | <p> |
| 439 | FIXME: this section will eventually go away. These are notes to myself of |
| 440 | things that should be implemented, but haven't yet. |
| 441 | </p> |
| 442 | |
| 443 | <p> |
| 444 | <b>Breakpoints:</b> Support is already implemented in the 'InferiorProcess' |
| 445 | class, though it hasn't been tested yet. To finish breakpoint support, we need |
| 446 | to implement breakCommand (which should reuse the linespec parser from the list |
| 447 | command), and handle the fact that 'break foo' or 'break file.c:53' may insert |
| 448 | multiple breakpoints. Also, if you say 'break file.c:53' and there is no |
| 449 | stoppoint on line 53, the breakpoint should go on the next available line. My |
| 450 | idea was to have the Debugger class provide a "Breakpoint" class which |
| 451 | encapsulated this messiness, giving the debugger front-end a simple interface. |
| 452 | The debugger front-end would have to map the really complex semantics of |
| 453 | temporary breakpoints and 'conditional' breakpoints onto this intermediate |
| 454 | level. Also, breakpoints should survive as much as possible across program |
| 455 | reloads. |
| 456 | </p> |
| 457 | |
| 458 | <p> |
| 459 | <b>run (with args)</b> & <b>set args</b>: These need to be implemented. |
| 460 | Currently run doesn't support setting arguments as part of the command. The |
| 461 | only tricky thing is handling quotes right and stuff.</p> |
| 462 | |
| 463 | <p> |
| 464 | <b>UnixLocalInferiorProcess.cpp speedup</b>: There is no reason for the debugged |
| 465 | process to code gen the globals corresponding to debug information. The |
| 466 | IntrinsicLowering object could instead change descriptors into constant expr |
| 467 | casts of the constant address of the LLVM objects for the descriptors. This |
| 468 | would also allow us to eliminate the mapping back and forth between physical |
| 469 | addresses that must be done.</p> |
| 470 | |
| 471 | </div> |
| 472 | |
| 473 | <!-- *********************************************************************** --> |
| 474 | <div class="doc_section"> |
| 475 | <a name="implementation">Debugging information implementation</a> |
| 476 | </div> |
| 477 | <!-- *********************************************************************** --> |
| 478 | |
| 479 | <div class="doc_text"> |
| 480 | |
| 481 | <p>LLVM debugging information has been carefully designed to make it possible |
| 482 | for the optimizer to optimize the program and debugging information without |
| 483 | necessarily having to know anything about debugging information. In particular, |
| 484 | the global constant merging pass automatically eliminates duplicated debugging |
| 485 | information (often caused by header files), the global dead code elimination |
| 486 | pass automatically deletes debugging information for a function if it decides to |
| 487 | delete the function, and the linker eliminates debug information when it merges |
| 488 | <tt>linkonce</tt> functions.</p> |
| 489 | |
| 490 | <p>To do this, most of the debugging information (descriptors for types, |
| 491 | variables, functions, source files, etc) is inserted by the language front-end |
| 492 | in the form of LLVM global variables. These LLVM global variables are no |
| 493 | different from any other global variables, except that they have a web of LLVM |
| 494 | intrinsic functions that point to them. If the last references to a particular |
| 495 | piece of debugging information are deleted (for example, by the |
| 496 | <tt>-globaldce</tt> pass), the extraneous debug information will automatically |
| 497 | become dead and be removed by the optimizer.</p> |
| 498 | |
| 499 | <p>The debugger is designed to be agnostic about the contents of most of the |
| 500 | debugging information. It uses a source-language-specific module to decode the |
| 501 | information that represents variables, types, functions, namespaces, etc: this |
| 502 | allows for arbitrary source-language semantics and type-systems to be used, as |
| 503 | long as there is a module written for the debugger to interpret the information. |
| 504 | </p> |
| 505 | |
| 506 | <p> |
| 507 | To provide basic functionality, the LLVM debugger does have to make some |
| 508 | assumptions about the source-level language being debugged, though it keeps |
| 509 | these to a minimum. The only common features that the LLVM debugger assumes |
| 510 | exist are <a href="#impl_common_source_files">source files</a>, <a |
| 511 | href="#impl_common_globals">global objects</a> (aka methods, messages, global |
| 512 | variables, etc), and <a href="#impl_common_localvars">local variables</a>. |
| 513 | These abstract objects are used by the debugger to form stack traces, show |
| 514 | information about local variables, etc. |
| 515 | |
| 516 | <p>This section of the documentation first describes the representation aspects |
| 517 | <a href="#impl_common">common to any source-language</a>. The next section |
| 518 | describes the data layout conventions used by the <a href="#impl_ccxx">C and C++ |
| 519 | front-ends</a>.</p> |
| 520 | |
| 521 | </div> |
| 522 | |
| 523 | <!-- ======================================================================= --> |
| 524 | <div class="doc_subsection"> |
| 525 | <a name="impl_common_anchors">Anchors for global objects</a> |
| 526 | </div> |
| 527 | |
| 528 | <div class="doc_text"> |
| 529 | <p> |
| 530 | One important aspect of the LLVM debug representation is that it allows the LLVM |
| 531 | debugger to efficiently index all of the global objects without having the scan |
| 532 | the program. To do this, all of the global objects use "anchor" globals of type |
| 533 | "<tt>{}</tt>", with designated names. These anchor objects obviously do not |
| 534 | contain any content or meaning by themselves, but all of the global objects of a |
| 535 | particular type (e.g., source file descriptors) contain a pointer to the anchor. |
| 536 | This pointer allows the debugger to use def-use chains to find all global |
| 537 | objects of that type. |
| 538 | </p> |
| 539 | |
| 540 | <p> |
| 541 | So far, the following names are recognized as anchors by the LLVM debugger: |
| 542 | </p> |
| 543 | |
| 544 | <p><pre> |
| 545 | %<a href="#impl_common_source_files">llvm.dbg.translation_units</a> = linkonce global {} {} |
| 546 | %<a href="#impl_common_globals">llvm.dbg.globals</a> = linkonce global {} {} |
| 547 | </pre></p> |
| 548 | |
| 549 | <p> |
| 550 | Using anchors in this way (where the source file descriptor points to the |
| 551 | anchors, as opposed to having a list of source file descriptors) allows for the |
| 552 | standard dead global elimination and merging passes to automatically remove |
| 553 | unused debugging information. If the globals were kept track of through lists, |
| 554 | there would always be an object pointing to the descriptors, thus would never be |
| 555 | deleted. |
| 556 | </p> |
| 557 | |
| 558 | </div> |
| 559 | |
| 560 | |
| 561 | <!-- ======================================================================= --> |
| 562 | <div class="doc_subsection"> |
| 563 | <a name="impl_common_stoppoint"> |
| 564 | Representing stopping points in the source program |
| 565 | </a> |
| 566 | </div> |
| 567 | |
| 568 | <div class="doc_text"> |
| 569 | |
| 570 | <p>LLVM debugger "stop points" are a key part of the debugging representation |
| 571 | that allows the LLVM to maintain simple semantics for <a |
| 572 | href="#debugopt">debugging optimized code</a>. The basic idea is that the |
| 573 | front-end inserts calls to the <tt>%llvm.dbg.stoppoint</tt> intrinsic function |
| 574 | at every point in the program where the debugger should be able to inspect the |
| 575 | program (these correspond to places the debugger stops when you "<tt>step</tt>" |
| 576 | through it). The front-end can choose to place these as fine-grained as it |
| 577 | would like (for example, before every subexpression was evaluated), but it is |
| 578 | recommended to only put them after every source statement.</p> |
| 579 | |
| 580 | <p> |
| 581 | Using calls to this intrinsic function to demark legal points for the debugger |
| 582 | to inspect the program automatically disables any optimizations that could |
| 583 | potentially confuse debugging information. To non-debug-information-aware |
| 584 | transformations, these calls simply look like calls to an external function, |
| 585 | which they must assume to do anything (including reading or writing to any part |
| 586 | of reachable memory). On the other hand, it does not impact many optimizations, |
| 587 | such as code motion of non-trapping instructions, nor does it impact |
| 588 | optimization of subexpressions, or any other code between the stop points.</p> |
| 589 | |
| 590 | <p> |
| 591 | An important aspect of the calls to the <tt>%llvm.dbg.stoppoint</tt> intrinsic |
| 592 | is that the function-local debugging information is woven together with use-def |
| 593 | chains. This makes it easy for the debugger to, for example, locate the 'next' |
| 594 | stop point. For a concrete example of stop points, see <a |
| 595 | href="#impl_common_lifetime">the next section</a>.</p> |
| 596 | |
| 597 | </div> |
| 598 | |
| 599 | |
| 600 | <!-- ======================================================================= --> |
| 601 | <div class="doc_subsection"> |
| 602 | <a name="impl_common_lifetime">Object lifetimes and scoping</a> |
| 603 | </div> |
| 604 | |
| 605 | <div class="doc_text"> |
| 606 | <p> |
| 607 | In many languages, the local variables in functions can have their lifetime or |
| 608 | scope limited to a subset of a function. In the C family of languages, for |
| 609 | example, variables are only live (readable and writable) within the source block |
| 610 | that they are defined in. In functional languages, values are only readable |
| 611 | after they have been defined. Though this is a very obvious concept, it is also |
| 612 | non-trivial to model in LLVM, because it has no notion of scoping in this sense, |
| 613 | and does not want to be tied to a language's scoping rules. |
| 614 | </p> |
| 615 | |
| 616 | <p> |
| 617 | In order to handle this, the LLVM debug format uses the notion of "regions" of a |
| 618 | function, delineated by calls to intrinsic functions. These intrinsic functions |
| 619 | define new regions of the program and indicate when the region lifetime expires. |
| 620 | Consider the following C fragment, for example: |
| 621 | </p> |
| 622 | |
| 623 | <p><pre> |
| 624 | 1. void foo() { |
| 625 | 2. int X = ...; |
| 626 | 3. int Y = ...; |
| 627 | 4. { |
| 628 | 5. int Z = ...; |
| 629 | 6. ... |
| 630 | 7. } |
| 631 | 8. ... |
| 632 | 9. } |
| 633 | </pre></p> |
| 634 | |
| 635 | <p> |
| 636 | Compiled to LLVM, this function would be represented like this (FIXME: CHECK AND |
| 637 | UPDATE THIS): |
| 638 | </p> |
| 639 | |
| 640 | <p><pre> |
| 641 | void %foo() { |
| 642 | %X = alloca int |
| 643 | %Y = alloca int |
| 644 | %Z = alloca int |
| 645 | <a name="#icl_ex_D1">%D1</a> = call {}* %llvm.dbg.func.start(<a href="#impl_common_globals">%lldb.global</a>* %d.foo) |
| 646 | %D2 = call {}* <a href="#impl_common_stoppoint">%llvm.dbg.stoppoint</a>({}* %D1, uint 2, uint 2, <a href="#impl_common_source_files">%lldb.compile_unit</a>* %file) |
| 647 | |
| 648 | %D3 = call {}* %llvm.dbg.DEFINEVARIABLE({}* %D2, ...) |
| 649 | <i>;; Evaluate expression on line 2, assigning to X.</i> |
| 650 | %D4 = call {}* <a href="#impl_common_stoppoint">%llvm.dbg.stoppoint</a>({}* %D3, uint 3, uint 2, <a href="#impl_common_source_files">%lldb.compile_unit</a>* %file) |
| 651 | |
| 652 | %D5 = call {}* %llvm.dbg.DEFINEVARIABLE({}* %D4, ...) |
| 653 | <i>;; Evaluate expression on line 3, assigning to Y.</i> |
| 654 | %D6 = call {}* <a href="#impl_common_stoppoint">%llvm.dbg.stoppoint</a>({}* %D5, uint 5, uint 4, <a href="#impl_common_source_files">%lldb.compile_unit</a>* %file) |
| 655 | |
| 656 | <a name="#icl_ex_D1">%D7</a> = call {}* %llvm.region.start({}* %D6) |
| 657 | %D8 = call {}* %llvm.dbg.DEFINEVARIABLE({}* %D7, ...) |
| 658 | <i>;; Evaluate expression on line 5, assigning to Z.</i> |
| 659 | %D9 = call {}* <a href="#impl_common_stoppoint">%llvm.dbg.stoppoint</a>({}* %D8, uint 6, uint 4, <a href="#impl_common_source_files">%lldb.compile_unit</a>* %file) |
| 660 | |
| 661 | <i>;; Code for line 6.</i> |
| 662 | %D10 = call {}* %llvm.region.end({}* %D9) |
| 663 | %D11 = call {}* <a href="#impl_common_stoppoint">%llvm.dbg.stoppoint</a>({}* %D10, uint 8, uint 2, <a href="#impl_common_source_files">%lldb.compile_unit</a>* %file) |
| 664 | |
| 665 | <i>;; Code for line 8.</i> |
| 666 | <a name="#icl_ex_D1">%D12</a> = call {}* %llvm.region.end({}* %D11) |
| 667 | ret void |
| 668 | } |
| 669 | </pre></p> |
| 670 | |
| 671 | <p> |
| 672 | This example illustrates a few important details about the LLVM debugging |
| 673 | information. In particular, it shows how the various intrinsics used are woven |
| 674 | together with def-use and use-def chains, similar to how <a |
| 675 | href="#impl_common_anchors">anchors</a> are used with globals. This allows the |
| 676 | debugger to analyze the relationship between statements, variable definitions, |
| 677 | and the code used to implement the function.</p> |
| 678 | |
| 679 | <p> |
| 680 | In this example, two explicit regions are defined, one with the <a |
| 681 | href="#icl_ex_D1">definition of the <tt>%D1</tt> variable</a> and one with the |
| 682 | <a href="#icl_ex_D7">definition of <tt>%D7</tt></a>. In the case of |
| 683 | <tt>%D1</tt>, the debug information indicates that the function whose <a |
| 684 | href="#impl_common_globals">descriptor</a> is specified as an argument to the |
| 685 | intrinsic. This defines a new stack frame whose lifetime ends when the region |
| 686 | is ended by <a href="#icl_ex_D12">the <tt>%D12</tt> call</a>.</p> |
| 687 | |
| 688 | <p> |
| 689 | Representing the boundaries of functions with regions allows normal LLVM |
| 690 | interprocedural optimizations to change the boundaries of functions without |
| 691 | having to worry about breaking mapping information between LLVM and source-level |
| 692 | functions. In particular, the inlining optimization requires no modification to |
| 693 | support inlining with debugging information: there is no correlation drawn |
| 694 | between LLVM functions and their source-level counterparts.</p> |
| 695 | |
| 696 | <p> |
| 697 | Once the function has been defined, the <a |
| 698 | href="#impl_common_stoppoint">stopping point</a> corresponding to line #2 of the |
| 699 | function is encountered. At this point in the function, <b>no</b> local |
| 700 | variables are live. As lines 2 and 3 of the example are executed, their |
| 701 | variable definitions are automatically introduced into the program, without the |
| 702 | need to specify a new region. These variables do not require new regions to be |
| 703 | introduced because they go out of scope at the same point in the program: line |
| 704 | 9. |
| 705 | </p> |
| 706 | |
| 707 | <p> |
| 708 | In contrast, the <tt>Z</tt> variable goes out of scope at a different time, on |
| 709 | line 7. For this reason, it is defined within <a href="#icl_ex_D7">the |
| 710 | <tt>%D7</tt> region</a>, which kills the availability of <tt>Z</tt> before the |
| 711 | code for line 8 is executed. Through the use of LLVM debugger regions, |
| 712 | arbitrary source-language scoping rules can be supported, as long as they can |
| 713 | only be nested (ie, one scope cannot partially overlap with a part of another |
| 714 | scope). |
| 715 | </p> |
| 716 | |
| 717 | <p> |
| 718 | It is worth noting that this scoping mechanism is used to control scoping of all |
| 719 | declarations, not just variable declarations. For example, the scope of a C++ |
| 720 | using declaration is controlled with this, and the <tt>llvm-db</tt> C++ support |
| 721 | routines could use this to change how name lookup is performed (though this is |
| 722 | not yet implemented). |
| 723 | </p> |
| 724 | |
| 725 | </div> |
| 726 | |
| 727 | |
| 728 | <!-- ======================================================================= --> |
| 729 | <div class="doc_subsection"> |
| 730 | <a name="impl_common_descriptors">Object descriptor formats</a> |
| 731 | </div> |
| 732 | |
| 733 | <div class="doc_text"> |
| 734 | <p> |
| 735 | The LLVM debugger expects the descriptors for global objects to start in a |
| 736 | canonical format, but the descriptors can include additional information |
| 737 | appended at the end. All LLVM debugging information is versioned, allowing |
| 738 | backwards compatibility in the case that the core structures need to change in |
| 739 | some way. The lowest-level descriptor are those describing <a |
| 740 | href="#impl_common_source_files">the files containing the program source |
| 741 | code</a>, all other descriptors refer to them. |
| 742 | </p> |
| 743 | </div> |
| 744 | |
| 745 | |
| 746 | <!-----------------------------------------------------------------------------> |
| 747 | <div class="doc_subsubsection"> |
| 748 | <a name="impl_common_source_files">Representation of source files</a> |
| 749 | </div> |
| 750 | |
| 751 | <div class="doc_text"> |
| 752 | <p> |
| 753 | Source file descriptors were roughly patterned after the Dwarf "compile_unit" |
| 754 | object. The descriptor currently is defined to have the following LLVM |
| 755 | type:</p> |
| 756 | |
| 757 | <p><pre> |
| 758 | %lldb.compile_unit = type { |
| 759 | ushort, <i>;; LLVM debug version number</i> |
| 760 | ushort, <i>;; Dwarf language identifier</i> |
| 761 | sbyte*, <i>;; Filename</i> |
| 762 | sbyte*, <i>;; Working directory when compiled</i> |
| 763 | sbyte*, <i>;; Producer of the debug information</i> |
| 764 | {}* <i>;; Anchor for llvm.dbg.translation_units</i> |
| 765 | } |
| 766 | </pre></p> |
| 767 | |
| 768 | <p> |
| 769 | These descriptors contain the version number for the debug info, a source |
| 770 | language ID for the file (we use the Dwarf 3.0 ID numbers, such as |
| 771 | <tt>DW_LANG_C89</tt>, <tt>DW_LANG_C_plus_plus</tt>, <tt>DW_LANG_Cobol74</tt>, |
| 772 | etc), three strings describing the filename, working directory of the compiler, |
| 773 | and an identifier string for the compiler that produced it, and the <a |
| 774 | href="#impl_common_anchors">anchor</a> for the descriptor. Here is an example |
| 775 | descriptor: |
| 776 | </p> |
| 777 | |
| 778 | <p><pre> |
| 779 | %arraytest_source_file = internal constant %lldb.compile_unit { |
| 780 | ushort 0, ; Version #0 |
| 781 | ushort 1, ; DW_LANG_C89 |
| 782 | sbyte* getelementptr ([12 x sbyte]* %.str_1, long 0, long 0), ; filename |
| 783 | sbyte* getelementptr ([12 x sbyte]* %.str_2, long 0, long 0), ; working dir |
| 784 | sbyte* getelementptr ([12 x sbyte]* %.str_3, long 0, long 0), ; producer |
| 785 | {}* %llvm.dbg.translation_units ; Anchor |
| 786 | } |
| 787 | %.str_1 = internal constant [12 x sbyte] c"arraytest.c\00" |
| 788 | %.str_2 = internal constant [12 x sbyte] c"/home/sabre\00" |
| 789 | %.str_3 = internal constant [12 x sbyte] c"llvmgcc 3.4\00" |
| 790 | </pre></p> |
| 791 | |
| 792 | |
| 793 | </div> |
| 794 | |
| 795 | |
| 796 | <!-----------------------------------------------------------------------------> |
| 797 | <div class="doc_subsubsection"> |
| 798 | <a name="impl_common_globals">Representation of global objects</a> |
| 799 | </div> |
| 800 | |
| 801 | <div class="doc_text"> |
| 802 | <p> |
| 803 | The LLVM debugger needs to know what the source-language global objects, in |
| 804 | order to build stack traces and other related activities. Because |
| 805 | source-languages have widly varying forms of global objects, the LLVM debugger |
| 806 | only expects the following fields in the descriptor for each global: |
| 807 | </p> |
| 808 | |
| 809 | <p><pre> |
| 810 | %lldb.global = type { |
| 811 | <a href="#impl_common_source_files">%lldb.compile_unit</a>*, <i>;; The translation unit containing the global</i> |
| 812 | sbyte*, <i>;; The global object 'name'</i> |
| 813 | [type]*, <i>;; Source-language type descriptor for global</i> |
| 814 | {}* <i>;; The anchor for llvm.dbg.globals</i> |
| 815 | } |
| 816 | </pre></p> |
| 817 | |
| 818 | <p> |
| 819 | The first field contains a pointer to the translation unit the function is |
| 820 | defined in. This pointer allows the debugger to find out which version of debug |
| 821 | information the function corresponds to. The second field contains a string |
| 822 | that the debugger can use to identify the subprogram if it does not contain |
| 823 | explicit support for the source-language in use. This should be some sort of |
| 824 | unmangled string that corresponds to the function somehow. |
| 825 | </p> |
| 826 | |
| 827 | <p> |
| 828 | Note again that descriptors can be extended to include source-language-specific |
| 829 | information in addition to the fields required by the LLVM debugger. See the <a |
| 830 | href="#impl_ccxx_descriptors">section on the C/C++ front-end</a> for more |
| 831 | information. |
| 832 | </p> |
| 833 | </div> |
| 834 | |
| 835 | |
| 836 | |
| 837 | <!-----------------------------------------------------------------------------> |
| 838 | <div class="doc_subsubsection"> |
| 839 | <a name="impl_common_localvars">Representation of local variables</a> |
| 840 | </div> |
| 841 | |
| 842 | <div class="doc_text"> |
| 843 | <p> |
| 844 | </p> |
| 845 | </div> |
| 846 | |
| 847 | |
| 848 | <!-- ======================================================================= --> |
| 849 | <div class="doc_subsection"> |
| 850 | <a name="impl_common_intrinsics">Other intrinsic functions</a> |
| 851 | </div> |
| 852 | |
| 853 | <div class="doc_text"> |
| 854 | <p> |
| 855 | |
| 856 | </p> |
| 857 | </div> |
| 858 | |
| 859 | |
| 860 | |
| 861 | <!-- *********************************************************************** --> |
| 862 | <div class="doc_section"> |
| 863 | <a name="impl_ccxx">C/C++ front-end specific debug information</a> |
| 864 | </div> |
| 865 | |
| 866 | <div class="doc_text"> |
| 867 | |
| 868 | <p> |
| 869 | The C and C++ front-ends represent information about the program in a format |
| 870 | that is effectively identical to <a |
| 871 | href="http://www.eagercon.com/dwarf/dwarf3std.htm">Dwarf 3.0</a> in terms of |
| 872 | information content. This allows code generators to trivially support native |
| 873 | debuggers by generating standard dwarf information, and contains enough |
| 874 | information for non-dwarf targets to translate it other as needed.</p> |
| 875 | |
| 876 | <p> |
| 877 | TODO: document extensions to standard debugging objects, document how we |
| 878 | represent source types, etc. |
| 879 | </p> |
| 880 | |
| 881 | </div> |
| 882 | |
| 883 | <!-- ======================================================================= --> |
| 884 | <div class="doc_subsection"> |
| 885 | <a name="impl_ccxx_descriptors">Object Descriptor Formats</a> |
| 886 | </div> |
| 887 | |
| 888 | <div class="doc_text"> |
| 889 | <p> |
| 890 | |
| 891 | </p> |
| 892 | </div> |
| 893 | |
| 894 | |
| 895 | |
| 896 | <!-- *********************************************************************** --> |
| 897 | <hr> |
| 898 | <div class="doc_footer"> |
| 899 | <address><a href="mailto:sabre@nondot.org">Chris Lattner</a></address> |
| 900 | <a href="http://llvm.cs.uiuc.edu">The LLVM Compiler Infrastructure</a> |
| 901 | <br> |
| 902 | Last modified: $Date$ |
| 903 | </div> |
| 904 | |
| 905 | </body> |
| 906 | </html> |