David Blaikie | 4a696b0 | 2015-02-07 23:23:43 +0000 | [diff] [blame] | 1 | ====================================== |
| 2 | Kaleidoscope: Adding Debug Information |
| 3 | ====================================== |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 4 | |
| 5 | .. contents:: |
| 6 | :local: |
| 7 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 8 | Chapter 8 Introduction |
| 9 | ====================== |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 10 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 11 | Welcome to Chapter 8 of the "`Implementing a language with |
| 12 | LLVM <index.html>`_" tutorial. In chapters 1 through 7, we've built a |
| 13 | decent little programming language with functions and variables. |
| 14 | What happens if something goes wrong though, how do you debug your |
| 15 | program? |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 16 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 17 | Source level debugging uses formatted data that helps a debugger |
| 18 | translate from binary and the state of the machine back to the |
| 19 | source that the programmer wrote. In LLVM we generally use a format |
| 20 | called `DWARF <http://dwarfstd.org>`_. DWARF is a compact encoding |
| 21 | that represents types, source locations, and variable locations. |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 22 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 23 | The short summary of this chapter is that we'll go through the |
| 24 | various things you have to add to a programming language to |
| 25 | support debug info, and how you translate that into DWARF. |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 26 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 27 | Caveat: For now we can't debug via the JIT, so we'll need to compile |
| 28 | our program down to something small and standalone. As part of this |
| 29 | we'll make a few modifications to the running of the language and |
| 30 | how programs are compiled. This means that we'll have a source file |
| 31 | with a simple program written in Kaleidoscope rather than the |
| 32 | interactive JIT. It does involve a limitation that we can only |
| 33 | have one "top level" command at a time to reduce the number of |
| 34 | changes necessary. |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 35 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 36 | Here's the sample program we'll be compiling: |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 37 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 38 | .. code-block:: python |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 39 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 40 | def fib(x) |
| 41 | if x < 3 then |
| 42 | 1 |
| 43 | else |
| 44 | fib(x-1)+fib(x-2); |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 45 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 46 | fib(10) |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 47 | |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 48 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 49 | Why is this a hard problem? |
| 50 | =========================== |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 51 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 52 | Debug information is a hard problem for a few different reasons - mostly |
| 53 | centered around optimized code. First, optimization makes keeping source |
| 54 | locations more difficult. In LLVM IR we keep the original source location |
| 55 | for each IR level instruction on the instruction. Optimization passes |
| 56 | should keep the source locations for newly created instructions, but merged |
| 57 | instructions only get to keep a single location - this can cause jumping |
| 58 | around when stepping through optimized programs. Secondly, optimization |
| 59 | can move variables in ways that are either optimized out, shared in memory |
| 60 | with other variables, or difficult to track. For the purposes of this |
| 61 | tutorial we're going to avoid optimization (as you'll see with one of the |
| 62 | next sets of patches). |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 63 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 64 | Ahead-of-Time Compilation Mode |
| 65 | ============================== |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 66 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 67 | To highlight only the aspects of adding debug information to a source |
| 68 | language without needing to worry about the complexities of JIT debugging |
| 69 | we're going to make a few changes to Kaleidoscope to support compiling |
| 70 | the IR emitted by the front end into a simple standalone program that |
| 71 | you can execute, debug, and see results. |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 72 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 73 | First we make our anonymous function that contains our top level |
| 74 | statement be our "main": |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 75 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 76 | .. code-block:: udiff |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 77 | |
Eric Christopher | 903f3db | 2014-12-08 18:48:08 +0000 | [diff] [blame] | 78 | - PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); |
| 79 | + PrototypeAST *Proto = new PrototypeAST("main", std::vector<std::string>()); |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 80 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 81 | just with the simple change of giving it a name. |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 82 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 83 | Then we're going to remove the command line code wherever it exists: |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 84 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 85 | .. code-block:: udiff |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 86 | |
Eric Christopher | 903f3db | 2014-12-08 18:48:08 +0000 | [diff] [blame] | 87 | @@ -1129,7 +1129,6 @@ static void HandleTopLevelExpression() { |
| 88 | /// top ::= definition | external | expression | ';' |
| 89 | static void MainLoop() { |
| 90 | while (1) { |
| 91 | - fprintf(stderr, "ready> "); |
| 92 | switch (CurTok) { |
| 93 | case tok_eof: |
| 94 | return; |
| 95 | @@ -1184,7 +1183,6 @@ int main() { |
| 96 | BinopPrecedence['*'] = 40; // highest. |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 97 | |
Eric Christopher | 903f3db | 2014-12-08 18:48:08 +0000 | [diff] [blame] | 98 | // Prime the first token. |
| 99 | - fprintf(stderr, "ready> "); |
| 100 | getNextToken(); |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 101 | |
| 102 | Lastly we're going to disable all of the optimization passes and the JIT so |
| 103 | that the only thing that happens after we're done parsing and generating |
| 104 | code is that the llvm IR goes to standard error: |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 105 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 106 | .. code-block:: udiff |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 107 | |
Eric Christopher | 903f3db | 2014-12-08 18:48:08 +0000 | [diff] [blame] | 108 | @@ -1108,17 +1108,8 @@ static void HandleExtern() { |
| 109 | static void HandleTopLevelExpression() { |
| 110 | // Evaluate a top-level expression into an anonymous function. |
| 111 | if (FunctionAST *F = ParseTopLevelExpr()) { |
| 112 | - if (Function *LF = F->Codegen()) { |
| 113 | - // We're just doing this to make sure it executes. |
| 114 | - TheExecutionEngine->finalizeObject(); |
| 115 | - // JIT the function, returning a function pointer. |
| 116 | - void *FPtr = TheExecutionEngine->getPointerToFunction(LF); |
| 117 | - |
| 118 | - // Cast it to the right type (takes no arguments, returns a double) so we |
| 119 | - // can call it as a native function. |
| 120 | - double (*FP)() = (double (*)())(intptr_t)FPtr; |
| 121 | - // Ignore the return value for this. |
| 122 | - (void)FP; |
| 123 | + if (!F->Codegen()) { |
| 124 | + fprintf(stderr, "Error generating code for top level expr"); |
| 125 | } |
| 126 | } else { |
| 127 | // Skip token for error recovery. |
| 128 | @@ -1439,11 +1459,11 @@ int main() { |
| 129 | // target lays out data structures. |
| 130 | TheModule->setDataLayout(TheExecutionEngine->getDataLayout()); |
| 131 | OurFPM.add(new DataLayoutPass()); |
| 132 | +#if 0 |
| 133 | OurFPM.add(createBasicAliasAnalysisPass()); |
| 134 | // Promote allocas to registers. |
| 135 | OurFPM.add(createPromoteMemoryToRegisterPass()); |
| 136 | @@ -1218,7 +1210,7 @@ int main() { |
| 137 | OurFPM.add(createGVNPass()); |
| 138 | // Simplify the control flow graph (deleting unreachable blocks, etc). |
| 139 | OurFPM.add(createCFGSimplificationPass()); |
| 140 | - |
| 141 | + #endif |
| 142 | OurFPM.doInitialization(); |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 143 | |
Eric Christopher | 903f3db | 2014-12-08 18:48:08 +0000 | [diff] [blame] | 144 | // Set the global so the code gen can use this. |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 145 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 146 | This relatively small set of changes get us to the point that we can compile |
| 147 | our piece of Kaleidoscope language down to an executable program via this |
| 148 | command line: |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 149 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 150 | .. code-block:: bash |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 151 | |
Eric Christopher | 903f3db | 2014-12-08 18:48:08 +0000 | [diff] [blame] | 152 | Kaleidoscope-Ch8 < fib.ks | & clang -x ir - |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 153 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 154 | which gives an a.out/a.exe in the current working directory. |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 155 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 156 | Compile Unit |
| 157 | ============ |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 158 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 159 | The top level container for a section of code in DWARF is a compile unit. |
| 160 | This contains the type and function data for an individual translation unit |
| 161 | (read: one file of source code). So the first thing we need to do is |
| 162 | construct one for our fib.ks file. |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 163 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 164 | DWARF Emission Setup |
| 165 | ==================== |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 166 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 167 | Similar to the ``IRBuilder`` class we have a |
| 168 | ```DIBuilder`` <http://llvm.org/doxygen/classllvm_1_1DIBuilder.html>`_ class |
| 169 | that helps in constructing debug metadata for an llvm IR file. It |
| 170 | corresponds 1:1 similarly to ``IRBuilder`` and llvm IR, but with nicer names. |
| 171 | Using it does require that you be more familiar with DWARF terminology than |
| 172 | you needed to be with ``IRBuilder`` and ``Instruction`` names, but if you |
| 173 | read through the general documentation on the |
| 174 | ```Metadata Format`` <http://llvm.org/docs/SourceLevelDebugging.html>`_ it |
| 175 | should be a little more clear. We'll be using this class to construct all |
| 176 | of our IR level descriptions. Construction for it takes a module so we |
| 177 | need to construct it shortly after we construct our module. We've left it |
| 178 | as a global static variable to make it a bit easier to use. |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 179 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 180 | Next we're going to create a small container to cache some of our frequent |
| 181 | data. The first will be our compile unit, but we'll also write a bit of |
| 182 | code for our one type since we won't have to worry about multiple typed |
| 183 | expressions: |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 184 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 185 | .. code-block:: c++ |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 186 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 187 | static DIBuilder *DBuilder; |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 188 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 189 | struct DebugInfo { |
Duncan P. N. Exon Smith | 0a35f65 | 2015-04-18 00:01:35 +0000 | [diff] [blame] | 190 | MDCompileUnit *TheCU; |
| 191 | MDType *DblTy; |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 192 | |
Duncan P. N. Exon Smith | 0a35f65 | 2015-04-18 00:01:35 +0000 | [diff] [blame] | 193 | MDType *getDoubleTy(); |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 194 | } KSDbgInfo; |
| 195 | |
Duncan P. N. Exon Smith | 0a35f65 | 2015-04-18 00:01:35 +0000 | [diff] [blame] | 196 | MDType *DebugInfo::getDoubleTy() { |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 197 | if (DblTy.isValid()) |
| 198 | return DblTy; |
| 199 | |
| 200 | DblTy = DBuilder->createBasicType("double", 64, 64, dwarf::DW_ATE_float); |
| 201 | return DblTy; |
| 202 | } |
| 203 | |
| 204 | And then later on in ``main`` when we're constructing our module: |
| 205 | |
| 206 | .. code-block:: c++ |
| 207 | |
| 208 | DBuilder = new DIBuilder(*TheModule); |
| 209 | |
| 210 | KSDbgInfo.TheCU = DBuilder->createCompileUnit( |
| 211 | dwarf::DW_LANG_C, "fib.ks", ".", "Kaleidoscope Compiler", 0, "", 0); |
| 212 | |
| 213 | There are a couple of things to note here. First, while we're producing a |
| 214 | compile unit for a language called Kaleidoscope we used the language |
| 215 | constant for C. This is because a debugger wouldn't necessarily understand |
| 216 | the calling conventions or default ABI for a language it doesn't recognize |
| 217 | and we follow the C ABI in our llvm code generation so it's the closest |
| 218 | thing to accurate. This ensures we can actually call functions from the |
| 219 | debugger and have them execute. Secondly, you'll see the "fib.ks" in the |
| 220 | call to ``createCompileUnit``. This is a default hard coded value since |
| 221 | we're using shell redirection to put our source into the Kaleidoscope |
| 222 | compiler. In a usual front end you'd have an input file name and it would |
| 223 | go there. |
| 224 | |
| 225 | One last thing as part of emitting debug information via DIBuilder is that |
| 226 | we need to "finalize" the debug information. The reasons are part of the |
| 227 | underlying API for DIBuilder, but make sure you do this near the end of |
| 228 | main: |
| 229 | |
| 230 | .. code-block:: c++ |
| 231 | |
| 232 | DBuilder->finalize(); |
| 233 | |
| 234 | before you dump out the module. |
| 235 | |
| 236 | Functions |
| 237 | ========= |
| 238 | |
| 239 | Now that we have our ``Compile Unit`` and our source locations, we can add |
| 240 | function definitions to the debug info. So in ``PrototypeAST::Codegen`` we |
| 241 | add a few lines of code to describe a context for our subprogram, in this |
| 242 | case the "File", and the actual definition of the function itself. |
| 243 | |
| 244 | So the context: |
| 245 | |
| 246 | .. code-block:: c++ |
| 247 | |
Duncan P. N. Exon Smith | 0a35f65 | 2015-04-18 00:01:35 +0000 | [diff] [blame] | 248 | MDFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(), |
| 249 | KSDbgInfo.TheCU.getDirectory()); |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 250 | |
Duncan P. N. Exon Smith | 0a35f65 | 2015-04-18 00:01:35 +0000 | [diff] [blame] | 251 | giving us an MDFile and asking the ``Compile Unit`` we created above for the |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 252 | directory and filename where we are currently. Then, for now, we use some |
| 253 | source locations of 0 (since our AST doesn't currently have source location |
| 254 | information) and construct our function definition: |
| 255 | |
| 256 | .. code-block:: c++ |
| 257 | |
Duncan P. N. Exon Smith | 0a35f65 | 2015-04-18 00:01:35 +0000 | [diff] [blame] | 258 | MDScope *FContext = Unit; |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 259 | unsigned LineNo = 0; |
| 260 | unsigned ScopeLine = 0; |
Duncan P. N. Exon Smith | 0a35f65 | 2015-04-18 00:01:35 +0000 | [diff] [blame] | 261 | MDSubprogram *SP = DBuilder->createFunction( |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 262 | FContext, Name, StringRef(), Unit, LineNo, |
| 263 | CreateFunctionType(Args.size(), Unit), false /* internal linkage */, |
Duncan P. N. Exon Smith | 0a35f65 | 2015-04-18 00:01:35 +0000 | [diff] [blame] | 264 | true /* definition */, ScopeLine, DebugNode::FlagPrototyped, false, F); |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 265 | |
Duncan P. N. Exon Smith | 0a35f65 | 2015-04-18 00:01:35 +0000 | [diff] [blame] | 266 | and we now have an MDSubprogram that contains a reference to all of our |
| 267 | metadata for the function. |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 268 | |
| 269 | Source Locations |
| 270 | ================ |
| 271 | |
| 272 | The most important thing for debug information is accurate source location - |
| 273 | this makes it possible to map your source code back. We have a problem though, |
| 274 | Kaleidoscope really doesn't have any source location information in the lexer |
| 275 | or parser so we'll need to add it. |
| 276 | |
| 277 | .. code-block:: c++ |
| 278 | |
| 279 | struct SourceLocation { |
| 280 | int Line; |
| 281 | int Col; |
| 282 | }; |
| 283 | static SourceLocation CurLoc; |
| 284 | static SourceLocation LexLoc = {1, 0}; |
| 285 | |
| 286 | static int advance() { |
| 287 | int LastChar = getchar(); |
| 288 | |
| 289 | if (LastChar == '\n' || LastChar == '\r') { |
| 290 | LexLoc.Line++; |
| 291 | LexLoc.Col = 0; |
| 292 | } else |
| 293 | LexLoc.Col++; |
| 294 | return LastChar; |
| 295 | } |
| 296 | |
| 297 | In this set of code we've added some functionality on how to keep track of the |
| 298 | line and column of the "source file". As we lex every token we set our current |
| 299 | current "lexical location" to the assorted line and column for the beginning |
| 300 | of the token. We do this by overriding all of the previous calls to |
| 301 | ``getchar()`` with our new ``advance()`` that keeps track of the information |
| 302 | and then we have added to all of our AST classes a source location: |
| 303 | |
| 304 | .. code-block:: c++ |
| 305 | |
| 306 | class ExprAST { |
| 307 | SourceLocation Loc; |
| 308 | |
| 309 | public: |
| 310 | int getLine() const { return Loc.Line; } |
| 311 | int getCol() const { return Loc.Col; } |
| 312 | ExprAST(SourceLocation Loc = CurLoc) : Loc(Loc) {} |
| 313 | virtual std::ostream &dump(std::ostream &out, int ind) { |
| 314 | return out << ':' << getLine() << ':' << getCol() << '\n'; |
| 315 | } |
| 316 | |
| 317 | that we pass down through when we create a new expression: |
| 318 | |
| 319 | .. code-block:: c++ |
| 320 | |
| 321 | LHS = new BinaryExprAST(BinLoc, BinOp, LHS, RHS); |
| 322 | |
| 323 | giving us locations for each of our expressions and variables. |
| 324 | |
| 325 | From this we can make sure to tell ``DIBuilder`` when we're at a new source |
| 326 | location so it can use that when we generate the rest of our code and make |
| 327 | sure that each instruction has source location information. We do this |
| 328 | by constructing another small function: |
| 329 | |
| 330 | .. code-block:: c++ |
| 331 | |
| 332 | void DebugInfo::emitLocation(ExprAST *AST) { |
Duncan P. N. Exon Smith | 0a35f65 | 2015-04-18 00:01:35 +0000 | [diff] [blame] | 333 | MDScope *Scope; |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 334 | if (LexicalBlocks.empty()) |
Duncan P. N. Exon Smith | 0a35f65 | 2015-04-18 00:01:35 +0000 | [diff] [blame] | 335 | Scope = TheCU; |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 336 | else |
| 337 | Scope = LexicalBlocks.back(); |
| 338 | Builder.SetCurrentDebugLocation( |
Duncan P. N. Exon Smith | 0a35f65 | 2015-04-18 00:01:35 +0000 | [diff] [blame] | 339 | DebugLoc::get(AST->getLine(), AST->getCol(), Scope)); |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 340 | } |
| 341 | |
| 342 | that both tells the main ``IRBuilder`` where we are, but also what scope |
| 343 | we're in. Since we've just created a function above we can either be in |
| 344 | the main file scope (like when we created our function), or now we can be |
| 345 | in the function scope we just created. To represent this we create a stack |
| 346 | of scopes: |
| 347 | |
| 348 | .. code-block:: c++ |
| 349 | |
Duncan P. N. Exon Smith | 0a35f65 | 2015-04-18 00:01:35 +0000 | [diff] [blame] | 350 | std::vector<MDScope *> LexicalBlocks; |
| 351 | std::map<const PrototypeAST *, MDScope *> FnScopeMap; |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 352 | |
Duncan P. N. Exon Smith | 0a35f65 | 2015-04-18 00:01:35 +0000 | [diff] [blame] | 353 | and keep a map of each function to the scope that it represents (an |
| 354 | MDSubprogram is also an MDScope). |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 355 | |
| 356 | Then we make sure to: |
| 357 | |
| 358 | .. code-block:: c++ |
| 359 | |
| 360 | KSDbgInfo.emitLocation(this); |
| 361 | |
| 362 | emit the location every time we start to generate code for a new AST, and |
| 363 | also: |
| 364 | |
| 365 | .. code-block:: c++ |
| 366 | |
| 367 | KSDbgInfo.FnScopeMap[this] = SP; |
| 368 | |
| 369 | store the scope (function) when we create it and use it: |
| 370 | |
| 371 | KSDbgInfo.LexicalBlocks.push_back(&KSDbgInfo.FnScopeMap[Proto]); |
| 372 | |
| 373 | when we start generating the code for each function. |
| 374 | |
Eric Christopher | 0dd4dd3 | 2014-12-09 00:28:24 +0000 | [diff] [blame] | 375 | also, don't forget to pop the scope back off of your scope stack at the |
| 376 | end of the code generation for the function: |
| 377 | |
| 378 | .. code-block:: c++ |
| 379 | |
| 380 | // Pop off the lexical block for the function since we added it |
| 381 | // unconditionally. |
| 382 | KSDbgInfo.LexicalBlocks.pop_back(); |
| 383 | |
| 384 | Variables |
| 385 | ========= |
| 386 | |
| 387 | Now that we have functions, we need to be able to print out the variables |
| 388 | we have in scope. Let's get our function arguments set up so we can get |
| 389 | decent backtraces and see how our functions are being called. It isn't |
| 390 | a lot of code, and we generally handle it when we're creating the |
| 391 | argument allocas in ``PrototypeAST::CreateArgumentAllocas``. |
| 392 | |
| 393 | .. code-block:: c++ |
| 394 | |
Duncan P. N. Exon Smith | 0a35f65 | 2015-04-18 00:01:35 +0000 | [diff] [blame] | 395 | MDScope *Scope = KSDbgInfo.LexicalBlocks.back(); |
| 396 | MDFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(), |
| 397 | KSDbgInfo.TheCU.getDirectory()); |
| 398 | MDLocalVariable D = DBuilder->createLocalVariable( |
| 399 | dwarf::DW_TAG_arg_variable, Scope, Args[Idx], Unit, Line, |
| 400 | KSDbgInfo.getDoubleTy(), Idx); |
Eric Christopher | 0dd4dd3 | 2014-12-09 00:28:24 +0000 | [diff] [blame] | 401 | |
| 402 | Instruction *Call = DBuilder->insertDeclare( |
| 403 | Alloca, D, DBuilder->createExpression(), Builder.GetInsertBlock()); |
Duncan P. N. Exon Smith | 0a35f65 | 2015-04-18 00:01:35 +0000 | [diff] [blame] | 404 | Call->setDebugLoc(DebugLoc::get(Line, 0, Scope)); |
Eric Christopher | 0dd4dd3 | 2014-12-09 00:28:24 +0000 | [diff] [blame] | 405 | |
| 406 | Here we're doing a few things. First, we're grabbing our current scope |
| 407 | for the variable so we can say what range of code our variable is valid |
| 408 | through. Second, we're creating the variable, giving it the scope, |
| 409 | the name, source location, type, and since it's an argument, the argument |
| 410 | index. Third, we create an ``lvm.dbg.declare`` call to indicate at the IR |
| 411 | level that we've got a variable in an alloca (and it gives a starting |
| 412 | location for the variable). Lastly, we set a source location for the |
| 413 | beginning of the scope on the declare. |
| 414 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 415 | One interesting thing to note at this point is that various debuggers have |
| 416 | assumptions based on how code and debug information was generated for them |
| 417 | in the past. In this case we need to do a little bit of a hack to avoid |
| 418 | generating line information for the function prologue so that the debugger |
| 419 | knows to skip over those instructions when setting a breakpoint. So in |
| 420 | ``FunctionAST::CodeGen`` we add a couple of lines: |
| 421 | |
| 422 | .. code-block:: c++ |
| 423 | |
| 424 | // Unset the location for the prologue emission (leading instructions with no |
| 425 | // location in a function are considered part of the prologue and the debugger |
| 426 | // will run past them when breaking on a function) |
| 427 | KSDbgInfo.emitLocation(nullptr); |
| 428 | |
| 429 | and then emit a new location when we actually start generating code for the |
| 430 | body of the function: |
| 431 | |
| 432 | .. code-block:: c++ |
| 433 | |
| 434 | KSDbgInfo.emitLocation(Body); |
| 435 | |
Eric Christopher | 0dd4dd3 | 2014-12-09 00:28:24 +0000 | [diff] [blame] | 436 | With this we have enough debug information to set breakpoints in functions, |
| 437 | print out argument variables, and call functions. Not too bad for just a |
| 438 | few simple lines of code! |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 439 | |
| 440 | Full Code Listing |
| 441 | ================= |
| 442 | |
| 443 | Here is the complete code listing for our running example, enhanced with |
| 444 | debug information. To build this example, use: |
| 445 | |
| 446 | .. code-block:: bash |
| 447 | |
| 448 | # Compile |
Eric Christopher | a8c6a0a | 2015-01-08 19:07:01 +0000 | [diff] [blame] | 449 | clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 450 | # Run |
| 451 | ./toy |
| 452 | |
| 453 | Here is the code: |
| 454 | |
| 455 | .. literalinclude:: ../../examples/Kaleidoscope/Chapter8/toy.cpp |
| 456 | :language: c++ |
| 457 | |
| 458 | `Next: Conclusion and other useful LLVM tidbits <LangImpl9.html>`_ |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 459 | |