David Blaikie | 4a696b0 | 2015-02-07 23:23:43 +0000 | [diff] [blame] | 1 | ====================================== |
| 2 | Kaleidoscope: Adding Debug Information |
| 3 | ====================================== |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 4 | |
| 5 | .. contents:: |
| 6 | :local: |
| 7 | |
Wilfred Hughes | 945f43e | 2016-07-02 17:01:59 +0000 | [diff] [blame] | 8 | Chapter 9 Introduction |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 9 | ====================== |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 10 | |
Wilfred Hughes | 945f43e | 2016-07-02 17:01:59 +0000 | [diff] [blame] | 11 | Welcome to Chapter 9 of the "`Implementing a language with |
| 12 | LLVM <index.html>`_" tutorial. In chapters 1 through 8, we've built a |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 13 | decent little programming language with functions and variables. |
| 14 | What happens if something goes wrong though, how do you debug your |
| 15 | program? |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 16 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 17 | Source level debugging uses formatted data that helps a debugger |
| 18 | translate from binary and the state of the machine back to the |
| 19 | source that the programmer wrote. In LLVM we generally use a format |
| 20 | called `DWARF <http://dwarfstd.org>`_. DWARF is a compact encoding |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 21 | that represents types, source locations, and variable locations. |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 22 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 23 | The short summary of this chapter is that we'll go through the |
| 24 | various things you have to add to a programming language to |
| 25 | support debug info, and how you translate that into DWARF. |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 26 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 27 | Caveat: For now we can't debug via the JIT, so we'll need to compile |
| 28 | our program down to something small and standalone. As part of this |
| 29 | we'll make a few modifications to the running of the language and |
| 30 | how programs are compiled. This means that we'll have a source file |
| 31 | with a simple program written in Kaleidoscope rather than the |
| 32 | interactive JIT. It does involve a limitation that we can only |
| 33 | have one "top level" command at a time to reduce the number of |
| 34 | changes necessary. |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 35 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 36 | Here's the sample program we'll be compiling: |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 37 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 38 | .. code-block:: python |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 39 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 40 | def fib(x) |
| 41 | if x < 3 then |
| 42 | 1 |
| 43 | else |
| 44 | fib(x-1)+fib(x-2); |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 45 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 46 | fib(10) |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 47 | |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 48 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 49 | Why is this a hard problem? |
| 50 | =========================== |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 51 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 52 | Debug information is a hard problem for a few different reasons - mostly |
| 53 | centered around optimized code. First, optimization makes keeping source |
| 54 | locations more difficult. In LLVM IR we keep the original source location |
| 55 | for each IR level instruction on the instruction. Optimization passes |
| 56 | should keep the source locations for newly created instructions, but merged |
| 57 | instructions only get to keep a single location - this can cause jumping |
| 58 | around when stepping through optimized programs. Secondly, optimization |
| 59 | can move variables in ways that are either optimized out, shared in memory |
| 60 | with other variables, or difficult to track. For the purposes of this |
| 61 | tutorial we're going to avoid optimization (as you'll see with one of the |
| 62 | next sets of patches). |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 63 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 64 | Ahead-of-Time Compilation Mode |
| 65 | ============================== |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 66 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 67 | To highlight only the aspects of adding debug information to a source |
| 68 | language without needing to worry about the complexities of JIT debugging |
| 69 | we're going to make a few changes to Kaleidoscope to support compiling |
| 70 | the IR emitted by the front end into a simple standalone program that |
| 71 | you can execute, debug, and see results. |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 72 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 73 | First we make our anonymous function that contains our top level |
| 74 | statement be our "main": |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 75 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 76 | .. code-block:: udiff |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 77 | |
Lang Hames | 09bf4c1 | 2015-08-18 18:11:06 +0000 | [diff] [blame] | 78 | - auto Proto = llvm::make_unique<PrototypeAST>("", std::vector<std::string>()); |
| 79 | + auto Proto = llvm::make_unique<PrototypeAST>("main", std::vector<std::string>()); |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 80 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 81 | just with the simple change of giving it a name. |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 82 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 83 | Then we're going to remove the command line code wherever it exists: |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 84 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 85 | .. code-block:: udiff |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 86 | |
Eric Christopher | 903f3db | 2014-12-08 18:48:08 +0000 | [diff] [blame] | 87 | @@ -1129,7 +1129,6 @@ static void HandleTopLevelExpression() { |
| 88 | /// top ::= definition | external | expression | ';' |
| 89 | static void MainLoop() { |
| 90 | while (1) { |
| 91 | - fprintf(stderr, "ready> "); |
| 92 | switch (CurTok) { |
| 93 | case tok_eof: |
| 94 | return; |
| 95 | @@ -1184,7 +1183,6 @@ int main() { |
| 96 | BinopPrecedence['*'] = 40; // highest. |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 97 | |
Eric Christopher | 903f3db | 2014-12-08 18:48:08 +0000 | [diff] [blame] | 98 | // Prime the first token. |
| 99 | - fprintf(stderr, "ready> "); |
| 100 | getNextToken(); |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 101 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 102 | Lastly we're going to disable all of the optimization passes and the JIT so |
| 103 | that the only thing that happens after we're done parsing and generating |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 104 | code is that the LLVM IR goes to standard error: |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 105 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 106 | .. code-block:: udiff |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 107 | |
Eric Christopher | 903f3db | 2014-12-08 18:48:08 +0000 | [diff] [blame] | 108 | @@ -1108,17 +1108,8 @@ static void HandleExtern() { |
| 109 | static void HandleTopLevelExpression() { |
| 110 | // Evaluate a top-level expression into an anonymous function. |
Lang Hames | 09bf4c1 | 2015-08-18 18:11:06 +0000 | [diff] [blame] | 111 | if (auto FnAST = ParseTopLevelExpr()) { |
Lang Hames | 2d789c3 | 2015-08-26 03:07:41 +0000 | [diff] [blame] | 112 | - if (auto *FnIR = FnAST->codegen()) { |
Eric Christopher | 903f3db | 2014-12-08 18:48:08 +0000 | [diff] [blame] | 113 | - // We're just doing this to make sure it executes. |
| 114 | - TheExecutionEngine->finalizeObject(); |
| 115 | - // JIT the function, returning a function pointer. |
Lang Hames | 09bf4c1 | 2015-08-18 18:11:06 +0000 | [diff] [blame] | 116 | - void *FPtr = TheExecutionEngine->getPointerToFunction(FnIR); |
Eric Christopher | 903f3db | 2014-12-08 18:48:08 +0000 | [diff] [blame] | 117 | - |
| 118 | - // Cast it to the right type (takes no arguments, returns a double) so we |
| 119 | - // can call it as a native function. |
| 120 | - double (*FP)() = (double (*)())(intptr_t)FPtr; |
| 121 | - // Ignore the return value for this. |
| 122 | - (void)FP; |
Lang Hames | 2d789c3 | 2015-08-26 03:07:41 +0000 | [diff] [blame] | 123 | + if (!F->codegen()) { |
Eric Christopher | 903f3db | 2014-12-08 18:48:08 +0000 | [diff] [blame] | 124 | + fprintf(stderr, "Error generating code for top level expr"); |
| 125 | } |
| 126 | } else { |
| 127 | // Skip token for error recovery. |
| 128 | @@ -1439,11 +1459,11 @@ int main() { |
| 129 | // target lays out data structures. |
| 130 | TheModule->setDataLayout(TheExecutionEngine->getDataLayout()); |
| 131 | OurFPM.add(new DataLayoutPass()); |
| 132 | +#if 0 |
| 133 | OurFPM.add(createBasicAliasAnalysisPass()); |
| 134 | // Promote allocas to registers. |
| 135 | OurFPM.add(createPromoteMemoryToRegisterPass()); |
| 136 | @@ -1218,7 +1210,7 @@ int main() { |
| 137 | OurFPM.add(createGVNPass()); |
| 138 | // Simplify the control flow graph (deleting unreachable blocks, etc). |
| 139 | OurFPM.add(createCFGSimplificationPass()); |
| 140 | - |
| 141 | + #endif |
| 142 | OurFPM.doInitialization(); |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 143 | |
Eric Christopher | 903f3db | 2014-12-08 18:48:08 +0000 | [diff] [blame] | 144 | // Set the global so the code gen can use this. |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 145 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 146 | This relatively small set of changes get us to the point that we can compile |
| 147 | our piece of Kaleidoscope language down to an executable program via this |
| 148 | command line: |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 149 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 150 | .. code-block:: bash |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 151 | |
Wilfred Hughes | 945f43e | 2016-07-02 17:01:59 +0000 | [diff] [blame] | 152 | Kaleidoscope-Ch9 < fib.ks | & clang -x ir - |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 153 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 154 | which gives an a.out/a.exe in the current working directory. |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 155 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 156 | Compile Unit |
| 157 | ============ |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 158 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 159 | The top level container for a section of code in DWARF is a compile unit. |
| 160 | This contains the type and function data for an individual translation unit |
| 161 | (read: one file of source code). So the first thing we need to do is |
| 162 | construct one for our fib.ks file. |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 163 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 164 | DWARF Emission Setup |
| 165 | ==================== |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 166 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 167 | Similar to the ``IRBuilder`` class we have a |
Alex Denisov | 596e979 | 2015-12-15 20:50:29 +0000 | [diff] [blame] | 168 | `DIBuilder <http://llvm.org/doxygen/classllvm_1_1DIBuilder.html>`_ class |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 169 | that helps in constructing debug metadata for an LLVM IR file. It |
| 170 | corresponds 1:1 similarly to ``IRBuilder`` and LLVM IR, but with nicer names. |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 171 | Using it does require that you be more familiar with DWARF terminology than |
| 172 | you needed to be with ``IRBuilder`` and ``Instruction`` names, but if you |
| 173 | read through the general documentation on the |
Alex Denisov | 596e979 | 2015-12-15 20:50:29 +0000 | [diff] [blame] | 174 | `Metadata Format <http://llvm.org/docs/SourceLevelDebugging.html>`_ it |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 175 | should be a little more clear. We'll be using this class to construct all |
| 176 | of our IR level descriptions. Construction for it takes a module so we |
| 177 | need to construct it shortly after we construct our module. We've left it |
| 178 | as a global static variable to make it a bit easier to use. |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 179 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 180 | Next we're going to create a small container to cache some of our frequent |
| 181 | data. The first will be our compile unit, but we'll also write a bit of |
| 182 | code for our one type since we won't have to worry about multiple typed |
| 183 | expressions: |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 184 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 185 | .. code-block:: c++ |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 186 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 187 | static DIBuilder *DBuilder; |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 188 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 189 | struct DebugInfo { |
Duncan P. N. Exon Smith | a9308c4 | 2015-04-29 16:38:44 +0000 | [diff] [blame] | 190 | DICompileUnit *TheCU; |
| 191 | DIType *DblTy; |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 192 | |
Duncan P. N. Exon Smith | a9308c4 | 2015-04-29 16:38:44 +0000 | [diff] [blame] | 193 | DIType *getDoubleTy(); |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 194 | } KSDbgInfo; |
| 195 | |
Duncan P. N. Exon Smith | a9308c4 | 2015-04-29 16:38:44 +0000 | [diff] [blame] | 196 | DIType *DebugInfo::getDoubleTy() { |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 197 | if (DblTy) |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 198 | return DblTy; |
| 199 | |
David Blaikie | 870bbdb | 2017-12-20 19:36:54 +0000 | [diff] [blame] | 200 | DblTy = DBuilder->createBasicType("double", 64, dwarf::DW_ATE_float); |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 201 | return DblTy; |
| 202 | } |
| 203 | |
| 204 | And then later on in ``main`` when we're constructing our module: |
| 205 | |
| 206 | .. code-block:: c++ |
| 207 | |
| 208 | DBuilder = new DIBuilder(*TheModule); |
| 209 | |
| 210 | KSDbgInfo.TheCU = DBuilder->createCompileUnit( |
David Blaikie | 870bbdb | 2017-12-20 19:36:54 +0000 | [diff] [blame] | 211 | dwarf::DW_LANG_C, DBuilder->createFile("fib.ks", "."), |
| 212 | "Kaleidoscope Compiler", 0, "", 0); |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 213 | |
| 214 | There are a couple of things to note here. First, while we're producing a |
| 215 | compile unit for a language called Kaleidoscope we used the language |
| 216 | constant for C. This is because a debugger wouldn't necessarily understand |
| 217 | the calling conventions or default ABI for a language it doesn't recognize |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 218 | and we follow the C ABI in our LLVM code generation so it's the closest |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 219 | thing to accurate. This ensures we can actually call functions from the |
| 220 | debugger and have them execute. Secondly, you'll see the "fib.ks" in the |
| 221 | call to ``createCompileUnit``. This is a default hard coded value since |
| 222 | we're using shell redirection to put our source into the Kaleidoscope |
| 223 | compiler. In a usual front end you'd have an input file name and it would |
| 224 | go there. |
| 225 | |
| 226 | One last thing as part of emitting debug information via DIBuilder is that |
| 227 | we need to "finalize" the debug information. The reasons are part of the |
| 228 | underlying API for DIBuilder, but make sure you do this near the end of |
| 229 | main: |
| 230 | |
| 231 | .. code-block:: c++ |
| 232 | |
| 233 | DBuilder->finalize(); |
| 234 | |
| 235 | before you dump out the module. |
| 236 | |
| 237 | Functions |
| 238 | ========= |
| 239 | |
| 240 | Now that we have our ``Compile Unit`` and our source locations, we can add |
Lang Hames | 2d789c3 | 2015-08-26 03:07:41 +0000 | [diff] [blame] | 241 | function definitions to the debug info. So in ``PrototypeAST::codegen()`` we |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 242 | add a few lines of code to describe a context for our subprogram, in this |
| 243 | case the "File", and the actual definition of the function itself. |
| 244 | |
| 245 | So the context: |
| 246 | |
| 247 | .. code-block:: c++ |
| 248 | |
Duncan P. N. Exon Smith | a9308c4 | 2015-04-29 16:38:44 +0000 | [diff] [blame] | 249 | DIFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(), |
Duncan P. N. Exon Smith | 0a35f65 | 2015-04-18 00:01:35 +0000 | [diff] [blame] | 250 | KSDbgInfo.TheCU.getDirectory()); |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 251 | |
Duncan P. N. Exon Smith | a9308c4 | 2015-04-29 16:38:44 +0000 | [diff] [blame] | 252 | giving us an DIFile and asking the ``Compile Unit`` we created above for the |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 253 | directory and filename where we are currently. Then, for now, we use some |
| 254 | source locations of 0 (since our AST doesn't currently have source location |
| 255 | information) and construct our function definition: |
| 256 | |
| 257 | .. code-block:: c++ |
| 258 | |
Duncan P. N. Exon Smith | a9308c4 | 2015-04-29 16:38:44 +0000 | [diff] [blame] | 259 | DIScope *FContext = Unit; |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 260 | unsigned LineNo = 0; |
| 261 | unsigned ScopeLine = 0; |
Duncan P. N. Exon Smith | a9308c4 | 2015-04-29 16:38:44 +0000 | [diff] [blame] | 262 | DISubprogram *SP = DBuilder->createFunction( |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 263 | FContext, P.getName(), StringRef(), Unit, LineNo, |
| 264 | CreateFunctionType(TheFunction->arg_size(), Unit), |
| 265 | false /* internal linkage */, true /* definition */, ScopeLine, |
| 266 | DINode::FlagPrototyped, false); |
| 267 | TheFunction->setSubprogram(SP); |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 268 | |
Duncan P. N. Exon Smith | a9308c4 | 2015-04-29 16:38:44 +0000 | [diff] [blame] | 269 | and we now have an DISubprogram that contains a reference to all of our |
Duncan P. N. Exon Smith | 0a35f65 | 2015-04-18 00:01:35 +0000 | [diff] [blame] | 270 | metadata for the function. |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 271 | |
| 272 | Source Locations |
| 273 | ================ |
| 274 | |
| 275 | The most important thing for debug information is accurate source location - |
| 276 | this makes it possible to map your source code back. We have a problem though, |
| 277 | Kaleidoscope really doesn't have any source location information in the lexer |
| 278 | or parser so we'll need to add it. |
| 279 | |
| 280 | .. code-block:: c++ |
| 281 | |
| 282 | struct SourceLocation { |
| 283 | int Line; |
| 284 | int Col; |
| 285 | }; |
| 286 | static SourceLocation CurLoc; |
| 287 | static SourceLocation LexLoc = {1, 0}; |
| 288 | |
| 289 | static int advance() { |
| 290 | int LastChar = getchar(); |
| 291 | |
| 292 | if (LastChar == '\n' || LastChar == '\r') { |
| 293 | LexLoc.Line++; |
| 294 | LexLoc.Col = 0; |
| 295 | } else |
| 296 | LexLoc.Col++; |
| 297 | return LastChar; |
| 298 | } |
| 299 | |
| 300 | In this set of code we've added some functionality on how to keep track of the |
| 301 | line and column of the "source file". As we lex every token we set our current |
| 302 | current "lexical location" to the assorted line and column for the beginning |
| 303 | of the token. We do this by overriding all of the previous calls to |
| 304 | ``getchar()`` with our new ``advance()`` that keeps track of the information |
| 305 | and then we have added to all of our AST classes a source location: |
| 306 | |
| 307 | .. code-block:: c++ |
| 308 | |
| 309 | class ExprAST { |
| 310 | SourceLocation Loc; |
| 311 | |
| 312 | public: |
Lang Hames | 59b0da8 | 2015-08-19 18:15:58 +0000 | [diff] [blame] | 313 | ExprAST(SourceLocation Loc = CurLoc) : Loc(Loc) {} |
| 314 | virtual ~ExprAST() {} |
Lang Hames | 2d789c3 | 2015-08-26 03:07:41 +0000 | [diff] [blame] | 315 | virtual Value* codegen() = 0; |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 316 | int getLine() const { return Loc.Line; } |
| 317 | int getCol() const { return Loc.Col; } |
Lang Hames | 59b0da8 | 2015-08-19 18:15:58 +0000 | [diff] [blame] | 318 | virtual raw_ostream &dump(raw_ostream &out, int ind) { |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 319 | return out << ':' << getLine() << ':' << getCol() << '\n'; |
| 320 | } |
| 321 | |
| 322 | that we pass down through when we create a new expression: |
| 323 | |
| 324 | .. code-block:: c++ |
| 325 | |
Lang Hames | 09bf4c1 | 2015-08-18 18:11:06 +0000 | [diff] [blame] | 326 | LHS = llvm::make_unique<BinaryExprAST>(BinLoc, BinOp, std::move(LHS), |
| 327 | std::move(RHS)); |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 328 | |
| 329 | giving us locations for each of our expressions and variables. |
| 330 | |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 331 | To make sure that every instruction gets proper source location information, |
| 332 | we have to tell ``Builder`` whenever we're at a new source location. |
| 333 | We use a small helper function for this: |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 334 | |
| 335 | .. code-block:: c++ |
| 336 | |
| 337 | void DebugInfo::emitLocation(ExprAST *AST) { |
Duncan P. N. Exon Smith | a9308c4 | 2015-04-29 16:38:44 +0000 | [diff] [blame] | 338 | DIScope *Scope; |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 339 | if (LexicalBlocks.empty()) |
Duncan P. N. Exon Smith | 0a35f65 | 2015-04-18 00:01:35 +0000 | [diff] [blame] | 340 | Scope = TheCU; |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 341 | else |
| 342 | Scope = LexicalBlocks.back(); |
| 343 | Builder.SetCurrentDebugLocation( |
Duncan P. N. Exon Smith | 0a35f65 | 2015-04-18 00:01:35 +0000 | [diff] [blame] | 344 | DebugLoc::get(AST->getLine(), AST->getCol(), Scope)); |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 345 | } |
| 346 | |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 347 | This both tells the main ``IRBuilder`` where we are, but also what scope |
| 348 | we're in. The scope can either be on compile-unit level or be the nearest |
| 349 | enclosing lexical block like the current function. |
| 350 | To represent this we create a stack of scopes: |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 351 | |
| 352 | .. code-block:: c++ |
| 353 | |
Duncan P. N. Exon Smith | a9308c4 | 2015-04-29 16:38:44 +0000 | [diff] [blame] | 354 | std::vector<DIScope *> LexicalBlocks; |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 355 | |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 356 | and push the scope (function) to the top of the stack when we start |
| 357 | generating the code for each function: |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 358 | |
| 359 | .. code-block:: c++ |
| 360 | |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 361 | KSDbgInfo.LexicalBlocks.push_back(SP); |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 362 | |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 363 | Also, we may not forget to pop the scope back off of the scope stack at the |
Eric Christopher | 0dd4dd3 | 2014-12-09 00:28:24 +0000 | [diff] [blame] | 364 | end of the code generation for the function: |
| 365 | |
| 366 | .. code-block:: c++ |
| 367 | |
| 368 | // Pop off the lexical block for the function since we added it |
| 369 | // unconditionally. |
| 370 | KSDbgInfo.LexicalBlocks.pop_back(); |
| 371 | |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 372 | Then we make sure to emit the location every time we start to generate code |
| 373 | for a new AST object: |
| 374 | |
| 375 | .. code-block:: c++ |
| 376 | |
| 377 | KSDbgInfo.emitLocation(this); |
| 378 | |
Eric Christopher | 0dd4dd3 | 2014-12-09 00:28:24 +0000 | [diff] [blame] | 379 | Variables |
| 380 | ========= |
| 381 | |
| 382 | Now that we have functions, we need to be able to print out the variables |
| 383 | we have in scope. Let's get our function arguments set up so we can get |
| 384 | decent backtraces and see how our functions are being called. It isn't |
| 385 | a lot of code, and we generally handle it when we're creating the |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 386 | argument allocas in ``FunctionAST::codegen``. |
Eric Christopher | 0dd4dd3 | 2014-12-09 00:28:24 +0000 | [diff] [blame] | 387 | |
| 388 | .. code-block:: c++ |
| 389 | |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 390 | // Record the function arguments in the NamedValues map. |
| 391 | NamedValues.clear(); |
| 392 | unsigned ArgIdx = 0; |
| 393 | for (auto &Arg : TheFunction->args()) { |
| 394 | // Create an alloca for this variable. |
| 395 | AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, Arg.getName()); |
Eric Christopher | 0dd4dd3 | 2014-12-09 00:28:24 +0000 | [diff] [blame] | 396 | |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 397 | // Create a debug descriptor for the variable. |
| 398 | DILocalVariable *D = DBuilder->createParameterVariable( |
| 399 | SP, Arg.getName(), ++ArgIdx, Unit, LineNo, KSDbgInfo.getDoubleTy(), |
| 400 | true); |
Eric Christopher | 0dd4dd3 | 2014-12-09 00:28:24 +0000 | [diff] [blame] | 401 | |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 402 | DBuilder->insertDeclare(Alloca, D, DBuilder->createExpression(), |
| 403 | DebugLoc::get(LineNo, 0, SP), |
| 404 | Builder.GetInsertBlock()); |
| 405 | |
| 406 | // Store the initial value into the alloca. |
| 407 | Builder.CreateStore(&Arg, Alloca); |
| 408 | |
| 409 | // Add arguments to variable symbol table. |
| 410 | NamedValues[Arg.getName()] = Alloca; |
| 411 | } |
| 412 | |
| 413 | |
| 414 | Here we're first creating the variable, giving it the scope (``SP``), |
Eric Christopher | 0dd4dd3 | 2014-12-09 00:28:24 +0000 | [diff] [blame] | 415 | the name, source location, type, and since it's an argument, the argument |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 416 | index. Next, we create an ``lvm.dbg.declare`` call to indicate at the IR |
Eric Christopher | 0dd4dd3 | 2014-12-09 00:28:24 +0000 | [diff] [blame] | 417 | level that we've got a variable in an alloca (and it gives a starting |
Lang Hames | 59b0da8 | 2015-08-19 18:15:58 +0000 | [diff] [blame] | 418 | location for the variable), and setting a source location for the |
Eric Christopher | 0dd4dd3 | 2014-12-09 00:28:24 +0000 | [diff] [blame] | 419 | beginning of the scope on the declare. |
| 420 | |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 421 | One interesting thing to note at this point is that various debuggers have |
| 422 | assumptions based on how code and debug information was generated for them |
| 423 | in the past. In this case we need to do a little bit of a hack to avoid |
| 424 | generating line information for the function prologue so that the debugger |
| 425 | knows to skip over those instructions when setting a breakpoint. So in |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 426 | ``FunctionAST::CodeGen`` we add some more lines: |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 427 | |
| 428 | .. code-block:: c++ |
| 429 | |
| 430 | // Unset the location for the prologue emission (leading instructions with no |
| 431 | // location in a function are considered part of the prologue and the debugger |
| 432 | // will run past them when breaking on a function) |
| 433 | KSDbgInfo.emitLocation(nullptr); |
| 434 | |
| 435 | and then emit a new location when we actually start generating code for the |
| 436 | body of the function: |
| 437 | |
| 438 | .. code-block:: c++ |
| 439 | |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 440 | KSDbgInfo.emitLocation(Body.get()); |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 441 | |
Eric Christopher | 0dd4dd3 | 2014-12-09 00:28:24 +0000 | [diff] [blame] | 442 | With this we have enough debug information to set breakpoints in functions, |
| 443 | print out argument variables, and call functions. Not too bad for just a |
| 444 | few simple lines of code! |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 445 | |
| 446 | Full Code Listing |
| 447 | ================= |
| 448 | |
| 449 | Here is the complete code listing for our running example, enhanced with |
| 450 | debug information. To build this example, use: |
| 451 | |
| 452 | .. code-block:: bash |
| 453 | |
| 454 | # Compile |
Eric Christopher | a8c6a0a | 2015-01-08 19:07:01 +0000 | [diff] [blame] | 455 | clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 456 | # Run |
| 457 | ./toy |
| 458 | |
| 459 | Here is the code: |
| 460 | |
Wilfred Hughes | 945f43e | 2016-07-02 17:01:59 +0000 | [diff] [blame] | 461 | .. literalinclude:: ../../examples/Kaleidoscope/Chapter9/toy.cpp |
Eric Christopher | 05917fa | 2014-12-08 18:00:47 +0000 | [diff] [blame] | 462 | :language: c++ |
| 463 | |
Wilfred Hughes | 945f43e | 2016-07-02 17:01:59 +0000 | [diff] [blame] | 464 | `Next: Conclusion and other useful LLVM tidbits <LangImpl10.html>`_ |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 465 | |