| Hans Wennborg | 74e4f8a | 2019-04-11 07:46:25 +0000 | [diff] [blame] | 1 | :orphan: | 
|  | 2 |  | 
| Chris Lattner | d80f118 | 2019-04-07 13:14:23 +0000 | [diff] [blame] | 3 | ======================================== | 
|  | 4 | Kaleidoscope: Code generation to LLVM IR | 
|  | 5 | ======================================== | 
|  | 6 |  | 
|  | 7 | .. contents:: | 
|  | 8 | :local: | 
|  | 9 |  | 
|  | 10 | Chapter 3 Introduction | 
|  | 11 | ====================== | 
|  | 12 |  | 
|  | 13 | Welcome to Chapter 3 of the "`Implementing a language with | 
|  | 14 | LLVM <index.html>`_" tutorial. This chapter shows you how to transform | 
|  | 15 | the `Abstract Syntax Tree <LangImpl02.html>`_, built in Chapter 2, into | 
|  | 16 | LLVM IR. This will teach you a little bit about how LLVM does things, as | 
|  | 17 | well as demonstrate how easy it is to use. It's much more work to build | 
|  | 18 | a lexer and parser than it is to generate LLVM IR code. :) | 
|  | 19 |  | 
|  | 20 | **Please note**: the code in this chapter and later require LLVM 3.7 or | 
|  | 21 | later. LLVM 3.6 and before will not work with it. Also note that you | 
|  | 22 | need to use a version of this tutorial that matches your LLVM release: | 
|  | 23 | If you are using an official LLVM release, use the version of the | 
|  | 24 | documentation included with your release or on the `llvm.org releases | 
|  | 25 | page <http://llvm.org/releases/>`_. | 
|  | 26 |  | 
|  | 27 | Code Generation Setup | 
|  | 28 | ===================== | 
|  | 29 |  | 
|  | 30 | In order to generate LLVM IR, we want some simple setup to get started. | 
|  | 31 | First we define virtual code generation (codegen) methods in each AST | 
|  | 32 | class: | 
|  | 33 |  | 
|  | 34 | .. code-block:: c++ | 
|  | 35 |  | 
|  | 36 | /// ExprAST - Base class for all expression nodes. | 
|  | 37 | class ExprAST { | 
|  | 38 | public: | 
|  | 39 | virtual ~ExprAST() {} | 
|  | 40 | virtual Value *codegen() = 0; | 
|  | 41 | }; | 
|  | 42 |  | 
|  | 43 | /// NumberExprAST - Expression class for numeric literals like "1.0". | 
|  | 44 | class NumberExprAST : public ExprAST { | 
|  | 45 | double Val; | 
|  | 46 |  | 
|  | 47 | public: | 
|  | 48 | NumberExprAST(double Val) : Val(Val) {} | 
|  | 49 | virtual Value *codegen(); | 
|  | 50 | }; | 
|  | 51 | ... | 
|  | 52 |  | 
|  | 53 | The codegen() method says to emit IR for that AST node along with all | 
|  | 54 | the things it depends on, and they all return an LLVM Value object. | 
|  | 55 | "Value" is the class used to represent a "`Static Single Assignment | 
|  | 56 | (SSA) <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_ | 
|  | 57 | register" or "SSA value" in LLVM. The most distinct aspect of SSA values | 
|  | 58 | is that their value is computed as the related instruction executes, and | 
|  | 59 | it does not get a new value until (and if) the instruction re-executes. | 
|  | 60 | In other words, there is no way to "change" an SSA value. For more | 
|  | 61 | information, please read up on `Static Single | 
|  | 62 | Assignment <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_ | 
|  | 63 | - the concepts are really quite natural once you grok them. | 
|  | 64 |  | 
|  | 65 | Note that instead of adding virtual methods to the ExprAST class | 
|  | 66 | hierarchy, it could also make sense to use a `visitor | 
|  | 67 | pattern <http://en.wikipedia.org/wiki/Visitor_pattern>`_ or some other | 
|  | 68 | way to model this. Again, this tutorial won't dwell on good software | 
|  | 69 | engineering practices: for our purposes, adding a virtual method is | 
|  | 70 | simplest. | 
|  | 71 |  | 
|  | 72 | The second thing we want is an "LogError" method like we used for the | 
|  | 73 | parser, which will be used to report errors found during code generation | 
|  | 74 | (for example, use of an undeclared parameter): | 
|  | 75 |  | 
|  | 76 | .. code-block:: c++ | 
|  | 77 |  | 
|  | 78 | static LLVMContext TheContext; | 
|  | 79 | static IRBuilder<> Builder(TheContext); | 
|  | 80 | static std::unique_ptr<Module> TheModule; | 
|  | 81 | static std::map<std::string, Value *> NamedValues; | 
|  | 82 |  | 
|  | 83 | Value *LogErrorV(const char *Str) { | 
|  | 84 | LogError(Str); | 
|  | 85 | return nullptr; | 
|  | 86 | } | 
|  | 87 |  | 
|  | 88 | The static variables will be used during code generation. ``TheContext`` | 
|  | 89 | is an opaque object that owns a lot of core LLVM data structures, such as | 
|  | 90 | the type and constant value tables. We don't need to understand it in | 
|  | 91 | detail, we just need a single instance to pass into APIs that require it. | 
|  | 92 |  | 
|  | 93 | The ``Builder`` object is a helper object that makes it easy to generate | 
|  | 94 | LLVM instructions. Instances of the | 
|  | 95 | `IRBuilder <http://llvm.org/doxygen/IRBuilder_8h-source.html>`_ | 
|  | 96 | class template keep track of the current place to insert instructions | 
|  | 97 | and has methods to create new instructions. | 
|  | 98 |  | 
|  | 99 | ``TheModule`` is an LLVM construct that contains functions and global | 
|  | 100 | variables. In many ways, it is the top-level structure that the LLVM IR | 
|  | 101 | uses to contain code. It will own the memory for all of the IR that we | 
|  | 102 | generate, which is why the codegen() method returns a raw Value\*, | 
|  | 103 | rather than a unique_ptr<Value>. | 
|  | 104 |  | 
|  | 105 | The ``NamedValues`` map keeps track of which values are defined in the | 
|  | 106 | current scope and what their LLVM representation is. (In other words, it | 
|  | 107 | is a symbol table for the code). In this form of Kaleidoscope, the only | 
|  | 108 | things that can be referenced are function parameters. As such, function | 
|  | 109 | parameters will be in this map when generating code for their function | 
|  | 110 | body. | 
|  | 111 |  | 
|  | 112 | With these basics in place, we can start talking about how to generate | 
|  | 113 | code for each expression. Note that this assumes that the ``Builder`` | 
|  | 114 | has been set up to generate code *into* something. For now, we'll assume | 
|  | 115 | that this has already been done, and we'll just use it to emit code. | 
|  | 116 |  | 
|  | 117 | Expression Code Generation | 
|  | 118 | ========================== | 
|  | 119 |  | 
|  | 120 | Generating LLVM code for expression nodes is very straightforward: less | 
|  | 121 | than 45 lines of commented code for all four of our expression nodes. | 
|  | 122 | First we'll do numeric literals: | 
|  | 123 |  | 
|  | 124 | .. code-block:: c++ | 
|  | 125 |  | 
|  | 126 | Value *NumberExprAST::codegen() { | 
|  | 127 | return ConstantFP::get(TheContext, APFloat(Val)); | 
|  | 128 | } | 
|  | 129 |  | 
|  | 130 | In the LLVM IR, numeric constants are represented with the | 
|  | 131 | ``ConstantFP`` class, which holds the numeric value in an ``APFloat`` | 
|  | 132 | internally (``APFloat`` has the capability of holding floating point | 
|  | 133 | constants of Arbitrary Precision). This code basically just creates | 
|  | 134 | and returns a ``ConstantFP``. Note that in the LLVM IR that constants | 
|  | 135 | are all uniqued together and shared. For this reason, the API uses the | 
|  | 136 | "foo::get(...)" idiom instead of "new foo(..)" or "foo::Create(..)". | 
|  | 137 |  | 
|  | 138 | .. code-block:: c++ | 
|  | 139 |  | 
|  | 140 | Value *VariableExprAST::codegen() { | 
|  | 141 | // Look this variable up in the function. | 
|  | 142 | Value *V = NamedValues[Name]; | 
|  | 143 | if (!V) | 
|  | 144 | LogErrorV("Unknown variable name"); | 
|  | 145 | return V; | 
|  | 146 | } | 
|  | 147 |  | 
|  | 148 | References to variables are also quite simple using LLVM. In the simple | 
|  | 149 | version of Kaleidoscope, we assume that the variable has already been | 
|  | 150 | emitted somewhere and its value is available. In practice, the only | 
|  | 151 | values that can be in the ``NamedValues`` map are function arguments. | 
|  | 152 | This code simply checks to see that the specified name is in the map (if | 
|  | 153 | not, an unknown variable is being referenced) and returns the value for | 
|  | 154 | it. In future chapters, we'll add support for `loop induction | 
|  | 155 | variables <LangImpl5.html#for-loop-expression>`_ in the symbol table, and for `local | 
|  | 156 | variables <LangImpl7.html#user-defined-local-variables>`_. | 
|  | 157 |  | 
|  | 158 | .. code-block:: c++ | 
|  | 159 |  | 
|  | 160 | Value *BinaryExprAST::codegen() { | 
|  | 161 | Value *L = LHS->codegen(); | 
|  | 162 | Value *R = RHS->codegen(); | 
|  | 163 | if (!L || !R) | 
|  | 164 | return nullptr; | 
|  | 165 |  | 
|  | 166 | switch (Op) { | 
|  | 167 | case '+': | 
|  | 168 | return Builder.CreateFAdd(L, R, "addtmp"); | 
|  | 169 | case '-': | 
|  | 170 | return Builder.CreateFSub(L, R, "subtmp"); | 
|  | 171 | case '*': | 
|  | 172 | return Builder.CreateFMul(L, R, "multmp"); | 
|  | 173 | case '<': | 
|  | 174 | L = Builder.CreateFCmpULT(L, R, "cmptmp"); | 
|  | 175 | // Convert bool 0/1 to double 0.0 or 1.0 | 
|  | 176 | return Builder.CreateUIToFP(L, Type::getDoubleTy(TheContext), | 
|  | 177 | "booltmp"); | 
|  | 178 | default: | 
|  | 179 | return LogErrorV("invalid binary operator"); | 
|  | 180 | } | 
|  | 181 | } | 
|  | 182 |  | 
|  | 183 | Binary operators start to get more interesting. The basic idea here is | 
|  | 184 | that we recursively emit code for the left-hand side of the expression, | 
|  | 185 | then the right-hand side, then we compute the result of the binary | 
|  | 186 | expression. In this code, we do a simple switch on the opcode to create | 
|  | 187 | the right LLVM instruction. | 
|  | 188 |  | 
|  | 189 | In the example above, the LLVM builder class is starting to show its | 
|  | 190 | value. IRBuilder knows where to insert the newly created instruction, | 
|  | 191 | all you have to do is specify what instruction to create (e.g. with | 
|  | 192 | ``CreateFAdd``), which operands to use (``L`` and ``R`` here) and | 
|  | 193 | optionally provide a name for the generated instruction. | 
|  | 194 |  | 
|  | 195 | One nice thing about LLVM is that the name is just a hint. For instance, | 
|  | 196 | if the code above emits multiple "addtmp" variables, LLVM will | 
|  | 197 | automatically provide each one with an increasing, unique numeric | 
|  | 198 | suffix. Local value names for instructions are purely optional, but it | 
|  | 199 | makes it much easier to read the IR dumps. | 
|  | 200 |  | 
|  | 201 | `LLVM instructions <../LangRef.html#instruction-reference>`_ are constrained by strict | 
|  | 202 | rules: for example, the Left and Right operators of an `add | 
|  | 203 | instruction <../LangRef.html#add-instruction>`_ must have the same type, and the | 
|  | 204 | result type of the add must match the operand types. Because all values | 
|  | 205 | in Kaleidoscope are doubles, this makes for very simple code for add, | 
|  | 206 | sub and mul. | 
|  | 207 |  | 
|  | 208 | On the other hand, LLVM specifies that the `fcmp | 
|  | 209 | instruction <../LangRef.html#fcmp-instruction>`_ always returns an 'i1' value (a | 
|  | 210 | one bit integer). The problem with this is that Kaleidoscope wants the | 
|  | 211 | value to be a 0.0 or 1.0 value. In order to get these semantics, we | 
|  | 212 | combine the fcmp instruction with a `uitofp | 
|  | 213 | instruction <../LangRef.html#uitofp-to-instruction>`_. This instruction converts its | 
|  | 214 | input integer into a floating point value by treating the input as an | 
|  | 215 | unsigned value. In contrast, if we used the `sitofp | 
|  | 216 | instruction <../LangRef.html#sitofp-to-instruction>`_, the Kaleidoscope '<' operator | 
|  | 217 | would return 0.0 and -1.0, depending on the input value. | 
|  | 218 |  | 
|  | 219 | .. code-block:: c++ | 
|  | 220 |  | 
|  | 221 | Value *CallExprAST::codegen() { | 
|  | 222 | // Look up the name in the global module table. | 
|  | 223 | Function *CalleeF = TheModule->getFunction(Callee); | 
|  | 224 | if (!CalleeF) | 
|  | 225 | return LogErrorV("Unknown function referenced"); | 
|  | 226 |  | 
|  | 227 | // If argument mismatch error. | 
|  | 228 | if (CalleeF->arg_size() != Args.size()) | 
|  | 229 | return LogErrorV("Incorrect # arguments passed"); | 
|  | 230 |  | 
|  | 231 | std::vector<Value *> ArgsV; | 
|  | 232 | for (unsigned i = 0, e = Args.size(); i != e; ++i) { | 
|  | 233 | ArgsV.push_back(Args[i]->codegen()); | 
|  | 234 | if (!ArgsV.back()) | 
|  | 235 | return nullptr; | 
|  | 236 | } | 
|  | 237 |  | 
|  | 238 | return Builder.CreateCall(CalleeF, ArgsV, "calltmp"); | 
|  | 239 | } | 
|  | 240 |  | 
|  | 241 | Code generation for function calls is quite straightforward with LLVM. The code | 
|  | 242 | above initially does a function name lookup in the LLVM Module's symbol table. | 
|  | 243 | Recall that the LLVM Module is the container that holds the functions we are | 
|  | 244 | JIT'ing. By giving each function the same name as what the user specifies, we | 
|  | 245 | can use the LLVM symbol table to resolve function names for us. | 
|  | 246 |  | 
|  | 247 | Once we have the function to call, we recursively codegen each argument | 
|  | 248 | that is to be passed in, and create an LLVM `call | 
|  | 249 | instruction <../LangRef.html#call-instruction>`_. Note that LLVM uses the native C | 
|  | 250 | calling conventions by default, allowing these calls to also call into | 
|  | 251 | standard library functions like "sin" and "cos", with no additional | 
|  | 252 | effort. | 
|  | 253 |  | 
|  | 254 | This wraps up our handling of the four basic expressions that we have so | 
|  | 255 | far in Kaleidoscope. Feel free to go in and add some more. For example, | 
|  | 256 | by browsing the `LLVM language reference <../LangRef.html>`_ you'll find | 
|  | 257 | several other interesting instructions that are really easy to plug into | 
|  | 258 | our basic framework. | 
|  | 259 |  | 
|  | 260 | Function Code Generation | 
|  | 261 | ======================== | 
|  | 262 |  | 
|  | 263 | Code generation for prototypes and functions must handle a number of | 
|  | 264 | details, which make their code less beautiful than expression code | 
|  | 265 | generation, but allows us to illustrate some important points. First, | 
|  | 266 | let's talk about code generation for prototypes: they are used both for | 
|  | 267 | function bodies and external function declarations. The code starts | 
|  | 268 | with: | 
|  | 269 |  | 
|  | 270 | .. code-block:: c++ | 
|  | 271 |  | 
|  | 272 | Function *PrototypeAST::codegen() { | 
|  | 273 | // Make the function type:  double(double,double) etc. | 
|  | 274 | std::vector<Type*> Doubles(Args.size(), | 
|  | 275 | Type::getDoubleTy(TheContext)); | 
|  | 276 | FunctionType *FT = | 
|  | 277 | FunctionType::get(Type::getDoubleTy(TheContext), Doubles, false); | 
|  | 278 |  | 
|  | 279 | Function *F = | 
|  | 280 | Function::Create(FT, Function::ExternalLinkage, Name, TheModule.get()); | 
|  | 281 |  | 
|  | 282 | This code packs a lot of power into a few lines. Note first that this | 
|  | 283 | function returns a "Function\*" instead of a "Value\*". Because a | 
|  | 284 | "prototype" really talks about the external interface for a function | 
|  | 285 | (not the value computed by an expression), it makes sense for it to | 
|  | 286 | return the LLVM Function it corresponds to when codegen'd. | 
|  | 287 |  | 
|  | 288 | The call to ``FunctionType::get`` creates the ``FunctionType`` that | 
|  | 289 | should be used for a given Prototype. Since all function arguments in | 
|  | 290 | Kaleidoscope are of type double, the first line creates a vector of "N" | 
|  | 291 | LLVM double types. It then uses the ``Functiontype::get`` method to | 
|  | 292 | create a function type that takes "N" doubles as arguments, returns one | 
|  | 293 | double as a result, and that is not vararg (the false parameter | 
|  | 294 | indicates this). Note that Types in LLVM are uniqued just like Constants | 
|  | 295 | are, so you don't "new" a type, you "get" it. | 
|  | 296 |  | 
|  | 297 | The final line above actually creates the IR Function corresponding to | 
|  | 298 | the Prototype. This indicates the type, linkage and name to use, as | 
|  | 299 | well as which module to insert into. "`external | 
|  | 300 | linkage <../LangRef.html#linkage>`_" means that the function may be | 
|  | 301 | defined outside the current module and/or that it is callable by | 
|  | 302 | functions outside the module. The Name passed in is the name the user | 
|  | 303 | specified: since "``TheModule``" is specified, this name is registered | 
|  | 304 | in "``TheModule``"s symbol table. | 
|  | 305 |  | 
|  | 306 | .. code-block:: c++ | 
|  | 307 |  | 
|  | 308 | // Set names for all arguments. | 
|  | 309 | unsigned Idx = 0; | 
|  | 310 | for (auto &Arg : F->args()) | 
|  | 311 | Arg.setName(Args[Idx++]); | 
|  | 312 |  | 
|  | 313 | return F; | 
|  | 314 |  | 
|  | 315 | Finally, we set the name of each of the function's arguments according to the | 
|  | 316 | names given in the Prototype. This step isn't strictly necessary, but keeping | 
|  | 317 | the names consistent makes the IR more readable, and allows subsequent code to | 
|  | 318 | refer directly to the arguments for their names, rather than having to look up | 
|  | 319 | them up in the Prototype AST. | 
|  | 320 |  | 
|  | 321 | At this point we have a function prototype with no body. This is how LLVM IR | 
|  | 322 | represents function declarations. For extern statements in Kaleidoscope, this | 
|  | 323 | is as far as we need to go. For function definitions however, we need to | 
|  | 324 | codegen and attach a function body. | 
|  | 325 |  | 
|  | 326 | .. code-block:: c++ | 
|  | 327 |  | 
|  | 328 | Function *FunctionAST::codegen() { | 
|  | 329 | // First, check for an existing function from a previous 'extern' declaration. | 
|  | 330 | Function *TheFunction = TheModule->getFunction(Proto->getName()); | 
|  | 331 |  | 
|  | 332 | if (!TheFunction) | 
|  | 333 | TheFunction = Proto->codegen(); | 
|  | 334 |  | 
|  | 335 | if (!TheFunction) | 
|  | 336 | return nullptr; | 
|  | 337 |  | 
|  | 338 | if (!TheFunction->empty()) | 
|  | 339 | return (Function*)LogErrorV("Function cannot be redefined."); | 
|  | 340 |  | 
|  | 341 |  | 
|  | 342 | For function definitions, we start by searching TheModule's symbol table for an | 
|  | 343 | existing version of this function, in case one has already been created using an | 
|  | 344 | 'extern' statement. If Module::getFunction returns null then no previous version | 
|  | 345 | exists, so we'll codegen one from the Prototype. In either case, we want to | 
|  | 346 | assert that the function is empty (i.e. has no body yet) before we start. | 
|  | 347 |  | 
|  | 348 | .. code-block:: c++ | 
|  | 349 |  | 
|  | 350 | // Create a new basic block to start insertion into. | 
|  | 351 | BasicBlock *BB = BasicBlock::Create(TheContext, "entry", TheFunction); | 
|  | 352 | Builder.SetInsertPoint(BB); | 
|  | 353 |  | 
|  | 354 | // Record the function arguments in the NamedValues map. | 
|  | 355 | NamedValues.clear(); | 
|  | 356 | for (auto &Arg : TheFunction->args()) | 
|  | 357 | NamedValues[Arg.getName()] = &Arg; | 
|  | 358 |  | 
|  | 359 | Now we get to the point where the ``Builder`` is set up. The first line | 
|  | 360 | creates a new `basic block <http://en.wikipedia.org/wiki/Basic_block>`_ | 
|  | 361 | (named "entry"), which is inserted into ``TheFunction``. The second line | 
|  | 362 | then tells the builder that new instructions should be inserted into the | 
|  | 363 | end of the new basic block. Basic blocks in LLVM are an important part | 
|  | 364 | of functions that define the `Control Flow | 
|  | 365 | Graph <http://en.wikipedia.org/wiki/Control_flow_graph>`_. Since we | 
|  | 366 | don't have any control flow, our functions will only contain one block | 
|  | 367 | at this point. We'll fix this in `Chapter 5 <LangImpl05.html>`_ :). | 
|  | 368 |  | 
|  | 369 | Next we add the function arguments to the NamedValues map (after first clearing | 
|  | 370 | it out) so that they're accessible to ``VariableExprAST`` nodes. | 
|  | 371 |  | 
|  | 372 | .. code-block:: c++ | 
|  | 373 |  | 
|  | 374 | if (Value *RetVal = Body->codegen()) { | 
|  | 375 | // Finish off the function. | 
|  | 376 | Builder.CreateRet(RetVal); | 
|  | 377 |  | 
|  | 378 | // Validate the generated code, checking for consistency. | 
|  | 379 | verifyFunction(*TheFunction); | 
|  | 380 |  | 
|  | 381 | return TheFunction; | 
|  | 382 | } | 
|  | 383 |  | 
|  | 384 | Once the insertion point has been set up and the NamedValues map populated, | 
|  | 385 | we call the ``codegen()`` method for the root expression of the function. If no | 
|  | 386 | error happens, this emits code to compute the expression into the entry block | 
|  | 387 | and returns the value that was computed. Assuming no error, we then create an | 
|  | 388 | LLVM `ret instruction <../LangRef.html#ret-instruction>`_, which completes the function. | 
|  | 389 | Once the function is built, we call ``verifyFunction``, which is | 
|  | 390 | provided by LLVM. This function does a variety of consistency checks on | 
|  | 391 | the generated code, to determine if our compiler is doing everything | 
|  | 392 | right. Using this is important: it can catch a lot of bugs. Once the | 
|  | 393 | function is finished and validated, we return it. | 
|  | 394 |  | 
|  | 395 | .. code-block:: c++ | 
|  | 396 |  | 
|  | 397 | // Error reading body, remove function. | 
|  | 398 | TheFunction->eraseFromParent(); | 
|  | 399 | return nullptr; | 
|  | 400 | } | 
|  | 401 |  | 
|  | 402 | The only piece left here is handling of the error case. For simplicity, | 
|  | 403 | we handle this by merely deleting the function we produced with the | 
|  | 404 | ``eraseFromParent`` method. This allows the user to redefine a function | 
|  | 405 | that they incorrectly typed in before: if we didn't delete it, it would | 
|  | 406 | live in the symbol table, with a body, preventing future redefinition. | 
|  | 407 |  | 
|  | 408 | This code does have a bug, though: If the ``FunctionAST::codegen()`` method | 
|  | 409 | finds an existing IR Function, it does not validate its signature against the | 
|  | 410 | definition's own prototype. This means that an earlier 'extern' declaration will | 
|  | 411 | take precedence over the function definition's signature, which can cause | 
|  | 412 | codegen to fail, for instance if the function arguments are named differently. | 
|  | 413 | There are a number of ways to fix this bug, see what you can come up with! Here | 
|  | 414 | is a testcase: | 
|  | 415 |  | 
|  | 416 | :: | 
|  | 417 |  | 
|  | 418 | extern foo(a);     # ok, defines foo. | 
|  | 419 | def foo(b) b;      # Error: Unknown variable name. (decl using 'a' takes precedence). | 
|  | 420 |  | 
|  | 421 | Driver Changes and Closing Thoughts | 
|  | 422 | =================================== | 
|  | 423 |  | 
|  | 424 | For now, code generation to LLVM doesn't really get us much, except that | 
|  | 425 | we can look at the pretty IR calls. The sample code inserts calls to | 
|  | 426 | codegen into the "``HandleDefinition``", "``HandleExtern``" etc | 
|  | 427 | functions, and then dumps out the LLVM IR. This gives a nice way to look | 
|  | 428 | at the LLVM IR for simple functions. For example: | 
|  | 429 |  | 
|  | 430 | :: | 
|  | 431 |  | 
|  | 432 | ready> 4+5; | 
|  | 433 | Read top-level expression: | 
|  | 434 | define double @0() { | 
|  | 435 | entry: | 
|  | 436 | ret double 9.000000e+00 | 
|  | 437 | } | 
|  | 438 |  | 
|  | 439 | Note how the parser turns the top-level expression into anonymous | 
|  | 440 | functions for us. This will be handy when we add `JIT | 
|  | 441 | support <LangImpl4.html#adding-a-jit-compiler>`_ in the next chapter. Also note that the | 
|  | 442 | code is very literally transcribed, no optimizations are being performed | 
|  | 443 | except simple constant folding done by IRBuilder. We will `add | 
|  | 444 | optimizations <LangImpl4.html#trivial-constant-folding>`_ explicitly in the next | 
|  | 445 | chapter. | 
|  | 446 |  | 
|  | 447 | :: | 
|  | 448 |  | 
|  | 449 | ready> def foo(a b) a*a + 2*a*b + b*b; | 
|  | 450 | Read function definition: | 
|  | 451 | define double @foo(double %a, double %b) { | 
|  | 452 | entry: | 
|  | 453 | %multmp = fmul double %a, %a | 
|  | 454 | %multmp1 = fmul double 2.000000e+00, %a | 
|  | 455 | %multmp2 = fmul double %multmp1, %b | 
|  | 456 | %addtmp = fadd double %multmp, %multmp2 | 
|  | 457 | %multmp3 = fmul double %b, %b | 
|  | 458 | %addtmp4 = fadd double %addtmp, %multmp3 | 
|  | 459 | ret double %addtmp4 | 
|  | 460 | } | 
|  | 461 |  | 
|  | 462 | This shows some simple arithmetic. Notice the striking similarity to the | 
|  | 463 | LLVM builder calls that we use to create the instructions. | 
|  | 464 |  | 
|  | 465 | :: | 
|  | 466 |  | 
|  | 467 | ready> def bar(a) foo(a, 4.0) + bar(31337); | 
|  | 468 | Read function definition: | 
|  | 469 | define double @bar(double %a) { | 
|  | 470 | entry: | 
|  | 471 | %calltmp = call double @foo(double %a, double 4.000000e+00) | 
|  | 472 | %calltmp1 = call double @bar(double 3.133700e+04) | 
|  | 473 | %addtmp = fadd double %calltmp, %calltmp1 | 
|  | 474 | ret double %addtmp | 
|  | 475 | } | 
|  | 476 |  | 
|  | 477 | This shows some function calls. Note that this function will take a long | 
|  | 478 | time to execute if you call it. In the future we'll add conditional | 
|  | 479 | control flow to actually make recursion useful :). | 
|  | 480 |  | 
|  | 481 | :: | 
|  | 482 |  | 
|  | 483 | ready> extern cos(x); | 
|  | 484 | Read extern: | 
|  | 485 | declare double @cos(double) | 
|  | 486 |  | 
|  | 487 | ready> cos(1.234); | 
|  | 488 | Read top-level expression: | 
|  | 489 | define double @1() { | 
|  | 490 | entry: | 
|  | 491 | %calltmp = call double @cos(double 1.234000e+00) | 
|  | 492 | ret double %calltmp | 
|  | 493 | } | 
|  | 494 |  | 
|  | 495 | This shows an extern for the libm "cos" function, and a call to it. | 
|  | 496 |  | 
|  | 497 | .. TODO:: Abandon Pygments' horrible `llvm` lexer. It just totally gives up | 
|  | 498 | on highlighting this due to the first line. | 
|  | 499 |  | 
|  | 500 | :: | 
|  | 501 |  | 
|  | 502 | ready> ^D | 
|  | 503 | ; ModuleID = 'my cool jit' | 
|  | 504 |  | 
|  | 505 | define double @0() { | 
|  | 506 | entry: | 
|  | 507 | %addtmp = fadd double 4.000000e+00, 5.000000e+00 | 
|  | 508 | ret double %addtmp | 
|  | 509 | } | 
|  | 510 |  | 
|  | 511 | define double @foo(double %a, double %b) { | 
|  | 512 | entry: | 
|  | 513 | %multmp = fmul double %a, %a | 
|  | 514 | %multmp1 = fmul double 2.000000e+00, %a | 
|  | 515 | %multmp2 = fmul double %multmp1, %b | 
|  | 516 | %addtmp = fadd double %multmp, %multmp2 | 
|  | 517 | %multmp3 = fmul double %b, %b | 
|  | 518 | %addtmp4 = fadd double %addtmp, %multmp3 | 
|  | 519 | ret double %addtmp4 | 
|  | 520 | } | 
|  | 521 |  | 
|  | 522 | define double @bar(double %a) { | 
|  | 523 | entry: | 
|  | 524 | %calltmp = call double @foo(double %a, double 4.000000e+00) | 
|  | 525 | %calltmp1 = call double @bar(double 3.133700e+04) | 
|  | 526 | %addtmp = fadd double %calltmp, %calltmp1 | 
|  | 527 | ret double %addtmp | 
|  | 528 | } | 
|  | 529 |  | 
|  | 530 | declare double @cos(double) | 
|  | 531 |  | 
|  | 532 | define double @1() { | 
|  | 533 | entry: | 
|  | 534 | %calltmp = call double @cos(double 1.234000e+00) | 
|  | 535 | ret double %calltmp | 
|  | 536 | } | 
|  | 537 |  | 
|  | 538 | When you quit the current demo (by sending an EOF via CTRL+D on Linux | 
|  | 539 | or CTRL+Z and ENTER on Windows), it dumps out the IR for the entire | 
|  | 540 | module generated. Here you can see the big picture with all the | 
|  | 541 | functions referencing each other. | 
|  | 542 |  | 
|  | 543 | This wraps up the third chapter of the Kaleidoscope tutorial. Up next, | 
|  | 544 | we'll describe how to `add JIT codegen and optimizer | 
|  | 545 | support <LangImpl04.html>`_ to this so we can actually start running | 
|  | 546 | code! | 
|  | 547 |  | 
|  | 548 | Full Code Listing | 
|  | 549 | ================= | 
|  | 550 |  | 
|  | 551 | Here is the complete code listing for our running example, enhanced with | 
|  | 552 | the LLVM code generator. Because this uses the LLVM libraries, we need | 
|  | 553 | to link them in. To do this, we use the | 
|  | 554 | `llvm-config <http://llvm.org/cmds/llvm-config.html>`_ tool to inform | 
|  | 555 | our makefile/command line about which options to use: | 
|  | 556 |  | 
|  | 557 | .. code-block:: bash | 
|  | 558 |  | 
|  | 559 | # Compile | 
|  | 560 | clang++ -g -O3 toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core` -o toy | 
|  | 561 | # Run | 
|  | 562 | ./toy | 
|  | 563 |  | 
|  | 564 | Here is the code: | 
|  | 565 |  | 
| Hans Wennborg | 147e0dd | 2019-04-11 07:30:56 +0000 | [diff] [blame] | 566 | .. literalinclude:: ../../../examples/Kaleidoscope/Chapter3/toy.cpp | 
| Chris Lattner | d80f118 | 2019-04-07 13:14:23 +0000 | [diff] [blame] | 567 | :language: c++ | 
|  | 568 |  | 
|  | 569 | `Next: Adding JIT and Optimizer Support <LangImpl04.html>`_ | 
|  | 570 |  |