Sean Silva | ee47edf | 2012-12-05 00:26:32 +0000 | [diff] [blame^] | 1 | ======================================== |
| 2 | Kaleidoscope: Code generation to LLVM IR |
| 3 | ======================================== |
| 4 | |
| 5 | .. contents:: |
| 6 | :local: |
| 7 | |
| 8 | Written by `Chris Lattner <mailto:sabre@nondot.org>`_ |
| 9 | |
| 10 | Chapter 3 Introduction |
| 11 | ====================== |
| 12 | |
| 13 | Welcome to Chapter 3 of the "`Implementing a language with |
| 14 | LLVM <index.html>`_" tutorial. This chapter shows you how to transform |
| 15 | the `Abstract Syntax Tree <LangImpl2.html>`_, built in Chapter 2, into |
| 16 | LLVM IR. This will teach you a little bit about how LLVM does things, as |
| 17 | well as demonstrate how easy it is to use. It's much more work to build |
| 18 | a lexer and parser than it is to generate LLVM IR code. :) |
| 19 | |
| 20 | **Please note**: the code in this chapter and later require LLVM 2.2 or |
| 21 | later. LLVM 2.1 and before will not work with it. Also note that you |
| 22 | need to use a version of this tutorial that matches your LLVM release: |
| 23 | If you are using an official LLVM release, use the version of the |
| 24 | documentation included with your release or on the `llvm.org releases |
| 25 | page <http://llvm.org/releases/>`_. |
| 26 | |
| 27 | Code Generation Setup |
| 28 | ===================== |
| 29 | |
| 30 | In order to generate LLVM IR, we want some simple setup to get started. |
| 31 | First we define virtual code generation (codegen) methods in each AST |
| 32 | class: |
| 33 | |
| 34 | .. code-block:: c++ |
| 35 | |
| 36 | /// ExprAST - Base class for all expression nodes. |
| 37 | class ExprAST { |
| 38 | public: |
| 39 | virtual ~ExprAST() {} |
| 40 | virtual Value *Codegen() = 0; |
| 41 | }; |
| 42 | |
| 43 | /// NumberExprAST - Expression class for numeric literals like "1.0". |
| 44 | class NumberExprAST : public ExprAST { |
| 45 | double Val; |
| 46 | public: |
| 47 | NumberExprAST(double val) : Val(val) {} |
| 48 | virtual Value *Codegen(); |
| 49 | }; |
| 50 | ... |
| 51 | |
| 52 | The Codegen() method says to emit IR for that AST node along with all |
| 53 | the things it depends on, and they all return an LLVM Value object. |
| 54 | "Value" is the class used to represent a "`Static Single Assignment |
| 55 | (SSA) <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_ |
| 56 | register" or "SSA value" in LLVM. The most distinct aspect of SSA values |
| 57 | is that their value is computed as the related instruction executes, and |
| 58 | it does not get a new value until (and if) the instruction re-executes. |
| 59 | In other words, there is no way to "change" an SSA value. For more |
| 60 | information, please read up on `Static Single |
| 61 | Assignment <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_ |
| 62 | - the concepts are really quite natural once you grok them. |
| 63 | |
| 64 | Note that instead of adding virtual methods to the ExprAST class |
| 65 | hierarchy, it could also make sense to use a `visitor |
| 66 | pattern <http://en.wikipedia.org/wiki/Visitor_pattern>`_ or some other |
| 67 | way to model this. Again, this tutorial won't dwell on good software |
| 68 | engineering practices: for our purposes, adding a virtual method is |
| 69 | simplest. |
| 70 | |
| 71 | The second thing we want is an "Error" method like we used for the |
| 72 | parser, which will be used to report errors found during code generation |
| 73 | (for example, use of an undeclared parameter): |
| 74 | |
| 75 | .. code-block:: c++ |
| 76 | |
| 77 | Value *ErrorV(const char *Str) { Error(Str); return 0; } |
| 78 | |
| 79 | static Module *TheModule; |
| 80 | static IRBuilder<> Builder(getGlobalContext()); |
| 81 | static std::map<std::string, Value*> NamedValues; |
| 82 | |
| 83 | The static variables will be used during code generation. ``TheModule`` |
| 84 | is the LLVM construct that contains all of the functions and global |
| 85 | variables in a chunk of code. In many ways, it is the top-level |
| 86 | structure that the LLVM IR uses to contain code. |
| 87 | |
| 88 | The ``Builder`` object is a helper object that makes it easy to generate |
| 89 | LLVM instructions. Instances of the |
| 90 | ```IRBuilder`` <http://llvm.org/doxygen/IRBuilder_8h-source.html>`_ |
| 91 | class template keep track of the current place to insert instructions |
| 92 | and has methods to create new instructions. |
| 93 | |
| 94 | The ``NamedValues`` map keeps track of which values are defined in the |
| 95 | current scope and what their LLVM representation is. (In other words, it |
| 96 | is a symbol table for the code). In this form of Kaleidoscope, the only |
| 97 | things that can be referenced are function parameters. As such, function |
| 98 | parameters will be in this map when generating code for their function |
| 99 | body. |
| 100 | |
| 101 | With these basics in place, we can start talking about how to generate |
| 102 | code for each expression. Note that this assumes that the ``Builder`` |
| 103 | has been set up to generate code *into* something. For now, we'll assume |
| 104 | that this has already been done, and we'll just use it to emit code. |
| 105 | |
| 106 | Expression Code Generation |
| 107 | ========================== |
| 108 | |
| 109 | Generating LLVM code for expression nodes is very straightforward: less |
| 110 | than 45 lines of commented code for all four of our expression nodes. |
| 111 | First we'll do numeric literals: |
| 112 | |
| 113 | .. code-block:: c++ |
| 114 | |
| 115 | Value *NumberExprAST::Codegen() { |
| 116 | return ConstantFP::get(getGlobalContext(), APFloat(Val)); |
| 117 | } |
| 118 | |
| 119 | In the LLVM IR, numeric constants are represented with the |
| 120 | ``ConstantFP`` class, which holds the numeric value in an ``APFloat`` |
| 121 | internally (``APFloat`` has the capability of holding floating point |
| 122 | constants of Arbitrary Precision). This code basically just creates |
| 123 | and returns a ``ConstantFP``. Note that in the LLVM IR that constants |
| 124 | are all uniqued together and shared. For this reason, the API uses the |
| 125 | "foo::get(...)" idiom instead of "new foo(..)" or "foo::Create(..)". |
| 126 | |
| 127 | .. code-block:: c++ |
| 128 | |
| 129 | Value *VariableExprAST::Codegen() { |
| 130 | // Look this variable up in the function. |
| 131 | Value *V = NamedValues[Name]; |
| 132 | return V ? V : ErrorV("Unknown variable name"); |
| 133 | } |
| 134 | |
| 135 | References to variables are also quite simple using LLVM. In the simple |
| 136 | version of Kaleidoscope, we assume that the variable has already been |
| 137 | emitted somewhere and its value is available. In practice, the only |
| 138 | values that can be in the ``NamedValues`` map are function arguments. |
| 139 | This code simply checks to see that the specified name is in the map (if |
| 140 | not, an unknown variable is being referenced) and returns the value for |
| 141 | it. In future chapters, we'll add support for `loop induction |
| 142 | variables <LangImpl5.html#for>`_ in the symbol table, and for `local |
| 143 | variables <LangImpl7.html#localvars>`_. |
| 144 | |
| 145 | .. code-block:: c++ |
| 146 | |
| 147 | Value *BinaryExprAST::Codegen() { |
| 148 | Value *L = LHS->Codegen(); |
| 149 | Value *R = RHS->Codegen(); |
| 150 | if (L == 0 || R == 0) return 0; |
| 151 | |
| 152 | switch (Op) { |
| 153 | case '+': return Builder.CreateFAdd(L, R, "addtmp"); |
| 154 | case '-': return Builder.CreateFSub(L, R, "subtmp"); |
| 155 | case '*': return Builder.CreateFMul(L, R, "multmp"); |
| 156 | case '<': |
| 157 | L = Builder.CreateFCmpULT(L, R, "cmptmp"); |
| 158 | // Convert bool 0/1 to double 0.0 or 1.0 |
| 159 | return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), |
| 160 | "booltmp"); |
| 161 | default: return ErrorV("invalid binary operator"); |
| 162 | } |
| 163 | } |
| 164 | |
| 165 | Binary operators start to get more interesting. The basic idea here is |
| 166 | that we recursively emit code for the left-hand side of the expression, |
| 167 | then the right-hand side, then we compute the result of the binary |
| 168 | expression. In this code, we do a simple switch on the opcode to create |
| 169 | the right LLVM instruction. |
| 170 | |
| 171 | In the example above, the LLVM builder class is starting to show its |
| 172 | value. IRBuilder knows where to insert the newly created instruction, |
| 173 | all you have to do is specify what instruction to create (e.g. with |
| 174 | ``CreateFAdd``), which operands to use (``L`` and ``R`` here) and |
| 175 | optionally provide a name for the generated instruction. |
| 176 | |
| 177 | One nice thing about LLVM is that the name is just a hint. For instance, |
| 178 | if the code above emits multiple "addtmp" variables, LLVM will |
| 179 | automatically provide each one with an increasing, unique numeric |
| 180 | suffix. Local value names for instructions are purely optional, but it |
| 181 | makes it much easier to read the IR dumps. |
| 182 | |
| 183 | `LLVM instructions <../LangRef.html#instref>`_ are constrained by strict |
| 184 | rules: for example, the Left and Right operators of an `add |
| 185 | instruction <../LangRef.html#i_add>`_ must have the same type, and the |
| 186 | result type of the add must match the operand types. Because all values |
| 187 | in Kaleidoscope are doubles, this makes for very simple code for add, |
| 188 | sub and mul. |
| 189 | |
| 190 | On the other hand, LLVM specifies that the `fcmp |
| 191 | instruction <../LangRef.html#i_fcmp>`_ always returns an 'i1' value (a |
| 192 | one bit integer). The problem with this is that Kaleidoscope wants the |
| 193 | value to be a 0.0 or 1.0 value. In order to get these semantics, we |
| 194 | combine the fcmp instruction with a `uitofp |
| 195 | instruction <../LangRef.html#i_uitofp>`_. This instruction converts its |
| 196 | input integer into a floating point value by treating the input as an |
| 197 | unsigned value. In contrast, if we used the `sitofp |
| 198 | instruction <../LangRef.html#i_sitofp>`_, the Kaleidoscope '<' operator |
| 199 | would return 0.0 and -1.0, depending on the input value. |
| 200 | |
| 201 | .. code-block:: c++ |
| 202 | |
| 203 | Value *CallExprAST::Codegen() { |
| 204 | // Look up the name in the global module table. |
| 205 | Function *CalleeF = TheModule->getFunction(Callee); |
| 206 | if (CalleeF == 0) |
| 207 | return ErrorV("Unknown function referenced"); |
| 208 | |
| 209 | // If argument mismatch error. |
| 210 | if (CalleeF->arg_size() != Args.size()) |
| 211 | return ErrorV("Incorrect # arguments passed"); |
| 212 | |
| 213 | std::vector<Value*> ArgsV; |
| 214 | for (unsigned i = 0, e = Args.size(); i != e; ++i) { |
| 215 | ArgsV.push_back(Args[i]->Codegen()); |
| 216 | if (ArgsV.back() == 0) return 0; |
| 217 | } |
| 218 | |
| 219 | return Builder.CreateCall(CalleeF, ArgsV, "calltmp"); |
| 220 | } |
| 221 | |
| 222 | Code generation for function calls is quite straightforward with LLVM. |
| 223 | The code above initially does a function name lookup in the LLVM |
| 224 | Module's symbol table. Recall that the LLVM Module is the container that |
| 225 | holds all of the functions we are JIT'ing. By giving each function the |
| 226 | same name as what the user specifies, we can use the LLVM symbol table |
| 227 | to resolve function names for us. |
| 228 | |
| 229 | Once we have the function to call, we recursively codegen each argument |
| 230 | that is to be passed in, and create an LLVM `call |
| 231 | instruction <../LangRef.html#i_call>`_. Note that LLVM uses the native C |
| 232 | calling conventions by default, allowing these calls to also call into |
| 233 | standard library functions like "sin" and "cos", with no additional |
| 234 | effort. |
| 235 | |
| 236 | This wraps up our handling of the four basic expressions that we have so |
| 237 | far in Kaleidoscope. Feel free to go in and add some more. For example, |
| 238 | by browsing the `LLVM language reference <../LangRef.html>`_ you'll find |
| 239 | several other interesting instructions that are really easy to plug into |
| 240 | our basic framework. |
| 241 | |
| 242 | Function Code Generation |
| 243 | ======================== |
| 244 | |
| 245 | Code generation for prototypes and functions must handle a number of |
| 246 | details, which make their code less beautiful than expression code |
| 247 | generation, but allows us to illustrate some important points. First, |
| 248 | lets talk about code generation for prototypes: they are used both for |
| 249 | function bodies and external function declarations. The code starts |
| 250 | with: |
| 251 | |
| 252 | .. code-block:: c++ |
| 253 | |
| 254 | Function *PrototypeAST::Codegen() { |
| 255 | // Make the function type: double(double,double) etc. |
| 256 | std::vector<Type*> Doubles(Args.size(), |
| 257 | Type::getDoubleTy(getGlobalContext())); |
| 258 | FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), |
| 259 | Doubles, false); |
| 260 | |
| 261 | Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule); |
| 262 | |
| 263 | This code packs a lot of power into a few lines. Note first that this |
| 264 | function returns a "Function\*" instead of a "Value\*". Because a |
| 265 | "prototype" really talks about the external interface for a function |
| 266 | (not the value computed by an expression), it makes sense for it to |
| 267 | return the LLVM Function it corresponds to when codegen'd. |
| 268 | |
| 269 | The call to ``FunctionType::get`` creates the ``FunctionType`` that |
| 270 | should be used for a given Prototype. Since all function arguments in |
| 271 | Kaleidoscope are of type double, the first line creates a vector of "N" |
| 272 | LLVM double types. It then uses the ``Functiontype::get`` method to |
| 273 | create a function type that takes "N" doubles as arguments, returns one |
| 274 | double as a result, and that is not vararg (the false parameter |
| 275 | indicates this). Note that Types in LLVM are uniqued just like Constants |
| 276 | are, so you don't "new" a type, you "get" it. |
| 277 | |
| 278 | The final line above actually creates the function that the prototype |
| 279 | will correspond to. This indicates the type, linkage and name to use, as |
| 280 | well as which module to insert into. "`external |
| 281 | linkage <../LangRef.html#linkage>`_" means that the function may be |
| 282 | defined outside the current module and/or that it is callable by |
| 283 | functions outside the module. The Name passed in is the name the user |
| 284 | specified: since "``TheModule``" is specified, this name is registered |
| 285 | in "``TheModule``"s symbol table, which is used by the function call |
| 286 | code above. |
| 287 | |
| 288 | .. code-block:: c++ |
| 289 | |
| 290 | // If F conflicted, there was already something named 'Name'. If it has a |
| 291 | // body, don't allow redefinition or reextern. |
| 292 | if (F->getName() != Name) { |
| 293 | // Delete the one we just made and get the existing one. |
| 294 | F->eraseFromParent(); |
| 295 | F = TheModule->getFunction(Name); |
| 296 | |
| 297 | The Module symbol table works just like the Function symbol table when |
| 298 | it comes to name conflicts: if a new function is created with a name |
| 299 | that was previously added to the symbol table, the new function will get |
| 300 | implicitly renamed when added to the Module. The code above exploits |
| 301 | this fact to determine if there was a previous definition of this |
| 302 | function. |
| 303 | |
| 304 | In Kaleidoscope, I choose to allow redefinitions of functions in two |
| 305 | cases: first, we want to allow 'extern'ing a function more than once, as |
| 306 | long as the prototypes for the externs match (since all arguments have |
| 307 | the same type, we just have to check that the number of arguments |
| 308 | match). Second, we want to allow 'extern'ing a function and then |
| 309 | defining a body for it. This is useful when defining mutually recursive |
| 310 | functions. |
| 311 | |
| 312 | In order to implement this, the code above first checks to see if there |
| 313 | is a collision on the name of the function. If so, it deletes the |
| 314 | function we just created (by calling ``eraseFromParent``) and then |
| 315 | calling ``getFunction`` to get the existing function with the specified |
| 316 | name. Note that many APIs in LLVM have "erase" forms and "remove" forms. |
| 317 | The "remove" form unlinks the object from its parent (e.g. a Function |
| 318 | from a Module) and returns it. The "erase" form unlinks the object and |
| 319 | then deletes it. |
| 320 | |
| 321 | .. code-block:: c++ |
| 322 | |
| 323 | // If F already has a body, reject this. |
| 324 | if (!F->empty()) { |
| 325 | ErrorF("redefinition of function"); |
| 326 | return 0; |
| 327 | } |
| 328 | |
| 329 | // If F took a different number of args, reject. |
| 330 | if (F->arg_size() != Args.size()) { |
| 331 | ErrorF("redefinition of function with different # args"); |
| 332 | return 0; |
| 333 | } |
| 334 | } |
| 335 | |
| 336 | In order to verify the logic above, we first check to see if the |
| 337 | pre-existing function is "empty". In this case, empty means that it has |
| 338 | no basic blocks in it, which means it has no body. If it has no body, it |
| 339 | is a forward declaration. Since we don't allow anything after a full |
| 340 | definition of the function, the code rejects this case. If the previous |
| 341 | reference to a function was an 'extern', we simply verify that the |
| 342 | number of arguments for that definition and this one match up. If not, |
| 343 | we emit an error. |
| 344 | |
| 345 | .. code-block:: c++ |
| 346 | |
| 347 | // Set names for all arguments. |
| 348 | unsigned Idx = 0; |
| 349 | for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); |
| 350 | ++AI, ++Idx) { |
| 351 | AI->setName(Args[Idx]); |
| 352 | |
| 353 | // Add arguments to variable symbol table. |
| 354 | NamedValues[Args[Idx]] = AI; |
| 355 | } |
| 356 | return F; |
| 357 | } |
| 358 | |
| 359 | The last bit of code for prototypes loops over all of the arguments in |
| 360 | the function, setting the name of the LLVM Argument objects to match, |
| 361 | and registering the arguments in the ``NamedValues`` map for future use |
| 362 | by the ``VariableExprAST`` AST node. Once this is set up, it returns the |
| 363 | Function object to the caller. Note that we don't check for conflicting |
| 364 | argument names here (e.g. "extern foo(a b a)"). Doing so would be very |
| 365 | straight-forward with the mechanics we have already used above. |
| 366 | |
| 367 | .. code-block:: c++ |
| 368 | |
| 369 | Function *FunctionAST::Codegen() { |
| 370 | NamedValues.clear(); |
| 371 | |
| 372 | Function *TheFunction = Proto->Codegen(); |
| 373 | if (TheFunction == 0) |
| 374 | return 0; |
| 375 | |
| 376 | Code generation for function definitions starts out simply enough: we |
| 377 | just codegen the prototype (Proto) and verify that it is ok. We then |
| 378 | clear out the ``NamedValues`` map to make sure that there isn't anything |
| 379 | in it from the last function we compiled. Code generation of the |
| 380 | prototype ensures that there is an LLVM Function object that is ready to |
| 381 | go for us. |
| 382 | |
| 383 | .. code-block:: c++ |
| 384 | |
| 385 | // Create a new basic block to start insertion into. |
| 386 | BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); |
| 387 | Builder.SetInsertPoint(BB); |
| 388 | |
| 389 | if (Value *RetVal = Body->Codegen()) { |
| 390 | |
| 391 | Now we get to the point where the ``Builder`` is set up. The first line |
| 392 | creates a new `basic block <http://en.wikipedia.org/wiki/Basic_block>`_ |
| 393 | (named "entry"), which is inserted into ``TheFunction``. The second line |
| 394 | then tells the builder that new instructions should be inserted into the |
| 395 | end of the new basic block. Basic blocks in LLVM are an important part |
| 396 | of functions that define the `Control Flow |
| 397 | Graph <http://en.wikipedia.org/wiki/Control_flow_graph>`_. Since we |
| 398 | don't have any control flow, our functions will only contain one block |
| 399 | at this point. We'll fix this in `Chapter 5 <LangImpl5.html>`_ :). |
| 400 | |
| 401 | .. code-block:: c++ |
| 402 | |
| 403 | if (Value *RetVal = Body->Codegen()) { |
| 404 | // Finish off the function. |
| 405 | Builder.CreateRet(RetVal); |
| 406 | |
| 407 | // Validate the generated code, checking for consistency. |
| 408 | verifyFunction(*TheFunction); |
| 409 | |
| 410 | return TheFunction; |
| 411 | } |
| 412 | |
| 413 | Once the insertion point is set up, we call the ``CodeGen()`` method for |
| 414 | the root expression of the function. If no error happens, this emits |
| 415 | code to compute the expression into the entry block and returns the |
| 416 | value that was computed. Assuming no error, we then create an LLVM `ret |
| 417 | instruction <../LangRef.html#i_ret>`_, which completes the function. |
| 418 | Once the function is built, we call ``verifyFunction``, which is |
| 419 | provided by LLVM. This function does a variety of consistency checks on |
| 420 | the generated code, to determine if our compiler is doing everything |
| 421 | right. Using this is important: it can catch a lot of bugs. Once the |
| 422 | function is finished and validated, we return it. |
| 423 | |
| 424 | .. code-block:: c++ |
| 425 | |
| 426 | // Error reading body, remove function. |
| 427 | TheFunction->eraseFromParent(); |
| 428 | return 0; |
| 429 | } |
| 430 | |
| 431 | The only piece left here is handling of the error case. For simplicity, |
| 432 | we handle this by merely deleting the function we produced with the |
| 433 | ``eraseFromParent`` method. This allows the user to redefine a function |
| 434 | that they incorrectly typed in before: if we didn't delete it, it would |
| 435 | live in the symbol table, with a body, preventing future redefinition. |
| 436 | |
| 437 | This code does have a bug, though. Since the ``PrototypeAST::Codegen`` |
| 438 | can return a previously defined forward declaration, our code can |
| 439 | actually delete a forward declaration. There are a number of ways to fix |
| 440 | this bug, see what you can come up with! Here is a testcase: |
| 441 | |
| 442 | :: |
| 443 | |
| 444 | extern foo(a b); # ok, defines foo. |
| 445 | def foo(a b) c; # error, 'c' is invalid. |
| 446 | def bar() foo(1, 2); # error, unknown function "foo" |
| 447 | |
| 448 | Driver Changes and Closing Thoughts |
| 449 | =================================== |
| 450 | |
| 451 | For now, code generation to LLVM doesn't really get us much, except that |
| 452 | we can look at the pretty IR calls. The sample code inserts calls to |
| 453 | Codegen into the "``HandleDefinition``", "``HandleExtern``" etc |
| 454 | functions, and then dumps out the LLVM IR. This gives a nice way to look |
| 455 | at the LLVM IR for simple functions. For example: |
| 456 | |
| 457 | :: |
| 458 | |
| 459 | ready> 4+5; |
| 460 | Read top-level expression: |
| 461 | define double @0() { |
| 462 | entry: |
| 463 | ret double 9.000000e+00 |
| 464 | } |
| 465 | |
| 466 | Note how the parser turns the top-level expression into anonymous |
| 467 | functions for us. This will be handy when we add `JIT |
| 468 | support <LangImpl4.html#jit>`_ in the next chapter. Also note that the |
| 469 | code is very literally transcribed, no optimizations are being performed |
| 470 | except simple constant folding done by IRBuilder. We will `add |
| 471 | optimizations <LangImpl4.html#trivialconstfold>`_ explicitly in the next |
| 472 | chapter. |
| 473 | |
| 474 | :: |
| 475 | |
| 476 | ready> def foo(a b) a*a + 2*a*b + b*b; |
| 477 | Read function definition: |
| 478 | define double @foo(double %a, double %b) { |
| 479 | entry: |
| 480 | %multmp = fmul double %a, %a |
| 481 | %multmp1 = fmul double 2.000000e+00, %a |
| 482 | %multmp2 = fmul double %multmp1, %b |
| 483 | %addtmp = fadd double %multmp, %multmp2 |
| 484 | %multmp3 = fmul double %b, %b |
| 485 | %addtmp4 = fadd double %addtmp, %multmp3 |
| 486 | ret double %addtmp4 |
| 487 | } |
| 488 | |
| 489 | This shows some simple arithmetic. Notice the striking similarity to the |
| 490 | LLVM builder calls that we use to create the instructions. |
| 491 | |
| 492 | :: |
| 493 | |
| 494 | ready> def bar(a) foo(a, 4.0) + bar(31337); |
| 495 | Read function definition: |
| 496 | define double @bar(double %a) { |
| 497 | entry: |
| 498 | %calltmp = call double @foo(double %a, double 4.000000e+00) |
| 499 | %calltmp1 = call double @bar(double 3.133700e+04) |
| 500 | %addtmp = fadd double %calltmp, %calltmp1 |
| 501 | ret double %addtmp |
| 502 | } |
| 503 | |
| 504 | This shows some function calls. Note that this function will take a long |
| 505 | time to execute if you call it. In the future we'll add conditional |
| 506 | control flow to actually make recursion useful :). |
| 507 | |
| 508 | :: |
| 509 | |
| 510 | ready> extern cos(x); |
| 511 | Read extern: |
| 512 | declare double @cos(double) |
| 513 | |
| 514 | ready> cos(1.234); |
| 515 | Read top-level expression: |
| 516 | define double @1() { |
| 517 | entry: |
| 518 | %calltmp = call double @cos(double 1.234000e+00) |
| 519 | ret double %calltmp |
| 520 | } |
| 521 | |
| 522 | This shows an extern for the libm "cos" function, and a call to it. |
| 523 | |
| 524 | .. TODO:: Abandon Pygments' horrible `llvm` lexer. It just totally gives up |
| 525 | on highlighting this due to the first line. |
| 526 | |
| 527 | :: |
| 528 | |
| 529 | ready> ^D |
| 530 | ; ModuleID = 'my cool jit' |
| 531 | |
| 532 | define double @0() { |
| 533 | entry: |
| 534 | %addtmp = fadd double 4.000000e+00, 5.000000e+00 |
| 535 | ret double %addtmp |
| 536 | } |
| 537 | |
| 538 | define double @foo(double %a, double %b) { |
| 539 | entry: |
| 540 | %multmp = fmul double %a, %a |
| 541 | %multmp1 = fmul double 2.000000e+00, %a |
| 542 | %multmp2 = fmul double %multmp1, %b |
| 543 | %addtmp = fadd double %multmp, %multmp2 |
| 544 | %multmp3 = fmul double %b, %b |
| 545 | %addtmp4 = fadd double %addtmp, %multmp3 |
| 546 | ret double %addtmp4 |
| 547 | } |
| 548 | |
| 549 | define double @bar(double %a) { |
| 550 | entry: |
| 551 | %calltmp = call double @foo(double %a, double 4.000000e+00) |
| 552 | %calltmp1 = call double @bar(double 3.133700e+04) |
| 553 | %addtmp = fadd double %calltmp, %calltmp1 |
| 554 | ret double %addtmp |
| 555 | } |
| 556 | |
| 557 | declare double @cos(double) |
| 558 | |
| 559 | define double @1() { |
| 560 | entry: |
| 561 | %calltmp = call double @cos(double 1.234000e+00) |
| 562 | ret double %calltmp |
| 563 | } |
| 564 | |
| 565 | When you quit the current demo, it dumps out the IR for the entire |
| 566 | module generated. Here you can see the big picture with all the |
| 567 | functions referencing each other. |
| 568 | |
| 569 | This wraps up the third chapter of the Kaleidoscope tutorial. Up next, |
| 570 | we'll describe how to `add JIT codegen and optimizer |
| 571 | support <LangImpl4.html>`_ to this so we can actually start running |
| 572 | code! |
| 573 | |
| 574 | Full Code Listing |
| 575 | ================= |
| 576 | |
| 577 | Here is the complete code listing for our running example, enhanced with |
| 578 | the LLVM code generator. Because this uses the LLVM libraries, we need |
| 579 | to link them in. To do this, we use the |
| 580 | `llvm-config <http://llvm.org/cmds/llvm-config.html>`_ tool to inform |
| 581 | our makefile/command line about which options to use: |
| 582 | |
| 583 | .. code-block:: bash |
| 584 | |
| 585 | # Compile |
| 586 | clang++ -g -O3 toy.cpp `llvm-config --cppflags --ldflags --libs core` -o toy |
| 587 | # Run |
| 588 | ./toy |
| 589 | |
| 590 | Here is the code: |
| 591 | |
| 592 | .. code-block:: c++ |
| 593 | |
| 594 | // To build this: |
| 595 | // See example below. |
| 596 | |
| 597 | #include "llvm/DerivedTypes.h" |
| 598 | #include "llvm/IRBuilder.h" |
| 599 | #include "llvm/LLVMContext.h" |
| 600 | #include "llvm/Module.h" |
| 601 | #include "llvm/Analysis/Verifier.h" |
| 602 | #include <cstdio> |
| 603 | #include <string> |
| 604 | #include <map> |
| 605 | #include <vector> |
| 606 | using namespace llvm; |
| 607 | |
| 608 | //===----------------------------------------------------------------------===// |
| 609 | // Lexer |
| 610 | //===----------------------------------------------------------------------===// |
| 611 | |
| 612 | // The lexer returns tokens [0-255] if it is an unknown character, otherwise one |
| 613 | // of these for known things. |
| 614 | enum Token { |
| 615 | tok_eof = -1, |
| 616 | |
| 617 | // commands |
| 618 | tok_def = -2, tok_extern = -3, |
| 619 | |
| 620 | // primary |
| 621 | tok_identifier = -4, tok_number = -5 |
| 622 | }; |
| 623 | |
| 624 | static std::string IdentifierStr; // Filled in if tok_identifier |
| 625 | static double NumVal; // Filled in if tok_number |
| 626 | |
| 627 | /// gettok - Return the next token from standard input. |
| 628 | static int gettok() { |
| 629 | static int LastChar = ' '; |
| 630 | |
| 631 | // Skip any whitespace. |
| 632 | while (isspace(LastChar)) |
| 633 | LastChar = getchar(); |
| 634 | |
| 635 | if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]* |
| 636 | IdentifierStr = LastChar; |
| 637 | while (isalnum((LastChar = getchar()))) |
| 638 | IdentifierStr += LastChar; |
| 639 | |
| 640 | if (IdentifierStr == "def") return tok_def; |
| 641 | if (IdentifierStr == "extern") return tok_extern; |
| 642 | return tok_identifier; |
| 643 | } |
| 644 | |
| 645 | if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ |
| 646 | std::string NumStr; |
| 647 | do { |
| 648 | NumStr += LastChar; |
| 649 | LastChar = getchar(); |
| 650 | } while (isdigit(LastChar) || LastChar == '.'); |
| 651 | |
| 652 | NumVal = strtod(NumStr.c_str(), 0); |
| 653 | return tok_number; |
| 654 | } |
| 655 | |
| 656 | if (LastChar == '#') { |
| 657 | // Comment until end of line. |
| 658 | do LastChar = getchar(); |
| 659 | while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); |
| 660 | |
| 661 | if (LastChar != EOF) |
| 662 | return gettok(); |
| 663 | } |
| 664 | |
| 665 | // Check for end of file. Don't eat the EOF. |
| 666 | if (LastChar == EOF) |
| 667 | return tok_eof; |
| 668 | |
| 669 | // Otherwise, just return the character as its ascii value. |
| 670 | int ThisChar = LastChar; |
| 671 | LastChar = getchar(); |
| 672 | return ThisChar; |
| 673 | } |
| 674 | |
| 675 | //===----------------------------------------------------------------------===// |
| 676 | // Abstract Syntax Tree (aka Parse Tree) |
| 677 | //===----------------------------------------------------------------------===// |
| 678 | |
| 679 | /// ExprAST - Base class for all expression nodes. |
| 680 | class ExprAST { |
| 681 | public: |
| 682 | virtual ~ExprAST() {} |
| 683 | virtual Value *Codegen() = 0; |
| 684 | }; |
| 685 | |
| 686 | /// NumberExprAST - Expression class for numeric literals like "1.0". |
| 687 | class NumberExprAST : public ExprAST { |
| 688 | double Val; |
| 689 | public: |
| 690 | NumberExprAST(double val) : Val(val) {} |
| 691 | virtual Value *Codegen(); |
| 692 | }; |
| 693 | |
| 694 | /// VariableExprAST - Expression class for referencing a variable, like "a". |
| 695 | class VariableExprAST : public ExprAST { |
| 696 | std::string Name; |
| 697 | public: |
| 698 | VariableExprAST(const std::string &name) : Name(name) {} |
| 699 | virtual Value *Codegen(); |
| 700 | }; |
| 701 | |
| 702 | /// BinaryExprAST - Expression class for a binary operator. |
| 703 | class BinaryExprAST : public ExprAST { |
| 704 | char Op; |
| 705 | ExprAST *LHS, *RHS; |
| 706 | public: |
| 707 | BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) |
| 708 | : Op(op), LHS(lhs), RHS(rhs) {} |
| 709 | virtual Value *Codegen(); |
| 710 | }; |
| 711 | |
| 712 | /// CallExprAST - Expression class for function calls. |
| 713 | class CallExprAST : public ExprAST { |
| 714 | std::string Callee; |
| 715 | std::vector<ExprAST*> Args; |
| 716 | public: |
| 717 | CallExprAST(const std::string &callee, std::vector<ExprAST*> &args) |
| 718 | : Callee(callee), Args(args) {} |
| 719 | virtual Value *Codegen(); |
| 720 | }; |
| 721 | |
| 722 | /// PrototypeAST - This class represents the "prototype" for a function, |
| 723 | /// which captures its name, and its argument names (thus implicitly the number |
| 724 | /// of arguments the function takes). |
| 725 | class PrototypeAST { |
| 726 | std::string Name; |
| 727 | std::vector<std::string> Args; |
| 728 | public: |
| 729 | PrototypeAST(const std::string &name, const std::vector<std::string> &args) |
| 730 | : Name(name), Args(args) {} |
| 731 | |
| 732 | Function *Codegen(); |
| 733 | }; |
| 734 | |
| 735 | /// FunctionAST - This class represents a function definition itself. |
| 736 | class FunctionAST { |
| 737 | PrototypeAST *Proto; |
| 738 | ExprAST *Body; |
| 739 | public: |
| 740 | FunctionAST(PrototypeAST *proto, ExprAST *body) |
| 741 | : Proto(proto), Body(body) {} |
| 742 | |
| 743 | Function *Codegen(); |
| 744 | }; |
| 745 | |
| 746 | //===----------------------------------------------------------------------===// |
| 747 | // Parser |
| 748 | //===----------------------------------------------------------------------===// |
| 749 | |
| 750 | /// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current |
| 751 | /// token the parser is looking at. getNextToken reads another token from the |
| 752 | /// lexer and updates CurTok with its results. |
| 753 | static int CurTok; |
| 754 | static int getNextToken() { |
| 755 | return CurTok = gettok(); |
| 756 | } |
| 757 | |
| 758 | /// BinopPrecedence - This holds the precedence for each binary operator that is |
| 759 | /// defined. |
| 760 | static std::map<char, int> BinopPrecedence; |
| 761 | |
| 762 | /// GetTokPrecedence - Get the precedence of the pending binary operator token. |
| 763 | static int GetTokPrecedence() { |
| 764 | if (!isascii(CurTok)) |
| 765 | return -1; |
| 766 | |
| 767 | // Make sure it's a declared binop. |
| 768 | int TokPrec = BinopPrecedence[CurTok]; |
| 769 | if (TokPrec <= 0) return -1; |
| 770 | return TokPrec; |
| 771 | } |
| 772 | |
| 773 | /// Error* - These are little helper functions for error handling. |
| 774 | ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;} |
| 775 | PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; } |
| 776 | FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; } |
| 777 | |
| 778 | static ExprAST *ParseExpression(); |
| 779 | |
| 780 | /// identifierexpr |
| 781 | /// ::= identifier |
| 782 | /// ::= identifier '(' expression* ')' |
| 783 | static ExprAST *ParseIdentifierExpr() { |
| 784 | std::string IdName = IdentifierStr; |
| 785 | |
| 786 | getNextToken(); // eat identifier. |
| 787 | |
| 788 | if (CurTok != '(') // Simple variable ref. |
| 789 | return new VariableExprAST(IdName); |
| 790 | |
| 791 | // Call. |
| 792 | getNextToken(); // eat ( |
| 793 | std::vector<ExprAST*> Args; |
| 794 | if (CurTok != ')') { |
| 795 | while (1) { |
| 796 | ExprAST *Arg = ParseExpression(); |
| 797 | if (!Arg) return 0; |
| 798 | Args.push_back(Arg); |
| 799 | |
| 800 | if (CurTok == ')') break; |
| 801 | |
| 802 | if (CurTok != ',') |
| 803 | return Error("Expected ')' or ',' in argument list"); |
| 804 | getNextToken(); |
| 805 | } |
| 806 | } |
| 807 | |
| 808 | // Eat the ')'. |
| 809 | getNextToken(); |
| 810 | |
| 811 | return new CallExprAST(IdName, Args); |
| 812 | } |
| 813 | |
| 814 | /// numberexpr ::= number |
| 815 | static ExprAST *ParseNumberExpr() { |
| 816 | ExprAST *Result = new NumberExprAST(NumVal); |
| 817 | getNextToken(); // consume the number |
| 818 | return Result; |
| 819 | } |
| 820 | |
| 821 | /// parenexpr ::= '(' expression ')' |
| 822 | static ExprAST *ParseParenExpr() { |
| 823 | getNextToken(); // eat (. |
| 824 | ExprAST *V = ParseExpression(); |
| 825 | if (!V) return 0; |
| 826 | |
| 827 | if (CurTok != ')') |
| 828 | return Error("expected ')'"); |
| 829 | getNextToken(); // eat ). |
| 830 | return V; |
| 831 | } |
| 832 | |
| 833 | /// primary |
| 834 | /// ::= identifierexpr |
| 835 | /// ::= numberexpr |
| 836 | /// ::= parenexpr |
| 837 | static ExprAST *ParsePrimary() { |
| 838 | switch (CurTok) { |
| 839 | default: return Error("unknown token when expecting an expression"); |
| 840 | case tok_identifier: return ParseIdentifierExpr(); |
| 841 | case tok_number: return ParseNumberExpr(); |
| 842 | case '(': return ParseParenExpr(); |
| 843 | } |
| 844 | } |
| 845 | |
| 846 | /// binoprhs |
| 847 | /// ::= ('+' primary)* |
| 848 | static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { |
| 849 | // If this is a binop, find its precedence. |
| 850 | while (1) { |
| 851 | int TokPrec = GetTokPrecedence(); |
| 852 | |
| 853 | // If this is a binop that binds at least as tightly as the current binop, |
| 854 | // consume it, otherwise we are done. |
| 855 | if (TokPrec < ExprPrec) |
| 856 | return LHS; |
| 857 | |
| 858 | // Okay, we know this is a binop. |
| 859 | int BinOp = CurTok; |
| 860 | getNextToken(); // eat binop |
| 861 | |
| 862 | // Parse the primary expression after the binary operator. |
| 863 | ExprAST *RHS = ParsePrimary(); |
| 864 | if (!RHS) return 0; |
| 865 | |
| 866 | // If BinOp binds less tightly with RHS than the operator after RHS, let |
| 867 | // the pending operator take RHS as its LHS. |
| 868 | int NextPrec = GetTokPrecedence(); |
| 869 | if (TokPrec < NextPrec) { |
| 870 | RHS = ParseBinOpRHS(TokPrec+1, RHS); |
| 871 | if (RHS == 0) return 0; |
| 872 | } |
| 873 | |
| 874 | // Merge LHS/RHS. |
| 875 | LHS = new BinaryExprAST(BinOp, LHS, RHS); |
| 876 | } |
| 877 | } |
| 878 | |
| 879 | /// expression |
| 880 | /// ::= primary binoprhs |
| 881 | /// |
| 882 | static ExprAST *ParseExpression() { |
| 883 | ExprAST *LHS = ParsePrimary(); |
| 884 | if (!LHS) return 0; |
| 885 | |
| 886 | return ParseBinOpRHS(0, LHS); |
| 887 | } |
| 888 | |
| 889 | /// prototype |
| 890 | /// ::= id '(' id* ')' |
| 891 | static PrototypeAST *ParsePrototype() { |
| 892 | if (CurTok != tok_identifier) |
| 893 | return ErrorP("Expected function name in prototype"); |
| 894 | |
| 895 | std::string FnName = IdentifierStr; |
| 896 | getNextToken(); |
| 897 | |
| 898 | if (CurTok != '(') |
| 899 | return ErrorP("Expected '(' in prototype"); |
| 900 | |
| 901 | std::vector<std::string> ArgNames; |
| 902 | while (getNextToken() == tok_identifier) |
| 903 | ArgNames.push_back(IdentifierStr); |
| 904 | if (CurTok != ')') |
| 905 | return ErrorP("Expected ')' in prototype"); |
| 906 | |
| 907 | // success. |
| 908 | getNextToken(); // eat ')'. |
| 909 | |
| 910 | return new PrototypeAST(FnName, ArgNames); |
| 911 | } |
| 912 | |
| 913 | /// definition ::= 'def' prototype expression |
| 914 | static FunctionAST *ParseDefinition() { |
| 915 | getNextToken(); // eat def. |
| 916 | PrototypeAST *Proto = ParsePrototype(); |
| 917 | if (Proto == 0) return 0; |
| 918 | |
| 919 | if (ExprAST *E = ParseExpression()) |
| 920 | return new FunctionAST(Proto, E); |
| 921 | return 0; |
| 922 | } |
| 923 | |
| 924 | /// toplevelexpr ::= expression |
| 925 | static FunctionAST *ParseTopLevelExpr() { |
| 926 | if (ExprAST *E = ParseExpression()) { |
| 927 | // Make an anonymous proto. |
| 928 | PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); |
| 929 | return new FunctionAST(Proto, E); |
| 930 | } |
| 931 | return 0; |
| 932 | } |
| 933 | |
| 934 | /// external ::= 'extern' prototype |
| 935 | static PrototypeAST *ParseExtern() { |
| 936 | getNextToken(); // eat extern. |
| 937 | return ParsePrototype(); |
| 938 | } |
| 939 | |
| 940 | //===----------------------------------------------------------------------===// |
| 941 | // Code Generation |
| 942 | //===----------------------------------------------------------------------===// |
| 943 | |
| 944 | static Module *TheModule; |
| 945 | static IRBuilder<> Builder(getGlobalContext()); |
| 946 | static std::map<std::string, Value*> NamedValues; |
| 947 | |
| 948 | Value *ErrorV(const char *Str) { Error(Str); return 0; } |
| 949 | |
| 950 | Value *NumberExprAST::Codegen() { |
| 951 | return ConstantFP::get(getGlobalContext(), APFloat(Val)); |
| 952 | } |
| 953 | |
| 954 | Value *VariableExprAST::Codegen() { |
| 955 | // Look this variable up in the function. |
| 956 | Value *V = NamedValues[Name]; |
| 957 | return V ? V : ErrorV("Unknown variable name"); |
| 958 | } |
| 959 | |
| 960 | Value *BinaryExprAST::Codegen() { |
| 961 | Value *L = LHS->Codegen(); |
| 962 | Value *R = RHS->Codegen(); |
| 963 | if (L == 0 || R == 0) return 0; |
| 964 | |
| 965 | switch (Op) { |
| 966 | case '+': return Builder.CreateFAdd(L, R, "addtmp"); |
| 967 | case '-': return Builder.CreateFSub(L, R, "subtmp"); |
| 968 | case '*': return Builder.CreateFMul(L, R, "multmp"); |
| 969 | case '<': |
| 970 | L = Builder.CreateFCmpULT(L, R, "cmptmp"); |
| 971 | // Convert bool 0/1 to double 0.0 or 1.0 |
| 972 | return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), |
| 973 | "booltmp"); |
| 974 | default: return ErrorV("invalid binary operator"); |
| 975 | } |
| 976 | } |
| 977 | |
| 978 | Value *CallExprAST::Codegen() { |
| 979 | // Look up the name in the global module table. |
| 980 | Function *CalleeF = TheModule->getFunction(Callee); |
| 981 | if (CalleeF == 0) |
| 982 | return ErrorV("Unknown function referenced"); |
| 983 | |
| 984 | // If argument mismatch error. |
| 985 | if (CalleeF->arg_size() != Args.size()) |
| 986 | return ErrorV("Incorrect # arguments passed"); |
| 987 | |
| 988 | std::vector<Value*> ArgsV; |
| 989 | for (unsigned i = 0, e = Args.size(); i != e; ++i) { |
| 990 | ArgsV.push_back(Args[i]->Codegen()); |
| 991 | if (ArgsV.back() == 0) return 0; |
| 992 | } |
| 993 | |
| 994 | return Builder.CreateCall(CalleeF, ArgsV, "calltmp"); |
| 995 | } |
| 996 | |
| 997 | Function *PrototypeAST::Codegen() { |
| 998 | // Make the function type: double(double,double) etc. |
| 999 | std::vector<Type*> Doubles(Args.size(), |
| 1000 | Type::getDoubleTy(getGlobalContext())); |
| 1001 | FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), |
| 1002 | Doubles, false); |
| 1003 | |
| 1004 | Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule); |
| 1005 | |
| 1006 | // If F conflicted, there was already something named 'Name'. If it has a |
| 1007 | // body, don't allow redefinition or reextern. |
| 1008 | if (F->getName() != Name) { |
| 1009 | // Delete the one we just made and get the existing one. |
| 1010 | F->eraseFromParent(); |
| 1011 | F = TheModule->getFunction(Name); |
| 1012 | |
| 1013 | // If F already has a body, reject this. |
| 1014 | if (!F->empty()) { |
| 1015 | ErrorF("redefinition of function"); |
| 1016 | return 0; |
| 1017 | } |
| 1018 | |
| 1019 | // If F took a different number of args, reject. |
| 1020 | if (F->arg_size() != Args.size()) { |
| 1021 | ErrorF("redefinition of function with different # args"); |
| 1022 | return 0; |
| 1023 | } |
| 1024 | } |
| 1025 | |
| 1026 | // Set names for all arguments. |
| 1027 | unsigned Idx = 0; |
| 1028 | for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); |
| 1029 | ++AI, ++Idx) { |
| 1030 | AI->setName(Args[Idx]); |
| 1031 | |
| 1032 | // Add arguments to variable symbol table. |
| 1033 | NamedValues[Args[Idx]] = AI; |
| 1034 | } |
| 1035 | |
| 1036 | return F; |
| 1037 | } |
| 1038 | |
| 1039 | Function *FunctionAST::Codegen() { |
| 1040 | NamedValues.clear(); |
| 1041 | |
| 1042 | Function *TheFunction = Proto->Codegen(); |
| 1043 | if (TheFunction == 0) |
| 1044 | return 0; |
| 1045 | |
| 1046 | // Create a new basic block to start insertion into. |
| 1047 | BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); |
| 1048 | Builder.SetInsertPoint(BB); |
| 1049 | |
| 1050 | if (Value *RetVal = Body->Codegen()) { |
| 1051 | // Finish off the function. |
| 1052 | Builder.CreateRet(RetVal); |
| 1053 | |
| 1054 | // Validate the generated code, checking for consistency. |
| 1055 | verifyFunction(*TheFunction); |
| 1056 | |
| 1057 | return TheFunction; |
| 1058 | } |
| 1059 | |
| 1060 | // Error reading body, remove function. |
| 1061 | TheFunction->eraseFromParent(); |
| 1062 | return 0; |
| 1063 | } |
| 1064 | |
| 1065 | //===----------------------------------------------------------------------===// |
| 1066 | // Top-Level parsing and JIT Driver |
| 1067 | //===----------------------------------------------------------------------===// |
| 1068 | |
| 1069 | static void HandleDefinition() { |
| 1070 | if (FunctionAST *F = ParseDefinition()) { |
| 1071 | if (Function *LF = F->Codegen()) { |
| 1072 | fprintf(stderr, "Read function definition:"); |
| 1073 | LF->dump(); |
| 1074 | } |
| 1075 | } else { |
| 1076 | // Skip token for error recovery. |
| 1077 | getNextToken(); |
| 1078 | } |
| 1079 | } |
| 1080 | |
| 1081 | static void HandleExtern() { |
| 1082 | if (PrototypeAST *P = ParseExtern()) { |
| 1083 | if (Function *F = P->Codegen()) { |
| 1084 | fprintf(stderr, "Read extern: "); |
| 1085 | F->dump(); |
| 1086 | } |
| 1087 | } else { |
| 1088 | // Skip token for error recovery. |
| 1089 | getNextToken(); |
| 1090 | } |
| 1091 | } |
| 1092 | |
| 1093 | static void HandleTopLevelExpression() { |
| 1094 | // Evaluate a top-level expression into an anonymous function. |
| 1095 | if (FunctionAST *F = ParseTopLevelExpr()) { |
| 1096 | if (Function *LF = F->Codegen()) { |
| 1097 | fprintf(stderr, "Read top-level expression:"); |
| 1098 | LF->dump(); |
| 1099 | } |
| 1100 | } else { |
| 1101 | // Skip token for error recovery. |
| 1102 | getNextToken(); |
| 1103 | } |
| 1104 | } |
| 1105 | |
| 1106 | /// top ::= definition | external | expression | ';' |
| 1107 | static void MainLoop() { |
| 1108 | while (1) { |
| 1109 | fprintf(stderr, "ready> "); |
| 1110 | switch (CurTok) { |
| 1111 | case tok_eof: return; |
| 1112 | case ';': getNextToken(); break; // ignore top-level semicolons. |
| 1113 | case tok_def: HandleDefinition(); break; |
| 1114 | case tok_extern: HandleExtern(); break; |
| 1115 | default: HandleTopLevelExpression(); break; |
| 1116 | } |
| 1117 | } |
| 1118 | } |
| 1119 | |
| 1120 | //===----------------------------------------------------------------------===// |
| 1121 | // "Library" functions that can be "extern'd" from user code. |
| 1122 | //===----------------------------------------------------------------------===// |
| 1123 | |
| 1124 | /// putchard - putchar that takes a double and returns 0. |
| 1125 | extern "C" |
| 1126 | double putchard(double X) { |
| 1127 | putchar((char)X); |
| 1128 | return 0; |
| 1129 | } |
| 1130 | |
| 1131 | //===----------------------------------------------------------------------===// |
| 1132 | // Main driver code. |
| 1133 | //===----------------------------------------------------------------------===// |
| 1134 | |
| 1135 | int main() { |
| 1136 | LLVMContext &Context = getGlobalContext(); |
| 1137 | |
| 1138 | // Install standard binary operators. |
| 1139 | // 1 is lowest precedence. |
| 1140 | BinopPrecedence['<'] = 10; |
| 1141 | BinopPrecedence['+'] = 20; |
| 1142 | BinopPrecedence['-'] = 20; |
| 1143 | BinopPrecedence['*'] = 40; // highest. |
| 1144 | |
| 1145 | // Prime the first token. |
| 1146 | fprintf(stderr, "ready> "); |
| 1147 | getNextToken(); |
| 1148 | |
| 1149 | // Make the module, which holds all the code. |
| 1150 | TheModule = new Module("my cool jit", Context); |
| 1151 | |
| 1152 | // Run the main "interpreter loop" now. |
| 1153 | MainLoop(); |
| 1154 | |
| 1155 | // Print out all of the generated code. |
| 1156 | TheModule->dump(); |
| 1157 | |
| 1158 | return 0; |
| 1159 | } |
| 1160 | |
| 1161 | `Next: Adding JIT and Optimizer Support <LangImpl4.html>`_ |
| 1162 | |