blob: 01935a443b455ad4f2625853d87c608266151abb [file] [log] [blame]
Sean Silvaee47edf2012-12-05 00:26:32 +00001========================================
2Kaleidoscope: Code generation to LLVM IR
3========================================
4
5.. contents::
6 :local:
7
8Written by `Chris Lattner <mailto:sabre@nondot.org>`_
9
10Chapter 3 Introduction
11======================
12
13Welcome to Chapter 3 of the "`Implementing a language with
14LLVM <index.html>`_" tutorial. This chapter shows you how to transform
15the `Abstract Syntax Tree <LangImpl2.html>`_, built in Chapter 2, into
16LLVM IR. This will teach you a little bit about how LLVM does things, as
17well as demonstrate how easy it is to use. It's much more work to build
18a lexer and parser than it is to generate LLVM IR code. :)
19
20**Please note**: the code in this chapter and later require LLVM 2.2 or
21later. LLVM 2.1 and before will not work with it. Also note that you
22need to use a version of this tutorial that matches your LLVM release:
23If you are using an official LLVM release, use the version of the
24documentation included with your release or on the `llvm.org releases
25page <http://llvm.org/releases/>`_.
26
27Code Generation Setup
28=====================
29
30In order to generate LLVM IR, we want some simple setup to get started.
31First we define virtual code generation (codegen) methods in each AST
32class:
33
34.. code-block:: c++
35
36 /// ExprAST - Base class for all expression nodes.
37 class ExprAST {
38 public:
39 virtual ~ExprAST() {}
40 virtual Value *Codegen() = 0;
41 };
42
43 /// NumberExprAST - Expression class for numeric literals like "1.0".
44 class NumberExprAST : public ExprAST {
45 double Val;
46 public:
47 NumberExprAST(double val) : Val(val) {}
48 virtual Value *Codegen();
49 };
50 ...
51
52The Codegen() method says to emit IR for that AST node along with all
53the things it depends on, and they all return an LLVM Value object.
54"Value" is the class used to represent a "`Static Single Assignment
55(SSA) <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
56register" or "SSA value" in LLVM. The most distinct aspect of SSA values
57is that their value is computed as the related instruction executes, and
58it does not get a new value until (and if) the instruction re-executes.
59In other words, there is no way to "change" an SSA value. For more
60information, please read up on `Static Single
61Assignment <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
62- the concepts are really quite natural once you grok them.
63
64Note that instead of adding virtual methods to the ExprAST class
65hierarchy, it could also make sense to use a `visitor
66pattern <http://en.wikipedia.org/wiki/Visitor_pattern>`_ or some other
67way to model this. Again, this tutorial won't dwell on good software
68engineering practices: for our purposes, adding a virtual method is
69simplest.
70
71The second thing we want is an "Error" method like we used for the
72parser, which will be used to report errors found during code generation
73(for example, use of an undeclared parameter):
74
75.. code-block:: c++
76
77 Value *ErrorV(const char *Str) { Error(Str); return 0; }
78
79 static Module *TheModule;
80 static IRBuilder<> Builder(getGlobalContext());
81 static std::map<std::string, Value*> NamedValues;
82
83The static variables will be used during code generation. ``TheModule``
84is the LLVM construct that contains all of the functions and global
85variables in a chunk of code. In many ways, it is the top-level
86structure that the LLVM IR uses to contain code.
87
88The ``Builder`` object is a helper object that makes it easy to generate
89LLVM instructions. Instances of the
90```IRBuilder`` <http://llvm.org/doxygen/IRBuilder_8h-source.html>`_
91class template keep track of the current place to insert instructions
92and has methods to create new instructions.
93
94The ``NamedValues`` map keeps track of which values are defined in the
95current scope and what their LLVM representation is. (In other words, it
96is a symbol table for the code). In this form of Kaleidoscope, the only
97things that can be referenced are function parameters. As such, function
98parameters will be in this map when generating code for their function
99body.
100
101With these basics in place, we can start talking about how to generate
102code for each expression. Note that this assumes that the ``Builder``
103has been set up to generate code *into* something. For now, we'll assume
104that this has already been done, and we'll just use it to emit code.
105
106Expression Code Generation
107==========================
108
109Generating LLVM code for expression nodes is very straightforward: less
110than 45 lines of commented code for all four of our expression nodes.
111First we'll do numeric literals:
112
113.. code-block:: c++
114
115 Value *NumberExprAST::Codegen() {
116 return ConstantFP::get(getGlobalContext(), APFloat(Val));
117 }
118
119In the LLVM IR, numeric constants are represented with the
120``ConstantFP`` class, which holds the numeric value in an ``APFloat``
121internally (``APFloat`` has the capability of holding floating point
122constants of Arbitrary Precision). This code basically just creates
123and returns a ``ConstantFP``. Note that in the LLVM IR that constants
124are all uniqued together and shared. For this reason, the API uses the
125"foo::get(...)" idiom instead of "new foo(..)" or "foo::Create(..)".
126
127.. code-block:: c++
128
129 Value *VariableExprAST::Codegen() {
130 // Look this variable up in the function.
131 Value *V = NamedValues[Name];
132 return V ? V : ErrorV("Unknown variable name");
133 }
134
135References to variables are also quite simple using LLVM. In the simple
136version of Kaleidoscope, we assume that the variable has already been
137emitted somewhere and its value is available. In practice, the only
138values that can be in the ``NamedValues`` map are function arguments.
139This code simply checks to see that the specified name is in the map (if
140not, an unknown variable is being referenced) and returns the value for
141it. In future chapters, we'll add support for `loop induction
142variables <LangImpl5.html#for>`_ in the symbol table, and for `local
143variables <LangImpl7.html#localvars>`_.
144
145.. code-block:: c++
146
147 Value *BinaryExprAST::Codegen() {
148 Value *L = LHS->Codegen();
149 Value *R = RHS->Codegen();
150 if (L == 0 || R == 0) return 0;
151
152 switch (Op) {
153 case '+': return Builder.CreateFAdd(L, R, "addtmp");
154 case '-': return Builder.CreateFSub(L, R, "subtmp");
155 case '*': return Builder.CreateFMul(L, R, "multmp");
156 case '<':
157 L = Builder.CreateFCmpULT(L, R, "cmptmp");
158 // Convert bool 0/1 to double 0.0 or 1.0
159 return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
160 "booltmp");
161 default: return ErrorV("invalid binary operator");
162 }
163 }
164
165Binary operators start to get more interesting. The basic idea here is
166that we recursively emit code for the left-hand side of the expression,
167then the right-hand side, then we compute the result of the binary
168expression. In this code, we do a simple switch on the opcode to create
169the right LLVM instruction.
170
171In the example above, the LLVM builder class is starting to show its
172value. IRBuilder knows where to insert the newly created instruction,
173all you have to do is specify what instruction to create (e.g. with
174``CreateFAdd``), which operands to use (``L`` and ``R`` here) and
175optionally provide a name for the generated instruction.
176
177One nice thing about LLVM is that the name is just a hint. For instance,
178if the code above emits multiple "addtmp" variables, LLVM will
179automatically provide each one with an increasing, unique numeric
180suffix. Local value names for instructions are purely optional, but it
181makes it much easier to read the IR dumps.
182
183`LLVM instructions <../LangRef.html#instref>`_ are constrained by strict
184rules: for example, the Left and Right operators of an `add
185instruction <../LangRef.html#i_add>`_ must have the same type, and the
186result type of the add must match the operand types. Because all values
187in Kaleidoscope are doubles, this makes for very simple code for add,
188sub and mul.
189
190On the other hand, LLVM specifies that the `fcmp
191instruction <../LangRef.html#i_fcmp>`_ always returns an 'i1' value (a
192one bit integer). The problem with this is that Kaleidoscope wants the
193value to be a 0.0 or 1.0 value. In order to get these semantics, we
194combine the fcmp instruction with a `uitofp
195instruction <../LangRef.html#i_uitofp>`_. This instruction converts its
196input integer into a floating point value by treating the input as an
197unsigned value. In contrast, if we used the `sitofp
198instruction <../LangRef.html#i_sitofp>`_, the Kaleidoscope '<' operator
199would return 0.0 and -1.0, depending on the input value.
200
201.. code-block:: c++
202
203 Value *CallExprAST::Codegen() {
204 // Look up the name in the global module table.
205 Function *CalleeF = TheModule->getFunction(Callee);
206 if (CalleeF == 0)
207 return ErrorV("Unknown function referenced");
208
209 // If argument mismatch error.
210 if (CalleeF->arg_size() != Args.size())
211 return ErrorV("Incorrect # arguments passed");
212
213 std::vector<Value*> ArgsV;
214 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
215 ArgsV.push_back(Args[i]->Codegen());
216 if (ArgsV.back() == 0) return 0;
217 }
218
219 return Builder.CreateCall(CalleeF, ArgsV, "calltmp");
220 }
221
222Code generation for function calls is quite straightforward with LLVM.
223The code above initially does a function name lookup in the LLVM
224Module's symbol table. Recall that the LLVM Module is the container that
225holds all of the functions we are JIT'ing. By giving each function the
226same name as what the user specifies, we can use the LLVM symbol table
227to resolve function names for us.
228
229Once we have the function to call, we recursively codegen each argument
230that is to be passed in, and create an LLVM `call
231instruction <../LangRef.html#i_call>`_. Note that LLVM uses the native C
232calling conventions by default, allowing these calls to also call into
233standard library functions like "sin" and "cos", with no additional
234effort.
235
236This wraps up our handling of the four basic expressions that we have so
237far in Kaleidoscope. Feel free to go in and add some more. For example,
238by browsing the `LLVM language reference <../LangRef.html>`_ you'll find
239several other interesting instructions that are really easy to plug into
240our basic framework.
241
242Function Code Generation
243========================
244
245Code generation for prototypes and functions must handle a number of
246details, which make their code less beautiful than expression code
247generation, but allows us to illustrate some important points. First,
248lets talk about code generation for prototypes: they are used both for
249function bodies and external function declarations. The code starts
250with:
251
252.. code-block:: c++
253
254 Function *PrototypeAST::Codegen() {
255 // Make the function type: double(double,double) etc.
256 std::vector<Type*> Doubles(Args.size(),
257 Type::getDoubleTy(getGlobalContext()));
258 FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()),
259 Doubles, false);
260
261 Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
262
263This code packs a lot of power into a few lines. Note first that this
264function returns a "Function\*" instead of a "Value\*". Because a
265"prototype" really talks about the external interface for a function
266(not the value computed by an expression), it makes sense for it to
267return the LLVM Function it corresponds to when codegen'd.
268
269The call to ``FunctionType::get`` creates the ``FunctionType`` that
270should be used for a given Prototype. Since all function arguments in
271Kaleidoscope are of type double, the first line creates a vector of "N"
272LLVM double types. It then uses the ``Functiontype::get`` method to
273create a function type that takes "N" doubles as arguments, returns one
274double as a result, and that is not vararg (the false parameter
275indicates this). Note that Types in LLVM are uniqued just like Constants
276are, so you don't "new" a type, you "get" it.
277
278The final line above actually creates the function that the prototype
279will correspond to. This indicates the type, linkage and name to use, as
280well as which module to insert into. "`external
281linkage <../LangRef.html#linkage>`_" means that the function may be
282defined outside the current module and/or that it is callable by
283functions outside the module. The Name passed in is the name the user
284specified: since "``TheModule``" is specified, this name is registered
285in "``TheModule``"s symbol table, which is used by the function call
286code above.
287
288.. code-block:: c++
289
290 // If F conflicted, there was already something named 'Name'. If it has a
291 // body, don't allow redefinition or reextern.
292 if (F->getName() != Name) {
293 // Delete the one we just made and get the existing one.
294 F->eraseFromParent();
295 F = TheModule->getFunction(Name);
296
297The Module symbol table works just like the Function symbol table when
298it comes to name conflicts: if a new function is created with a name
299that was previously added to the symbol table, the new function will get
300implicitly renamed when added to the Module. The code above exploits
301this fact to determine if there was a previous definition of this
302function.
303
304In Kaleidoscope, I choose to allow redefinitions of functions in two
305cases: first, we want to allow 'extern'ing a function more than once, as
306long as the prototypes for the externs match (since all arguments have
307the same type, we just have to check that the number of arguments
308match). Second, we want to allow 'extern'ing a function and then
309defining a body for it. This is useful when defining mutually recursive
310functions.
311
312In order to implement this, the code above first checks to see if there
313is a collision on the name of the function. If so, it deletes the
314function we just created (by calling ``eraseFromParent``) and then
315calling ``getFunction`` to get the existing function with the specified
316name. Note that many APIs in LLVM have "erase" forms and "remove" forms.
317The "remove" form unlinks the object from its parent (e.g. a Function
318from a Module) and returns it. The "erase" form unlinks the object and
319then deletes it.
320
321.. code-block:: c++
322
323 // If F already has a body, reject this.
324 if (!F->empty()) {
325 ErrorF("redefinition of function");
326 return 0;
327 }
328
329 // If F took a different number of args, reject.
330 if (F->arg_size() != Args.size()) {
331 ErrorF("redefinition of function with different # args");
332 return 0;
333 }
334 }
335
336In order to verify the logic above, we first check to see if the
337pre-existing function is "empty". In this case, empty means that it has
338no basic blocks in it, which means it has no body. If it has no body, it
339is a forward declaration. Since we don't allow anything after a full
340definition of the function, the code rejects this case. If the previous
341reference to a function was an 'extern', we simply verify that the
342number of arguments for that definition and this one match up. If not,
343we emit an error.
344
345.. code-block:: c++
346
347 // Set names for all arguments.
348 unsigned Idx = 0;
349 for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size();
350 ++AI, ++Idx) {
351 AI->setName(Args[Idx]);
352
353 // Add arguments to variable symbol table.
354 NamedValues[Args[Idx]] = AI;
355 }
356 return F;
357 }
358
359The last bit of code for prototypes loops over all of the arguments in
360the function, setting the name of the LLVM Argument objects to match,
361and registering the arguments in the ``NamedValues`` map for future use
362by the ``VariableExprAST`` AST node. Once this is set up, it returns the
363Function object to the caller. Note that we don't check for conflicting
364argument names here (e.g. "extern foo(a b a)"). Doing so would be very
365straight-forward with the mechanics we have already used above.
366
367.. code-block:: c++
368
369 Function *FunctionAST::Codegen() {
370 NamedValues.clear();
371
372 Function *TheFunction = Proto->Codegen();
373 if (TheFunction == 0)
374 return 0;
375
376Code generation for function definitions starts out simply enough: we
377just codegen the prototype (Proto) and verify that it is ok. We then
378clear out the ``NamedValues`` map to make sure that there isn't anything
379in it from the last function we compiled. Code generation of the
380prototype ensures that there is an LLVM Function object that is ready to
381go for us.
382
383.. code-block:: c++
384
385 // Create a new basic block to start insertion into.
386 BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
387 Builder.SetInsertPoint(BB);
388
389 if (Value *RetVal = Body->Codegen()) {
390
391Now we get to the point where the ``Builder`` is set up. The first line
392creates a new `basic block <http://en.wikipedia.org/wiki/Basic_block>`_
393(named "entry"), which is inserted into ``TheFunction``. The second line
394then tells the builder that new instructions should be inserted into the
395end of the new basic block. Basic blocks in LLVM are an important part
396of functions that define the `Control Flow
397Graph <http://en.wikipedia.org/wiki/Control_flow_graph>`_. Since we
398don't have any control flow, our functions will only contain one block
399at this point. We'll fix this in `Chapter 5 <LangImpl5.html>`_ :).
400
401.. code-block:: c++
402
403 if (Value *RetVal = Body->Codegen()) {
404 // Finish off the function.
405 Builder.CreateRet(RetVal);
406
407 // Validate the generated code, checking for consistency.
408 verifyFunction(*TheFunction);
409
410 return TheFunction;
411 }
412
413Once the insertion point is set up, we call the ``CodeGen()`` method for
414the root expression of the function. If no error happens, this emits
415code to compute the expression into the entry block and returns the
416value that was computed. Assuming no error, we then create an LLVM `ret
417instruction <../LangRef.html#i_ret>`_, which completes the function.
418Once the function is built, we call ``verifyFunction``, which is
419provided by LLVM. This function does a variety of consistency checks on
420the generated code, to determine if our compiler is doing everything
421right. Using this is important: it can catch a lot of bugs. Once the
422function is finished and validated, we return it.
423
424.. code-block:: c++
425
426 // Error reading body, remove function.
427 TheFunction->eraseFromParent();
428 return 0;
429 }
430
431The only piece left here is handling of the error case. For simplicity,
432we handle this by merely deleting the function we produced with the
433``eraseFromParent`` method. This allows the user to redefine a function
434that they incorrectly typed in before: if we didn't delete it, it would
435live in the symbol table, with a body, preventing future redefinition.
436
437This code does have a bug, though. Since the ``PrototypeAST::Codegen``
438can return a previously defined forward declaration, our code can
439actually delete a forward declaration. There are a number of ways to fix
440this bug, see what you can come up with! Here is a testcase:
441
442::
443
444 extern foo(a b); # ok, defines foo.
445 def foo(a b) c; # error, 'c' is invalid.
446 def bar() foo(1, 2); # error, unknown function "foo"
447
448Driver Changes and Closing Thoughts
449===================================
450
451For now, code generation to LLVM doesn't really get us much, except that
452we can look at the pretty IR calls. The sample code inserts calls to
453Codegen into the "``HandleDefinition``", "``HandleExtern``" etc
454functions, and then dumps out the LLVM IR. This gives a nice way to look
455at the LLVM IR for simple functions. For example:
456
457::
458
459 ready> 4+5;
460 Read top-level expression:
461 define double @0() {
462 entry:
463 ret double 9.000000e+00
464 }
465
466Note how the parser turns the top-level expression into anonymous
467functions for us. This will be handy when we add `JIT
468support <LangImpl4.html#jit>`_ in the next chapter. Also note that the
469code is very literally transcribed, no optimizations are being performed
470except simple constant folding done by IRBuilder. We will `add
471optimizations <LangImpl4.html#trivialconstfold>`_ explicitly in the next
472chapter.
473
474::
475
476 ready> def foo(a b) a*a + 2*a*b + b*b;
477 Read function definition:
478 define double @foo(double %a, double %b) {
479 entry:
480 %multmp = fmul double %a, %a
481 %multmp1 = fmul double 2.000000e+00, %a
482 %multmp2 = fmul double %multmp1, %b
483 %addtmp = fadd double %multmp, %multmp2
484 %multmp3 = fmul double %b, %b
485 %addtmp4 = fadd double %addtmp, %multmp3
486 ret double %addtmp4
487 }
488
489This shows some simple arithmetic. Notice the striking similarity to the
490LLVM builder calls that we use to create the instructions.
491
492::
493
494 ready> def bar(a) foo(a, 4.0) + bar(31337);
495 Read function definition:
496 define double @bar(double %a) {
497 entry:
498 %calltmp = call double @foo(double %a, double 4.000000e+00)
499 %calltmp1 = call double @bar(double 3.133700e+04)
500 %addtmp = fadd double %calltmp, %calltmp1
501 ret double %addtmp
502 }
503
504This shows some function calls. Note that this function will take a long
505time to execute if you call it. In the future we'll add conditional
506control flow to actually make recursion useful :).
507
508::
509
510 ready> extern cos(x);
511 Read extern:
512 declare double @cos(double)
513
514 ready> cos(1.234);
515 Read top-level expression:
516 define double @1() {
517 entry:
518 %calltmp = call double @cos(double 1.234000e+00)
519 ret double %calltmp
520 }
521
522This shows an extern for the libm "cos" function, and a call to it.
523
524.. TODO:: Abandon Pygments' horrible `llvm` lexer. It just totally gives up
525 on highlighting this due to the first line.
526
527::
528
529 ready> ^D
530 ; ModuleID = 'my cool jit'
531
532 define double @0() {
533 entry:
534 %addtmp = fadd double 4.000000e+00, 5.000000e+00
535 ret double %addtmp
536 }
537
538 define double @foo(double %a, double %b) {
539 entry:
540 %multmp = fmul double %a, %a
541 %multmp1 = fmul double 2.000000e+00, %a
542 %multmp2 = fmul double %multmp1, %b
543 %addtmp = fadd double %multmp, %multmp2
544 %multmp3 = fmul double %b, %b
545 %addtmp4 = fadd double %addtmp, %multmp3
546 ret double %addtmp4
547 }
548
549 define double @bar(double %a) {
550 entry:
551 %calltmp = call double @foo(double %a, double 4.000000e+00)
552 %calltmp1 = call double @bar(double 3.133700e+04)
553 %addtmp = fadd double %calltmp, %calltmp1
554 ret double %addtmp
555 }
556
557 declare double @cos(double)
558
559 define double @1() {
560 entry:
561 %calltmp = call double @cos(double 1.234000e+00)
562 ret double %calltmp
563 }
564
565When you quit the current demo, it dumps out the IR for the entire
566module generated. Here you can see the big picture with all the
567functions referencing each other.
568
569This wraps up the third chapter of the Kaleidoscope tutorial. Up next,
570we'll describe how to `add JIT codegen and optimizer
571support <LangImpl4.html>`_ to this so we can actually start running
572code!
573
574Full Code Listing
575=================
576
577Here is the complete code listing for our running example, enhanced with
578the LLVM code generator. Because this uses the LLVM libraries, we need
579to link them in. To do this, we use the
580`llvm-config <http://llvm.org/cmds/llvm-config.html>`_ tool to inform
581our makefile/command line about which options to use:
582
583.. code-block:: bash
584
585 # Compile
586 clang++ -g -O3 toy.cpp `llvm-config --cppflags --ldflags --libs core` -o toy
587 # Run
588 ./toy
589
590Here is the code:
591
592.. code-block:: c++
593
594 // To build this:
595 // See example below.
596
597 #include "llvm/DerivedTypes.h"
598 #include "llvm/IRBuilder.h"
599 #include "llvm/LLVMContext.h"
600 #include "llvm/Module.h"
601 #include "llvm/Analysis/Verifier.h"
602 #include <cstdio>
603 #include <string>
604 #include <map>
605 #include <vector>
606 using namespace llvm;
607
608 //===----------------------------------------------------------------------===//
609 // Lexer
610 //===----------------------------------------------------------------------===//
611
612 // The lexer returns tokens [0-255] if it is an unknown character, otherwise one
613 // of these for known things.
614 enum Token {
615 tok_eof = -1,
616
617 // commands
618 tok_def = -2, tok_extern = -3,
619
620 // primary
621 tok_identifier = -4, tok_number = -5
622 };
623
624 static std::string IdentifierStr; // Filled in if tok_identifier
625 static double NumVal; // Filled in if tok_number
626
627 /// gettok - Return the next token from standard input.
628 static int gettok() {
629 static int LastChar = ' ';
630
631 // Skip any whitespace.
632 while (isspace(LastChar))
633 LastChar = getchar();
634
635 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
636 IdentifierStr = LastChar;
637 while (isalnum((LastChar = getchar())))
638 IdentifierStr += LastChar;
639
640 if (IdentifierStr == "def") return tok_def;
641 if (IdentifierStr == "extern") return tok_extern;
642 return tok_identifier;
643 }
644
645 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
646 std::string NumStr;
647 do {
648 NumStr += LastChar;
649 LastChar = getchar();
650 } while (isdigit(LastChar) || LastChar == '.');
651
652 NumVal = strtod(NumStr.c_str(), 0);
653 return tok_number;
654 }
655
656 if (LastChar == '#') {
657 // Comment until end of line.
658 do LastChar = getchar();
659 while (LastChar != EOF && LastChar != '\n' && LastChar != '\r');
660
661 if (LastChar != EOF)
662 return gettok();
663 }
664
665 // Check for end of file. Don't eat the EOF.
666 if (LastChar == EOF)
667 return tok_eof;
668
669 // Otherwise, just return the character as its ascii value.
670 int ThisChar = LastChar;
671 LastChar = getchar();
672 return ThisChar;
673 }
674
675 //===----------------------------------------------------------------------===//
676 // Abstract Syntax Tree (aka Parse Tree)
677 //===----------------------------------------------------------------------===//
678
679 /// ExprAST - Base class for all expression nodes.
680 class ExprAST {
681 public:
682 virtual ~ExprAST() {}
683 virtual Value *Codegen() = 0;
684 };
685
686 /// NumberExprAST - Expression class for numeric literals like "1.0".
687 class NumberExprAST : public ExprAST {
688 double Val;
689 public:
690 NumberExprAST(double val) : Val(val) {}
691 virtual Value *Codegen();
692 };
693
694 /// VariableExprAST - Expression class for referencing a variable, like "a".
695 class VariableExprAST : public ExprAST {
696 std::string Name;
697 public:
698 VariableExprAST(const std::string &name) : Name(name) {}
699 virtual Value *Codegen();
700 };
701
702 /// BinaryExprAST - Expression class for a binary operator.
703 class BinaryExprAST : public ExprAST {
704 char Op;
705 ExprAST *LHS, *RHS;
706 public:
707 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
708 : Op(op), LHS(lhs), RHS(rhs) {}
709 virtual Value *Codegen();
710 };
711
712 /// CallExprAST - Expression class for function calls.
713 class CallExprAST : public ExprAST {
714 std::string Callee;
715 std::vector<ExprAST*> Args;
716 public:
717 CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
718 : Callee(callee), Args(args) {}
719 virtual Value *Codegen();
720 };
721
722 /// PrototypeAST - This class represents the "prototype" for a function,
723 /// which captures its name, and its argument names (thus implicitly the number
724 /// of arguments the function takes).
725 class PrototypeAST {
726 std::string Name;
727 std::vector<std::string> Args;
728 public:
729 PrototypeAST(const std::string &name, const std::vector<std::string> &args)
730 : Name(name), Args(args) {}
731
732 Function *Codegen();
733 };
734
735 /// FunctionAST - This class represents a function definition itself.
736 class FunctionAST {
737 PrototypeAST *Proto;
738 ExprAST *Body;
739 public:
740 FunctionAST(PrototypeAST *proto, ExprAST *body)
741 : Proto(proto), Body(body) {}
742
743 Function *Codegen();
744 };
745
746 //===----------------------------------------------------------------------===//
747 // Parser
748 //===----------------------------------------------------------------------===//
749
750 /// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
751 /// token the parser is looking at. getNextToken reads another token from the
752 /// lexer and updates CurTok with its results.
753 static int CurTok;
754 static int getNextToken() {
755 return CurTok = gettok();
756 }
757
758 /// BinopPrecedence - This holds the precedence for each binary operator that is
759 /// defined.
760 static std::map<char, int> BinopPrecedence;
761
762 /// GetTokPrecedence - Get the precedence of the pending binary operator token.
763 static int GetTokPrecedence() {
764 if (!isascii(CurTok))
765 return -1;
766
767 // Make sure it's a declared binop.
768 int TokPrec = BinopPrecedence[CurTok];
769 if (TokPrec <= 0) return -1;
770 return TokPrec;
771 }
772
773 /// Error* - These are little helper functions for error handling.
774 ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
775 PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
776 FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
777
778 static ExprAST *ParseExpression();
779
780 /// identifierexpr
781 /// ::= identifier
782 /// ::= identifier '(' expression* ')'
783 static ExprAST *ParseIdentifierExpr() {
784 std::string IdName = IdentifierStr;
785
786 getNextToken(); // eat identifier.
787
788 if (CurTok != '(') // Simple variable ref.
789 return new VariableExprAST(IdName);
790
791 // Call.
792 getNextToken(); // eat (
793 std::vector<ExprAST*> Args;
794 if (CurTok != ')') {
795 while (1) {
796 ExprAST *Arg = ParseExpression();
797 if (!Arg) return 0;
798 Args.push_back(Arg);
799
800 if (CurTok == ')') break;
801
802 if (CurTok != ',')
803 return Error("Expected ')' or ',' in argument list");
804 getNextToken();
805 }
806 }
807
808 // Eat the ')'.
809 getNextToken();
810
811 return new CallExprAST(IdName, Args);
812 }
813
814 /// numberexpr ::= number
815 static ExprAST *ParseNumberExpr() {
816 ExprAST *Result = new NumberExprAST(NumVal);
817 getNextToken(); // consume the number
818 return Result;
819 }
820
821 /// parenexpr ::= '(' expression ')'
822 static ExprAST *ParseParenExpr() {
823 getNextToken(); // eat (.
824 ExprAST *V = ParseExpression();
825 if (!V) return 0;
826
827 if (CurTok != ')')
828 return Error("expected ')'");
829 getNextToken(); // eat ).
830 return V;
831 }
832
833 /// primary
834 /// ::= identifierexpr
835 /// ::= numberexpr
836 /// ::= parenexpr
837 static ExprAST *ParsePrimary() {
838 switch (CurTok) {
839 default: return Error("unknown token when expecting an expression");
840 case tok_identifier: return ParseIdentifierExpr();
841 case tok_number: return ParseNumberExpr();
842 case '(': return ParseParenExpr();
843 }
844 }
845
846 /// binoprhs
847 /// ::= ('+' primary)*
848 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
849 // If this is a binop, find its precedence.
850 while (1) {
851 int TokPrec = GetTokPrecedence();
852
853 // If this is a binop that binds at least as tightly as the current binop,
854 // consume it, otherwise we are done.
855 if (TokPrec < ExprPrec)
856 return LHS;
857
858 // Okay, we know this is a binop.
859 int BinOp = CurTok;
860 getNextToken(); // eat binop
861
862 // Parse the primary expression after the binary operator.
863 ExprAST *RHS = ParsePrimary();
864 if (!RHS) return 0;
865
866 // If BinOp binds less tightly with RHS than the operator after RHS, let
867 // the pending operator take RHS as its LHS.
868 int NextPrec = GetTokPrecedence();
869 if (TokPrec < NextPrec) {
870 RHS = ParseBinOpRHS(TokPrec+1, RHS);
871 if (RHS == 0) return 0;
872 }
873
874 // Merge LHS/RHS.
875 LHS = new BinaryExprAST(BinOp, LHS, RHS);
876 }
877 }
878
879 /// expression
880 /// ::= primary binoprhs
881 ///
882 static ExprAST *ParseExpression() {
883 ExprAST *LHS = ParsePrimary();
884 if (!LHS) return 0;
885
886 return ParseBinOpRHS(0, LHS);
887 }
888
889 /// prototype
890 /// ::= id '(' id* ')'
891 static PrototypeAST *ParsePrototype() {
892 if (CurTok != tok_identifier)
893 return ErrorP("Expected function name in prototype");
894
895 std::string FnName = IdentifierStr;
896 getNextToken();
897
898 if (CurTok != '(')
899 return ErrorP("Expected '(' in prototype");
900
901 std::vector<std::string> ArgNames;
902 while (getNextToken() == tok_identifier)
903 ArgNames.push_back(IdentifierStr);
904 if (CurTok != ')')
905 return ErrorP("Expected ')' in prototype");
906
907 // success.
908 getNextToken(); // eat ')'.
909
910 return new PrototypeAST(FnName, ArgNames);
911 }
912
913 /// definition ::= 'def' prototype expression
914 static FunctionAST *ParseDefinition() {
915 getNextToken(); // eat def.
916 PrototypeAST *Proto = ParsePrototype();
917 if (Proto == 0) return 0;
918
919 if (ExprAST *E = ParseExpression())
920 return new FunctionAST(Proto, E);
921 return 0;
922 }
923
924 /// toplevelexpr ::= expression
925 static FunctionAST *ParseTopLevelExpr() {
926 if (ExprAST *E = ParseExpression()) {
927 // Make an anonymous proto.
928 PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
929 return new FunctionAST(Proto, E);
930 }
931 return 0;
932 }
933
934 /// external ::= 'extern' prototype
935 static PrototypeAST *ParseExtern() {
936 getNextToken(); // eat extern.
937 return ParsePrototype();
938 }
939
940 //===----------------------------------------------------------------------===//
941 // Code Generation
942 //===----------------------------------------------------------------------===//
943
944 static Module *TheModule;
945 static IRBuilder<> Builder(getGlobalContext());
946 static std::map<std::string, Value*> NamedValues;
947
948 Value *ErrorV(const char *Str) { Error(Str); return 0; }
949
950 Value *NumberExprAST::Codegen() {
951 return ConstantFP::get(getGlobalContext(), APFloat(Val));
952 }
953
954 Value *VariableExprAST::Codegen() {
955 // Look this variable up in the function.
956 Value *V = NamedValues[Name];
957 return V ? V : ErrorV("Unknown variable name");
958 }
959
960 Value *BinaryExprAST::Codegen() {
961 Value *L = LHS->Codegen();
962 Value *R = RHS->Codegen();
963 if (L == 0 || R == 0) return 0;
964
965 switch (Op) {
966 case '+': return Builder.CreateFAdd(L, R, "addtmp");
967 case '-': return Builder.CreateFSub(L, R, "subtmp");
968 case '*': return Builder.CreateFMul(L, R, "multmp");
969 case '<':
970 L = Builder.CreateFCmpULT(L, R, "cmptmp");
971 // Convert bool 0/1 to double 0.0 or 1.0
972 return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
973 "booltmp");
974 default: return ErrorV("invalid binary operator");
975 }
976 }
977
978 Value *CallExprAST::Codegen() {
979 // Look up the name in the global module table.
980 Function *CalleeF = TheModule->getFunction(Callee);
981 if (CalleeF == 0)
982 return ErrorV("Unknown function referenced");
983
984 // If argument mismatch error.
985 if (CalleeF->arg_size() != Args.size())
986 return ErrorV("Incorrect # arguments passed");
987
988 std::vector<Value*> ArgsV;
989 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
990 ArgsV.push_back(Args[i]->Codegen());
991 if (ArgsV.back() == 0) return 0;
992 }
993
994 return Builder.CreateCall(CalleeF, ArgsV, "calltmp");
995 }
996
997 Function *PrototypeAST::Codegen() {
998 // Make the function type: double(double,double) etc.
999 std::vector<Type*> Doubles(Args.size(),
1000 Type::getDoubleTy(getGlobalContext()));
1001 FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()),
1002 Doubles, false);
1003
1004 Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
1005
1006 // If F conflicted, there was already something named 'Name'. If it has a
1007 // body, don't allow redefinition or reextern.
1008 if (F->getName() != Name) {
1009 // Delete the one we just made and get the existing one.
1010 F->eraseFromParent();
1011 F = TheModule->getFunction(Name);
1012
1013 // If F already has a body, reject this.
1014 if (!F->empty()) {
1015 ErrorF("redefinition of function");
1016 return 0;
1017 }
1018
1019 // If F took a different number of args, reject.
1020 if (F->arg_size() != Args.size()) {
1021 ErrorF("redefinition of function with different # args");
1022 return 0;
1023 }
1024 }
1025
1026 // Set names for all arguments.
1027 unsigned Idx = 0;
1028 for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size();
1029 ++AI, ++Idx) {
1030 AI->setName(Args[Idx]);
1031
1032 // Add arguments to variable symbol table.
1033 NamedValues[Args[Idx]] = AI;
1034 }
1035
1036 return F;
1037 }
1038
1039 Function *FunctionAST::Codegen() {
1040 NamedValues.clear();
1041
1042 Function *TheFunction = Proto->Codegen();
1043 if (TheFunction == 0)
1044 return 0;
1045
1046 // Create a new basic block to start insertion into.
1047 BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
1048 Builder.SetInsertPoint(BB);
1049
1050 if (Value *RetVal = Body->Codegen()) {
1051 // Finish off the function.
1052 Builder.CreateRet(RetVal);
1053
1054 // Validate the generated code, checking for consistency.
1055 verifyFunction(*TheFunction);
1056
1057 return TheFunction;
1058 }
1059
1060 // Error reading body, remove function.
1061 TheFunction->eraseFromParent();
1062 return 0;
1063 }
1064
1065 //===----------------------------------------------------------------------===//
1066 // Top-Level parsing and JIT Driver
1067 //===----------------------------------------------------------------------===//
1068
1069 static void HandleDefinition() {
1070 if (FunctionAST *F = ParseDefinition()) {
1071 if (Function *LF = F->Codegen()) {
1072 fprintf(stderr, "Read function definition:");
1073 LF->dump();
1074 }
1075 } else {
1076 // Skip token for error recovery.
1077 getNextToken();
1078 }
1079 }
1080
1081 static void HandleExtern() {
1082 if (PrototypeAST *P = ParseExtern()) {
1083 if (Function *F = P->Codegen()) {
1084 fprintf(stderr, "Read extern: ");
1085 F->dump();
1086 }
1087 } else {
1088 // Skip token for error recovery.
1089 getNextToken();
1090 }
1091 }
1092
1093 static void HandleTopLevelExpression() {
1094 // Evaluate a top-level expression into an anonymous function.
1095 if (FunctionAST *F = ParseTopLevelExpr()) {
1096 if (Function *LF = F->Codegen()) {
1097 fprintf(stderr, "Read top-level expression:");
1098 LF->dump();
1099 }
1100 } else {
1101 // Skip token for error recovery.
1102 getNextToken();
1103 }
1104 }
1105
1106 /// top ::= definition | external | expression | ';'
1107 static void MainLoop() {
1108 while (1) {
1109 fprintf(stderr, "ready> ");
1110 switch (CurTok) {
1111 case tok_eof: return;
1112 case ';': getNextToken(); break; // ignore top-level semicolons.
1113 case tok_def: HandleDefinition(); break;
1114 case tok_extern: HandleExtern(); break;
1115 default: HandleTopLevelExpression(); break;
1116 }
1117 }
1118 }
1119
1120 //===----------------------------------------------------------------------===//
1121 // "Library" functions that can be "extern'd" from user code.
1122 //===----------------------------------------------------------------------===//
1123
1124 /// putchard - putchar that takes a double and returns 0.
1125 extern "C"
1126 double putchard(double X) {
1127 putchar((char)X);
1128 return 0;
1129 }
1130
1131 //===----------------------------------------------------------------------===//
1132 // Main driver code.
1133 //===----------------------------------------------------------------------===//
1134
1135 int main() {
1136 LLVMContext &Context = getGlobalContext();
1137
1138 // Install standard binary operators.
1139 // 1 is lowest precedence.
1140 BinopPrecedence['<'] = 10;
1141 BinopPrecedence['+'] = 20;
1142 BinopPrecedence['-'] = 20;
1143 BinopPrecedence['*'] = 40; // highest.
1144
1145 // Prime the first token.
1146 fprintf(stderr, "ready> ");
1147 getNextToken();
1148
1149 // Make the module, which holds all the code.
1150 TheModule = new Module("my cool jit", Context);
1151
1152 // Run the main "interpreter loop" now.
1153 MainLoop();
1154
1155 // Print out all of the generated code.
1156 TheModule->dump();
1157
1158 return 0;
1159 }
1160
1161`Next: Adding JIT and Optimizer Support <LangImpl4.html>`_
1162