blob: a5a60bffe04a5afaf1870daa74345857adfb8d93 [file] [log] [blame]
Sean Silvaee47edf2012-12-05 00:26:32 +00001============================================================
2Kaleidoscope: Extending the Language: User-defined Operators
3============================================================
4
5.. contents::
6 :local:
7
Sean Silvaee47edf2012-12-05 00:26:32 +00008Chapter 6 Introduction
9======================
10
11Welcome to Chapter 6 of the "`Implementing a language with
12LLVM <index.html>`_" tutorial. At this point in our tutorial, we now
13have a fully functional language that is fairly minimal, but also
14useful. There is still one big problem with it, however. Our language
15doesn't have many useful operators (like division, logical negation, or
16even any comparisons besides less-than).
17
18This chapter of the tutorial takes a wild digression into adding
19user-defined operators to the simple and beautiful Kaleidoscope
20language. This digression now gives us a simple and ugly language in
21some ways, but also a powerful one at the same time. One of the great
22things about creating your own language is that you get to decide what
23is good or bad. In this tutorial we'll assume that it is okay to use
24this as a way to show some interesting parsing techniques.
25
26At the end of this tutorial, we'll run through an example Kaleidoscope
27application that `renders the Mandelbrot set <#example>`_. This gives an
28example of what you can build with Kaleidoscope and its feature set.
29
30User-defined Operators: the Idea
31================================
32
33The "operator overloading" that we will add to Kaleidoscope is more
34general than languages like C++. In C++, you are only allowed to
35redefine existing operators: you can't programatically change the
36grammar, introduce new operators, change precedence levels, etc. In this
37chapter, we will add this capability to Kaleidoscope, which will let the
38user round out the set of operators that are supported.
39
40The point of going into user-defined operators in a tutorial like this
41is to show the power and flexibility of using a hand-written parser.
42Thus far, the parser we have been implementing uses recursive descent
43for most parts of the grammar and operator precedence parsing for the
44expressions. See `Chapter 2 <LangImpl2.html>`_ for details. Without
45using operator precedence parsing, it would be very difficult to allow
46the programmer to introduce new operators into the grammar: the grammar
47is dynamically extensible as the JIT runs.
48
49The two specific features we'll add are programmable unary operators
50(right now, Kaleidoscope has no unary operators at all) as well as
51binary operators. An example of this is:
52
53::
54
55 # Logical unary not.
56 def unary!(v)
57 if v then
58 0
59 else
60 1;
61
62 # Define > with the same precedence as <.
63 def binary> 10 (LHS RHS)
64 RHS < LHS;
65
66 # Binary "logical or", (note that it does not "short circuit")
67 def binary| 5 (LHS RHS)
68 if LHS then
69 1
70 else if RHS then
71 1
72 else
73 0;
74
75 # Define = with slightly lower precedence than relationals.
76 def binary= 9 (LHS RHS)
77 !(LHS < RHS | LHS > RHS);
78
79Many languages aspire to being able to implement their standard runtime
80library in the language itself. In Kaleidoscope, we can implement
81significant parts of the language in the library!
82
83We will break down implementation of these features into two parts:
84implementing support for user-defined binary operators and adding unary
85operators.
86
87User-defined Binary Operators
88=============================
89
90Adding support for user-defined binary operators is pretty simple with
91our current framework. We'll first add support for the unary/binary
92keywords:
93
94.. code-block:: c++
95
96 enum Token {
97 ...
98 // operators
99 tok_binary = -11, tok_unary = -12
100 };
101 ...
102 static int gettok() {
103 ...
104 if (IdentifierStr == "for") return tok_for;
105 if (IdentifierStr == "in") return tok_in;
106 if (IdentifierStr == "binary") return tok_binary;
107 if (IdentifierStr == "unary") return tok_unary;
108 return tok_identifier;
109
110This just adds lexer support for the unary and binary keywords, like we
111did in `previous chapters <LangImpl5.html#iflexer>`_. One nice thing
112about our current AST, is that we represent binary operators with full
113generalisation by using their ASCII code as the opcode. For our extended
114operators, we'll use this same representation, so we don't need any new
115AST or parser support.
116
117On the other hand, we have to be able to represent the definitions of
118these new operators, in the "def binary\| 5" part of the function
119definition. In our grammar so far, the "name" for the function
120definition is parsed as the "prototype" production and into the
121``PrototypeAST`` AST node. To represent our new user-defined operators
122as prototypes, we have to extend the ``PrototypeAST`` AST node like
123this:
124
125.. code-block:: c++
126
127 /// PrototypeAST - This class represents the "prototype" for a function,
128 /// which captures its argument names as well as if it is an operator.
129 class PrototypeAST {
130 std::string Name;
131 std::vector<std::string> Args;
132 bool isOperator;
133 unsigned Precedence; // Precedence if a binary op.
134 public:
135 PrototypeAST(const std::string &name, const std::vector<std::string> &args,
136 bool isoperator = false, unsigned prec = 0)
137 : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {}
138
139 bool isUnaryOp() const { return isOperator && Args.size() == 1; }
140 bool isBinaryOp() const { return isOperator && Args.size() == 2; }
141
142 char getOperatorName() const {
143 assert(isUnaryOp() || isBinaryOp());
144 return Name[Name.size()-1];
145 }
146
147 unsigned getBinaryPrecedence() const { return Precedence; }
148
149 Function *Codegen();
150 };
151
152Basically, in addition to knowing a name for the prototype, we now keep
153track of whether it was an operator, and if it was, what precedence
154level the operator is at. The precedence is only used for binary
155operators (as you'll see below, it just doesn't apply for unary
156operators). Now that we have a way to represent the prototype for a
157user-defined operator, we need to parse it:
158
159.. code-block:: c++
160
161 /// prototype
162 /// ::= id '(' id* ')'
163 /// ::= binary LETTER number? (id, id)
164 static PrototypeAST *ParsePrototype() {
165 std::string FnName;
166
167 unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
168 unsigned BinaryPrecedence = 30;
169
170 switch (CurTok) {
171 default:
172 return ErrorP("Expected function name in prototype");
173 case tok_identifier:
174 FnName = IdentifierStr;
175 Kind = 0;
176 getNextToken();
177 break;
178 case tok_binary:
179 getNextToken();
180 if (!isascii(CurTok))
181 return ErrorP("Expected binary operator");
182 FnName = "binary";
183 FnName += (char)CurTok;
184 Kind = 2;
185 getNextToken();
186
187 // Read the precedence if present.
188 if (CurTok == tok_number) {
189 if (NumVal < 1 || NumVal > 100)
190 return ErrorP("Invalid precedecnce: must be 1..100");
191 BinaryPrecedence = (unsigned)NumVal;
192 getNextToken();
193 }
194 break;
195 }
196
197 if (CurTok != '(')
198 return ErrorP("Expected '(' in prototype");
199
200 std::vector<std::string> ArgNames;
201 while (getNextToken() == tok_identifier)
202 ArgNames.push_back(IdentifierStr);
203 if (CurTok != ')')
204 return ErrorP("Expected ')' in prototype");
205
206 // success.
207 getNextToken(); // eat ')'.
208
209 // Verify right number of names for operator.
210 if (Kind && ArgNames.size() != Kind)
211 return ErrorP("Invalid number of operands for operator");
212
213 return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);
214 }
215
216This is all fairly straightforward parsing code, and we have already
217seen a lot of similar code in the past. One interesting part about the
218code above is the couple lines that set up ``FnName`` for binary
219operators. This builds names like "binary@" for a newly defined "@"
220operator. This then takes advantage of the fact that symbol names in the
221LLVM symbol table are allowed to have any character in them, including
222embedded nul characters.
223
224The next interesting thing to add, is codegen support for these binary
225operators. Given our current structure, this is a simple addition of a
226default case for our existing binary operator node:
227
228.. code-block:: c++
229
230 Value *BinaryExprAST::Codegen() {
231 Value *L = LHS->Codegen();
232 Value *R = RHS->Codegen();
233 if (L == 0 || R == 0) return 0;
234
235 switch (Op) {
236 case '+': return Builder.CreateFAdd(L, R, "addtmp");
237 case '-': return Builder.CreateFSub(L, R, "subtmp");
238 case '*': return Builder.CreateFMul(L, R, "multmp");
239 case '<':
240 L = Builder.CreateFCmpULT(L, R, "cmptmp");
241 // Convert bool 0/1 to double 0.0 or 1.0
242 return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
243 "booltmp");
244 default: break;
245 }
246
247 // If it wasn't a builtin binary operator, it must be a user defined one. Emit
248 // a call to it.
249 Function *F = TheModule->getFunction(std::string("binary")+Op);
250 assert(F && "binary operator not found!");
251
252 Value *Ops[2] = { L, R };
253 return Builder.CreateCall(F, Ops, "binop");
254 }
255
256As you can see above, the new code is actually really simple. It just
257does a lookup for the appropriate operator in the symbol table and
258generates a function call to it. Since user-defined operators are just
259built as normal functions (because the "prototype" boils down to a
260function with the right name) everything falls into place.
261
262The final piece of code we are missing, is a bit of top-level magic:
263
264.. code-block:: c++
265
266 Function *FunctionAST::Codegen() {
267 NamedValues.clear();
268
269 Function *TheFunction = Proto->Codegen();
270 if (TheFunction == 0)
271 return 0;
272
273 // If this is an operator, install it.
274 if (Proto->isBinaryOp())
275 BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence();
276
277 // Create a new basic block to start insertion into.
278 BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
279 Builder.SetInsertPoint(BB);
280
281 if (Value *RetVal = Body->Codegen()) {
282 ...
283
284Basically, before codegening a function, if it is a user-defined
285operator, we register it in the precedence table. This allows the binary
286operator parsing logic we already have in place to handle it. Since we
287are working on a fully-general operator precedence parser, this is all
288we need to do to "extend the grammar".
289
290Now we have useful user-defined binary operators. This builds a lot on
291the previous framework we built for other operators. Adding unary
292operators is a bit more challenging, because we don't have any framework
293for it yet - lets see what it takes.
294
295User-defined Unary Operators
296============================
297
298Since we don't currently support unary operators in the Kaleidoscope
299language, we'll need to add everything to support them. Above, we added
300simple support for the 'unary' keyword to the lexer. In addition to
301that, we need an AST node:
302
303.. code-block:: c++
304
305 /// UnaryExprAST - Expression class for a unary operator.
306 class UnaryExprAST : public ExprAST {
307 char Opcode;
308 ExprAST *Operand;
309 public:
310 UnaryExprAST(char opcode, ExprAST *operand)
311 : Opcode(opcode), Operand(operand) {}
312 virtual Value *Codegen();
313 };
314
315This AST node is very simple and obvious by now. It directly mirrors the
316binary operator AST node, except that it only has one child. With this,
317we need to add the parsing logic. Parsing a unary operator is pretty
318simple: we'll add a new function to do it:
319
320.. code-block:: c++
321
322 /// unary
323 /// ::= primary
324 /// ::= '!' unary
325 static ExprAST *ParseUnary() {
326 // If the current token is not an operator, it must be a primary expr.
327 if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
328 return ParsePrimary();
329
330 // If this is a unary operator, read it.
331 int Opc = CurTok;
332 getNextToken();
333 if (ExprAST *Operand = ParseUnary())
334 return new UnaryExprAST(Opc, Operand);
335 return 0;
336 }
337
338The grammar we add is pretty straightforward here. If we see a unary
339operator when parsing a primary operator, we eat the operator as a
340prefix and parse the remaining piece as another unary operator. This
341allows us to handle multiple unary operators (e.g. "!!x"). Note that
342unary operators can't have ambiguous parses like binary operators can,
343so there is no need for precedence information.
344
345The problem with this function, is that we need to call ParseUnary from
346somewhere. To do this, we change previous callers of ParsePrimary to
347call ParseUnary instead:
348
349.. code-block:: c++
350
351 /// binoprhs
352 /// ::= ('+' unary)*
353 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
354 ...
355 // Parse the unary expression after the binary operator.
356 ExprAST *RHS = ParseUnary();
357 if (!RHS) return 0;
358 ...
359 }
360 /// expression
361 /// ::= unary binoprhs
362 ///
363 static ExprAST *ParseExpression() {
364 ExprAST *LHS = ParseUnary();
365 if (!LHS) return 0;
366
367 return ParseBinOpRHS(0, LHS);
368 }
369
370With these two simple changes, we are now able to parse unary operators
371and build the AST for them. Next up, we need to add parser support for
372prototypes, to parse the unary operator prototype. We extend the binary
373operator code above with:
374
375.. code-block:: c++
376
377 /// prototype
378 /// ::= id '(' id* ')'
379 /// ::= binary LETTER number? (id, id)
380 /// ::= unary LETTER (id)
381 static PrototypeAST *ParsePrototype() {
382 std::string FnName;
383
384 unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
385 unsigned BinaryPrecedence = 30;
386
387 switch (CurTok) {
388 default:
389 return ErrorP("Expected function name in prototype");
390 case tok_identifier:
391 FnName = IdentifierStr;
392 Kind = 0;
393 getNextToken();
394 break;
395 case tok_unary:
396 getNextToken();
397 if (!isascii(CurTok))
398 return ErrorP("Expected unary operator");
399 FnName = "unary";
400 FnName += (char)CurTok;
401 Kind = 1;
402 getNextToken();
403 break;
404 case tok_binary:
405 ...
406
407As with binary operators, we name unary operators with a name that
408includes the operator character. This assists us at code generation
409time. Speaking of, the final piece we need to add is codegen support for
410unary operators. It looks like this:
411
412.. code-block:: c++
413
414 Value *UnaryExprAST::Codegen() {
415 Value *OperandV = Operand->Codegen();
416 if (OperandV == 0) return 0;
417
418 Function *F = TheModule->getFunction(std::string("unary")+Opcode);
419 if (F == 0)
420 return ErrorV("Unknown unary operator");
421
422 return Builder.CreateCall(F, OperandV, "unop");
423 }
424
425This code is similar to, but simpler than, the code for binary
426operators. It is simpler primarily because it doesn't need to handle any
427predefined operators.
428
429Kicking the Tires
430=================
431
432It is somewhat hard to believe, but with a few simple extensions we've
433covered in the last chapters, we have grown a real-ish language. With
434this, we can do a lot of interesting things, including I/O, math, and a
435bunch of other things. For example, we can now add a nice sequencing
436operator (printd is defined to print out the specified value and a
437newline):
438
439::
440
441 ready> extern printd(x);
442 Read extern:
443 declare double @printd(double)
444
445 ready> def binary : 1 (x y) 0; # Low-precedence operator that ignores operands.
446 ..
447 ready> printd(123) : printd(456) : printd(789);
448 123.000000
449 456.000000
450 789.000000
451 Evaluated to 0.000000
452
453We can also define a bunch of other "primitive" operations, such as:
454
455::
456
457 # Logical unary not.
458 def unary!(v)
459 if v then
460 0
461 else
462 1;
463
464 # Unary negate.
465 def unary-(v)
466 0-v;
467
468 # Define > with the same precedence as <.
469 def binary> 10 (LHS RHS)
470 RHS < LHS;
471
472 # Binary logical or, which does not short circuit.
473 def binary| 5 (LHS RHS)
474 if LHS then
475 1
476 else if RHS then
477 1
478 else
479 0;
480
481 # Binary logical and, which does not short circuit.
482 def binary& 6 (LHS RHS)
483 if !LHS then
484 0
485 else
486 !!RHS;
487
488 # Define = with slightly lower precedence than relationals.
489 def binary = 9 (LHS RHS)
490 !(LHS < RHS | LHS > RHS);
491
492 # Define ':' for sequencing: as a low-precedence operator that ignores operands
493 # and just returns the RHS.
494 def binary : 1 (x y) y;
495
496Given the previous if/then/else support, we can also define interesting
497functions for I/O. For example, the following prints out a character
498whose "density" reflects the value passed in: the lower the value, the
499denser the character:
500
501::
502
503 ready>
504
505 extern putchard(char)
506 def printdensity(d)
507 if d > 8 then
508 putchard(32) # ' '
509 else if d > 4 then
510 putchard(46) # '.'
511 else if d > 2 then
512 putchard(43) # '+'
513 else
514 putchard(42); # '*'
515 ...
516 ready> printdensity(1): printdensity(2): printdensity(3):
517 printdensity(4): printdensity(5): printdensity(9):
518 putchard(10);
519 **++.
520 Evaluated to 0.000000
521
522Based on these simple primitive operations, we can start to define more
523interesting things. For example, here's a little function that solves
524for the number of iterations it takes a function in the complex plane to
525converge:
526
527::
528
529 # Determine whether the specific location diverges.
530 # Solve for z = z^2 + c in the complex plane.
531 def mandleconverger(real imag iters creal cimag)
532 if iters > 255 | (real*real + imag*imag > 4) then
533 iters
534 else
535 mandleconverger(real*real - imag*imag + creal,
536 2*real*imag + cimag,
537 iters+1, creal, cimag);
538
539 # Return the number of iterations required for the iteration to escape
540 def mandleconverge(real imag)
541 mandleconverger(real, imag, 0, real, imag);
542
543This "``z = z2 + c``" function is a beautiful little creature that is
544the basis for computation of the `Mandelbrot
545Set <http://en.wikipedia.org/wiki/Mandelbrot_set>`_. Our
546``mandelconverge`` function returns the number of iterations that it
547takes for a complex orbit to escape, saturating to 255. This is not a
548very useful function by itself, but if you plot its value over a
549two-dimensional plane, you can see the Mandelbrot set. Given that we are
550limited to using putchard here, our amazing graphical output is limited,
551but we can whip together something using the density plotter above:
552
553::
554
555 # Compute and plot the mandlebrot set with the specified 2 dimensional range
556 # info.
557 def mandelhelp(xmin xmax xstep ymin ymax ystep)
558 for y = ymin, y < ymax, ystep in (
559 (for x = xmin, x < xmax, xstep in
560 printdensity(mandleconverge(x,y)))
561 : putchard(10)
562 )
563
564 # mandel - This is a convenient helper function for plotting the mandelbrot set
565 # from the specified position with the specified Magnification.
566 def mandel(realstart imagstart realmag imagmag)
567 mandelhelp(realstart, realstart+realmag*78, realmag,
568 imagstart, imagstart+imagmag*40, imagmag);
569
570Given this, we can try plotting out the mandlebrot set! Lets try it out:
571
572::
573
574 ready> mandel(-2.3, -1.3, 0.05, 0.07);
575 *******************************+++++++++++*************************************
576 *************************+++++++++++++++++++++++*******************************
577 **********************+++++++++++++++++++++++++++++****************************
578 *******************+++++++++++++++++++++.. ...++++++++*************************
579 *****************++++++++++++++++++++++.... ...+++++++++***********************
580 ***************+++++++++++++++++++++++..... ...+++++++++*********************
581 **************+++++++++++++++++++++++.... ....+++++++++********************
582 *************++++++++++++++++++++++...... .....++++++++*******************
583 ************+++++++++++++++++++++....... .......+++++++******************
584 ***********+++++++++++++++++++.... ... .+++++++*****************
585 **********+++++++++++++++++....... .+++++++****************
586 *********++++++++++++++........... ...+++++++***************
587 ********++++++++++++............ ...++++++++**************
588 ********++++++++++... .......... .++++++++**************
589 *******+++++++++..... .+++++++++*************
590 *******++++++++...... ..+++++++++*************
591 *******++++++....... ..+++++++++*************
592 *******+++++...... ..+++++++++*************
593 *******.... .... ...+++++++++*************
594 *******.... . ...+++++++++*************
595 *******+++++...... ...+++++++++*************
596 *******++++++....... ..+++++++++*************
597 *******++++++++...... .+++++++++*************
598 *******+++++++++..... ..+++++++++*************
599 ********++++++++++... .......... .++++++++**************
600 ********++++++++++++............ ...++++++++**************
601 *********++++++++++++++.......... ...+++++++***************
602 **********++++++++++++++++........ .+++++++****************
603 **********++++++++++++++++++++.... ... ..+++++++****************
604 ***********++++++++++++++++++++++....... .......++++++++*****************
605 ************+++++++++++++++++++++++...... ......++++++++******************
606 **************+++++++++++++++++++++++.... ....++++++++********************
607 ***************+++++++++++++++++++++++..... ...+++++++++*********************
608 *****************++++++++++++++++++++++.... ...++++++++***********************
609 *******************+++++++++++++++++++++......++++++++*************************
610 *********************++++++++++++++++++++++.++++++++***************************
611 *************************+++++++++++++++++++++++*******************************
612 ******************************+++++++++++++************************************
613 *******************************************************************************
614 *******************************************************************************
615 *******************************************************************************
616 Evaluated to 0.000000
617 ready> mandel(-2, -1, 0.02, 0.04);
618 **************************+++++++++++++++++++++++++++++++++++++++++++++++++++++
619 ***********************++++++++++++++++++++++++++++++++++++++++++++++++++++++++
620 *********************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.
621 *******************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++...
622 *****************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.....
623 ***************++++++++++++++++++++++++++++++++++++++++++++++++++++++++........
624 **************++++++++++++++++++++++++++++++++++++++++++++++++++++++...........
625 ************+++++++++++++++++++++++++++++++++++++++++++++++++++++..............
626 ***********++++++++++++++++++++++++++++++++++++++++++++++++++........ .
627 **********++++++++++++++++++++++++++++++++++++++++++++++.............
628 ********+++++++++++++++++++++++++++++++++++++++++++..................
629 *******+++++++++++++++++++++++++++++++++++++++.......................
630 ******+++++++++++++++++++++++++++++++++++...........................
631 *****++++++++++++++++++++++++++++++++............................
632 *****++++++++++++++++++++++++++++...............................
633 ****++++++++++++++++++++++++++...... .........................
634 ***++++++++++++++++++++++++......... ...... ...........
635 ***++++++++++++++++++++++............
636 **+++++++++++++++++++++..............
637 **+++++++++++++++++++................
638 *++++++++++++++++++.................
639 *++++++++++++++++............ ...
640 *++++++++++++++..............
641 *+++....++++................
642 *.......... ...........
643 *
644 *.......... ...........
645 *+++....++++................
646 *++++++++++++++..............
647 *++++++++++++++++............ ...
648 *++++++++++++++++++.................
649 **+++++++++++++++++++................
650 **+++++++++++++++++++++..............
651 ***++++++++++++++++++++++............
652 ***++++++++++++++++++++++++......... ...... ...........
653 ****++++++++++++++++++++++++++...... .........................
654 *****++++++++++++++++++++++++++++...............................
655 *****++++++++++++++++++++++++++++++++............................
656 ******+++++++++++++++++++++++++++++++++++...........................
657 *******+++++++++++++++++++++++++++++++++++++++.......................
658 ********+++++++++++++++++++++++++++++++++++++++++++..................
659 Evaluated to 0.000000
660 ready> mandel(-0.9, -1.4, 0.02, 0.03);
661 *******************************************************************************
662 *******************************************************************************
663 *******************************************************************************
664 **********+++++++++++++++++++++************************************************
665 *+++++++++++++++++++++++++++++++++++++++***************************************
666 +++++++++++++++++++++++++++++++++++++++++++++**********************************
667 ++++++++++++++++++++++++++++++++++++++++++++++++++*****************************
668 ++++++++++++++++++++++++++++++++++++++++++++++++++++++*************************
669 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++**********************
670 +++++++++++++++++++++++++++++++++.........++++++++++++++++++*******************
671 +++++++++++++++++++++++++++++++.... ......+++++++++++++++++++****************
672 +++++++++++++++++++++++++++++....... ........+++++++++++++++++++**************
673 ++++++++++++++++++++++++++++........ ........++++++++++++++++++++************
674 +++++++++++++++++++++++++++......... .. ...+++++++++++++++++++++**********
675 ++++++++++++++++++++++++++........... ....++++++++++++++++++++++********
676 ++++++++++++++++++++++++............. .......++++++++++++++++++++++******
677 +++++++++++++++++++++++............. ........+++++++++++++++++++++++****
678 ++++++++++++++++++++++........... ..........++++++++++++++++++++++***
679 ++++++++++++++++++++........... .........++++++++++++++++++++++*
680 ++++++++++++++++++............ ...........++++++++++++++++++++
681 ++++++++++++++++............... .............++++++++++++++++++
682 ++++++++++++++................. ...............++++++++++++++++
683 ++++++++++++.................. .................++++++++++++++
684 +++++++++.................. .................+++++++++++++
685 ++++++........ . ......... ..++++++++++++
686 ++............ ...... ....++++++++++
687 .............. ...++++++++++
688 .............. ....+++++++++
689 .............. .....++++++++
690 ............. ......++++++++
691 ........... .......++++++++
692 ......... ........+++++++
693 ......... ........+++++++
694 ......... ....+++++++
695 ........ ...+++++++
696 ....... ...+++++++
697 ....+++++++
698 .....+++++++
699 ....+++++++
700 ....+++++++
701 ....+++++++
702 Evaluated to 0.000000
703 ready> ^D
704
705At this point, you may be starting to realize that Kaleidoscope is a
706real and powerful language. It may not be self-similar :), but it can be
707used to plot things that are!
708
709With this, we conclude the "adding user-defined operators" chapter of
710the tutorial. We have successfully augmented our language, adding the
711ability to extend the language in the library, and we have shown how
712this can be used to build a simple but interesting end-user application
713in Kaleidoscope. At this point, Kaleidoscope can build a variety of
714applications that are functional and can call functions with
715side-effects, but it can't actually define and mutate a variable itself.
716
717Strikingly, variable mutation is an important feature of some languages,
718and it is not at all obvious how to `add support for mutable
719variables <LangImpl7.html>`_ without having to add an "SSA construction"
720phase to your front-end. In the next chapter, we will describe how you
721can add variable mutation without building SSA in your front-end.
722
723Full Code Listing
724=================
725
726Here is the complete code listing for our running example, enhanced with
727the if/then/else and for expressions.. To build this example, use:
728
729.. code-block:: bash
730
731 # Compile
732 clang++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
733 # Run
734 ./toy
735
736On some platforms, you will need to specify -rdynamic or
737-Wl,--export-dynamic when linking. This ensures that symbols defined in
738the main executable are exported to the dynamic linker and so are
739available for symbol resolution at run time. This is not needed if you
740compile your support code into a shared library, although doing that
741will cause problems on Windows.
742
743Here is the code:
744
745.. code-block:: c++
746
747 #include "llvm/DerivedTypes.h"
748 #include "llvm/ExecutionEngine/ExecutionEngine.h"
749 #include "llvm/ExecutionEngine/JIT.h"
750 #include "llvm/IRBuilder.h"
751 #include "llvm/LLVMContext.h"
752 #include "llvm/Module.h"
753 #include "llvm/PassManager.h"
754 #include "llvm/Analysis/Verifier.h"
755 #include "llvm/Analysis/Passes.h"
756 #include "llvm/DataLayout.h"
757 #include "llvm/Transforms/Scalar.h"
758 #include "llvm/Support/TargetSelect.h"
759 #include <cstdio>
760 #include <string>
761 #include <map>
762 #include <vector>
763 using namespace llvm;
764
765 //===----------------------------------------------------------------------===//
766 // Lexer
767 //===----------------------------------------------------------------------===//
768
769 // The lexer returns tokens [0-255] if it is an unknown character, otherwise one
770 // of these for known things.
771 enum Token {
772 tok_eof = -1,
773
774 // commands
775 tok_def = -2, tok_extern = -3,
776
777 // primary
778 tok_identifier = -4, tok_number = -5,
779
780 // control
781 tok_if = -6, tok_then = -7, tok_else = -8,
782 tok_for = -9, tok_in = -10,
783
784 // operators
785 tok_binary = -11, tok_unary = -12
786 };
787
788 static std::string IdentifierStr; // Filled in if tok_identifier
789 static double NumVal; // Filled in if tok_number
790
791 /// gettok - Return the next token from standard input.
792 static int gettok() {
793 static int LastChar = ' ';
794
795 // Skip any whitespace.
796 while (isspace(LastChar))
797 LastChar = getchar();
798
799 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
800 IdentifierStr = LastChar;
801 while (isalnum((LastChar = getchar())))
802 IdentifierStr += LastChar;
803
804 if (IdentifierStr == "def") return tok_def;
805 if (IdentifierStr == "extern") return tok_extern;
806 if (IdentifierStr == "if") return tok_if;
807 if (IdentifierStr == "then") return tok_then;
808 if (IdentifierStr == "else") return tok_else;
809 if (IdentifierStr == "for") return tok_for;
810 if (IdentifierStr == "in") return tok_in;
811 if (IdentifierStr == "binary") return tok_binary;
812 if (IdentifierStr == "unary") return tok_unary;
813 return tok_identifier;
814 }
815
816 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
817 std::string NumStr;
818 do {
819 NumStr += LastChar;
820 LastChar = getchar();
821 } while (isdigit(LastChar) || LastChar == '.');
822
823 NumVal = strtod(NumStr.c_str(), 0);
824 return tok_number;
825 }
826
827 if (LastChar == '#') {
828 // Comment until end of line.
829 do LastChar = getchar();
830 while (LastChar != EOF && LastChar != '\n' && LastChar != '\r');
831
832 if (LastChar != EOF)
833 return gettok();
834 }
835
836 // Check for end of file. Don't eat the EOF.
837 if (LastChar == EOF)
838 return tok_eof;
839
840 // Otherwise, just return the character as its ascii value.
841 int ThisChar = LastChar;
842 LastChar = getchar();
843 return ThisChar;
844 }
845
846 //===----------------------------------------------------------------------===//
847 // Abstract Syntax Tree (aka Parse Tree)
848 //===----------------------------------------------------------------------===//
849
850 /// ExprAST - Base class for all expression nodes.
851 class ExprAST {
852 public:
853 virtual ~ExprAST() {}
854 virtual Value *Codegen() = 0;
855 };
856
857 /// NumberExprAST - Expression class for numeric literals like "1.0".
858 class NumberExprAST : public ExprAST {
859 double Val;
860 public:
861 NumberExprAST(double val) : Val(val) {}
862 virtual Value *Codegen();
863 };
864
865 /// VariableExprAST - Expression class for referencing a variable, like "a".
866 class VariableExprAST : public ExprAST {
867 std::string Name;
868 public:
869 VariableExprAST(const std::string &name) : Name(name) {}
870 virtual Value *Codegen();
871 };
872
873 /// UnaryExprAST - Expression class for a unary operator.
874 class UnaryExprAST : public ExprAST {
875 char Opcode;
876 ExprAST *Operand;
877 public:
878 UnaryExprAST(char opcode, ExprAST *operand)
879 : Opcode(opcode), Operand(operand) {}
880 virtual Value *Codegen();
881 };
882
883 /// BinaryExprAST - Expression class for a binary operator.
884 class BinaryExprAST : public ExprAST {
885 char Op;
886 ExprAST *LHS, *RHS;
887 public:
888 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
889 : Op(op), LHS(lhs), RHS(rhs) {}
890 virtual Value *Codegen();
891 };
892
893 /// CallExprAST - Expression class for function calls.
894 class CallExprAST : public ExprAST {
895 std::string Callee;
896 std::vector<ExprAST*> Args;
897 public:
898 CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
899 : Callee(callee), Args(args) {}
900 virtual Value *Codegen();
901 };
902
903 /// IfExprAST - Expression class for if/then/else.
904 class IfExprAST : public ExprAST {
905 ExprAST *Cond, *Then, *Else;
906 public:
907 IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
908 : Cond(cond), Then(then), Else(_else) {}
909 virtual Value *Codegen();
910 };
911
912 /// ForExprAST - Expression class for for/in.
913 class ForExprAST : public ExprAST {
914 std::string VarName;
915 ExprAST *Start, *End, *Step, *Body;
916 public:
917 ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end,
918 ExprAST *step, ExprAST *body)
919 : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
920 virtual Value *Codegen();
921 };
922
923 /// PrototypeAST - This class represents the "prototype" for a function,
924 /// which captures its name, and its argument names (thus implicitly the number
925 /// of arguments the function takes), as well as if it is an operator.
926 class PrototypeAST {
927 std::string Name;
928 std::vector<std::string> Args;
929 bool isOperator;
930 unsigned Precedence; // Precedence if a binary op.
931 public:
932 PrototypeAST(const std::string &name, const std::vector<std::string> &args,
933 bool isoperator = false, unsigned prec = 0)
934 : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {}
935
936 bool isUnaryOp() const { return isOperator && Args.size() == 1; }
937 bool isBinaryOp() const { return isOperator && Args.size() == 2; }
938
939 char getOperatorName() const {
940 assert(isUnaryOp() || isBinaryOp());
941 return Name[Name.size()-1];
942 }
943
944 unsigned getBinaryPrecedence() const { return Precedence; }
945
946 Function *Codegen();
947 };
948
949 /// FunctionAST - This class represents a function definition itself.
950 class FunctionAST {
951 PrototypeAST *Proto;
952 ExprAST *Body;
953 public:
954 FunctionAST(PrototypeAST *proto, ExprAST *body)
955 : Proto(proto), Body(body) {}
956
957 Function *Codegen();
958 };
959
960 //===----------------------------------------------------------------------===//
961 // Parser
962 //===----------------------------------------------------------------------===//
963
964 /// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
965 /// token the parser is looking at. getNextToken reads another token from the
966 /// lexer and updates CurTok with its results.
967 static int CurTok;
968 static int getNextToken() {
969 return CurTok = gettok();
970 }
971
972 /// BinopPrecedence - This holds the precedence for each binary operator that is
973 /// defined.
974 static std::map<char, int> BinopPrecedence;
975
976 /// GetTokPrecedence - Get the precedence of the pending binary operator token.
977 static int GetTokPrecedence() {
978 if (!isascii(CurTok))
979 return -1;
980
981 // Make sure it's a declared binop.
982 int TokPrec = BinopPrecedence[CurTok];
983 if (TokPrec <= 0) return -1;
984 return TokPrec;
985 }
986
987 /// Error* - These are little helper functions for error handling.
988 ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
989 PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
990 FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
991
992 static ExprAST *ParseExpression();
993
994 /// identifierexpr
995 /// ::= identifier
996 /// ::= identifier '(' expression* ')'
997 static ExprAST *ParseIdentifierExpr() {
998 std::string IdName = IdentifierStr;
999
1000 getNextToken(); // eat identifier.
1001
1002 if (CurTok != '(') // Simple variable ref.
1003 return new VariableExprAST(IdName);
1004
1005 // Call.
1006 getNextToken(); // eat (
1007 std::vector<ExprAST*> Args;
1008 if (CurTok != ')') {
1009 while (1) {
1010 ExprAST *Arg = ParseExpression();
1011 if (!Arg) return 0;
1012 Args.push_back(Arg);
1013
1014 if (CurTok == ')') break;
1015
1016 if (CurTok != ',')
1017 return Error("Expected ')' or ',' in argument list");
1018 getNextToken();
1019 }
1020 }
1021
1022 // Eat the ')'.
1023 getNextToken();
1024
1025 return new CallExprAST(IdName, Args);
1026 }
1027
1028 /// numberexpr ::= number
1029 static ExprAST *ParseNumberExpr() {
1030 ExprAST *Result = new NumberExprAST(NumVal);
1031 getNextToken(); // consume the number
1032 return Result;
1033 }
1034
1035 /// parenexpr ::= '(' expression ')'
1036 static ExprAST *ParseParenExpr() {
1037 getNextToken(); // eat (.
1038 ExprAST *V = ParseExpression();
1039 if (!V) return 0;
1040
1041 if (CurTok != ')')
1042 return Error("expected ')'");
1043 getNextToken(); // eat ).
1044 return V;
1045 }
1046
1047 /// ifexpr ::= 'if' expression 'then' expression 'else' expression
1048 static ExprAST *ParseIfExpr() {
1049 getNextToken(); // eat the if.
1050
1051 // condition.
1052 ExprAST *Cond = ParseExpression();
1053 if (!Cond) return 0;
1054
1055 if (CurTok != tok_then)
1056 return Error("expected then");
1057 getNextToken(); // eat the then
1058
1059 ExprAST *Then = ParseExpression();
1060 if (Then == 0) return 0;
1061
1062 if (CurTok != tok_else)
1063 return Error("expected else");
1064
1065 getNextToken();
1066
1067 ExprAST *Else = ParseExpression();
1068 if (!Else) return 0;
1069
1070 return new IfExprAST(Cond, Then, Else);
1071 }
1072
1073 /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
1074 static ExprAST *ParseForExpr() {
1075 getNextToken(); // eat the for.
1076
1077 if (CurTok != tok_identifier)
1078 return Error("expected identifier after for");
1079
1080 std::string IdName = IdentifierStr;
1081 getNextToken(); // eat identifier.
1082
1083 if (CurTok != '=')
1084 return Error("expected '=' after for");
1085 getNextToken(); // eat '='.
1086
1087
1088 ExprAST *Start = ParseExpression();
1089 if (Start == 0) return 0;
1090 if (CurTok != ',')
1091 return Error("expected ',' after for start value");
1092 getNextToken();
1093
1094 ExprAST *End = ParseExpression();
1095 if (End == 0) return 0;
1096
1097 // The step value is optional.
1098 ExprAST *Step = 0;
1099 if (CurTok == ',') {
1100 getNextToken();
1101 Step = ParseExpression();
1102 if (Step == 0) return 0;
1103 }
1104
1105 if (CurTok != tok_in)
1106 return Error("expected 'in' after for");
1107 getNextToken(); // eat 'in'.
1108
1109 ExprAST *Body = ParseExpression();
1110 if (Body == 0) return 0;
1111
1112 return new ForExprAST(IdName, Start, End, Step, Body);
1113 }
1114
1115 /// primary
1116 /// ::= identifierexpr
1117 /// ::= numberexpr
1118 /// ::= parenexpr
1119 /// ::= ifexpr
1120 /// ::= forexpr
1121 static ExprAST *ParsePrimary() {
1122 switch (CurTok) {
1123 default: return Error("unknown token when expecting an expression");
1124 case tok_identifier: return ParseIdentifierExpr();
1125 case tok_number: return ParseNumberExpr();
1126 case '(': return ParseParenExpr();
1127 case tok_if: return ParseIfExpr();
1128 case tok_for: return ParseForExpr();
1129 }
1130 }
1131
1132 /// unary
1133 /// ::= primary
1134 /// ::= '!' unary
1135 static ExprAST *ParseUnary() {
1136 // If the current token is not an operator, it must be a primary expr.
1137 if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
1138 return ParsePrimary();
1139
1140 // If this is a unary operator, read it.
1141 int Opc = CurTok;
1142 getNextToken();
1143 if (ExprAST *Operand = ParseUnary())
1144 return new UnaryExprAST(Opc, Operand);
1145 return 0;
1146 }
1147
1148 /// binoprhs
1149 /// ::= ('+' unary)*
1150 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
1151 // If this is a binop, find its precedence.
1152 while (1) {
1153 int TokPrec = GetTokPrecedence();
1154
1155 // If this is a binop that binds at least as tightly as the current binop,
1156 // consume it, otherwise we are done.
1157 if (TokPrec < ExprPrec)
1158 return LHS;
1159
1160 // Okay, we know this is a binop.
1161 int BinOp = CurTok;
1162 getNextToken(); // eat binop
1163
1164 // Parse the unary expression after the binary operator.
1165 ExprAST *RHS = ParseUnary();
1166 if (!RHS) return 0;
1167
1168 // If BinOp binds less tightly with RHS than the operator after RHS, let
1169 // the pending operator take RHS as its LHS.
1170 int NextPrec = GetTokPrecedence();
1171 if (TokPrec < NextPrec) {
1172 RHS = ParseBinOpRHS(TokPrec+1, RHS);
1173 if (RHS == 0) return 0;
1174 }
1175
1176 // Merge LHS/RHS.
1177 LHS = new BinaryExprAST(BinOp, LHS, RHS);
1178 }
1179 }
1180
1181 /// expression
1182 /// ::= unary binoprhs
1183 ///
1184 static ExprAST *ParseExpression() {
1185 ExprAST *LHS = ParseUnary();
1186 if (!LHS) return 0;
1187
1188 return ParseBinOpRHS(0, LHS);
1189 }
1190
1191 /// prototype
1192 /// ::= id '(' id* ')'
1193 /// ::= binary LETTER number? (id, id)
1194 /// ::= unary LETTER (id)
1195 static PrototypeAST *ParsePrototype() {
1196 std::string FnName;
1197
1198 unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
1199 unsigned BinaryPrecedence = 30;
1200
1201 switch (CurTok) {
1202 default:
1203 return ErrorP("Expected function name in prototype");
1204 case tok_identifier:
1205 FnName = IdentifierStr;
1206 Kind = 0;
1207 getNextToken();
1208 break;
1209 case tok_unary:
1210 getNextToken();
1211 if (!isascii(CurTok))
1212 return ErrorP("Expected unary operator");
1213 FnName = "unary";
1214 FnName += (char)CurTok;
1215 Kind = 1;
1216 getNextToken();
1217 break;
1218 case tok_binary:
1219 getNextToken();
1220 if (!isascii(CurTok))
1221 return ErrorP("Expected binary operator");
1222 FnName = "binary";
1223 FnName += (char)CurTok;
1224 Kind = 2;
1225 getNextToken();
1226
1227 // Read the precedence if present.
1228 if (CurTok == tok_number) {
1229 if (NumVal < 1 || NumVal > 100)
1230 return ErrorP("Invalid precedecnce: must be 1..100");
1231 BinaryPrecedence = (unsigned)NumVal;
1232 getNextToken();
1233 }
1234 break;
1235 }
1236
1237 if (CurTok != '(')
1238 return ErrorP("Expected '(' in prototype");
1239
1240 std::vector<std::string> ArgNames;
1241 while (getNextToken() == tok_identifier)
1242 ArgNames.push_back(IdentifierStr);
1243 if (CurTok != ')')
1244 return ErrorP("Expected ')' in prototype");
1245
1246 // success.
1247 getNextToken(); // eat ')'.
1248
1249 // Verify right number of names for operator.
1250 if (Kind && ArgNames.size() != Kind)
1251 return ErrorP("Invalid number of operands for operator");
1252
1253 return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);
1254 }
1255
1256 /// definition ::= 'def' prototype expression
1257 static FunctionAST *ParseDefinition() {
1258 getNextToken(); // eat def.
1259 PrototypeAST *Proto = ParsePrototype();
1260 if (Proto == 0) return 0;
1261
1262 if (ExprAST *E = ParseExpression())
1263 return new FunctionAST(Proto, E);
1264 return 0;
1265 }
1266
1267 /// toplevelexpr ::= expression
1268 static FunctionAST *ParseTopLevelExpr() {
1269 if (ExprAST *E = ParseExpression()) {
1270 // Make an anonymous proto.
1271 PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
1272 return new FunctionAST(Proto, E);
1273 }
1274 return 0;
1275 }
1276
1277 /// external ::= 'extern' prototype
1278 static PrototypeAST *ParseExtern() {
1279 getNextToken(); // eat extern.
1280 return ParsePrototype();
1281 }
1282
1283 //===----------------------------------------------------------------------===//
1284 // Code Generation
1285 //===----------------------------------------------------------------------===//
1286
1287 static Module *TheModule;
1288 static IRBuilder<> Builder(getGlobalContext());
1289 static std::map<std::string, Value*> NamedValues;
1290 static FunctionPassManager *TheFPM;
1291
1292 Value *ErrorV(const char *Str) { Error(Str); return 0; }
1293
1294 Value *NumberExprAST::Codegen() {
1295 return ConstantFP::get(getGlobalContext(), APFloat(Val));
1296 }
1297
1298 Value *VariableExprAST::Codegen() {
1299 // Look this variable up in the function.
1300 Value *V = NamedValues[Name];
1301 return V ? V : ErrorV("Unknown variable name");
1302 }
1303
1304 Value *UnaryExprAST::Codegen() {
1305 Value *OperandV = Operand->Codegen();
1306 if (OperandV == 0) return 0;
1307
1308 Function *F = TheModule->getFunction(std::string("unary")+Opcode);
1309 if (F == 0)
1310 return ErrorV("Unknown unary operator");
1311
1312 return Builder.CreateCall(F, OperandV, "unop");
1313 }
1314
1315 Value *BinaryExprAST::Codegen() {
1316 Value *L = LHS->Codegen();
1317 Value *R = RHS->Codegen();
1318 if (L == 0 || R == 0) return 0;
1319
1320 switch (Op) {
1321 case '+': return Builder.CreateFAdd(L, R, "addtmp");
1322 case '-': return Builder.CreateFSub(L, R, "subtmp");
1323 case '*': return Builder.CreateFMul(L, R, "multmp");
1324 case '<':
1325 L = Builder.CreateFCmpULT(L, R, "cmptmp");
1326 // Convert bool 0/1 to double 0.0 or 1.0
1327 return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
1328 "booltmp");
1329 default: break;
1330 }
1331
1332 // If it wasn't a builtin binary operator, it must be a user defined one. Emit
1333 // a call to it.
1334 Function *F = TheModule->getFunction(std::string("binary")+Op);
1335 assert(F && "binary operator not found!");
1336
1337 Value *Ops[2] = { L, R };
1338 return Builder.CreateCall(F, Ops, "binop");
1339 }
1340
1341 Value *CallExprAST::Codegen() {
1342 // Look up the name in the global module table.
1343 Function *CalleeF = TheModule->getFunction(Callee);
1344 if (CalleeF == 0)
1345 return ErrorV("Unknown function referenced");
1346
1347 // If argument mismatch error.
1348 if (CalleeF->arg_size() != Args.size())
1349 return ErrorV("Incorrect # arguments passed");
1350
1351 std::vector<Value*> ArgsV;
1352 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
1353 ArgsV.push_back(Args[i]->Codegen());
1354 if (ArgsV.back() == 0) return 0;
1355 }
1356
1357 return Builder.CreateCall(CalleeF, ArgsV, "calltmp");
1358 }
1359
1360 Value *IfExprAST::Codegen() {
1361 Value *CondV = Cond->Codegen();
1362 if (CondV == 0) return 0;
1363
1364 // Convert condition to a bool by comparing equal to 0.0.
1365 CondV = Builder.CreateFCmpONE(CondV,
1366 ConstantFP::get(getGlobalContext(), APFloat(0.0)),
1367 "ifcond");
1368
1369 Function *TheFunction = Builder.GetInsertBlock()->getParent();
1370
1371 // Create blocks for the then and else cases. Insert the 'then' block at the
1372 // end of the function.
1373 BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction);
1374 BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else");
1375 BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont");
1376
1377 Builder.CreateCondBr(CondV, ThenBB, ElseBB);
1378
1379 // Emit then value.
1380 Builder.SetInsertPoint(ThenBB);
1381
1382 Value *ThenV = Then->Codegen();
1383 if (ThenV == 0) return 0;
1384
1385 Builder.CreateBr(MergeBB);
1386 // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
1387 ThenBB = Builder.GetInsertBlock();
1388
1389 // Emit else block.
1390 TheFunction->getBasicBlockList().push_back(ElseBB);
1391 Builder.SetInsertPoint(ElseBB);
1392
1393 Value *ElseV = Else->Codegen();
1394 if (ElseV == 0) return 0;
1395
1396 Builder.CreateBr(MergeBB);
1397 // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
1398 ElseBB = Builder.GetInsertBlock();
1399
1400 // Emit merge block.
1401 TheFunction->getBasicBlockList().push_back(MergeBB);
1402 Builder.SetInsertPoint(MergeBB);
1403 PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2,
1404 "iftmp");
1405
1406 PN->addIncoming(ThenV, ThenBB);
1407 PN->addIncoming(ElseV, ElseBB);
1408 return PN;
1409 }
1410
1411 Value *ForExprAST::Codegen() {
1412 // Output this as:
1413 // ...
1414 // start = startexpr
1415 // goto loop
1416 // loop:
1417 // variable = phi [start, loopheader], [nextvariable, loopend]
1418 // ...
1419 // bodyexpr
1420 // ...
1421 // loopend:
1422 // step = stepexpr
1423 // nextvariable = variable + step
1424 // endcond = endexpr
1425 // br endcond, loop, endloop
1426 // outloop:
1427
1428 // Emit the start code first, without 'variable' in scope.
1429 Value *StartVal = Start->Codegen();
1430 if (StartVal == 0) return 0;
1431
1432 // Make the new basic block for the loop header, inserting after current
1433 // block.
1434 Function *TheFunction = Builder.GetInsertBlock()->getParent();
1435 BasicBlock *PreheaderBB = Builder.GetInsertBlock();
1436 BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction);
1437
1438 // Insert an explicit fall through from the current block to the LoopBB.
1439 Builder.CreateBr(LoopBB);
1440
1441 // Start insertion in LoopBB.
1442 Builder.SetInsertPoint(LoopBB);
1443
1444 // Start the PHI node with an entry for Start.
1445 PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, VarName.c_str());
1446 Variable->addIncoming(StartVal, PreheaderBB);
1447
1448 // Within the loop, the variable is defined equal to the PHI node. If it
1449 // shadows an existing variable, we have to restore it, so save it now.
1450 Value *OldVal = NamedValues[VarName];
1451 NamedValues[VarName] = Variable;
1452
1453 // Emit the body of the loop. This, like any other expr, can change the
1454 // current BB. Note that we ignore the value computed by the body, but don't
1455 // allow an error.
1456 if (Body->Codegen() == 0)
1457 return 0;
1458
1459 // Emit the step value.
1460 Value *StepVal;
1461 if (Step) {
1462 StepVal = Step->Codegen();
1463 if (StepVal == 0) return 0;
1464 } else {
1465 // If not specified, use 1.0.
1466 StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0));
1467 }
1468
1469 Value *NextVar = Builder.CreateFAdd(Variable, StepVal, "nextvar");
1470
1471 // Compute the end condition.
1472 Value *EndCond = End->Codegen();
1473 if (EndCond == 0) return EndCond;
1474
1475 // Convert condition to a bool by comparing equal to 0.0.
1476 EndCond = Builder.CreateFCmpONE(EndCond,
1477 ConstantFP::get(getGlobalContext(), APFloat(0.0)),
1478 "loopcond");
1479
1480 // Create the "after loop" block and insert it.
1481 BasicBlock *LoopEndBB = Builder.GetInsertBlock();
1482 BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction);
1483
1484 // Insert the conditional branch into the end of LoopEndBB.
1485 Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
1486
1487 // Any new code will be inserted in AfterBB.
1488 Builder.SetInsertPoint(AfterBB);
1489
1490 // Add a new entry to the PHI node for the backedge.
1491 Variable->addIncoming(NextVar, LoopEndBB);
1492
1493 // Restore the unshadowed variable.
1494 if (OldVal)
1495 NamedValues[VarName] = OldVal;
1496 else
1497 NamedValues.erase(VarName);
1498
1499
1500 // for expr always returns 0.0.
1501 return Constant::getNullValue(Type::getDoubleTy(getGlobalContext()));
1502 }
1503
1504 Function *PrototypeAST::Codegen() {
1505 // Make the function type: double(double,double) etc.
1506 std::vector<Type*> Doubles(Args.size(),
1507 Type::getDoubleTy(getGlobalContext()));
1508 FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()),
1509 Doubles, false);
1510
1511 Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
1512
1513 // If F conflicted, there was already something named 'Name'. If it has a
1514 // body, don't allow redefinition or reextern.
1515 if (F->getName() != Name) {
1516 // Delete the one we just made and get the existing one.
1517 F->eraseFromParent();
1518 F = TheModule->getFunction(Name);
1519
1520 // If F already has a body, reject this.
1521 if (!F->empty()) {
1522 ErrorF("redefinition of function");
1523 return 0;
1524 }
1525
1526 // If F took a different number of args, reject.
1527 if (F->arg_size() != Args.size()) {
1528 ErrorF("redefinition of function with different # args");
1529 return 0;
1530 }
1531 }
1532
1533 // Set names for all arguments.
1534 unsigned Idx = 0;
1535 for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size();
1536 ++AI, ++Idx) {
1537 AI->setName(Args[Idx]);
1538
1539 // Add arguments to variable symbol table.
1540 NamedValues[Args[Idx]] = AI;
1541 }
1542
1543 return F;
1544 }
1545
1546 Function *FunctionAST::Codegen() {
1547 NamedValues.clear();
1548
1549 Function *TheFunction = Proto->Codegen();
1550 if (TheFunction == 0)
1551 return 0;
1552
1553 // If this is an operator, install it.
1554 if (Proto->isBinaryOp())
1555 BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence();
1556
1557 // Create a new basic block to start insertion into.
1558 BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
1559 Builder.SetInsertPoint(BB);
1560
1561 if (Value *RetVal = Body->Codegen()) {
1562 // Finish off the function.
1563 Builder.CreateRet(RetVal);
1564
1565 // Validate the generated code, checking for consistency.
1566 verifyFunction(*TheFunction);
1567
1568 // Optimize the function.
1569 TheFPM->run(*TheFunction);
1570
1571 return TheFunction;
1572 }
1573
1574 // Error reading body, remove function.
1575 TheFunction->eraseFromParent();
1576
1577 if (Proto->isBinaryOp())
1578 BinopPrecedence.erase(Proto->getOperatorName());
1579 return 0;
1580 }
1581
1582 //===----------------------------------------------------------------------===//
1583 // Top-Level parsing and JIT Driver
1584 //===----------------------------------------------------------------------===//
1585
1586 static ExecutionEngine *TheExecutionEngine;
1587
1588 static void HandleDefinition() {
1589 if (FunctionAST *F = ParseDefinition()) {
1590 if (Function *LF = F->Codegen()) {
1591 fprintf(stderr, "Read function definition:");
1592 LF->dump();
1593 }
1594 } else {
1595 // Skip token for error recovery.
1596 getNextToken();
1597 }
1598 }
1599
1600 static void HandleExtern() {
1601 if (PrototypeAST *P = ParseExtern()) {
1602 if (Function *F = P->Codegen()) {
1603 fprintf(stderr, "Read extern: ");
1604 F->dump();
1605 }
1606 } else {
1607 // Skip token for error recovery.
1608 getNextToken();
1609 }
1610 }
1611
1612 static void HandleTopLevelExpression() {
1613 // Evaluate a top-level expression into an anonymous function.
1614 if (FunctionAST *F = ParseTopLevelExpr()) {
1615 if (Function *LF = F->Codegen()) {
1616 // JIT the function, returning a function pointer.
1617 void *FPtr = TheExecutionEngine->getPointerToFunction(LF);
1618
1619 // Cast it to the right type (takes no arguments, returns a double) so we
1620 // can call it as a native function.
1621 double (*FP)() = (double (*)())(intptr_t)FPtr;
1622 fprintf(stderr, "Evaluated to %f\n", FP());
1623 }
1624 } else {
1625 // Skip token for error recovery.
1626 getNextToken();
1627 }
1628 }
1629
1630 /// top ::= definition | external | expression | ';'
1631 static void MainLoop() {
1632 while (1) {
1633 fprintf(stderr, "ready> ");
1634 switch (CurTok) {
1635 case tok_eof: return;
1636 case ';': getNextToken(); break; // ignore top-level semicolons.
1637 case tok_def: HandleDefinition(); break;
1638 case tok_extern: HandleExtern(); break;
1639 default: HandleTopLevelExpression(); break;
1640 }
1641 }
1642 }
1643
1644 //===----------------------------------------------------------------------===//
1645 // "Library" functions that can be "extern'd" from user code.
1646 //===----------------------------------------------------------------------===//
1647
1648 /// putchard - putchar that takes a double and returns 0.
1649 extern "C"
1650 double putchard(double X) {
1651 putchar((char)X);
1652 return 0;
1653 }
1654
1655 /// printd - printf that takes a double prints it as "%f\n", returning 0.
1656 extern "C"
1657 double printd(double X) {
1658 printf("%f\n", X);
1659 return 0;
1660 }
1661
1662 //===----------------------------------------------------------------------===//
1663 // Main driver code.
1664 //===----------------------------------------------------------------------===//
1665
1666 int main() {
1667 InitializeNativeTarget();
1668 LLVMContext &Context = getGlobalContext();
1669
1670 // Install standard binary operators.
1671 // 1 is lowest precedence.
1672 BinopPrecedence['<'] = 10;
1673 BinopPrecedence['+'] = 20;
1674 BinopPrecedence['-'] = 20;
1675 BinopPrecedence['*'] = 40; // highest.
1676
1677 // Prime the first token.
1678 fprintf(stderr, "ready> ");
1679 getNextToken();
1680
1681 // Make the module, which holds all the code.
1682 TheModule = new Module("my cool jit", Context);
1683
1684 // Create the JIT. This takes ownership of the module.
1685 std::string ErrStr;
1686 TheExecutionEngine = EngineBuilder(TheModule).setErrorStr(&ErrStr).create();
1687 if (!TheExecutionEngine) {
1688 fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str());
1689 exit(1);
1690 }
1691
1692 FunctionPassManager OurFPM(TheModule);
1693
1694 // Set up the optimizer pipeline. Start with registering info about how the
1695 // target lays out data structures.
1696 OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout()));
1697 // Provide basic AliasAnalysis support for GVN.
1698 OurFPM.add(createBasicAliasAnalysisPass());
1699 // Do simple "peephole" optimizations and bit-twiddling optzns.
1700 OurFPM.add(createInstructionCombiningPass());
1701 // Reassociate expressions.
1702 OurFPM.add(createReassociatePass());
1703 // Eliminate Common SubExpressions.
1704 OurFPM.add(createGVNPass());
1705 // Simplify the control flow graph (deleting unreachable blocks, etc).
1706 OurFPM.add(createCFGSimplificationPass());
1707
1708 OurFPM.doInitialization();
1709
1710 // Set the global so the code gen can use this.
1711 TheFPM = &OurFPM;
1712
1713 // Run the main "interpreter loop" now.
1714 MainLoop();
1715
1716 TheFPM = 0;
1717
1718 // Print out all of the generated code.
1719 TheModule->dump();
1720
1721 return 0;
1722 }
1723
1724`Next: Extending the language: mutable variables / SSA
1725construction <LangImpl7.html>`_
1726