blob: 30f4e90d033befd70a4b13fef5bdf28dfd75516a [file] [log] [blame]
Sean Silvaee47edf2012-12-05 00:26:32 +00001============================================================
2Kaleidoscope: Extending the Language: User-defined Operators
3============================================================
4
5.. contents::
6 :local:
7
8Written by `Chris Lattner <mailto:sabre@nondot.org>`_
9
10Chapter 6 Introduction
11======================
12
13Welcome to Chapter 6 of the "`Implementing a language with
14LLVM <index.html>`_" tutorial. At this point in our tutorial, we now
15have a fully functional language that is fairly minimal, but also
16useful. There is still one big problem with it, however. Our language
17doesn't have many useful operators (like division, logical negation, or
18even any comparisons besides less-than).
19
20This chapter of the tutorial takes a wild digression into adding
21user-defined operators to the simple and beautiful Kaleidoscope
22language. This digression now gives us a simple and ugly language in
23some ways, but also a powerful one at the same time. One of the great
24things about creating your own language is that you get to decide what
25is good or bad. In this tutorial we'll assume that it is okay to use
26this as a way to show some interesting parsing techniques.
27
28At the end of this tutorial, we'll run through an example Kaleidoscope
29application that `renders the Mandelbrot set <#example>`_. This gives an
30example of what you can build with Kaleidoscope and its feature set.
31
32User-defined Operators: the Idea
33================================
34
35The "operator overloading" that we will add to Kaleidoscope is more
36general than languages like C++. In C++, you are only allowed to
37redefine existing operators: you can't programatically change the
38grammar, introduce new operators, change precedence levels, etc. In this
39chapter, we will add this capability to Kaleidoscope, which will let the
40user round out the set of operators that are supported.
41
42The point of going into user-defined operators in a tutorial like this
43is to show the power and flexibility of using a hand-written parser.
44Thus far, the parser we have been implementing uses recursive descent
45for most parts of the grammar and operator precedence parsing for the
46expressions. See `Chapter 2 <LangImpl2.html>`_ for details. Without
47using operator precedence parsing, it would be very difficult to allow
48the programmer to introduce new operators into the grammar: the grammar
49is dynamically extensible as the JIT runs.
50
51The two specific features we'll add are programmable unary operators
52(right now, Kaleidoscope has no unary operators at all) as well as
53binary operators. An example of this is:
54
55::
56
57 # Logical unary not.
58 def unary!(v)
59 if v then
60 0
61 else
62 1;
63
64 # Define > with the same precedence as <.
65 def binary> 10 (LHS RHS)
66 RHS < LHS;
67
68 # Binary "logical or", (note that it does not "short circuit")
69 def binary| 5 (LHS RHS)
70 if LHS then
71 1
72 else if RHS then
73 1
74 else
75 0;
76
77 # Define = with slightly lower precedence than relationals.
78 def binary= 9 (LHS RHS)
79 !(LHS < RHS | LHS > RHS);
80
81Many languages aspire to being able to implement their standard runtime
82library in the language itself. In Kaleidoscope, we can implement
83significant parts of the language in the library!
84
85We will break down implementation of these features into two parts:
86implementing support for user-defined binary operators and adding unary
87operators.
88
89User-defined Binary Operators
90=============================
91
92Adding support for user-defined binary operators is pretty simple with
93our current framework. We'll first add support for the unary/binary
94keywords:
95
96.. code-block:: c++
97
98 enum Token {
99 ...
100 // operators
101 tok_binary = -11, tok_unary = -12
102 };
103 ...
104 static int gettok() {
105 ...
106 if (IdentifierStr == "for") return tok_for;
107 if (IdentifierStr == "in") return tok_in;
108 if (IdentifierStr == "binary") return tok_binary;
109 if (IdentifierStr == "unary") return tok_unary;
110 return tok_identifier;
111
112This just adds lexer support for the unary and binary keywords, like we
113did in `previous chapters <LangImpl5.html#iflexer>`_. One nice thing
114about our current AST, is that we represent binary operators with full
115generalisation by using their ASCII code as the opcode. For our extended
116operators, we'll use this same representation, so we don't need any new
117AST or parser support.
118
119On the other hand, we have to be able to represent the definitions of
120these new operators, in the "def binary\| 5" part of the function
121definition. In our grammar so far, the "name" for the function
122definition is parsed as the "prototype" production and into the
123``PrototypeAST`` AST node. To represent our new user-defined operators
124as prototypes, we have to extend the ``PrototypeAST`` AST node like
125this:
126
127.. code-block:: c++
128
129 /// PrototypeAST - This class represents the "prototype" for a function,
130 /// which captures its argument names as well as if it is an operator.
131 class PrototypeAST {
132 std::string Name;
133 std::vector<std::string> Args;
134 bool isOperator;
135 unsigned Precedence; // Precedence if a binary op.
136 public:
137 PrototypeAST(const std::string &name, const std::vector<std::string> &args,
138 bool isoperator = false, unsigned prec = 0)
139 : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {}
140
141 bool isUnaryOp() const { return isOperator && Args.size() == 1; }
142 bool isBinaryOp() const { return isOperator && Args.size() == 2; }
143
144 char getOperatorName() const {
145 assert(isUnaryOp() || isBinaryOp());
146 return Name[Name.size()-1];
147 }
148
149 unsigned getBinaryPrecedence() const { return Precedence; }
150
151 Function *Codegen();
152 };
153
154Basically, in addition to knowing a name for the prototype, we now keep
155track of whether it was an operator, and if it was, what precedence
156level the operator is at. The precedence is only used for binary
157operators (as you'll see below, it just doesn't apply for unary
158operators). Now that we have a way to represent the prototype for a
159user-defined operator, we need to parse it:
160
161.. code-block:: c++
162
163 /// prototype
164 /// ::= id '(' id* ')'
165 /// ::= binary LETTER number? (id, id)
166 static PrototypeAST *ParsePrototype() {
167 std::string FnName;
168
169 unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
170 unsigned BinaryPrecedence = 30;
171
172 switch (CurTok) {
173 default:
174 return ErrorP("Expected function name in prototype");
175 case tok_identifier:
176 FnName = IdentifierStr;
177 Kind = 0;
178 getNextToken();
179 break;
180 case tok_binary:
181 getNextToken();
182 if (!isascii(CurTok))
183 return ErrorP("Expected binary operator");
184 FnName = "binary";
185 FnName += (char)CurTok;
186 Kind = 2;
187 getNextToken();
188
189 // Read the precedence if present.
190 if (CurTok == tok_number) {
191 if (NumVal < 1 || NumVal > 100)
192 return ErrorP("Invalid precedecnce: must be 1..100");
193 BinaryPrecedence = (unsigned)NumVal;
194 getNextToken();
195 }
196 break;
197 }
198
199 if (CurTok != '(')
200 return ErrorP("Expected '(' in prototype");
201
202 std::vector<std::string> ArgNames;
203 while (getNextToken() == tok_identifier)
204 ArgNames.push_back(IdentifierStr);
205 if (CurTok != ')')
206 return ErrorP("Expected ')' in prototype");
207
208 // success.
209 getNextToken(); // eat ')'.
210
211 // Verify right number of names for operator.
212 if (Kind && ArgNames.size() != Kind)
213 return ErrorP("Invalid number of operands for operator");
214
215 return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);
216 }
217
218This is all fairly straightforward parsing code, and we have already
219seen a lot of similar code in the past. One interesting part about the
220code above is the couple lines that set up ``FnName`` for binary
221operators. This builds names like "binary@" for a newly defined "@"
222operator. This then takes advantage of the fact that symbol names in the
223LLVM symbol table are allowed to have any character in them, including
224embedded nul characters.
225
226The next interesting thing to add, is codegen support for these binary
227operators. Given our current structure, this is a simple addition of a
228default case for our existing binary operator node:
229
230.. code-block:: c++
231
232 Value *BinaryExprAST::Codegen() {
233 Value *L = LHS->Codegen();
234 Value *R = RHS->Codegen();
235 if (L == 0 || R == 0) return 0;
236
237 switch (Op) {
238 case '+': return Builder.CreateFAdd(L, R, "addtmp");
239 case '-': return Builder.CreateFSub(L, R, "subtmp");
240 case '*': return Builder.CreateFMul(L, R, "multmp");
241 case '<':
242 L = Builder.CreateFCmpULT(L, R, "cmptmp");
243 // Convert bool 0/1 to double 0.0 or 1.0
244 return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
245 "booltmp");
246 default: break;
247 }
248
249 // If it wasn't a builtin binary operator, it must be a user defined one. Emit
250 // a call to it.
251 Function *F = TheModule->getFunction(std::string("binary")+Op);
252 assert(F && "binary operator not found!");
253
254 Value *Ops[2] = { L, R };
255 return Builder.CreateCall(F, Ops, "binop");
256 }
257
258As you can see above, the new code is actually really simple. It just
259does a lookup for the appropriate operator in the symbol table and
260generates a function call to it. Since user-defined operators are just
261built as normal functions (because the "prototype" boils down to a
262function with the right name) everything falls into place.
263
264The final piece of code we are missing, is a bit of top-level magic:
265
266.. code-block:: c++
267
268 Function *FunctionAST::Codegen() {
269 NamedValues.clear();
270
271 Function *TheFunction = Proto->Codegen();
272 if (TheFunction == 0)
273 return 0;
274
275 // If this is an operator, install it.
276 if (Proto->isBinaryOp())
277 BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence();
278
279 // Create a new basic block to start insertion into.
280 BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
281 Builder.SetInsertPoint(BB);
282
283 if (Value *RetVal = Body->Codegen()) {
284 ...
285
286Basically, before codegening a function, if it is a user-defined
287operator, we register it in the precedence table. This allows the binary
288operator parsing logic we already have in place to handle it. Since we
289are working on a fully-general operator precedence parser, this is all
290we need to do to "extend the grammar".
291
292Now we have useful user-defined binary operators. This builds a lot on
293the previous framework we built for other operators. Adding unary
294operators is a bit more challenging, because we don't have any framework
295for it yet - lets see what it takes.
296
297User-defined Unary Operators
298============================
299
300Since we don't currently support unary operators in the Kaleidoscope
301language, we'll need to add everything to support them. Above, we added
302simple support for the 'unary' keyword to the lexer. In addition to
303that, we need an AST node:
304
305.. code-block:: c++
306
307 /// UnaryExprAST - Expression class for a unary operator.
308 class UnaryExprAST : public ExprAST {
309 char Opcode;
310 ExprAST *Operand;
311 public:
312 UnaryExprAST(char opcode, ExprAST *operand)
313 : Opcode(opcode), Operand(operand) {}
314 virtual Value *Codegen();
315 };
316
317This AST node is very simple and obvious by now. It directly mirrors the
318binary operator AST node, except that it only has one child. With this,
319we need to add the parsing logic. Parsing a unary operator is pretty
320simple: we'll add a new function to do it:
321
322.. code-block:: c++
323
324 /// unary
325 /// ::= primary
326 /// ::= '!' unary
327 static ExprAST *ParseUnary() {
328 // If the current token is not an operator, it must be a primary expr.
329 if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
330 return ParsePrimary();
331
332 // If this is a unary operator, read it.
333 int Opc = CurTok;
334 getNextToken();
335 if (ExprAST *Operand = ParseUnary())
336 return new UnaryExprAST(Opc, Operand);
337 return 0;
338 }
339
340The grammar we add is pretty straightforward here. If we see a unary
341operator when parsing a primary operator, we eat the operator as a
342prefix and parse the remaining piece as another unary operator. This
343allows us to handle multiple unary operators (e.g. "!!x"). Note that
344unary operators can't have ambiguous parses like binary operators can,
345so there is no need for precedence information.
346
347The problem with this function, is that we need to call ParseUnary from
348somewhere. To do this, we change previous callers of ParsePrimary to
349call ParseUnary instead:
350
351.. code-block:: c++
352
353 /// binoprhs
354 /// ::= ('+' unary)*
355 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
356 ...
357 // Parse the unary expression after the binary operator.
358 ExprAST *RHS = ParseUnary();
359 if (!RHS) return 0;
360 ...
361 }
362 /// expression
363 /// ::= unary binoprhs
364 ///
365 static ExprAST *ParseExpression() {
366 ExprAST *LHS = ParseUnary();
367 if (!LHS) return 0;
368
369 return ParseBinOpRHS(0, LHS);
370 }
371
372With these two simple changes, we are now able to parse unary operators
373and build the AST for them. Next up, we need to add parser support for
374prototypes, to parse the unary operator prototype. We extend the binary
375operator code above with:
376
377.. code-block:: c++
378
379 /// prototype
380 /// ::= id '(' id* ')'
381 /// ::= binary LETTER number? (id, id)
382 /// ::= unary LETTER (id)
383 static PrototypeAST *ParsePrototype() {
384 std::string FnName;
385
386 unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
387 unsigned BinaryPrecedence = 30;
388
389 switch (CurTok) {
390 default:
391 return ErrorP("Expected function name in prototype");
392 case tok_identifier:
393 FnName = IdentifierStr;
394 Kind = 0;
395 getNextToken();
396 break;
397 case tok_unary:
398 getNextToken();
399 if (!isascii(CurTok))
400 return ErrorP("Expected unary operator");
401 FnName = "unary";
402 FnName += (char)CurTok;
403 Kind = 1;
404 getNextToken();
405 break;
406 case tok_binary:
407 ...
408
409As with binary operators, we name unary operators with a name that
410includes the operator character. This assists us at code generation
411time. Speaking of, the final piece we need to add is codegen support for
412unary operators. It looks like this:
413
414.. code-block:: c++
415
416 Value *UnaryExprAST::Codegen() {
417 Value *OperandV = Operand->Codegen();
418 if (OperandV == 0) return 0;
419
420 Function *F = TheModule->getFunction(std::string("unary")+Opcode);
421 if (F == 0)
422 return ErrorV("Unknown unary operator");
423
424 return Builder.CreateCall(F, OperandV, "unop");
425 }
426
427This code is similar to, but simpler than, the code for binary
428operators. It is simpler primarily because it doesn't need to handle any
429predefined operators.
430
431Kicking the Tires
432=================
433
434It is somewhat hard to believe, but with a few simple extensions we've
435covered in the last chapters, we have grown a real-ish language. With
436this, we can do a lot of interesting things, including I/O, math, and a
437bunch of other things. For example, we can now add a nice sequencing
438operator (printd is defined to print out the specified value and a
439newline):
440
441::
442
443 ready> extern printd(x);
444 Read extern:
445 declare double @printd(double)
446
447 ready> def binary : 1 (x y) 0; # Low-precedence operator that ignores operands.
448 ..
449 ready> printd(123) : printd(456) : printd(789);
450 123.000000
451 456.000000
452 789.000000
453 Evaluated to 0.000000
454
455We can also define a bunch of other "primitive" operations, such as:
456
457::
458
459 # Logical unary not.
460 def unary!(v)
461 if v then
462 0
463 else
464 1;
465
466 # Unary negate.
467 def unary-(v)
468 0-v;
469
470 # Define > with the same precedence as <.
471 def binary> 10 (LHS RHS)
472 RHS < LHS;
473
474 # Binary logical or, which does not short circuit.
475 def binary| 5 (LHS RHS)
476 if LHS then
477 1
478 else if RHS then
479 1
480 else
481 0;
482
483 # Binary logical and, which does not short circuit.
484 def binary& 6 (LHS RHS)
485 if !LHS then
486 0
487 else
488 !!RHS;
489
490 # Define = with slightly lower precedence than relationals.
491 def binary = 9 (LHS RHS)
492 !(LHS < RHS | LHS > RHS);
493
494 # Define ':' for sequencing: as a low-precedence operator that ignores operands
495 # and just returns the RHS.
496 def binary : 1 (x y) y;
497
498Given the previous if/then/else support, we can also define interesting
499functions for I/O. For example, the following prints out a character
500whose "density" reflects the value passed in: the lower the value, the
501denser the character:
502
503::
504
505 ready>
506
507 extern putchard(char)
508 def printdensity(d)
509 if d > 8 then
510 putchard(32) # ' '
511 else if d > 4 then
512 putchard(46) # '.'
513 else if d > 2 then
514 putchard(43) # '+'
515 else
516 putchard(42); # '*'
517 ...
518 ready> printdensity(1): printdensity(2): printdensity(3):
519 printdensity(4): printdensity(5): printdensity(9):
520 putchard(10);
521 **++.
522 Evaluated to 0.000000
523
524Based on these simple primitive operations, we can start to define more
525interesting things. For example, here's a little function that solves
526for the number of iterations it takes a function in the complex plane to
527converge:
528
529::
530
531 # Determine whether the specific location diverges.
532 # Solve for z = z^2 + c in the complex plane.
533 def mandleconverger(real imag iters creal cimag)
534 if iters > 255 | (real*real + imag*imag > 4) then
535 iters
536 else
537 mandleconverger(real*real - imag*imag + creal,
538 2*real*imag + cimag,
539 iters+1, creal, cimag);
540
541 # Return the number of iterations required for the iteration to escape
542 def mandleconverge(real imag)
543 mandleconverger(real, imag, 0, real, imag);
544
545This "``z = z2 + c``" function is a beautiful little creature that is
546the basis for computation of the `Mandelbrot
547Set <http://en.wikipedia.org/wiki/Mandelbrot_set>`_. Our
548``mandelconverge`` function returns the number of iterations that it
549takes for a complex orbit to escape, saturating to 255. This is not a
550very useful function by itself, but if you plot its value over a
551two-dimensional plane, you can see the Mandelbrot set. Given that we are
552limited to using putchard here, our amazing graphical output is limited,
553but we can whip together something using the density plotter above:
554
555::
556
557 # Compute and plot the mandlebrot set with the specified 2 dimensional range
558 # info.
559 def mandelhelp(xmin xmax xstep ymin ymax ystep)
560 for y = ymin, y < ymax, ystep in (
561 (for x = xmin, x < xmax, xstep in
562 printdensity(mandleconverge(x,y)))
563 : putchard(10)
564 )
565
566 # mandel - This is a convenient helper function for plotting the mandelbrot set
567 # from the specified position with the specified Magnification.
568 def mandel(realstart imagstart realmag imagmag)
569 mandelhelp(realstart, realstart+realmag*78, realmag,
570 imagstart, imagstart+imagmag*40, imagmag);
571
572Given this, we can try plotting out the mandlebrot set! Lets try it out:
573
574::
575
576 ready> mandel(-2.3, -1.3, 0.05, 0.07);
577 *******************************+++++++++++*************************************
578 *************************+++++++++++++++++++++++*******************************
579 **********************+++++++++++++++++++++++++++++****************************
580 *******************+++++++++++++++++++++.. ...++++++++*************************
581 *****************++++++++++++++++++++++.... ...+++++++++***********************
582 ***************+++++++++++++++++++++++..... ...+++++++++*********************
583 **************+++++++++++++++++++++++.... ....+++++++++********************
584 *************++++++++++++++++++++++...... .....++++++++*******************
585 ************+++++++++++++++++++++....... .......+++++++******************
586 ***********+++++++++++++++++++.... ... .+++++++*****************
587 **********+++++++++++++++++....... .+++++++****************
588 *********++++++++++++++........... ...+++++++***************
589 ********++++++++++++............ ...++++++++**************
590 ********++++++++++... .......... .++++++++**************
591 *******+++++++++..... .+++++++++*************
592 *******++++++++...... ..+++++++++*************
593 *******++++++....... ..+++++++++*************
594 *******+++++...... ..+++++++++*************
595 *******.... .... ...+++++++++*************
596 *******.... . ...+++++++++*************
597 *******+++++...... ...+++++++++*************
598 *******++++++....... ..+++++++++*************
599 *******++++++++...... .+++++++++*************
600 *******+++++++++..... ..+++++++++*************
601 ********++++++++++... .......... .++++++++**************
602 ********++++++++++++............ ...++++++++**************
603 *********++++++++++++++.......... ...+++++++***************
604 **********++++++++++++++++........ .+++++++****************
605 **********++++++++++++++++++++.... ... ..+++++++****************
606 ***********++++++++++++++++++++++....... .......++++++++*****************
607 ************+++++++++++++++++++++++...... ......++++++++******************
608 **************+++++++++++++++++++++++.... ....++++++++********************
609 ***************+++++++++++++++++++++++..... ...+++++++++*********************
610 *****************++++++++++++++++++++++.... ...++++++++***********************
611 *******************+++++++++++++++++++++......++++++++*************************
612 *********************++++++++++++++++++++++.++++++++***************************
613 *************************+++++++++++++++++++++++*******************************
614 ******************************+++++++++++++************************************
615 *******************************************************************************
616 *******************************************************************************
617 *******************************************************************************
618 Evaluated to 0.000000
619 ready> mandel(-2, -1, 0.02, 0.04);
620 **************************+++++++++++++++++++++++++++++++++++++++++++++++++++++
621 ***********************++++++++++++++++++++++++++++++++++++++++++++++++++++++++
622 *********************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.
623 *******************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++...
624 *****************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.....
625 ***************++++++++++++++++++++++++++++++++++++++++++++++++++++++++........
626 **************++++++++++++++++++++++++++++++++++++++++++++++++++++++...........
627 ************+++++++++++++++++++++++++++++++++++++++++++++++++++++..............
628 ***********++++++++++++++++++++++++++++++++++++++++++++++++++........ .
629 **********++++++++++++++++++++++++++++++++++++++++++++++.............
630 ********+++++++++++++++++++++++++++++++++++++++++++..................
631 *******+++++++++++++++++++++++++++++++++++++++.......................
632 ******+++++++++++++++++++++++++++++++++++...........................
633 *****++++++++++++++++++++++++++++++++............................
634 *****++++++++++++++++++++++++++++...............................
635 ****++++++++++++++++++++++++++...... .........................
636 ***++++++++++++++++++++++++......... ...... ...........
637 ***++++++++++++++++++++++............
638 **+++++++++++++++++++++..............
639 **+++++++++++++++++++................
640 *++++++++++++++++++.................
641 *++++++++++++++++............ ...
642 *++++++++++++++..............
643 *+++....++++................
644 *.......... ...........
645 *
646 *.......... ...........
647 *+++....++++................
648 *++++++++++++++..............
649 *++++++++++++++++............ ...
650 *++++++++++++++++++.................
651 **+++++++++++++++++++................
652 **+++++++++++++++++++++..............
653 ***++++++++++++++++++++++............
654 ***++++++++++++++++++++++++......... ...... ...........
655 ****++++++++++++++++++++++++++...... .........................
656 *****++++++++++++++++++++++++++++...............................
657 *****++++++++++++++++++++++++++++++++............................
658 ******+++++++++++++++++++++++++++++++++++...........................
659 *******+++++++++++++++++++++++++++++++++++++++.......................
660 ********+++++++++++++++++++++++++++++++++++++++++++..................
661 Evaluated to 0.000000
662 ready> mandel(-0.9, -1.4, 0.02, 0.03);
663 *******************************************************************************
664 *******************************************************************************
665 *******************************************************************************
666 **********+++++++++++++++++++++************************************************
667 *+++++++++++++++++++++++++++++++++++++++***************************************
668 +++++++++++++++++++++++++++++++++++++++++++++**********************************
669 ++++++++++++++++++++++++++++++++++++++++++++++++++*****************************
670 ++++++++++++++++++++++++++++++++++++++++++++++++++++++*************************
671 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++**********************
672 +++++++++++++++++++++++++++++++++.........++++++++++++++++++*******************
673 +++++++++++++++++++++++++++++++.... ......+++++++++++++++++++****************
674 +++++++++++++++++++++++++++++....... ........+++++++++++++++++++**************
675 ++++++++++++++++++++++++++++........ ........++++++++++++++++++++************
676 +++++++++++++++++++++++++++......... .. ...+++++++++++++++++++++**********
677 ++++++++++++++++++++++++++........... ....++++++++++++++++++++++********
678 ++++++++++++++++++++++++............. .......++++++++++++++++++++++******
679 +++++++++++++++++++++++............. ........+++++++++++++++++++++++****
680 ++++++++++++++++++++++........... ..........++++++++++++++++++++++***
681 ++++++++++++++++++++........... .........++++++++++++++++++++++*
682 ++++++++++++++++++............ ...........++++++++++++++++++++
683 ++++++++++++++++............... .............++++++++++++++++++
684 ++++++++++++++................. ...............++++++++++++++++
685 ++++++++++++.................. .................++++++++++++++
686 +++++++++.................. .................+++++++++++++
687 ++++++........ . ......... ..++++++++++++
688 ++............ ...... ....++++++++++
689 .............. ...++++++++++
690 .............. ....+++++++++
691 .............. .....++++++++
692 ............. ......++++++++
693 ........... .......++++++++
694 ......... ........+++++++
695 ......... ........+++++++
696 ......... ....+++++++
697 ........ ...+++++++
698 ....... ...+++++++
699 ....+++++++
700 .....+++++++
701 ....+++++++
702 ....+++++++
703 ....+++++++
704 Evaluated to 0.000000
705 ready> ^D
706
707At this point, you may be starting to realize that Kaleidoscope is a
708real and powerful language. It may not be self-similar :), but it can be
709used to plot things that are!
710
711With this, we conclude the "adding user-defined operators" chapter of
712the tutorial. We have successfully augmented our language, adding the
713ability to extend the language in the library, and we have shown how
714this can be used to build a simple but interesting end-user application
715in Kaleidoscope. At this point, Kaleidoscope can build a variety of
716applications that are functional and can call functions with
717side-effects, but it can't actually define and mutate a variable itself.
718
719Strikingly, variable mutation is an important feature of some languages,
720and it is not at all obvious how to `add support for mutable
721variables <LangImpl7.html>`_ without having to add an "SSA construction"
722phase to your front-end. In the next chapter, we will describe how you
723can add variable mutation without building SSA in your front-end.
724
725Full Code Listing
726=================
727
728Here is the complete code listing for our running example, enhanced with
729the if/then/else and for expressions.. To build this example, use:
730
731.. code-block:: bash
732
733 # Compile
734 clang++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
735 # Run
736 ./toy
737
738On some platforms, you will need to specify -rdynamic or
739-Wl,--export-dynamic when linking. This ensures that symbols defined in
740the main executable are exported to the dynamic linker and so are
741available for symbol resolution at run time. This is not needed if you
742compile your support code into a shared library, although doing that
743will cause problems on Windows.
744
745Here is the code:
746
747.. code-block:: c++
748
749 #include "llvm/DerivedTypes.h"
750 #include "llvm/ExecutionEngine/ExecutionEngine.h"
751 #include "llvm/ExecutionEngine/JIT.h"
752 #include "llvm/IRBuilder.h"
753 #include "llvm/LLVMContext.h"
754 #include "llvm/Module.h"
755 #include "llvm/PassManager.h"
756 #include "llvm/Analysis/Verifier.h"
757 #include "llvm/Analysis/Passes.h"
758 #include "llvm/DataLayout.h"
759 #include "llvm/Transforms/Scalar.h"
760 #include "llvm/Support/TargetSelect.h"
761 #include <cstdio>
762 #include <string>
763 #include <map>
764 #include <vector>
765 using namespace llvm;
766
767 //===----------------------------------------------------------------------===//
768 // Lexer
769 //===----------------------------------------------------------------------===//
770
771 // The lexer returns tokens [0-255] if it is an unknown character, otherwise one
772 // of these for known things.
773 enum Token {
774 tok_eof = -1,
775
776 // commands
777 tok_def = -2, tok_extern = -3,
778
779 // primary
780 tok_identifier = -4, tok_number = -5,
781
782 // control
783 tok_if = -6, tok_then = -7, tok_else = -8,
784 tok_for = -9, tok_in = -10,
785
786 // operators
787 tok_binary = -11, tok_unary = -12
788 };
789
790 static std::string IdentifierStr; // Filled in if tok_identifier
791 static double NumVal; // Filled in if tok_number
792
793 /// gettok - Return the next token from standard input.
794 static int gettok() {
795 static int LastChar = ' ';
796
797 // Skip any whitespace.
798 while (isspace(LastChar))
799 LastChar = getchar();
800
801 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
802 IdentifierStr = LastChar;
803 while (isalnum((LastChar = getchar())))
804 IdentifierStr += LastChar;
805
806 if (IdentifierStr == "def") return tok_def;
807 if (IdentifierStr == "extern") return tok_extern;
808 if (IdentifierStr == "if") return tok_if;
809 if (IdentifierStr == "then") return tok_then;
810 if (IdentifierStr == "else") return tok_else;
811 if (IdentifierStr == "for") return tok_for;
812 if (IdentifierStr == "in") return tok_in;
813 if (IdentifierStr == "binary") return tok_binary;
814 if (IdentifierStr == "unary") return tok_unary;
815 return tok_identifier;
816 }
817
818 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
819 std::string NumStr;
820 do {
821 NumStr += LastChar;
822 LastChar = getchar();
823 } while (isdigit(LastChar) || LastChar == '.');
824
825 NumVal = strtod(NumStr.c_str(), 0);
826 return tok_number;
827 }
828
829 if (LastChar == '#') {
830 // Comment until end of line.
831 do LastChar = getchar();
832 while (LastChar != EOF && LastChar != '\n' && LastChar != '\r');
833
834 if (LastChar != EOF)
835 return gettok();
836 }
837
838 // Check for end of file. Don't eat the EOF.
839 if (LastChar == EOF)
840 return tok_eof;
841
842 // Otherwise, just return the character as its ascii value.
843 int ThisChar = LastChar;
844 LastChar = getchar();
845 return ThisChar;
846 }
847
848 //===----------------------------------------------------------------------===//
849 // Abstract Syntax Tree (aka Parse Tree)
850 //===----------------------------------------------------------------------===//
851
852 /// ExprAST - Base class for all expression nodes.
853 class ExprAST {
854 public:
855 virtual ~ExprAST() {}
856 virtual Value *Codegen() = 0;
857 };
858
859 /// NumberExprAST - Expression class for numeric literals like "1.0".
860 class NumberExprAST : public ExprAST {
861 double Val;
862 public:
863 NumberExprAST(double val) : Val(val) {}
864 virtual Value *Codegen();
865 };
866
867 /// VariableExprAST - Expression class for referencing a variable, like "a".
868 class VariableExprAST : public ExprAST {
869 std::string Name;
870 public:
871 VariableExprAST(const std::string &name) : Name(name) {}
872 virtual Value *Codegen();
873 };
874
875 /// UnaryExprAST - Expression class for a unary operator.
876 class UnaryExprAST : public ExprAST {
877 char Opcode;
878 ExprAST *Operand;
879 public:
880 UnaryExprAST(char opcode, ExprAST *operand)
881 : Opcode(opcode), Operand(operand) {}
882 virtual Value *Codegen();
883 };
884
885 /// BinaryExprAST - Expression class for a binary operator.
886 class BinaryExprAST : public ExprAST {
887 char Op;
888 ExprAST *LHS, *RHS;
889 public:
890 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
891 : Op(op), LHS(lhs), RHS(rhs) {}
892 virtual Value *Codegen();
893 };
894
895 /// CallExprAST - Expression class for function calls.
896 class CallExprAST : public ExprAST {
897 std::string Callee;
898 std::vector<ExprAST*> Args;
899 public:
900 CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
901 : Callee(callee), Args(args) {}
902 virtual Value *Codegen();
903 };
904
905 /// IfExprAST - Expression class for if/then/else.
906 class IfExprAST : public ExprAST {
907 ExprAST *Cond, *Then, *Else;
908 public:
909 IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
910 : Cond(cond), Then(then), Else(_else) {}
911 virtual Value *Codegen();
912 };
913
914 /// ForExprAST - Expression class for for/in.
915 class ForExprAST : public ExprAST {
916 std::string VarName;
917 ExprAST *Start, *End, *Step, *Body;
918 public:
919 ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end,
920 ExprAST *step, ExprAST *body)
921 : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
922 virtual Value *Codegen();
923 };
924
925 /// PrototypeAST - This class represents the "prototype" for a function,
926 /// which captures its name, and its argument names (thus implicitly the number
927 /// of arguments the function takes), as well as if it is an operator.
928 class PrototypeAST {
929 std::string Name;
930 std::vector<std::string> Args;
931 bool isOperator;
932 unsigned Precedence; // Precedence if a binary op.
933 public:
934 PrototypeAST(const std::string &name, const std::vector<std::string> &args,
935 bool isoperator = false, unsigned prec = 0)
936 : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {}
937
938 bool isUnaryOp() const { return isOperator && Args.size() == 1; }
939 bool isBinaryOp() const { return isOperator && Args.size() == 2; }
940
941 char getOperatorName() const {
942 assert(isUnaryOp() || isBinaryOp());
943 return Name[Name.size()-1];
944 }
945
946 unsigned getBinaryPrecedence() const { return Precedence; }
947
948 Function *Codegen();
949 };
950
951 /// FunctionAST - This class represents a function definition itself.
952 class FunctionAST {
953 PrototypeAST *Proto;
954 ExprAST *Body;
955 public:
956 FunctionAST(PrototypeAST *proto, ExprAST *body)
957 : Proto(proto), Body(body) {}
958
959 Function *Codegen();
960 };
961
962 //===----------------------------------------------------------------------===//
963 // Parser
964 //===----------------------------------------------------------------------===//
965
966 /// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
967 /// token the parser is looking at. getNextToken reads another token from the
968 /// lexer and updates CurTok with its results.
969 static int CurTok;
970 static int getNextToken() {
971 return CurTok = gettok();
972 }
973
974 /// BinopPrecedence - This holds the precedence for each binary operator that is
975 /// defined.
976 static std::map<char, int> BinopPrecedence;
977
978 /// GetTokPrecedence - Get the precedence of the pending binary operator token.
979 static int GetTokPrecedence() {
980 if (!isascii(CurTok))
981 return -1;
982
983 // Make sure it's a declared binop.
984 int TokPrec = BinopPrecedence[CurTok];
985 if (TokPrec <= 0) return -1;
986 return TokPrec;
987 }
988
989 /// Error* - These are little helper functions for error handling.
990 ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
991 PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
992 FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
993
994 static ExprAST *ParseExpression();
995
996 /// identifierexpr
997 /// ::= identifier
998 /// ::= identifier '(' expression* ')'
999 static ExprAST *ParseIdentifierExpr() {
1000 std::string IdName = IdentifierStr;
1001
1002 getNextToken(); // eat identifier.
1003
1004 if (CurTok != '(') // Simple variable ref.
1005 return new VariableExprAST(IdName);
1006
1007 // Call.
1008 getNextToken(); // eat (
1009 std::vector<ExprAST*> Args;
1010 if (CurTok != ')') {
1011 while (1) {
1012 ExprAST *Arg = ParseExpression();
1013 if (!Arg) return 0;
1014 Args.push_back(Arg);
1015
1016 if (CurTok == ')') break;
1017
1018 if (CurTok != ',')
1019 return Error("Expected ')' or ',' in argument list");
1020 getNextToken();
1021 }
1022 }
1023
1024 // Eat the ')'.
1025 getNextToken();
1026
1027 return new CallExprAST(IdName, Args);
1028 }
1029
1030 /// numberexpr ::= number
1031 static ExprAST *ParseNumberExpr() {
1032 ExprAST *Result = new NumberExprAST(NumVal);
1033 getNextToken(); // consume the number
1034 return Result;
1035 }
1036
1037 /// parenexpr ::= '(' expression ')'
1038 static ExprAST *ParseParenExpr() {
1039 getNextToken(); // eat (.
1040 ExprAST *V = ParseExpression();
1041 if (!V) return 0;
1042
1043 if (CurTok != ')')
1044 return Error("expected ')'");
1045 getNextToken(); // eat ).
1046 return V;
1047 }
1048
1049 /// ifexpr ::= 'if' expression 'then' expression 'else' expression
1050 static ExprAST *ParseIfExpr() {
1051 getNextToken(); // eat the if.
1052
1053 // condition.
1054 ExprAST *Cond = ParseExpression();
1055 if (!Cond) return 0;
1056
1057 if (CurTok != tok_then)
1058 return Error("expected then");
1059 getNextToken(); // eat the then
1060
1061 ExprAST *Then = ParseExpression();
1062 if (Then == 0) return 0;
1063
1064 if (CurTok != tok_else)
1065 return Error("expected else");
1066
1067 getNextToken();
1068
1069 ExprAST *Else = ParseExpression();
1070 if (!Else) return 0;
1071
1072 return new IfExprAST(Cond, Then, Else);
1073 }
1074
1075 /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
1076 static ExprAST *ParseForExpr() {
1077 getNextToken(); // eat the for.
1078
1079 if (CurTok != tok_identifier)
1080 return Error("expected identifier after for");
1081
1082 std::string IdName = IdentifierStr;
1083 getNextToken(); // eat identifier.
1084
1085 if (CurTok != '=')
1086 return Error("expected '=' after for");
1087 getNextToken(); // eat '='.
1088
1089
1090 ExprAST *Start = ParseExpression();
1091 if (Start == 0) return 0;
1092 if (CurTok != ',')
1093 return Error("expected ',' after for start value");
1094 getNextToken();
1095
1096 ExprAST *End = ParseExpression();
1097 if (End == 0) return 0;
1098
1099 // The step value is optional.
1100 ExprAST *Step = 0;
1101 if (CurTok == ',') {
1102 getNextToken();
1103 Step = ParseExpression();
1104 if (Step == 0) return 0;
1105 }
1106
1107 if (CurTok != tok_in)
1108 return Error("expected 'in' after for");
1109 getNextToken(); // eat 'in'.
1110
1111 ExprAST *Body = ParseExpression();
1112 if (Body == 0) return 0;
1113
1114 return new ForExprAST(IdName, Start, End, Step, Body);
1115 }
1116
1117 /// primary
1118 /// ::= identifierexpr
1119 /// ::= numberexpr
1120 /// ::= parenexpr
1121 /// ::= ifexpr
1122 /// ::= forexpr
1123 static ExprAST *ParsePrimary() {
1124 switch (CurTok) {
1125 default: return Error("unknown token when expecting an expression");
1126 case tok_identifier: return ParseIdentifierExpr();
1127 case tok_number: return ParseNumberExpr();
1128 case '(': return ParseParenExpr();
1129 case tok_if: return ParseIfExpr();
1130 case tok_for: return ParseForExpr();
1131 }
1132 }
1133
1134 /// unary
1135 /// ::= primary
1136 /// ::= '!' unary
1137 static ExprAST *ParseUnary() {
1138 // If the current token is not an operator, it must be a primary expr.
1139 if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
1140 return ParsePrimary();
1141
1142 // If this is a unary operator, read it.
1143 int Opc = CurTok;
1144 getNextToken();
1145 if (ExprAST *Operand = ParseUnary())
1146 return new UnaryExprAST(Opc, Operand);
1147 return 0;
1148 }
1149
1150 /// binoprhs
1151 /// ::= ('+' unary)*
1152 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
1153 // If this is a binop, find its precedence.
1154 while (1) {
1155 int TokPrec = GetTokPrecedence();
1156
1157 // If this is a binop that binds at least as tightly as the current binop,
1158 // consume it, otherwise we are done.
1159 if (TokPrec < ExprPrec)
1160 return LHS;
1161
1162 // Okay, we know this is a binop.
1163 int BinOp = CurTok;
1164 getNextToken(); // eat binop
1165
1166 // Parse the unary expression after the binary operator.
1167 ExprAST *RHS = ParseUnary();
1168 if (!RHS) return 0;
1169
1170 // If BinOp binds less tightly with RHS than the operator after RHS, let
1171 // the pending operator take RHS as its LHS.
1172 int NextPrec = GetTokPrecedence();
1173 if (TokPrec < NextPrec) {
1174 RHS = ParseBinOpRHS(TokPrec+1, RHS);
1175 if (RHS == 0) return 0;
1176 }
1177
1178 // Merge LHS/RHS.
1179 LHS = new BinaryExprAST(BinOp, LHS, RHS);
1180 }
1181 }
1182
1183 /// expression
1184 /// ::= unary binoprhs
1185 ///
1186 static ExprAST *ParseExpression() {
1187 ExprAST *LHS = ParseUnary();
1188 if (!LHS) return 0;
1189
1190 return ParseBinOpRHS(0, LHS);
1191 }
1192
1193 /// prototype
1194 /// ::= id '(' id* ')'
1195 /// ::= binary LETTER number? (id, id)
1196 /// ::= unary LETTER (id)
1197 static PrototypeAST *ParsePrototype() {
1198 std::string FnName;
1199
1200 unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
1201 unsigned BinaryPrecedence = 30;
1202
1203 switch (CurTok) {
1204 default:
1205 return ErrorP("Expected function name in prototype");
1206 case tok_identifier:
1207 FnName = IdentifierStr;
1208 Kind = 0;
1209 getNextToken();
1210 break;
1211 case tok_unary:
1212 getNextToken();
1213 if (!isascii(CurTok))
1214 return ErrorP("Expected unary operator");
1215 FnName = "unary";
1216 FnName += (char)CurTok;
1217 Kind = 1;
1218 getNextToken();
1219 break;
1220 case tok_binary:
1221 getNextToken();
1222 if (!isascii(CurTok))
1223 return ErrorP("Expected binary operator");
1224 FnName = "binary";
1225 FnName += (char)CurTok;
1226 Kind = 2;
1227 getNextToken();
1228
1229 // Read the precedence if present.
1230 if (CurTok == tok_number) {
1231 if (NumVal < 1 || NumVal > 100)
1232 return ErrorP("Invalid precedecnce: must be 1..100");
1233 BinaryPrecedence = (unsigned)NumVal;
1234 getNextToken();
1235 }
1236 break;
1237 }
1238
1239 if (CurTok != '(')
1240 return ErrorP("Expected '(' in prototype");
1241
1242 std::vector<std::string> ArgNames;
1243 while (getNextToken() == tok_identifier)
1244 ArgNames.push_back(IdentifierStr);
1245 if (CurTok != ')')
1246 return ErrorP("Expected ')' in prototype");
1247
1248 // success.
1249 getNextToken(); // eat ')'.
1250
1251 // Verify right number of names for operator.
1252 if (Kind && ArgNames.size() != Kind)
1253 return ErrorP("Invalid number of operands for operator");
1254
1255 return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);
1256 }
1257
1258 /// definition ::= 'def' prototype expression
1259 static FunctionAST *ParseDefinition() {
1260 getNextToken(); // eat def.
1261 PrototypeAST *Proto = ParsePrototype();
1262 if (Proto == 0) return 0;
1263
1264 if (ExprAST *E = ParseExpression())
1265 return new FunctionAST(Proto, E);
1266 return 0;
1267 }
1268
1269 /// toplevelexpr ::= expression
1270 static FunctionAST *ParseTopLevelExpr() {
1271 if (ExprAST *E = ParseExpression()) {
1272 // Make an anonymous proto.
1273 PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
1274 return new FunctionAST(Proto, E);
1275 }
1276 return 0;
1277 }
1278
1279 /// external ::= 'extern' prototype
1280 static PrototypeAST *ParseExtern() {
1281 getNextToken(); // eat extern.
1282 return ParsePrototype();
1283 }
1284
1285 //===----------------------------------------------------------------------===//
1286 // Code Generation
1287 //===----------------------------------------------------------------------===//
1288
1289 static Module *TheModule;
1290 static IRBuilder<> Builder(getGlobalContext());
1291 static std::map<std::string, Value*> NamedValues;
1292 static FunctionPassManager *TheFPM;
1293
1294 Value *ErrorV(const char *Str) { Error(Str); return 0; }
1295
1296 Value *NumberExprAST::Codegen() {
1297 return ConstantFP::get(getGlobalContext(), APFloat(Val));
1298 }
1299
1300 Value *VariableExprAST::Codegen() {
1301 // Look this variable up in the function.
1302 Value *V = NamedValues[Name];
1303 return V ? V : ErrorV("Unknown variable name");
1304 }
1305
1306 Value *UnaryExprAST::Codegen() {
1307 Value *OperandV = Operand->Codegen();
1308 if (OperandV == 0) return 0;
1309
1310 Function *F = TheModule->getFunction(std::string("unary")+Opcode);
1311 if (F == 0)
1312 return ErrorV("Unknown unary operator");
1313
1314 return Builder.CreateCall(F, OperandV, "unop");
1315 }
1316
1317 Value *BinaryExprAST::Codegen() {
1318 Value *L = LHS->Codegen();
1319 Value *R = RHS->Codegen();
1320 if (L == 0 || R == 0) return 0;
1321
1322 switch (Op) {
1323 case '+': return Builder.CreateFAdd(L, R, "addtmp");
1324 case '-': return Builder.CreateFSub(L, R, "subtmp");
1325 case '*': return Builder.CreateFMul(L, R, "multmp");
1326 case '<':
1327 L = Builder.CreateFCmpULT(L, R, "cmptmp");
1328 // Convert bool 0/1 to double 0.0 or 1.0
1329 return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
1330 "booltmp");
1331 default: break;
1332 }
1333
1334 // If it wasn't a builtin binary operator, it must be a user defined one. Emit
1335 // a call to it.
1336 Function *F = TheModule->getFunction(std::string("binary")+Op);
1337 assert(F && "binary operator not found!");
1338
1339 Value *Ops[2] = { L, R };
1340 return Builder.CreateCall(F, Ops, "binop");
1341 }
1342
1343 Value *CallExprAST::Codegen() {
1344 // Look up the name in the global module table.
1345 Function *CalleeF = TheModule->getFunction(Callee);
1346 if (CalleeF == 0)
1347 return ErrorV("Unknown function referenced");
1348
1349 // If argument mismatch error.
1350 if (CalleeF->arg_size() != Args.size())
1351 return ErrorV("Incorrect # arguments passed");
1352
1353 std::vector<Value*> ArgsV;
1354 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
1355 ArgsV.push_back(Args[i]->Codegen());
1356 if (ArgsV.back() == 0) return 0;
1357 }
1358
1359 return Builder.CreateCall(CalleeF, ArgsV, "calltmp");
1360 }
1361
1362 Value *IfExprAST::Codegen() {
1363 Value *CondV = Cond->Codegen();
1364 if (CondV == 0) return 0;
1365
1366 // Convert condition to a bool by comparing equal to 0.0.
1367 CondV = Builder.CreateFCmpONE(CondV,
1368 ConstantFP::get(getGlobalContext(), APFloat(0.0)),
1369 "ifcond");
1370
1371 Function *TheFunction = Builder.GetInsertBlock()->getParent();
1372
1373 // Create blocks for the then and else cases. Insert the 'then' block at the
1374 // end of the function.
1375 BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction);
1376 BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else");
1377 BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont");
1378
1379 Builder.CreateCondBr(CondV, ThenBB, ElseBB);
1380
1381 // Emit then value.
1382 Builder.SetInsertPoint(ThenBB);
1383
1384 Value *ThenV = Then->Codegen();
1385 if (ThenV == 0) return 0;
1386
1387 Builder.CreateBr(MergeBB);
1388 // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
1389 ThenBB = Builder.GetInsertBlock();
1390
1391 // Emit else block.
1392 TheFunction->getBasicBlockList().push_back(ElseBB);
1393 Builder.SetInsertPoint(ElseBB);
1394
1395 Value *ElseV = Else->Codegen();
1396 if (ElseV == 0) return 0;
1397
1398 Builder.CreateBr(MergeBB);
1399 // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
1400 ElseBB = Builder.GetInsertBlock();
1401
1402 // Emit merge block.
1403 TheFunction->getBasicBlockList().push_back(MergeBB);
1404 Builder.SetInsertPoint(MergeBB);
1405 PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2,
1406 "iftmp");
1407
1408 PN->addIncoming(ThenV, ThenBB);
1409 PN->addIncoming(ElseV, ElseBB);
1410 return PN;
1411 }
1412
1413 Value *ForExprAST::Codegen() {
1414 // Output this as:
1415 // ...
1416 // start = startexpr
1417 // goto loop
1418 // loop:
1419 // variable = phi [start, loopheader], [nextvariable, loopend]
1420 // ...
1421 // bodyexpr
1422 // ...
1423 // loopend:
1424 // step = stepexpr
1425 // nextvariable = variable + step
1426 // endcond = endexpr
1427 // br endcond, loop, endloop
1428 // outloop:
1429
1430 // Emit the start code first, without 'variable' in scope.
1431 Value *StartVal = Start->Codegen();
1432 if (StartVal == 0) return 0;
1433
1434 // Make the new basic block for the loop header, inserting after current
1435 // block.
1436 Function *TheFunction = Builder.GetInsertBlock()->getParent();
1437 BasicBlock *PreheaderBB = Builder.GetInsertBlock();
1438 BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction);
1439
1440 // Insert an explicit fall through from the current block to the LoopBB.
1441 Builder.CreateBr(LoopBB);
1442
1443 // Start insertion in LoopBB.
1444 Builder.SetInsertPoint(LoopBB);
1445
1446 // Start the PHI node with an entry for Start.
1447 PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, VarName.c_str());
1448 Variable->addIncoming(StartVal, PreheaderBB);
1449
1450 // Within the loop, the variable is defined equal to the PHI node. If it
1451 // shadows an existing variable, we have to restore it, so save it now.
1452 Value *OldVal = NamedValues[VarName];
1453 NamedValues[VarName] = Variable;
1454
1455 // Emit the body of the loop. This, like any other expr, can change the
1456 // current BB. Note that we ignore the value computed by the body, but don't
1457 // allow an error.
1458 if (Body->Codegen() == 0)
1459 return 0;
1460
1461 // Emit the step value.
1462 Value *StepVal;
1463 if (Step) {
1464 StepVal = Step->Codegen();
1465 if (StepVal == 0) return 0;
1466 } else {
1467 // If not specified, use 1.0.
1468 StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0));
1469 }
1470
1471 Value *NextVar = Builder.CreateFAdd(Variable, StepVal, "nextvar");
1472
1473 // Compute the end condition.
1474 Value *EndCond = End->Codegen();
1475 if (EndCond == 0) return EndCond;
1476
1477 // Convert condition to a bool by comparing equal to 0.0.
1478 EndCond = Builder.CreateFCmpONE(EndCond,
1479 ConstantFP::get(getGlobalContext(), APFloat(0.0)),
1480 "loopcond");
1481
1482 // Create the "after loop" block and insert it.
1483 BasicBlock *LoopEndBB = Builder.GetInsertBlock();
1484 BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction);
1485
1486 // Insert the conditional branch into the end of LoopEndBB.
1487 Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
1488
1489 // Any new code will be inserted in AfterBB.
1490 Builder.SetInsertPoint(AfterBB);
1491
1492 // Add a new entry to the PHI node for the backedge.
1493 Variable->addIncoming(NextVar, LoopEndBB);
1494
1495 // Restore the unshadowed variable.
1496 if (OldVal)
1497 NamedValues[VarName] = OldVal;
1498 else
1499 NamedValues.erase(VarName);
1500
1501
1502 // for expr always returns 0.0.
1503 return Constant::getNullValue(Type::getDoubleTy(getGlobalContext()));
1504 }
1505
1506 Function *PrototypeAST::Codegen() {
1507 // Make the function type: double(double,double) etc.
1508 std::vector<Type*> Doubles(Args.size(),
1509 Type::getDoubleTy(getGlobalContext()));
1510 FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()),
1511 Doubles, false);
1512
1513 Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
1514
1515 // If F conflicted, there was already something named 'Name'. If it has a
1516 // body, don't allow redefinition or reextern.
1517 if (F->getName() != Name) {
1518 // Delete the one we just made and get the existing one.
1519 F->eraseFromParent();
1520 F = TheModule->getFunction(Name);
1521
1522 // If F already has a body, reject this.
1523 if (!F->empty()) {
1524 ErrorF("redefinition of function");
1525 return 0;
1526 }
1527
1528 // If F took a different number of args, reject.
1529 if (F->arg_size() != Args.size()) {
1530 ErrorF("redefinition of function with different # args");
1531 return 0;
1532 }
1533 }
1534
1535 // Set names for all arguments.
1536 unsigned Idx = 0;
1537 for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size();
1538 ++AI, ++Idx) {
1539 AI->setName(Args[Idx]);
1540
1541 // Add arguments to variable symbol table.
1542 NamedValues[Args[Idx]] = AI;
1543 }
1544
1545 return F;
1546 }
1547
1548 Function *FunctionAST::Codegen() {
1549 NamedValues.clear();
1550
1551 Function *TheFunction = Proto->Codegen();
1552 if (TheFunction == 0)
1553 return 0;
1554
1555 // If this is an operator, install it.
1556 if (Proto->isBinaryOp())
1557 BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence();
1558
1559 // Create a new basic block to start insertion into.
1560 BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
1561 Builder.SetInsertPoint(BB);
1562
1563 if (Value *RetVal = Body->Codegen()) {
1564 // Finish off the function.
1565 Builder.CreateRet(RetVal);
1566
1567 // Validate the generated code, checking for consistency.
1568 verifyFunction(*TheFunction);
1569
1570 // Optimize the function.
1571 TheFPM->run(*TheFunction);
1572
1573 return TheFunction;
1574 }
1575
1576 // Error reading body, remove function.
1577 TheFunction->eraseFromParent();
1578
1579 if (Proto->isBinaryOp())
1580 BinopPrecedence.erase(Proto->getOperatorName());
1581 return 0;
1582 }
1583
1584 //===----------------------------------------------------------------------===//
1585 // Top-Level parsing and JIT Driver
1586 //===----------------------------------------------------------------------===//
1587
1588 static ExecutionEngine *TheExecutionEngine;
1589
1590 static void HandleDefinition() {
1591 if (FunctionAST *F = ParseDefinition()) {
1592 if (Function *LF = F->Codegen()) {
1593 fprintf(stderr, "Read function definition:");
1594 LF->dump();
1595 }
1596 } else {
1597 // Skip token for error recovery.
1598 getNextToken();
1599 }
1600 }
1601
1602 static void HandleExtern() {
1603 if (PrototypeAST *P = ParseExtern()) {
1604 if (Function *F = P->Codegen()) {
1605 fprintf(stderr, "Read extern: ");
1606 F->dump();
1607 }
1608 } else {
1609 // Skip token for error recovery.
1610 getNextToken();
1611 }
1612 }
1613
1614 static void HandleTopLevelExpression() {
1615 // Evaluate a top-level expression into an anonymous function.
1616 if (FunctionAST *F = ParseTopLevelExpr()) {
1617 if (Function *LF = F->Codegen()) {
1618 // JIT the function, returning a function pointer.
1619 void *FPtr = TheExecutionEngine->getPointerToFunction(LF);
1620
1621 // Cast it to the right type (takes no arguments, returns a double) so we
1622 // can call it as a native function.
1623 double (*FP)() = (double (*)())(intptr_t)FPtr;
1624 fprintf(stderr, "Evaluated to %f\n", FP());
1625 }
1626 } else {
1627 // Skip token for error recovery.
1628 getNextToken();
1629 }
1630 }
1631
1632 /// top ::= definition | external | expression | ';'
1633 static void MainLoop() {
1634 while (1) {
1635 fprintf(stderr, "ready> ");
1636 switch (CurTok) {
1637 case tok_eof: return;
1638 case ';': getNextToken(); break; // ignore top-level semicolons.
1639 case tok_def: HandleDefinition(); break;
1640 case tok_extern: HandleExtern(); break;
1641 default: HandleTopLevelExpression(); break;
1642 }
1643 }
1644 }
1645
1646 //===----------------------------------------------------------------------===//
1647 // "Library" functions that can be "extern'd" from user code.
1648 //===----------------------------------------------------------------------===//
1649
1650 /// putchard - putchar that takes a double and returns 0.
1651 extern "C"
1652 double putchard(double X) {
1653 putchar((char)X);
1654 return 0;
1655 }
1656
1657 /// printd - printf that takes a double prints it as "%f\n", returning 0.
1658 extern "C"
1659 double printd(double X) {
1660 printf("%f\n", X);
1661 return 0;
1662 }
1663
1664 //===----------------------------------------------------------------------===//
1665 // Main driver code.
1666 //===----------------------------------------------------------------------===//
1667
1668 int main() {
1669 InitializeNativeTarget();
1670 LLVMContext &Context = getGlobalContext();
1671
1672 // Install standard binary operators.
1673 // 1 is lowest precedence.
1674 BinopPrecedence['<'] = 10;
1675 BinopPrecedence['+'] = 20;
1676 BinopPrecedence['-'] = 20;
1677 BinopPrecedence['*'] = 40; // highest.
1678
1679 // Prime the first token.
1680 fprintf(stderr, "ready> ");
1681 getNextToken();
1682
1683 // Make the module, which holds all the code.
1684 TheModule = new Module("my cool jit", Context);
1685
1686 // Create the JIT. This takes ownership of the module.
1687 std::string ErrStr;
1688 TheExecutionEngine = EngineBuilder(TheModule).setErrorStr(&ErrStr).create();
1689 if (!TheExecutionEngine) {
1690 fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str());
1691 exit(1);
1692 }
1693
1694 FunctionPassManager OurFPM(TheModule);
1695
1696 // Set up the optimizer pipeline. Start with registering info about how the
1697 // target lays out data structures.
1698 OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout()));
1699 // Provide basic AliasAnalysis support for GVN.
1700 OurFPM.add(createBasicAliasAnalysisPass());
1701 // Do simple "peephole" optimizations and bit-twiddling optzns.
1702 OurFPM.add(createInstructionCombiningPass());
1703 // Reassociate expressions.
1704 OurFPM.add(createReassociatePass());
1705 // Eliminate Common SubExpressions.
1706 OurFPM.add(createGVNPass());
1707 // Simplify the control flow graph (deleting unreachable blocks, etc).
1708 OurFPM.add(createCFGSimplificationPass());
1709
1710 OurFPM.doInitialization();
1711
1712 // Set the global so the code gen can use this.
1713 TheFPM = &OurFPM;
1714
1715 // Run the main "interpreter loop" now.
1716 MainLoop();
1717
1718 TheFPM = 0;
1719
1720 // Print out all of the generated code.
1721 TheModule->dump();
1722
1723 return 0;
1724 }
1725
1726`Next: Extending the language: mutable variables / SSA
1727construction <LangImpl7.html>`_
1728