blob: 8650892e8f8bdfddb73ff122d26d80ed9e39abcb [file] [log] [blame]
Sean Silvad7fb3962012-12-05 00:26:32 +00001==================================================
2Kaleidoscope: Extending the Language: Control Flow
3==================================================
4
5.. contents::
6 :local:
7
Sean Silvad7fb3962012-12-05 00:26:32 +00008Chapter 5 Introduction
9======================
10
11Welcome to Chapter 5 of the "`Implementing a language with
12LLVM <index.html>`_" tutorial. Parts 1-4 described the implementation of
13the simple Kaleidoscope language and included support for generating
14LLVM IR, followed by optimizations and a JIT compiler. Unfortunately, as
15presented, Kaleidoscope is mostly useless: it has no control flow other
16than call and return. This means that you can't have conditional
17branches in the code, significantly limiting its power. In this episode
18of "build that compiler", we'll extend Kaleidoscope to have an
19if/then/else expression plus a simple 'for' loop.
20
21If/Then/Else
22============
23
24Extending Kaleidoscope to support if/then/else is quite straightforward.
25It basically requires adding support for this "new" concept to the
26lexer, parser, AST, and LLVM code emitter. This example is nice, because
27it shows how easy it is to "grow" a language over time, incrementally
28extending it as new ideas are discovered.
29
30Before we get going on "how" we add this extension, lets talk about
31"what" we want. The basic idea is that we want to be able to write this
32sort of thing:
33
34::
35
36 def fib(x)
37 if x < 3 then
38 1
39 else
40 fib(x-1)+fib(x-2);
41
42In Kaleidoscope, every construct is an expression: there are no
43statements. As such, the if/then/else expression needs to return a value
44like any other. Since we're using a mostly functional form, we'll have
45it evaluate its conditional, then return the 'then' or 'else' value
46based on how the condition was resolved. This is very similar to the C
47"?:" expression.
48
49The semantics of the if/then/else expression is that it evaluates the
50condition to a boolean equality value: 0.0 is considered to be false and
51everything else is considered to be true. If the condition is true, the
52first subexpression is evaluated and returned, if the condition is
53false, the second subexpression is evaluated and returned. Since
54Kaleidoscope allows side-effects, this behavior is important to nail
55down.
56
57Now that we know what we "want", lets break this down into its
58constituent pieces.
59
60Lexer Extensions for If/Then/Else
61---------------------------------
62
63The lexer extensions are straightforward. First we add new enum values
64for the relevant tokens:
65
66.. code-block:: c++
67
68 // control
Lang Hames59b0da82015-08-19 18:15:58 +000069 tok_if = -6,
70 tok_then = -7,
71 tok_else = -8,
Sean Silvad7fb3962012-12-05 00:26:32 +000072
73Once we have that, we recognize the new keywords in the lexer. This is
74pretty simple stuff:
75
76.. code-block:: c++
77
78 ...
Lang Hames59b0da82015-08-19 18:15:58 +000079 if (IdentifierStr == "def")
80 return tok_def;
81 if (IdentifierStr == "extern")
82 return tok_extern;
83 if (IdentifierStr == "if")
84 return tok_if;
85 if (IdentifierStr == "then")
86 return tok_then;
87 if (IdentifierStr == "else")
88 return tok_else;
Sean Silvad7fb3962012-12-05 00:26:32 +000089 return tok_identifier;
90
91AST Extensions for If/Then/Else
92-------------------------------
93
94To represent the new expression we add a new AST node for it:
95
96.. code-block:: c++
97
98 /// IfExprAST - Expression class for if/then/else.
99 class IfExprAST : public ExprAST {
Lang Hames9d7593f2015-08-26 20:57:03 +0000100 std::unique_ptr<ExprAST> Cond, Then, Else;
Lang Hames59b0da82015-08-19 18:15:58 +0000101
Sean Silvad7fb3962012-12-05 00:26:32 +0000102 public:
Lang Hames09bf4c12015-08-18 18:11:06 +0000103 IfExprAST(std::unique_ptr<ExprAST> Cond, std::unique_ptr<ExprAST> Then,
104 std::unique_ptr<ExprAST> Else)
105 : Cond(std::move(Cond)), Then(std::move(Then)), Else(std::move(Else)) {}
Mehdi Aminibb6805d2017-02-11 21:26:52 +0000106
107 Value *codegen() override;
Sean Silvad7fb3962012-12-05 00:26:32 +0000108 };
109
110The AST node just has pointers to the various subexpressions.
111
112Parser Extensions for If/Then/Else
113----------------------------------
114
115Now that we have the relevant tokens coming from the lexer and we have
116the AST node to build, our parsing logic is relatively straightforward.
117First we define a new parsing function:
118
119.. code-block:: c++
120
121 /// ifexpr ::= 'if' expression 'then' expression 'else' expression
Lang Hames09bf4c12015-08-18 18:11:06 +0000122 static std::unique_ptr<ExprAST> ParseIfExpr() {
Sean Silvad7fb3962012-12-05 00:26:32 +0000123 getNextToken(); // eat the if.
124
125 // condition.
Lang Hames09bf4c12015-08-18 18:11:06 +0000126 auto Cond = ParseExpression();
Lang Hames59b0da82015-08-19 18:15:58 +0000127 if (!Cond)
128 return nullptr;
Sean Silvad7fb3962012-12-05 00:26:32 +0000129
130 if (CurTok != tok_then)
Lang Hames5d045a92016-03-25 17:41:26 +0000131 return LogError("expected then");
Sean Silvad7fb3962012-12-05 00:26:32 +0000132 getNextToken(); // eat the then
133
Lang Hames09bf4c12015-08-18 18:11:06 +0000134 auto Then = ParseExpression();
Lang Hames59b0da82015-08-19 18:15:58 +0000135 if (!Then)
136 return nullptr;
Sean Silvad7fb3962012-12-05 00:26:32 +0000137
138 if (CurTok != tok_else)
Lang Hames5d045a92016-03-25 17:41:26 +0000139 return LogError("expected else");
Sean Silvad7fb3962012-12-05 00:26:32 +0000140
141 getNextToken();
142
Lang Hames09bf4c12015-08-18 18:11:06 +0000143 auto Else = ParseExpression();
Lang Hames59b0da82015-08-19 18:15:58 +0000144 if (!Else)
145 return nullptr;
Sean Silvad7fb3962012-12-05 00:26:32 +0000146
Lang Hames09bf4c12015-08-18 18:11:06 +0000147 return llvm::make_unique<IfExprAST>(std::move(Cond), std::move(Then),
148 std::move(Else));
Sean Silvad7fb3962012-12-05 00:26:32 +0000149 }
150
151Next we hook it up as a primary expression:
152
153.. code-block:: c++
154
Lang Hames09bf4c12015-08-18 18:11:06 +0000155 static std::unique_ptr<ExprAST> ParsePrimary() {
Sean Silvad7fb3962012-12-05 00:26:32 +0000156 switch (CurTok) {
Lang Hames59b0da82015-08-19 18:15:58 +0000157 default:
Lang Hames5d045a92016-03-25 17:41:26 +0000158 return LogError("unknown token when expecting an expression");
Lang Hames59b0da82015-08-19 18:15:58 +0000159 case tok_identifier:
160 return ParseIdentifierExpr();
161 case tok_number:
162 return ParseNumberExpr();
163 case '(':
164 return ParseParenExpr();
165 case tok_if:
166 return ParseIfExpr();
Sean Silvad7fb3962012-12-05 00:26:32 +0000167 }
168 }
169
170LLVM IR for If/Then/Else
171------------------------
172
173Now that we have it parsing and building the AST, the final piece is
174adding LLVM code generation support. This is the most interesting part
175of the if/then/else example, because this is where it starts to
176introduce new concepts. All of the code above has been thoroughly
177described in previous chapters.
178
179To motivate the code we want to produce, lets take a look at a simple
180example. Consider:
181
182::
183
184 extern foo();
185 extern bar();
186 def baz(x) if x then foo() else bar();
187
188If you disable optimizations, the code you'll (soon) get from
189Kaleidoscope looks like this:
190
191.. code-block:: llvm
192
193 declare double @foo()
194
195 declare double @bar()
196
197 define double @baz(double %x) {
198 entry:
199 %ifcond = fcmp one double %x, 0.000000e+00
200 br i1 %ifcond, label %then, label %else
201
202 then: ; preds = %entry
203 %calltmp = call double @foo()
204 br label %ifcont
205
206 else: ; preds = %entry
207 %calltmp1 = call double @bar()
208 br label %ifcont
209
210 ifcont: ; preds = %else, %then
211 %iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ]
212 ret double %iftmp
213 }
214
215To visualize the control flow graph, you can use a nifty feature of the
216LLVM '`opt <http://llvm.org/cmds/opt.html>`_' tool. If you put this LLVM
217IR into "t.ll" and run "``llvm-as < t.ll | opt -analyze -view-cfg``", `a
Alex Denisov596e9792015-12-15 20:50:29 +0000218window will pop up <../ProgrammersManual.html#viewing-graphs-while-debugging-code>`_ and you'll
Sean Silvad7fb3962012-12-05 00:26:32 +0000219see this graph:
220
Wilfred Hughes945f43e2016-07-02 17:01:59 +0000221.. figure:: LangImpl05-cfg.png
Sean Silvad7fb3962012-12-05 00:26:32 +0000222 :align: center
223 :alt: Example CFG
224
225 Example CFG
226
227Another way to get this is to call "``F->viewCFG()``" or
228"``F->viewCFGOnly()``" (where F is a "``Function*``") either by
229inserting actual calls into the code and recompiling or by calling these
230in the debugger. LLVM has many nice features for visualizing various
231graphs.
232
233Getting back to the generated code, it is fairly simple: the entry block
234evaluates the conditional expression ("x" in our case here) and compares
235the result to 0.0 with the "``fcmp one``" instruction ('one' is "Ordered
236and Not Equal"). Based on the result of this expression, the code jumps
237to either the "then" or "else" blocks, which contain the expressions for
238the true/false cases.
239
240Once the then/else blocks are finished executing, they both branch back
241to the 'ifcont' block to execute the code that happens after the
242if/then/else. In this case the only thing left to do is to return to the
243caller of the function. The question then becomes: how does the code
244know which expression to return?
245
246The answer to this question involves an important SSA operation: the
247`Phi
248operation <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_.
249If you're not familiar with SSA, `the wikipedia
250article <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
251is a good introduction and there are various other introductions to it
252available on your favorite search engine. The short version is that
253"execution" of the Phi operation requires "remembering" which block
254control came from. The Phi operation takes on the value corresponding to
255the input control block. In this case, if control comes in from the
256"then" block, it gets the value of "calltmp". If control comes from the
257"else" block, it gets the value of "calltmp1".
258
259At this point, you are probably starting to think "Oh no! This means my
260simple and elegant front-end will have to start generating SSA form in
261order to use LLVM!". Fortunately, this is not the case, and we strongly
262advise *not* implementing an SSA construction algorithm in your
263front-end unless there is an amazingly good reason to do so. In
264practice, there are two sorts of values that float around in code
265written for your average imperative programming language that might need
266Phi nodes:
267
268#. Code that involves user variables: ``x = 1; x = x + 1;``
269#. Values that are implicit in the structure of your AST, such as the
270 Phi node in this case.
271
Kirill Bobyreve4364832017-07-10 09:07:23 +0000272In `Chapter 7 <LangImpl07.html>`_ of this tutorial ("mutable variables"),
Sean Silvad7fb3962012-12-05 00:26:32 +0000273we'll talk about #1 in depth. For now, just believe me that you don't
274need SSA construction to handle this case. For #2, you have the choice
275of using the techniques that we will describe for #1, or you can insert
Ed Maste8ed40ce2015-04-14 20:52:58 +0000276Phi nodes directly, if convenient. In this case, it is really
Sean Silvad7fb3962012-12-05 00:26:32 +0000277easy to generate the Phi node, so we choose to do it directly.
278
279Okay, enough of the motivation and overview, lets generate code!
280
281Code Generation for If/Then/Else
282--------------------------------
283
Lang Hames2d789c32015-08-26 03:07:41 +0000284In order to generate code for this, we implement the ``codegen`` method
Sean Silvad7fb3962012-12-05 00:26:32 +0000285for ``IfExprAST``:
286
287.. code-block:: c++
288
Lang Hames2d789c32015-08-26 03:07:41 +0000289 Value *IfExprAST::codegen() {
290 Value *CondV = Cond->codegen();
Lang Hames59b0da82015-08-19 18:15:58 +0000291 if (!CondV)
292 return nullptr;
Sean Silvad7fb3962012-12-05 00:26:32 +0000293
Mehdi Aminibb6805d2017-02-11 21:26:52 +0000294 // Convert condition to a bool by comparing non-equal to 0.0.
Lang Hames59b0da82015-08-19 18:15:58 +0000295 CondV = Builder.CreateFCmpONE(
Mehdi Aminibb6805d2017-02-11 21:26:52 +0000296 CondV, ConstantFP::get(TheContext, APFloat(0.0)), "ifcond");
Sean Silvad7fb3962012-12-05 00:26:32 +0000297
298This code is straightforward and similar to what we saw before. We emit
299the expression for the condition, then compare that value to zero to get
300a truth value as a 1-bit (bool) value.
301
302.. code-block:: c++
303
304 Function *TheFunction = Builder.GetInsertBlock()->getParent();
305
306 // Create blocks for the then and else cases. Insert the 'then' block at the
307 // end of the function.
Lang Hames59b0da82015-08-19 18:15:58 +0000308 BasicBlock *ThenBB =
Mehdi Aminibb6805d2017-02-11 21:26:52 +0000309 BasicBlock::Create(TheContext, "then", TheFunction);
310 BasicBlock *ElseBB = BasicBlock::Create(TheContext, "else");
311 BasicBlock *MergeBB = BasicBlock::Create(TheContext, "ifcont");
Sean Silvad7fb3962012-12-05 00:26:32 +0000312
313 Builder.CreateCondBr(CondV, ThenBB, ElseBB);
314
315This code creates the basic blocks that are related to the if/then/else
316statement, and correspond directly to the blocks in the example above.
317The first line gets the current Function object that is being built. It
318gets this by asking the builder for the current BasicBlock, and asking
319that block for its "parent" (the function it is currently embedded
320into).
321
322Once it has that, it creates three blocks. Note that it passes
323"TheFunction" into the constructor for the "then" block. This causes the
324constructor to automatically insert the new block into the end of the
325specified function. The other two blocks are created, but aren't yet
326inserted into the function.
327
328Once the blocks are created, we can emit the conditional branch that
329chooses between them. Note that creating new blocks does not implicitly
330affect the IRBuilder, so it is still inserting into the block that the
331condition went into. Also note that it is creating a branch to the
332"then" block and the "else" block, even though the "else" block isn't
333inserted into the function yet. This is all ok: it is the standard way
334that LLVM supports forward references.
335
336.. code-block:: c++
337
338 // Emit then value.
339 Builder.SetInsertPoint(ThenBB);
340
Lang Hames2d789c32015-08-26 03:07:41 +0000341 Value *ThenV = Then->codegen();
Lang Hames59b0da82015-08-19 18:15:58 +0000342 if (!ThenV)
343 return nullptr;
Sean Silvad7fb3962012-12-05 00:26:32 +0000344
345 Builder.CreateBr(MergeBB);
346 // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
347 ThenBB = Builder.GetInsertBlock();
348
349After the conditional branch is inserted, we move the builder to start
350inserting into the "then" block. Strictly speaking, this call moves the
351insertion point to be at the end of the specified block. However, since
352the "then" block is empty, it also starts out by inserting at the
353beginning of the block. :)
354
355Once the insertion point is set, we recursively codegen the "then"
356expression from the AST. To finish off the "then" block, we create an
357unconditional branch to the merge block. One interesting (and very
358important) aspect of the LLVM IR is that it `requires all basic blocks
359to be "terminated" <../LangRef.html#functionstructure>`_ with a `control
360flow instruction <../LangRef.html#terminators>`_ such as return or
361branch. This means that all control flow, *including fall throughs* must
362be made explicit in the LLVM IR. If you violate this rule, the verifier
363will emit an error.
364
365The final line here is quite subtle, but is very important. The basic
366issue is that when we create the Phi node in the merge block, we need to
367set up the block/value pairs that indicate how the Phi will work.
368Importantly, the Phi node expects to have an entry for each predecessor
369of the block in the CFG. Why then, are we getting the current block when
370we just set it to ThenBB 5 lines above? The problem is that the "Then"
371expression may actually itself change the block that the Builder is
372emitting into if, for example, it contains a nested "if/then/else"
Lang Hames2d789c32015-08-26 03:07:41 +0000373expression. Because calling ``codegen()`` recursively could arbitrarily change
Sean Silvad7fb3962012-12-05 00:26:32 +0000374the notion of the current block, we are required to get an up-to-date
375value for code that will set up the Phi node.
376
377.. code-block:: c++
378
379 // Emit else block.
380 TheFunction->getBasicBlockList().push_back(ElseBB);
381 Builder.SetInsertPoint(ElseBB);
382
Lang Hames2d789c32015-08-26 03:07:41 +0000383 Value *ElseV = Else->codegen();
Lang Hames59b0da82015-08-19 18:15:58 +0000384 if (!ElseV)
385 return nullptr;
Sean Silvad7fb3962012-12-05 00:26:32 +0000386
387 Builder.CreateBr(MergeBB);
Lang Hames2d789c32015-08-26 03:07:41 +0000388 // codegen of 'Else' can change the current block, update ElseBB for the PHI.
Sean Silvad7fb3962012-12-05 00:26:32 +0000389 ElseBB = Builder.GetInsertBlock();
390
391Code generation for the 'else' block is basically identical to codegen
392for the 'then' block. The only significant difference is the first line,
393which adds the 'else' block to the function. Recall previously that the
394'else' block was created, but not added to the function. Now that the
395'then' and 'else' blocks are emitted, we can finish up with the merge
396code:
397
398.. code-block:: c++
399
400 // Emit merge block.
401 TheFunction->getBasicBlockList().push_back(MergeBB);
402 Builder.SetInsertPoint(MergeBB);
Lang Hames59b0da82015-08-19 18:15:58 +0000403 PHINode *PN =
Mehdi Aminibb6805d2017-02-11 21:26:52 +0000404 Builder.CreatePHI(Type::getDoubleTy(TheContext), 2, "iftmp");
Sean Silvad7fb3962012-12-05 00:26:32 +0000405
406 PN->addIncoming(ThenV, ThenBB);
407 PN->addIncoming(ElseV, ElseBB);
408 return PN;
409 }
410
411The first two lines here are now familiar: the first adds the "merge"
412block to the Function object (it was previously floating, like the else
Sean Silvafb8908c2015-03-31 22:48:45 +0000413block above). The second changes the insertion point so that newly
Sean Silvad7fb3962012-12-05 00:26:32 +0000414created code will go into the "merge" block. Once that is done, we need
415to create the PHI node and set up the block/value pairs for the PHI.
416
417Finally, the CodeGen function returns the phi node as the value computed
418by the if/then/else expression. In our example above, this returned
419value will feed into the code for the top-level function, which will
420create the return instruction.
421
422Overall, we now have the ability to execute conditional code in
423Kaleidoscope. With this extension, Kaleidoscope is a fairly complete
424language that can calculate a wide variety of numeric functions. Next up
425we'll add another useful expression that is familiar from non-functional
426languages...
427
428'for' Loop Expression
429=====================
430
431Now that we know how to add basic control flow constructs to the
432language, we have the tools to add more powerful things. Lets add
433something more aggressive, a 'for' expression:
434
435::
436
Mehdi Aminibb6805d2017-02-11 21:26:52 +0000437 extern putchard(char);
Sean Silvad7fb3962012-12-05 00:26:32 +0000438 def printstar(n)
439 for i = 1, i < n, 1.0 in
440 putchard(42); # ascii 42 = '*'
441
442 # print 100 '*' characters
443 printstar(100);
444
445This expression defines a new variable ("i" in this case) which iterates
446from a starting value, while the condition ("i < n" in this case) is
447true, incrementing by an optional step value ("1.0" in this case). If
448the step value is omitted, it defaults to 1.0. While the loop is true,
449it executes its body expression. Because we don't have anything better
450to return, we'll just define the loop as always returning 0.0. In the
451future when we have mutable variables, it will get more useful.
452
453As before, lets talk about the changes that we need to Kaleidoscope to
454support this.
455
456Lexer Extensions for the 'for' Loop
457-----------------------------------
458
459The lexer extensions are the same sort of thing as for if/then/else:
460
461.. code-block:: c++
462
463 ... in enum Token ...
464 // control
465 tok_if = -6, tok_then = -7, tok_else = -8,
466 tok_for = -9, tok_in = -10
467
468 ... in gettok ...
Lang Hames59b0da82015-08-19 18:15:58 +0000469 if (IdentifierStr == "def")
470 return tok_def;
471 if (IdentifierStr == "extern")
472 return tok_extern;
473 if (IdentifierStr == "if")
474 return tok_if;
475 if (IdentifierStr == "then")
476 return tok_then;
477 if (IdentifierStr == "else")
478 return tok_else;
479 if (IdentifierStr == "for")
480 return tok_for;
481 if (IdentifierStr == "in")
482 return tok_in;
Sean Silvad7fb3962012-12-05 00:26:32 +0000483 return tok_identifier;
484
485AST Extensions for the 'for' Loop
486---------------------------------
487
488The AST node is just as simple. It basically boils down to capturing the
489variable name and the constituent expressions in the node.
490
491.. code-block:: c++
492
493 /// ForExprAST - Expression class for for/in.
494 class ForExprAST : public ExprAST {
495 std::string VarName;
Lang Hames09bf4c12015-08-18 18:11:06 +0000496 std::unique_ptr<ExprAST> Start, End, Step, Body;
Lang Hames59b0da82015-08-19 18:15:58 +0000497
Sean Silvad7fb3962012-12-05 00:26:32 +0000498 public:
Lang Hames09bf4c12015-08-18 18:11:06 +0000499 ForExprAST(const std::string &VarName, std::unique_ptr<ExprAST> Start,
500 std::unique_ptr<ExprAST> End, std::unique_ptr<ExprAST> Step,
501 std::unique_ptr<ExprAST> Body)
502 : VarName(VarName), Start(std::move(Start)), End(std::move(End)),
503 Step(std::move(Step)), Body(std::move(Body)) {}
Mehdi Aminibb6805d2017-02-11 21:26:52 +0000504
505 Value *codegen() override;
Sean Silvad7fb3962012-12-05 00:26:32 +0000506 };
507
508Parser Extensions for the 'for' Loop
509------------------------------------
510
511The parser code is also fairly standard. The only interesting thing here
512is handling of the optional step value. The parser code handles it by
513checking to see if the second comma is present. If not, it sets the step
514value to null in the AST node:
515
516.. code-block:: c++
517
518 /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
Lang Hames09bf4c12015-08-18 18:11:06 +0000519 static std::unique_ptr<ExprAST> ParseForExpr() {
Sean Silvad7fb3962012-12-05 00:26:32 +0000520 getNextToken(); // eat the for.
521
522 if (CurTok != tok_identifier)
Lang Hames5d045a92016-03-25 17:41:26 +0000523 return LogError("expected identifier after for");
Sean Silvad7fb3962012-12-05 00:26:32 +0000524
525 std::string IdName = IdentifierStr;
526 getNextToken(); // eat identifier.
527
528 if (CurTok != '=')
Lang Hames5d045a92016-03-25 17:41:26 +0000529 return LogError("expected '=' after for");
Sean Silvad7fb3962012-12-05 00:26:32 +0000530 getNextToken(); // eat '='.
531
532
Lang Hames09bf4c12015-08-18 18:11:06 +0000533 auto Start = ParseExpression();
Lang Hames59b0da82015-08-19 18:15:58 +0000534 if (!Start)
535 return nullptr;
Sean Silvad7fb3962012-12-05 00:26:32 +0000536 if (CurTok != ',')
Lang Hames5d045a92016-03-25 17:41:26 +0000537 return LogError("expected ',' after for start value");
Sean Silvad7fb3962012-12-05 00:26:32 +0000538 getNextToken();
539
Lang Hames09bf4c12015-08-18 18:11:06 +0000540 auto End = ParseExpression();
Lang Hames59b0da82015-08-19 18:15:58 +0000541 if (!End)
542 return nullptr;
Sean Silvad7fb3962012-12-05 00:26:32 +0000543
544 // The step value is optional.
Lang Hames09bf4c12015-08-18 18:11:06 +0000545 std::unique_ptr<ExprAST> Step;
Sean Silvad7fb3962012-12-05 00:26:32 +0000546 if (CurTok == ',') {
547 getNextToken();
548 Step = ParseExpression();
Lang Hames59b0da82015-08-19 18:15:58 +0000549 if (!Step)
550 return nullptr;
Sean Silvad7fb3962012-12-05 00:26:32 +0000551 }
552
553 if (CurTok != tok_in)
Lang Hames5d045a92016-03-25 17:41:26 +0000554 return LogError("expected 'in' after for");
Sean Silvad7fb3962012-12-05 00:26:32 +0000555 getNextToken(); // eat 'in'.
556
Lang Hames09bf4c12015-08-18 18:11:06 +0000557 auto Body = ParseExpression();
Lang Hames59b0da82015-08-19 18:15:58 +0000558 if (!Body)
559 return nullptr;
Sean Silvad7fb3962012-12-05 00:26:32 +0000560
Lang Hames09bf4c12015-08-18 18:11:06 +0000561 return llvm::make_unique<ForExprAST>(IdName, std::move(Start),
562 std::move(End), std::move(Step),
563 std::move(Body));
Sean Silvad7fb3962012-12-05 00:26:32 +0000564 }
565
Mehdi Aminibb6805d2017-02-11 21:26:52 +0000566And again we hook it up as a primary expression:
567
568.. code-block:: c++
569
570 static std::unique_ptr<ExprAST> ParsePrimary() {
571 switch (CurTok) {
572 default:
573 return LogError("unknown token when expecting an expression");
574 case tok_identifier:
575 return ParseIdentifierExpr();
576 case tok_number:
577 return ParseNumberExpr();
578 case '(':
579 return ParseParenExpr();
580 case tok_if:
581 return ParseIfExpr();
582 case tok_for:
583 return ParseForExpr();
584 }
585 }
586
Sean Silvad7fb3962012-12-05 00:26:32 +0000587LLVM IR for the 'for' Loop
588--------------------------
589
590Now we get to the good part: the LLVM IR we want to generate for this
591thing. With the simple example above, we get this LLVM IR (note that
592this dump is generated with optimizations disabled for clarity):
593
594.. code-block:: llvm
595
596 declare double @putchard(double)
597
598 define double @printstar(double %n) {
599 entry:
600 ; initial value = 1.0 (inlined into phi)
601 br label %loop
602
603 loop: ; preds = %loop, %entry
604 %i = phi double [ 1.000000e+00, %entry ], [ %nextvar, %loop ]
605 ; body
606 %calltmp = call double @putchard(double 4.200000e+01)
607 ; increment
608 %nextvar = fadd double %i, 1.000000e+00
609
610 ; termination test
611 %cmptmp = fcmp ult double %i, %n
612 %booltmp = uitofp i1 %cmptmp to double
613 %loopcond = fcmp one double %booltmp, 0.000000e+00
614 br i1 %loopcond, label %loop, label %afterloop
615
616 afterloop: ; preds = %loop
617 ; loop always returns 0.0
618 ret double 0.000000e+00
619 }
620
621This loop contains all the same constructs we saw before: a phi node,
622several expressions, and some basic blocks. Lets see how this fits
623together.
624
625Code Generation for the 'for' Loop
626----------------------------------
627
Lang Hames2d789c32015-08-26 03:07:41 +0000628The first part of codegen is very simple: we just output the start
Sean Silvad7fb3962012-12-05 00:26:32 +0000629expression for the loop value:
630
631.. code-block:: c++
632
Lang Hames2d789c32015-08-26 03:07:41 +0000633 Value *ForExprAST::codegen() {
Sean Silvad7fb3962012-12-05 00:26:32 +0000634 // Emit the start code first, without 'variable' in scope.
Lang Hames2d789c32015-08-26 03:07:41 +0000635 Value *StartVal = Start->codegen();
Mehdi Aminibb6805d2017-02-11 21:26:52 +0000636 if (!StartVal)
637 return nullptr;
Sean Silvad7fb3962012-12-05 00:26:32 +0000638
639With this out of the way, the next step is to set up the LLVM basic
640block for the start of the loop body. In the case above, the whole loop
641body is one block, but remember that the body code itself could consist
642of multiple blocks (e.g. if it contains an if/then/else or a for/in
643expression).
644
645.. code-block:: c++
646
647 // Make the new basic block for the loop header, inserting after current
648 // block.
649 Function *TheFunction = Builder.GetInsertBlock()->getParent();
650 BasicBlock *PreheaderBB = Builder.GetInsertBlock();
Lang Hames59b0da82015-08-19 18:15:58 +0000651 BasicBlock *LoopBB =
Mehdi Aminibb6805d2017-02-11 21:26:52 +0000652 BasicBlock::Create(TheContext, "loop", TheFunction);
Sean Silvad7fb3962012-12-05 00:26:32 +0000653
654 // Insert an explicit fall through from the current block to the LoopBB.
655 Builder.CreateBr(LoopBB);
656
657This code is similar to what we saw for if/then/else. Because we will
658need it to create the Phi node, we remember the block that falls through
659into the loop. Once we have that, we create the actual block that starts
660the loop and create an unconditional branch for the fall-through between
661the two blocks.
662
663.. code-block:: c++
664
665 // Start insertion in LoopBB.
666 Builder.SetInsertPoint(LoopBB);
667
668 // Start the PHI node with an entry for Start.
Mehdi Aminibb6805d2017-02-11 21:26:52 +0000669 PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(TheContext),
Lang Hames59b0da82015-08-19 18:15:58 +0000670 2, VarName.c_str());
Sean Silvad7fb3962012-12-05 00:26:32 +0000671 Variable->addIncoming(StartVal, PreheaderBB);
672
673Now that the "preheader" for the loop is set up, we switch to emitting
674code for the loop body. To begin with, we move the insertion point and
675create the PHI node for the loop induction variable. Since we already
676know the incoming value for the starting value, we add it to the Phi
677node. Note that the Phi will eventually get a second value for the
678backedge, but we can't set it up yet (because it doesn't exist!).
679
680.. code-block:: c++
681
682 // Within the loop, the variable is defined equal to the PHI node. If it
683 // shadows an existing variable, we have to restore it, so save it now.
684 Value *OldVal = NamedValues[VarName];
685 NamedValues[VarName] = Variable;
686
687 // Emit the body of the loop. This, like any other expr, can change the
688 // current BB. Note that we ignore the value computed by the body, but don't
689 // allow an error.
Lang Hames2d789c32015-08-26 03:07:41 +0000690 if (!Body->codegen())
Lang Hames59b0da82015-08-19 18:15:58 +0000691 return nullptr;
Sean Silvad7fb3962012-12-05 00:26:32 +0000692
693Now the code starts to get more interesting. Our 'for' loop introduces a
694new variable to the symbol table. This means that our symbol table can
695now contain either function arguments or loop variables. To handle this,
696before we codegen the body of the loop, we add the loop variable as the
697current value for its name. Note that it is possible that there is a
698variable of the same name in the outer scope. It would be easy to make
699this an error (emit an error and return null if there is already an
700entry for VarName) but we choose to allow shadowing of variables. In
701order to handle this correctly, we remember the Value that we are
702potentially shadowing in ``OldVal`` (which will be null if there is no
703shadowed variable).
704
705Once the loop variable is set into the symbol table, the code
706recursively codegen's the body. This allows the body to use the loop
707variable: any references to it will naturally find it in the symbol
708table.
709
710.. code-block:: c++
711
712 // Emit the step value.
Lang Hames59b0da82015-08-19 18:15:58 +0000713 Value *StepVal = nullptr;
Sean Silvad7fb3962012-12-05 00:26:32 +0000714 if (Step) {
Lang Hames2d789c32015-08-26 03:07:41 +0000715 StepVal = Step->codegen();
Lang Hames59b0da82015-08-19 18:15:58 +0000716 if (!StepVal)
717 return nullptr;
Sean Silvad7fb3962012-12-05 00:26:32 +0000718 } else {
719 // If not specified, use 1.0.
Mehdi Aminibb6805d2017-02-11 21:26:52 +0000720 StepVal = ConstantFP::get(TheContext, APFloat(1.0));
Sean Silvad7fb3962012-12-05 00:26:32 +0000721 }
722
723 Value *NextVar = Builder.CreateFAdd(Variable, StepVal, "nextvar");
724
725Now that the body is emitted, we compute the next value of the iteration
726variable by adding the step value, or 1.0 if it isn't present.
727'``NextVar``' will be the value of the loop variable on the next
728iteration of the loop.
729
730.. code-block:: c++
731
732 // Compute the end condition.
Lang Hames2d789c32015-08-26 03:07:41 +0000733 Value *EndCond = End->codegen();
Lang Hames59b0da82015-08-19 18:15:58 +0000734 if (!EndCond)
735 return nullptr;
Sean Silvad7fb3962012-12-05 00:26:32 +0000736
Mehdi Aminibb6805d2017-02-11 21:26:52 +0000737 // Convert condition to a bool by comparing non-equal to 0.0.
Lang Hames59b0da82015-08-19 18:15:58 +0000738 EndCond = Builder.CreateFCmpONE(
Mehdi Aminibb6805d2017-02-11 21:26:52 +0000739 EndCond, ConstantFP::get(TheContext, APFloat(0.0)), "loopcond");
Sean Silvad7fb3962012-12-05 00:26:32 +0000740
741Finally, we evaluate the exit value of the loop, to determine whether
742the loop should exit. This mirrors the condition evaluation for the
743if/then/else statement.
744
745.. code-block:: c++
746
747 // Create the "after loop" block and insert it.
748 BasicBlock *LoopEndBB = Builder.GetInsertBlock();
Lang Hames59b0da82015-08-19 18:15:58 +0000749 BasicBlock *AfterBB =
Mehdi Aminibb6805d2017-02-11 21:26:52 +0000750 BasicBlock::Create(TheContext, "afterloop", TheFunction);
Sean Silvad7fb3962012-12-05 00:26:32 +0000751
752 // Insert the conditional branch into the end of LoopEndBB.
753 Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
754
755 // Any new code will be inserted in AfterBB.
756 Builder.SetInsertPoint(AfterBB);
757
758With the code for the body of the loop complete, we just need to finish
759up the control flow for it. This code remembers the end block (for the
760phi node), then creates the block for the loop exit ("afterloop"). Based
761on the value of the exit condition, it creates a conditional branch that
762chooses between executing the loop again and exiting the loop. Any
763future code is emitted in the "afterloop" block, so it sets the
764insertion position to it.
765
766.. code-block:: c++
767
768 // Add a new entry to the PHI node for the backedge.
769 Variable->addIncoming(NextVar, LoopEndBB);
770
771 // Restore the unshadowed variable.
772 if (OldVal)
773 NamedValues[VarName] = OldVal;
774 else
775 NamedValues.erase(VarName);
776
777 // for expr always returns 0.0.
Mehdi Aminibb6805d2017-02-11 21:26:52 +0000778 return Constant::getNullValue(Type::getDoubleTy(TheContext));
Sean Silvad7fb3962012-12-05 00:26:32 +0000779 }
780
781The final code handles various cleanups: now that we have the "NextVar"
782value, we can add the incoming value to the loop PHI node. After that,
783we remove the loop variable from the symbol table, so that it isn't in
784scope after the for loop. Finally, code generation of the for loop
785always returns 0.0, so that is what we return from
Lang Hames2d789c32015-08-26 03:07:41 +0000786``ForExprAST::codegen()``.
Sean Silvad7fb3962012-12-05 00:26:32 +0000787
788With this, we conclude the "adding control flow to Kaleidoscope" chapter
789of the tutorial. In this chapter we added two control flow constructs,
790and used them to motivate a couple of aspects of the LLVM IR that are
791important for front-end implementors to know. In the next chapter of our
792saga, we will get a bit crazier and add `user-defined
Kirill Bobyreve4364832017-07-10 09:07:23 +0000793operators <LangImpl06.html>`_ to our poor innocent language.
Sean Silvad7fb3962012-12-05 00:26:32 +0000794
795Full Code Listing
796=================
797
798Here is the complete code listing for our running example, enhanced with
Mehdi Aminibb6805d2017-02-11 21:26:52 +0000799the if/then/else and for expressions. To build this example, use:
Sean Silvad7fb3962012-12-05 00:26:32 +0000800
801.. code-block:: bash
802
803 # Compile
Eric Christophera8c6a0a2015-01-08 19:07:01 +0000804 clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
Sean Silvad7fb3962012-12-05 00:26:32 +0000805 # Run
806 ./toy
807
808Here is the code:
809
Logan Chien855b17d2013-06-08 09:03:03 +0000810.. literalinclude:: ../../examples/Kaleidoscope/Chapter5/toy.cpp
811 :language: c++
Sean Silvad7fb3962012-12-05 00:26:32 +0000812
Wilfred Hughes945f43e2016-07-02 17:01:59 +0000813`Next: Extending the language: user-defined operators <LangImpl06.html>`_
Sean Silvad7fb3962012-12-05 00:26:32 +0000814