blob: e95aadd61b2b97b7ee7972138036b2e1e3265848 [file] [log] [blame]
Chris Lattner602c832c2007-10-31 06:30:21 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
3
4<html>
5<head>
6 <title>Kaleidoscope: Extending the Language: Control Flow</title>
7 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
8 <meta name="author" content="Chris Lattner">
9 <link rel="stylesheet" href="../llvm.css" type="text/css">
10</head>
11
12<body>
13
14<div class="doc_title">Kaleidoscope: Extending the Language: Control Flow</div>
15
16<div class="doc_author">
17 <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
18</div>
19
20<!-- *********************************************************************** -->
21<div class="doc_section"><a name="intro">Part 5 Introduction</a></div>
22<!-- *********************************************************************** -->
23
24<div class="doc_text">
25
26<p>Welcome to Part 5 of the "<a href="index.html">Implementing a language with
27LLVM</a>" tutorial. Parts 1-4 described the implementation of the simple
28Kaleidoscope language and included support for generating LLVM IR, following by
29optimizations and a JIT compiler. Unfortunately, as presented, Kaleidoscope is
30mostly useless: it has no control flow other than call and return. This means
31that you can't have conditional branches in the code, significantly limiting its
32power. In this episode of "build that compiler", we'll extend Kaleidoscope to
33have an if/then/else expression plus a simple looping construct.</p>
34
35</div>
36
37<!-- *********************************************************************** -->
38<div class="doc_section"><a name="ifthen">If/Then/Else</a></div>
39<!-- *********************************************************************** -->
40
41<div class="doc_text">
42
43<p>
44Extending Kaleidoscope to support if/then/else is quite straight-forward. It
45basically requires adding lexer support for this "new" concept to the lexer,
46parser, AST, and LLVM code emitter. This example is nice, because it shows how
47easy it is to "grow" a language over time, incrementally extending it as new
48ideas are discovered.</p>
49
50<p>Before we get going on "how" we do this extension, lets talk about what we
51want. The basic idea is that we want to be able to write this sort of thing:
52</p>
53
54<div class="doc_code">
55<pre>
56def fib(x)
57 if x &lt; 3 then
58 1
59 else
60 fib(x-1)+fib(x-2);
61</pre>
62</div>
63
64<p>In Kaleidoscope, every construct is an expression: there are no statements.
65As such, the if/then/else expression needs to return a value like any other.
66Since we're using a mostly functional form, we'll have it evaluate its
67conditional, then return the 'then' or 'else' value based on how the condition
68was resolved. This is very similar to the C "?:" expression.</p>
69
70<p>The semantics of the if/then/else expression is that it first evaluates the
71condition to a boolean equality value: 0.0 is false and everything else is true.
72If the condition is true, the first subexpression is evaluated and returned, if
73the condition is false, the second subexpression is evaluated and returned.
74Since Kaleidoscope allows side-effects, this behavior is important to nail down.
75</p>
76
77<p>Now that we know what we want, lets break this down into its constituent
78pieces.</p>
79
80</div>
81
82<!-- ======================================================================= -->
83<div class="doc_subsubsection"><a name="iflexer">Lexer Extensions for
84If/Then/Else</a></div>
85<!-- ======================================================================= -->
86
87
88<div class="doc_text">
89
90<p>The lexer extensions are straight-forward. First we add new enum values
91for the relevant tokens:</p>
92
93<div class="doc_code">
94<pre>
95 // control
96 tok_if = -6, tok_then = -7, tok_else = -8,
97</pre>
98</div>
99
100<p>Once we have that, we recognize the new keywords in the lexer, pretty simple
101stuff:</p>
102
103<div class="doc_code">
104<pre>
105 ...
106 if (IdentifierStr == "def") return tok_def;
107 if (IdentifierStr == "extern") return tok_extern;
108 <b>if (IdentifierStr == "if") return tok_if;
109 if (IdentifierStr == "then") return tok_then;
110 if (IdentifierStr == "else") return tok_else;</b>
111 return tok_identifier;
112</pre>
113</div>
114
115</div>
116
117<!-- ======================================================================= -->
118<div class="doc_subsubsection"><a name="ifast">AST Extensions for
119 If/Then/Else </a></div>
120<!-- ======================================================================= -->
121
122<div class="doc_text">
123
124<p>To represent the new expression we add a new AST node for it:</p>
125
126<div class="doc_code">
127<pre>
128/// IfExprAST - Expression class for if/then/else.
129class IfExprAST : public ExprAST {
130 ExprAST *Cond, *Then, *Else;
131public:
132 IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
133 : Cond(cond), Then(then), Else(_else) {}
134 virtual Value *Codegen();
135};
136</pre>
137</div>
138
139<p>The AST node just has pointers to the various subexpressions.</p>
140
141</div>
142
143<!-- ======================================================================= -->
144<div class="doc_subsubsection"><a name="ifparser">Parser Extensions for
145If/Then/Else </a></div>
146<!-- ======================================================================= -->
147
148<div class="doc_text">
149
150<p>Now that we have the relevant tokens coming from the lexer and we have the
151AST node to build, our parsing logic is relatively straight-forward. First we
152define a new parsing function:</p>
153
154<div class="doc_code">
155<pre>
156/// ifexpr ::= 'if' expression 'then' expression 'else' expression
157static ExprAST *ParseIfExpr() {
158 getNextToken(); // eat the if.
159
160 // condition.
161 ExprAST *Cond = ParseExpression();
162 if (!Cond) return 0;
163
164 if (CurTok != tok_then)
165 return Error("expected then");
166 getNextToken(); // eat the then
167
168 ExprAST *Then = ParseExpression();
169 if (Then == 0) return 0;
170
171 if (CurTok != tok_else)
172 return Error("expected else");
173
174 getNextToken();
175
176 ExprAST *Else = ParseExpression();
177 if (!Else) return 0;
178
179 return new IfExprAST(Cond, Then, Else);
180}
181</pre>
182</div>
183
184<p>Next we hook it up as a primary expression:</p>
185
186<div class="doc_code">
187<pre>
188static ExprAST *ParsePrimary() {
189 switch (CurTok) {
190 default: return Error("unknown token when expecting an expression");
191 case tok_identifier: return ParseIdentifierExpr();
192 case tok_number: return ParseNumberExpr();
193 case '(': return ParseParenExpr();
194 <b>case tok_if: return ParseIfExpr();</b>
195 }
196}
197</pre>
198</div>
199
200</div>
201
202<!-- ======================================================================= -->
203<div class="doc_subsubsection"><a name="ifir">LLVM IR for If/Then/Else</a></div>
204<!-- ======================================================================= -->
205
206<div class="doc_text">
207
208<p>Now that we have it parsing and building the AST, the final piece is adding
209LLVM code generation support. This is the most interesting part of the
210if/then/else example, because this is where it starts to introduce new concepts.
211All of the code above has been described in previous chapters fairly thoroughly.
212</p>
213
214<p>To motivate the code we want to produce, lets take a look at a simple
215example. Consider:</p>
216
217<div class="doc_code">
218<pre>
219extern foo();
220extern bar();
221def baz(x) if x then foo() else bar();
222</pre>
223</div>
224
225<p>If you disable optimizations, the code you'll (soon) get from Kaleidoscope
226looks like this:</p>
227
228<div class="doc_code">
229<pre>
230declare double @foo()
231
232declare double @bar()
233
234define double @baz(double %x) {
235entry:
236 %ifcond = fcmp one double %x, 0.000000e+00
237 br i1 %ifcond, label %then, label %else
238
239then: ; preds = %entry
240 %calltmp = call double @foo()
241 br label %ifcont
242
243else: ; preds = %entry
244 %calltmp1 = call double @bar()
245 br label %ifcont
246
247ifcont: ; preds = %else, %then
248 %iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ]
249 ret double %iftmp
250}
251</pre>
252</div>
253
254<p>To visualize the control flow graph, you can use a nifty feature of the LLVM
255'<a href="http://llvm.org/cmds/opt.html">opt</a>' tool. If you put this LLVM IR
256into "t.ll" and run "<tt>llvm-as &lt; t.ll | opt -analyze -view-cfg</tt>", <a
257href="../ProgrammersManual.html#ViewGraph">a window will pop up</a> and you'll
258see this graph:</p>
259
260<center><img src="LangImpl5-cfg.png" alt="Example CFG" width="423"
261height="315"></center>
262
263<p>Another way to get this is to call "<tt>F-&gt;viewCFG()</tt>" or
264"<tt>F-&gt;viewCFGOnly()</tt>" (where F is a "<tt>Function*</tt>") either by
265inserting actual calls into the code and recompiling or by calling these in the
266debugger. LLVM has many nice features for visualizing various graphs.</p>
267
268<p>Coming back to the generated code, it is fairly simple: the entry block
269evaluates the conditional expression ("x" in our case here) and compares the
270result to 0.0 with the "<tt><a href="../LangRef.html#i_fcmp">fcmp</a> one</tt>"
271instruction ('one' is "ordered and not equal"). Based on the result of this
272expression, the code jumps to either the "then" or "else" blocks, which contain
273the expressions for the true/false case.</p>
274
275<p>Once the then/else blocks is finished executing, they both branch back to the
276else block to execute the code that happens after the if/then/else. In this
277case the only thing left to do is to return to the caller of the function. The
278question then becomes: how does the code know which expression to return?</p>
279
280<p>The answer to this question involves an important SSA operation: the
281<a href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Phi
282operation</a>. If you're not familiar with SSA, <a
283href="http://en.wikipedia.org/wiki/Static_single_assignment_form">the wikipedia
284article</a> is a good introduction and there are various other introductions to
285it available on your favorite search engine. The short version is that
286"execution" of the Phi operation requires "remembering" which block control came
287from. The Phi operation takes on the value corresponding to the input control
288block. In this case, if control comes in from the "then" block, it gets the
289value of "calltmp". If control comes from the "else" block, it gets the value
290of "calltmp1".</p>
291
292<p>At this point, you are probably starting to think "on no! this means my
293simple and elegant front-end will have to start generating SSA form in order to
294use LLVM!". Fortunately, this is not the case, and we strongly advise
295<em>not</em> implementing an SSA construction algorithm in your front-end
296unless there is an amazingly good reason to do so. In practice, there are two
297sorts of values that float around in code written in your average imperative
298programming language that might need Phi nodes:</p>
299
300<ol>
301<li>Code that involves user variables: <tt>x = 1; x = x + 1; </tt></li>
302<li>Values that are implicit in the structure of your AST, such as the phi node
303in this case.</li>
304</ol>
305
306<p>At a future point in this tutorial ("mutable variables"), we'll talk about #1
307in depth. For now, just believe me that you don't need SSA construction to
308handle them. For #2, you have the choice of using the techniques that we will
309describe for #1, or you can insert Phi nodes directly if convenient. In this
310case, it is really really easy to generate the Phi node, so we choose to do it
311directly.</p>
312
313<p>Okay, enough of the motivation and overview, lets generate code!</p>
314
315</div>
316
317<!-- ======================================================================= -->
318<div class="doc_subsubsection"><a name="ifcodegen">Code Generation for
319If/Then/Else</a></div>
320<!-- ======================================================================= -->
321
322<div class="doc_text">
323
324<p>In order to generate code for this, we implement the <tt>Codegen</tt> method
325for <tt>IfExprAST</tt>:</p>
326
327<div class="doc_code">
328<pre>
329Value *IfExprAST::Codegen() {
330 Value *CondV = Cond-&gt;Codegen();
331 if (CondV == 0) return 0;
332
333 // Convert condition to a bool by comparing equal to 0.0.
334 CondV = Builder.CreateFCmpONE(CondV,
335 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
336 "ifcond");
337</pre>
338</div>
339
340<p>This code is straight-forward and similar to what we saw before. We emit the
341expression for the condition, then compare that value to zero to get a truth
342value as a 1-bit (bool) value.</p>
343
344<div class="doc_code">
345<pre>
346 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
347
348 // Create blocks for the then and else cases. Insert the 'then' block at the
349 // end of the function.
350 BasicBlock *ThenBB = new BasicBlock("then", TheFunction);
351 BasicBlock *ElseBB = new BasicBlock("else");
352 BasicBlock *MergeBB = new BasicBlock("ifcont");
353
354 Builder.CreateCondBr(CondV, ThenBB, ElseBB);
355</pre>
356</div>
357
358<p>This code creates the basic blocks that are related to the if/then/else
359statement, and correspond directly to the blocks in the example above. The
360first line of this gets the current Function object that is being built. It
361gets this by asking the builder for the current BasicBlock, and asking that
362block for its "parent" (the function it is currently embedded into).</p>
363
364<p>Once it has that, it creates three blocks. Note that it passes "TheFunction"
365into the constructor for the "then" block. This causes the constructor to
366automatically insert the new block onto the end of the specified function. The
367other two blocks are created, but aren't yet inserted into the function.</p>
368
369<p>Once the blocks are created, we can emit the conditional branch that chooses
370between them. Note that creating new blocks does not implicitly affect the
371LLVMBuilder, so it is still inserting into the block that the condition
372went into. Also note that it is creating a branch to the "then" block and the
373"else" block, even though the "else" block isn't inserted into the function yet.
374This is all ok: it is the standard way that LLVM supports forward
375references.</p>
376
377<div class="doc_code">
378<pre>
379 // Emit then value.
380 Builder.SetInsertPoint(ThenBB);
381
382 Value *ThenV = Then-&gt;Codegen();
383 if (ThenV == 0) return 0;
384
385 Builder.CreateBr(MergeBB);
386 // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
387 ThenBB = Builder.GetInsertBlock();
388</pre>
389</div>
390
391<p>After the conditional branch is inserted, we move the builder to start
392inserting into the "then" block. Strictly speaking, this call moves the
393insertion point to be at the end of the specified block. However, since the
394"then" block is empty, it also starts out by inserting at the beginning of the
395block. :)</p>
396
397<p>Once the insertion point is set, we recursively codegen the "then" expression
398from the AST. To finish off the then block, we create an unconditional branch
399to the merge block. One interesting (and very important) aspect of the LLVM IR
400is that it <a href="../LangRef.html#functionstructure">requires all basic blocks
401to be "terminated"</a> with a <a href="../LangRef.html#terminators">control flow
402instruction</a> such as return or branch. This means that all control flow,
403<em>including fall throughs</em> must be made explicit in the LLVM IR. If you
404violate this rule, the verifier will emit an error.</p>
405
406<p>The final line here is quite subtle, but is very important. The basic issue
407is that when we create the Phi node in the merge block, we need to set up the
408block/value pairs that indicate how the Phi will work. Importantly, the Phi
409node expects to have an extry for each predecessor of the block in the CFG. Why
410then are we getting the current block when we just set it to ThenBB 5 lines
411above? The problem is that the "Then" expression may actually itself change the
412block that the Builder is emitting into if, for example, it contains a nested
413"if/then/else" expression. Because calling Codegen recursively could
414arbitrarily change the notion of the current block, we are required to get an
415up-to-date value for code that will set up the Phi node.</p>
416
417<div class="doc_code">
418<pre>
419 // Emit else block.
420 TheFunction-&gt;getBasicBlockList().push_back(ElseBB);
421 Builder.SetInsertPoint(ElseBB);
422
423 Value *ElseV = Else-&gt;Codegen();
424 if (ElseV == 0) return 0;
425
426 Builder.CreateBr(MergeBB);
427 // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
428 ElseBB = Builder.GetInsertBlock();
429</pre>
430</div>
431
432<p>Code generation for the 'else' block is basically identical to codegen for
433the 'then' block. The only significant difference is the first line, which adds
434the 'else' block to the function. Recall previously that the 'else' block was
435created, but not added to the function. Now that the 'then' and 'else' blocks
436are emitted, we can finish up with the merge code:</p>
437
438<div class="doc_code">
439<pre>
440 // Emit merge block.
441 TheFunction->getBasicBlockList().push_back(MergeBB);
442 Builder.SetInsertPoint(MergeBB);
443 PHINode *PN = Builder.CreatePHI(Type::DoubleTy, "iftmp");
444
445 PN->addIncoming(ThenV, ThenBB);
446 PN->addIncoming(ElseV, ElseBB);
447 return PN;
448}
449</pre>
450</div>
451
452<p>The first two lines here are now familiar: the first adds the "merge" block
453to the Function object (it was previously floating, like the else block above).
454The second block changes the insertion point so that newly created code will go
455into the "merge" block. Once that is done, we need to create the PHI node and
456set up the block/value pairs for the PHI.</p>
457
458<p>Finally, the CodeGen function returns the phi node as the value computed by
459the if/then/else expression. In our example above, this returned value will
460feed into the code for the top-level function, which will create the return
461instruction.</p>
462
463<p>Overall, we now have the ability to execution conditional code in
464Kaleidoscope. With this extension, Kaleidoscope is a fairly complete language
465that can calculate a wide variety of numeric functions. Next up we'll add
466another useful expression that is familiar from non-functional languages...</p>
467
468</div>
469
470<!-- *********************************************************************** -->
471<div class="doc_section"><a name="for">'for' Loop Expression</a></div>
472<!-- *********************************************************************** -->
473
474<div class="doc_text">
475
Chris Lattnerf5234802007-10-31 06:47:39 +0000476<p>Now that we know how to add basic control flow constructs to the language,
477we have the tools to add more powerful things. Lets add something more
478aggressive, a 'for' expression:</p>
479
480<div class="doc_code">
481<pre>
Chris Lattnerf5234802007-10-31 06:47:39 +0000482 extern putchard(char)
Chris Lattner6093bd52007-10-31 07:29:43 +0000483 def printstar(n)
484 for i = 1, i &lt; n, 1.0 in
485 putchard(42); # ascii 42 = '*'
486
487 # print 100 '*' characters
488 printstar(100);
Chris Lattnerf5234802007-10-31 06:47:39 +0000489</pre>
490</div>
491
Chris Lattner6093bd52007-10-31 07:29:43 +0000492<p>This expression defines a new variable ("i" in this case) which iterates from
493a starting value, while the condition ("i &lt; n" in this case) is true,
Chris Lattnerf5234802007-10-31 06:47:39 +0000494incrementing by an optional step value ("1.0" in this case). If the step value
495is omitted, it defaults to 1.0. While the loop is true, it executes its
496body expression. Because we don't have anything better to return, we'll just
497define the loop as always returning 0.0. In the future when we have mutable
498variables, it will get more useful.</p>
499
500<p>As before, lets talk about the changes that we need to Kaleidoscope to
501support this.</p>
502
503</div>
504
505<!-- ======================================================================= -->
506<div class="doc_subsubsection"><a name="forlexer">Lexer Extensions for
507the 'for' Loop</a></div>
508<!-- ======================================================================= -->
509
510<div class="doc_text">
511
512<p>The lexer extensions are the same sort of thing as for if/then/else:</p>
513
514<div class="doc_code">
515<pre>
516 ... in enum Token ...
517 // control
518 tok_if = -6, tok_then = -7, tok_else = -8,
519<b> tok_for = -9, tok_in = -10</b>
520
521 ... in gettok ...
522 if (IdentifierStr == "def") return tok_def;
523 if (IdentifierStr == "extern") return tok_extern;
524 if (IdentifierStr == "if") return tok_if;
525 if (IdentifierStr == "then") return tok_then;
526 if (IdentifierStr == "else") return tok_else;
527 <b>if (IdentifierStr == "for") return tok_for;
528 if (IdentifierStr == "in") return tok_in;</b>
529 return tok_identifier;
530</pre>
531</div>
532
533</div>
534
535<!-- ======================================================================= -->
536<div class="doc_subsubsection"><a name="forast">AST Extensions for
537the 'for' Loop</a></div>
538<!-- ======================================================================= -->
539
540<div class="doc_text">
541
542<p>The AST node is similarly simple. It basically boils down to capturing
543the variable name and the consituent expressions in the node.</p>
544
545<div class="doc_code">
546<pre>
547/// ForExprAST - Expression class for for/in.
548class ForExprAST : public ExprAST {
549 std::string VarName;
550 ExprAST *Start, *End, *Step, *Body;
551public:
552 ForExprAST(const std::string &amp;varname, ExprAST *start, ExprAST *end,
553 ExprAST *step, ExprAST *body)
554 : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
555 virtual Value *Codegen();
556};
557</pre>
558</div>
559
560</div>
561
562<!-- ======================================================================= -->
563<div class="doc_subsubsection"><a name="forparser">Parser Extensions for
564the 'for' Loop</a></div>
565<!-- ======================================================================= -->
566
567<div class="doc_text">
568
569<p>The parser code is also fairly standard. The only interesting thing here is
570handling of the optional step value. The parser code handles it by checking to
571see if the second comma is present. If not, it sets the step value to null in
572the AST node:</p>
573
574<div class="doc_code">
575<pre>
576/// forexpr ::= 'for' identifer '=' expr ',' expr (',' expr)? 'in' expression
577static ExprAST *ParseForExpr() {
578 getNextToken(); // eat the for.
579
580 if (CurTok != tok_identifier)
581 return Error("expected identifier after for");
582
583 std::string IdName = IdentifierStr;
584 getNextToken(); // eat identifer.
585
586 if (CurTok != '=')
587 return Error("expected '=' after for");
588 getNextToken(); // eat '='.
589
590
591 ExprAST *Start = ParseExpression();
592 if (Start == 0) return 0;
593 if (CurTok != ',')
594 return Error("expected ',' after for start value");
595 getNextToken();
596
597 ExprAST *End = ParseExpression();
598 if (End == 0) return 0;
599
600 // The step value is optional.
601 ExprAST *Step = 0;
602 if (CurTok == ',') {
603 getNextToken();
604 Step = ParseExpression();
605 if (Step == 0) return 0;
606 }
607
608 if (CurTok != tok_in)
609 return Error("expected 'in' after for");
610 getNextToken(); // eat 'in'.
611
612 ExprAST *Body = ParseExpression();
613 if (Body == 0) return 0;
614
615 return new ForExprAST(IdName, Start, End, Step, Body);
616}
617</pre>
618</div>
619
620</div>
621
622<!-- ======================================================================= -->
623<div class="doc_subsubsection"><a name="forir">LLVM IR for
624the 'for' Loop</a></div>
625<!-- ======================================================================= -->
626
627<div class="doc_text">
628
629<p>Now we get to the good part: the LLVM IR we want to generate for this thing.
Chris Lattner6093bd52007-10-31 07:29:43 +0000630With the simple example above, we get this LLVM IR (note that this dump is
631generated with optimizations disabled):
Chris Lattnerf5234802007-10-31 06:47:39 +0000632</p>
633
Chris Lattner6093bd52007-10-31 07:29:43 +0000634<div class="doc_code">
635<pre>
636declare double @putchard(double)
Chris Lattnerf5234802007-10-31 06:47:39 +0000637
Chris Lattner6093bd52007-10-31 07:29:43 +0000638define double @printstar(double %n) {
639entry:
640 ; initial value = 1.0 (inlined into phi)
641 br label %loop
642
643loop: ; preds = %loop, %entry
644 %i = phi double [ 1.000000e+00, %entry ], [ %nextvar, %loop ]
645 ; body
646 %calltmp = call double @putchard( double 4.200000e+01 )
647 ; increment
648 %nextvar = add double %i, 1.000000e+00
649
650 ; termination test
651 %multmp = fcmp ult double %i, %n
652 %booltmp = uitofp i1 %multmp to double
653 %loopcond = fcmp one double %booltmp, 0.000000e+00
654 br i1 %loopcond, label %loop, label %afterloop
655
656afterloop: ; preds = %loop
657 ; loop always returns 0.0
658 ret double 0.000000e+00
659}
660</pre>
661</div>
662
663<p>This loop contains all the same constructs we saw before: a phi node, several
664expressions, and some basic blocks. Lets see how this fits together.</p>
Chris Lattnerf5234802007-10-31 06:47:39 +0000665
666</div>
667
668<!-- ======================================================================= -->
669<div class="doc_subsubsection"><a name="forcodegen">Code Generation for
670the 'for' Loop</a></div>
671<!-- ======================================================================= -->
672
673<div class="doc_text">
674
Chris Lattner6093bd52007-10-31 07:29:43 +0000675<p>The first part of codegen is very simple: we just output the start expression
676for the loop value:</p>
Chris Lattnerf5234802007-10-31 06:47:39 +0000677
678<div class="doc_code">
679<pre>
680Value *ForExprAST::Codegen() {
Chris Lattner6093bd52007-10-31 07:29:43 +0000681 // Emit the start code first, without 'variable' in scope.
682 Value *StartVal = Start-&gt;Codegen();
683 if (StartVal == 0) return 0;
684</pre>
685</div>
686
687<p>With this out of the way, the next step is to set up the LLVM basic block
688for the start of the loop body. In the case above, the whole loop body is one
689block, but remember that the body code itself could consist of multiple blocks
690(e.g. if it is a if/then/else expression).</p>
691
692<div class="doc_code">
693<pre>
694 // Make the new basic block for the loop header, inserting after current
695 // block.
696 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
697 BasicBlock *PreheaderBB = Builder.GetInsertBlock();
698 BasicBlock *LoopBB = new BasicBlock("loop", TheFunction);
699
700 // Insert an explicit fall through from the current block to the LoopBB.
701 Builder.CreateBr(LoopBB);
702</pre>
703</div>
704
705<p>This code is similar to what we saw for if/then/else. Because we will need
706it to create the Phi node, we remember the block that falls through into the
707loop. Once we have that, we create the actual block that starts the loop and
708create an unconditional branch for the fall-through between the two blocks.</p>
709
710<div class="doc_code">
711<pre>
712 // Start insertion in LoopBB.
713 Builder.SetInsertPoint(LoopBB);
714
715 // Start the PHI node with an entry for Start.
716 PHINode *Variable = Builder.CreatePHI(Type::DoubleTy, VarName.c_str());
717 Variable-&gt;addIncoming(StartVal, PreheaderBB);
718</pre>
719</div>
720
721<p>Now that the "preheader" for the loop is set up, we switch to emitting code
722for the loop body. To begin with, we move the insertion point and create the
723PHI node for the loop induction variable. SInce we already know the incoming
724value for the starting value, we add it to the Phi node. Note that the Phi will
725eventually get a second value for the backedge, but we can't set it up yet
726(because it doesn't exist!).</p>
727
728<div class="doc_code">
729<pre>
730 // Within the loop, the variable is defined equal to the PHI node. If it
731 // shadows an existing variable, we have to restore it, so save it now.
732 Value *OldVal = NamedValues[VarName];
733 NamedValues[VarName] = Variable;
734
735 // Emit the body of the loop. This, like any other expr, can change the
736 // current BB. Note that we ignore the value computed by the body, but don't
737 // allow an error.
738 if (Body-&gt;Codegen() == 0)
739 return 0;
740</pre>
741</div>
742
743<p>Now the code starts to get more interesting. Our 'for' loop introduces a new
744variable to the symbol table. This means that our symbol table can now contain
745either function arguments or loop variables. To handle this, before we codegen
746the body of the loop, we add the loop variable as the current value for its
747name. Note that it is possible that there is a variable of the same name in the
748outer scope. It would be easy to make this an error (emit an error and return
749null if there is already an entry for VarName) but we choose to allow shadowing
750of variables. In order to handle this correctly, we remember the Value that
751we are potentially shadowing in <tt>OldVal</tt> (which will be null if there is
752no shadowed variable).</p>
753
754<p>Once the loop variable is set into the symbol table, the code recursively
755codegen's the body. This allows the body to use the loop variable: any
756references to it will naturally find it in the symbol table.</p>
757
758<div class="doc_code">
759<pre>
760 // Emit the step value.
761 Value *StepVal;
762 if (Step) {
763 StepVal = Step-&gt;Codegen();
764 if (StepVal == 0) return 0;
765 } else {
766 // If not specified, use 1.0.
767 StepVal = ConstantFP::get(Type::DoubleTy, APFloat(1.0));
768 }
769
770 Value *NextVar = Builder.CreateAdd(Variable, StepVal, "nextvar");
771</pre>
772</div>
773
774<p>Now that the body is emitted, we compute the next value of the iteration
775variable by adding the step value or 1.0 if it isn't present. '<tt>NextVar</tt>'
776will be the value of the loop variable on the next iteration of the loop.</p>
777
778<div class="doc_code">
779<pre>
780 // Compute the end condition.
781 Value *EndCond = End-&gt;Codegen();
782 if (EndCond == 0) return EndCond;
783
784 // Convert condition to a bool by comparing equal to 0.0.
785 EndCond = Builder.CreateFCmpONE(EndCond,
786 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
787 "loopcond");
788</pre>
789</div>
790
791<p>Finally, we evaluate the exit value of the loop, to determine whether the
792loop should exit. This mirrors the condition evaluation for the if/then/else
793statement.</p>
794
795<div class="doc_code">
796<pre>
797 // Create the "after loop" block and insert it.
798 BasicBlock *LoopEndBB = Builder.GetInsertBlock();
799 BasicBlock *AfterBB = new BasicBlock("afterloop", TheFunction);
800
801 // Insert the conditional branch into the end of LoopEndBB.
802 Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
803
804 // Any new code will be inserted in AfterBB.
805 Builder.SetInsertPoint(AfterBB);
806</pre>
807</div>
808
809<p>With the code for the body of the loop complete, we just need to finish up
810the control flow for it. This remembers the end block (for the phi node), then
811creates the block for the loop exit ("afterloop"). Based on the value of the
812exit condition, it creates a conditional branch that chooses between executing
813the loop again and exiting the loop. Any future code is emitted in the
814"afterloop" block, so it sets the insertion position to it.</p>
815
816<div class="doc_code">
817<pre>
818 // Add a new entry to the PHI node for the backedge.
819 Variable-&gt;addIncoming(NextVar, LoopEndBB);
820
821 // Restore the unshadowed variable.
822 if (OldVal)
823 NamedValues[VarName] = OldVal;
824 else
825 NamedValues.erase(VarName);
826
827 // for expr always returns 0.0.
828 return Constant::getNullValue(Type::DoubleTy);
829}
830</pre>
831</div>
832
833<p>The final code handles various cleanups: now that we have the "NextVar"
834value, we can add the incoming value to the loop PHI node. After that, we
835remove the loop variable from the symbol table, so that it isn't in scope after
836the for loop. Finally, code generation of the for loop always returns 0.0, so
837that is what we return from <tt>ForExprAST::Codegen</tt>.</p>
838
839<p>With this, we conclude the "adding control flow to Kaleidoscope" chapter of
840the tutorial. We added two control flow constructs, and used them to motivate
841a couple of aspects of the LLVM IR that are important for front-end implementors
842to know. In the next chapter of our saga, we will get a bit crazier and add
843operator overloading to our poor innocent language.</p>
844
845</div>
846
847<!-- *********************************************************************** -->
848<div class="doc_section"><a name="code">Full Code Listing</a></div>
849<!-- *********************************************************************** -->
850
851<div class="doc_text">
852
853<p>
854Here is the complete code listing for our running example, enhanced with the
855if/then/else and for expressions.. To build this example, use:
856</p>
857
858<div class="doc_code">
859<pre>
860 # Compile
861 g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
862 # Run
863 ./toy
864</pre>
865</div>
866
867<p>Here is the code:</p>
868
869<div class="doc_code">
870<pre>
871#include "llvm/DerivedTypes.h"
872#include "llvm/ExecutionEngine/ExecutionEngine.h"
873#include "llvm/Module.h"
874#include "llvm/ModuleProvider.h"
875#include "llvm/PassManager.h"
876#include "llvm/Analysis/Verifier.h"
877#include "llvm/Target/TargetData.h"
878#include "llvm/Transforms/Scalar.h"
879#include "llvm/Support/LLVMBuilder.h"
880#include &lt;cstdio&gt;
881#include &lt;string&gt;
882#include &lt;map&gt;
883#include &lt;vector&gt;
884using namespace llvm;
885
886//===----------------------------------------------------------------------===//
887// Lexer
888//===----------------------------------------------------------------------===//
889
890// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
891// of these for known things.
892enum Token {
893 tok_eof = -1,
894
895 // commands
896 tok_def = -2, tok_extern = -3,
897
898 // primary
899 tok_identifier = -4, tok_number = -5,
900
901 // control
902 tok_if = -6, tok_then = -7, tok_else = -8,
903 tok_for = -9, tok_in = -10
904};
905
906static std::string IdentifierStr; // Filled in if tok_identifier
907static double NumVal; // Filled in if tok_number
908
909/// gettok - Return the next token from standard input.
910static int gettok() {
911 static int LastChar = ' ';
912
913 // Skip any whitespace.
914 while (isspace(LastChar))
915 LastChar = getchar();
916
917 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
918 IdentifierStr = LastChar;
919 while (isalnum((LastChar = getchar())))
920 IdentifierStr += LastChar;
921
922 if (IdentifierStr == "def") return tok_def;
923 if (IdentifierStr == "extern") return tok_extern;
924 if (IdentifierStr == "if") return tok_if;
925 if (IdentifierStr == "then") return tok_then;
926 if (IdentifierStr == "else") return tok_else;
927 if (IdentifierStr == "for") return tok_for;
928 if (IdentifierStr == "in") return tok_in;
929 return tok_identifier;
930 }
931
932 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
933 std::string NumStr;
934 do {
935 NumStr += LastChar;
936 LastChar = getchar();
937 } while (isdigit(LastChar) || LastChar == '.');
938
939 NumVal = strtod(NumStr.c_str(), 0);
940 return tok_number;
941 }
942
943 if (LastChar == '#') {
944 // Comment until end of line.
945 do LastChar = getchar();
946 while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp; LastChar != '\r');
947
948 if (LastChar != EOF)
949 return gettok();
950 }
951
952 // Check for end of file. Don't eat the EOF.
953 if (LastChar == EOF)
954 return tok_eof;
955
956 // Otherwise, just return the character as its ascii value.
957 int ThisChar = LastChar;
958 LastChar = getchar();
959 return ThisChar;
960}
961
962//===----------------------------------------------------------------------===//
963// Abstract Syntax Tree (aka Parse Tree)
964//===----------------------------------------------------------------------===//
965
966/// ExprAST - Base class for all expression nodes.
967class ExprAST {
968public:
969 virtual ~ExprAST() {}
970 virtual Value *Codegen() = 0;
971};
972
973/// NumberExprAST - Expression class for numeric literals like "1.0".
974class NumberExprAST : public ExprAST {
975 double Val;
976public:
977 NumberExprAST(double val) : Val(val) {}
978 virtual Value *Codegen();
979};
980
981/// VariableExprAST - Expression class for referencing a variable, like "a".
982class VariableExprAST : public ExprAST {
983 std::string Name;
984public:
985 VariableExprAST(const std::string &amp;name) : Name(name) {}
986 virtual Value *Codegen();
987};
988
989/// BinaryExprAST - Expression class for a binary operator.
990class BinaryExprAST : public ExprAST {
991 char Op;
992 ExprAST *LHS, *RHS;
993public:
994 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
995 : Op(op), LHS(lhs), RHS(rhs) {}
996 virtual Value *Codegen();
997};
998
999/// CallExprAST - Expression class for function calls.
1000class CallExprAST : public ExprAST {
1001 std::string Callee;
1002 std::vector&lt;ExprAST*&gt; Args;
1003public:
1004 CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
1005 : Callee(callee), Args(args) {}
1006 virtual Value *Codegen();
1007};
1008
1009/// IfExprAST - Expression class for if/then/else.
1010class IfExprAST : public ExprAST {
1011 ExprAST *Cond, *Then, *Else;
1012public:
1013 IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
1014 : Cond(cond), Then(then), Else(_else) {}
1015 virtual Value *Codegen();
1016};
1017
1018/// ForExprAST - Expression class for for/in.
1019class ForExprAST : public ExprAST {
1020 std::string VarName;
1021 ExprAST *Start, *End, *Step, *Body;
1022public:
1023 ForExprAST(const std::string &amp;varname, ExprAST *start, ExprAST *end,
1024 ExprAST *step, ExprAST *body)
1025 : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
1026 virtual Value *Codegen();
1027};
1028
1029/// PrototypeAST - This class represents the "prototype" for a function,
1030/// which captures its argument names as well as if it is an operator.
1031class PrototypeAST {
1032 std::string Name;
1033 std::vector&lt;std::string&gt; Args;
1034public:
1035 PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args)
1036 : Name(name), Args(args) {}
1037
1038 Function *Codegen();
1039};
1040
1041/// FunctionAST - This class represents a function definition itself.
1042class FunctionAST {
1043 PrototypeAST *Proto;
1044 ExprAST *Body;
1045public:
1046 FunctionAST(PrototypeAST *proto, ExprAST *body)
1047 : Proto(proto), Body(body) {}
1048
1049 Function *Codegen();
1050};
1051
1052//===----------------------------------------------------------------------===//
1053// Parser
1054//===----------------------------------------------------------------------===//
1055
1056/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
1057/// token the parser it looking at. getNextToken reads another token from the
1058/// lexer and updates CurTok with its results.
1059static int CurTok;
1060static int getNextToken() {
1061 return CurTok = gettok();
1062}
1063
1064/// BinopPrecedence - This holds the precedence for each binary operator that is
1065/// defined.
1066static std::map&lt;char, int&gt; BinopPrecedence;
1067
1068/// GetTokPrecedence - Get the precedence of the pending binary operator token.
1069static int GetTokPrecedence() {
1070 if (!isascii(CurTok))
1071 return -1;
1072
1073 // Make sure it's a declared binop.
1074 int TokPrec = BinopPrecedence[CurTok];
1075 if (TokPrec &lt;= 0) return -1;
1076 return TokPrec;
1077}
1078
1079/// Error* - These are little helper functions for error handling.
1080ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
1081PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
1082FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
1083
1084static ExprAST *ParseExpression();
1085
1086/// identifierexpr
1087/// ::= identifer
1088/// ::= identifer '(' expression* ')'
1089static ExprAST *ParseIdentifierExpr() {
1090 std::string IdName = IdentifierStr;
1091
1092 getNextToken(); // eat identifer.
1093
1094 if (CurTok != '(') // Simple variable ref.
1095 return new VariableExprAST(IdName);
1096
1097 // Call.
1098 getNextToken(); // eat (
1099 std::vector&lt;ExprAST*&gt; Args;
1100 if (CurTok != ')') {
1101 while (1) {
1102 ExprAST *Arg = ParseExpression();
1103 if (!Arg) return 0;
1104 Args.push_back(Arg);
1105
1106 if (CurTok == ')') break;
1107
1108 if (CurTok != ',')
1109 return Error("Expected ')'");
1110 getNextToken();
1111 }
1112 }
1113
1114 // Eat the ')'.
1115 getNextToken();
1116
1117 return new CallExprAST(IdName, Args);
1118}
1119
1120/// numberexpr ::= number
1121static ExprAST *ParseNumberExpr() {
1122 ExprAST *Result = new NumberExprAST(NumVal);
1123 getNextToken(); // consume the number
1124 return Result;
1125}
1126
1127/// parenexpr ::= '(' expression ')'
1128static ExprAST *ParseParenExpr() {
1129 getNextToken(); // eat (.
1130 ExprAST *V = ParseExpression();
1131 if (!V) return 0;
1132
1133 if (CurTok != ')')
1134 return Error("expected ')'");
1135 getNextToken(); // eat ).
1136 return V;
1137}
1138
1139/// ifexpr ::= 'if' expression 'then' expression 'else' expression
1140static ExprAST *ParseIfExpr() {
1141 getNextToken(); // eat the if.
1142
1143 // condition.
1144 ExprAST *Cond = ParseExpression();
1145 if (!Cond) return 0;
1146
1147 if (CurTok != tok_then)
1148 return Error("expected then");
1149 getNextToken(); // eat the then
1150
1151 ExprAST *Then = ParseExpression();
1152 if (Then == 0) return 0;
1153
1154 if (CurTok != tok_else)
1155 return Error("expected else");
1156
1157 getNextToken();
1158
1159 ExprAST *Else = ParseExpression();
1160 if (!Else) return 0;
1161
1162 return new IfExprAST(Cond, Then, Else);
1163}
1164
1165/// forexpr ::= 'for' identifer '=' expr ',' expr (',' expr)? 'in' expression
1166static ExprAST *ParseForExpr() {
1167 getNextToken(); // eat the for.
1168
1169 if (CurTok != tok_identifier)
1170 return Error("expected identifier after for");
1171
1172 std::string IdName = IdentifierStr;
1173 getNextToken(); // eat identifer.
1174
1175 if (CurTok != '=')
1176 return Error("expected '=' after for");
1177 getNextToken(); // eat '='.
1178
1179
1180 ExprAST *Start = ParseExpression();
1181 if (Start == 0) return 0;
1182 if (CurTok != ',')
1183 return Error("expected ',' after for start value");
1184 getNextToken();
1185
1186 ExprAST *End = ParseExpression();
1187 if (End == 0) return 0;
1188
1189 // The step value is optional.
1190 ExprAST *Step = 0;
1191 if (CurTok == ',') {
1192 getNextToken();
1193 Step = ParseExpression();
1194 if (Step == 0) return 0;
1195 }
1196
1197 if (CurTok != tok_in)
1198 return Error("expected 'in' after for");
1199 getNextToken(); // eat 'in'.
1200
1201 ExprAST *Body = ParseExpression();
1202 if (Body == 0) return 0;
1203
1204 return new ForExprAST(IdName, Start, End, Step, Body);
1205}
1206
1207
1208/// primary
1209/// ::= identifierexpr
1210/// ::= numberexpr
1211/// ::= parenexpr
1212/// ::= ifexpr
1213/// ::= forexpr
1214static ExprAST *ParsePrimary() {
1215 switch (CurTok) {
1216 default: return Error("unknown token when expecting an expression");
1217 case tok_identifier: return ParseIdentifierExpr();
1218 case tok_number: return ParseNumberExpr();
1219 case '(': return ParseParenExpr();
1220 case tok_if: return ParseIfExpr();
1221 case tok_for: return ParseForExpr();
1222 }
1223}
1224
1225/// binoprhs
1226/// ::= ('+' primary)*
1227static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
1228 // If this is a binop, find its precedence.
1229 while (1) {
1230 int TokPrec = GetTokPrecedence();
1231
1232 // If this is a binop that binds at least as tightly as the current binop,
1233 // consume it, otherwise we are done.
1234 if (TokPrec &lt; ExprPrec)
1235 return LHS;
1236
1237 // Okay, we know this is a binop.
1238 int BinOp = CurTok;
1239 getNextToken(); // eat binop
1240
1241 // Parse the primary expression after the binary operator.
1242 ExprAST *RHS = ParsePrimary();
1243 if (!RHS) return 0;
1244
1245 // If BinOp binds less tightly with RHS than the operator after RHS, let
1246 // the pending operator take RHS as its LHS.
1247 int NextPrec = GetTokPrecedence();
1248 if (TokPrec &lt; NextPrec) {
1249 RHS = ParseBinOpRHS(TokPrec+1, RHS);
1250 if (RHS == 0) return 0;
1251 }
1252
1253 // Merge LHS/RHS.
1254 LHS = new BinaryExprAST(BinOp, LHS, RHS);
1255 }
1256}
1257
1258/// expression
1259/// ::= primary binoprhs
1260///
1261static ExprAST *ParseExpression() {
1262 ExprAST *LHS = ParsePrimary();
1263 if (!LHS) return 0;
1264
1265 return ParseBinOpRHS(0, LHS);
1266}
1267
1268/// prototype
1269/// ::= id '(' id* ')'
1270static PrototypeAST *ParsePrototype() {
1271 if (CurTok != tok_identifier)
1272 return ErrorP("Expected function name in prototype");
1273
1274 std::string FnName = IdentifierStr;
1275 getNextToken();
1276
1277 if (CurTok != '(')
1278 return ErrorP("Expected '(' in prototype");
1279
1280 std::vector&lt;std::string&gt; ArgNames;
1281 while (getNextToken() == tok_identifier)
1282 ArgNames.push_back(IdentifierStr);
1283 if (CurTok != ')')
1284 return ErrorP("Expected ')' in prototype");
1285
1286 // success.
1287 getNextToken(); // eat ')'.
1288
1289 return new PrototypeAST(FnName, ArgNames);
1290}
1291
1292/// definition ::= 'def' prototype expression
1293static FunctionAST *ParseDefinition() {
1294 getNextToken(); // eat def.
1295 PrototypeAST *Proto = ParsePrototype();
1296 if (Proto == 0) return 0;
1297
1298 if (ExprAST *E = ParseExpression())
1299 return new FunctionAST(Proto, E);
1300 return 0;
1301}
1302
1303/// toplevelexpr ::= expression
1304static FunctionAST *ParseTopLevelExpr() {
1305 if (ExprAST *E = ParseExpression()) {
1306 // Make an anonymous proto.
1307 PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
1308 return new FunctionAST(Proto, E);
1309 }
1310 return 0;
1311}
1312
1313/// external ::= 'extern' prototype
1314static PrototypeAST *ParseExtern() {
1315 getNextToken(); // eat extern.
1316 return ParsePrototype();
1317}
1318
1319//===----------------------------------------------------------------------===//
1320// Code Generation
1321//===----------------------------------------------------------------------===//
1322
1323static Module *TheModule;
1324static LLVMFoldingBuilder Builder;
1325static std::map&lt;std::string, Value*&gt; NamedValues;
1326static FunctionPassManager *TheFPM;
1327
1328Value *ErrorV(const char *Str) { Error(Str); return 0; }
1329
1330Value *NumberExprAST::Codegen() {
1331 return ConstantFP::get(Type::DoubleTy, APFloat(Val));
1332}
1333
1334Value *VariableExprAST::Codegen() {
1335 // Look this variable up in the function.
1336 Value *V = NamedValues[Name];
1337 return V ? V : ErrorV("Unknown variable name");
1338}
1339
1340Value *BinaryExprAST::Codegen() {
1341 Value *L = LHS-&gt;Codegen();
1342 Value *R = RHS-&gt;Codegen();
1343 if (L == 0 || R == 0) return 0;
1344
1345 switch (Op) {
1346 case '+': return Builder.CreateAdd(L, R, "addtmp");
1347 case '-': return Builder.CreateSub(L, R, "subtmp");
1348 case '*': return Builder.CreateMul(L, R, "multmp");
1349 case '&lt;':
1350 L = Builder.CreateFCmpULT(L, R, "multmp");
1351 // Convert bool 0/1 to double 0.0 or 1.0
1352 return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
1353 default: return ErrorV("invalid binary operator");
1354 }
1355}
1356
1357Value *CallExprAST::Codegen() {
1358 // Look up the name in the global module table.
1359 Function *CalleeF = TheModule-&gt;getFunction(Callee);
1360 if (CalleeF == 0)
1361 return ErrorV("Unknown function referenced");
1362
1363 // If argument mismatch error.
1364 if (CalleeF-&gt;arg_size() != Args.size())
1365 return ErrorV("Incorrect # arguments passed");
1366
1367 std::vector&lt;Value*&gt; ArgsV;
1368 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
1369 ArgsV.push_back(Args[i]-&gt;Codegen());
1370 if (ArgsV.back() == 0) return 0;
1371 }
1372
1373 return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
1374}
1375
1376Value *IfExprAST::Codegen() {
1377 Value *CondV = Cond-&gt;Codegen();
1378 if (CondV == 0) return 0;
1379
1380 // Convert condition to a bool by comparing equal to 0.0.
1381 CondV = Builder.CreateFCmpONE(CondV,
1382 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
1383 "ifcond");
1384
1385 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1386
1387 // Create blocks for the then and else cases. Insert the 'then' block at the
1388 // end of the function.
1389 BasicBlock *ThenBB = new BasicBlock("then", TheFunction);
1390 BasicBlock *ElseBB = new BasicBlock("else");
1391 BasicBlock *MergeBB = new BasicBlock("ifcont");
1392
1393 Builder.CreateCondBr(CondV, ThenBB, ElseBB);
1394
1395 // Emit then value.
1396 Builder.SetInsertPoint(ThenBB);
1397
1398 Value *ThenV = Then-&gt;Codegen();
1399 if (ThenV == 0) return 0;
1400
1401 Builder.CreateBr(MergeBB);
1402 // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
1403 ThenBB = Builder.GetInsertBlock();
1404
1405 // Emit else block.
1406 TheFunction-&gt;getBasicBlockList().push_back(ElseBB);
1407 Builder.SetInsertPoint(ElseBB);
1408
1409 Value *ElseV = Else-&gt;Codegen();
1410 if (ElseV == 0) return 0;
1411
1412 Builder.CreateBr(MergeBB);
1413 // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
1414 ElseBB = Builder.GetInsertBlock();
1415
1416 // Emit merge block.
1417 TheFunction-&gt;getBasicBlockList().push_back(MergeBB);
1418 Builder.SetInsertPoint(MergeBB);
1419 PHINode *PN = Builder.CreatePHI(Type::DoubleTy, "iftmp");
1420
1421 PN-&gt;addIncoming(ThenV, ThenBB);
1422 PN-&gt;addIncoming(ElseV, ElseBB);
1423 return PN;
1424}
1425
1426Value *ForExprAST::Codegen() {
Chris Lattnerf5234802007-10-31 06:47:39 +00001427 // Output this as:
1428 // ...
1429 // start = startexpr
1430 // goto loop
1431 // loop:
1432 // variable = phi [start, loopheader], [nextvariable, loopend]
1433 // ...
1434 // bodyexpr
1435 // ...
1436 // loopend:
1437 // step = stepexpr
1438 // nextvariable = variable + step
1439 // endcond = endexpr
1440 // br endcond, loop, endloop
1441 // outloop:
1442
1443 // Emit the start code first, without 'variable' in scope.
1444 Value *StartVal = Start-&gt;Codegen();
1445 if (StartVal == 0) return 0;
1446
1447 // Make the new basic block for the loop header, inserting after current
1448 // block.
1449 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1450 BasicBlock *PreheaderBB = Builder.GetInsertBlock();
1451 BasicBlock *LoopBB = new BasicBlock("loop", TheFunction);
1452
1453 // Insert an explicit fall through from the current block to the LoopBB.
Chris Lattnerf5234802007-10-31 06:47:39 +00001454 Builder.CreateBr(LoopBB);
Chris Lattner6093bd52007-10-31 07:29:43 +00001455
1456 // Start insertion in LoopBB.
Chris Lattnerf5234802007-10-31 06:47:39 +00001457 Builder.SetInsertPoint(LoopBB);
1458
1459 // Start the PHI node with an entry for Start.
1460 PHINode *Variable = Builder.CreatePHI(Type::DoubleTy, VarName.c_str());
1461 Variable-&gt;addIncoming(StartVal, PreheaderBB);
1462
1463 // Within the loop, the variable is defined equal to the PHI node. If it
1464 // shadows an existing variable, we have to restore it, so save it now.
1465 Value *OldVal = NamedValues[VarName];
1466 NamedValues[VarName] = Variable;
1467
1468 // Emit the body of the loop. This, like any other expr, can change the
1469 // current BB. Note that we ignore the value computed by the body, but don't
1470 // allow an error.
1471 if (Body-&gt;Codegen() == 0)
1472 return 0;
1473
1474 // Emit the step value.
1475 Value *StepVal;
1476 if (Step) {
1477 StepVal = Step-&gt;Codegen();
1478 if (StepVal == 0) return 0;
1479 } else {
1480 // If not specified, use 1.0.
1481 StepVal = ConstantFP::get(Type::DoubleTy, APFloat(1.0));
1482 }
1483
1484 Value *NextVar = Builder.CreateAdd(Variable, StepVal, "nextvar");
1485
Chris Lattnerf5234802007-10-31 06:47:39 +00001486 // Compute the end condition.
1487 Value *EndCond = End-&gt;Codegen();
1488 if (EndCond == 0) return EndCond;
1489
1490 // Convert condition to a bool by comparing equal to 0.0.
1491 EndCond = Builder.CreateFCmpONE(EndCond,
1492 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
1493 "loopcond");
1494
1495 // Create the "after loop" block and insert it.
1496 BasicBlock *LoopEndBB = Builder.GetInsertBlock();
1497 BasicBlock *AfterBB = new BasicBlock("afterloop", TheFunction);
1498
1499 // Insert the conditional branch into the end of LoopEndBB.
1500 Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
1501
1502 // Any new code will be inserted in AfterBB.
1503 Builder.SetInsertPoint(AfterBB);
1504
1505 // Add a new entry to the PHI node for the backedge.
1506 Variable-&gt;addIncoming(NextVar, LoopEndBB);
1507
1508 // Restore the unshadowed variable.
1509 if (OldVal)
1510 NamedValues[VarName] = OldVal;
1511 else
1512 NamedValues.erase(VarName);
1513
1514
1515 // for expr always returns 0.0.
1516 return Constant::getNullValue(Type::DoubleTy);
1517}
Chris Lattner6093bd52007-10-31 07:29:43 +00001518
1519Function *PrototypeAST::Codegen() {
1520 // Make the function type: double(double,double) etc.
1521 std::vector&lt;const Type*&gt; Doubles(Args.size(), Type::DoubleTy);
1522 FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
1523
1524 Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
1525
1526 // If F conflicted, there was already something named 'Name'. If it has a
1527 // body, don't allow redefinition or reextern.
1528 if (F-&gt;getName() != Name) {
1529 // Delete the one we just made and get the existing one.
1530 F-&gt;eraseFromParent();
1531 F = TheModule-&gt;getFunction(Name);
1532
1533 // If F already has a body, reject this.
1534 if (!F-&gt;empty()) {
1535 ErrorF("redefinition of function");
1536 return 0;
1537 }
1538
1539 // If F took a different number of args, reject.
1540 if (F-&gt;arg_size() != Args.size()) {
1541 ErrorF("redefinition of function with different # args");
1542 return 0;
1543 }
1544 }
1545
1546 // Set names for all arguments.
1547 unsigned Idx = 0;
1548 for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
1549 ++AI, ++Idx) {
1550 AI-&gt;setName(Args[Idx]);
1551
1552 // Add arguments to variable symbol table.
1553 NamedValues[Args[Idx]] = AI;
1554 }
1555
1556 return F;
1557}
1558
1559Function *FunctionAST::Codegen() {
1560 NamedValues.clear();
1561
1562 Function *TheFunction = Proto-&gt;Codegen();
1563 if (TheFunction == 0)
1564 return 0;
1565
1566 // Create a new basic block to start insertion into.
1567 BasicBlock *BB = new BasicBlock("entry", TheFunction);
1568 Builder.SetInsertPoint(BB);
1569
1570 if (Value *RetVal = Body-&gt;Codegen()) {
1571 // Finish off the function.
1572 Builder.CreateRet(RetVal);
1573
1574 // Validate the generated code, checking for consistency.
1575 verifyFunction(*TheFunction);
1576
1577 // Optimize the function.
1578 TheFPM-&gt;run(*TheFunction);
1579
1580 return TheFunction;
1581 }
1582
1583 // Error reading body, remove function.
1584 TheFunction-&gt;eraseFromParent();
1585 return 0;
1586}
1587
1588//===----------------------------------------------------------------------===//
1589// Top-Level parsing and JIT Driver
1590//===----------------------------------------------------------------------===//
1591
1592static ExecutionEngine *TheExecutionEngine;
1593
1594static void HandleDefinition() {
1595 if (FunctionAST *F = ParseDefinition()) {
1596 if (Function *LF = F-&gt;Codegen()) {
1597 fprintf(stderr, "Read function definition:");
1598 LF-&gt;dump();
1599 }
1600 } else {
1601 // Skip token for error recovery.
1602 getNextToken();
1603 }
1604}
1605
1606static void HandleExtern() {
1607 if (PrototypeAST *P = ParseExtern()) {
1608 if (Function *F = P-&gt;Codegen()) {
1609 fprintf(stderr, "Read extern: ");
1610 F-&gt;dump();
1611 }
1612 } else {
1613 // Skip token for error recovery.
1614 getNextToken();
1615 }
1616}
1617
1618static void HandleTopLevelExpression() {
1619 // Evaluate a top level expression into an anonymous function.
1620 if (FunctionAST *F = ParseTopLevelExpr()) {
1621 if (Function *LF = F-&gt;Codegen()) {
1622 // JIT the function, returning a function pointer.
1623 void *FPtr = TheExecutionEngine-&gt;getPointerToFunction(LF);
1624
1625 // Cast it to the right type (takes no arguments, returns a double) so we
1626 // can call it as a native function.
1627 double (*FP)() = (double (*)())FPtr;
1628 fprintf(stderr, "Evaluated to %f\n", FP());
1629 }
1630 } else {
1631 // Skip token for error recovery.
1632 getNextToken();
1633 }
1634}
1635
1636/// top ::= definition | external | expression | ';'
1637static void MainLoop() {
1638 while (1) {
1639 fprintf(stderr, "ready&gt; ");
1640 switch (CurTok) {
1641 case tok_eof: return;
1642 case ';': getNextToken(); break; // ignore top level semicolons.
1643 case tok_def: HandleDefinition(); break;
1644 case tok_extern: HandleExtern(); break;
1645 default: HandleTopLevelExpression(); break;
1646 }
1647 }
1648}
Chris Lattnerf5234802007-10-31 06:47:39 +00001649
Chris Lattner602c832c2007-10-31 06:30:21 +00001650
Chris Lattner602c832c2007-10-31 06:30:21 +00001651
Chris Lattner6093bd52007-10-31 07:29:43 +00001652//===----------------------------------------------------------------------===//
1653// "Library" functions that can be "extern'd" from user code.
1654//===----------------------------------------------------------------------===//
Chris Lattner602c832c2007-10-31 06:30:21 +00001655
Chris Lattner6093bd52007-10-31 07:29:43 +00001656/// putchard - putchar that takes a double and returns 0.
1657extern "C"
1658double putchard(double X) {
1659 putchar((char)X);
1660 return 0;
1661}
Chris Lattner602c832c2007-10-31 06:30:21 +00001662
Chris Lattner6093bd52007-10-31 07:29:43 +00001663//===----------------------------------------------------------------------===//
1664// Main driver code.
1665//===----------------------------------------------------------------------===//
Chris Lattner602c832c2007-10-31 06:30:21 +00001666
Chris Lattner6093bd52007-10-31 07:29:43 +00001667int main() {
1668 // Install standard binary operators.
1669 // 1 is lowest precedence.
1670 BinopPrecedence['&lt;'] = 10;
1671 BinopPrecedence['+'] = 20;
1672 BinopPrecedence['-'] = 20;
1673 BinopPrecedence['*'] = 40; // highest.
Chris Lattner602c832c2007-10-31 06:30:21 +00001674
Chris Lattner6093bd52007-10-31 07:29:43 +00001675 // Prime the first token.
1676 fprintf(stderr, "ready&gt; ");
1677 getNextToken();
Chris Lattner602c832c2007-10-31 06:30:21 +00001678
Chris Lattner6093bd52007-10-31 07:29:43 +00001679 // Make the module, which holds all the code.
1680 TheModule = new Module("my cool jit");
1681
1682 // Create the JIT.
1683 TheExecutionEngine = ExecutionEngine::create(TheModule);
1684
1685 {
1686 ExistingModuleProvider OurModuleProvider(TheModule);
1687 FunctionPassManager OurFPM(&amp;OurModuleProvider);
1688
1689 // Set up the optimizer pipeline. Start with registering info about how the
1690 // target lays out data structures.
1691 OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
1692 // Do simple "peephole" optimizations and bit-twiddling optzns.
1693 OurFPM.add(createInstructionCombiningPass());
1694 // Reassociate expressions.
1695 OurFPM.add(createReassociatePass());
1696 // Eliminate Common SubExpressions.
1697 OurFPM.add(createGVNPass());
1698 // Simplify the control flow graph (deleting unreachable blocks, etc).
1699 OurFPM.add(createCFGSimplificationPass());
1700 // Set the global so the code gen can use this.
1701 TheFPM = &amp;OurFPM;
1702
1703 // Run the main "interpreter loop" now.
1704 MainLoop();
1705
1706 TheFPM = 0;
1707 } // Free module provider and pass manager.
1708
1709
1710 // Print out all of the generated code.
1711 TheModule-&gt;dump();
1712 return 0;
1713}
Chris Lattner602c832c2007-10-31 06:30:21 +00001714</pre>
1715</div>
1716
1717</div>
1718
1719<!-- *********************************************************************** -->
1720<hr>
1721<address>
1722 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
1723 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
1724 <a href="http://validator.w3.org/check/referer"><img
1725 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
1726
1727 <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
1728 <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
1729 Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $
1730</address>
1731</body>
1732</html>