blob: a57975801425c938700790c95597276177c7ec5a [file] [log] [blame]
Chris Lattner602c832c2007-10-31 06:30:21 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
3
4<html>
5<head>
6 <title>Kaleidoscope: Extending the Language: Control Flow</title>
7 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
8 <meta name="author" content="Chris Lattner">
9 <link rel="stylesheet" href="../llvm.css" type="text/css">
10</head>
11
12<body>
13
14<div class="doc_title">Kaleidoscope: Extending the Language: Control Flow</div>
15
Chris Lattner128eb862007-11-05 19:06:59 +000016<ul>
17<li>Chapter 5
18 <ol>
19 <li><a href="#intro">Chapter 5 Introduction</a></li>
20 <li><a href="#ifthen">If/Then/Else</a>
21 <ol>
22 <li><a href="#iflexer">Lexer Extensions</a></li>
23 <li><a href="#ifast">AST Extensions</a></li>
24 <li><a href="#ifparser">Parser Extensions</a></li>
25 <li><a href="#ifir">LLVM IR</a></li>
26 <li><a href="#ifcodegen">Code Generation</a></li>
27 </ol>
28 </li>
29 <li><a href="#for">'for' Loop Expression</a>
30 <ol>
31 <li><a href="#forlexer">Lexer Extensions</a></li>
32 <li><a href="#forast">AST Extensions</a></li>
33 <li><a href="#forparser">Parser Extensions</a></li>
34 <li><a href="#forir">LLVM IR</a></li>
35 <li><a href="#forcodegen">Code Generation</a></li>
36 </ol>
37 </li>
38 <li><a href="#code">Full Code Listing</a></li>
39 </ol>
40</li>
41</ul>
42
Chris Lattner602c832c2007-10-31 06:30:21 +000043<div class="doc_author">
44 <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
45</div>
46
47<!-- *********************************************************************** -->
Chris Lattner128eb862007-11-05 19:06:59 +000048<div class="doc_section"><a name="intro">Chapter 5 Introduction</a></div>
Chris Lattner602c832c2007-10-31 06:30:21 +000049<!-- *********************************************************************** -->
50
51<div class="doc_text">
52
Chris Lattner128eb862007-11-05 19:06:59 +000053<p>Welcome to Chapter 5 of the "<a href="index.html">Implementing a language
54with LLVM</a>" tutorial. Parts 1-4 described the implementation of the simple
Chris Lattner602c832c2007-10-31 06:30:21 +000055Kaleidoscope language and included support for generating LLVM IR, following by
56optimizations and a JIT compiler. Unfortunately, as presented, Kaleidoscope is
57mostly useless: it has no control flow other than call and return. This means
58that you can't have conditional branches in the code, significantly limiting its
59power. In this episode of "build that compiler", we'll extend Kaleidoscope to
60have an if/then/else expression plus a simple looping construct.</p>
61
62</div>
63
64<!-- *********************************************************************** -->
65<div class="doc_section"><a name="ifthen">If/Then/Else</a></div>
66<!-- *********************************************************************** -->
67
68<div class="doc_text">
69
70<p>
71Extending Kaleidoscope to support if/then/else is quite straight-forward. It
72basically requires adding lexer support for this "new" concept to the lexer,
73parser, AST, and LLVM code emitter. This example is nice, because it shows how
74easy it is to "grow" a language over time, incrementally extending it as new
75ideas are discovered.</p>
76
77<p>Before we get going on "how" we do this extension, lets talk about what we
78want. The basic idea is that we want to be able to write this sort of thing:
79</p>
80
81<div class="doc_code">
82<pre>
83def fib(x)
84 if x &lt; 3 then
85 1
86 else
87 fib(x-1)+fib(x-2);
88</pre>
89</div>
90
91<p>In Kaleidoscope, every construct is an expression: there are no statements.
92As such, the if/then/else expression needs to return a value like any other.
93Since we're using a mostly functional form, we'll have it evaluate its
94conditional, then return the 'then' or 'else' value based on how the condition
95was resolved. This is very similar to the C "?:" expression.</p>
96
97<p>The semantics of the if/then/else expression is that it first evaluates the
98condition to a boolean equality value: 0.0 is false and everything else is true.
99If the condition is true, the first subexpression is evaluated and returned, if
100the condition is false, the second subexpression is evaluated and returned.
101Since Kaleidoscope allows side-effects, this behavior is important to nail down.
102</p>
103
104<p>Now that we know what we want, lets break this down into its constituent
105pieces.</p>
106
107</div>
108
109<!-- ======================================================================= -->
110<div class="doc_subsubsection"><a name="iflexer">Lexer Extensions for
111If/Then/Else</a></div>
112<!-- ======================================================================= -->
113
114
115<div class="doc_text">
116
117<p>The lexer extensions are straight-forward. First we add new enum values
118for the relevant tokens:</p>
119
120<div class="doc_code">
121<pre>
122 // control
123 tok_if = -6, tok_then = -7, tok_else = -8,
124</pre>
125</div>
126
127<p>Once we have that, we recognize the new keywords in the lexer, pretty simple
128stuff:</p>
129
130<div class="doc_code">
131<pre>
132 ...
133 if (IdentifierStr == "def") return tok_def;
134 if (IdentifierStr == "extern") return tok_extern;
135 <b>if (IdentifierStr == "if") return tok_if;
136 if (IdentifierStr == "then") return tok_then;
137 if (IdentifierStr == "else") return tok_else;</b>
138 return tok_identifier;
139</pre>
140</div>
141
142</div>
143
144<!-- ======================================================================= -->
145<div class="doc_subsubsection"><a name="ifast">AST Extensions for
Chris Lattner128eb862007-11-05 19:06:59 +0000146 If/Then/Else</a></div>
Chris Lattner602c832c2007-10-31 06:30:21 +0000147<!-- ======================================================================= -->
148
149<div class="doc_text">
150
151<p>To represent the new expression we add a new AST node for it:</p>
152
153<div class="doc_code">
154<pre>
155/// IfExprAST - Expression class for if/then/else.
156class IfExprAST : public ExprAST {
157 ExprAST *Cond, *Then, *Else;
158public:
159 IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
160 : Cond(cond), Then(then), Else(_else) {}
161 virtual Value *Codegen();
162};
163</pre>
164</div>
165
166<p>The AST node just has pointers to the various subexpressions.</p>
167
168</div>
169
170<!-- ======================================================================= -->
171<div class="doc_subsubsection"><a name="ifparser">Parser Extensions for
Chris Lattner128eb862007-11-05 19:06:59 +0000172If/Then/Else</a></div>
Chris Lattner602c832c2007-10-31 06:30:21 +0000173<!-- ======================================================================= -->
174
175<div class="doc_text">
176
177<p>Now that we have the relevant tokens coming from the lexer and we have the
178AST node to build, our parsing logic is relatively straight-forward. First we
179define a new parsing function:</p>
180
181<div class="doc_code">
182<pre>
183/// ifexpr ::= 'if' expression 'then' expression 'else' expression
184static ExprAST *ParseIfExpr() {
185 getNextToken(); // eat the if.
186
187 // condition.
188 ExprAST *Cond = ParseExpression();
189 if (!Cond) return 0;
190
191 if (CurTok != tok_then)
192 return Error("expected then");
193 getNextToken(); // eat the then
194
195 ExprAST *Then = ParseExpression();
196 if (Then == 0) return 0;
197
198 if (CurTok != tok_else)
199 return Error("expected else");
200
201 getNextToken();
202
203 ExprAST *Else = ParseExpression();
204 if (!Else) return 0;
205
206 return new IfExprAST(Cond, Then, Else);
207}
208</pre>
209</div>
210
211<p>Next we hook it up as a primary expression:</p>
212
213<div class="doc_code">
214<pre>
215static ExprAST *ParsePrimary() {
216 switch (CurTok) {
217 default: return Error("unknown token when expecting an expression");
218 case tok_identifier: return ParseIdentifierExpr();
219 case tok_number: return ParseNumberExpr();
220 case '(': return ParseParenExpr();
221 <b>case tok_if: return ParseIfExpr();</b>
222 }
223}
224</pre>
225</div>
226
227</div>
228
229<!-- ======================================================================= -->
230<div class="doc_subsubsection"><a name="ifir">LLVM IR for If/Then/Else</a></div>
231<!-- ======================================================================= -->
232
233<div class="doc_text">
234
235<p>Now that we have it parsing and building the AST, the final piece is adding
236LLVM code generation support. This is the most interesting part of the
237if/then/else example, because this is where it starts to introduce new concepts.
238All of the code above has been described in previous chapters fairly thoroughly.
239</p>
240
241<p>To motivate the code we want to produce, lets take a look at a simple
242example. Consider:</p>
243
244<div class="doc_code">
245<pre>
246extern foo();
247extern bar();
248def baz(x) if x then foo() else bar();
249</pre>
250</div>
251
252<p>If you disable optimizations, the code you'll (soon) get from Kaleidoscope
253looks like this:</p>
254
255<div class="doc_code">
256<pre>
257declare double @foo()
258
259declare double @bar()
260
261define double @baz(double %x) {
262entry:
263 %ifcond = fcmp one double %x, 0.000000e+00
264 br i1 %ifcond, label %then, label %else
265
266then: ; preds = %entry
267 %calltmp = call double @foo()
268 br label %ifcont
269
270else: ; preds = %entry
271 %calltmp1 = call double @bar()
272 br label %ifcont
273
274ifcont: ; preds = %else, %then
275 %iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ]
276 ret double %iftmp
277}
278</pre>
279</div>
280
281<p>To visualize the control flow graph, you can use a nifty feature of the LLVM
282'<a href="http://llvm.org/cmds/opt.html">opt</a>' tool. If you put this LLVM IR
283into "t.ll" and run "<tt>llvm-as &lt; t.ll | opt -analyze -view-cfg</tt>", <a
284href="../ProgrammersManual.html#ViewGraph">a window will pop up</a> and you'll
285see this graph:</p>
286
287<center><img src="LangImpl5-cfg.png" alt="Example CFG" width="423"
288height="315"></center>
289
290<p>Another way to get this is to call "<tt>F-&gt;viewCFG()</tt>" or
291"<tt>F-&gt;viewCFGOnly()</tt>" (where F is a "<tt>Function*</tt>") either by
292inserting actual calls into the code and recompiling or by calling these in the
293debugger. LLVM has many nice features for visualizing various graphs.</p>
294
295<p>Coming back to the generated code, it is fairly simple: the entry block
296evaluates the conditional expression ("x" in our case here) and compares the
297result to 0.0 with the "<tt><a href="../LangRef.html#i_fcmp">fcmp</a> one</tt>"
298instruction ('one' is "ordered and not equal"). Based on the result of this
299expression, the code jumps to either the "then" or "else" blocks, which contain
300the expressions for the true/false case.</p>
301
302<p>Once the then/else blocks is finished executing, they both branch back to the
303else block to execute the code that happens after the if/then/else. In this
304case the only thing left to do is to return to the caller of the function. The
305question then becomes: how does the code know which expression to return?</p>
306
307<p>The answer to this question involves an important SSA operation: the
308<a href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Phi
309operation</a>. If you're not familiar with SSA, <a
310href="http://en.wikipedia.org/wiki/Static_single_assignment_form">the wikipedia
311article</a> is a good introduction and there are various other introductions to
312it available on your favorite search engine. The short version is that
313"execution" of the Phi operation requires "remembering" which block control came
314from. The Phi operation takes on the value corresponding to the input control
315block. In this case, if control comes in from the "then" block, it gets the
316value of "calltmp". If control comes from the "else" block, it gets the value
317of "calltmp1".</p>
318
319<p>At this point, you are probably starting to think "on no! this means my
320simple and elegant front-end will have to start generating SSA form in order to
321use LLVM!". Fortunately, this is not the case, and we strongly advise
322<em>not</em> implementing an SSA construction algorithm in your front-end
323unless there is an amazingly good reason to do so. In practice, there are two
324sorts of values that float around in code written in your average imperative
325programming language that might need Phi nodes:</p>
326
327<ol>
328<li>Code that involves user variables: <tt>x = 1; x = x + 1; </tt></li>
329<li>Values that are implicit in the structure of your AST, such as the phi node
330in this case.</li>
331</ol>
332
Chris Lattnerb0f0deb2007-11-05 07:02:49 +0000333<p>In <a href="LangImpl7.html">Chapter 7</a> of this tutorial ("mutable
334variables"), we'll talk about #1
Chris Lattner602c832c2007-10-31 06:30:21 +0000335in depth. For now, just believe me that you don't need SSA construction to
336handle them. For #2, you have the choice of using the techniques that we will
337describe for #1, or you can insert Phi nodes directly if convenient. In this
338case, it is really really easy to generate the Phi node, so we choose to do it
339directly.</p>
340
341<p>Okay, enough of the motivation and overview, lets generate code!</p>
342
343</div>
344
345<!-- ======================================================================= -->
346<div class="doc_subsubsection"><a name="ifcodegen">Code Generation for
347If/Then/Else</a></div>
348<!-- ======================================================================= -->
349
350<div class="doc_text">
351
352<p>In order to generate code for this, we implement the <tt>Codegen</tt> method
353for <tt>IfExprAST</tt>:</p>
354
355<div class="doc_code">
356<pre>
357Value *IfExprAST::Codegen() {
358 Value *CondV = Cond-&gt;Codegen();
359 if (CondV == 0) return 0;
360
361 // Convert condition to a bool by comparing equal to 0.0.
362 CondV = Builder.CreateFCmpONE(CondV,
363 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
364 "ifcond");
365</pre>
366</div>
367
368<p>This code is straight-forward and similar to what we saw before. We emit the
369expression for the condition, then compare that value to zero to get a truth
370value as a 1-bit (bool) value.</p>
371
372<div class="doc_code">
373<pre>
374 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
375
376 // Create blocks for the then and else cases. Insert the 'then' block at the
377 // end of the function.
378 BasicBlock *ThenBB = new BasicBlock("then", TheFunction);
379 BasicBlock *ElseBB = new BasicBlock("else");
380 BasicBlock *MergeBB = new BasicBlock("ifcont");
381
382 Builder.CreateCondBr(CondV, ThenBB, ElseBB);
383</pre>
384</div>
385
386<p>This code creates the basic blocks that are related to the if/then/else
387statement, and correspond directly to the blocks in the example above. The
388first line of this gets the current Function object that is being built. It
389gets this by asking the builder for the current BasicBlock, and asking that
390block for its "parent" (the function it is currently embedded into).</p>
391
392<p>Once it has that, it creates three blocks. Note that it passes "TheFunction"
393into the constructor for the "then" block. This causes the constructor to
394automatically insert the new block onto the end of the specified function. The
395other two blocks are created, but aren't yet inserted into the function.</p>
396
397<p>Once the blocks are created, we can emit the conditional branch that chooses
398between them. Note that creating new blocks does not implicitly affect the
399LLVMBuilder, so it is still inserting into the block that the condition
400went into. Also note that it is creating a branch to the "then" block and the
401"else" block, even though the "else" block isn't inserted into the function yet.
402This is all ok: it is the standard way that LLVM supports forward
403references.</p>
404
405<div class="doc_code">
406<pre>
407 // Emit then value.
408 Builder.SetInsertPoint(ThenBB);
409
410 Value *ThenV = Then-&gt;Codegen();
411 if (ThenV == 0) return 0;
412
413 Builder.CreateBr(MergeBB);
414 // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
415 ThenBB = Builder.GetInsertBlock();
416</pre>
417</div>
418
419<p>After the conditional branch is inserted, we move the builder to start
420inserting into the "then" block. Strictly speaking, this call moves the
421insertion point to be at the end of the specified block. However, since the
422"then" block is empty, it also starts out by inserting at the beginning of the
423block. :)</p>
424
425<p>Once the insertion point is set, we recursively codegen the "then" expression
426from the AST. To finish off the then block, we create an unconditional branch
427to the merge block. One interesting (and very important) aspect of the LLVM IR
428is that it <a href="../LangRef.html#functionstructure">requires all basic blocks
429to be "terminated"</a> with a <a href="../LangRef.html#terminators">control flow
430instruction</a> such as return or branch. This means that all control flow,
431<em>including fall throughs</em> must be made explicit in the LLVM IR. If you
432violate this rule, the verifier will emit an error.</p>
433
434<p>The final line here is quite subtle, but is very important. The basic issue
435is that when we create the Phi node in the merge block, we need to set up the
436block/value pairs that indicate how the Phi will work. Importantly, the Phi
Chris Lattnerb5019642007-11-05 17:52:04 +0000437node expects to have an entry for each predecessor of the block in the CFG. Why
Chris Lattner602c832c2007-10-31 06:30:21 +0000438then are we getting the current block when we just set it to ThenBB 5 lines
439above? The problem is that the "Then" expression may actually itself change the
440block that the Builder is emitting into if, for example, it contains a nested
441"if/then/else" expression. Because calling Codegen recursively could
442arbitrarily change the notion of the current block, we are required to get an
443up-to-date value for code that will set up the Phi node.</p>
444
445<div class="doc_code">
446<pre>
447 // Emit else block.
448 TheFunction-&gt;getBasicBlockList().push_back(ElseBB);
449 Builder.SetInsertPoint(ElseBB);
450
451 Value *ElseV = Else-&gt;Codegen();
452 if (ElseV == 0) return 0;
453
454 Builder.CreateBr(MergeBB);
455 // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
456 ElseBB = Builder.GetInsertBlock();
457</pre>
458</div>
459
460<p>Code generation for the 'else' block is basically identical to codegen for
461the 'then' block. The only significant difference is the first line, which adds
462the 'else' block to the function. Recall previously that the 'else' block was
463created, but not added to the function. Now that the 'then' and 'else' blocks
464are emitted, we can finish up with the merge code:</p>
465
466<div class="doc_code">
467<pre>
468 // Emit merge block.
469 TheFunction->getBasicBlockList().push_back(MergeBB);
470 Builder.SetInsertPoint(MergeBB);
471 PHINode *PN = Builder.CreatePHI(Type::DoubleTy, "iftmp");
472
473 PN->addIncoming(ThenV, ThenBB);
474 PN->addIncoming(ElseV, ElseBB);
475 return PN;
476}
477</pre>
478</div>
479
480<p>The first two lines here are now familiar: the first adds the "merge" block
481to the Function object (it was previously floating, like the else block above).
482The second block changes the insertion point so that newly created code will go
483into the "merge" block. Once that is done, we need to create the PHI node and
484set up the block/value pairs for the PHI.</p>
485
486<p>Finally, the CodeGen function returns the phi node as the value computed by
487the if/then/else expression. In our example above, this returned value will
488feed into the code for the top-level function, which will create the return
489instruction.</p>
490
491<p>Overall, we now have the ability to execution conditional code in
492Kaleidoscope. With this extension, Kaleidoscope is a fairly complete language
493that can calculate a wide variety of numeric functions. Next up we'll add
494another useful expression that is familiar from non-functional languages...</p>
495
496</div>
497
498<!-- *********************************************************************** -->
499<div class="doc_section"><a name="for">'for' Loop Expression</a></div>
500<!-- *********************************************************************** -->
501
502<div class="doc_text">
503
Chris Lattnerf5234802007-10-31 06:47:39 +0000504<p>Now that we know how to add basic control flow constructs to the language,
505we have the tools to add more powerful things. Lets add something more
506aggressive, a 'for' expression:</p>
507
508<div class="doc_code">
509<pre>
Chris Lattnerf5234802007-10-31 06:47:39 +0000510 extern putchard(char)
Chris Lattner6093bd52007-10-31 07:29:43 +0000511 def printstar(n)
512 for i = 1, i &lt; n, 1.0 in
513 putchard(42); # ascii 42 = '*'
514
515 # print 100 '*' characters
516 printstar(100);
Chris Lattnerf5234802007-10-31 06:47:39 +0000517</pre>
518</div>
519
Chris Lattner6093bd52007-10-31 07:29:43 +0000520<p>This expression defines a new variable ("i" in this case) which iterates from
521a starting value, while the condition ("i &lt; n" in this case) is true,
Chris Lattnerf5234802007-10-31 06:47:39 +0000522incrementing by an optional step value ("1.0" in this case). If the step value
523is omitted, it defaults to 1.0. While the loop is true, it executes its
524body expression. Because we don't have anything better to return, we'll just
525define the loop as always returning 0.0. In the future when we have mutable
526variables, it will get more useful.</p>
527
528<p>As before, lets talk about the changes that we need to Kaleidoscope to
529support this.</p>
530
531</div>
532
533<!-- ======================================================================= -->
534<div class="doc_subsubsection"><a name="forlexer">Lexer Extensions for
535the 'for' Loop</a></div>
536<!-- ======================================================================= -->
537
538<div class="doc_text">
539
540<p>The lexer extensions are the same sort of thing as for if/then/else:</p>
541
542<div class="doc_code">
543<pre>
544 ... in enum Token ...
545 // control
546 tok_if = -6, tok_then = -7, tok_else = -8,
547<b> tok_for = -9, tok_in = -10</b>
548
549 ... in gettok ...
550 if (IdentifierStr == "def") return tok_def;
551 if (IdentifierStr == "extern") return tok_extern;
552 if (IdentifierStr == "if") return tok_if;
553 if (IdentifierStr == "then") return tok_then;
554 if (IdentifierStr == "else") return tok_else;
555 <b>if (IdentifierStr == "for") return tok_for;
556 if (IdentifierStr == "in") return tok_in;</b>
557 return tok_identifier;
558</pre>
559</div>
560
561</div>
562
563<!-- ======================================================================= -->
564<div class="doc_subsubsection"><a name="forast">AST Extensions for
565the 'for' Loop</a></div>
566<!-- ======================================================================= -->
567
568<div class="doc_text">
569
570<p>The AST node is similarly simple. It basically boils down to capturing
571the variable name and the consituent expressions in the node.</p>
572
573<div class="doc_code">
574<pre>
575/// ForExprAST - Expression class for for/in.
576class ForExprAST : public ExprAST {
577 std::string VarName;
578 ExprAST *Start, *End, *Step, *Body;
579public:
580 ForExprAST(const std::string &amp;varname, ExprAST *start, ExprAST *end,
581 ExprAST *step, ExprAST *body)
582 : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
583 virtual Value *Codegen();
584};
585</pre>
586</div>
587
588</div>
589
590<!-- ======================================================================= -->
591<div class="doc_subsubsection"><a name="forparser">Parser Extensions for
592the 'for' Loop</a></div>
593<!-- ======================================================================= -->
594
595<div class="doc_text">
596
597<p>The parser code is also fairly standard. The only interesting thing here is
598handling of the optional step value. The parser code handles it by checking to
599see if the second comma is present. If not, it sets the step value to null in
600the AST node:</p>
601
602<div class="doc_code">
603<pre>
Chris Lattner20a0c802007-11-05 17:54:34 +0000604/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
Chris Lattnerf5234802007-10-31 06:47:39 +0000605static ExprAST *ParseForExpr() {
606 getNextToken(); // eat the for.
607
608 if (CurTok != tok_identifier)
609 return Error("expected identifier after for");
610
611 std::string IdName = IdentifierStr;
Chris Lattner20a0c802007-11-05 17:54:34 +0000612 getNextToken(); // eat identifier.
Chris Lattnerf5234802007-10-31 06:47:39 +0000613
614 if (CurTok != '=')
615 return Error("expected '=' after for");
616 getNextToken(); // eat '='.
617
618
619 ExprAST *Start = ParseExpression();
620 if (Start == 0) return 0;
621 if (CurTok != ',')
622 return Error("expected ',' after for start value");
623 getNextToken();
624
625 ExprAST *End = ParseExpression();
626 if (End == 0) return 0;
627
628 // The step value is optional.
629 ExprAST *Step = 0;
630 if (CurTok == ',') {
631 getNextToken();
632 Step = ParseExpression();
633 if (Step == 0) return 0;
634 }
635
636 if (CurTok != tok_in)
637 return Error("expected 'in' after for");
638 getNextToken(); // eat 'in'.
639
640 ExprAST *Body = ParseExpression();
641 if (Body == 0) return 0;
642
643 return new ForExprAST(IdName, Start, End, Step, Body);
644}
645</pre>
646</div>
647
648</div>
649
650<!-- ======================================================================= -->
651<div class="doc_subsubsection"><a name="forir">LLVM IR for
652the 'for' Loop</a></div>
653<!-- ======================================================================= -->
654
655<div class="doc_text">
656
657<p>Now we get to the good part: the LLVM IR we want to generate for this thing.
Chris Lattner6093bd52007-10-31 07:29:43 +0000658With the simple example above, we get this LLVM IR (note that this dump is
659generated with optimizations disabled):
Chris Lattnerf5234802007-10-31 06:47:39 +0000660</p>
661
Chris Lattner6093bd52007-10-31 07:29:43 +0000662<div class="doc_code">
663<pre>
664declare double @putchard(double)
Chris Lattnerf5234802007-10-31 06:47:39 +0000665
Chris Lattner6093bd52007-10-31 07:29:43 +0000666define double @printstar(double %n) {
667entry:
668 ; initial value = 1.0 (inlined into phi)
669 br label %loop
670
671loop: ; preds = %loop, %entry
672 %i = phi double [ 1.000000e+00, %entry ], [ %nextvar, %loop ]
673 ; body
674 %calltmp = call double @putchard( double 4.200000e+01 )
675 ; increment
676 %nextvar = add double %i, 1.000000e+00
677
678 ; termination test
679 %multmp = fcmp ult double %i, %n
680 %booltmp = uitofp i1 %multmp to double
681 %loopcond = fcmp one double %booltmp, 0.000000e+00
682 br i1 %loopcond, label %loop, label %afterloop
683
684afterloop: ; preds = %loop
685 ; loop always returns 0.0
686 ret double 0.000000e+00
687}
688</pre>
689</div>
690
691<p>This loop contains all the same constructs we saw before: a phi node, several
692expressions, and some basic blocks. Lets see how this fits together.</p>
Chris Lattnerf5234802007-10-31 06:47:39 +0000693
694</div>
695
696<!-- ======================================================================= -->
697<div class="doc_subsubsection"><a name="forcodegen">Code Generation for
698the 'for' Loop</a></div>
699<!-- ======================================================================= -->
700
701<div class="doc_text">
702
Chris Lattner6093bd52007-10-31 07:29:43 +0000703<p>The first part of codegen is very simple: we just output the start expression
704for the loop value:</p>
Chris Lattnerf5234802007-10-31 06:47:39 +0000705
706<div class="doc_code">
707<pre>
708Value *ForExprAST::Codegen() {
Chris Lattner6093bd52007-10-31 07:29:43 +0000709 // Emit the start code first, without 'variable' in scope.
710 Value *StartVal = Start-&gt;Codegen();
711 if (StartVal == 0) return 0;
712</pre>
713</div>
714
715<p>With this out of the way, the next step is to set up the LLVM basic block
716for the start of the loop body. In the case above, the whole loop body is one
717block, but remember that the body code itself could consist of multiple blocks
718(e.g. if it is a if/then/else expression).</p>
719
720<div class="doc_code">
721<pre>
722 // Make the new basic block for the loop header, inserting after current
723 // block.
724 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
725 BasicBlock *PreheaderBB = Builder.GetInsertBlock();
726 BasicBlock *LoopBB = new BasicBlock("loop", TheFunction);
727
728 // Insert an explicit fall through from the current block to the LoopBB.
729 Builder.CreateBr(LoopBB);
730</pre>
731</div>
732
733<p>This code is similar to what we saw for if/then/else. Because we will need
734it to create the Phi node, we remember the block that falls through into the
735loop. Once we have that, we create the actual block that starts the loop and
736create an unconditional branch for the fall-through between the two blocks.</p>
737
738<div class="doc_code">
739<pre>
740 // Start insertion in LoopBB.
741 Builder.SetInsertPoint(LoopBB);
742
743 // Start the PHI node with an entry for Start.
744 PHINode *Variable = Builder.CreatePHI(Type::DoubleTy, VarName.c_str());
745 Variable-&gt;addIncoming(StartVal, PreheaderBB);
746</pre>
747</div>
748
749<p>Now that the "preheader" for the loop is set up, we switch to emitting code
750for the loop body. To begin with, we move the insertion point and create the
751PHI node for the loop induction variable. SInce we already know the incoming
752value for the starting value, we add it to the Phi node. Note that the Phi will
753eventually get a second value for the backedge, but we can't set it up yet
754(because it doesn't exist!).</p>
755
756<div class="doc_code">
757<pre>
758 // Within the loop, the variable is defined equal to the PHI node. If it
759 // shadows an existing variable, we have to restore it, so save it now.
760 Value *OldVal = NamedValues[VarName];
761 NamedValues[VarName] = Variable;
762
763 // Emit the body of the loop. This, like any other expr, can change the
764 // current BB. Note that we ignore the value computed by the body, but don't
765 // allow an error.
766 if (Body-&gt;Codegen() == 0)
767 return 0;
768</pre>
769</div>
770
771<p>Now the code starts to get more interesting. Our 'for' loop introduces a new
772variable to the symbol table. This means that our symbol table can now contain
773either function arguments or loop variables. To handle this, before we codegen
774the body of the loop, we add the loop variable as the current value for its
775name. Note that it is possible that there is a variable of the same name in the
776outer scope. It would be easy to make this an error (emit an error and return
777null if there is already an entry for VarName) but we choose to allow shadowing
778of variables. In order to handle this correctly, we remember the Value that
779we are potentially shadowing in <tt>OldVal</tt> (which will be null if there is
780no shadowed variable).</p>
781
782<p>Once the loop variable is set into the symbol table, the code recursively
783codegen's the body. This allows the body to use the loop variable: any
784references to it will naturally find it in the symbol table.</p>
785
786<div class="doc_code">
787<pre>
788 // Emit the step value.
789 Value *StepVal;
790 if (Step) {
791 StepVal = Step-&gt;Codegen();
792 if (StepVal == 0) return 0;
793 } else {
794 // If not specified, use 1.0.
795 StepVal = ConstantFP::get(Type::DoubleTy, APFloat(1.0));
796 }
797
798 Value *NextVar = Builder.CreateAdd(Variable, StepVal, "nextvar");
799</pre>
800</div>
801
802<p>Now that the body is emitted, we compute the next value of the iteration
803variable by adding the step value or 1.0 if it isn't present. '<tt>NextVar</tt>'
804will be the value of the loop variable on the next iteration of the loop.</p>
805
806<div class="doc_code">
807<pre>
808 // Compute the end condition.
809 Value *EndCond = End-&gt;Codegen();
810 if (EndCond == 0) return EndCond;
811
812 // Convert condition to a bool by comparing equal to 0.0.
813 EndCond = Builder.CreateFCmpONE(EndCond,
814 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
815 "loopcond");
816</pre>
817</div>
818
819<p>Finally, we evaluate the exit value of the loop, to determine whether the
820loop should exit. This mirrors the condition evaluation for the if/then/else
821statement.</p>
822
823<div class="doc_code">
824<pre>
825 // Create the "after loop" block and insert it.
826 BasicBlock *LoopEndBB = Builder.GetInsertBlock();
827 BasicBlock *AfterBB = new BasicBlock("afterloop", TheFunction);
828
829 // Insert the conditional branch into the end of LoopEndBB.
830 Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
831
832 // Any new code will be inserted in AfterBB.
833 Builder.SetInsertPoint(AfterBB);
834</pre>
835</div>
836
837<p>With the code for the body of the loop complete, we just need to finish up
838the control flow for it. This remembers the end block (for the phi node), then
839creates the block for the loop exit ("afterloop"). Based on the value of the
840exit condition, it creates a conditional branch that chooses between executing
841the loop again and exiting the loop. Any future code is emitted in the
842"afterloop" block, so it sets the insertion position to it.</p>
843
844<div class="doc_code">
845<pre>
846 // Add a new entry to the PHI node for the backedge.
847 Variable-&gt;addIncoming(NextVar, LoopEndBB);
848
849 // Restore the unshadowed variable.
850 if (OldVal)
851 NamedValues[VarName] = OldVal;
852 else
853 NamedValues.erase(VarName);
854
855 // for expr always returns 0.0.
856 return Constant::getNullValue(Type::DoubleTy);
857}
858</pre>
859</div>
860
861<p>The final code handles various cleanups: now that we have the "NextVar"
862value, we can add the incoming value to the loop PHI node. After that, we
863remove the loop variable from the symbol table, so that it isn't in scope after
864the for loop. Finally, code generation of the for loop always returns 0.0, so
865that is what we return from <tt>ForExprAST::Codegen</tt>.</p>
866
867<p>With this, we conclude the "adding control flow to Kaleidoscope" chapter of
868the tutorial. We added two control flow constructs, and used them to motivate
869a couple of aspects of the LLVM IR that are important for front-end implementors
870to know. In the next chapter of our saga, we will get a bit crazier and add
871operator overloading to our poor innocent language.</p>
872
873</div>
874
875<!-- *********************************************************************** -->
876<div class="doc_section"><a name="code">Full Code Listing</a></div>
877<!-- *********************************************************************** -->
878
879<div class="doc_text">
880
881<p>
882Here is the complete code listing for our running example, enhanced with the
883if/then/else and for expressions.. To build this example, use:
884</p>
885
886<div class="doc_code">
887<pre>
888 # Compile
889 g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
890 # Run
891 ./toy
892</pre>
893</div>
894
895<p>Here is the code:</p>
896
897<div class="doc_code">
898<pre>
899#include "llvm/DerivedTypes.h"
900#include "llvm/ExecutionEngine/ExecutionEngine.h"
901#include "llvm/Module.h"
902#include "llvm/ModuleProvider.h"
903#include "llvm/PassManager.h"
904#include "llvm/Analysis/Verifier.h"
905#include "llvm/Target/TargetData.h"
906#include "llvm/Transforms/Scalar.h"
907#include "llvm/Support/LLVMBuilder.h"
908#include &lt;cstdio&gt;
909#include &lt;string&gt;
910#include &lt;map&gt;
911#include &lt;vector&gt;
912using namespace llvm;
913
914//===----------------------------------------------------------------------===//
915// Lexer
916//===----------------------------------------------------------------------===//
917
918// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
919// of these for known things.
920enum Token {
921 tok_eof = -1,
922
923 // commands
924 tok_def = -2, tok_extern = -3,
925
926 // primary
927 tok_identifier = -4, tok_number = -5,
928
929 // control
930 tok_if = -6, tok_then = -7, tok_else = -8,
931 tok_for = -9, tok_in = -10
932};
933
934static std::string IdentifierStr; // Filled in if tok_identifier
935static double NumVal; // Filled in if tok_number
936
937/// gettok - Return the next token from standard input.
938static int gettok() {
939 static int LastChar = ' ';
940
941 // Skip any whitespace.
942 while (isspace(LastChar))
943 LastChar = getchar();
944
945 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
946 IdentifierStr = LastChar;
947 while (isalnum((LastChar = getchar())))
948 IdentifierStr += LastChar;
949
950 if (IdentifierStr == "def") return tok_def;
951 if (IdentifierStr == "extern") return tok_extern;
952 if (IdentifierStr == "if") return tok_if;
953 if (IdentifierStr == "then") return tok_then;
954 if (IdentifierStr == "else") return tok_else;
955 if (IdentifierStr == "for") return tok_for;
956 if (IdentifierStr == "in") return tok_in;
957 return tok_identifier;
958 }
959
960 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
961 std::string NumStr;
962 do {
963 NumStr += LastChar;
964 LastChar = getchar();
965 } while (isdigit(LastChar) || LastChar == '.');
966
967 NumVal = strtod(NumStr.c_str(), 0);
968 return tok_number;
969 }
970
971 if (LastChar == '#') {
972 // Comment until end of line.
973 do LastChar = getchar();
974 while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp; LastChar != '\r');
975
976 if (LastChar != EOF)
977 return gettok();
978 }
979
980 // Check for end of file. Don't eat the EOF.
981 if (LastChar == EOF)
982 return tok_eof;
983
984 // Otherwise, just return the character as its ascii value.
985 int ThisChar = LastChar;
986 LastChar = getchar();
987 return ThisChar;
988}
989
990//===----------------------------------------------------------------------===//
991// Abstract Syntax Tree (aka Parse Tree)
992//===----------------------------------------------------------------------===//
993
994/// ExprAST - Base class for all expression nodes.
995class ExprAST {
996public:
997 virtual ~ExprAST() {}
998 virtual Value *Codegen() = 0;
999};
1000
1001/// NumberExprAST - Expression class for numeric literals like "1.0".
1002class NumberExprAST : public ExprAST {
1003 double Val;
1004public:
1005 NumberExprAST(double val) : Val(val) {}
1006 virtual Value *Codegen();
1007};
1008
1009/// VariableExprAST - Expression class for referencing a variable, like "a".
1010class VariableExprAST : public ExprAST {
1011 std::string Name;
1012public:
1013 VariableExprAST(const std::string &amp;name) : Name(name) {}
1014 virtual Value *Codegen();
1015};
1016
1017/// BinaryExprAST - Expression class for a binary operator.
1018class BinaryExprAST : public ExprAST {
1019 char Op;
1020 ExprAST *LHS, *RHS;
1021public:
1022 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
1023 : Op(op), LHS(lhs), RHS(rhs) {}
1024 virtual Value *Codegen();
1025};
1026
1027/// CallExprAST - Expression class for function calls.
1028class CallExprAST : public ExprAST {
1029 std::string Callee;
1030 std::vector&lt;ExprAST*&gt; Args;
1031public:
1032 CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
1033 : Callee(callee), Args(args) {}
1034 virtual Value *Codegen();
1035};
1036
1037/// IfExprAST - Expression class for if/then/else.
1038class IfExprAST : public ExprAST {
1039 ExprAST *Cond, *Then, *Else;
1040public:
1041 IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
1042 : Cond(cond), Then(then), Else(_else) {}
1043 virtual Value *Codegen();
1044};
1045
1046/// ForExprAST - Expression class for for/in.
1047class ForExprAST : public ExprAST {
1048 std::string VarName;
1049 ExprAST *Start, *End, *Step, *Body;
1050public:
1051 ForExprAST(const std::string &amp;varname, ExprAST *start, ExprAST *end,
1052 ExprAST *step, ExprAST *body)
1053 : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
1054 virtual Value *Codegen();
1055};
1056
1057/// PrototypeAST - This class represents the "prototype" for a function,
1058/// which captures its argument names as well as if it is an operator.
1059class PrototypeAST {
1060 std::string Name;
1061 std::vector&lt;std::string&gt; Args;
1062public:
1063 PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args)
1064 : Name(name), Args(args) {}
1065
1066 Function *Codegen();
1067};
1068
1069/// FunctionAST - This class represents a function definition itself.
1070class FunctionAST {
1071 PrototypeAST *Proto;
1072 ExprAST *Body;
1073public:
1074 FunctionAST(PrototypeAST *proto, ExprAST *body)
1075 : Proto(proto), Body(body) {}
1076
1077 Function *Codegen();
1078};
1079
1080//===----------------------------------------------------------------------===//
1081// Parser
1082//===----------------------------------------------------------------------===//
1083
1084/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
1085/// token the parser it looking at. getNextToken reads another token from the
1086/// lexer and updates CurTok with its results.
1087static int CurTok;
1088static int getNextToken() {
1089 return CurTok = gettok();
1090}
1091
1092/// BinopPrecedence - This holds the precedence for each binary operator that is
1093/// defined.
1094static std::map&lt;char, int&gt; BinopPrecedence;
1095
1096/// GetTokPrecedence - Get the precedence of the pending binary operator token.
1097static int GetTokPrecedence() {
1098 if (!isascii(CurTok))
1099 return -1;
1100
1101 // Make sure it's a declared binop.
1102 int TokPrec = BinopPrecedence[CurTok];
1103 if (TokPrec &lt;= 0) return -1;
1104 return TokPrec;
1105}
1106
1107/// Error* - These are little helper functions for error handling.
1108ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
1109PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
1110FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
1111
1112static ExprAST *ParseExpression();
1113
1114/// identifierexpr
Chris Lattner20a0c802007-11-05 17:54:34 +00001115/// ::= identifier
1116/// ::= identifier '(' expression* ')'
Chris Lattner6093bd52007-10-31 07:29:43 +00001117static ExprAST *ParseIdentifierExpr() {
1118 std::string IdName = IdentifierStr;
1119
Chris Lattner20a0c802007-11-05 17:54:34 +00001120 getNextToken(); // eat identifier.
Chris Lattner6093bd52007-10-31 07:29:43 +00001121
1122 if (CurTok != '(') // Simple variable ref.
1123 return new VariableExprAST(IdName);
1124
1125 // Call.
1126 getNextToken(); // eat (
1127 std::vector&lt;ExprAST*&gt; Args;
1128 if (CurTok != ')') {
1129 while (1) {
1130 ExprAST *Arg = ParseExpression();
1131 if (!Arg) return 0;
1132 Args.push_back(Arg);
1133
1134 if (CurTok == ')') break;
1135
1136 if (CurTok != ',')
1137 return Error("Expected ')'");
1138 getNextToken();
1139 }
1140 }
1141
1142 // Eat the ')'.
1143 getNextToken();
1144
1145 return new CallExprAST(IdName, Args);
1146}
1147
1148/// numberexpr ::= number
1149static ExprAST *ParseNumberExpr() {
1150 ExprAST *Result = new NumberExprAST(NumVal);
1151 getNextToken(); // consume the number
1152 return Result;
1153}
1154
1155/// parenexpr ::= '(' expression ')'
1156static ExprAST *ParseParenExpr() {
1157 getNextToken(); // eat (.
1158 ExprAST *V = ParseExpression();
1159 if (!V) return 0;
1160
1161 if (CurTok != ')')
1162 return Error("expected ')'");
1163 getNextToken(); // eat ).
1164 return V;
1165}
1166
1167/// ifexpr ::= 'if' expression 'then' expression 'else' expression
1168static ExprAST *ParseIfExpr() {
1169 getNextToken(); // eat the if.
1170
1171 // condition.
1172 ExprAST *Cond = ParseExpression();
1173 if (!Cond) return 0;
1174
1175 if (CurTok != tok_then)
1176 return Error("expected then");
1177 getNextToken(); // eat the then
1178
1179 ExprAST *Then = ParseExpression();
1180 if (Then == 0) return 0;
1181
1182 if (CurTok != tok_else)
1183 return Error("expected else");
1184
1185 getNextToken();
1186
1187 ExprAST *Else = ParseExpression();
1188 if (!Else) return 0;
1189
1190 return new IfExprAST(Cond, Then, Else);
1191}
1192
Chris Lattner20a0c802007-11-05 17:54:34 +00001193/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
Chris Lattner6093bd52007-10-31 07:29:43 +00001194static ExprAST *ParseForExpr() {
1195 getNextToken(); // eat the for.
1196
1197 if (CurTok != tok_identifier)
1198 return Error("expected identifier after for");
1199
1200 std::string IdName = IdentifierStr;
Chris Lattner20a0c802007-11-05 17:54:34 +00001201 getNextToken(); // eat identifier.
Chris Lattner6093bd52007-10-31 07:29:43 +00001202
1203 if (CurTok != '=')
1204 return Error("expected '=' after for");
1205 getNextToken(); // eat '='.
1206
1207
1208 ExprAST *Start = ParseExpression();
1209 if (Start == 0) return 0;
1210 if (CurTok != ',')
1211 return Error("expected ',' after for start value");
1212 getNextToken();
1213
1214 ExprAST *End = ParseExpression();
1215 if (End == 0) return 0;
1216
1217 // The step value is optional.
1218 ExprAST *Step = 0;
1219 if (CurTok == ',') {
1220 getNextToken();
1221 Step = ParseExpression();
1222 if (Step == 0) return 0;
1223 }
1224
1225 if (CurTok != tok_in)
1226 return Error("expected 'in' after for");
1227 getNextToken(); // eat 'in'.
1228
1229 ExprAST *Body = ParseExpression();
1230 if (Body == 0) return 0;
1231
1232 return new ForExprAST(IdName, Start, End, Step, Body);
1233}
1234
1235
1236/// primary
1237/// ::= identifierexpr
1238/// ::= numberexpr
1239/// ::= parenexpr
1240/// ::= ifexpr
1241/// ::= forexpr
1242static ExprAST *ParsePrimary() {
1243 switch (CurTok) {
1244 default: return Error("unknown token when expecting an expression");
1245 case tok_identifier: return ParseIdentifierExpr();
1246 case tok_number: return ParseNumberExpr();
1247 case '(': return ParseParenExpr();
1248 case tok_if: return ParseIfExpr();
1249 case tok_for: return ParseForExpr();
1250 }
1251}
1252
1253/// binoprhs
1254/// ::= ('+' primary)*
1255static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
1256 // If this is a binop, find its precedence.
1257 while (1) {
1258 int TokPrec = GetTokPrecedence();
1259
1260 // If this is a binop that binds at least as tightly as the current binop,
1261 // consume it, otherwise we are done.
1262 if (TokPrec &lt; ExprPrec)
1263 return LHS;
1264
1265 // Okay, we know this is a binop.
1266 int BinOp = CurTok;
1267 getNextToken(); // eat binop
1268
1269 // Parse the primary expression after the binary operator.
1270 ExprAST *RHS = ParsePrimary();
1271 if (!RHS) return 0;
1272
1273 // If BinOp binds less tightly with RHS than the operator after RHS, let
1274 // the pending operator take RHS as its LHS.
1275 int NextPrec = GetTokPrecedence();
1276 if (TokPrec &lt; NextPrec) {
1277 RHS = ParseBinOpRHS(TokPrec+1, RHS);
1278 if (RHS == 0) return 0;
1279 }
1280
1281 // Merge LHS/RHS.
1282 LHS = new BinaryExprAST(BinOp, LHS, RHS);
1283 }
1284}
1285
1286/// expression
1287/// ::= primary binoprhs
1288///
1289static ExprAST *ParseExpression() {
1290 ExprAST *LHS = ParsePrimary();
1291 if (!LHS) return 0;
1292
1293 return ParseBinOpRHS(0, LHS);
1294}
1295
1296/// prototype
1297/// ::= id '(' id* ')'
1298static PrototypeAST *ParsePrototype() {
1299 if (CurTok != tok_identifier)
1300 return ErrorP("Expected function name in prototype");
1301
1302 std::string FnName = IdentifierStr;
1303 getNextToken();
1304
1305 if (CurTok != '(')
1306 return ErrorP("Expected '(' in prototype");
1307
1308 std::vector&lt;std::string&gt; ArgNames;
1309 while (getNextToken() == tok_identifier)
1310 ArgNames.push_back(IdentifierStr);
1311 if (CurTok != ')')
1312 return ErrorP("Expected ')' in prototype");
1313
1314 // success.
1315 getNextToken(); // eat ')'.
1316
1317 return new PrototypeAST(FnName, ArgNames);
1318}
1319
1320/// definition ::= 'def' prototype expression
1321static FunctionAST *ParseDefinition() {
1322 getNextToken(); // eat def.
1323 PrototypeAST *Proto = ParsePrototype();
1324 if (Proto == 0) return 0;
1325
1326 if (ExprAST *E = ParseExpression())
1327 return new FunctionAST(Proto, E);
1328 return 0;
1329}
1330
1331/// toplevelexpr ::= expression
1332static FunctionAST *ParseTopLevelExpr() {
1333 if (ExprAST *E = ParseExpression()) {
1334 // Make an anonymous proto.
1335 PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
1336 return new FunctionAST(Proto, E);
1337 }
1338 return 0;
1339}
1340
1341/// external ::= 'extern' prototype
1342static PrototypeAST *ParseExtern() {
1343 getNextToken(); // eat extern.
1344 return ParsePrototype();
1345}
1346
1347//===----------------------------------------------------------------------===//
1348// Code Generation
1349//===----------------------------------------------------------------------===//
1350
1351static Module *TheModule;
1352static LLVMFoldingBuilder Builder;
1353static std::map&lt;std::string, Value*&gt; NamedValues;
1354static FunctionPassManager *TheFPM;
1355
1356Value *ErrorV(const char *Str) { Error(Str); return 0; }
1357
1358Value *NumberExprAST::Codegen() {
1359 return ConstantFP::get(Type::DoubleTy, APFloat(Val));
1360}
1361
1362Value *VariableExprAST::Codegen() {
1363 // Look this variable up in the function.
1364 Value *V = NamedValues[Name];
1365 return V ? V : ErrorV("Unknown variable name");
1366}
1367
1368Value *BinaryExprAST::Codegen() {
1369 Value *L = LHS-&gt;Codegen();
1370 Value *R = RHS-&gt;Codegen();
1371 if (L == 0 || R == 0) return 0;
1372
1373 switch (Op) {
1374 case '+': return Builder.CreateAdd(L, R, "addtmp");
1375 case '-': return Builder.CreateSub(L, R, "subtmp");
1376 case '*': return Builder.CreateMul(L, R, "multmp");
1377 case '&lt;':
1378 L = Builder.CreateFCmpULT(L, R, "multmp");
1379 // Convert bool 0/1 to double 0.0 or 1.0
1380 return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
1381 default: return ErrorV("invalid binary operator");
1382 }
1383}
1384
1385Value *CallExprAST::Codegen() {
1386 // Look up the name in the global module table.
1387 Function *CalleeF = TheModule-&gt;getFunction(Callee);
1388 if (CalleeF == 0)
1389 return ErrorV("Unknown function referenced");
1390
1391 // If argument mismatch error.
1392 if (CalleeF-&gt;arg_size() != Args.size())
1393 return ErrorV("Incorrect # arguments passed");
1394
1395 std::vector&lt;Value*&gt; ArgsV;
1396 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
1397 ArgsV.push_back(Args[i]-&gt;Codegen());
1398 if (ArgsV.back() == 0) return 0;
1399 }
1400
1401 return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
1402}
1403
1404Value *IfExprAST::Codegen() {
1405 Value *CondV = Cond-&gt;Codegen();
1406 if (CondV == 0) return 0;
1407
1408 // Convert condition to a bool by comparing equal to 0.0.
1409 CondV = Builder.CreateFCmpONE(CondV,
1410 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
1411 "ifcond");
1412
1413 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1414
1415 // Create blocks for the then and else cases. Insert the 'then' block at the
1416 // end of the function.
1417 BasicBlock *ThenBB = new BasicBlock("then", TheFunction);
1418 BasicBlock *ElseBB = new BasicBlock("else");
1419 BasicBlock *MergeBB = new BasicBlock("ifcont");
1420
1421 Builder.CreateCondBr(CondV, ThenBB, ElseBB);
1422
1423 // Emit then value.
1424 Builder.SetInsertPoint(ThenBB);
1425
1426 Value *ThenV = Then-&gt;Codegen();
1427 if (ThenV == 0) return 0;
1428
1429 Builder.CreateBr(MergeBB);
1430 // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
1431 ThenBB = Builder.GetInsertBlock();
1432
1433 // Emit else block.
1434 TheFunction-&gt;getBasicBlockList().push_back(ElseBB);
1435 Builder.SetInsertPoint(ElseBB);
1436
1437 Value *ElseV = Else-&gt;Codegen();
1438 if (ElseV == 0) return 0;
1439
1440 Builder.CreateBr(MergeBB);
1441 // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
1442 ElseBB = Builder.GetInsertBlock();
1443
1444 // Emit merge block.
1445 TheFunction-&gt;getBasicBlockList().push_back(MergeBB);
1446 Builder.SetInsertPoint(MergeBB);
1447 PHINode *PN = Builder.CreatePHI(Type::DoubleTy, "iftmp");
1448
1449 PN-&gt;addIncoming(ThenV, ThenBB);
1450 PN-&gt;addIncoming(ElseV, ElseBB);
1451 return PN;
1452}
1453
1454Value *ForExprAST::Codegen() {
Chris Lattnerf5234802007-10-31 06:47:39 +00001455 // Output this as:
1456 // ...
1457 // start = startexpr
1458 // goto loop
1459 // loop:
1460 // variable = phi [start, loopheader], [nextvariable, loopend]
1461 // ...
1462 // bodyexpr
1463 // ...
1464 // loopend:
1465 // step = stepexpr
1466 // nextvariable = variable + step
1467 // endcond = endexpr
1468 // br endcond, loop, endloop
1469 // outloop:
1470
1471 // Emit the start code first, without 'variable' in scope.
1472 Value *StartVal = Start-&gt;Codegen();
1473 if (StartVal == 0) return 0;
1474
1475 // Make the new basic block for the loop header, inserting after current
1476 // block.
1477 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1478 BasicBlock *PreheaderBB = Builder.GetInsertBlock();
1479 BasicBlock *LoopBB = new BasicBlock("loop", TheFunction);
1480
1481 // Insert an explicit fall through from the current block to the LoopBB.
Chris Lattnerf5234802007-10-31 06:47:39 +00001482 Builder.CreateBr(LoopBB);
Chris Lattner6093bd52007-10-31 07:29:43 +00001483
1484 // Start insertion in LoopBB.
Chris Lattnerf5234802007-10-31 06:47:39 +00001485 Builder.SetInsertPoint(LoopBB);
1486
1487 // Start the PHI node with an entry for Start.
1488 PHINode *Variable = Builder.CreatePHI(Type::DoubleTy, VarName.c_str());
1489 Variable-&gt;addIncoming(StartVal, PreheaderBB);
1490
1491 // Within the loop, the variable is defined equal to the PHI node. If it
1492 // shadows an existing variable, we have to restore it, so save it now.
1493 Value *OldVal = NamedValues[VarName];
1494 NamedValues[VarName] = Variable;
1495
1496 // Emit the body of the loop. This, like any other expr, can change the
1497 // current BB. Note that we ignore the value computed by the body, but don't
1498 // allow an error.
1499 if (Body-&gt;Codegen() == 0)
1500 return 0;
1501
1502 // Emit the step value.
1503 Value *StepVal;
1504 if (Step) {
1505 StepVal = Step-&gt;Codegen();
1506 if (StepVal == 0) return 0;
1507 } else {
1508 // If not specified, use 1.0.
1509 StepVal = ConstantFP::get(Type::DoubleTy, APFloat(1.0));
1510 }
1511
1512 Value *NextVar = Builder.CreateAdd(Variable, StepVal, "nextvar");
1513
Chris Lattnerf5234802007-10-31 06:47:39 +00001514 // Compute the end condition.
1515 Value *EndCond = End-&gt;Codegen();
1516 if (EndCond == 0) return EndCond;
1517
1518 // Convert condition to a bool by comparing equal to 0.0.
1519 EndCond = Builder.CreateFCmpONE(EndCond,
1520 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
1521 "loopcond");
1522
1523 // Create the "after loop" block and insert it.
1524 BasicBlock *LoopEndBB = Builder.GetInsertBlock();
1525 BasicBlock *AfterBB = new BasicBlock("afterloop", TheFunction);
1526
1527 // Insert the conditional branch into the end of LoopEndBB.
1528 Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
1529
1530 // Any new code will be inserted in AfterBB.
1531 Builder.SetInsertPoint(AfterBB);
1532
1533 // Add a new entry to the PHI node for the backedge.
1534 Variable-&gt;addIncoming(NextVar, LoopEndBB);
1535
1536 // Restore the unshadowed variable.
1537 if (OldVal)
1538 NamedValues[VarName] = OldVal;
1539 else
1540 NamedValues.erase(VarName);
1541
1542
1543 // for expr always returns 0.0.
1544 return Constant::getNullValue(Type::DoubleTy);
1545}
Chris Lattner6093bd52007-10-31 07:29:43 +00001546
1547Function *PrototypeAST::Codegen() {
1548 // Make the function type: double(double,double) etc.
1549 std::vector&lt;const Type*&gt; Doubles(Args.size(), Type::DoubleTy);
1550 FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
1551
1552 Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
1553
1554 // If F conflicted, there was already something named 'Name'. If it has a
1555 // body, don't allow redefinition or reextern.
1556 if (F-&gt;getName() != Name) {
1557 // Delete the one we just made and get the existing one.
1558 F-&gt;eraseFromParent();
1559 F = TheModule-&gt;getFunction(Name);
1560
1561 // If F already has a body, reject this.
1562 if (!F-&gt;empty()) {
1563 ErrorF("redefinition of function");
1564 return 0;
1565 }
1566
1567 // If F took a different number of args, reject.
1568 if (F-&gt;arg_size() != Args.size()) {
1569 ErrorF("redefinition of function with different # args");
1570 return 0;
1571 }
1572 }
1573
1574 // Set names for all arguments.
1575 unsigned Idx = 0;
1576 for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
1577 ++AI, ++Idx) {
1578 AI-&gt;setName(Args[Idx]);
1579
1580 // Add arguments to variable symbol table.
1581 NamedValues[Args[Idx]] = AI;
1582 }
1583
1584 return F;
1585}
1586
1587Function *FunctionAST::Codegen() {
1588 NamedValues.clear();
1589
1590 Function *TheFunction = Proto-&gt;Codegen();
1591 if (TheFunction == 0)
1592 return 0;
1593
1594 // Create a new basic block to start insertion into.
1595 BasicBlock *BB = new BasicBlock("entry", TheFunction);
1596 Builder.SetInsertPoint(BB);
1597
1598 if (Value *RetVal = Body-&gt;Codegen()) {
1599 // Finish off the function.
1600 Builder.CreateRet(RetVal);
1601
1602 // Validate the generated code, checking for consistency.
1603 verifyFunction(*TheFunction);
1604
1605 // Optimize the function.
1606 TheFPM-&gt;run(*TheFunction);
1607
1608 return TheFunction;
1609 }
1610
1611 // Error reading body, remove function.
1612 TheFunction-&gt;eraseFromParent();
1613 return 0;
1614}
1615
1616//===----------------------------------------------------------------------===//
1617// Top-Level parsing and JIT Driver
1618//===----------------------------------------------------------------------===//
1619
1620static ExecutionEngine *TheExecutionEngine;
1621
1622static void HandleDefinition() {
1623 if (FunctionAST *F = ParseDefinition()) {
1624 if (Function *LF = F-&gt;Codegen()) {
1625 fprintf(stderr, "Read function definition:");
1626 LF-&gt;dump();
1627 }
1628 } else {
1629 // Skip token for error recovery.
1630 getNextToken();
1631 }
1632}
1633
1634static void HandleExtern() {
1635 if (PrototypeAST *P = ParseExtern()) {
1636 if (Function *F = P-&gt;Codegen()) {
1637 fprintf(stderr, "Read extern: ");
1638 F-&gt;dump();
1639 }
1640 } else {
1641 // Skip token for error recovery.
1642 getNextToken();
1643 }
1644}
1645
1646static void HandleTopLevelExpression() {
1647 // Evaluate a top level expression into an anonymous function.
1648 if (FunctionAST *F = ParseTopLevelExpr()) {
1649 if (Function *LF = F-&gt;Codegen()) {
1650 // JIT the function, returning a function pointer.
1651 void *FPtr = TheExecutionEngine-&gt;getPointerToFunction(LF);
1652
1653 // Cast it to the right type (takes no arguments, returns a double) so we
1654 // can call it as a native function.
1655 double (*FP)() = (double (*)())FPtr;
1656 fprintf(stderr, "Evaluated to %f\n", FP());
1657 }
1658 } else {
1659 // Skip token for error recovery.
1660 getNextToken();
1661 }
1662}
1663
1664/// top ::= definition | external | expression | ';'
1665static void MainLoop() {
1666 while (1) {
1667 fprintf(stderr, "ready&gt; ");
1668 switch (CurTok) {
1669 case tok_eof: return;
1670 case ';': getNextToken(); break; // ignore top level semicolons.
1671 case tok_def: HandleDefinition(); break;
1672 case tok_extern: HandleExtern(); break;
1673 default: HandleTopLevelExpression(); break;
1674 }
1675 }
1676}
Chris Lattnerf5234802007-10-31 06:47:39 +00001677
Chris Lattner602c832c2007-10-31 06:30:21 +00001678
Chris Lattner602c832c2007-10-31 06:30:21 +00001679
Chris Lattner6093bd52007-10-31 07:29:43 +00001680//===----------------------------------------------------------------------===//
1681// "Library" functions that can be "extern'd" from user code.
1682//===----------------------------------------------------------------------===//
Chris Lattner602c832c2007-10-31 06:30:21 +00001683
Chris Lattner6093bd52007-10-31 07:29:43 +00001684/// putchard - putchar that takes a double and returns 0.
1685extern "C"
1686double putchard(double X) {
1687 putchar((char)X);
1688 return 0;
1689}
Chris Lattner602c832c2007-10-31 06:30:21 +00001690
Chris Lattner6093bd52007-10-31 07:29:43 +00001691//===----------------------------------------------------------------------===//
1692// Main driver code.
1693//===----------------------------------------------------------------------===//
Chris Lattner602c832c2007-10-31 06:30:21 +00001694
Chris Lattner6093bd52007-10-31 07:29:43 +00001695int main() {
1696 // Install standard binary operators.
1697 // 1 is lowest precedence.
1698 BinopPrecedence['&lt;'] = 10;
1699 BinopPrecedence['+'] = 20;
1700 BinopPrecedence['-'] = 20;
1701 BinopPrecedence['*'] = 40; // highest.
Chris Lattner602c832c2007-10-31 06:30:21 +00001702
Chris Lattner6093bd52007-10-31 07:29:43 +00001703 // Prime the first token.
1704 fprintf(stderr, "ready&gt; ");
1705 getNextToken();
Chris Lattner602c832c2007-10-31 06:30:21 +00001706
Chris Lattner6093bd52007-10-31 07:29:43 +00001707 // Make the module, which holds all the code.
1708 TheModule = new Module("my cool jit");
1709
1710 // Create the JIT.
1711 TheExecutionEngine = ExecutionEngine::create(TheModule);
1712
1713 {
1714 ExistingModuleProvider OurModuleProvider(TheModule);
1715 FunctionPassManager OurFPM(&amp;OurModuleProvider);
1716
1717 // Set up the optimizer pipeline. Start with registering info about how the
1718 // target lays out data structures.
1719 OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
1720 // Do simple "peephole" optimizations and bit-twiddling optzns.
1721 OurFPM.add(createInstructionCombiningPass());
1722 // Reassociate expressions.
1723 OurFPM.add(createReassociatePass());
1724 // Eliminate Common SubExpressions.
1725 OurFPM.add(createGVNPass());
1726 // Simplify the control flow graph (deleting unreachable blocks, etc).
1727 OurFPM.add(createCFGSimplificationPass());
1728 // Set the global so the code gen can use this.
1729 TheFPM = &amp;OurFPM;
1730
1731 // Run the main "interpreter loop" now.
1732 MainLoop();
1733
1734 TheFPM = 0;
1735 } // Free module provider and pass manager.
1736
1737
1738 // Print out all of the generated code.
1739 TheModule-&gt;dump();
1740 return 0;
1741}
Chris Lattner602c832c2007-10-31 06:30:21 +00001742</pre>
1743</div>
1744
1745</div>
1746
1747<!-- *********************************************************************** -->
1748<hr>
1749<address>
1750 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
1751 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
1752 <a href="http://validator.w3.org/check/referer"><img
1753 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
1754
1755 <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
1756 <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
1757 Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $
1758</address>
1759</body>
1760</html>