blob: acd60456c8164b37476a5d3a3e77bda3aba00574 [file] [log] [blame]
Chris Lattner602c832c2007-10-31 06:30:21 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
3
4<html>
5<head>
6 <title>Kaleidoscope: Extending the Language: Control Flow</title>
7 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
8 <meta name="author" content="Chris Lattner">
9 <link rel="stylesheet" href="../llvm.css" type="text/css">
10</head>
11
12<body>
13
14<div class="doc_title">Kaleidoscope: Extending the Language: Control Flow</div>
15
Chris Lattner128eb862007-11-05 19:06:59 +000016<ul>
Chris Lattner0e555b12007-11-05 20:04:56 +000017<li><a href="index.html">Up to Tutorial Index</a></li>
Chris Lattner128eb862007-11-05 19:06:59 +000018<li>Chapter 5
19 <ol>
20 <li><a href="#intro">Chapter 5 Introduction</a></li>
21 <li><a href="#ifthen">If/Then/Else</a>
22 <ol>
23 <li><a href="#iflexer">Lexer Extensions</a></li>
24 <li><a href="#ifast">AST Extensions</a></li>
25 <li><a href="#ifparser">Parser Extensions</a></li>
26 <li><a href="#ifir">LLVM IR</a></li>
27 <li><a href="#ifcodegen">Code Generation</a></li>
28 </ol>
29 </li>
30 <li><a href="#for">'for' Loop Expression</a>
31 <ol>
32 <li><a href="#forlexer">Lexer Extensions</a></li>
33 <li><a href="#forast">AST Extensions</a></li>
34 <li><a href="#forparser">Parser Extensions</a></li>
35 <li><a href="#forir">LLVM IR</a></li>
36 <li><a href="#forcodegen">Code Generation</a></li>
37 </ol>
38 </li>
39 <li><a href="#code">Full Code Listing</a></li>
40 </ol>
41</li>
Chris Lattner0e555b12007-11-05 20:04:56 +000042<li><a href="LangImpl6.html">Chapter 6</a>: Extending the Language:
43User-defined Operators</li>
Chris Lattner128eb862007-11-05 19:06:59 +000044</ul>
45
Chris Lattner602c832c2007-10-31 06:30:21 +000046<div class="doc_author">
47 <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
48</div>
49
50<!-- *********************************************************************** -->
Chris Lattner128eb862007-11-05 19:06:59 +000051<div class="doc_section"><a name="intro">Chapter 5 Introduction</a></div>
Chris Lattner602c832c2007-10-31 06:30:21 +000052<!-- *********************************************************************** -->
53
54<div class="doc_text">
55
Chris Lattner128eb862007-11-05 19:06:59 +000056<p>Welcome to Chapter 5 of the "<a href="index.html">Implementing a language
57with LLVM</a>" tutorial. Parts 1-4 described the implementation of the simple
Chris Lattner602c832c2007-10-31 06:30:21 +000058Kaleidoscope language and included support for generating LLVM IR, following by
59optimizations and a JIT compiler. Unfortunately, as presented, Kaleidoscope is
60mostly useless: it has no control flow other than call and return. This means
61that you can't have conditional branches in the code, significantly limiting its
62power. In this episode of "build that compiler", we'll extend Kaleidoscope to
63have an if/then/else expression plus a simple looping construct.</p>
64
65</div>
66
67<!-- *********************************************************************** -->
68<div class="doc_section"><a name="ifthen">If/Then/Else</a></div>
69<!-- *********************************************************************** -->
70
71<div class="doc_text">
72
73<p>
74Extending Kaleidoscope to support if/then/else is quite straight-forward. It
75basically requires adding lexer support for this "new" concept to the lexer,
76parser, AST, and LLVM code emitter. This example is nice, because it shows how
77easy it is to "grow" a language over time, incrementally extending it as new
78ideas are discovered.</p>
79
80<p>Before we get going on "how" we do this extension, lets talk about what we
81want. The basic idea is that we want to be able to write this sort of thing:
82</p>
83
84<div class="doc_code">
85<pre>
86def fib(x)
87 if x &lt; 3 then
88 1
89 else
90 fib(x-1)+fib(x-2);
91</pre>
92</div>
93
94<p>In Kaleidoscope, every construct is an expression: there are no statements.
95As such, the if/then/else expression needs to return a value like any other.
96Since we're using a mostly functional form, we'll have it evaluate its
97conditional, then return the 'then' or 'else' value based on how the condition
98was resolved. This is very similar to the C "?:" expression.</p>
99
100<p>The semantics of the if/then/else expression is that it first evaluates the
101condition to a boolean equality value: 0.0 is false and everything else is true.
102If the condition is true, the first subexpression is evaluated and returned, if
103the condition is false, the second subexpression is evaluated and returned.
104Since Kaleidoscope allows side-effects, this behavior is important to nail down.
105</p>
106
107<p>Now that we know what we want, lets break this down into its constituent
108pieces.</p>
109
110</div>
111
112<!-- ======================================================================= -->
113<div class="doc_subsubsection"><a name="iflexer">Lexer Extensions for
114If/Then/Else</a></div>
115<!-- ======================================================================= -->
116
117
118<div class="doc_text">
119
120<p>The lexer extensions are straight-forward. First we add new enum values
121for the relevant tokens:</p>
122
123<div class="doc_code">
124<pre>
125 // control
126 tok_if = -6, tok_then = -7, tok_else = -8,
127</pre>
128</div>
129
130<p>Once we have that, we recognize the new keywords in the lexer, pretty simple
131stuff:</p>
132
133<div class="doc_code">
134<pre>
135 ...
136 if (IdentifierStr == "def") return tok_def;
137 if (IdentifierStr == "extern") return tok_extern;
138 <b>if (IdentifierStr == "if") return tok_if;
139 if (IdentifierStr == "then") return tok_then;
140 if (IdentifierStr == "else") return tok_else;</b>
141 return tok_identifier;
142</pre>
143</div>
144
145</div>
146
147<!-- ======================================================================= -->
148<div class="doc_subsubsection"><a name="ifast">AST Extensions for
Chris Lattner128eb862007-11-05 19:06:59 +0000149 If/Then/Else</a></div>
Chris Lattner602c832c2007-10-31 06:30:21 +0000150<!-- ======================================================================= -->
151
152<div class="doc_text">
153
154<p>To represent the new expression we add a new AST node for it:</p>
155
156<div class="doc_code">
157<pre>
158/// IfExprAST - Expression class for if/then/else.
159class IfExprAST : public ExprAST {
160 ExprAST *Cond, *Then, *Else;
161public:
162 IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
163 : Cond(cond), Then(then), Else(_else) {}
164 virtual Value *Codegen();
165};
166</pre>
167</div>
168
169<p>The AST node just has pointers to the various subexpressions.</p>
170
171</div>
172
173<!-- ======================================================================= -->
174<div class="doc_subsubsection"><a name="ifparser">Parser Extensions for
Chris Lattner128eb862007-11-05 19:06:59 +0000175If/Then/Else</a></div>
Chris Lattner602c832c2007-10-31 06:30:21 +0000176<!-- ======================================================================= -->
177
178<div class="doc_text">
179
180<p>Now that we have the relevant tokens coming from the lexer and we have the
181AST node to build, our parsing logic is relatively straight-forward. First we
182define a new parsing function:</p>
183
184<div class="doc_code">
185<pre>
186/// ifexpr ::= 'if' expression 'then' expression 'else' expression
187static ExprAST *ParseIfExpr() {
188 getNextToken(); // eat the if.
189
190 // condition.
191 ExprAST *Cond = ParseExpression();
192 if (!Cond) return 0;
193
194 if (CurTok != tok_then)
195 return Error("expected then");
196 getNextToken(); // eat the then
197
198 ExprAST *Then = ParseExpression();
199 if (Then == 0) return 0;
200
201 if (CurTok != tok_else)
202 return Error("expected else");
203
204 getNextToken();
205
206 ExprAST *Else = ParseExpression();
207 if (!Else) return 0;
208
209 return new IfExprAST(Cond, Then, Else);
210}
211</pre>
212</div>
213
214<p>Next we hook it up as a primary expression:</p>
215
216<div class="doc_code">
217<pre>
218static ExprAST *ParsePrimary() {
219 switch (CurTok) {
220 default: return Error("unknown token when expecting an expression");
221 case tok_identifier: return ParseIdentifierExpr();
222 case tok_number: return ParseNumberExpr();
223 case '(': return ParseParenExpr();
224 <b>case tok_if: return ParseIfExpr();</b>
225 }
226}
227</pre>
228</div>
229
230</div>
231
232<!-- ======================================================================= -->
233<div class="doc_subsubsection"><a name="ifir">LLVM IR for If/Then/Else</a></div>
234<!-- ======================================================================= -->
235
236<div class="doc_text">
237
238<p>Now that we have it parsing and building the AST, the final piece is adding
239LLVM code generation support. This is the most interesting part of the
240if/then/else example, because this is where it starts to introduce new concepts.
241All of the code above has been described in previous chapters fairly thoroughly.
242</p>
243
244<p>To motivate the code we want to produce, lets take a look at a simple
245example. Consider:</p>
246
247<div class="doc_code">
248<pre>
249extern foo();
250extern bar();
251def baz(x) if x then foo() else bar();
252</pre>
253</div>
254
255<p>If you disable optimizations, the code you'll (soon) get from Kaleidoscope
256looks like this:</p>
257
258<div class="doc_code">
259<pre>
260declare double @foo()
261
262declare double @bar()
263
264define double @baz(double %x) {
265entry:
266 %ifcond = fcmp one double %x, 0.000000e+00
267 br i1 %ifcond, label %then, label %else
268
269then: ; preds = %entry
270 %calltmp = call double @foo()
271 br label %ifcont
272
273else: ; preds = %entry
274 %calltmp1 = call double @bar()
275 br label %ifcont
276
277ifcont: ; preds = %else, %then
278 %iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ]
279 ret double %iftmp
280}
281</pre>
282</div>
283
284<p>To visualize the control flow graph, you can use a nifty feature of the LLVM
285'<a href="http://llvm.org/cmds/opt.html">opt</a>' tool. If you put this LLVM IR
286into "t.ll" and run "<tt>llvm-as &lt; t.ll | opt -analyze -view-cfg</tt>", <a
287href="../ProgrammersManual.html#ViewGraph">a window will pop up</a> and you'll
288see this graph:</p>
289
290<center><img src="LangImpl5-cfg.png" alt="Example CFG" width="423"
291height="315"></center>
292
293<p>Another way to get this is to call "<tt>F-&gt;viewCFG()</tt>" or
294"<tt>F-&gt;viewCFGOnly()</tt>" (where F is a "<tt>Function*</tt>") either by
295inserting actual calls into the code and recompiling or by calling these in the
296debugger. LLVM has many nice features for visualizing various graphs.</p>
297
298<p>Coming back to the generated code, it is fairly simple: the entry block
299evaluates the conditional expression ("x" in our case here) and compares the
300result to 0.0 with the "<tt><a href="../LangRef.html#i_fcmp">fcmp</a> one</tt>"
301instruction ('one' is "ordered and not equal"). Based on the result of this
302expression, the code jumps to either the "then" or "else" blocks, which contain
303the expressions for the true/false case.</p>
304
305<p>Once the then/else blocks is finished executing, they both branch back to the
306else block to execute the code that happens after the if/then/else. In this
307case the only thing left to do is to return to the caller of the function. The
308question then becomes: how does the code know which expression to return?</p>
309
310<p>The answer to this question involves an important SSA operation: the
311<a href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Phi
312operation</a>. If you're not familiar with SSA, <a
313href="http://en.wikipedia.org/wiki/Static_single_assignment_form">the wikipedia
314article</a> is a good introduction and there are various other introductions to
315it available on your favorite search engine. The short version is that
316"execution" of the Phi operation requires "remembering" which block control came
317from. The Phi operation takes on the value corresponding to the input control
318block. In this case, if control comes in from the "then" block, it gets the
319value of "calltmp". If control comes from the "else" block, it gets the value
320of "calltmp1".</p>
321
322<p>At this point, you are probably starting to think "on no! this means my
323simple and elegant front-end will have to start generating SSA form in order to
324use LLVM!". Fortunately, this is not the case, and we strongly advise
325<em>not</em> implementing an SSA construction algorithm in your front-end
326unless there is an amazingly good reason to do so. In practice, there are two
327sorts of values that float around in code written in your average imperative
328programming language that might need Phi nodes:</p>
329
330<ol>
331<li>Code that involves user variables: <tt>x = 1; x = x + 1; </tt></li>
332<li>Values that are implicit in the structure of your AST, such as the phi node
333in this case.</li>
334</ol>
335
Chris Lattnerb0f0deb2007-11-05 07:02:49 +0000336<p>In <a href="LangImpl7.html">Chapter 7</a> of this tutorial ("mutable
337variables"), we'll talk about #1
Chris Lattner602c832c2007-10-31 06:30:21 +0000338in depth. For now, just believe me that you don't need SSA construction to
339handle them. For #2, you have the choice of using the techniques that we will
340describe for #1, or you can insert Phi nodes directly if convenient. In this
341case, it is really really easy to generate the Phi node, so we choose to do it
342directly.</p>
343
344<p>Okay, enough of the motivation and overview, lets generate code!</p>
345
346</div>
347
348<!-- ======================================================================= -->
349<div class="doc_subsubsection"><a name="ifcodegen">Code Generation for
350If/Then/Else</a></div>
351<!-- ======================================================================= -->
352
353<div class="doc_text">
354
355<p>In order to generate code for this, we implement the <tt>Codegen</tt> method
356for <tt>IfExprAST</tt>:</p>
357
358<div class="doc_code">
359<pre>
360Value *IfExprAST::Codegen() {
361 Value *CondV = Cond-&gt;Codegen();
362 if (CondV == 0) return 0;
363
364 // Convert condition to a bool by comparing equal to 0.0.
365 CondV = Builder.CreateFCmpONE(CondV,
366 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
367 "ifcond");
368</pre>
369</div>
370
371<p>This code is straight-forward and similar to what we saw before. We emit the
372expression for the condition, then compare that value to zero to get a truth
373value as a 1-bit (bool) value.</p>
374
375<div class="doc_code">
376<pre>
377 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
378
379 // Create blocks for the then and else cases. Insert the 'then' block at the
380 // end of the function.
381 BasicBlock *ThenBB = new BasicBlock("then", TheFunction);
382 BasicBlock *ElseBB = new BasicBlock("else");
383 BasicBlock *MergeBB = new BasicBlock("ifcont");
384
385 Builder.CreateCondBr(CondV, ThenBB, ElseBB);
386</pre>
387</div>
388
389<p>This code creates the basic blocks that are related to the if/then/else
390statement, and correspond directly to the blocks in the example above. The
391first line of this gets the current Function object that is being built. It
392gets this by asking the builder for the current BasicBlock, and asking that
393block for its "parent" (the function it is currently embedded into).</p>
394
395<p>Once it has that, it creates three blocks. Note that it passes "TheFunction"
396into the constructor for the "then" block. This causes the constructor to
397automatically insert the new block onto the end of the specified function. The
398other two blocks are created, but aren't yet inserted into the function.</p>
399
400<p>Once the blocks are created, we can emit the conditional branch that chooses
401between them. Note that creating new blocks does not implicitly affect the
402LLVMBuilder, so it is still inserting into the block that the condition
403went into. Also note that it is creating a branch to the "then" block and the
404"else" block, even though the "else" block isn't inserted into the function yet.
405This is all ok: it is the standard way that LLVM supports forward
406references.</p>
407
408<div class="doc_code">
409<pre>
410 // Emit then value.
411 Builder.SetInsertPoint(ThenBB);
412
413 Value *ThenV = Then-&gt;Codegen();
414 if (ThenV == 0) return 0;
415
416 Builder.CreateBr(MergeBB);
417 // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
418 ThenBB = Builder.GetInsertBlock();
419</pre>
420</div>
421
422<p>After the conditional branch is inserted, we move the builder to start
423inserting into the "then" block. Strictly speaking, this call moves the
424insertion point to be at the end of the specified block. However, since the
425"then" block is empty, it also starts out by inserting at the beginning of the
426block. :)</p>
427
428<p>Once the insertion point is set, we recursively codegen the "then" expression
429from the AST. To finish off the then block, we create an unconditional branch
430to the merge block. One interesting (and very important) aspect of the LLVM IR
431is that it <a href="../LangRef.html#functionstructure">requires all basic blocks
432to be "terminated"</a> with a <a href="../LangRef.html#terminators">control flow
433instruction</a> such as return or branch. This means that all control flow,
434<em>including fall throughs</em> must be made explicit in the LLVM IR. If you
435violate this rule, the verifier will emit an error.</p>
436
437<p>The final line here is quite subtle, but is very important. The basic issue
438is that when we create the Phi node in the merge block, we need to set up the
439block/value pairs that indicate how the Phi will work. Importantly, the Phi
Chris Lattnerb5019642007-11-05 17:52:04 +0000440node expects to have an entry for each predecessor of the block in the CFG. Why
Chris Lattner602c832c2007-10-31 06:30:21 +0000441then are we getting the current block when we just set it to ThenBB 5 lines
442above? The problem is that the "Then" expression may actually itself change the
443block that the Builder is emitting into if, for example, it contains a nested
444"if/then/else" expression. Because calling Codegen recursively could
445arbitrarily change the notion of the current block, we are required to get an
446up-to-date value for code that will set up the Phi node.</p>
447
448<div class="doc_code">
449<pre>
450 // Emit else block.
451 TheFunction-&gt;getBasicBlockList().push_back(ElseBB);
452 Builder.SetInsertPoint(ElseBB);
453
454 Value *ElseV = Else-&gt;Codegen();
455 if (ElseV == 0) return 0;
456
457 Builder.CreateBr(MergeBB);
458 // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
459 ElseBB = Builder.GetInsertBlock();
460</pre>
461</div>
462
463<p>Code generation for the 'else' block is basically identical to codegen for
464the 'then' block. The only significant difference is the first line, which adds
465the 'else' block to the function. Recall previously that the 'else' block was
466created, but not added to the function. Now that the 'then' and 'else' blocks
467are emitted, we can finish up with the merge code:</p>
468
469<div class="doc_code">
470<pre>
471 // Emit merge block.
472 TheFunction->getBasicBlockList().push_back(MergeBB);
473 Builder.SetInsertPoint(MergeBB);
474 PHINode *PN = Builder.CreatePHI(Type::DoubleTy, "iftmp");
475
476 PN->addIncoming(ThenV, ThenBB);
477 PN->addIncoming(ElseV, ElseBB);
478 return PN;
479}
480</pre>
481</div>
482
483<p>The first two lines here are now familiar: the first adds the "merge" block
484to the Function object (it was previously floating, like the else block above).
485The second block changes the insertion point so that newly created code will go
486into the "merge" block. Once that is done, we need to create the PHI node and
487set up the block/value pairs for the PHI.</p>
488
489<p>Finally, the CodeGen function returns the phi node as the value computed by
490the if/then/else expression. In our example above, this returned value will
491feed into the code for the top-level function, which will create the return
492instruction.</p>
493
494<p>Overall, we now have the ability to execution conditional code in
495Kaleidoscope. With this extension, Kaleidoscope is a fairly complete language
496that can calculate a wide variety of numeric functions. Next up we'll add
497another useful expression that is familiar from non-functional languages...</p>
498
499</div>
500
501<!-- *********************************************************************** -->
502<div class="doc_section"><a name="for">'for' Loop Expression</a></div>
503<!-- *********************************************************************** -->
504
505<div class="doc_text">
506
Chris Lattnerf5234802007-10-31 06:47:39 +0000507<p>Now that we know how to add basic control flow constructs to the language,
508we have the tools to add more powerful things. Lets add something more
509aggressive, a 'for' expression:</p>
510
511<div class="doc_code">
512<pre>
Chris Lattnerf5234802007-10-31 06:47:39 +0000513 extern putchard(char)
Chris Lattner6093bd52007-10-31 07:29:43 +0000514 def printstar(n)
515 for i = 1, i &lt; n, 1.0 in
516 putchard(42); # ascii 42 = '*'
517
518 # print 100 '*' characters
519 printstar(100);
Chris Lattnerf5234802007-10-31 06:47:39 +0000520</pre>
521</div>
522
Chris Lattner6093bd52007-10-31 07:29:43 +0000523<p>This expression defines a new variable ("i" in this case) which iterates from
524a starting value, while the condition ("i &lt; n" in this case) is true,
Chris Lattnerf5234802007-10-31 06:47:39 +0000525incrementing by an optional step value ("1.0" in this case). If the step value
526is omitted, it defaults to 1.0. While the loop is true, it executes its
527body expression. Because we don't have anything better to return, we'll just
528define the loop as always returning 0.0. In the future when we have mutable
529variables, it will get more useful.</p>
530
531<p>As before, lets talk about the changes that we need to Kaleidoscope to
532support this.</p>
533
534</div>
535
536<!-- ======================================================================= -->
537<div class="doc_subsubsection"><a name="forlexer">Lexer Extensions for
538the 'for' Loop</a></div>
539<!-- ======================================================================= -->
540
541<div class="doc_text">
542
543<p>The lexer extensions are the same sort of thing as for if/then/else:</p>
544
545<div class="doc_code">
546<pre>
547 ... in enum Token ...
548 // control
549 tok_if = -6, tok_then = -7, tok_else = -8,
550<b> tok_for = -9, tok_in = -10</b>
551
552 ... in gettok ...
553 if (IdentifierStr == "def") return tok_def;
554 if (IdentifierStr == "extern") return tok_extern;
555 if (IdentifierStr == "if") return tok_if;
556 if (IdentifierStr == "then") return tok_then;
557 if (IdentifierStr == "else") return tok_else;
558 <b>if (IdentifierStr == "for") return tok_for;
559 if (IdentifierStr == "in") return tok_in;</b>
560 return tok_identifier;
561</pre>
562</div>
563
564</div>
565
566<!-- ======================================================================= -->
567<div class="doc_subsubsection"><a name="forast">AST Extensions for
568the 'for' Loop</a></div>
569<!-- ======================================================================= -->
570
571<div class="doc_text">
572
573<p>The AST node is similarly simple. It basically boils down to capturing
574the variable name and the consituent expressions in the node.</p>
575
576<div class="doc_code">
577<pre>
578/// ForExprAST - Expression class for for/in.
579class ForExprAST : public ExprAST {
580 std::string VarName;
581 ExprAST *Start, *End, *Step, *Body;
582public:
583 ForExprAST(const std::string &amp;varname, ExprAST *start, ExprAST *end,
584 ExprAST *step, ExprAST *body)
585 : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
586 virtual Value *Codegen();
587};
588</pre>
589</div>
590
591</div>
592
593<!-- ======================================================================= -->
594<div class="doc_subsubsection"><a name="forparser">Parser Extensions for
595the 'for' Loop</a></div>
596<!-- ======================================================================= -->
597
598<div class="doc_text">
599
600<p>The parser code is also fairly standard. The only interesting thing here is
601handling of the optional step value. The parser code handles it by checking to
602see if the second comma is present. If not, it sets the step value to null in
603the AST node:</p>
604
605<div class="doc_code">
606<pre>
Chris Lattner20a0c802007-11-05 17:54:34 +0000607/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
Chris Lattnerf5234802007-10-31 06:47:39 +0000608static ExprAST *ParseForExpr() {
609 getNextToken(); // eat the for.
610
611 if (CurTok != tok_identifier)
612 return Error("expected identifier after for");
613
614 std::string IdName = IdentifierStr;
Chris Lattner20a0c802007-11-05 17:54:34 +0000615 getNextToken(); // eat identifier.
Chris Lattnerf5234802007-10-31 06:47:39 +0000616
617 if (CurTok != '=')
618 return Error("expected '=' after for");
619 getNextToken(); // eat '='.
620
621
622 ExprAST *Start = ParseExpression();
623 if (Start == 0) return 0;
624 if (CurTok != ',')
625 return Error("expected ',' after for start value");
626 getNextToken();
627
628 ExprAST *End = ParseExpression();
629 if (End == 0) return 0;
630
631 // The step value is optional.
632 ExprAST *Step = 0;
633 if (CurTok == ',') {
634 getNextToken();
635 Step = ParseExpression();
636 if (Step == 0) return 0;
637 }
638
639 if (CurTok != tok_in)
640 return Error("expected 'in' after for");
641 getNextToken(); // eat 'in'.
642
643 ExprAST *Body = ParseExpression();
644 if (Body == 0) return 0;
645
646 return new ForExprAST(IdName, Start, End, Step, Body);
647}
648</pre>
649</div>
650
651</div>
652
653<!-- ======================================================================= -->
654<div class="doc_subsubsection"><a name="forir">LLVM IR for
655the 'for' Loop</a></div>
656<!-- ======================================================================= -->
657
658<div class="doc_text">
659
660<p>Now we get to the good part: the LLVM IR we want to generate for this thing.
Chris Lattner6093bd52007-10-31 07:29:43 +0000661With the simple example above, we get this LLVM IR (note that this dump is
662generated with optimizations disabled):
Chris Lattnerf5234802007-10-31 06:47:39 +0000663</p>
664
Chris Lattner6093bd52007-10-31 07:29:43 +0000665<div class="doc_code">
666<pre>
667declare double @putchard(double)
Chris Lattnerf5234802007-10-31 06:47:39 +0000668
Chris Lattner6093bd52007-10-31 07:29:43 +0000669define double @printstar(double %n) {
670entry:
671 ; initial value = 1.0 (inlined into phi)
672 br label %loop
673
674loop: ; preds = %loop, %entry
675 %i = phi double [ 1.000000e+00, %entry ], [ %nextvar, %loop ]
676 ; body
677 %calltmp = call double @putchard( double 4.200000e+01 )
678 ; increment
679 %nextvar = add double %i, 1.000000e+00
680
681 ; termination test
682 %multmp = fcmp ult double %i, %n
683 %booltmp = uitofp i1 %multmp to double
684 %loopcond = fcmp one double %booltmp, 0.000000e+00
685 br i1 %loopcond, label %loop, label %afterloop
686
687afterloop: ; preds = %loop
688 ; loop always returns 0.0
689 ret double 0.000000e+00
690}
691</pre>
692</div>
693
694<p>This loop contains all the same constructs we saw before: a phi node, several
695expressions, and some basic blocks. Lets see how this fits together.</p>
Chris Lattnerf5234802007-10-31 06:47:39 +0000696
697</div>
698
699<!-- ======================================================================= -->
700<div class="doc_subsubsection"><a name="forcodegen">Code Generation for
701the 'for' Loop</a></div>
702<!-- ======================================================================= -->
703
704<div class="doc_text">
705
Chris Lattner6093bd52007-10-31 07:29:43 +0000706<p>The first part of codegen is very simple: we just output the start expression
707for the loop value:</p>
Chris Lattnerf5234802007-10-31 06:47:39 +0000708
709<div class="doc_code">
710<pre>
711Value *ForExprAST::Codegen() {
Chris Lattner6093bd52007-10-31 07:29:43 +0000712 // Emit the start code first, without 'variable' in scope.
713 Value *StartVal = Start-&gt;Codegen();
714 if (StartVal == 0) return 0;
715</pre>
716</div>
717
718<p>With this out of the way, the next step is to set up the LLVM basic block
719for the start of the loop body. In the case above, the whole loop body is one
720block, but remember that the body code itself could consist of multiple blocks
721(e.g. if it is a if/then/else expression).</p>
722
723<div class="doc_code">
724<pre>
725 // Make the new basic block for the loop header, inserting after current
726 // block.
727 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
728 BasicBlock *PreheaderBB = Builder.GetInsertBlock();
729 BasicBlock *LoopBB = new BasicBlock("loop", TheFunction);
730
731 // Insert an explicit fall through from the current block to the LoopBB.
732 Builder.CreateBr(LoopBB);
733</pre>
734</div>
735
736<p>This code is similar to what we saw for if/then/else. Because we will need
737it to create the Phi node, we remember the block that falls through into the
738loop. Once we have that, we create the actual block that starts the loop and
739create an unconditional branch for the fall-through between the two blocks.</p>
740
741<div class="doc_code">
742<pre>
743 // Start insertion in LoopBB.
744 Builder.SetInsertPoint(LoopBB);
745
746 // Start the PHI node with an entry for Start.
747 PHINode *Variable = Builder.CreatePHI(Type::DoubleTy, VarName.c_str());
748 Variable-&gt;addIncoming(StartVal, PreheaderBB);
749</pre>
750</div>
751
752<p>Now that the "preheader" for the loop is set up, we switch to emitting code
753for the loop body. To begin with, we move the insertion point and create the
754PHI node for the loop induction variable. SInce we already know the incoming
755value for the starting value, we add it to the Phi node. Note that the Phi will
756eventually get a second value for the backedge, but we can't set it up yet
757(because it doesn't exist!).</p>
758
759<div class="doc_code">
760<pre>
761 // Within the loop, the variable is defined equal to the PHI node. If it
762 // shadows an existing variable, we have to restore it, so save it now.
763 Value *OldVal = NamedValues[VarName];
764 NamedValues[VarName] = Variable;
765
766 // Emit the body of the loop. This, like any other expr, can change the
767 // current BB. Note that we ignore the value computed by the body, but don't
768 // allow an error.
769 if (Body-&gt;Codegen() == 0)
770 return 0;
771</pre>
772</div>
773
774<p>Now the code starts to get more interesting. Our 'for' loop introduces a new
775variable to the symbol table. This means that our symbol table can now contain
776either function arguments or loop variables. To handle this, before we codegen
777the body of the loop, we add the loop variable as the current value for its
778name. Note that it is possible that there is a variable of the same name in the
779outer scope. It would be easy to make this an error (emit an error and return
780null if there is already an entry for VarName) but we choose to allow shadowing
781of variables. In order to handle this correctly, we remember the Value that
782we are potentially shadowing in <tt>OldVal</tt> (which will be null if there is
783no shadowed variable).</p>
784
785<p>Once the loop variable is set into the symbol table, the code recursively
786codegen's the body. This allows the body to use the loop variable: any
787references to it will naturally find it in the symbol table.</p>
788
789<div class="doc_code">
790<pre>
791 // Emit the step value.
792 Value *StepVal;
793 if (Step) {
794 StepVal = Step-&gt;Codegen();
795 if (StepVal == 0) return 0;
796 } else {
797 // If not specified, use 1.0.
798 StepVal = ConstantFP::get(Type::DoubleTy, APFloat(1.0));
799 }
800
801 Value *NextVar = Builder.CreateAdd(Variable, StepVal, "nextvar");
802</pre>
803</div>
804
805<p>Now that the body is emitted, we compute the next value of the iteration
806variable by adding the step value or 1.0 if it isn't present. '<tt>NextVar</tt>'
807will be the value of the loop variable on the next iteration of the loop.</p>
808
809<div class="doc_code">
810<pre>
811 // Compute the end condition.
812 Value *EndCond = End-&gt;Codegen();
813 if (EndCond == 0) return EndCond;
814
815 // Convert condition to a bool by comparing equal to 0.0.
816 EndCond = Builder.CreateFCmpONE(EndCond,
817 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
818 "loopcond");
819</pre>
820</div>
821
822<p>Finally, we evaluate the exit value of the loop, to determine whether the
823loop should exit. This mirrors the condition evaluation for the if/then/else
824statement.</p>
825
826<div class="doc_code">
827<pre>
828 // Create the "after loop" block and insert it.
829 BasicBlock *LoopEndBB = Builder.GetInsertBlock();
830 BasicBlock *AfterBB = new BasicBlock("afterloop", TheFunction);
831
832 // Insert the conditional branch into the end of LoopEndBB.
833 Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
834
835 // Any new code will be inserted in AfterBB.
836 Builder.SetInsertPoint(AfterBB);
837</pre>
838</div>
839
840<p>With the code for the body of the loop complete, we just need to finish up
841the control flow for it. This remembers the end block (for the phi node), then
842creates the block for the loop exit ("afterloop"). Based on the value of the
843exit condition, it creates a conditional branch that chooses between executing
844the loop again and exiting the loop. Any future code is emitted in the
845"afterloop" block, so it sets the insertion position to it.</p>
846
847<div class="doc_code">
848<pre>
849 // Add a new entry to the PHI node for the backedge.
850 Variable-&gt;addIncoming(NextVar, LoopEndBB);
851
852 // Restore the unshadowed variable.
853 if (OldVal)
854 NamedValues[VarName] = OldVal;
855 else
856 NamedValues.erase(VarName);
857
858 // for expr always returns 0.0.
859 return Constant::getNullValue(Type::DoubleTy);
860}
861</pre>
862</div>
863
864<p>The final code handles various cleanups: now that we have the "NextVar"
865value, we can add the incoming value to the loop PHI node. After that, we
866remove the loop variable from the symbol table, so that it isn't in scope after
867the for loop. Finally, code generation of the for loop always returns 0.0, so
868that is what we return from <tt>ForExprAST::Codegen</tt>.</p>
869
870<p>With this, we conclude the "adding control flow to Kaleidoscope" chapter of
871the tutorial. We added two control flow constructs, and used them to motivate
872a couple of aspects of the LLVM IR that are important for front-end implementors
873to know. In the next chapter of our saga, we will get a bit crazier and add
874operator overloading to our poor innocent language.</p>
875
876</div>
877
878<!-- *********************************************************************** -->
879<div class="doc_section"><a name="code">Full Code Listing</a></div>
880<!-- *********************************************************************** -->
881
882<div class="doc_text">
883
884<p>
885Here is the complete code listing for our running example, enhanced with the
886if/then/else and for expressions.. To build this example, use:
887</p>
888
889<div class="doc_code">
890<pre>
891 # Compile
892 g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
893 # Run
894 ./toy
895</pre>
896</div>
897
898<p>Here is the code:</p>
899
900<div class="doc_code">
901<pre>
902#include "llvm/DerivedTypes.h"
903#include "llvm/ExecutionEngine/ExecutionEngine.h"
904#include "llvm/Module.h"
905#include "llvm/ModuleProvider.h"
906#include "llvm/PassManager.h"
907#include "llvm/Analysis/Verifier.h"
908#include "llvm/Target/TargetData.h"
909#include "llvm/Transforms/Scalar.h"
910#include "llvm/Support/LLVMBuilder.h"
911#include &lt;cstdio&gt;
912#include &lt;string&gt;
913#include &lt;map&gt;
914#include &lt;vector&gt;
915using namespace llvm;
916
917//===----------------------------------------------------------------------===//
918// Lexer
919//===----------------------------------------------------------------------===//
920
921// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
922// of these for known things.
923enum Token {
924 tok_eof = -1,
925
926 // commands
927 tok_def = -2, tok_extern = -3,
928
929 // primary
930 tok_identifier = -4, tok_number = -5,
931
932 // control
933 tok_if = -6, tok_then = -7, tok_else = -8,
934 tok_for = -9, tok_in = -10
935};
936
937static std::string IdentifierStr; // Filled in if tok_identifier
938static double NumVal; // Filled in if tok_number
939
940/// gettok - Return the next token from standard input.
941static int gettok() {
942 static int LastChar = ' ';
943
944 // Skip any whitespace.
945 while (isspace(LastChar))
946 LastChar = getchar();
947
948 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
949 IdentifierStr = LastChar;
950 while (isalnum((LastChar = getchar())))
951 IdentifierStr += LastChar;
952
953 if (IdentifierStr == "def") return tok_def;
954 if (IdentifierStr == "extern") return tok_extern;
955 if (IdentifierStr == "if") return tok_if;
956 if (IdentifierStr == "then") return tok_then;
957 if (IdentifierStr == "else") return tok_else;
958 if (IdentifierStr == "for") return tok_for;
959 if (IdentifierStr == "in") return tok_in;
960 return tok_identifier;
961 }
962
963 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
964 std::string NumStr;
965 do {
966 NumStr += LastChar;
967 LastChar = getchar();
968 } while (isdigit(LastChar) || LastChar == '.');
969
970 NumVal = strtod(NumStr.c_str(), 0);
971 return tok_number;
972 }
973
974 if (LastChar == '#') {
975 // Comment until end of line.
976 do LastChar = getchar();
977 while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp; LastChar != '\r');
978
979 if (LastChar != EOF)
980 return gettok();
981 }
982
983 // Check for end of file. Don't eat the EOF.
984 if (LastChar == EOF)
985 return tok_eof;
986
987 // Otherwise, just return the character as its ascii value.
988 int ThisChar = LastChar;
989 LastChar = getchar();
990 return ThisChar;
991}
992
993//===----------------------------------------------------------------------===//
994// Abstract Syntax Tree (aka Parse Tree)
995//===----------------------------------------------------------------------===//
996
997/// ExprAST - Base class for all expression nodes.
998class ExprAST {
999public:
1000 virtual ~ExprAST() {}
1001 virtual Value *Codegen() = 0;
1002};
1003
1004/// NumberExprAST - Expression class for numeric literals like "1.0".
1005class NumberExprAST : public ExprAST {
1006 double Val;
1007public:
1008 NumberExprAST(double val) : Val(val) {}
1009 virtual Value *Codegen();
1010};
1011
1012/// VariableExprAST - Expression class for referencing a variable, like "a".
1013class VariableExprAST : public ExprAST {
1014 std::string Name;
1015public:
1016 VariableExprAST(const std::string &amp;name) : Name(name) {}
1017 virtual Value *Codegen();
1018};
1019
1020/// BinaryExprAST - Expression class for a binary operator.
1021class BinaryExprAST : public ExprAST {
1022 char Op;
1023 ExprAST *LHS, *RHS;
1024public:
1025 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
1026 : Op(op), LHS(lhs), RHS(rhs) {}
1027 virtual Value *Codegen();
1028};
1029
1030/// CallExprAST - Expression class for function calls.
1031class CallExprAST : public ExprAST {
1032 std::string Callee;
1033 std::vector&lt;ExprAST*&gt; Args;
1034public:
1035 CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
1036 : Callee(callee), Args(args) {}
1037 virtual Value *Codegen();
1038};
1039
1040/// IfExprAST - Expression class for if/then/else.
1041class IfExprAST : public ExprAST {
1042 ExprAST *Cond, *Then, *Else;
1043public:
1044 IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
1045 : Cond(cond), Then(then), Else(_else) {}
1046 virtual Value *Codegen();
1047};
1048
1049/// ForExprAST - Expression class for for/in.
1050class ForExprAST : public ExprAST {
1051 std::string VarName;
1052 ExprAST *Start, *End, *Step, *Body;
1053public:
1054 ForExprAST(const std::string &amp;varname, ExprAST *start, ExprAST *end,
1055 ExprAST *step, ExprAST *body)
1056 : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
1057 virtual Value *Codegen();
1058};
1059
1060/// PrototypeAST - This class represents the "prototype" for a function,
1061/// which captures its argument names as well as if it is an operator.
1062class PrototypeAST {
1063 std::string Name;
1064 std::vector&lt;std::string&gt; Args;
1065public:
1066 PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args)
1067 : Name(name), Args(args) {}
1068
1069 Function *Codegen();
1070};
1071
1072/// FunctionAST - This class represents a function definition itself.
1073class FunctionAST {
1074 PrototypeAST *Proto;
1075 ExprAST *Body;
1076public:
1077 FunctionAST(PrototypeAST *proto, ExprAST *body)
1078 : Proto(proto), Body(body) {}
1079
1080 Function *Codegen();
1081};
1082
1083//===----------------------------------------------------------------------===//
1084// Parser
1085//===----------------------------------------------------------------------===//
1086
1087/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
1088/// token the parser it looking at. getNextToken reads another token from the
1089/// lexer and updates CurTok with its results.
1090static int CurTok;
1091static int getNextToken() {
1092 return CurTok = gettok();
1093}
1094
1095/// BinopPrecedence - This holds the precedence for each binary operator that is
1096/// defined.
1097static std::map&lt;char, int&gt; BinopPrecedence;
1098
1099/// GetTokPrecedence - Get the precedence of the pending binary operator token.
1100static int GetTokPrecedence() {
1101 if (!isascii(CurTok))
1102 return -1;
1103
1104 // Make sure it's a declared binop.
1105 int TokPrec = BinopPrecedence[CurTok];
1106 if (TokPrec &lt;= 0) return -1;
1107 return TokPrec;
1108}
1109
1110/// Error* - These are little helper functions for error handling.
1111ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
1112PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
1113FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
1114
1115static ExprAST *ParseExpression();
1116
1117/// identifierexpr
Chris Lattner20a0c802007-11-05 17:54:34 +00001118/// ::= identifier
1119/// ::= identifier '(' expression* ')'
Chris Lattner6093bd52007-10-31 07:29:43 +00001120static ExprAST *ParseIdentifierExpr() {
1121 std::string IdName = IdentifierStr;
1122
Chris Lattner20a0c802007-11-05 17:54:34 +00001123 getNextToken(); // eat identifier.
Chris Lattner6093bd52007-10-31 07:29:43 +00001124
1125 if (CurTok != '(') // Simple variable ref.
1126 return new VariableExprAST(IdName);
1127
1128 // Call.
1129 getNextToken(); // eat (
1130 std::vector&lt;ExprAST*&gt; Args;
1131 if (CurTok != ')') {
1132 while (1) {
1133 ExprAST *Arg = ParseExpression();
1134 if (!Arg) return 0;
1135 Args.push_back(Arg);
1136
1137 if (CurTok == ')') break;
1138
1139 if (CurTok != ',')
1140 return Error("Expected ')'");
1141 getNextToken();
1142 }
1143 }
1144
1145 // Eat the ')'.
1146 getNextToken();
1147
1148 return new CallExprAST(IdName, Args);
1149}
1150
1151/// numberexpr ::= number
1152static ExprAST *ParseNumberExpr() {
1153 ExprAST *Result = new NumberExprAST(NumVal);
1154 getNextToken(); // consume the number
1155 return Result;
1156}
1157
1158/// parenexpr ::= '(' expression ')'
1159static ExprAST *ParseParenExpr() {
1160 getNextToken(); // eat (.
1161 ExprAST *V = ParseExpression();
1162 if (!V) return 0;
1163
1164 if (CurTok != ')')
1165 return Error("expected ')'");
1166 getNextToken(); // eat ).
1167 return V;
1168}
1169
1170/// ifexpr ::= 'if' expression 'then' expression 'else' expression
1171static ExprAST *ParseIfExpr() {
1172 getNextToken(); // eat the if.
1173
1174 // condition.
1175 ExprAST *Cond = ParseExpression();
1176 if (!Cond) return 0;
1177
1178 if (CurTok != tok_then)
1179 return Error("expected then");
1180 getNextToken(); // eat the then
1181
1182 ExprAST *Then = ParseExpression();
1183 if (Then == 0) return 0;
1184
1185 if (CurTok != tok_else)
1186 return Error("expected else");
1187
1188 getNextToken();
1189
1190 ExprAST *Else = ParseExpression();
1191 if (!Else) return 0;
1192
1193 return new IfExprAST(Cond, Then, Else);
1194}
1195
Chris Lattner20a0c802007-11-05 17:54:34 +00001196/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
Chris Lattner6093bd52007-10-31 07:29:43 +00001197static ExprAST *ParseForExpr() {
1198 getNextToken(); // eat the for.
1199
1200 if (CurTok != tok_identifier)
1201 return Error("expected identifier after for");
1202
1203 std::string IdName = IdentifierStr;
Chris Lattner20a0c802007-11-05 17:54:34 +00001204 getNextToken(); // eat identifier.
Chris Lattner6093bd52007-10-31 07:29:43 +00001205
1206 if (CurTok != '=')
1207 return Error("expected '=' after for");
1208 getNextToken(); // eat '='.
1209
1210
1211 ExprAST *Start = ParseExpression();
1212 if (Start == 0) return 0;
1213 if (CurTok != ',')
1214 return Error("expected ',' after for start value");
1215 getNextToken();
1216
1217 ExprAST *End = ParseExpression();
1218 if (End == 0) return 0;
1219
1220 // The step value is optional.
1221 ExprAST *Step = 0;
1222 if (CurTok == ',') {
1223 getNextToken();
1224 Step = ParseExpression();
1225 if (Step == 0) return 0;
1226 }
1227
1228 if (CurTok != tok_in)
1229 return Error("expected 'in' after for");
1230 getNextToken(); // eat 'in'.
1231
1232 ExprAST *Body = ParseExpression();
1233 if (Body == 0) return 0;
1234
1235 return new ForExprAST(IdName, Start, End, Step, Body);
1236}
1237
1238
1239/// primary
1240/// ::= identifierexpr
1241/// ::= numberexpr
1242/// ::= parenexpr
1243/// ::= ifexpr
1244/// ::= forexpr
1245static ExprAST *ParsePrimary() {
1246 switch (CurTok) {
1247 default: return Error("unknown token when expecting an expression");
1248 case tok_identifier: return ParseIdentifierExpr();
1249 case tok_number: return ParseNumberExpr();
1250 case '(': return ParseParenExpr();
1251 case tok_if: return ParseIfExpr();
1252 case tok_for: return ParseForExpr();
1253 }
1254}
1255
1256/// binoprhs
1257/// ::= ('+' primary)*
1258static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
1259 // If this is a binop, find its precedence.
1260 while (1) {
1261 int TokPrec = GetTokPrecedence();
1262
1263 // If this is a binop that binds at least as tightly as the current binop,
1264 // consume it, otherwise we are done.
1265 if (TokPrec &lt; ExprPrec)
1266 return LHS;
1267
1268 // Okay, we know this is a binop.
1269 int BinOp = CurTok;
1270 getNextToken(); // eat binop
1271
1272 // Parse the primary expression after the binary operator.
1273 ExprAST *RHS = ParsePrimary();
1274 if (!RHS) return 0;
1275
1276 // If BinOp binds less tightly with RHS than the operator after RHS, let
1277 // the pending operator take RHS as its LHS.
1278 int NextPrec = GetTokPrecedence();
1279 if (TokPrec &lt; NextPrec) {
1280 RHS = ParseBinOpRHS(TokPrec+1, RHS);
1281 if (RHS == 0) return 0;
1282 }
1283
1284 // Merge LHS/RHS.
1285 LHS = new BinaryExprAST(BinOp, LHS, RHS);
1286 }
1287}
1288
1289/// expression
1290/// ::= primary binoprhs
1291///
1292static ExprAST *ParseExpression() {
1293 ExprAST *LHS = ParsePrimary();
1294 if (!LHS) return 0;
1295
1296 return ParseBinOpRHS(0, LHS);
1297}
1298
1299/// prototype
1300/// ::= id '(' id* ')'
1301static PrototypeAST *ParsePrototype() {
1302 if (CurTok != tok_identifier)
1303 return ErrorP("Expected function name in prototype");
1304
1305 std::string FnName = IdentifierStr;
1306 getNextToken();
1307
1308 if (CurTok != '(')
1309 return ErrorP("Expected '(' in prototype");
1310
1311 std::vector&lt;std::string&gt; ArgNames;
1312 while (getNextToken() == tok_identifier)
1313 ArgNames.push_back(IdentifierStr);
1314 if (CurTok != ')')
1315 return ErrorP("Expected ')' in prototype");
1316
1317 // success.
1318 getNextToken(); // eat ')'.
1319
1320 return new PrototypeAST(FnName, ArgNames);
1321}
1322
1323/// definition ::= 'def' prototype expression
1324static FunctionAST *ParseDefinition() {
1325 getNextToken(); // eat def.
1326 PrototypeAST *Proto = ParsePrototype();
1327 if (Proto == 0) return 0;
1328
1329 if (ExprAST *E = ParseExpression())
1330 return new FunctionAST(Proto, E);
1331 return 0;
1332}
1333
1334/// toplevelexpr ::= expression
1335static FunctionAST *ParseTopLevelExpr() {
1336 if (ExprAST *E = ParseExpression()) {
1337 // Make an anonymous proto.
1338 PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
1339 return new FunctionAST(Proto, E);
1340 }
1341 return 0;
1342}
1343
1344/// external ::= 'extern' prototype
1345static PrototypeAST *ParseExtern() {
1346 getNextToken(); // eat extern.
1347 return ParsePrototype();
1348}
1349
1350//===----------------------------------------------------------------------===//
1351// Code Generation
1352//===----------------------------------------------------------------------===//
1353
1354static Module *TheModule;
1355static LLVMFoldingBuilder Builder;
1356static std::map&lt;std::string, Value*&gt; NamedValues;
1357static FunctionPassManager *TheFPM;
1358
1359Value *ErrorV(const char *Str) { Error(Str); return 0; }
1360
1361Value *NumberExprAST::Codegen() {
1362 return ConstantFP::get(Type::DoubleTy, APFloat(Val));
1363}
1364
1365Value *VariableExprAST::Codegen() {
1366 // Look this variable up in the function.
1367 Value *V = NamedValues[Name];
1368 return V ? V : ErrorV("Unknown variable name");
1369}
1370
1371Value *BinaryExprAST::Codegen() {
1372 Value *L = LHS-&gt;Codegen();
1373 Value *R = RHS-&gt;Codegen();
1374 if (L == 0 || R == 0) return 0;
1375
1376 switch (Op) {
1377 case '+': return Builder.CreateAdd(L, R, "addtmp");
1378 case '-': return Builder.CreateSub(L, R, "subtmp");
1379 case '*': return Builder.CreateMul(L, R, "multmp");
1380 case '&lt;':
1381 L = Builder.CreateFCmpULT(L, R, "multmp");
1382 // Convert bool 0/1 to double 0.0 or 1.0
1383 return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
1384 default: return ErrorV("invalid binary operator");
1385 }
1386}
1387
1388Value *CallExprAST::Codegen() {
1389 // Look up the name in the global module table.
1390 Function *CalleeF = TheModule-&gt;getFunction(Callee);
1391 if (CalleeF == 0)
1392 return ErrorV("Unknown function referenced");
1393
1394 // If argument mismatch error.
1395 if (CalleeF-&gt;arg_size() != Args.size())
1396 return ErrorV("Incorrect # arguments passed");
1397
1398 std::vector&lt;Value*&gt; ArgsV;
1399 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
1400 ArgsV.push_back(Args[i]-&gt;Codegen());
1401 if (ArgsV.back() == 0) return 0;
1402 }
1403
1404 return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
1405}
1406
1407Value *IfExprAST::Codegen() {
1408 Value *CondV = Cond-&gt;Codegen();
1409 if (CondV == 0) return 0;
1410
1411 // Convert condition to a bool by comparing equal to 0.0.
1412 CondV = Builder.CreateFCmpONE(CondV,
1413 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
1414 "ifcond");
1415
1416 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1417
1418 // Create blocks for the then and else cases. Insert the 'then' block at the
1419 // end of the function.
1420 BasicBlock *ThenBB = new BasicBlock("then", TheFunction);
1421 BasicBlock *ElseBB = new BasicBlock("else");
1422 BasicBlock *MergeBB = new BasicBlock("ifcont");
1423
1424 Builder.CreateCondBr(CondV, ThenBB, ElseBB);
1425
1426 // Emit then value.
1427 Builder.SetInsertPoint(ThenBB);
1428
1429 Value *ThenV = Then-&gt;Codegen();
1430 if (ThenV == 0) return 0;
1431
1432 Builder.CreateBr(MergeBB);
1433 // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
1434 ThenBB = Builder.GetInsertBlock();
1435
1436 // Emit else block.
1437 TheFunction-&gt;getBasicBlockList().push_back(ElseBB);
1438 Builder.SetInsertPoint(ElseBB);
1439
1440 Value *ElseV = Else-&gt;Codegen();
1441 if (ElseV == 0) return 0;
1442
1443 Builder.CreateBr(MergeBB);
1444 // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
1445 ElseBB = Builder.GetInsertBlock();
1446
1447 // Emit merge block.
1448 TheFunction-&gt;getBasicBlockList().push_back(MergeBB);
1449 Builder.SetInsertPoint(MergeBB);
1450 PHINode *PN = Builder.CreatePHI(Type::DoubleTy, "iftmp");
1451
1452 PN-&gt;addIncoming(ThenV, ThenBB);
1453 PN-&gt;addIncoming(ElseV, ElseBB);
1454 return PN;
1455}
1456
1457Value *ForExprAST::Codegen() {
Chris Lattnerf5234802007-10-31 06:47:39 +00001458 // Output this as:
1459 // ...
1460 // start = startexpr
1461 // goto loop
1462 // loop:
1463 // variable = phi [start, loopheader], [nextvariable, loopend]
1464 // ...
1465 // bodyexpr
1466 // ...
1467 // loopend:
1468 // step = stepexpr
1469 // nextvariable = variable + step
1470 // endcond = endexpr
1471 // br endcond, loop, endloop
1472 // outloop:
1473
1474 // Emit the start code first, without 'variable' in scope.
1475 Value *StartVal = Start-&gt;Codegen();
1476 if (StartVal == 0) return 0;
1477
1478 // Make the new basic block for the loop header, inserting after current
1479 // block.
1480 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1481 BasicBlock *PreheaderBB = Builder.GetInsertBlock();
1482 BasicBlock *LoopBB = new BasicBlock("loop", TheFunction);
1483
1484 // Insert an explicit fall through from the current block to the LoopBB.
Chris Lattnerf5234802007-10-31 06:47:39 +00001485 Builder.CreateBr(LoopBB);
Chris Lattner6093bd52007-10-31 07:29:43 +00001486
1487 // Start insertion in LoopBB.
Chris Lattnerf5234802007-10-31 06:47:39 +00001488 Builder.SetInsertPoint(LoopBB);
1489
1490 // Start the PHI node with an entry for Start.
1491 PHINode *Variable = Builder.CreatePHI(Type::DoubleTy, VarName.c_str());
1492 Variable-&gt;addIncoming(StartVal, PreheaderBB);
1493
1494 // Within the loop, the variable is defined equal to the PHI node. If it
1495 // shadows an existing variable, we have to restore it, so save it now.
1496 Value *OldVal = NamedValues[VarName];
1497 NamedValues[VarName] = Variable;
1498
1499 // Emit the body of the loop. This, like any other expr, can change the
1500 // current BB. Note that we ignore the value computed by the body, but don't
1501 // allow an error.
1502 if (Body-&gt;Codegen() == 0)
1503 return 0;
1504
1505 // Emit the step value.
1506 Value *StepVal;
1507 if (Step) {
1508 StepVal = Step-&gt;Codegen();
1509 if (StepVal == 0) return 0;
1510 } else {
1511 // If not specified, use 1.0.
1512 StepVal = ConstantFP::get(Type::DoubleTy, APFloat(1.0));
1513 }
1514
1515 Value *NextVar = Builder.CreateAdd(Variable, StepVal, "nextvar");
1516
Chris Lattnerf5234802007-10-31 06:47:39 +00001517 // Compute the end condition.
1518 Value *EndCond = End-&gt;Codegen();
1519 if (EndCond == 0) return EndCond;
1520
1521 // Convert condition to a bool by comparing equal to 0.0.
1522 EndCond = Builder.CreateFCmpONE(EndCond,
1523 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
1524 "loopcond");
1525
1526 // Create the "after loop" block and insert it.
1527 BasicBlock *LoopEndBB = Builder.GetInsertBlock();
1528 BasicBlock *AfterBB = new BasicBlock("afterloop", TheFunction);
1529
1530 // Insert the conditional branch into the end of LoopEndBB.
1531 Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
1532
1533 // Any new code will be inserted in AfterBB.
1534 Builder.SetInsertPoint(AfterBB);
1535
1536 // Add a new entry to the PHI node for the backedge.
1537 Variable-&gt;addIncoming(NextVar, LoopEndBB);
1538
1539 // Restore the unshadowed variable.
1540 if (OldVal)
1541 NamedValues[VarName] = OldVal;
1542 else
1543 NamedValues.erase(VarName);
1544
1545
1546 // for expr always returns 0.0.
1547 return Constant::getNullValue(Type::DoubleTy);
1548}
Chris Lattner6093bd52007-10-31 07:29:43 +00001549
1550Function *PrototypeAST::Codegen() {
1551 // Make the function type: double(double,double) etc.
1552 std::vector&lt;const Type*&gt; Doubles(Args.size(), Type::DoubleTy);
1553 FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
1554
1555 Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
1556
1557 // If F conflicted, there was already something named 'Name'. If it has a
1558 // body, don't allow redefinition or reextern.
1559 if (F-&gt;getName() != Name) {
1560 // Delete the one we just made and get the existing one.
1561 F-&gt;eraseFromParent();
1562 F = TheModule-&gt;getFunction(Name);
1563
1564 // If F already has a body, reject this.
1565 if (!F-&gt;empty()) {
1566 ErrorF("redefinition of function");
1567 return 0;
1568 }
1569
1570 // If F took a different number of args, reject.
1571 if (F-&gt;arg_size() != Args.size()) {
1572 ErrorF("redefinition of function with different # args");
1573 return 0;
1574 }
1575 }
1576
1577 // Set names for all arguments.
1578 unsigned Idx = 0;
1579 for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
1580 ++AI, ++Idx) {
1581 AI-&gt;setName(Args[Idx]);
1582
1583 // Add arguments to variable symbol table.
1584 NamedValues[Args[Idx]] = AI;
1585 }
1586
1587 return F;
1588}
1589
1590Function *FunctionAST::Codegen() {
1591 NamedValues.clear();
1592
1593 Function *TheFunction = Proto-&gt;Codegen();
1594 if (TheFunction == 0)
1595 return 0;
1596
1597 // Create a new basic block to start insertion into.
1598 BasicBlock *BB = new BasicBlock("entry", TheFunction);
1599 Builder.SetInsertPoint(BB);
1600
1601 if (Value *RetVal = Body-&gt;Codegen()) {
1602 // Finish off the function.
1603 Builder.CreateRet(RetVal);
1604
1605 // Validate the generated code, checking for consistency.
1606 verifyFunction(*TheFunction);
1607
1608 // Optimize the function.
1609 TheFPM-&gt;run(*TheFunction);
1610
1611 return TheFunction;
1612 }
1613
1614 // Error reading body, remove function.
1615 TheFunction-&gt;eraseFromParent();
1616 return 0;
1617}
1618
1619//===----------------------------------------------------------------------===//
1620// Top-Level parsing and JIT Driver
1621//===----------------------------------------------------------------------===//
1622
1623static ExecutionEngine *TheExecutionEngine;
1624
1625static void HandleDefinition() {
1626 if (FunctionAST *F = ParseDefinition()) {
1627 if (Function *LF = F-&gt;Codegen()) {
1628 fprintf(stderr, "Read function definition:");
1629 LF-&gt;dump();
1630 }
1631 } else {
1632 // Skip token for error recovery.
1633 getNextToken();
1634 }
1635}
1636
1637static void HandleExtern() {
1638 if (PrototypeAST *P = ParseExtern()) {
1639 if (Function *F = P-&gt;Codegen()) {
1640 fprintf(stderr, "Read extern: ");
1641 F-&gt;dump();
1642 }
1643 } else {
1644 // Skip token for error recovery.
1645 getNextToken();
1646 }
1647}
1648
1649static void HandleTopLevelExpression() {
1650 // Evaluate a top level expression into an anonymous function.
1651 if (FunctionAST *F = ParseTopLevelExpr()) {
1652 if (Function *LF = F-&gt;Codegen()) {
1653 // JIT the function, returning a function pointer.
1654 void *FPtr = TheExecutionEngine-&gt;getPointerToFunction(LF);
1655
1656 // Cast it to the right type (takes no arguments, returns a double) so we
1657 // can call it as a native function.
1658 double (*FP)() = (double (*)())FPtr;
1659 fprintf(stderr, "Evaluated to %f\n", FP());
1660 }
1661 } else {
1662 // Skip token for error recovery.
1663 getNextToken();
1664 }
1665}
1666
1667/// top ::= definition | external | expression | ';'
1668static void MainLoop() {
1669 while (1) {
1670 fprintf(stderr, "ready&gt; ");
1671 switch (CurTok) {
1672 case tok_eof: return;
1673 case ';': getNextToken(); break; // ignore top level semicolons.
1674 case tok_def: HandleDefinition(); break;
1675 case tok_extern: HandleExtern(); break;
1676 default: HandleTopLevelExpression(); break;
1677 }
1678 }
1679}
Chris Lattnerf5234802007-10-31 06:47:39 +00001680
Chris Lattner602c832c2007-10-31 06:30:21 +00001681
Chris Lattner602c832c2007-10-31 06:30:21 +00001682
Chris Lattner6093bd52007-10-31 07:29:43 +00001683//===----------------------------------------------------------------------===//
1684// "Library" functions that can be "extern'd" from user code.
1685//===----------------------------------------------------------------------===//
Chris Lattner602c832c2007-10-31 06:30:21 +00001686
Chris Lattner6093bd52007-10-31 07:29:43 +00001687/// putchard - putchar that takes a double and returns 0.
1688extern "C"
1689double putchard(double X) {
1690 putchar((char)X);
1691 return 0;
1692}
Chris Lattner602c832c2007-10-31 06:30:21 +00001693
Chris Lattner6093bd52007-10-31 07:29:43 +00001694//===----------------------------------------------------------------------===//
1695// Main driver code.
1696//===----------------------------------------------------------------------===//
Chris Lattner602c832c2007-10-31 06:30:21 +00001697
Chris Lattner6093bd52007-10-31 07:29:43 +00001698int main() {
1699 // Install standard binary operators.
1700 // 1 is lowest precedence.
1701 BinopPrecedence['&lt;'] = 10;
1702 BinopPrecedence['+'] = 20;
1703 BinopPrecedence['-'] = 20;
1704 BinopPrecedence['*'] = 40; // highest.
Chris Lattner602c832c2007-10-31 06:30:21 +00001705
Chris Lattner6093bd52007-10-31 07:29:43 +00001706 // Prime the first token.
1707 fprintf(stderr, "ready&gt; ");
1708 getNextToken();
Chris Lattner602c832c2007-10-31 06:30:21 +00001709
Chris Lattner6093bd52007-10-31 07:29:43 +00001710 // Make the module, which holds all the code.
1711 TheModule = new Module("my cool jit");
1712
1713 // Create the JIT.
1714 TheExecutionEngine = ExecutionEngine::create(TheModule);
1715
1716 {
1717 ExistingModuleProvider OurModuleProvider(TheModule);
1718 FunctionPassManager OurFPM(&amp;OurModuleProvider);
1719
1720 // Set up the optimizer pipeline. Start with registering info about how the
1721 // target lays out data structures.
1722 OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
1723 // Do simple "peephole" optimizations and bit-twiddling optzns.
1724 OurFPM.add(createInstructionCombiningPass());
1725 // Reassociate expressions.
1726 OurFPM.add(createReassociatePass());
1727 // Eliminate Common SubExpressions.
1728 OurFPM.add(createGVNPass());
1729 // Simplify the control flow graph (deleting unreachable blocks, etc).
1730 OurFPM.add(createCFGSimplificationPass());
1731 // Set the global so the code gen can use this.
1732 TheFPM = &amp;OurFPM;
1733
1734 // Run the main "interpreter loop" now.
1735 MainLoop();
1736
1737 TheFPM = 0;
1738 } // Free module provider and pass manager.
1739
1740
1741 // Print out all of the generated code.
1742 TheModule-&gt;dump();
1743 return 0;
1744}
Chris Lattner602c832c2007-10-31 06:30:21 +00001745</pre>
1746</div>
1747
1748</div>
1749
1750<!-- *********************************************************************** -->
1751<hr>
1752<address>
1753 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
1754 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
1755 <a href="http://validator.w3.org/check/referer"><img
1756 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
1757
1758 <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
1759 <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
1760 Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $
1761</address>
1762</body>
1763</html>