blob: fc49d070476495817962f1bd98054505d12e4717 [file] [log] [blame]
Chris Lattner00c992d2007-11-03 08:55:29 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
3
4<html>
5<head>
6 <title>Kaleidoscope: Extending the Language: Mutable Variables / SSA
7 construction</title>
8 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
9 <meta name="author" content="Chris Lattner">
10 <link rel="stylesheet" href="../llvm.css" type="text/css">
11</head>
12
13<body>
14
NAKAMURA Takumi05d02652011-04-18 23:59:50 +000015<h1>Kaleidoscope: Extending the Language: Mutable Variables</h1>
Chris Lattner00c992d2007-11-03 08:55:29 +000016
Chris Lattner128eb862007-11-05 19:06:59 +000017<ul>
Chris Lattner0e555b12007-11-05 20:04:56 +000018<li><a href="index.html">Up to Tutorial Index</a></li>
Chris Lattner128eb862007-11-05 19:06:59 +000019<li>Chapter 7
20 <ol>
21 <li><a href="#intro">Chapter 7 Introduction</a></li>
22 <li><a href="#why">Why is this a hard problem?</a></li>
23 <li><a href="#memory">Memory in LLVM</a></li>
24 <li><a href="#kalvars">Mutable Variables in Kaleidoscope</a></li>
25 <li><a href="#adjustments">Adjusting Existing Variables for
26 Mutation</a></li>
27 <li><a href="#assignment">New Assignment Operator</a></li>
28 <li><a href="#localvars">User-defined Local Variables</a></li>
29 <li><a href="#code">Full Code Listing</a></li>
30 </ol>
31</li>
Chris Lattner0e555b12007-11-05 20:04:56 +000032<li><a href="LangImpl8.html">Chapter 8</a>: Conclusion and other useful LLVM
33 tidbits</li>
Chris Lattner128eb862007-11-05 19:06:59 +000034</ul>
35
Chris Lattner00c992d2007-11-03 08:55:29 +000036<div class="doc_author">
37 <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
38</div>
39
40<!-- *********************************************************************** -->
NAKAMURA Takumi05d02652011-04-18 23:59:50 +000041<h2><a name="intro">Chapter 7 Introduction</a></h2>
Chris Lattner00c992d2007-11-03 08:55:29 +000042<!-- *********************************************************************** -->
43
44<div class="doc_text">
45
Chris Lattner128eb862007-11-05 19:06:59 +000046<p>Welcome to Chapter 7 of the "<a href="index.html">Implementing a language
47with LLVM</a>" tutorial. In chapters 1 through 6, we've built a very
48respectable, albeit simple, <a
Chris Lattner00c992d2007-11-03 08:55:29 +000049href="http://en.wikipedia.org/wiki/Functional_programming">functional
50programming language</a>. In our journey, we learned some parsing techniques,
51how to build and represent an AST, how to build LLVM IR, and how to optimize
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +000052the resultant code as well as JIT compile it.</p>
Chris Lattner00c992d2007-11-03 08:55:29 +000053
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +000054<p>While Kaleidoscope is interesting as a functional language, the fact that it
55is functional makes it "too easy" to generate LLVM IR for it. In particular, a
56functional language makes it very easy to build LLVM IR directly in <a
Chris Lattner00c992d2007-11-03 08:55:29 +000057href="http://en.wikipedia.org/wiki/Static_single_assignment_form">SSA form</a>.
58Since LLVM requires that the input code be in SSA form, this is a very nice
59property and it is often unclear to newcomers how to generate code for an
60imperative language with mutable variables.</p>
61
62<p>The short (and happy) summary of this chapter is that there is no need for
63your front-end to build SSA form: LLVM provides highly tuned and well tested
64support for this, though the way it works is a bit unexpected for some.</p>
65
66</div>
67
68<!-- *********************************************************************** -->
NAKAMURA Takumi05d02652011-04-18 23:59:50 +000069<h2><a name="why">Why is this a hard problem?</a></h2>
Chris Lattner00c992d2007-11-03 08:55:29 +000070<!-- *********************************************************************** -->
71
72<div class="doc_text">
73
74<p>
75To understand why mutable variables cause complexities in SSA construction,
76consider this extremely simple C example:
77</p>
78
79<div class="doc_code">
80<pre>
81int G, H;
82int test(_Bool Condition) {
83 int X;
84 if (Condition)
85 X = G;
86 else
87 X = H;
88 return X;
89}
90</pre>
91</div>
92
93<p>In this case, we have the variable "X", whose value depends on the path
94executed in the program. Because there are two different possible values for X
95before the return instruction, a PHI node is inserted to merge the two values.
96The LLVM IR that we want for this example looks like this:</p>
97
98<div class="doc_code">
99<pre>
100@G = weak global i32 0 ; type of @G is i32*
101@H = weak global i32 0 ; type of @H is i32*
102
103define i32 @test(i1 %Condition) {
104entry:
105 br i1 %Condition, label %cond_true, label %cond_false
106
107cond_true:
108 %X.0 = load i32* @G
109 br label %cond_next
110
111cond_false:
112 %X.1 = load i32* @H
113 br label %cond_next
114
115cond_next:
116 %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
117 ret i32 %X.2
118}
119</pre>
120</div>
121
122<p>In this example, the loads from the G and H global variables are explicit in
123the LLVM IR, and they live in the then/else branches of the if statement
124(cond_true/cond_false). In order to merge the incoming values, the X.2 phi node
125in the cond_next block selects the right value to use based on where control
126flow is coming from: if control flow comes from the cond_false block, X.2 gets
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000127the value of X.1. Alternatively, if control flow comes from cond_true, it gets
Chris Lattner00c992d2007-11-03 08:55:29 +0000128the value of X.0. The intent of this chapter is not to explain the details of
129SSA form. For more information, see one of the many <a
130href="http://en.wikipedia.org/wiki/Static_single_assignment_form">online
131references</a>.</p>
132
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000133<p>The question for this article is "who places the phi nodes when lowering
Chris Lattner00c992d2007-11-03 08:55:29 +0000134assignments to mutable variables?". The issue here is that LLVM
135<em>requires</em> that its IR be in SSA form: there is no "non-ssa" mode for it.
136However, SSA construction requires non-trivial algorithms and data structures,
137so it is inconvenient and wasteful for every front-end to have to reproduce this
138logic.</p>
139
140</div>
141
142<!-- *********************************************************************** -->
NAKAMURA Takumi05d02652011-04-18 23:59:50 +0000143<h2><a name="memory">Memory in LLVM</a></h2>
Chris Lattner00c992d2007-11-03 08:55:29 +0000144<!-- *********************************************************************** -->
145
146<div class="doc_text">
147
148<p>The 'trick' here is that while LLVM does require all register values to be
149in SSA form, it does not require (or permit) memory objects to be in SSA form.
150In the example above, note that the loads from G and H are direct accesses to
151G and H: they are not renamed or versioned. This differs from some other
Chris Lattner2e5d07e2007-11-04 19:42:13 +0000152compiler systems, which do try to version memory objects. In LLVM, instead of
Chris Lattner00c992d2007-11-03 08:55:29 +0000153encoding dataflow analysis of memory into the LLVM IR, it is handled with <a
154href="../WritingAnLLVMPass.html">Analysis Passes</a> which are computed on
155demand.</p>
156
157<p>
158With this in mind, the high-level idea is that we want to make a stack variable
159(which lives in memory, because it is on the stack) for each mutable object in
160a function. To take advantage of this trick, we need to talk about how LLVM
161represents stack variables.
162</p>
163
164<p>In LLVM, all memory accesses are explicit with load/store instructions, and
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000165it is carefully designed not to have (or need) an "address-of" operator. Notice
Chris Lattner00c992d2007-11-03 08:55:29 +0000166how the type of the @G/@H global variables is actually "i32*" even though the
167variable is defined as "i32". What this means is that @G defines <em>space</em>
168for an i32 in the global data area, but its <em>name</em> actually refers to the
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000169address for that space. Stack variables work the same way, except that instead of
170being declared with global variable definitions, they are declared with the
Chris Lattner00c992d2007-11-03 08:55:29 +0000171<a href="../LangRef.html#i_alloca">LLVM alloca instruction</a>:</p>
172
173<div class="doc_code">
174<pre>
Chris Lattner1e46a6c2007-11-07 06:34:39 +0000175define i32 @example() {
Chris Lattner00c992d2007-11-03 08:55:29 +0000176entry:
177 %X = alloca i32 ; type of %X is i32*.
178 ...
179 %tmp = load i32* %X ; load the stack value %X from the stack.
180 %tmp2 = add i32 %tmp, 1 ; increment it
181 store i32 %tmp2, i32* %X ; store it back
182 ...
183</pre>
184</div>
185
186<p>This code shows an example of how you can declare and manipulate a stack
187variable in the LLVM IR. Stack memory allocated with the alloca instruction is
188fully general: you can pass the address of the stack slot to functions, you can
189store it in other variables, etc. In our example above, we could rewrite the
190example to use the alloca technique to avoid using a PHI node:</p>
191
192<div class="doc_code">
193<pre>
194@G = weak global i32 0 ; type of @G is i32*
195@H = weak global i32 0 ; type of @H is i32*
196
197define i32 @test(i1 %Condition) {
198entry:
199 %X = alloca i32 ; type of %X is i32*.
200 br i1 %Condition, label %cond_true, label %cond_false
201
202cond_true:
203 %X.0 = load i32* @G
204 store i32 %X.0, i32* %X ; Update X
205 br label %cond_next
206
207cond_false:
208 %X.1 = load i32* @H
209 store i32 %X.1, i32* %X ; Update X
210 br label %cond_next
211
212cond_next:
213 %X.2 = load i32* %X ; Read X
214 ret i32 %X.2
215}
216</pre>
217</div>
218
219<p>With this, we have discovered a way to handle arbitrary mutable variables
220without the need to create Phi nodes at all:</p>
221
222<ol>
223<li>Each mutable variable becomes a stack allocation.</li>
224<li>Each read of the variable becomes a load from the stack.</li>
225<li>Each update of the variable becomes a store to the stack.</li>
226<li>Taking the address of a variable just uses the stack address directly.</li>
227</ol>
228
229<p>While this solution has solved our immediate problem, it introduced another
230one: we have now apparently introduced a lot of stack traffic for very simple
231and common operations, a major performance problem. Fortunately for us, the
232LLVM optimizer has a highly-tuned optimization pass named "mem2reg" that handles
233this case, promoting allocas like this into SSA registers, inserting Phi nodes
234as appropriate. If you run this example through the pass, for example, you'll
235get:</p>
236
237<div class="doc_code">
238<pre>
239$ <b>llvm-as &lt; example.ll | opt -mem2reg | llvm-dis</b>
240@G = weak global i32 0
241@H = weak global i32 0
242
243define i32 @test(i1 %Condition) {
244entry:
245 br i1 %Condition, label %cond_true, label %cond_false
246
247cond_true:
248 %X.0 = load i32* @G
249 br label %cond_next
250
251cond_false:
252 %X.1 = load i32* @H
253 br label %cond_next
254
255cond_next:
256 %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
257 ret i32 %X.01
258}
259</pre>
Chris Lattnere7198312007-11-03 22:22:30 +0000260</div>
Chris Lattner00c992d2007-11-03 08:55:29 +0000261
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000262<p>The mem2reg pass implements the standard "iterated dominance frontier"
Chris Lattnere7198312007-11-03 22:22:30 +0000263algorithm for constructing SSA form and has a number of optimizations that speed
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000264up (very common) degenerate cases. The mem2reg optimization pass is the answer to dealing
265with mutable variables, and we highly recommend that you depend on it. Note that
Chris Lattnere7198312007-11-03 22:22:30 +0000266mem2reg only works on variables in certain circumstances:</p>
Chris Lattner00c992d2007-11-03 08:55:29 +0000267
Chris Lattnere7198312007-11-03 22:22:30 +0000268<ol>
269<li>mem2reg is alloca-driven: it looks for allocas and if it can handle them, it
270promotes them. It does not apply to global variables or heap allocations.</li>
Chris Lattner00c992d2007-11-03 08:55:29 +0000271
Chris Lattnere7198312007-11-03 22:22:30 +0000272<li>mem2reg only looks for alloca instructions in the entry block of the
273function. Being in the entry block guarantees that the alloca is only executed
274once, which makes analysis simpler.</li>
Chris Lattner00c992d2007-11-03 08:55:29 +0000275
Chris Lattnere7198312007-11-03 22:22:30 +0000276<li>mem2reg only promotes allocas whose uses are direct loads and stores. If
277the address of the stack object is passed to a function, or if any funny pointer
278arithmetic is involved, the alloca will not be promoted.</li>
279
Chris Lattnera56b22d2007-11-05 17:45:54 +0000280<li>mem2reg only works on allocas of <a
281href="../LangRef.html#t_classifications">first class</a>
282values (such as pointers, scalars and vectors), and only if the array size
Chris Lattnere7198312007-11-03 22:22:30 +0000283of the allocation is 1 (or missing in the .ll file). mem2reg is not capable of
284promoting structs or arrays to registers. Note that the "scalarrepl" pass is
285more powerful and can promote structs, "unions", and arrays in many cases.</li>
286
287</ol>
288
289<p>
290All of these properties are easy to satisfy for most imperative languages, and
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000291we'll illustrate it below with Kaleidoscope. The final question you may be
Chris Lattnere7198312007-11-03 22:22:30 +0000292asking is: should I bother with this nonsense for my front-end? Wouldn't it be
293better if I just did SSA construction directly, avoiding use of the mem2reg
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000294optimization pass? In short, we strongly recommend that you use this technique
Chris Lattnere7198312007-11-03 22:22:30 +0000295for building SSA form, unless there is an extremely good reason not to. Using
296this technique is:</p>
297
298<ul>
299<li>Proven and well tested: llvm-gcc and clang both use this technique for local
300mutable variables. As such, the most common clients of LLVM are using this to
301handle a bulk of their variables. You can be sure that bugs are found fast and
302fixed early.</li>
303
304<li>Extremely Fast: mem2reg has a number of special cases that make it fast in
305common cases as well as fully general. For example, it has fast-paths for
306variables that are only used in a single block, variables that only have one
307assignment point, good heuristics to avoid insertion of unneeded phi nodes, etc.
308</li>
309
310<li>Needed for debug info generation: <a href="../SourceLevelDebugging.html">
311Debug information in LLVM</a> relies on having the address of the variable
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000312exposed so that debug info can be attached to it. This technique dovetails
313very naturally with this style of debug info.</li>
Chris Lattnere7198312007-11-03 22:22:30 +0000314</ul>
315
316<p>If nothing else, this makes it much easier to get your front-end up and
317running, and is very simple to implement. Lets extend Kaleidoscope with mutable
318variables now!
Chris Lattner00c992d2007-11-03 08:55:29 +0000319</p>
Chris Lattner62a709d2007-11-05 00:23:57 +0000320
Chris Lattner00c992d2007-11-03 08:55:29 +0000321</div>
322
Chris Lattner62a709d2007-11-05 00:23:57 +0000323<!-- *********************************************************************** -->
NAKAMURA Takumi05d02652011-04-18 23:59:50 +0000324<h2><a name="kalvars">Mutable Variables in Kaleidoscope</a></h2>
Chris Lattner62a709d2007-11-05 00:23:57 +0000325<!-- *********************************************************************** -->
326
327<div class="doc_text">
328
329<p>Now that we know the sort of problem we want to tackle, lets see what this
330looks like in the context of our little Kaleidoscope language. We're going to
331add two features:</p>
332
333<ol>
334<li>The ability to mutate variables with the '=' operator.</li>
335<li>The ability to define new variables.</li>
336</ol>
337
338<p>While the first item is really what this is about, we only have variables
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000339for incoming arguments as well as for induction variables, and redefining those only
Chris Lattner62a709d2007-11-05 00:23:57 +0000340goes so far :). Also, the ability to define new variables is a
341useful thing regardless of whether you will be mutating them. Here's a
342motivating example that shows how we could use these:</p>
343
344<div class="doc_code">
345<pre>
346# Define ':' for sequencing: as a low-precedence operator that ignores operands
347# and just returns the RHS.
348def binary : 1 (x y) y;
349
350# Recursive fib, we could do this before.
351def fib(x)
352 if (x &lt; 3) then
353 1
354 else
355 fib(x-1)+fib(x-2);
356
357# Iterative fib.
358def fibi(x)
359 <b>var a = 1, b = 1, c in</b>
Chris Lattner1e46a6c2007-11-07 06:34:39 +0000360 (for i = 3, i &lt; x in
Chris Lattner62a709d2007-11-05 00:23:57 +0000361 <b>c = a + b</b> :
362 <b>a = b</b> :
363 <b>b = c</b>) :
364 b;
365
366# Call it.
367fibi(10);
368</pre>
369</div>
370
371<p>
372In order to mutate variables, we have to change our existing variables to use
373the "alloca trick". Once we have that, we'll add our new operator, then extend
374Kaleidoscope to support new variable definitions.
375</p>
376
377</div>
378
379<!-- *********************************************************************** -->
NAKAMURA Takumi05d02652011-04-18 23:59:50 +0000380<h2><a name="adjustments">Adjusting Existing Variables for Mutation</a></h2>
Chris Lattner62a709d2007-11-05 00:23:57 +0000381<!-- *********************************************************************** -->
382
383<div class="doc_text">
384
385<p>
386The symbol table in Kaleidoscope is managed at code generation time by the
387'<tt>NamedValues</tt>' map. This map currently keeps track of the LLVM "Value*"
388that holds the double value for the named variable. In order to support
389mutation, we need to change this slightly, so that it <tt>NamedValues</tt> holds
390the <em>memory location</em> of the variable in question. Note that this
391change is a refactoring: it changes the structure of the code, but does not
392(by itself) change the behavior of the compiler. All of these changes are
393isolated in the Kaleidoscope code generator.</p>
394
395<p>
396At this point in Kaleidoscope's development, it only supports variables for two
397things: incoming arguments to functions and the induction variable of 'for'
398loops. For consistency, we'll allow mutation of these variables in addition to
399other user-defined variables. This means that these will both need memory
400locations.
401</p>
402
403<p>To start our transformation of Kaleidoscope, we'll change the NamedValues
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000404map so that it maps to AllocaInst* instead of Value*. Once we do this, the C++
405compiler will tell us what parts of the code we need to update:</p>
Chris Lattner62a709d2007-11-05 00:23:57 +0000406
407<div class="doc_code">
408<pre>
409static std::map&lt;std::string, AllocaInst*&gt; NamedValues;
410</pre>
411</div>
412
413<p>Also, since we will need to create these alloca's, we'll use a helper
414function that ensures that the allocas are created in the entry block of the
415function:</p>
416
417<div class="doc_code">
418<pre>
419/// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
420/// the function. This is used for mutable variables etc.
421static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
422 const std::string &amp;VarName) {
Gabor Greifd6c1ed02009-03-11 19:51:07 +0000423 IRBuilder&lt;&gt; TmpB(&amp;TheFunction-&gt;getEntryBlock(),
Duncan Sands89f6d882008-04-13 06:22:09 +0000424 TheFunction-&gt;getEntryBlock().begin());
Nick Lewycky422094c2009-09-13 21:38:54 +0000425 return TmpB.CreateAlloca(Type::getDoubleTy(getGlobalContext()), 0,
426 VarName.c_str());
Chris Lattner62a709d2007-11-05 00:23:57 +0000427}
428</pre>
429</div>
430
Duncan Sands89f6d882008-04-13 06:22:09 +0000431<p>This funny looking code creates an IRBuilder object that is pointing at
Chris Lattner62a709d2007-11-05 00:23:57 +0000432the first instruction (.begin()) of the entry block. It then creates an alloca
433with the expected name and returns it. Because all values in Kaleidoscope are
434doubles, there is no need to pass in a type to use.</p>
435
436<p>With this in place, the first functionality change we want to make is to
437variable references. In our new scheme, variables live on the stack, so code
438generating a reference to them actually needs to produce a load from the stack
439slot:</p>
440
441<div class="doc_code">
442<pre>
443Value *VariableExprAST::Codegen() {
444 // Look this variable up in the function.
445 Value *V = NamedValues[Name];
446 if (V == 0) return ErrorV("Unknown variable name");
447
Chris Lattner1e46a6c2007-11-07 06:34:39 +0000448 <b>// Load the value.
449 return Builder.CreateLoad(V, Name.c_str());</b>
Chris Lattner62a709d2007-11-05 00:23:57 +0000450}
451</pre>
452</div>
453
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000454<p>As you can see, this is pretty straightforward. Now we need to update the
Chris Lattner62a709d2007-11-05 00:23:57 +0000455things that define the variables to set up the alloca. We'll start with
456<tt>ForExprAST::Codegen</tt> (see the <a href="#code">full code listing</a> for
457the unabridged code):</p>
458
459<div class="doc_code">
460<pre>
461 Function *TheFunction = Builder.GetInsertBlock()->getParent();
462
463 <b>// Create an alloca for the variable in the entry block.
464 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);</b>
465
466 // Emit the start code first, without 'variable' in scope.
467 Value *StartVal = Start-&gt;Codegen();
468 if (StartVal == 0) return 0;
469
470 <b>// Store the value into the alloca.
471 Builder.CreateStore(StartVal, Alloca);</b>
472 ...
473
474 // Compute the end condition.
475 Value *EndCond = End-&gt;Codegen();
476 if (EndCond == 0) return EndCond;
477
478 <b>// Reload, increment, and restore the alloca. This handles the case where
479 // the body of the loop mutates the variable.
480 Value *CurVar = Builder.CreateLoad(Alloca);
Chris Lattnerf57fcf72010-09-01 20:09:20 +0000481 Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar");
Chris Lattner62a709d2007-11-05 00:23:57 +0000482 Builder.CreateStore(NextVar, Alloca);</b>
483 ...
484</pre>
485</div>
486
487<p>This code is virtually identical to the code <a
488href="LangImpl5.html#forcodegen">before we allowed mutable variables</a>. The
489big difference is that we no longer have to construct a PHI node, and we use
490load/store to access the variable as needed.</p>
491
492<p>To support mutable argument variables, we need to also make allocas for them.
493The code for this is also pretty simple:</p>
494
495<div class="doc_code">
496<pre>
497/// CreateArgumentAllocas - Create an alloca for each argument and register the
498/// argument in the symbol table so that references to it will succeed.
499void PrototypeAST::CreateArgumentAllocas(Function *F) {
500 Function::arg_iterator AI = F-&gt;arg_begin();
501 for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
502 // Create an alloca for this variable.
503 AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
504
505 // Store the initial value into the alloca.
506 Builder.CreateStore(AI, Alloca);
507
508 // Add arguments to variable symbol table.
509 NamedValues[Args[Idx]] = Alloca;
510 }
511}
512</pre>
513</div>
514
515<p>For each argument, we make an alloca, store the input value to the function
516into the alloca, and register the alloca as the memory location for the
517argument. This method gets invoked by <tt>FunctionAST::Codegen</tt> right after
518it sets up the entry block for the function.</p>
519
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000520<p>The final missing piece is adding the mem2reg pass, which allows us to get
Chris Lattner62a709d2007-11-05 00:23:57 +0000521good codegen once again:</p>
522
523<div class="doc_code">
524<pre>
525 // Set up the optimizer pipeline. Start with registering info about how the
526 // target lays out data structures.
527 OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
528 <b>// Promote allocas to registers.
529 OurFPM.add(createPromoteMemoryToRegisterPass());</b>
530 // Do simple "peephole" optimizations and bit-twiddling optzns.
531 OurFPM.add(createInstructionCombiningPass());
532 // Reassociate expressions.
533 OurFPM.add(createReassociatePass());
534</pre>
535</div>
536
537<p>It is interesting to see what the code looks like before and after the
538mem2reg optimization runs. For example, this is the before/after code for our
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000539recursive fib function. Before the optimization:</p>
Chris Lattner62a709d2007-11-05 00:23:57 +0000540
541<div class="doc_code">
542<pre>
543define double @fib(double %x) {
544entry:
545 <b>%x1 = alloca double
546 store double %x, double* %x1
547 %x2 = load double* %x1</b>
Chris Lattner71155212007-11-06 01:39:12 +0000548 %cmptmp = fcmp ult double %x2, 3.000000e+00
549 %booltmp = uitofp i1 %cmptmp to double
Chris Lattner62a709d2007-11-05 00:23:57 +0000550 %ifcond = fcmp one double %booltmp, 0.000000e+00
551 br i1 %ifcond, label %then, label %else
552
553then: ; preds = %entry
554 br label %ifcont
555
556else: ; preds = %entry
557 <b>%x3 = load double* %x1</b>
Dan Gohmana9445e12010-03-02 01:11:08 +0000558 %subtmp = fsub double %x3, 1.000000e+00
Dan Gohman3dfb3cf2010-05-28 17:07:41 +0000559 %calltmp = call double @fib(double %subtmp)
Chris Lattner62a709d2007-11-05 00:23:57 +0000560 <b>%x4 = load double* %x1</b>
Dan Gohmana9445e12010-03-02 01:11:08 +0000561 %subtmp5 = fsub double %x4, 2.000000e+00
Dan Gohman3dfb3cf2010-05-28 17:07:41 +0000562 %calltmp6 = call double @fib(double %subtmp5)
Dan Gohmana9445e12010-03-02 01:11:08 +0000563 %addtmp = fadd double %calltmp, %calltmp6
Chris Lattner62a709d2007-11-05 00:23:57 +0000564 br label %ifcont
565
566ifcont: ; preds = %else, %then
567 %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
568 ret double %iftmp
569}
570</pre>
571</div>
572
573<p>Here there is only one variable (x, the input argument) but you can still
574see the extremely simple-minded code generation strategy we are using. In the
575entry block, an alloca is created, and the initial input value is stored into
576it. Each reference to the variable does a reload from the stack. Also, note
577that we didn't modify the if/then/else expression, so it still inserts a PHI
578node. While we could make an alloca for it, it is actually easier to create a
579PHI node for it, so we still just make the PHI.</p>
580
581<p>Here is the code after the mem2reg pass runs:</p>
582
583<div class="doc_code">
584<pre>
585define double @fib(double %x) {
586entry:
Chris Lattner71155212007-11-06 01:39:12 +0000587 %cmptmp = fcmp ult double <b>%x</b>, 3.000000e+00
588 %booltmp = uitofp i1 %cmptmp to double
Chris Lattner62a709d2007-11-05 00:23:57 +0000589 %ifcond = fcmp one double %booltmp, 0.000000e+00
590 br i1 %ifcond, label %then, label %else
591
592then:
593 br label %ifcont
594
595else:
Dan Gohmana9445e12010-03-02 01:11:08 +0000596 %subtmp = fsub double <b>%x</b>, 1.000000e+00
Dan Gohman3dfb3cf2010-05-28 17:07:41 +0000597 %calltmp = call double @fib(double %subtmp)
Dan Gohmana9445e12010-03-02 01:11:08 +0000598 %subtmp5 = fsub double <b>%x</b>, 2.000000e+00
Dan Gohman3dfb3cf2010-05-28 17:07:41 +0000599 %calltmp6 = call double @fib(double %subtmp5)
Dan Gohmana9445e12010-03-02 01:11:08 +0000600 %addtmp = fadd double %calltmp, %calltmp6
Chris Lattner62a709d2007-11-05 00:23:57 +0000601 br label %ifcont
602
603ifcont: ; preds = %else, %then
604 %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
605 ret double %iftmp
606}
607</pre>
608</div>
609
610<p>This is a trivial case for mem2reg, since there are no redefinitions of the
611variable. The point of showing this is to calm your tension about inserting
612such blatent inefficiencies :).</p>
613
614<p>After the rest of the optimizers run, we get:</p>
615
616<div class="doc_code">
617<pre>
618define double @fib(double %x) {
619entry:
Chris Lattner71155212007-11-06 01:39:12 +0000620 %cmptmp = fcmp ult double %x, 3.000000e+00
621 %booltmp = uitofp i1 %cmptmp to double
Chris Lattner62a709d2007-11-05 00:23:57 +0000622 %ifcond = fcmp ueq double %booltmp, 0.000000e+00
623 br i1 %ifcond, label %else, label %ifcont
624
625else:
Dan Gohmana9445e12010-03-02 01:11:08 +0000626 %subtmp = fsub double %x, 1.000000e+00
Dan Gohman3dfb3cf2010-05-28 17:07:41 +0000627 %calltmp = call double @fib(double %subtmp)
Dan Gohmana9445e12010-03-02 01:11:08 +0000628 %subtmp5 = fsub double %x, 2.000000e+00
Dan Gohman3dfb3cf2010-05-28 17:07:41 +0000629 %calltmp6 = call double @fib(double %subtmp5)
Dan Gohmana9445e12010-03-02 01:11:08 +0000630 %addtmp = fadd double %calltmp, %calltmp6
Chris Lattner62a709d2007-11-05 00:23:57 +0000631 ret double %addtmp
632
633ifcont:
634 ret double 1.000000e+00
635}
636</pre>
637</div>
638
639<p>Here we see that the simplifycfg pass decided to clone the return instruction
640into the end of the 'else' block. This allowed it to eliminate some branches
641and the PHI node.</p>
642
643<p>Now that all symbol table references are updated to use stack variables,
644we'll add the assignment operator.</p>
645
646</div>
647
648<!-- *********************************************************************** -->
NAKAMURA Takumi05d02652011-04-18 23:59:50 +0000649<h2><a name="assignment">New Assignment Operator</a></h2>
Chris Lattner62a709d2007-11-05 00:23:57 +0000650<!-- *********************************************************************** -->
651
652<div class="doc_text">
653
654<p>With our current framework, adding a new assignment operator is really
655simple. We will parse it just like any other binary operator, but handle it
656internally (instead of allowing the user to define it). The first step is to
657set a precedence:</p>
658
659<div class="doc_code">
660<pre>
661 int main() {
662 // Install standard binary operators.
663 // 1 is lowest precedence.
664 <b>BinopPrecedence['='] = 2;</b>
665 BinopPrecedence['&lt;'] = 10;
666 BinopPrecedence['+'] = 20;
667 BinopPrecedence['-'] = 20;
668</pre>
669</div>
670
671<p>Now that the parser knows the precedence of the binary operator, it takes
672care of all the parsing and AST generation. We just need to implement codegen
673for the assignment operator. This looks like:</p>
674
675<div class="doc_code">
676<pre>
677Value *BinaryExprAST::Codegen() {
678 // Special case '=' because we don't want to emit the LHS as an expression.
679 if (Op == '=') {
680 // Assignment requires the LHS to be an identifier.
681 VariableExprAST *LHSE = dynamic_cast&lt;VariableExprAST*&gt;(LHS);
682 if (!LHSE)
683 return ErrorV("destination of '=' must be a variable");
684</pre>
685</div>
686
687<p>Unlike the rest of the binary operators, our assignment operator doesn't
688follow the "emit LHS, emit RHS, do computation" model. As such, it is handled
689as a special case before the other binary operators are handled. The other
Chris Lattner1e46a6c2007-11-07 06:34:39 +0000690strange thing is that it requires the LHS to be a variable. It is invalid to
691have "(x+1) = expr" - only things like "x = expr" are allowed.
Chris Lattner62a709d2007-11-05 00:23:57 +0000692</p>
693
694<div class="doc_code">
695<pre>
696 // Codegen the RHS.
697 Value *Val = RHS-&gt;Codegen();
698 if (Val == 0) return 0;
699
700 // Look up the name.
701 Value *Variable = NamedValues[LHSE-&gt;getName()];
702 if (Variable == 0) return ErrorV("Unknown variable name");
703
704 Builder.CreateStore(Val, Variable);
705 return Val;
706 }
707 ...
708</pre>
709</div>
710
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000711<p>Once we have the variable, codegen'ing the assignment is straightforward:
Chris Lattner62a709d2007-11-05 00:23:57 +0000712we emit the RHS of the assignment, create a store, and return the computed
713value. Returning a value allows for chained assignments like "X = (Y = Z)".</p>
714
715<p>Now that we have an assignment operator, we can mutate loop variables and
716arguments. For example, we can now run code like this:</p>
717
718<div class="doc_code">
719<pre>
720# Function to print a double.
721extern printd(x);
722
723# Define ':' for sequencing: as a low-precedence operator that ignores operands
724# and just returns the RHS.
725def binary : 1 (x y) y;
726
727def test(x)
728 printd(x) :
729 x = 4 :
730 printd(x);
731
732test(123);
733</pre>
734</div>
735
736<p>When run, this example prints "123" and then "4", showing that we did
737actually mutate the value! Okay, we have now officially implemented our goal:
738getting this to work requires SSA construction in the general case. However,
739to be really useful, we want the ability to define our own local variables, lets
740add this next!
741</p>
742
743</div>
744
745<!-- *********************************************************************** -->
NAKAMURA Takumi05d02652011-04-18 23:59:50 +0000746<h2><a name="localvars">User-defined Local Variables</a></h2>
Chris Lattner62a709d2007-11-05 00:23:57 +0000747<!-- *********************************************************************** -->
748
749<div class="doc_text">
750
751<p>Adding var/in is just like any other other extensions we made to
752Kaleidoscope: we extend the lexer, the parser, the AST and the code generator.
753The first step for adding our new 'var/in' construct is to extend the lexer.
754As before, this is pretty trivial, the code looks like this:</p>
755
756<div class="doc_code">
757<pre>
758enum Token {
759 ...
760 <b>// var definition
761 tok_var = -13</b>
762...
763}
764...
765static int gettok() {
766...
767 if (IdentifierStr == "in") return tok_in;
768 if (IdentifierStr == "binary") return tok_binary;
769 if (IdentifierStr == "unary") return tok_unary;
770 <b>if (IdentifierStr == "var") return tok_var;</b>
771 return tok_identifier;
772...
773</pre>
774</div>
775
776<p>The next step is to define the AST node that we will construct. For var/in,
Chris Lattner1e46a6c2007-11-07 06:34:39 +0000777it looks like this:</p>
Chris Lattner62a709d2007-11-05 00:23:57 +0000778
779<div class="doc_code">
780<pre>
781/// VarExprAST - Expression class for var/in
782class VarExprAST : public ExprAST {
783 std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
784 ExprAST *Body;
785public:
786 VarExprAST(const std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; &amp;varnames,
787 ExprAST *body)
788 : VarNames(varnames), Body(body) {}
789
790 virtual Value *Codegen();
791};
792</pre>
793</div>
794
795<p>var/in allows a list of names to be defined all at once, and each name can
796optionally have an initializer value. As such, we capture this information in
797the VarNames vector. Also, var/in has a body, this body is allowed to access
Chris Lattner1e46a6c2007-11-07 06:34:39 +0000798the variables defined by the var/in.</p>
Chris Lattner62a709d2007-11-05 00:23:57 +0000799
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000800<p>With this in place, we can define the parser pieces. The first thing we do is add
Chris Lattner62a709d2007-11-05 00:23:57 +0000801it as a primary expression:</p>
802
803<div class="doc_code">
804<pre>
805/// primary
806/// ::= identifierexpr
807/// ::= numberexpr
808/// ::= parenexpr
809/// ::= ifexpr
810/// ::= forexpr
811<b>/// ::= varexpr</b>
812static ExprAST *ParsePrimary() {
813 switch (CurTok) {
814 default: return Error("unknown token when expecting an expression");
815 case tok_identifier: return ParseIdentifierExpr();
816 case tok_number: return ParseNumberExpr();
817 case '(': return ParseParenExpr();
818 case tok_if: return ParseIfExpr();
819 case tok_for: return ParseForExpr();
820 <b>case tok_var: return ParseVarExpr();</b>
821 }
822}
823</pre>
824</div>
825
826<p>Next we define ParseVarExpr:</p>
827
828<div class="doc_code">
829<pre>
Chris Lattner20a0c802007-11-05 17:54:34 +0000830/// varexpr ::= 'var' identifier ('=' expression)?
831// (',' identifier ('=' expression)?)* 'in' expression
Chris Lattner62a709d2007-11-05 00:23:57 +0000832static ExprAST *ParseVarExpr() {
833 getNextToken(); // eat the var.
834
835 std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
836
837 // At least one variable name is required.
838 if (CurTok != tok_identifier)
839 return Error("expected identifier after var");
840</pre>
841</div>
842
843<p>The first part of this code parses the list of identifier/expr pairs into the
844local <tt>VarNames</tt> vector.
845
846<div class="doc_code">
847<pre>
848 while (1) {
849 std::string Name = IdentifierStr;
Chris Lattner20a0c802007-11-05 17:54:34 +0000850 getNextToken(); // eat identifier.
Chris Lattner62a709d2007-11-05 00:23:57 +0000851
852 // Read the optional initializer.
853 ExprAST *Init = 0;
854 if (CurTok == '=') {
855 getNextToken(); // eat the '='.
856
857 Init = ParseExpression();
858 if (Init == 0) return 0;
859 }
860
861 VarNames.push_back(std::make_pair(Name, Init));
862
863 // End of var list, exit loop.
864 if (CurTok != ',') break;
865 getNextToken(); // eat the ','.
866
867 if (CurTok != tok_identifier)
868 return Error("expected identifier list after var");
869 }
870</pre>
871</div>
872
873<p>Once all the variables are parsed, we then parse the body and create the
874AST node:</p>
875
876<div class="doc_code">
877<pre>
878 // At this point, we have to have 'in'.
879 if (CurTok != tok_in)
880 return Error("expected 'in' keyword after 'var'");
881 getNextToken(); // eat 'in'.
882
883 ExprAST *Body = ParseExpression();
884 if (Body == 0) return 0;
885
886 return new VarExprAST(VarNames, Body);
887}
888</pre>
889</div>
890
891<p>Now that we can parse and represent the code, we need to support emission of
892LLVM IR for it. This code starts out with:</p>
893
894<div class="doc_code">
895<pre>
896Value *VarExprAST::Codegen() {
897 std::vector&lt;AllocaInst *&gt; OldBindings;
898
899 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
900
901 // Register all variables and emit their initializer.
902 for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
903 const std::string &amp;VarName = VarNames[i].first;
904 ExprAST *Init = VarNames[i].second;
905</pre>
906</div>
907
908<p>Basically it loops over all the variables, installing them one at a time.
909For each variable we put into the symbol table, we remember the previous value
910that we replace in OldBindings.</p>
911
912<div class="doc_code">
913<pre>
914 // Emit the initializer before adding the variable to scope, this prevents
915 // the initializer from referencing the variable itself, and permits stuff
916 // like this:
917 // var a = 1 in
918 // var a = a in ... # refers to outer 'a'.
919 Value *InitVal;
920 if (Init) {
921 InitVal = Init-&gt;Codegen();
922 if (InitVal == 0) return 0;
923 } else { // If not specified, use 0.0.
Owen Anderson6f83c9c2009-07-27 20:59:43 +0000924 InitVal = ConstantFP::get(getGlobalContext(), APFloat(0.0));
Chris Lattner62a709d2007-11-05 00:23:57 +0000925 }
926
927 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
928 Builder.CreateStore(InitVal, Alloca);
929
930 // Remember the old variable binding so that we can restore the binding when
931 // we unrecurse.
932 OldBindings.push_back(NamedValues[VarName]);
933
934 // Remember this binding.
935 NamedValues[VarName] = Alloca;
936 }
937</pre>
938</div>
939
940<p>There are more comments here than code. The basic idea is that we emit the
941initializer, create the alloca, then update the symbol table to point to it.
942Once all the variables are installed in the symbol table, we evaluate the body
943of the var/in expression:</p>
944
945<div class="doc_code">
946<pre>
947 // Codegen the body, now that all vars are in scope.
948 Value *BodyVal = Body-&gt;Codegen();
949 if (BodyVal == 0) return 0;
950</pre>
951</div>
952
953<p>Finally, before returning, we restore the previous variable bindings:</p>
954
955<div class="doc_code">
956<pre>
957 // Pop all our variables from scope.
958 for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
959 NamedValues[VarNames[i].first] = OldBindings[i];
960
961 // Return the body computation.
962 return BodyVal;
963}
964</pre>
965</div>
966
967<p>The end result of all of this is that we get properly scoped variable
968definitions, and we even (trivially) allow mutation of them :).</p>
969
970<p>With this, we completed what we set out to do. Our nice iterative fib
971example from the intro compiles and runs just fine. The mem2reg pass optimizes
972all of our stack variables into SSA registers, inserting PHI nodes where needed,
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000973and our front-end remains simple: no "iterated dominance frontier" computation
Chris Lattner62a709d2007-11-05 00:23:57 +0000974anywhere in sight.</p>
975
976</div>
Chris Lattner00c992d2007-11-03 08:55:29 +0000977
978<!-- *********************************************************************** -->
NAKAMURA Takumi05d02652011-04-18 23:59:50 +0000979<h2><a name="code">Full Code Listing</a></h2>
Chris Lattner00c992d2007-11-03 08:55:29 +0000980<!-- *********************************************************************** -->
981
982<div class="doc_text">
983
984<p>
Chris Lattner62a709d2007-11-05 00:23:57 +0000985Here is the complete code listing for our running example, enhanced with mutable
986variables and var/in support. To build this example, use:
Chris Lattner00c992d2007-11-03 08:55:29 +0000987</p>
988
989<div class="doc_code">
990<pre>
991 # Compile
992 g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
993 # Run
994 ./toy
995</pre>
996</div>
997
998<p>Here is the code:</p>
999
1000<div class="doc_code">
1001<pre>
Chris Lattner62a709d2007-11-05 00:23:57 +00001002#include "llvm/DerivedTypes.h"
1003#include "llvm/ExecutionEngine/ExecutionEngine.h"
Nick Lewycky422094c2009-09-13 21:38:54 +00001004#include "llvm/ExecutionEngine/JIT.h"
Owen Andersond1fbd142009-07-08 20:50:47 +00001005#include "llvm/LLVMContext.h"
Chris Lattner62a709d2007-11-05 00:23:57 +00001006#include "llvm/Module.h"
Chris Lattner62a709d2007-11-05 00:23:57 +00001007#include "llvm/PassManager.h"
1008#include "llvm/Analysis/Verifier.h"
Dan Gohmanab7fa082010-11-16 17:28:22 +00001009#include "llvm/Analysis/Passes.h"
Chris Lattner62a709d2007-11-05 00:23:57 +00001010#include "llvm/Target/TargetData.h"
Nick Lewycky422094c2009-09-13 21:38:54 +00001011#include "llvm/Target/TargetSelect.h"
Chris Lattner62a709d2007-11-05 00:23:57 +00001012#include "llvm/Transforms/Scalar.h"
Duncan Sands89f6d882008-04-13 06:22:09 +00001013#include "llvm/Support/IRBuilder.h"
Chris Lattner62a709d2007-11-05 00:23:57 +00001014#include &lt;cstdio&gt;
1015#include &lt;string&gt;
1016#include &lt;map&gt;
1017#include &lt;vector&gt;
1018using namespace llvm;
1019
1020//===----------------------------------------------------------------------===//
1021// Lexer
1022//===----------------------------------------------------------------------===//
1023
1024// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
1025// of these for known things.
1026enum Token {
1027 tok_eof = -1,
1028
1029 // commands
1030 tok_def = -2, tok_extern = -3,
1031
1032 // primary
1033 tok_identifier = -4, tok_number = -5,
1034
1035 // control
1036 tok_if = -6, tok_then = -7, tok_else = -8,
1037 tok_for = -9, tok_in = -10,
1038
1039 // operators
1040 tok_binary = -11, tok_unary = -12,
1041
1042 // var definition
1043 tok_var = -13
1044};
1045
1046static std::string IdentifierStr; // Filled in if tok_identifier
1047static double NumVal; // Filled in if tok_number
1048
1049/// gettok - Return the next token from standard input.
1050static int gettok() {
1051 static int LastChar = ' ';
1052
1053 // Skip any whitespace.
1054 while (isspace(LastChar))
1055 LastChar = getchar();
1056
1057 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
1058 IdentifierStr = LastChar;
1059 while (isalnum((LastChar = getchar())))
1060 IdentifierStr += LastChar;
1061
1062 if (IdentifierStr == "def") return tok_def;
1063 if (IdentifierStr == "extern") return tok_extern;
1064 if (IdentifierStr == "if") return tok_if;
1065 if (IdentifierStr == "then") return tok_then;
1066 if (IdentifierStr == "else") return tok_else;
1067 if (IdentifierStr == "for") return tok_for;
1068 if (IdentifierStr == "in") return tok_in;
1069 if (IdentifierStr == "binary") return tok_binary;
1070 if (IdentifierStr == "unary") return tok_unary;
1071 if (IdentifierStr == "var") return tok_var;
1072 return tok_identifier;
1073 }
1074
1075 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
1076 std::string NumStr;
1077 do {
1078 NumStr += LastChar;
1079 LastChar = getchar();
1080 } while (isdigit(LastChar) || LastChar == '.');
1081
1082 NumVal = strtod(NumStr.c_str(), 0);
1083 return tok_number;
1084 }
1085
1086 if (LastChar == '#') {
1087 // Comment until end of line.
1088 do LastChar = getchar();
Chris Lattnerc80c23f2007-12-02 22:46:01 +00001089 while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp;&amp; LastChar != '\r');
Chris Lattner62a709d2007-11-05 00:23:57 +00001090
1091 if (LastChar != EOF)
1092 return gettok();
1093 }
1094
1095 // Check for end of file. Don't eat the EOF.
1096 if (LastChar == EOF)
1097 return tok_eof;
1098
1099 // Otherwise, just return the character as its ascii value.
1100 int ThisChar = LastChar;
1101 LastChar = getchar();
1102 return ThisChar;
1103}
1104
1105//===----------------------------------------------------------------------===//
1106// Abstract Syntax Tree (aka Parse Tree)
1107//===----------------------------------------------------------------------===//
1108
1109/// ExprAST - Base class for all expression nodes.
1110class ExprAST {
1111public:
1112 virtual ~ExprAST() {}
1113 virtual Value *Codegen() = 0;
1114};
1115
1116/// NumberExprAST - Expression class for numeric literals like "1.0".
1117class NumberExprAST : public ExprAST {
1118 double Val;
1119public:
1120 NumberExprAST(double val) : Val(val) {}
1121 virtual Value *Codegen();
1122};
1123
1124/// VariableExprAST - Expression class for referencing a variable, like "a".
1125class VariableExprAST : public ExprAST {
1126 std::string Name;
1127public:
1128 VariableExprAST(const std::string &amp;name) : Name(name) {}
1129 const std::string &amp;getName() const { return Name; }
1130 virtual Value *Codegen();
1131};
1132
1133/// UnaryExprAST - Expression class for a unary operator.
1134class UnaryExprAST : public ExprAST {
1135 char Opcode;
1136 ExprAST *Operand;
1137public:
1138 UnaryExprAST(char opcode, ExprAST *operand)
1139 : Opcode(opcode), Operand(operand) {}
1140 virtual Value *Codegen();
1141};
1142
1143/// BinaryExprAST - Expression class for a binary operator.
1144class BinaryExprAST : public ExprAST {
1145 char Op;
1146 ExprAST *LHS, *RHS;
1147public:
1148 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
1149 : Op(op), LHS(lhs), RHS(rhs) {}
1150 virtual Value *Codegen();
1151};
1152
1153/// CallExprAST - Expression class for function calls.
1154class CallExprAST : public ExprAST {
1155 std::string Callee;
1156 std::vector&lt;ExprAST*&gt; Args;
1157public:
1158 CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
1159 : Callee(callee), Args(args) {}
1160 virtual Value *Codegen();
1161};
1162
1163/// IfExprAST - Expression class for if/then/else.
1164class IfExprAST : public ExprAST {
1165 ExprAST *Cond, *Then, *Else;
1166public:
1167 IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
1168 : Cond(cond), Then(then), Else(_else) {}
1169 virtual Value *Codegen();
1170};
1171
1172/// ForExprAST - Expression class for for/in.
1173class ForExprAST : public ExprAST {
1174 std::string VarName;
1175 ExprAST *Start, *End, *Step, *Body;
1176public:
1177 ForExprAST(const std::string &amp;varname, ExprAST *start, ExprAST *end,
1178 ExprAST *step, ExprAST *body)
1179 : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
1180 virtual Value *Codegen();
1181};
1182
1183/// VarExprAST - Expression class for var/in
1184class VarExprAST : public ExprAST {
1185 std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
1186 ExprAST *Body;
1187public:
1188 VarExprAST(const std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; &amp;varnames,
1189 ExprAST *body)
1190 : VarNames(varnames), Body(body) {}
1191
1192 virtual Value *Codegen();
1193};
1194
1195/// PrototypeAST - This class represents the "prototype" for a function,
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00001196/// which captures its name, and its argument names (thus implicitly the number
1197/// of arguments the function takes), as well as if it is an operator.
Chris Lattner62a709d2007-11-05 00:23:57 +00001198class PrototypeAST {
1199 std::string Name;
1200 std::vector&lt;std::string&gt; Args;
1201 bool isOperator;
1202 unsigned Precedence; // Precedence if a binary op.
1203public:
1204 PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args,
1205 bool isoperator = false, unsigned prec = 0)
1206 : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {}
1207
1208 bool isUnaryOp() const { return isOperator &amp;&amp; Args.size() == 1; }
1209 bool isBinaryOp() const { return isOperator &amp;&amp; Args.size() == 2; }
1210
1211 char getOperatorName() const {
1212 assert(isUnaryOp() || isBinaryOp());
1213 return Name[Name.size()-1];
1214 }
1215
1216 unsigned getBinaryPrecedence() const { return Precedence; }
1217
1218 Function *Codegen();
1219
1220 void CreateArgumentAllocas(Function *F);
1221};
1222
1223/// FunctionAST - This class represents a function definition itself.
1224class FunctionAST {
1225 PrototypeAST *Proto;
1226 ExprAST *Body;
1227public:
1228 FunctionAST(PrototypeAST *proto, ExprAST *body)
1229 : Proto(proto), Body(body) {}
1230
1231 Function *Codegen();
1232};
1233
1234//===----------------------------------------------------------------------===//
1235// Parser
1236//===----------------------------------------------------------------------===//
1237
1238/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00001239/// token the parser is looking at. getNextToken reads another token from the
Chris Lattner62a709d2007-11-05 00:23:57 +00001240/// lexer and updates CurTok with its results.
1241static int CurTok;
1242static int getNextToken() {
1243 return CurTok = gettok();
1244}
1245
1246/// BinopPrecedence - This holds the precedence for each binary operator that is
1247/// defined.
1248static std::map&lt;char, int&gt; BinopPrecedence;
1249
1250/// GetTokPrecedence - Get the precedence of the pending binary operator token.
1251static int GetTokPrecedence() {
1252 if (!isascii(CurTok))
1253 return -1;
1254
1255 // Make sure it's a declared binop.
1256 int TokPrec = BinopPrecedence[CurTok];
1257 if (TokPrec &lt;= 0) return -1;
1258 return TokPrec;
1259}
1260
1261/// Error* - These are little helper functions for error handling.
1262ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
1263PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
1264FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
1265
1266static ExprAST *ParseExpression();
1267
1268/// identifierexpr
Chris Lattner20a0c802007-11-05 17:54:34 +00001269/// ::= identifier
1270/// ::= identifier '(' expression* ')'
Chris Lattner62a709d2007-11-05 00:23:57 +00001271static ExprAST *ParseIdentifierExpr() {
1272 std::string IdName = IdentifierStr;
1273
Chris Lattner20a0c802007-11-05 17:54:34 +00001274 getNextToken(); // eat identifier.
Chris Lattner62a709d2007-11-05 00:23:57 +00001275
1276 if (CurTok != '(') // Simple variable ref.
1277 return new VariableExprAST(IdName);
1278
1279 // Call.
1280 getNextToken(); // eat (
1281 std::vector&lt;ExprAST*&gt; Args;
1282 if (CurTok != ')') {
1283 while (1) {
1284 ExprAST *Arg = ParseExpression();
1285 if (!Arg) return 0;
1286 Args.push_back(Arg);
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00001287
Chris Lattner62a709d2007-11-05 00:23:57 +00001288 if (CurTok == ')') break;
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00001289
Chris Lattner62a709d2007-11-05 00:23:57 +00001290 if (CurTok != ',')
Chris Lattner6c4be9c2008-04-14 16:44:41 +00001291 return Error("Expected ')' or ',' in argument list");
Chris Lattner62a709d2007-11-05 00:23:57 +00001292 getNextToken();
1293 }
1294 }
1295
1296 // Eat the ')'.
1297 getNextToken();
1298
1299 return new CallExprAST(IdName, Args);
1300}
1301
1302/// numberexpr ::= number
1303static ExprAST *ParseNumberExpr() {
1304 ExprAST *Result = new NumberExprAST(NumVal);
1305 getNextToken(); // consume the number
1306 return Result;
1307}
1308
1309/// parenexpr ::= '(' expression ')'
1310static ExprAST *ParseParenExpr() {
1311 getNextToken(); // eat (.
1312 ExprAST *V = ParseExpression();
1313 if (!V) return 0;
1314
1315 if (CurTok != ')')
1316 return Error("expected ')'");
1317 getNextToken(); // eat ).
1318 return V;
1319}
1320
1321/// ifexpr ::= 'if' expression 'then' expression 'else' expression
1322static ExprAST *ParseIfExpr() {
1323 getNextToken(); // eat the if.
1324
1325 // condition.
1326 ExprAST *Cond = ParseExpression();
1327 if (!Cond) return 0;
1328
1329 if (CurTok != tok_then)
1330 return Error("expected then");
1331 getNextToken(); // eat the then
1332
1333 ExprAST *Then = ParseExpression();
1334 if (Then == 0) return 0;
1335
1336 if (CurTok != tok_else)
1337 return Error("expected else");
1338
1339 getNextToken();
1340
1341 ExprAST *Else = ParseExpression();
1342 if (!Else) return 0;
1343
1344 return new IfExprAST(Cond, Then, Else);
1345}
1346
Chris Lattner20a0c802007-11-05 17:54:34 +00001347/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
Chris Lattner62a709d2007-11-05 00:23:57 +00001348static ExprAST *ParseForExpr() {
1349 getNextToken(); // eat the for.
1350
1351 if (CurTok != tok_identifier)
1352 return Error("expected identifier after for");
1353
1354 std::string IdName = IdentifierStr;
Chris Lattner20a0c802007-11-05 17:54:34 +00001355 getNextToken(); // eat identifier.
Chris Lattner62a709d2007-11-05 00:23:57 +00001356
1357 if (CurTok != '=')
1358 return Error("expected '=' after for");
1359 getNextToken(); // eat '='.
1360
1361
1362 ExprAST *Start = ParseExpression();
1363 if (Start == 0) return 0;
1364 if (CurTok != ',')
1365 return Error("expected ',' after for start value");
1366 getNextToken();
1367
1368 ExprAST *End = ParseExpression();
1369 if (End == 0) return 0;
1370
1371 // The step value is optional.
1372 ExprAST *Step = 0;
1373 if (CurTok == ',') {
1374 getNextToken();
1375 Step = ParseExpression();
1376 if (Step == 0) return 0;
1377 }
1378
1379 if (CurTok != tok_in)
1380 return Error("expected 'in' after for");
1381 getNextToken(); // eat 'in'.
1382
1383 ExprAST *Body = ParseExpression();
1384 if (Body == 0) return 0;
1385
1386 return new ForExprAST(IdName, Start, End, Step, Body);
1387}
1388
Chris Lattner20a0c802007-11-05 17:54:34 +00001389/// varexpr ::= 'var' identifier ('=' expression)?
1390// (',' identifier ('=' expression)?)* 'in' expression
Chris Lattner62a709d2007-11-05 00:23:57 +00001391static ExprAST *ParseVarExpr() {
1392 getNextToken(); // eat the var.
1393
1394 std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
1395
1396 // At least one variable name is required.
1397 if (CurTok != tok_identifier)
1398 return Error("expected identifier after var");
1399
1400 while (1) {
1401 std::string Name = IdentifierStr;
Chris Lattner20a0c802007-11-05 17:54:34 +00001402 getNextToken(); // eat identifier.
Chris Lattner62a709d2007-11-05 00:23:57 +00001403
1404 // Read the optional initializer.
1405 ExprAST *Init = 0;
1406 if (CurTok == '=') {
1407 getNextToken(); // eat the '='.
1408
1409 Init = ParseExpression();
1410 if (Init == 0) return 0;
1411 }
1412
1413 VarNames.push_back(std::make_pair(Name, Init));
1414
1415 // End of var list, exit loop.
1416 if (CurTok != ',') break;
1417 getNextToken(); // eat the ','.
1418
1419 if (CurTok != tok_identifier)
1420 return Error("expected identifier list after var");
1421 }
1422
1423 // At this point, we have to have 'in'.
1424 if (CurTok != tok_in)
1425 return Error("expected 'in' keyword after 'var'");
1426 getNextToken(); // eat 'in'.
1427
1428 ExprAST *Body = ParseExpression();
1429 if (Body == 0) return 0;
1430
1431 return new VarExprAST(VarNames, Body);
1432}
1433
Chris Lattner62a709d2007-11-05 00:23:57 +00001434/// primary
1435/// ::= identifierexpr
1436/// ::= numberexpr
1437/// ::= parenexpr
1438/// ::= ifexpr
1439/// ::= forexpr
1440/// ::= varexpr
1441static ExprAST *ParsePrimary() {
1442 switch (CurTok) {
1443 default: return Error("unknown token when expecting an expression");
1444 case tok_identifier: return ParseIdentifierExpr();
1445 case tok_number: return ParseNumberExpr();
1446 case '(': return ParseParenExpr();
1447 case tok_if: return ParseIfExpr();
1448 case tok_for: return ParseForExpr();
1449 case tok_var: return ParseVarExpr();
1450 }
1451}
1452
1453/// unary
1454/// ::= primary
1455/// ::= '!' unary
1456static ExprAST *ParseUnary() {
1457 // If the current token is not an operator, it must be a primary expr.
1458 if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
1459 return ParsePrimary();
1460
1461 // If this is a unary operator, read it.
1462 int Opc = CurTok;
1463 getNextToken();
1464 if (ExprAST *Operand = ParseUnary())
1465 return new UnaryExprAST(Opc, Operand);
1466 return 0;
1467}
1468
1469/// binoprhs
1470/// ::= ('+' unary)*
1471static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
1472 // If this is a binop, find its precedence.
1473 while (1) {
1474 int TokPrec = GetTokPrecedence();
1475
1476 // If this is a binop that binds at least as tightly as the current binop,
1477 // consume it, otherwise we are done.
1478 if (TokPrec &lt; ExprPrec)
1479 return LHS;
1480
1481 // Okay, we know this is a binop.
1482 int BinOp = CurTok;
1483 getNextToken(); // eat binop
1484
1485 // Parse the unary expression after the binary operator.
1486 ExprAST *RHS = ParseUnary();
1487 if (!RHS) return 0;
1488
1489 // If BinOp binds less tightly with RHS than the operator after RHS, let
1490 // the pending operator take RHS as its LHS.
1491 int NextPrec = GetTokPrecedence();
1492 if (TokPrec &lt; NextPrec) {
1493 RHS = ParseBinOpRHS(TokPrec+1, RHS);
1494 if (RHS == 0) return 0;
1495 }
1496
1497 // Merge LHS/RHS.
1498 LHS = new BinaryExprAST(BinOp, LHS, RHS);
1499 }
1500}
1501
1502/// expression
1503/// ::= unary binoprhs
1504///
1505static ExprAST *ParseExpression() {
1506 ExprAST *LHS = ParseUnary();
1507 if (!LHS) return 0;
1508
1509 return ParseBinOpRHS(0, LHS);
1510}
1511
1512/// prototype
1513/// ::= id '(' id* ')'
1514/// ::= binary LETTER number? (id, id)
1515/// ::= unary LETTER (id)
1516static PrototypeAST *ParsePrototype() {
1517 std::string FnName;
1518
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00001519 unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
Chris Lattner62a709d2007-11-05 00:23:57 +00001520 unsigned BinaryPrecedence = 30;
1521
1522 switch (CurTok) {
1523 default:
1524 return ErrorP("Expected function name in prototype");
1525 case tok_identifier:
1526 FnName = IdentifierStr;
1527 Kind = 0;
1528 getNextToken();
1529 break;
1530 case tok_unary:
1531 getNextToken();
1532 if (!isascii(CurTok))
1533 return ErrorP("Expected unary operator");
1534 FnName = "unary";
1535 FnName += (char)CurTok;
1536 Kind = 1;
1537 getNextToken();
1538 break;
1539 case tok_binary:
1540 getNextToken();
1541 if (!isascii(CurTok))
1542 return ErrorP("Expected binary operator");
1543 FnName = "binary";
1544 FnName += (char)CurTok;
1545 Kind = 2;
1546 getNextToken();
1547
1548 // Read the precedence if present.
1549 if (CurTok == tok_number) {
1550 if (NumVal &lt; 1 || NumVal &gt; 100)
1551 return ErrorP("Invalid precedecnce: must be 1..100");
1552 BinaryPrecedence = (unsigned)NumVal;
1553 getNextToken();
1554 }
1555 break;
1556 }
1557
1558 if (CurTok != '(')
1559 return ErrorP("Expected '(' in prototype");
1560
1561 std::vector&lt;std::string&gt; ArgNames;
1562 while (getNextToken() == tok_identifier)
1563 ArgNames.push_back(IdentifierStr);
1564 if (CurTok != ')')
1565 return ErrorP("Expected ')' in prototype");
1566
1567 // success.
1568 getNextToken(); // eat ')'.
1569
1570 // Verify right number of names for operator.
1571 if (Kind &amp;&amp; ArgNames.size() != Kind)
1572 return ErrorP("Invalid number of operands for operator");
1573
1574 return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);
1575}
1576
1577/// definition ::= 'def' prototype expression
1578static FunctionAST *ParseDefinition() {
1579 getNextToken(); // eat def.
1580 PrototypeAST *Proto = ParsePrototype();
1581 if (Proto == 0) return 0;
1582
1583 if (ExprAST *E = ParseExpression())
1584 return new FunctionAST(Proto, E);
1585 return 0;
1586}
1587
1588/// toplevelexpr ::= expression
1589static FunctionAST *ParseTopLevelExpr() {
1590 if (ExprAST *E = ParseExpression()) {
1591 // Make an anonymous proto.
1592 PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
1593 return new FunctionAST(Proto, E);
1594 }
1595 return 0;
1596}
1597
1598/// external ::= 'extern' prototype
1599static PrototypeAST *ParseExtern() {
1600 getNextToken(); // eat extern.
1601 return ParsePrototype();
1602}
1603
1604//===----------------------------------------------------------------------===//
1605// Code Generation
1606//===----------------------------------------------------------------------===//
1607
1608static Module *TheModule;
Owen Andersond1fbd142009-07-08 20:50:47 +00001609static IRBuilder&lt;&gt; Builder(getGlobalContext());
Chris Lattner62a709d2007-11-05 00:23:57 +00001610static std::map&lt;std::string, AllocaInst*&gt; NamedValues;
1611static FunctionPassManager *TheFPM;
1612
1613Value *ErrorV(const char *Str) { Error(Str); return 0; }
1614
1615/// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
1616/// the function. This is used for mutable variables etc.
1617static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
1618 const std::string &amp;VarName) {
Gabor Greifd6c1ed02009-03-11 19:51:07 +00001619 IRBuilder&lt;&gt; TmpB(&amp;TheFunction-&gt;getEntryBlock(),
Duncan Sands89f6d882008-04-13 06:22:09 +00001620 TheFunction-&gt;getEntryBlock().begin());
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00001621 return TmpB.CreateAlloca(Type::getDoubleTy(getGlobalContext()), 0,
1622 VarName.c_str());
Chris Lattner62a709d2007-11-05 00:23:57 +00001623}
1624
Chris Lattner62a709d2007-11-05 00:23:57 +00001625Value *NumberExprAST::Codegen() {
Owen Anderson6f83c9c2009-07-27 20:59:43 +00001626 return ConstantFP::get(getGlobalContext(), APFloat(Val));
Chris Lattner62a709d2007-11-05 00:23:57 +00001627}
1628
1629Value *VariableExprAST::Codegen() {
1630 // Look this variable up in the function.
1631 Value *V = NamedValues[Name];
1632 if (V == 0) return ErrorV("Unknown variable name");
1633
1634 // Load the value.
1635 return Builder.CreateLoad(V, Name.c_str());
1636}
1637
1638Value *UnaryExprAST::Codegen() {
1639 Value *OperandV = Operand-&gt;Codegen();
1640 if (OperandV == 0) return 0;
1641
1642 Function *F = TheModule-&gt;getFunction(std::string("unary")+Opcode);
1643 if (F == 0)
1644 return ErrorV("Unknown unary operator");
1645
1646 return Builder.CreateCall(F, OperandV, "unop");
1647}
1648
Chris Lattner62a709d2007-11-05 00:23:57 +00001649Value *BinaryExprAST::Codegen() {
1650 // Special case '=' because we don't want to emit the LHS as an expression.
1651 if (Op == '=') {
1652 // Assignment requires the LHS to be an identifier.
1653 VariableExprAST *LHSE = dynamic_cast&lt;VariableExprAST*&gt;(LHS);
1654 if (!LHSE)
1655 return ErrorV("destination of '=' must be a variable");
1656 // Codegen the RHS.
1657 Value *Val = RHS-&gt;Codegen();
1658 if (Val == 0) return 0;
1659
1660 // Look up the name.
1661 Value *Variable = NamedValues[LHSE-&gt;getName()];
1662 if (Variable == 0) return ErrorV("Unknown variable name");
1663
1664 Builder.CreateStore(Val, Variable);
1665 return Val;
1666 }
1667
Chris Lattner62a709d2007-11-05 00:23:57 +00001668 Value *L = LHS-&gt;Codegen();
1669 Value *R = RHS-&gt;Codegen();
1670 if (L == 0 || R == 0) return 0;
1671
1672 switch (Op) {
Eric Christopher2214b812010-06-14 06:09:39 +00001673 case '+': return Builder.CreateFAdd(L, R, "addtmp");
1674 case '-': return Builder.CreateFSub(L, R, "subtmp");
1675 case '*': return Builder.CreateFMul(L, R, "multmp");
Chris Lattner62a709d2007-11-05 00:23:57 +00001676 case '&lt;':
Chris Lattner71155212007-11-06 01:39:12 +00001677 L = Builder.CreateFCmpULT(L, R, "cmptmp");
Chris Lattner62a709d2007-11-05 00:23:57 +00001678 // Convert bool 0/1 to double 0.0 or 1.0
Nick Lewycky422094c2009-09-13 21:38:54 +00001679 return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
1680 "booltmp");
Chris Lattner62a709d2007-11-05 00:23:57 +00001681 default: break;
1682 }
1683
1684 // If it wasn't a builtin binary operator, it must be a user defined one. Emit
1685 // a call to it.
1686 Function *F = TheModule-&gt;getFunction(std::string("binary")+Op);
1687 assert(F &amp;&amp; "binary operator not found!");
1688
1689 Value *Ops[] = { L, R };
1690 return Builder.CreateCall(F, Ops, Ops+2, "binop");
1691}
1692
1693Value *CallExprAST::Codegen() {
1694 // Look up the name in the global module table.
1695 Function *CalleeF = TheModule-&gt;getFunction(Callee);
1696 if (CalleeF == 0)
1697 return ErrorV("Unknown function referenced");
1698
1699 // If argument mismatch error.
1700 if (CalleeF-&gt;arg_size() != Args.size())
1701 return ErrorV("Incorrect # arguments passed");
1702
1703 std::vector&lt;Value*&gt; ArgsV;
1704 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
1705 ArgsV.push_back(Args[i]-&gt;Codegen());
1706 if (ArgsV.back() == 0) return 0;
1707 }
1708
1709 return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
1710}
1711
1712Value *IfExprAST::Codegen() {
1713 Value *CondV = Cond-&gt;Codegen();
1714 if (CondV == 0) return 0;
1715
1716 // Convert condition to a bool by comparing equal to 0.0.
1717 CondV = Builder.CreateFCmpONE(CondV,
Owen Anderson6f83c9c2009-07-27 20:59:43 +00001718 ConstantFP::get(getGlobalContext(), APFloat(0.0)),
Chris Lattner62a709d2007-11-05 00:23:57 +00001719 "ifcond");
1720
1721 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1722
1723 // Create blocks for the then and else cases. Insert the 'then' block at the
1724 // end of the function.
Owen Anderson1d0be152009-08-13 21:58:54 +00001725 BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction);
1726 BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else");
1727 BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont");
Chris Lattner62a709d2007-11-05 00:23:57 +00001728
1729 Builder.CreateCondBr(CondV, ThenBB, ElseBB);
1730
1731 // Emit then value.
1732 Builder.SetInsertPoint(ThenBB);
1733
1734 Value *ThenV = Then-&gt;Codegen();
1735 if (ThenV == 0) return 0;
1736
1737 Builder.CreateBr(MergeBB);
1738 // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
1739 ThenBB = Builder.GetInsertBlock();
1740
1741 // Emit else block.
1742 TheFunction-&gt;getBasicBlockList().push_back(ElseBB);
1743 Builder.SetInsertPoint(ElseBB);
1744
1745 Value *ElseV = Else-&gt;Codegen();
1746 if (ElseV == 0) return 0;
1747
1748 Builder.CreateBr(MergeBB);
1749 // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
1750 ElseBB = Builder.GetInsertBlock();
1751
1752 // Emit merge block.
1753 TheFunction-&gt;getBasicBlockList().push_back(MergeBB);
1754 Builder.SetInsertPoint(MergeBB);
Jay Foad3ecfc862011-03-30 11:28:46 +00001755 PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2,
Nick Lewycky422094c2009-09-13 21:38:54 +00001756 "iftmp");
Chris Lattner62a709d2007-11-05 00:23:57 +00001757
1758 PN-&gt;addIncoming(ThenV, ThenBB);
1759 PN-&gt;addIncoming(ElseV, ElseBB);
1760 return PN;
1761}
1762
1763Value *ForExprAST::Codegen() {
1764 // Output this as:
1765 // var = alloca double
1766 // ...
1767 // start = startexpr
1768 // store start -&gt; var
1769 // goto loop
1770 // loop:
1771 // ...
1772 // bodyexpr
1773 // ...
1774 // loopend:
1775 // step = stepexpr
1776 // endcond = endexpr
1777 //
1778 // curvar = load var
1779 // nextvar = curvar + step
1780 // store nextvar -&gt; var
1781 // br endcond, loop, endloop
1782 // outloop:
1783
1784 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1785
1786 // Create an alloca for the variable in the entry block.
1787 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
1788
1789 // Emit the start code first, without 'variable' in scope.
1790 Value *StartVal = Start-&gt;Codegen();
1791 if (StartVal == 0) return 0;
1792
1793 // Store the value into the alloca.
1794 Builder.CreateStore(StartVal, Alloca);
1795
1796 // Make the new basic block for the loop header, inserting after current
1797 // block.
Owen Anderson1d0be152009-08-13 21:58:54 +00001798 BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction);
Chris Lattner62a709d2007-11-05 00:23:57 +00001799
1800 // Insert an explicit fall through from the current block to the LoopBB.
1801 Builder.CreateBr(LoopBB);
1802
1803 // Start insertion in LoopBB.
1804 Builder.SetInsertPoint(LoopBB);
1805
1806 // Within the loop, the variable is defined equal to the PHI node. If it
1807 // shadows an existing variable, we have to restore it, so save it now.
1808 AllocaInst *OldVal = NamedValues[VarName];
1809 NamedValues[VarName] = Alloca;
1810
1811 // Emit the body of the loop. This, like any other expr, can change the
1812 // current BB. Note that we ignore the value computed by the body, but don't
1813 // allow an error.
1814 if (Body-&gt;Codegen() == 0)
1815 return 0;
1816
1817 // Emit the step value.
1818 Value *StepVal;
1819 if (Step) {
1820 StepVal = Step-&gt;Codegen();
1821 if (StepVal == 0) return 0;
1822 } else {
1823 // If not specified, use 1.0.
Owen Anderson6f83c9c2009-07-27 20:59:43 +00001824 StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0));
Chris Lattner62a709d2007-11-05 00:23:57 +00001825 }
1826
1827 // Compute the end condition.
1828 Value *EndCond = End-&gt;Codegen();
1829 if (EndCond == 0) return EndCond;
1830
1831 // Reload, increment, and restore the alloca. This handles the case where
1832 // the body of the loop mutates the variable.
1833 Value *CurVar = Builder.CreateLoad(Alloca, VarName.c_str());
Chris Lattnerf57fcf72010-09-01 20:09:20 +00001834 Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar");
Chris Lattner62a709d2007-11-05 00:23:57 +00001835 Builder.CreateStore(NextVar, Alloca);
1836
1837 // Convert condition to a bool by comparing equal to 0.0.
1838 EndCond = Builder.CreateFCmpONE(EndCond,
Owen Anderson6f83c9c2009-07-27 20:59:43 +00001839 ConstantFP::get(getGlobalContext(), APFloat(0.0)),
Chris Lattner62a709d2007-11-05 00:23:57 +00001840 "loopcond");
1841
1842 // Create the "after loop" block and insert it.
Owen Anderson1d0be152009-08-13 21:58:54 +00001843 BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction);
Chris Lattner62a709d2007-11-05 00:23:57 +00001844
1845 // Insert the conditional branch into the end of LoopEndBB.
1846 Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
1847
1848 // Any new code will be inserted in AfterBB.
1849 Builder.SetInsertPoint(AfterBB);
1850
1851 // Restore the unshadowed variable.
1852 if (OldVal)
1853 NamedValues[VarName] = OldVal;
1854 else
1855 NamedValues.erase(VarName);
1856
1857
1858 // for expr always returns 0.0.
Owen Anderson1d0be152009-08-13 21:58:54 +00001859 return Constant::getNullValue(Type::getDoubleTy(getGlobalContext()));
Chris Lattner62a709d2007-11-05 00:23:57 +00001860}
1861
1862Value *VarExprAST::Codegen() {
1863 std::vector&lt;AllocaInst *&gt; OldBindings;
1864
1865 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1866
1867 // Register all variables and emit their initializer.
1868 for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
1869 const std::string &amp;VarName = VarNames[i].first;
1870 ExprAST *Init = VarNames[i].second;
1871
1872 // Emit the initializer before adding the variable to scope, this prevents
1873 // the initializer from referencing the variable itself, and permits stuff
1874 // like this:
1875 // var a = 1 in
1876 // var a = a in ... # refers to outer 'a'.
1877 Value *InitVal;
1878 if (Init) {
1879 InitVal = Init-&gt;Codegen();
1880 if (InitVal == 0) return 0;
1881 } else { // If not specified, use 0.0.
Owen Anderson6f83c9c2009-07-27 20:59:43 +00001882 InitVal = ConstantFP::get(getGlobalContext(), APFloat(0.0));
Chris Lattner62a709d2007-11-05 00:23:57 +00001883 }
1884
1885 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
1886 Builder.CreateStore(InitVal, Alloca);
1887
1888 // Remember the old variable binding so that we can restore the binding when
1889 // we unrecurse.
1890 OldBindings.push_back(NamedValues[VarName]);
1891
1892 // Remember this binding.
1893 NamedValues[VarName] = Alloca;
1894 }
1895
1896 // Codegen the body, now that all vars are in scope.
1897 Value *BodyVal = Body-&gt;Codegen();
1898 if (BodyVal == 0) return 0;
1899
1900 // Pop all our variables from scope.
1901 for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
1902 NamedValues[VarNames[i].first] = OldBindings[i];
1903
1904 // Return the body computation.
1905 return BodyVal;
1906}
1907
Chris Lattner62a709d2007-11-05 00:23:57 +00001908Function *PrototypeAST::Codegen() {
1909 // Make the function type: double(double,double) etc.
Nick Lewycky422094c2009-09-13 21:38:54 +00001910 std::vector&lt;const Type*&gt; Doubles(Args.size(),
1911 Type::getDoubleTy(getGlobalContext()));
1912 FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()),
1913 Doubles, false);
Chris Lattner62a709d2007-11-05 00:23:57 +00001914
Gabor Greifdf7d2b42008-04-19 22:25:09 +00001915 Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
Chris Lattner62a709d2007-11-05 00:23:57 +00001916
1917 // If F conflicted, there was already something named 'Name'. If it has a
1918 // body, don't allow redefinition or reextern.
1919 if (F-&gt;getName() != Name) {
1920 // Delete the one we just made and get the existing one.
1921 F-&gt;eraseFromParent();
1922 F = TheModule-&gt;getFunction(Name);
1923
1924 // If F already has a body, reject this.
1925 if (!F-&gt;empty()) {
1926 ErrorF("redefinition of function");
1927 return 0;
1928 }
1929
1930 // If F took a different number of args, reject.
1931 if (F-&gt;arg_size() != Args.size()) {
1932 ErrorF("redefinition of function with different # args");
1933 return 0;
1934 }
1935 }
1936
1937 // Set names for all arguments.
1938 unsigned Idx = 0;
1939 for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
1940 ++AI, ++Idx)
1941 AI-&gt;setName(Args[Idx]);
1942
1943 return F;
1944}
1945
1946/// CreateArgumentAllocas - Create an alloca for each argument and register the
1947/// argument in the symbol table so that references to it will succeed.
1948void PrototypeAST::CreateArgumentAllocas(Function *F) {
1949 Function::arg_iterator AI = F-&gt;arg_begin();
1950 for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
1951 // Create an alloca for this variable.
1952 AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
1953
1954 // Store the initial value into the alloca.
1955 Builder.CreateStore(AI, Alloca);
1956
1957 // Add arguments to variable symbol table.
1958 NamedValues[Args[Idx]] = Alloca;
1959 }
1960}
1961
Chris Lattner62a709d2007-11-05 00:23:57 +00001962Function *FunctionAST::Codegen() {
1963 NamedValues.clear();
1964
1965 Function *TheFunction = Proto-&gt;Codegen();
1966 if (TheFunction == 0)
1967 return 0;
1968
1969 // If this is an operator, install it.
1970 if (Proto-&gt;isBinaryOp())
1971 BinopPrecedence[Proto-&gt;getOperatorName()] = Proto-&gt;getBinaryPrecedence();
1972
1973 // Create a new basic block to start insertion into.
Owen Anderson1d0be152009-08-13 21:58:54 +00001974 BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
Chris Lattner62a709d2007-11-05 00:23:57 +00001975 Builder.SetInsertPoint(BB);
1976
1977 // Add all arguments to the symbol table and create their allocas.
1978 Proto-&gt;CreateArgumentAllocas(TheFunction);
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00001979
Chris Lattner62a709d2007-11-05 00:23:57 +00001980 if (Value *RetVal = Body-&gt;Codegen()) {
1981 // Finish off the function.
1982 Builder.CreateRet(RetVal);
1983
1984 // Validate the generated code, checking for consistency.
1985 verifyFunction(*TheFunction);
1986
1987 // Optimize the function.
1988 TheFPM-&gt;run(*TheFunction);
1989
1990 return TheFunction;
1991 }
1992
1993 // Error reading body, remove function.
1994 TheFunction-&gt;eraseFromParent();
1995
1996 if (Proto-&gt;isBinaryOp())
1997 BinopPrecedence.erase(Proto-&gt;getOperatorName());
1998 return 0;
1999}
2000
2001//===----------------------------------------------------------------------===//
2002// Top-Level parsing and JIT Driver
2003//===----------------------------------------------------------------------===//
2004
2005static ExecutionEngine *TheExecutionEngine;
2006
2007static void HandleDefinition() {
2008 if (FunctionAST *F = ParseDefinition()) {
2009 if (Function *LF = F-&gt;Codegen()) {
2010 fprintf(stderr, "Read function definition:");
2011 LF-&gt;dump();
2012 }
2013 } else {
2014 // Skip token for error recovery.
2015 getNextToken();
2016 }
2017}
2018
2019static void HandleExtern() {
2020 if (PrototypeAST *P = ParseExtern()) {
2021 if (Function *F = P-&gt;Codegen()) {
2022 fprintf(stderr, "Read extern: ");
2023 F-&gt;dump();
2024 }
2025 } else {
2026 // Skip token for error recovery.
2027 getNextToken();
2028 }
2029}
2030
2031static void HandleTopLevelExpression() {
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00002032 // Evaluate a top-level expression into an anonymous function.
Chris Lattner62a709d2007-11-05 00:23:57 +00002033 if (FunctionAST *F = ParseTopLevelExpr()) {
2034 if (Function *LF = F-&gt;Codegen()) {
2035 // JIT the function, returning a function pointer.
2036 void *FPtr = TheExecutionEngine-&gt;getPointerToFunction(LF);
2037
2038 // Cast it to the right type (takes no arguments, returns a double) so we
2039 // can call it as a native function.
Nick Lewycky422094c2009-09-13 21:38:54 +00002040 double (*FP)() = (double (*)())(intptr_t)FPtr;
Chris Lattner62a709d2007-11-05 00:23:57 +00002041 fprintf(stderr, "Evaluated to %f\n", FP());
2042 }
2043 } else {
2044 // Skip token for error recovery.
2045 getNextToken();
2046 }
2047}
2048
2049/// top ::= definition | external | expression | ';'
2050static void MainLoop() {
2051 while (1) {
2052 fprintf(stderr, "ready&gt; ");
2053 switch (CurTok) {
2054 case tok_eof: return;
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00002055 case ';': getNextToken(); break; // ignore top-level semicolons.
Chris Lattner62a709d2007-11-05 00:23:57 +00002056 case tok_def: HandleDefinition(); break;
2057 case tok_extern: HandleExtern(); break;
2058 default: HandleTopLevelExpression(); break;
2059 }
2060 }
2061}
2062
Chris Lattner62a709d2007-11-05 00:23:57 +00002063//===----------------------------------------------------------------------===//
2064// "Library" functions that can be "extern'd" from user code.
2065//===----------------------------------------------------------------------===//
2066
2067/// putchard - putchar that takes a double and returns 0.
2068extern "C"
2069double putchard(double X) {
2070 putchar((char)X);
2071 return 0;
2072}
2073
2074/// printd - printf that takes a double prints it as "%f\n", returning 0.
2075extern "C"
2076double printd(double X) {
2077 printf("%f\n", X);
2078 return 0;
2079}
2080
2081//===----------------------------------------------------------------------===//
2082// Main driver code.
2083//===----------------------------------------------------------------------===//
2084
2085int main() {
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00002086 InitializeNativeTarget();
2087 LLVMContext &amp;Context = getGlobalContext();
2088
Chris Lattner62a709d2007-11-05 00:23:57 +00002089 // Install standard binary operators.
2090 // 1 is lowest precedence.
2091 BinopPrecedence['='] = 2;
2092 BinopPrecedence['&lt;'] = 10;
2093 BinopPrecedence['+'] = 20;
2094 BinopPrecedence['-'] = 20;
2095 BinopPrecedence['*'] = 40; // highest.
2096
2097 // Prime the first token.
2098 fprintf(stderr, "ready&gt; ");
2099 getNextToken();
2100
2101 // Make the module, which holds all the code.
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00002102 TheModule = new Module("my cool jit", Context);
Chris Lattner62a709d2007-11-05 00:23:57 +00002103
Jeffrey Yasskinf0356fe2010-01-27 20:34:15 +00002104 // Create the JIT. This takes ownership of the module.
Jeffrey Yasskin42fc5582010-02-11 19:15:20 +00002105 std::string ErrStr;
2106 TheExecutionEngine = EngineBuilder(TheModule).setErrorStr(&amp;ErrStr).create();
2107 if (!TheExecutionEngine) {
2108 fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str());
2109 exit(1);
2110 }
Chris Lattner62a709d2007-11-05 00:23:57 +00002111
Jeffrey Yasskinf0356fe2010-01-27 20:34:15 +00002112 FunctionPassManager OurFPM(TheModule);
Reid Kleckner60130f02009-08-26 20:58:25 +00002113
2114 // Set up the optimizer pipeline. Start with registering info about how the
2115 // target lays out data structures.
2116 OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
Dan Gohmandfa1a792010-11-15 18:41:10 +00002117 // Provide basic AliasAnalysis support for GVN.
2118 OurFPM.add(createBasicAliasAnalysisPass());
Nick Lewycky422094c2009-09-13 21:38:54 +00002119 // Promote allocas to registers.
2120 OurFPM.add(createPromoteMemoryToRegisterPass());
Reid Kleckner60130f02009-08-26 20:58:25 +00002121 // Do simple "peephole" optimizations and bit-twiddling optzns.
2122 OurFPM.add(createInstructionCombiningPass());
2123 // Reassociate expressions.
2124 OurFPM.add(createReassociatePass());
2125 // Eliminate Common SubExpressions.
2126 OurFPM.add(createGVNPass());
2127 // Simplify the control flow graph (deleting unreachable blocks, etc).
2128 OurFPM.add(createCFGSimplificationPass());
2129
Nick Lewycky422094c2009-09-13 21:38:54 +00002130 OurFPM.doInitialization();
2131
Reid Kleckner60130f02009-08-26 20:58:25 +00002132 // Set the global so the code gen can use this.
2133 TheFPM = &amp;OurFPM;
2134
2135 // Run the main "interpreter loop" now.
2136 MainLoop();
2137
2138 TheFPM = 0;
2139
2140 // Print out all of the generated code.
2141 TheModule-&gt;dump();
2142
Chris Lattner62a709d2007-11-05 00:23:57 +00002143 return 0;
2144}
Chris Lattner00c992d2007-11-03 08:55:29 +00002145</pre>
2146</div>
2147
Chris Lattner729eb142008-02-10 19:11:04 +00002148<a href="LangImpl8.html">Next: Conclusion and other useful LLVM tidbits</a>
Chris Lattner00c992d2007-11-03 08:55:29 +00002149</div>
2150
2151<!-- *********************************************************************** -->
2152<hr>
2153<address>
2154 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
2155 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
2156 <a href="http://validator.w3.org/check/referer"><img
2157 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
2158
2159 <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
NAKAMURA Takumib9a33632011-04-09 02:13:37 +00002160 <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br>
Dan Gohman523e3922010-02-03 17:27:31 +00002161 Last modified: $Date$
Chris Lattner00c992d2007-11-03 08:55:29 +00002162</address>
2163</body>
2164</html>