blob: ec07fa88d4b145162e74847f3c65577701230b42 [file] [log] [blame]
Chris Lattner00c992d2007-11-03 08:55:29 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
3
4<html>
5<head>
6 <title>Kaleidoscope: Extending the Language: Mutable Variables / SSA
7 construction</title>
8 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
9 <meta name="author" content="Chris Lattner">
10 <link rel="stylesheet" href="../llvm.css" type="text/css">
11</head>
12
13<body>
14
15<div class="doc_title">Kaleidoscope: Extending the Language: Mutable Variables</div>
16
Chris Lattner128eb862007-11-05 19:06:59 +000017<ul>
Chris Lattner0e555b12007-11-05 20:04:56 +000018<li><a href="index.html">Up to Tutorial Index</a></li>
Chris Lattner128eb862007-11-05 19:06:59 +000019<li>Chapter 7
20 <ol>
21 <li><a href="#intro">Chapter 7 Introduction</a></li>
22 <li><a href="#why">Why is this a hard problem?</a></li>
23 <li><a href="#memory">Memory in LLVM</a></li>
24 <li><a href="#kalvars">Mutable Variables in Kaleidoscope</a></li>
25 <li><a href="#adjustments">Adjusting Existing Variables for
26 Mutation</a></li>
27 <li><a href="#assignment">New Assignment Operator</a></li>
28 <li><a href="#localvars">User-defined Local Variables</a></li>
29 <li><a href="#code">Full Code Listing</a></li>
30 </ol>
31</li>
Chris Lattner0e555b12007-11-05 20:04:56 +000032<li><a href="LangImpl8.html">Chapter 8</a>: Conclusion and other useful LLVM
33 tidbits</li>
Chris Lattner128eb862007-11-05 19:06:59 +000034</ul>
35
Chris Lattner00c992d2007-11-03 08:55:29 +000036<div class="doc_author">
37 <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
38</div>
39
40<!-- *********************************************************************** -->
Chris Lattner128eb862007-11-05 19:06:59 +000041<div class="doc_section"><a name="intro">Chapter 7 Introduction</a></div>
Chris Lattner00c992d2007-11-03 08:55:29 +000042<!-- *********************************************************************** -->
43
44<div class="doc_text">
45
Chris Lattner128eb862007-11-05 19:06:59 +000046<p>Welcome to Chapter 7 of the "<a href="index.html">Implementing a language
47with LLVM</a>" tutorial. In chapters 1 through 6, we've built a very
48respectable, albeit simple, <a
Chris Lattner00c992d2007-11-03 08:55:29 +000049href="http://en.wikipedia.org/wiki/Functional_programming">functional
50programming language</a>. In our journey, we learned some parsing techniques,
51how to build and represent an AST, how to build LLVM IR, and how to optimize
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +000052the resultant code as well as JIT compile it.</p>
Chris Lattner00c992d2007-11-03 08:55:29 +000053
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +000054<p>While Kaleidoscope is interesting as a functional language, the fact that it
55is functional makes it "too easy" to generate LLVM IR for it. In particular, a
56functional language makes it very easy to build LLVM IR directly in <a
Chris Lattner00c992d2007-11-03 08:55:29 +000057href="http://en.wikipedia.org/wiki/Static_single_assignment_form">SSA form</a>.
58Since LLVM requires that the input code be in SSA form, this is a very nice
59property and it is often unclear to newcomers how to generate code for an
60imperative language with mutable variables.</p>
61
62<p>The short (and happy) summary of this chapter is that there is no need for
63your front-end to build SSA form: LLVM provides highly tuned and well tested
64support for this, though the way it works is a bit unexpected for some.</p>
65
66</div>
67
68<!-- *********************************************************************** -->
69<div class="doc_section"><a name="why">Why is this a hard problem?</a></div>
70<!-- *********************************************************************** -->
71
72<div class="doc_text">
73
74<p>
75To understand why mutable variables cause complexities in SSA construction,
76consider this extremely simple C example:
77</p>
78
79<div class="doc_code">
80<pre>
81int G, H;
82int test(_Bool Condition) {
83 int X;
84 if (Condition)
85 X = G;
86 else
87 X = H;
88 return X;
89}
90</pre>
91</div>
92
93<p>In this case, we have the variable "X", whose value depends on the path
94executed in the program. Because there are two different possible values for X
95before the return instruction, a PHI node is inserted to merge the two values.
96The LLVM IR that we want for this example looks like this:</p>
97
98<div class="doc_code">
99<pre>
100@G = weak global i32 0 ; type of @G is i32*
101@H = weak global i32 0 ; type of @H is i32*
102
103define i32 @test(i1 %Condition) {
104entry:
105 br i1 %Condition, label %cond_true, label %cond_false
106
107cond_true:
108 %X.0 = load i32* @G
109 br label %cond_next
110
111cond_false:
112 %X.1 = load i32* @H
113 br label %cond_next
114
115cond_next:
116 %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
117 ret i32 %X.2
118}
119</pre>
120</div>
121
122<p>In this example, the loads from the G and H global variables are explicit in
123the LLVM IR, and they live in the then/else branches of the if statement
124(cond_true/cond_false). In order to merge the incoming values, the X.2 phi node
125in the cond_next block selects the right value to use based on where control
126flow is coming from: if control flow comes from the cond_false block, X.2 gets
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000127the value of X.1. Alternatively, if control flow comes from cond_true, it gets
Chris Lattner00c992d2007-11-03 08:55:29 +0000128the value of X.0. The intent of this chapter is not to explain the details of
129SSA form. For more information, see one of the many <a
130href="http://en.wikipedia.org/wiki/Static_single_assignment_form">online
131references</a>.</p>
132
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000133<p>The question for this article is "who places the phi nodes when lowering
Chris Lattner00c992d2007-11-03 08:55:29 +0000134assignments to mutable variables?". The issue here is that LLVM
135<em>requires</em> that its IR be in SSA form: there is no "non-ssa" mode for it.
136However, SSA construction requires non-trivial algorithms and data structures,
137so it is inconvenient and wasteful for every front-end to have to reproduce this
138logic.</p>
139
140</div>
141
142<!-- *********************************************************************** -->
143<div class="doc_section"><a name="memory">Memory in LLVM</a></div>
144<!-- *********************************************************************** -->
145
146<div class="doc_text">
147
148<p>The 'trick' here is that while LLVM does require all register values to be
149in SSA form, it does not require (or permit) memory objects to be in SSA form.
150In the example above, note that the loads from G and H are direct accesses to
151G and H: they are not renamed or versioned. This differs from some other
Chris Lattner2e5d07e2007-11-04 19:42:13 +0000152compiler systems, which do try to version memory objects. In LLVM, instead of
Chris Lattner00c992d2007-11-03 08:55:29 +0000153encoding dataflow analysis of memory into the LLVM IR, it is handled with <a
154href="../WritingAnLLVMPass.html">Analysis Passes</a> which are computed on
155demand.</p>
156
157<p>
158With this in mind, the high-level idea is that we want to make a stack variable
159(which lives in memory, because it is on the stack) for each mutable object in
160a function. To take advantage of this trick, we need to talk about how LLVM
161represents stack variables.
162</p>
163
164<p>In LLVM, all memory accesses are explicit with load/store instructions, and
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000165it is carefully designed not to have (or need) an "address-of" operator. Notice
Chris Lattner00c992d2007-11-03 08:55:29 +0000166how the type of the @G/@H global variables is actually "i32*" even though the
167variable is defined as "i32". What this means is that @G defines <em>space</em>
168for an i32 in the global data area, but its <em>name</em> actually refers to the
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000169address for that space. Stack variables work the same way, except that instead of
170being declared with global variable definitions, they are declared with the
Chris Lattner00c992d2007-11-03 08:55:29 +0000171<a href="../LangRef.html#i_alloca">LLVM alloca instruction</a>:</p>
172
173<div class="doc_code">
174<pre>
Chris Lattner1e46a6c2007-11-07 06:34:39 +0000175define i32 @example() {
Chris Lattner00c992d2007-11-03 08:55:29 +0000176entry:
177 %X = alloca i32 ; type of %X is i32*.
178 ...
179 %tmp = load i32* %X ; load the stack value %X from the stack.
180 %tmp2 = add i32 %tmp, 1 ; increment it
181 store i32 %tmp2, i32* %X ; store it back
182 ...
183</pre>
184</div>
185
186<p>This code shows an example of how you can declare and manipulate a stack
187variable in the LLVM IR. Stack memory allocated with the alloca instruction is
188fully general: you can pass the address of the stack slot to functions, you can
189store it in other variables, etc. In our example above, we could rewrite the
190example to use the alloca technique to avoid using a PHI node:</p>
191
192<div class="doc_code">
193<pre>
194@G = weak global i32 0 ; type of @G is i32*
195@H = weak global i32 0 ; type of @H is i32*
196
197define i32 @test(i1 %Condition) {
198entry:
199 %X = alloca i32 ; type of %X is i32*.
200 br i1 %Condition, label %cond_true, label %cond_false
201
202cond_true:
203 %X.0 = load i32* @G
204 store i32 %X.0, i32* %X ; Update X
205 br label %cond_next
206
207cond_false:
208 %X.1 = load i32* @H
209 store i32 %X.1, i32* %X ; Update X
210 br label %cond_next
211
212cond_next:
213 %X.2 = load i32* %X ; Read X
214 ret i32 %X.2
215}
216</pre>
217</div>
218
219<p>With this, we have discovered a way to handle arbitrary mutable variables
220without the need to create Phi nodes at all:</p>
221
222<ol>
223<li>Each mutable variable becomes a stack allocation.</li>
224<li>Each read of the variable becomes a load from the stack.</li>
225<li>Each update of the variable becomes a store to the stack.</li>
226<li>Taking the address of a variable just uses the stack address directly.</li>
227</ol>
228
229<p>While this solution has solved our immediate problem, it introduced another
230one: we have now apparently introduced a lot of stack traffic for very simple
231and common operations, a major performance problem. Fortunately for us, the
232LLVM optimizer has a highly-tuned optimization pass named "mem2reg" that handles
233this case, promoting allocas like this into SSA registers, inserting Phi nodes
234as appropriate. If you run this example through the pass, for example, you'll
235get:</p>
236
237<div class="doc_code">
238<pre>
239$ <b>llvm-as &lt; example.ll | opt -mem2reg | llvm-dis</b>
240@G = weak global i32 0
241@H = weak global i32 0
242
243define i32 @test(i1 %Condition) {
244entry:
245 br i1 %Condition, label %cond_true, label %cond_false
246
247cond_true:
248 %X.0 = load i32* @G
249 br label %cond_next
250
251cond_false:
252 %X.1 = load i32* @H
253 br label %cond_next
254
255cond_next:
256 %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
257 ret i32 %X.01
258}
259</pre>
Chris Lattnere7198312007-11-03 22:22:30 +0000260</div>
Chris Lattner00c992d2007-11-03 08:55:29 +0000261
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000262<p>The mem2reg pass implements the standard "iterated dominance frontier"
Chris Lattnere7198312007-11-03 22:22:30 +0000263algorithm for constructing SSA form and has a number of optimizations that speed
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000264up (very common) degenerate cases. The mem2reg optimization pass is the answer to dealing
265with mutable variables, and we highly recommend that you depend on it. Note that
Chris Lattnere7198312007-11-03 22:22:30 +0000266mem2reg only works on variables in certain circumstances:</p>
Chris Lattner00c992d2007-11-03 08:55:29 +0000267
Chris Lattnere7198312007-11-03 22:22:30 +0000268<ol>
269<li>mem2reg is alloca-driven: it looks for allocas and if it can handle them, it
270promotes them. It does not apply to global variables or heap allocations.</li>
Chris Lattner00c992d2007-11-03 08:55:29 +0000271
Chris Lattnere7198312007-11-03 22:22:30 +0000272<li>mem2reg only looks for alloca instructions in the entry block of the
273function. Being in the entry block guarantees that the alloca is only executed
274once, which makes analysis simpler.</li>
Chris Lattner00c992d2007-11-03 08:55:29 +0000275
Chris Lattnere7198312007-11-03 22:22:30 +0000276<li>mem2reg only promotes allocas whose uses are direct loads and stores. If
277the address of the stack object is passed to a function, or if any funny pointer
278arithmetic is involved, the alloca will not be promoted.</li>
279
Chris Lattnera56b22d2007-11-05 17:45:54 +0000280<li>mem2reg only works on allocas of <a
281href="../LangRef.html#t_classifications">first class</a>
282values (such as pointers, scalars and vectors), and only if the array size
Chris Lattnere7198312007-11-03 22:22:30 +0000283of the allocation is 1 (or missing in the .ll file). mem2reg is not capable of
284promoting structs or arrays to registers. Note that the "scalarrepl" pass is
285more powerful and can promote structs, "unions", and arrays in many cases.</li>
286
287</ol>
288
289<p>
290All of these properties are easy to satisfy for most imperative languages, and
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000291we'll illustrate it below with Kaleidoscope. The final question you may be
Chris Lattnere7198312007-11-03 22:22:30 +0000292asking is: should I bother with this nonsense for my front-end? Wouldn't it be
293better if I just did SSA construction directly, avoiding use of the mem2reg
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000294optimization pass? In short, we strongly recommend that you use this technique
Chris Lattnere7198312007-11-03 22:22:30 +0000295for building SSA form, unless there is an extremely good reason not to. Using
296this technique is:</p>
297
298<ul>
299<li>Proven and well tested: llvm-gcc and clang both use this technique for local
300mutable variables. As such, the most common clients of LLVM are using this to
301handle a bulk of their variables. You can be sure that bugs are found fast and
302fixed early.</li>
303
304<li>Extremely Fast: mem2reg has a number of special cases that make it fast in
305common cases as well as fully general. For example, it has fast-paths for
306variables that are only used in a single block, variables that only have one
307assignment point, good heuristics to avoid insertion of unneeded phi nodes, etc.
308</li>
309
310<li>Needed for debug info generation: <a href="../SourceLevelDebugging.html">
311Debug information in LLVM</a> relies on having the address of the variable
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000312exposed so that debug info can be attached to it. This technique dovetails
313very naturally with this style of debug info.</li>
Chris Lattnere7198312007-11-03 22:22:30 +0000314</ul>
315
316<p>If nothing else, this makes it much easier to get your front-end up and
317running, and is very simple to implement. Lets extend Kaleidoscope with mutable
318variables now!
Chris Lattner00c992d2007-11-03 08:55:29 +0000319</p>
Chris Lattner62a709d2007-11-05 00:23:57 +0000320
Chris Lattner00c992d2007-11-03 08:55:29 +0000321</div>
322
Chris Lattner62a709d2007-11-05 00:23:57 +0000323<!-- *********************************************************************** -->
324<div class="doc_section"><a name="kalvars">Mutable Variables in
325Kaleidoscope</a></div>
326<!-- *********************************************************************** -->
327
328<div class="doc_text">
329
330<p>Now that we know the sort of problem we want to tackle, lets see what this
331looks like in the context of our little Kaleidoscope language. We're going to
332add two features:</p>
333
334<ol>
335<li>The ability to mutate variables with the '=' operator.</li>
336<li>The ability to define new variables.</li>
337</ol>
338
339<p>While the first item is really what this is about, we only have variables
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000340for incoming arguments as well as for induction variables, and redefining those only
Chris Lattner62a709d2007-11-05 00:23:57 +0000341goes so far :). Also, the ability to define new variables is a
342useful thing regardless of whether you will be mutating them. Here's a
343motivating example that shows how we could use these:</p>
344
345<div class="doc_code">
346<pre>
347# Define ':' for sequencing: as a low-precedence operator that ignores operands
348# and just returns the RHS.
349def binary : 1 (x y) y;
350
351# Recursive fib, we could do this before.
352def fib(x)
353 if (x &lt; 3) then
354 1
355 else
356 fib(x-1)+fib(x-2);
357
358# Iterative fib.
359def fibi(x)
360 <b>var a = 1, b = 1, c in</b>
Chris Lattner1e46a6c2007-11-07 06:34:39 +0000361 (for i = 3, i &lt; x in
Chris Lattner62a709d2007-11-05 00:23:57 +0000362 <b>c = a + b</b> :
363 <b>a = b</b> :
364 <b>b = c</b>) :
365 b;
366
367# Call it.
368fibi(10);
369</pre>
370</div>
371
372<p>
373In order to mutate variables, we have to change our existing variables to use
374the "alloca trick". Once we have that, we'll add our new operator, then extend
375Kaleidoscope to support new variable definitions.
376</p>
377
378</div>
379
380<!-- *********************************************************************** -->
381<div class="doc_section"><a name="adjustments">Adjusting Existing Variables for
382Mutation</a></div>
383<!-- *********************************************************************** -->
384
385<div class="doc_text">
386
387<p>
388The symbol table in Kaleidoscope is managed at code generation time by the
389'<tt>NamedValues</tt>' map. This map currently keeps track of the LLVM "Value*"
390that holds the double value for the named variable. In order to support
391mutation, we need to change this slightly, so that it <tt>NamedValues</tt> holds
392the <em>memory location</em> of the variable in question. Note that this
393change is a refactoring: it changes the structure of the code, but does not
394(by itself) change the behavior of the compiler. All of these changes are
395isolated in the Kaleidoscope code generator.</p>
396
397<p>
398At this point in Kaleidoscope's development, it only supports variables for two
399things: incoming arguments to functions and the induction variable of 'for'
400loops. For consistency, we'll allow mutation of these variables in addition to
401other user-defined variables. This means that these will both need memory
402locations.
403</p>
404
405<p>To start our transformation of Kaleidoscope, we'll change the NamedValues
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000406map so that it maps to AllocaInst* instead of Value*. Once we do this, the C++
407compiler will tell us what parts of the code we need to update:</p>
Chris Lattner62a709d2007-11-05 00:23:57 +0000408
409<div class="doc_code">
410<pre>
411static std::map&lt;std::string, AllocaInst*&gt; NamedValues;
412</pre>
413</div>
414
415<p>Also, since we will need to create these alloca's, we'll use a helper
416function that ensures that the allocas are created in the entry block of the
417function:</p>
418
419<div class="doc_code">
420<pre>
421/// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
422/// the function. This is used for mutable variables etc.
423static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
424 const std::string &amp;VarName) {
Gabor Greifd6c1ed02009-03-11 19:51:07 +0000425 IRBuilder&lt;&gt; TmpB(&amp;TheFunction-&gt;getEntryBlock(),
Duncan Sands89f6d882008-04-13 06:22:09 +0000426 TheFunction-&gt;getEntryBlock().begin());
Nick Lewycky422094c2009-09-13 21:38:54 +0000427 return TmpB.CreateAlloca(Type::getDoubleTy(getGlobalContext()), 0,
428 VarName.c_str());
Chris Lattner62a709d2007-11-05 00:23:57 +0000429}
430</pre>
431</div>
432
Duncan Sands89f6d882008-04-13 06:22:09 +0000433<p>This funny looking code creates an IRBuilder object that is pointing at
Chris Lattner62a709d2007-11-05 00:23:57 +0000434the first instruction (.begin()) of the entry block. It then creates an alloca
435with the expected name and returns it. Because all values in Kaleidoscope are
436doubles, there is no need to pass in a type to use.</p>
437
438<p>With this in place, the first functionality change we want to make is to
439variable references. In our new scheme, variables live on the stack, so code
440generating a reference to them actually needs to produce a load from the stack
441slot:</p>
442
443<div class="doc_code">
444<pre>
445Value *VariableExprAST::Codegen() {
446 // Look this variable up in the function.
447 Value *V = NamedValues[Name];
448 if (V == 0) return ErrorV("Unknown variable name");
449
Chris Lattner1e46a6c2007-11-07 06:34:39 +0000450 <b>// Load the value.
451 return Builder.CreateLoad(V, Name.c_str());</b>
Chris Lattner62a709d2007-11-05 00:23:57 +0000452}
453</pre>
454</div>
455
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000456<p>As you can see, this is pretty straightforward. Now we need to update the
Chris Lattner62a709d2007-11-05 00:23:57 +0000457things that define the variables to set up the alloca. We'll start with
458<tt>ForExprAST::Codegen</tt> (see the <a href="#code">full code listing</a> for
459the unabridged code):</p>
460
461<div class="doc_code">
462<pre>
463 Function *TheFunction = Builder.GetInsertBlock()->getParent();
464
465 <b>// Create an alloca for the variable in the entry block.
466 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);</b>
467
468 // Emit the start code first, without 'variable' in scope.
469 Value *StartVal = Start-&gt;Codegen();
470 if (StartVal == 0) return 0;
471
472 <b>// Store the value into the alloca.
473 Builder.CreateStore(StartVal, Alloca);</b>
474 ...
475
476 // Compute the end condition.
477 Value *EndCond = End-&gt;Codegen();
478 if (EndCond == 0) return EndCond;
479
480 <b>// Reload, increment, and restore the alloca. This handles the case where
481 // the body of the loop mutates the variable.
482 Value *CurVar = Builder.CreateLoad(Alloca);
483 Value *NextVar = Builder.CreateAdd(CurVar, StepVal, "nextvar");
484 Builder.CreateStore(NextVar, Alloca);</b>
485 ...
486</pre>
487</div>
488
489<p>This code is virtually identical to the code <a
490href="LangImpl5.html#forcodegen">before we allowed mutable variables</a>. The
491big difference is that we no longer have to construct a PHI node, and we use
492load/store to access the variable as needed.</p>
493
494<p>To support mutable argument variables, we need to also make allocas for them.
495The code for this is also pretty simple:</p>
496
497<div class="doc_code">
498<pre>
499/// CreateArgumentAllocas - Create an alloca for each argument and register the
500/// argument in the symbol table so that references to it will succeed.
501void PrototypeAST::CreateArgumentAllocas(Function *F) {
502 Function::arg_iterator AI = F-&gt;arg_begin();
503 for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
504 // Create an alloca for this variable.
505 AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
506
507 // Store the initial value into the alloca.
508 Builder.CreateStore(AI, Alloca);
509
510 // Add arguments to variable symbol table.
511 NamedValues[Args[Idx]] = Alloca;
512 }
513}
514</pre>
515</div>
516
517<p>For each argument, we make an alloca, store the input value to the function
518into the alloca, and register the alloca as the memory location for the
519argument. This method gets invoked by <tt>FunctionAST::Codegen</tt> right after
520it sets up the entry block for the function.</p>
521
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000522<p>The final missing piece is adding the mem2reg pass, which allows us to get
Chris Lattner62a709d2007-11-05 00:23:57 +0000523good codegen once again:</p>
524
525<div class="doc_code">
526<pre>
527 // Set up the optimizer pipeline. Start with registering info about how the
528 // target lays out data structures.
529 OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
530 <b>// Promote allocas to registers.
531 OurFPM.add(createPromoteMemoryToRegisterPass());</b>
532 // Do simple "peephole" optimizations and bit-twiddling optzns.
533 OurFPM.add(createInstructionCombiningPass());
534 // Reassociate expressions.
535 OurFPM.add(createReassociatePass());
536</pre>
537</div>
538
539<p>It is interesting to see what the code looks like before and after the
540mem2reg optimization runs. For example, this is the before/after code for our
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000541recursive fib function. Before the optimization:</p>
Chris Lattner62a709d2007-11-05 00:23:57 +0000542
543<div class="doc_code">
544<pre>
545define double @fib(double %x) {
546entry:
547 <b>%x1 = alloca double
548 store double %x, double* %x1
549 %x2 = load double* %x1</b>
Chris Lattner71155212007-11-06 01:39:12 +0000550 %cmptmp = fcmp ult double %x2, 3.000000e+00
551 %booltmp = uitofp i1 %cmptmp to double
Chris Lattner62a709d2007-11-05 00:23:57 +0000552 %ifcond = fcmp one double %booltmp, 0.000000e+00
553 br i1 %ifcond, label %then, label %else
554
555then: ; preds = %entry
556 br label %ifcont
557
558else: ; preds = %entry
559 <b>%x3 = load double* %x1</b>
560 %subtmp = sub double %x3, 1.000000e+00
561 %calltmp = call double @fib( double %subtmp )
562 <b>%x4 = load double* %x1</b>
563 %subtmp5 = sub double %x4, 2.000000e+00
564 %calltmp6 = call double @fib( double %subtmp5 )
565 %addtmp = add double %calltmp, %calltmp6
566 br label %ifcont
567
568ifcont: ; preds = %else, %then
569 %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
570 ret double %iftmp
571}
572</pre>
573</div>
574
575<p>Here there is only one variable (x, the input argument) but you can still
576see the extremely simple-minded code generation strategy we are using. In the
577entry block, an alloca is created, and the initial input value is stored into
578it. Each reference to the variable does a reload from the stack. Also, note
579that we didn't modify the if/then/else expression, so it still inserts a PHI
580node. While we could make an alloca for it, it is actually easier to create a
581PHI node for it, so we still just make the PHI.</p>
582
583<p>Here is the code after the mem2reg pass runs:</p>
584
585<div class="doc_code">
586<pre>
587define double @fib(double %x) {
588entry:
Chris Lattner71155212007-11-06 01:39:12 +0000589 %cmptmp = fcmp ult double <b>%x</b>, 3.000000e+00
590 %booltmp = uitofp i1 %cmptmp to double
Chris Lattner62a709d2007-11-05 00:23:57 +0000591 %ifcond = fcmp one double %booltmp, 0.000000e+00
592 br i1 %ifcond, label %then, label %else
593
594then:
595 br label %ifcont
596
597else:
598 %subtmp = sub double <b>%x</b>, 1.000000e+00
599 %calltmp = call double @fib( double %subtmp )
600 %subtmp5 = sub double <b>%x</b>, 2.000000e+00
601 %calltmp6 = call double @fib( double %subtmp5 )
602 %addtmp = add double %calltmp, %calltmp6
603 br label %ifcont
604
605ifcont: ; preds = %else, %then
606 %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
607 ret double %iftmp
608}
609</pre>
610</div>
611
612<p>This is a trivial case for mem2reg, since there are no redefinitions of the
613variable. The point of showing this is to calm your tension about inserting
614such blatent inefficiencies :).</p>
615
616<p>After the rest of the optimizers run, we get:</p>
617
618<div class="doc_code">
619<pre>
620define double @fib(double %x) {
621entry:
Chris Lattner71155212007-11-06 01:39:12 +0000622 %cmptmp = fcmp ult double %x, 3.000000e+00
623 %booltmp = uitofp i1 %cmptmp to double
Chris Lattner62a709d2007-11-05 00:23:57 +0000624 %ifcond = fcmp ueq double %booltmp, 0.000000e+00
625 br i1 %ifcond, label %else, label %ifcont
626
627else:
628 %subtmp = sub double %x, 1.000000e+00
629 %calltmp = call double @fib( double %subtmp )
630 %subtmp5 = sub double %x, 2.000000e+00
631 %calltmp6 = call double @fib( double %subtmp5 )
632 %addtmp = add double %calltmp, %calltmp6
633 ret double %addtmp
634
635ifcont:
636 ret double 1.000000e+00
637}
638</pre>
639</div>
640
641<p>Here we see that the simplifycfg pass decided to clone the return instruction
642into the end of the 'else' block. This allowed it to eliminate some branches
643and the PHI node.</p>
644
645<p>Now that all symbol table references are updated to use stack variables,
646we'll add the assignment operator.</p>
647
648</div>
649
650<!-- *********************************************************************** -->
651<div class="doc_section"><a name="assignment">New Assignment Operator</a></div>
652<!-- *********************************************************************** -->
653
654<div class="doc_text">
655
656<p>With our current framework, adding a new assignment operator is really
657simple. We will parse it just like any other binary operator, but handle it
658internally (instead of allowing the user to define it). The first step is to
659set a precedence:</p>
660
661<div class="doc_code">
662<pre>
663 int main() {
664 // Install standard binary operators.
665 // 1 is lowest precedence.
666 <b>BinopPrecedence['='] = 2;</b>
667 BinopPrecedence['&lt;'] = 10;
668 BinopPrecedence['+'] = 20;
669 BinopPrecedence['-'] = 20;
670</pre>
671</div>
672
673<p>Now that the parser knows the precedence of the binary operator, it takes
674care of all the parsing and AST generation. We just need to implement codegen
675for the assignment operator. This looks like:</p>
676
677<div class="doc_code">
678<pre>
679Value *BinaryExprAST::Codegen() {
680 // Special case '=' because we don't want to emit the LHS as an expression.
681 if (Op == '=') {
682 // Assignment requires the LHS to be an identifier.
683 VariableExprAST *LHSE = dynamic_cast&lt;VariableExprAST*&gt;(LHS);
684 if (!LHSE)
685 return ErrorV("destination of '=' must be a variable");
686</pre>
687</div>
688
689<p>Unlike the rest of the binary operators, our assignment operator doesn't
690follow the "emit LHS, emit RHS, do computation" model. As such, it is handled
691as a special case before the other binary operators are handled. The other
Chris Lattner1e46a6c2007-11-07 06:34:39 +0000692strange thing is that it requires the LHS to be a variable. It is invalid to
693have "(x+1) = expr" - only things like "x = expr" are allowed.
Chris Lattner62a709d2007-11-05 00:23:57 +0000694</p>
695
696<div class="doc_code">
697<pre>
698 // Codegen the RHS.
699 Value *Val = RHS-&gt;Codegen();
700 if (Val == 0) return 0;
701
702 // Look up the name.
703 Value *Variable = NamedValues[LHSE-&gt;getName()];
704 if (Variable == 0) return ErrorV("Unknown variable name");
705
706 Builder.CreateStore(Val, Variable);
707 return Val;
708 }
709 ...
710</pre>
711</div>
712
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000713<p>Once we have the variable, codegen'ing the assignment is straightforward:
Chris Lattner62a709d2007-11-05 00:23:57 +0000714we emit the RHS of the assignment, create a store, and return the computed
715value. Returning a value allows for chained assignments like "X = (Y = Z)".</p>
716
717<p>Now that we have an assignment operator, we can mutate loop variables and
718arguments. For example, we can now run code like this:</p>
719
720<div class="doc_code">
721<pre>
722# Function to print a double.
723extern printd(x);
724
725# Define ':' for sequencing: as a low-precedence operator that ignores operands
726# and just returns the RHS.
727def binary : 1 (x y) y;
728
729def test(x)
730 printd(x) :
731 x = 4 :
732 printd(x);
733
734test(123);
735</pre>
736</div>
737
738<p>When run, this example prints "123" and then "4", showing that we did
739actually mutate the value! Okay, we have now officially implemented our goal:
740getting this to work requires SSA construction in the general case. However,
741to be really useful, we want the ability to define our own local variables, lets
742add this next!
743</p>
744
745</div>
746
747<!-- *********************************************************************** -->
748<div class="doc_section"><a name="localvars">User-defined Local
749Variables</a></div>
750<!-- *********************************************************************** -->
751
752<div class="doc_text">
753
754<p>Adding var/in is just like any other other extensions we made to
755Kaleidoscope: we extend the lexer, the parser, the AST and the code generator.
756The first step for adding our new 'var/in' construct is to extend the lexer.
757As before, this is pretty trivial, the code looks like this:</p>
758
759<div class="doc_code">
760<pre>
761enum Token {
762 ...
763 <b>// var definition
764 tok_var = -13</b>
765...
766}
767...
768static int gettok() {
769...
770 if (IdentifierStr == "in") return tok_in;
771 if (IdentifierStr == "binary") return tok_binary;
772 if (IdentifierStr == "unary") return tok_unary;
773 <b>if (IdentifierStr == "var") return tok_var;</b>
774 return tok_identifier;
775...
776</pre>
777</div>
778
779<p>The next step is to define the AST node that we will construct. For var/in,
Chris Lattner1e46a6c2007-11-07 06:34:39 +0000780it looks like this:</p>
Chris Lattner62a709d2007-11-05 00:23:57 +0000781
782<div class="doc_code">
783<pre>
784/// VarExprAST - Expression class for var/in
785class VarExprAST : public ExprAST {
786 std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
787 ExprAST *Body;
788public:
789 VarExprAST(const std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; &amp;varnames,
790 ExprAST *body)
791 : VarNames(varnames), Body(body) {}
792
793 virtual Value *Codegen();
794};
795</pre>
796</div>
797
798<p>var/in allows a list of names to be defined all at once, and each name can
799optionally have an initializer value. As such, we capture this information in
800the VarNames vector. Also, var/in has a body, this body is allowed to access
Chris Lattner1e46a6c2007-11-07 06:34:39 +0000801the variables defined by the var/in.</p>
Chris Lattner62a709d2007-11-05 00:23:57 +0000802
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000803<p>With this in place, we can define the parser pieces. The first thing we do is add
Chris Lattner62a709d2007-11-05 00:23:57 +0000804it as a primary expression:</p>
805
806<div class="doc_code">
807<pre>
808/// primary
809/// ::= identifierexpr
810/// ::= numberexpr
811/// ::= parenexpr
812/// ::= ifexpr
813/// ::= forexpr
814<b>/// ::= varexpr</b>
815static ExprAST *ParsePrimary() {
816 switch (CurTok) {
817 default: return Error("unknown token when expecting an expression");
818 case tok_identifier: return ParseIdentifierExpr();
819 case tok_number: return ParseNumberExpr();
820 case '(': return ParseParenExpr();
821 case tok_if: return ParseIfExpr();
822 case tok_for: return ParseForExpr();
823 <b>case tok_var: return ParseVarExpr();</b>
824 }
825}
826</pre>
827</div>
828
829<p>Next we define ParseVarExpr:</p>
830
831<div class="doc_code">
832<pre>
Chris Lattner20a0c802007-11-05 17:54:34 +0000833/// varexpr ::= 'var' identifier ('=' expression)?
834// (',' identifier ('=' expression)?)* 'in' expression
Chris Lattner62a709d2007-11-05 00:23:57 +0000835static ExprAST *ParseVarExpr() {
836 getNextToken(); // eat the var.
837
838 std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
839
840 // At least one variable name is required.
841 if (CurTok != tok_identifier)
842 return Error("expected identifier after var");
843</pre>
844</div>
845
846<p>The first part of this code parses the list of identifier/expr pairs into the
847local <tt>VarNames</tt> vector.
848
849<div class="doc_code">
850<pre>
851 while (1) {
852 std::string Name = IdentifierStr;
Chris Lattner20a0c802007-11-05 17:54:34 +0000853 getNextToken(); // eat identifier.
Chris Lattner62a709d2007-11-05 00:23:57 +0000854
855 // Read the optional initializer.
856 ExprAST *Init = 0;
857 if (CurTok == '=') {
858 getNextToken(); // eat the '='.
859
860 Init = ParseExpression();
861 if (Init == 0) return 0;
862 }
863
864 VarNames.push_back(std::make_pair(Name, Init));
865
866 // End of var list, exit loop.
867 if (CurTok != ',') break;
868 getNextToken(); // eat the ','.
869
870 if (CurTok != tok_identifier)
871 return Error("expected identifier list after var");
872 }
873</pre>
874</div>
875
876<p>Once all the variables are parsed, we then parse the body and create the
877AST node:</p>
878
879<div class="doc_code">
880<pre>
881 // At this point, we have to have 'in'.
882 if (CurTok != tok_in)
883 return Error("expected 'in' keyword after 'var'");
884 getNextToken(); // eat 'in'.
885
886 ExprAST *Body = ParseExpression();
887 if (Body == 0) return 0;
888
889 return new VarExprAST(VarNames, Body);
890}
891</pre>
892</div>
893
894<p>Now that we can parse and represent the code, we need to support emission of
895LLVM IR for it. This code starts out with:</p>
896
897<div class="doc_code">
898<pre>
899Value *VarExprAST::Codegen() {
900 std::vector&lt;AllocaInst *&gt; OldBindings;
901
902 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
903
904 // Register all variables and emit their initializer.
905 for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
906 const std::string &amp;VarName = VarNames[i].first;
907 ExprAST *Init = VarNames[i].second;
908</pre>
909</div>
910
911<p>Basically it loops over all the variables, installing them one at a time.
912For each variable we put into the symbol table, we remember the previous value
913that we replace in OldBindings.</p>
914
915<div class="doc_code">
916<pre>
917 // Emit the initializer before adding the variable to scope, this prevents
918 // the initializer from referencing the variable itself, and permits stuff
919 // like this:
920 // var a = 1 in
921 // var a = a in ... # refers to outer 'a'.
922 Value *InitVal;
923 if (Init) {
924 InitVal = Init-&gt;Codegen();
925 if (InitVal == 0) return 0;
926 } else { // If not specified, use 0.0.
Owen Anderson6f83c9c2009-07-27 20:59:43 +0000927 InitVal = ConstantFP::get(getGlobalContext(), APFloat(0.0));
Chris Lattner62a709d2007-11-05 00:23:57 +0000928 }
929
930 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
931 Builder.CreateStore(InitVal, Alloca);
932
933 // Remember the old variable binding so that we can restore the binding when
934 // we unrecurse.
935 OldBindings.push_back(NamedValues[VarName]);
936
937 // Remember this binding.
938 NamedValues[VarName] = Alloca;
939 }
940</pre>
941</div>
942
943<p>There are more comments here than code. The basic idea is that we emit the
944initializer, create the alloca, then update the symbol table to point to it.
945Once all the variables are installed in the symbol table, we evaluate the body
946of the var/in expression:</p>
947
948<div class="doc_code">
949<pre>
950 // Codegen the body, now that all vars are in scope.
951 Value *BodyVal = Body-&gt;Codegen();
952 if (BodyVal == 0) return 0;
953</pre>
954</div>
955
956<p>Finally, before returning, we restore the previous variable bindings:</p>
957
958<div class="doc_code">
959<pre>
960 // Pop all our variables from scope.
961 for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
962 NamedValues[VarNames[i].first] = OldBindings[i];
963
964 // Return the body computation.
965 return BodyVal;
966}
967</pre>
968</div>
969
970<p>The end result of all of this is that we get properly scoped variable
971definitions, and we even (trivially) allow mutation of them :).</p>
972
973<p>With this, we completed what we set out to do. Our nice iterative fib
974example from the intro compiles and runs just fine. The mem2reg pass optimizes
975all of our stack variables into SSA registers, inserting PHI nodes where needed,
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000976and our front-end remains simple: no "iterated dominance frontier" computation
Chris Lattner62a709d2007-11-05 00:23:57 +0000977anywhere in sight.</p>
978
979</div>
Chris Lattner00c992d2007-11-03 08:55:29 +0000980
981<!-- *********************************************************************** -->
982<div class="doc_section"><a name="code">Full Code Listing</a></div>
983<!-- *********************************************************************** -->
984
985<div class="doc_text">
986
987<p>
Chris Lattner62a709d2007-11-05 00:23:57 +0000988Here is the complete code listing for our running example, enhanced with mutable
989variables and var/in support. To build this example, use:
Chris Lattner00c992d2007-11-03 08:55:29 +0000990</p>
991
992<div class="doc_code">
993<pre>
994 # Compile
995 g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
996 # Run
997 ./toy
998</pre>
999</div>
1000
1001<p>Here is the code:</p>
1002
1003<div class="doc_code">
1004<pre>
Chris Lattner62a709d2007-11-05 00:23:57 +00001005#include "llvm/DerivedTypes.h"
1006#include "llvm/ExecutionEngine/ExecutionEngine.h"
Nick Lewycky422094c2009-09-13 21:38:54 +00001007#include "llvm/ExecutionEngine/Interpreter.h"
1008#include "llvm/ExecutionEngine/JIT.h"
Owen Andersond1fbd142009-07-08 20:50:47 +00001009#include "llvm/LLVMContext.h"
Chris Lattner62a709d2007-11-05 00:23:57 +00001010#include "llvm/Module.h"
1011#include "llvm/ModuleProvider.h"
1012#include "llvm/PassManager.h"
1013#include "llvm/Analysis/Verifier.h"
1014#include "llvm/Target/TargetData.h"
Nick Lewycky422094c2009-09-13 21:38:54 +00001015#include "llvm/Target/TargetSelect.h"
Chris Lattner62a709d2007-11-05 00:23:57 +00001016#include "llvm/Transforms/Scalar.h"
Duncan Sands89f6d882008-04-13 06:22:09 +00001017#include "llvm/Support/IRBuilder.h"
Chris Lattner62a709d2007-11-05 00:23:57 +00001018#include &lt;cstdio&gt;
1019#include &lt;string&gt;
1020#include &lt;map&gt;
1021#include &lt;vector&gt;
1022using namespace llvm;
1023
1024//===----------------------------------------------------------------------===//
1025// Lexer
1026//===----------------------------------------------------------------------===//
1027
1028// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
1029// of these for known things.
1030enum Token {
1031 tok_eof = -1,
1032
1033 // commands
1034 tok_def = -2, tok_extern = -3,
1035
1036 // primary
1037 tok_identifier = -4, tok_number = -5,
1038
1039 // control
1040 tok_if = -6, tok_then = -7, tok_else = -8,
1041 tok_for = -9, tok_in = -10,
1042
1043 // operators
1044 tok_binary = -11, tok_unary = -12,
1045
1046 // var definition
1047 tok_var = -13
1048};
1049
1050static std::string IdentifierStr; // Filled in if tok_identifier
1051static double NumVal; // Filled in if tok_number
1052
1053/// gettok - Return the next token from standard input.
1054static int gettok() {
1055 static int LastChar = ' ';
1056
1057 // Skip any whitespace.
1058 while (isspace(LastChar))
1059 LastChar = getchar();
1060
1061 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
1062 IdentifierStr = LastChar;
1063 while (isalnum((LastChar = getchar())))
1064 IdentifierStr += LastChar;
1065
1066 if (IdentifierStr == "def") return tok_def;
1067 if (IdentifierStr == "extern") return tok_extern;
1068 if (IdentifierStr == "if") return tok_if;
1069 if (IdentifierStr == "then") return tok_then;
1070 if (IdentifierStr == "else") return tok_else;
1071 if (IdentifierStr == "for") return tok_for;
1072 if (IdentifierStr == "in") return tok_in;
1073 if (IdentifierStr == "binary") return tok_binary;
1074 if (IdentifierStr == "unary") return tok_unary;
1075 if (IdentifierStr == "var") return tok_var;
1076 return tok_identifier;
1077 }
1078
1079 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
1080 std::string NumStr;
1081 do {
1082 NumStr += LastChar;
1083 LastChar = getchar();
1084 } while (isdigit(LastChar) || LastChar == '.');
1085
1086 NumVal = strtod(NumStr.c_str(), 0);
1087 return tok_number;
1088 }
1089
1090 if (LastChar == '#') {
1091 // Comment until end of line.
1092 do LastChar = getchar();
Chris Lattnerc80c23f2007-12-02 22:46:01 +00001093 while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp;&amp; LastChar != '\r');
Chris Lattner62a709d2007-11-05 00:23:57 +00001094
1095 if (LastChar != EOF)
1096 return gettok();
1097 }
1098
1099 // Check for end of file. Don't eat the EOF.
1100 if (LastChar == EOF)
1101 return tok_eof;
1102
1103 // Otherwise, just return the character as its ascii value.
1104 int ThisChar = LastChar;
1105 LastChar = getchar();
1106 return ThisChar;
1107}
1108
1109//===----------------------------------------------------------------------===//
1110// Abstract Syntax Tree (aka Parse Tree)
1111//===----------------------------------------------------------------------===//
1112
1113/// ExprAST - Base class for all expression nodes.
1114class ExprAST {
1115public:
1116 virtual ~ExprAST() {}
1117 virtual Value *Codegen() = 0;
1118};
1119
1120/// NumberExprAST - Expression class for numeric literals like "1.0".
1121class NumberExprAST : public ExprAST {
1122 double Val;
1123public:
1124 NumberExprAST(double val) : Val(val) {}
1125 virtual Value *Codegen();
1126};
1127
1128/// VariableExprAST - Expression class for referencing a variable, like "a".
1129class VariableExprAST : public ExprAST {
1130 std::string Name;
1131public:
1132 VariableExprAST(const std::string &amp;name) : Name(name) {}
1133 const std::string &amp;getName() const { return Name; }
1134 virtual Value *Codegen();
1135};
1136
1137/// UnaryExprAST - Expression class for a unary operator.
1138class UnaryExprAST : public ExprAST {
1139 char Opcode;
1140 ExprAST *Operand;
1141public:
1142 UnaryExprAST(char opcode, ExprAST *operand)
1143 : Opcode(opcode), Operand(operand) {}
1144 virtual Value *Codegen();
1145};
1146
1147/// BinaryExprAST - Expression class for a binary operator.
1148class BinaryExprAST : public ExprAST {
1149 char Op;
1150 ExprAST *LHS, *RHS;
1151public:
1152 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
1153 : Op(op), LHS(lhs), RHS(rhs) {}
1154 virtual Value *Codegen();
1155};
1156
1157/// CallExprAST - Expression class for function calls.
1158class CallExprAST : public ExprAST {
1159 std::string Callee;
1160 std::vector&lt;ExprAST*&gt; Args;
1161public:
1162 CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
1163 : Callee(callee), Args(args) {}
1164 virtual Value *Codegen();
1165};
1166
1167/// IfExprAST - Expression class for if/then/else.
1168class IfExprAST : public ExprAST {
1169 ExprAST *Cond, *Then, *Else;
1170public:
1171 IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
1172 : Cond(cond), Then(then), Else(_else) {}
1173 virtual Value *Codegen();
1174};
1175
1176/// ForExprAST - Expression class for for/in.
1177class ForExprAST : public ExprAST {
1178 std::string VarName;
1179 ExprAST *Start, *End, *Step, *Body;
1180public:
1181 ForExprAST(const std::string &amp;varname, ExprAST *start, ExprAST *end,
1182 ExprAST *step, ExprAST *body)
1183 : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
1184 virtual Value *Codegen();
1185};
1186
1187/// VarExprAST - Expression class for var/in
1188class VarExprAST : public ExprAST {
1189 std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
1190 ExprAST *Body;
1191public:
1192 VarExprAST(const std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; &amp;varnames,
1193 ExprAST *body)
1194 : VarNames(varnames), Body(body) {}
1195
1196 virtual Value *Codegen();
1197};
1198
1199/// PrototypeAST - This class represents the "prototype" for a function,
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00001200/// which captures its name, and its argument names (thus implicitly the number
1201/// of arguments the function takes), as well as if it is an operator.
Chris Lattner62a709d2007-11-05 00:23:57 +00001202class PrototypeAST {
1203 std::string Name;
1204 std::vector&lt;std::string&gt; Args;
1205 bool isOperator;
1206 unsigned Precedence; // Precedence if a binary op.
1207public:
1208 PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args,
1209 bool isoperator = false, unsigned prec = 0)
1210 : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {}
1211
1212 bool isUnaryOp() const { return isOperator &amp;&amp; Args.size() == 1; }
1213 bool isBinaryOp() const { return isOperator &amp;&amp; Args.size() == 2; }
1214
1215 char getOperatorName() const {
1216 assert(isUnaryOp() || isBinaryOp());
1217 return Name[Name.size()-1];
1218 }
1219
1220 unsigned getBinaryPrecedence() const { return Precedence; }
1221
1222 Function *Codegen();
1223
1224 void CreateArgumentAllocas(Function *F);
1225};
1226
1227/// FunctionAST - This class represents a function definition itself.
1228class FunctionAST {
1229 PrototypeAST *Proto;
1230 ExprAST *Body;
1231public:
1232 FunctionAST(PrototypeAST *proto, ExprAST *body)
1233 : Proto(proto), Body(body) {}
1234
1235 Function *Codegen();
1236};
1237
1238//===----------------------------------------------------------------------===//
1239// Parser
1240//===----------------------------------------------------------------------===//
1241
1242/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00001243/// token the parser is looking at. getNextToken reads another token from the
Chris Lattner62a709d2007-11-05 00:23:57 +00001244/// lexer and updates CurTok with its results.
1245static int CurTok;
1246static int getNextToken() {
1247 return CurTok = gettok();
1248}
1249
1250/// BinopPrecedence - This holds the precedence for each binary operator that is
1251/// defined.
1252static std::map&lt;char, int&gt; BinopPrecedence;
1253
1254/// GetTokPrecedence - Get the precedence of the pending binary operator token.
1255static int GetTokPrecedence() {
1256 if (!isascii(CurTok))
1257 return -1;
1258
1259 // Make sure it's a declared binop.
1260 int TokPrec = BinopPrecedence[CurTok];
1261 if (TokPrec &lt;= 0) return -1;
1262 return TokPrec;
1263}
1264
1265/// Error* - These are little helper functions for error handling.
1266ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
1267PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
1268FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
1269
1270static ExprAST *ParseExpression();
1271
1272/// identifierexpr
Chris Lattner20a0c802007-11-05 17:54:34 +00001273/// ::= identifier
1274/// ::= identifier '(' expression* ')'
Chris Lattner62a709d2007-11-05 00:23:57 +00001275static ExprAST *ParseIdentifierExpr() {
1276 std::string IdName = IdentifierStr;
1277
Chris Lattner20a0c802007-11-05 17:54:34 +00001278 getNextToken(); // eat identifier.
Chris Lattner62a709d2007-11-05 00:23:57 +00001279
1280 if (CurTok != '(') // Simple variable ref.
1281 return new VariableExprAST(IdName);
1282
1283 // Call.
1284 getNextToken(); // eat (
1285 std::vector&lt;ExprAST*&gt; Args;
1286 if (CurTok != ')') {
1287 while (1) {
1288 ExprAST *Arg = ParseExpression();
1289 if (!Arg) return 0;
1290 Args.push_back(Arg);
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00001291
Chris Lattner62a709d2007-11-05 00:23:57 +00001292 if (CurTok == ')') break;
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00001293
Chris Lattner62a709d2007-11-05 00:23:57 +00001294 if (CurTok != ',')
Chris Lattner6c4be9c2008-04-14 16:44:41 +00001295 return Error("Expected ')' or ',' in argument list");
Chris Lattner62a709d2007-11-05 00:23:57 +00001296 getNextToken();
1297 }
1298 }
1299
1300 // Eat the ')'.
1301 getNextToken();
1302
1303 return new CallExprAST(IdName, Args);
1304}
1305
1306/// numberexpr ::= number
1307static ExprAST *ParseNumberExpr() {
1308 ExprAST *Result = new NumberExprAST(NumVal);
1309 getNextToken(); // consume the number
1310 return Result;
1311}
1312
1313/// parenexpr ::= '(' expression ')'
1314static ExprAST *ParseParenExpr() {
1315 getNextToken(); // eat (.
1316 ExprAST *V = ParseExpression();
1317 if (!V) return 0;
1318
1319 if (CurTok != ')')
1320 return Error("expected ')'");
1321 getNextToken(); // eat ).
1322 return V;
1323}
1324
1325/// ifexpr ::= 'if' expression 'then' expression 'else' expression
1326static ExprAST *ParseIfExpr() {
1327 getNextToken(); // eat the if.
1328
1329 // condition.
1330 ExprAST *Cond = ParseExpression();
1331 if (!Cond) return 0;
1332
1333 if (CurTok != tok_then)
1334 return Error("expected then");
1335 getNextToken(); // eat the then
1336
1337 ExprAST *Then = ParseExpression();
1338 if (Then == 0) return 0;
1339
1340 if (CurTok != tok_else)
1341 return Error("expected else");
1342
1343 getNextToken();
1344
1345 ExprAST *Else = ParseExpression();
1346 if (!Else) return 0;
1347
1348 return new IfExprAST(Cond, Then, Else);
1349}
1350
Chris Lattner20a0c802007-11-05 17:54:34 +00001351/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
Chris Lattner62a709d2007-11-05 00:23:57 +00001352static ExprAST *ParseForExpr() {
1353 getNextToken(); // eat the for.
1354
1355 if (CurTok != tok_identifier)
1356 return Error("expected identifier after for");
1357
1358 std::string IdName = IdentifierStr;
Chris Lattner20a0c802007-11-05 17:54:34 +00001359 getNextToken(); // eat identifier.
Chris Lattner62a709d2007-11-05 00:23:57 +00001360
1361 if (CurTok != '=')
1362 return Error("expected '=' after for");
1363 getNextToken(); // eat '='.
1364
1365
1366 ExprAST *Start = ParseExpression();
1367 if (Start == 0) return 0;
1368 if (CurTok != ',')
1369 return Error("expected ',' after for start value");
1370 getNextToken();
1371
1372 ExprAST *End = ParseExpression();
1373 if (End == 0) return 0;
1374
1375 // The step value is optional.
1376 ExprAST *Step = 0;
1377 if (CurTok == ',') {
1378 getNextToken();
1379 Step = ParseExpression();
1380 if (Step == 0) return 0;
1381 }
1382
1383 if (CurTok != tok_in)
1384 return Error("expected 'in' after for");
1385 getNextToken(); // eat 'in'.
1386
1387 ExprAST *Body = ParseExpression();
1388 if (Body == 0) return 0;
1389
1390 return new ForExprAST(IdName, Start, End, Step, Body);
1391}
1392
Chris Lattner20a0c802007-11-05 17:54:34 +00001393/// varexpr ::= 'var' identifier ('=' expression)?
1394// (',' identifier ('=' expression)?)* 'in' expression
Chris Lattner62a709d2007-11-05 00:23:57 +00001395static ExprAST *ParseVarExpr() {
1396 getNextToken(); // eat the var.
1397
1398 std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
1399
1400 // At least one variable name is required.
1401 if (CurTok != tok_identifier)
1402 return Error("expected identifier after var");
1403
1404 while (1) {
1405 std::string Name = IdentifierStr;
Chris Lattner20a0c802007-11-05 17:54:34 +00001406 getNextToken(); // eat identifier.
Chris Lattner62a709d2007-11-05 00:23:57 +00001407
1408 // Read the optional initializer.
1409 ExprAST *Init = 0;
1410 if (CurTok == '=') {
1411 getNextToken(); // eat the '='.
1412
1413 Init = ParseExpression();
1414 if (Init == 0) return 0;
1415 }
1416
1417 VarNames.push_back(std::make_pair(Name, Init));
1418
1419 // End of var list, exit loop.
1420 if (CurTok != ',') break;
1421 getNextToken(); // eat the ','.
1422
1423 if (CurTok != tok_identifier)
1424 return Error("expected identifier list after var");
1425 }
1426
1427 // At this point, we have to have 'in'.
1428 if (CurTok != tok_in)
1429 return Error("expected 'in' keyword after 'var'");
1430 getNextToken(); // eat 'in'.
1431
1432 ExprAST *Body = ParseExpression();
1433 if (Body == 0) return 0;
1434
1435 return new VarExprAST(VarNames, Body);
1436}
1437
Chris Lattner62a709d2007-11-05 00:23:57 +00001438/// primary
1439/// ::= identifierexpr
1440/// ::= numberexpr
1441/// ::= parenexpr
1442/// ::= ifexpr
1443/// ::= forexpr
1444/// ::= varexpr
1445static ExprAST *ParsePrimary() {
1446 switch (CurTok) {
1447 default: return Error("unknown token when expecting an expression");
1448 case tok_identifier: return ParseIdentifierExpr();
1449 case tok_number: return ParseNumberExpr();
1450 case '(': return ParseParenExpr();
1451 case tok_if: return ParseIfExpr();
1452 case tok_for: return ParseForExpr();
1453 case tok_var: return ParseVarExpr();
1454 }
1455}
1456
1457/// unary
1458/// ::= primary
1459/// ::= '!' unary
1460static ExprAST *ParseUnary() {
1461 // If the current token is not an operator, it must be a primary expr.
1462 if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
1463 return ParsePrimary();
1464
1465 // If this is a unary operator, read it.
1466 int Opc = CurTok;
1467 getNextToken();
1468 if (ExprAST *Operand = ParseUnary())
1469 return new UnaryExprAST(Opc, Operand);
1470 return 0;
1471}
1472
1473/// binoprhs
1474/// ::= ('+' unary)*
1475static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
1476 // If this is a binop, find its precedence.
1477 while (1) {
1478 int TokPrec = GetTokPrecedence();
1479
1480 // If this is a binop that binds at least as tightly as the current binop,
1481 // consume it, otherwise we are done.
1482 if (TokPrec &lt; ExprPrec)
1483 return LHS;
1484
1485 // Okay, we know this is a binop.
1486 int BinOp = CurTok;
1487 getNextToken(); // eat binop
1488
1489 // Parse the unary expression after the binary operator.
1490 ExprAST *RHS = ParseUnary();
1491 if (!RHS) return 0;
1492
1493 // If BinOp binds less tightly with RHS than the operator after RHS, let
1494 // the pending operator take RHS as its LHS.
1495 int NextPrec = GetTokPrecedence();
1496 if (TokPrec &lt; NextPrec) {
1497 RHS = ParseBinOpRHS(TokPrec+1, RHS);
1498 if (RHS == 0) return 0;
1499 }
1500
1501 // Merge LHS/RHS.
1502 LHS = new BinaryExprAST(BinOp, LHS, RHS);
1503 }
1504}
1505
1506/// expression
1507/// ::= unary binoprhs
1508///
1509static ExprAST *ParseExpression() {
1510 ExprAST *LHS = ParseUnary();
1511 if (!LHS) return 0;
1512
1513 return ParseBinOpRHS(0, LHS);
1514}
1515
1516/// prototype
1517/// ::= id '(' id* ')'
1518/// ::= binary LETTER number? (id, id)
1519/// ::= unary LETTER (id)
1520static PrototypeAST *ParsePrototype() {
1521 std::string FnName;
1522
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00001523 unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
Chris Lattner62a709d2007-11-05 00:23:57 +00001524 unsigned BinaryPrecedence = 30;
1525
1526 switch (CurTok) {
1527 default:
1528 return ErrorP("Expected function name in prototype");
1529 case tok_identifier:
1530 FnName = IdentifierStr;
1531 Kind = 0;
1532 getNextToken();
1533 break;
1534 case tok_unary:
1535 getNextToken();
1536 if (!isascii(CurTok))
1537 return ErrorP("Expected unary operator");
1538 FnName = "unary";
1539 FnName += (char)CurTok;
1540 Kind = 1;
1541 getNextToken();
1542 break;
1543 case tok_binary:
1544 getNextToken();
1545 if (!isascii(CurTok))
1546 return ErrorP("Expected binary operator");
1547 FnName = "binary";
1548 FnName += (char)CurTok;
1549 Kind = 2;
1550 getNextToken();
1551
1552 // Read the precedence if present.
1553 if (CurTok == tok_number) {
1554 if (NumVal &lt; 1 || NumVal &gt; 100)
1555 return ErrorP("Invalid precedecnce: must be 1..100");
1556 BinaryPrecedence = (unsigned)NumVal;
1557 getNextToken();
1558 }
1559 break;
1560 }
1561
1562 if (CurTok != '(')
1563 return ErrorP("Expected '(' in prototype");
1564
1565 std::vector&lt;std::string&gt; ArgNames;
1566 while (getNextToken() == tok_identifier)
1567 ArgNames.push_back(IdentifierStr);
1568 if (CurTok != ')')
1569 return ErrorP("Expected ')' in prototype");
1570
1571 // success.
1572 getNextToken(); // eat ')'.
1573
1574 // Verify right number of names for operator.
1575 if (Kind &amp;&amp; ArgNames.size() != Kind)
1576 return ErrorP("Invalid number of operands for operator");
1577
1578 return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);
1579}
1580
1581/// definition ::= 'def' prototype expression
1582static FunctionAST *ParseDefinition() {
1583 getNextToken(); // eat def.
1584 PrototypeAST *Proto = ParsePrototype();
1585 if (Proto == 0) return 0;
1586
1587 if (ExprAST *E = ParseExpression())
1588 return new FunctionAST(Proto, E);
1589 return 0;
1590}
1591
1592/// toplevelexpr ::= expression
1593static FunctionAST *ParseTopLevelExpr() {
1594 if (ExprAST *E = ParseExpression()) {
1595 // Make an anonymous proto.
1596 PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
1597 return new FunctionAST(Proto, E);
1598 }
1599 return 0;
1600}
1601
1602/// external ::= 'extern' prototype
1603static PrototypeAST *ParseExtern() {
1604 getNextToken(); // eat extern.
1605 return ParsePrototype();
1606}
1607
1608//===----------------------------------------------------------------------===//
1609// Code Generation
1610//===----------------------------------------------------------------------===//
1611
1612static Module *TheModule;
Owen Andersond1fbd142009-07-08 20:50:47 +00001613static IRBuilder&lt;&gt; Builder(getGlobalContext());
Chris Lattner62a709d2007-11-05 00:23:57 +00001614static std::map&lt;std::string, AllocaInst*&gt; NamedValues;
1615static FunctionPassManager *TheFPM;
1616
1617Value *ErrorV(const char *Str) { Error(Str); return 0; }
1618
1619/// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
1620/// the function. This is used for mutable variables etc.
1621static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
1622 const std::string &amp;VarName) {
Gabor Greifd6c1ed02009-03-11 19:51:07 +00001623 IRBuilder&lt;&gt; TmpB(&amp;TheFunction-&gt;getEntryBlock(),
Duncan Sands89f6d882008-04-13 06:22:09 +00001624 TheFunction-&gt;getEntryBlock().begin());
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00001625 return TmpB.CreateAlloca(Type::getDoubleTy(getGlobalContext()), 0,
1626 VarName.c_str());
Chris Lattner62a709d2007-11-05 00:23:57 +00001627}
1628
Chris Lattner62a709d2007-11-05 00:23:57 +00001629Value *NumberExprAST::Codegen() {
Owen Anderson6f83c9c2009-07-27 20:59:43 +00001630 return ConstantFP::get(getGlobalContext(), APFloat(Val));
Chris Lattner62a709d2007-11-05 00:23:57 +00001631}
1632
1633Value *VariableExprAST::Codegen() {
1634 // Look this variable up in the function.
1635 Value *V = NamedValues[Name];
1636 if (V == 0) return ErrorV("Unknown variable name");
1637
1638 // Load the value.
1639 return Builder.CreateLoad(V, Name.c_str());
1640}
1641
1642Value *UnaryExprAST::Codegen() {
1643 Value *OperandV = Operand-&gt;Codegen();
1644 if (OperandV == 0) return 0;
1645
1646 Function *F = TheModule-&gt;getFunction(std::string("unary")+Opcode);
1647 if (F == 0)
1648 return ErrorV("Unknown unary operator");
1649
1650 return Builder.CreateCall(F, OperandV, "unop");
1651}
1652
Chris Lattner62a709d2007-11-05 00:23:57 +00001653Value *BinaryExprAST::Codegen() {
1654 // Special case '=' because we don't want to emit the LHS as an expression.
1655 if (Op == '=') {
1656 // Assignment requires the LHS to be an identifier.
1657 VariableExprAST *LHSE = dynamic_cast&lt;VariableExprAST*&gt;(LHS);
1658 if (!LHSE)
1659 return ErrorV("destination of '=' must be a variable");
1660 // Codegen the RHS.
1661 Value *Val = RHS-&gt;Codegen();
1662 if (Val == 0) return 0;
1663
1664 // Look up the name.
1665 Value *Variable = NamedValues[LHSE-&gt;getName()];
1666 if (Variable == 0) return ErrorV("Unknown variable name");
1667
1668 Builder.CreateStore(Val, Variable);
1669 return Val;
1670 }
1671
Chris Lattner62a709d2007-11-05 00:23:57 +00001672 Value *L = LHS-&gt;Codegen();
1673 Value *R = RHS-&gt;Codegen();
1674 if (L == 0 || R == 0) return 0;
1675
1676 switch (Op) {
1677 case '+': return Builder.CreateAdd(L, R, "addtmp");
1678 case '-': return Builder.CreateSub(L, R, "subtmp");
1679 case '*': return Builder.CreateMul(L, R, "multmp");
1680 case '&lt;':
Chris Lattner71155212007-11-06 01:39:12 +00001681 L = Builder.CreateFCmpULT(L, R, "cmptmp");
Chris Lattner62a709d2007-11-05 00:23:57 +00001682 // Convert bool 0/1 to double 0.0 or 1.0
Nick Lewycky422094c2009-09-13 21:38:54 +00001683 return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
1684 "booltmp");
Chris Lattner62a709d2007-11-05 00:23:57 +00001685 default: break;
1686 }
1687
1688 // If it wasn't a builtin binary operator, it must be a user defined one. Emit
1689 // a call to it.
1690 Function *F = TheModule-&gt;getFunction(std::string("binary")+Op);
1691 assert(F &amp;&amp; "binary operator not found!");
1692
1693 Value *Ops[] = { L, R };
1694 return Builder.CreateCall(F, Ops, Ops+2, "binop");
1695}
1696
1697Value *CallExprAST::Codegen() {
1698 // Look up the name in the global module table.
1699 Function *CalleeF = TheModule-&gt;getFunction(Callee);
1700 if (CalleeF == 0)
1701 return ErrorV("Unknown function referenced");
1702
1703 // If argument mismatch error.
1704 if (CalleeF-&gt;arg_size() != Args.size())
1705 return ErrorV("Incorrect # arguments passed");
1706
1707 std::vector&lt;Value*&gt; ArgsV;
1708 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
1709 ArgsV.push_back(Args[i]-&gt;Codegen());
1710 if (ArgsV.back() == 0) return 0;
1711 }
1712
1713 return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
1714}
1715
1716Value *IfExprAST::Codegen() {
1717 Value *CondV = Cond-&gt;Codegen();
1718 if (CondV == 0) return 0;
1719
1720 // Convert condition to a bool by comparing equal to 0.0.
1721 CondV = Builder.CreateFCmpONE(CondV,
Owen Anderson6f83c9c2009-07-27 20:59:43 +00001722 ConstantFP::get(getGlobalContext(), APFloat(0.0)),
Chris Lattner62a709d2007-11-05 00:23:57 +00001723 "ifcond");
1724
1725 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1726
1727 // Create blocks for the then and else cases. Insert the 'then' block at the
1728 // end of the function.
Owen Anderson1d0be152009-08-13 21:58:54 +00001729 BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction);
1730 BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else");
1731 BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont");
Chris Lattner62a709d2007-11-05 00:23:57 +00001732
1733 Builder.CreateCondBr(CondV, ThenBB, ElseBB);
1734
1735 // Emit then value.
1736 Builder.SetInsertPoint(ThenBB);
1737
1738 Value *ThenV = Then-&gt;Codegen();
1739 if (ThenV == 0) return 0;
1740
1741 Builder.CreateBr(MergeBB);
1742 // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
1743 ThenBB = Builder.GetInsertBlock();
1744
1745 // Emit else block.
1746 TheFunction-&gt;getBasicBlockList().push_back(ElseBB);
1747 Builder.SetInsertPoint(ElseBB);
1748
1749 Value *ElseV = Else-&gt;Codegen();
1750 if (ElseV == 0) return 0;
1751
1752 Builder.CreateBr(MergeBB);
1753 // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
1754 ElseBB = Builder.GetInsertBlock();
1755
1756 // Emit merge block.
1757 TheFunction-&gt;getBasicBlockList().push_back(MergeBB);
1758 Builder.SetInsertPoint(MergeBB);
Nick Lewycky422094c2009-09-13 21:38:54 +00001759 PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()),
1760 "iftmp");
Chris Lattner62a709d2007-11-05 00:23:57 +00001761
1762 PN-&gt;addIncoming(ThenV, ThenBB);
1763 PN-&gt;addIncoming(ElseV, ElseBB);
1764 return PN;
1765}
1766
1767Value *ForExprAST::Codegen() {
1768 // Output this as:
1769 // var = alloca double
1770 // ...
1771 // start = startexpr
1772 // store start -&gt; var
1773 // goto loop
1774 // loop:
1775 // ...
1776 // bodyexpr
1777 // ...
1778 // loopend:
1779 // step = stepexpr
1780 // endcond = endexpr
1781 //
1782 // curvar = load var
1783 // nextvar = curvar + step
1784 // store nextvar -&gt; var
1785 // br endcond, loop, endloop
1786 // outloop:
1787
1788 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1789
1790 // Create an alloca for the variable in the entry block.
1791 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
1792
1793 // Emit the start code first, without 'variable' in scope.
1794 Value *StartVal = Start-&gt;Codegen();
1795 if (StartVal == 0) return 0;
1796
1797 // Store the value into the alloca.
1798 Builder.CreateStore(StartVal, Alloca);
1799
1800 // Make the new basic block for the loop header, inserting after current
1801 // block.
Owen Anderson1d0be152009-08-13 21:58:54 +00001802 BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction);
Chris Lattner62a709d2007-11-05 00:23:57 +00001803
1804 // Insert an explicit fall through from the current block to the LoopBB.
1805 Builder.CreateBr(LoopBB);
1806
1807 // Start insertion in LoopBB.
1808 Builder.SetInsertPoint(LoopBB);
1809
1810 // Within the loop, the variable is defined equal to the PHI node. If it
1811 // shadows an existing variable, we have to restore it, so save it now.
1812 AllocaInst *OldVal = NamedValues[VarName];
1813 NamedValues[VarName] = Alloca;
1814
1815 // Emit the body of the loop. This, like any other expr, can change the
1816 // current BB. Note that we ignore the value computed by the body, but don't
1817 // allow an error.
1818 if (Body-&gt;Codegen() == 0)
1819 return 0;
1820
1821 // Emit the step value.
1822 Value *StepVal;
1823 if (Step) {
1824 StepVal = Step-&gt;Codegen();
1825 if (StepVal == 0) return 0;
1826 } else {
1827 // If not specified, use 1.0.
Owen Anderson6f83c9c2009-07-27 20:59:43 +00001828 StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0));
Chris Lattner62a709d2007-11-05 00:23:57 +00001829 }
1830
1831 // Compute the end condition.
1832 Value *EndCond = End-&gt;Codegen();
1833 if (EndCond == 0) return EndCond;
1834
1835 // Reload, increment, and restore the alloca. This handles the case where
1836 // the body of the loop mutates the variable.
1837 Value *CurVar = Builder.CreateLoad(Alloca, VarName.c_str());
1838 Value *NextVar = Builder.CreateAdd(CurVar, StepVal, "nextvar");
1839 Builder.CreateStore(NextVar, Alloca);
1840
1841 // Convert condition to a bool by comparing equal to 0.0.
1842 EndCond = Builder.CreateFCmpONE(EndCond,
Owen Anderson6f83c9c2009-07-27 20:59:43 +00001843 ConstantFP::get(getGlobalContext(), APFloat(0.0)),
Chris Lattner62a709d2007-11-05 00:23:57 +00001844 "loopcond");
1845
1846 // Create the "after loop" block and insert it.
Owen Anderson1d0be152009-08-13 21:58:54 +00001847 BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction);
Chris Lattner62a709d2007-11-05 00:23:57 +00001848
1849 // Insert the conditional branch into the end of LoopEndBB.
1850 Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
1851
1852 // Any new code will be inserted in AfterBB.
1853 Builder.SetInsertPoint(AfterBB);
1854
1855 // Restore the unshadowed variable.
1856 if (OldVal)
1857 NamedValues[VarName] = OldVal;
1858 else
1859 NamedValues.erase(VarName);
1860
1861
1862 // for expr always returns 0.0.
Owen Anderson1d0be152009-08-13 21:58:54 +00001863 return Constant::getNullValue(Type::getDoubleTy(getGlobalContext()));
Chris Lattner62a709d2007-11-05 00:23:57 +00001864}
1865
1866Value *VarExprAST::Codegen() {
1867 std::vector&lt;AllocaInst *&gt; OldBindings;
1868
1869 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1870
1871 // Register all variables and emit their initializer.
1872 for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
1873 const std::string &amp;VarName = VarNames[i].first;
1874 ExprAST *Init = VarNames[i].second;
1875
1876 // Emit the initializer before adding the variable to scope, this prevents
1877 // the initializer from referencing the variable itself, and permits stuff
1878 // like this:
1879 // var a = 1 in
1880 // var a = a in ... # refers to outer 'a'.
1881 Value *InitVal;
1882 if (Init) {
1883 InitVal = Init-&gt;Codegen();
1884 if (InitVal == 0) return 0;
1885 } else { // If not specified, use 0.0.
Owen Anderson6f83c9c2009-07-27 20:59:43 +00001886 InitVal = ConstantFP::get(getGlobalContext(), APFloat(0.0));
Chris Lattner62a709d2007-11-05 00:23:57 +00001887 }
1888
1889 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
1890 Builder.CreateStore(InitVal, Alloca);
1891
1892 // Remember the old variable binding so that we can restore the binding when
1893 // we unrecurse.
1894 OldBindings.push_back(NamedValues[VarName]);
1895
1896 // Remember this binding.
1897 NamedValues[VarName] = Alloca;
1898 }
1899
1900 // Codegen the body, now that all vars are in scope.
1901 Value *BodyVal = Body-&gt;Codegen();
1902 if (BodyVal == 0) return 0;
1903
1904 // Pop all our variables from scope.
1905 for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
1906 NamedValues[VarNames[i].first] = OldBindings[i];
1907
1908 // Return the body computation.
1909 return BodyVal;
1910}
1911
Chris Lattner62a709d2007-11-05 00:23:57 +00001912Function *PrototypeAST::Codegen() {
1913 // Make the function type: double(double,double) etc.
Nick Lewycky422094c2009-09-13 21:38:54 +00001914 std::vector&lt;const Type*&gt; Doubles(Args.size(),
1915 Type::getDoubleTy(getGlobalContext()));
1916 FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()),
1917 Doubles, false);
Chris Lattner62a709d2007-11-05 00:23:57 +00001918
Gabor Greifdf7d2b42008-04-19 22:25:09 +00001919 Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
Chris Lattner62a709d2007-11-05 00:23:57 +00001920
1921 // If F conflicted, there was already something named 'Name'. If it has a
1922 // body, don't allow redefinition or reextern.
1923 if (F-&gt;getName() != Name) {
1924 // Delete the one we just made and get the existing one.
1925 F-&gt;eraseFromParent();
1926 F = TheModule-&gt;getFunction(Name);
1927
1928 // If F already has a body, reject this.
1929 if (!F-&gt;empty()) {
1930 ErrorF("redefinition of function");
1931 return 0;
1932 }
1933
1934 // If F took a different number of args, reject.
1935 if (F-&gt;arg_size() != Args.size()) {
1936 ErrorF("redefinition of function with different # args");
1937 return 0;
1938 }
1939 }
1940
1941 // Set names for all arguments.
1942 unsigned Idx = 0;
1943 for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
1944 ++AI, ++Idx)
1945 AI-&gt;setName(Args[Idx]);
1946
1947 return F;
1948}
1949
1950/// CreateArgumentAllocas - Create an alloca for each argument and register the
1951/// argument in the symbol table so that references to it will succeed.
1952void PrototypeAST::CreateArgumentAllocas(Function *F) {
1953 Function::arg_iterator AI = F-&gt;arg_begin();
1954 for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
1955 // Create an alloca for this variable.
1956 AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
1957
1958 // Store the initial value into the alloca.
1959 Builder.CreateStore(AI, Alloca);
1960
1961 // Add arguments to variable symbol table.
1962 NamedValues[Args[Idx]] = Alloca;
1963 }
1964}
1965
Chris Lattner62a709d2007-11-05 00:23:57 +00001966Function *FunctionAST::Codegen() {
1967 NamedValues.clear();
1968
1969 Function *TheFunction = Proto-&gt;Codegen();
1970 if (TheFunction == 0)
1971 return 0;
1972
1973 // If this is an operator, install it.
1974 if (Proto-&gt;isBinaryOp())
1975 BinopPrecedence[Proto-&gt;getOperatorName()] = Proto-&gt;getBinaryPrecedence();
1976
1977 // Create a new basic block to start insertion into.
Owen Anderson1d0be152009-08-13 21:58:54 +00001978 BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
Chris Lattner62a709d2007-11-05 00:23:57 +00001979 Builder.SetInsertPoint(BB);
1980
1981 // Add all arguments to the symbol table and create their allocas.
1982 Proto-&gt;CreateArgumentAllocas(TheFunction);
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00001983
Chris Lattner62a709d2007-11-05 00:23:57 +00001984 if (Value *RetVal = Body-&gt;Codegen()) {
1985 // Finish off the function.
1986 Builder.CreateRet(RetVal);
1987
1988 // Validate the generated code, checking for consistency.
1989 verifyFunction(*TheFunction);
1990
1991 // Optimize the function.
1992 TheFPM-&gt;run(*TheFunction);
1993
1994 return TheFunction;
1995 }
1996
1997 // Error reading body, remove function.
1998 TheFunction-&gt;eraseFromParent();
1999
2000 if (Proto-&gt;isBinaryOp())
2001 BinopPrecedence.erase(Proto-&gt;getOperatorName());
2002 return 0;
2003}
2004
2005//===----------------------------------------------------------------------===//
2006// Top-Level parsing and JIT Driver
2007//===----------------------------------------------------------------------===//
2008
2009static ExecutionEngine *TheExecutionEngine;
2010
2011static void HandleDefinition() {
2012 if (FunctionAST *F = ParseDefinition()) {
2013 if (Function *LF = F-&gt;Codegen()) {
2014 fprintf(stderr, "Read function definition:");
2015 LF-&gt;dump();
2016 }
2017 } else {
2018 // Skip token for error recovery.
2019 getNextToken();
2020 }
2021}
2022
2023static void HandleExtern() {
2024 if (PrototypeAST *P = ParseExtern()) {
2025 if (Function *F = P-&gt;Codegen()) {
2026 fprintf(stderr, "Read extern: ");
2027 F-&gt;dump();
2028 }
2029 } else {
2030 // Skip token for error recovery.
2031 getNextToken();
2032 }
2033}
2034
2035static void HandleTopLevelExpression() {
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00002036 // Evaluate a top-level expression into an anonymous function.
Chris Lattner62a709d2007-11-05 00:23:57 +00002037 if (FunctionAST *F = ParseTopLevelExpr()) {
2038 if (Function *LF = F-&gt;Codegen()) {
2039 // JIT the function, returning a function pointer.
2040 void *FPtr = TheExecutionEngine-&gt;getPointerToFunction(LF);
2041
2042 // Cast it to the right type (takes no arguments, returns a double) so we
2043 // can call it as a native function.
Nick Lewycky422094c2009-09-13 21:38:54 +00002044 double (*FP)() = (double (*)())(intptr_t)FPtr;
Chris Lattner62a709d2007-11-05 00:23:57 +00002045 fprintf(stderr, "Evaluated to %f\n", FP());
2046 }
2047 } else {
2048 // Skip token for error recovery.
2049 getNextToken();
2050 }
2051}
2052
2053/// top ::= definition | external | expression | ';'
2054static void MainLoop() {
2055 while (1) {
2056 fprintf(stderr, "ready&gt; ");
2057 switch (CurTok) {
2058 case tok_eof: return;
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00002059 case ';': getNextToken(); break; // ignore top-level semicolons.
Chris Lattner62a709d2007-11-05 00:23:57 +00002060 case tok_def: HandleDefinition(); break;
2061 case tok_extern: HandleExtern(); break;
2062 default: HandleTopLevelExpression(); break;
2063 }
2064 }
2065}
2066
Chris Lattner62a709d2007-11-05 00:23:57 +00002067//===----------------------------------------------------------------------===//
2068// "Library" functions that can be "extern'd" from user code.
2069//===----------------------------------------------------------------------===//
2070
2071/// putchard - putchar that takes a double and returns 0.
2072extern "C"
2073double putchard(double X) {
2074 putchar((char)X);
2075 return 0;
2076}
2077
2078/// printd - printf that takes a double prints it as "%f\n", returning 0.
2079extern "C"
2080double printd(double X) {
2081 printf("%f\n", X);
2082 return 0;
2083}
2084
2085//===----------------------------------------------------------------------===//
2086// Main driver code.
2087//===----------------------------------------------------------------------===//
2088
2089int main() {
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00002090 InitializeNativeTarget();
2091 LLVMContext &amp;Context = getGlobalContext();
2092
Chris Lattner62a709d2007-11-05 00:23:57 +00002093 // Install standard binary operators.
2094 // 1 is lowest precedence.
2095 BinopPrecedence['='] = 2;
2096 BinopPrecedence['&lt;'] = 10;
2097 BinopPrecedence['+'] = 20;
2098 BinopPrecedence['-'] = 20;
2099 BinopPrecedence['*'] = 40; // highest.
2100
2101 // Prime the first token.
2102 fprintf(stderr, "ready&gt; ");
2103 getNextToken();
2104
2105 // Make the module, which holds all the code.
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00002106 TheModule = new Module("my cool jit", Context);
Chris Lattner62a709d2007-11-05 00:23:57 +00002107
Reid Kleckner60130f02009-08-26 20:58:25 +00002108 ExistingModuleProvider *OurModuleProvider =
2109 new ExistingModuleProvider(TheModule);
Chris Lattner62a709d2007-11-05 00:23:57 +00002110
Reid Kleckner60130f02009-08-26 20:58:25 +00002111 // Create the JIT. This takes ownership of the module and module provider.
2112 TheExecutionEngine = EngineBuilder(OurModuleProvider).create();
Chris Lattner62a709d2007-11-05 00:23:57 +00002113
Reid Kleckner60130f02009-08-26 20:58:25 +00002114 FunctionPassManager OurFPM(OurModuleProvider);
2115
2116 // Set up the optimizer pipeline. Start with registering info about how the
2117 // target lays out data structures.
2118 OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
Nick Lewycky422094c2009-09-13 21:38:54 +00002119 // Promote allocas to registers.
2120 OurFPM.add(createPromoteMemoryToRegisterPass());
Reid Kleckner60130f02009-08-26 20:58:25 +00002121 // Do simple "peephole" optimizations and bit-twiddling optzns.
2122 OurFPM.add(createInstructionCombiningPass());
2123 // Reassociate expressions.
2124 OurFPM.add(createReassociatePass());
2125 // Eliminate Common SubExpressions.
2126 OurFPM.add(createGVNPass());
2127 // Simplify the control flow graph (deleting unreachable blocks, etc).
2128 OurFPM.add(createCFGSimplificationPass());
2129
Nick Lewycky422094c2009-09-13 21:38:54 +00002130 OurFPM.doInitialization();
2131
Reid Kleckner60130f02009-08-26 20:58:25 +00002132 // Set the global so the code gen can use this.
2133 TheFPM = &amp;OurFPM;
2134
2135 // Run the main "interpreter loop" now.
2136 MainLoop();
2137
2138 TheFPM = 0;
2139
2140 // Print out all of the generated code.
2141 TheModule-&gt;dump();
2142
Chris Lattner62a709d2007-11-05 00:23:57 +00002143 return 0;
2144}
Chris Lattner00c992d2007-11-03 08:55:29 +00002145</pre>
2146</div>
2147
Chris Lattner729eb142008-02-10 19:11:04 +00002148<a href="LangImpl8.html">Next: Conclusion and other useful LLVM tidbits</a>
Chris Lattner00c992d2007-11-03 08:55:29 +00002149</div>
2150
2151<!-- *********************************************************************** -->
2152<hr>
2153<address>
2154 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
2155 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
2156 <a href="http://validator.w3.org/check/referer"><img
2157 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
2158
2159 <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
2160 <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
2161 Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $
2162</address>
2163</body>
2164</html>