blob: 3cd02a750263be35ca5927813b48709f054fecec [file] [log] [blame]
Chris Lattner00c992d2007-11-03 08:55:29 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
3
4<html>
5<head>
6 <title>Kaleidoscope: Extending the Language: Mutable Variables / SSA
7 construction</title>
8 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
9 <meta name="author" content="Chris Lattner">
10 <link rel="stylesheet" href="../llvm.css" type="text/css">
11</head>
12
13<body>
14
15<div class="doc_title">Kaleidoscope: Extending the Language: Mutable Variables</div>
16
Chris Lattner128eb862007-11-05 19:06:59 +000017<ul>
Chris Lattner0e555b12007-11-05 20:04:56 +000018<li><a href="index.html">Up to Tutorial Index</a></li>
Chris Lattner128eb862007-11-05 19:06:59 +000019<li>Chapter 7
20 <ol>
21 <li><a href="#intro">Chapter 7 Introduction</a></li>
22 <li><a href="#why">Why is this a hard problem?</a></li>
23 <li><a href="#memory">Memory in LLVM</a></li>
24 <li><a href="#kalvars">Mutable Variables in Kaleidoscope</a></li>
25 <li><a href="#adjustments">Adjusting Existing Variables for
26 Mutation</a></li>
27 <li><a href="#assignment">New Assignment Operator</a></li>
28 <li><a href="#localvars">User-defined Local Variables</a></li>
29 <li><a href="#code">Full Code Listing</a></li>
30 </ol>
31</li>
Chris Lattner0e555b12007-11-05 20:04:56 +000032<li><a href="LangImpl8.html">Chapter 8</a>: Conclusion and other useful LLVM
33 tidbits</li>
Chris Lattner128eb862007-11-05 19:06:59 +000034</ul>
35
Chris Lattner00c992d2007-11-03 08:55:29 +000036<div class="doc_author">
37 <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
38</div>
39
40<!-- *********************************************************************** -->
Chris Lattner128eb862007-11-05 19:06:59 +000041<div class="doc_section"><a name="intro">Chapter 7 Introduction</a></div>
Chris Lattner00c992d2007-11-03 08:55:29 +000042<!-- *********************************************************************** -->
43
44<div class="doc_text">
45
Chris Lattner128eb862007-11-05 19:06:59 +000046<p>Welcome to Chapter 7 of the "<a href="index.html">Implementing a language
47with LLVM</a>" tutorial. In chapters 1 through 6, we've built a very
48respectable, albeit simple, <a
Chris Lattner00c992d2007-11-03 08:55:29 +000049href="http://en.wikipedia.org/wiki/Functional_programming">functional
50programming language</a>. In our journey, we learned some parsing techniques,
51how to build and represent an AST, how to build LLVM IR, and how to optimize
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +000052the resultant code as well as JIT compile it.</p>
Chris Lattner00c992d2007-11-03 08:55:29 +000053
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +000054<p>While Kaleidoscope is interesting as a functional language, the fact that it
55is functional makes it "too easy" to generate LLVM IR for it. In particular, a
56functional language makes it very easy to build LLVM IR directly in <a
Chris Lattner00c992d2007-11-03 08:55:29 +000057href="http://en.wikipedia.org/wiki/Static_single_assignment_form">SSA form</a>.
58Since LLVM requires that the input code be in SSA form, this is a very nice
59property and it is often unclear to newcomers how to generate code for an
60imperative language with mutable variables.</p>
61
62<p>The short (and happy) summary of this chapter is that there is no need for
63your front-end to build SSA form: LLVM provides highly tuned and well tested
64support for this, though the way it works is a bit unexpected for some.</p>
65
66</div>
67
68<!-- *********************************************************************** -->
69<div class="doc_section"><a name="why">Why is this a hard problem?</a></div>
70<!-- *********************************************************************** -->
71
72<div class="doc_text">
73
74<p>
75To understand why mutable variables cause complexities in SSA construction,
76consider this extremely simple C example:
77</p>
78
79<div class="doc_code">
80<pre>
81int G, H;
82int test(_Bool Condition) {
83 int X;
84 if (Condition)
85 X = G;
86 else
87 X = H;
88 return X;
89}
90</pre>
91</div>
92
93<p>In this case, we have the variable "X", whose value depends on the path
94executed in the program. Because there are two different possible values for X
95before the return instruction, a PHI node is inserted to merge the two values.
96The LLVM IR that we want for this example looks like this:</p>
97
98<div class="doc_code">
99<pre>
100@G = weak global i32 0 ; type of @G is i32*
101@H = weak global i32 0 ; type of @H is i32*
102
103define i32 @test(i1 %Condition) {
104entry:
105 br i1 %Condition, label %cond_true, label %cond_false
106
107cond_true:
108 %X.0 = load i32* @G
109 br label %cond_next
110
111cond_false:
112 %X.1 = load i32* @H
113 br label %cond_next
114
115cond_next:
116 %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
117 ret i32 %X.2
118}
119</pre>
120</div>
121
122<p>In this example, the loads from the G and H global variables are explicit in
123the LLVM IR, and they live in the then/else branches of the if statement
124(cond_true/cond_false). In order to merge the incoming values, the X.2 phi node
125in the cond_next block selects the right value to use based on where control
126flow is coming from: if control flow comes from the cond_false block, X.2 gets
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000127the value of X.1. Alternatively, if control flow comes from cond_true, it gets
Chris Lattner00c992d2007-11-03 08:55:29 +0000128the value of X.0. The intent of this chapter is not to explain the details of
129SSA form. For more information, see one of the many <a
130href="http://en.wikipedia.org/wiki/Static_single_assignment_form">online
131references</a>.</p>
132
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000133<p>The question for this article is "who places the phi nodes when lowering
Chris Lattner00c992d2007-11-03 08:55:29 +0000134assignments to mutable variables?". The issue here is that LLVM
135<em>requires</em> that its IR be in SSA form: there is no "non-ssa" mode for it.
136However, SSA construction requires non-trivial algorithms and data structures,
137so it is inconvenient and wasteful for every front-end to have to reproduce this
138logic.</p>
139
140</div>
141
142<!-- *********************************************************************** -->
143<div class="doc_section"><a name="memory">Memory in LLVM</a></div>
144<!-- *********************************************************************** -->
145
146<div class="doc_text">
147
148<p>The 'trick' here is that while LLVM does require all register values to be
149in SSA form, it does not require (or permit) memory objects to be in SSA form.
150In the example above, note that the loads from G and H are direct accesses to
151G and H: they are not renamed or versioned. This differs from some other
Chris Lattner2e5d07e2007-11-04 19:42:13 +0000152compiler systems, which do try to version memory objects. In LLVM, instead of
Chris Lattner00c992d2007-11-03 08:55:29 +0000153encoding dataflow analysis of memory into the LLVM IR, it is handled with <a
154href="../WritingAnLLVMPass.html">Analysis Passes</a> which are computed on
155demand.</p>
156
157<p>
158With this in mind, the high-level idea is that we want to make a stack variable
159(which lives in memory, because it is on the stack) for each mutable object in
160a function. To take advantage of this trick, we need to talk about how LLVM
161represents stack variables.
162</p>
163
164<p>In LLVM, all memory accesses are explicit with load/store instructions, and
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000165it is carefully designed not to have (or need) an "address-of" operator. Notice
Chris Lattner00c992d2007-11-03 08:55:29 +0000166how the type of the @G/@H global variables is actually "i32*" even though the
167variable is defined as "i32". What this means is that @G defines <em>space</em>
168for an i32 in the global data area, but its <em>name</em> actually refers to the
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000169address for that space. Stack variables work the same way, except that instead of
170being declared with global variable definitions, they are declared with the
Chris Lattner00c992d2007-11-03 08:55:29 +0000171<a href="../LangRef.html#i_alloca">LLVM alloca instruction</a>:</p>
172
173<div class="doc_code">
174<pre>
Chris Lattner1e46a6c2007-11-07 06:34:39 +0000175define i32 @example() {
Chris Lattner00c992d2007-11-03 08:55:29 +0000176entry:
177 %X = alloca i32 ; type of %X is i32*.
178 ...
179 %tmp = load i32* %X ; load the stack value %X from the stack.
180 %tmp2 = add i32 %tmp, 1 ; increment it
181 store i32 %tmp2, i32* %X ; store it back
182 ...
183</pre>
184</div>
185
186<p>This code shows an example of how you can declare and manipulate a stack
187variable in the LLVM IR. Stack memory allocated with the alloca instruction is
188fully general: you can pass the address of the stack slot to functions, you can
189store it in other variables, etc. In our example above, we could rewrite the
190example to use the alloca technique to avoid using a PHI node:</p>
191
192<div class="doc_code">
193<pre>
194@G = weak global i32 0 ; type of @G is i32*
195@H = weak global i32 0 ; type of @H is i32*
196
197define i32 @test(i1 %Condition) {
198entry:
199 %X = alloca i32 ; type of %X is i32*.
200 br i1 %Condition, label %cond_true, label %cond_false
201
202cond_true:
203 %X.0 = load i32* @G
204 store i32 %X.0, i32* %X ; Update X
205 br label %cond_next
206
207cond_false:
208 %X.1 = load i32* @H
209 store i32 %X.1, i32* %X ; Update X
210 br label %cond_next
211
212cond_next:
213 %X.2 = load i32* %X ; Read X
214 ret i32 %X.2
215}
216</pre>
217</div>
218
219<p>With this, we have discovered a way to handle arbitrary mutable variables
220without the need to create Phi nodes at all:</p>
221
222<ol>
223<li>Each mutable variable becomes a stack allocation.</li>
224<li>Each read of the variable becomes a load from the stack.</li>
225<li>Each update of the variable becomes a store to the stack.</li>
226<li>Taking the address of a variable just uses the stack address directly.</li>
227</ol>
228
229<p>While this solution has solved our immediate problem, it introduced another
230one: we have now apparently introduced a lot of stack traffic for very simple
231and common operations, a major performance problem. Fortunately for us, the
232LLVM optimizer has a highly-tuned optimization pass named "mem2reg" that handles
233this case, promoting allocas like this into SSA registers, inserting Phi nodes
234as appropriate. If you run this example through the pass, for example, you'll
235get:</p>
236
237<div class="doc_code">
238<pre>
239$ <b>llvm-as &lt; example.ll | opt -mem2reg | llvm-dis</b>
240@G = weak global i32 0
241@H = weak global i32 0
242
243define i32 @test(i1 %Condition) {
244entry:
245 br i1 %Condition, label %cond_true, label %cond_false
246
247cond_true:
248 %X.0 = load i32* @G
249 br label %cond_next
250
251cond_false:
252 %X.1 = load i32* @H
253 br label %cond_next
254
255cond_next:
256 %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
257 ret i32 %X.01
258}
259</pre>
Chris Lattnere7198312007-11-03 22:22:30 +0000260</div>
Chris Lattner00c992d2007-11-03 08:55:29 +0000261
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000262<p>The mem2reg pass implements the standard "iterated dominance frontier"
Chris Lattnere7198312007-11-03 22:22:30 +0000263algorithm for constructing SSA form and has a number of optimizations that speed
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000264up (very common) degenerate cases. The mem2reg optimization pass is the answer to dealing
265with mutable variables, and we highly recommend that you depend on it. Note that
Chris Lattnere7198312007-11-03 22:22:30 +0000266mem2reg only works on variables in certain circumstances:</p>
Chris Lattner00c992d2007-11-03 08:55:29 +0000267
Chris Lattnere7198312007-11-03 22:22:30 +0000268<ol>
269<li>mem2reg is alloca-driven: it looks for allocas and if it can handle them, it
270promotes them. It does not apply to global variables or heap allocations.</li>
Chris Lattner00c992d2007-11-03 08:55:29 +0000271
Chris Lattnere7198312007-11-03 22:22:30 +0000272<li>mem2reg only looks for alloca instructions in the entry block of the
273function. Being in the entry block guarantees that the alloca is only executed
274once, which makes analysis simpler.</li>
Chris Lattner00c992d2007-11-03 08:55:29 +0000275
Chris Lattnere7198312007-11-03 22:22:30 +0000276<li>mem2reg only promotes allocas whose uses are direct loads and stores. If
277the address of the stack object is passed to a function, or if any funny pointer
278arithmetic is involved, the alloca will not be promoted.</li>
279
Chris Lattnera56b22d2007-11-05 17:45:54 +0000280<li>mem2reg only works on allocas of <a
281href="../LangRef.html#t_classifications">first class</a>
282values (such as pointers, scalars and vectors), and only if the array size
Chris Lattnere7198312007-11-03 22:22:30 +0000283of the allocation is 1 (or missing in the .ll file). mem2reg is not capable of
284promoting structs or arrays to registers. Note that the "scalarrepl" pass is
285more powerful and can promote structs, "unions", and arrays in many cases.</li>
286
287</ol>
288
289<p>
290All of these properties are easy to satisfy for most imperative languages, and
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000291we'll illustrate it below with Kaleidoscope. The final question you may be
Chris Lattnere7198312007-11-03 22:22:30 +0000292asking is: should I bother with this nonsense for my front-end? Wouldn't it be
293better if I just did SSA construction directly, avoiding use of the mem2reg
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000294optimization pass? In short, we strongly recommend that you use this technique
Chris Lattnere7198312007-11-03 22:22:30 +0000295for building SSA form, unless there is an extremely good reason not to. Using
296this technique is:</p>
297
298<ul>
299<li>Proven and well tested: llvm-gcc and clang both use this technique for local
300mutable variables. As such, the most common clients of LLVM are using this to
301handle a bulk of their variables. You can be sure that bugs are found fast and
302fixed early.</li>
303
304<li>Extremely Fast: mem2reg has a number of special cases that make it fast in
305common cases as well as fully general. For example, it has fast-paths for
306variables that are only used in a single block, variables that only have one
307assignment point, good heuristics to avoid insertion of unneeded phi nodes, etc.
308</li>
309
310<li>Needed for debug info generation: <a href="../SourceLevelDebugging.html">
311Debug information in LLVM</a> relies on having the address of the variable
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000312exposed so that debug info can be attached to it. This technique dovetails
313very naturally with this style of debug info.</li>
Chris Lattnere7198312007-11-03 22:22:30 +0000314</ul>
315
316<p>If nothing else, this makes it much easier to get your front-end up and
317running, and is very simple to implement. Lets extend Kaleidoscope with mutable
318variables now!
Chris Lattner00c992d2007-11-03 08:55:29 +0000319</p>
Chris Lattner62a709d2007-11-05 00:23:57 +0000320
Chris Lattner00c992d2007-11-03 08:55:29 +0000321</div>
322
Chris Lattner62a709d2007-11-05 00:23:57 +0000323<!-- *********************************************************************** -->
324<div class="doc_section"><a name="kalvars">Mutable Variables in
325Kaleidoscope</a></div>
326<!-- *********************************************************************** -->
327
328<div class="doc_text">
329
330<p>Now that we know the sort of problem we want to tackle, lets see what this
331looks like in the context of our little Kaleidoscope language. We're going to
332add two features:</p>
333
334<ol>
335<li>The ability to mutate variables with the '=' operator.</li>
336<li>The ability to define new variables.</li>
337</ol>
338
339<p>While the first item is really what this is about, we only have variables
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000340for incoming arguments as well as for induction variables, and redefining those only
Chris Lattner62a709d2007-11-05 00:23:57 +0000341goes so far :). Also, the ability to define new variables is a
342useful thing regardless of whether you will be mutating them. Here's a
343motivating example that shows how we could use these:</p>
344
345<div class="doc_code">
346<pre>
347# Define ':' for sequencing: as a low-precedence operator that ignores operands
348# and just returns the RHS.
349def binary : 1 (x y) y;
350
351# Recursive fib, we could do this before.
352def fib(x)
353 if (x &lt; 3) then
354 1
355 else
356 fib(x-1)+fib(x-2);
357
358# Iterative fib.
359def fibi(x)
360 <b>var a = 1, b = 1, c in</b>
Chris Lattner1e46a6c2007-11-07 06:34:39 +0000361 (for i = 3, i &lt; x in
Chris Lattner62a709d2007-11-05 00:23:57 +0000362 <b>c = a + b</b> :
363 <b>a = b</b> :
364 <b>b = c</b>) :
365 b;
366
367# Call it.
368fibi(10);
369</pre>
370</div>
371
372<p>
373In order to mutate variables, we have to change our existing variables to use
374the "alloca trick". Once we have that, we'll add our new operator, then extend
375Kaleidoscope to support new variable definitions.
376</p>
377
378</div>
379
380<!-- *********************************************************************** -->
381<div class="doc_section"><a name="adjustments">Adjusting Existing Variables for
382Mutation</a></div>
383<!-- *********************************************************************** -->
384
385<div class="doc_text">
386
387<p>
388The symbol table in Kaleidoscope is managed at code generation time by the
389'<tt>NamedValues</tt>' map. This map currently keeps track of the LLVM "Value*"
390that holds the double value for the named variable. In order to support
391mutation, we need to change this slightly, so that it <tt>NamedValues</tt> holds
392the <em>memory location</em> of the variable in question. Note that this
393change is a refactoring: it changes the structure of the code, but does not
394(by itself) change the behavior of the compiler. All of these changes are
395isolated in the Kaleidoscope code generator.</p>
396
397<p>
398At this point in Kaleidoscope's development, it only supports variables for two
399things: incoming arguments to functions and the induction variable of 'for'
400loops. For consistency, we'll allow mutation of these variables in addition to
401other user-defined variables. This means that these will both need memory
402locations.
403</p>
404
405<p>To start our transformation of Kaleidoscope, we'll change the NamedValues
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000406map so that it maps to AllocaInst* instead of Value*. Once we do this, the C++
407compiler will tell us what parts of the code we need to update:</p>
Chris Lattner62a709d2007-11-05 00:23:57 +0000408
409<div class="doc_code">
410<pre>
411static std::map&lt;std::string, AllocaInst*&gt; NamedValues;
412</pre>
413</div>
414
415<p>Also, since we will need to create these alloca's, we'll use a helper
416function that ensures that the allocas are created in the entry block of the
417function:</p>
418
419<div class="doc_code">
420<pre>
421/// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
422/// the function. This is used for mutable variables etc.
423static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
424 const std::string &amp;VarName) {
425 LLVMBuilder TmpB(&amp;TheFunction-&gt;getEntryBlock(),
426 TheFunction-&gt;getEntryBlock().begin());
427 return TmpB.CreateAlloca(Type::DoubleTy, 0, VarName.c_str());
428}
429</pre>
430</div>
431
432<p>This funny looking code creates an LLVMBuilder object that is pointing at
433the first instruction (.begin()) of the entry block. It then creates an alloca
434with the expected name and returns it. Because all values in Kaleidoscope are
435doubles, there is no need to pass in a type to use.</p>
436
437<p>With this in place, the first functionality change we want to make is to
438variable references. In our new scheme, variables live on the stack, so code
439generating a reference to them actually needs to produce a load from the stack
440slot:</p>
441
442<div class="doc_code">
443<pre>
444Value *VariableExprAST::Codegen() {
445 // Look this variable up in the function.
446 Value *V = NamedValues[Name];
447 if (V == 0) return ErrorV("Unknown variable name");
448
Chris Lattner1e46a6c2007-11-07 06:34:39 +0000449 <b>// Load the value.
450 return Builder.CreateLoad(V, Name.c_str());</b>
Chris Lattner62a709d2007-11-05 00:23:57 +0000451}
452</pre>
453</div>
454
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000455<p>As you can see, this is pretty straightforward. Now we need to update the
Chris Lattner62a709d2007-11-05 00:23:57 +0000456things that define the variables to set up the alloca. We'll start with
457<tt>ForExprAST::Codegen</tt> (see the <a href="#code">full code listing</a> for
458the unabridged code):</p>
459
460<div class="doc_code">
461<pre>
462 Function *TheFunction = Builder.GetInsertBlock()->getParent();
463
464 <b>// Create an alloca for the variable in the entry block.
465 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);</b>
466
467 // Emit the start code first, without 'variable' in scope.
468 Value *StartVal = Start-&gt;Codegen();
469 if (StartVal == 0) return 0;
470
471 <b>// Store the value into the alloca.
472 Builder.CreateStore(StartVal, Alloca);</b>
473 ...
474
475 // Compute the end condition.
476 Value *EndCond = End-&gt;Codegen();
477 if (EndCond == 0) return EndCond;
478
479 <b>// Reload, increment, and restore the alloca. This handles the case where
480 // the body of the loop mutates the variable.
481 Value *CurVar = Builder.CreateLoad(Alloca);
482 Value *NextVar = Builder.CreateAdd(CurVar, StepVal, "nextvar");
483 Builder.CreateStore(NextVar, Alloca);</b>
484 ...
485</pre>
486</div>
487
488<p>This code is virtually identical to the code <a
489href="LangImpl5.html#forcodegen">before we allowed mutable variables</a>. The
490big difference is that we no longer have to construct a PHI node, and we use
491load/store to access the variable as needed.</p>
492
493<p>To support mutable argument variables, we need to also make allocas for them.
494The code for this is also pretty simple:</p>
495
496<div class="doc_code">
497<pre>
498/// CreateArgumentAllocas - Create an alloca for each argument and register the
499/// argument in the symbol table so that references to it will succeed.
500void PrototypeAST::CreateArgumentAllocas(Function *F) {
501 Function::arg_iterator AI = F-&gt;arg_begin();
502 for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
503 // Create an alloca for this variable.
504 AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
505
506 // Store the initial value into the alloca.
507 Builder.CreateStore(AI, Alloca);
508
509 // Add arguments to variable symbol table.
510 NamedValues[Args[Idx]] = Alloca;
511 }
512}
513</pre>
514</div>
515
516<p>For each argument, we make an alloca, store the input value to the function
517into the alloca, and register the alloca as the memory location for the
518argument. This method gets invoked by <tt>FunctionAST::Codegen</tt> right after
519it sets up the entry block for the function.</p>
520
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000521<p>The final missing piece is adding the mem2reg pass, which allows us to get
Chris Lattner62a709d2007-11-05 00:23:57 +0000522good codegen once again:</p>
523
524<div class="doc_code">
525<pre>
526 // Set up the optimizer pipeline. Start with registering info about how the
527 // target lays out data structures.
528 OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
529 <b>// Promote allocas to registers.
530 OurFPM.add(createPromoteMemoryToRegisterPass());</b>
531 // Do simple "peephole" optimizations and bit-twiddling optzns.
532 OurFPM.add(createInstructionCombiningPass());
533 // Reassociate expressions.
534 OurFPM.add(createReassociatePass());
535</pre>
536</div>
537
538<p>It is interesting to see what the code looks like before and after the
539mem2reg optimization runs. For example, this is the before/after code for our
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000540recursive fib function. Before the optimization:</p>
Chris Lattner62a709d2007-11-05 00:23:57 +0000541
542<div class="doc_code">
543<pre>
544define double @fib(double %x) {
545entry:
546 <b>%x1 = alloca double
547 store double %x, double* %x1
548 %x2 = load double* %x1</b>
Chris Lattner71155212007-11-06 01:39:12 +0000549 %cmptmp = fcmp ult double %x2, 3.000000e+00
550 %booltmp = uitofp i1 %cmptmp to double
Chris Lattner62a709d2007-11-05 00:23:57 +0000551 %ifcond = fcmp one double %booltmp, 0.000000e+00
552 br i1 %ifcond, label %then, label %else
553
554then: ; preds = %entry
555 br label %ifcont
556
557else: ; preds = %entry
558 <b>%x3 = load double* %x1</b>
559 %subtmp = sub double %x3, 1.000000e+00
560 %calltmp = call double @fib( double %subtmp )
561 <b>%x4 = load double* %x1</b>
562 %subtmp5 = sub double %x4, 2.000000e+00
563 %calltmp6 = call double @fib( double %subtmp5 )
564 %addtmp = add double %calltmp, %calltmp6
565 br label %ifcont
566
567ifcont: ; preds = %else, %then
568 %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
569 ret double %iftmp
570}
571</pre>
572</div>
573
574<p>Here there is only one variable (x, the input argument) but you can still
575see the extremely simple-minded code generation strategy we are using. In the
576entry block, an alloca is created, and the initial input value is stored into
577it. Each reference to the variable does a reload from the stack. Also, note
578that we didn't modify the if/then/else expression, so it still inserts a PHI
579node. While we could make an alloca for it, it is actually easier to create a
580PHI node for it, so we still just make the PHI.</p>
581
582<p>Here is the code after the mem2reg pass runs:</p>
583
584<div class="doc_code">
585<pre>
586define double @fib(double %x) {
587entry:
Chris Lattner71155212007-11-06 01:39:12 +0000588 %cmptmp = fcmp ult double <b>%x</b>, 3.000000e+00
589 %booltmp = uitofp i1 %cmptmp to double
Chris Lattner62a709d2007-11-05 00:23:57 +0000590 %ifcond = fcmp one double %booltmp, 0.000000e+00
591 br i1 %ifcond, label %then, label %else
592
593then:
594 br label %ifcont
595
596else:
597 %subtmp = sub double <b>%x</b>, 1.000000e+00
598 %calltmp = call double @fib( double %subtmp )
599 %subtmp5 = sub double <b>%x</b>, 2.000000e+00
600 %calltmp6 = call double @fib( double %subtmp5 )
601 %addtmp = add double %calltmp, %calltmp6
602 br label %ifcont
603
604ifcont: ; preds = %else, %then
605 %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
606 ret double %iftmp
607}
608</pre>
609</div>
610
611<p>This is a trivial case for mem2reg, since there are no redefinitions of the
612variable. The point of showing this is to calm your tension about inserting
613such blatent inefficiencies :).</p>
614
615<p>After the rest of the optimizers run, we get:</p>
616
617<div class="doc_code">
618<pre>
619define double @fib(double %x) {
620entry:
Chris Lattner71155212007-11-06 01:39:12 +0000621 %cmptmp = fcmp ult double %x, 3.000000e+00
622 %booltmp = uitofp i1 %cmptmp to double
Chris Lattner62a709d2007-11-05 00:23:57 +0000623 %ifcond = fcmp ueq double %booltmp, 0.000000e+00
624 br i1 %ifcond, label %else, label %ifcont
625
626else:
627 %subtmp = sub double %x, 1.000000e+00
628 %calltmp = call double @fib( double %subtmp )
629 %subtmp5 = sub double %x, 2.000000e+00
630 %calltmp6 = call double @fib( double %subtmp5 )
631 %addtmp = add double %calltmp, %calltmp6
632 ret double %addtmp
633
634ifcont:
635 ret double 1.000000e+00
636}
637</pre>
638</div>
639
640<p>Here we see that the simplifycfg pass decided to clone the return instruction
641into the end of the 'else' block. This allowed it to eliminate some branches
642and the PHI node.</p>
643
644<p>Now that all symbol table references are updated to use stack variables,
645we'll add the assignment operator.</p>
646
647</div>
648
649<!-- *********************************************************************** -->
650<div class="doc_section"><a name="assignment">New Assignment Operator</a></div>
651<!-- *********************************************************************** -->
652
653<div class="doc_text">
654
655<p>With our current framework, adding a new assignment operator is really
656simple. We will parse it just like any other binary operator, but handle it
657internally (instead of allowing the user to define it). The first step is to
658set a precedence:</p>
659
660<div class="doc_code">
661<pre>
662 int main() {
663 // Install standard binary operators.
664 // 1 is lowest precedence.
665 <b>BinopPrecedence['='] = 2;</b>
666 BinopPrecedence['&lt;'] = 10;
667 BinopPrecedence['+'] = 20;
668 BinopPrecedence['-'] = 20;
669</pre>
670</div>
671
672<p>Now that the parser knows the precedence of the binary operator, it takes
673care of all the parsing and AST generation. We just need to implement codegen
674for the assignment operator. This looks like:</p>
675
676<div class="doc_code">
677<pre>
678Value *BinaryExprAST::Codegen() {
679 // Special case '=' because we don't want to emit the LHS as an expression.
680 if (Op == '=') {
681 // Assignment requires the LHS to be an identifier.
682 VariableExprAST *LHSE = dynamic_cast&lt;VariableExprAST*&gt;(LHS);
683 if (!LHSE)
684 return ErrorV("destination of '=' must be a variable");
685</pre>
686</div>
687
688<p>Unlike the rest of the binary operators, our assignment operator doesn't
689follow the "emit LHS, emit RHS, do computation" model. As such, it is handled
690as a special case before the other binary operators are handled. The other
Chris Lattner1e46a6c2007-11-07 06:34:39 +0000691strange thing is that it requires the LHS to be a variable. It is invalid to
692have "(x+1) = expr" - only things like "x = expr" are allowed.
Chris Lattner62a709d2007-11-05 00:23:57 +0000693</p>
694
695<div class="doc_code">
696<pre>
697 // Codegen the RHS.
698 Value *Val = RHS-&gt;Codegen();
699 if (Val == 0) return 0;
700
701 // Look up the name.
702 Value *Variable = NamedValues[LHSE-&gt;getName()];
703 if (Variable == 0) return ErrorV("Unknown variable name");
704
705 Builder.CreateStore(Val, Variable);
706 return Val;
707 }
708 ...
709</pre>
710</div>
711
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000712<p>Once we have the variable, codegen'ing the assignment is straightforward:
Chris Lattner62a709d2007-11-05 00:23:57 +0000713we emit the RHS of the assignment, create a store, and return the computed
714value. Returning a value allows for chained assignments like "X = (Y = Z)".</p>
715
716<p>Now that we have an assignment operator, we can mutate loop variables and
717arguments. For example, we can now run code like this:</p>
718
719<div class="doc_code">
720<pre>
721# Function to print a double.
722extern printd(x);
723
724# Define ':' for sequencing: as a low-precedence operator that ignores operands
725# and just returns the RHS.
726def binary : 1 (x y) y;
727
728def test(x)
729 printd(x) :
730 x = 4 :
731 printd(x);
732
733test(123);
734</pre>
735</div>
736
737<p>When run, this example prints "123" and then "4", showing that we did
738actually mutate the value! Okay, we have now officially implemented our goal:
739getting this to work requires SSA construction in the general case. However,
740to be really useful, we want the ability to define our own local variables, lets
741add this next!
742</p>
743
744</div>
745
746<!-- *********************************************************************** -->
747<div class="doc_section"><a name="localvars">User-defined Local
748Variables</a></div>
749<!-- *********************************************************************** -->
750
751<div class="doc_text">
752
753<p>Adding var/in is just like any other other extensions we made to
754Kaleidoscope: we extend the lexer, the parser, the AST and the code generator.
755The first step for adding our new 'var/in' construct is to extend the lexer.
756As before, this is pretty trivial, the code looks like this:</p>
757
758<div class="doc_code">
759<pre>
760enum Token {
761 ...
762 <b>// var definition
763 tok_var = -13</b>
764...
765}
766...
767static int gettok() {
768...
769 if (IdentifierStr == "in") return tok_in;
770 if (IdentifierStr == "binary") return tok_binary;
771 if (IdentifierStr == "unary") return tok_unary;
772 <b>if (IdentifierStr == "var") return tok_var;</b>
773 return tok_identifier;
774...
775</pre>
776</div>
777
778<p>The next step is to define the AST node that we will construct. For var/in,
Chris Lattner1e46a6c2007-11-07 06:34:39 +0000779it looks like this:</p>
Chris Lattner62a709d2007-11-05 00:23:57 +0000780
781<div class="doc_code">
782<pre>
783/// VarExprAST - Expression class for var/in
784class VarExprAST : public ExprAST {
785 std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
786 ExprAST *Body;
787public:
788 VarExprAST(const std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; &amp;varnames,
789 ExprAST *body)
790 : VarNames(varnames), Body(body) {}
791
792 virtual Value *Codegen();
793};
794</pre>
795</div>
796
797<p>var/in allows a list of names to be defined all at once, and each name can
798optionally have an initializer value. As such, we capture this information in
799the VarNames vector. Also, var/in has a body, this body is allowed to access
Chris Lattner1e46a6c2007-11-07 06:34:39 +0000800the variables defined by the var/in.</p>
Chris Lattner62a709d2007-11-05 00:23:57 +0000801
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000802<p>With this in place, we can define the parser pieces. The first thing we do is add
Chris Lattner62a709d2007-11-05 00:23:57 +0000803it as a primary expression:</p>
804
805<div class="doc_code">
806<pre>
807/// primary
808/// ::= identifierexpr
809/// ::= numberexpr
810/// ::= parenexpr
811/// ::= ifexpr
812/// ::= forexpr
813<b>/// ::= varexpr</b>
814static ExprAST *ParsePrimary() {
815 switch (CurTok) {
816 default: return Error("unknown token when expecting an expression");
817 case tok_identifier: return ParseIdentifierExpr();
818 case tok_number: return ParseNumberExpr();
819 case '(': return ParseParenExpr();
820 case tok_if: return ParseIfExpr();
821 case tok_for: return ParseForExpr();
822 <b>case tok_var: return ParseVarExpr();</b>
823 }
824}
825</pre>
826</div>
827
828<p>Next we define ParseVarExpr:</p>
829
830<div class="doc_code">
831<pre>
Chris Lattner20a0c802007-11-05 17:54:34 +0000832/// varexpr ::= 'var' identifier ('=' expression)?
833// (',' identifier ('=' expression)?)* 'in' expression
Chris Lattner62a709d2007-11-05 00:23:57 +0000834static ExprAST *ParseVarExpr() {
835 getNextToken(); // eat the var.
836
837 std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
838
839 // At least one variable name is required.
840 if (CurTok != tok_identifier)
841 return Error("expected identifier after var");
842</pre>
843</div>
844
845<p>The first part of this code parses the list of identifier/expr pairs into the
846local <tt>VarNames</tt> vector.
847
848<div class="doc_code">
849<pre>
850 while (1) {
851 std::string Name = IdentifierStr;
Chris Lattner20a0c802007-11-05 17:54:34 +0000852 getNextToken(); // eat identifier.
Chris Lattner62a709d2007-11-05 00:23:57 +0000853
854 // Read the optional initializer.
855 ExprAST *Init = 0;
856 if (CurTok == '=') {
857 getNextToken(); // eat the '='.
858
859 Init = ParseExpression();
860 if (Init == 0) return 0;
861 }
862
863 VarNames.push_back(std::make_pair(Name, Init));
864
865 // End of var list, exit loop.
866 if (CurTok != ',') break;
867 getNextToken(); // eat the ','.
868
869 if (CurTok != tok_identifier)
870 return Error("expected identifier list after var");
871 }
872</pre>
873</div>
874
875<p>Once all the variables are parsed, we then parse the body and create the
876AST node:</p>
877
878<div class="doc_code">
879<pre>
880 // At this point, we have to have 'in'.
881 if (CurTok != tok_in)
882 return Error("expected 'in' keyword after 'var'");
883 getNextToken(); // eat 'in'.
884
885 ExprAST *Body = ParseExpression();
886 if (Body == 0) return 0;
887
888 return new VarExprAST(VarNames, Body);
889}
890</pre>
891</div>
892
893<p>Now that we can parse and represent the code, we need to support emission of
894LLVM IR for it. This code starts out with:</p>
895
896<div class="doc_code">
897<pre>
898Value *VarExprAST::Codegen() {
899 std::vector&lt;AllocaInst *&gt; OldBindings;
900
901 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
902
903 // Register all variables and emit their initializer.
904 for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
905 const std::string &amp;VarName = VarNames[i].first;
906 ExprAST *Init = VarNames[i].second;
907</pre>
908</div>
909
910<p>Basically it loops over all the variables, installing them one at a time.
911For each variable we put into the symbol table, we remember the previous value
912that we replace in OldBindings.</p>
913
914<div class="doc_code">
915<pre>
916 // Emit the initializer before adding the variable to scope, this prevents
917 // the initializer from referencing the variable itself, and permits stuff
918 // like this:
919 // var a = 1 in
920 // var a = a in ... # refers to outer 'a'.
921 Value *InitVal;
922 if (Init) {
923 InitVal = Init-&gt;Codegen();
924 if (InitVal == 0) return 0;
925 } else { // If not specified, use 0.0.
926 InitVal = ConstantFP::get(Type::DoubleTy, APFloat(0.0));
927 }
928
929 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
930 Builder.CreateStore(InitVal, Alloca);
931
932 // Remember the old variable binding so that we can restore the binding when
933 // we unrecurse.
934 OldBindings.push_back(NamedValues[VarName]);
935
936 // Remember this binding.
937 NamedValues[VarName] = Alloca;
938 }
939</pre>
940</div>
941
942<p>There are more comments here than code. The basic idea is that we emit the
943initializer, create the alloca, then update the symbol table to point to it.
944Once all the variables are installed in the symbol table, we evaluate the body
945of the var/in expression:</p>
946
947<div class="doc_code">
948<pre>
949 // Codegen the body, now that all vars are in scope.
950 Value *BodyVal = Body-&gt;Codegen();
951 if (BodyVal == 0) return 0;
952</pre>
953</div>
954
955<p>Finally, before returning, we restore the previous variable bindings:</p>
956
957<div class="doc_code">
958<pre>
959 // Pop all our variables from scope.
960 for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
961 NamedValues[VarNames[i].first] = OldBindings[i];
962
963 // Return the body computation.
964 return BodyVal;
965}
966</pre>
967</div>
968
969<p>The end result of all of this is that we get properly scoped variable
970definitions, and we even (trivially) allow mutation of them :).</p>
971
972<p>With this, we completed what we set out to do. Our nice iterative fib
973example from the intro compiles and runs just fine. The mem2reg pass optimizes
974all of our stack variables into SSA registers, inserting PHI nodes where needed,
Chris Lattnerb7e6b1a2007-11-15 04:51:31 +0000975and our front-end remains simple: no "iterated dominance frontier" computation
Chris Lattner62a709d2007-11-05 00:23:57 +0000976anywhere in sight.</p>
977
978</div>
Chris Lattner00c992d2007-11-03 08:55:29 +0000979
980<!-- *********************************************************************** -->
981<div class="doc_section"><a name="code">Full Code Listing</a></div>
982<!-- *********************************************************************** -->
983
984<div class="doc_text">
985
986<p>
Chris Lattner62a709d2007-11-05 00:23:57 +0000987Here is the complete code listing for our running example, enhanced with mutable
988variables and var/in support. To build this example, use:
Chris Lattner00c992d2007-11-03 08:55:29 +0000989</p>
990
991<div class="doc_code">
992<pre>
993 # Compile
994 g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
995 # Run
996 ./toy
997</pre>
998</div>
999
1000<p>Here is the code:</p>
1001
1002<div class="doc_code">
1003<pre>
Chris Lattner62a709d2007-11-05 00:23:57 +00001004#include "llvm/DerivedTypes.h"
1005#include "llvm/ExecutionEngine/ExecutionEngine.h"
1006#include "llvm/Module.h"
1007#include "llvm/ModuleProvider.h"
1008#include "llvm/PassManager.h"
1009#include "llvm/Analysis/Verifier.h"
1010#include "llvm/Target/TargetData.h"
1011#include "llvm/Transforms/Scalar.h"
1012#include "llvm/Support/LLVMBuilder.h"
1013#include &lt;cstdio&gt;
1014#include &lt;string&gt;
1015#include &lt;map&gt;
1016#include &lt;vector&gt;
1017using namespace llvm;
1018
1019//===----------------------------------------------------------------------===//
1020// Lexer
1021//===----------------------------------------------------------------------===//
1022
1023// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
1024// of these for known things.
1025enum Token {
1026 tok_eof = -1,
1027
1028 // commands
1029 tok_def = -2, tok_extern = -3,
1030
1031 // primary
1032 tok_identifier = -4, tok_number = -5,
1033
1034 // control
1035 tok_if = -6, tok_then = -7, tok_else = -8,
1036 tok_for = -9, tok_in = -10,
1037
1038 // operators
1039 tok_binary = -11, tok_unary = -12,
1040
1041 // var definition
1042 tok_var = -13
1043};
1044
1045static std::string IdentifierStr; // Filled in if tok_identifier
1046static double NumVal; // Filled in if tok_number
1047
1048/// gettok - Return the next token from standard input.
1049static int gettok() {
1050 static int LastChar = ' ';
1051
1052 // Skip any whitespace.
1053 while (isspace(LastChar))
1054 LastChar = getchar();
1055
1056 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
1057 IdentifierStr = LastChar;
1058 while (isalnum((LastChar = getchar())))
1059 IdentifierStr += LastChar;
1060
1061 if (IdentifierStr == "def") return tok_def;
1062 if (IdentifierStr == "extern") return tok_extern;
1063 if (IdentifierStr == "if") return tok_if;
1064 if (IdentifierStr == "then") return tok_then;
1065 if (IdentifierStr == "else") return tok_else;
1066 if (IdentifierStr == "for") return tok_for;
1067 if (IdentifierStr == "in") return tok_in;
1068 if (IdentifierStr == "binary") return tok_binary;
1069 if (IdentifierStr == "unary") return tok_unary;
1070 if (IdentifierStr == "var") return tok_var;
1071 return tok_identifier;
1072 }
1073
1074 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
1075 std::string NumStr;
1076 do {
1077 NumStr += LastChar;
1078 LastChar = getchar();
1079 } while (isdigit(LastChar) || LastChar == '.');
1080
1081 NumVal = strtod(NumStr.c_str(), 0);
1082 return tok_number;
1083 }
1084
1085 if (LastChar == '#') {
1086 // Comment until end of line.
1087 do LastChar = getchar();
Chris Lattnerc80c23f2007-12-02 22:46:01 +00001088 while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp;&amp; LastChar != '\r');
Chris Lattner62a709d2007-11-05 00:23:57 +00001089
1090 if (LastChar != EOF)
1091 return gettok();
1092 }
1093
1094 // Check for end of file. Don't eat the EOF.
1095 if (LastChar == EOF)
1096 return tok_eof;
1097
1098 // Otherwise, just return the character as its ascii value.
1099 int ThisChar = LastChar;
1100 LastChar = getchar();
1101 return ThisChar;
1102}
1103
1104//===----------------------------------------------------------------------===//
1105// Abstract Syntax Tree (aka Parse Tree)
1106//===----------------------------------------------------------------------===//
1107
1108/// ExprAST - Base class for all expression nodes.
1109class ExprAST {
1110public:
1111 virtual ~ExprAST() {}
1112 virtual Value *Codegen() = 0;
1113};
1114
1115/// NumberExprAST - Expression class for numeric literals like "1.0".
1116class NumberExprAST : public ExprAST {
1117 double Val;
1118public:
1119 NumberExprAST(double val) : Val(val) {}
1120 virtual Value *Codegen();
1121};
1122
1123/// VariableExprAST - Expression class for referencing a variable, like "a".
1124class VariableExprAST : public ExprAST {
1125 std::string Name;
1126public:
1127 VariableExprAST(const std::string &amp;name) : Name(name) {}
1128 const std::string &amp;getName() const { return Name; }
1129 virtual Value *Codegen();
1130};
1131
1132/// UnaryExprAST - Expression class for a unary operator.
1133class UnaryExprAST : public ExprAST {
1134 char Opcode;
1135 ExprAST *Operand;
1136public:
1137 UnaryExprAST(char opcode, ExprAST *operand)
1138 : Opcode(opcode), Operand(operand) {}
1139 virtual Value *Codegen();
1140};
1141
1142/// BinaryExprAST - Expression class for a binary operator.
1143class BinaryExprAST : public ExprAST {
1144 char Op;
1145 ExprAST *LHS, *RHS;
1146public:
1147 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
1148 : Op(op), LHS(lhs), RHS(rhs) {}
1149 virtual Value *Codegen();
1150};
1151
1152/// CallExprAST - Expression class for function calls.
1153class CallExprAST : public ExprAST {
1154 std::string Callee;
1155 std::vector&lt;ExprAST*&gt; Args;
1156public:
1157 CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
1158 : Callee(callee), Args(args) {}
1159 virtual Value *Codegen();
1160};
1161
1162/// IfExprAST - Expression class for if/then/else.
1163class IfExprAST : public ExprAST {
1164 ExprAST *Cond, *Then, *Else;
1165public:
1166 IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
1167 : Cond(cond), Then(then), Else(_else) {}
1168 virtual Value *Codegen();
1169};
1170
1171/// ForExprAST - Expression class for for/in.
1172class ForExprAST : public ExprAST {
1173 std::string VarName;
1174 ExprAST *Start, *End, *Step, *Body;
1175public:
1176 ForExprAST(const std::string &amp;varname, ExprAST *start, ExprAST *end,
1177 ExprAST *step, ExprAST *body)
1178 : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
1179 virtual Value *Codegen();
1180};
1181
1182/// VarExprAST - Expression class for var/in
1183class VarExprAST : public ExprAST {
1184 std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
1185 ExprAST *Body;
1186public:
1187 VarExprAST(const std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; &amp;varnames,
1188 ExprAST *body)
1189 : VarNames(varnames), Body(body) {}
1190
1191 virtual Value *Codegen();
1192};
1193
1194/// PrototypeAST - This class represents the "prototype" for a function,
1195/// which captures its argument names as well as if it is an operator.
1196class PrototypeAST {
1197 std::string Name;
1198 std::vector&lt;std::string&gt; Args;
1199 bool isOperator;
1200 unsigned Precedence; // Precedence if a binary op.
1201public:
1202 PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args,
1203 bool isoperator = false, unsigned prec = 0)
1204 : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {}
1205
1206 bool isUnaryOp() const { return isOperator &amp;&amp; Args.size() == 1; }
1207 bool isBinaryOp() const { return isOperator &amp;&amp; Args.size() == 2; }
1208
1209 char getOperatorName() const {
1210 assert(isUnaryOp() || isBinaryOp());
1211 return Name[Name.size()-1];
1212 }
1213
1214 unsigned getBinaryPrecedence() const { return Precedence; }
1215
1216 Function *Codegen();
1217
1218 void CreateArgumentAllocas(Function *F);
1219};
1220
1221/// FunctionAST - This class represents a function definition itself.
1222class FunctionAST {
1223 PrototypeAST *Proto;
1224 ExprAST *Body;
1225public:
1226 FunctionAST(PrototypeAST *proto, ExprAST *body)
1227 : Proto(proto), Body(body) {}
1228
1229 Function *Codegen();
1230};
1231
1232//===----------------------------------------------------------------------===//
1233// Parser
1234//===----------------------------------------------------------------------===//
1235
1236/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
1237/// token the parser it looking at. getNextToken reads another token from the
1238/// lexer and updates CurTok with its results.
1239static int CurTok;
1240static int getNextToken() {
1241 return CurTok = gettok();
1242}
1243
1244/// BinopPrecedence - This holds the precedence for each binary operator that is
1245/// defined.
1246static std::map&lt;char, int&gt; BinopPrecedence;
1247
1248/// GetTokPrecedence - Get the precedence of the pending binary operator token.
1249static int GetTokPrecedence() {
1250 if (!isascii(CurTok))
1251 return -1;
1252
1253 // Make sure it's a declared binop.
1254 int TokPrec = BinopPrecedence[CurTok];
1255 if (TokPrec &lt;= 0) return -1;
1256 return TokPrec;
1257}
1258
1259/// Error* - These are little helper functions for error handling.
1260ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
1261PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
1262FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
1263
1264static ExprAST *ParseExpression();
1265
1266/// identifierexpr
Chris Lattner20a0c802007-11-05 17:54:34 +00001267/// ::= identifier
1268/// ::= identifier '(' expression* ')'
Chris Lattner62a709d2007-11-05 00:23:57 +00001269static ExprAST *ParseIdentifierExpr() {
1270 std::string IdName = IdentifierStr;
1271
Chris Lattner20a0c802007-11-05 17:54:34 +00001272 getNextToken(); // eat identifier.
Chris Lattner62a709d2007-11-05 00:23:57 +00001273
1274 if (CurTok != '(') // Simple variable ref.
1275 return new VariableExprAST(IdName);
1276
1277 // Call.
1278 getNextToken(); // eat (
1279 std::vector&lt;ExprAST*&gt; Args;
1280 if (CurTok != ')') {
1281 while (1) {
1282 ExprAST *Arg = ParseExpression();
1283 if (!Arg) return 0;
1284 Args.push_back(Arg);
1285
1286 if (CurTok == ')') break;
1287
1288 if (CurTok != ',')
1289 return Error("Expected ')'");
1290 getNextToken();
1291 }
1292 }
1293
1294 // Eat the ')'.
1295 getNextToken();
1296
1297 return new CallExprAST(IdName, Args);
1298}
1299
1300/// numberexpr ::= number
1301static ExprAST *ParseNumberExpr() {
1302 ExprAST *Result = new NumberExprAST(NumVal);
1303 getNextToken(); // consume the number
1304 return Result;
1305}
1306
1307/// parenexpr ::= '(' expression ')'
1308static ExprAST *ParseParenExpr() {
1309 getNextToken(); // eat (.
1310 ExprAST *V = ParseExpression();
1311 if (!V) return 0;
1312
1313 if (CurTok != ')')
1314 return Error("expected ')'");
1315 getNextToken(); // eat ).
1316 return V;
1317}
1318
1319/// ifexpr ::= 'if' expression 'then' expression 'else' expression
1320static ExprAST *ParseIfExpr() {
1321 getNextToken(); // eat the if.
1322
1323 // condition.
1324 ExprAST *Cond = ParseExpression();
1325 if (!Cond) return 0;
1326
1327 if (CurTok != tok_then)
1328 return Error("expected then");
1329 getNextToken(); // eat the then
1330
1331 ExprAST *Then = ParseExpression();
1332 if (Then == 0) return 0;
1333
1334 if (CurTok != tok_else)
1335 return Error("expected else");
1336
1337 getNextToken();
1338
1339 ExprAST *Else = ParseExpression();
1340 if (!Else) return 0;
1341
1342 return new IfExprAST(Cond, Then, Else);
1343}
1344
Chris Lattner20a0c802007-11-05 17:54:34 +00001345/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
Chris Lattner62a709d2007-11-05 00:23:57 +00001346static ExprAST *ParseForExpr() {
1347 getNextToken(); // eat the for.
1348
1349 if (CurTok != tok_identifier)
1350 return Error("expected identifier after for");
1351
1352 std::string IdName = IdentifierStr;
Chris Lattner20a0c802007-11-05 17:54:34 +00001353 getNextToken(); // eat identifier.
Chris Lattner62a709d2007-11-05 00:23:57 +00001354
1355 if (CurTok != '=')
1356 return Error("expected '=' after for");
1357 getNextToken(); // eat '='.
1358
1359
1360 ExprAST *Start = ParseExpression();
1361 if (Start == 0) return 0;
1362 if (CurTok != ',')
1363 return Error("expected ',' after for start value");
1364 getNextToken();
1365
1366 ExprAST *End = ParseExpression();
1367 if (End == 0) return 0;
1368
1369 // The step value is optional.
1370 ExprAST *Step = 0;
1371 if (CurTok == ',') {
1372 getNextToken();
1373 Step = ParseExpression();
1374 if (Step == 0) return 0;
1375 }
1376
1377 if (CurTok != tok_in)
1378 return Error("expected 'in' after for");
1379 getNextToken(); // eat 'in'.
1380
1381 ExprAST *Body = ParseExpression();
1382 if (Body == 0) return 0;
1383
1384 return new ForExprAST(IdName, Start, End, Step, Body);
1385}
1386
Chris Lattner20a0c802007-11-05 17:54:34 +00001387/// varexpr ::= 'var' identifier ('=' expression)?
1388// (',' identifier ('=' expression)?)* 'in' expression
Chris Lattner62a709d2007-11-05 00:23:57 +00001389static ExprAST *ParseVarExpr() {
1390 getNextToken(); // eat the var.
1391
1392 std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
1393
1394 // At least one variable name is required.
1395 if (CurTok != tok_identifier)
1396 return Error("expected identifier after var");
1397
1398 while (1) {
1399 std::string Name = IdentifierStr;
Chris Lattner20a0c802007-11-05 17:54:34 +00001400 getNextToken(); // eat identifier.
Chris Lattner62a709d2007-11-05 00:23:57 +00001401
1402 // Read the optional initializer.
1403 ExprAST *Init = 0;
1404 if (CurTok == '=') {
1405 getNextToken(); // eat the '='.
1406
1407 Init = ParseExpression();
1408 if (Init == 0) return 0;
1409 }
1410
1411 VarNames.push_back(std::make_pair(Name, Init));
1412
1413 // End of var list, exit loop.
1414 if (CurTok != ',') break;
1415 getNextToken(); // eat the ','.
1416
1417 if (CurTok != tok_identifier)
1418 return Error("expected identifier list after var");
1419 }
1420
1421 // At this point, we have to have 'in'.
1422 if (CurTok != tok_in)
1423 return Error("expected 'in' keyword after 'var'");
1424 getNextToken(); // eat 'in'.
1425
1426 ExprAST *Body = ParseExpression();
1427 if (Body == 0) return 0;
1428
1429 return new VarExprAST(VarNames, Body);
1430}
1431
1432
1433/// primary
1434/// ::= identifierexpr
1435/// ::= numberexpr
1436/// ::= parenexpr
1437/// ::= ifexpr
1438/// ::= forexpr
1439/// ::= varexpr
1440static ExprAST *ParsePrimary() {
1441 switch (CurTok) {
1442 default: return Error("unknown token when expecting an expression");
1443 case tok_identifier: return ParseIdentifierExpr();
1444 case tok_number: return ParseNumberExpr();
1445 case '(': return ParseParenExpr();
1446 case tok_if: return ParseIfExpr();
1447 case tok_for: return ParseForExpr();
1448 case tok_var: return ParseVarExpr();
1449 }
1450}
1451
1452/// unary
1453/// ::= primary
1454/// ::= '!' unary
1455static ExprAST *ParseUnary() {
1456 // If the current token is not an operator, it must be a primary expr.
1457 if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
1458 return ParsePrimary();
1459
1460 // If this is a unary operator, read it.
1461 int Opc = CurTok;
1462 getNextToken();
1463 if (ExprAST *Operand = ParseUnary())
1464 return new UnaryExprAST(Opc, Operand);
1465 return 0;
1466}
1467
1468/// binoprhs
1469/// ::= ('+' unary)*
1470static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
1471 // If this is a binop, find its precedence.
1472 while (1) {
1473 int TokPrec = GetTokPrecedence();
1474
1475 // If this is a binop that binds at least as tightly as the current binop,
1476 // consume it, otherwise we are done.
1477 if (TokPrec &lt; ExprPrec)
1478 return LHS;
1479
1480 // Okay, we know this is a binop.
1481 int BinOp = CurTok;
1482 getNextToken(); // eat binop
1483
1484 // Parse the unary expression after the binary operator.
1485 ExprAST *RHS = ParseUnary();
1486 if (!RHS) return 0;
1487
1488 // If BinOp binds less tightly with RHS than the operator after RHS, let
1489 // the pending operator take RHS as its LHS.
1490 int NextPrec = GetTokPrecedence();
1491 if (TokPrec &lt; NextPrec) {
1492 RHS = ParseBinOpRHS(TokPrec+1, RHS);
1493 if (RHS == 0) return 0;
1494 }
1495
1496 // Merge LHS/RHS.
1497 LHS = new BinaryExprAST(BinOp, LHS, RHS);
1498 }
1499}
1500
1501/// expression
1502/// ::= unary binoprhs
1503///
1504static ExprAST *ParseExpression() {
1505 ExprAST *LHS = ParseUnary();
1506 if (!LHS) return 0;
1507
1508 return ParseBinOpRHS(0, LHS);
1509}
1510
1511/// prototype
1512/// ::= id '(' id* ')'
1513/// ::= binary LETTER number? (id, id)
1514/// ::= unary LETTER (id)
1515static PrototypeAST *ParsePrototype() {
1516 std::string FnName;
1517
1518 int Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
1519 unsigned BinaryPrecedence = 30;
1520
1521 switch (CurTok) {
1522 default:
1523 return ErrorP("Expected function name in prototype");
1524 case tok_identifier:
1525 FnName = IdentifierStr;
1526 Kind = 0;
1527 getNextToken();
1528 break;
1529 case tok_unary:
1530 getNextToken();
1531 if (!isascii(CurTok))
1532 return ErrorP("Expected unary operator");
1533 FnName = "unary";
1534 FnName += (char)CurTok;
1535 Kind = 1;
1536 getNextToken();
1537 break;
1538 case tok_binary:
1539 getNextToken();
1540 if (!isascii(CurTok))
1541 return ErrorP("Expected binary operator");
1542 FnName = "binary";
1543 FnName += (char)CurTok;
1544 Kind = 2;
1545 getNextToken();
1546
1547 // Read the precedence if present.
1548 if (CurTok == tok_number) {
1549 if (NumVal &lt; 1 || NumVal &gt; 100)
1550 return ErrorP("Invalid precedecnce: must be 1..100");
1551 BinaryPrecedence = (unsigned)NumVal;
1552 getNextToken();
1553 }
1554 break;
1555 }
1556
1557 if (CurTok != '(')
1558 return ErrorP("Expected '(' in prototype");
1559
1560 std::vector&lt;std::string&gt; ArgNames;
1561 while (getNextToken() == tok_identifier)
1562 ArgNames.push_back(IdentifierStr);
1563 if (CurTok != ')')
1564 return ErrorP("Expected ')' in prototype");
1565
1566 // success.
1567 getNextToken(); // eat ')'.
1568
1569 // Verify right number of names for operator.
1570 if (Kind &amp;&amp; ArgNames.size() != Kind)
1571 return ErrorP("Invalid number of operands for operator");
1572
1573 return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);
1574}
1575
1576/// definition ::= 'def' prototype expression
1577static FunctionAST *ParseDefinition() {
1578 getNextToken(); // eat def.
1579 PrototypeAST *Proto = ParsePrototype();
1580 if (Proto == 0) return 0;
1581
1582 if (ExprAST *E = ParseExpression())
1583 return new FunctionAST(Proto, E);
1584 return 0;
1585}
1586
1587/// toplevelexpr ::= expression
1588static FunctionAST *ParseTopLevelExpr() {
1589 if (ExprAST *E = ParseExpression()) {
1590 // Make an anonymous proto.
1591 PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
1592 return new FunctionAST(Proto, E);
1593 }
1594 return 0;
1595}
1596
1597/// external ::= 'extern' prototype
1598static PrototypeAST *ParseExtern() {
1599 getNextToken(); // eat extern.
1600 return ParsePrototype();
1601}
1602
1603//===----------------------------------------------------------------------===//
1604// Code Generation
1605//===----------------------------------------------------------------------===//
1606
1607static Module *TheModule;
1608static LLVMFoldingBuilder Builder;
1609static std::map&lt;std::string, AllocaInst*&gt; NamedValues;
1610static FunctionPassManager *TheFPM;
1611
1612Value *ErrorV(const char *Str) { Error(Str); return 0; }
1613
1614/// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
1615/// the function. This is used for mutable variables etc.
1616static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
1617 const std::string &amp;VarName) {
1618 LLVMBuilder TmpB(&amp;TheFunction-&gt;getEntryBlock(),
1619 TheFunction-&gt;getEntryBlock().begin());
1620 return TmpB.CreateAlloca(Type::DoubleTy, 0, VarName.c_str());
1621}
1622
1623
1624Value *NumberExprAST::Codegen() {
1625 return ConstantFP::get(Type::DoubleTy, APFloat(Val));
1626}
1627
1628Value *VariableExprAST::Codegen() {
1629 // Look this variable up in the function.
1630 Value *V = NamedValues[Name];
1631 if (V == 0) return ErrorV("Unknown variable name");
1632
1633 // Load the value.
1634 return Builder.CreateLoad(V, Name.c_str());
1635}
1636
1637Value *UnaryExprAST::Codegen() {
1638 Value *OperandV = Operand-&gt;Codegen();
1639 if (OperandV == 0) return 0;
1640
1641 Function *F = TheModule-&gt;getFunction(std::string("unary")+Opcode);
1642 if (F == 0)
1643 return ErrorV("Unknown unary operator");
1644
1645 return Builder.CreateCall(F, OperandV, "unop");
1646}
1647
1648
1649Value *BinaryExprAST::Codegen() {
1650 // Special case '=' because we don't want to emit the LHS as an expression.
1651 if (Op == '=') {
1652 // Assignment requires the LHS to be an identifier.
1653 VariableExprAST *LHSE = dynamic_cast&lt;VariableExprAST*&gt;(LHS);
1654 if (!LHSE)
1655 return ErrorV("destination of '=' must be a variable");
1656 // Codegen the RHS.
1657 Value *Val = RHS-&gt;Codegen();
1658 if (Val == 0) return 0;
1659
1660 // Look up the name.
1661 Value *Variable = NamedValues[LHSE-&gt;getName()];
1662 if (Variable == 0) return ErrorV("Unknown variable name");
1663
1664 Builder.CreateStore(Val, Variable);
1665 return Val;
1666 }
1667
1668
1669 Value *L = LHS-&gt;Codegen();
1670 Value *R = RHS-&gt;Codegen();
1671 if (L == 0 || R == 0) return 0;
1672
1673 switch (Op) {
1674 case '+': return Builder.CreateAdd(L, R, "addtmp");
1675 case '-': return Builder.CreateSub(L, R, "subtmp");
1676 case '*': return Builder.CreateMul(L, R, "multmp");
1677 case '&lt;':
Chris Lattner71155212007-11-06 01:39:12 +00001678 L = Builder.CreateFCmpULT(L, R, "cmptmp");
Chris Lattner62a709d2007-11-05 00:23:57 +00001679 // Convert bool 0/1 to double 0.0 or 1.0
1680 return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
1681 default: break;
1682 }
1683
1684 // If it wasn't a builtin binary operator, it must be a user defined one. Emit
1685 // a call to it.
1686 Function *F = TheModule-&gt;getFunction(std::string("binary")+Op);
1687 assert(F &amp;&amp; "binary operator not found!");
1688
1689 Value *Ops[] = { L, R };
1690 return Builder.CreateCall(F, Ops, Ops+2, "binop");
1691}
1692
1693Value *CallExprAST::Codegen() {
1694 // Look up the name in the global module table.
1695 Function *CalleeF = TheModule-&gt;getFunction(Callee);
1696 if (CalleeF == 0)
1697 return ErrorV("Unknown function referenced");
1698
1699 // If argument mismatch error.
1700 if (CalleeF-&gt;arg_size() != Args.size())
1701 return ErrorV("Incorrect # arguments passed");
1702
1703 std::vector&lt;Value*&gt; ArgsV;
1704 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
1705 ArgsV.push_back(Args[i]-&gt;Codegen());
1706 if (ArgsV.back() == 0) return 0;
1707 }
1708
1709 return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
1710}
1711
1712Value *IfExprAST::Codegen() {
1713 Value *CondV = Cond-&gt;Codegen();
1714 if (CondV == 0) return 0;
1715
1716 // Convert condition to a bool by comparing equal to 0.0.
1717 CondV = Builder.CreateFCmpONE(CondV,
1718 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
1719 "ifcond");
1720
1721 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1722
1723 // Create blocks for the then and else cases. Insert the 'then' block at the
1724 // end of the function.
1725 BasicBlock *ThenBB = new BasicBlock("then", TheFunction);
1726 BasicBlock *ElseBB = new BasicBlock("else");
1727 BasicBlock *MergeBB = new BasicBlock("ifcont");
1728
1729 Builder.CreateCondBr(CondV, ThenBB, ElseBB);
1730
1731 // Emit then value.
1732 Builder.SetInsertPoint(ThenBB);
1733
1734 Value *ThenV = Then-&gt;Codegen();
1735 if (ThenV == 0) return 0;
1736
1737 Builder.CreateBr(MergeBB);
1738 // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
1739 ThenBB = Builder.GetInsertBlock();
1740
1741 // Emit else block.
1742 TheFunction-&gt;getBasicBlockList().push_back(ElseBB);
1743 Builder.SetInsertPoint(ElseBB);
1744
1745 Value *ElseV = Else-&gt;Codegen();
1746 if (ElseV == 0) return 0;
1747
1748 Builder.CreateBr(MergeBB);
1749 // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
1750 ElseBB = Builder.GetInsertBlock();
1751
1752 // Emit merge block.
1753 TheFunction-&gt;getBasicBlockList().push_back(MergeBB);
1754 Builder.SetInsertPoint(MergeBB);
1755 PHINode *PN = Builder.CreatePHI(Type::DoubleTy, "iftmp");
1756
1757 PN-&gt;addIncoming(ThenV, ThenBB);
1758 PN-&gt;addIncoming(ElseV, ElseBB);
1759 return PN;
1760}
1761
1762Value *ForExprAST::Codegen() {
1763 // Output this as:
1764 // var = alloca double
1765 // ...
1766 // start = startexpr
1767 // store start -&gt; var
1768 // goto loop
1769 // loop:
1770 // ...
1771 // bodyexpr
1772 // ...
1773 // loopend:
1774 // step = stepexpr
1775 // endcond = endexpr
1776 //
1777 // curvar = load var
1778 // nextvar = curvar + step
1779 // store nextvar -&gt; var
1780 // br endcond, loop, endloop
1781 // outloop:
1782
1783 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1784
1785 // Create an alloca for the variable in the entry block.
1786 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
1787
1788 // Emit the start code first, without 'variable' in scope.
1789 Value *StartVal = Start-&gt;Codegen();
1790 if (StartVal == 0) return 0;
1791
1792 // Store the value into the alloca.
1793 Builder.CreateStore(StartVal, Alloca);
1794
1795 // Make the new basic block for the loop header, inserting after current
1796 // block.
1797 BasicBlock *PreheaderBB = Builder.GetInsertBlock();
1798 BasicBlock *LoopBB = new BasicBlock("loop", TheFunction);
1799
1800 // Insert an explicit fall through from the current block to the LoopBB.
1801 Builder.CreateBr(LoopBB);
1802
1803 // Start insertion in LoopBB.
1804 Builder.SetInsertPoint(LoopBB);
1805
1806 // Within the loop, the variable is defined equal to the PHI node. If it
1807 // shadows an existing variable, we have to restore it, so save it now.
1808 AllocaInst *OldVal = NamedValues[VarName];
1809 NamedValues[VarName] = Alloca;
1810
1811 // Emit the body of the loop. This, like any other expr, can change the
1812 // current BB. Note that we ignore the value computed by the body, but don't
1813 // allow an error.
1814 if (Body-&gt;Codegen() == 0)
1815 return 0;
1816
1817 // Emit the step value.
1818 Value *StepVal;
1819 if (Step) {
1820 StepVal = Step-&gt;Codegen();
1821 if (StepVal == 0) return 0;
1822 } else {
1823 // If not specified, use 1.0.
1824 StepVal = ConstantFP::get(Type::DoubleTy, APFloat(1.0));
1825 }
1826
1827 // Compute the end condition.
1828 Value *EndCond = End-&gt;Codegen();
1829 if (EndCond == 0) return EndCond;
1830
1831 // Reload, increment, and restore the alloca. This handles the case where
1832 // the body of the loop mutates the variable.
1833 Value *CurVar = Builder.CreateLoad(Alloca, VarName.c_str());
1834 Value *NextVar = Builder.CreateAdd(CurVar, StepVal, "nextvar");
1835 Builder.CreateStore(NextVar, Alloca);
1836
1837 // Convert condition to a bool by comparing equal to 0.0.
1838 EndCond = Builder.CreateFCmpONE(EndCond,
1839 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
1840 "loopcond");
1841
1842 // Create the "after loop" block and insert it.
1843 BasicBlock *LoopEndBB = Builder.GetInsertBlock();
1844 BasicBlock *AfterBB = new BasicBlock("afterloop", TheFunction);
1845
1846 // Insert the conditional branch into the end of LoopEndBB.
1847 Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
1848
1849 // Any new code will be inserted in AfterBB.
1850 Builder.SetInsertPoint(AfterBB);
1851
1852 // Restore the unshadowed variable.
1853 if (OldVal)
1854 NamedValues[VarName] = OldVal;
1855 else
1856 NamedValues.erase(VarName);
1857
1858
1859 // for expr always returns 0.0.
1860 return Constant::getNullValue(Type::DoubleTy);
1861}
1862
1863Value *VarExprAST::Codegen() {
1864 std::vector&lt;AllocaInst *&gt; OldBindings;
1865
1866 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1867
1868 // Register all variables and emit their initializer.
1869 for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
1870 const std::string &amp;VarName = VarNames[i].first;
1871 ExprAST *Init = VarNames[i].second;
1872
1873 // Emit the initializer before adding the variable to scope, this prevents
1874 // the initializer from referencing the variable itself, and permits stuff
1875 // like this:
1876 // var a = 1 in
1877 // var a = a in ... # refers to outer 'a'.
1878 Value *InitVal;
1879 if (Init) {
1880 InitVal = Init-&gt;Codegen();
1881 if (InitVal == 0) return 0;
1882 } else { // If not specified, use 0.0.
1883 InitVal = ConstantFP::get(Type::DoubleTy, APFloat(0.0));
1884 }
1885
1886 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
1887 Builder.CreateStore(InitVal, Alloca);
1888
1889 // Remember the old variable binding so that we can restore the binding when
1890 // we unrecurse.
1891 OldBindings.push_back(NamedValues[VarName]);
1892
1893 // Remember this binding.
1894 NamedValues[VarName] = Alloca;
1895 }
1896
1897 // Codegen the body, now that all vars are in scope.
1898 Value *BodyVal = Body-&gt;Codegen();
1899 if (BodyVal == 0) return 0;
1900
1901 // Pop all our variables from scope.
1902 for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
1903 NamedValues[VarNames[i].first] = OldBindings[i];
1904
1905 // Return the body computation.
1906 return BodyVal;
1907}
1908
1909
1910Function *PrototypeAST::Codegen() {
1911 // Make the function type: double(double,double) etc.
1912 std::vector&lt;const Type*&gt; Doubles(Args.size(), Type::DoubleTy);
1913 FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
1914
1915 Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
1916
1917 // If F conflicted, there was already something named 'Name'. If it has a
1918 // body, don't allow redefinition or reextern.
1919 if (F-&gt;getName() != Name) {
1920 // Delete the one we just made and get the existing one.
1921 F-&gt;eraseFromParent();
1922 F = TheModule-&gt;getFunction(Name);
1923
1924 // If F already has a body, reject this.
1925 if (!F-&gt;empty()) {
1926 ErrorF("redefinition of function");
1927 return 0;
1928 }
1929
1930 // If F took a different number of args, reject.
1931 if (F-&gt;arg_size() != Args.size()) {
1932 ErrorF("redefinition of function with different # args");
1933 return 0;
1934 }
1935 }
1936
1937 // Set names for all arguments.
1938 unsigned Idx = 0;
1939 for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
1940 ++AI, ++Idx)
1941 AI-&gt;setName(Args[Idx]);
1942
1943 return F;
1944}
1945
1946/// CreateArgumentAllocas - Create an alloca for each argument and register the
1947/// argument in the symbol table so that references to it will succeed.
1948void PrototypeAST::CreateArgumentAllocas(Function *F) {
1949 Function::arg_iterator AI = F-&gt;arg_begin();
1950 for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
1951 // Create an alloca for this variable.
1952 AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
1953
1954 // Store the initial value into the alloca.
1955 Builder.CreateStore(AI, Alloca);
1956
1957 // Add arguments to variable symbol table.
1958 NamedValues[Args[Idx]] = Alloca;
1959 }
1960}
1961
1962
1963Function *FunctionAST::Codegen() {
1964 NamedValues.clear();
1965
1966 Function *TheFunction = Proto-&gt;Codegen();
1967 if (TheFunction == 0)
1968 return 0;
1969
1970 // If this is an operator, install it.
1971 if (Proto-&gt;isBinaryOp())
1972 BinopPrecedence[Proto-&gt;getOperatorName()] = Proto-&gt;getBinaryPrecedence();
1973
1974 // Create a new basic block to start insertion into.
1975 BasicBlock *BB = new BasicBlock("entry", TheFunction);
1976 Builder.SetInsertPoint(BB);
1977
1978 // Add all arguments to the symbol table and create their allocas.
1979 Proto-&gt;CreateArgumentAllocas(TheFunction);
1980
1981 if (Value *RetVal = Body-&gt;Codegen()) {
1982 // Finish off the function.
1983 Builder.CreateRet(RetVal);
1984
1985 // Validate the generated code, checking for consistency.
1986 verifyFunction(*TheFunction);
1987
1988 // Optimize the function.
1989 TheFPM-&gt;run(*TheFunction);
1990
1991 return TheFunction;
1992 }
1993
1994 // Error reading body, remove function.
1995 TheFunction-&gt;eraseFromParent();
1996
1997 if (Proto-&gt;isBinaryOp())
1998 BinopPrecedence.erase(Proto-&gt;getOperatorName());
1999 return 0;
2000}
2001
2002//===----------------------------------------------------------------------===//
2003// Top-Level parsing and JIT Driver
2004//===----------------------------------------------------------------------===//
2005
2006static ExecutionEngine *TheExecutionEngine;
2007
2008static void HandleDefinition() {
2009 if (FunctionAST *F = ParseDefinition()) {
2010 if (Function *LF = F-&gt;Codegen()) {
2011 fprintf(stderr, "Read function definition:");
2012 LF-&gt;dump();
2013 }
2014 } else {
2015 // Skip token for error recovery.
2016 getNextToken();
2017 }
2018}
2019
2020static void HandleExtern() {
2021 if (PrototypeAST *P = ParseExtern()) {
2022 if (Function *F = P-&gt;Codegen()) {
2023 fprintf(stderr, "Read extern: ");
2024 F-&gt;dump();
2025 }
2026 } else {
2027 // Skip token for error recovery.
2028 getNextToken();
2029 }
2030}
2031
2032static void HandleTopLevelExpression() {
2033 // Evaluate a top level expression into an anonymous function.
2034 if (FunctionAST *F = ParseTopLevelExpr()) {
2035 if (Function *LF = F-&gt;Codegen()) {
2036 // JIT the function, returning a function pointer.
2037 void *FPtr = TheExecutionEngine-&gt;getPointerToFunction(LF);
2038
2039 // Cast it to the right type (takes no arguments, returns a double) so we
2040 // can call it as a native function.
2041 double (*FP)() = (double (*)())FPtr;
2042 fprintf(stderr, "Evaluated to %f\n", FP());
2043 }
2044 } else {
2045 // Skip token for error recovery.
2046 getNextToken();
2047 }
2048}
2049
2050/// top ::= definition | external | expression | ';'
2051static void MainLoop() {
2052 while (1) {
2053 fprintf(stderr, "ready&gt; ");
2054 switch (CurTok) {
2055 case tok_eof: return;
2056 case ';': getNextToken(); break; // ignore top level semicolons.
2057 case tok_def: HandleDefinition(); break;
2058 case tok_extern: HandleExtern(); break;
2059 default: HandleTopLevelExpression(); break;
2060 }
2061 }
2062}
2063
2064
2065
2066//===----------------------------------------------------------------------===//
2067// "Library" functions that can be "extern'd" from user code.
2068//===----------------------------------------------------------------------===//
2069
2070/// putchard - putchar that takes a double and returns 0.
2071extern "C"
2072double putchard(double X) {
2073 putchar((char)X);
2074 return 0;
2075}
2076
2077/// printd - printf that takes a double prints it as "%f\n", returning 0.
2078extern "C"
2079double printd(double X) {
2080 printf("%f\n", X);
2081 return 0;
2082}
2083
2084//===----------------------------------------------------------------------===//
2085// Main driver code.
2086//===----------------------------------------------------------------------===//
2087
2088int main() {
2089 // Install standard binary operators.
2090 // 1 is lowest precedence.
2091 BinopPrecedence['='] = 2;
2092 BinopPrecedence['&lt;'] = 10;
2093 BinopPrecedence['+'] = 20;
2094 BinopPrecedence['-'] = 20;
2095 BinopPrecedence['*'] = 40; // highest.
2096
2097 // Prime the first token.
2098 fprintf(stderr, "ready&gt; ");
2099 getNextToken();
2100
2101 // Make the module, which holds all the code.
2102 TheModule = new Module("my cool jit");
2103
2104 // Create the JIT.
2105 TheExecutionEngine = ExecutionEngine::create(TheModule);
2106
2107 {
2108 ExistingModuleProvider OurModuleProvider(TheModule);
2109 FunctionPassManager OurFPM(&amp;OurModuleProvider);
2110
2111 // Set up the optimizer pipeline. Start with registering info about how the
2112 // target lays out data structures.
2113 OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
2114 // Promote allocas to registers.
2115 OurFPM.add(createPromoteMemoryToRegisterPass());
2116 // Do simple "peephole" optimizations and bit-twiddling optzns.
2117 OurFPM.add(createInstructionCombiningPass());
2118 // Reassociate expressions.
2119 OurFPM.add(createReassociatePass());
2120 // Eliminate Common SubExpressions.
2121 OurFPM.add(createGVNPass());
2122 // Simplify the control flow graph (deleting unreachable blocks, etc).
2123 OurFPM.add(createCFGSimplificationPass());
2124
2125 // Set the global so the code gen can use this.
2126 TheFPM = &amp;OurFPM;
2127
2128 // Run the main "interpreter loop" now.
2129 MainLoop();
2130
2131 TheFPM = 0;
Chris Lattner515686b2008-02-05 06:18:42 +00002132
2133 // Print out all of the generated code.
2134 TheModule-&gt;dump();
2135
2136 } // Free module provider (and thus the module) and pass manager.
2137
Chris Lattner62a709d2007-11-05 00:23:57 +00002138 return 0;
2139}
Chris Lattner00c992d2007-11-03 08:55:29 +00002140</pre>
2141</div>
2142
2143</div>
2144
2145<!-- *********************************************************************** -->
2146<hr>
2147<address>
2148 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
2149 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
2150 <a href="http://validator.w3.org/check/referer"><img
2151 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
2152
2153 <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
2154 <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
2155 Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $
2156</address>
2157</body>
2158</html>