blob: 6664c7c5802acbcf4855ae2bd812cdda4a245dc0 [file] [log] [blame]
Chris Lattner00c992d2007-11-03 08:55:29 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
3
4<html>
5<head>
6 <title>Kaleidoscope: Extending the Language: Mutable Variables / SSA
7 construction</title>
8 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
9 <meta name="author" content="Chris Lattner">
10 <link rel="stylesheet" href="../llvm.css" type="text/css">
11</head>
12
13<body>
14
15<div class="doc_title">Kaleidoscope: Extending the Language: Mutable Variables</div>
16
17<div class="doc_author">
18 <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
19</div>
20
21<!-- *********************************************************************** -->
22<div class="doc_section"><a name="intro">Part 7 Introduction</a></div>
23<!-- *********************************************************************** -->
24
25<div class="doc_text">
26
27<p>Welcome to Part 7 of the "<a href="index.html">Implementing a language with
28LLVM</a>" tutorial. In parts 1 through 6, we've built a very respectable,
29albeit simple, <a
30href="http://en.wikipedia.org/wiki/Functional_programming">functional
31programming language</a>. In our journey, we learned some parsing techniques,
32how to build and represent an AST, how to build LLVM IR, and how to optimize
33the resultant code and JIT compile it.</p>
34
35<p>While Kaleidoscope is interesting as a functional language, this makes it
36"too easy" to generate LLVM IR for it. In particular, a functional language
37makes it very easy to build LLVM IR directly in <a
38href="http://en.wikipedia.org/wiki/Static_single_assignment_form">SSA form</a>.
39Since LLVM requires that the input code be in SSA form, this is a very nice
40property and it is often unclear to newcomers how to generate code for an
41imperative language with mutable variables.</p>
42
43<p>The short (and happy) summary of this chapter is that there is no need for
44your front-end to build SSA form: LLVM provides highly tuned and well tested
45support for this, though the way it works is a bit unexpected for some.</p>
46
47</div>
48
49<!-- *********************************************************************** -->
50<div class="doc_section"><a name="why">Why is this a hard problem?</a></div>
51<!-- *********************************************************************** -->
52
53<div class="doc_text">
54
55<p>
56To understand why mutable variables cause complexities in SSA construction,
57consider this extremely simple C example:
58</p>
59
60<div class="doc_code">
61<pre>
62int G, H;
63int test(_Bool Condition) {
64 int X;
65 if (Condition)
66 X = G;
67 else
68 X = H;
69 return X;
70}
71</pre>
72</div>
73
74<p>In this case, we have the variable "X", whose value depends on the path
75executed in the program. Because there are two different possible values for X
76before the return instruction, a PHI node is inserted to merge the two values.
77The LLVM IR that we want for this example looks like this:</p>
78
79<div class="doc_code">
80<pre>
81@G = weak global i32 0 ; type of @G is i32*
82@H = weak global i32 0 ; type of @H is i32*
83
84define i32 @test(i1 %Condition) {
85entry:
86 br i1 %Condition, label %cond_true, label %cond_false
87
88cond_true:
89 %X.0 = load i32* @G
90 br label %cond_next
91
92cond_false:
93 %X.1 = load i32* @H
94 br label %cond_next
95
96cond_next:
97 %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
98 ret i32 %X.2
99}
100</pre>
101</div>
102
103<p>In this example, the loads from the G and H global variables are explicit in
104the LLVM IR, and they live in the then/else branches of the if statement
105(cond_true/cond_false). In order to merge the incoming values, the X.2 phi node
106in the cond_next block selects the right value to use based on where control
107flow is coming from: if control flow comes from the cond_false block, X.2 gets
108the value of X.1. Alternatively, if control flow comes from cond_tree, it gets
109the value of X.0. The intent of this chapter is not to explain the details of
110SSA form. For more information, see one of the many <a
111href="http://en.wikipedia.org/wiki/Static_single_assignment_form">online
112references</a>.</p>
113
114<p>The question for this article is "who places phi nodes when lowering
115assignments to mutable variables?". The issue here is that LLVM
116<em>requires</em> that its IR be in SSA form: there is no "non-ssa" mode for it.
117However, SSA construction requires non-trivial algorithms and data structures,
118so it is inconvenient and wasteful for every front-end to have to reproduce this
119logic.</p>
120
121</div>
122
123<!-- *********************************************************************** -->
124<div class="doc_section"><a name="memory">Memory in LLVM</a></div>
125<!-- *********************************************************************** -->
126
127<div class="doc_text">
128
129<p>The 'trick' here is that while LLVM does require all register values to be
130in SSA form, it does not require (or permit) memory objects to be in SSA form.
131In the example above, note that the loads from G and H are direct accesses to
132G and H: they are not renamed or versioned. This differs from some other
Chris Lattner2e5d07e2007-11-04 19:42:13 +0000133compiler systems, which do try to version memory objects. In LLVM, instead of
Chris Lattner00c992d2007-11-03 08:55:29 +0000134encoding dataflow analysis of memory into the LLVM IR, it is handled with <a
135href="../WritingAnLLVMPass.html">Analysis Passes</a> which are computed on
136demand.</p>
137
138<p>
139With this in mind, the high-level idea is that we want to make a stack variable
140(which lives in memory, because it is on the stack) for each mutable object in
141a function. To take advantage of this trick, we need to talk about how LLVM
142represents stack variables.
143</p>
144
145<p>In LLVM, all memory accesses are explicit with load/store instructions, and
146it is carefully designed to not have (or need) an "address-of" operator. Notice
147how the type of the @G/@H global variables is actually "i32*" even though the
148variable is defined as "i32". What this means is that @G defines <em>space</em>
149for an i32 in the global data area, but its <em>name</em> actually refers to the
150address for that space. Stack variables work the same way, but instead of being
151declared with global variable definitions, they are declared with the
152<a href="../LangRef.html#i_alloca">LLVM alloca instruction</a>:</p>
153
154<div class="doc_code">
155<pre>
156define i32 @test(i1 %Condition) {
157entry:
158 %X = alloca i32 ; type of %X is i32*.
159 ...
160 %tmp = load i32* %X ; load the stack value %X from the stack.
161 %tmp2 = add i32 %tmp, 1 ; increment it
162 store i32 %tmp2, i32* %X ; store it back
163 ...
164</pre>
165</div>
166
167<p>This code shows an example of how you can declare and manipulate a stack
168variable in the LLVM IR. Stack memory allocated with the alloca instruction is
169fully general: you can pass the address of the stack slot to functions, you can
170store it in other variables, etc. In our example above, we could rewrite the
171example to use the alloca technique to avoid using a PHI node:</p>
172
173<div class="doc_code">
174<pre>
175@G = weak global i32 0 ; type of @G is i32*
176@H = weak global i32 0 ; type of @H is i32*
177
178define i32 @test(i1 %Condition) {
179entry:
180 %X = alloca i32 ; type of %X is i32*.
181 br i1 %Condition, label %cond_true, label %cond_false
182
183cond_true:
184 %X.0 = load i32* @G
185 store i32 %X.0, i32* %X ; Update X
186 br label %cond_next
187
188cond_false:
189 %X.1 = load i32* @H
190 store i32 %X.1, i32* %X ; Update X
191 br label %cond_next
192
193cond_next:
194 %X.2 = load i32* %X ; Read X
195 ret i32 %X.2
196}
197</pre>
198</div>
199
200<p>With this, we have discovered a way to handle arbitrary mutable variables
201without the need to create Phi nodes at all:</p>
202
203<ol>
204<li>Each mutable variable becomes a stack allocation.</li>
205<li>Each read of the variable becomes a load from the stack.</li>
206<li>Each update of the variable becomes a store to the stack.</li>
207<li>Taking the address of a variable just uses the stack address directly.</li>
208</ol>
209
210<p>While this solution has solved our immediate problem, it introduced another
211one: we have now apparently introduced a lot of stack traffic for very simple
212and common operations, a major performance problem. Fortunately for us, the
213LLVM optimizer has a highly-tuned optimization pass named "mem2reg" that handles
214this case, promoting allocas like this into SSA registers, inserting Phi nodes
215as appropriate. If you run this example through the pass, for example, you'll
216get:</p>
217
218<div class="doc_code">
219<pre>
220$ <b>llvm-as &lt; example.ll | opt -mem2reg | llvm-dis</b>
221@G = weak global i32 0
222@H = weak global i32 0
223
224define i32 @test(i1 %Condition) {
225entry:
226 br i1 %Condition, label %cond_true, label %cond_false
227
228cond_true:
229 %X.0 = load i32* @G
230 br label %cond_next
231
232cond_false:
233 %X.1 = load i32* @H
234 br label %cond_next
235
236cond_next:
237 %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
238 ret i32 %X.01
239}
240</pre>
Chris Lattnere7198312007-11-03 22:22:30 +0000241</div>
Chris Lattner00c992d2007-11-03 08:55:29 +0000242
Chris Lattnere7198312007-11-03 22:22:30 +0000243<p>The mem2reg pass implements the standard "iterated dominator frontier"
244algorithm for constructing SSA form and has a number of optimizations that speed
245up very common degenerate cases. mem2reg really is the answer for dealing with
246mutable variables, and we highly recommend that you depend on it. Note that
247mem2reg only works on variables in certain circumstances:</p>
Chris Lattner00c992d2007-11-03 08:55:29 +0000248
Chris Lattnere7198312007-11-03 22:22:30 +0000249<ol>
250<li>mem2reg is alloca-driven: it looks for allocas and if it can handle them, it
251promotes them. It does not apply to global variables or heap allocations.</li>
Chris Lattner00c992d2007-11-03 08:55:29 +0000252
Chris Lattnere7198312007-11-03 22:22:30 +0000253<li>mem2reg only looks for alloca instructions in the entry block of the
254function. Being in the entry block guarantees that the alloca is only executed
255once, which makes analysis simpler.</li>
Chris Lattner00c992d2007-11-03 08:55:29 +0000256
Chris Lattnere7198312007-11-03 22:22:30 +0000257<li>mem2reg only promotes allocas whose uses are direct loads and stores. If
258the address of the stack object is passed to a function, or if any funny pointer
259arithmetic is involved, the alloca will not be promoted.</li>
260
261<li>mem2reg only works on allocas of scalar values, and only if the array size
262of the allocation is 1 (or missing in the .ll file). mem2reg is not capable of
263promoting structs or arrays to registers. Note that the "scalarrepl" pass is
264more powerful and can promote structs, "unions", and arrays in many cases.</li>
265
266</ol>
267
268<p>
269All of these properties are easy to satisfy for most imperative languages, and
Chris Lattner2e5d07e2007-11-04 19:42:13 +0000270we'll illustrate this below with Kaleidoscope. The final question you may be
Chris Lattnere7198312007-11-03 22:22:30 +0000271asking is: should I bother with this nonsense for my front-end? Wouldn't it be
272better if I just did SSA construction directly, avoiding use of the mem2reg
273optimization pass? In short, we strongly recommend that use you this technique
274for building SSA form, unless there is an extremely good reason not to. Using
275this technique is:</p>
276
277<ul>
278<li>Proven and well tested: llvm-gcc and clang both use this technique for local
279mutable variables. As such, the most common clients of LLVM are using this to
280handle a bulk of their variables. You can be sure that bugs are found fast and
281fixed early.</li>
282
283<li>Extremely Fast: mem2reg has a number of special cases that make it fast in
284common cases as well as fully general. For example, it has fast-paths for
285variables that are only used in a single block, variables that only have one
286assignment point, good heuristics to avoid insertion of unneeded phi nodes, etc.
287</li>
288
289<li>Needed for debug info generation: <a href="../SourceLevelDebugging.html">
290Debug information in LLVM</a> relies on having the address of the variable
291exposed to attach debug info to it. This technique dovetails very naturally
292with this style of debug info.</li>
293</ul>
294
295<p>If nothing else, this makes it much easier to get your front-end up and
296running, and is very simple to implement. Lets extend Kaleidoscope with mutable
297variables now!
Chris Lattner00c992d2007-11-03 08:55:29 +0000298</p>
Chris Lattner62a709d2007-11-05 00:23:57 +0000299
Chris Lattner00c992d2007-11-03 08:55:29 +0000300</div>
301
Chris Lattner62a709d2007-11-05 00:23:57 +0000302<!-- *********************************************************************** -->
303<div class="doc_section"><a name="kalvars">Mutable Variables in
304Kaleidoscope</a></div>
305<!-- *********************************************************************** -->
306
307<div class="doc_text">
308
309<p>Now that we know the sort of problem we want to tackle, lets see what this
310looks like in the context of our little Kaleidoscope language. We're going to
311add two features:</p>
312
313<ol>
314<li>The ability to mutate variables with the '=' operator.</li>
315<li>The ability to define new variables.</li>
316</ol>
317
318<p>While the first item is really what this is about, we only have variables
319for incoming arguments and for induction variables, and redefining them only
320goes so far :). Also, the ability to define new variables is a
321useful thing regardless of whether you will be mutating them. Here's a
322motivating example that shows how we could use these:</p>
323
324<div class="doc_code">
325<pre>
326# Define ':' for sequencing: as a low-precedence operator that ignores operands
327# and just returns the RHS.
328def binary : 1 (x y) y;
329
330# Recursive fib, we could do this before.
331def fib(x)
332 if (x &lt; 3) then
333 1
334 else
335 fib(x-1)+fib(x-2);
336
337# Iterative fib.
338def fibi(x)
339 <b>var a = 1, b = 1, c in</b>
340 (for i = 3, i &;t; x in
341 <b>c = a + b</b> :
342 <b>a = b</b> :
343 <b>b = c</b>) :
344 b;
345
346# Call it.
347fibi(10);
348</pre>
349</div>
350
351<p>
352In order to mutate variables, we have to change our existing variables to use
353the "alloca trick". Once we have that, we'll add our new operator, then extend
354Kaleidoscope to support new variable definitions.
355</p>
356
357</div>
358
359<!-- *********************************************************************** -->
360<div class="doc_section"><a name="adjustments">Adjusting Existing Variables for
361Mutation</a></div>
362<!-- *********************************************************************** -->
363
364<div class="doc_text">
365
366<p>
367The symbol table in Kaleidoscope is managed at code generation time by the
368'<tt>NamedValues</tt>' map. This map currently keeps track of the LLVM "Value*"
369that holds the double value for the named variable. In order to support
370mutation, we need to change this slightly, so that it <tt>NamedValues</tt> holds
371the <em>memory location</em> of the variable in question. Note that this
372change is a refactoring: it changes the structure of the code, but does not
373(by itself) change the behavior of the compiler. All of these changes are
374isolated in the Kaleidoscope code generator.</p>
375
376<p>
377At this point in Kaleidoscope's development, it only supports variables for two
378things: incoming arguments to functions and the induction variable of 'for'
379loops. For consistency, we'll allow mutation of these variables in addition to
380other user-defined variables. This means that these will both need memory
381locations.
382</p>
383
384<p>To start our transformation of Kaleidoscope, we'll change the NamedValues
385map to map to AllocaInst* instead of Value*. Once we do this, the C++ compiler
386will tell use what parts of the code we need to update:</p>
387
388<div class="doc_code">
389<pre>
390static std::map&lt;std::string, AllocaInst*&gt; NamedValues;
391</pre>
392</div>
393
394<p>Also, since we will need to create these alloca's, we'll use a helper
395function that ensures that the allocas are created in the entry block of the
396function:</p>
397
398<div class="doc_code">
399<pre>
400/// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
401/// the function. This is used for mutable variables etc.
402static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
403 const std::string &amp;VarName) {
404 LLVMBuilder TmpB(&amp;TheFunction-&gt;getEntryBlock(),
405 TheFunction-&gt;getEntryBlock().begin());
406 return TmpB.CreateAlloca(Type::DoubleTy, 0, VarName.c_str());
407}
408</pre>
409</div>
410
411<p>This funny looking code creates an LLVMBuilder object that is pointing at
412the first instruction (.begin()) of the entry block. It then creates an alloca
413with the expected name and returns it. Because all values in Kaleidoscope are
414doubles, there is no need to pass in a type to use.</p>
415
416<p>With this in place, the first functionality change we want to make is to
417variable references. In our new scheme, variables live on the stack, so code
418generating a reference to them actually needs to produce a load from the stack
419slot:</p>
420
421<div class="doc_code">
422<pre>
423Value *VariableExprAST::Codegen() {
424 // Look this variable up in the function.
425 Value *V = NamedValues[Name];
426 if (V == 0) return ErrorV("Unknown variable name");
427
428 // Load the value.
429 return Builder.CreateLoad(V, Name.c_str());
430}
431</pre>
432</div>
433
434<p>As you can see, this is pretty straight-forward. Next we need to update the
435things that define the variables to set up the alloca. We'll start with
436<tt>ForExprAST::Codegen</tt> (see the <a href="#code">full code listing</a> for
437the unabridged code):</p>
438
439<div class="doc_code">
440<pre>
441 Function *TheFunction = Builder.GetInsertBlock()->getParent();
442
443 <b>// Create an alloca for the variable in the entry block.
444 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);</b>
445
446 // Emit the start code first, without 'variable' in scope.
447 Value *StartVal = Start-&gt;Codegen();
448 if (StartVal == 0) return 0;
449
450 <b>// Store the value into the alloca.
451 Builder.CreateStore(StartVal, Alloca);</b>
452 ...
453
454 // Compute the end condition.
455 Value *EndCond = End-&gt;Codegen();
456 if (EndCond == 0) return EndCond;
457
458 <b>// Reload, increment, and restore the alloca. This handles the case where
459 // the body of the loop mutates the variable.
460 Value *CurVar = Builder.CreateLoad(Alloca);
461 Value *NextVar = Builder.CreateAdd(CurVar, StepVal, "nextvar");
462 Builder.CreateStore(NextVar, Alloca);</b>
463 ...
464</pre>
465</div>
466
467<p>This code is virtually identical to the code <a
468href="LangImpl5.html#forcodegen">before we allowed mutable variables</a>. The
469big difference is that we no longer have to construct a PHI node, and we use
470load/store to access the variable as needed.</p>
471
472<p>To support mutable argument variables, we need to also make allocas for them.
473The code for this is also pretty simple:</p>
474
475<div class="doc_code">
476<pre>
477/// CreateArgumentAllocas - Create an alloca for each argument and register the
478/// argument in the symbol table so that references to it will succeed.
479void PrototypeAST::CreateArgumentAllocas(Function *F) {
480 Function::arg_iterator AI = F-&gt;arg_begin();
481 for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
482 // Create an alloca for this variable.
483 AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
484
485 // Store the initial value into the alloca.
486 Builder.CreateStore(AI, Alloca);
487
488 // Add arguments to variable symbol table.
489 NamedValues[Args[Idx]] = Alloca;
490 }
491}
492</pre>
493</div>
494
495<p>For each argument, we make an alloca, store the input value to the function
496into the alloca, and register the alloca as the memory location for the
497argument. This method gets invoked by <tt>FunctionAST::Codegen</tt> right after
498it sets up the entry block for the function.</p>
499
500<p>The final missing piece is adding the 'mem2reg' pass, which allows us to get
501good codegen once again:</p>
502
503<div class="doc_code">
504<pre>
505 // Set up the optimizer pipeline. Start with registering info about how the
506 // target lays out data structures.
507 OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
508 <b>// Promote allocas to registers.
509 OurFPM.add(createPromoteMemoryToRegisterPass());</b>
510 // Do simple "peephole" optimizations and bit-twiddling optzns.
511 OurFPM.add(createInstructionCombiningPass());
512 // Reassociate expressions.
513 OurFPM.add(createReassociatePass());
514</pre>
515</div>
516
517<p>It is interesting to see what the code looks like before and after the
518mem2reg optimization runs. For example, this is the before/after code for our
519recursive fib. Before the optimization:</p>
520
521<div class="doc_code">
522<pre>
523define double @fib(double %x) {
524entry:
525 <b>%x1 = alloca double
526 store double %x, double* %x1
527 %x2 = load double* %x1</b>
528 %multmp = fcmp ult double %x2, 3.000000e+00
529 %booltmp = uitofp i1 %multmp to double
530 %ifcond = fcmp one double %booltmp, 0.000000e+00
531 br i1 %ifcond, label %then, label %else
532
533then: ; preds = %entry
534 br label %ifcont
535
536else: ; preds = %entry
537 <b>%x3 = load double* %x1</b>
538 %subtmp = sub double %x3, 1.000000e+00
539 %calltmp = call double @fib( double %subtmp )
540 <b>%x4 = load double* %x1</b>
541 %subtmp5 = sub double %x4, 2.000000e+00
542 %calltmp6 = call double @fib( double %subtmp5 )
543 %addtmp = add double %calltmp, %calltmp6
544 br label %ifcont
545
546ifcont: ; preds = %else, %then
547 %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
548 ret double %iftmp
549}
550</pre>
551</div>
552
553<p>Here there is only one variable (x, the input argument) but you can still
554see the extremely simple-minded code generation strategy we are using. In the
555entry block, an alloca is created, and the initial input value is stored into
556it. Each reference to the variable does a reload from the stack. Also, note
557that we didn't modify the if/then/else expression, so it still inserts a PHI
558node. While we could make an alloca for it, it is actually easier to create a
559PHI node for it, so we still just make the PHI.</p>
560
561<p>Here is the code after the mem2reg pass runs:</p>
562
563<div class="doc_code">
564<pre>
565define double @fib(double %x) {
566entry:
567 %multmp = fcmp ult double <b>%x</b>, 3.000000e+00
568 %booltmp = uitofp i1 %multmp to double
569 %ifcond = fcmp one double %booltmp, 0.000000e+00
570 br i1 %ifcond, label %then, label %else
571
572then:
573 br label %ifcont
574
575else:
576 %subtmp = sub double <b>%x</b>, 1.000000e+00
577 %calltmp = call double @fib( double %subtmp )
578 %subtmp5 = sub double <b>%x</b>, 2.000000e+00
579 %calltmp6 = call double @fib( double %subtmp5 )
580 %addtmp = add double %calltmp, %calltmp6
581 br label %ifcont
582
583ifcont: ; preds = %else, %then
584 %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
585 ret double %iftmp
586}
587</pre>
588</div>
589
590<p>This is a trivial case for mem2reg, since there are no redefinitions of the
591variable. The point of showing this is to calm your tension about inserting
592such blatent inefficiencies :).</p>
593
594<p>After the rest of the optimizers run, we get:</p>
595
596<div class="doc_code">
597<pre>
598define double @fib(double %x) {
599entry:
600 %multmp = fcmp ult double %x, 3.000000e+00
601 %booltmp = uitofp i1 %multmp to double
602 %ifcond = fcmp ueq double %booltmp, 0.000000e+00
603 br i1 %ifcond, label %else, label %ifcont
604
605else:
606 %subtmp = sub double %x, 1.000000e+00
607 %calltmp = call double @fib( double %subtmp )
608 %subtmp5 = sub double %x, 2.000000e+00
609 %calltmp6 = call double @fib( double %subtmp5 )
610 %addtmp = add double %calltmp, %calltmp6
611 ret double %addtmp
612
613ifcont:
614 ret double 1.000000e+00
615}
616</pre>
617</div>
618
619<p>Here we see that the simplifycfg pass decided to clone the return instruction
620into the end of the 'else' block. This allowed it to eliminate some branches
621and the PHI node.</p>
622
623<p>Now that all symbol table references are updated to use stack variables,
624we'll add the assignment operator.</p>
625
626</div>
627
628<!-- *********************************************************************** -->
629<div class="doc_section"><a name="assignment">New Assignment Operator</a></div>
630<!-- *********************************************************************** -->
631
632<div class="doc_text">
633
634<p>With our current framework, adding a new assignment operator is really
635simple. We will parse it just like any other binary operator, but handle it
636internally (instead of allowing the user to define it). The first step is to
637set a precedence:</p>
638
639<div class="doc_code">
640<pre>
641 int main() {
642 // Install standard binary operators.
643 // 1 is lowest precedence.
644 <b>BinopPrecedence['='] = 2;</b>
645 BinopPrecedence['&lt;'] = 10;
646 BinopPrecedence['+'] = 20;
647 BinopPrecedence['-'] = 20;
648</pre>
649</div>
650
651<p>Now that the parser knows the precedence of the binary operator, it takes
652care of all the parsing and AST generation. We just need to implement codegen
653for the assignment operator. This looks like:</p>
654
655<div class="doc_code">
656<pre>
657Value *BinaryExprAST::Codegen() {
658 // Special case '=' because we don't want to emit the LHS as an expression.
659 if (Op == '=') {
660 // Assignment requires the LHS to be an identifier.
661 VariableExprAST *LHSE = dynamic_cast&lt;VariableExprAST*&gt;(LHS);
662 if (!LHSE)
663 return ErrorV("destination of '=' must be a variable");
664</pre>
665</div>
666
667<p>Unlike the rest of the binary operators, our assignment operator doesn't
668follow the "emit LHS, emit RHS, do computation" model. As such, it is handled
669as a special case before the other binary operators are handled. The other
670strange thing about it is that it requires the LHS to be a variable directly.
671</p>
672
673<div class="doc_code">
674<pre>
675 // Codegen the RHS.
676 Value *Val = RHS-&gt;Codegen();
677 if (Val == 0) return 0;
678
679 // Look up the name.
680 Value *Variable = NamedValues[LHSE-&gt;getName()];
681 if (Variable == 0) return ErrorV("Unknown variable name");
682
683 Builder.CreateStore(Val, Variable);
684 return Val;
685 }
686 ...
687</pre>
688</div>
689
690<p>Once it has the variable, codegen'ing the assignment is straight-forward:
691we emit the RHS of the assignment, create a store, and return the computed
692value. Returning a value allows for chained assignments like "X = (Y = Z)".</p>
693
694<p>Now that we have an assignment operator, we can mutate loop variables and
695arguments. For example, we can now run code like this:</p>
696
697<div class="doc_code">
698<pre>
699# Function to print a double.
700extern printd(x);
701
702# Define ':' for sequencing: as a low-precedence operator that ignores operands
703# and just returns the RHS.
704def binary : 1 (x y) y;
705
706def test(x)
707 printd(x) :
708 x = 4 :
709 printd(x);
710
711test(123);
712</pre>
713</div>
714
715<p>When run, this example prints "123" and then "4", showing that we did
716actually mutate the value! Okay, we have now officially implemented our goal:
717getting this to work requires SSA construction in the general case. However,
718to be really useful, we want the ability to define our own local variables, lets
719add this next!
720</p>
721
722</div>
723
724<!-- *********************************************************************** -->
725<div class="doc_section"><a name="localvars">User-defined Local
726Variables</a></div>
727<!-- *********************************************************************** -->
728
729<div class="doc_text">
730
731<p>Adding var/in is just like any other other extensions we made to
732Kaleidoscope: we extend the lexer, the parser, the AST and the code generator.
733The first step for adding our new 'var/in' construct is to extend the lexer.
734As before, this is pretty trivial, the code looks like this:</p>
735
736<div class="doc_code">
737<pre>
738enum Token {
739 ...
740 <b>// var definition
741 tok_var = -13</b>
742...
743}
744...
745static int gettok() {
746...
747 if (IdentifierStr == "in") return tok_in;
748 if (IdentifierStr == "binary") return tok_binary;
749 if (IdentifierStr == "unary") return tok_unary;
750 <b>if (IdentifierStr == "var") return tok_var;</b>
751 return tok_identifier;
752...
753</pre>
754</div>
755
756<p>The next step is to define the AST node that we will construct. For var/in,
757it will look like this:</p>
758
759<div class="doc_code">
760<pre>
761/// VarExprAST - Expression class for var/in
762class VarExprAST : public ExprAST {
763 std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
764 ExprAST *Body;
765public:
766 VarExprAST(const std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; &amp;varnames,
767 ExprAST *body)
768 : VarNames(varnames), Body(body) {}
769
770 virtual Value *Codegen();
771};
772</pre>
773</div>
774
775<p>var/in allows a list of names to be defined all at once, and each name can
776optionally have an initializer value. As such, we capture this information in
777the VarNames vector. Also, var/in has a body, this body is allowed to access
778the variables defined by the let/in.</p>
779
780<p>With this ready, we can define the parser pieces. First thing we do is add
781it as a primary expression:</p>
782
783<div class="doc_code">
784<pre>
785/// primary
786/// ::= identifierexpr
787/// ::= numberexpr
788/// ::= parenexpr
789/// ::= ifexpr
790/// ::= forexpr
791<b>/// ::= varexpr</b>
792static ExprAST *ParsePrimary() {
793 switch (CurTok) {
794 default: return Error("unknown token when expecting an expression");
795 case tok_identifier: return ParseIdentifierExpr();
796 case tok_number: return ParseNumberExpr();
797 case '(': return ParseParenExpr();
798 case tok_if: return ParseIfExpr();
799 case tok_for: return ParseForExpr();
800 <b>case tok_var: return ParseVarExpr();</b>
801 }
802}
803</pre>
804</div>
805
806<p>Next we define ParseVarExpr:</p>
807
808<div class="doc_code">
809<pre>
810/// varexpr ::= 'var' identifer ('=' expression)?
811// (',' identifer ('=' expression)?)* 'in' expression
812static ExprAST *ParseVarExpr() {
813 getNextToken(); // eat the var.
814
815 std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
816
817 // At least one variable name is required.
818 if (CurTok != tok_identifier)
819 return Error("expected identifier after var");
820</pre>
821</div>
822
823<p>The first part of this code parses the list of identifier/expr pairs into the
824local <tt>VarNames</tt> vector.
825
826<div class="doc_code">
827<pre>
828 while (1) {
829 std::string Name = IdentifierStr;
830 getNextToken(); // eat identifer.
831
832 // Read the optional initializer.
833 ExprAST *Init = 0;
834 if (CurTok == '=') {
835 getNextToken(); // eat the '='.
836
837 Init = ParseExpression();
838 if (Init == 0) return 0;
839 }
840
841 VarNames.push_back(std::make_pair(Name, Init));
842
843 // End of var list, exit loop.
844 if (CurTok != ',') break;
845 getNextToken(); // eat the ','.
846
847 if (CurTok != tok_identifier)
848 return Error("expected identifier list after var");
849 }
850</pre>
851</div>
852
853<p>Once all the variables are parsed, we then parse the body and create the
854AST node:</p>
855
856<div class="doc_code">
857<pre>
858 // At this point, we have to have 'in'.
859 if (CurTok != tok_in)
860 return Error("expected 'in' keyword after 'var'");
861 getNextToken(); // eat 'in'.
862
863 ExprAST *Body = ParseExpression();
864 if (Body == 0) return 0;
865
866 return new VarExprAST(VarNames, Body);
867}
868</pre>
869</div>
870
871<p>Now that we can parse and represent the code, we need to support emission of
872LLVM IR for it. This code starts out with:</p>
873
874<div class="doc_code">
875<pre>
876Value *VarExprAST::Codegen() {
877 std::vector&lt;AllocaInst *&gt; OldBindings;
878
879 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
880
881 // Register all variables and emit their initializer.
882 for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
883 const std::string &amp;VarName = VarNames[i].first;
884 ExprAST *Init = VarNames[i].second;
885</pre>
886</div>
887
888<p>Basically it loops over all the variables, installing them one at a time.
889For each variable we put into the symbol table, we remember the previous value
890that we replace in OldBindings.</p>
891
892<div class="doc_code">
893<pre>
894 // Emit the initializer before adding the variable to scope, this prevents
895 // the initializer from referencing the variable itself, and permits stuff
896 // like this:
897 // var a = 1 in
898 // var a = a in ... # refers to outer 'a'.
899 Value *InitVal;
900 if (Init) {
901 InitVal = Init-&gt;Codegen();
902 if (InitVal == 0) return 0;
903 } else { // If not specified, use 0.0.
904 InitVal = ConstantFP::get(Type::DoubleTy, APFloat(0.0));
905 }
906
907 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
908 Builder.CreateStore(InitVal, Alloca);
909
910 // Remember the old variable binding so that we can restore the binding when
911 // we unrecurse.
912 OldBindings.push_back(NamedValues[VarName]);
913
914 // Remember this binding.
915 NamedValues[VarName] = Alloca;
916 }
917</pre>
918</div>
919
920<p>There are more comments here than code. The basic idea is that we emit the
921initializer, create the alloca, then update the symbol table to point to it.
922Once all the variables are installed in the symbol table, we evaluate the body
923of the var/in expression:</p>
924
925<div class="doc_code">
926<pre>
927 // Codegen the body, now that all vars are in scope.
928 Value *BodyVal = Body-&gt;Codegen();
929 if (BodyVal == 0) return 0;
930</pre>
931</div>
932
933<p>Finally, before returning, we restore the previous variable bindings:</p>
934
935<div class="doc_code">
936<pre>
937 // Pop all our variables from scope.
938 for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
939 NamedValues[VarNames[i].first] = OldBindings[i];
940
941 // Return the body computation.
942 return BodyVal;
943}
944</pre>
945</div>
946
947<p>The end result of all of this is that we get properly scoped variable
948definitions, and we even (trivially) allow mutation of them :).</p>
949
950<p>With this, we completed what we set out to do. Our nice iterative fib
951example from the intro compiles and runs just fine. The mem2reg pass optimizes
952all of our stack variables into SSA registers, inserting PHI nodes where needed,
953and our front-end remains simple: no iterated dominator frontier computation
954anywhere in sight.</p>
955
956</div>
Chris Lattner00c992d2007-11-03 08:55:29 +0000957
958<!-- *********************************************************************** -->
959<div class="doc_section"><a name="code">Full Code Listing</a></div>
960<!-- *********************************************************************** -->
961
962<div class="doc_text">
963
964<p>
Chris Lattner62a709d2007-11-05 00:23:57 +0000965Here is the complete code listing for our running example, enhanced with mutable
966variables and var/in support. To build this example, use:
Chris Lattner00c992d2007-11-03 08:55:29 +0000967</p>
968
969<div class="doc_code">
970<pre>
971 # Compile
972 g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
973 # Run
974 ./toy
975</pre>
976</div>
977
978<p>Here is the code:</p>
979
980<div class="doc_code">
981<pre>
Chris Lattner62a709d2007-11-05 00:23:57 +0000982#include "llvm/DerivedTypes.h"
983#include "llvm/ExecutionEngine/ExecutionEngine.h"
984#include "llvm/Module.h"
985#include "llvm/ModuleProvider.h"
986#include "llvm/PassManager.h"
987#include "llvm/Analysis/Verifier.h"
988#include "llvm/Target/TargetData.h"
989#include "llvm/Transforms/Scalar.h"
990#include "llvm/Support/LLVMBuilder.h"
991#include &lt;cstdio&gt;
992#include &lt;string&gt;
993#include &lt;map&gt;
994#include &lt;vector&gt;
995using namespace llvm;
996
997//===----------------------------------------------------------------------===//
998// Lexer
999//===----------------------------------------------------------------------===//
1000
1001// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
1002// of these for known things.
1003enum Token {
1004 tok_eof = -1,
1005
1006 // commands
1007 tok_def = -2, tok_extern = -3,
1008
1009 // primary
1010 tok_identifier = -4, tok_number = -5,
1011
1012 // control
1013 tok_if = -6, tok_then = -7, tok_else = -8,
1014 tok_for = -9, tok_in = -10,
1015
1016 // operators
1017 tok_binary = -11, tok_unary = -12,
1018
1019 // var definition
1020 tok_var = -13
1021};
1022
1023static std::string IdentifierStr; // Filled in if tok_identifier
1024static double NumVal; // Filled in if tok_number
1025
1026/// gettok - Return the next token from standard input.
1027static int gettok() {
1028 static int LastChar = ' ';
1029
1030 // Skip any whitespace.
1031 while (isspace(LastChar))
1032 LastChar = getchar();
1033
1034 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
1035 IdentifierStr = LastChar;
1036 while (isalnum((LastChar = getchar())))
1037 IdentifierStr += LastChar;
1038
1039 if (IdentifierStr == "def") return tok_def;
1040 if (IdentifierStr == "extern") return tok_extern;
1041 if (IdentifierStr == "if") return tok_if;
1042 if (IdentifierStr == "then") return tok_then;
1043 if (IdentifierStr == "else") return tok_else;
1044 if (IdentifierStr == "for") return tok_for;
1045 if (IdentifierStr == "in") return tok_in;
1046 if (IdentifierStr == "binary") return tok_binary;
1047 if (IdentifierStr == "unary") return tok_unary;
1048 if (IdentifierStr == "var") return tok_var;
1049 return tok_identifier;
1050 }
1051
1052 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
1053 std::string NumStr;
1054 do {
1055 NumStr += LastChar;
1056 LastChar = getchar();
1057 } while (isdigit(LastChar) || LastChar == '.');
1058
1059 NumVal = strtod(NumStr.c_str(), 0);
1060 return tok_number;
1061 }
1062
1063 if (LastChar == '#') {
1064 // Comment until end of line.
1065 do LastChar = getchar();
1066 while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp; LastChar != '\r');
1067
1068 if (LastChar != EOF)
1069 return gettok();
1070 }
1071
1072 // Check for end of file. Don't eat the EOF.
1073 if (LastChar == EOF)
1074 return tok_eof;
1075
1076 // Otherwise, just return the character as its ascii value.
1077 int ThisChar = LastChar;
1078 LastChar = getchar();
1079 return ThisChar;
1080}
1081
1082//===----------------------------------------------------------------------===//
1083// Abstract Syntax Tree (aka Parse Tree)
1084//===----------------------------------------------------------------------===//
1085
1086/// ExprAST - Base class for all expression nodes.
1087class ExprAST {
1088public:
1089 virtual ~ExprAST() {}
1090 virtual Value *Codegen() = 0;
1091};
1092
1093/// NumberExprAST - Expression class for numeric literals like "1.0".
1094class NumberExprAST : public ExprAST {
1095 double Val;
1096public:
1097 NumberExprAST(double val) : Val(val) {}
1098 virtual Value *Codegen();
1099};
1100
1101/// VariableExprAST - Expression class for referencing a variable, like "a".
1102class VariableExprAST : public ExprAST {
1103 std::string Name;
1104public:
1105 VariableExprAST(const std::string &amp;name) : Name(name) {}
1106 const std::string &amp;getName() const { return Name; }
1107 virtual Value *Codegen();
1108};
1109
1110/// UnaryExprAST - Expression class for a unary operator.
1111class UnaryExprAST : public ExprAST {
1112 char Opcode;
1113 ExprAST *Operand;
1114public:
1115 UnaryExprAST(char opcode, ExprAST *operand)
1116 : Opcode(opcode), Operand(operand) {}
1117 virtual Value *Codegen();
1118};
1119
1120/// BinaryExprAST - Expression class for a binary operator.
1121class BinaryExprAST : public ExprAST {
1122 char Op;
1123 ExprAST *LHS, *RHS;
1124public:
1125 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
1126 : Op(op), LHS(lhs), RHS(rhs) {}
1127 virtual Value *Codegen();
1128};
1129
1130/// CallExprAST - Expression class for function calls.
1131class CallExprAST : public ExprAST {
1132 std::string Callee;
1133 std::vector&lt;ExprAST*&gt; Args;
1134public:
1135 CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
1136 : Callee(callee), Args(args) {}
1137 virtual Value *Codegen();
1138};
1139
1140/// IfExprAST - Expression class for if/then/else.
1141class IfExprAST : public ExprAST {
1142 ExprAST *Cond, *Then, *Else;
1143public:
1144 IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
1145 : Cond(cond), Then(then), Else(_else) {}
1146 virtual Value *Codegen();
1147};
1148
1149/// ForExprAST - Expression class for for/in.
1150class ForExprAST : public ExprAST {
1151 std::string VarName;
1152 ExprAST *Start, *End, *Step, *Body;
1153public:
1154 ForExprAST(const std::string &amp;varname, ExprAST *start, ExprAST *end,
1155 ExprAST *step, ExprAST *body)
1156 : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
1157 virtual Value *Codegen();
1158};
1159
1160/// VarExprAST - Expression class for var/in
1161class VarExprAST : public ExprAST {
1162 std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
1163 ExprAST *Body;
1164public:
1165 VarExprAST(const std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; &amp;varnames,
1166 ExprAST *body)
1167 : VarNames(varnames), Body(body) {}
1168
1169 virtual Value *Codegen();
1170};
1171
1172/// PrototypeAST - This class represents the "prototype" for a function,
1173/// which captures its argument names as well as if it is an operator.
1174class PrototypeAST {
1175 std::string Name;
1176 std::vector&lt;std::string&gt; Args;
1177 bool isOperator;
1178 unsigned Precedence; // Precedence if a binary op.
1179public:
1180 PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args,
1181 bool isoperator = false, unsigned prec = 0)
1182 : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {}
1183
1184 bool isUnaryOp() const { return isOperator &amp;&amp; Args.size() == 1; }
1185 bool isBinaryOp() const { return isOperator &amp;&amp; Args.size() == 2; }
1186
1187 char getOperatorName() const {
1188 assert(isUnaryOp() || isBinaryOp());
1189 return Name[Name.size()-1];
1190 }
1191
1192 unsigned getBinaryPrecedence() const { return Precedence; }
1193
1194 Function *Codegen();
1195
1196 void CreateArgumentAllocas(Function *F);
1197};
1198
1199/// FunctionAST - This class represents a function definition itself.
1200class FunctionAST {
1201 PrototypeAST *Proto;
1202 ExprAST *Body;
1203public:
1204 FunctionAST(PrototypeAST *proto, ExprAST *body)
1205 : Proto(proto), Body(body) {}
1206
1207 Function *Codegen();
1208};
1209
1210//===----------------------------------------------------------------------===//
1211// Parser
1212//===----------------------------------------------------------------------===//
1213
1214/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
1215/// token the parser it looking at. getNextToken reads another token from the
1216/// lexer and updates CurTok with its results.
1217static int CurTok;
1218static int getNextToken() {
1219 return CurTok = gettok();
1220}
1221
1222/// BinopPrecedence - This holds the precedence for each binary operator that is
1223/// defined.
1224static std::map&lt;char, int&gt; BinopPrecedence;
1225
1226/// GetTokPrecedence - Get the precedence of the pending binary operator token.
1227static int GetTokPrecedence() {
1228 if (!isascii(CurTok))
1229 return -1;
1230
1231 // Make sure it's a declared binop.
1232 int TokPrec = BinopPrecedence[CurTok];
1233 if (TokPrec &lt;= 0) return -1;
1234 return TokPrec;
1235}
1236
1237/// Error* - These are little helper functions for error handling.
1238ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
1239PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
1240FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
1241
1242static ExprAST *ParseExpression();
1243
1244/// identifierexpr
1245/// ::= identifer
1246/// ::= identifer '(' expression* ')'
1247static ExprAST *ParseIdentifierExpr() {
1248 std::string IdName = IdentifierStr;
1249
1250 getNextToken(); // eat identifer.
1251
1252 if (CurTok != '(') // Simple variable ref.
1253 return new VariableExprAST(IdName);
1254
1255 // Call.
1256 getNextToken(); // eat (
1257 std::vector&lt;ExprAST*&gt; Args;
1258 if (CurTok != ')') {
1259 while (1) {
1260 ExprAST *Arg = ParseExpression();
1261 if (!Arg) return 0;
1262 Args.push_back(Arg);
1263
1264 if (CurTok == ')') break;
1265
1266 if (CurTok != ',')
1267 return Error("Expected ')'");
1268 getNextToken();
1269 }
1270 }
1271
1272 // Eat the ')'.
1273 getNextToken();
1274
1275 return new CallExprAST(IdName, Args);
1276}
1277
1278/// numberexpr ::= number
1279static ExprAST *ParseNumberExpr() {
1280 ExprAST *Result = new NumberExprAST(NumVal);
1281 getNextToken(); // consume the number
1282 return Result;
1283}
1284
1285/// parenexpr ::= '(' expression ')'
1286static ExprAST *ParseParenExpr() {
1287 getNextToken(); // eat (.
1288 ExprAST *V = ParseExpression();
1289 if (!V) return 0;
1290
1291 if (CurTok != ')')
1292 return Error("expected ')'");
1293 getNextToken(); // eat ).
1294 return V;
1295}
1296
1297/// ifexpr ::= 'if' expression 'then' expression 'else' expression
1298static ExprAST *ParseIfExpr() {
1299 getNextToken(); // eat the if.
1300
1301 // condition.
1302 ExprAST *Cond = ParseExpression();
1303 if (!Cond) return 0;
1304
1305 if (CurTok != tok_then)
1306 return Error("expected then");
1307 getNextToken(); // eat the then
1308
1309 ExprAST *Then = ParseExpression();
1310 if (Then == 0) return 0;
1311
1312 if (CurTok != tok_else)
1313 return Error("expected else");
1314
1315 getNextToken();
1316
1317 ExprAST *Else = ParseExpression();
1318 if (!Else) return 0;
1319
1320 return new IfExprAST(Cond, Then, Else);
1321}
1322
1323/// forexpr ::= 'for' identifer '=' expr ',' expr (',' expr)? 'in' expression
1324static ExprAST *ParseForExpr() {
1325 getNextToken(); // eat the for.
1326
1327 if (CurTok != tok_identifier)
1328 return Error("expected identifier after for");
1329
1330 std::string IdName = IdentifierStr;
1331 getNextToken(); // eat identifer.
1332
1333 if (CurTok != '=')
1334 return Error("expected '=' after for");
1335 getNextToken(); // eat '='.
1336
1337
1338 ExprAST *Start = ParseExpression();
1339 if (Start == 0) return 0;
1340 if (CurTok != ',')
1341 return Error("expected ',' after for start value");
1342 getNextToken();
1343
1344 ExprAST *End = ParseExpression();
1345 if (End == 0) return 0;
1346
1347 // The step value is optional.
1348 ExprAST *Step = 0;
1349 if (CurTok == ',') {
1350 getNextToken();
1351 Step = ParseExpression();
1352 if (Step == 0) return 0;
1353 }
1354
1355 if (CurTok != tok_in)
1356 return Error("expected 'in' after for");
1357 getNextToken(); // eat 'in'.
1358
1359 ExprAST *Body = ParseExpression();
1360 if (Body == 0) return 0;
1361
1362 return new ForExprAST(IdName, Start, End, Step, Body);
1363}
1364
1365/// varexpr ::= 'var' identifer ('=' expression)?
1366// (',' identifer ('=' expression)?)* 'in' expression
1367static ExprAST *ParseVarExpr() {
1368 getNextToken(); // eat the var.
1369
1370 std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
1371
1372 // At least one variable name is required.
1373 if (CurTok != tok_identifier)
1374 return Error("expected identifier after var");
1375
1376 while (1) {
1377 std::string Name = IdentifierStr;
1378 getNextToken(); // eat identifer.
1379
1380 // Read the optional initializer.
1381 ExprAST *Init = 0;
1382 if (CurTok == '=') {
1383 getNextToken(); // eat the '='.
1384
1385 Init = ParseExpression();
1386 if (Init == 0) return 0;
1387 }
1388
1389 VarNames.push_back(std::make_pair(Name, Init));
1390
1391 // End of var list, exit loop.
1392 if (CurTok != ',') break;
1393 getNextToken(); // eat the ','.
1394
1395 if (CurTok != tok_identifier)
1396 return Error("expected identifier list after var");
1397 }
1398
1399 // At this point, we have to have 'in'.
1400 if (CurTok != tok_in)
1401 return Error("expected 'in' keyword after 'var'");
1402 getNextToken(); // eat 'in'.
1403
1404 ExprAST *Body = ParseExpression();
1405 if (Body == 0) return 0;
1406
1407 return new VarExprAST(VarNames, Body);
1408}
1409
1410
1411/// primary
1412/// ::= identifierexpr
1413/// ::= numberexpr
1414/// ::= parenexpr
1415/// ::= ifexpr
1416/// ::= forexpr
1417/// ::= varexpr
1418static ExprAST *ParsePrimary() {
1419 switch (CurTok) {
1420 default: return Error("unknown token when expecting an expression");
1421 case tok_identifier: return ParseIdentifierExpr();
1422 case tok_number: return ParseNumberExpr();
1423 case '(': return ParseParenExpr();
1424 case tok_if: return ParseIfExpr();
1425 case tok_for: return ParseForExpr();
1426 case tok_var: return ParseVarExpr();
1427 }
1428}
1429
1430/// unary
1431/// ::= primary
1432/// ::= '!' unary
1433static ExprAST *ParseUnary() {
1434 // If the current token is not an operator, it must be a primary expr.
1435 if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
1436 return ParsePrimary();
1437
1438 // If this is a unary operator, read it.
1439 int Opc = CurTok;
1440 getNextToken();
1441 if (ExprAST *Operand = ParseUnary())
1442 return new UnaryExprAST(Opc, Operand);
1443 return 0;
1444}
1445
1446/// binoprhs
1447/// ::= ('+' unary)*
1448static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
1449 // If this is a binop, find its precedence.
1450 while (1) {
1451 int TokPrec = GetTokPrecedence();
1452
1453 // If this is a binop that binds at least as tightly as the current binop,
1454 // consume it, otherwise we are done.
1455 if (TokPrec &lt; ExprPrec)
1456 return LHS;
1457
1458 // Okay, we know this is a binop.
1459 int BinOp = CurTok;
1460 getNextToken(); // eat binop
1461
1462 // Parse the unary expression after the binary operator.
1463 ExprAST *RHS = ParseUnary();
1464 if (!RHS) return 0;
1465
1466 // If BinOp binds less tightly with RHS than the operator after RHS, let
1467 // the pending operator take RHS as its LHS.
1468 int NextPrec = GetTokPrecedence();
1469 if (TokPrec &lt; NextPrec) {
1470 RHS = ParseBinOpRHS(TokPrec+1, RHS);
1471 if (RHS == 0) return 0;
1472 }
1473
1474 // Merge LHS/RHS.
1475 LHS = new BinaryExprAST(BinOp, LHS, RHS);
1476 }
1477}
1478
1479/// expression
1480/// ::= unary binoprhs
1481///
1482static ExprAST *ParseExpression() {
1483 ExprAST *LHS = ParseUnary();
1484 if (!LHS) return 0;
1485
1486 return ParseBinOpRHS(0, LHS);
1487}
1488
1489/// prototype
1490/// ::= id '(' id* ')'
1491/// ::= binary LETTER number? (id, id)
1492/// ::= unary LETTER (id)
1493static PrototypeAST *ParsePrototype() {
1494 std::string FnName;
1495
1496 int Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
1497 unsigned BinaryPrecedence = 30;
1498
1499 switch (CurTok) {
1500 default:
1501 return ErrorP("Expected function name in prototype");
1502 case tok_identifier:
1503 FnName = IdentifierStr;
1504 Kind = 0;
1505 getNextToken();
1506 break;
1507 case tok_unary:
1508 getNextToken();
1509 if (!isascii(CurTok))
1510 return ErrorP("Expected unary operator");
1511 FnName = "unary";
1512 FnName += (char)CurTok;
1513 Kind = 1;
1514 getNextToken();
1515 break;
1516 case tok_binary:
1517 getNextToken();
1518 if (!isascii(CurTok))
1519 return ErrorP("Expected binary operator");
1520 FnName = "binary";
1521 FnName += (char)CurTok;
1522 Kind = 2;
1523 getNextToken();
1524
1525 // Read the precedence if present.
1526 if (CurTok == tok_number) {
1527 if (NumVal &lt; 1 || NumVal &gt; 100)
1528 return ErrorP("Invalid precedecnce: must be 1..100");
1529 BinaryPrecedence = (unsigned)NumVal;
1530 getNextToken();
1531 }
1532 break;
1533 }
1534
1535 if (CurTok != '(')
1536 return ErrorP("Expected '(' in prototype");
1537
1538 std::vector&lt;std::string&gt; ArgNames;
1539 while (getNextToken() == tok_identifier)
1540 ArgNames.push_back(IdentifierStr);
1541 if (CurTok != ')')
1542 return ErrorP("Expected ')' in prototype");
1543
1544 // success.
1545 getNextToken(); // eat ')'.
1546
1547 // Verify right number of names for operator.
1548 if (Kind &amp;&amp; ArgNames.size() != Kind)
1549 return ErrorP("Invalid number of operands for operator");
1550
1551 return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);
1552}
1553
1554/// definition ::= 'def' prototype expression
1555static FunctionAST *ParseDefinition() {
1556 getNextToken(); // eat def.
1557 PrototypeAST *Proto = ParsePrototype();
1558 if (Proto == 0) return 0;
1559
1560 if (ExprAST *E = ParseExpression())
1561 return new FunctionAST(Proto, E);
1562 return 0;
1563}
1564
1565/// toplevelexpr ::= expression
1566static FunctionAST *ParseTopLevelExpr() {
1567 if (ExprAST *E = ParseExpression()) {
1568 // Make an anonymous proto.
1569 PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
1570 return new FunctionAST(Proto, E);
1571 }
1572 return 0;
1573}
1574
1575/// external ::= 'extern' prototype
1576static PrototypeAST *ParseExtern() {
1577 getNextToken(); // eat extern.
1578 return ParsePrototype();
1579}
1580
1581//===----------------------------------------------------------------------===//
1582// Code Generation
1583//===----------------------------------------------------------------------===//
1584
1585static Module *TheModule;
1586static LLVMFoldingBuilder Builder;
1587static std::map&lt;std::string, AllocaInst*&gt; NamedValues;
1588static FunctionPassManager *TheFPM;
1589
1590Value *ErrorV(const char *Str) { Error(Str); return 0; }
1591
1592/// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
1593/// the function. This is used for mutable variables etc.
1594static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
1595 const std::string &amp;VarName) {
1596 LLVMBuilder TmpB(&amp;TheFunction-&gt;getEntryBlock(),
1597 TheFunction-&gt;getEntryBlock().begin());
1598 return TmpB.CreateAlloca(Type::DoubleTy, 0, VarName.c_str());
1599}
1600
1601
1602Value *NumberExprAST::Codegen() {
1603 return ConstantFP::get(Type::DoubleTy, APFloat(Val));
1604}
1605
1606Value *VariableExprAST::Codegen() {
1607 // Look this variable up in the function.
1608 Value *V = NamedValues[Name];
1609 if (V == 0) return ErrorV("Unknown variable name");
1610
1611 // Load the value.
1612 return Builder.CreateLoad(V, Name.c_str());
1613}
1614
1615Value *UnaryExprAST::Codegen() {
1616 Value *OperandV = Operand-&gt;Codegen();
1617 if (OperandV == 0) return 0;
1618
1619 Function *F = TheModule-&gt;getFunction(std::string("unary")+Opcode);
1620 if (F == 0)
1621 return ErrorV("Unknown unary operator");
1622
1623 return Builder.CreateCall(F, OperandV, "unop");
1624}
1625
1626
1627Value *BinaryExprAST::Codegen() {
1628 // Special case '=' because we don't want to emit the LHS as an expression.
1629 if (Op == '=') {
1630 // Assignment requires the LHS to be an identifier.
1631 VariableExprAST *LHSE = dynamic_cast&lt;VariableExprAST*&gt;(LHS);
1632 if (!LHSE)
1633 return ErrorV("destination of '=' must be a variable");
1634 // Codegen the RHS.
1635 Value *Val = RHS-&gt;Codegen();
1636 if (Val == 0) return 0;
1637
1638 // Look up the name.
1639 Value *Variable = NamedValues[LHSE-&gt;getName()];
1640 if (Variable == 0) return ErrorV("Unknown variable name");
1641
1642 Builder.CreateStore(Val, Variable);
1643 return Val;
1644 }
1645
1646
1647 Value *L = LHS-&gt;Codegen();
1648 Value *R = RHS-&gt;Codegen();
1649 if (L == 0 || R == 0) return 0;
1650
1651 switch (Op) {
1652 case '+': return Builder.CreateAdd(L, R, "addtmp");
1653 case '-': return Builder.CreateSub(L, R, "subtmp");
1654 case '*': return Builder.CreateMul(L, R, "multmp");
1655 case '&lt;':
1656 L = Builder.CreateFCmpULT(L, R, "multmp");
1657 // Convert bool 0/1 to double 0.0 or 1.0
1658 return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
1659 default: break;
1660 }
1661
1662 // If it wasn't a builtin binary operator, it must be a user defined one. Emit
1663 // a call to it.
1664 Function *F = TheModule-&gt;getFunction(std::string("binary")+Op);
1665 assert(F &amp;&amp; "binary operator not found!");
1666
1667 Value *Ops[] = { L, R };
1668 return Builder.CreateCall(F, Ops, Ops+2, "binop");
1669}
1670
1671Value *CallExprAST::Codegen() {
1672 // Look up the name in the global module table.
1673 Function *CalleeF = TheModule-&gt;getFunction(Callee);
1674 if (CalleeF == 0)
1675 return ErrorV("Unknown function referenced");
1676
1677 // If argument mismatch error.
1678 if (CalleeF-&gt;arg_size() != Args.size())
1679 return ErrorV("Incorrect # arguments passed");
1680
1681 std::vector&lt;Value*&gt; ArgsV;
1682 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
1683 ArgsV.push_back(Args[i]-&gt;Codegen());
1684 if (ArgsV.back() == 0) return 0;
1685 }
1686
1687 return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
1688}
1689
1690Value *IfExprAST::Codegen() {
1691 Value *CondV = Cond-&gt;Codegen();
1692 if (CondV == 0) return 0;
1693
1694 // Convert condition to a bool by comparing equal to 0.0.
1695 CondV = Builder.CreateFCmpONE(CondV,
1696 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
1697 "ifcond");
1698
1699 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1700
1701 // Create blocks for the then and else cases. Insert the 'then' block at the
1702 // end of the function.
1703 BasicBlock *ThenBB = new BasicBlock("then", TheFunction);
1704 BasicBlock *ElseBB = new BasicBlock("else");
1705 BasicBlock *MergeBB = new BasicBlock("ifcont");
1706
1707 Builder.CreateCondBr(CondV, ThenBB, ElseBB);
1708
1709 // Emit then value.
1710 Builder.SetInsertPoint(ThenBB);
1711
1712 Value *ThenV = Then-&gt;Codegen();
1713 if (ThenV == 0) return 0;
1714
1715 Builder.CreateBr(MergeBB);
1716 // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
1717 ThenBB = Builder.GetInsertBlock();
1718
1719 // Emit else block.
1720 TheFunction-&gt;getBasicBlockList().push_back(ElseBB);
1721 Builder.SetInsertPoint(ElseBB);
1722
1723 Value *ElseV = Else-&gt;Codegen();
1724 if (ElseV == 0) return 0;
1725
1726 Builder.CreateBr(MergeBB);
1727 // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
1728 ElseBB = Builder.GetInsertBlock();
1729
1730 // Emit merge block.
1731 TheFunction-&gt;getBasicBlockList().push_back(MergeBB);
1732 Builder.SetInsertPoint(MergeBB);
1733 PHINode *PN = Builder.CreatePHI(Type::DoubleTy, "iftmp");
1734
1735 PN-&gt;addIncoming(ThenV, ThenBB);
1736 PN-&gt;addIncoming(ElseV, ElseBB);
1737 return PN;
1738}
1739
1740Value *ForExprAST::Codegen() {
1741 // Output this as:
1742 // var = alloca double
1743 // ...
1744 // start = startexpr
1745 // store start -&gt; var
1746 // goto loop
1747 // loop:
1748 // ...
1749 // bodyexpr
1750 // ...
1751 // loopend:
1752 // step = stepexpr
1753 // endcond = endexpr
1754 //
1755 // curvar = load var
1756 // nextvar = curvar + step
1757 // store nextvar -&gt; var
1758 // br endcond, loop, endloop
1759 // outloop:
1760
1761 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1762
1763 // Create an alloca for the variable in the entry block.
1764 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
1765
1766 // Emit the start code first, without 'variable' in scope.
1767 Value *StartVal = Start-&gt;Codegen();
1768 if (StartVal == 0) return 0;
1769
1770 // Store the value into the alloca.
1771 Builder.CreateStore(StartVal, Alloca);
1772
1773 // Make the new basic block for the loop header, inserting after current
1774 // block.
1775 BasicBlock *PreheaderBB = Builder.GetInsertBlock();
1776 BasicBlock *LoopBB = new BasicBlock("loop", TheFunction);
1777
1778 // Insert an explicit fall through from the current block to the LoopBB.
1779 Builder.CreateBr(LoopBB);
1780
1781 // Start insertion in LoopBB.
1782 Builder.SetInsertPoint(LoopBB);
1783
1784 // Within the loop, the variable is defined equal to the PHI node. If it
1785 // shadows an existing variable, we have to restore it, so save it now.
1786 AllocaInst *OldVal = NamedValues[VarName];
1787 NamedValues[VarName] = Alloca;
1788
1789 // Emit the body of the loop. This, like any other expr, can change the
1790 // current BB. Note that we ignore the value computed by the body, but don't
1791 // allow an error.
1792 if (Body-&gt;Codegen() == 0)
1793 return 0;
1794
1795 // Emit the step value.
1796 Value *StepVal;
1797 if (Step) {
1798 StepVal = Step-&gt;Codegen();
1799 if (StepVal == 0) return 0;
1800 } else {
1801 // If not specified, use 1.0.
1802 StepVal = ConstantFP::get(Type::DoubleTy, APFloat(1.0));
1803 }
1804
1805 // Compute the end condition.
1806 Value *EndCond = End-&gt;Codegen();
1807 if (EndCond == 0) return EndCond;
1808
1809 // Reload, increment, and restore the alloca. This handles the case where
1810 // the body of the loop mutates the variable.
1811 Value *CurVar = Builder.CreateLoad(Alloca, VarName.c_str());
1812 Value *NextVar = Builder.CreateAdd(CurVar, StepVal, "nextvar");
1813 Builder.CreateStore(NextVar, Alloca);
1814
1815 // Convert condition to a bool by comparing equal to 0.0.
1816 EndCond = Builder.CreateFCmpONE(EndCond,
1817 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
1818 "loopcond");
1819
1820 // Create the "after loop" block and insert it.
1821 BasicBlock *LoopEndBB = Builder.GetInsertBlock();
1822 BasicBlock *AfterBB = new BasicBlock("afterloop", TheFunction);
1823
1824 // Insert the conditional branch into the end of LoopEndBB.
1825 Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
1826
1827 // Any new code will be inserted in AfterBB.
1828 Builder.SetInsertPoint(AfterBB);
1829
1830 // Restore the unshadowed variable.
1831 if (OldVal)
1832 NamedValues[VarName] = OldVal;
1833 else
1834 NamedValues.erase(VarName);
1835
1836
1837 // for expr always returns 0.0.
1838 return Constant::getNullValue(Type::DoubleTy);
1839}
1840
1841Value *VarExprAST::Codegen() {
1842 std::vector&lt;AllocaInst *&gt; OldBindings;
1843
1844 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1845
1846 // Register all variables and emit their initializer.
1847 for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
1848 const std::string &amp;VarName = VarNames[i].first;
1849 ExprAST *Init = VarNames[i].second;
1850
1851 // Emit the initializer before adding the variable to scope, this prevents
1852 // the initializer from referencing the variable itself, and permits stuff
1853 // like this:
1854 // var a = 1 in
1855 // var a = a in ... # refers to outer 'a'.
1856 Value *InitVal;
1857 if (Init) {
1858 InitVal = Init-&gt;Codegen();
1859 if (InitVal == 0) return 0;
1860 } else { // If not specified, use 0.0.
1861 InitVal = ConstantFP::get(Type::DoubleTy, APFloat(0.0));
1862 }
1863
1864 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
1865 Builder.CreateStore(InitVal, Alloca);
1866
1867 // Remember the old variable binding so that we can restore the binding when
1868 // we unrecurse.
1869 OldBindings.push_back(NamedValues[VarName]);
1870
1871 // Remember this binding.
1872 NamedValues[VarName] = Alloca;
1873 }
1874
1875 // Codegen the body, now that all vars are in scope.
1876 Value *BodyVal = Body-&gt;Codegen();
1877 if (BodyVal == 0) return 0;
1878
1879 // Pop all our variables from scope.
1880 for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
1881 NamedValues[VarNames[i].first] = OldBindings[i];
1882
1883 // Return the body computation.
1884 return BodyVal;
1885}
1886
1887
1888Function *PrototypeAST::Codegen() {
1889 // Make the function type: double(double,double) etc.
1890 std::vector&lt;const Type*&gt; Doubles(Args.size(), Type::DoubleTy);
1891 FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
1892
1893 Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
1894
1895 // If F conflicted, there was already something named 'Name'. If it has a
1896 // body, don't allow redefinition or reextern.
1897 if (F-&gt;getName() != Name) {
1898 // Delete the one we just made and get the existing one.
1899 F-&gt;eraseFromParent();
1900 F = TheModule-&gt;getFunction(Name);
1901
1902 // If F already has a body, reject this.
1903 if (!F-&gt;empty()) {
1904 ErrorF("redefinition of function");
1905 return 0;
1906 }
1907
1908 // If F took a different number of args, reject.
1909 if (F-&gt;arg_size() != Args.size()) {
1910 ErrorF("redefinition of function with different # args");
1911 return 0;
1912 }
1913 }
1914
1915 // Set names for all arguments.
1916 unsigned Idx = 0;
1917 for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
1918 ++AI, ++Idx)
1919 AI-&gt;setName(Args[Idx]);
1920
1921 return F;
1922}
1923
1924/// CreateArgumentAllocas - Create an alloca for each argument and register the
1925/// argument in the symbol table so that references to it will succeed.
1926void PrototypeAST::CreateArgumentAllocas(Function *F) {
1927 Function::arg_iterator AI = F-&gt;arg_begin();
1928 for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
1929 // Create an alloca for this variable.
1930 AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
1931
1932 // Store the initial value into the alloca.
1933 Builder.CreateStore(AI, Alloca);
1934
1935 // Add arguments to variable symbol table.
1936 NamedValues[Args[Idx]] = Alloca;
1937 }
1938}
1939
1940
1941Function *FunctionAST::Codegen() {
1942 NamedValues.clear();
1943
1944 Function *TheFunction = Proto-&gt;Codegen();
1945 if (TheFunction == 0)
1946 return 0;
1947
1948 // If this is an operator, install it.
1949 if (Proto-&gt;isBinaryOp())
1950 BinopPrecedence[Proto-&gt;getOperatorName()] = Proto-&gt;getBinaryPrecedence();
1951
1952 // Create a new basic block to start insertion into.
1953 BasicBlock *BB = new BasicBlock("entry", TheFunction);
1954 Builder.SetInsertPoint(BB);
1955
1956 // Add all arguments to the symbol table and create their allocas.
1957 Proto-&gt;CreateArgumentAllocas(TheFunction);
1958
1959 if (Value *RetVal = Body-&gt;Codegen()) {
1960 // Finish off the function.
1961 Builder.CreateRet(RetVal);
1962
1963 // Validate the generated code, checking for consistency.
1964 verifyFunction(*TheFunction);
1965
1966 // Optimize the function.
1967 TheFPM-&gt;run(*TheFunction);
1968
1969 return TheFunction;
1970 }
1971
1972 // Error reading body, remove function.
1973 TheFunction-&gt;eraseFromParent();
1974
1975 if (Proto-&gt;isBinaryOp())
1976 BinopPrecedence.erase(Proto-&gt;getOperatorName());
1977 return 0;
1978}
1979
1980//===----------------------------------------------------------------------===//
1981// Top-Level parsing and JIT Driver
1982//===----------------------------------------------------------------------===//
1983
1984static ExecutionEngine *TheExecutionEngine;
1985
1986static void HandleDefinition() {
1987 if (FunctionAST *F = ParseDefinition()) {
1988 if (Function *LF = F-&gt;Codegen()) {
1989 fprintf(stderr, "Read function definition:");
1990 LF-&gt;dump();
1991 }
1992 } else {
1993 // Skip token for error recovery.
1994 getNextToken();
1995 }
1996}
1997
1998static void HandleExtern() {
1999 if (PrototypeAST *P = ParseExtern()) {
2000 if (Function *F = P-&gt;Codegen()) {
2001 fprintf(stderr, "Read extern: ");
2002 F-&gt;dump();
2003 }
2004 } else {
2005 // Skip token for error recovery.
2006 getNextToken();
2007 }
2008}
2009
2010static void HandleTopLevelExpression() {
2011 // Evaluate a top level expression into an anonymous function.
2012 if (FunctionAST *F = ParseTopLevelExpr()) {
2013 if (Function *LF = F-&gt;Codegen()) {
2014 // JIT the function, returning a function pointer.
2015 void *FPtr = TheExecutionEngine-&gt;getPointerToFunction(LF);
2016
2017 // Cast it to the right type (takes no arguments, returns a double) so we
2018 // can call it as a native function.
2019 double (*FP)() = (double (*)())FPtr;
2020 fprintf(stderr, "Evaluated to %f\n", FP());
2021 }
2022 } else {
2023 // Skip token for error recovery.
2024 getNextToken();
2025 }
2026}
2027
2028/// top ::= definition | external | expression | ';'
2029static void MainLoop() {
2030 while (1) {
2031 fprintf(stderr, "ready&gt; ");
2032 switch (CurTok) {
2033 case tok_eof: return;
2034 case ';': getNextToken(); break; // ignore top level semicolons.
2035 case tok_def: HandleDefinition(); break;
2036 case tok_extern: HandleExtern(); break;
2037 default: HandleTopLevelExpression(); break;
2038 }
2039 }
2040}
2041
2042
2043
2044//===----------------------------------------------------------------------===//
2045// "Library" functions that can be "extern'd" from user code.
2046//===----------------------------------------------------------------------===//
2047
2048/// putchard - putchar that takes a double and returns 0.
2049extern "C"
2050double putchard(double X) {
2051 putchar((char)X);
2052 return 0;
2053}
2054
2055/// printd - printf that takes a double prints it as "%f\n", returning 0.
2056extern "C"
2057double printd(double X) {
2058 printf("%f\n", X);
2059 return 0;
2060}
2061
2062//===----------------------------------------------------------------------===//
2063// Main driver code.
2064//===----------------------------------------------------------------------===//
2065
2066int main() {
2067 // Install standard binary operators.
2068 // 1 is lowest precedence.
2069 BinopPrecedence['='] = 2;
2070 BinopPrecedence['&lt;'] = 10;
2071 BinopPrecedence['+'] = 20;
2072 BinopPrecedence['-'] = 20;
2073 BinopPrecedence['*'] = 40; // highest.
2074
2075 // Prime the first token.
2076 fprintf(stderr, "ready&gt; ");
2077 getNextToken();
2078
2079 // Make the module, which holds all the code.
2080 TheModule = new Module("my cool jit");
2081
2082 // Create the JIT.
2083 TheExecutionEngine = ExecutionEngine::create(TheModule);
2084
2085 {
2086 ExistingModuleProvider OurModuleProvider(TheModule);
2087 FunctionPassManager OurFPM(&amp;OurModuleProvider);
2088
2089 // Set up the optimizer pipeline. Start with registering info about how the
2090 // target lays out data structures.
2091 OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
2092 // Promote allocas to registers.
2093 OurFPM.add(createPromoteMemoryToRegisterPass());
2094 // Do simple "peephole" optimizations and bit-twiddling optzns.
2095 OurFPM.add(createInstructionCombiningPass());
2096 // Reassociate expressions.
2097 OurFPM.add(createReassociatePass());
2098 // Eliminate Common SubExpressions.
2099 OurFPM.add(createGVNPass());
2100 // Simplify the control flow graph (deleting unreachable blocks, etc).
2101 OurFPM.add(createCFGSimplificationPass());
2102
2103 // Set the global so the code gen can use this.
2104 TheFPM = &amp;OurFPM;
2105
2106 // Run the main "interpreter loop" now.
2107 MainLoop();
2108
2109 TheFPM = 0;
2110 } // Free module provider and pass manager.
2111
2112
2113 // Print out all of the generated code.
2114 TheModule-&gt;dump();
2115 return 0;
2116}
Chris Lattner00c992d2007-11-03 08:55:29 +00002117</pre>
2118</div>
2119
2120</div>
2121
2122<!-- *********************************************************************** -->
2123<hr>
2124<address>
2125 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
2126 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
2127 <a href="http://validator.w3.org/check/referer"><img
2128 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
2129
2130 <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
2131 <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
2132 Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $
2133</address>
2134</body>
2135</html>