blob: b002f7a45193f31427a246f451e7be5699eb3c0f [file] [log] [blame]
Chris Lattner00c992d2007-11-03 08:55:29 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
3
4<html>
5<head>
6 <title>Kaleidoscope: Extending the Language: Mutable Variables / SSA
7 construction</title>
8 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
9 <meta name="author" content="Chris Lattner">
10 <link rel="stylesheet" href="../llvm.css" type="text/css">
11</head>
12
13<body>
14
15<div class="doc_title">Kaleidoscope: Extending the Language: Mutable Variables</div>
16
17<div class="doc_author">
18 <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
19</div>
20
21<!-- *********************************************************************** -->
22<div class="doc_section"><a name="intro">Part 7 Introduction</a></div>
23<!-- *********************************************************************** -->
24
25<div class="doc_text">
26
27<p>Welcome to Part 7 of the "<a href="index.html">Implementing a language with
28LLVM</a>" tutorial. In parts 1 through 6, we've built a very respectable,
29albeit simple, <a
30href="http://en.wikipedia.org/wiki/Functional_programming">functional
31programming language</a>. In our journey, we learned some parsing techniques,
32how to build and represent an AST, how to build LLVM IR, and how to optimize
33the resultant code and JIT compile it.</p>
34
35<p>While Kaleidoscope is interesting as a functional language, this makes it
36"too easy" to generate LLVM IR for it. In particular, a functional language
37makes it very easy to build LLVM IR directly in <a
38href="http://en.wikipedia.org/wiki/Static_single_assignment_form">SSA form</a>.
39Since LLVM requires that the input code be in SSA form, this is a very nice
40property and it is often unclear to newcomers how to generate code for an
41imperative language with mutable variables.</p>
42
43<p>The short (and happy) summary of this chapter is that there is no need for
44your front-end to build SSA form: LLVM provides highly tuned and well tested
45support for this, though the way it works is a bit unexpected for some.</p>
46
47</div>
48
49<!-- *********************************************************************** -->
50<div class="doc_section"><a name="why">Why is this a hard problem?</a></div>
51<!-- *********************************************************************** -->
52
53<div class="doc_text">
54
55<p>
56To understand why mutable variables cause complexities in SSA construction,
57consider this extremely simple C example:
58</p>
59
60<div class="doc_code">
61<pre>
62int G, H;
63int test(_Bool Condition) {
64 int X;
65 if (Condition)
66 X = G;
67 else
68 X = H;
69 return X;
70}
71</pre>
72</div>
73
74<p>In this case, we have the variable "X", whose value depends on the path
75executed in the program. Because there are two different possible values for X
76before the return instruction, a PHI node is inserted to merge the two values.
77The LLVM IR that we want for this example looks like this:</p>
78
79<div class="doc_code">
80<pre>
81@G = weak global i32 0 ; type of @G is i32*
82@H = weak global i32 0 ; type of @H is i32*
83
84define i32 @test(i1 %Condition) {
85entry:
86 br i1 %Condition, label %cond_true, label %cond_false
87
88cond_true:
89 %X.0 = load i32* @G
90 br label %cond_next
91
92cond_false:
93 %X.1 = load i32* @H
94 br label %cond_next
95
96cond_next:
97 %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
98 ret i32 %X.2
99}
100</pre>
101</div>
102
103<p>In this example, the loads from the G and H global variables are explicit in
104the LLVM IR, and they live in the then/else branches of the if statement
105(cond_true/cond_false). In order to merge the incoming values, the X.2 phi node
106in the cond_next block selects the right value to use based on where control
107flow is coming from: if control flow comes from the cond_false block, X.2 gets
108the value of X.1. Alternatively, if control flow comes from cond_tree, it gets
109the value of X.0. The intent of this chapter is not to explain the details of
110SSA form. For more information, see one of the many <a
111href="http://en.wikipedia.org/wiki/Static_single_assignment_form">online
112references</a>.</p>
113
114<p>The question for this article is "who places phi nodes when lowering
115assignments to mutable variables?". The issue here is that LLVM
116<em>requires</em> that its IR be in SSA form: there is no "non-ssa" mode for it.
117However, SSA construction requires non-trivial algorithms and data structures,
118so it is inconvenient and wasteful for every front-end to have to reproduce this
119logic.</p>
120
121</div>
122
123<!-- *********************************************************************** -->
124<div class="doc_section"><a name="memory">Memory in LLVM</a></div>
125<!-- *********************************************************************** -->
126
127<div class="doc_text">
128
129<p>The 'trick' here is that while LLVM does require all register values to be
130in SSA form, it does not require (or permit) memory objects to be in SSA form.
131In the example above, note that the loads from G and H are direct accesses to
132G and H: they are not renamed or versioned. This differs from some other
Chris Lattner2e5d07e2007-11-04 19:42:13 +0000133compiler systems, which do try to version memory objects. In LLVM, instead of
Chris Lattner00c992d2007-11-03 08:55:29 +0000134encoding dataflow analysis of memory into the LLVM IR, it is handled with <a
135href="../WritingAnLLVMPass.html">Analysis Passes</a> which are computed on
136demand.</p>
137
138<p>
139With this in mind, the high-level idea is that we want to make a stack variable
140(which lives in memory, because it is on the stack) for each mutable object in
141a function. To take advantage of this trick, we need to talk about how LLVM
142represents stack variables.
143</p>
144
145<p>In LLVM, all memory accesses are explicit with load/store instructions, and
146it is carefully designed to not have (or need) an "address-of" operator. Notice
147how the type of the @G/@H global variables is actually "i32*" even though the
148variable is defined as "i32". What this means is that @G defines <em>space</em>
149for an i32 in the global data area, but its <em>name</em> actually refers to the
150address for that space. Stack variables work the same way, but instead of being
151declared with global variable definitions, they are declared with the
152<a href="../LangRef.html#i_alloca">LLVM alloca instruction</a>:</p>
153
154<div class="doc_code">
155<pre>
156define i32 @test(i1 %Condition) {
157entry:
158 %X = alloca i32 ; type of %X is i32*.
159 ...
160 %tmp = load i32* %X ; load the stack value %X from the stack.
161 %tmp2 = add i32 %tmp, 1 ; increment it
162 store i32 %tmp2, i32* %X ; store it back
163 ...
164</pre>
165</div>
166
167<p>This code shows an example of how you can declare and manipulate a stack
168variable in the LLVM IR. Stack memory allocated with the alloca instruction is
169fully general: you can pass the address of the stack slot to functions, you can
170store it in other variables, etc. In our example above, we could rewrite the
171example to use the alloca technique to avoid using a PHI node:</p>
172
173<div class="doc_code">
174<pre>
175@G = weak global i32 0 ; type of @G is i32*
176@H = weak global i32 0 ; type of @H is i32*
177
178define i32 @test(i1 %Condition) {
179entry:
180 %X = alloca i32 ; type of %X is i32*.
181 br i1 %Condition, label %cond_true, label %cond_false
182
183cond_true:
184 %X.0 = load i32* @G
185 store i32 %X.0, i32* %X ; Update X
186 br label %cond_next
187
188cond_false:
189 %X.1 = load i32* @H
190 store i32 %X.1, i32* %X ; Update X
191 br label %cond_next
192
193cond_next:
194 %X.2 = load i32* %X ; Read X
195 ret i32 %X.2
196}
197</pre>
198</div>
199
200<p>With this, we have discovered a way to handle arbitrary mutable variables
201without the need to create Phi nodes at all:</p>
202
203<ol>
204<li>Each mutable variable becomes a stack allocation.</li>
205<li>Each read of the variable becomes a load from the stack.</li>
206<li>Each update of the variable becomes a store to the stack.</li>
207<li>Taking the address of a variable just uses the stack address directly.</li>
208</ol>
209
210<p>While this solution has solved our immediate problem, it introduced another
211one: we have now apparently introduced a lot of stack traffic for very simple
212and common operations, a major performance problem. Fortunately for us, the
213LLVM optimizer has a highly-tuned optimization pass named "mem2reg" that handles
214this case, promoting allocas like this into SSA registers, inserting Phi nodes
215as appropriate. If you run this example through the pass, for example, you'll
216get:</p>
217
218<div class="doc_code">
219<pre>
220$ <b>llvm-as &lt; example.ll | opt -mem2reg | llvm-dis</b>
221@G = weak global i32 0
222@H = weak global i32 0
223
224define i32 @test(i1 %Condition) {
225entry:
226 br i1 %Condition, label %cond_true, label %cond_false
227
228cond_true:
229 %X.0 = load i32* @G
230 br label %cond_next
231
232cond_false:
233 %X.1 = load i32* @H
234 br label %cond_next
235
236cond_next:
237 %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
238 ret i32 %X.01
239}
240</pre>
Chris Lattnere7198312007-11-03 22:22:30 +0000241</div>
Chris Lattner00c992d2007-11-03 08:55:29 +0000242
Chris Lattnere7198312007-11-03 22:22:30 +0000243<p>The mem2reg pass implements the standard "iterated dominator frontier"
244algorithm for constructing SSA form and has a number of optimizations that speed
245up very common degenerate cases. mem2reg really is the answer for dealing with
246mutable variables, and we highly recommend that you depend on it. Note that
247mem2reg only works on variables in certain circumstances:</p>
Chris Lattner00c992d2007-11-03 08:55:29 +0000248
Chris Lattnere7198312007-11-03 22:22:30 +0000249<ol>
250<li>mem2reg is alloca-driven: it looks for allocas and if it can handle them, it
251promotes them. It does not apply to global variables or heap allocations.</li>
Chris Lattner00c992d2007-11-03 08:55:29 +0000252
Chris Lattnere7198312007-11-03 22:22:30 +0000253<li>mem2reg only looks for alloca instructions in the entry block of the
254function. Being in the entry block guarantees that the alloca is only executed
255once, which makes analysis simpler.</li>
Chris Lattner00c992d2007-11-03 08:55:29 +0000256
Chris Lattnere7198312007-11-03 22:22:30 +0000257<li>mem2reg only promotes allocas whose uses are direct loads and stores. If
258the address of the stack object is passed to a function, or if any funny pointer
259arithmetic is involved, the alloca will not be promoted.</li>
260
Chris Lattnera56b22d2007-11-05 17:45:54 +0000261<li>mem2reg only works on allocas of <a
262href="../LangRef.html#t_classifications">first class</a>
263values (such as pointers, scalars and vectors), and only if the array size
Chris Lattnere7198312007-11-03 22:22:30 +0000264of the allocation is 1 (or missing in the .ll file). mem2reg is not capable of
265promoting structs or arrays to registers. Note that the "scalarrepl" pass is
266more powerful and can promote structs, "unions", and arrays in many cases.</li>
267
268</ol>
269
270<p>
271All of these properties are easy to satisfy for most imperative languages, and
Chris Lattner2e5d07e2007-11-04 19:42:13 +0000272we'll illustrate this below with Kaleidoscope. The final question you may be
Chris Lattnere7198312007-11-03 22:22:30 +0000273asking is: should I bother with this nonsense for my front-end? Wouldn't it be
274better if I just did SSA construction directly, avoiding use of the mem2reg
275optimization pass? In short, we strongly recommend that use you this technique
276for building SSA form, unless there is an extremely good reason not to. Using
277this technique is:</p>
278
279<ul>
280<li>Proven and well tested: llvm-gcc and clang both use this technique for local
281mutable variables. As such, the most common clients of LLVM are using this to
282handle a bulk of their variables. You can be sure that bugs are found fast and
283fixed early.</li>
284
285<li>Extremely Fast: mem2reg has a number of special cases that make it fast in
286common cases as well as fully general. For example, it has fast-paths for
287variables that are only used in a single block, variables that only have one
288assignment point, good heuristics to avoid insertion of unneeded phi nodes, etc.
289</li>
290
291<li>Needed for debug info generation: <a href="../SourceLevelDebugging.html">
292Debug information in LLVM</a> relies on having the address of the variable
293exposed to attach debug info to it. This technique dovetails very naturally
294with this style of debug info.</li>
295</ul>
296
297<p>If nothing else, this makes it much easier to get your front-end up and
298running, and is very simple to implement. Lets extend Kaleidoscope with mutable
299variables now!
Chris Lattner00c992d2007-11-03 08:55:29 +0000300</p>
Chris Lattner62a709d2007-11-05 00:23:57 +0000301
Chris Lattner00c992d2007-11-03 08:55:29 +0000302</div>
303
Chris Lattner62a709d2007-11-05 00:23:57 +0000304<!-- *********************************************************************** -->
305<div class="doc_section"><a name="kalvars">Mutable Variables in
306Kaleidoscope</a></div>
307<!-- *********************************************************************** -->
308
309<div class="doc_text">
310
311<p>Now that we know the sort of problem we want to tackle, lets see what this
312looks like in the context of our little Kaleidoscope language. We're going to
313add two features:</p>
314
315<ol>
316<li>The ability to mutate variables with the '=' operator.</li>
317<li>The ability to define new variables.</li>
318</ol>
319
320<p>While the first item is really what this is about, we only have variables
321for incoming arguments and for induction variables, and redefining them only
322goes so far :). Also, the ability to define new variables is a
323useful thing regardless of whether you will be mutating them. Here's a
324motivating example that shows how we could use these:</p>
325
326<div class="doc_code">
327<pre>
328# Define ':' for sequencing: as a low-precedence operator that ignores operands
329# and just returns the RHS.
330def binary : 1 (x y) y;
331
332# Recursive fib, we could do this before.
333def fib(x)
334 if (x &lt; 3) then
335 1
336 else
337 fib(x-1)+fib(x-2);
338
339# Iterative fib.
340def fibi(x)
341 <b>var a = 1, b = 1, c in</b>
342 (for i = 3, i &;t; x in
343 <b>c = a + b</b> :
344 <b>a = b</b> :
345 <b>b = c</b>) :
346 b;
347
348# Call it.
349fibi(10);
350</pre>
351</div>
352
353<p>
354In order to mutate variables, we have to change our existing variables to use
355the "alloca trick". Once we have that, we'll add our new operator, then extend
356Kaleidoscope to support new variable definitions.
357</p>
358
359</div>
360
361<!-- *********************************************************************** -->
362<div class="doc_section"><a name="adjustments">Adjusting Existing Variables for
363Mutation</a></div>
364<!-- *********************************************************************** -->
365
366<div class="doc_text">
367
368<p>
369The symbol table in Kaleidoscope is managed at code generation time by the
370'<tt>NamedValues</tt>' map. This map currently keeps track of the LLVM "Value*"
371that holds the double value for the named variable. In order to support
372mutation, we need to change this slightly, so that it <tt>NamedValues</tt> holds
373the <em>memory location</em> of the variable in question. Note that this
374change is a refactoring: it changes the structure of the code, but does not
375(by itself) change the behavior of the compiler. All of these changes are
376isolated in the Kaleidoscope code generator.</p>
377
378<p>
379At this point in Kaleidoscope's development, it only supports variables for two
380things: incoming arguments to functions and the induction variable of 'for'
381loops. For consistency, we'll allow mutation of these variables in addition to
382other user-defined variables. This means that these will both need memory
383locations.
384</p>
385
386<p>To start our transformation of Kaleidoscope, we'll change the NamedValues
387map to map to AllocaInst* instead of Value*. Once we do this, the C++ compiler
388will tell use what parts of the code we need to update:</p>
389
390<div class="doc_code">
391<pre>
392static std::map&lt;std::string, AllocaInst*&gt; NamedValues;
393</pre>
394</div>
395
396<p>Also, since we will need to create these alloca's, we'll use a helper
397function that ensures that the allocas are created in the entry block of the
398function:</p>
399
400<div class="doc_code">
401<pre>
402/// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
403/// the function. This is used for mutable variables etc.
404static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
405 const std::string &amp;VarName) {
406 LLVMBuilder TmpB(&amp;TheFunction-&gt;getEntryBlock(),
407 TheFunction-&gt;getEntryBlock().begin());
408 return TmpB.CreateAlloca(Type::DoubleTy, 0, VarName.c_str());
409}
410</pre>
411</div>
412
413<p>This funny looking code creates an LLVMBuilder object that is pointing at
414the first instruction (.begin()) of the entry block. It then creates an alloca
415with the expected name and returns it. Because all values in Kaleidoscope are
416doubles, there is no need to pass in a type to use.</p>
417
418<p>With this in place, the first functionality change we want to make is to
419variable references. In our new scheme, variables live on the stack, so code
420generating a reference to them actually needs to produce a load from the stack
421slot:</p>
422
423<div class="doc_code">
424<pre>
425Value *VariableExprAST::Codegen() {
426 // Look this variable up in the function.
427 Value *V = NamedValues[Name];
428 if (V == 0) return ErrorV("Unknown variable name");
429
430 // Load the value.
431 return Builder.CreateLoad(V, Name.c_str());
432}
433</pre>
434</div>
435
436<p>As you can see, this is pretty straight-forward. Next we need to update the
437things that define the variables to set up the alloca. We'll start with
438<tt>ForExprAST::Codegen</tt> (see the <a href="#code">full code listing</a> for
439the unabridged code):</p>
440
441<div class="doc_code">
442<pre>
443 Function *TheFunction = Builder.GetInsertBlock()->getParent();
444
445 <b>// Create an alloca for the variable in the entry block.
446 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);</b>
447
448 // Emit the start code first, without 'variable' in scope.
449 Value *StartVal = Start-&gt;Codegen();
450 if (StartVal == 0) return 0;
451
452 <b>// Store the value into the alloca.
453 Builder.CreateStore(StartVal, Alloca);</b>
454 ...
455
456 // Compute the end condition.
457 Value *EndCond = End-&gt;Codegen();
458 if (EndCond == 0) return EndCond;
459
460 <b>// Reload, increment, and restore the alloca. This handles the case where
461 // the body of the loop mutates the variable.
462 Value *CurVar = Builder.CreateLoad(Alloca);
463 Value *NextVar = Builder.CreateAdd(CurVar, StepVal, "nextvar");
464 Builder.CreateStore(NextVar, Alloca);</b>
465 ...
466</pre>
467</div>
468
469<p>This code is virtually identical to the code <a
470href="LangImpl5.html#forcodegen">before we allowed mutable variables</a>. The
471big difference is that we no longer have to construct a PHI node, and we use
472load/store to access the variable as needed.</p>
473
474<p>To support mutable argument variables, we need to also make allocas for them.
475The code for this is also pretty simple:</p>
476
477<div class="doc_code">
478<pre>
479/// CreateArgumentAllocas - Create an alloca for each argument and register the
480/// argument in the symbol table so that references to it will succeed.
481void PrototypeAST::CreateArgumentAllocas(Function *F) {
482 Function::arg_iterator AI = F-&gt;arg_begin();
483 for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
484 // Create an alloca for this variable.
485 AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
486
487 // Store the initial value into the alloca.
488 Builder.CreateStore(AI, Alloca);
489
490 // Add arguments to variable symbol table.
491 NamedValues[Args[Idx]] = Alloca;
492 }
493}
494</pre>
495</div>
496
497<p>For each argument, we make an alloca, store the input value to the function
498into the alloca, and register the alloca as the memory location for the
499argument. This method gets invoked by <tt>FunctionAST::Codegen</tt> right after
500it sets up the entry block for the function.</p>
501
502<p>The final missing piece is adding the 'mem2reg' pass, which allows us to get
503good codegen once again:</p>
504
505<div class="doc_code">
506<pre>
507 // Set up the optimizer pipeline. Start with registering info about how the
508 // target lays out data structures.
509 OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
510 <b>// Promote allocas to registers.
511 OurFPM.add(createPromoteMemoryToRegisterPass());</b>
512 // Do simple "peephole" optimizations and bit-twiddling optzns.
513 OurFPM.add(createInstructionCombiningPass());
514 // Reassociate expressions.
515 OurFPM.add(createReassociatePass());
516</pre>
517</div>
518
519<p>It is interesting to see what the code looks like before and after the
520mem2reg optimization runs. For example, this is the before/after code for our
521recursive fib. Before the optimization:</p>
522
523<div class="doc_code">
524<pre>
525define double @fib(double %x) {
526entry:
527 <b>%x1 = alloca double
528 store double %x, double* %x1
529 %x2 = load double* %x1</b>
530 %multmp = fcmp ult double %x2, 3.000000e+00
531 %booltmp = uitofp i1 %multmp to double
532 %ifcond = fcmp one double %booltmp, 0.000000e+00
533 br i1 %ifcond, label %then, label %else
534
535then: ; preds = %entry
536 br label %ifcont
537
538else: ; preds = %entry
539 <b>%x3 = load double* %x1</b>
540 %subtmp = sub double %x3, 1.000000e+00
541 %calltmp = call double @fib( double %subtmp )
542 <b>%x4 = load double* %x1</b>
543 %subtmp5 = sub double %x4, 2.000000e+00
544 %calltmp6 = call double @fib( double %subtmp5 )
545 %addtmp = add double %calltmp, %calltmp6
546 br label %ifcont
547
548ifcont: ; preds = %else, %then
549 %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
550 ret double %iftmp
551}
552</pre>
553</div>
554
555<p>Here there is only one variable (x, the input argument) but you can still
556see the extremely simple-minded code generation strategy we are using. In the
557entry block, an alloca is created, and the initial input value is stored into
558it. Each reference to the variable does a reload from the stack. Also, note
559that we didn't modify the if/then/else expression, so it still inserts a PHI
560node. While we could make an alloca for it, it is actually easier to create a
561PHI node for it, so we still just make the PHI.</p>
562
563<p>Here is the code after the mem2reg pass runs:</p>
564
565<div class="doc_code">
566<pre>
567define double @fib(double %x) {
568entry:
569 %multmp = fcmp ult double <b>%x</b>, 3.000000e+00
570 %booltmp = uitofp i1 %multmp to double
571 %ifcond = fcmp one double %booltmp, 0.000000e+00
572 br i1 %ifcond, label %then, label %else
573
574then:
575 br label %ifcont
576
577else:
578 %subtmp = sub double <b>%x</b>, 1.000000e+00
579 %calltmp = call double @fib( double %subtmp )
580 %subtmp5 = sub double <b>%x</b>, 2.000000e+00
581 %calltmp6 = call double @fib( double %subtmp5 )
582 %addtmp = add double %calltmp, %calltmp6
583 br label %ifcont
584
585ifcont: ; preds = %else, %then
586 %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
587 ret double %iftmp
588}
589</pre>
590</div>
591
592<p>This is a trivial case for mem2reg, since there are no redefinitions of the
593variable. The point of showing this is to calm your tension about inserting
594such blatent inefficiencies :).</p>
595
596<p>After the rest of the optimizers run, we get:</p>
597
598<div class="doc_code">
599<pre>
600define double @fib(double %x) {
601entry:
602 %multmp = fcmp ult double %x, 3.000000e+00
603 %booltmp = uitofp i1 %multmp to double
604 %ifcond = fcmp ueq double %booltmp, 0.000000e+00
605 br i1 %ifcond, label %else, label %ifcont
606
607else:
608 %subtmp = sub double %x, 1.000000e+00
609 %calltmp = call double @fib( double %subtmp )
610 %subtmp5 = sub double %x, 2.000000e+00
611 %calltmp6 = call double @fib( double %subtmp5 )
612 %addtmp = add double %calltmp, %calltmp6
613 ret double %addtmp
614
615ifcont:
616 ret double 1.000000e+00
617}
618</pre>
619</div>
620
621<p>Here we see that the simplifycfg pass decided to clone the return instruction
622into the end of the 'else' block. This allowed it to eliminate some branches
623and the PHI node.</p>
624
625<p>Now that all symbol table references are updated to use stack variables,
626we'll add the assignment operator.</p>
627
628</div>
629
630<!-- *********************************************************************** -->
631<div class="doc_section"><a name="assignment">New Assignment Operator</a></div>
632<!-- *********************************************************************** -->
633
634<div class="doc_text">
635
636<p>With our current framework, adding a new assignment operator is really
637simple. We will parse it just like any other binary operator, but handle it
638internally (instead of allowing the user to define it). The first step is to
639set a precedence:</p>
640
641<div class="doc_code">
642<pre>
643 int main() {
644 // Install standard binary operators.
645 // 1 is lowest precedence.
646 <b>BinopPrecedence['='] = 2;</b>
647 BinopPrecedence['&lt;'] = 10;
648 BinopPrecedence['+'] = 20;
649 BinopPrecedence['-'] = 20;
650</pre>
651</div>
652
653<p>Now that the parser knows the precedence of the binary operator, it takes
654care of all the parsing and AST generation. We just need to implement codegen
655for the assignment operator. This looks like:</p>
656
657<div class="doc_code">
658<pre>
659Value *BinaryExprAST::Codegen() {
660 // Special case '=' because we don't want to emit the LHS as an expression.
661 if (Op == '=') {
662 // Assignment requires the LHS to be an identifier.
663 VariableExprAST *LHSE = dynamic_cast&lt;VariableExprAST*&gt;(LHS);
664 if (!LHSE)
665 return ErrorV("destination of '=' must be a variable");
666</pre>
667</div>
668
669<p>Unlike the rest of the binary operators, our assignment operator doesn't
670follow the "emit LHS, emit RHS, do computation" model. As such, it is handled
671as a special case before the other binary operators are handled. The other
672strange thing about it is that it requires the LHS to be a variable directly.
673</p>
674
675<div class="doc_code">
676<pre>
677 // Codegen the RHS.
678 Value *Val = RHS-&gt;Codegen();
679 if (Val == 0) return 0;
680
681 // Look up the name.
682 Value *Variable = NamedValues[LHSE-&gt;getName()];
683 if (Variable == 0) return ErrorV("Unknown variable name");
684
685 Builder.CreateStore(Val, Variable);
686 return Val;
687 }
688 ...
689</pre>
690</div>
691
692<p>Once it has the variable, codegen'ing the assignment is straight-forward:
693we emit the RHS of the assignment, create a store, and return the computed
694value. Returning a value allows for chained assignments like "X = (Y = Z)".</p>
695
696<p>Now that we have an assignment operator, we can mutate loop variables and
697arguments. For example, we can now run code like this:</p>
698
699<div class="doc_code">
700<pre>
701# Function to print a double.
702extern printd(x);
703
704# Define ':' for sequencing: as a low-precedence operator that ignores operands
705# and just returns the RHS.
706def binary : 1 (x y) y;
707
708def test(x)
709 printd(x) :
710 x = 4 :
711 printd(x);
712
713test(123);
714</pre>
715</div>
716
717<p>When run, this example prints "123" and then "4", showing that we did
718actually mutate the value! Okay, we have now officially implemented our goal:
719getting this to work requires SSA construction in the general case. However,
720to be really useful, we want the ability to define our own local variables, lets
721add this next!
722</p>
723
724</div>
725
726<!-- *********************************************************************** -->
727<div class="doc_section"><a name="localvars">User-defined Local
728Variables</a></div>
729<!-- *********************************************************************** -->
730
731<div class="doc_text">
732
733<p>Adding var/in is just like any other other extensions we made to
734Kaleidoscope: we extend the lexer, the parser, the AST and the code generator.
735The first step for adding our new 'var/in' construct is to extend the lexer.
736As before, this is pretty trivial, the code looks like this:</p>
737
738<div class="doc_code">
739<pre>
740enum Token {
741 ...
742 <b>// var definition
743 tok_var = -13</b>
744...
745}
746...
747static int gettok() {
748...
749 if (IdentifierStr == "in") return tok_in;
750 if (IdentifierStr == "binary") return tok_binary;
751 if (IdentifierStr == "unary") return tok_unary;
752 <b>if (IdentifierStr == "var") return tok_var;</b>
753 return tok_identifier;
754...
755</pre>
756</div>
757
758<p>The next step is to define the AST node that we will construct. For var/in,
759it will look like this:</p>
760
761<div class="doc_code">
762<pre>
763/// VarExprAST - Expression class for var/in
764class VarExprAST : public ExprAST {
765 std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
766 ExprAST *Body;
767public:
768 VarExprAST(const std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; &amp;varnames,
769 ExprAST *body)
770 : VarNames(varnames), Body(body) {}
771
772 virtual Value *Codegen();
773};
774</pre>
775</div>
776
777<p>var/in allows a list of names to be defined all at once, and each name can
778optionally have an initializer value. As such, we capture this information in
779the VarNames vector. Also, var/in has a body, this body is allowed to access
780the variables defined by the let/in.</p>
781
782<p>With this ready, we can define the parser pieces. First thing we do is add
783it as a primary expression:</p>
784
785<div class="doc_code">
786<pre>
787/// primary
788/// ::= identifierexpr
789/// ::= numberexpr
790/// ::= parenexpr
791/// ::= ifexpr
792/// ::= forexpr
793<b>/// ::= varexpr</b>
794static ExprAST *ParsePrimary() {
795 switch (CurTok) {
796 default: return Error("unknown token when expecting an expression");
797 case tok_identifier: return ParseIdentifierExpr();
798 case tok_number: return ParseNumberExpr();
799 case '(': return ParseParenExpr();
800 case tok_if: return ParseIfExpr();
801 case tok_for: return ParseForExpr();
802 <b>case tok_var: return ParseVarExpr();</b>
803 }
804}
805</pre>
806</div>
807
808<p>Next we define ParseVarExpr:</p>
809
810<div class="doc_code">
811<pre>
812/// varexpr ::= 'var' identifer ('=' expression)?
813// (',' identifer ('=' expression)?)* 'in' expression
814static ExprAST *ParseVarExpr() {
815 getNextToken(); // eat the var.
816
817 std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
818
819 // At least one variable name is required.
820 if (CurTok != tok_identifier)
821 return Error("expected identifier after var");
822</pre>
823</div>
824
825<p>The first part of this code parses the list of identifier/expr pairs into the
826local <tt>VarNames</tt> vector.
827
828<div class="doc_code">
829<pre>
830 while (1) {
831 std::string Name = IdentifierStr;
832 getNextToken(); // eat identifer.
833
834 // Read the optional initializer.
835 ExprAST *Init = 0;
836 if (CurTok == '=') {
837 getNextToken(); // eat the '='.
838
839 Init = ParseExpression();
840 if (Init == 0) return 0;
841 }
842
843 VarNames.push_back(std::make_pair(Name, Init));
844
845 // End of var list, exit loop.
846 if (CurTok != ',') break;
847 getNextToken(); // eat the ','.
848
849 if (CurTok != tok_identifier)
850 return Error("expected identifier list after var");
851 }
852</pre>
853</div>
854
855<p>Once all the variables are parsed, we then parse the body and create the
856AST node:</p>
857
858<div class="doc_code">
859<pre>
860 // At this point, we have to have 'in'.
861 if (CurTok != tok_in)
862 return Error("expected 'in' keyword after 'var'");
863 getNextToken(); // eat 'in'.
864
865 ExprAST *Body = ParseExpression();
866 if (Body == 0) return 0;
867
868 return new VarExprAST(VarNames, Body);
869}
870</pre>
871</div>
872
873<p>Now that we can parse and represent the code, we need to support emission of
874LLVM IR for it. This code starts out with:</p>
875
876<div class="doc_code">
877<pre>
878Value *VarExprAST::Codegen() {
879 std::vector&lt;AllocaInst *&gt; OldBindings;
880
881 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
882
883 // Register all variables and emit their initializer.
884 for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
885 const std::string &amp;VarName = VarNames[i].first;
886 ExprAST *Init = VarNames[i].second;
887</pre>
888</div>
889
890<p>Basically it loops over all the variables, installing them one at a time.
891For each variable we put into the symbol table, we remember the previous value
892that we replace in OldBindings.</p>
893
894<div class="doc_code">
895<pre>
896 // Emit the initializer before adding the variable to scope, this prevents
897 // the initializer from referencing the variable itself, and permits stuff
898 // like this:
899 // var a = 1 in
900 // var a = a in ... # refers to outer 'a'.
901 Value *InitVal;
902 if (Init) {
903 InitVal = Init-&gt;Codegen();
904 if (InitVal == 0) return 0;
905 } else { // If not specified, use 0.0.
906 InitVal = ConstantFP::get(Type::DoubleTy, APFloat(0.0));
907 }
908
909 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
910 Builder.CreateStore(InitVal, Alloca);
911
912 // Remember the old variable binding so that we can restore the binding when
913 // we unrecurse.
914 OldBindings.push_back(NamedValues[VarName]);
915
916 // Remember this binding.
917 NamedValues[VarName] = Alloca;
918 }
919</pre>
920</div>
921
922<p>There are more comments here than code. The basic idea is that we emit the
923initializer, create the alloca, then update the symbol table to point to it.
924Once all the variables are installed in the symbol table, we evaluate the body
925of the var/in expression:</p>
926
927<div class="doc_code">
928<pre>
929 // Codegen the body, now that all vars are in scope.
930 Value *BodyVal = Body-&gt;Codegen();
931 if (BodyVal == 0) return 0;
932</pre>
933</div>
934
935<p>Finally, before returning, we restore the previous variable bindings:</p>
936
937<div class="doc_code">
938<pre>
939 // Pop all our variables from scope.
940 for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
941 NamedValues[VarNames[i].first] = OldBindings[i];
942
943 // Return the body computation.
944 return BodyVal;
945}
946</pre>
947</div>
948
949<p>The end result of all of this is that we get properly scoped variable
950definitions, and we even (trivially) allow mutation of them :).</p>
951
952<p>With this, we completed what we set out to do. Our nice iterative fib
953example from the intro compiles and runs just fine. The mem2reg pass optimizes
954all of our stack variables into SSA registers, inserting PHI nodes where needed,
955and our front-end remains simple: no iterated dominator frontier computation
956anywhere in sight.</p>
957
958</div>
Chris Lattner00c992d2007-11-03 08:55:29 +0000959
960<!-- *********************************************************************** -->
961<div class="doc_section"><a name="code">Full Code Listing</a></div>
962<!-- *********************************************************************** -->
963
964<div class="doc_text">
965
966<p>
Chris Lattner62a709d2007-11-05 00:23:57 +0000967Here is the complete code listing for our running example, enhanced with mutable
968variables and var/in support. To build this example, use:
Chris Lattner00c992d2007-11-03 08:55:29 +0000969</p>
970
971<div class="doc_code">
972<pre>
973 # Compile
974 g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
975 # Run
976 ./toy
977</pre>
978</div>
979
980<p>Here is the code:</p>
981
982<div class="doc_code">
983<pre>
Chris Lattner62a709d2007-11-05 00:23:57 +0000984#include "llvm/DerivedTypes.h"
985#include "llvm/ExecutionEngine/ExecutionEngine.h"
986#include "llvm/Module.h"
987#include "llvm/ModuleProvider.h"
988#include "llvm/PassManager.h"
989#include "llvm/Analysis/Verifier.h"
990#include "llvm/Target/TargetData.h"
991#include "llvm/Transforms/Scalar.h"
992#include "llvm/Support/LLVMBuilder.h"
993#include &lt;cstdio&gt;
994#include &lt;string&gt;
995#include &lt;map&gt;
996#include &lt;vector&gt;
997using namespace llvm;
998
999//===----------------------------------------------------------------------===//
1000// Lexer
1001//===----------------------------------------------------------------------===//
1002
1003// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
1004// of these for known things.
1005enum Token {
1006 tok_eof = -1,
1007
1008 // commands
1009 tok_def = -2, tok_extern = -3,
1010
1011 // primary
1012 tok_identifier = -4, tok_number = -5,
1013
1014 // control
1015 tok_if = -6, tok_then = -7, tok_else = -8,
1016 tok_for = -9, tok_in = -10,
1017
1018 // operators
1019 tok_binary = -11, tok_unary = -12,
1020
1021 // var definition
1022 tok_var = -13
1023};
1024
1025static std::string IdentifierStr; // Filled in if tok_identifier
1026static double NumVal; // Filled in if tok_number
1027
1028/// gettok - Return the next token from standard input.
1029static int gettok() {
1030 static int LastChar = ' ';
1031
1032 // Skip any whitespace.
1033 while (isspace(LastChar))
1034 LastChar = getchar();
1035
1036 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
1037 IdentifierStr = LastChar;
1038 while (isalnum((LastChar = getchar())))
1039 IdentifierStr += LastChar;
1040
1041 if (IdentifierStr == "def") return tok_def;
1042 if (IdentifierStr == "extern") return tok_extern;
1043 if (IdentifierStr == "if") return tok_if;
1044 if (IdentifierStr == "then") return tok_then;
1045 if (IdentifierStr == "else") return tok_else;
1046 if (IdentifierStr == "for") return tok_for;
1047 if (IdentifierStr == "in") return tok_in;
1048 if (IdentifierStr == "binary") return tok_binary;
1049 if (IdentifierStr == "unary") return tok_unary;
1050 if (IdentifierStr == "var") return tok_var;
1051 return tok_identifier;
1052 }
1053
1054 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
1055 std::string NumStr;
1056 do {
1057 NumStr += LastChar;
1058 LastChar = getchar();
1059 } while (isdigit(LastChar) || LastChar == '.');
1060
1061 NumVal = strtod(NumStr.c_str(), 0);
1062 return tok_number;
1063 }
1064
1065 if (LastChar == '#') {
1066 // Comment until end of line.
1067 do LastChar = getchar();
1068 while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp; LastChar != '\r');
1069
1070 if (LastChar != EOF)
1071 return gettok();
1072 }
1073
1074 // Check for end of file. Don't eat the EOF.
1075 if (LastChar == EOF)
1076 return tok_eof;
1077
1078 // Otherwise, just return the character as its ascii value.
1079 int ThisChar = LastChar;
1080 LastChar = getchar();
1081 return ThisChar;
1082}
1083
1084//===----------------------------------------------------------------------===//
1085// Abstract Syntax Tree (aka Parse Tree)
1086//===----------------------------------------------------------------------===//
1087
1088/// ExprAST - Base class for all expression nodes.
1089class ExprAST {
1090public:
1091 virtual ~ExprAST() {}
1092 virtual Value *Codegen() = 0;
1093};
1094
1095/// NumberExprAST - Expression class for numeric literals like "1.0".
1096class NumberExprAST : public ExprAST {
1097 double Val;
1098public:
1099 NumberExprAST(double val) : Val(val) {}
1100 virtual Value *Codegen();
1101};
1102
1103/// VariableExprAST - Expression class for referencing a variable, like "a".
1104class VariableExprAST : public ExprAST {
1105 std::string Name;
1106public:
1107 VariableExprAST(const std::string &amp;name) : Name(name) {}
1108 const std::string &amp;getName() const { return Name; }
1109 virtual Value *Codegen();
1110};
1111
1112/// UnaryExprAST - Expression class for a unary operator.
1113class UnaryExprAST : public ExprAST {
1114 char Opcode;
1115 ExprAST *Operand;
1116public:
1117 UnaryExprAST(char opcode, ExprAST *operand)
1118 : Opcode(opcode), Operand(operand) {}
1119 virtual Value *Codegen();
1120};
1121
1122/// BinaryExprAST - Expression class for a binary operator.
1123class BinaryExprAST : public ExprAST {
1124 char Op;
1125 ExprAST *LHS, *RHS;
1126public:
1127 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
1128 : Op(op), LHS(lhs), RHS(rhs) {}
1129 virtual Value *Codegen();
1130};
1131
1132/// CallExprAST - Expression class for function calls.
1133class CallExprAST : public ExprAST {
1134 std::string Callee;
1135 std::vector&lt;ExprAST*&gt; Args;
1136public:
1137 CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
1138 : Callee(callee), Args(args) {}
1139 virtual Value *Codegen();
1140};
1141
1142/// IfExprAST - Expression class for if/then/else.
1143class IfExprAST : public ExprAST {
1144 ExprAST *Cond, *Then, *Else;
1145public:
1146 IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
1147 : Cond(cond), Then(then), Else(_else) {}
1148 virtual Value *Codegen();
1149};
1150
1151/// ForExprAST - Expression class for for/in.
1152class ForExprAST : public ExprAST {
1153 std::string VarName;
1154 ExprAST *Start, *End, *Step, *Body;
1155public:
1156 ForExprAST(const std::string &amp;varname, ExprAST *start, ExprAST *end,
1157 ExprAST *step, ExprAST *body)
1158 : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
1159 virtual Value *Codegen();
1160};
1161
1162/// VarExprAST - Expression class for var/in
1163class VarExprAST : public ExprAST {
1164 std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
1165 ExprAST *Body;
1166public:
1167 VarExprAST(const std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; &amp;varnames,
1168 ExprAST *body)
1169 : VarNames(varnames), Body(body) {}
1170
1171 virtual Value *Codegen();
1172};
1173
1174/// PrototypeAST - This class represents the "prototype" for a function,
1175/// which captures its argument names as well as if it is an operator.
1176class PrototypeAST {
1177 std::string Name;
1178 std::vector&lt;std::string&gt; Args;
1179 bool isOperator;
1180 unsigned Precedence; // Precedence if a binary op.
1181public:
1182 PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args,
1183 bool isoperator = false, unsigned prec = 0)
1184 : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {}
1185
1186 bool isUnaryOp() const { return isOperator &amp;&amp; Args.size() == 1; }
1187 bool isBinaryOp() const { return isOperator &amp;&amp; Args.size() == 2; }
1188
1189 char getOperatorName() const {
1190 assert(isUnaryOp() || isBinaryOp());
1191 return Name[Name.size()-1];
1192 }
1193
1194 unsigned getBinaryPrecedence() const { return Precedence; }
1195
1196 Function *Codegen();
1197
1198 void CreateArgumentAllocas(Function *F);
1199};
1200
1201/// FunctionAST - This class represents a function definition itself.
1202class FunctionAST {
1203 PrototypeAST *Proto;
1204 ExprAST *Body;
1205public:
1206 FunctionAST(PrototypeAST *proto, ExprAST *body)
1207 : Proto(proto), Body(body) {}
1208
1209 Function *Codegen();
1210};
1211
1212//===----------------------------------------------------------------------===//
1213// Parser
1214//===----------------------------------------------------------------------===//
1215
1216/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
1217/// token the parser it looking at. getNextToken reads another token from the
1218/// lexer and updates CurTok with its results.
1219static int CurTok;
1220static int getNextToken() {
1221 return CurTok = gettok();
1222}
1223
1224/// BinopPrecedence - This holds the precedence for each binary operator that is
1225/// defined.
1226static std::map&lt;char, int&gt; BinopPrecedence;
1227
1228/// GetTokPrecedence - Get the precedence of the pending binary operator token.
1229static int GetTokPrecedence() {
1230 if (!isascii(CurTok))
1231 return -1;
1232
1233 // Make sure it's a declared binop.
1234 int TokPrec = BinopPrecedence[CurTok];
1235 if (TokPrec &lt;= 0) return -1;
1236 return TokPrec;
1237}
1238
1239/// Error* - These are little helper functions for error handling.
1240ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
1241PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
1242FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
1243
1244static ExprAST *ParseExpression();
1245
1246/// identifierexpr
1247/// ::= identifer
1248/// ::= identifer '(' expression* ')'
1249static ExprAST *ParseIdentifierExpr() {
1250 std::string IdName = IdentifierStr;
1251
1252 getNextToken(); // eat identifer.
1253
1254 if (CurTok != '(') // Simple variable ref.
1255 return new VariableExprAST(IdName);
1256
1257 // Call.
1258 getNextToken(); // eat (
1259 std::vector&lt;ExprAST*&gt; Args;
1260 if (CurTok != ')') {
1261 while (1) {
1262 ExprAST *Arg = ParseExpression();
1263 if (!Arg) return 0;
1264 Args.push_back(Arg);
1265
1266 if (CurTok == ')') break;
1267
1268 if (CurTok != ',')
1269 return Error("Expected ')'");
1270 getNextToken();
1271 }
1272 }
1273
1274 // Eat the ')'.
1275 getNextToken();
1276
1277 return new CallExprAST(IdName, Args);
1278}
1279
1280/// numberexpr ::= number
1281static ExprAST *ParseNumberExpr() {
1282 ExprAST *Result = new NumberExprAST(NumVal);
1283 getNextToken(); // consume the number
1284 return Result;
1285}
1286
1287/// parenexpr ::= '(' expression ')'
1288static ExprAST *ParseParenExpr() {
1289 getNextToken(); // eat (.
1290 ExprAST *V = ParseExpression();
1291 if (!V) return 0;
1292
1293 if (CurTok != ')')
1294 return Error("expected ')'");
1295 getNextToken(); // eat ).
1296 return V;
1297}
1298
1299/// ifexpr ::= 'if' expression 'then' expression 'else' expression
1300static ExprAST *ParseIfExpr() {
1301 getNextToken(); // eat the if.
1302
1303 // condition.
1304 ExprAST *Cond = ParseExpression();
1305 if (!Cond) return 0;
1306
1307 if (CurTok != tok_then)
1308 return Error("expected then");
1309 getNextToken(); // eat the then
1310
1311 ExprAST *Then = ParseExpression();
1312 if (Then == 0) return 0;
1313
1314 if (CurTok != tok_else)
1315 return Error("expected else");
1316
1317 getNextToken();
1318
1319 ExprAST *Else = ParseExpression();
1320 if (!Else) return 0;
1321
1322 return new IfExprAST(Cond, Then, Else);
1323}
1324
1325/// forexpr ::= 'for' identifer '=' expr ',' expr (',' expr)? 'in' expression
1326static ExprAST *ParseForExpr() {
1327 getNextToken(); // eat the for.
1328
1329 if (CurTok != tok_identifier)
1330 return Error("expected identifier after for");
1331
1332 std::string IdName = IdentifierStr;
1333 getNextToken(); // eat identifer.
1334
1335 if (CurTok != '=')
1336 return Error("expected '=' after for");
1337 getNextToken(); // eat '='.
1338
1339
1340 ExprAST *Start = ParseExpression();
1341 if (Start == 0) return 0;
1342 if (CurTok != ',')
1343 return Error("expected ',' after for start value");
1344 getNextToken();
1345
1346 ExprAST *End = ParseExpression();
1347 if (End == 0) return 0;
1348
1349 // The step value is optional.
1350 ExprAST *Step = 0;
1351 if (CurTok == ',') {
1352 getNextToken();
1353 Step = ParseExpression();
1354 if (Step == 0) return 0;
1355 }
1356
1357 if (CurTok != tok_in)
1358 return Error("expected 'in' after for");
1359 getNextToken(); // eat 'in'.
1360
1361 ExprAST *Body = ParseExpression();
1362 if (Body == 0) return 0;
1363
1364 return new ForExprAST(IdName, Start, End, Step, Body);
1365}
1366
1367/// varexpr ::= 'var' identifer ('=' expression)?
1368// (',' identifer ('=' expression)?)* 'in' expression
1369static ExprAST *ParseVarExpr() {
1370 getNextToken(); // eat the var.
1371
1372 std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
1373
1374 // At least one variable name is required.
1375 if (CurTok != tok_identifier)
1376 return Error("expected identifier after var");
1377
1378 while (1) {
1379 std::string Name = IdentifierStr;
1380 getNextToken(); // eat identifer.
1381
1382 // Read the optional initializer.
1383 ExprAST *Init = 0;
1384 if (CurTok == '=') {
1385 getNextToken(); // eat the '='.
1386
1387 Init = ParseExpression();
1388 if (Init == 0) return 0;
1389 }
1390
1391 VarNames.push_back(std::make_pair(Name, Init));
1392
1393 // End of var list, exit loop.
1394 if (CurTok != ',') break;
1395 getNextToken(); // eat the ','.
1396
1397 if (CurTok != tok_identifier)
1398 return Error("expected identifier list after var");
1399 }
1400
1401 // At this point, we have to have 'in'.
1402 if (CurTok != tok_in)
1403 return Error("expected 'in' keyword after 'var'");
1404 getNextToken(); // eat 'in'.
1405
1406 ExprAST *Body = ParseExpression();
1407 if (Body == 0) return 0;
1408
1409 return new VarExprAST(VarNames, Body);
1410}
1411
1412
1413/// primary
1414/// ::= identifierexpr
1415/// ::= numberexpr
1416/// ::= parenexpr
1417/// ::= ifexpr
1418/// ::= forexpr
1419/// ::= varexpr
1420static ExprAST *ParsePrimary() {
1421 switch (CurTok) {
1422 default: return Error("unknown token when expecting an expression");
1423 case tok_identifier: return ParseIdentifierExpr();
1424 case tok_number: return ParseNumberExpr();
1425 case '(': return ParseParenExpr();
1426 case tok_if: return ParseIfExpr();
1427 case tok_for: return ParseForExpr();
1428 case tok_var: return ParseVarExpr();
1429 }
1430}
1431
1432/// unary
1433/// ::= primary
1434/// ::= '!' unary
1435static ExprAST *ParseUnary() {
1436 // If the current token is not an operator, it must be a primary expr.
1437 if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
1438 return ParsePrimary();
1439
1440 // If this is a unary operator, read it.
1441 int Opc = CurTok;
1442 getNextToken();
1443 if (ExprAST *Operand = ParseUnary())
1444 return new UnaryExprAST(Opc, Operand);
1445 return 0;
1446}
1447
1448/// binoprhs
1449/// ::= ('+' unary)*
1450static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
1451 // If this is a binop, find its precedence.
1452 while (1) {
1453 int TokPrec = GetTokPrecedence();
1454
1455 // If this is a binop that binds at least as tightly as the current binop,
1456 // consume it, otherwise we are done.
1457 if (TokPrec &lt; ExprPrec)
1458 return LHS;
1459
1460 // Okay, we know this is a binop.
1461 int BinOp = CurTok;
1462 getNextToken(); // eat binop
1463
1464 // Parse the unary expression after the binary operator.
1465 ExprAST *RHS = ParseUnary();
1466 if (!RHS) return 0;
1467
1468 // If BinOp binds less tightly with RHS than the operator after RHS, let
1469 // the pending operator take RHS as its LHS.
1470 int NextPrec = GetTokPrecedence();
1471 if (TokPrec &lt; NextPrec) {
1472 RHS = ParseBinOpRHS(TokPrec+1, RHS);
1473 if (RHS == 0) return 0;
1474 }
1475
1476 // Merge LHS/RHS.
1477 LHS = new BinaryExprAST(BinOp, LHS, RHS);
1478 }
1479}
1480
1481/// expression
1482/// ::= unary binoprhs
1483///
1484static ExprAST *ParseExpression() {
1485 ExprAST *LHS = ParseUnary();
1486 if (!LHS) return 0;
1487
1488 return ParseBinOpRHS(0, LHS);
1489}
1490
1491/// prototype
1492/// ::= id '(' id* ')'
1493/// ::= binary LETTER number? (id, id)
1494/// ::= unary LETTER (id)
1495static PrototypeAST *ParsePrototype() {
1496 std::string FnName;
1497
1498 int Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
1499 unsigned BinaryPrecedence = 30;
1500
1501 switch (CurTok) {
1502 default:
1503 return ErrorP("Expected function name in prototype");
1504 case tok_identifier:
1505 FnName = IdentifierStr;
1506 Kind = 0;
1507 getNextToken();
1508 break;
1509 case tok_unary:
1510 getNextToken();
1511 if (!isascii(CurTok))
1512 return ErrorP("Expected unary operator");
1513 FnName = "unary";
1514 FnName += (char)CurTok;
1515 Kind = 1;
1516 getNextToken();
1517 break;
1518 case tok_binary:
1519 getNextToken();
1520 if (!isascii(CurTok))
1521 return ErrorP("Expected binary operator");
1522 FnName = "binary";
1523 FnName += (char)CurTok;
1524 Kind = 2;
1525 getNextToken();
1526
1527 // Read the precedence if present.
1528 if (CurTok == tok_number) {
1529 if (NumVal &lt; 1 || NumVal &gt; 100)
1530 return ErrorP("Invalid precedecnce: must be 1..100");
1531 BinaryPrecedence = (unsigned)NumVal;
1532 getNextToken();
1533 }
1534 break;
1535 }
1536
1537 if (CurTok != '(')
1538 return ErrorP("Expected '(' in prototype");
1539
1540 std::vector&lt;std::string&gt; ArgNames;
1541 while (getNextToken() == tok_identifier)
1542 ArgNames.push_back(IdentifierStr);
1543 if (CurTok != ')')
1544 return ErrorP("Expected ')' in prototype");
1545
1546 // success.
1547 getNextToken(); // eat ')'.
1548
1549 // Verify right number of names for operator.
1550 if (Kind &amp;&amp; ArgNames.size() != Kind)
1551 return ErrorP("Invalid number of operands for operator");
1552
1553 return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);
1554}
1555
1556/// definition ::= 'def' prototype expression
1557static FunctionAST *ParseDefinition() {
1558 getNextToken(); // eat def.
1559 PrototypeAST *Proto = ParsePrototype();
1560 if (Proto == 0) return 0;
1561
1562 if (ExprAST *E = ParseExpression())
1563 return new FunctionAST(Proto, E);
1564 return 0;
1565}
1566
1567/// toplevelexpr ::= expression
1568static FunctionAST *ParseTopLevelExpr() {
1569 if (ExprAST *E = ParseExpression()) {
1570 // Make an anonymous proto.
1571 PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
1572 return new FunctionAST(Proto, E);
1573 }
1574 return 0;
1575}
1576
1577/// external ::= 'extern' prototype
1578static PrototypeAST *ParseExtern() {
1579 getNextToken(); // eat extern.
1580 return ParsePrototype();
1581}
1582
1583//===----------------------------------------------------------------------===//
1584// Code Generation
1585//===----------------------------------------------------------------------===//
1586
1587static Module *TheModule;
1588static LLVMFoldingBuilder Builder;
1589static std::map&lt;std::string, AllocaInst*&gt; NamedValues;
1590static FunctionPassManager *TheFPM;
1591
1592Value *ErrorV(const char *Str) { Error(Str); return 0; }
1593
1594/// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
1595/// the function. This is used for mutable variables etc.
1596static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
1597 const std::string &amp;VarName) {
1598 LLVMBuilder TmpB(&amp;TheFunction-&gt;getEntryBlock(),
1599 TheFunction-&gt;getEntryBlock().begin());
1600 return TmpB.CreateAlloca(Type::DoubleTy, 0, VarName.c_str());
1601}
1602
1603
1604Value *NumberExprAST::Codegen() {
1605 return ConstantFP::get(Type::DoubleTy, APFloat(Val));
1606}
1607
1608Value *VariableExprAST::Codegen() {
1609 // Look this variable up in the function.
1610 Value *V = NamedValues[Name];
1611 if (V == 0) return ErrorV("Unknown variable name");
1612
1613 // Load the value.
1614 return Builder.CreateLoad(V, Name.c_str());
1615}
1616
1617Value *UnaryExprAST::Codegen() {
1618 Value *OperandV = Operand-&gt;Codegen();
1619 if (OperandV == 0) return 0;
1620
1621 Function *F = TheModule-&gt;getFunction(std::string("unary")+Opcode);
1622 if (F == 0)
1623 return ErrorV("Unknown unary operator");
1624
1625 return Builder.CreateCall(F, OperandV, "unop");
1626}
1627
1628
1629Value *BinaryExprAST::Codegen() {
1630 // Special case '=' because we don't want to emit the LHS as an expression.
1631 if (Op == '=') {
1632 // Assignment requires the LHS to be an identifier.
1633 VariableExprAST *LHSE = dynamic_cast&lt;VariableExprAST*&gt;(LHS);
1634 if (!LHSE)
1635 return ErrorV("destination of '=' must be a variable");
1636 // Codegen the RHS.
1637 Value *Val = RHS-&gt;Codegen();
1638 if (Val == 0) return 0;
1639
1640 // Look up the name.
1641 Value *Variable = NamedValues[LHSE-&gt;getName()];
1642 if (Variable == 0) return ErrorV("Unknown variable name");
1643
1644 Builder.CreateStore(Val, Variable);
1645 return Val;
1646 }
1647
1648
1649 Value *L = LHS-&gt;Codegen();
1650 Value *R = RHS-&gt;Codegen();
1651 if (L == 0 || R == 0) return 0;
1652
1653 switch (Op) {
1654 case '+': return Builder.CreateAdd(L, R, "addtmp");
1655 case '-': return Builder.CreateSub(L, R, "subtmp");
1656 case '*': return Builder.CreateMul(L, R, "multmp");
1657 case '&lt;':
1658 L = Builder.CreateFCmpULT(L, R, "multmp");
1659 // Convert bool 0/1 to double 0.0 or 1.0
1660 return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
1661 default: break;
1662 }
1663
1664 // If it wasn't a builtin binary operator, it must be a user defined one. Emit
1665 // a call to it.
1666 Function *F = TheModule-&gt;getFunction(std::string("binary")+Op);
1667 assert(F &amp;&amp; "binary operator not found!");
1668
1669 Value *Ops[] = { L, R };
1670 return Builder.CreateCall(F, Ops, Ops+2, "binop");
1671}
1672
1673Value *CallExprAST::Codegen() {
1674 // Look up the name in the global module table.
1675 Function *CalleeF = TheModule-&gt;getFunction(Callee);
1676 if (CalleeF == 0)
1677 return ErrorV("Unknown function referenced");
1678
1679 // If argument mismatch error.
1680 if (CalleeF-&gt;arg_size() != Args.size())
1681 return ErrorV("Incorrect # arguments passed");
1682
1683 std::vector&lt;Value*&gt; ArgsV;
1684 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
1685 ArgsV.push_back(Args[i]-&gt;Codegen());
1686 if (ArgsV.back() == 0) return 0;
1687 }
1688
1689 return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
1690}
1691
1692Value *IfExprAST::Codegen() {
1693 Value *CondV = Cond-&gt;Codegen();
1694 if (CondV == 0) return 0;
1695
1696 // Convert condition to a bool by comparing equal to 0.0.
1697 CondV = Builder.CreateFCmpONE(CondV,
1698 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
1699 "ifcond");
1700
1701 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1702
1703 // Create blocks for the then and else cases. Insert the 'then' block at the
1704 // end of the function.
1705 BasicBlock *ThenBB = new BasicBlock("then", TheFunction);
1706 BasicBlock *ElseBB = new BasicBlock("else");
1707 BasicBlock *MergeBB = new BasicBlock("ifcont");
1708
1709 Builder.CreateCondBr(CondV, ThenBB, ElseBB);
1710
1711 // Emit then value.
1712 Builder.SetInsertPoint(ThenBB);
1713
1714 Value *ThenV = Then-&gt;Codegen();
1715 if (ThenV == 0) return 0;
1716
1717 Builder.CreateBr(MergeBB);
1718 // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
1719 ThenBB = Builder.GetInsertBlock();
1720
1721 // Emit else block.
1722 TheFunction-&gt;getBasicBlockList().push_back(ElseBB);
1723 Builder.SetInsertPoint(ElseBB);
1724
1725 Value *ElseV = Else-&gt;Codegen();
1726 if (ElseV == 0) return 0;
1727
1728 Builder.CreateBr(MergeBB);
1729 // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
1730 ElseBB = Builder.GetInsertBlock();
1731
1732 // Emit merge block.
1733 TheFunction-&gt;getBasicBlockList().push_back(MergeBB);
1734 Builder.SetInsertPoint(MergeBB);
1735 PHINode *PN = Builder.CreatePHI(Type::DoubleTy, "iftmp");
1736
1737 PN-&gt;addIncoming(ThenV, ThenBB);
1738 PN-&gt;addIncoming(ElseV, ElseBB);
1739 return PN;
1740}
1741
1742Value *ForExprAST::Codegen() {
1743 // Output this as:
1744 // var = alloca double
1745 // ...
1746 // start = startexpr
1747 // store start -&gt; var
1748 // goto loop
1749 // loop:
1750 // ...
1751 // bodyexpr
1752 // ...
1753 // loopend:
1754 // step = stepexpr
1755 // endcond = endexpr
1756 //
1757 // curvar = load var
1758 // nextvar = curvar + step
1759 // store nextvar -&gt; var
1760 // br endcond, loop, endloop
1761 // outloop:
1762
1763 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1764
1765 // Create an alloca for the variable in the entry block.
1766 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
1767
1768 // Emit the start code first, without 'variable' in scope.
1769 Value *StartVal = Start-&gt;Codegen();
1770 if (StartVal == 0) return 0;
1771
1772 // Store the value into the alloca.
1773 Builder.CreateStore(StartVal, Alloca);
1774
1775 // Make the new basic block for the loop header, inserting after current
1776 // block.
1777 BasicBlock *PreheaderBB = Builder.GetInsertBlock();
1778 BasicBlock *LoopBB = new BasicBlock("loop", TheFunction);
1779
1780 // Insert an explicit fall through from the current block to the LoopBB.
1781 Builder.CreateBr(LoopBB);
1782
1783 // Start insertion in LoopBB.
1784 Builder.SetInsertPoint(LoopBB);
1785
1786 // Within the loop, the variable is defined equal to the PHI node. If it
1787 // shadows an existing variable, we have to restore it, so save it now.
1788 AllocaInst *OldVal = NamedValues[VarName];
1789 NamedValues[VarName] = Alloca;
1790
1791 // Emit the body of the loop. This, like any other expr, can change the
1792 // current BB. Note that we ignore the value computed by the body, but don't
1793 // allow an error.
1794 if (Body-&gt;Codegen() == 0)
1795 return 0;
1796
1797 // Emit the step value.
1798 Value *StepVal;
1799 if (Step) {
1800 StepVal = Step-&gt;Codegen();
1801 if (StepVal == 0) return 0;
1802 } else {
1803 // If not specified, use 1.0.
1804 StepVal = ConstantFP::get(Type::DoubleTy, APFloat(1.0));
1805 }
1806
1807 // Compute the end condition.
1808 Value *EndCond = End-&gt;Codegen();
1809 if (EndCond == 0) return EndCond;
1810
1811 // Reload, increment, and restore the alloca. This handles the case where
1812 // the body of the loop mutates the variable.
1813 Value *CurVar = Builder.CreateLoad(Alloca, VarName.c_str());
1814 Value *NextVar = Builder.CreateAdd(CurVar, StepVal, "nextvar");
1815 Builder.CreateStore(NextVar, Alloca);
1816
1817 // Convert condition to a bool by comparing equal to 0.0.
1818 EndCond = Builder.CreateFCmpONE(EndCond,
1819 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
1820 "loopcond");
1821
1822 // Create the "after loop" block and insert it.
1823 BasicBlock *LoopEndBB = Builder.GetInsertBlock();
1824 BasicBlock *AfterBB = new BasicBlock("afterloop", TheFunction);
1825
1826 // Insert the conditional branch into the end of LoopEndBB.
1827 Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
1828
1829 // Any new code will be inserted in AfterBB.
1830 Builder.SetInsertPoint(AfterBB);
1831
1832 // Restore the unshadowed variable.
1833 if (OldVal)
1834 NamedValues[VarName] = OldVal;
1835 else
1836 NamedValues.erase(VarName);
1837
1838
1839 // for expr always returns 0.0.
1840 return Constant::getNullValue(Type::DoubleTy);
1841}
1842
1843Value *VarExprAST::Codegen() {
1844 std::vector&lt;AllocaInst *&gt; OldBindings;
1845
1846 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1847
1848 // Register all variables and emit their initializer.
1849 for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
1850 const std::string &amp;VarName = VarNames[i].first;
1851 ExprAST *Init = VarNames[i].second;
1852
1853 // Emit the initializer before adding the variable to scope, this prevents
1854 // the initializer from referencing the variable itself, and permits stuff
1855 // like this:
1856 // var a = 1 in
1857 // var a = a in ... # refers to outer 'a'.
1858 Value *InitVal;
1859 if (Init) {
1860 InitVal = Init-&gt;Codegen();
1861 if (InitVal == 0) return 0;
1862 } else { // If not specified, use 0.0.
1863 InitVal = ConstantFP::get(Type::DoubleTy, APFloat(0.0));
1864 }
1865
1866 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
1867 Builder.CreateStore(InitVal, Alloca);
1868
1869 // Remember the old variable binding so that we can restore the binding when
1870 // we unrecurse.
1871 OldBindings.push_back(NamedValues[VarName]);
1872
1873 // Remember this binding.
1874 NamedValues[VarName] = Alloca;
1875 }
1876
1877 // Codegen the body, now that all vars are in scope.
1878 Value *BodyVal = Body-&gt;Codegen();
1879 if (BodyVal == 0) return 0;
1880
1881 // Pop all our variables from scope.
1882 for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
1883 NamedValues[VarNames[i].first] = OldBindings[i];
1884
1885 // Return the body computation.
1886 return BodyVal;
1887}
1888
1889
1890Function *PrototypeAST::Codegen() {
1891 // Make the function type: double(double,double) etc.
1892 std::vector&lt;const Type*&gt; Doubles(Args.size(), Type::DoubleTy);
1893 FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
1894
1895 Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
1896
1897 // If F conflicted, there was already something named 'Name'. If it has a
1898 // body, don't allow redefinition or reextern.
1899 if (F-&gt;getName() != Name) {
1900 // Delete the one we just made and get the existing one.
1901 F-&gt;eraseFromParent();
1902 F = TheModule-&gt;getFunction(Name);
1903
1904 // If F already has a body, reject this.
1905 if (!F-&gt;empty()) {
1906 ErrorF("redefinition of function");
1907 return 0;
1908 }
1909
1910 // If F took a different number of args, reject.
1911 if (F-&gt;arg_size() != Args.size()) {
1912 ErrorF("redefinition of function with different # args");
1913 return 0;
1914 }
1915 }
1916
1917 // Set names for all arguments.
1918 unsigned Idx = 0;
1919 for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
1920 ++AI, ++Idx)
1921 AI-&gt;setName(Args[Idx]);
1922
1923 return F;
1924}
1925
1926/// CreateArgumentAllocas - Create an alloca for each argument and register the
1927/// argument in the symbol table so that references to it will succeed.
1928void PrototypeAST::CreateArgumentAllocas(Function *F) {
1929 Function::arg_iterator AI = F-&gt;arg_begin();
1930 for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
1931 // Create an alloca for this variable.
1932 AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
1933
1934 // Store the initial value into the alloca.
1935 Builder.CreateStore(AI, Alloca);
1936
1937 // Add arguments to variable symbol table.
1938 NamedValues[Args[Idx]] = Alloca;
1939 }
1940}
1941
1942
1943Function *FunctionAST::Codegen() {
1944 NamedValues.clear();
1945
1946 Function *TheFunction = Proto-&gt;Codegen();
1947 if (TheFunction == 0)
1948 return 0;
1949
1950 // If this is an operator, install it.
1951 if (Proto-&gt;isBinaryOp())
1952 BinopPrecedence[Proto-&gt;getOperatorName()] = Proto-&gt;getBinaryPrecedence();
1953
1954 // Create a new basic block to start insertion into.
1955 BasicBlock *BB = new BasicBlock("entry", TheFunction);
1956 Builder.SetInsertPoint(BB);
1957
1958 // Add all arguments to the symbol table and create their allocas.
1959 Proto-&gt;CreateArgumentAllocas(TheFunction);
1960
1961 if (Value *RetVal = Body-&gt;Codegen()) {
1962 // Finish off the function.
1963 Builder.CreateRet(RetVal);
1964
1965 // Validate the generated code, checking for consistency.
1966 verifyFunction(*TheFunction);
1967
1968 // Optimize the function.
1969 TheFPM-&gt;run(*TheFunction);
1970
1971 return TheFunction;
1972 }
1973
1974 // Error reading body, remove function.
1975 TheFunction-&gt;eraseFromParent();
1976
1977 if (Proto-&gt;isBinaryOp())
1978 BinopPrecedence.erase(Proto-&gt;getOperatorName());
1979 return 0;
1980}
1981
1982//===----------------------------------------------------------------------===//
1983// Top-Level parsing and JIT Driver
1984//===----------------------------------------------------------------------===//
1985
1986static ExecutionEngine *TheExecutionEngine;
1987
1988static void HandleDefinition() {
1989 if (FunctionAST *F = ParseDefinition()) {
1990 if (Function *LF = F-&gt;Codegen()) {
1991 fprintf(stderr, "Read function definition:");
1992 LF-&gt;dump();
1993 }
1994 } else {
1995 // Skip token for error recovery.
1996 getNextToken();
1997 }
1998}
1999
2000static void HandleExtern() {
2001 if (PrototypeAST *P = ParseExtern()) {
2002 if (Function *F = P-&gt;Codegen()) {
2003 fprintf(stderr, "Read extern: ");
2004 F-&gt;dump();
2005 }
2006 } else {
2007 // Skip token for error recovery.
2008 getNextToken();
2009 }
2010}
2011
2012static void HandleTopLevelExpression() {
2013 // Evaluate a top level expression into an anonymous function.
2014 if (FunctionAST *F = ParseTopLevelExpr()) {
2015 if (Function *LF = F-&gt;Codegen()) {
2016 // JIT the function, returning a function pointer.
2017 void *FPtr = TheExecutionEngine-&gt;getPointerToFunction(LF);
2018
2019 // Cast it to the right type (takes no arguments, returns a double) so we
2020 // can call it as a native function.
2021 double (*FP)() = (double (*)())FPtr;
2022 fprintf(stderr, "Evaluated to %f\n", FP());
2023 }
2024 } else {
2025 // Skip token for error recovery.
2026 getNextToken();
2027 }
2028}
2029
2030/// top ::= definition | external | expression | ';'
2031static void MainLoop() {
2032 while (1) {
2033 fprintf(stderr, "ready&gt; ");
2034 switch (CurTok) {
2035 case tok_eof: return;
2036 case ';': getNextToken(); break; // ignore top level semicolons.
2037 case tok_def: HandleDefinition(); break;
2038 case tok_extern: HandleExtern(); break;
2039 default: HandleTopLevelExpression(); break;
2040 }
2041 }
2042}
2043
2044
2045
2046//===----------------------------------------------------------------------===//
2047// "Library" functions that can be "extern'd" from user code.
2048//===----------------------------------------------------------------------===//
2049
2050/// putchard - putchar that takes a double and returns 0.
2051extern "C"
2052double putchard(double X) {
2053 putchar((char)X);
2054 return 0;
2055}
2056
2057/// printd - printf that takes a double prints it as "%f\n", returning 0.
2058extern "C"
2059double printd(double X) {
2060 printf("%f\n", X);
2061 return 0;
2062}
2063
2064//===----------------------------------------------------------------------===//
2065// Main driver code.
2066//===----------------------------------------------------------------------===//
2067
2068int main() {
2069 // Install standard binary operators.
2070 // 1 is lowest precedence.
2071 BinopPrecedence['='] = 2;
2072 BinopPrecedence['&lt;'] = 10;
2073 BinopPrecedence['+'] = 20;
2074 BinopPrecedence['-'] = 20;
2075 BinopPrecedence['*'] = 40; // highest.
2076
2077 // Prime the first token.
2078 fprintf(stderr, "ready&gt; ");
2079 getNextToken();
2080
2081 // Make the module, which holds all the code.
2082 TheModule = new Module("my cool jit");
2083
2084 // Create the JIT.
2085 TheExecutionEngine = ExecutionEngine::create(TheModule);
2086
2087 {
2088 ExistingModuleProvider OurModuleProvider(TheModule);
2089 FunctionPassManager OurFPM(&amp;OurModuleProvider);
2090
2091 // Set up the optimizer pipeline. Start with registering info about how the
2092 // target lays out data structures.
2093 OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
2094 // Promote allocas to registers.
2095 OurFPM.add(createPromoteMemoryToRegisterPass());
2096 // Do simple "peephole" optimizations and bit-twiddling optzns.
2097 OurFPM.add(createInstructionCombiningPass());
2098 // Reassociate expressions.
2099 OurFPM.add(createReassociatePass());
2100 // Eliminate Common SubExpressions.
2101 OurFPM.add(createGVNPass());
2102 // Simplify the control flow graph (deleting unreachable blocks, etc).
2103 OurFPM.add(createCFGSimplificationPass());
2104
2105 // Set the global so the code gen can use this.
2106 TheFPM = &amp;OurFPM;
2107
2108 // Run the main "interpreter loop" now.
2109 MainLoop();
2110
2111 TheFPM = 0;
2112 } // Free module provider and pass manager.
2113
2114
2115 // Print out all of the generated code.
2116 TheModule-&gt;dump();
2117 return 0;
2118}
Chris Lattner00c992d2007-11-03 08:55:29 +00002119</pre>
2120</div>
2121
2122</div>
2123
2124<!-- *********************************************************************** -->
2125<hr>
2126<address>
2127 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
2128 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
2129 <a href="http://validator.w3.org/check/referer"><img
2130 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
2131
2132 <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
2133 <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
2134 Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $
2135</address>
2136</body>
2137</html>