blob: 9aa65d3b6e126bf74905ad9289961c9801883596 [file] [log] [blame]
Chris Lattner00c992d2007-11-03 08:55:29 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
3
4<html>
5<head>
6 <title>Kaleidoscope: Extending the Language: Mutable Variables / SSA
7 construction</title>
8 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
9 <meta name="author" content="Chris Lattner">
10 <link rel="stylesheet" href="../llvm.css" type="text/css">
11</head>
12
13<body>
14
15<div class="doc_title">Kaleidoscope: Extending the Language: Mutable Variables</div>
16
Chris Lattner128eb862007-11-05 19:06:59 +000017<ul>
Chris Lattner0e555b12007-11-05 20:04:56 +000018<li><a href="index.html">Up to Tutorial Index</a></li>
Chris Lattner128eb862007-11-05 19:06:59 +000019<li>Chapter 7
20 <ol>
21 <li><a href="#intro">Chapter 7 Introduction</a></li>
22 <li><a href="#why">Why is this a hard problem?</a></li>
23 <li><a href="#memory">Memory in LLVM</a></li>
24 <li><a href="#kalvars">Mutable Variables in Kaleidoscope</a></li>
25 <li><a href="#adjustments">Adjusting Existing Variables for
26 Mutation</a></li>
27 <li><a href="#assignment">New Assignment Operator</a></li>
28 <li><a href="#localvars">User-defined Local Variables</a></li>
29 <li><a href="#code">Full Code Listing</a></li>
30 </ol>
31</li>
Chris Lattner0e555b12007-11-05 20:04:56 +000032<li><a href="LangImpl8.html">Chapter 8</a>: Conclusion and other useful LLVM
33 tidbits</li>
Chris Lattner128eb862007-11-05 19:06:59 +000034</ul>
35
Chris Lattner00c992d2007-11-03 08:55:29 +000036<div class="doc_author">
37 <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
38</div>
39
40<!-- *********************************************************************** -->
Chris Lattner128eb862007-11-05 19:06:59 +000041<div class="doc_section"><a name="intro">Chapter 7 Introduction</a></div>
Chris Lattner00c992d2007-11-03 08:55:29 +000042<!-- *********************************************************************** -->
43
44<div class="doc_text">
45
Chris Lattner128eb862007-11-05 19:06:59 +000046<p>Welcome to Chapter 7 of the "<a href="index.html">Implementing a language
47with LLVM</a>" tutorial. In chapters 1 through 6, we've built a very
48respectable, albeit simple, <a
Chris Lattner00c992d2007-11-03 08:55:29 +000049href="http://en.wikipedia.org/wiki/Functional_programming">functional
50programming language</a>. In our journey, we learned some parsing techniques,
51how to build and represent an AST, how to build LLVM IR, and how to optimize
52the resultant code and JIT compile it.</p>
53
54<p>While Kaleidoscope is interesting as a functional language, this makes it
55"too easy" to generate LLVM IR for it. In particular, a functional language
56makes it very easy to build LLVM IR directly in <a
57href="http://en.wikipedia.org/wiki/Static_single_assignment_form">SSA form</a>.
58Since LLVM requires that the input code be in SSA form, this is a very nice
59property and it is often unclear to newcomers how to generate code for an
60imperative language with mutable variables.</p>
61
62<p>The short (and happy) summary of this chapter is that there is no need for
63your front-end to build SSA form: LLVM provides highly tuned and well tested
64support for this, though the way it works is a bit unexpected for some.</p>
65
66</div>
67
68<!-- *********************************************************************** -->
69<div class="doc_section"><a name="why">Why is this a hard problem?</a></div>
70<!-- *********************************************************************** -->
71
72<div class="doc_text">
73
74<p>
75To understand why mutable variables cause complexities in SSA construction,
76consider this extremely simple C example:
77</p>
78
79<div class="doc_code">
80<pre>
81int G, H;
82int test(_Bool Condition) {
83 int X;
84 if (Condition)
85 X = G;
86 else
87 X = H;
88 return X;
89}
90</pre>
91</div>
92
93<p>In this case, we have the variable "X", whose value depends on the path
94executed in the program. Because there are two different possible values for X
95before the return instruction, a PHI node is inserted to merge the two values.
96The LLVM IR that we want for this example looks like this:</p>
97
98<div class="doc_code">
99<pre>
100@G = weak global i32 0 ; type of @G is i32*
101@H = weak global i32 0 ; type of @H is i32*
102
103define i32 @test(i1 %Condition) {
104entry:
105 br i1 %Condition, label %cond_true, label %cond_false
106
107cond_true:
108 %X.0 = load i32* @G
109 br label %cond_next
110
111cond_false:
112 %X.1 = load i32* @H
113 br label %cond_next
114
115cond_next:
116 %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
117 ret i32 %X.2
118}
119</pre>
120</div>
121
122<p>In this example, the loads from the G and H global variables are explicit in
123the LLVM IR, and they live in the then/else branches of the if statement
124(cond_true/cond_false). In order to merge the incoming values, the X.2 phi node
125in the cond_next block selects the right value to use based on where control
126flow is coming from: if control flow comes from the cond_false block, X.2 gets
127the value of X.1. Alternatively, if control flow comes from cond_tree, it gets
128the value of X.0. The intent of this chapter is not to explain the details of
129SSA form. For more information, see one of the many <a
130href="http://en.wikipedia.org/wiki/Static_single_assignment_form">online
131references</a>.</p>
132
133<p>The question for this article is "who places phi nodes when lowering
134assignments to mutable variables?". The issue here is that LLVM
135<em>requires</em> that its IR be in SSA form: there is no "non-ssa" mode for it.
136However, SSA construction requires non-trivial algorithms and data structures,
137so it is inconvenient and wasteful for every front-end to have to reproduce this
138logic.</p>
139
140</div>
141
142<!-- *********************************************************************** -->
143<div class="doc_section"><a name="memory">Memory in LLVM</a></div>
144<!-- *********************************************************************** -->
145
146<div class="doc_text">
147
148<p>The 'trick' here is that while LLVM does require all register values to be
149in SSA form, it does not require (or permit) memory objects to be in SSA form.
150In the example above, note that the loads from G and H are direct accesses to
151G and H: they are not renamed or versioned. This differs from some other
Chris Lattner2e5d07e2007-11-04 19:42:13 +0000152compiler systems, which do try to version memory objects. In LLVM, instead of
Chris Lattner00c992d2007-11-03 08:55:29 +0000153encoding dataflow analysis of memory into the LLVM IR, it is handled with <a
154href="../WritingAnLLVMPass.html">Analysis Passes</a> which are computed on
155demand.</p>
156
157<p>
158With this in mind, the high-level idea is that we want to make a stack variable
159(which lives in memory, because it is on the stack) for each mutable object in
160a function. To take advantage of this trick, we need to talk about how LLVM
161represents stack variables.
162</p>
163
164<p>In LLVM, all memory accesses are explicit with load/store instructions, and
165it is carefully designed to not have (or need) an "address-of" operator. Notice
166how the type of the @G/@H global variables is actually "i32*" even though the
167variable is defined as "i32". What this means is that @G defines <em>space</em>
168for an i32 in the global data area, but its <em>name</em> actually refers to the
169address for that space. Stack variables work the same way, but instead of being
170declared with global variable definitions, they are declared with the
171<a href="../LangRef.html#i_alloca">LLVM alloca instruction</a>:</p>
172
173<div class="doc_code">
174<pre>
175define i32 @test(i1 %Condition) {
176entry:
177 %X = alloca i32 ; type of %X is i32*.
178 ...
179 %tmp = load i32* %X ; load the stack value %X from the stack.
180 %tmp2 = add i32 %tmp, 1 ; increment it
181 store i32 %tmp2, i32* %X ; store it back
182 ...
183</pre>
184</div>
185
186<p>This code shows an example of how you can declare and manipulate a stack
187variable in the LLVM IR. Stack memory allocated with the alloca instruction is
188fully general: you can pass the address of the stack slot to functions, you can
189store it in other variables, etc. In our example above, we could rewrite the
190example to use the alloca technique to avoid using a PHI node:</p>
191
192<div class="doc_code">
193<pre>
194@G = weak global i32 0 ; type of @G is i32*
195@H = weak global i32 0 ; type of @H is i32*
196
197define i32 @test(i1 %Condition) {
198entry:
199 %X = alloca i32 ; type of %X is i32*.
200 br i1 %Condition, label %cond_true, label %cond_false
201
202cond_true:
203 %X.0 = load i32* @G
204 store i32 %X.0, i32* %X ; Update X
205 br label %cond_next
206
207cond_false:
208 %X.1 = load i32* @H
209 store i32 %X.1, i32* %X ; Update X
210 br label %cond_next
211
212cond_next:
213 %X.2 = load i32* %X ; Read X
214 ret i32 %X.2
215}
216</pre>
217</div>
218
219<p>With this, we have discovered a way to handle arbitrary mutable variables
220without the need to create Phi nodes at all:</p>
221
222<ol>
223<li>Each mutable variable becomes a stack allocation.</li>
224<li>Each read of the variable becomes a load from the stack.</li>
225<li>Each update of the variable becomes a store to the stack.</li>
226<li>Taking the address of a variable just uses the stack address directly.</li>
227</ol>
228
229<p>While this solution has solved our immediate problem, it introduced another
230one: we have now apparently introduced a lot of stack traffic for very simple
231and common operations, a major performance problem. Fortunately for us, the
232LLVM optimizer has a highly-tuned optimization pass named "mem2reg" that handles
233this case, promoting allocas like this into SSA registers, inserting Phi nodes
234as appropriate. If you run this example through the pass, for example, you'll
235get:</p>
236
237<div class="doc_code">
238<pre>
239$ <b>llvm-as &lt; example.ll | opt -mem2reg | llvm-dis</b>
240@G = weak global i32 0
241@H = weak global i32 0
242
243define i32 @test(i1 %Condition) {
244entry:
245 br i1 %Condition, label %cond_true, label %cond_false
246
247cond_true:
248 %X.0 = load i32* @G
249 br label %cond_next
250
251cond_false:
252 %X.1 = load i32* @H
253 br label %cond_next
254
255cond_next:
256 %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
257 ret i32 %X.01
258}
259</pre>
Chris Lattnere7198312007-11-03 22:22:30 +0000260</div>
Chris Lattner00c992d2007-11-03 08:55:29 +0000261
Chris Lattnere7198312007-11-03 22:22:30 +0000262<p>The mem2reg pass implements the standard "iterated dominator frontier"
263algorithm for constructing SSA form and has a number of optimizations that speed
264up very common degenerate cases. mem2reg really is the answer for dealing with
265mutable variables, and we highly recommend that you depend on it. Note that
266mem2reg only works on variables in certain circumstances:</p>
Chris Lattner00c992d2007-11-03 08:55:29 +0000267
Chris Lattnere7198312007-11-03 22:22:30 +0000268<ol>
269<li>mem2reg is alloca-driven: it looks for allocas and if it can handle them, it
270promotes them. It does not apply to global variables or heap allocations.</li>
Chris Lattner00c992d2007-11-03 08:55:29 +0000271
Chris Lattnere7198312007-11-03 22:22:30 +0000272<li>mem2reg only looks for alloca instructions in the entry block of the
273function. Being in the entry block guarantees that the alloca is only executed
274once, which makes analysis simpler.</li>
Chris Lattner00c992d2007-11-03 08:55:29 +0000275
Chris Lattnere7198312007-11-03 22:22:30 +0000276<li>mem2reg only promotes allocas whose uses are direct loads and stores. If
277the address of the stack object is passed to a function, or if any funny pointer
278arithmetic is involved, the alloca will not be promoted.</li>
279
Chris Lattnera56b22d2007-11-05 17:45:54 +0000280<li>mem2reg only works on allocas of <a
281href="../LangRef.html#t_classifications">first class</a>
282values (such as pointers, scalars and vectors), and only if the array size
Chris Lattnere7198312007-11-03 22:22:30 +0000283of the allocation is 1 (or missing in the .ll file). mem2reg is not capable of
284promoting structs or arrays to registers. Note that the "scalarrepl" pass is
285more powerful and can promote structs, "unions", and arrays in many cases.</li>
286
287</ol>
288
289<p>
290All of these properties are easy to satisfy for most imperative languages, and
Chris Lattner2e5d07e2007-11-04 19:42:13 +0000291we'll illustrate this below with Kaleidoscope. The final question you may be
Chris Lattnere7198312007-11-03 22:22:30 +0000292asking is: should I bother with this nonsense for my front-end? Wouldn't it be
293better if I just did SSA construction directly, avoiding use of the mem2reg
294optimization pass? In short, we strongly recommend that use you this technique
295for building SSA form, unless there is an extremely good reason not to. Using
296this technique is:</p>
297
298<ul>
299<li>Proven and well tested: llvm-gcc and clang both use this technique for local
300mutable variables. As such, the most common clients of LLVM are using this to
301handle a bulk of their variables. You can be sure that bugs are found fast and
302fixed early.</li>
303
304<li>Extremely Fast: mem2reg has a number of special cases that make it fast in
305common cases as well as fully general. For example, it has fast-paths for
306variables that are only used in a single block, variables that only have one
307assignment point, good heuristics to avoid insertion of unneeded phi nodes, etc.
308</li>
309
310<li>Needed for debug info generation: <a href="../SourceLevelDebugging.html">
311Debug information in LLVM</a> relies on having the address of the variable
312exposed to attach debug info to it. This technique dovetails very naturally
313with this style of debug info.</li>
314</ul>
315
316<p>If nothing else, this makes it much easier to get your front-end up and
317running, and is very simple to implement. Lets extend Kaleidoscope with mutable
318variables now!
Chris Lattner00c992d2007-11-03 08:55:29 +0000319</p>
Chris Lattner62a709d2007-11-05 00:23:57 +0000320
Chris Lattner00c992d2007-11-03 08:55:29 +0000321</div>
322
Chris Lattner62a709d2007-11-05 00:23:57 +0000323<!-- *********************************************************************** -->
324<div class="doc_section"><a name="kalvars">Mutable Variables in
325Kaleidoscope</a></div>
326<!-- *********************************************************************** -->
327
328<div class="doc_text">
329
330<p>Now that we know the sort of problem we want to tackle, lets see what this
331looks like in the context of our little Kaleidoscope language. We're going to
332add two features:</p>
333
334<ol>
335<li>The ability to mutate variables with the '=' operator.</li>
336<li>The ability to define new variables.</li>
337</ol>
338
339<p>While the first item is really what this is about, we only have variables
340for incoming arguments and for induction variables, and redefining them only
341goes so far :). Also, the ability to define new variables is a
342useful thing regardless of whether you will be mutating them. Here's a
343motivating example that shows how we could use these:</p>
344
345<div class="doc_code">
346<pre>
347# Define ':' for sequencing: as a low-precedence operator that ignores operands
348# and just returns the RHS.
349def binary : 1 (x y) y;
350
351# Recursive fib, we could do this before.
352def fib(x)
353 if (x &lt; 3) then
354 1
355 else
356 fib(x-1)+fib(x-2);
357
358# Iterative fib.
359def fibi(x)
360 <b>var a = 1, b = 1, c in</b>
361 (for i = 3, i &;t; x in
362 <b>c = a + b</b> :
363 <b>a = b</b> :
364 <b>b = c</b>) :
365 b;
366
367# Call it.
368fibi(10);
369</pre>
370</div>
371
372<p>
373In order to mutate variables, we have to change our existing variables to use
374the "alloca trick". Once we have that, we'll add our new operator, then extend
375Kaleidoscope to support new variable definitions.
376</p>
377
378</div>
379
380<!-- *********************************************************************** -->
381<div class="doc_section"><a name="adjustments">Adjusting Existing Variables for
382Mutation</a></div>
383<!-- *********************************************************************** -->
384
385<div class="doc_text">
386
387<p>
388The symbol table in Kaleidoscope is managed at code generation time by the
389'<tt>NamedValues</tt>' map. This map currently keeps track of the LLVM "Value*"
390that holds the double value for the named variable. In order to support
391mutation, we need to change this slightly, so that it <tt>NamedValues</tt> holds
392the <em>memory location</em> of the variable in question. Note that this
393change is a refactoring: it changes the structure of the code, but does not
394(by itself) change the behavior of the compiler. All of these changes are
395isolated in the Kaleidoscope code generator.</p>
396
397<p>
398At this point in Kaleidoscope's development, it only supports variables for two
399things: incoming arguments to functions and the induction variable of 'for'
400loops. For consistency, we'll allow mutation of these variables in addition to
401other user-defined variables. This means that these will both need memory
402locations.
403</p>
404
405<p>To start our transformation of Kaleidoscope, we'll change the NamedValues
406map to map to AllocaInst* instead of Value*. Once we do this, the C++ compiler
407will tell use what parts of the code we need to update:</p>
408
409<div class="doc_code">
410<pre>
411static std::map&lt;std::string, AllocaInst*&gt; NamedValues;
412</pre>
413</div>
414
415<p>Also, since we will need to create these alloca's, we'll use a helper
416function that ensures that the allocas are created in the entry block of the
417function:</p>
418
419<div class="doc_code">
420<pre>
421/// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
422/// the function. This is used for mutable variables etc.
423static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
424 const std::string &amp;VarName) {
425 LLVMBuilder TmpB(&amp;TheFunction-&gt;getEntryBlock(),
426 TheFunction-&gt;getEntryBlock().begin());
427 return TmpB.CreateAlloca(Type::DoubleTy, 0, VarName.c_str());
428}
429</pre>
430</div>
431
432<p>This funny looking code creates an LLVMBuilder object that is pointing at
433the first instruction (.begin()) of the entry block. It then creates an alloca
434with the expected name and returns it. Because all values in Kaleidoscope are
435doubles, there is no need to pass in a type to use.</p>
436
437<p>With this in place, the first functionality change we want to make is to
438variable references. In our new scheme, variables live on the stack, so code
439generating a reference to them actually needs to produce a load from the stack
440slot:</p>
441
442<div class="doc_code">
443<pre>
444Value *VariableExprAST::Codegen() {
445 // Look this variable up in the function.
446 Value *V = NamedValues[Name];
447 if (V == 0) return ErrorV("Unknown variable name");
448
449 // Load the value.
450 return Builder.CreateLoad(V, Name.c_str());
451}
452</pre>
453</div>
454
455<p>As you can see, this is pretty straight-forward. Next we need to update the
456things that define the variables to set up the alloca. We'll start with
457<tt>ForExprAST::Codegen</tt> (see the <a href="#code">full code listing</a> for
458the unabridged code):</p>
459
460<div class="doc_code">
461<pre>
462 Function *TheFunction = Builder.GetInsertBlock()->getParent();
463
464 <b>// Create an alloca for the variable in the entry block.
465 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);</b>
466
467 // Emit the start code first, without 'variable' in scope.
468 Value *StartVal = Start-&gt;Codegen();
469 if (StartVal == 0) return 0;
470
471 <b>// Store the value into the alloca.
472 Builder.CreateStore(StartVal, Alloca);</b>
473 ...
474
475 // Compute the end condition.
476 Value *EndCond = End-&gt;Codegen();
477 if (EndCond == 0) return EndCond;
478
479 <b>// Reload, increment, and restore the alloca. This handles the case where
480 // the body of the loop mutates the variable.
481 Value *CurVar = Builder.CreateLoad(Alloca);
482 Value *NextVar = Builder.CreateAdd(CurVar, StepVal, "nextvar");
483 Builder.CreateStore(NextVar, Alloca);</b>
484 ...
485</pre>
486</div>
487
488<p>This code is virtually identical to the code <a
489href="LangImpl5.html#forcodegen">before we allowed mutable variables</a>. The
490big difference is that we no longer have to construct a PHI node, and we use
491load/store to access the variable as needed.</p>
492
493<p>To support mutable argument variables, we need to also make allocas for them.
494The code for this is also pretty simple:</p>
495
496<div class="doc_code">
497<pre>
498/// CreateArgumentAllocas - Create an alloca for each argument and register the
499/// argument in the symbol table so that references to it will succeed.
500void PrototypeAST::CreateArgumentAllocas(Function *F) {
501 Function::arg_iterator AI = F-&gt;arg_begin();
502 for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
503 // Create an alloca for this variable.
504 AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
505
506 // Store the initial value into the alloca.
507 Builder.CreateStore(AI, Alloca);
508
509 // Add arguments to variable symbol table.
510 NamedValues[Args[Idx]] = Alloca;
511 }
512}
513</pre>
514</div>
515
516<p>For each argument, we make an alloca, store the input value to the function
517into the alloca, and register the alloca as the memory location for the
518argument. This method gets invoked by <tt>FunctionAST::Codegen</tt> right after
519it sets up the entry block for the function.</p>
520
521<p>The final missing piece is adding the 'mem2reg' pass, which allows us to get
522good codegen once again:</p>
523
524<div class="doc_code">
525<pre>
526 // Set up the optimizer pipeline. Start with registering info about how the
527 // target lays out data structures.
528 OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
529 <b>// Promote allocas to registers.
530 OurFPM.add(createPromoteMemoryToRegisterPass());</b>
531 // Do simple "peephole" optimizations and bit-twiddling optzns.
532 OurFPM.add(createInstructionCombiningPass());
533 // Reassociate expressions.
534 OurFPM.add(createReassociatePass());
535</pre>
536</div>
537
538<p>It is interesting to see what the code looks like before and after the
539mem2reg optimization runs. For example, this is the before/after code for our
540recursive fib. Before the optimization:</p>
541
542<div class="doc_code">
543<pre>
544define double @fib(double %x) {
545entry:
546 <b>%x1 = alloca double
547 store double %x, double* %x1
548 %x2 = load double* %x1</b>
Chris Lattner71155212007-11-06 01:39:12 +0000549 %cmptmp = fcmp ult double %x2, 3.000000e+00
550 %booltmp = uitofp i1 %cmptmp to double
Chris Lattner62a709d2007-11-05 00:23:57 +0000551 %ifcond = fcmp one double %booltmp, 0.000000e+00
552 br i1 %ifcond, label %then, label %else
553
554then: ; preds = %entry
555 br label %ifcont
556
557else: ; preds = %entry
558 <b>%x3 = load double* %x1</b>
559 %subtmp = sub double %x3, 1.000000e+00
560 %calltmp = call double @fib( double %subtmp )
561 <b>%x4 = load double* %x1</b>
562 %subtmp5 = sub double %x4, 2.000000e+00
563 %calltmp6 = call double @fib( double %subtmp5 )
564 %addtmp = add double %calltmp, %calltmp6
565 br label %ifcont
566
567ifcont: ; preds = %else, %then
568 %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
569 ret double %iftmp
570}
571</pre>
572</div>
573
574<p>Here there is only one variable (x, the input argument) but you can still
575see the extremely simple-minded code generation strategy we are using. In the
576entry block, an alloca is created, and the initial input value is stored into
577it. Each reference to the variable does a reload from the stack. Also, note
578that we didn't modify the if/then/else expression, so it still inserts a PHI
579node. While we could make an alloca for it, it is actually easier to create a
580PHI node for it, so we still just make the PHI.</p>
581
582<p>Here is the code after the mem2reg pass runs:</p>
583
584<div class="doc_code">
585<pre>
586define double @fib(double %x) {
587entry:
Chris Lattner71155212007-11-06 01:39:12 +0000588 %cmptmp = fcmp ult double <b>%x</b>, 3.000000e+00
589 %booltmp = uitofp i1 %cmptmp to double
Chris Lattner62a709d2007-11-05 00:23:57 +0000590 %ifcond = fcmp one double %booltmp, 0.000000e+00
591 br i1 %ifcond, label %then, label %else
592
593then:
594 br label %ifcont
595
596else:
597 %subtmp = sub double <b>%x</b>, 1.000000e+00
598 %calltmp = call double @fib( double %subtmp )
599 %subtmp5 = sub double <b>%x</b>, 2.000000e+00
600 %calltmp6 = call double @fib( double %subtmp5 )
601 %addtmp = add double %calltmp, %calltmp6
602 br label %ifcont
603
604ifcont: ; preds = %else, %then
605 %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
606 ret double %iftmp
607}
608</pre>
609</div>
610
611<p>This is a trivial case for mem2reg, since there are no redefinitions of the
612variable. The point of showing this is to calm your tension about inserting
613such blatent inefficiencies :).</p>
614
615<p>After the rest of the optimizers run, we get:</p>
616
617<div class="doc_code">
618<pre>
619define double @fib(double %x) {
620entry:
Chris Lattner71155212007-11-06 01:39:12 +0000621 %cmptmp = fcmp ult double %x, 3.000000e+00
622 %booltmp = uitofp i1 %cmptmp to double
Chris Lattner62a709d2007-11-05 00:23:57 +0000623 %ifcond = fcmp ueq double %booltmp, 0.000000e+00
624 br i1 %ifcond, label %else, label %ifcont
625
626else:
627 %subtmp = sub double %x, 1.000000e+00
628 %calltmp = call double @fib( double %subtmp )
629 %subtmp5 = sub double %x, 2.000000e+00
630 %calltmp6 = call double @fib( double %subtmp5 )
631 %addtmp = add double %calltmp, %calltmp6
632 ret double %addtmp
633
634ifcont:
635 ret double 1.000000e+00
636}
637</pre>
638</div>
639
640<p>Here we see that the simplifycfg pass decided to clone the return instruction
641into the end of the 'else' block. This allowed it to eliminate some branches
642and the PHI node.</p>
643
644<p>Now that all symbol table references are updated to use stack variables,
645we'll add the assignment operator.</p>
646
647</div>
648
649<!-- *********************************************************************** -->
650<div class="doc_section"><a name="assignment">New Assignment Operator</a></div>
651<!-- *********************************************************************** -->
652
653<div class="doc_text">
654
655<p>With our current framework, adding a new assignment operator is really
656simple. We will parse it just like any other binary operator, but handle it
657internally (instead of allowing the user to define it). The first step is to
658set a precedence:</p>
659
660<div class="doc_code">
661<pre>
662 int main() {
663 // Install standard binary operators.
664 // 1 is lowest precedence.
665 <b>BinopPrecedence['='] = 2;</b>
666 BinopPrecedence['&lt;'] = 10;
667 BinopPrecedence['+'] = 20;
668 BinopPrecedence['-'] = 20;
669</pre>
670</div>
671
672<p>Now that the parser knows the precedence of the binary operator, it takes
673care of all the parsing and AST generation. We just need to implement codegen
674for the assignment operator. This looks like:</p>
675
676<div class="doc_code">
677<pre>
678Value *BinaryExprAST::Codegen() {
679 // Special case '=' because we don't want to emit the LHS as an expression.
680 if (Op == '=') {
681 // Assignment requires the LHS to be an identifier.
682 VariableExprAST *LHSE = dynamic_cast&lt;VariableExprAST*&gt;(LHS);
683 if (!LHSE)
684 return ErrorV("destination of '=' must be a variable");
685</pre>
686</div>
687
688<p>Unlike the rest of the binary operators, our assignment operator doesn't
689follow the "emit LHS, emit RHS, do computation" model. As such, it is handled
690as a special case before the other binary operators are handled. The other
691strange thing about it is that it requires the LHS to be a variable directly.
692</p>
693
694<div class="doc_code">
695<pre>
696 // Codegen the RHS.
697 Value *Val = RHS-&gt;Codegen();
698 if (Val == 0) return 0;
699
700 // Look up the name.
701 Value *Variable = NamedValues[LHSE-&gt;getName()];
702 if (Variable == 0) return ErrorV("Unknown variable name");
703
704 Builder.CreateStore(Val, Variable);
705 return Val;
706 }
707 ...
708</pre>
709</div>
710
711<p>Once it has the variable, codegen'ing the assignment is straight-forward:
712we emit the RHS of the assignment, create a store, and return the computed
713value. Returning a value allows for chained assignments like "X = (Y = Z)".</p>
714
715<p>Now that we have an assignment operator, we can mutate loop variables and
716arguments. For example, we can now run code like this:</p>
717
718<div class="doc_code">
719<pre>
720# Function to print a double.
721extern printd(x);
722
723# Define ':' for sequencing: as a low-precedence operator that ignores operands
724# and just returns the RHS.
725def binary : 1 (x y) y;
726
727def test(x)
728 printd(x) :
729 x = 4 :
730 printd(x);
731
732test(123);
733</pre>
734</div>
735
736<p>When run, this example prints "123" and then "4", showing that we did
737actually mutate the value! Okay, we have now officially implemented our goal:
738getting this to work requires SSA construction in the general case. However,
739to be really useful, we want the ability to define our own local variables, lets
740add this next!
741</p>
742
743</div>
744
745<!-- *********************************************************************** -->
746<div class="doc_section"><a name="localvars">User-defined Local
747Variables</a></div>
748<!-- *********************************************************************** -->
749
750<div class="doc_text">
751
752<p>Adding var/in is just like any other other extensions we made to
753Kaleidoscope: we extend the lexer, the parser, the AST and the code generator.
754The first step for adding our new 'var/in' construct is to extend the lexer.
755As before, this is pretty trivial, the code looks like this:</p>
756
757<div class="doc_code">
758<pre>
759enum Token {
760 ...
761 <b>// var definition
762 tok_var = -13</b>
763...
764}
765...
766static int gettok() {
767...
768 if (IdentifierStr == "in") return tok_in;
769 if (IdentifierStr == "binary") return tok_binary;
770 if (IdentifierStr == "unary") return tok_unary;
771 <b>if (IdentifierStr == "var") return tok_var;</b>
772 return tok_identifier;
773...
774</pre>
775</div>
776
777<p>The next step is to define the AST node that we will construct. For var/in,
778it will look like this:</p>
779
780<div class="doc_code">
781<pre>
782/// VarExprAST - Expression class for var/in
783class VarExprAST : public ExprAST {
784 std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
785 ExprAST *Body;
786public:
787 VarExprAST(const std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; &amp;varnames,
788 ExprAST *body)
789 : VarNames(varnames), Body(body) {}
790
791 virtual Value *Codegen();
792};
793</pre>
794</div>
795
796<p>var/in allows a list of names to be defined all at once, and each name can
797optionally have an initializer value. As such, we capture this information in
798the VarNames vector. Also, var/in has a body, this body is allowed to access
799the variables defined by the let/in.</p>
800
801<p>With this ready, we can define the parser pieces. First thing we do is add
802it as a primary expression:</p>
803
804<div class="doc_code">
805<pre>
806/// primary
807/// ::= identifierexpr
808/// ::= numberexpr
809/// ::= parenexpr
810/// ::= ifexpr
811/// ::= forexpr
812<b>/// ::= varexpr</b>
813static ExprAST *ParsePrimary() {
814 switch (CurTok) {
815 default: return Error("unknown token when expecting an expression");
816 case tok_identifier: return ParseIdentifierExpr();
817 case tok_number: return ParseNumberExpr();
818 case '(': return ParseParenExpr();
819 case tok_if: return ParseIfExpr();
820 case tok_for: return ParseForExpr();
821 <b>case tok_var: return ParseVarExpr();</b>
822 }
823}
824</pre>
825</div>
826
827<p>Next we define ParseVarExpr:</p>
828
829<div class="doc_code">
830<pre>
Chris Lattner20a0c802007-11-05 17:54:34 +0000831/// varexpr ::= 'var' identifier ('=' expression)?
832// (',' identifier ('=' expression)?)* 'in' expression
Chris Lattner62a709d2007-11-05 00:23:57 +0000833static ExprAST *ParseVarExpr() {
834 getNextToken(); // eat the var.
835
836 std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
837
838 // At least one variable name is required.
839 if (CurTok != tok_identifier)
840 return Error("expected identifier after var");
841</pre>
842</div>
843
844<p>The first part of this code parses the list of identifier/expr pairs into the
845local <tt>VarNames</tt> vector.
846
847<div class="doc_code">
848<pre>
849 while (1) {
850 std::string Name = IdentifierStr;
Chris Lattner20a0c802007-11-05 17:54:34 +0000851 getNextToken(); // eat identifier.
Chris Lattner62a709d2007-11-05 00:23:57 +0000852
853 // Read the optional initializer.
854 ExprAST *Init = 0;
855 if (CurTok == '=') {
856 getNextToken(); // eat the '='.
857
858 Init = ParseExpression();
859 if (Init == 0) return 0;
860 }
861
862 VarNames.push_back(std::make_pair(Name, Init));
863
864 // End of var list, exit loop.
865 if (CurTok != ',') break;
866 getNextToken(); // eat the ','.
867
868 if (CurTok != tok_identifier)
869 return Error("expected identifier list after var");
870 }
871</pre>
872</div>
873
874<p>Once all the variables are parsed, we then parse the body and create the
875AST node:</p>
876
877<div class="doc_code">
878<pre>
879 // At this point, we have to have 'in'.
880 if (CurTok != tok_in)
881 return Error("expected 'in' keyword after 'var'");
882 getNextToken(); // eat 'in'.
883
884 ExprAST *Body = ParseExpression();
885 if (Body == 0) return 0;
886
887 return new VarExprAST(VarNames, Body);
888}
889</pre>
890</div>
891
892<p>Now that we can parse and represent the code, we need to support emission of
893LLVM IR for it. This code starts out with:</p>
894
895<div class="doc_code">
896<pre>
897Value *VarExprAST::Codegen() {
898 std::vector&lt;AllocaInst *&gt; OldBindings;
899
900 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
901
902 // Register all variables and emit their initializer.
903 for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
904 const std::string &amp;VarName = VarNames[i].first;
905 ExprAST *Init = VarNames[i].second;
906</pre>
907</div>
908
909<p>Basically it loops over all the variables, installing them one at a time.
910For each variable we put into the symbol table, we remember the previous value
911that we replace in OldBindings.</p>
912
913<div class="doc_code">
914<pre>
915 // Emit the initializer before adding the variable to scope, this prevents
916 // the initializer from referencing the variable itself, and permits stuff
917 // like this:
918 // var a = 1 in
919 // var a = a in ... # refers to outer 'a'.
920 Value *InitVal;
921 if (Init) {
922 InitVal = Init-&gt;Codegen();
923 if (InitVal == 0) return 0;
924 } else { // If not specified, use 0.0.
925 InitVal = ConstantFP::get(Type::DoubleTy, APFloat(0.0));
926 }
927
928 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
929 Builder.CreateStore(InitVal, Alloca);
930
931 // Remember the old variable binding so that we can restore the binding when
932 // we unrecurse.
933 OldBindings.push_back(NamedValues[VarName]);
934
935 // Remember this binding.
936 NamedValues[VarName] = Alloca;
937 }
938</pre>
939</div>
940
941<p>There are more comments here than code. The basic idea is that we emit the
942initializer, create the alloca, then update the symbol table to point to it.
943Once all the variables are installed in the symbol table, we evaluate the body
944of the var/in expression:</p>
945
946<div class="doc_code">
947<pre>
948 // Codegen the body, now that all vars are in scope.
949 Value *BodyVal = Body-&gt;Codegen();
950 if (BodyVal == 0) return 0;
951</pre>
952</div>
953
954<p>Finally, before returning, we restore the previous variable bindings:</p>
955
956<div class="doc_code">
957<pre>
958 // Pop all our variables from scope.
959 for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
960 NamedValues[VarNames[i].first] = OldBindings[i];
961
962 // Return the body computation.
963 return BodyVal;
964}
965</pre>
966</div>
967
968<p>The end result of all of this is that we get properly scoped variable
969definitions, and we even (trivially) allow mutation of them :).</p>
970
971<p>With this, we completed what we set out to do. Our nice iterative fib
972example from the intro compiles and runs just fine. The mem2reg pass optimizes
973all of our stack variables into SSA registers, inserting PHI nodes where needed,
974and our front-end remains simple: no iterated dominator frontier computation
975anywhere in sight.</p>
976
977</div>
Chris Lattner00c992d2007-11-03 08:55:29 +0000978
979<!-- *********************************************************************** -->
980<div class="doc_section"><a name="code">Full Code Listing</a></div>
981<!-- *********************************************************************** -->
982
983<div class="doc_text">
984
985<p>
Chris Lattner62a709d2007-11-05 00:23:57 +0000986Here is the complete code listing for our running example, enhanced with mutable
987variables and var/in support. To build this example, use:
Chris Lattner00c992d2007-11-03 08:55:29 +0000988</p>
989
990<div class="doc_code">
991<pre>
992 # Compile
993 g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
994 # Run
995 ./toy
996</pre>
997</div>
998
999<p>Here is the code:</p>
1000
1001<div class="doc_code">
1002<pre>
Chris Lattner62a709d2007-11-05 00:23:57 +00001003#include "llvm/DerivedTypes.h"
1004#include "llvm/ExecutionEngine/ExecutionEngine.h"
1005#include "llvm/Module.h"
1006#include "llvm/ModuleProvider.h"
1007#include "llvm/PassManager.h"
1008#include "llvm/Analysis/Verifier.h"
1009#include "llvm/Target/TargetData.h"
1010#include "llvm/Transforms/Scalar.h"
1011#include "llvm/Support/LLVMBuilder.h"
1012#include &lt;cstdio&gt;
1013#include &lt;string&gt;
1014#include &lt;map&gt;
1015#include &lt;vector&gt;
1016using namespace llvm;
1017
1018//===----------------------------------------------------------------------===//
1019// Lexer
1020//===----------------------------------------------------------------------===//
1021
1022// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
1023// of these for known things.
1024enum Token {
1025 tok_eof = -1,
1026
1027 // commands
1028 tok_def = -2, tok_extern = -3,
1029
1030 // primary
1031 tok_identifier = -4, tok_number = -5,
1032
1033 // control
1034 tok_if = -6, tok_then = -7, tok_else = -8,
1035 tok_for = -9, tok_in = -10,
1036
1037 // operators
1038 tok_binary = -11, tok_unary = -12,
1039
1040 // var definition
1041 tok_var = -13
1042};
1043
1044static std::string IdentifierStr; // Filled in if tok_identifier
1045static double NumVal; // Filled in if tok_number
1046
1047/// gettok - Return the next token from standard input.
1048static int gettok() {
1049 static int LastChar = ' ';
1050
1051 // Skip any whitespace.
1052 while (isspace(LastChar))
1053 LastChar = getchar();
1054
1055 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
1056 IdentifierStr = LastChar;
1057 while (isalnum((LastChar = getchar())))
1058 IdentifierStr += LastChar;
1059
1060 if (IdentifierStr == "def") return tok_def;
1061 if (IdentifierStr == "extern") return tok_extern;
1062 if (IdentifierStr == "if") return tok_if;
1063 if (IdentifierStr == "then") return tok_then;
1064 if (IdentifierStr == "else") return tok_else;
1065 if (IdentifierStr == "for") return tok_for;
1066 if (IdentifierStr == "in") return tok_in;
1067 if (IdentifierStr == "binary") return tok_binary;
1068 if (IdentifierStr == "unary") return tok_unary;
1069 if (IdentifierStr == "var") return tok_var;
1070 return tok_identifier;
1071 }
1072
1073 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
1074 std::string NumStr;
1075 do {
1076 NumStr += LastChar;
1077 LastChar = getchar();
1078 } while (isdigit(LastChar) || LastChar == '.');
1079
1080 NumVal = strtod(NumStr.c_str(), 0);
1081 return tok_number;
1082 }
1083
1084 if (LastChar == '#') {
1085 // Comment until end of line.
1086 do LastChar = getchar();
1087 while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp; LastChar != '\r');
1088
1089 if (LastChar != EOF)
1090 return gettok();
1091 }
1092
1093 // Check for end of file. Don't eat the EOF.
1094 if (LastChar == EOF)
1095 return tok_eof;
1096
1097 // Otherwise, just return the character as its ascii value.
1098 int ThisChar = LastChar;
1099 LastChar = getchar();
1100 return ThisChar;
1101}
1102
1103//===----------------------------------------------------------------------===//
1104// Abstract Syntax Tree (aka Parse Tree)
1105//===----------------------------------------------------------------------===//
1106
1107/// ExprAST - Base class for all expression nodes.
1108class ExprAST {
1109public:
1110 virtual ~ExprAST() {}
1111 virtual Value *Codegen() = 0;
1112};
1113
1114/// NumberExprAST - Expression class for numeric literals like "1.0".
1115class NumberExprAST : public ExprAST {
1116 double Val;
1117public:
1118 NumberExprAST(double val) : Val(val) {}
1119 virtual Value *Codegen();
1120};
1121
1122/// VariableExprAST - Expression class for referencing a variable, like "a".
1123class VariableExprAST : public ExprAST {
1124 std::string Name;
1125public:
1126 VariableExprAST(const std::string &amp;name) : Name(name) {}
1127 const std::string &amp;getName() const { return Name; }
1128 virtual Value *Codegen();
1129};
1130
1131/// UnaryExprAST - Expression class for a unary operator.
1132class UnaryExprAST : public ExprAST {
1133 char Opcode;
1134 ExprAST *Operand;
1135public:
1136 UnaryExprAST(char opcode, ExprAST *operand)
1137 : Opcode(opcode), Operand(operand) {}
1138 virtual Value *Codegen();
1139};
1140
1141/// BinaryExprAST - Expression class for a binary operator.
1142class BinaryExprAST : public ExprAST {
1143 char Op;
1144 ExprAST *LHS, *RHS;
1145public:
1146 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
1147 : Op(op), LHS(lhs), RHS(rhs) {}
1148 virtual Value *Codegen();
1149};
1150
1151/// CallExprAST - Expression class for function calls.
1152class CallExprAST : public ExprAST {
1153 std::string Callee;
1154 std::vector&lt;ExprAST*&gt; Args;
1155public:
1156 CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
1157 : Callee(callee), Args(args) {}
1158 virtual Value *Codegen();
1159};
1160
1161/// IfExprAST - Expression class for if/then/else.
1162class IfExprAST : public ExprAST {
1163 ExprAST *Cond, *Then, *Else;
1164public:
1165 IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
1166 : Cond(cond), Then(then), Else(_else) {}
1167 virtual Value *Codegen();
1168};
1169
1170/// ForExprAST - Expression class for for/in.
1171class ForExprAST : public ExprAST {
1172 std::string VarName;
1173 ExprAST *Start, *End, *Step, *Body;
1174public:
1175 ForExprAST(const std::string &amp;varname, ExprAST *start, ExprAST *end,
1176 ExprAST *step, ExprAST *body)
1177 : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
1178 virtual Value *Codegen();
1179};
1180
1181/// VarExprAST - Expression class for var/in
1182class VarExprAST : public ExprAST {
1183 std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
1184 ExprAST *Body;
1185public:
1186 VarExprAST(const std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; &amp;varnames,
1187 ExprAST *body)
1188 : VarNames(varnames), Body(body) {}
1189
1190 virtual Value *Codegen();
1191};
1192
1193/// PrototypeAST - This class represents the "prototype" for a function,
1194/// which captures its argument names as well as if it is an operator.
1195class PrototypeAST {
1196 std::string Name;
1197 std::vector&lt;std::string&gt; Args;
1198 bool isOperator;
1199 unsigned Precedence; // Precedence if a binary op.
1200public:
1201 PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args,
1202 bool isoperator = false, unsigned prec = 0)
1203 : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {}
1204
1205 bool isUnaryOp() const { return isOperator &amp;&amp; Args.size() == 1; }
1206 bool isBinaryOp() const { return isOperator &amp;&amp; Args.size() == 2; }
1207
1208 char getOperatorName() const {
1209 assert(isUnaryOp() || isBinaryOp());
1210 return Name[Name.size()-1];
1211 }
1212
1213 unsigned getBinaryPrecedence() const { return Precedence; }
1214
1215 Function *Codegen();
1216
1217 void CreateArgumentAllocas(Function *F);
1218};
1219
1220/// FunctionAST - This class represents a function definition itself.
1221class FunctionAST {
1222 PrototypeAST *Proto;
1223 ExprAST *Body;
1224public:
1225 FunctionAST(PrototypeAST *proto, ExprAST *body)
1226 : Proto(proto), Body(body) {}
1227
1228 Function *Codegen();
1229};
1230
1231//===----------------------------------------------------------------------===//
1232// Parser
1233//===----------------------------------------------------------------------===//
1234
1235/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
1236/// token the parser it looking at. getNextToken reads another token from the
1237/// lexer and updates CurTok with its results.
1238static int CurTok;
1239static int getNextToken() {
1240 return CurTok = gettok();
1241}
1242
1243/// BinopPrecedence - This holds the precedence for each binary operator that is
1244/// defined.
1245static std::map&lt;char, int&gt; BinopPrecedence;
1246
1247/// GetTokPrecedence - Get the precedence of the pending binary operator token.
1248static int GetTokPrecedence() {
1249 if (!isascii(CurTok))
1250 return -1;
1251
1252 // Make sure it's a declared binop.
1253 int TokPrec = BinopPrecedence[CurTok];
1254 if (TokPrec &lt;= 0) return -1;
1255 return TokPrec;
1256}
1257
1258/// Error* - These are little helper functions for error handling.
1259ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
1260PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
1261FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
1262
1263static ExprAST *ParseExpression();
1264
1265/// identifierexpr
Chris Lattner20a0c802007-11-05 17:54:34 +00001266/// ::= identifier
1267/// ::= identifier '(' expression* ')'
Chris Lattner62a709d2007-11-05 00:23:57 +00001268static ExprAST *ParseIdentifierExpr() {
1269 std::string IdName = IdentifierStr;
1270
Chris Lattner20a0c802007-11-05 17:54:34 +00001271 getNextToken(); // eat identifier.
Chris Lattner62a709d2007-11-05 00:23:57 +00001272
1273 if (CurTok != '(') // Simple variable ref.
1274 return new VariableExprAST(IdName);
1275
1276 // Call.
1277 getNextToken(); // eat (
1278 std::vector&lt;ExprAST*&gt; Args;
1279 if (CurTok != ')') {
1280 while (1) {
1281 ExprAST *Arg = ParseExpression();
1282 if (!Arg) return 0;
1283 Args.push_back(Arg);
1284
1285 if (CurTok == ')') break;
1286
1287 if (CurTok != ',')
1288 return Error("Expected ')'");
1289 getNextToken();
1290 }
1291 }
1292
1293 // Eat the ')'.
1294 getNextToken();
1295
1296 return new CallExprAST(IdName, Args);
1297}
1298
1299/// numberexpr ::= number
1300static ExprAST *ParseNumberExpr() {
1301 ExprAST *Result = new NumberExprAST(NumVal);
1302 getNextToken(); // consume the number
1303 return Result;
1304}
1305
1306/// parenexpr ::= '(' expression ')'
1307static ExprAST *ParseParenExpr() {
1308 getNextToken(); // eat (.
1309 ExprAST *V = ParseExpression();
1310 if (!V) return 0;
1311
1312 if (CurTok != ')')
1313 return Error("expected ')'");
1314 getNextToken(); // eat ).
1315 return V;
1316}
1317
1318/// ifexpr ::= 'if' expression 'then' expression 'else' expression
1319static ExprAST *ParseIfExpr() {
1320 getNextToken(); // eat the if.
1321
1322 // condition.
1323 ExprAST *Cond = ParseExpression();
1324 if (!Cond) return 0;
1325
1326 if (CurTok != tok_then)
1327 return Error("expected then");
1328 getNextToken(); // eat the then
1329
1330 ExprAST *Then = ParseExpression();
1331 if (Then == 0) return 0;
1332
1333 if (CurTok != tok_else)
1334 return Error("expected else");
1335
1336 getNextToken();
1337
1338 ExprAST *Else = ParseExpression();
1339 if (!Else) return 0;
1340
1341 return new IfExprAST(Cond, Then, Else);
1342}
1343
Chris Lattner20a0c802007-11-05 17:54:34 +00001344/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
Chris Lattner62a709d2007-11-05 00:23:57 +00001345static ExprAST *ParseForExpr() {
1346 getNextToken(); // eat the for.
1347
1348 if (CurTok != tok_identifier)
1349 return Error("expected identifier after for");
1350
1351 std::string IdName = IdentifierStr;
Chris Lattner20a0c802007-11-05 17:54:34 +00001352 getNextToken(); // eat identifier.
Chris Lattner62a709d2007-11-05 00:23:57 +00001353
1354 if (CurTok != '=')
1355 return Error("expected '=' after for");
1356 getNextToken(); // eat '='.
1357
1358
1359 ExprAST *Start = ParseExpression();
1360 if (Start == 0) return 0;
1361 if (CurTok != ',')
1362 return Error("expected ',' after for start value");
1363 getNextToken();
1364
1365 ExprAST *End = ParseExpression();
1366 if (End == 0) return 0;
1367
1368 // The step value is optional.
1369 ExprAST *Step = 0;
1370 if (CurTok == ',') {
1371 getNextToken();
1372 Step = ParseExpression();
1373 if (Step == 0) return 0;
1374 }
1375
1376 if (CurTok != tok_in)
1377 return Error("expected 'in' after for");
1378 getNextToken(); // eat 'in'.
1379
1380 ExprAST *Body = ParseExpression();
1381 if (Body == 0) return 0;
1382
1383 return new ForExprAST(IdName, Start, End, Step, Body);
1384}
1385
Chris Lattner20a0c802007-11-05 17:54:34 +00001386/// varexpr ::= 'var' identifier ('=' expression)?
1387// (',' identifier ('=' expression)?)* 'in' expression
Chris Lattner62a709d2007-11-05 00:23:57 +00001388static ExprAST *ParseVarExpr() {
1389 getNextToken(); // eat the var.
1390
1391 std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
1392
1393 // At least one variable name is required.
1394 if (CurTok != tok_identifier)
1395 return Error("expected identifier after var");
1396
1397 while (1) {
1398 std::string Name = IdentifierStr;
Chris Lattner20a0c802007-11-05 17:54:34 +00001399 getNextToken(); // eat identifier.
Chris Lattner62a709d2007-11-05 00:23:57 +00001400
1401 // Read the optional initializer.
1402 ExprAST *Init = 0;
1403 if (CurTok == '=') {
1404 getNextToken(); // eat the '='.
1405
1406 Init = ParseExpression();
1407 if (Init == 0) return 0;
1408 }
1409
1410 VarNames.push_back(std::make_pair(Name, Init));
1411
1412 // End of var list, exit loop.
1413 if (CurTok != ',') break;
1414 getNextToken(); // eat the ','.
1415
1416 if (CurTok != tok_identifier)
1417 return Error("expected identifier list after var");
1418 }
1419
1420 // At this point, we have to have 'in'.
1421 if (CurTok != tok_in)
1422 return Error("expected 'in' keyword after 'var'");
1423 getNextToken(); // eat 'in'.
1424
1425 ExprAST *Body = ParseExpression();
1426 if (Body == 0) return 0;
1427
1428 return new VarExprAST(VarNames, Body);
1429}
1430
1431
1432/// primary
1433/// ::= identifierexpr
1434/// ::= numberexpr
1435/// ::= parenexpr
1436/// ::= ifexpr
1437/// ::= forexpr
1438/// ::= varexpr
1439static ExprAST *ParsePrimary() {
1440 switch (CurTok) {
1441 default: return Error("unknown token when expecting an expression");
1442 case tok_identifier: return ParseIdentifierExpr();
1443 case tok_number: return ParseNumberExpr();
1444 case '(': return ParseParenExpr();
1445 case tok_if: return ParseIfExpr();
1446 case tok_for: return ParseForExpr();
1447 case tok_var: return ParseVarExpr();
1448 }
1449}
1450
1451/// unary
1452/// ::= primary
1453/// ::= '!' unary
1454static ExprAST *ParseUnary() {
1455 // If the current token is not an operator, it must be a primary expr.
1456 if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
1457 return ParsePrimary();
1458
1459 // If this is a unary operator, read it.
1460 int Opc = CurTok;
1461 getNextToken();
1462 if (ExprAST *Operand = ParseUnary())
1463 return new UnaryExprAST(Opc, Operand);
1464 return 0;
1465}
1466
1467/// binoprhs
1468/// ::= ('+' unary)*
1469static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
1470 // If this is a binop, find its precedence.
1471 while (1) {
1472 int TokPrec = GetTokPrecedence();
1473
1474 // If this is a binop that binds at least as tightly as the current binop,
1475 // consume it, otherwise we are done.
1476 if (TokPrec &lt; ExprPrec)
1477 return LHS;
1478
1479 // Okay, we know this is a binop.
1480 int BinOp = CurTok;
1481 getNextToken(); // eat binop
1482
1483 // Parse the unary expression after the binary operator.
1484 ExprAST *RHS = ParseUnary();
1485 if (!RHS) return 0;
1486
1487 // If BinOp binds less tightly with RHS than the operator after RHS, let
1488 // the pending operator take RHS as its LHS.
1489 int NextPrec = GetTokPrecedence();
1490 if (TokPrec &lt; NextPrec) {
1491 RHS = ParseBinOpRHS(TokPrec+1, RHS);
1492 if (RHS == 0) return 0;
1493 }
1494
1495 // Merge LHS/RHS.
1496 LHS = new BinaryExprAST(BinOp, LHS, RHS);
1497 }
1498}
1499
1500/// expression
1501/// ::= unary binoprhs
1502///
1503static ExprAST *ParseExpression() {
1504 ExprAST *LHS = ParseUnary();
1505 if (!LHS) return 0;
1506
1507 return ParseBinOpRHS(0, LHS);
1508}
1509
1510/// prototype
1511/// ::= id '(' id* ')'
1512/// ::= binary LETTER number? (id, id)
1513/// ::= unary LETTER (id)
1514static PrototypeAST *ParsePrototype() {
1515 std::string FnName;
1516
1517 int Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
1518 unsigned BinaryPrecedence = 30;
1519
1520 switch (CurTok) {
1521 default:
1522 return ErrorP("Expected function name in prototype");
1523 case tok_identifier:
1524 FnName = IdentifierStr;
1525 Kind = 0;
1526 getNextToken();
1527 break;
1528 case tok_unary:
1529 getNextToken();
1530 if (!isascii(CurTok))
1531 return ErrorP("Expected unary operator");
1532 FnName = "unary";
1533 FnName += (char)CurTok;
1534 Kind = 1;
1535 getNextToken();
1536 break;
1537 case tok_binary:
1538 getNextToken();
1539 if (!isascii(CurTok))
1540 return ErrorP("Expected binary operator");
1541 FnName = "binary";
1542 FnName += (char)CurTok;
1543 Kind = 2;
1544 getNextToken();
1545
1546 // Read the precedence if present.
1547 if (CurTok == tok_number) {
1548 if (NumVal &lt; 1 || NumVal &gt; 100)
1549 return ErrorP("Invalid precedecnce: must be 1..100");
1550 BinaryPrecedence = (unsigned)NumVal;
1551 getNextToken();
1552 }
1553 break;
1554 }
1555
1556 if (CurTok != '(')
1557 return ErrorP("Expected '(' in prototype");
1558
1559 std::vector&lt;std::string&gt; ArgNames;
1560 while (getNextToken() == tok_identifier)
1561 ArgNames.push_back(IdentifierStr);
1562 if (CurTok != ')')
1563 return ErrorP("Expected ')' in prototype");
1564
1565 // success.
1566 getNextToken(); // eat ')'.
1567
1568 // Verify right number of names for operator.
1569 if (Kind &amp;&amp; ArgNames.size() != Kind)
1570 return ErrorP("Invalid number of operands for operator");
1571
1572 return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);
1573}
1574
1575/// definition ::= 'def' prototype expression
1576static FunctionAST *ParseDefinition() {
1577 getNextToken(); // eat def.
1578 PrototypeAST *Proto = ParsePrototype();
1579 if (Proto == 0) return 0;
1580
1581 if (ExprAST *E = ParseExpression())
1582 return new FunctionAST(Proto, E);
1583 return 0;
1584}
1585
1586/// toplevelexpr ::= expression
1587static FunctionAST *ParseTopLevelExpr() {
1588 if (ExprAST *E = ParseExpression()) {
1589 // Make an anonymous proto.
1590 PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
1591 return new FunctionAST(Proto, E);
1592 }
1593 return 0;
1594}
1595
1596/// external ::= 'extern' prototype
1597static PrototypeAST *ParseExtern() {
1598 getNextToken(); // eat extern.
1599 return ParsePrototype();
1600}
1601
1602//===----------------------------------------------------------------------===//
1603// Code Generation
1604//===----------------------------------------------------------------------===//
1605
1606static Module *TheModule;
1607static LLVMFoldingBuilder Builder;
1608static std::map&lt;std::string, AllocaInst*&gt; NamedValues;
1609static FunctionPassManager *TheFPM;
1610
1611Value *ErrorV(const char *Str) { Error(Str); return 0; }
1612
1613/// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
1614/// the function. This is used for mutable variables etc.
1615static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
1616 const std::string &amp;VarName) {
1617 LLVMBuilder TmpB(&amp;TheFunction-&gt;getEntryBlock(),
1618 TheFunction-&gt;getEntryBlock().begin());
1619 return TmpB.CreateAlloca(Type::DoubleTy, 0, VarName.c_str());
1620}
1621
1622
1623Value *NumberExprAST::Codegen() {
1624 return ConstantFP::get(Type::DoubleTy, APFloat(Val));
1625}
1626
1627Value *VariableExprAST::Codegen() {
1628 // Look this variable up in the function.
1629 Value *V = NamedValues[Name];
1630 if (V == 0) return ErrorV("Unknown variable name");
1631
1632 // Load the value.
1633 return Builder.CreateLoad(V, Name.c_str());
1634}
1635
1636Value *UnaryExprAST::Codegen() {
1637 Value *OperandV = Operand-&gt;Codegen();
1638 if (OperandV == 0) return 0;
1639
1640 Function *F = TheModule-&gt;getFunction(std::string("unary")+Opcode);
1641 if (F == 0)
1642 return ErrorV("Unknown unary operator");
1643
1644 return Builder.CreateCall(F, OperandV, "unop");
1645}
1646
1647
1648Value *BinaryExprAST::Codegen() {
1649 // Special case '=' because we don't want to emit the LHS as an expression.
1650 if (Op == '=') {
1651 // Assignment requires the LHS to be an identifier.
1652 VariableExprAST *LHSE = dynamic_cast&lt;VariableExprAST*&gt;(LHS);
1653 if (!LHSE)
1654 return ErrorV("destination of '=' must be a variable");
1655 // Codegen the RHS.
1656 Value *Val = RHS-&gt;Codegen();
1657 if (Val == 0) return 0;
1658
1659 // Look up the name.
1660 Value *Variable = NamedValues[LHSE-&gt;getName()];
1661 if (Variable == 0) return ErrorV("Unknown variable name");
1662
1663 Builder.CreateStore(Val, Variable);
1664 return Val;
1665 }
1666
1667
1668 Value *L = LHS-&gt;Codegen();
1669 Value *R = RHS-&gt;Codegen();
1670 if (L == 0 || R == 0) return 0;
1671
1672 switch (Op) {
1673 case '+': return Builder.CreateAdd(L, R, "addtmp");
1674 case '-': return Builder.CreateSub(L, R, "subtmp");
1675 case '*': return Builder.CreateMul(L, R, "multmp");
1676 case '&lt;':
Chris Lattner71155212007-11-06 01:39:12 +00001677 L = Builder.CreateFCmpULT(L, R, "cmptmp");
Chris Lattner62a709d2007-11-05 00:23:57 +00001678 // Convert bool 0/1 to double 0.0 or 1.0
1679 return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
1680 default: break;
1681 }
1682
1683 // If it wasn't a builtin binary operator, it must be a user defined one. Emit
1684 // a call to it.
1685 Function *F = TheModule-&gt;getFunction(std::string("binary")+Op);
1686 assert(F &amp;&amp; "binary operator not found!");
1687
1688 Value *Ops[] = { L, R };
1689 return Builder.CreateCall(F, Ops, Ops+2, "binop");
1690}
1691
1692Value *CallExprAST::Codegen() {
1693 // Look up the name in the global module table.
1694 Function *CalleeF = TheModule-&gt;getFunction(Callee);
1695 if (CalleeF == 0)
1696 return ErrorV("Unknown function referenced");
1697
1698 // If argument mismatch error.
1699 if (CalleeF-&gt;arg_size() != Args.size())
1700 return ErrorV("Incorrect # arguments passed");
1701
1702 std::vector&lt;Value*&gt; ArgsV;
1703 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
1704 ArgsV.push_back(Args[i]-&gt;Codegen());
1705 if (ArgsV.back() == 0) return 0;
1706 }
1707
1708 return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
1709}
1710
1711Value *IfExprAST::Codegen() {
1712 Value *CondV = Cond-&gt;Codegen();
1713 if (CondV == 0) return 0;
1714
1715 // Convert condition to a bool by comparing equal to 0.0.
1716 CondV = Builder.CreateFCmpONE(CondV,
1717 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
1718 "ifcond");
1719
1720 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1721
1722 // Create blocks for the then and else cases. Insert the 'then' block at the
1723 // end of the function.
1724 BasicBlock *ThenBB = new BasicBlock("then", TheFunction);
1725 BasicBlock *ElseBB = new BasicBlock("else");
1726 BasicBlock *MergeBB = new BasicBlock("ifcont");
1727
1728 Builder.CreateCondBr(CondV, ThenBB, ElseBB);
1729
1730 // Emit then value.
1731 Builder.SetInsertPoint(ThenBB);
1732
1733 Value *ThenV = Then-&gt;Codegen();
1734 if (ThenV == 0) return 0;
1735
1736 Builder.CreateBr(MergeBB);
1737 // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
1738 ThenBB = Builder.GetInsertBlock();
1739
1740 // Emit else block.
1741 TheFunction-&gt;getBasicBlockList().push_back(ElseBB);
1742 Builder.SetInsertPoint(ElseBB);
1743
1744 Value *ElseV = Else-&gt;Codegen();
1745 if (ElseV == 0) return 0;
1746
1747 Builder.CreateBr(MergeBB);
1748 // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
1749 ElseBB = Builder.GetInsertBlock();
1750
1751 // Emit merge block.
1752 TheFunction-&gt;getBasicBlockList().push_back(MergeBB);
1753 Builder.SetInsertPoint(MergeBB);
1754 PHINode *PN = Builder.CreatePHI(Type::DoubleTy, "iftmp");
1755
1756 PN-&gt;addIncoming(ThenV, ThenBB);
1757 PN-&gt;addIncoming(ElseV, ElseBB);
1758 return PN;
1759}
1760
1761Value *ForExprAST::Codegen() {
1762 // Output this as:
1763 // var = alloca double
1764 // ...
1765 // start = startexpr
1766 // store start -&gt; var
1767 // goto loop
1768 // loop:
1769 // ...
1770 // bodyexpr
1771 // ...
1772 // loopend:
1773 // step = stepexpr
1774 // endcond = endexpr
1775 //
1776 // curvar = load var
1777 // nextvar = curvar + step
1778 // store nextvar -&gt; var
1779 // br endcond, loop, endloop
1780 // outloop:
1781
1782 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1783
1784 // Create an alloca for the variable in the entry block.
1785 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
1786
1787 // Emit the start code first, without 'variable' in scope.
1788 Value *StartVal = Start-&gt;Codegen();
1789 if (StartVal == 0) return 0;
1790
1791 // Store the value into the alloca.
1792 Builder.CreateStore(StartVal, Alloca);
1793
1794 // Make the new basic block for the loop header, inserting after current
1795 // block.
1796 BasicBlock *PreheaderBB = Builder.GetInsertBlock();
1797 BasicBlock *LoopBB = new BasicBlock("loop", TheFunction);
1798
1799 // Insert an explicit fall through from the current block to the LoopBB.
1800 Builder.CreateBr(LoopBB);
1801
1802 // Start insertion in LoopBB.
1803 Builder.SetInsertPoint(LoopBB);
1804
1805 // Within the loop, the variable is defined equal to the PHI node. If it
1806 // shadows an existing variable, we have to restore it, so save it now.
1807 AllocaInst *OldVal = NamedValues[VarName];
1808 NamedValues[VarName] = Alloca;
1809
1810 // Emit the body of the loop. This, like any other expr, can change the
1811 // current BB. Note that we ignore the value computed by the body, but don't
1812 // allow an error.
1813 if (Body-&gt;Codegen() == 0)
1814 return 0;
1815
1816 // Emit the step value.
1817 Value *StepVal;
1818 if (Step) {
1819 StepVal = Step-&gt;Codegen();
1820 if (StepVal == 0) return 0;
1821 } else {
1822 // If not specified, use 1.0.
1823 StepVal = ConstantFP::get(Type::DoubleTy, APFloat(1.0));
1824 }
1825
1826 // Compute the end condition.
1827 Value *EndCond = End-&gt;Codegen();
1828 if (EndCond == 0) return EndCond;
1829
1830 // Reload, increment, and restore the alloca. This handles the case where
1831 // the body of the loop mutates the variable.
1832 Value *CurVar = Builder.CreateLoad(Alloca, VarName.c_str());
1833 Value *NextVar = Builder.CreateAdd(CurVar, StepVal, "nextvar");
1834 Builder.CreateStore(NextVar, Alloca);
1835
1836 // Convert condition to a bool by comparing equal to 0.0.
1837 EndCond = Builder.CreateFCmpONE(EndCond,
1838 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
1839 "loopcond");
1840
1841 // Create the "after loop" block and insert it.
1842 BasicBlock *LoopEndBB = Builder.GetInsertBlock();
1843 BasicBlock *AfterBB = new BasicBlock("afterloop", TheFunction);
1844
1845 // Insert the conditional branch into the end of LoopEndBB.
1846 Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
1847
1848 // Any new code will be inserted in AfterBB.
1849 Builder.SetInsertPoint(AfterBB);
1850
1851 // Restore the unshadowed variable.
1852 if (OldVal)
1853 NamedValues[VarName] = OldVal;
1854 else
1855 NamedValues.erase(VarName);
1856
1857
1858 // for expr always returns 0.0.
1859 return Constant::getNullValue(Type::DoubleTy);
1860}
1861
1862Value *VarExprAST::Codegen() {
1863 std::vector&lt;AllocaInst *&gt; OldBindings;
1864
1865 Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1866
1867 // Register all variables and emit their initializer.
1868 for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
1869 const std::string &amp;VarName = VarNames[i].first;
1870 ExprAST *Init = VarNames[i].second;
1871
1872 // Emit the initializer before adding the variable to scope, this prevents
1873 // the initializer from referencing the variable itself, and permits stuff
1874 // like this:
1875 // var a = 1 in
1876 // var a = a in ... # refers to outer 'a'.
1877 Value *InitVal;
1878 if (Init) {
1879 InitVal = Init-&gt;Codegen();
1880 if (InitVal == 0) return 0;
1881 } else { // If not specified, use 0.0.
1882 InitVal = ConstantFP::get(Type::DoubleTy, APFloat(0.0));
1883 }
1884
1885 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
1886 Builder.CreateStore(InitVal, Alloca);
1887
1888 // Remember the old variable binding so that we can restore the binding when
1889 // we unrecurse.
1890 OldBindings.push_back(NamedValues[VarName]);
1891
1892 // Remember this binding.
1893 NamedValues[VarName] = Alloca;
1894 }
1895
1896 // Codegen the body, now that all vars are in scope.
1897 Value *BodyVal = Body-&gt;Codegen();
1898 if (BodyVal == 0) return 0;
1899
1900 // Pop all our variables from scope.
1901 for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
1902 NamedValues[VarNames[i].first] = OldBindings[i];
1903
1904 // Return the body computation.
1905 return BodyVal;
1906}
1907
1908
1909Function *PrototypeAST::Codegen() {
1910 // Make the function type: double(double,double) etc.
1911 std::vector&lt;const Type*&gt; Doubles(Args.size(), Type::DoubleTy);
1912 FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
1913
1914 Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
1915
1916 // If F conflicted, there was already something named 'Name'. If it has a
1917 // body, don't allow redefinition or reextern.
1918 if (F-&gt;getName() != Name) {
1919 // Delete the one we just made and get the existing one.
1920 F-&gt;eraseFromParent();
1921 F = TheModule-&gt;getFunction(Name);
1922
1923 // If F already has a body, reject this.
1924 if (!F-&gt;empty()) {
1925 ErrorF("redefinition of function");
1926 return 0;
1927 }
1928
1929 // If F took a different number of args, reject.
1930 if (F-&gt;arg_size() != Args.size()) {
1931 ErrorF("redefinition of function with different # args");
1932 return 0;
1933 }
1934 }
1935
1936 // Set names for all arguments.
1937 unsigned Idx = 0;
1938 for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
1939 ++AI, ++Idx)
1940 AI-&gt;setName(Args[Idx]);
1941
1942 return F;
1943}
1944
1945/// CreateArgumentAllocas - Create an alloca for each argument and register the
1946/// argument in the symbol table so that references to it will succeed.
1947void PrototypeAST::CreateArgumentAllocas(Function *F) {
1948 Function::arg_iterator AI = F-&gt;arg_begin();
1949 for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
1950 // Create an alloca for this variable.
1951 AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
1952
1953 // Store the initial value into the alloca.
1954 Builder.CreateStore(AI, Alloca);
1955
1956 // Add arguments to variable symbol table.
1957 NamedValues[Args[Idx]] = Alloca;
1958 }
1959}
1960
1961
1962Function *FunctionAST::Codegen() {
1963 NamedValues.clear();
1964
1965 Function *TheFunction = Proto-&gt;Codegen();
1966 if (TheFunction == 0)
1967 return 0;
1968
1969 // If this is an operator, install it.
1970 if (Proto-&gt;isBinaryOp())
1971 BinopPrecedence[Proto-&gt;getOperatorName()] = Proto-&gt;getBinaryPrecedence();
1972
1973 // Create a new basic block to start insertion into.
1974 BasicBlock *BB = new BasicBlock("entry", TheFunction);
1975 Builder.SetInsertPoint(BB);
1976
1977 // Add all arguments to the symbol table and create their allocas.
1978 Proto-&gt;CreateArgumentAllocas(TheFunction);
1979
1980 if (Value *RetVal = Body-&gt;Codegen()) {
1981 // Finish off the function.
1982 Builder.CreateRet(RetVal);
1983
1984 // Validate the generated code, checking for consistency.
1985 verifyFunction(*TheFunction);
1986
1987 // Optimize the function.
1988 TheFPM-&gt;run(*TheFunction);
1989
1990 return TheFunction;
1991 }
1992
1993 // Error reading body, remove function.
1994 TheFunction-&gt;eraseFromParent();
1995
1996 if (Proto-&gt;isBinaryOp())
1997 BinopPrecedence.erase(Proto-&gt;getOperatorName());
1998 return 0;
1999}
2000
2001//===----------------------------------------------------------------------===//
2002// Top-Level parsing and JIT Driver
2003//===----------------------------------------------------------------------===//
2004
2005static ExecutionEngine *TheExecutionEngine;
2006
2007static void HandleDefinition() {
2008 if (FunctionAST *F = ParseDefinition()) {
2009 if (Function *LF = F-&gt;Codegen()) {
2010 fprintf(stderr, "Read function definition:");
2011 LF-&gt;dump();
2012 }
2013 } else {
2014 // Skip token for error recovery.
2015 getNextToken();
2016 }
2017}
2018
2019static void HandleExtern() {
2020 if (PrototypeAST *P = ParseExtern()) {
2021 if (Function *F = P-&gt;Codegen()) {
2022 fprintf(stderr, "Read extern: ");
2023 F-&gt;dump();
2024 }
2025 } else {
2026 // Skip token for error recovery.
2027 getNextToken();
2028 }
2029}
2030
2031static void HandleTopLevelExpression() {
2032 // Evaluate a top level expression into an anonymous function.
2033 if (FunctionAST *F = ParseTopLevelExpr()) {
2034 if (Function *LF = F-&gt;Codegen()) {
2035 // JIT the function, returning a function pointer.
2036 void *FPtr = TheExecutionEngine-&gt;getPointerToFunction(LF);
2037
2038 // Cast it to the right type (takes no arguments, returns a double) so we
2039 // can call it as a native function.
2040 double (*FP)() = (double (*)())FPtr;
2041 fprintf(stderr, "Evaluated to %f\n", FP());
2042 }
2043 } else {
2044 // Skip token for error recovery.
2045 getNextToken();
2046 }
2047}
2048
2049/// top ::= definition | external | expression | ';'
2050static void MainLoop() {
2051 while (1) {
2052 fprintf(stderr, "ready&gt; ");
2053 switch (CurTok) {
2054 case tok_eof: return;
2055 case ';': getNextToken(); break; // ignore top level semicolons.
2056 case tok_def: HandleDefinition(); break;
2057 case tok_extern: HandleExtern(); break;
2058 default: HandleTopLevelExpression(); break;
2059 }
2060 }
2061}
2062
2063
2064
2065//===----------------------------------------------------------------------===//
2066// "Library" functions that can be "extern'd" from user code.
2067//===----------------------------------------------------------------------===//
2068
2069/// putchard - putchar that takes a double and returns 0.
2070extern "C"
2071double putchard(double X) {
2072 putchar((char)X);
2073 return 0;
2074}
2075
2076/// printd - printf that takes a double prints it as "%f\n", returning 0.
2077extern "C"
2078double printd(double X) {
2079 printf("%f\n", X);
2080 return 0;
2081}
2082
2083//===----------------------------------------------------------------------===//
2084// Main driver code.
2085//===----------------------------------------------------------------------===//
2086
2087int main() {
2088 // Install standard binary operators.
2089 // 1 is lowest precedence.
2090 BinopPrecedence['='] = 2;
2091 BinopPrecedence['&lt;'] = 10;
2092 BinopPrecedence['+'] = 20;
2093 BinopPrecedence['-'] = 20;
2094 BinopPrecedence['*'] = 40; // highest.
2095
2096 // Prime the first token.
2097 fprintf(stderr, "ready&gt; ");
2098 getNextToken();
2099
2100 // Make the module, which holds all the code.
2101 TheModule = new Module("my cool jit");
2102
2103 // Create the JIT.
2104 TheExecutionEngine = ExecutionEngine::create(TheModule);
2105
2106 {
2107 ExistingModuleProvider OurModuleProvider(TheModule);
2108 FunctionPassManager OurFPM(&amp;OurModuleProvider);
2109
2110 // Set up the optimizer pipeline. Start with registering info about how the
2111 // target lays out data structures.
2112 OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
2113 // Promote allocas to registers.
2114 OurFPM.add(createPromoteMemoryToRegisterPass());
2115 // Do simple "peephole" optimizations and bit-twiddling optzns.
2116 OurFPM.add(createInstructionCombiningPass());
2117 // Reassociate expressions.
2118 OurFPM.add(createReassociatePass());
2119 // Eliminate Common SubExpressions.
2120 OurFPM.add(createGVNPass());
2121 // Simplify the control flow graph (deleting unreachable blocks, etc).
2122 OurFPM.add(createCFGSimplificationPass());
2123
2124 // Set the global so the code gen can use this.
2125 TheFPM = &amp;OurFPM;
2126
2127 // Run the main "interpreter loop" now.
2128 MainLoop();
2129
2130 TheFPM = 0;
2131 } // Free module provider and pass manager.
2132
2133
2134 // Print out all of the generated code.
2135 TheModule-&gt;dump();
2136 return 0;
2137}
Chris Lattner00c992d2007-11-03 08:55:29 +00002138</pre>
2139</div>
2140
2141</div>
2142
2143<!-- *********************************************************************** -->
2144<hr>
2145<address>
2146 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
2147 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
2148 <a href="http://validator.w3.org/check/referer"><img
2149 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
2150
2151 <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
2152 <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
2153 Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $
2154</address>
2155</body>
2156</html>