blob: 602dcb5f6f41e2d5a43e93bd3f98824876188f9e [file] [log] [blame]
Sean Silvaee47edf2012-12-05 00:26:32 +00001=======================================================
2Kaleidoscope: Extending the Language: Mutable Variables
3=======================================================
4
5.. contents::
6 :local:
7
8Written by `Chris Lattner <mailto:sabre@nondot.org>`_
9
10Chapter 7 Introduction
11======================
12
13Welcome to Chapter 7 of the "`Implementing a language with
14LLVM <index.html>`_" tutorial. In chapters 1 through 6, we've built a
15very respectable, albeit simple, `functional programming
16language <http://en.wikipedia.org/wiki/Functional_programming>`_. In our
17journey, we learned some parsing techniques, how to build and represent
18an AST, how to build LLVM IR, and how to optimize the resultant code as
19well as JIT compile it.
20
21While Kaleidoscope is interesting as a functional language, the fact
22that it is functional makes it "too easy" to generate LLVM IR for it. In
23particular, a functional language makes it very easy to build LLVM IR
24directly in `SSA
25form <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_.
26Since LLVM requires that the input code be in SSA form, this is a very
27nice property and it is often unclear to newcomers how to generate code
28for an imperative language with mutable variables.
29
30The short (and happy) summary of this chapter is that there is no need
31for your front-end to build SSA form: LLVM provides highly tuned and
32well tested support for this, though the way it works is a bit
33unexpected for some.
34
35Why is this a hard problem?
36===========================
37
38To understand why mutable variables cause complexities in SSA
39construction, consider this extremely simple C example:
40
41.. code-block:: c
42
43 int G, H;
44 int test(_Bool Condition) {
45 int X;
46 if (Condition)
47 X = G;
48 else
49 X = H;
50 return X;
51 }
52
53In this case, we have the variable "X", whose value depends on the path
54executed in the program. Because there are two different possible values
55for X before the return instruction, a PHI node is inserted to merge the
56two values. The LLVM IR that we want for this example looks like this:
57
58.. code-block:: llvm
59
60 @G = weak global i32 0 ; type of @G is i32*
61 @H = weak global i32 0 ; type of @H is i32*
62
63 define i32 @test(i1 %Condition) {
64 entry:
65 br i1 %Condition, label %cond_true, label %cond_false
66
67 cond_true:
68 %X.0 = load i32* @G
69 br label %cond_next
70
71 cond_false:
72 %X.1 = load i32* @H
73 br label %cond_next
74
75 cond_next:
76 %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
77 ret i32 %X.2
78 }
79
80In this example, the loads from the G and H global variables are
81explicit in the LLVM IR, and they live in the then/else branches of the
82if statement (cond\_true/cond\_false). In order to merge the incoming
83values, the X.2 phi node in the cond\_next block selects the right value
84to use based on where control flow is coming from: if control flow comes
85from the cond\_false block, X.2 gets the value of X.1. Alternatively, if
86control flow comes from cond\_true, it gets the value of X.0. The intent
87of this chapter is not to explain the details of SSA form. For more
88information, see one of the many `online
89references <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_.
90
91The question for this article is "who places the phi nodes when lowering
92assignments to mutable variables?". The issue here is that LLVM
93*requires* that its IR be in SSA form: there is no "non-ssa" mode for
94it. However, SSA construction requires non-trivial algorithms and data
95structures, so it is inconvenient and wasteful for every front-end to
96have to reproduce this logic.
97
98Memory in LLVM
99==============
100
101The 'trick' here is that while LLVM does require all register values to
102be in SSA form, it does not require (or permit) memory objects to be in
103SSA form. In the example above, note that the loads from G and H are
104direct accesses to G and H: they are not renamed or versioned. This
105differs from some other compiler systems, which do try to version memory
106objects. In LLVM, instead of encoding dataflow analysis of memory into
107the LLVM IR, it is handled with `Analysis
108Passes <../WritingAnLLVMPass.html>`_ which are computed on demand.
109
110With this in mind, the high-level idea is that we want to make a stack
111variable (which lives in memory, because it is on the stack) for each
112mutable object in a function. To take advantage of this trick, we need
113to talk about how LLVM represents stack variables.
114
115In LLVM, all memory accesses are explicit with load/store instructions,
116and it is carefully designed not to have (or need) an "address-of"
117operator. Notice how the type of the @G/@H global variables is actually
118"i32\*" even though the variable is defined as "i32". What this means is
119that @G defines *space* for an i32 in the global data area, but its
120*name* actually refers to the address for that space. Stack variables
121work the same way, except that instead of being declared with global
122variable definitions, they are declared with the `LLVM alloca
123instruction <../LangRef.html#i_alloca>`_:
124
125.. code-block:: llvm
126
127 define i32 @example() {
128 entry:
129 %X = alloca i32 ; type of %X is i32*.
130 ...
131 %tmp = load i32* %X ; load the stack value %X from the stack.
132 %tmp2 = add i32 %tmp, 1 ; increment it
133 store i32 %tmp2, i32* %X ; store it back
134 ...
135
136This code shows an example of how you can declare and manipulate a stack
137variable in the LLVM IR. Stack memory allocated with the alloca
138instruction is fully general: you can pass the address of the stack slot
139to functions, you can store it in other variables, etc. In our example
140above, we could rewrite the example to use the alloca technique to avoid
141using a PHI node:
142
143.. code-block:: llvm
144
145 @G = weak global i32 0 ; type of @G is i32*
146 @H = weak global i32 0 ; type of @H is i32*
147
148 define i32 @test(i1 %Condition) {
149 entry:
150 %X = alloca i32 ; type of %X is i32*.
151 br i1 %Condition, label %cond_true, label %cond_false
152
153 cond_true:
154 %X.0 = load i32* @G
155 store i32 %X.0, i32* %X ; Update X
156 br label %cond_next
157
158 cond_false:
159 %X.1 = load i32* @H
160 store i32 %X.1, i32* %X ; Update X
161 br label %cond_next
162
163 cond_next:
164 %X.2 = load i32* %X ; Read X
165 ret i32 %X.2
166 }
167
168With this, we have discovered a way to handle arbitrary mutable
169variables without the need to create Phi nodes at all:
170
171#. Each mutable variable becomes a stack allocation.
172#. Each read of the variable becomes a load from the stack.
173#. Each update of the variable becomes a store to the stack.
174#. Taking the address of a variable just uses the stack address
175 directly.
176
177While this solution has solved our immediate problem, it introduced
178another one: we have now apparently introduced a lot of stack traffic
179for very simple and common operations, a major performance problem.
180Fortunately for us, the LLVM optimizer has a highly-tuned optimization
181pass named "mem2reg" that handles this case, promoting allocas like this
182into SSA registers, inserting Phi nodes as appropriate. If you run this
183example through the pass, for example, you'll get:
184
185.. code-block:: bash
186
187 $ llvm-as < example.ll | opt -mem2reg | llvm-dis
188 @G = weak global i32 0
189 @H = weak global i32 0
190
191 define i32 @test(i1 %Condition) {
192 entry:
193 br i1 %Condition, label %cond_true, label %cond_false
194
195 cond_true:
196 %X.0 = load i32* @G
197 br label %cond_next
198
199 cond_false:
200 %X.1 = load i32* @H
201 br label %cond_next
202
203 cond_next:
204 %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
205 ret i32 %X.01
206 }
207
208The mem2reg pass implements the standard "iterated dominance frontier"
209algorithm for constructing SSA form and has a number of optimizations
210that speed up (very common) degenerate cases. The mem2reg optimization
211pass is the answer to dealing with mutable variables, and we highly
212recommend that you depend on it. Note that mem2reg only works on
213variables in certain circumstances:
214
215#. mem2reg is alloca-driven: it looks for allocas and if it can handle
216 them, it promotes them. It does not apply to global variables or heap
217 allocations.
218#. mem2reg only looks for alloca instructions in the entry block of the
219 function. Being in the entry block guarantees that the alloca is only
220 executed once, which makes analysis simpler.
221#. mem2reg only promotes allocas whose uses are direct loads and stores.
222 If the address of the stack object is passed to a function, or if any
223 funny pointer arithmetic is involved, the alloca will not be
224 promoted.
225#. mem2reg only works on allocas of `first
226 class <../LangRef.html#t_classifications>`_ values (such as pointers,
227 scalars and vectors), and only if the array size of the allocation is
228 1 (or missing in the .ll file). mem2reg is not capable of promoting
229 structs or arrays to registers. Note that the "scalarrepl" pass is
230 more powerful and can promote structs, "unions", and arrays in many
231 cases.
232
233All of these properties are easy to satisfy for most imperative
234languages, and we'll illustrate it below with Kaleidoscope. The final
235question you may be asking is: should I bother with this nonsense for my
236front-end? Wouldn't it be better if I just did SSA construction
237directly, avoiding use of the mem2reg optimization pass? In short, we
238strongly recommend that you use this technique for building SSA form,
239unless there is an extremely good reason not to. Using this technique
240is:
241
242- Proven and well tested: llvm-gcc and clang both use this technique
243 for local mutable variables. As such, the most common clients of LLVM
244 are using this to handle a bulk of their variables. You can be sure
245 that bugs are found fast and fixed early.
246- Extremely Fast: mem2reg has a number of special cases that make it
247 fast in common cases as well as fully general. For example, it has
248 fast-paths for variables that are only used in a single block,
249 variables that only have one assignment point, good heuristics to
250 avoid insertion of unneeded phi nodes, etc.
251- Needed for debug info generation: `Debug information in
252 LLVM <../SourceLevelDebugging.html>`_ relies on having the address of
253 the variable exposed so that debug info can be attached to it. This
254 technique dovetails very naturally with this style of debug info.
255
256If nothing else, this makes it much easier to get your front-end up and
257running, and is very simple to implement. Lets extend Kaleidoscope with
258mutable variables now!
259
260Mutable Variables in Kaleidoscope
261=================================
262
263Now that we know the sort of problem we want to tackle, lets see what
264this looks like in the context of our little Kaleidoscope language.
265We're going to add two features:
266
267#. The ability to mutate variables with the '=' operator.
268#. The ability to define new variables.
269
270While the first item is really what this is about, we only have
271variables for incoming arguments as well as for induction variables, and
272redefining those only goes so far :). Also, the ability to define new
273variables is a useful thing regardless of whether you will be mutating
274them. Here's a motivating example that shows how we could use these:
275
276::
277
278 # Define ':' for sequencing: as a low-precedence operator that ignores operands
279 # and just returns the RHS.
280 def binary : 1 (x y) y;
281
282 # Recursive fib, we could do this before.
283 def fib(x)
284 if (x < 3) then
285 1
286 else
287 fib(x-1)+fib(x-2);
288
289 # Iterative fib.
290 def fibi(x)
291 var a = 1, b = 1, c in
292 (for i = 3, i < x in
293 c = a + b :
294 a = b :
295 b = c) :
296 b;
297
298 # Call it.
299 fibi(10);
300
301In order to mutate variables, we have to change our existing variables
302to use the "alloca trick". Once we have that, we'll add our new
303operator, then extend Kaleidoscope to support new variable definitions.
304
305Adjusting Existing Variables for Mutation
306=========================================
307
308The symbol table in Kaleidoscope is managed at code generation time by
309the '``NamedValues``' map. This map currently keeps track of the LLVM
310"Value\*" that holds the double value for the named variable. In order
311to support mutation, we need to change this slightly, so that it
312``NamedValues`` holds the *memory location* of the variable in question.
313Note that this change is a refactoring: it changes the structure of the
314code, but does not (by itself) change the behavior of the compiler. All
315of these changes are isolated in the Kaleidoscope code generator.
316
317At this point in Kaleidoscope's development, it only supports variables
318for two things: incoming arguments to functions and the induction
319variable of 'for' loops. For consistency, we'll allow mutation of these
320variables in addition to other user-defined variables. This means that
321these will both need memory locations.
322
323To start our transformation of Kaleidoscope, we'll change the
324NamedValues map so that it maps to AllocaInst\* instead of Value\*. Once
325we do this, the C++ compiler will tell us what parts of the code we need
326to update:
327
328.. code-block:: c++
329
330 static std::map<std::string, AllocaInst*> NamedValues;
331
332Also, since we will need to create these alloca's, we'll use a helper
333function that ensures that the allocas are created in the entry block of
334the function:
335
336.. code-block:: c++
337
338 /// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
339 /// the function. This is used for mutable variables etc.
340 static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
341 const std::string &VarName) {
342 IRBuilder<> TmpB(&TheFunction->getEntryBlock(),
343 TheFunction->getEntryBlock().begin());
344 return TmpB.CreateAlloca(Type::getDoubleTy(getGlobalContext()), 0,
345 VarName.c_str());
346 }
347
348This funny looking code creates an IRBuilder object that is pointing at
349the first instruction (.begin()) of the entry block. It then creates an
350alloca with the expected name and returns it. Because all values in
351Kaleidoscope are doubles, there is no need to pass in a type to use.
352
353With this in place, the first functionality change we want to make is to
354variable references. In our new scheme, variables live on the stack, so
355code generating a reference to them actually needs to produce a load
356from the stack slot:
357
358.. code-block:: c++
359
360 Value *VariableExprAST::Codegen() {
361 // Look this variable up in the function.
362 Value *V = NamedValues[Name];
363 if (V == 0) return ErrorV("Unknown variable name");
364
365 // Load the value.
366 return Builder.CreateLoad(V, Name.c_str());
367 }
368
369As you can see, this is pretty straightforward. Now we need to update
370the things that define the variables to set up the alloca. We'll start
371with ``ForExprAST::Codegen`` (see the `full code listing <#code>`_ for
372the unabridged code):
373
374.. code-block:: c++
375
376 Function *TheFunction = Builder.GetInsertBlock()->getParent();
377
378 // Create an alloca for the variable in the entry block.
379 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
380
381 // Emit the start code first, without 'variable' in scope.
382 Value *StartVal = Start->Codegen();
383 if (StartVal == 0) return 0;
384
385 // Store the value into the alloca.
386 Builder.CreateStore(StartVal, Alloca);
387 ...
388
389 // Compute the end condition.
390 Value *EndCond = End->Codegen();
391 if (EndCond == 0) return EndCond;
392
393 // Reload, increment, and restore the alloca. This handles the case where
394 // the body of the loop mutates the variable.
395 Value *CurVar = Builder.CreateLoad(Alloca);
396 Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar");
397 Builder.CreateStore(NextVar, Alloca);
398 ...
399
400This code is virtually identical to the code `before we allowed mutable
401variables <LangImpl5.html#forcodegen>`_. The big difference is that we
402no longer have to construct a PHI node, and we use load/store to access
403the variable as needed.
404
405To support mutable argument variables, we need to also make allocas for
406them. The code for this is also pretty simple:
407
408.. code-block:: c++
409
410 /// CreateArgumentAllocas - Create an alloca for each argument and register the
411 /// argument in the symbol table so that references to it will succeed.
412 void PrototypeAST::CreateArgumentAllocas(Function *F) {
413 Function::arg_iterator AI = F->arg_begin();
414 for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
415 // Create an alloca for this variable.
416 AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
417
418 // Store the initial value into the alloca.
419 Builder.CreateStore(AI, Alloca);
420
421 // Add arguments to variable symbol table.
422 NamedValues[Args[Idx]] = Alloca;
423 }
424 }
425
426For each argument, we make an alloca, store the input value to the
427function into the alloca, and register the alloca as the memory location
428for the argument. This method gets invoked by ``FunctionAST::Codegen``
429right after it sets up the entry block for the function.
430
431The final missing piece is adding the mem2reg pass, which allows us to
432get good codegen once again:
433
434.. code-block:: c++
435
436 // Set up the optimizer pipeline. Start with registering info about how the
437 // target lays out data structures.
438 OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout()));
439 // Promote allocas to registers.
440 OurFPM.add(createPromoteMemoryToRegisterPass());
441 // Do simple "peephole" optimizations and bit-twiddling optzns.
442 OurFPM.add(createInstructionCombiningPass());
443 // Reassociate expressions.
444 OurFPM.add(createReassociatePass());
445
446It is interesting to see what the code looks like before and after the
447mem2reg optimization runs. For example, this is the before/after code
448for our recursive fib function. Before the optimization:
449
450.. code-block:: llvm
451
452 define double @fib(double %x) {
453 entry:
454 %x1 = alloca double
455 store double %x, double* %x1
456 %x2 = load double* %x1
457 %cmptmp = fcmp ult double %x2, 3.000000e+00
458 %booltmp = uitofp i1 %cmptmp to double
459 %ifcond = fcmp one double %booltmp, 0.000000e+00
460 br i1 %ifcond, label %then, label %else
461
462 then: ; preds = %entry
463 br label %ifcont
464
465 else: ; preds = %entry
466 %x3 = load double* %x1
467 %subtmp = fsub double %x3, 1.000000e+00
468 %calltmp = call double @fib(double %subtmp)
469 %x4 = load double* %x1
470 %subtmp5 = fsub double %x4, 2.000000e+00
471 %calltmp6 = call double @fib(double %subtmp5)
472 %addtmp = fadd double %calltmp, %calltmp6
473 br label %ifcont
474
475 ifcont: ; preds = %else, %then
476 %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
477 ret double %iftmp
478 }
479
480Here there is only one variable (x, the input argument) but you can
481still see the extremely simple-minded code generation strategy we are
482using. In the entry block, an alloca is created, and the initial input
483value is stored into it. Each reference to the variable does a reload
484from the stack. Also, note that we didn't modify the if/then/else
485expression, so it still inserts a PHI node. While we could make an
486alloca for it, it is actually easier to create a PHI node for it, so we
487still just make the PHI.
488
489Here is the code after the mem2reg pass runs:
490
491.. code-block:: llvm
492
493 define double @fib(double %x) {
494 entry:
495 %cmptmp = fcmp ult double %x, 3.000000e+00
496 %booltmp = uitofp i1 %cmptmp to double
497 %ifcond = fcmp one double %booltmp, 0.000000e+00
498 br i1 %ifcond, label %then, label %else
499
500 then:
501 br label %ifcont
502
503 else:
504 %subtmp = fsub double %x, 1.000000e+00
505 %calltmp = call double @fib(double %subtmp)
506 %subtmp5 = fsub double %x, 2.000000e+00
507 %calltmp6 = call double @fib(double %subtmp5)
508 %addtmp = fadd double %calltmp, %calltmp6
509 br label %ifcont
510
511 ifcont: ; preds = %else, %then
512 %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
513 ret double %iftmp
514 }
515
516This is a trivial case for mem2reg, since there are no redefinitions of
517the variable. The point of showing this is to calm your tension about
518inserting such blatent inefficiencies :).
519
520After the rest of the optimizers run, we get:
521
522.. code-block:: llvm
523
524 define double @fib(double %x) {
525 entry:
526 %cmptmp = fcmp ult double %x, 3.000000e+00
527 %booltmp = uitofp i1 %cmptmp to double
528 %ifcond = fcmp ueq double %booltmp, 0.000000e+00
529 br i1 %ifcond, label %else, label %ifcont
530
531 else:
532 %subtmp = fsub double %x, 1.000000e+00
533 %calltmp = call double @fib(double %subtmp)
534 %subtmp5 = fsub double %x, 2.000000e+00
535 %calltmp6 = call double @fib(double %subtmp5)
536 %addtmp = fadd double %calltmp, %calltmp6
537 ret double %addtmp
538
539 ifcont:
540 ret double 1.000000e+00
541 }
542
543Here we see that the simplifycfg pass decided to clone the return
544instruction into the end of the 'else' block. This allowed it to
545eliminate some branches and the PHI node.
546
547Now that all symbol table references are updated to use stack variables,
548we'll add the assignment operator.
549
550New Assignment Operator
551=======================
552
553With our current framework, adding a new assignment operator is really
554simple. We will parse it just like any other binary operator, but handle
555it internally (instead of allowing the user to define it). The first
556step is to set a precedence:
557
558.. code-block:: c++
559
560 int main() {
561 // Install standard binary operators.
562 // 1 is lowest precedence.
563 BinopPrecedence['='] = 2;
564 BinopPrecedence['<'] = 10;
565 BinopPrecedence['+'] = 20;
566 BinopPrecedence['-'] = 20;
567
568Now that the parser knows the precedence of the binary operator, it
569takes care of all the parsing and AST generation. We just need to
570implement codegen for the assignment operator. This looks like:
571
572.. code-block:: c++
573
574 Value *BinaryExprAST::Codegen() {
575 // Special case '=' because we don't want to emit the LHS as an expression.
576 if (Op == '=') {
577 // Assignment requires the LHS to be an identifier.
578 VariableExprAST *LHSE = dynamic_cast<VariableExprAST*>(LHS);
579 if (!LHSE)
580 return ErrorV("destination of '=' must be a variable");
581
582Unlike the rest of the binary operators, our assignment operator doesn't
583follow the "emit LHS, emit RHS, do computation" model. As such, it is
584handled as a special case before the other binary operators are handled.
585The other strange thing is that it requires the LHS to be a variable. It
586is invalid to have "(x+1) = expr" - only things like "x = expr" are
587allowed.
588
589.. code-block:: c++
590
591 // Codegen the RHS.
592 Value *Val = RHS->Codegen();
593 if (Val == 0) return 0;
594
595 // Look up the name.
596 Value *Variable = NamedValues[LHSE->getName()];
597 if (Variable == 0) return ErrorV("Unknown variable name");
598
599 Builder.CreateStore(Val, Variable);
600 return Val;
601 }
602 ...
603
604Once we have the variable, codegen'ing the assignment is
605straightforward: we emit the RHS of the assignment, create a store, and
606return the computed value. Returning a value allows for chained
607assignments like "X = (Y = Z)".
608
609Now that we have an assignment operator, we can mutate loop variables
610and arguments. For example, we can now run code like this:
611
612::
613
614 # Function to print a double.
615 extern printd(x);
616
617 # Define ':' for sequencing: as a low-precedence operator that ignores operands
618 # and just returns the RHS.
619 def binary : 1 (x y) y;
620
621 def test(x)
622 printd(x) :
623 x = 4 :
624 printd(x);
625
626 test(123);
627
628When run, this example prints "123" and then "4", showing that we did
629actually mutate the value! Okay, we have now officially implemented our
630goal: getting this to work requires SSA construction in the general
631case. However, to be really useful, we want the ability to define our
632own local variables, lets add this next!
633
634User-defined Local Variables
635============================
636
637Adding var/in is just like any other other extensions we made to
638Kaleidoscope: we extend the lexer, the parser, the AST and the code
639generator. The first step for adding our new 'var/in' construct is to
640extend the lexer. As before, this is pretty trivial, the code looks like
641this:
642
643.. code-block:: c++
644
645 enum Token {
646 ...
647 // var definition
648 tok_var = -13
649 ...
650 }
651 ...
652 static int gettok() {
653 ...
654 if (IdentifierStr == "in") return tok_in;
655 if (IdentifierStr == "binary") return tok_binary;
656 if (IdentifierStr == "unary") return tok_unary;
657 if (IdentifierStr == "var") return tok_var;
658 return tok_identifier;
659 ...
660
661The next step is to define the AST node that we will construct. For
662var/in, it looks like this:
663
664.. code-block:: c++
665
666 /// VarExprAST - Expression class for var/in
667 class VarExprAST : public ExprAST {
668 std::vector<std::pair<std::string, ExprAST*> > VarNames;
669 ExprAST *Body;
670 public:
671 VarExprAST(const std::vector<std::pair<std::string, ExprAST*> > &varnames,
672 ExprAST *body)
673 : VarNames(varnames), Body(body) {}
674
675 virtual Value *Codegen();
676 };
677
678var/in allows a list of names to be defined all at once, and each name
679can optionally have an initializer value. As such, we capture this
680information in the VarNames vector. Also, var/in has a body, this body
681is allowed to access the variables defined by the var/in.
682
683With this in place, we can define the parser pieces. The first thing we
684do is add it as a primary expression:
685
686.. code-block:: c++
687
688 /// primary
689 /// ::= identifierexpr
690 /// ::= numberexpr
691 /// ::= parenexpr
692 /// ::= ifexpr
693 /// ::= forexpr
694 /// ::= varexpr
695 static ExprAST *ParsePrimary() {
696 switch (CurTok) {
697 default: return Error("unknown token when expecting an expression");
698 case tok_identifier: return ParseIdentifierExpr();
699 case tok_number: return ParseNumberExpr();
700 case '(': return ParseParenExpr();
701 case tok_if: return ParseIfExpr();
702 case tok_for: return ParseForExpr();
703 case tok_var: return ParseVarExpr();
704 }
705 }
706
707Next we define ParseVarExpr:
708
709.. code-block:: c++
710
711 /// varexpr ::= 'var' identifier ('=' expression)?
712 // (',' identifier ('=' expression)?)* 'in' expression
713 static ExprAST *ParseVarExpr() {
714 getNextToken(); // eat the var.
715
716 std::vector<std::pair<std::string, ExprAST*> > VarNames;
717
718 // At least one variable name is required.
719 if (CurTok != tok_identifier)
720 return Error("expected identifier after var");
721
722The first part of this code parses the list of identifier/expr pairs
723into the local ``VarNames`` vector.
724
725.. code-block:: c++
726
727 while (1) {
728 std::string Name = IdentifierStr;
729 getNextToken(); // eat identifier.
730
731 // Read the optional initializer.
732 ExprAST *Init = 0;
733 if (CurTok == '=') {
734 getNextToken(); // eat the '='.
735
736 Init = ParseExpression();
737 if (Init == 0) return 0;
738 }
739
740 VarNames.push_back(std::make_pair(Name, Init));
741
742 // End of var list, exit loop.
743 if (CurTok != ',') break;
744 getNextToken(); // eat the ','.
745
746 if (CurTok != tok_identifier)
747 return Error("expected identifier list after var");
748 }
749
750Once all the variables are parsed, we then parse the body and create the
751AST node:
752
753.. code-block:: c++
754
755 // At this point, we have to have 'in'.
756 if (CurTok != tok_in)
757 return Error("expected 'in' keyword after 'var'");
758 getNextToken(); // eat 'in'.
759
760 ExprAST *Body = ParseExpression();
761 if (Body == 0) return 0;
762
763 return new VarExprAST(VarNames, Body);
764 }
765
766Now that we can parse and represent the code, we need to support
767emission of LLVM IR for it. This code starts out with:
768
769.. code-block:: c++
770
771 Value *VarExprAST::Codegen() {
772 std::vector<AllocaInst *> OldBindings;
773
774 Function *TheFunction = Builder.GetInsertBlock()->getParent();
775
776 // Register all variables and emit their initializer.
777 for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
778 const std::string &VarName = VarNames[i].first;
779 ExprAST *Init = VarNames[i].second;
780
781Basically it loops over all the variables, installing them one at a
782time. For each variable we put into the symbol table, we remember the
783previous value that we replace in OldBindings.
784
785.. code-block:: c++
786
787 // Emit the initializer before adding the variable to scope, this prevents
788 // the initializer from referencing the variable itself, and permits stuff
789 // like this:
790 // var a = 1 in
791 // var a = a in ... # refers to outer 'a'.
792 Value *InitVal;
793 if (Init) {
794 InitVal = Init->Codegen();
795 if (InitVal == 0) return 0;
796 } else { // If not specified, use 0.0.
797 InitVal = ConstantFP::get(getGlobalContext(), APFloat(0.0));
798 }
799
800 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
801 Builder.CreateStore(InitVal, Alloca);
802
803 // Remember the old variable binding so that we can restore the binding when
804 // we unrecurse.
805 OldBindings.push_back(NamedValues[VarName]);
806
807 // Remember this binding.
808 NamedValues[VarName] = Alloca;
809 }
810
811There are more comments here than code. The basic idea is that we emit
812the initializer, create the alloca, then update the symbol table to
813point to it. Once all the variables are installed in the symbol table,
814we evaluate the body of the var/in expression:
815
816.. code-block:: c++
817
818 // Codegen the body, now that all vars are in scope.
819 Value *BodyVal = Body->Codegen();
820 if (BodyVal == 0) return 0;
821
822Finally, before returning, we restore the previous variable bindings:
823
824.. code-block:: c++
825
826 // Pop all our variables from scope.
827 for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
828 NamedValues[VarNames[i].first] = OldBindings[i];
829
830 // Return the body computation.
831 return BodyVal;
832 }
833
834The end result of all of this is that we get properly scoped variable
835definitions, and we even (trivially) allow mutation of them :).
836
837With this, we completed what we set out to do. Our nice iterative fib
838example from the intro compiles and runs just fine. The mem2reg pass
839optimizes all of our stack variables into SSA registers, inserting PHI
840nodes where needed, and our front-end remains simple: no "iterated
841dominance frontier" computation anywhere in sight.
842
843Full Code Listing
844=================
845
846Here is the complete code listing for our running example, enhanced with
847mutable variables and var/in support. To build this example, use:
848
849.. code-block:: bash
850
851 # Compile
852 clang++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
853 # Run
854 ./toy
855
856Here is the code:
857
858.. code-block:: c++
859
860 #include "llvm/DerivedTypes.h"
861 #include "llvm/ExecutionEngine/ExecutionEngine.h"
862 #include "llvm/ExecutionEngine/JIT.h"
863 #include "llvm/IRBuilder.h"
864 #include "llvm/LLVMContext.h"
865 #include "llvm/Module.h"
866 #include "llvm/PassManager.h"
867 #include "llvm/Analysis/Verifier.h"
868 #include "llvm/Analysis/Passes.h"
869 #include "llvm/DataLayout.h"
870 #include "llvm/Transforms/Scalar.h"
871 #include "llvm/Support/TargetSelect.h"
872 #include <cstdio>
873 #include <string>
874 #include <map>
875 #include <vector>
876 using namespace llvm;
877
878 //===----------------------------------------------------------------------===//
879 // Lexer
880 //===----------------------------------------------------------------------===//
881
882 // The lexer returns tokens [0-255] if it is an unknown character, otherwise one
883 // of these for known things.
884 enum Token {
885 tok_eof = -1,
886
887 // commands
888 tok_def = -2, tok_extern = -3,
889
890 // primary
891 tok_identifier = -4, tok_number = -5,
892
893 // control
894 tok_if = -6, tok_then = -7, tok_else = -8,
895 tok_for = -9, tok_in = -10,
896
897 // operators
898 tok_binary = -11, tok_unary = -12,
899
900 // var definition
901 tok_var = -13
902 };
903
904 static std::string IdentifierStr; // Filled in if tok_identifier
905 static double NumVal; // Filled in if tok_number
906
907 /// gettok - Return the next token from standard input.
908 static int gettok() {
909 static int LastChar = ' ';
910
911 // Skip any whitespace.
912 while (isspace(LastChar))
913 LastChar = getchar();
914
915 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
916 IdentifierStr = LastChar;
917 while (isalnum((LastChar = getchar())))
918 IdentifierStr += LastChar;
919
920 if (IdentifierStr == "def") return tok_def;
921 if (IdentifierStr == "extern") return tok_extern;
922 if (IdentifierStr == "if") return tok_if;
923 if (IdentifierStr == "then") return tok_then;
924 if (IdentifierStr == "else") return tok_else;
925 if (IdentifierStr == "for") return tok_for;
926 if (IdentifierStr == "in") return tok_in;
927 if (IdentifierStr == "binary") return tok_binary;
928 if (IdentifierStr == "unary") return tok_unary;
929 if (IdentifierStr == "var") return tok_var;
930 return tok_identifier;
931 }
932
933 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
934 std::string NumStr;
935 do {
936 NumStr += LastChar;
937 LastChar = getchar();
938 } while (isdigit(LastChar) || LastChar == '.');
939
940 NumVal = strtod(NumStr.c_str(), 0);
941 return tok_number;
942 }
943
944 if (LastChar == '#') {
945 // Comment until end of line.
946 do LastChar = getchar();
947 while (LastChar != EOF && LastChar != '\n' && LastChar != '\r');
948
949 if (LastChar != EOF)
950 return gettok();
951 }
952
953 // Check for end of file. Don't eat the EOF.
954 if (LastChar == EOF)
955 return tok_eof;
956
957 // Otherwise, just return the character as its ascii value.
958 int ThisChar = LastChar;
959 LastChar = getchar();
960 return ThisChar;
961 }
962
963 //===----------------------------------------------------------------------===//
964 // Abstract Syntax Tree (aka Parse Tree)
965 //===----------------------------------------------------------------------===//
966
967 /// ExprAST - Base class for all expression nodes.
968 class ExprAST {
969 public:
970 virtual ~ExprAST() {}
971 virtual Value *Codegen() = 0;
972 };
973
974 /// NumberExprAST - Expression class for numeric literals like "1.0".
975 class NumberExprAST : public ExprAST {
976 double Val;
977 public:
978 NumberExprAST(double val) : Val(val) {}
979 virtual Value *Codegen();
980 };
981
982 /// VariableExprAST - Expression class for referencing a variable, like "a".
983 class VariableExprAST : public ExprAST {
984 std::string Name;
985 public:
986 VariableExprAST(const std::string &name) : Name(name) {}
987 const std::string &getName() const { return Name; }
988 virtual Value *Codegen();
989 };
990
991 /// UnaryExprAST - Expression class for a unary operator.
992 class UnaryExprAST : public ExprAST {
993 char Opcode;
994 ExprAST *Operand;
995 public:
996 UnaryExprAST(char opcode, ExprAST *operand)
997 : Opcode(opcode), Operand(operand) {}
998 virtual Value *Codegen();
999 };
1000
1001 /// BinaryExprAST - Expression class for a binary operator.
1002 class BinaryExprAST : public ExprAST {
1003 char Op;
1004 ExprAST *LHS, *RHS;
1005 public:
1006 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
1007 : Op(op), LHS(lhs), RHS(rhs) {}
1008 virtual Value *Codegen();
1009 };
1010
1011 /// CallExprAST - Expression class for function calls.
1012 class CallExprAST : public ExprAST {
1013 std::string Callee;
1014 std::vector<ExprAST*> Args;
1015 public:
1016 CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
1017 : Callee(callee), Args(args) {}
1018 virtual Value *Codegen();
1019 };
1020
1021 /// IfExprAST - Expression class for if/then/else.
1022 class IfExprAST : public ExprAST {
1023 ExprAST *Cond, *Then, *Else;
1024 public:
1025 IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
1026 : Cond(cond), Then(then), Else(_else) {}
1027 virtual Value *Codegen();
1028 };
1029
1030 /// ForExprAST - Expression class for for/in.
1031 class ForExprAST : public ExprAST {
1032 std::string VarName;
1033 ExprAST *Start, *End, *Step, *Body;
1034 public:
1035 ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end,
1036 ExprAST *step, ExprAST *body)
1037 : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
1038 virtual Value *Codegen();
1039 };
1040
1041 /// VarExprAST - Expression class for var/in
1042 class VarExprAST : public ExprAST {
1043 std::vector<std::pair<std::string, ExprAST*> > VarNames;
1044 ExprAST *Body;
1045 public:
1046 VarExprAST(const std::vector<std::pair<std::string, ExprAST*> > &varnames,
1047 ExprAST *body)
1048 : VarNames(varnames), Body(body) {}
1049
1050 virtual Value *Codegen();
1051 };
1052
1053 /// PrototypeAST - This class represents the "prototype" for a function,
1054 /// which captures its name, and its argument names (thus implicitly the number
1055 /// of arguments the function takes), as well as if it is an operator.
1056 class PrototypeAST {
1057 std::string Name;
1058 std::vector<std::string> Args;
1059 bool isOperator;
1060 unsigned Precedence; // Precedence if a binary op.
1061 public:
1062 PrototypeAST(const std::string &name, const std::vector<std::string> &args,
1063 bool isoperator = false, unsigned prec = 0)
1064 : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {}
1065
1066 bool isUnaryOp() const { return isOperator && Args.size() == 1; }
1067 bool isBinaryOp() const { return isOperator && Args.size() == 2; }
1068
1069 char getOperatorName() const {
1070 assert(isUnaryOp() || isBinaryOp());
1071 return Name[Name.size()-1];
1072 }
1073
1074 unsigned getBinaryPrecedence() const { return Precedence; }
1075
1076 Function *Codegen();
1077
1078 void CreateArgumentAllocas(Function *F);
1079 };
1080
1081 /// FunctionAST - This class represents a function definition itself.
1082 class FunctionAST {
1083 PrototypeAST *Proto;
1084 ExprAST *Body;
1085 public:
1086 FunctionAST(PrototypeAST *proto, ExprAST *body)
1087 : Proto(proto), Body(body) {}
1088
1089 Function *Codegen();
1090 };
1091
1092 //===----------------------------------------------------------------------===//
1093 // Parser
1094 //===----------------------------------------------------------------------===//
1095
1096 /// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
1097 /// token the parser is looking at. getNextToken reads another token from the
1098 /// lexer and updates CurTok with its results.
1099 static int CurTok;
1100 static int getNextToken() {
1101 return CurTok = gettok();
1102 }
1103
1104 /// BinopPrecedence - This holds the precedence for each binary operator that is
1105 /// defined.
1106 static std::map<char, int> BinopPrecedence;
1107
1108 /// GetTokPrecedence - Get the precedence of the pending binary operator token.
1109 static int GetTokPrecedence() {
1110 if (!isascii(CurTok))
1111 return -1;
1112
1113 // Make sure it's a declared binop.
1114 int TokPrec = BinopPrecedence[CurTok];
1115 if (TokPrec <= 0) return -1;
1116 return TokPrec;
1117 }
1118
1119 /// Error* - These are little helper functions for error handling.
1120 ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
1121 PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
1122 FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
1123
1124 static ExprAST *ParseExpression();
1125
1126 /// identifierexpr
1127 /// ::= identifier
1128 /// ::= identifier '(' expression* ')'
1129 static ExprAST *ParseIdentifierExpr() {
1130 std::string IdName = IdentifierStr;
1131
1132 getNextToken(); // eat identifier.
1133
1134 if (CurTok != '(') // Simple variable ref.
1135 return new VariableExprAST(IdName);
1136
1137 // Call.
1138 getNextToken(); // eat (
1139 std::vector<ExprAST*> Args;
1140 if (CurTok != ')') {
1141 while (1) {
1142 ExprAST *Arg = ParseExpression();
1143 if (!Arg) return 0;
1144 Args.push_back(Arg);
1145
1146 if (CurTok == ')') break;
1147
1148 if (CurTok != ',')
1149 return Error("Expected ')' or ',' in argument list");
1150 getNextToken();
1151 }
1152 }
1153
1154 // Eat the ')'.
1155 getNextToken();
1156
1157 return new CallExprAST(IdName, Args);
1158 }
1159
1160 /// numberexpr ::= number
1161 static ExprAST *ParseNumberExpr() {
1162 ExprAST *Result = new NumberExprAST(NumVal);
1163 getNextToken(); // consume the number
1164 return Result;
1165 }
1166
1167 /// parenexpr ::= '(' expression ')'
1168 static ExprAST *ParseParenExpr() {
1169 getNextToken(); // eat (.
1170 ExprAST *V = ParseExpression();
1171 if (!V) return 0;
1172
1173 if (CurTok != ')')
1174 return Error("expected ')'");
1175 getNextToken(); // eat ).
1176 return V;
1177 }
1178
1179 /// ifexpr ::= 'if' expression 'then' expression 'else' expression
1180 static ExprAST *ParseIfExpr() {
1181 getNextToken(); // eat the if.
1182
1183 // condition.
1184 ExprAST *Cond = ParseExpression();
1185 if (!Cond) return 0;
1186
1187 if (CurTok != tok_then)
1188 return Error("expected then");
1189 getNextToken(); // eat the then
1190
1191 ExprAST *Then = ParseExpression();
1192 if (Then == 0) return 0;
1193
1194 if (CurTok != tok_else)
1195 return Error("expected else");
1196
1197 getNextToken();
1198
1199 ExprAST *Else = ParseExpression();
1200 if (!Else) return 0;
1201
1202 return new IfExprAST(Cond, Then, Else);
1203 }
1204
1205 /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
1206 static ExprAST *ParseForExpr() {
1207 getNextToken(); // eat the for.
1208
1209 if (CurTok != tok_identifier)
1210 return Error("expected identifier after for");
1211
1212 std::string IdName = IdentifierStr;
1213 getNextToken(); // eat identifier.
1214
1215 if (CurTok != '=')
1216 return Error("expected '=' after for");
1217 getNextToken(); // eat '='.
1218
1219
1220 ExprAST *Start = ParseExpression();
1221 if (Start == 0) return 0;
1222 if (CurTok != ',')
1223 return Error("expected ',' after for start value");
1224 getNextToken();
1225
1226 ExprAST *End = ParseExpression();
1227 if (End == 0) return 0;
1228
1229 // The step value is optional.
1230 ExprAST *Step = 0;
1231 if (CurTok == ',') {
1232 getNextToken();
1233 Step = ParseExpression();
1234 if (Step == 0) return 0;
1235 }
1236
1237 if (CurTok != tok_in)
1238 return Error("expected 'in' after for");
1239 getNextToken(); // eat 'in'.
1240
1241 ExprAST *Body = ParseExpression();
1242 if (Body == 0) return 0;
1243
1244 return new ForExprAST(IdName, Start, End, Step, Body);
1245 }
1246
1247 /// varexpr ::= 'var' identifier ('=' expression)?
1248 // (',' identifier ('=' expression)?)* 'in' expression
1249 static ExprAST *ParseVarExpr() {
1250 getNextToken(); // eat the var.
1251
1252 std::vector<std::pair<std::string, ExprAST*> > VarNames;
1253
1254 // At least one variable name is required.
1255 if (CurTok != tok_identifier)
1256 return Error("expected identifier after var");
1257
1258 while (1) {
1259 std::string Name = IdentifierStr;
1260 getNextToken(); // eat identifier.
1261
1262 // Read the optional initializer.
1263 ExprAST *Init = 0;
1264 if (CurTok == '=') {
1265 getNextToken(); // eat the '='.
1266
1267 Init = ParseExpression();
1268 if (Init == 0) return 0;
1269 }
1270
1271 VarNames.push_back(std::make_pair(Name, Init));
1272
1273 // End of var list, exit loop.
1274 if (CurTok != ',') break;
1275 getNextToken(); // eat the ','.
1276
1277 if (CurTok != tok_identifier)
1278 return Error("expected identifier list after var");
1279 }
1280
1281 // At this point, we have to have 'in'.
1282 if (CurTok != tok_in)
1283 return Error("expected 'in' keyword after 'var'");
1284 getNextToken(); // eat 'in'.
1285
1286 ExprAST *Body = ParseExpression();
1287 if (Body == 0) return 0;
1288
1289 return new VarExprAST(VarNames, Body);
1290 }
1291
1292 /// primary
1293 /// ::= identifierexpr
1294 /// ::= numberexpr
1295 /// ::= parenexpr
1296 /// ::= ifexpr
1297 /// ::= forexpr
1298 /// ::= varexpr
1299 static ExprAST *ParsePrimary() {
1300 switch (CurTok) {
1301 default: return Error("unknown token when expecting an expression");
1302 case tok_identifier: return ParseIdentifierExpr();
1303 case tok_number: return ParseNumberExpr();
1304 case '(': return ParseParenExpr();
1305 case tok_if: return ParseIfExpr();
1306 case tok_for: return ParseForExpr();
1307 case tok_var: return ParseVarExpr();
1308 }
1309 }
1310
1311 /// unary
1312 /// ::= primary
1313 /// ::= '!' unary
1314 static ExprAST *ParseUnary() {
1315 // If the current token is not an operator, it must be a primary expr.
1316 if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
1317 return ParsePrimary();
1318
1319 // If this is a unary operator, read it.
1320 int Opc = CurTok;
1321 getNextToken();
1322 if (ExprAST *Operand = ParseUnary())
1323 return new UnaryExprAST(Opc, Operand);
1324 return 0;
1325 }
1326
1327 /// binoprhs
1328 /// ::= ('+' unary)*
1329 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
1330 // If this is a binop, find its precedence.
1331 while (1) {
1332 int TokPrec = GetTokPrecedence();
1333
1334 // If this is a binop that binds at least as tightly as the current binop,
1335 // consume it, otherwise we are done.
1336 if (TokPrec < ExprPrec)
1337 return LHS;
1338
1339 // Okay, we know this is a binop.
1340 int BinOp = CurTok;
1341 getNextToken(); // eat binop
1342
1343 // Parse the unary expression after the binary operator.
1344 ExprAST *RHS = ParseUnary();
1345 if (!RHS) return 0;
1346
1347 // If BinOp binds less tightly with RHS than the operator after RHS, let
1348 // the pending operator take RHS as its LHS.
1349 int NextPrec = GetTokPrecedence();
1350 if (TokPrec < NextPrec) {
1351 RHS = ParseBinOpRHS(TokPrec+1, RHS);
1352 if (RHS == 0) return 0;
1353 }
1354
1355 // Merge LHS/RHS.
1356 LHS = new BinaryExprAST(BinOp, LHS, RHS);
1357 }
1358 }
1359
1360 /// expression
1361 /// ::= unary binoprhs
1362 ///
1363 static ExprAST *ParseExpression() {
1364 ExprAST *LHS = ParseUnary();
1365 if (!LHS) return 0;
1366
1367 return ParseBinOpRHS(0, LHS);
1368 }
1369
1370 /// prototype
1371 /// ::= id '(' id* ')'
1372 /// ::= binary LETTER number? (id, id)
1373 /// ::= unary LETTER (id)
1374 static PrototypeAST *ParsePrototype() {
1375 std::string FnName;
1376
1377 unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
1378 unsigned BinaryPrecedence = 30;
1379
1380 switch (CurTok) {
1381 default:
1382 return ErrorP("Expected function name in prototype");
1383 case tok_identifier:
1384 FnName = IdentifierStr;
1385 Kind = 0;
1386 getNextToken();
1387 break;
1388 case tok_unary:
1389 getNextToken();
1390 if (!isascii(CurTok))
1391 return ErrorP("Expected unary operator");
1392 FnName = "unary";
1393 FnName += (char)CurTok;
1394 Kind = 1;
1395 getNextToken();
1396 break;
1397 case tok_binary:
1398 getNextToken();
1399 if (!isascii(CurTok))
1400 return ErrorP("Expected binary operator");
1401 FnName = "binary";
1402 FnName += (char)CurTok;
1403 Kind = 2;
1404 getNextToken();
1405
1406 // Read the precedence if present.
1407 if (CurTok == tok_number) {
1408 if (NumVal < 1 || NumVal > 100)
1409 return ErrorP("Invalid precedecnce: must be 1..100");
1410 BinaryPrecedence = (unsigned)NumVal;
1411 getNextToken();
1412 }
1413 break;
1414 }
1415
1416 if (CurTok != '(')
1417 return ErrorP("Expected '(' in prototype");
1418
1419 std::vector<std::string> ArgNames;
1420 while (getNextToken() == tok_identifier)
1421 ArgNames.push_back(IdentifierStr);
1422 if (CurTok != ')')
1423 return ErrorP("Expected ')' in prototype");
1424
1425 // success.
1426 getNextToken(); // eat ')'.
1427
1428 // Verify right number of names for operator.
1429 if (Kind && ArgNames.size() != Kind)
1430 return ErrorP("Invalid number of operands for operator");
1431
1432 return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);
1433 }
1434
1435 /// definition ::= 'def' prototype expression
1436 static FunctionAST *ParseDefinition() {
1437 getNextToken(); // eat def.
1438 PrototypeAST *Proto = ParsePrototype();
1439 if (Proto == 0) return 0;
1440
1441 if (ExprAST *E = ParseExpression())
1442 return new FunctionAST(Proto, E);
1443 return 0;
1444 }
1445
1446 /// toplevelexpr ::= expression
1447 static FunctionAST *ParseTopLevelExpr() {
1448 if (ExprAST *E = ParseExpression()) {
1449 // Make an anonymous proto.
1450 PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
1451 return new FunctionAST(Proto, E);
1452 }
1453 return 0;
1454 }
1455
1456 /// external ::= 'extern' prototype
1457 static PrototypeAST *ParseExtern() {
1458 getNextToken(); // eat extern.
1459 return ParsePrototype();
1460 }
1461
1462 //===----------------------------------------------------------------------===//
1463 // Code Generation
1464 //===----------------------------------------------------------------------===//
1465
1466 static Module *TheModule;
1467 static IRBuilder<> Builder(getGlobalContext());
1468 static std::map<std::string, AllocaInst*> NamedValues;
1469 static FunctionPassManager *TheFPM;
1470
1471 Value *ErrorV(const char *Str) { Error(Str); return 0; }
1472
1473 /// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
1474 /// the function. This is used for mutable variables etc.
1475 static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
1476 const std::string &VarName) {
1477 IRBuilder<> TmpB(&TheFunction->getEntryBlock(),
1478 TheFunction->getEntryBlock().begin());
1479 return TmpB.CreateAlloca(Type::getDoubleTy(getGlobalContext()), 0,
1480 VarName.c_str());
1481 }
1482
1483 Value *NumberExprAST::Codegen() {
1484 return ConstantFP::get(getGlobalContext(), APFloat(Val));
1485 }
1486
1487 Value *VariableExprAST::Codegen() {
1488 // Look this variable up in the function.
1489 Value *V = NamedValues[Name];
1490 if (V == 0) return ErrorV("Unknown variable name");
1491
1492 // Load the value.
1493 return Builder.CreateLoad(V, Name.c_str());
1494 }
1495
1496 Value *UnaryExprAST::Codegen() {
1497 Value *OperandV = Operand->Codegen();
1498 if (OperandV == 0) return 0;
1499
1500 Function *F = TheModule->getFunction(std::string("unary")+Opcode);
1501 if (F == 0)
1502 return ErrorV("Unknown unary operator");
1503
1504 return Builder.CreateCall(F, OperandV, "unop");
1505 }
1506
1507 Value *BinaryExprAST::Codegen() {
1508 // Special case '=' because we don't want to emit the LHS as an expression.
1509 if (Op == '=') {
1510 // Assignment requires the LHS to be an identifier.
1511 VariableExprAST *LHSE = dynamic_cast<VariableExprAST*>(LHS);
1512 if (!LHSE)
1513 return ErrorV("destination of '=' must be a variable");
1514 // Codegen the RHS.
1515 Value *Val = RHS->Codegen();
1516 if (Val == 0) return 0;
1517
1518 // Look up the name.
1519 Value *Variable = NamedValues[LHSE->getName()];
1520 if (Variable == 0) return ErrorV("Unknown variable name");
1521
1522 Builder.CreateStore(Val, Variable);
1523 return Val;
1524 }
1525
1526 Value *L = LHS->Codegen();
1527 Value *R = RHS->Codegen();
1528 if (L == 0 || R == 0) return 0;
1529
1530 switch (Op) {
1531 case '+': return Builder.CreateFAdd(L, R, "addtmp");
1532 case '-': return Builder.CreateFSub(L, R, "subtmp");
1533 case '*': return Builder.CreateFMul(L, R, "multmp");
1534 case '<':
1535 L = Builder.CreateFCmpULT(L, R, "cmptmp");
1536 // Convert bool 0/1 to double 0.0 or 1.0
1537 return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
1538 "booltmp");
1539 default: break;
1540 }
1541
1542 // If it wasn't a builtin binary operator, it must be a user defined one. Emit
1543 // a call to it.
1544 Function *F = TheModule->getFunction(std::string("binary")+Op);
1545 assert(F && "binary operator not found!");
1546
1547 Value *Ops[2] = { L, R };
1548 return Builder.CreateCall(F, Ops, "binop");
1549 }
1550
1551 Value *CallExprAST::Codegen() {
1552 // Look up the name in the global module table.
1553 Function *CalleeF = TheModule->getFunction(Callee);
1554 if (CalleeF == 0)
1555 return ErrorV("Unknown function referenced");
1556
1557 // If argument mismatch error.
1558 if (CalleeF->arg_size() != Args.size())
1559 return ErrorV("Incorrect # arguments passed");
1560
1561 std::vector<Value*> ArgsV;
1562 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
1563 ArgsV.push_back(Args[i]->Codegen());
1564 if (ArgsV.back() == 0) return 0;
1565 }
1566
1567 return Builder.CreateCall(CalleeF, ArgsV, "calltmp");
1568 }
1569
1570 Value *IfExprAST::Codegen() {
1571 Value *CondV = Cond->Codegen();
1572 if (CondV == 0) return 0;
1573
1574 // Convert condition to a bool by comparing equal to 0.0.
1575 CondV = Builder.CreateFCmpONE(CondV,
1576 ConstantFP::get(getGlobalContext(), APFloat(0.0)),
1577 "ifcond");
1578
1579 Function *TheFunction = Builder.GetInsertBlock()->getParent();
1580
1581 // Create blocks for the then and else cases. Insert the 'then' block at the
1582 // end of the function.
1583 BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction);
1584 BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else");
1585 BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont");
1586
1587 Builder.CreateCondBr(CondV, ThenBB, ElseBB);
1588
1589 // Emit then value.
1590 Builder.SetInsertPoint(ThenBB);
1591
1592 Value *ThenV = Then->Codegen();
1593 if (ThenV == 0) return 0;
1594
1595 Builder.CreateBr(MergeBB);
1596 // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
1597 ThenBB = Builder.GetInsertBlock();
1598
1599 // Emit else block.
1600 TheFunction->getBasicBlockList().push_back(ElseBB);
1601 Builder.SetInsertPoint(ElseBB);
1602
1603 Value *ElseV = Else->Codegen();
1604 if (ElseV == 0) return 0;
1605
1606 Builder.CreateBr(MergeBB);
1607 // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
1608 ElseBB = Builder.GetInsertBlock();
1609
1610 // Emit merge block.
1611 TheFunction->getBasicBlockList().push_back(MergeBB);
1612 Builder.SetInsertPoint(MergeBB);
1613 PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2,
1614 "iftmp");
1615
1616 PN->addIncoming(ThenV, ThenBB);
1617 PN->addIncoming(ElseV, ElseBB);
1618 return PN;
1619 }
1620
1621 Value *ForExprAST::Codegen() {
1622 // Output this as:
1623 // var = alloca double
1624 // ...
1625 // start = startexpr
1626 // store start -> var
1627 // goto loop
1628 // loop:
1629 // ...
1630 // bodyexpr
1631 // ...
1632 // loopend:
1633 // step = stepexpr
1634 // endcond = endexpr
1635 //
1636 // curvar = load var
1637 // nextvar = curvar + step
1638 // store nextvar -> var
1639 // br endcond, loop, endloop
1640 // outloop:
1641
1642 Function *TheFunction = Builder.GetInsertBlock()->getParent();
1643
1644 // Create an alloca for the variable in the entry block.
1645 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
1646
1647 // Emit the start code first, without 'variable' in scope.
1648 Value *StartVal = Start->Codegen();
1649 if (StartVal == 0) return 0;
1650
1651 // Store the value into the alloca.
1652 Builder.CreateStore(StartVal, Alloca);
1653
1654 // Make the new basic block for the loop header, inserting after current
1655 // block.
1656 BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction);
1657
1658 // Insert an explicit fall through from the current block to the LoopBB.
1659 Builder.CreateBr(LoopBB);
1660
1661 // Start insertion in LoopBB.
1662 Builder.SetInsertPoint(LoopBB);
1663
1664 // Within the loop, the variable is defined equal to the PHI node. If it
1665 // shadows an existing variable, we have to restore it, so save it now.
1666 AllocaInst *OldVal = NamedValues[VarName];
1667 NamedValues[VarName] = Alloca;
1668
1669 // Emit the body of the loop. This, like any other expr, can change the
1670 // current BB. Note that we ignore the value computed by the body, but don't
1671 // allow an error.
1672 if (Body->Codegen() == 0)
1673 return 0;
1674
1675 // Emit the step value.
1676 Value *StepVal;
1677 if (Step) {
1678 StepVal = Step->Codegen();
1679 if (StepVal == 0) return 0;
1680 } else {
1681 // If not specified, use 1.0.
1682 StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0));
1683 }
1684
1685 // Compute the end condition.
1686 Value *EndCond = End->Codegen();
1687 if (EndCond == 0) return EndCond;
1688
1689 // Reload, increment, and restore the alloca. This handles the case where
1690 // the body of the loop mutates the variable.
1691 Value *CurVar = Builder.CreateLoad(Alloca, VarName.c_str());
1692 Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar");
1693 Builder.CreateStore(NextVar, Alloca);
1694
1695 // Convert condition to a bool by comparing equal to 0.0.
1696 EndCond = Builder.CreateFCmpONE(EndCond,
1697 ConstantFP::get(getGlobalContext(), APFloat(0.0)),
1698 "loopcond");
1699
1700 // Create the "after loop" block and insert it.
1701 BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction);
1702
1703 // Insert the conditional branch into the end of LoopEndBB.
1704 Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
1705
1706 // Any new code will be inserted in AfterBB.
1707 Builder.SetInsertPoint(AfterBB);
1708
1709 // Restore the unshadowed variable.
1710 if (OldVal)
1711 NamedValues[VarName] = OldVal;
1712 else
1713 NamedValues.erase(VarName);
1714
1715
1716 // for expr always returns 0.0.
1717 return Constant::getNullValue(Type::getDoubleTy(getGlobalContext()));
1718 }
1719
1720 Value *VarExprAST::Codegen() {
1721 std::vector<AllocaInst *> OldBindings;
1722
1723 Function *TheFunction = Builder.GetInsertBlock()->getParent();
1724
1725 // Register all variables and emit their initializer.
1726 for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
1727 const std::string &VarName = VarNames[i].first;
1728 ExprAST *Init = VarNames[i].second;
1729
1730 // Emit the initializer before adding the variable to scope, this prevents
1731 // the initializer from referencing the variable itself, and permits stuff
1732 // like this:
1733 // var a = 1 in
1734 // var a = a in ... # refers to outer 'a'.
1735 Value *InitVal;
1736 if (Init) {
1737 InitVal = Init->Codegen();
1738 if (InitVal == 0) return 0;
1739 } else { // If not specified, use 0.0.
1740 InitVal = ConstantFP::get(getGlobalContext(), APFloat(0.0));
1741 }
1742
1743 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
1744 Builder.CreateStore(InitVal, Alloca);
1745
1746 // Remember the old variable binding so that we can restore the binding when
1747 // we unrecurse.
1748 OldBindings.push_back(NamedValues[VarName]);
1749
1750 // Remember this binding.
1751 NamedValues[VarName] = Alloca;
1752 }
1753
1754 // Codegen the body, now that all vars are in scope.
1755 Value *BodyVal = Body->Codegen();
1756 if (BodyVal == 0) return 0;
1757
1758 // Pop all our variables from scope.
1759 for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
1760 NamedValues[VarNames[i].first] = OldBindings[i];
1761
1762 // Return the body computation.
1763 return BodyVal;
1764 }
1765
1766 Function *PrototypeAST::Codegen() {
1767 // Make the function type: double(double,double) etc.
1768 std::vector<Type*> Doubles(Args.size(),
1769 Type::getDoubleTy(getGlobalContext()));
1770 FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()),
1771 Doubles, false);
1772
1773 Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
1774
1775 // If F conflicted, there was already something named 'Name'. If it has a
1776 // body, don't allow redefinition or reextern.
1777 if (F->getName() != Name) {
1778 // Delete the one we just made and get the existing one.
1779 F->eraseFromParent();
1780 F = TheModule->getFunction(Name);
1781
1782 // If F already has a body, reject this.
1783 if (!F->empty()) {
1784 ErrorF("redefinition of function");
1785 return 0;
1786 }
1787
1788 // If F took a different number of args, reject.
1789 if (F->arg_size() != Args.size()) {
1790 ErrorF("redefinition of function with different # args");
1791 return 0;
1792 }
1793 }
1794
1795 // Set names for all arguments.
1796 unsigned Idx = 0;
1797 for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size();
1798 ++AI, ++Idx)
1799 AI->setName(Args[Idx]);
1800
1801 return F;
1802 }
1803
1804 /// CreateArgumentAllocas - Create an alloca for each argument and register the
1805 /// argument in the symbol table so that references to it will succeed.
1806 void PrototypeAST::CreateArgumentAllocas(Function *F) {
1807 Function::arg_iterator AI = F->arg_begin();
1808 for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
1809 // Create an alloca for this variable.
1810 AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
1811
1812 // Store the initial value into the alloca.
1813 Builder.CreateStore(AI, Alloca);
1814
1815 // Add arguments to variable symbol table.
1816 NamedValues[Args[Idx]] = Alloca;
1817 }
1818 }
1819
1820 Function *FunctionAST::Codegen() {
1821 NamedValues.clear();
1822
1823 Function *TheFunction = Proto->Codegen();
1824 if (TheFunction == 0)
1825 return 0;
1826
1827 // If this is an operator, install it.
1828 if (Proto->isBinaryOp())
1829 BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence();
1830
1831 // Create a new basic block to start insertion into.
1832 BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
1833 Builder.SetInsertPoint(BB);
1834
1835 // Add all arguments to the symbol table and create their allocas.
1836 Proto->CreateArgumentAllocas(TheFunction);
1837
1838 if (Value *RetVal = Body->Codegen()) {
1839 // Finish off the function.
1840 Builder.CreateRet(RetVal);
1841
1842 // Validate the generated code, checking for consistency.
1843 verifyFunction(*TheFunction);
1844
1845 // Optimize the function.
1846 TheFPM->run(*TheFunction);
1847
1848 return TheFunction;
1849 }
1850
1851 // Error reading body, remove function.
1852 TheFunction->eraseFromParent();
1853
1854 if (Proto->isBinaryOp())
1855 BinopPrecedence.erase(Proto->getOperatorName());
1856 return 0;
1857 }
1858
1859 //===----------------------------------------------------------------------===//
1860 // Top-Level parsing and JIT Driver
1861 //===----------------------------------------------------------------------===//
1862
1863 static ExecutionEngine *TheExecutionEngine;
1864
1865 static void HandleDefinition() {
1866 if (FunctionAST *F = ParseDefinition()) {
1867 if (Function *LF = F->Codegen()) {
1868 fprintf(stderr, "Read function definition:");
1869 LF->dump();
1870 }
1871 } else {
1872 // Skip token for error recovery.
1873 getNextToken();
1874 }
1875 }
1876
1877 static void HandleExtern() {
1878 if (PrototypeAST *P = ParseExtern()) {
1879 if (Function *F = P->Codegen()) {
1880 fprintf(stderr, "Read extern: ");
1881 F->dump();
1882 }
1883 } else {
1884 // Skip token for error recovery.
1885 getNextToken();
1886 }
1887 }
1888
1889 static void HandleTopLevelExpression() {
1890 // Evaluate a top-level expression into an anonymous function.
1891 if (FunctionAST *F = ParseTopLevelExpr()) {
1892 if (Function *LF = F->Codegen()) {
1893 // JIT the function, returning a function pointer.
1894 void *FPtr = TheExecutionEngine->getPointerToFunction(LF);
1895
1896 // Cast it to the right type (takes no arguments, returns a double) so we
1897 // can call it as a native function.
1898 double (*FP)() = (double (*)())(intptr_t)FPtr;
1899 fprintf(stderr, "Evaluated to %f\n", FP());
1900 }
1901 } else {
1902 // Skip token for error recovery.
1903 getNextToken();
1904 }
1905 }
1906
1907 /// top ::= definition | external | expression | ';'
1908 static void MainLoop() {
1909 while (1) {
1910 fprintf(stderr, "ready> ");
1911 switch (CurTok) {
1912 case tok_eof: return;
1913 case ';': getNextToken(); break; // ignore top-level semicolons.
1914 case tok_def: HandleDefinition(); break;
1915 case tok_extern: HandleExtern(); break;
1916 default: HandleTopLevelExpression(); break;
1917 }
1918 }
1919 }
1920
1921 //===----------------------------------------------------------------------===//
1922 // "Library" functions that can be "extern'd" from user code.
1923 //===----------------------------------------------------------------------===//
1924
1925 /// putchard - putchar that takes a double and returns 0.
1926 extern "C"
1927 double putchard(double X) {
1928 putchar((char)X);
1929 return 0;
1930 }
1931
1932 /// printd - printf that takes a double prints it as "%f\n", returning 0.
1933 extern "C"
1934 double printd(double X) {
1935 printf("%f\n", X);
1936 return 0;
1937 }
1938
1939 //===----------------------------------------------------------------------===//
1940 // Main driver code.
1941 //===----------------------------------------------------------------------===//
1942
1943 int main() {
1944 InitializeNativeTarget();
1945 LLVMContext &Context = getGlobalContext();
1946
1947 // Install standard binary operators.
1948 // 1 is lowest precedence.
1949 BinopPrecedence['='] = 2;
1950 BinopPrecedence['<'] = 10;
1951 BinopPrecedence['+'] = 20;
1952 BinopPrecedence['-'] = 20;
1953 BinopPrecedence['*'] = 40; // highest.
1954
1955 // Prime the first token.
1956 fprintf(stderr, "ready> ");
1957 getNextToken();
1958
1959 // Make the module, which holds all the code.
1960 TheModule = new Module("my cool jit", Context);
1961
1962 // Create the JIT. This takes ownership of the module.
1963 std::string ErrStr;
1964 TheExecutionEngine = EngineBuilder(TheModule).setErrorStr(&ErrStr).create();
1965 if (!TheExecutionEngine) {
1966 fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str());
1967 exit(1);
1968 }
1969
1970 FunctionPassManager OurFPM(TheModule);
1971
1972 // Set up the optimizer pipeline. Start with registering info about how the
1973 // target lays out data structures.
1974 OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout()));
1975 // Provide basic AliasAnalysis support for GVN.
1976 OurFPM.add(createBasicAliasAnalysisPass());
1977 // Promote allocas to registers.
1978 OurFPM.add(createPromoteMemoryToRegisterPass());
1979 // Do simple "peephole" optimizations and bit-twiddling optzns.
1980 OurFPM.add(createInstructionCombiningPass());
1981 // Reassociate expressions.
1982 OurFPM.add(createReassociatePass());
1983 // Eliminate Common SubExpressions.
1984 OurFPM.add(createGVNPass());
1985 // Simplify the control flow graph (deleting unreachable blocks, etc).
1986 OurFPM.add(createCFGSimplificationPass());
1987
1988 OurFPM.doInitialization();
1989
1990 // Set the global so the code gen can use this.
1991 TheFPM = &OurFPM;
1992
1993 // Run the main "interpreter loop" now.
1994 MainLoop();
1995
1996 TheFPM = 0;
1997
1998 // Print out all of the generated code.
1999 TheModule->dump();
2000
2001 return 0;
2002 }
2003
2004`Next: Conclusion and other useful LLVM tidbits <LangImpl8.html>`_
2005