blob: b1f237c2c4a4c9371b21005009e5a69bf03d531b [file] [log] [blame]
Chris Lattner2e902042007-10-22 07:01:42 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
3
4<html>
5<head>
6 <title>Kaleidoscope: Implementing code generation to LLVM IR</title>
7 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
8 <meta name="author" content="Chris Lattner">
9 <link rel="stylesheet" href="../llvm.css" type="text/css">
10</head>
11
12<body>
13
14<div class="doc_title">Kaleidoscope: Code generation to LLVM IR</div>
15
16<div class="doc_author">
17 <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
18</div>
19
20<!-- *********************************************************************** -->
21<div class="doc_section"><a name="intro">Part 3 Introduction</a></div>
22<!-- *********************************************************************** -->
23
24<div class="doc_text">
25
26<p>Welcome to part 3 of the "<a href="index.html">Implementing a language with
27LLVM</a>" tutorial. This chapter shows you how to transform the <a
28href="LangImpl2.html">Abstract Syntax Tree built in Chapter 2</a> into LLVM IR.
29This will teach you a little bit about how LLVM does things, as well as
30demonstrate how easy it is to use. It's much more work to build a lexer and
31parser than it is to generate LLVM IR code.
32</p>
33
34</div>
35
36<!-- *********************************************************************** -->
37<div class="doc_section"><a name="basics">Code Generation setup</a></div>
38<!-- *********************************************************************** -->
39
40<div class="doc_text">
41
42<p>
43In order to generate LLVM IR, we want some simple setup to get started. First,
44we define virtual codegen methods in each AST class:</p>
45
46<div class="doc_code">
47<pre>
48/// ExprAST - Base class for all expression nodes.
49class ExprAST {
50public:
51 virtual ~ExprAST() {}
52 virtual Value *Codegen() = 0;
53};
54
55/// NumberExprAST - Expression class for numeric literals like "1.0".
56class NumberExprAST : public ExprAST {
57 double Val;
58public:
Chris Lattner28571ed2007-10-23 04:27:44 +000059 explicit NumberExprAST(double val) : Val(val) {}
Chris Lattner2e902042007-10-22 07:01:42 +000060 virtual Value *Codegen();
61};
62...
63</pre>
64</div>
65
Chris Lattner28571ed2007-10-23 04:27:44 +000066<p>The Codegen() method says to emit IR for that AST node and all things it
67depends on, and they all return an LLVM Value object.
68"Value" is the class used to represent a "<a
69href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single
70Assignment (SSA)</a> register" or "SSA value" in LLVM. The most distinct aspect
71of SSA values is that their value is computed as the related instruction
72executes, and it does not get a new value until (and if) the instruction
73re-executes. In order words, there is no way to "change" an SSA value. For
74more information, please read up on <a
75href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single
76Assignment</a> - the concepts are really quite natural once you grok them.</p>
77
78<p>The
Chris Lattner2e902042007-10-22 07:01:42 +000079second thing we want is an "Error" method like we used for parser, which will
80be used to report errors found during code generation (for example, use of an
81undeclared parameter):</p>
82
83<div class="doc_code">
84<pre>
85Value *ErrorV(const char *Str) { Error(Str); return 0; }
86
87static Module *TheModule;
88static LLVMBuilder Builder;
89static std::map&lt;std::string, Value*&gt; NamedValues;
90</pre>
91</div>
92
93<p>The static variables will be used during code generation. <tt>TheModule</tt>
94is the LLVM construct that contains all of the functions and global variables in
95a chunk of code. In many ways, it is the top-level structure that the LLVM IR
96uses to contain code.</p>
97
98<p>The <tt>Builder</tt> object is a helper object that makes it easy to generate
99LLVM instructions. The <tt>Builder</tt> keeps track of the current place to
100insert instructions and has methods to create new instructions.</p>
101
102<p>The <tt>NamedValues</tt> map keeps track of which values are defined in the
103current scope and what their LLVM representation is. In this form of
104Kaleidoscope, the only things that can be referenced are function parameters.
105As such, function parameters will be in this map when generating code for their
106function body.</p>
107
108<p>
109With these basics in place, we can start talking about how to generate code for
110each expression. Note that this assumes that the <tt>Builder</tt> has been set
111up to generate code <em>into</em> something. For now, we'll assume that this
112has already been done, and we'll just use it to emit code.
113</p>
114
115</div>
116
117<!-- *********************************************************************** -->
118<div class="doc_section"><a name="exprs">Expression Code Generation</a></div>
119<!-- *********************************************************************** -->
120
121<div class="doc_text">
122
123<p>Generating LLVM code for expression nodes is very straight-forward: less
124than 45 lines of commented code for all four of our expression nodes. First,
125we'll do numeric literals:</p>
126
127<div class="doc_code">
128<pre>
129Value *NumberExprAST::Codegen() {
130 return ConstantFP::get(Type::DoubleTy, APFloat(Val));
131}
132</pre>
133</div>
134
Chris Lattnerd3f0cdd2007-10-23 04:51:30 +0000135<p>In the LLVM IR, numeric constants are represented with the
136<tt>ConstantFP</tt> class, which holds the numeric value in an <tt>APFloat</tt>
137internally (<tt>APFloat</tt> has the capability of holding floating point
138constants of <em>A</em>rbitrary <em>P</em>recision). This code basically just
139creates and returns a <tt>ConstantFP</tt>. Note that in the LLVM IR
Chris Lattner2e902042007-10-22 07:01:42 +0000140that constants are all uniqued together and shared. For this reason, the API
Chris Lattnerd3f0cdd2007-10-23 04:51:30 +0000141uses "the foo::get(..)" idiom instead of "new foo(..)" or "foo::create(..).</p>
Chris Lattner2e902042007-10-22 07:01:42 +0000142
143<div class="doc_code">
144<pre>
145Value *VariableExprAST::Codegen() {
146 // Look this variable up in the function.
147 Value *V = NamedValues[Name];
148 return V ? V : ErrorV("Unknown variable name");
149}
150</pre>
151</div>
152
Chris Lattnerd3f0cdd2007-10-23 04:51:30 +0000153<p>References to variables is also quite simple here. In the simple version
154of Kaleidoscope, we assume that the variable has already been emited somewhere
155and its value is available. In practice, the only values that can be in the
156<tt>NamedValues</tt> map are function arguments. This
Chris Lattner2e902042007-10-22 07:01:42 +0000157code simply checks to see that the specified name is in the map (if not, an
158unknown variable is being referenced) and returns the value for it.</p>
159
160<div class="doc_code">
161<pre>
162Value *BinaryExprAST::Codegen() {
163 Value *L = LHS-&gt;Codegen();
164 Value *R = RHS-&gt;Codegen();
165 if (L == 0 || R == 0) return 0;
166
167 switch (Op) {
168 case '+': return Builder.CreateAdd(L, R, "addtmp");
169 case '-': return Builder.CreateSub(L, R, "subtmp");
170 case '*': return Builder.CreateMul(L, R, "multmp");
171 case '&lt;':
172 L = Builder.CreateFCmpULT(L, R, "multmp");
173 // Convert bool 0/1 to double 0.0 or 1.0
174 return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
175 default: return ErrorV("invalid binary operator");
176 }
177}
178</pre>
179</div>
180
Chris Lattnerd3f0cdd2007-10-23 04:51:30 +0000181<p>Binary operators start to get more interesting. The basic idea here is that
182we recursively emit code for the left-hand side of the expression, then the
183right-hand side, then we compute the result of the binary expression. In this
184code, we do a simple switch on the opcode to create the right LLVM instruction.
185</p>
Chris Lattner2e902042007-10-22 07:01:42 +0000186
Chris Lattnerd3f0cdd2007-10-23 04:51:30 +0000187<p>In this example, the LLVM builder class is starting to show its value.
188Because it knows where to insert the newly created instruction, you just have to
189specificy what instruction to create (e.g. with <tt>CreateAdd</tt>), which
190operands to use (<tt>L</tt> and <tt>R</tt> here) and optionally provide a name
191for the generated instruction. One nice thing about LLVM is that the name is
192just a hint: if there are multiple additions in a single function, the first
193will be named "addtmp" and the second will be "autorenamed" by adding a suffix,
194giving it a name like "addtmp42". Local value names for instructions are purely
195optional, but it makes it much easier to read the IR dumps.</p>
196
197<p><a href="../LangRef.html#instref">LLVM instructions</a> are constrained to
198have very strict type properties: for example, the Left and Right operators of
199an <a href="../LangRef.html#i_add">add instruction</a> have to have the same
200type, and that the result of the add matches the operands. Because all values
201in Kaleidoscope are doubles, this makes for very simple code for add, sub and
202mul.</p>
203
204<p>On the other hand, LLVM specifies that the <a
205href="../LangRef.html#i_fcmp">fcmp instruction</a> always returns an 'i1' value
206(a one bit integer). However, Kaleidoscope wants the value to be a 0.0 or 1.0
207value. In order to get these semantics, we combine the fcmp instruction with
208a <a href="../LangRef.html#i_uitofp">uitofp instruction</a>. This instruction
209converts its input integer into a floating point value by treating the input
210as an unsigned value. In contrast, if we used the <a
211href="../LangRef.html#i_sitofp">sitofp instruction</a>, the Kaleidoscope '<'
212operator would return 0.0 and -1.0, depending on the input value.</p>
Chris Lattner2e902042007-10-22 07:01:42 +0000213
214<div class="doc_code">
215<pre>
216Value *CallExprAST::Codegen() {
217 // Look up the name in the global module table.
218 Function *CalleeF = TheModule-&gt;getFunction(Callee);
219 if (CalleeF == 0)
220 return ErrorV("Unknown function referenced");
221
222 // If argument mismatch error.
223 if (CalleeF-&gt;arg_size() != Args.size())
224 return ErrorV("Incorrect # arguments passed");
225
226 std::vector&lt;Value*&gt; ArgsV;
227 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
228 ArgsV.push_back(Args[i]-&gt;Codegen());
229 if (ArgsV.back() == 0) return 0;
230 }
231
232 return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
233}
234</pre>
235</div>
236
Chris Lattnerd3f0cdd2007-10-23 04:51:30 +0000237<p>Code generation for function calls is quite straight-forward with LLVM. The
238code above first looks the name of the function up in the LLVM Module's symbol
239table. Recall that the LLVM Module is the container that holds all of the
240functions we are JIT'ing. By giving each function the same name as what the
241user specifies, we can use the LLVM symbol table to resolve function names for
242us.</p>
243
244<p>Once we have the function to call, we recursively codegen each argument that
245is to be passed in, and create an LLVM <a href="../LangRef.html#i_call">call
246instruction</a>. Note that LLVM uses the native C calling conventions by
247default, allowing these calls to call into standard library functions like
248"sin" and "cos" with no additional effort.</p>
249
250<p>This wraps up our handling of the four basic expressions that we have so far
251in Kaleidoscope. Feel free to go in and add some more. For example, by
252browsing the <a href="../LangRef.html">LLVM language reference</a> you'll find
253several other interesting instructions that are really easy to plug into our
254basic framework.</p>
Chris Lattner2e902042007-10-22 07:01:42 +0000255
256</div>
257
258<!-- *********************************************************************** -->
Chris Lattner35abbf52007-10-23 06:23:57 +0000259<div class="doc_section"><a name="funcs">Function Code Generation</a></div>
Chris Lattner2e902042007-10-22 07:01:42 +0000260<!-- *********************************************************************** -->
261
262<div class="doc_text">
263
Chris Lattner35abbf52007-10-23 06:23:57 +0000264<p>Code generation for prototypes and functions has to handle a number of
265details, which make their code less beautiful and elegant than expression code
266generation, but they illustrate some important points. First, lets talk about
267code generation for prototypes: this is used both for function bodies as well
268as external function declarations. The code starts with:</p>
269
270<div class="doc_code">
271<pre>
272Function *PrototypeAST::Codegen() {
273 // Make the function type: double(double,double) etc.
274 std::vector&lt;const Type*&gt; Doubles(Args.size(), Type::DoubleTy);
275 FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
276
277 Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
278</pre>
279</div>
280
281<p>This code packs a lot of power into a few lines. The first step is to create
282the <tt>FunctionType</tt> that should be used for a given Prototype. Since all
283function arguments in Kaleidoscope are of type double, the first line creates
284a vector of "N" LLVM Double types. It then uses the <tt>FunctionType::get</tt>
285method to create a function type that takes "N" doubles as arguments, returns
286one double as a result, and that is not vararg (the false parameter indicates
287this). Note that Types in LLVM are uniqued just like Constants are, so you
288don't "new" a type, you "get" it.</p>
289
290<p>The final line above actually creates the function that the prototype will
291correspond to. This indicates which type, linkage, and name to use, and which
292module to insert into. "<a href="LangRef.html#linkage">external linkage</a>"
293means that the function may be defined outside the current module and/or that it
294is callable by functions outside the module. The Name passed in is the name the
295user specified: since "<tt>TheModule</tt>" is specified, this name is registered
296in "<tt>TheModule</tt>"s symbol table, which is used by the function call code
297above.</p>
298
299<div class="doc_code">
300<pre>
301 // If F conflicted, there was already something named 'Name'. If it has a
302 // body, don't allow redefinition or reextern.
303 if (F-&gt;getName() != Name) {
304 // Delete the one we just made and get the existing one.
305 F-&gt;eraseFromParent();
306 F = TheModule-&gt;getFunction(Name);
307</pre>
308</div>
309
310<p>The Module symbol table works just like the Function symbol table when it
311comes to name conflicts: if a new function is created with a name was previously
312added to the symbol table, it will get implicitly renamed when added to the
313Module. The code above exploits this fact to tell if there was a previous
314definition of this function.</p>
315
316<p>In Kaleidoscope, I choose to allow redefinitions of functions in two cases:
317first, we want to allow 'extern'ing a function more than once, so long as the
318prototypes for the externs match (since all arguments have the same type, we
319just have to check that the number of arguments match). Second, we want to
320allow 'extern'ing a function and then definining a body for it. This is useful
321when defining mutually recursive functions.</p>
322
323<p>In order to implement this, the code above first checks to see if there is
324a collision on the name of the function. If so, it deletes the function we just
325created (by calling <tt>eraseFromParent</tt>) and then calling
326<tt>getFunction</tt> to get the existing function with the specified name. Note
327that many APIs in LLVM have "erase" forms and "remove" forms. The "remove" form
328unlinks the object from its parent (e.g. a Function from a Module) and returns
329it. The "erase" form unlinks the object and then deletes it.</p>
330
331<div class="doc_code">
332<pre>
333 // If F already has a body, reject this.
334 if (!F-&gt;empty()) {
335 ErrorF("redefinition of function");
336 return 0;
337 }
338
339 // If F took a different number of args, reject.
340 if (F-&gt;arg_size() != Args.size()) {
341 ErrorF("redefinition of function with different # args");
342 return 0;
343 }
344 }
345</pre>
346</div>
347
348<p>In order to verify the logic above, we first check to see if the preexisting
349function is "empty". In this case, empty means that it has no basic blocks in
350it, which means it has no body. If it has no body, this means its a forward
351declaration. Since we don't allow anything after a full definition of the
352function, the code rejects this case. If the previous reference to a function
353was an 'extern', we simply verify that the number of arguments for that
354definition and this one match up. If not, we emit an error.</p>
355
356<div class="doc_code">
357<pre>
358 // Set names for all arguments.
359 unsigned Idx = 0;
360 for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
361 ++AI, ++Idx) {
362 AI-&gt;setName(Args[Idx]);
363
364 // Add arguments to variable symbol table.
365 NamedValues[Args[Idx]] = AI;
366 }
367 return F;
368}
369</pre>
370</div>
371
372<p>The last bit of code for prototypes loops over all of the arguments in the
373function, setting the name of the LLVM Argument objects to match and registering
374the arguments in the <tt>NamedValues</tt> map for future use by the
375<tt>VariableExprAST</tt> AST node. Once this is set up, it returns the Function
376object to the caller. Note that we don't check for conflicting
377argument names here (e.g. "extern foo(a b a)"). Doing so would be very
378straight-forward.</p>
379
380<div class="doc_code">
381<pre>
382Function *FunctionAST::Codegen() {
383 NamedValues.clear();
384
385 Function *TheFunction = Proto-&gt;Codegen();
386 if (TheFunction == 0)
387 return 0;
388</pre>
389</div>
390
391<p>Code generation for function definitions starts out simply enough: first we
392codegen the prototype and verify that it is ok. We also clear out the
393<tt>NamedValues</tt> map to make sure that there isn't anything in it from the
394last function we compiled.</p>
395
396<div class="doc_code">
397<pre>
398 // Create a new basic block to start insertion into.
399 BasicBlock *BB = new BasicBlock("entry", TheFunction);
400 Builder.SetInsertPoint(BB);
401
402 if (Value *RetVal = Body-&gt;Codegen()) {
403 // Finish off the function.
404 Builder.CreateRet(RetVal);
405 return TheFunction;
406 }
407</pre>
408</div>
409
410<p>Now we get to the point where the <tt>Builder</tt> is set up. The first
411line creates a new <a href="http://en.wikipedia.org/wiki/Basic_block">basic
412block</a> (named "entry"), which is inserted into <tt>TheFunction</tt>. The
413second line then tells the builder that new instructions should be inserted into
414the end of the new basic block. Basic blocks in LLVM are an important part
415of functions that define the <a
416href="http://en.wikipedia.org/wiki/Control_flow_graph">Control Flow Graph</a>.
417Since we don't have any control flow, our functions will only contain one
418block so far. We'll fix this in a future installment :).</p>
419
420<p>Once the insertion point is set up, we call the <tt>CodeGen()</tt> method for
421the root expression of the function. If no error happens, this emits code to
422compute the expression into the entry block and returns the value that was
423computed. Assuming no error, we then create an LLVM <a
424href="../LangRef.html#i_ret">ret instruction</a>. This completes the function,
425which is then returned.</p>
426
427<div class="doc_code">
428<pre>
429 // Error reading body, remove function.
430 TheFunction-&gt;eraseFromParent();
431 return 0;
432}
433</pre>
434</div>
435
436<p>The only piece left here is handling of the error case. For simplicity, we
437simply handle this by deleting the function we produced with the
438<tt>eraseFromParent</tt> method. This allows the user to redefine a function
439that they incorrectly typed in before: if we didn't delete it, it would live in
440the symbol table, with a body, preventing future redefinition.</p>
441
442<p>This code does have a bug though. Since the <tt>PrototypeAST::Codegen</tt>
443can return a previously defined forward declaration, this can actually delete
444a forward declaration. There are a number of ways to fix this bug, see what you
445can come up with! Here is a testcase:</p>
446
447<div class="doc_code">
448<pre>
449extern foo(a b); # ok, defines foo.
450def foo(a b) c; # error, 'c' is invalid.
451def bar() foo(1, 2); # error, unknown function "foo"
452</pre>
453</div>
454
455</div>
456
457<!-- *********************************************************************** -->
458<div class="doc_section"><a name="driver">Driver Changes and
459Closing Thoughts</a></div>
460<!-- *********************************************************************** -->
461
462<div class="doc_text">
463
464<p>
465For now, code generation to LLVM doesn't really get us much, except that we can
466look at the pretty IR calls. The sample code inserts calls to Codegen into the
467"<tt>HandleDefinition</tt>", "<tt>HandleExtern</tt>" etc functions, and then
468dumps out the LLVM IR. This gives a nice way to look at the LLVM IR for simple
469functions. For example:
470</p>
471
472<div class="doc_code">
473<pre>
474ready> <b>4+5</b>;
475ready> Read top-level expression:
476define double @""() {
477entry:
478 %addtmp = add double 4.000000e+00, 5.000000e+00
479 ret double %addtmp
480}
481</pre>
482</div>
483
484<p>Note how the parser turns the top-level expression into anonymous functions
485for us. This will be handy when we add JIT support in the next chapter. Also
486note that the code is very literally transcribed, no optimizations are being
487performed. We will add optimizations explicitly in the next chapter.</p>
488
489<div class="doc_code">
490<pre>
491ready&gt; <b>def foo(a b) a*a + 2*a*b + b*b;</b>
492ready&gt; Read function definition:
493define double @foo(double %a, double %b) {
494entry:
495 %multmp = mul double %a, %a
496 %multmp1 = mul double 2.000000e+00, %a
497 %multmp2 = mul double %multmp1, %b
498 %addtmp = add double %multmp, %multmp2
499 %multmp3 = mul double %b, %b
500 %addtmp4 = add double %addtmp, %multmp3
501 ret double %addtmp4
502}
503</pre>
504</div>
505
506<p>This shows some simple arithmetic. Notice the striking similarity to the
507LLVM builder calls that we use to create the instructions.</p>
508
509<div class="doc_code">
510<pre>
511ready&gt; <b>def bar(a) foo(a, 4.0) + bar(31337);</b>
512ready&gt; Read function definition:
513define double @bar(double %a) {
514entry:
515 %calltmp = call double @foo( double %a, double 4.000000e+00 )
516 %calltmp1 = call double @bar( double 3.133700e+04 )
517 %addtmp = add double %calltmp, %calltmp1
518 ret double %addtmp
519}
520</pre>
521</div>
522
523<p>This shows some function calls. Note that the runtime of this function might
524be fairly high. In the future we'll add conditional control flow to make
525recursion actually be useful :).</p>
526
527<div class="doc_code">
528<pre>
529ready&gt; <b>extern cos(x);</b>
530ready&gt; Read extern:
531declare double @cos(double)
532
533ready&gt; <b>cos(1.234);</b>
534ready&gt; Read top-level expression:
535define double @""() {
536entry:
Chris Lattner8eef4b22007-10-23 06:30:50 +0000537 %calltmp = call double @cos( double 1.234000e+00 )
Chris Lattner35abbf52007-10-23 06:23:57 +0000538 ret double %calltmp
539}
540</pre>
541</div>
542
543<p>This shows an extern for the libm "cos" function, and a call to it.</p>
544
545
546<div class="doc_code">
547<pre>
548ready&gt; <b>^D</b>
549; ModuleID = 'my cool jit'
550
551define double @""() {
552entry:
553 %addtmp = add double 4.000000e+00, 5.000000e+00
554 ret double %addtmp
555}
556
557define double @foo(double %a, double %b) {
558entry:
559 %multmp = mul double %a, %a
560 %multmp1 = mul double 2.000000e+00, %a
561 %multmp2 = mul double %multmp1, %b
562 %addtmp = add double %multmp, %multmp2
563 %multmp3 = mul double %b, %b
564 %addtmp4 = add double %addtmp, %multmp3
565 ret double %addtmp4
566}
567
568define double @bar(double %a) {
569entry:
570 %calltmp = call double @foo( double %a, double 4.000000e+00 )
571 %calltmp1 = call double @bar( double 3.133700e+04 )
572 %addtmp = add double %calltmp, %calltmp1
573 ret double %addtmp
574}
575
576declare double @cos(double)
577
578define double @""() {
579entry:
580 %calltmp = call double @cos( double 1.234000e+00 )
581 ret double %calltmp
582}
583</pre>
584</div>
585
586<p>When you quit the current demo, it dumps out the IR for the entire module
587generated. Here you can see the big picture with all the functions referencing
588each other.</p>
589
590<p>This wraps up this chapter of the Kaleidoscope tutorial. Up next we'll
591describe how to <a href="LangImpl4.html">add JIT codegen and optimizer
592support</a> to this so we can actually start running code!</p>
593
594</div>
595
596
597<!-- *********************************************************************** -->
598<div class="doc_section"><a name="code">Full Code Listing</a></div>
599<!-- *********************************************************************** -->
600
601<div class="doc_text">
602
603<p>
604Here is the complete code listing for our running example, enhanced with the
605LLVM code generator. Because this uses the LLVM libraries, we need to link
606them in. To do this, we use the <a
607href="http://llvm.org/cmds/llvm-config.html">llvm-config</a> tool to inform
608our makefile/command line about which options to use:</p>
609
610<div class="doc_code">
611<pre>
612 # Compile
613 g++ -g toy.cpp `llvm-config --cppflags` `llvm-config --ldflags` \
614 `llvm-config --libs core` -I ~/llvm/include/ -o toy
615 # Run
616 ./toy
617</pre>
618</div>
619
620<p>Here is the code:</p>
621
Chris Lattner2e902042007-10-22 07:01:42 +0000622<div class="doc_code">
623<pre>
624// To build this:
Chris Lattner2e902042007-10-22 07:01:42 +0000625// See example below.
626
627#include "llvm/DerivedTypes.h"
628#include "llvm/Module.h"
629#include "llvm/Support/LLVMBuilder.h"
630#include &lt;cstdio&gt;
631#include &lt;string&gt;
632#include &lt;map&gt;
633#include &lt;vector&gt;
634using namespace llvm;
635
636//===----------------------------------------------------------------------===//
637// Lexer
638//===----------------------------------------------------------------------===//
639
640// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
641// of these for known things.
642enum Token {
643 tok_eof = -1,
644
645 // commands
646 tok_def = -2, tok_extern = -3,
647
648 // primary
649 tok_identifier = -4, tok_number = -5,
650};
651
652static std::string IdentifierStr; // Filled in if tok_identifier
653static double NumVal; // Filled in if tok_number
654
655/// gettok - Return the next token from standard input.
656static int gettok() {
657 static int LastChar = ' ';
658
659 // Skip any whitespace.
660 while (isspace(LastChar))
661 LastChar = getchar();
662
663 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
664 IdentifierStr = LastChar;
665 while (isalnum((LastChar = getchar())))
666 IdentifierStr += LastChar;
667
668 if (IdentifierStr == "def") return tok_def;
669 if (IdentifierStr == "extern") return tok_extern;
670 return tok_identifier;
671 }
672
673 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
674 std::string NumStr;
675 do {
676 NumStr += LastChar;
677 LastChar = getchar();
678 } while (isdigit(LastChar) || LastChar == '.');
679
680 NumVal = strtod(NumStr.c_str(), 0);
681 return tok_number;
682 }
683
684 if (LastChar == '#') {
685 // Comment until end of line.
686 do LastChar = getchar();
687 while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp; LastChar != '\r');
688
689 if (LastChar != EOF)
690 return gettok();
691 }
692
693 // Check for end of file. Don't eat the EOF.
694 if (LastChar == EOF)
695 return tok_eof;
696
697 // Otherwise, just return the character as its ascii value.
698 int ThisChar = LastChar;
699 LastChar = getchar();
700 return ThisChar;
701}
702
703//===----------------------------------------------------------------------===//
704// Abstract Syntax Tree (aka Parse Tree)
705//===----------------------------------------------------------------------===//
706
707/// ExprAST - Base class for all expression nodes.
708class ExprAST {
709public:
710 virtual ~ExprAST() {}
711 virtual Value *Codegen() = 0;
712};
713
714/// NumberExprAST - Expression class for numeric literals like "1.0".
715class NumberExprAST : public ExprAST {
716 double Val;
717public:
Chris Lattner28571ed2007-10-23 04:27:44 +0000718 explicit NumberExprAST(double val) : Val(val) {}
Chris Lattner2e902042007-10-22 07:01:42 +0000719 virtual Value *Codegen();
720};
721
722/// VariableExprAST - Expression class for referencing a variable, like "a".
723class VariableExprAST : public ExprAST {
724 std::string Name;
725public:
Chris Lattner28571ed2007-10-23 04:27:44 +0000726 explicit VariableExprAST(const std::string &amp;name) : Name(name) {}
Chris Lattner2e902042007-10-22 07:01:42 +0000727 virtual Value *Codegen();
728};
729
730/// BinaryExprAST - Expression class for a binary operator.
731class BinaryExprAST : public ExprAST {
732 char Op;
733 ExprAST *LHS, *RHS;
734public:
735 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
736 : Op(op), LHS(lhs), RHS(rhs) {}
737 virtual Value *Codegen();
738};
739
740/// CallExprAST - Expression class for function calls.
741class CallExprAST : public ExprAST {
742 std::string Callee;
743 std::vector&lt;ExprAST*&gt; Args;
744public:
745 CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
746 : Callee(callee), Args(args) {}
747 virtual Value *Codegen();
748};
749
750/// PrototypeAST - This class represents the "prototype" for a function,
751/// which captures its argument names as well as if it is an operator.
752class PrototypeAST {
753 std::string Name;
754 std::vector&lt;std::string&gt; Args;
755public:
756 PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args)
757 : Name(name), Args(args) {}
758
759 Function *Codegen();
760};
761
762/// FunctionAST - This class represents a function definition itself.
763class FunctionAST {
764 PrototypeAST *Proto;
765 ExprAST *Body;
766public:
767 FunctionAST(PrototypeAST *proto, ExprAST *body)
768 : Proto(proto), Body(body) {}
769
770 Function *Codegen();
771};
772
773//===----------------------------------------------------------------------===//
774// Parser
775//===----------------------------------------------------------------------===//
776
777/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
778/// token the parser it looking at. getNextToken reads another token from the
779/// lexer and updates CurTok with its results.
780static int CurTok;
781static int getNextToken() {
782 return CurTok = gettok();
783}
784
785/// BinopPrecedence - This holds the precedence for each binary operator that is
786/// defined.
787static std::map&lt;char, int&gt; BinopPrecedence;
788
789/// GetTokPrecedence - Get the precedence of the pending binary operator token.
790static int GetTokPrecedence() {
791 if (!isascii(CurTok))
792 return -1;
793
794 // Make sure it's a declared binop.
795 int TokPrec = BinopPrecedence[CurTok];
796 if (TokPrec &lt;= 0) return -1;
797 return TokPrec;
798}
799
800/// Error* - These are little helper functions for error handling.
801ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
802PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
803FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
804
805static ExprAST *ParseExpression();
806
807/// identifierexpr
808/// ::= identifer
809/// ::= identifer '(' expression* ')'
810static ExprAST *ParseIdentifierExpr() {
811 std::string IdName = IdentifierStr;
812
813 getNextToken(); // eat identifer.
814
815 if (CurTok != '(') // Simple variable ref.
816 return new VariableExprAST(IdName);
817
818 // Call.
819 getNextToken(); // eat (
820 std::vector&lt;ExprAST*&gt; Args;
821 while (1) {
822 ExprAST *Arg = ParseExpression();
823 if (!Arg) return 0;
824 Args.push_back(Arg);
825
826 if (CurTok == ')') break;
827
828 if (CurTok != ',')
829 return Error("Expected ')'");
830 getNextToken();
831 }
832
833 // Eat the ')'.
834 getNextToken();
835
836 return new CallExprAST(IdName, Args);
837}
838
839/// numberexpr ::= number
840static ExprAST *ParseNumberExpr() {
841 ExprAST *Result = new NumberExprAST(NumVal);
842 getNextToken(); // consume the number
843 return Result;
844}
845
846/// parenexpr ::= '(' expression ')'
847static ExprAST *ParseParenExpr() {
848 getNextToken(); // eat (.
849 ExprAST *V = ParseExpression();
850 if (!V) return 0;
851
852 if (CurTok != ')')
853 return Error("expected ')'");
854 getNextToken(); // eat ).
855 return V;
856}
857
858/// primary
859/// ::= identifierexpr
860/// ::= numberexpr
861/// ::= parenexpr
862static ExprAST *ParsePrimary() {
863 switch (CurTok) {
864 default: return Error("unknown token when expecting an expression");
865 case tok_identifier: return ParseIdentifierExpr();
866 case tok_number: return ParseNumberExpr();
867 case '(': return ParseParenExpr();
868 }
869}
870
871/// binoprhs
872/// ::= ('+' primary)*
873static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
874 // If this is a binop, find its precedence.
875 while (1) {
876 int TokPrec = GetTokPrecedence();
877
878 // If this is a binop that binds at least as tightly as the current binop,
879 // consume it, otherwise we are done.
880 if (TokPrec &lt; ExprPrec)
881 return LHS;
882
883 // Okay, we know this is a binop.
884 int BinOp = CurTok;
885 getNextToken(); // eat binop
886
887 // Parse the primary expression after the binary operator.
888 ExprAST *RHS = ParsePrimary();
889 if (!RHS) return 0;
890
891 // If BinOp binds less tightly with RHS than the operator after RHS, let
892 // the pending operator take RHS as its LHS.
893 int NextPrec = GetTokPrecedence();
894 if (TokPrec &lt; NextPrec) {
895 RHS = ParseBinOpRHS(TokPrec+1, RHS);
896 if (RHS == 0) return 0;
897 }
898
899 // Merge LHS/RHS.
900 LHS = new BinaryExprAST(BinOp, LHS, RHS);
901 }
902}
903
904/// expression
905/// ::= primary binoprhs
906///
907static ExprAST *ParseExpression() {
908 ExprAST *LHS = ParsePrimary();
909 if (!LHS) return 0;
910
911 return ParseBinOpRHS(0, LHS);
912}
913
914/// prototype
915/// ::= id '(' id* ')'
916static PrototypeAST *ParsePrototype() {
917 if (CurTok != tok_identifier)
918 return ErrorP("Expected function name in prototype");
919
920 std::string FnName = IdentifierStr;
921 getNextToken();
922
923 if (CurTok != '(')
924 return ErrorP("Expected '(' in prototype");
925
926 std::vector&lt;std::string&gt; ArgNames;
927 while (getNextToken() == tok_identifier)
928 ArgNames.push_back(IdentifierStr);
929 if (CurTok != ')')
930 return ErrorP("Expected ')' in prototype");
931
932 // success.
933 getNextToken(); // eat ')'.
934
935 return new PrototypeAST(FnName, ArgNames);
936}
937
938/// definition ::= 'def' prototype expression
939static FunctionAST *ParseDefinition() {
940 getNextToken(); // eat def.
941 PrototypeAST *Proto = ParsePrototype();
942 if (Proto == 0) return 0;
943
944 if (ExprAST *E = ParseExpression())
945 return new FunctionAST(Proto, E);
946 return 0;
947}
948
949/// toplevelexpr ::= expression
950static FunctionAST *ParseTopLevelExpr() {
951 if (ExprAST *E = ParseExpression()) {
952 // Make an anonymous proto.
953 PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
954 return new FunctionAST(Proto, E);
955 }
956 return 0;
957}
958
959/// external ::= 'extern' prototype
960static PrototypeAST *ParseExtern() {
961 getNextToken(); // eat extern.
962 return ParsePrototype();
963}
964
965//===----------------------------------------------------------------------===//
966// Code Generation
967//===----------------------------------------------------------------------===//
968
969static Module *TheModule;
970static LLVMBuilder Builder;
971static std::map&lt;std::string, Value*&gt; NamedValues;
972
973Value *ErrorV(const char *Str) { Error(Str); return 0; }
974
975Value *NumberExprAST::Codegen() {
976 return ConstantFP::get(Type::DoubleTy, APFloat(Val));
977}
978
979Value *VariableExprAST::Codegen() {
980 // Look this variable up in the function.
981 Value *V = NamedValues[Name];
982 return V ? V : ErrorV("Unknown variable name");
983}
984
985Value *BinaryExprAST::Codegen() {
986 Value *L = LHS-&gt;Codegen();
987 Value *R = RHS-&gt;Codegen();
988 if (L == 0 || R == 0) return 0;
989
990 switch (Op) {
991 case '+': return Builder.CreateAdd(L, R, "addtmp");
992 case '-': return Builder.CreateSub(L, R, "subtmp");
993 case '*': return Builder.CreateMul(L, R, "multmp");
994 case '&lt;':
995 L = Builder.CreateFCmpULT(L, R, "multmp");
996 // Convert bool 0/1 to double 0.0 or 1.0
997 return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
998 default: return ErrorV("invalid binary operator");
999 }
1000}
1001
1002Value *CallExprAST::Codegen() {
1003 // Look up the name in the global module table.
1004 Function *CalleeF = TheModule-&gt;getFunction(Callee);
1005 if (CalleeF == 0)
1006 return ErrorV("Unknown function referenced");
1007
1008 // If argument mismatch error.
1009 if (CalleeF-&gt;arg_size() != Args.size())
1010 return ErrorV("Incorrect # arguments passed");
1011
1012 std::vector&lt;Value*&gt; ArgsV;
1013 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
1014 ArgsV.push_back(Args[i]-&gt;Codegen());
1015 if (ArgsV.back() == 0) return 0;
1016 }
1017
1018 return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
1019}
1020
1021Function *PrototypeAST::Codegen() {
1022 // Make the function type: double(double,double) etc.
Chris Lattner35abbf52007-10-23 06:23:57 +00001023 std::vector&lt;const Type*&gt; Doubles(Args.size(), Type::DoubleTy);
1024 FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
Chris Lattner2e902042007-10-22 07:01:42 +00001025
1026 Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
1027
1028 // If F conflicted, there was already something named 'Name'. If it has a
1029 // body, don't allow redefinition or reextern.
1030 if (F-&gt;getName() != Name) {
1031 // Delete the one we just made and get the existing one.
1032 F-&gt;eraseFromParent();
1033 F = TheModule-&gt;getFunction(Name);
1034
1035 // If F already has a body, reject this.
1036 if (!F-&gt;empty()) {
1037 ErrorF("redefinition of function");
1038 return 0;
1039 }
1040
1041 // If F took a different number of args, reject.
1042 if (F-&gt;arg_size() != Args.size()) {
1043 ErrorF("redefinition of function with different # args");
1044 return 0;
1045 }
1046 }
1047
1048 // Set names for all arguments.
1049 unsigned Idx = 0;
1050 for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
1051 ++AI, ++Idx) {
1052 AI-&gt;setName(Args[Idx]);
1053
1054 // Add arguments to variable symbol table.
1055 NamedValues[Args[Idx]] = AI;
1056 }
1057
1058 return F;
1059}
1060
1061Function *FunctionAST::Codegen() {
1062 NamedValues.clear();
1063
1064 Function *TheFunction = Proto-&gt;Codegen();
1065 if (TheFunction == 0)
1066 return 0;
1067
1068 // Create a new basic block to start insertion into.
Chris Lattner35abbf52007-10-23 06:23:57 +00001069 BasicBlock *BB = new BasicBlock("entry", TheFunction);
1070 Builder.SetInsertPoint(BB);
Chris Lattner2e902042007-10-22 07:01:42 +00001071
1072 if (Value *RetVal = Body-&gt;Codegen()) {
1073 // Finish off the function.
1074 Builder.CreateRet(RetVal);
1075 return TheFunction;
1076 }
1077
1078 // Error reading body, remove function.
1079 TheFunction-&gt;eraseFromParent();
1080 return 0;
1081}
1082
1083//===----------------------------------------------------------------------===//
1084// Top-Level parsing and JIT Driver
1085//===----------------------------------------------------------------------===//
1086
1087static void HandleDefinition() {
1088 if (FunctionAST *F = ParseDefinition()) {
1089 if (Function *LF = F-&gt;Codegen()) {
1090 fprintf(stderr, "Read function definition:");
1091 LF-&gt;dump();
1092 }
1093 } else {
1094 // Skip token for error recovery.
1095 getNextToken();
1096 }
1097}
1098
1099static void HandleExtern() {
1100 if (PrototypeAST *P = ParseExtern()) {
1101 if (Function *F = P-&gt;Codegen()) {
1102 fprintf(stderr, "Read extern: ");
1103 F-&gt;dump();
1104 }
1105 } else {
1106 // Skip token for error recovery.
1107 getNextToken();
1108 }
1109}
1110
1111static void HandleTopLevelExpression() {
1112 // Evaluate a top level expression into an anonymous function.
1113 if (FunctionAST *F = ParseTopLevelExpr()) {
1114 if (Function *LF = F-&gt;Codegen()) {
1115 fprintf(stderr, "Read top-level expression:");
1116 LF-&gt;dump();
1117 }
1118 } else {
1119 // Skip token for error recovery.
1120 getNextToken();
1121 }
1122}
1123
1124/// top ::= definition | external | expression | ';'
1125static void MainLoop() {
1126 while (1) {
1127 fprintf(stderr, "ready&gt; ");
1128 switch (CurTok) {
1129 case tok_eof: return;
1130 case ';': getNextToken(); break; // ignore top level semicolons.
1131 case tok_def: HandleDefinition(); break;
1132 case tok_extern: HandleExtern(); break;
1133 default: HandleTopLevelExpression(); break;
1134 }
1135 }
1136}
1137
1138
1139
1140//===----------------------------------------------------------------------===//
1141// "Library" functions that can be "extern'd" from user code.
1142//===----------------------------------------------------------------------===//
1143
1144/// putchard - putchar that takes a double and returns 0.
1145extern "C"
1146double putchard(double X) {
1147 putchar((char)X);
1148 return 0;
1149}
1150
1151//===----------------------------------------------------------------------===//
1152// Main driver code.
1153//===----------------------------------------------------------------------===//
1154
1155int main() {
1156 TheModule = new Module("my cool jit");
1157
1158 // Install standard binary operators.
1159 // 1 is lowest precedence.
1160 BinopPrecedence['&lt;'] = 10;
1161 BinopPrecedence['+'] = 20;
1162 BinopPrecedence['-'] = 20;
1163 BinopPrecedence['*'] = 40; // highest.
1164
1165 // Prime the first token.
1166 fprintf(stderr, "ready&gt; ");
1167 getNextToken();
1168
1169 MainLoop();
1170 TheModule-&gt;dump();
1171 return 0;
1172}
Chris Lattner2e902042007-10-22 07:01:42 +00001173</pre>
1174</div>
1175</div>
1176
1177<!-- *********************************************************************** -->
1178<hr>
1179<address>
1180 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
1181 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
1182 <a href="http://validator.w3.org/check/referer"><img
Chris Lattner8eef4b22007-10-23 06:30:50 +00001183 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
Chris Lattner2e902042007-10-22 07:01:42 +00001184
1185 <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
1186 <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
1187 Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $
1188</address>
1189</body>
1190</html>