blob: 7e3ed0377372f5d24352ed4f0f6eae684efe81ce [file] [log] [blame]
Chris Lattner2e902042007-10-22 07:01:42 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
3
4<html>
5<head>
6 <title>Kaleidoscope: Implementing code generation to LLVM IR</title>
7 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
8 <meta name="author" content="Chris Lattner">
9 <link rel="stylesheet" href="../llvm.css" type="text/css">
10</head>
11
12<body>
13
14<div class="doc_title">Kaleidoscope: Code generation to LLVM IR</div>
15
16<div class="doc_author">
17 <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
18</div>
19
20<!-- *********************************************************************** -->
21<div class="doc_section"><a name="intro">Part 3 Introduction</a></div>
22<!-- *********************************************************************** -->
23
24<div class="doc_text">
25
26<p>Welcome to part 3 of the "<a href="index.html">Implementing a language with
27LLVM</a>" tutorial. This chapter shows you how to transform the <a
28href="LangImpl2.html">Abstract Syntax Tree built in Chapter 2</a> into LLVM IR.
29This will teach you a little bit about how LLVM does things, as well as
30demonstrate how easy it is to use. It's much more work to build a lexer and
31parser than it is to generate LLVM IR code.
32</p>
33
34</div>
35
36<!-- *********************************************************************** -->
37<div class="doc_section"><a name="basics">Code Generation setup</a></div>
38<!-- *********************************************************************** -->
39
40<div class="doc_text">
41
42<p>
43In order to generate LLVM IR, we want some simple setup to get started. First,
44we define virtual codegen methods in each AST class:</p>
45
46<div class="doc_code">
47<pre>
48/// ExprAST - Base class for all expression nodes.
49class ExprAST {
50public:
51 virtual ~ExprAST() {}
52 virtual Value *Codegen() = 0;
53};
54
55/// NumberExprAST - Expression class for numeric literals like "1.0".
56class NumberExprAST : public ExprAST {
57 double Val;
58public:
Chris Lattner28571ed2007-10-23 04:27:44 +000059 explicit NumberExprAST(double val) : Val(val) {}
Chris Lattner2e902042007-10-22 07:01:42 +000060 virtual Value *Codegen();
61};
62...
63</pre>
64</div>
65
Chris Lattner28571ed2007-10-23 04:27:44 +000066<p>The Codegen() method says to emit IR for that AST node and all things it
67depends on, and they all return an LLVM Value object.
68"Value" is the class used to represent a "<a
69href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single
70Assignment (SSA)</a> register" or "SSA value" in LLVM. The most distinct aspect
71of SSA values is that their value is computed as the related instruction
72executes, and it does not get a new value until (and if) the instruction
73re-executes. In order words, there is no way to "change" an SSA value. For
74more information, please read up on <a
75href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single
76Assignment</a> - the concepts are really quite natural once you grok them.</p>
77
78<p>The
Chris Lattner2e902042007-10-22 07:01:42 +000079second thing we want is an "Error" method like we used for parser, which will
80be used to report errors found during code generation (for example, use of an
81undeclared parameter):</p>
82
83<div class="doc_code">
84<pre>
85Value *ErrorV(const char *Str) { Error(Str); return 0; }
86
87static Module *TheModule;
88static LLVMBuilder Builder;
89static std::map&lt;std::string, Value*&gt; NamedValues;
90</pre>
91</div>
92
93<p>The static variables will be used during code generation. <tt>TheModule</tt>
94is the LLVM construct that contains all of the functions and global variables in
95a chunk of code. In many ways, it is the top-level structure that the LLVM IR
96uses to contain code.</p>
97
98<p>The <tt>Builder</tt> object is a helper object that makes it easy to generate
99LLVM instructions. The <tt>Builder</tt> keeps track of the current place to
100insert instructions and has methods to create new instructions.</p>
101
102<p>The <tt>NamedValues</tt> map keeps track of which values are defined in the
103current scope and what their LLVM representation is. In this form of
104Kaleidoscope, the only things that can be referenced are function parameters.
105As such, function parameters will be in this map when generating code for their
106function body.</p>
107
108<p>
109With these basics in place, we can start talking about how to generate code for
110each expression. Note that this assumes that the <tt>Builder</tt> has been set
111up to generate code <em>into</em> something. For now, we'll assume that this
112has already been done, and we'll just use it to emit code.
113</p>
114
115</div>
116
117<!-- *********************************************************************** -->
118<div class="doc_section"><a name="exprs">Expression Code Generation</a></div>
119<!-- *********************************************************************** -->
120
121<div class="doc_text">
122
123<p>Generating LLVM code for expression nodes is very straight-forward: less
124than 45 lines of commented code for all four of our expression nodes. First,
125we'll do numeric literals:</p>
126
127<div class="doc_code">
128<pre>
129Value *NumberExprAST::Codegen() {
130 return ConstantFP::get(Type::DoubleTy, APFloat(Val));
131}
132</pre>
133</div>
134
Chris Lattnerd3f0cdd2007-10-23 04:51:30 +0000135<p>In the LLVM IR, numeric constants are represented with the
136<tt>ConstantFP</tt> class, which holds the numeric value in an <tt>APFloat</tt>
137internally (<tt>APFloat</tt> has the capability of holding floating point
138constants of <em>A</em>rbitrary <em>P</em>recision). This code basically just
139creates and returns a <tt>ConstantFP</tt>. Note that in the LLVM IR
Chris Lattner2e902042007-10-22 07:01:42 +0000140that constants are all uniqued together and shared. For this reason, the API
Chris Lattnerd3f0cdd2007-10-23 04:51:30 +0000141uses "the foo::get(..)" idiom instead of "new foo(..)" or "foo::create(..).</p>
Chris Lattner2e902042007-10-22 07:01:42 +0000142
143<div class="doc_code">
144<pre>
145Value *VariableExprAST::Codegen() {
146 // Look this variable up in the function.
147 Value *V = NamedValues[Name];
148 return V ? V : ErrorV("Unknown variable name");
149}
150</pre>
151</div>
152
Chris Lattnerd3f0cdd2007-10-23 04:51:30 +0000153<p>References to variables is also quite simple here. In the simple version
154of Kaleidoscope, we assume that the variable has already been emited somewhere
155and its value is available. In practice, the only values that can be in the
156<tt>NamedValues</tt> map are function arguments. This
Chris Lattner2e902042007-10-22 07:01:42 +0000157code simply checks to see that the specified name is in the map (if not, an
158unknown variable is being referenced) and returns the value for it.</p>
159
160<div class="doc_code">
161<pre>
162Value *BinaryExprAST::Codegen() {
163 Value *L = LHS-&gt;Codegen();
164 Value *R = RHS-&gt;Codegen();
165 if (L == 0 || R == 0) return 0;
166
167 switch (Op) {
168 case '+': return Builder.CreateAdd(L, R, "addtmp");
169 case '-': return Builder.CreateSub(L, R, "subtmp");
170 case '*': return Builder.CreateMul(L, R, "multmp");
171 case '&lt;':
172 L = Builder.CreateFCmpULT(L, R, "multmp");
173 // Convert bool 0/1 to double 0.0 or 1.0
174 return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
175 default: return ErrorV("invalid binary operator");
176 }
177}
178</pre>
179</div>
180
Chris Lattnerd3f0cdd2007-10-23 04:51:30 +0000181<p>Binary operators start to get more interesting. The basic idea here is that
182we recursively emit code for the left-hand side of the expression, then the
183right-hand side, then we compute the result of the binary expression. In this
184code, we do a simple switch on the opcode to create the right LLVM instruction.
185</p>
Chris Lattner2e902042007-10-22 07:01:42 +0000186
Chris Lattnerd3f0cdd2007-10-23 04:51:30 +0000187<p>In this example, the LLVM builder class is starting to show its value.
188Because it knows where to insert the newly created instruction, you just have to
189specificy what instruction to create (e.g. with <tt>CreateAdd</tt>), which
190operands to use (<tt>L</tt> and <tt>R</tt> here) and optionally provide a name
191for the generated instruction. One nice thing about LLVM is that the name is
192just a hint: if there are multiple additions in a single function, the first
193will be named "addtmp" and the second will be "autorenamed" by adding a suffix,
194giving it a name like "addtmp42". Local value names for instructions are purely
195optional, but it makes it much easier to read the IR dumps.</p>
196
197<p><a href="../LangRef.html#instref">LLVM instructions</a> are constrained to
198have very strict type properties: for example, the Left and Right operators of
199an <a href="../LangRef.html#i_add">add instruction</a> have to have the same
200type, and that the result of the add matches the operands. Because all values
201in Kaleidoscope are doubles, this makes for very simple code for add, sub and
202mul.</p>
203
204<p>On the other hand, LLVM specifies that the <a
205href="../LangRef.html#i_fcmp">fcmp instruction</a> always returns an 'i1' value
206(a one bit integer). However, Kaleidoscope wants the value to be a 0.0 or 1.0
207value. In order to get these semantics, we combine the fcmp instruction with
208a <a href="../LangRef.html#i_uitofp">uitofp instruction</a>. This instruction
209converts its input integer into a floating point value by treating the input
210as an unsigned value. In contrast, if we used the <a
211href="../LangRef.html#i_sitofp">sitofp instruction</a>, the Kaleidoscope '<'
212operator would return 0.0 and -1.0, depending on the input value.</p>
Chris Lattner2e902042007-10-22 07:01:42 +0000213
214<div class="doc_code">
215<pre>
216Value *CallExprAST::Codegen() {
217 // Look up the name in the global module table.
218 Function *CalleeF = TheModule-&gt;getFunction(Callee);
219 if (CalleeF == 0)
220 return ErrorV("Unknown function referenced");
221
222 // If argument mismatch error.
223 if (CalleeF-&gt;arg_size() != Args.size())
224 return ErrorV("Incorrect # arguments passed");
225
226 std::vector&lt;Value*&gt; ArgsV;
227 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
228 ArgsV.push_back(Args[i]-&gt;Codegen());
229 if (ArgsV.back() == 0) return 0;
230 }
231
232 return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
233}
234</pre>
235</div>
236
Chris Lattnerd3f0cdd2007-10-23 04:51:30 +0000237<p>Code generation for function calls is quite straight-forward with LLVM. The
238code above first looks the name of the function up in the LLVM Module's symbol
239table. Recall that the LLVM Module is the container that holds all of the
240functions we are JIT'ing. By giving each function the same name as what the
241user specifies, we can use the LLVM symbol table to resolve function names for
242us.</p>
243
244<p>Once we have the function to call, we recursively codegen each argument that
245is to be passed in, and create an LLVM <a href="../LangRef.html#i_call">call
246instruction</a>. Note that LLVM uses the native C calling conventions by
247default, allowing these calls to call into standard library functions like
248"sin" and "cos" with no additional effort.</p>
249
250<p>This wraps up our handling of the four basic expressions that we have so far
251in Kaleidoscope. Feel free to go in and add some more. For example, by
252browsing the <a href="../LangRef.html">LLVM language reference</a> you'll find
253several other interesting instructions that are really easy to plug into our
254basic framework.</p>
Chris Lattner2e902042007-10-22 07:01:42 +0000255
256</div>
257
258<!-- *********************************************************************** -->
Chris Lattner35abbf52007-10-23 06:23:57 +0000259<div class="doc_section"><a name="funcs">Function Code Generation</a></div>
Chris Lattner2e902042007-10-22 07:01:42 +0000260<!-- *********************************************************************** -->
261
262<div class="doc_text">
263
Chris Lattner35abbf52007-10-23 06:23:57 +0000264<p>Code generation for prototypes and functions has to handle a number of
265details, which make their code less beautiful and elegant than expression code
266generation, but they illustrate some important points. First, lets talk about
267code generation for prototypes: this is used both for function bodies as well
268as external function declarations. The code starts with:</p>
269
270<div class="doc_code">
271<pre>
272Function *PrototypeAST::Codegen() {
273 // Make the function type: double(double,double) etc.
274 std::vector&lt;const Type*&gt; Doubles(Args.size(), Type::DoubleTy);
275 FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
276
277 Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
278</pre>
279</div>
280
281<p>This code packs a lot of power into a few lines. The first step is to create
282the <tt>FunctionType</tt> that should be used for a given Prototype. Since all
283function arguments in Kaleidoscope are of type double, the first line creates
284a vector of "N" LLVM Double types. It then uses the <tt>FunctionType::get</tt>
285method to create a function type that takes "N" doubles as arguments, returns
286one double as a result, and that is not vararg (the false parameter indicates
287this). Note that Types in LLVM are uniqued just like Constants are, so you
288don't "new" a type, you "get" it.</p>
289
290<p>The final line above actually creates the function that the prototype will
291correspond to. This indicates which type, linkage, and name to use, and which
292module to insert into. "<a href="LangRef.html#linkage">external linkage</a>"
293means that the function may be defined outside the current module and/or that it
294is callable by functions outside the module. The Name passed in is the name the
295user specified: since "<tt>TheModule</tt>" is specified, this name is registered
296in "<tt>TheModule</tt>"s symbol table, which is used by the function call code
297above.</p>
298
299<div class="doc_code">
300<pre>
301 // If F conflicted, there was already something named 'Name'. If it has a
302 // body, don't allow redefinition or reextern.
303 if (F-&gt;getName() != Name) {
304 // Delete the one we just made and get the existing one.
305 F-&gt;eraseFromParent();
306 F = TheModule-&gt;getFunction(Name);
307</pre>
308</div>
309
310<p>The Module symbol table works just like the Function symbol table when it
311comes to name conflicts: if a new function is created with a name was previously
312added to the symbol table, it will get implicitly renamed when added to the
313Module. The code above exploits this fact to tell if there was a previous
314definition of this function.</p>
315
316<p>In Kaleidoscope, I choose to allow redefinitions of functions in two cases:
317first, we want to allow 'extern'ing a function more than once, so long as the
318prototypes for the externs match (since all arguments have the same type, we
319just have to check that the number of arguments match). Second, we want to
320allow 'extern'ing a function and then definining a body for it. This is useful
321when defining mutually recursive functions.</p>
322
323<p>In order to implement this, the code above first checks to see if there is
324a collision on the name of the function. If so, it deletes the function we just
325created (by calling <tt>eraseFromParent</tt>) and then calling
326<tt>getFunction</tt> to get the existing function with the specified name. Note
327that many APIs in LLVM have "erase" forms and "remove" forms. The "remove" form
328unlinks the object from its parent (e.g. a Function from a Module) and returns
329it. The "erase" form unlinks the object and then deletes it.</p>
330
331<div class="doc_code">
332<pre>
333 // If F already has a body, reject this.
334 if (!F-&gt;empty()) {
335 ErrorF("redefinition of function");
336 return 0;
337 }
338
339 // If F took a different number of args, reject.
340 if (F-&gt;arg_size() != Args.size()) {
341 ErrorF("redefinition of function with different # args");
342 return 0;
343 }
344 }
345</pre>
346</div>
347
348<p>In order to verify the logic above, we first check to see if the preexisting
349function is "empty". In this case, empty means that it has no basic blocks in
350it, which means it has no body. If it has no body, this means its a forward
351declaration. Since we don't allow anything after a full definition of the
352function, the code rejects this case. If the previous reference to a function
353was an 'extern', we simply verify that the number of arguments for that
354definition and this one match up. If not, we emit an error.</p>
355
356<div class="doc_code">
357<pre>
358 // Set names for all arguments.
359 unsigned Idx = 0;
360 for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
361 ++AI, ++Idx) {
362 AI-&gt;setName(Args[Idx]);
363
364 // Add arguments to variable symbol table.
365 NamedValues[Args[Idx]] = AI;
366 }
367 return F;
368}
369</pre>
370</div>
371
372<p>The last bit of code for prototypes loops over all of the arguments in the
373function, setting the name of the LLVM Argument objects to match and registering
374the arguments in the <tt>NamedValues</tt> map for future use by the
375<tt>VariableExprAST</tt> AST node. Once this is set up, it returns the Function
376object to the caller. Note that we don't check for conflicting
377argument names here (e.g. "extern foo(a b a)"). Doing so would be very
378straight-forward.</p>
379
380<div class="doc_code">
381<pre>
382Function *FunctionAST::Codegen() {
383 NamedValues.clear();
384
385 Function *TheFunction = Proto-&gt;Codegen();
386 if (TheFunction == 0)
387 return 0;
388</pre>
389</div>
390
391<p>Code generation for function definitions starts out simply enough: first we
392codegen the prototype and verify that it is ok. We also clear out the
393<tt>NamedValues</tt> map to make sure that there isn't anything in it from the
394last function we compiled.</p>
395
396<div class="doc_code">
397<pre>
398 // Create a new basic block to start insertion into.
399 BasicBlock *BB = new BasicBlock("entry", TheFunction);
400 Builder.SetInsertPoint(BB);
401
402 if (Value *RetVal = Body-&gt;Codegen()) {
403 // Finish off the function.
404 Builder.CreateRet(RetVal);
405 return TheFunction;
406 }
407</pre>
408</div>
409
410<p>Now we get to the point where the <tt>Builder</tt> is set up. The first
411line creates a new <a href="http://en.wikipedia.org/wiki/Basic_block">basic
412block</a> (named "entry"), which is inserted into <tt>TheFunction</tt>. The
413second line then tells the builder that new instructions should be inserted into
414the end of the new basic block. Basic blocks in LLVM are an important part
415of functions that define the <a
416href="http://en.wikipedia.org/wiki/Control_flow_graph">Control Flow Graph</a>.
417Since we don't have any control flow, our functions will only contain one
418block so far. We'll fix this in a future installment :).</p>
419
420<p>Once the insertion point is set up, we call the <tt>CodeGen()</tt> method for
421the root expression of the function. If no error happens, this emits code to
422compute the expression into the entry block and returns the value that was
423computed. Assuming no error, we then create an LLVM <a
424href="../LangRef.html#i_ret">ret instruction</a>. This completes the function,
425which is then returned.</p>
426
427<div class="doc_code">
428<pre>
429 // Error reading body, remove function.
430 TheFunction-&gt;eraseFromParent();
431 return 0;
432}
433</pre>
434</div>
435
436<p>The only piece left here is handling of the error case. For simplicity, we
437simply handle this by deleting the function we produced with the
438<tt>eraseFromParent</tt> method. This allows the user to redefine a function
439that they incorrectly typed in before: if we didn't delete it, it would live in
440the symbol table, with a body, preventing future redefinition.</p>
441
442<p>This code does have a bug though. Since the <tt>PrototypeAST::Codegen</tt>
443can return a previously defined forward declaration, this can actually delete
444a forward declaration. There are a number of ways to fix this bug, see what you
445can come up with! Here is a testcase:</p>
446
447<div class="doc_code">
448<pre>
449extern foo(a b); # ok, defines foo.
450def foo(a b) c; # error, 'c' is invalid.
451def bar() foo(1, 2); # error, unknown function "foo"
452</pre>
453</div>
454
455</div>
456
457<!-- *********************************************************************** -->
458<div class="doc_section"><a name="driver">Driver Changes and
459Closing Thoughts</a></div>
460<!-- *********************************************************************** -->
461
462<div class="doc_text">
463
464<p>
465For now, code generation to LLVM doesn't really get us much, except that we can
466look at the pretty IR calls. The sample code inserts calls to Codegen into the
467"<tt>HandleDefinition</tt>", "<tt>HandleExtern</tt>" etc functions, and then
468dumps out the LLVM IR. This gives a nice way to look at the LLVM IR for simple
469functions. For example:
470</p>
471
472<div class="doc_code">
473<pre>
474ready> <b>4+5</b>;
475ready> Read top-level expression:
476define double @""() {
477entry:
478 %addtmp = add double 4.000000e+00, 5.000000e+00
479 ret double %addtmp
480}
481</pre>
482</div>
483
484<p>Note how the parser turns the top-level expression into anonymous functions
485for us. This will be handy when we add JIT support in the next chapter. Also
486note that the code is very literally transcribed, no optimizations are being
487performed. We will add optimizations explicitly in the next chapter.</p>
488
489<div class="doc_code">
490<pre>
491ready&gt; <b>def foo(a b) a*a + 2*a*b + b*b;</b>
492ready&gt; Read function definition:
493define double @foo(double %a, double %b) {
494entry:
495 %multmp = mul double %a, %a
496 %multmp1 = mul double 2.000000e+00, %a
497 %multmp2 = mul double %multmp1, %b
498 %addtmp = add double %multmp, %multmp2
499 %multmp3 = mul double %b, %b
500 %addtmp4 = add double %addtmp, %multmp3
501 ret double %addtmp4
502}
503</pre>
504</div>
505
506<p>This shows some simple arithmetic. Notice the striking similarity to the
507LLVM builder calls that we use to create the instructions.</p>
508
509<div class="doc_code">
510<pre>
511ready&gt; <b>def bar(a) foo(a, 4.0) + bar(31337);</b>
512ready&gt; Read function definition:
513define double @bar(double %a) {
514entry:
515 %calltmp = call double @foo( double %a, double 4.000000e+00 )
516 %calltmp1 = call double @bar( double 3.133700e+04 )
517 %addtmp = add double %calltmp, %calltmp1
518 ret double %addtmp
519}
520</pre>
521</div>
522
523<p>This shows some function calls. Note that the runtime of this function might
524be fairly high. In the future we'll add conditional control flow to make
525recursion actually be useful :).</p>
526
527<div class="doc_code">
528<pre>
529ready&gt; <b>extern cos(x);</b>
530ready&gt; Read extern:
531declare double @cos(double)
532
533ready&gt; <b>cos(1.234);</b>
534ready&gt; Read top-level expression:
535define double @""() {
536entry:
Chris Lattner8eef4b22007-10-23 06:30:50 +0000537 %calltmp = call double @cos( double 1.234000e+00 )
Chris Lattner35abbf52007-10-23 06:23:57 +0000538 ret double %calltmp
539}
540</pre>
541</div>
542
543<p>This shows an extern for the libm "cos" function, and a call to it.</p>
544
545
546<div class="doc_code">
547<pre>
548ready&gt; <b>^D</b>
549; ModuleID = 'my cool jit'
550
551define double @""() {
552entry:
553 %addtmp = add double 4.000000e+00, 5.000000e+00
554 ret double %addtmp
555}
556
557define double @foo(double %a, double %b) {
558entry:
559 %multmp = mul double %a, %a
560 %multmp1 = mul double 2.000000e+00, %a
561 %multmp2 = mul double %multmp1, %b
562 %addtmp = add double %multmp, %multmp2
563 %multmp3 = mul double %b, %b
564 %addtmp4 = add double %addtmp, %multmp3
565 ret double %addtmp4
566}
567
568define double @bar(double %a) {
569entry:
570 %calltmp = call double @foo( double %a, double 4.000000e+00 )
571 %calltmp1 = call double @bar( double 3.133700e+04 )
572 %addtmp = add double %calltmp, %calltmp1
573 ret double %addtmp
574}
575
576declare double @cos(double)
577
578define double @""() {
579entry:
580 %calltmp = call double @cos( double 1.234000e+00 )
581 ret double %calltmp
582}
583</pre>
584</div>
585
586<p>When you quit the current demo, it dumps out the IR for the entire module
587generated. Here you can see the big picture with all the functions referencing
588each other.</p>
589
590<p>This wraps up this chapter of the Kaleidoscope tutorial. Up next we'll
591describe how to <a href="LangImpl4.html">add JIT codegen and optimizer
592support</a> to this so we can actually start running code!</p>
593
594</div>
595
596
597<!-- *********************************************************************** -->
598<div class="doc_section"><a name="code">Full Code Listing</a></div>
599<!-- *********************************************************************** -->
600
601<div class="doc_text">
602
603<p>
604Here is the complete code listing for our running example, enhanced with the
605LLVM code generator. Because this uses the LLVM libraries, we need to link
606them in. To do this, we use the <a
607href="http://llvm.org/cmds/llvm-config.html">llvm-config</a> tool to inform
608our makefile/command line about which options to use:</p>
609
610<div class="doc_code">
611<pre>
612 # Compile
Chris Lattner9ac0ca02007-10-24 05:09:48 +0000613 g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core` -o toy
Chris Lattner35abbf52007-10-23 06:23:57 +0000614 # Run
615 ./toy
616</pre>
617</div>
618
619<p>Here is the code:</p>
620
Chris Lattner2e902042007-10-22 07:01:42 +0000621<div class="doc_code">
622<pre>
623// To build this:
Chris Lattner2e902042007-10-22 07:01:42 +0000624// See example below.
625
626#include "llvm/DerivedTypes.h"
627#include "llvm/Module.h"
628#include "llvm/Support/LLVMBuilder.h"
629#include &lt;cstdio&gt;
630#include &lt;string&gt;
631#include &lt;map&gt;
632#include &lt;vector&gt;
633using namespace llvm;
634
635//===----------------------------------------------------------------------===//
636// Lexer
637//===----------------------------------------------------------------------===//
638
639// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
640// of these for known things.
641enum Token {
642 tok_eof = -1,
643
644 // commands
645 tok_def = -2, tok_extern = -3,
646
647 // primary
648 tok_identifier = -4, tok_number = -5,
649};
650
651static std::string IdentifierStr; // Filled in if tok_identifier
652static double NumVal; // Filled in if tok_number
653
654/// gettok - Return the next token from standard input.
655static int gettok() {
656 static int LastChar = ' ';
657
658 // Skip any whitespace.
659 while (isspace(LastChar))
660 LastChar = getchar();
661
662 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
663 IdentifierStr = LastChar;
664 while (isalnum((LastChar = getchar())))
665 IdentifierStr += LastChar;
666
667 if (IdentifierStr == "def") return tok_def;
668 if (IdentifierStr == "extern") return tok_extern;
669 return tok_identifier;
670 }
671
672 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
673 std::string NumStr;
674 do {
675 NumStr += LastChar;
676 LastChar = getchar();
677 } while (isdigit(LastChar) || LastChar == '.');
678
679 NumVal = strtod(NumStr.c_str(), 0);
680 return tok_number;
681 }
682
683 if (LastChar == '#') {
684 // Comment until end of line.
685 do LastChar = getchar();
686 while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp; LastChar != '\r');
687
688 if (LastChar != EOF)
689 return gettok();
690 }
691
692 // Check for end of file. Don't eat the EOF.
693 if (LastChar == EOF)
694 return tok_eof;
695
696 // Otherwise, just return the character as its ascii value.
697 int ThisChar = LastChar;
698 LastChar = getchar();
699 return ThisChar;
700}
701
702//===----------------------------------------------------------------------===//
703// Abstract Syntax Tree (aka Parse Tree)
704//===----------------------------------------------------------------------===//
705
706/// ExprAST - Base class for all expression nodes.
707class ExprAST {
708public:
709 virtual ~ExprAST() {}
710 virtual Value *Codegen() = 0;
711};
712
713/// NumberExprAST - Expression class for numeric literals like "1.0".
714class NumberExprAST : public ExprAST {
715 double Val;
716public:
Chris Lattner28571ed2007-10-23 04:27:44 +0000717 explicit NumberExprAST(double val) : Val(val) {}
Chris Lattner2e902042007-10-22 07:01:42 +0000718 virtual Value *Codegen();
719};
720
721/// VariableExprAST - Expression class for referencing a variable, like "a".
722class VariableExprAST : public ExprAST {
723 std::string Name;
724public:
Chris Lattner28571ed2007-10-23 04:27:44 +0000725 explicit VariableExprAST(const std::string &amp;name) : Name(name) {}
Chris Lattner2e902042007-10-22 07:01:42 +0000726 virtual Value *Codegen();
727};
728
729/// BinaryExprAST - Expression class for a binary operator.
730class BinaryExprAST : public ExprAST {
731 char Op;
732 ExprAST *LHS, *RHS;
733public:
734 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
735 : Op(op), LHS(lhs), RHS(rhs) {}
736 virtual Value *Codegen();
737};
738
739/// CallExprAST - Expression class for function calls.
740class CallExprAST : public ExprAST {
741 std::string Callee;
742 std::vector&lt;ExprAST*&gt; Args;
743public:
744 CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
745 : Callee(callee), Args(args) {}
746 virtual Value *Codegen();
747};
748
749/// PrototypeAST - This class represents the "prototype" for a function,
750/// which captures its argument names as well as if it is an operator.
751class PrototypeAST {
752 std::string Name;
753 std::vector&lt;std::string&gt; Args;
754public:
755 PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args)
756 : Name(name), Args(args) {}
757
758 Function *Codegen();
759};
760
761/// FunctionAST - This class represents a function definition itself.
762class FunctionAST {
763 PrototypeAST *Proto;
764 ExprAST *Body;
765public:
766 FunctionAST(PrototypeAST *proto, ExprAST *body)
767 : Proto(proto), Body(body) {}
768
769 Function *Codegen();
770};
771
772//===----------------------------------------------------------------------===//
773// Parser
774//===----------------------------------------------------------------------===//
775
776/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
777/// token the parser it looking at. getNextToken reads another token from the
778/// lexer and updates CurTok with its results.
779static int CurTok;
780static int getNextToken() {
781 return CurTok = gettok();
782}
783
784/// BinopPrecedence - This holds the precedence for each binary operator that is
785/// defined.
786static std::map&lt;char, int&gt; BinopPrecedence;
787
788/// GetTokPrecedence - Get the precedence of the pending binary operator token.
789static int GetTokPrecedence() {
790 if (!isascii(CurTok))
791 return -1;
792
793 // Make sure it's a declared binop.
794 int TokPrec = BinopPrecedence[CurTok];
795 if (TokPrec &lt;= 0) return -1;
796 return TokPrec;
797}
798
799/// Error* - These are little helper functions for error handling.
800ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
801PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
802FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
803
804static ExprAST *ParseExpression();
805
806/// identifierexpr
807/// ::= identifer
808/// ::= identifer '(' expression* ')'
809static ExprAST *ParseIdentifierExpr() {
810 std::string IdName = IdentifierStr;
811
812 getNextToken(); // eat identifer.
813
814 if (CurTok != '(') // Simple variable ref.
815 return new VariableExprAST(IdName);
816
817 // Call.
818 getNextToken(); // eat (
819 std::vector&lt;ExprAST*&gt; Args;
820 while (1) {
821 ExprAST *Arg = ParseExpression();
822 if (!Arg) return 0;
823 Args.push_back(Arg);
824
825 if (CurTok == ')') break;
826
827 if (CurTok != ',')
828 return Error("Expected ')'");
829 getNextToken();
830 }
831
832 // Eat the ')'.
833 getNextToken();
834
835 return new CallExprAST(IdName, Args);
836}
837
838/// numberexpr ::= number
839static ExprAST *ParseNumberExpr() {
840 ExprAST *Result = new NumberExprAST(NumVal);
841 getNextToken(); // consume the number
842 return Result;
843}
844
845/// parenexpr ::= '(' expression ')'
846static ExprAST *ParseParenExpr() {
847 getNextToken(); // eat (.
848 ExprAST *V = ParseExpression();
849 if (!V) return 0;
850
851 if (CurTok != ')')
852 return Error("expected ')'");
853 getNextToken(); // eat ).
854 return V;
855}
856
857/// primary
858/// ::= identifierexpr
859/// ::= numberexpr
860/// ::= parenexpr
861static ExprAST *ParsePrimary() {
862 switch (CurTok) {
863 default: return Error("unknown token when expecting an expression");
864 case tok_identifier: return ParseIdentifierExpr();
865 case tok_number: return ParseNumberExpr();
866 case '(': return ParseParenExpr();
867 }
868}
869
870/// binoprhs
871/// ::= ('+' primary)*
872static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
873 // If this is a binop, find its precedence.
874 while (1) {
875 int TokPrec = GetTokPrecedence();
876
877 // If this is a binop that binds at least as tightly as the current binop,
878 // consume it, otherwise we are done.
879 if (TokPrec &lt; ExprPrec)
880 return LHS;
881
882 // Okay, we know this is a binop.
883 int BinOp = CurTok;
884 getNextToken(); // eat binop
885
886 // Parse the primary expression after the binary operator.
887 ExprAST *RHS = ParsePrimary();
888 if (!RHS) return 0;
889
890 // If BinOp binds less tightly with RHS than the operator after RHS, let
891 // the pending operator take RHS as its LHS.
892 int NextPrec = GetTokPrecedence();
893 if (TokPrec &lt; NextPrec) {
894 RHS = ParseBinOpRHS(TokPrec+1, RHS);
895 if (RHS == 0) return 0;
896 }
897
898 // Merge LHS/RHS.
899 LHS = new BinaryExprAST(BinOp, LHS, RHS);
900 }
901}
902
903/// expression
904/// ::= primary binoprhs
905///
906static ExprAST *ParseExpression() {
907 ExprAST *LHS = ParsePrimary();
908 if (!LHS) return 0;
909
910 return ParseBinOpRHS(0, LHS);
911}
912
913/// prototype
914/// ::= id '(' id* ')'
915static PrototypeAST *ParsePrototype() {
916 if (CurTok != tok_identifier)
917 return ErrorP("Expected function name in prototype");
918
919 std::string FnName = IdentifierStr;
920 getNextToken();
921
922 if (CurTok != '(')
923 return ErrorP("Expected '(' in prototype");
924
925 std::vector&lt;std::string&gt; ArgNames;
926 while (getNextToken() == tok_identifier)
927 ArgNames.push_back(IdentifierStr);
928 if (CurTok != ')')
929 return ErrorP("Expected ')' in prototype");
930
931 // success.
932 getNextToken(); // eat ')'.
933
934 return new PrototypeAST(FnName, ArgNames);
935}
936
937/// definition ::= 'def' prototype expression
938static FunctionAST *ParseDefinition() {
939 getNextToken(); // eat def.
940 PrototypeAST *Proto = ParsePrototype();
941 if (Proto == 0) return 0;
942
943 if (ExprAST *E = ParseExpression())
944 return new FunctionAST(Proto, E);
945 return 0;
946}
947
948/// toplevelexpr ::= expression
949static FunctionAST *ParseTopLevelExpr() {
950 if (ExprAST *E = ParseExpression()) {
951 // Make an anonymous proto.
952 PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
953 return new FunctionAST(Proto, E);
954 }
955 return 0;
956}
957
958/// external ::= 'extern' prototype
959static PrototypeAST *ParseExtern() {
960 getNextToken(); // eat extern.
961 return ParsePrototype();
962}
963
964//===----------------------------------------------------------------------===//
965// Code Generation
966//===----------------------------------------------------------------------===//
967
968static Module *TheModule;
969static LLVMBuilder Builder;
970static std::map&lt;std::string, Value*&gt; NamedValues;
971
972Value *ErrorV(const char *Str) { Error(Str); return 0; }
973
974Value *NumberExprAST::Codegen() {
975 return ConstantFP::get(Type::DoubleTy, APFloat(Val));
976}
977
978Value *VariableExprAST::Codegen() {
979 // Look this variable up in the function.
980 Value *V = NamedValues[Name];
981 return V ? V : ErrorV("Unknown variable name");
982}
983
984Value *BinaryExprAST::Codegen() {
985 Value *L = LHS-&gt;Codegen();
986 Value *R = RHS-&gt;Codegen();
987 if (L == 0 || R == 0) return 0;
988
989 switch (Op) {
990 case '+': return Builder.CreateAdd(L, R, "addtmp");
991 case '-': return Builder.CreateSub(L, R, "subtmp");
992 case '*': return Builder.CreateMul(L, R, "multmp");
993 case '&lt;':
994 L = Builder.CreateFCmpULT(L, R, "multmp");
995 // Convert bool 0/1 to double 0.0 or 1.0
996 return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
997 default: return ErrorV("invalid binary operator");
998 }
999}
1000
1001Value *CallExprAST::Codegen() {
1002 // Look up the name in the global module table.
1003 Function *CalleeF = TheModule-&gt;getFunction(Callee);
1004 if (CalleeF == 0)
1005 return ErrorV("Unknown function referenced");
1006
1007 // If argument mismatch error.
1008 if (CalleeF-&gt;arg_size() != Args.size())
1009 return ErrorV("Incorrect # arguments passed");
1010
1011 std::vector&lt;Value*&gt; ArgsV;
1012 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
1013 ArgsV.push_back(Args[i]-&gt;Codegen());
1014 if (ArgsV.back() == 0) return 0;
1015 }
1016
1017 return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
1018}
1019
1020Function *PrototypeAST::Codegen() {
1021 // Make the function type: double(double,double) etc.
Chris Lattner35abbf52007-10-23 06:23:57 +00001022 std::vector&lt;const Type*&gt; Doubles(Args.size(), Type::DoubleTy);
1023 FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
Chris Lattner2e902042007-10-22 07:01:42 +00001024
1025 Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
1026
1027 // If F conflicted, there was already something named 'Name'. If it has a
1028 // body, don't allow redefinition or reextern.
1029 if (F-&gt;getName() != Name) {
1030 // Delete the one we just made and get the existing one.
1031 F-&gt;eraseFromParent();
1032 F = TheModule-&gt;getFunction(Name);
1033
1034 // If F already has a body, reject this.
1035 if (!F-&gt;empty()) {
1036 ErrorF("redefinition of function");
1037 return 0;
1038 }
1039
1040 // If F took a different number of args, reject.
1041 if (F-&gt;arg_size() != Args.size()) {
1042 ErrorF("redefinition of function with different # args");
1043 return 0;
1044 }
1045 }
1046
1047 // Set names for all arguments.
1048 unsigned Idx = 0;
1049 for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
1050 ++AI, ++Idx) {
1051 AI-&gt;setName(Args[Idx]);
1052
1053 // Add arguments to variable symbol table.
1054 NamedValues[Args[Idx]] = AI;
1055 }
1056
1057 return F;
1058}
1059
1060Function *FunctionAST::Codegen() {
1061 NamedValues.clear();
1062
1063 Function *TheFunction = Proto-&gt;Codegen();
1064 if (TheFunction == 0)
1065 return 0;
1066
1067 // Create a new basic block to start insertion into.
Chris Lattner35abbf52007-10-23 06:23:57 +00001068 BasicBlock *BB = new BasicBlock("entry", TheFunction);
1069 Builder.SetInsertPoint(BB);
Chris Lattner2e902042007-10-22 07:01:42 +00001070
1071 if (Value *RetVal = Body-&gt;Codegen()) {
1072 // Finish off the function.
1073 Builder.CreateRet(RetVal);
1074 return TheFunction;
1075 }
1076
1077 // Error reading body, remove function.
1078 TheFunction-&gt;eraseFromParent();
1079 return 0;
1080}
1081
1082//===----------------------------------------------------------------------===//
1083// Top-Level parsing and JIT Driver
1084//===----------------------------------------------------------------------===//
1085
1086static void HandleDefinition() {
1087 if (FunctionAST *F = ParseDefinition()) {
1088 if (Function *LF = F-&gt;Codegen()) {
1089 fprintf(stderr, "Read function definition:");
1090 LF-&gt;dump();
1091 }
1092 } else {
1093 // Skip token for error recovery.
1094 getNextToken();
1095 }
1096}
1097
1098static void HandleExtern() {
1099 if (PrototypeAST *P = ParseExtern()) {
1100 if (Function *F = P-&gt;Codegen()) {
1101 fprintf(stderr, "Read extern: ");
1102 F-&gt;dump();
1103 }
1104 } else {
1105 // Skip token for error recovery.
1106 getNextToken();
1107 }
1108}
1109
1110static void HandleTopLevelExpression() {
1111 // Evaluate a top level expression into an anonymous function.
1112 if (FunctionAST *F = ParseTopLevelExpr()) {
1113 if (Function *LF = F-&gt;Codegen()) {
1114 fprintf(stderr, "Read top-level expression:");
1115 LF-&gt;dump();
1116 }
1117 } else {
1118 // Skip token for error recovery.
1119 getNextToken();
1120 }
1121}
1122
1123/// top ::= definition | external | expression | ';'
1124static void MainLoop() {
1125 while (1) {
1126 fprintf(stderr, "ready&gt; ");
1127 switch (CurTok) {
1128 case tok_eof: return;
1129 case ';': getNextToken(); break; // ignore top level semicolons.
1130 case tok_def: HandleDefinition(); break;
1131 case tok_extern: HandleExtern(); break;
1132 default: HandleTopLevelExpression(); break;
1133 }
1134 }
1135}
1136
1137
1138
1139//===----------------------------------------------------------------------===//
1140// "Library" functions that can be "extern'd" from user code.
1141//===----------------------------------------------------------------------===//
1142
1143/// putchard - putchar that takes a double and returns 0.
1144extern "C"
1145double putchard(double X) {
1146 putchar((char)X);
1147 return 0;
1148}
1149
1150//===----------------------------------------------------------------------===//
1151// Main driver code.
1152//===----------------------------------------------------------------------===//
1153
1154int main() {
1155 TheModule = new Module("my cool jit");
1156
1157 // Install standard binary operators.
1158 // 1 is lowest precedence.
1159 BinopPrecedence['&lt;'] = 10;
1160 BinopPrecedence['+'] = 20;
1161 BinopPrecedence['-'] = 20;
1162 BinopPrecedence['*'] = 40; // highest.
1163
1164 // Prime the first token.
1165 fprintf(stderr, "ready&gt; ");
1166 getNextToken();
1167
1168 MainLoop();
1169 TheModule-&gt;dump();
1170 return 0;
1171}
Chris Lattner2e902042007-10-22 07:01:42 +00001172</pre>
1173</div>
1174</div>
1175
1176<!-- *********************************************************************** -->
1177<hr>
1178<address>
1179 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
1180 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
1181 <a href="http://validator.w3.org/check/referer"><img
Chris Lattner8eef4b22007-10-23 06:30:50 +00001182 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
Chris Lattner2e902042007-10-22 07:01:42 +00001183
1184 <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
1185 <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
1186 Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $
1187</address>
1188</body>
1189</html>