blob: 3733a66bfcff7f3b3d3a190e0c54141879b0ee1d [file] [log] [blame]
Chris Lattner2e902042007-10-22 07:01:42 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
3
4<html>
5<head>
6 <title>Kaleidoscope: Implementing code generation to LLVM IR</title>
7 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
8 <meta name="author" content="Chris Lattner">
9 <link rel="stylesheet" href="../llvm.css" type="text/css">
10</head>
11
12<body>
13
14<div class="doc_title">Kaleidoscope: Code generation to LLVM IR</div>
15
Chris Lattner128eb862007-11-05 19:06:59 +000016<ul>
17<li>Chapter 3
18 <ol>
19 <li><a href="#intro">Chapter 3 Introduction</a></li>
20 <li><a href="#basics">Code Generation setup</a></li>
21 <li><a href="#exprs">Expression Code Generation</a></li>
22 <li><a href="#funcs">Function Code Generation</a></li>
23 <li><a href="#driver">Driver Changes and Closing Thoughts</a></li>
24 <li><a href="#code">Full Code Listing</a></li>
25 </ol>
26</li>
27</ul>
28
Chris Lattner2e902042007-10-22 07:01:42 +000029<div class="doc_author">
30 <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
31</div>
32
33<!-- *********************************************************************** -->
Chris Lattner128eb862007-11-05 19:06:59 +000034<div class="doc_section"><a name="intro">Chapter 3 Introduction</a></div>
Chris Lattner2e902042007-10-22 07:01:42 +000035<!-- *********************************************************************** -->
36
37<div class="doc_text">
38
Chris Lattner128eb862007-11-05 19:06:59 +000039<p>Welcome to Chapter 3 of the "<a href="index.html">Implementing a language
40with LLVM</a>" tutorial. This chapter shows you how to transform the <a
Chris Lattner2e902042007-10-22 07:01:42 +000041href="LangImpl2.html">Abstract Syntax Tree built in Chapter 2</a> into LLVM IR.
42This will teach you a little bit about how LLVM does things, as well as
43demonstrate how easy it is to use. It's much more work to build a lexer and
44parser than it is to generate LLVM IR code.
45</p>
46
47</div>
48
49<!-- *********************************************************************** -->
50<div class="doc_section"><a name="basics">Code Generation setup</a></div>
51<!-- *********************************************************************** -->
52
53<div class="doc_text">
54
55<p>
56In order to generate LLVM IR, we want some simple setup to get started. First,
57we define virtual codegen methods in each AST class:</p>
58
59<div class="doc_code">
60<pre>
61/// ExprAST - Base class for all expression nodes.
62class ExprAST {
63public:
64 virtual ~ExprAST() {}
65 virtual Value *Codegen() = 0;
66};
67
68/// NumberExprAST - Expression class for numeric literals like "1.0".
69class NumberExprAST : public ExprAST {
70 double Val;
71public:
Chris Lattner28571ed2007-10-23 04:27:44 +000072 explicit NumberExprAST(double val) : Val(val) {}
Chris Lattner2e902042007-10-22 07:01:42 +000073 virtual Value *Codegen();
74};
75...
76</pre>
77</div>
78
Chris Lattner28571ed2007-10-23 04:27:44 +000079<p>The Codegen() method says to emit IR for that AST node and all things it
80depends on, and they all return an LLVM Value object.
81"Value" is the class used to represent a "<a
82href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single
83Assignment (SSA)</a> register" or "SSA value" in LLVM. The most distinct aspect
84of SSA values is that their value is computed as the related instruction
85executes, and it does not get a new value until (and if) the instruction
86re-executes. In order words, there is no way to "change" an SSA value. For
87more information, please read up on <a
88href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single
89Assignment</a> - the concepts are really quite natural once you grok them.</p>
90
91<p>The
Chris Lattner2e902042007-10-22 07:01:42 +000092second thing we want is an "Error" method like we used for parser, which will
93be used to report errors found during code generation (for example, use of an
94undeclared parameter):</p>
95
96<div class="doc_code">
97<pre>
98Value *ErrorV(const char *Str) { Error(Str); return 0; }
99
100static Module *TheModule;
101static LLVMBuilder Builder;
102static std::map&lt;std::string, Value*&gt; NamedValues;
103</pre>
104</div>
105
106<p>The static variables will be used during code generation. <tt>TheModule</tt>
107is the LLVM construct that contains all of the functions and global variables in
108a chunk of code. In many ways, it is the top-level structure that the LLVM IR
109uses to contain code.</p>
110
111<p>The <tt>Builder</tt> object is a helper object that makes it easy to generate
Chris Lattnerf6e53df2007-11-05 18:02:15 +0000112LLVM instructions. Instances of the <a
113href="http://llvm.org/doxygen/LLVMBuilder_8h-source.html"><tt>LLVMBuilder</tt>
114class</a> keeps track of the current place to
Chris Lattner2e902042007-10-22 07:01:42 +0000115insert instructions and has methods to create new instructions.</p>
116
117<p>The <tt>NamedValues</tt> map keeps track of which values are defined in the
118current scope and what their LLVM representation is. In this form of
119Kaleidoscope, the only things that can be referenced are function parameters.
120As such, function parameters will be in this map when generating code for their
121function body.</p>
122
123<p>
124With these basics in place, we can start talking about how to generate code for
125each expression. Note that this assumes that the <tt>Builder</tt> has been set
126up to generate code <em>into</em> something. For now, we'll assume that this
127has already been done, and we'll just use it to emit code.
128</p>
129
130</div>
131
132<!-- *********************************************************************** -->
133<div class="doc_section"><a name="exprs">Expression Code Generation</a></div>
134<!-- *********************************************************************** -->
135
136<div class="doc_text">
137
138<p>Generating LLVM code for expression nodes is very straight-forward: less
139than 45 lines of commented code for all four of our expression nodes. First,
140we'll do numeric literals:</p>
141
142<div class="doc_code">
143<pre>
144Value *NumberExprAST::Codegen() {
145 return ConstantFP::get(Type::DoubleTy, APFloat(Val));
146}
147</pre>
148</div>
149
Chris Lattnerd3f0cdd2007-10-23 04:51:30 +0000150<p>In the LLVM IR, numeric constants are represented with the
151<tt>ConstantFP</tt> class, which holds the numeric value in an <tt>APFloat</tt>
152internally (<tt>APFloat</tt> has the capability of holding floating point
153constants of <em>A</em>rbitrary <em>P</em>recision). This code basically just
154creates and returns a <tt>ConstantFP</tt>. Note that in the LLVM IR
Chris Lattner2e902042007-10-22 07:01:42 +0000155that constants are all uniqued together and shared. For this reason, the API
Chris Lattnerd3f0cdd2007-10-23 04:51:30 +0000156uses "the foo::get(..)" idiom instead of "new foo(..)" or "foo::create(..).</p>
Chris Lattner2e902042007-10-22 07:01:42 +0000157
158<div class="doc_code">
159<pre>
160Value *VariableExprAST::Codegen() {
161 // Look this variable up in the function.
162 Value *V = NamedValues[Name];
163 return V ? V : ErrorV("Unknown variable name");
164}
165</pre>
166</div>
167
Chris Lattnerd3f0cdd2007-10-23 04:51:30 +0000168<p>References to variables is also quite simple here. In the simple version
169of Kaleidoscope, we assume that the variable has already been emited somewhere
170and its value is available. In practice, the only values that can be in the
171<tt>NamedValues</tt> map are function arguments. This
Chris Lattner2e902042007-10-22 07:01:42 +0000172code simply checks to see that the specified name is in the map (if not, an
173unknown variable is being referenced) and returns the value for it.</p>
174
175<div class="doc_code">
176<pre>
177Value *BinaryExprAST::Codegen() {
178 Value *L = LHS-&gt;Codegen();
179 Value *R = RHS-&gt;Codegen();
180 if (L == 0 || R == 0) return 0;
181
182 switch (Op) {
183 case '+': return Builder.CreateAdd(L, R, "addtmp");
184 case '-': return Builder.CreateSub(L, R, "subtmp");
185 case '*': return Builder.CreateMul(L, R, "multmp");
186 case '&lt;':
187 L = Builder.CreateFCmpULT(L, R, "multmp");
188 // Convert bool 0/1 to double 0.0 or 1.0
189 return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
190 default: return ErrorV("invalid binary operator");
191 }
192}
193</pre>
194</div>
195
Chris Lattnerd3f0cdd2007-10-23 04:51:30 +0000196<p>Binary operators start to get more interesting. The basic idea here is that
197we recursively emit code for the left-hand side of the expression, then the
198right-hand side, then we compute the result of the binary expression. In this
199code, we do a simple switch on the opcode to create the right LLVM instruction.
200</p>
Chris Lattner2e902042007-10-22 07:01:42 +0000201
Chris Lattnerd3f0cdd2007-10-23 04:51:30 +0000202<p>In this example, the LLVM builder class is starting to show its value.
203Because it knows where to insert the newly created instruction, you just have to
204specificy what instruction to create (e.g. with <tt>CreateAdd</tt>), which
205operands to use (<tt>L</tt> and <tt>R</tt> here) and optionally provide a name
206for the generated instruction. One nice thing about LLVM is that the name is
207just a hint: if there are multiple additions in a single function, the first
208will be named "addtmp" and the second will be "autorenamed" by adding a suffix,
209giving it a name like "addtmp42". Local value names for instructions are purely
210optional, but it makes it much easier to read the IR dumps.</p>
211
212<p><a href="../LangRef.html#instref">LLVM instructions</a> are constrained to
213have very strict type properties: for example, the Left and Right operators of
214an <a href="../LangRef.html#i_add">add instruction</a> have to have the same
215type, and that the result of the add matches the operands. Because all values
216in Kaleidoscope are doubles, this makes for very simple code for add, sub and
217mul.</p>
218
219<p>On the other hand, LLVM specifies that the <a
220href="../LangRef.html#i_fcmp">fcmp instruction</a> always returns an 'i1' value
221(a one bit integer). However, Kaleidoscope wants the value to be a 0.0 or 1.0
222value. In order to get these semantics, we combine the fcmp instruction with
223a <a href="../LangRef.html#i_uitofp">uitofp instruction</a>. This instruction
224converts its input integer into a floating point value by treating the input
225as an unsigned value. In contrast, if we used the <a
226href="../LangRef.html#i_sitofp">sitofp instruction</a>, the Kaleidoscope '<'
227operator would return 0.0 and -1.0, depending on the input value.</p>
Chris Lattner2e902042007-10-22 07:01:42 +0000228
229<div class="doc_code">
230<pre>
231Value *CallExprAST::Codegen() {
232 // Look up the name in the global module table.
233 Function *CalleeF = TheModule-&gt;getFunction(Callee);
234 if (CalleeF == 0)
235 return ErrorV("Unknown function referenced");
236
237 // If argument mismatch error.
238 if (CalleeF-&gt;arg_size() != Args.size())
239 return ErrorV("Incorrect # arguments passed");
240
241 std::vector&lt;Value*&gt; ArgsV;
242 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
243 ArgsV.push_back(Args[i]-&gt;Codegen());
244 if (ArgsV.back() == 0) return 0;
245 }
246
247 return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
248}
249</pre>
250</div>
251
Chris Lattnerd3f0cdd2007-10-23 04:51:30 +0000252<p>Code generation for function calls is quite straight-forward with LLVM. The
253code above first looks the name of the function up in the LLVM Module's symbol
254table. Recall that the LLVM Module is the container that holds all of the
255functions we are JIT'ing. By giving each function the same name as what the
256user specifies, we can use the LLVM symbol table to resolve function names for
257us.</p>
258
259<p>Once we have the function to call, we recursively codegen each argument that
260is to be passed in, and create an LLVM <a href="../LangRef.html#i_call">call
261instruction</a>. Note that LLVM uses the native C calling conventions by
262default, allowing these calls to call into standard library functions like
263"sin" and "cos" with no additional effort.</p>
264
265<p>This wraps up our handling of the four basic expressions that we have so far
266in Kaleidoscope. Feel free to go in and add some more. For example, by
267browsing the <a href="../LangRef.html">LLVM language reference</a> you'll find
268several other interesting instructions that are really easy to plug into our
269basic framework.</p>
Chris Lattner2e902042007-10-22 07:01:42 +0000270
271</div>
272
273<!-- *********************************************************************** -->
Chris Lattner35abbf52007-10-23 06:23:57 +0000274<div class="doc_section"><a name="funcs">Function Code Generation</a></div>
Chris Lattner2e902042007-10-22 07:01:42 +0000275<!-- *********************************************************************** -->
276
277<div class="doc_text">
278
Chris Lattner35abbf52007-10-23 06:23:57 +0000279<p>Code generation for prototypes and functions has to handle a number of
280details, which make their code less beautiful and elegant than expression code
281generation, but they illustrate some important points. First, lets talk about
282code generation for prototypes: this is used both for function bodies as well
283as external function declarations. The code starts with:</p>
284
285<div class="doc_code">
286<pre>
287Function *PrototypeAST::Codegen() {
288 // Make the function type: double(double,double) etc.
289 std::vector&lt;const Type*&gt; Doubles(Args.size(), Type::DoubleTy);
290 FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
291
292 Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
293</pre>
294</div>
295
Chris Lattnercf9893d2007-11-05 19:22:50 +0000296<p>This code packs a lot of power into a few lines. Note first that this
297function returns a Function* instead of a Value*. Because a "prototype" really
298talks about the external interface for a function (not the value computed by
299an expression), it makes sense for it to return the LLVM Function it corresponds
300to when codegen'd.</p>
301
302<p>The next step is to create
Chris Lattner35abbf52007-10-23 06:23:57 +0000303the <tt>FunctionType</tt> that should be used for a given Prototype. Since all
304function arguments in Kaleidoscope are of type double, the first line creates
305a vector of "N" LLVM Double types. It then uses the <tt>FunctionType::get</tt>
306method to create a function type that takes "N" doubles as arguments, returns
307one double as a result, and that is not vararg (the false parameter indicates
308this). Note that Types in LLVM are uniqued just like Constants are, so you
309don't "new" a type, you "get" it.</p>
310
311<p>The final line above actually creates the function that the prototype will
312correspond to. This indicates which type, linkage, and name to use, and which
313module to insert into. "<a href="LangRef.html#linkage">external linkage</a>"
314means that the function may be defined outside the current module and/or that it
315is callable by functions outside the module. The Name passed in is the name the
316user specified: since "<tt>TheModule</tt>" is specified, this name is registered
317in "<tt>TheModule</tt>"s symbol table, which is used by the function call code
318above.</p>
319
320<div class="doc_code">
321<pre>
322 // If F conflicted, there was already something named 'Name'. If it has a
323 // body, don't allow redefinition or reextern.
324 if (F-&gt;getName() != Name) {
325 // Delete the one we just made and get the existing one.
326 F-&gt;eraseFromParent();
327 F = TheModule-&gt;getFunction(Name);
328</pre>
329</div>
330
331<p>The Module symbol table works just like the Function symbol table when it
332comes to name conflicts: if a new function is created with a name was previously
333added to the symbol table, it will get implicitly renamed when added to the
334Module. The code above exploits this fact to tell if there was a previous
335definition of this function.</p>
336
337<p>In Kaleidoscope, I choose to allow redefinitions of functions in two cases:
338first, we want to allow 'extern'ing a function more than once, so long as the
339prototypes for the externs match (since all arguments have the same type, we
340just have to check that the number of arguments match). Second, we want to
341allow 'extern'ing a function and then definining a body for it. This is useful
342when defining mutually recursive functions.</p>
343
344<p>In order to implement this, the code above first checks to see if there is
345a collision on the name of the function. If so, it deletes the function we just
346created (by calling <tt>eraseFromParent</tt>) and then calling
347<tt>getFunction</tt> to get the existing function with the specified name. Note
348that many APIs in LLVM have "erase" forms and "remove" forms. The "remove" form
349unlinks the object from its parent (e.g. a Function from a Module) and returns
350it. The "erase" form unlinks the object and then deletes it.</p>
351
352<div class="doc_code">
353<pre>
354 // If F already has a body, reject this.
355 if (!F-&gt;empty()) {
356 ErrorF("redefinition of function");
357 return 0;
358 }
359
360 // If F took a different number of args, reject.
361 if (F-&gt;arg_size() != Args.size()) {
362 ErrorF("redefinition of function with different # args");
363 return 0;
364 }
365 }
366</pre>
367</div>
368
369<p>In order to verify the logic above, we first check to see if the preexisting
370function is "empty". In this case, empty means that it has no basic blocks in
371it, which means it has no body. If it has no body, this means its a forward
372declaration. Since we don't allow anything after a full definition of the
373function, the code rejects this case. If the previous reference to a function
374was an 'extern', we simply verify that the number of arguments for that
375definition and this one match up. If not, we emit an error.</p>
376
377<div class="doc_code">
378<pre>
379 // Set names for all arguments.
380 unsigned Idx = 0;
381 for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
382 ++AI, ++Idx) {
383 AI-&gt;setName(Args[Idx]);
384
385 // Add arguments to variable symbol table.
386 NamedValues[Args[Idx]] = AI;
387 }
388 return F;
389}
390</pre>
391</div>
392
393<p>The last bit of code for prototypes loops over all of the arguments in the
394function, setting the name of the LLVM Argument objects to match and registering
395the arguments in the <tt>NamedValues</tt> map for future use by the
396<tt>VariableExprAST</tt> AST node. Once this is set up, it returns the Function
397object to the caller. Note that we don't check for conflicting
398argument names here (e.g. "extern foo(a b a)"). Doing so would be very
399straight-forward.</p>
400
401<div class="doc_code">
402<pre>
403Function *FunctionAST::Codegen() {
404 NamedValues.clear();
405
406 Function *TheFunction = Proto-&gt;Codegen();
407 if (TheFunction == 0)
408 return 0;
409</pre>
410</div>
411
412<p>Code generation for function definitions starts out simply enough: first we
413codegen the prototype and verify that it is ok. We also clear out the
414<tt>NamedValues</tt> map to make sure that there isn't anything in it from the
415last function we compiled.</p>
416
417<div class="doc_code">
418<pre>
419 // Create a new basic block to start insertion into.
420 BasicBlock *BB = new BasicBlock("entry", TheFunction);
421 Builder.SetInsertPoint(BB);
422
423 if (Value *RetVal = Body-&gt;Codegen()) {
Chris Lattner35abbf52007-10-23 06:23:57 +0000424</pre>
425</div>
426
427<p>Now we get to the point where the <tt>Builder</tt> is set up. The first
428line creates a new <a href="http://en.wikipedia.org/wiki/Basic_block">basic
429block</a> (named "entry"), which is inserted into <tt>TheFunction</tt>. The
430second line then tells the builder that new instructions should be inserted into
431the end of the new basic block. Basic blocks in LLVM are an important part
432of functions that define the <a
433href="http://en.wikipedia.org/wiki/Control_flow_graph">Control Flow Graph</a>.
434Since we don't have any control flow, our functions will only contain one
435block so far. We'll fix this in a future installment :).</p>
436
Chris Lattnerd9b86162007-10-25 04:30:35 +0000437<div class="doc_code">
438<pre>
439 if (Value *RetVal = Body-&gt;Codegen()) {
440 // Finish off the function.
441 Builder.CreateRet(RetVal);
442
443 // Validate the generated code, checking for consistency.
444 verifyFunction(*TheFunction);
445 return TheFunction;
446 }
447</pre>
448</div>
449
Chris Lattner35abbf52007-10-23 06:23:57 +0000450<p>Once the insertion point is set up, we call the <tt>CodeGen()</tt> method for
451the root expression of the function. If no error happens, this emits code to
452compute the expression into the entry block and returns the value that was
453computed. Assuming no error, we then create an LLVM <a
Chris Lattnerd9b86162007-10-25 04:30:35 +0000454href="../LangRef.html#i_ret">ret instruction</a>, which completes the function.
455Once the function is built, we call the <tt>verifyFunction</tt> function, which
456is provided by LLVM. This function does a variety of consistency checks on the
457generated code, to determine if our compiler is doing everything right. Using
458this is important: it can catch a lot of bugs. Once the function is finished
459and validated, we return it.</p>
Chris Lattner35abbf52007-10-23 06:23:57 +0000460
461<div class="doc_code">
462<pre>
463 // Error reading body, remove function.
464 TheFunction-&gt;eraseFromParent();
465 return 0;
466}
467</pre>
468</div>
469
470<p>The only piece left here is handling of the error case. For simplicity, we
471simply handle this by deleting the function we produced with the
472<tt>eraseFromParent</tt> method. This allows the user to redefine a function
473that they incorrectly typed in before: if we didn't delete it, it would live in
474the symbol table, with a body, preventing future redefinition.</p>
475
476<p>This code does have a bug though. Since the <tt>PrototypeAST::Codegen</tt>
477can return a previously defined forward declaration, this can actually delete
478a forward declaration. There are a number of ways to fix this bug, see what you
479can come up with! Here is a testcase:</p>
480
481<div class="doc_code">
482<pre>
483extern foo(a b); # ok, defines foo.
484def foo(a b) c; # error, 'c' is invalid.
485def bar() foo(1, 2); # error, unknown function "foo"
486</pre>
487</div>
488
489</div>
490
491<!-- *********************************************************************** -->
492<div class="doc_section"><a name="driver">Driver Changes and
493Closing Thoughts</a></div>
494<!-- *********************************************************************** -->
495
496<div class="doc_text">
497
498<p>
499For now, code generation to LLVM doesn't really get us much, except that we can
500look at the pretty IR calls. The sample code inserts calls to Codegen into the
501"<tt>HandleDefinition</tt>", "<tt>HandleExtern</tt>" etc functions, and then
502dumps out the LLVM IR. This gives a nice way to look at the LLVM IR for simple
503functions. For example:
504</p>
505
506<div class="doc_code">
507<pre>
508ready> <b>4+5</b>;
509ready> Read top-level expression:
510define double @""() {
511entry:
512 %addtmp = add double 4.000000e+00, 5.000000e+00
513 ret double %addtmp
514}
515</pre>
516</div>
517
518<p>Note how the parser turns the top-level expression into anonymous functions
519for us. This will be handy when we add JIT support in the next chapter. Also
520note that the code is very literally transcribed, no optimizations are being
521performed. We will add optimizations explicitly in the next chapter.</p>
522
523<div class="doc_code">
524<pre>
525ready&gt; <b>def foo(a b) a*a + 2*a*b + b*b;</b>
526ready&gt; Read function definition:
527define double @foo(double %a, double %b) {
528entry:
529 %multmp = mul double %a, %a
530 %multmp1 = mul double 2.000000e+00, %a
531 %multmp2 = mul double %multmp1, %b
532 %addtmp = add double %multmp, %multmp2
533 %multmp3 = mul double %b, %b
534 %addtmp4 = add double %addtmp, %multmp3
535 ret double %addtmp4
536}
537</pre>
538</div>
539
540<p>This shows some simple arithmetic. Notice the striking similarity to the
541LLVM builder calls that we use to create the instructions.</p>
542
543<div class="doc_code">
544<pre>
545ready&gt; <b>def bar(a) foo(a, 4.0) + bar(31337);</b>
546ready&gt; Read function definition:
547define double @bar(double %a) {
548entry:
549 %calltmp = call double @foo( double %a, double 4.000000e+00 )
550 %calltmp1 = call double @bar( double 3.133700e+04 )
551 %addtmp = add double %calltmp, %calltmp1
552 ret double %addtmp
553}
554</pre>
555</div>
556
Chris Lattnerfc60ab02007-11-05 17:39:26 +0000557<p>This shows some function calls. Note that this function will take a long
558time to execute if you call it. In the future we'll add conditional control
559flow to make recursion actually be useful :).</p>
Chris Lattner35abbf52007-10-23 06:23:57 +0000560
561<div class="doc_code">
562<pre>
563ready&gt; <b>extern cos(x);</b>
564ready&gt; Read extern:
565declare double @cos(double)
566
567ready&gt; <b>cos(1.234);</b>
568ready&gt; Read top-level expression:
569define double @""() {
570entry:
Chris Lattner8eef4b22007-10-23 06:30:50 +0000571 %calltmp = call double @cos( double 1.234000e+00 )
Chris Lattner35abbf52007-10-23 06:23:57 +0000572 ret double %calltmp
573}
574</pre>
575</div>
576
577<p>This shows an extern for the libm "cos" function, and a call to it.</p>
578
579
580<div class="doc_code">
581<pre>
582ready&gt; <b>^D</b>
583; ModuleID = 'my cool jit'
584
585define double @""() {
586entry:
587 %addtmp = add double 4.000000e+00, 5.000000e+00
588 ret double %addtmp
589}
590
591define double @foo(double %a, double %b) {
592entry:
593 %multmp = mul double %a, %a
594 %multmp1 = mul double 2.000000e+00, %a
595 %multmp2 = mul double %multmp1, %b
596 %addtmp = add double %multmp, %multmp2
597 %multmp3 = mul double %b, %b
598 %addtmp4 = add double %addtmp, %multmp3
599 ret double %addtmp4
600}
601
602define double @bar(double %a) {
603entry:
604 %calltmp = call double @foo( double %a, double 4.000000e+00 )
605 %calltmp1 = call double @bar( double 3.133700e+04 )
606 %addtmp = add double %calltmp, %calltmp1
607 ret double %addtmp
608}
609
610declare double @cos(double)
611
612define double @""() {
613entry:
614 %calltmp = call double @cos( double 1.234000e+00 )
615 ret double %calltmp
616}
617</pre>
618</div>
619
620<p>When you quit the current demo, it dumps out the IR for the entire module
621generated. Here you can see the big picture with all the functions referencing
622each other.</p>
623
624<p>This wraps up this chapter of the Kaleidoscope tutorial. Up next we'll
625describe how to <a href="LangImpl4.html">add JIT codegen and optimizer
626support</a> to this so we can actually start running code!</p>
627
628</div>
629
630
631<!-- *********************************************************************** -->
632<div class="doc_section"><a name="code">Full Code Listing</a></div>
633<!-- *********************************************************************** -->
634
635<div class="doc_text">
636
637<p>
638Here is the complete code listing for our running example, enhanced with the
639LLVM code generator. Because this uses the LLVM libraries, we need to link
640them in. To do this, we use the <a
641href="http://llvm.org/cmds/llvm-config.html">llvm-config</a> tool to inform
642our makefile/command line about which options to use:</p>
643
644<div class="doc_code">
645<pre>
646 # Compile
Chris Lattner9ac0ca02007-10-24 05:09:48 +0000647 g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core` -o toy
Chris Lattner35abbf52007-10-23 06:23:57 +0000648 # Run
649 ./toy
650</pre>
651</div>
652
653<p>Here is the code:</p>
654
Chris Lattner2e902042007-10-22 07:01:42 +0000655<div class="doc_code">
656<pre>
657// To build this:
Chris Lattner2e902042007-10-22 07:01:42 +0000658// See example below.
659
660#include "llvm/DerivedTypes.h"
661#include "llvm/Module.h"
Chris Lattnerd9b86162007-10-25 04:30:35 +0000662#include "llvm/Analysis/Verifier.h"
Chris Lattner2e902042007-10-22 07:01:42 +0000663#include "llvm/Support/LLVMBuilder.h"
664#include &lt;cstdio&gt;
665#include &lt;string&gt;
666#include &lt;map&gt;
667#include &lt;vector&gt;
668using namespace llvm;
669
670//===----------------------------------------------------------------------===//
671// Lexer
672//===----------------------------------------------------------------------===//
673
674// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
675// of these for known things.
676enum Token {
677 tok_eof = -1,
678
679 // commands
680 tok_def = -2, tok_extern = -3,
681
682 // primary
683 tok_identifier = -4, tok_number = -5,
684};
685
686static std::string IdentifierStr; // Filled in if tok_identifier
687static double NumVal; // Filled in if tok_number
688
689/// gettok - Return the next token from standard input.
690static int gettok() {
691 static int LastChar = ' ';
692
693 // Skip any whitespace.
694 while (isspace(LastChar))
695 LastChar = getchar();
696
697 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
698 IdentifierStr = LastChar;
699 while (isalnum((LastChar = getchar())))
700 IdentifierStr += LastChar;
701
702 if (IdentifierStr == "def") return tok_def;
703 if (IdentifierStr == "extern") return tok_extern;
704 return tok_identifier;
705 }
706
707 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
708 std::string NumStr;
709 do {
710 NumStr += LastChar;
711 LastChar = getchar();
712 } while (isdigit(LastChar) || LastChar == '.');
713
714 NumVal = strtod(NumStr.c_str(), 0);
715 return tok_number;
716 }
717
718 if (LastChar == '#') {
719 // Comment until end of line.
720 do LastChar = getchar();
721 while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp; LastChar != '\r');
722
723 if (LastChar != EOF)
724 return gettok();
725 }
726
727 // Check for end of file. Don't eat the EOF.
728 if (LastChar == EOF)
729 return tok_eof;
730
731 // Otherwise, just return the character as its ascii value.
732 int ThisChar = LastChar;
733 LastChar = getchar();
734 return ThisChar;
735}
736
737//===----------------------------------------------------------------------===//
738// Abstract Syntax Tree (aka Parse Tree)
739//===----------------------------------------------------------------------===//
740
741/// ExprAST - Base class for all expression nodes.
742class ExprAST {
743public:
744 virtual ~ExprAST() {}
745 virtual Value *Codegen() = 0;
746};
747
748/// NumberExprAST - Expression class for numeric literals like "1.0".
749class NumberExprAST : public ExprAST {
750 double Val;
751public:
Chris Lattner28571ed2007-10-23 04:27:44 +0000752 explicit NumberExprAST(double val) : Val(val) {}
Chris Lattner2e902042007-10-22 07:01:42 +0000753 virtual Value *Codegen();
754};
755
756/// VariableExprAST - Expression class for referencing a variable, like "a".
757class VariableExprAST : public ExprAST {
758 std::string Name;
759public:
Chris Lattner28571ed2007-10-23 04:27:44 +0000760 explicit VariableExprAST(const std::string &amp;name) : Name(name) {}
Chris Lattner2e902042007-10-22 07:01:42 +0000761 virtual Value *Codegen();
762};
763
764/// BinaryExprAST - Expression class for a binary operator.
765class BinaryExprAST : public ExprAST {
766 char Op;
767 ExprAST *LHS, *RHS;
768public:
769 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
770 : Op(op), LHS(lhs), RHS(rhs) {}
771 virtual Value *Codegen();
772};
773
774/// CallExprAST - Expression class for function calls.
775class CallExprAST : public ExprAST {
776 std::string Callee;
777 std::vector&lt;ExprAST*&gt; Args;
778public:
779 CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
780 : Callee(callee), Args(args) {}
781 virtual Value *Codegen();
782};
783
784/// PrototypeAST - This class represents the "prototype" for a function,
785/// which captures its argument names as well as if it is an operator.
786class PrototypeAST {
787 std::string Name;
788 std::vector&lt;std::string&gt; Args;
789public:
790 PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args)
791 : Name(name), Args(args) {}
792
793 Function *Codegen();
794};
795
796/// FunctionAST - This class represents a function definition itself.
797class FunctionAST {
798 PrototypeAST *Proto;
799 ExprAST *Body;
800public:
801 FunctionAST(PrototypeAST *proto, ExprAST *body)
802 : Proto(proto), Body(body) {}
803
804 Function *Codegen();
805};
806
807//===----------------------------------------------------------------------===//
808// Parser
809//===----------------------------------------------------------------------===//
810
811/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
812/// token the parser it looking at. getNextToken reads another token from the
813/// lexer and updates CurTok with its results.
814static int CurTok;
815static int getNextToken() {
816 return CurTok = gettok();
817}
818
819/// BinopPrecedence - This holds the precedence for each binary operator that is
820/// defined.
821static std::map&lt;char, int&gt; BinopPrecedence;
822
823/// GetTokPrecedence - Get the precedence of the pending binary operator token.
824static int GetTokPrecedence() {
825 if (!isascii(CurTok))
826 return -1;
827
828 // Make sure it's a declared binop.
829 int TokPrec = BinopPrecedence[CurTok];
830 if (TokPrec &lt;= 0) return -1;
831 return TokPrec;
832}
833
834/// Error* - These are little helper functions for error handling.
835ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
836PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
837FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
838
839static ExprAST *ParseExpression();
840
841/// identifierexpr
Chris Lattner20a0c802007-11-05 17:54:34 +0000842/// ::= identifier
843/// ::= identifier '(' expression* ')'
Chris Lattner2e902042007-10-22 07:01:42 +0000844static ExprAST *ParseIdentifierExpr() {
845 std::string IdName = IdentifierStr;
846
Chris Lattner20a0c802007-11-05 17:54:34 +0000847 getNextToken(); // eat identifier.
Chris Lattner2e902042007-10-22 07:01:42 +0000848
849 if (CurTok != '(') // Simple variable ref.
850 return new VariableExprAST(IdName);
851
852 // Call.
853 getNextToken(); // eat (
854 std::vector&lt;ExprAST*&gt; Args;
855 while (1) {
856 ExprAST *Arg = ParseExpression();
857 if (!Arg) return 0;
858 Args.push_back(Arg);
859
860 if (CurTok == ')') break;
861
862 if (CurTok != ',')
863 return Error("Expected ')'");
864 getNextToken();
865 }
866
867 // Eat the ')'.
868 getNextToken();
869
870 return new CallExprAST(IdName, Args);
871}
872
873/// numberexpr ::= number
874static ExprAST *ParseNumberExpr() {
875 ExprAST *Result = new NumberExprAST(NumVal);
876 getNextToken(); // consume the number
877 return Result;
878}
879
880/// parenexpr ::= '(' expression ')'
881static ExprAST *ParseParenExpr() {
882 getNextToken(); // eat (.
883 ExprAST *V = ParseExpression();
884 if (!V) return 0;
885
886 if (CurTok != ')')
887 return Error("expected ')'");
888 getNextToken(); // eat ).
889 return V;
890}
891
892/// primary
893/// ::= identifierexpr
894/// ::= numberexpr
895/// ::= parenexpr
896static ExprAST *ParsePrimary() {
897 switch (CurTok) {
898 default: return Error("unknown token when expecting an expression");
899 case tok_identifier: return ParseIdentifierExpr();
900 case tok_number: return ParseNumberExpr();
901 case '(': return ParseParenExpr();
902 }
903}
904
905/// binoprhs
906/// ::= ('+' primary)*
907static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
908 // If this is a binop, find its precedence.
909 while (1) {
910 int TokPrec = GetTokPrecedence();
911
912 // If this is a binop that binds at least as tightly as the current binop,
913 // consume it, otherwise we are done.
914 if (TokPrec &lt; ExprPrec)
915 return LHS;
916
917 // Okay, we know this is a binop.
918 int BinOp = CurTok;
919 getNextToken(); // eat binop
920
921 // Parse the primary expression after the binary operator.
922 ExprAST *RHS = ParsePrimary();
923 if (!RHS) return 0;
924
925 // If BinOp binds less tightly with RHS than the operator after RHS, let
926 // the pending operator take RHS as its LHS.
927 int NextPrec = GetTokPrecedence();
928 if (TokPrec &lt; NextPrec) {
929 RHS = ParseBinOpRHS(TokPrec+1, RHS);
930 if (RHS == 0) return 0;
931 }
932
933 // Merge LHS/RHS.
934 LHS = new BinaryExprAST(BinOp, LHS, RHS);
935 }
936}
937
938/// expression
939/// ::= primary binoprhs
940///
941static ExprAST *ParseExpression() {
942 ExprAST *LHS = ParsePrimary();
943 if (!LHS) return 0;
944
945 return ParseBinOpRHS(0, LHS);
946}
947
948/// prototype
949/// ::= id '(' id* ')'
950static PrototypeAST *ParsePrototype() {
951 if (CurTok != tok_identifier)
952 return ErrorP("Expected function name in prototype");
953
954 std::string FnName = IdentifierStr;
955 getNextToken();
956
957 if (CurTok != '(')
958 return ErrorP("Expected '(' in prototype");
959
960 std::vector&lt;std::string&gt; ArgNames;
961 while (getNextToken() == tok_identifier)
962 ArgNames.push_back(IdentifierStr);
963 if (CurTok != ')')
964 return ErrorP("Expected ')' in prototype");
965
966 // success.
967 getNextToken(); // eat ')'.
968
969 return new PrototypeAST(FnName, ArgNames);
970}
971
972/// definition ::= 'def' prototype expression
973static FunctionAST *ParseDefinition() {
974 getNextToken(); // eat def.
975 PrototypeAST *Proto = ParsePrototype();
976 if (Proto == 0) return 0;
977
978 if (ExprAST *E = ParseExpression())
979 return new FunctionAST(Proto, E);
980 return 0;
981}
982
983/// toplevelexpr ::= expression
984static FunctionAST *ParseTopLevelExpr() {
985 if (ExprAST *E = ParseExpression()) {
986 // Make an anonymous proto.
987 PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
988 return new FunctionAST(Proto, E);
989 }
990 return 0;
991}
992
993/// external ::= 'extern' prototype
994static PrototypeAST *ParseExtern() {
995 getNextToken(); // eat extern.
996 return ParsePrototype();
997}
998
999//===----------------------------------------------------------------------===//
1000// Code Generation
1001//===----------------------------------------------------------------------===//
1002
1003static Module *TheModule;
1004static LLVMBuilder Builder;
1005static std::map&lt;std::string, Value*&gt; NamedValues;
1006
1007Value *ErrorV(const char *Str) { Error(Str); return 0; }
1008
1009Value *NumberExprAST::Codegen() {
1010 return ConstantFP::get(Type::DoubleTy, APFloat(Val));
1011}
1012
1013Value *VariableExprAST::Codegen() {
1014 // Look this variable up in the function.
1015 Value *V = NamedValues[Name];
1016 return V ? V : ErrorV("Unknown variable name");
1017}
1018
1019Value *BinaryExprAST::Codegen() {
1020 Value *L = LHS-&gt;Codegen();
1021 Value *R = RHS-&gt;Codegen();
1022 if (L == 0 || R == 0) return 0;
1023
1024 switch (Op) {
1025 case '+': return Builder.CreateAdd(L, R, "addtmp");
1026 case '-': return Builder.CreateSub(L, R, "subtmp");
1027 case '*': return Builder.CreateMul(L, R, "multmp");
1028 case '&lt;':
1029 L = Builder.CreateFCmpULT(L, R, "multmp");
1030 // Convert bool 0/1 to double 0.0 or 1.0
1031 return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
1032 default: return ErrorV("invalid binary operator");
1033 }
1034}
1035
1036Value *CallExprAST::Codegen() {
1037 // Look up the name in the global module table.
1038 Function *CalleeF = TheModule-&gt;getFunction(Callee);
1039 if (CalleeF == 0)
1040 return ErrorV("Unknown function referenced");
1041
1042 // If argument mismatch error.
1043 if (CalleeF-&gt;arg_size() != Args.size())
1044 return ErrorV("Incorrect # arguments passed");
1045
1046 std::vector&lt;Value*&gt; ArgsV;
1047 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
1048 ArgsV.push_back(Args[i]-&gt;Codegen());
1049 if (ArgsV.back() == 0) return 0;
1050 }
1051
1052 return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
1053}
1054
1055Function *PrototypeAST::Codegen() {
1056 // Make the function type: double(double,double) etc.
Chris Lattner35abbf52007-10-23 06:23:57 +00001057 std::vector&lt;const Type*&gt; Doubles(Args.size(), Type::DoubleTy);
1058 FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
Chris Lattner2e902042007-10-22 07:01:42 +00001059
1060 Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
1061
1062 // If F conflicted, there was already something named 'Name'. If it has a
1063 // body, don't allow redefinition or reextern.
1064 if (F-&gt;getName() != Name) {
1065 // Delete the one we just made and get the existing one.
1066 F-&gt;eraseFromParent();
1067 F = TheModule-&gt;getFunction(Name);
1068
1069 // If F already has a body, reject this.
1070 if (!F-&gt;empty()) {
1071 ErrorF("redefinition of function");
1072 return 0;
1073 }
1074
1075 // If F took a different number of args, reject.
1076 if (F-&gt;arg_size() != Args.size()) {
1077 ErrorF("redefinition of function with different # args");
1078 return 0;
1079 }
1080 }
1081
1082 // Set names for all arguments.
1083 unsigned Idx = 0;
1084 for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
1085 ++AI, ++Idx) {
1086 AI-&gt;setName(Args[Idx]);
1087
1088 // Add arguments to variable symbol table.
1089 NamedValues[Args[Idx]] = AI;
1090 }
1091
1092 return F;
1093}
1094
1095Function *FunctionAST::Codegen() {
1096 NamedValues.clear();
1097
1098 Function *TheFunction = Proto-&gt;Codegen();
1099 if (TheFunction == 0)
1100 return 0;
1101
1102 // Create a new basic block to start insertion into.
Chris Lattner35abbf52007-10-23 06:23:57 +00001103 BasicBlock *BB = new BasicBlock("entry", TheFunction);
1104 Builder.SetInsertPoint(BB);
Chris Lattner2e902042007-10-22 07:01:42 +00001105
1106 if (Value *RetVal = Body-&gt;Codegen()) {
1107 // Finish off the function.
1108 Builder.CreateRet(RetVal);
Chris Lattnerd9b86162007-10-25 04:30:35 +00001109
1110 // Validate the generated code, checking for consistency.
1111 verifyFunction(*TheFunction);
Chris Lattner2e902042007-10-22 07:01:42 +00001112 return TheFunction;
1113 }
1114
1115 // Error reading body, remove function.
1116 TheFunction-&gt;eraseFromParent();
1117 return 0;
1118}
1119
1120//===----------------------------------------------------------------------===//
1121// Top-Level parsing and JIT Driver
1122//===----------------------------------------------------------------------===//
1123
1124static void HandleDefinition() {
1125 if (FunctionAST *F = ParseDefinition()) {
1126 if (Function *LF = F-&gt;Codegen()) {
1127 fprintf(stderr, "Read function definition:");
1128 LF-&gt;dump();
1129 }
1130 } else {
1131 // Skip token for error recovery.
1132 getNextToken();
1133 }
1134}
1135
1136static void HandleExtern() {
1137 if (PrototypeAST *P = ParseExtern()) {
1138 if (Function *F = P-&gt;Codegen()) {
1139 fprintf(stderr, "Read extern: ");
1140 F-&gt;dump();
1141 }
1142 } else {
1143 // Skip token for error recovery.
1144 getNextToken();
1145 }
1146}
1147
1148static void HandleTopLevelExpression() {
1149 // Evaluate a top level expression into an anonymous function.
1150 if (FunctionAST *F = ParseTopLevelExpr()) {
1151 if (Function *LF = F-&gt;Codegen()) {
1152 fprintf(stderr, "Read top-level expression:");
1153 LF-&gt;dump();
1154 }
1155 } else {
1156 // Skip token for error recovery.
1157 getNextToken();
1158 }
1159}
1160
1161/// top ::= definition | external | expression | ';'
1162static void MainLoop() {
1163 while (1) {
1164 fprintf(stderr, "ready&gt; ");
1165 switch (CurTok) {
1166 case tok_eof: return;
1167 case ';': getNextToken(); break; // ignore top level semicolons.
1168 case tok_def: HandleDefinition(); break;
1169 case tok_extern: HandleExtern(); break;
1170 default: HandleTopLevelExpression(); break;
1171 }
1172 }
1173}
1174
1175
1176
1177//===----------------------------------------------------------------------===//
1178// "Library" functions that can be "extern'd" from user code.
1179//===----------------------------------------------------------------------===//
1180
1181/// putchard - putchar that takes a double and returns 0.
1182extern "C"
1183double putchard(double X) {
1184 putchar((char)X);
1185 return 0;
1186}
1187
1188//===----------------------------------------------------------------------===//
1189// Main driver code.
1190//===----------------------------------------------------------------------===//
1191
1192int main() {
1193 TheModule = new Module("my cool jit");
1194
1195 // Install standard binary operators.
1196 // 1 is lowest precedence.
1197 BinopPrecedence['&lt;'] = 10;
1198 BinopPrecedence['+'] = 20;
1199 BinopPrecedence['-'] = 20;
1200 BinopPrecedence['*'] = 40; // highest.
1201
1202 // Prime the first token.
1203 fprintf(stderr, "ready&gt; ");
1204 getNextToken();
1205
1206 MainLoop();
1207 TheModule-&gt;dump();
1208 return 0;
1209}
Chris Lattner2e902042007-10-22 07:01:42 +00001210</pre>
1211</div>
1212</div>
1213
1214<!-- *********************************************************************** -->
1215<hr>
1216<address>
1217 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
1218 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
1219 <a href="http://validator.w3.org/check/referer"><img
Chris Lattner8eef4b22007-10-23 06:30:50 +00001220 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
Chris Lattner2e902042007-10-22 07:01:42 +00001221
1222 <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
1223 <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
1224 Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $
1225</address>
1226</body>
1227</html>