blob: 59513ab3f66f6188ce5d45a1aecd022287894ee3 [file] [log] [blame]
Chris Lattner2e902042007-10-22 07:01:42 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
3
4<html>
5<head>
6 <title>Kaleidoscope: Implementing code generation to LLVM IR</title>
7 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
8 <meta name="author" content="Chris Lattner">
9 <link rel="stylesheet" href="../llvm.css" type="text/css">
10</head>
11
12<body>
13
14<div class="doc_title">Kaleidoscope: Code generation to LLVM IR</div>
15
16<div class="doc_author">
17 <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
18</div>
19
20<!-- *********************************************************************** -->
21<div class="doc_section"><a name="intro">Part 3 Introduction</a></div>
22<!-- *********************************************************************** -->
23
24<div class="doc_text">
25
26<p>Welcome to part 3 of the "<a href="index.html">Implementing a language with
27LLVM</a>" tutorial. This chapter shows you how to transform the <a
28href="LangImpl2.html">Abstract Syntax Tree built in Chapter 2</a> into LLVM IR.
29This will teach you a little bit about how LLVM does things, as well as
30demonstrate how easy it is to use. It's much more work to build a lexer and
31parser than it is to generate LLVM IR code.
32</p>
33
34</div>
35
36<!-- *********************************************************************** -->
37<div class="doc_section"><a name="basics">Code Generation setup</a></div>
38<!-- *********************************************************************** -->
39
40<div class="doc_text">
41
42<p>
43In order to generate LLVM IR, we want some simple setup to get started. First,
44we define virtual codegen methods in each AST class:</p>
45
46<div class="doc_code">
47<pre>
48/// ExprAST - Base class for all expression nodes.
49class ExprAST {
50public:
51 virtual ~ExprAST() {}
52 virtual Value *Codegen() = 0;
53};
54
55/// NumberExprAST - Expression class for numeric literals like "1.0".
56class NumberExprAST : public ExprAST {
57 double Val;
58public:
Chris Lattner28571ed2007-10-23 04:27:44 +000059 explicit NumberExprAST(double val) : Val(val) {}
Chris Lattner2e902042007-10-22 07:01:42 +000060 virtual Value *Codegen();
61};
62...
63</pre>
64</div>
65
Chris Lattner28571ed2007-10-23 04:27:44 +000066<p>The Codegen() method says to emit IR for that AST node and all things it
67depends on, and they all return an LLVM Value object.
68"Value" is the class used to represent a "<a
69href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single
70Assignment (SSA)</a> register" or "SSA value" in LLVM. The most distinct aspect
71of SSA values is that their value is computed as the related instruction
72executes, and it does not get a new value until (and if) the instruction
73re-executes. In order words, there is no way to "change" an SSA value. For
74more information, please read up on <a
75href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single
76Assignment</a> - the concepts are really quite natural once you grok them.</p>
77
78<p>The
Chris Lattner2e902042007-10-22 07:01:42 +000079second thing we want is an "Error" method like we used for parser, which will
80be used to report errors found during code generation (for example, use of an
81undeclared parameter):</p>
82
83<div class="doc_code">
84<pre>
85Value *ErrorV(const char *Str) { Error(Str); return 0; }
86
87static Module *TheModule;
88static LLVMBuilder Builder;
89static std::map&lt;std::string, Value*&gt; NamedValues;
90</pre>
91</div>
92
93<p>The static variables will be used during code generation. <tt>TheModule</tt>
94is the LLVM construct that contains all of the functions and global variables in
95a chunk of code. In many ways, it is the top-level structure that the LLVM IR
96uses to contain code.</p>
97
98<p>The <tt>Builder</tt> object is a helper object that makes it easy to generate
Chris Lattnerf6e53df2007-11-05 18:02:15 +000099LLVM instructions. Instances of the <a
100href="http://llvm.org/doxygen/LLVMBuilder_8h-source.html"><tt>LLVMBuilder</tt>
101class</a> keeps track of the current place to
Chris Lattner2e902042007-10-22 07:01:42 +0000102insert instructions and has methods to create new instructions.</p>
103
104<p>The <tt>NamedValues</tt> map keeps track of which values are defined in the
105current scope and what their LLVM representation is. In this form of
106Kaleidoscope, the only things that can be referenced are function parameters.
107As such, function parameters will be in this map when generating code for their
108function body.</p>
109
110<p>
111With these basics in place, we can start talking about how to generate code for
112each expression. Note that this assumes that the <tt>Builder</tt> has been set
113up to generate code <em>into</em> something. For now, we'll assume that this
114has already been done, and we'll just use it to emit code.
115</p>
116
117</div>
118
119<!-- *********************************************************************** -->
120<div class="doc_section"><a name="exprs">Expression Code Generation</a></div>
121<!-- *********************************************************************** -->
122
123<div class="doc_text">
124
125<p>Generating LLVM code for expression nodes is very straight-forward: less
126than 45 lines of commented code for all four of our expression nodes. First,
127we'll do numeric literals:</p>
128
129<div class="doc_code">
130<pre>
131Value *NumberExprAST::Codegen() {
132 return ConstantFP::get(Type::DoubleTy, APFloat(Val));
133}
134</pre>
135</div>
136
Chris Lattnerd3f0cdd2007-10-23 04:51:30 +0000137<p>In the LLVM IR, numeric constants are represented with the
138<tt>ConstantFP</tt> class, which holds the numeric value in an <tt>APFloat</tt>
139internally (<tt>APFloat</tt> has the capability of holding floating point
140constants of <em>A</em>rbitrary <em>P</em>recision). This code basically just
141creates and returns a <tt>ConstantFP</tt>. Note that in the LLVM IR
Chris Lattner2e902042007-10-22 07:01:42 +0000142that constants are all uniqued together and shared. For this reason, the API
Chris Lattnerd3f0cdd2007-10-23 04:51:30 +0000143uses "the foo::get(..)" idiom instead of "new foo(..)" or "foo::create(..).</p>
Chris Lattner2e902042007-10-22 07:01:42 +0000144
145<div class="doc_code">
146<pre>
147Value *VariableExprAST::Codegen() {
148 // Look this variable up in the function.
149 Value *V = NamedValues[Name];
150 return V ? V : ErrorV("Unknown variable name");
151}
152</pre>
153</div>
154
Chris Lattnerd3f0cdd2007-10-23 04:51:30 +0000155<p>References to variables is also quite simple here. In the simple version
156of Kaleidoscope, we assume that the variable has already been emited somewhere
157and its value is available. In practice, the only values that can be in the
158<tt>NamedValues</tt> map are function arguments. This
Chris Lattner2e902042007-10-22 07:01:42 +0000159code simply checks to see that the specified name is in the map (if not, an
160unknown variable is being referenced) and returns the value for it.</p>
161
162<div class="doc_code">
163<pre>
164Value *BinaryExprAST::Codegen() {
165 Value *L = LHS-&gt;Codegen();
166 Value *R = RHS-&gt;Codegen();
167 if (L == 0 || R == 0) return 0;
168
169 switch (Op) {
170 case '+': return Builder.CreateAdd(L, R, "addtmp");
171 case '-': return Builder.CreateSub(L, R, "subtmp");
172 case '*': return Builder.CreateMul(L, R, "multmp");
173 case '&lt;':
174 L = Builder.CreateFCmpULT(L, R, "multmp");
175 // Convert bool 0/1 to double 0.0 or 1.0
176 return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
177 default: return ErrorV("invalid binary operator");
178 }
179}
180</pre>
181</div>
182
Chris Lattnerd3f0cdd2007-10-23 04:51:30 +0000183<p>Binary operators start to get more interesting. The basic idea here is that
184we recursively emit code for the left-hand side of the expression, then the
185right-hand side, then we compute the result of the binary expression. In this
186code, we do a simple switch on the opcode to create the right LLVM instruction.
187</p>
Chris Lattner2e902042007-10-22 07:01:42 +0000188
Chris Lattnerd3f0cdd2007-10-23 04:51:30 +0000189<p>In this example, the LLVM builder class is starting to show its value.
190Because it knows where to insert the newly created instruction, you just have to
191specificy what instruction to create (e.g. with <tt>CreateAdd</tt>), which
192operands to use (<tt>L</tt> and <tt>R</tt> here) and optionally provide a name
193for the generated instruction. One nice thing about LLVM is that the name is
194just a hint: if there are multiple additions in a single function, the first
195will be named "addtmp" and the second will be "autorenamed" by adding a suffix,
196giving it a name like "addtmp42". Local value names for instructions are purely
197optional, but it makes it much easier to read the IR dumps.</p>
198
199<p><a href="../LangRef.html#instref">LLVM instructions</a> are constrained to
200have very strict type properties: for example, the Left and Right operators of
201an <a href="../LangRef.html#i_add">add instruction</a> have to have the same
202type, and that the result of the add matches the operands. Because all values
203in Kaleidoscope are doubles, this makes for very simple code for add, sub and
204mul.</p>
205
206<p>On the other hand, LLVM specifies that the <a
207href="../LangRef.html#i_fcmp">fcmp instruction</a> always returns an 'i1' value
208(a one bit integer). However, Kaleidoscope wants the value to be a 0.0 or 1.0
209value. In order to get these semantics, we combine the fcmp instruction with
210a <a href="../LangRef.html#i_uitofp">uitofp instruction</a>. This instruction
211converts its input integer into a floating point value by treating the input
212as an unsigned value. In contrast, if we used the <a
213href="../LangRef.html#i_sitofp">sitofp instruction</a>, the Kaleidoscope '<'
214operator would return 0.0 and -1.0, depending on the input value.</p>
Chris Lattner2e902042007-10-22 07:01:42 +0000215
216<div class="doc_code">
217<pre>
218Value *CallExprAST::Codegen() {
219 // Look up the name in the global module table.
220 Function *CalleeF = TheModule-&gt;getFunction(Callee);
221 if (CalleeF == 0)
222 return ErrorV("Unknown function referenced");
223
224 // If argument mismatch error.
225 if (CalleeF-&gt;arg_size() != Args.size())
226 return ErrorV("Incorrect # arguments passed");
227
228 std::vector&lt;Value*&gt; ArgsV;
229 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
230 ArgsV.push_back(Args[i]-&gt;Codegen());
231 if (ArgsV.back() == 0) return 0;
232 }
233
234 return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
235}
236</pre>
237</div>
238
Chris Lattnerd3f0cdd2007-10-23 04:51:30 +0000239<p>Code generation for function calls is quite straight-forward with LLVM. The
240code above first looks the name of the function up in the LLVM Module's symbol
241table. Recall that the LLVM Module is the container that holds all of the
242functions we are JIT'ing. By giving each function the same name as what the
243user specifies, we can use the LLVM symbol table to resolve function names for
244us.</p>
245
246<p>Once we have the function to call, we recursively codegen each argument that
247is to be passed in, and create an LLVM <a href="../LangRef.html#i_call">call
248instruction</a>. Note that LLVM uses the native C calling conventions by
249default, allowing these calls to call into standard library functions like
250"sin" and "cos" with no additional effort.</p>
251
252<p>This wraps up our handling of the four basic expressions that we have so far
253in Kaleidoscope. Feel free to go in and add some more. For example, by
254browsing the <a href="../LangRef.html">LLVM language reference</a> you'll find
255several other interesting instructions that are really easy to plug into our
256basic framework.</p>
Chris Lattner2e902042007-10-22 07:01:42 +0000257
258</div>
259
260<!-- *********************************************************************** -->
Chris Lattner35abbf52007-10-23 06:23:57 +0000261<div class="doc_section"><a name="funcs">Function Code Generation</a></div>
Chris Lattner2e902042007-10-22 07:01:42 +0000262<!-- *********************************************************************** -->
263
264<div class="doc_text">
265
Chris Lattner35abbf52007-10-23 06:23:57 +0000266<p>Code generation for prototypes and functions has to handle a number of
267details, which make their code less beautiful and elegant than expression code
268generation, but they illustrate some important points. First, lets talk about
269code generation for prototypes: this is used both for function bodies as well
270as external function declarations. The code starts with:</p>
271
272<div class="doc_code">
273<pre>
274Function *PrototypeAST::Codegen() {
275 // Make the function type: double(double,double) etc.
276 std::vector&lt;const Type*&gt; Doubles(Args.size(), Type::DoubleTy);
277 FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
278
279 Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
280</pre>
281</div>
282
283<p>This code packs a lot of power into a few lines. The first step is to create
284the <tt>FunctionType</tt> that should be used for a given Prototype. Since all
285function arguments in Kaleidoscope are of type double, the first line creates
286a vector of "N" LLVM Double types. It then uses the <tt>FunctionType::get</tt>
287method to create a function type that takes "N" doubles as arguments, returns
288one double as a result, and that is not vararg (the false parameter indicates
289this). Note that Types in LLVM are uniqued just like Constants are, so you
290don't "new" a type, you "get" it.</p>
291
292<p>The final line above actually creates the function that the prototype will
293correspond to. This indicates which type, linkage, and name to use, and which
294module to insert into. "<a href="LangRef.html#linkage">external linkage</a>"
295means that the function may be defined outside the current module and/or that it
296is callable by functions outside the module. The Name passed in is the name the
297user specified: since "<tt>TheModule</tt>" is specified, this name is registered
298in "<tt>TheModule</tt>"s symbol table, which is used by the function call code
299above.</p>
300
301<div class="doc_code">
302<pre>
303 // If F conflicted, there was already something named 'Name'. If it has a
304 // body, don't allow redefinition or reextern.
305 if (F-&gt;getName() != Name) {
306 // Delete the one we just made and get the existing one.
307 F-&gt;eraseFromParent();
308 F = TheModule-&gt;getFunction(Name);
309</pre>
310</div>
311
312<p>The Module symbol table works just like the Function symbol table when it
313comes to name conflicts: if a new function is created with a name was previously
314added to the symbol table, it will get implicitly renamed when added to the
315Module. The code above exploits this fact to tell if there was a previous
316definition of this function.</p>
317
318<p>In Kaleidoscope, I choose to allow redefinitions of functions in two cases:
319first, we want to allow 'extern'ing a function more than once, so long as the
320prototypes for the externs match (since all arguments have the same type, we
321just have to check that the number of arguments match). Second, we want to
322allow 'extern'ing a function and then definining a body for it. This is useful
323when defining mutually recursive functions.</p>
324
325<p>In order to implement this, the code above first checks to see if there is
326a collision on the name of the function. If so, it deletes the function we just
327created (by calling <tt>eraseFromParent</tt>) and then calling
328<tt>getFunction</tt> to get the existing function with the specified name. Note
329that many APIs in LLVM have "erase" forms and "remove" forms. The "remove" form
330unlinks the object from its parent (e.g. a Function from a Module) and returns
331it. The "erase" form unlinks the object and then deletes it.</p>
332
333<div class="doc_code">
334<pre>
335 // If F already has a body, reject this.
336 if (!F-&gt;empty()) {
337 ErrorF("redefinition of function");
338 return 0;
339 }
340
341 // If F took a different number of args, reject.
342 if (F-&gt;arg_size() != Args.size()) {
343 ErrorF("redefinition of function with different # args");
344 return 0;
345 }
346 }
347</pre>
348</div>
349
350<p>In order to verify the logic above, we first check to see if the preexisting
351function is "empty". In this case, empty means that it has no basic blocks in
352it, which means it has no body. If it has no body, this means its a forward
353declaration. Since we don't allow anything after a full definition of the
354function, the code rejects this case. If the previous reference to a function
355was an 'extern', we simply verify that the number of arguments for that
356definition and this one match up. If not, we emit an error.</p>
357
358<div class="doc_code">
359<pre>
360 // Set names for all arguments.
361 unsigned Idx = 0;
362 for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
363 ++AI, ++Idx) {
364 AI-&gt;setName(Args[Idx]);
365
366 // Add arguments to variable symbol table.
367 NamedValues[Args[Idx]] = AI;
368 }
369 return F;
370}
371</pre>
372</div>
373
374<p>The last bit of code for prototypes loops over all of the arguments in the
375function, setting the name of the LLVM Argument objects to match and registering
376the arguments in the <tt>NamedValues</tt> map for future use by the
377<tt>VariableExprAST</tt> AST node. Once this is set up, it returns the Function
378object to the caller. Note that we don't check for conflicting
379argument names here (e.g. "extern foo(a b a)"). Doing so would be very
380straight-forward.</p>
381
382<div class="doc_code">
383<pre>
384Function *FunctionAST::Codegen() {
385 NamedValues.clear();
386
387 Function *TheFunction = Proto-&gt;Codegen();
388 if (TheFunction == 0)
389 return 0;
390</pre>
391</div>
392
393<p>Code generation for function definitions starts out simply enough: first we
394codegen the prototype and verify that it is ok. We also clear out the
395<tt>NamedValues</tt> map to make sure that there isn't anything in it from the
396last function we compiled.</p>
397
398<div class="doc_code">
399<pre>
400 // Create a new basic block to start insertion into.
401 BasicBlock *BB = new BasicBlock("entry", TheFunction);
402 Builder.SetInsertPoint(BB);
403
404 if (Value *RetVal = Body-&gt;Codegen()) {
Chris Lattner35abbf52007-10-23 06:23:57 +0000405</pre>
406</div>
407
408<p>Now we get to the point where the <tt>Builder</tt> is set up. The first
409line creates a new <a href="http://en.wikipedia.org/wiki/Basic_block">basic
410block</a> (named "entry"), which is inserted into <tt>TheFunction</tt>. The
411second line then tells the builder that new instructions should be inserted into
412the end of the new basic block. Basic blocks in LLVM are an important part
413of functions that define the <a
414href="http://en.wikipedia.org/wiki/Control_flow_graph">Control Flow Graph</a>.
415Since we don't have any control flow, our functions will only contain one
416block so far. We'll fix this in a future installment :).</p>
417
Chris Lattnerd9b86162007-10-25 04:30:35 +0000418<div class="doc_code">
419<pre>
420 if (Value *RetVal = Body-&gt;Codegen()) {
421 // Finish off the function.
422 Builder.CreateRet(RetVal);
423
424 // Validate the generated code, checking for consistency.
425 verifyFunction(*TheFunction);
426 return TheFunction;
427 }
428</pre>
429</div>
430
Chris Lattner35abbf52007-10-23 06:23:57 +0000431<p>Once the insertion point is set up, we call the <tt>CodeGen()</tt> method for
432the root expression of the function. If no error happens, this emits code to
433compute the expression into the entry block and returns the value that was
434computed. Assuming no error, we then create an LLVM <a
Chris Lattnerd9b86162007-10-25 04:30:35 +0000435href="../LangRef.html#i_ret">ret instruction</a>, which completes the function.
436Once the function is built, we call the <tt>verifyFunction</tt> function, which
437is provided by LLVM. This function does a variety of consistency checks on the
438generated code, to determine if our compiler is doing everything right. Using
439this is important: it can catch a lot of bugs. Once the function is finished
440and validated, we return it.</p>
Chris Lattner35abbf52007-10-23 06:23:57 +0000441
442<div class="doc_code">
443<pre>
444 // Error reading body, remove function.
445 TheFunction-&gt;eraseFromParent();
446 return 0;
447}
448</pre>
449</div>
450
451<p>The only piece left here is handling of the error case. For simplicity, we
452simply handle this by deleting the function we produced with the
453<tt>eraseFromParent</tt> method. This allows the user to redefine a function
454that they incorrectly typed in before: if we didn't delete it, it would live in
455the symbol table, with a body, preventing future redefinition.</p>
456
457<p>This code does have a bug though. Since the <tt>PrototypeAST::Codegen</tt>
458can return a previously defined forward declaration, this can actually delete
459a forward declaration. There are a number of ways to fix this bug, see what you
460can come up with! Here is a testcase:</p>
461
462<div class="doc_code">
463<pre>
464extern foo(a b); # ok, defines foo.
465def foo(a b) c; # error, 'c' is invalid.
466def bar() foo(1, 2); # error, unknown function "foo"
467</pre>
468</div>
469
470</div>
471
472<!-- *********************************************************************** -->
473<div class="doc_section"><a name="driver">Driver Changes and
474Closing Thoughts</a></div>
475<!-- *********************************************************************** -->
476
477<div class="doc_text">
478
479<p>
480For now, code generation to LLVM doesn't really get us much, except that we can
481look at the pretty IR calls. The sample code inserts calls to Codegen into the
482"<tt>HandleDefinition</tt>", "<tt>HandleExtern</tt>" etc functions, and then
483dumps out the LLVM IR. This gives a nice way to look at the LLVM IR for simple
484functions. For example:
485</p>
486
487<div class="doc_code">
488<pre>
489ready> <b>4+5</b>;
490ready> Read top-level expression:
491define double @""() {
492entry:
493 %addtmp = add double 4.000000e+00, 5.000000e+00
494 ret double %addtmp
495}
496</pre>
497</div>
498
499<p>Note how the parser turns the top-level expression into anonymous functions
500for us. This will be handy when we add JIT support in the next chapter. Also
501note that the code is very literally transcribed, no optimizations are being
502performed. We will add optimizations explicitly in the next chapter.</p>
503
504<div class="doc_code">
505<pre>
506ready&gt; <b>def foo(a b) a*a + 2*a*b + b*b;</b>
507ready&gt; Read function definition:
508define double @foo(double %a, double %b) {
509entry:
510 %multmp = mul double %a, %a
511 %multmp1 = mul double 2.000000e+00, %a
512 %multmp2 = mul double %multmp1, %b
513 %addtmp = add double %multmp, %multmp2
514 %multmp3 = mul double %b, %b
515 %addtmp4 = add double %addtmp, %multmp3
516 ret double %addtmp4
517}
518</pre>
519</div>
520
521<p>This shows some simple arithmetic. Notice the striking similarity to the
522LLVM builder calls that we use to create the instructions.</p>
523
524<div class="doc_code">
525<pre>
526ready&gt; <b>def bar(a) foo(a, 4.0) + bar(31337);</b>
527ready&gt; Read function definition:
528define double @bar(double %a) {
529entry:
530 %calltmp = call double @foo( double %a, double 4.000000e+00 )
531 %calltmp1 = call double @bar( double 3.133700e+04 )
532 %addtmp = add double %calltmp, %calltmp1
533 ret double %addtmp
534}
535</pre>
536</div>
537
Chris Lattnerfc60ab02007-11-05 17:39:26 +0000538<p>This shows some function calls. Note that this function will take a long
539time to execute if you call it. In the future we'll add conditional control
540flow to make recursion actually be useful :).</p>
Chris Lattner35abbf52007-10-23 06:23:57 +0000541
542<div class="doc_code">
543<pre>
544ready&gt; <b>extern cos(x);</b>
545ready&gt; Read extern:
546declare double @cos(double)
547
548ready&gt; <b>cos(1.234);</b>
549ready&gt; Read top-level expression:
550define double @""() {
551entry:
Chris Lattner8eef4b22007-10-23 06:30:50 +0000552 %calltmp = call double @cos( double 1.234000e+00 )
Chris Lattner35abbf52007-10-23 06:23:57 +0000553 ret double %calltmp
554}
555</pre>
556</div>
557
558<p>This shows an extern for the libm "cos" function, and a call to it.</p>
559
560
561<div class="doc_code">
562<pre>
563ready&gt; <b>^D</b>
564; ModuleID = 'my cool jit'
565
566define double @""() {
567entry:
568 %addtmp = add double 4.000000e+00, 5.000000e+00
569 ret double %addtmp
570}
571
572define double @foo(double %a, double %b) {
573entry:
574 %multmp = mul double %a, %a
575 %multmp1 = mul double 2.000000e+00, %a
576 %multmp2 = mul double %multmp1, %b
577 %addtmp = add double %multmp, %multmp2
578 %multmp3 = mul double %b, %b
579 %addtmp4 = add double %addtmp, %multmp3
580 ret double %addtmp4
581}
582
583define double @bar(double %a) {
584entry:
585 %calltmp = call double @foo( double %a, double 4.000000e+00 )
586 %calltmp1 = call double @bar( double 3.133700e+04 )
587 %addtmp = add double %calltmp, %calltmp1
588 ret double %addtmp
589}
590
591declare double @cos(double)
592
593define double @""() {
594entry:
595 %calltmp = call double @cos( double 1.234000e+00 )
596 ret double %calltmp
597}
598</pre>
599</div>
600
601<p>When you quit the current demo, it dumps out the IR for the entire module
602generated. Here you can see the big picture with all the functions referencing
603each other.</p>
604
605<p>This wraps up this chapter of the Kaleidoscope tutorial. Up next we'll
606describe how to <a href="LangImpl4.html">add JIT codegen and optimizer
607support</a> to this so we can actually start running code!</p>
608
609</div>
610
611
612<!-- *********************************************************************** -->
613<div class="doc_section"><a name="code">Full Code Listing</a></div>
614<!-- *********************************************************************** -->
615
616<div class="doc_text">
617
618<p>
619Here is the complete code listing for our running example, enhanced with the
620LLVM code generator. Because this uses the LLVM libraries, we need to link
621them in. To do this, we use the <a
622href="http://llvm.org/cmds/llvm-config.html">llvm-config</a> tool to inform
623our makefile/command line about which options to use:</p>
624
625<div class="doc_code">
626<pre>
627 # Compile
Chris Lattner9ac0ca02007-10-24 05:09:48 +0000628 g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core` -o toy
Chris Lattner35abbf52007-10-23 06:23:57 +0000629 # Run
630 ./toy
631</pre>
632</div>
633
634<p>Here is the code:</p>
635
Chris Lattner2e902042007-10-22 07:01:42 +0000636<div class="doc_code">
637<pre>
638// To build this:
Chris Lattner2e902042007-10-22 07:01:42 +0000639// See example below.
640
641#include "llvm/DerivedTypes.h"
642#include "llvm/Module.h"
Chris Lattnerd9b86162007-10-25 04:30:35 +0000643#include "llvm/Analysis/Verifier.h"
Chris Lattner2e902042007-10-22 07:01:42 +0000644#include "llvm/Support/LLVMBuilder.h"
645#include &lt;cstdio&gt;
646#include &lt;string&gt;
647#include &lt;map&gt;
648#include &lt;vector&gt;
649using namespace llvm;
650
651//===----------------------------------------------------------------------===//
652// Lexer
653//===----------------------------------------------------------------------===//
654
655// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
656// of these for known things.
657enum Token {
658 tok_eof = -1,
659
660 // commands
661 tok_def = -2, tok_extern = -3,
662
663 // primary
664 tok_identifier = -4, tok_number = -5,
665};
666
667static std::string IdentifierStr; // Filled in if tok_identifier
668static double NumVal; // Filled in if tok_number
669
670/// gettok - Return the next token from standard input.
671static int gettok() {
672 static int LastChar = ' ';
673
674 // Skip any whitespace.
675 while (isspace(LastChar))
676 LastChar = getchar();
677
678 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
679 IdentifierStr = LastChar;
680 while (isalnum((LastChar = getchar())))
681 IdentifierStr += LastChar;
682
683 if (IdentifierStr == "def") return tok_def;
684 if (IdentifierStr == "extern") return tok_extern;
685 return tok_identifier;
686 }
687
688 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
689 std::string NumStr;
690 do {
691 NumStr += LastChar;
692 LastChar = getchar();
693 } while (isdigit(LastChar) || LastChar == '.');
694
695 NumVal = strtod(NumStr.c_str(), 0);
696 return tok_number;
697 }
698
699 if (LastChar == '#') {
700 // Comment until end of line.
701 do LastChar = getchar();
702 while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp; LastChar != '\r');
703
704 if (LastChar != EOF)
705 return gettok();
706 }
707
708 // Check for end of file. Don't eat the EOF.
709 if (LastChar == EOF)
710 return tok_eof;
711
712 // Otherwise, just return the character as its ascii value.
713 int ThisChar = LastChar;
714 LastChar = getchar();
715 return ThisChar;
716}
717
718//===----------------------------------------------------------------------===//
719// Abstract Syntax Tree (aka Parse Tree)
720//===----------------------------------------------------------------------===//
721
722/// ExprAST - Base class for all expression nodes.
723class ExprAST {
724public:
725 virtual ~ExprAST() {}
726 virtual Value *Codegen() = 0;
727};
728
729/// NumberExprAST - Expression class for numeric literals like "1.0".
730class NumberExprAST : public ExprAST {
731 double Val;
732public:
Chris Lattner28571ed2007-10-23 04:27:44 +0000733 explicit NumberExprAST(double val) : Val(val) {}
Chris Lattner2e902042007-10-22 07:01:42 +0000734 virtual Value *Codegen();
735};
736
737/// VariableExprAST - Expression class for referencing a variable, like "a".
738class VariableExprAST : public ExprAST {
739 std::string Name;
740public:
Chris Lattner28571ed2007-10-23 04:27:44 +0000741 explicit VariableExprAST(const std::string &amp;name) : Name(name) {}
Chris Lattner2e902042007-10-22 07:01:42 +0000742 virtual Value *Codegen();
743};
744
745/// BinaryExprAST - Expression class for a binary operator.
746class BinaryExprAST : public ExprAST {
747 char Op;
748 ExprAST *LHS, *RHS;
749public:
750 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
751 : Op(op), LHS(lhs), RHS(rhs) {}
752 virtual Value *Codegen();
753};
754
755/// CallExprAST - Expression class for function calls.
756class CallExprAST : public ExprAST {
757 std::string Callee;
758 std::vector&lt;ExprAST*&gt; Args;
759public:
760 CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
761 : Callee(callee), Args(args) {}
762 virtual Value *Codegen();
763};
764
765/// PrototypeAST - This class represents the "prototype" for a function,
766/// which captures its argument names as well as if it is an operator.
767class PrototypeAST {
768 std::string Name;
769 std::vector&lt;std::string&gt; Args;
770public:
771 PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args)
772 : Name(name), Args(args) {}
773
774 Function *Codegen();
775};
776
777/// FunctionAST - This class represents a function definition itself.
778class FunctionAST {
779 PrototypeAST *Proto;
780 ExprAST *Body;
781public:
782 FunctionAST(PrototypeAST *proto, ExprAST *body)
783 : Proto(proto), Body(body) {}
784
785 Function *Codegen();
786};
787
788//===----------------------------------------------------------------------===//
789// Parser
790//===----------------------------------------------------------------------===//
791
792/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
793/// token the parser it looking at. getNextToken reads another token from the
794/// lexer and updates CurTok with its results.
795static int CurTok;
796static int getNextToken() {
797 return CurTok = gettok();
798}
799
800/// BinopPrecedence - This holds the precedence for each binary operator that is
801/// defined.
802static std::map&lt;char, int&gt; BinopPrecedence;
803
804/// GetTokPrecedence - Get the precedence of the pending binary operator token.
805static int GetTokPrecedence() {
806 if (!isascii(CurTok))
807 return -1;
808
809 // Make sure it's a declared binop.
810 int TokPrec = BinopPrecedence[CurTok];
811 if (TokPrec &lt;= 0) return -1;
812 return TokPrec;
813}
814
815/// Error* - These are little helper functions for error handling.
816ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
817PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
818FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
819
820static ExprAST *ParseExpression();
821
822/// identifierexpr
Chris Lattner20a0c802007-11-05 17:54:34 +0000823/// ::= identifier
824/// ::= identifier '(' expression* ')'
Chris Lattner2e902042007-10-22 07:01:42 +0000825static ExprAST *ParseIdentifierExpr() {
826 std::string IdName = IdentifierStr;
827
Chris Lattner20a0c802007-11-05 17:54:34 +0000828 getNextToken(); // eat identifier.
Chris Lattner2e902042007-10-22 07:01:42 +0000829
830 if (CurTok != '(') // Simple variable ref.
831 return new VariableExprAST(IdName);
832
833 // Call.
834 getNextToken(); // eat (
835 std::vector&lt;ExprAST*&gt; Args;
836 while (1) {
837 ExprAST *Arg = ParseExpression();
838 if (!Arg) return 0;
839 Args.push_back(Arg);
840
841 if (CurTok == ')') break;
842
843 if (CurTok != ',')
844 return Error("Expected ')'");
845 getNextToken();
846 }
847
848 // Eat the ')'.
849 getNextToken();
850
851 return new CallExprAST(IdName, Args);
852}
853
854/// numberexpr ::= number
855static ExprAST *ParseNumberExpr() {
856 ExprAST *Result = new NumberExprAST(NumVal);
857 getNextToken(); // consume the number
858 return Result;
859}
860
861/// parenexpr ::= '(' expression ')'
862static ExprAST *ParseParenExpr() {
863 getNextToken(); // eat (.
864 ExprAST *V = ParseExpression();
865 if (!V) return 0;
866
867 if (CurTok != ')')
868 return Error("expected ')'");
869 getNextToken(); // eat ).
870 return V;
871}
872
873/// primary
874/// ::= identifierexpr
875/// ::= numberexpr
876/// ::= parenexpr
877static ExprAST *ParsePrimary() {
878 switch (CurTok) {
879 default: return Error("unknown token when expecting an expression");
880 case tok_identifier: return ParseIdentifierExpr();
881 case tok_number: return ParseNumberExpr();
882 case '(': return ParseParenExpr();
883 }
884}
885
886/// binoprhs
887/// ::= ('+' primary)*
888static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
889 // If this is a binop, find its precedence.
890 while (1) {
891 int TokPrec = GetTokPrecedence();
892
893 // If this is a binop that binds at least as tightly as the current binop,
894 // consume it, otherwise we are done.
895 if (TokPrec &lt; ExprPrec)
896 return LHS;
897
898 // Okay, we know this is a binop.
899 int BinOp = CurTok;
900 getNextToken(); // eat binop
901
902 // Parse the primary expression after the binary operator.
903 ExprAST *RHS = ParsePrimary();
904 if (!RHS) return 0;
905
906 // If BinOp binds less tightly with RHS than the operator after RHS, let
907 // the pending operator take RHS as its LHS.
908 int NextPrec = GetTokPrecedence();
909 if (TokPrec &lt; NextPrec) {
910 RHS = ParseBinOpRHS(TokPrec+1, RHS);
911 if (RHS == 0) return 0;
912 }
913
914 // Merge LHS/RHS.
915 LHS = new BinaryExprAST(BinOp, LHS, RHS);
916 }
917}
918
919/// expression
920/// ::= primary binoprhs
921///
922static ExprAST *ParseExpression() {
923 ExprAST *LHS = ParsePrimary();
924 if (!LHS) return 0;
925
926 return ParseBinOpRHS(0, LHS);
927}
928
929/// prototype
930/// ::= id '(' id* ')'
931static PrototypeAST *ParsePrototype() {
932 if (CurTok != tok_identifier)
933 return ErrorP("Expected function name in prototype");
934
935 std::string FnName = IdentifierStr;
936 getNextToken();
937
938 if (CurTok != '(')
939 return ErrorP("Expected '(' in prototype");
940
941 std::vector&lt;std::string&gt; ArgNames;
942 while (getNextToken() == tok_identifier)
943 ArgNames.push_back(IdentifierStr);
944 if (CurTok != ')')
945 return ErrorP("Expected ')' in prototype");
946
947 // success.
948 getNextToken(); // eat ')'.
949
950 return new PrototypeAST(FnName, ArgNames);
951}
952
953/// definition ::= 'def' prototype expression
954static FunctionAST *ParseDefinition() {
955 getNextToken(); // eat def.
956 PrototypeAST *Proto = ParsePrototype();
957 if (Proto == 0) return 0;
958
959 if (ExprAST *E = ParseExpression())
960 return new FunctionAST(Proto, E);
961 return 0;
962}
963
964/// toplevelexpr ::= expression
965static FunctionAST *ParseTopLevelExpr() {
966 if (ExprAST *E = ParseExpression()) {
967 // Make an anonymous proto.
968 PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
969 return new FunctionAST(Proto, E);
970 }
971 return 0;
972}
973
974/// external ::= 'extern' prototype
975static PrototypeAST *ParseExtern() {
976 getNextToken(); // eat extern.
977 return ParsePrototype();
978}
979
980//===----------------------------------------------------------------------===//
981// Code Generation
982//===----------------------------------------------------------------------===//
983
984static Module *TheModule;
985static LLVMBuilder Builder;
986static std::map&lt;std::string, Value*&gt; NamedValues;
987
988Value *ErrorV(const char *Str) { Error(Str); return 0; }
989
990Value *NumberExprAST::Codegen() {
991 return ConstantFP::get(Type::DoubleTy, APFloat(Val));
992}
993
994Value *VariableExprAST::Codegen() {
995 // Look this variable up in the function.
996 Value *V = NamedValues[Name];
997 return V ? V : ErrorV("Unknown variable name");
998}
999
1000Value *BinaryExprAST::Codegen() {
1001 Value *L = LHS-&gt;Codegen();
1002 Value *R = RHS-&gt;Codegen();
1003 if (L == 0 || R == 0) return 0;
1004
1005 switch (Op) {
1006 case '+': return Builder.CreateAdd(L, R, "addtmp");
1007 case '-': return Builder.CreateSub(L, R, "subtmp");
1008 case '*': return Builder.CreateMul(L, R, "multmp");
1009 case '&lt;':
1010 L = Builder.CreateFCmpULT(L, R, "multmp");
1011 // Convert bool 0/1 to double 0.0 or 1.0
1012 return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
1013 default: return ErrorV("invalid binary operator");
1014 }
1015}
1016
1017Value *CallExprAST::Codegen() {
1018 // Look up the name in the global module table.
1019 Function *CalleeF = TheModule-&gt;getFunction(Callee);
1020 if (CalleeF == 0)
1021 return ErrorV("Unknown function referenced");
1022
1023 // If argument mismatch error.
1024 if (CalleeF-&gt;arg_size() != Args.size())
1025 return ErrorV("Incorrect # arguments passed");
1026
1027 std::vector&lt;Value*&gt; ArgsV;
1028 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
1029 ArgsV.push_back(Args[i]-&gt;Codegen());
1030 if (ArgsV.back() == 0) return 0;
1031 }
1032
1033 return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
1034}
1035
1036Function *PrototypeAST::Codegen() {
1037 // Make the function type: double(double,double) etc.
Chris Lattner35abbf52007-10-23 06:23:57 +00001038 std::vector&lt;const Type*&gt; Doubles(Args.size(), Type::DoubleTy);
1039 FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
Chris Lattner2e902042007-10-22 07:01:42 +00001040
1041 Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
1042
1043 // If F conflicted, there was already something named 'Name'. If it has a
1044 // body, don't allow redefinition or reextern.
1045 if (F-&gt;getName() != Name) {
1046 // Delete the one we just made and get the existing one.
1047 F-&gt;eraseFromParent();
1048 F = TheModule-&gt;getFunction(Name);
1049
1050 // If F already has a body, reject this.
1051 if (!F-&gt;empty()) {
1052 ErrorF("redefinition of function");
1053 return 0;
1054 }
1055
1056 // If F took a different number of args, reject.
1057 if (F-&gt;arg_size() != Args.size()) {
1058 ErrorF("redefinition of function with different # args");
1059 return 0;
1060 }
1061 }
1062
1063 // Set names for all arguments.
1064 unsigned Idx = 0;
1065 for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
1066 ++AI, ++Idx) {
1067 AI-&gt;setName(Args[Idx]);
1068
1069 // Add arguments to variable symbol table.
1070 NamedValues[Args[Idx]] = AI;
1071 }
1072
1073 return F;
1074}
1075
1076Function *FunctionAST::Codegen() {
1077 NamedValues.clear();
1078
1079 Function *TheFunction = Proto-&gt;Codegen();
1080 if (TheFunction == 0)
1081 return 0;
1082
1083 // Create a new basic block to start insertion into.
Chris Lattner35abbf52007-10-23 06:23:57 +00001084 BasicBlock *BB = new BasicBlock("entry", TheFunction);
1085 Builder.SetInsertPoint(BB);
Chris Lattner2e902042007-10-22 07:01:42 +00001086
1087 if (Value *RetVal = Body-&gt;Codegen()) {
1088 // Finish off the function.
1089 Builder.CreateRet(RetVal);
Chris Lattnerd9b86162007-10-25 04:30:35 +00001090
1091 // Validate the generated code, checking for consistency.
1092 verifyFunction(*TheFunction);
Chris Lattner2e902042007-10-22 07:01:42 +00001093 return TheFunction;
1094 }
1095
1096 // Error reading body, remove function.
1097 TheFunction-&gt;eraseFromParent();
1098 return 0;
1099}
1100
1101//===----------------------------------------------------------------------===//
1102// Top-Level parsing and JIT Driver
1103//===----------------------------------------------------------------------===//
1104
1105static void HandleDefinition() {
1106 if (FunctionAST *F = ParseDefinition()) {
1107 if (Function *LF = F-&gt;Codegen()) {
1108 fprintf(stderr, "Read function definition:");
1109 LF-&gt;dump();
1110 }
1111 } else {
1112 // Skip token for error recovery.
1113 getNextToken();
1114 }
1115}
1116
1117static void HandleExtern() {
1118 if (PrototypeAST *P = ParseExtern()) {
1119 if (Function *F = P-&gt;Codegen()) {
1120 fprintf(stderr, "Read extern: ");
1121 F-&gt;dump();
1122 }
1123 } else {
1124 // Skip token for error recovery.
1125 getNextToken();
1126 }
1127}
1128
1129static void HandleTopLevelExpression() {
1130 // Evaluate a top level expression into an anonymous function.
1131 if (FunctionAST *F = ParseTopLevelExpr()) {
1132 if (Function *LF = F-&gt;Codegen()) {
1133 fprintf(stderr, "Read top-level expression:");
1134 LF-&gt;dump();
1135 }
1136 } else {
1137 // Skip token for error recovery.
1138 getNextToken();
1139 }
1140}
1141
1142/// top ::= definition | external | expression | ';'
1143static void MainLoop() {
1144 while (1) {
1145 fprintf(stderr, "ready&gt; ");
1146 switch (CurTok) {
1147 case tok_eof: return;
1148 case ';': getNextToken(); break; // ignore top level semicolons.
1149 case tok_def: HandleDefinition(); break;
1150 case tok_extern: HandleExtern(); break;
1151 default: HandleTopLevelExpression(); break;
1152 }
1153 }
1154}
1155
1156
1157
1158//===----------------------------------------------------------------------===//
1159// "Library" functions that can be "extern'd" from user code.
1160//===----------------------------------------------------------------------===//
1161
1162/// putchard - putchar that takes a double and returns 0.
1163extern "C"
1164double putchard(double X) {
1165 putchar((char)X);
1166 return 0;
1167}
1168
1169//===----------------------------------------------------------------------===//
1170// Main driver code.
1171//===----------------------------------------------------------------------===//
1172
1173int main() {
1174 TheModule = new Module("my cool jit");
1175
1176 // Install standard binary operators.
1177 // 1 is lowest precedence.
1178 BinopPrecedence['&lt;'] = 10;
1179 BinopPrecedence['+'] = 20;
1180 BinopPrecedence['-'] = 20;
1181 BinopPrecedence['*'] = 40; // highest.
1182
1183 // Prime the first token.
1184 fprintf(stderr, "ready&gt; ");
1185 getNextToken();
1186
1187 MainLoop();
1188 TheModule-&gt;dump();
1189 return 0;
1190}
Chris Lattner2e902042007-10-22 07:01:42 +00001191</pre>
1192</div>
1193</div>
1194
1195<!-- *********************************************************************** -->
1196<hr>
1197<address>
1198 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
1199 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
1200 <a href="http://validator.w3.org/check/referer"><img
Chris Lattner8eef4b22007-10-23 06:30:50 +00001201 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
Chris Lattner2e902042007-10-22 07:01:42 +00001202
1203 <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
1204 <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
1205 Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $
1206</address>
1207</body>
1208</html>