blob: 9c13b486fa84a6e4bc4e2c90e07712f99ab7ef78 [file] [log] [blame]
Chris Lattnere6c91042007-10-22 06:34:15 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
3
4<html>
5<head>
6 <title>Kaleidoscope: Implementing a Parser and AST</title>
7 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
8 <meta name="author" content="Chris Lattner">
9 <link rel="stylesheet" href="../llvm.css" type="text/css">
10</head>
11
12<body>
13
14<div class="doc_title">Kaleidoscope: Implementing a Parser and AST</div>
15
Chris Lattner128eb862007-11-05 19:06:59 +000016<ul>
Chris Lattner0e555b12007-11-05 20:04:56 +000017<li><a href="index.html">Up to Tutorial Index</a></li>
Chris Lattner128eb862007-11-05 19:06:59 +000018<li>Chapter 2
19 <ol>
20 <li><a href="#intro">Chapter 2 Introduction</a></li>
21 <li><a href="#ast">The Abstract Syntax Tree (AST)</a></li>
22 <li><a href="#parserbasics">Parser Basics</a></li>
23 <li><a href="#parserprimexprs">Basic Expression Parsing</a></li>
24 <li><a href="#parserbinops">Binary Expression Parsing</a></li>
25 <li><a href="#parsertop">Parsing the Rest</a></li>
26 <li><a href="#driver">The Driver</a></li>
27 <li><a href="#conclusions">Conclusions</a></li>
28 <li><a href="#code">Full Code Listing</a></li>
29 </ol>
30</li>
Chris Lattner0e555b12007-11-05 20:04:56 +000031<li><a href="LangImpl3.html">Chapter 3</a>: Code generation to LLVM IR</li>
Chris Lattner128eb862007-11-05 19:06:59 +000032</ul>
33
Chris Lattnere6c91042007-10-22 06:34:15 +000034<div class="doc_author">
35 <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
36</div>
37
38<!-- *********************************************************************** -->
Chris Lattner128eb862007-11-05 19:06:59 +000039<div class="doc_section"><a name="intro">Chapter 2 Introduction</a></div>
Chris Lattnere6c91042007-10-22 06:34:15 +000040<!-- *********************************************************************** -->
41
42<div class="doc_text">
43
Chris Lattner128eb862007-11-05 19:06:59 +000044<p>Welcome to Chapter 2 of the "<a href="index.html">Implementing a language
Chris Lattnerbd779a32007-11-06 18:13:32 +000045with LLVM</a>" tutorial. This chapter shows you how to use the lexer, built in
46<a href="LangImpl1.html">Chapter 1</a>, to build a full <a
Chris Lattnere6c91042007-10-22 06:34:15 +000047href="http://en.wikipedia.org/wiki/Parsing">parser</a> for
Chris Lattnercde1d9d2007-11-06 07:16:22 +000048our Kaleidoscope language. Once we have a parser, we'll define and build an <a
Chris Lattnere6c91042007-10-22 06:34:15 +000049href="http://en.wikipedia.org/wiki/Abstract_syntax_tree">Abstract Syntax
50Tree</a> (AST).</p>
51
52<p>The parser we will build uses a combination of <a
53href="http://en.wikipedia.org/wiki/Recursive_descent_parser">Recursive Descent
54Parsing</a> and <a href=
55"http://en.wikipedia.org/wiki/Operator-precedence_parser">Operator-Precedence
Chris Lattnerbd779a32007-11-06 18:13:32 +000056Parsing</a> to parse the Kaleidoscope language (the latter for
57binary expressions and the former for everything else). Before we get to
58parsing though, lets talk about the output of the parser: the Abstract Syntax
59Tree.</p>
Chris Lattnere6c91042007-10-22 06:34:15 +000060
61</div>
62
63<!-- *********************************************************************** -->
64<div class="doc_section"><a name="ast">The Abstract Syntax Tree (AST)</a></div>
65<!-- *********************************************************************** -->
66
67<div class="doc_text">
68
Chris Lattnerbd779a32007-11-06 18:13:32 +000069<p>The AST for a program captures its behavior in such a way that it is easy for
Chris Lattnere6c91042007-10-22 06:34:15 +000070later stages of the compiler (e.g. code generation) to interpret. We basically
71want one object for each construct in the language, and the AST should closely
72model the language. In Kaleidoscope, we have expressions, a prototype, and a
73function object. We'll start with expressions first:</p>
74
75<div class="doc_code">
76<pre>
77/// ExprAST - Base class for all expression nodes.
78class ExprAST {
79public:
80 virtual ~ExprAST() {}
81};
82
83/// NumberExprAST - Expression class for numeric literals like "1.0".
84class NumberExprAST : public ExprAST {
85 double Val;
86public:
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +000087 NumberExprAST(double val) : Val(val) {}
Chris Lattnere6c91042007-10-22 06:34:15 +000088};
89</pre>
90</div>
91
92<p>The code above shows the definition of the base ExprAST class and one
Chris Lattnerbd779a32007-11-06 18:13:32 +000093subclass which we use for numeric literals. The important thing to note about
94this code is that the NumberExprAST class captures the numeric value of the
95literal as an instance variable. This allows later phases of the compiler to
96know what the stored numeric value is.</p>
Chris Lattnere6c91042007-10-22 06:34:15 +000097
98<p>Right now we only create the AST, so there are no useful accessor methods on
99them. It would be very easy to add a virtual method to pretty print the code,
100for example. Here are the other expression AST node definitions that we'll use
Chris Lattner729eb142008-02-10 19:11:04 +0000101in the basic form of the Kaleidoscope language:
Chris Lattnere6c91042007-10-22 06:34:15 +0000102</p>
103
104<div class="doc_code">
105<pre>
106/// VariableExprAST - Expression class for referencing a variable, like "a".
107class VariableExprAST : public ExprAST {
108 std::string Name;
109public:
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +0000110 VariableExprAST(const std::string &amp;name) : Name(name) {}
Chris Lattnere6c91042007-10-22 06:34:15 +0000111};
112
113/// BinaryExprAST - Expression class for a binary operator.
114class BinaryExprAST : public ExprAST {
115 char Op;
116 ExprAST *LHS, *RHS;
117public:
118 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
119 : Op(op), LHS(lhs), RHS(rhs) {}
120};
121
122/// CallExprAST - Expression class for function calls.
123class CallExprAST : public ExprAST {
124 std::string Callee;
125 std::vector&lt;ExprAST*&gt; Args;
126public:
127 CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
128 : Callee(callee), Args(args) {}
129};
130</pre>
131</div>
132
Chris Lattner729eb142008-02-10 19:11:04 +0000133<p>This is all (intentionally) rather straight-forward: variables capture the
Chris Lattnere6c91042007-10-22 06:34:15 +0000134variable name, binary operators capture their opcode (e.g. '+'), and calls
Chris Lattnerbd779a32007-11-06 18:13:32 +0000135capture a function name as well as a list of any argument expressions. One thing
136that is nice about our AST is that it captures the language features without
137talking about the syntax of the language. Note that there is no discussion about
138precedence of binary operators, lexical structure, etc.</p>
Chris Lattnere6c91042007-10-22 06:34:15 +0000139
140<p>For our basic language, these are all of the expression nodes we'll define.
Owen Andersonc6311b92007-10-22 06:48:28 +0000141Because it doesn't have conditional control flow, it isn't Turing-complete;
Chris Lattnere6c91042007-10-22 06:34:15 +0000142we'll fix that in a later installment. The two things we need next are a way
143to talk about the interface to a function, and a way to talk about functions
144themselves:</p>
145
146<div class="doc_code">
147<pre>
148/// PrototypeAST - This class represents the "prototype" for a function,
Chris Lattnercde1d9d2007-11-06 07:16:22 +0000149/// which captures its name, and its argument names (thus implicitly the number
150/// of arguments the function takes).
Chris Lattnere6c91042007-10-22 06:34:15 +0000151class PrototypeAST {
152 std::string Name;
153 std::vector&lt;std::string&gt; Args;
154public:
155 PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args)
156 : Name(name), Args(args) {}
157};
158
159/// FunctionAST - This class represents a function definition itself.
160class FunctionAST {
161 PrototypeAST *Proto;
162 ExprAST *Body;
163public:
164 FunctionAST(PrototypeAST *proto, ExprAST *body)
165 : Proto(proto), Body(body) {}
166};
167</pre>
168</div>
169
170<p>In Kaleidoscope, functions are typed with just a count of their arguments.
Chris Lattnercde1d9d2007-11-06 07:16:22 +0000171Since all values are double precision floating point, the type of each argument
172doesn't need to be stored anywhere. In a more aggressive and realistic
173language, the "ExprAST" class would probably have a type field.</p>
Chris Lattnere6c91042007-10-22 06:34:15 +0000174
175<p>With this scaffolding, we can now talk about parsing expressions and function
176bodies in Kaleidoscope.</p>
177
178</div>
179
180<!-- *********************************************************************** -->
181<div class="doc_section"><a name="parserbasics">Parser Basics</a></div>
182<!-- *********************************************************************** -->
183
184<div class="doc_text">
185
186<p>Now that we have an AST to build, we need to define the parser code to build
187it. The idea here is that we want to parse something like "x+y" (which is
188returned as three tokens by the lexer) into an AST that could be generated with
189calls like this:</p>
190
191<div class="doc_code">
192<pre>
193 ExprAST *X = new VariableExprAST("x");
194 ExprAST *Y = new VariableExprAST("y");
195 ExprAST *Result = new BinaryExprAST('+', X, Y);
196</pre>
197</div>
198
199<p>In order to do this, we'll start by defining some basic helper routines:</p>
200
201<div class="doc_code">
202<pre>
203/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
Chris Lattner729eb142008-02-10 19:11:04 +0000204/// token the parser is looking at. getNextToken reads another token from the
Chris Lattnere6c91042007-10-22 06:34:15 +0000205/// lexer and updates CurTok with its results.
206static int CurTok;
207static int getNextToken() {
208 return CurTok = gettok();
209}
210</pre>
211</div>
212
213<p>
214This implements a simple token buffer around the lexer. This allows
215us to look one token ahead at what the lexer is returning. Every function in
Chris Lattnere9495122007-10-25 18:05:29 +0000216our parser will assume that CurTok is the current token that needs to be
Chris Lattnere6c91042007-10-22 06:34:15 +0000217parsed.</p>
218
Chris Lattnere6c91042007-10-22 06:34:15 +0000219<div class="doc_code">
220<pre>
221
222/// Error* - These are little helper functions for error handling.
223ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
224PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
225FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
226</pre>
227</div>
228
229<p>
230The <tt>Error</tt> routines are simple helper routines that our parser will use
231to handle errors. The error recovery in our parser will not be the best and
Duncan Sands72261ff2007-11-05 16:04:58 +0000232is not particular user-friendly, but it will be enough for our tutorial. These
Chris Lattnere6c91042007-10-22 06:34:15 +0000233routines make it easier to handle errors in routines that have various return
234types: they always return null.</p>
235
Chris Lattnerbd779a32007-11-06 18:13:32 +0000236<p>With these basic helper functions, we can implement the first
237piece of our grammar: numeric literals.</p>
Chris Lattnere6c91042007-10-22 06:34:15 +0000238
239</div>
240
241<!-- *********************************************************************** -->
242<div class="doc_section"><a name="parserprimexprs">Basic Expression
243 Parsing</a></div>
244<!-- *********************************************************************** -->
245
246<div class="doc_text">
247
248<p>We start with numeric literals, because they are the simplest to process.
249For each production in our grammar, we'll define a function which parses that
250production. For numeric literals, we have:
251</p>
252
253<div class="doc_code">
254<pre>
255/// numberexpr ::= number
256static ExprAST *ParseNumberExpr() {
257 ExprAST *Result = new NumberExprAST(NumVal);
258 getNextToken(); // consume the number
259 return Result;
260}
261</pre>
262</div>
263
264<p>This routine is very simple: it expects to be called when the current token
265is a <tt>tok_number</tt> token. It takes the current number value, creates
Chris Lattner729eb142008-02-10 19:11:04 +0000266a <tt>NumberExprAST</tt> node, advances the lexer to the next token, and finally
Chris Lattnere6c91042007-10-22 06:34:15 +0000267returns.</p>
268
Chris Lattnerbd779a32007-11-06 18:13:32 +0000269<p>There are some interesting aspects to this. The most important one is that
Chris Lattner729eb142008-02-10 19:11:04 +0000270this routine eats all of the tokens that correspond to the production and
Chris Lattnere6c91042007-10-22 06:34:15 +0000271returns the lexer buffer with the next token (which is not part of the grammar
272production) ready to go. This is a fairly standard way to go for recursive
273descent parsers. For a better example, the parenthesis operator is defined like
274this:</p>
275
276<div class="doc_code">
277<pre>
278/// parenexpr ::= '(' expression ')'
279static ExprAST *ParseParenExpr() {
280 getNextToken(); // eat (.
281 ExprAST *V = ParseExpression();
282 if (!V) return 0;
283
284 if (CurTok != ')')
285 return Error("expected ')'");
286 getNextToken(); // eat ).
287 return V;
288}
289</pre>
290</div>
291
Chris Lattnerbd779a32007-11-06 18:13:32 +0000292<p>This function illustrates a number of interesting things about the
293parser:</p>
294
295<p>
Chris Lattner729eb142008-02-10 19:11:04 +00002961) It shows how we use the Error routines. When called, this function expects
Chris Lattnere6c91042007-10-22 06:34:15 +0000297that the current token is a '(' token, but after parsing the subexpression, it
Chris Lattnercde1d9d2007-11-06 07:16:22 +0000298is possible that there is no ')' waiting. For example, if the user types in
Chris Lattnere6c91042007-10-22 06:34:15 +0000299"(4 x" instead of "(4)", the parser should emit an error. Because errors can
300occur, the parser needs a way to indicate that they happened: in our parser, we
301return null on an error.</p>
302
Chris Lattnerbd779a32007-11-06 18:13:32 +0000303<p>2) Another interesting aspect of this function is that it uses recursion by
Owen Andersonc6311b92007-10-22 06:48:28 +0000304calling <tt>ParseExpression</tt> (we will soon see that <tt>ParseExpression</tt> can call
Chris Lattnere6c91042007-10-22 06:34:15 +0000305<tt>ParseParenExpr</tt>). This is powerful because it allows us to handle
306recursive grammars, and keeps each production very simple. Note that
Duncan Sands72261ff2007-11-05 16:04:58 +0000307parentheses do not cause construction of AST nodes themselves. While we could
Chris Lattner729eb142008-02-10 19:11:04 +0000308do it this way, the most important role of parentheses are to guide the parser
309and provide grouping. Once the parser constructs the AST, parentheses are not
Chris Lattnerbd779a32007-11-06 18:13:32 +0000310needed.</p>
Chris Lattnere6c91042007-10-22 06:34:15 +0000311
312<p>The next simple production is for handling variable references and function
313calls:</p>
314
315<div class="doc_code">
316<pre>
317/// identifierexpr
Chris Lattner20a0c802007-11-05 17:54:34 +0000318/// ::= identifier
319/// ::= identifier '(' expression* ')'
Chris Lattnere6c91042007-10-22 06:34:15 +0000320static ExprAST *ParseIdentifierExpr() {
321 std::string IdName = IdentifierStr;
322
Chris Lattner20a0c802007-11-05 17:54:34 +0000323 getNextToken(); // eat identifier.
Chris Lattnere6c91042007-10-22 06:34:15 +0000324
325 if (CurTok != '(') // Simple variable ref.
326 return new VariableExprAST(IdName);
327
328 // Call.
329 getNextToken(); // eat (
330 std::vector&lt;ExprAST*&gt; Args;
Chris Lattner71155212007-11-06 01:39:12 +0000331 if (CurTok != ')') {
332 while (1) {
333 ExprAST *Arg = ParseExpression();
334 if (!Arg) return 0;
335 Args.push_back(Arg);
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +0000336
Chris Lattner71155212007-11-06 01:39:12 +0000337 if (CurTok == ')') break;
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +0000338
Chris Lattner71155212007-11-06 01:39:12 +0000339 if (CurTok != ',')
Chris Lattner6c4be9c2008-04-14 16:44:41 +0000340 return Error("Expected ')' or ',' in argument list");
Chris Lattner71155212007-11-06 01:39:12 +0000341 getNextToken();
342 }
Chris Lattnere6c91042007-10-22 06:34:15 +0000343 }
344
345 // Eat the ')'.
346 getNextToken();
347
348 return new CallExprAST(IdName, Args);
349}
350</pre>
351</div>
352
Chris Lattner729eb142008-02-10 19:11:04 +0000353<p>This routine follows the same style as the other routines. (It expects to be
Chris Lattnere6c91042007-10-22 06:34:15 +0000354called if the current token is a <tt>tok_identifier</tt> token). It also has
355recursion and error handling. One interesting aspect of this is that it uses
356<em>look-ahead</em> to determine if the current identifier is a stand alone
357variable reference or if it is a function call expression. It handles this by
Chris Lattner729eb142008-02-10 19:11:04 +0000358checking to see if the token after the identifier is a '(' token, constructing
Chris Lattnere6c91042007-10-22 06:34:15 +0000359either a <tt>VariableExprAST</tt> or <tt>CallExprAST</tt> node as appropriate.
360</p>
361
Chris Lattner729eb142008-02-10 19:11:04 +0000362<p>Now that we have all of our simple expression-parsing logic in place, we can
363define a helper function to wrap it together into one entry point. We call this
Chris Lattnercde1d9d2007-11-06 07:16:22 +0000364class of expressions "primary" expressions, for reasons that will become more
365clear <a href="LangImpl6.html#unary">later in the tutorial</a>. In order to
366parse an arbitrary primary expression, we need to determine what sort of
Chris Lattner729eb142008-02-10 19:11:04 +0000367expression it is:</p>
Chris Lattnere6c91042007-10-22 06:34:15 +0000368
369<div class="doc_code">
370<pre>
371/// primary
372/// ::= identifierexpr
373/// ::= numberexpr
374/// ::= parenexpr
375static ExprAST *ParsePrimary() {
376 switch (CurTok) {
377 default: return Error("unknown token when expecting an expression");
378 case tok_identifier: return ParseIdentifierExpr();
379 case tok_number: return ParseNumberExpr();
380 case '(': return ParseParenExpr();
381 }
382}
383</pre>
384</div>
385
Chris Lattner729eb142008-02-10 19:11:04 +0000386<p>Now that you see the definition of this function, it is more obvious why we
387can assume the state of CurTok in the various functions. This uses look-ahead
388to determine which sort of expression is being inspected, and then parses it
389with a function call.</p>
Chris Lattnere6c91042007-10-22 06:34:15 +0000390
Chris Lattner729eb142008-02-10 19:11:04 +0000391<p>Now that basic expressions are handled, we need to handle binary expressions.
392They are a bit more complex.</p>
Chris Lattnere6c91042007-10-22 06:34:15 +0000393
394</div>
395
396<!-- *********************************************************************** -->
397<div class="doc_section"><a name="parserbinops">Binary Expression
398 Parsing</a></div>
399<!-- *********************************************************************** -->
400
401<div class="doc_text">
402
403<p>Binary expressions are significantly harder to parse because they are often
404ambiguous. For example, when given the string "x+y*z", the parser can choose
405to parse it as either "(x+y)*z" or "x+(y*z)". With common definitions from
406mathematics, we expect the later parse, because "*" (multiplication) has
407higher <em>precedence</em> than "+" (addition).</p>
408
409<p>There are many ways to handle this, but an elegant and efficient way is to
410use <a href=
411"http://en.wikipedia.org/wiki/Operator-precedence_parser">Operator-Precedence
412Parsing</a>. This parsing technique uses the precedence of binary operators to
413guide recursion. To start with, we need a table of precedences:</p>
414
415<div class="doc_code">
416<pre>
417/// BinopPrecedence - This holds the precedence for each binary operator that is
418/// defined.
419static std::map&lt;char, int&gt; BinopPrecedence;
420
421/// GetTokPrecedence - Get the precedence of the pending binary operator token.
422static int GetTokPrecedence() {
423 if (!isascii(CurTok))
424 return -1;
425
426 // Make sure it's a declared binop.
427 int TokPrec = BinopPrecedence[CurTok];
428 if (TokPrec &lt;= 0) return -1;
429 return TokPrec;
430}
431
432int main() {
433 // Install standard binary operators.
434 // 1 is lowest precedence.
435 BinopPrecedence['&lt;'] = 10;
436 BinopPrecedence['+'] = 20;
437 BinopPrecedence['-'] = 20;
438 BinopPrecedence['*'] = 40; // highest.
439 ...
440}
441</pre>
442</div>
443
444<p>For the basic form of Kaleidoscope, we will only support 4 binary operators
Chris Lattnercde1d9d2007-11-06 07:16:22 +0000445(this can obviously be extended by you, our brave and intrepid reader). The
Chris Lattnere6c91042007-10-22 06:34:15 +0000446<tt>GetTokPrecedence</tt> function returns the precedence for the current token,
447or -1 if the token is not a binary operator. Having a map makes it easy to add
448new operators and makes it clear that the algorithm doesn't depend on the
449specific operators involved, but it would be easy enough to eliminate the map
Chris Lattner729eb142008-02-10 19:11:04 +0000450and do the comparisons in the <tt>GetTokPrecedence</tt> function. (Or just use
Chris Lattnercde1d9d2007-11-06 07:16:22 +0000451a fixed-size array).</p>
Chris Lattnere6c91042007-10-22 06:34:15 +0000452
453<p>With the helper above defined, we can now start parsing binary expressions.
454The basic idea of operator precedence parsing is to break down an expression
Chris Lattner729eb142008-02-10 19:11:04 +0000455with potentially ambiguous binary operators into pieces. Consider ,for example,
Chris Lattnere6c91042007-10-22 06:34:15 +0000456the expression "a+b+(c+d)*e*f+g". Operator precedence parsing considers this
457as a stream of primary expressions separated by binary operators. As such,
458it will first parse the leading primary expression "a", then it will see the
459pairs [+, b] [+, (c+d)] [*, e] [*, f] and [+, g]. Note that because parentheses
Duncan Sands72261ff2007-11-05 16:04:58 +0000460are primary expressions, the binary expression parser doesn't need to worry
Chris Lattnere6c91042007-10-22 06:34:15 +0000461about nested subexpressions like (c+d) at all.
462</p>
463
464<p>
465To start, an expression is a primary expression potentially followed by a
466sequence of [binop,primaryexpr] pairs:</p>
467
468<div class="doc_code">
469<pre>
470/// expression
471/// ::= primary binoprhs
472///
473static ExprAST *ParseExpression() {
474 ExprAST *LHS = ParsePrimary();
475 if (!LHS) return 0;
476
477 return ParseBinOpRHS(0, LHS);
478}
479</pre>
480</div>
481
482<p><tt>ParseBinOpRHS</tt> is the function that parses the sequence of pairs for
Chris Lattnerbd779a32007-11-06 18:13:32 +0000483us. It takes a precedence and a pointer to an expression for the part that has been
484parsed so far. Note that "x" is a perfectly valid expression: As such, "binoprhs" is
Chris Lattnere6c91042007-10-22 06:34:15 +0000485allowed to be empty, in which case it returns the expression that is passed into
486it. In our example above, the code passes the expression for "a" into
487<tt>ParseBinOpRHS</tt> and the current token is "+".</p>
488
489<p>The precedence value passed into <tt>ParseBinOpRHS</tt> indicates the <em>
490minimal operator precedence</em> that the function is allowed to eat. For
491example, if the current pair stream is [+, x] and <tt>ParseBinOpRHS</tt> is
492passed in a precedence of 40, it will not consume any tokens (because the
493precedence of '+' is only 20). With this in mind, <tt>ParseBinOpRHS</tt> starts
494with:</p>
495
496<div class="doc_code">
497<pre>
498/// binoprhs
499/// ::= ('+' primary)*
500static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
501 // If this is a binop, find its precedence.
502 while (1) {
503 int TokPrec = GetTokPrecedence();
504
505 // If this is a binop that binds at least as tightly as the current binop,
506 // consume it, otherwise we are done.
507 if (TokPrec &lt; ExprPrec)
508 return LHS;
509</pre>
510</div>
511
512<p>This code gets the precedence of the current token and checks to see if if is
513too low. Because we defined invalid tokens to have a precedence of -1, this
514check implicitly knows that the pair-stream ends when the token stream runs out
515of binary operators. If this check succeeds, we know that the token is a binary
516operator and that it will be included in this expression:</p>
517
518<div class="doc_code">
519<pre>
520 // Okay, we know this is a binop.
521 int BinOp = CurTok;
522 getNextToken(); // eat binop
523
524 // Parse the primary expression after the binary operator.
525 ExprAST *RHS = ParsePrimary();
526 if (!RHS) return 0;
527</pre>
528</div>
529
530<p>As such, this code eats (and remembers) the binary operator and then parses
Chris Lattnerbd779a32007-11-06 18:13:32 +0000531the primary expression that follows. This builds up the whole pair, the first of
Chris Lattnere6c91042007-10-22 06:34:15 +0000532which is [+, b] for the running example.</p>
533
534<p>Now that we parsed the left-hand side of an expression and one pair of the
535RHS sequence, we have to decide which way the expression associates. In
536particular, we could have "(a+b) binop unparsed" or "a + (b binop unparsed)".
537To determine this, we look ahead at "binop" to determine its precedence and
538compare it to BinOp's precedence (which is '+' in this case):</p>
539
540<div class="doc_code">
541<pre>
542 // If BinOp binds less tightly with RHS than the operator after RHS, let
543 // the pending operator take RHS as its LHS.
544 int NextPrec = GetTokPrecedence();
545 if (TokPrec &lt; NextPrec) {
546</pre>
547</div>
548
549<p>If the precedence of the binop to the right of "RHS" is lower or equal to the
550precedence of our current operator, then we know that the parentheses associate
Chris Lattnerbd779a32007-11-06 18:13:32 +0000551as "(a+b) binop ...". In our example, the current operator is "+" and the next
552operator is "+", we know that they have the same precedence. In this case we'll
Chris Lattnere6c91042007-10-22 06:34:15 +0000553create the AST node for "a+b", and then continue parsing:</p>
554
555<div class="doc_code">
556<pre>
557 ... if body omitted ...
558 }
559
560 // Merge LHS/RHS.
561 LHS = new BinaryExprAST(BinOp, LHS, RHS);
562 } // loop around to the top of the while loop.
563}
564</pre>
565</div>
566
567<p>In our example above, this will turn "a+b+" into "(a+b)" and execute the next
Chris Lattnerbd779a32007-11-06 18:13:32 +0000568iteration of the loop, with "+" as the current token. The code above will eat,
569remember, and parse "(c+d)" as the primary expression, which makes the
570current pair equal to [+, (c+d)]. It will then evaluate the 'if' conditional above with
571"*" as the binop to the right of the primary. In this case, the precedence of "*" is
Chris Lattnere6c91042007-10-22 06:34:15 +0000572higher than the precedence of "+" so the if condition will be entered.</p>
573
574<p>The critical question left here is "how can the if condition parse the right
575hand side in full"? In particular, to build the AST correctly for our example,
576it needs to get all of "(c+d)*e*f" as the RHS expression variable. The code to
577do this is surprisingly simple (code from the above two blocks duplicated for
578context):</p>
579
580<div class="doc_code">
581<pre>
582 // If BinOp binds less tightly with RHS than the operator after RHS, let
583 // the pending operator take RHS as its LHS.
584 int NextPrec = GetTokPrecedence();
585 if (TokPrec &lt; NextPrec) {
Chris Lattnercde1d9d2007-11-06 07:16:22 +0000586 <b>RHS = ParseBinOpRHS(TokPrec+1, RHS);
587 if (RHS == 0) return 0;</b>
Chris Lattnere6c91042007-10-22 06:34:15 +0000588 }
589 // Merge LHS/RHS.
590 LHS = new BinaryExprAST(BinOp, LHS, RHS);
591 } // loop around to the top of the while loop.
592}
593</pre>
594</div>
595
596<p>At this point, we know that the binary operator to the RHS of our primary
597has higher precedence than the binop we are currently parsing. As such, we know
598that any sequence of pairs whose operators are all higher precedence than "+"
599should be parsed together and returned as "RHS". To do this, we recursively
600invoke the <tt>ParseBinOpRHS</tt> function specifying "TokPrec+1" as the minimum
601precedence required for it to continue. In our example above, this will cause
602it to return the AST node for "(c+d)*e*f" as RHS, which is then set as the RHS
603of the '+' expression.</p>
604
Chris Lattnerbd779a32007-11-06 18:13:32 +0000605<p>Finally, on the next iteration of the while loop, the "+g" piece is parsed
Chris Lattnere6c91042007-10-22 06:34:15 +0000606and added to the AST. With this little bit of code (14 non-trivial lines), we
607correctly handle fully general binary expression parsing in a very elegant way.
Chris Lattnercde1d9d2007-11-06 07:16:22 +0000608This was a whirlwind tour of this code, and it is somewhat subtle. I recommend
609running through it with a few tough examples to see how it works.
Chris Lattnere6c91042007-10-22 06:34:15 +0000610</p>
611
612<p>This wraps up handling of expressions. At this point, we can point the
Chris Lattnerbd779a32007-11-06 18:13:32 +0000613parser at an arbitrary token stream and build an expression from it, stopping
Chris Lattnere6c91042007-10-22 06:34:15 +0000614at the first token that is not part of the expression. Next up we need to
Chris Lattnerbd779a32007-11-06 18:13:32 +0000615handle function definitions, etc.</p>
Chris Lattnere6c91042007-10-22 06:34:15 +0000616
617</div>
618
619<!-- *********************************************************************** -->
620<div class="doc_section"><a name="parsertop">Parsing the Rest</a></div>
621<!-- *********************************************************************** -->
622
623<div class="doc_text">
624
625<p>
Chris Lattnercde1d9d2007-11-06 07:16:22 +0000626The next thing missing is handling of function prototypes. In Kaleidoscope,
Chris Lattnere6c91042007-10-22 06:34:15 +0000627these are used both for 'extern' function declarations as well as function body
628definitions. The code to do this is straight-forward and not very interesting
629(once you've survived expressions):
630</p>
631
632<div class="doc_code">
633<pre>
634/// prototype
635/// ::= id '(' id* ')'
636static PrototypeAST *ParsePrototype() {
637 if (CurTok != tok_identifier)
638 return ErrorP("Expected function name in prototype");
639
640 std::string FnName = IdentifierStr;
641 getNextToken();
642
643 if (CurTok != '(')
644 return ErrorP("Expected '(' in prototype");
645
Chris Lattnercde1d9d2007-11-06 07:16:22 +0000646 // Read the list of argument names.
Chris Lattnere6c91042007-10-22 06:34:15 +0000647 std::vector&lt;std::string&gt; ArgNames;
648 while (getNextToken() == tok_identifier)
649 ArgNames.push_back(IdentifierStr);
650 if (CurTok != ')')
651 return ErrorP("Expected ')' in prototype");
652
653 // success.
654 getNextToken(); // eat ')'.
655
656 return new PrototypeAST(FnName, ArgNames);
657}
658</pre>
659</div>
660
661<p>Given this, a function definition is very simple, just a prototype plus
Duncan Sands72261ff2007-11-05 16:04:58 +0000662an expression to implement the body:</p>
Chris Lattnere6c91042007-10-22 06:34:15 +0000663
664<div class="doc_code">
665<pre>
666/// definition ::= 'def' prototype expression
667static FunctionAST *ParseDefinition() {
668 getNextToken(); // eat def.
669 PrototypeAST *Proto = ParsePrototype();
670 if (Proto == 0) return 0;
671
672 if (ExprAST *E = ParseExpression())
673 return new FunctionAST(Proto, E);
674 return 0;
675}
676</pre>
677</div>
678
679<p>In addition, we support 'extern' to declare functions like 'sin' and 'cos' as
Chris Lattnerbd779a32007-11-06 18:13:32 +0000680well as to support forward declaration of user functions. These 'extern's are just
Chris Lattnere6c91042007-10-22 06:34:15 +0000681prototypes with no body:</p>
682
683<div class="doc_code">
684<pre>
685/// external ::= 'extern' prototype
686static PrototypeAST *ParseExtern() {
687 getNextToken(); // eat extern.
688 return ParsePrototype();
689}
690</pre>
691</div>
692
693<p>Finally, we'll also let the user type in arbitrary top-level expressions and
694evaluate them on the fly. We will handle this by defining anonymous nullary
695(zero argument) functions for them:</p>
696
697<div class="doc_code">
698<pre>
699/// toplevelexpr ::= expression
700static FunctionAST *ParseTopLevelExpr() {
701 if (ExprAST *E = ParseExpression()) {
702 // Make an anonymous proto.
703 PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
704 return new FunctionAST(Proto, E);
705 }
706 return 0;
707}
708</pre>
709</div>
710
Chris Lattner729eb142008-02-10 19:11:04 +0000711<p>Now that we have all the pieces, let's build a little driver that will let us
Chris Lattnere6c91042007-10-22 06:34:15 +0000712actually <em>execute</em> this code we've built!</p>
713
714</div>
715
716<!-- *********************************************************************** -->
717<div class="doc_section"><a name="driver">The Driver</a></div>
718<!-- *********************************************************************** -->
719
720<div class="doc_text">
721
722<p>The driver for this simply invokes all of the parsing pieces with a top-level
723dispatch loop. There isn't much interesting here, so I'll just include the
724top-level loop. See <a href="#code">below</a> for full code in the "Top-Level
725Parsing" section.</p>
726
727<div class="doc_code">
728<pre>
729/// top ::= definition | external | expression | ';'
730static void MainLoop() {
731 while (1) {
732 fprintf(stderr, "ready&gt; ");
733 switch (CurTok) {
734 case tok_eof: return;
Chris Lattner729eb142008-02-10 19:11:04 +0000735 case ';': getNextToken(); break; // ignore top-level semicolons.
Chris Lattnere6c91042007-10-22 06:34:15 +0000736 case tok_def: HandleDefinition(); break;
737 case tok_extern: HandleExtern(); break;
738 default: HandleTopLevelExpression(); break;
739 }
740 }
741}
742</pre>
743</div>
744
Chris Lattner729eb142008-02-10 19:11:04 +0000745<p>The most interesting part of this is that we ignore top-level semicolons.
Owen Andersonc6311b92007-10-22 06:48:28 +0000746Why is this, you ask? The basic reason is that if you type "4 + 5" at the
Chris Lattnerbd779a32007-11-06 18:13:32 +0000747command line, the parser doesn't know whether that is the end of what you will type
748or not. For example, on the next line you could type "def foo..." in which case
Chris Lattnere6c91042007-10-22 06:34:15 +00007494+5 is the end of a top-level expression. Alternatively you could type "* 6",
750which would continue the expression. Having top-level semicolons allows you to
Chris Lattner729eb142008-02-10 19:11:04 +0000751type "4+5;", and the parser will know you are done.</p>
Chris Lattnere6c91042007-10-22 06:34:15 +0000752
753</div>
754
755<!-- *********************************************************************** -->
Chris Lattner128eb862007-11-05 19:06:59 +0000756<div class="doc_section"><a name="conclusions">Conclusions</a></div>
Chris Lattnere6c91042007-10-22 06:34:15 +0000757<!-- *********************************************************************** -->
758
759<div class="doc_text">
760
Chris Lattnercde1d9d2007-11-06 07:16:22 +0000761<p>With just under 400 lines of commented code (240 lines of non-comment,
762non-blank code), we fully defined our minimal language, including a lexer,
Chris Lattner729eb142008-02-10 19:11:04 +0000763parser, and AST builder. With this done, the executable will validate
764Kaleidoscope code and tell us if it is grammatically invalid. For
Chris Lattnere6c91042007-10-22 06:34:15 +0000765example, here is a sample interaction:</p>
766
767<div class="doc_code">
768<pre>
Chris Lattnercde1d9d2007-11-06 07:16:22 +0000769$ <b>./a.out</b>
770ready&gt; <b>def foo(x y) x+foo(y, 4.0);</b>
771Parsed a function definition.
772ready&gt; <b>def foo(x y) x+y y;</b>
773Parsed a function definition.
774Parsed a top-level expr
775ready&gt; <b>def foo(x y) x+y );</b>
776Parsed a function definition.
777Error: unknown token when expecting an expression
778ready&gt; <b>extern sin(a);</b>
Chris Lattner35abbf52007-10-23 06:23:57 +0000779ready&gt; Parsed an extern
Chris Lattnercde1d9d2007-11-06 07:16:22 +0000780ready&gt; <b>^D</b>
Chris Lattnere6c91042007-10-22 06:34:15 +0000781$
782</pre>
783</div>
784
Chris Lattnerd93a5842007-10-23 05:43:01 +0000785<p>There is a lot of room for extension here. You can define new AST nodes,
786extend the language in many ways, etc. In the <a href="LangImpl3.html">next
Chris Lattnerbd779a32007-11-06 18:13:32 +0000787installment</a>, we will describe how to generate LLVM Intermediate
788Representation (IR) from the AST.</p>
Chris Lattnerd93a5842007-10-23 05:43:01 +0000789
790</div>
791
792<!-- *********************************************************************** -->
793<div class="doc_section"><a name="code">Full Code Listing</a></div>
794<!-- *********************************************************************** -->
795
796<div class="doc_text">
797
Chris Lattnere6c91042007-10-22 06:34:15 +0000798<p>
Chris Lattnerd93a5842007-10-23 05:43:01 +0000799Here is the complete code listing for this and the previous chapter.
800Note that it is fully self-contained: you don't need LLVM or any external
Chris Lattner729eb142008-02-10 19:11:04 +0000801libraries at all for this. (Besides the C and C++ standard libraries, of
802course.) To build this, just compile with:</p>
Chris Lattnere6c91042007-10-22 06:34:15 +0000803
804<div class="doc_code">
805<pre>
Chris Lattnerd93a5842007-10-23 05:43:01 +0000806 # Compile
Chris Lattnercde1d9d2007-11-06 07:16:22 +0000807 g++ -g -O3 toy.cpp
Chris Lattnerd93a5842007-10-23 05:43:01 +0000808 # Run
809 ./a.out
810</pre>
811</div>
Chris Lattnere6c91042007-10-22 06:34:15 +0000812
Chris Lattnerd93a5842007-10-23 05:43:01 +0000813<p>Here is the code:</p>
814
815<div class="doc_code">
816<pre>
Chris Lattnere6c91042007-10-22 06:34:15 +0000817#include &lt;cstdio&gt;
John McCall46c81812009-08-17 21:07:37 +0000818#include &lt;cstdlib&gt;
Chris Lattnere6c91042007-10-22 06:34:15 +0000819#include &lt;string&gt;
Chris Lattnerd93a5842007-10-23 05:43:01 +0000820#include &lt;map&gt;
Chris Lattnere6c91042007-10-22 06:34:15 +0000821#include &lt;vector&gt;
822
823//===----------------------------------------------------------------------===//
824// Lexer
825//===----------------------------------------------------------------------===//
826
827// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
828// of these for known things.
829enum Token {
830 tok_eof = -1,
831
832 // commands
833 tok_def = -2, tok_extern = -3,
834
835 // primary
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +0000836 tok_identifier = -4, tok_number = -5
Chris Lattnere6c91042007-10-22 06:34:15 +0000837};
838
839static std::string IdentifierStr; // Filled in if tok_identifier
840static double NumVal; // Filled in if tok_number
841
842/// gettok - Return the next token from standard input.
843static int gettok() {
844 static int LastChar = ' ';
845
846 // Skip any whitespace.
847 while (isspace(LastChar))
848 LastChar = getchar();
849
850 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
851 IdentifierStr = LastChar;
852 while (isalnum((LastChar = getchar())))
853 IdentifierStr += LastChar;
854
855 if (IdentifierStr == "def") return tok_def;
856 if (IdentifierStr == "extern") return tok_extern;
857 return tok_identifier;
858 }
859
860 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
861 std::string NumStr;
862 do {
863 NumStr += LastChar;
864 LastChar = getchar();
865 } while (isdigit(LastChar) || LastChar == '.');
866
867 NumVal = strtod(NumStr.c_str(), 0);
868 return tok_number;
869 }
870
871 if (LastChar == '#') {
872 // Comment until end of line.
873 do LastChar = getchar();
Chris Lattnerc80c23f2007-12-02 22:46:01 +0000874 while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp;&amp; LastChar != '\r');
Chris Lattnere6c91042007-10-22 06:34:15 +0000875
876 if (LastChar != EOF)
877 return gettok();
878 }
879
880 // Check for end of file. Don't eat the EOF.
881 if (LastChar == EOF)
882 return tok_eof;
883
884 // Otherwise, just return the character as its ascii value.
885 int ThisChar = LastChar;
886 LastChar = getchar();
887 return ThisChar;
888}
889
890//===----------------------------------------------------------------------===//
891// Abstract Syntax Tree (aka Parse Tree)
892//===----------------------------------------------------------------------===//
893
894/// ExprAST - Base class for all expression nodes.
895class ExprAST {
896public:
897 virtual ~ExprAST() {}
898};
899
900/// NumberExprAST - Expression class for numeric literals like "1.0".
901class NumberExprAST : public ExprAST {
902 double Val;
903public:
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +0000904 NumberExprAST(double val) : Val(val) {}
Chris Lattnere6c91042007-10-22 06:34:15 +0000905};
906
907/// VariableExprAST - Expression class for referencing a variable, like "a".
908class VariableExprAST : public ExprAST {
909 std::string Name;
910public:
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +0000911 VariableExprAST(const std::string &amp;name) : Name(name) {}
Chris Lattnere6c91042007-10-22 06:34:15 +0000912};
913
914/// BinaryExprAST - Expression class for a binary operator.
915class BinaryExprAST : public ExprAST {
916 char Op;
917 ExprAST *LHS, *RHS;
918public:
919 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
920 : Op(op), LHS(lhs), RHS(rhs) {}
921};
922
923/// CallExprAST - Expression class for function calls.
924class CallExprAST : public ExprAST {
925 std::string Callee;
926 std::vector&lt;ExprAST*&gt; Args;
927public:
928 CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
929 : Callee(callee), Args(args) {}
930};
931
932/// PrototypeAST - This class represents the "prototype" for a function,
Chris Lattnercde1d9d2007-11-06 07:16:22 +0000933/// which captures its name, and its argument names (thus implicitly the number
934/// of arguments the function takes).
Chris Lattnere6c91042007-10-22 06:34:15 +0000935class PrototypeAST {
936 std::string Name;
Chris Lattner5ea8ef82008-02-22 17:09:39 +0000937 std::vector&lt;std::string&gt; Args;
Chris Lattnere6c91042007-10-22 06:34:15 +0000938public:
939 PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args)
940 : Name(name), Args(args) {}
941
942};
943
944/// FunctionAST - This class represents a function definition itself.
945class FunctionAST {
946 PrototypeAST *Proto;
947 ExprAST *Body;
948public:
949 FunctionAST(PrototypeAST *proto, ExprAST *body)
950 : Proto(proto), Body(body) {}
951
952};
953
954//===----------------------------------------------------------------------===//
955// Parser
956//===----------------------------------------------------------------------===//
957
958/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
Chris Lattner729eb142008-02-10 19:11:04 +0000959/// token the parser is looking at. getNextToken reads another token from the
Chris Lattnere6c91042007-10-22 06:34:15 +0000960/// lexer and updates CurTok with its results.
961static int CurTok;
962static int getNextToken() {
963 return CurTok = gettok();
964}
965
966/// BinopPrecedence - This holds the precedence for each binary operator that is
967/// defined.
968static std::map&lt;char, int&gt; BinopPrecedence;
969
970/// GetTokPrecedence - Get the precedence of the pending binary operator token.
971static int GetTokPrecedence() {
972 if (!isascii(CurTok))
973 return -1;
974
975 // Make sure it's a declared binop.
976 int TokPrec = BinopPrecedence[CurTok];
977 if (TokPrec &lt;= 0) return -1;
978 return TokPrec;
979}
980
981/// Error* - These are little helper functions for error handling.
982ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
983PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
984FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
985
986static ExprAST *ParseExpression();
987
988/// identifierexpr
Chris Lattner20a0c802007-11-05 17:54:34 +0000989/// ::= identifier
990/// ::= identifier '(' expression* ')'
Chris Lattnere6c91042007-10-22 06:34:15 +0000991static ExprAST *ParseIdentifierExpr() {
992 std::string IdName = IdentifierStr;
993
Chris Lattner20a0c802007-11-05 17:54:34 +0000994 getNextToken(); // eat identifier.
Chris Lattnere6c91042007-10-22 06:34:15 +0000995
996 if (CurTok != '(') // Simple variable ref.
997 return new VariableExprAST(IdName);
998
999 // Call.
1000 getNextToken(); // eat (
1001 std::vector&lt;ExprAST*&gt; Args;
Chris Lattner71155212007-11-06 01:39:12 +00001002 if (CurTok != ')') {
1003 while (1) {
1004 ExprAST *Arg = ParseExpression();
1005 if (!Arg) return 0;
1006 Args.push_back(Arg);
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00001007
Chris Lattner71155212007-11-06 01:39:12 +00001008 if (CurTok == ')') break;
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00001009
Chris Lattner71155212007-11-06 01:39:12 +00001010 if (CurTok != ',')
Chris Lattner6c4be9c2008-04-14 16:44:41 +00001011 return Error("Expected ')' or ',' in argument list");
Chris Lattner71155212007-11-06 01:39:12 +00001012 getNextToken();
1013 }
Chris Lattnere6c91042007-10-22 06:34:15 +00001014 }
1015
1016 // Eat the ')'.
1017 getNextToken();
1018
1019 return new CallExprAST(IdName, Args);
1020}
1021
1022/// numberexpr ::= number
1023static ExprAST *ParseNumberExpr() {
1024 ExprAST *Result = new NumberExprAST(NumVal);
1025 getNextToken(); // consume the number
1026 return Result;
1027}
1028
1029/// parenexpr ::= '(' expression ')'
1030static ExprAST *ParseParenExpr() {
1031 getNextToken(); // eat (.
1032 ExprAST *V = ParseExpression();
1033 if (!V) return 0;
1034
1035 if (CurTok != ')')
1036 return Error("expected ')'");
1037 getNextToken(); // eat ).
1038 return V;
1039}
1040
1041/// primary
1042/// ::= identifierexpr
1043/// ::= numberexpr
1044/// ::= parenexpr
1045static ExprAST *ParsePrimary() {
1046 switch (CurTok) {
1047 default: return Error("unknown token when expecting an expression");
1048 case tok_identifier: return ParseIdentifierExpr();
1049 case tok_number: return ParseNumberExpr();
1050 case '(': return ParseParenExpr();
1051 }
1052}
1053
1054/// binoprhs
1055/// ::= ('+' primary)*
1056static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
1057 // If this is a binop, find its precedence.
1058 while (1) {
1059 int TokPrec = GetTokPrecedence();
1060
1061 // If this is a binop that binds at least as tightly as the current binop,
1062 // consume it, otherwise we are done.
1063 if (TokPrec &lt; ExprPrec)
1064 return LHS;
1065
1066 // Okay, we know this is a binop.
1067 int BinOp = CurTok;
1068 getNextToken(); // eat binop
1069
1070 // Parse the primary expression after the binary operator.
1071 ExprAST *RHS = ParsePrimary();
1072 if (!RHS) return 0;
1073
1074 // If BinOp binds less tightly with RHS than the operator after RHS, let
1075 // the pending operator take RHS as its LHS.
1076 int NextPrec = GetTokPrecedence();
1077 if (TokPrec &lt; NextPrec) {
1078 RHS = ParseBinOpRHS(TokPrec+1, RHS);
1079 if (RHS == 0) return 0;
1080 }
1081
1082 // Merge LHS/RHS.
1083 LHS = new BinaryExprAST(BinOp, LHS, RHS);
1084 }
1085}
1086
1087/// expression
1088/// ::= primary binoprhs
1089///
1090static ExprAST *ParseExpression() {
1091 ExprAST *LHS = ParsePrimary();
1092 if (!LHS) return 0;
1093
1094 return ParseBinOpRHS(0, LHS);
1095}
1096
1097/// prototype
1098/// ::= id '(' id* ')'
1099static PrototypeAST *ParsePrototype() {
1100 if (CurTok != tok_identifier)
1101 return ErrorP("Expected function name in prototype");
1102
1103 std::string FnName = IdentifierStr;
1104 getNextToken();
1105
1106 if (CurTok != '(')
1107 return ErrorP("Expected '(' in prototype");
1108
1109 std::vector&lt;std::string&gt; ArgNames;
1110 while (getNextToken() == tok_identifier)
1111 ArgNames.push_back(IdentifierStr);
1112 if (CurTok != ')')
1113 return ErrorP("Expected ')' in prototype");
1114
1115 // success.
1116 getNextToken(); // eat ')'.
1117
1118 return new PrototypeAST(FnName, ArgNames);
1119}
1120
1121/// definition ::= 'def' prototype expression
1122static FunctionAST *ParseDefinition() {
1123 getNextToken(); // eat def.
1124 PrototypeAST *Proto = ParsePrototype();
1125 if (Proto == 0) return 0;
1126
1127 if (ExprAST *E = ParseExpression())
1128 return new FunctionAST(Proto, E);
1129 return 0;
1130}
1131
1132/// toplevelexpr ::= expression
1133static FunctionAST *ParseTopLevelExpr() {
1134 if (ExprAST *E = ParseExpression()) {
1135 // Make an anonymous proto.
Chris Lattner5ea8ef82008-02-22 17:09:39 +00001136 PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
Chris Lattnere6c91042007-10-22 06:34:15 +00001137 return new FunctionAST(Proto, E);
1138 }
1139 return 0;
1140}
1141
1142/// external ::= 'extern' prototype
1143static PrototypeAST *ParseExtern() {
1144 getNextToken(); // eat extern.
1145 return ParsePrototype();
1146}
1147
1148//===----------------------------------------------------------------------===//
1149// Top-Level parsing
1150//===----------------------------------------------------------------------===//
1151
1152static void HandleDefinition() {
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00001153 if (ParseDefinition()) {
Chris Lattnere6c91042007-10-22 06:34:15 +00001154 fprintf(stderr, "Parsed a function definition.\n");
1155 } else {
1156 // Skip token for error recovery.
1157 getNextToken();
1158 }
1159}
1160
1161static void HandleExtern() {
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00001162 if (ParseExtern()) {
Chris Lattnere6c91042007-10-22 06:34:15 +00001163 fprintf(stderr, "Parsed an extern\n");
1164 } else {
1165 // Skip token for error recovery.
1166 getNextToken();
1167 }
1168}
1169
1170static void HandleTopLevelExpression() {
Chris Lattner729eb142008-02-10 19:11:04 +00001171 // Evaluate a top-level expression into an anonymous function.
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00001172 if (ParseTopLevelExpr()) {
Chris Lattnere6c91042007-10-22 06:34:15 +00001173 fprintf(stderr, "Parsed a top-level expr\n");
1174 } else {
1175 // Skip token for error recovery.
1176 getNextToken();
1177 }
1178}
1179
1180/// top ::= definition | external | expression | ';'
1181static void MainLoop() {
1182 while (1) {
1183 fprintf(stderr, "ready&gt; ");
1184 switch (CurTok) {
1185 case tok_eof: return;
Chris Lattner729eb142008-02-10 19:11:04 +00001186 case ';': getNextToken(); break; // ignore top-level semicolons.
Chris Lattnere6c91042007-10-22 06:34:15 +00001187 case tok_def: HandleDefinition(); break;
1188 case tok_extern: HandleExtern(); break;
1189 default: HandleTopLevelExpression(); break;
1190 }
1191 }
1192}
1193
1194//===----------------------------------------------------------------------===//
1195// Main driver code.
1196//===----------------------------------------------------------------------===//
1197
1198int main() {
1199 // Install standard binary operators.
1200 // 1 is lowest precedence.
1201 BinopPrecedence['&lt;'] = 10;
1202 BinopPrecedence['+'] = 20;
1203 BinopPrecedence['-'] = 20;
1204 BinopPrecedence['*'] = 40; // highest.
1205
1206 // Prime the first token.
1207 fprintf(stderr, "ready&gt; ");
1208 getNextToken();
1209
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00001210 // Run the main "interpreter loop" now.
Chris Lattnere6c91042007-10-22 06:34:15 +00001211 MainLoop();
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00001212
Chris Lattnere6c91042007-10-22 06:34:15 +00001213 return 0;
1214}
1215</pre>
1216</div>
Chris Lattner729eb142008-02-10 19:11:04 +00001217<a href="LangImpl3.html">Next: Implementing Code Generation to LLVM IR</a>
Chris Lattnere6c91042007-10-22 06:34:15 +00001218</div>
1219
1220<!-- *********************************************************************** -->
1221<hr>
1222<address>
1223 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
1224 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
1225 <a href="http://validator.w3.org/check/referer"><img
Chris Lattner8eef4b22007-10-23 06:30:50 +00001226 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
Chris Lattnere6c91042007-10-22 06:34:15 +00001227
1228 <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
1229 <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
Dan Gohman523e3922010-02-03 17:27:31 +00001230 Last modified: $Date$
Chris Lattnere6c91042007-10-22 06:34:15 +00001231</address>
1232</body>
1233</html>