blob: acccd20a09048426ca3dbd62e2dfaa18d4b36f7a [file] [log] [blame]
Chris Lattnere6c91042007-10-22 06:34:15 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
3
4<html>
5<head>
6 <title>Kaleidoscope: Implementing a Parser and AST</title>
7 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
8 <meta name="author" content="Chris Lattner">
9 <link rel="stylesheet" href="../llvm.css" type="text/css">
10</head>
11
12<body>
13
NAKAMURA Takumi05d02652011-04-18 23:59:50 +000014<h1>Kaleidoscope: Implementing a Parser and AST</h1>
Chris Lattnere6c91042007-10-22 06:34:15 +000015
Chris Lattner128eb862007-11-05 19:06:59 +000016<ul>
Chris Lattner0e555b12007-11-05 20:04:56 +000017<li><a href="index.html">Up to Tutorial Index</a></li>
Chris Lattner128eb862007-11-05 19:06:59 +000018<li>Chapter 2
19 <ol>
20 <li><a href="#intro">Chapter 2 Introduction</a></li>
21 <li><a href="#ast">The Abstract Syntax Tree (AST)</a></li>
22 <li><a href="#parserbasics">Parser Basics</a></li>
23 <li><a href="#parserprimexprs">Basic Expression Parsing</a></li>
24 <li><a href="#parserbinops">Binary Expression Parsing</a></li>
25 <li><a href="#parsertop">Parsing the Rest</a></li>
26 <li><a href="#driver">The Driver</a></li>
27 <li><a href="#conclusions">Conclusions</a></li>
28 <li><a href="#code">Full Code Listing</a></li>
29 </ol>
30</li>
Chris Lattner0e555b12007-11-05 20:04:56 +000031<li><a href="LangImpl3.html">Chapter 3</a>: Code generation to LLVM IR</li>
Chris Lattner128eb862007-11-05 19:06:59 +000032</ul>
33
Chris Lattnere6c91042007-10-22 06:34:15 +000034<div class="doc_author">
35 <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
36</div>
37
38<!-- *********************************************************************** -->
NAKAMURA Takumi05d02652011-04-18 23:59:50 +000039<h2><a name="intro">Chapter 2 Introduction</a></h2>
Chris Lattnere6c91042007-10-22 06:34:15 +000040<!-- *********************************************************************** -->
41
NAKAMURA Takumif5af6ad2011-04-23 00:30:22 +000042<div>
Chris Lattnere6c91042007-10-22 06:34:15 +000043
Chris Lattner128eb862007-11-05 19:06:59 +000044<p>Welcome to Chapter 2 of the "<a href="index.html">Implementing a language
Chris Lattnerbd779a32007-11-06 18:13:32 +000045with LLVM</a>" tutorial. This chapter shows you how to use the lexer, built in
46<a href="LangImpl1.html">Chapter 1</a>, to build a full <a
Chris Lattnere6c91042007-10-22 06:34:15 +000047href="http://en.wikipedia.org/wiki/Parsing">parser</a> for
Chris Lattnercde1d9d2007-11-06 07:16:22 +000048our Kaleidoscope language. Once we have a parser, we'll define and build an <a
Chris Lattnere6c91042007-10-22 06:34:15 +000049href="http://en.wikipedia.org/wiki/Abstract_syntax_tree">Abstract Syntax
50Tree</a> (AST).</p>
51
52<p>The parser we will build uses a combination of <a
53href="http://en.wikipedia.org/wiki/Recursive_descent_parser">Recursive Descent
54Parsing</a> and <a href=
55"http://en.wikipedia.org/wiki/Operator-precedence_parser">Operator-Precedence
Chris Lattnerbd779a32007-11-06 18:13:32 +000056Parsing</a> to parse the Kaleidoscope language (the latter for
57binary expressions and the former for everything else). Before we get to
58parsing though, lets talk about the output of the parser: the Abstract Syntax
59Tree.</p>
Chris Lattnere6c91042007-10-22 06:34:15 +000060
61</div>
62
63<!-- *********************************************************************** -->
NAKAMURA Takumi05d02652011-04-18 23:59:50 +000064<h2><a name="ast">The Abstract Syntax Tree (AST)</a></h2>
Chris Lattnere6c91042007-10-22 06:34:15 +000065<!-- *********************************************************************** -->
66
NAKAMURA Takumif5af6ad2011-04-23 00:30:22 +000067<div>
Chris Lattnere6c91042007-10-22 06:34:15 +000068
Chris Lattnerbd779a32007-11-06 18:13:32 +000069<p>The AST for a program captures its behavior in such a way that it is easy for
Chris Lattnere6c91042007-10-22 06:34:15 +000070later stages of the compiler (e.g. code generation) to interpret. We basically
71want one object for each construct in the language, and the AST should closely
72model the language. In Kaleidoscope, we have expressions, a prototype, and a
73function object. We'll start with expressions first:</p>
74
75<div class="doc_code">
76<pre>
77/// ExprAST - Base class for all expression nodes.
78class ExprAST {
79public:
80 virtual ~ExprAST() {}
81};
82
83/// NumberExprAST - Expression class for numeric literals like "1.0".
84class NumberExprAST : public ExprAST {
85 double Val;
86public:
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +000087 NumberExprAST(double val) : Val(val) {}
Chris Lattnere6c91042007-10-22 06:34:15 +000088};
89</pre>
90</div>
91
92<p>The code above shows the definition of the base ExprAST class and one
Chris Lattnerbd779a32007-11-06 18:13:32 +000093subclass which we use for numeric literals. The important thing to note about
94this code is that the NumberExprAST class captures the numeric value of the
95literal as an instance variable. This allows later phases of the compiler to
96know what the stored numeric value is.</p>
Chris Lattnere6c91042007-10-22 06:34:15 +000097
98<p>Right now we only create the AST, so there are no useful accessor methods on
99them. It would be very easy to add a virtual method to pretty print the code,
100for example. Here are the other expression AST node definitions that we'll use
Chris Lattner729eb142008-02-10 19:11:04 +0000101in the basic form of the Kaleidoscope language:
Chris Lattnere6c91042007-10-22 06:34:15 +0000102</p>
103
104<div class="doc_code">
105<pre>
106/// VariableExprAST - Expression class for referencing a variable, like "a".
107class VariableExprAST : public ExprAST {
108 std::string Name;
109public:
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +0000110 VariableExprAST(const std::string &amp;name) : Name(name) {}
Chris Lattnere6c91042007-10-22 06:34:15 +0000111};
112
113/// BinaryExprAST - Expression class for a binary operator.
114class BinaryExprAST : public ExprAST {
115 char Op;
116 ExprAST *LHS, *RHS;
117public:
118 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
119 : Op(op), LHS(lhs), RHS(rhs) {}
120};
121
122/// CallExprAST - Expression class for function calls.
123class CallExprAST : public ExprAST {
124 std::string Callee;
125 std::vector&lt;ExprAST*&gt; Args;
126public:
127 CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
128 : Callee(callee), Args(args) {}
129};
130</pre>
131</div>
132
Chris Lattner729eb142008-02-10 19:11:04 +0000133<p>This is all (intentionally) rather straight-forward: variables capture the
Chris Lattnere6c91042007-10-22 06:34:15 +0000134variable name, binary operators capture their opcode (e.g. '+'), and calls
Chris Lattnerbd779a32007-11-06 18:13:32 +0000135capture a function name as well as a list of any argument expressions. One thing
136that is nice about our AST is that it captures the language features without
137talking about the syntax of the language. Note that there is no discussion about
138precedence of binary operators, lexical structure, etc.</p>
Chris Lattnere6c91042007-10-22 06:34:15 +0000139
140<p>For our basic language, these are all of the expression nodes we'll define.
Owen Andersonc6311b92007-10-22 06:48:28 +0000141Because it doesn't have conditional control flow, it isn't Turing-complete;
Chris Lattnere6c91042007-10-22 06:34:15 +0000142we'll fix that in a later installment. The two things we need next are a way
143to talk about the interface to a function, and a way to talk about functions
144themselves:</p>
145
146<div class="doc_code">
147<pre>
148/// PrototypeAST - This class represents the "prototype" for a function,
Chris Lattnercde1d9d2007-11-06 07:16:22 +0000149/// which captures its name, and its argument names (thus implicitly the number
150/// of arguments the function takes).
Chris Lattnere6c91042007-10-22 06:34:15 +0000151class PrototypeAST {
152 std::string Name;
153 std::vector&lt;std::string&gt; Args;
154public:
155 PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args)
156 : Name(name), Args(args) {}
157};
158
159/// FunctionAST - This class represents a function definition itself.
160class FunctionAST {
161 PrototypeAST *Proto;
162 ExprAST *Body;
163public:
164 FunctionAST(PrototypeAST *proto, ExprAST *body)
165 : Proto(proto), Body(body) {}
166};
167</pre>
168</div>
169
170<p>In Kaleidoscope, functions are typed with just a count of their arguments.
Chris Lattnercde1d9d2007-11-06 07:16:22 +0000171Since all values are double precision floating point, the type of each argument
172doesn't need to be stored anywhere. In a more aggressive and realistic
173language, the "ExprAST" class would probably have a type field.</p>
Chris Lattnere6c91042007-10-22 06:34:15 +0000174
175<p>With this scaffolding, we can now talk about parsing expressions and function
176bodies in Kaleidoscope.</p>
177
178</div>
179
180<!-- *********************************************************************** -->
NAKAMURA Takumi05d02652011-04-18 23:59:50 +0000181<h2><a name="parserbasics">Parser Basics</a></h2>
Chris Lattnere6c91042007-10-22 06:34:15 +0000182<!-- *********************************************************************** -->
183
NAKAMURA Takumif5af6ad2011-04-23 00:30:22 +0000184<div>
Chris Lattnere6c91042007-10-22 06:34:15 +0000185
186<p>Now that we have an AST to build, we need to define the parser code to build
187it. The idea here is that we want to parse something like "x+y" (which is
188returned as three tokens by the lexer) into an AST that could be generated with
189calls like this:</p>
190
191<div class="doc_code">
192<pre>
193 ExprAST *X = new VariableExprAST("x");
194 ExprAST *Y = new VariableExprAST("y");
195 ExprAST *Result = new BinaryExprAST('+', X, Y);
196</pre>
197</div>
198
199<p>In order to do this, we'll start by defining some basic helper routines:</p>
200
201<div class="doc_code">
202<pre>
203/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
Chris Lattner729eb142008-02-10 19:11:04 +0000204/// token the parser is looking at. getNextToken reads another token from the
Chris Lattnere6c91042007-10-22 06:34:15 +0000205/// lexer and updates CurTok with its results.
206static int CurTok;
207static int getNextToken() {
208 return CurTok = gettok();
209}
210</pre>
211</div>
212
213<p>
214This implements a simple token buffer around the lexer. This allows
215us to look one token ahead at what the lexer is returning. Every function in
Chris Lattnere9495122007-10-25 18:05:29 +0000216our parser will assume that CurTok is the current token that needs to be
Chris Lattnere6c91042007-10-22 06:34:15 +0000217parsed.</p>
218
Chris Lattnere6c91042007-10-22 06:34:15 +0000219<div class="doc_code">
220<pre>
221
222/// Error* - These are little helper functions for error handling.
223ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
224PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
225FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
226</pre>
227</div>
228
229<p>
230The <tt>Error</tt> routines are simple helper routines that our parser will use
231to handle errors. The error recovery in our parser will not be the best and
Duncan Sands72261ff2007-11-05 16:04:58 +0000232is not particular user-friendly, but it will be enough for our tutorial. These
Chris Lattnere6c91042007-10-22 06:34:15 +0000233routines make it easier to handle errors in routines that have various return
234types: they always return null.</p>
235
Chris Lattnerbd779a32007-11-06 18:13:32 +0000236<p>With these basic helper functions, we can implement the first
237piece of our grammar: numeric literals.</p>
Chris Lattnere6c91042007-10-22 06:34:15 +0000238
239</div>
240
241<!-- *********************************************************************** -->
NAKAMURA Takumi05d02652011-04-18 23:59:50 +0000242<h2><a name="parserprimexprs">Basic Expression Parsing</a></h2>
Chris Lattnere6c91042007-10-22 06:34:15 +0000243<!-- *********************************************************************** -->
244
NAKAMURA Takumif5af6ad2011-04-23 00:30:22 +0000245<div>
Chris Lattnere6c91042007-10-22 06:34:15 +0000246
247<p>We start with numeric literals, because they are the simplest to process.
248For each production in our grammar, we'll define a function which parses that
249production. For numeric literals, we have:
250</p>
251
252<div class="doc_code">
253<pre>
254/// numberexpr ::= number
255static ExprAST *ParseNumberExpr() {
256 ExprAST *Result = new NumberExprAST(NumVal);
257 getNextToken(); // consume the number
258 return Result;
259}
260</pre>
261</div>
262
263<p>This routine is very simple: it expects to be called when the current token
264is a <tt>tok_number</tt> token. It takes the current number value, creates
Chris Lattner729eb142008-02-10 19:11:04 +0000265a <tt>NumberExprAST</tt> node, advances the lexer to the next token, and finally
Chris Lattnere6c91042007-10-22 06:34:15 +0000266returns.</p>
267
Chris Lattnerbd779a32007-11-06 18:13:32 +0000268<p>There are some interesting aspects to this. The most important one is that
Chris Lattner729eb142008-02-10 19:11:04 +0000269this routine eats all of the tokens that correspond to the production and
Chris Lattnere6c91042007-10-22 06:34:15 +0000270returns the lexer buffer with the next token (which is not part of the grammar
271production) ready to go. This is a fairly standard way to go for recursive
272descent parsers. For a better example, the parenthesis operator is defined like
273this:</p>
274
275<div class="doc_code">
276<pre>
277/// parenexpr ::= '(' expression ')'
278static ExprAST *ParseParenExpr() {
279 getNextToken(); // eat (.
280 ExprAST *V = ParseExpression();
281 if (!V) return 0;
282
283 if (CurTok != ')')
284 return Error("expected ')'");
285 getNextToken(); // eat ).
286 return V;
287}
288</pre>
289</div>
290
Chris Lattnerbd779a32007-11-06 18:13:32 +0000291<p>This function illustrates a number of interesting things about the
292parser:</p>
293
294<p>
Chris Lattner729eb142008-02-10 19:11:04 +00002951) It shows how we use the Error routines. When called, this function expects
Chris Lattnere6c91042007-10-22 06:34:15 +0000296that the current token is a '(' token, but after parsing the subexpression, it
Chris Lattnercde1d9d2007-11-06 07:16:22 +0000297is possible that there is no ')' waiting. For example, if the user types in
Chris Lattnere6c91042007-10-22 06:34:15 +0000298"(4 x" instead of "(4)", the parser should emit an error. Because errors can
299occur, the parser needs a way to indicate that they happened: in our parser, we
300return null on an error.</p>
301
Chris Lattnerbd779a32007-11-06 18:13:32 +0000302<p>2) Another interesting aspect of this function is that it uses recursion by
Owen Andersonc6311b92007-10-22 06:48:28 +0000303calling <tt>ParseExpression</tt> (we will soon see that <tt>ParseExpression</tt> can call
Chris Lattnere6c91042007-10-22 06:34:15 +0000304<tt>ParseParenExpr</tt>). This is powerful because it allows us to handle
305recursive grammars, and keeps each production very simple. Note that
Duncan Sands72261ff2007-11-05 16:04:58 +0000306parentheses do not cause construction of AST nodes themselves. While we could
Chris Lattner729eb142008-02-10 19:11:04 +0000307do it this way, the most important role of parentheses are to guide the parser
308and provide grouping. Once the parser constructs the AST, parentheses are not
Chris Lattnerbd779a32007-11-06 18:13:32 +0000309needed.</p>
Chris Lattnere6c91042007-10-22 06:34:15 +0000310
311<p>The next simple production is for handling variable references and function
312calls:</p>
313
314<div class="doc_code">
315<pre>
316/// identifierexpr
Chris Lattner20a0c802007-11-05 17:54:34 +0000317/// ::= identifier
318/// ::= identifier '(' expression* ')'
Chris Lattnere6c91042007-10-22 06:34:15 +0000319static ExprAST *ParseIdentifierExpr() {
320 std::string IdName = IdentifierStr;
321
Chris Lattner20a0c802007-11-05 17:54:34 +0000322 getNextToken(); // eat identifier.
Chris Lattnere6c91042007-10-22 06:34:15 +0000323
324 if (CurTok != '(') // Simple variable ref.
325 return new VariableExprAST(IdName);
326
327 // Call.
328 getNextToken(); // eat (
329 std::vector&lt;ExprAST*&gt; Args;
Chris Lattner71155212007-11-06 01:39:12 +0000330 if (CurTok != ')') {
331 while (1) {
332 ExprAST *Arg = ParseExpression();
333 if (!Arg) return 0;
334 Args.push_back(Arg);
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +0000335
Chris Lattner71155212007-11-06 01:39:12 +0000336 if (CurTok == ')') break;
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +0000337
Chris Lattner71155212007-11-06 01:39:12 +0000338 if (CurTok != ',')
Chris Lattner6c4be9c2008-04-14 16:44:41 +0000339 return Error("Expected ')' or ',' in argument list");
Chris Lattner71155212007-11-06 01:39:12 +0000340 getNextToken();
341 }
Chris Lattnere6c91042007-10-22 06:34:15 +0000342 }
343
344 // Eat the ')'.
345 getNextToken();
346
347 return new CallExprAST(IdName, Args);
348}
349</pre>
350</div>
351
Chris Lattner729eb142008-02-10 19:11:04 +0000352<p>This routine follows the same style as the other routines. (It expects to be
Chris Lattnere6c91042007-10-22 06:34:15 +0000353called if the current token is a <tt>tok_identifier</tt> token). It also has
354recursion and error handling. One interesting aspect of this is that it uses
355<em>look-ahead</em> to determine if the current identifier is a stand alone
356variable reference or if it is a function call expression. It handles this by
Chris Lattner729eb142008-02-10 19:11:04 +0000357checking to see if the token after the identifier is a '(' token, constructing
Chris Lattnere6c91042007-10-22 06:34:15 +0000358either a <tt>VariableExprAST</tt> or <tt>CallExprAST</tt> node as appropriate.
359</p>
360
Chris Lattner729eb142008-02-10 19:11:04 +0000361<p>Now that we have all of our simple expression-parsing logic in place, we can
362define a helper function to wrap it together into one entry point. We call this
Chris Lattnercde1d9d2007-11-06 07:16:22 +0000363class of expressions "primary" expressions, for reasons that will become more
364clear <a href="LangImpl6.html#unary">later in the tutorial</a>. In order to
365parse an arbitrary primary expression, we need to determine what sort of
Chris Lattner729eb142008-02-10 19:11:04 +0000366expression it is:</p>
Chris Lattnere6c91042007-10-22 06:34:15 +0000367
368<div class="doc_code">
369<pre>
370/// primary
371/// ::= identifierexpr
372/// ::= numberexpr
373/// ::= parenexpr
374static ExprAST *ParsePrimary() {
375 switch (CurTok) {
376 default: return Error("unknown token when expecting an expression");
377 case tok_identifier: return ParseIdentifierExpr();
378 case tok_number: return ParseNumberExpr();
379 case '(': return ParseParenExpr();
380 }
381}
382</pre>
383</div>
384
Chris Lattner729eb142008-02-10 19:11:04 +0000385<p>Now that you see the definition of this function, it is more obvious why we
386can assume the state of CurTok in the various functions. This uses look-ahead
387to determine which sort of expression is being inspected, and then parses it
388with a function call.</p>
Chris Lattnere6c91042007-10-22 06:34:15 +0000389
Chris Lattner729eb142008-02-10 19:11:04 +0000390<p>Now that basic expressions are handled, we need to handle binary expressions.
391They are a bit more complex.</p>
Chris Lattnere6c91042007-10-22 06:34:15 +0000392
393</div>
394
395<!-- *********************************************************************** -->
NAKAMURA Takumi05d02652011-04-18 23:59:50 +0000396<h2><a name="parserbinops">Binary Expression Parsing</a></h2>
Chris Lattnere6c91042007-10-22 06:34:15 +0000397<!-- *********************************************************************** -->
398
NAKAMURA Takumif5af6ad2011-04-23 00:30:22 +0000399<div>
Chris Lattnere6c91042007-10-22 06:34:15 +0000400
401<p>Binary expressions are significantly harder to parse because they are often
402ambiguous. For example, when given the string "x+y*z", the parser can choose
403to parse it as either "(x+y)*z" or "x+(y*z)". With common definitions from
404mathematics, we expect the later parse, because "*" (multiplication) has
405higher <em>precedence</em> than "+" (addition).</p>
406
407<p>There are many ways to handle this, but an elegant and efficient way is to
408use <a href=
409"http://en.wikipedia.org/wiki/Operator-precedence_parser">Operator-Precedence
410Parsing</a>. This parsing technique uses the precedence of binary operators to
411guide recursion. To start with, we need a table of precedences:</p>
412
413<div class="doc_code">
414<pre>
415/// BinopPrecedence - This holds the precedence for each binary operator that is
416/// defined.
417static std::map&lt;char, int&gt; BinopPrecedence;
418
419/// GetTokPrecedence - Get the precedence of the pending binary operator token.
420static int GetTokPrecedence() {
421 if (!isascii(CurTok))
422 return -1;
423
424 // Make sure it's a declared binop.
425 int TokPrec = BinopPrecedence[CurTok];
426 if (TokPrec &lt;= 0) return -1;
427 return TokPrec;
428}
429
430int main() {
431 // Install standard binary operators.
432 // 1 is lowest precedence.
433 BinopPrecedence['&lt;'] = 10;
434 BinopPrecedence['+'] = 20;
435 BinopPrecedence['-'] = 20;
436 BinopPrecedence['*'] = 40; // highest.
437 ...
438}
439</pre>
440</div>
441
442<p>For the basic form of Kaleidoscope, we will only support 4 binary operators
Chris Lattnercde1d9d2007-11-06 07:16:22 +0000443(this can obviously be extended by you, our brave and intrepid reader). The
Chris Lattnere6c91042007-10-22 06:34:15 +0000444<tt>GetTokPrecedence</tt> function returns the precedence for the current token,
445or -1 if the token is not a binary operator. Having a map makes it easy to add
446new operators and makes it clear that the algorithm doesn't depend on the
447specific operators involved, but it would be easy enough to eliminate the map
Chris Lattner729eb142008-02-10 19:11:04 +0000448and do the comparisons in the <tt>GetTokPrecedence</tt> function. (Or just use
Chris Lattnercde1d9d2007-11-06 07:16:22 +0000449a fixed-size array).</p>
Chris Lattnere6c91042007-10-22 06:34:15 +0000450
451<p>With the helper above defined, we can now start parsing binary expressions.
452The basic idea of operator precedence parsing is to break down an expression
Chris Lattner729eb142008-02-10 19:11:04 +0000453with potentially ambiguous binary operators into pieces. Consider ,for example,
Chris Lattnere6c91042007-10-22 06:34:15 +0000454the expression "a+b+(c+d)*e*f+g". Operator precedence parsing considers this
455as a stream of primary expressions separated by binary operators. As such,
456it will first parse the leading primary expression "a", then it will see the
457pairs [+, b] [+, (c+d)] [*, e] [*, f] and [+, g]. Note that because parentheses
Duncan Sands72261ff2007-11-05 16:04:58 +0000458are primary expressions, the binary expression parser doesn't need to worry
Chris Lattnere6c91042007-10-22 06:34:15 +0000459about nested subexpressions like (c+d) at all.
460</p>
461
462<p>
463To start, an expression is a primary expression potentially followed by a
464sequence of [binop,primaryexpr] pairs:</p>
465
466<div class="doc_code">
467<pre>
468/// expression
469/// ::= primary binoprhs
470///
471static ExprAST *ParseExpression() {
472 ExprAST *LHS = ParsePrimary();
473 if (!LHS) return 0;
474
475 return ParseBinOpRHS(0, LHS);
476}
477</pre>
478</div>
479
480<p><tt>ParseBinOpRHS</tt> is the function that parses the sequence of pairs for
Chris Lattnerbd779a32007-11-06 18:13:32 +0000481us. It takes a precedence and a pointer to an expression for the part that has been
482parsed so far. Note that "x" is a perfectly valid expression: As such, "binoprhs" is
Chris Lattnere6c91042007-10-22 06:34:15 +0000483allowed to be empty, in which case it returns the expression that is passed into
484it. In our example above, the code passes the expression for "a" into
485<tt>ParseBinOpRHS</tt> and the current token is "+".</p>
486
487<p>The precedence value passed into <tt>ParseBinOpRHS</tt> indicates the <em>
488minimal operator precedence</em> that the function is allowed to eat. For
489example, if the current pair stream is [+, x] and <tt>ParseBinOpRHS</tt> is
490passed in a precedence of 40, it will not consume any tokens (because the
491precedence of '+' is only 20). With this in mind, <tt>ParseBinOpRHS</tt> starts
492with:</p>
493
494<div class="doc_code">
495<pre>
496/// binoprhs
497/// ::= ('+' primary)*
498static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
499 // If this is a binop, find its precedence.
500 while (1) {
501 int TokPrec = GetTokPrecedence();
502
503 // If this is a binop that binds at least as tightly as the current binop,
504 // consume it, otherwise we are done.
505 if (TokPrec &lt; ExprPrec)
506 return LHS;
507</pre>
508</div>
509
510<p>This code gets the precedence of the current token and checks to see if if is
511too low. Because we defined invalid tokens to have a precedence of -1, this
512check implicitly knows that the pair-stream ends when the token stream runs out
513of binary operators. If this check succeeds, we know that the token is a binary
514operator and that it will be included in this expression:</p>
515
516<div class="doc_code">
517<pre>
518 // Okay, we know this is a binop.
519 int BinOp = CurTok;
520 getNextToken(); // eat binop
521
522 // Parse the primary expression after the binary operator.
523 ExprAST *RHS = ParsePrimary();
524 if (!RHS) return 0;
525</pre>
526</div>
527
528<p>As such, this code eats (and remembers) the binary operator and then parses
Chris Lattnerbd779a32007-11-06 18:13:32 +0000529the primary expression that follows. This builds up the whole pair, the first of
Chris Lattnere6c91042007-10-22 06:34:15 +0000530which is [+, b] for the running example.</p>
531
532<p>Now that we parsed the left-hand side of an expression and one pair of the
533RHS sequence, we have to decide which way the expression associates. In
534particular, we could have "(a+b) binop unparsed" or "a + (b binop unparsed)".
535To determine this, we look ahead at "binop" to determine its precedence and
536compare it to BinOp's precedence (which is '+' in this case):</p>
537
538<div class="doc_code">
539<pre>
540 // If BinOp binds less tightly with RHS than the operator after RHS, let
541 // the pending operator take RHS as its LHS.
542 int NextPrec = GetTokPrecedence();
543 if (TokPrec &lt; NextPrec) {
544</pre>
545</div>
546
547<p>If the precedence of the binop to the right of "RHS" is lower or equal to the
548precedence of our current operator, then we know that the parentheses associate
Chris Lattnerbd779a32007-11-06 18:13:32 +0000549as "(a+b) binop ...". In our example, the current operator is "+" and the next
550operator is "+", we know that they have the same precedence. In this case we'll
Chris Lattnere6c91042007-10-22 06:34:15 +0000551create the AST node for "a+b", and then continue parsing:</p>
552
553<div class="doc_code">
554<pre>
555 ... if body omitted ...
556 }
557
558 // Merge LHS/RHS.
559 LHS = new BinaryExprAST(BinOp, LHS, RHS);
560 } // loop around to the top of the while loop.
561}
562</pre>
563</div>
564
565<p>In our example above, this will turn "a+b+" into "(a+b)" and execute the next
Chris Lattnerbd779a32007-11-06 18:13:32 +0000566iteration of the loop, with "+" as the current token. The code above will eat,
567remember, and parse "(c+d)" as the primary expression, which makes the
568current pair equal to [+, (c+d)]. It will then evaluate the 'if' conditional above with
569"*" as the binop to the right of the primary. In this case, the precedence of "*" is
Chris Lattnere6c91042007-10-22 06:34:15 +0000570higher than the precedence of "+" so the if condition will be entered.</p>
571
572<p>The critical question left here is "how can the if condition parse the right
573hand side in full"? In particular, to build the AST correctly for our example,
574it needs to get all of "(c+d)*e*f" as the RHS expression variable. The code to
575do this is surprisingly simple (code from the above two blocks duplicated for
576context):</p>
577
578<div class="doc_code">
579<pre>
580 // If BinOp binds less tightly with RHS than the operator after RHS, let
581 // the pending operator take RHS as its LHS.
582 int NextPrec = GetTokPrecedence();
583 if (TokPrec &lt; NextPrec) {
Chris Lattnercde1d9d2007-11-06 07:16:22 +0000584 <b>RHS = ParseBinOpRHS(TokPrec+1, RHS);
585 if (RHS == 0) return 0;</b>
Chris Lattnere6c91042007-10-22 06:34:15 +0000586 }
587 // Merge LHS/RHS.
588 LHS = new BinaryExprAST(BinOp, LHS, RHS);
589 } // loop around to the top of the while loop.
590}
591</pre>
592</div>
593
594<p>At this point, we know that the binary operator to the RHS of our primary
595has higher precedence than the binop we are currently parsing. As such, we know
596that any sequence of pairs whose operators are all higher precedence than "+"
597should be parsed together and returned as "RHS". To do this, we recursively
598invoke the <tt>ParseBinOpRHS</tt> function specifying "TokPrec+1" as the minimum
599precedence required for it to continue. In our example above, this will cause
600it to return the AST node for "(c+d)*e*f" as RHS, which is then set as the RHS
601of the '+' expression.</p>
602
Chris Lattnerbd779a32007-11-06 18:13:32 +0000603<p>Finally, on the next iteration of the while loop, the "+g" piece is parsed
Chris Lattnere6c91042007-10-22 06:34:15 +0000604and added to the AST. With this little bit of code (14 non-trivial lines), we
605correctly handle fully general binary expression parsing in a very elegant way.
Chris Lattnercde1d9d2007-11-06 07:16:22 +0000606This was a whirlwind tour of this code, and it is somewhat subtle. I recommend
607running through it with a few tough examples to see how it works.
Chris Lattnere6c91042007-10-22 06:34:15 +0000608</p>
609
610<p>This wraps up handling of expressions. At this point, we can point the
Chris Lattnerbd779a32007-11-06 18:13:32 +0000611parser at an arbitrary token stream and build an expression from it, stopping
Chris Lattnere6c91042007-10-22 06:34:15 +0000612at the first token that is not part of the expression. Next up we need to
Chris Lattnerbd779a32007-11-06 18:13:32 +0000613handle function definitions, etc.</p>
Chris Lattnere6c91042007-10-22 06:34:15 +0000614
615</div>
616
617<!-- *********************************************************************** -->
NAKAMURA Takumi05d02652011-04-18 23:59:50 +0000618<h2><a name="parsertop">Parsing the Rest</a></h2>
Chris Lattnere6c91042007-10-22 06:34:15 +0000619<!-- *********************************************************************** -->
620
NAKAMURA Takumif5af6ad2011-04-23 00:30:22 +0000621<div>
Chris Lattnere6c91042007-10-22 06:34:15 +0000622
623<p>
Chris Lattnercde1d9d2007-11-06 07:16:22 +0000624The next thing missing is handling of function prototypes. In Kaleidoscope,
Chris Lattnere6c91042007-10-22 06:34:15 +0000625these are used both for 'extern' function declarations as well as function body
626definitions. The code to do this is straight-forward and not very interesting
627(once you've survived expressions):
628</p>
629
630<div class="doc_code">
631<pre>
632/// prototype
633/// ::= id '(' id* ')'
634static PrototypeAST *ParsePrototype() {
635 if (CurTok != tok_identifier)
636 return ErrorP("Expected function name in prototype");
637
638 std::string FnName = IdentifierStr;
639 getNextToken();
640
641 if (CurTok != '(')
642 return ErrorP("Expected '(' in prototype");
643
Chris Lattnercde1d9d2007-11-06 07:16:22 +0000644 // Read the list of argument names.
Chris Lattnere6c91042007-10-22 06:34:15 +0000645 std::vector&lt;std::string&gt; ArgNames;
646 while (getNextToken() == tok_identifier)
647 ArgNames.push_back(IdentifierStr);
648 if (CurTok != ')')
649 return ErrorP("Expected ')' in prototype");
650
651 // success.
652 getNextToken(); // eat ')'.
653
654 return new PrototypeAST(FnName, ArgNames);
655}
656</pre>
657</div>
658
659<p>Given this, a function definition is very simple, just a prototype plus
Duncan Sands72261ff2007-11-05 16:04:58 +0000660an expression to implement the body:</p>
Chris Lattnere6c91042007-10-22 06:34:15 +0000661
662<div class="doc_code">
663<pre>
664/// definition ::= 'def' prototype expression
665static FunctionAST *ParseDefinition() {
666 getNextToken(); // eat def.
667 PrototypeAST *Proto = ParsePrototype();
668 if (Proto == 0) return 0;
669
670 if (ExprAST *E = ParseExpression())
671 return new FunctionAST(Proto, E);
672 return 0;
673}
674</pre>
675</div>
676
677<p>In addition, we support 'extern' to declare functions like 'sin' and 'cos' as
Chris Lattnerbd779a32007-11-06 18:13:32 +0000678well as to support forward declaration of user functions. These 'extern's are just
Chris Lattnere6c91042007-10-22 06:34:15 +0000679prototypes with no body:</p>
680
681<div class="doc_code">
682<pre>
683/// external ::= 'extern' prototype
684static PrototypeAST *ParseExtern() {
685 getNextToken(); // eat extern.
686 return ParsePrototype();
687}
688</pre>
689</div>
690
691<p>Finally, we'll also let the user type in arbitrary top-level expressions and
692evaluate them on the fly. We will handle this by defining anonymous nullary
693(zero argument) functions for them:</p>
694
695<div class="doc_code">
696<pre>
697/// toplevelexpr ::= expression
698static FunctionAST *ParseTopLevelExpr() {
699 if (ExprAST *E = ParseExpression()) {
700 // Make an anonymous proto.
701 PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
702 return new FunctionAST(Proto, E);
703 }
704 return 0;
705}
706</pre>
707</div>
708
Chris Lattner729eb142008-02-10 19:11:04 +0000709<p>Now that we have all the pieces, let's build a little driver that will let us
Chris Lattnere6c91042007-10-22 06:34:15 +0000710actually <em>execute</em> this code we've built!</p>
711
712</div>
713
714<!-- *********************************************************************** -->
NAKAMURA Takumi05d02652011-04-18 23:59:50 +0000715<h2><a name="driver">The Driver</a></h2>
Chris Lattnere6c91042007-10-22 06:34:15 +0000716<!-- *********************************************************************** -->
717
NAKAMURA Takumif5af6ad2011-04-23 00:30:22 +0000718<div>
Chris Lattnere6c91042007-10-22 06:34:15 +0000719
720<p>The driver for this simply invokes all of the parsing pieces with a top-level
721dispatch loop. There isn't much interesting here, so I'll just include the
722top-level loop. See <a href="#code">below</a> for full code in the "Top-Level
723Parsing" section.</p>
724
725<div class="doc_code">
726<pre>
727/// top ::= definition | external | expression | ';'
728static void MainLoop() {
729 while (1) {
730 fprintf(stderr, "ready&gt; ");
731 switch (CurTok) {
732 case tok_eof: return;
Chris Lattner729eb142008-02-10 19:11:04 +0000733 case ';': getNextToken(); break; // ignore top-level semicolons.
Chris Lattnere6c91042007-10-22 06:34:15 +0000734 case tok_def: HandleDefinition(); break;
735 case tok_extern: HandleExtern(); break;
736 default: HandleTopLevelExpression(); break;
737 }
738 }
739}
740</pre>
741</div>
742
Chris Lattner729eb142008-02-10 19:11:04 +0000743<p>The most interesting part of this is that we ignore top-level semicolons.
Owen Andersonc6311b92007-10-22 06:48:28 +0000744Why is this, you ask? The basic reason is that if you type "4 + 5" at the
Chris Lattnerbd779a32007-11-06 18:13:32 +0000745command line, the parser doesn't know whether that is the end of what you will type
746or not. For example, on the next line you could type "def foo..." in which case
Chris Lattnere6c91042007-10-22 06:34:15 +00007474+5 is the end of a top-level expression. Alternatively you could type "* 6",
748which would continue the expression. Having top-level semicolons allows you to
Chris Lattner729eb142008-02-10 19:11:04 +0000749type "4+5;", and the parser will know you are done.</p>
Chris Lattnere6c91042007-10-22 06:34:15 +0000750
751</div>
752
753<!-- *********************************************************************** -->
NAKAMURA Takumi05d02652011-04-18 23:59:50 +0000754<h2><a name="conclusions">Conclusions</a></h2>
Chris Lattnere6c91042007-10-22 06:34:15 +0000755<!-- *********************************************************************** -->
756
NAKAMURA Takumif5af6ad2011-04-23 00:30:22 +0000757<div>
Chris Lattnere6c91042007-10-22 06:34:15 +0000758
Chris Lattnercde1d9d2007-11-06 07:16:22 +0000759<p>With just under 400 lines of commented code (240 lines of non-comment,
760non-blank code), we fully defined our minimal language, including a lexer,
Chris Lattner729eb142008-02-10 19:11:04 +0000761parser, and AST builder. With this done, the executable will validate
762Kaleidoscope code and tell us if it is grammatically invalid. For
Chris Lattnere6c91042007-10-22 06:34:15 +0000763example, here is a sample interaction:</p>
764
765<div class="doc_code">
766<pre>
Chris Lattnercde1d9d2007-11-06 07:16:22 +0000767$ <b>./a.out</b>
768ready&gt; <b>def foo(x y) x+foo(y, 4.0);</b>
769Parsed a function definition.
770ready&gt; <b>def foo(x y) x+y y;</b>
771Parsed a function definition.
772Parsed a top-level expr
773ready&gt; <b>def foo(x y) x+y );</b>
774Parsed a function definition.
775Error: unknown token when expecting an expression
776ready&gt; <b>extern sin(a);</b>
Chris Lattner35abbf52007-10-23 06:23:57 +0000777ready&gt; Parsed an extern
Chris Lattnercde1d9d2007-11-06 07:16:22 +0000778ready&gt; <b>^D</b>
Chris Lattnere6c91042007-10-22 06:34:15 +0000779$
780</pre>
781</div>
782
Chris Lattnerd93a5842007-10-23 05:43:01 +0000783<p>There is a lot of room for extension here. You can define new AST nodes,
784extend the language in many ways, etc. In the <a href="LangImpl3.html">next
Chris Lattnerbd779a32007-11-06 18:13:32 +0000785installment</a>, we will describe how to generate LLVM Intermediate
786Representation (IR) from the AST.</p>
Chris Lattnerd93a5842007-10-23 05:43:01 +0000787
788</div>
789
790<!-- *********************************************************************** -->
NAKAMURA Takumi05d02652011-04-18 23:59:50 +0000791<h2><a name="code">Full Code Listing</a></h2>
Chris Lattnerd93a5842007-10-23 05:43:01 +0000792<!-- *********************************************************************** -->
793
NAKAMURA Takumif5af6ad2011-04-23 00:30:22 +0000794<div>
Chris Lattnerd93a5842007-10-23 05:43:01 +0000795
Chris Lattnere6c91042007-10-22 06:34:15 +0000796<p>
Chris Lattnerd93a5842007-10-23 05:43:01 +0000797Here is the complete code listing for this and the previous chapter.
798Note that it is fully self-contained: you don't need LLVM or any external
Chris Lattner729eb142008-02-10 19:11:04 +0000799libraries at all for this. (Besides the C and C++ standard libraries, of
800course.) To build this, just compile with:</p>
Chris Lattnere6c91042007-10-22 06:34:15 +0000801
802<div class="doc_code">
803<pre>
Chris Lattnerd93a5842007-10-23 05:43:01 +0000804 # Compile
Chris Lattnercde1d9d2007-11-06 07:16:22 +0000805 g++ -g -O3 toy.cpp
Chris Lattnerd93a5842007-10-23 05:43:01 +0000806 # Run
807 ./a.out
808</pre>
809</div>
Chris Lattnere6c91042007-10-22 06:34:15 +0000810
Chris Lattnerd93a5842007-10-23 05:43:01 +0000811<p>Here is the code:</p>
812
813<div class="doc_code">
814<pre>
Chris Lattnere6c91042007-10-22 06:34:15 +0000815#include &lt;cstdio&gt;
John McCall46c81812009-08-17 21:07:37 +0000816#include &lt;cstdlib&gt;
Chris Lattnere6c91042007-10-22 06:34:15 +0000817#include &lt;string&gt;
Chris Lattnerd93a5842007-10-23 05:43:01 +0000818#include &lt;map&gt;
Chris Lattnere6c91042007-10-22 06:34:15 +0000819#include &lt;vector&gt;
820
821//===----------------------------------------------------------------------===//
822// Lexer
823//===----------------------------------------------------------------------===//
824
825// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
826// of these for known things.
827enum Token {
828 tok_eof = -1,
829
830 // commands
831 tok_def = -2, tok_extern = -3,
832
833 // primary
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +0000834 tok_identifier = -4, tok_number = -5
Chris Lattnere6c91042007-10-22 06:34:15 +0000835};
836
837static std::string IdentifierStr; // Filled in if tok_identifier
838static double NumVal; // Filled in if tok_number
839
840/// gettok - Return the next token from standard input.
841static int gettok() {
842 static int LastChar = ' ';
843
844 // Skip any whitespace.
845 while (isspace(LastChar))
846 LastChar = getchar();
847
848 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
849 IdentifierStr = LastChar;
850 while (isalnum((LastChar = getchar())))
851 IdentifierStr += LastChar;
852
853 if (IdentifierStr == "def") return tok_def;
854 if (IdentifierStr == "extern") return tok_extern;
855 return tok_identifier;
856 }
857
858 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
859 std::string NumStr;
860 do {
861 NumStr += LastChar;
862 LastChar = getchar();
863 } while (isdigit(LastChar) || LastChar == '.');
864
865 NumVal = strtod(NumStr.c_str(), 0);
866 return tok_number;
867 }
868
869 if (LastChar == '#') {
870 // Comment until end of line.
871 do LastChar = getchar();
Chris Lattnerc80c23f2007-12-02 22:46:01 +0000872 while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp;&amp; LastChar != '\r');
Chris Lattnere6c91042007-10-22 06:34:15 +0000873
874 if (LastChar != EOF)
875 return gettok();
876 }
877
878 // Check for end of file. Don't eat the EOF.
879 if (LastChar == EOF)
880 return tok_eof;
881
882 // Otherwise, just return the character as its ascii value.
883 int ThisChar = LastChar;
884 LastChar = getchar();
885 return ThisChar;
886}
887
888//===----------------------------------------------------------------------===//
889// Abstract Syntax Tree (aka Parse Tree)
890//===----------------------------------------------------------------------===//
891
892/// ExprAST - Base class for all expression nodes.
893class ExprAST {
894public:
895 virtual ~ExprAST() {}
896};
897
898/// NumberExprAST - Expression class for numeric literals like "1.0".
899class NumberExprAST : public ExprAST {
900 double Val;
901public:
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +0000902 NumberExprAST(double val) : Val(val) {}
Chris Lattnere6c91042007-10-22 06:34:15 +0000903};
904
905/// VariableExprAST - Expression class for referencing a variable, like "a".
906class VariableExprAST : public ExprAST {
907 std::string Name;
908public:
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +0000909 VariableExprAST(const std::string &amp;name) : Name(name) {}
Chris Lattnere6c91042007-10-22 06:34:15 +0000910};
911
912/// BinaryExprAST - Expression class for a binary operator.
913class BinaryExprAST : public ExprAST {
914 char Op;
915 ExprAST *LHS, *RHS;
916public:
917 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
918 : Op(op), LHS(lhs), RHS(rhs) {}
919};
920
921/// CallExprAST - Expression class for function calls.
922class CallExprAST : public ExprAST {
923 std::string Callee;
924 std::vector&lt;ExprAST*&gt; Args;
925public:
926 CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
927 : Callee(callee), Args(args) {}
928};
929
930/// PrototypeAST - This class represents the "prototype" for a function,
Chris Lattnercde1d9d2007-11-06 07:16:22 +0000931/// which captures its name, and its argument names (thus implicitly the number
932/// of arguments the function takes).
Chris Lattnere6c91042007-10-22 06:34:15 +0000933class PrototypeAST {
934 std::string Name;
Chris Lattner5ea8ef82008-02-22 17:09:39 +0000935 std::vector&lt;std::string&gt; Args;
Chris Lattnere6c91042007-10-22 06:34:15 +0000936public:
937 PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args)
938 : Name(name), Args(args) {}
939
940};
941
942/// FunctionAST - This class represents a function definition itself.
943class FunctionAST {
944 PrototypeAST *Proto;
945 ExprAST *Body;
946public:
947 FunctionAST(PrototypeAST *proto, ExprAST *body)
948 : Proto(proto), Body(body) {}
949
950};
951
952//===----------------------------------------------------------------------===//
953// Parser
954//===----------------------------------------------------------------------===//
955
956/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
Chris Lattner729eb142008-02-10 19:11:04 +0000957/// token the parser is looking at. getNextToken reads another token from the
Chris Lattnere6c91042007-10-22 06:34:15 +0000958/// lexer and updates CurTok with its results.
959static int CurTok;
960static int getNextToken() {
961 return CurTok = gettok();
962}
963
964/// BinopPrecedence - This holds the precedence for each binary operator that is
965/// defined.
966static std::map&lt;char, int&gt; BinopPrecedence;
967
968/// GetTokPrecedence - Get the precedence of the pending binary operator token.
969static int GetTokPrecedence() {
970 if (!isascii(CurTok))
971 return -1;
972
973 // Make sure it's a declared binop.
974 int TokPrec = BinopPrecedence[CurTok];
975 if (TokPrec &lt;= 0) return -1;
976 return TokPrec;
977}
978
979/// Error* - These are little helper functions for error handling.
980ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
981PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
982FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
983
984static ExprAST *ParseExpression();
985
986/// identifierexpr
Chris Lattner20a0c802007-11-05 17:54:34 +0000987/// ::= identifier
988/// ::= identifier '(' expression* ')'
Chris Lattnere6c91042007-10-22 06:34:15 +0000989static ExprAST *ParseIdentifierExpr() {
990 std::string IdName = IdentifierStr;
991
Chris Lattner20a0c802007-11-05 17:54:34 +0000992 getNextToken(); // eat identifier.
Chris Lattnere6c91042007-10-22 06:34:15 +0000993
994 if (CurTok != '(') // Simple variable ref.
995 return new VariableExprAST(IdName);
996
997 // Call.
998 getNextToken(); // eat (
999 std::vector&lt;ExprAST*&gt; Args;
Chris Lattner71155212007-11-06 01:39:12 +00001000 if (CurTok != ')') {
1001 while (1) {
1002 ExprAST *Arg = ParseExpression();
1003 if (!Arg) return 0;
1004 Args.push_back(Arg);
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00001005
Chris Lattner71155212007-11-06 01:39:12 +00001006 if (CurTok == ')') break;
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00001007
Chris Lattner71155212007-11-06 01:39:12 +00001008 if (CurTok != ',')
Chris Lattner6c4be9c2008-04-14 16:44:41 +00001009 return Error("Expected ')' or ',' in argument list");
Chris Lattner71155212007-11-06 01:39:12 +00001010 getNextToken();
1011 }
Chris Lattnere6c91042007-10-22 06:34:15 +00001012 }
1013
1014 // Eat the ')'.
1015 getNextToken();
1016
1017 return new CallExprAST(IdName, Args);
1018}
1019
1020/// numberexpr ::= number
1021static ExprAST *ParseNumberExpr() {
1022 ExprAST *Result = new NumberExprAST(NumVal);
1023 getNextToken(); // consume the number
1024 return Result;
1025}
1026
1027/// parenexpr ::= '(' expression ')'
1028static ExprAST *ParseParenExpr() {
1029 getNextToken(); // eat (.
1030 ExprAST *V = ParseExpression();
1031 if (!V) return 0;
1032
1033 if (CurTok != ')')
1034 return Error("expected ')'");
1035 getNextToken(); // eat ).
1036 return V;
1037}
1038
1039/// primary
1040/// ::= identifierexpr
1041/// ::= numberexpr
1042/// ::= parenexpr
1043static ExprAST *ParsePrimary() {
1044 switch (CurTok) {
1045 default: return Error("unknown token when expecting an expression");
1046 case tok_identifier: return ParseIdentifierExpr();
1047 case tok_number: return ParseNumberExpr();
1048 case '(': return ParseParenExpr();
1049 }
1050}
1051
1052/// binoprhs
1053/// ::= ('+' primary)*
1054static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
1055 // If this is a binop, find its precedence.
1056 while (1) {
1057 int TokPrec = GetTokPrecedence();
1058
1059 // If this is a binop that binds at least as tightly as the current binop,
1060 // consume it, otherwise we are done.
1061 if (TokPrec &lt; ExprPrec)
1062 return LHS;
1063
1064 // Okay, we know this is a binop.
1065 int BinOp = CurTok;
1066 getNextToken(); // eat binop
1067
1068 // Parse the primary expression after the binary operator.
1069 ExprAST *RHS = ParsePrimary();
1070 if (!RHS) return 0;
1071
1072 // If BinOp binds less tightly with RHS than the operator after RHS, let
1073 // the pending operator take RHS as its LHS.
1074 int NextPrec = GetTokPrecedence();
1075 if (TokPrec &lt; NextPrec) {
1076 RHS = ParseBinOpRHS(TokPrec+1, RHS);
1077 if (RHS == 0) return 0;
1078 }
1079
1080 // Merge LHS/RHS.
1081 LHS = new BinaryExprAST(BinOp, LHS, RHS);
1082 }
1083}
1084
1085/// expression
1086/// ::= primary binoprhs
1087///
1088static ExprAST *ParseExpression() {
1089 ExprAST *LHS = ParsePrimary();
1090 if (!LHS) return 0;
1091
1092 return ParseBinOpRHS(0, LHS);
1093}
1094
1095/// prototype
1096/// ::= id '(' id* ')'
1097static PrototypeAST *ParsePrototype() {
1098 if (CurTok != tok_identifier)
1099 return ErrorP("Expected function name in prototype");
1100
1101 std::string FnName = IdentifierStr;
1102 getNextToken();
1103
1104 if (CurTok != '(')
1105 return ErrorP("Expected '(' in prototype");
1106
1107 std::vector&lt;std::string&gt; ArgNames;
1108 while (getNextToken() == tok_identifier)
1109 ArgNames.push_back(IdentifierStr);
1110 if (CurTok != ')')
1111 return ErrorP("Expected ')' in prototype");
1112
1113 // success.
1114 getNextToken(); // eat ')'.
1115
1116 return new PrototypeAST(FnName, ArgNames);
1117}
1118
1119/// definition ::= 'def' prototype expression
1120static FunctionAST *ParseDefinition() {
1121 getNextToken(); // eat def.
1122 PrototypeAST *Proto = ParsePrototype();
1123 if (Proto == 0) return 0;
1124
1125 if (ExprAST *E = ParseExpression())
1126 return new FunctionAST(Proto, E);
1127 return 0;
1128}
1129
1130/// toplevelexpr ::= expression
1131static FunctionAST *ParseTopLevelExpr() {
1132 if (ExprAST *E = ParseExpression()) {
1133 // Make an anonymous proto.
Chris Lattner5ea8ef82008-02-22 17:09:39 +00001134 PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
Chris Lattnere6c91042007-10-22 06:34:15 +00001135 return new FunctionAST(Proto, E);
1136 }
1137 return 0;
1138}
1139
1140/// external ::= 'extern' prototype
1141static PrototypeAST *ParseExtern() {
1142 getNextToken(); // eat extern.
1143 return ParsePrototype();
1144}
1145
1146//===----------------------------------------------------------------------===//
1147// Top-Level parsing
1148//===----------------------------------------------------------------------===//
1149
1150static void HandleDefinition() {
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00001151 if (ParseDefinition()) {
Chris Lattnere6c91042007-10-22 06:34:15 +00001152 fprintf(stderr, "Parsed a function definition.\n");
1153 } else {
1154 // Skip token for error recovery.
1155 getNextToken();
1156 }
1157}
1158
1159static void HandleExtern() {
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00001160 if (ParseExtern()) {
Chris Lattnere6c91042007-10-22 06:34:15 +00001161 fprintf(stderr, "Parsed an extern\n");
1162 } else {
1163 // Skip token for error recovery.
1164 getNextToken();
1165 }
1166}
1167
1168static void HandleTopLevelExpression() {
Chris Lattner729eb142008-02-10 19:11:04 +00001169 // Evaluate a top-level expression into an anonymous function.
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00001170 if (ParseTopLevelExpr()) {
Chris Lattnere6c91042007-10-22 06:34:15 +00001171 fprintf(stderr, "Parsed a top-level expr\n");
1172 } else {
1173 // Skip token for error recovery.
1174 getNextToken();
1175 }
1176}
1177
1178/// top ::= definition | external | expression | ';'
1179static void MainLoop() {
1180 while (1) {
1181 fprintf(stderr, "ready&gt; ");
1182 switch (CurTok) {
1183 case tok_eof: return;
Chris Lattner729eb142008-02-10 19:11:04 +00001184 case ';': getNextToken(); break; // ignore top-level semicolons.
Chris Lattnere6c91042007-10-22 06:34:15 +00001185 case tok_def: HandleDefinition(); break;
1186 case tok_extern: HandleExtern(); break;
1187 default: HandleTopLevelExpression(); break;
1188 }
1189 }
1190}
1191
1192//===----------------------------------------------------------------------===//
1193// Main driver code.
1194//===----------------------------------------------------------------------===//
1195
1196int main() {
1197 // Install standard binary operators.
1198 // 1 is lowest precedence.
1199 BinopPrecedence['&lt;'] = 10;
1200 BinopPrecedence['+'] = 20;
1201 BinopPrecedence['-'] = 20;
1202 BinopPrecedence['*'] = 40; // highest.
1203
1204 // Prime the first token.
1205 fprintf(stderr, "ready&gt; ");
1206 getNextToken();
1207
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00001208 // Run the main "interpreter loop" now.
Chris Lattnere6c91042007-10-22 06:34:15 +00001209 MainLoop();
Erick Tryzelaarfd1ec5e2009-09-22 21:14:49 +00001210
Chris Lattnere6c91042007-10-22 06:34:15 +00001211 return 0;
1212}
1213</pre>
1214</div>
Chris Lattner729eb142008-02-10 19:11:04 +00001215<a href="LangImpl3.html">Next: Implementing Code Generation to LLVM IR</a>
Chris Lattnere6c91042007-10-22 06:34:15 +00001216</div>
1217
1218<!-- *********************************************************************** -->
1219<hr>
1220<address>
1221 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
1222 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
1223 <a href="http://validator.w3.org/check/referer"><img
Chris Lattner8eef4b22007-10-23 06:30:50 +00001224 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
Chris Lattnere6c91042007-10-22 06:34:15 +00001225
1226 <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
NAKAMURA Takumib9a33632011-04-09 02:13:37 +00001227 <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br>
Dan Gohman523e3922010-02-03 17:27:31 +00001228 Last modified: $Date$
Chris Lattnere6c91042007-10-22 06:34:15 +00001229</address>
1230</body>
1231</html>