Blame - docs/tutorial/LangImpl2.html - fp2-dev/platform/external/llvm

blob: df54f5b84f9fcbb526e25de786738f8ce7f2e0cb [file] [log] [blame]

Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
				2	"http://www.w3.org/TR/html4/strict.dtd">
				3
				4	<html>
				5	<head>
				6	<title>Kaleidoscope: Implementing a Parser and AST</title>
				7	<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
				8	<meta name="author" content="Chris Lattner">
				9	<link rel="stylesheet" href="../llvm.css" type="text/css">
				10	</head>
				11
				12	<body>
				13
				14	<div class="doc_title">Kaleidoscope: Implementing a Parser and AST</div>
				15
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	16	<ul>
Chris Lattner	0e555b1	2007-11-05 20:04:56 +0000	[diff] [blame]	17	<li><a href="index.html">Up to Tutorial Index</a></li>
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	18	<li>Chapter 2
				19	<ol>
				20	<li><a href="#intro">Chapter 2 Introduction</a></li>
				21	<li><a href="#ast">The Abstract Syntax Tree (AST)</a></li>
				22	<li><a href="#parserbasics">Parser Basics</a></li>
				23	<li><a href="#parserprimexprs">Basic Expression Parsing</a></li>
				24	<li><a href="#parserbinops">Binary Expression Parsing</a></li>
				25	<li><a href="#parsertop">Parsing the Rest</a></li>
				26	<li><a href="#driver">The Driver</a></li>
				27	<li><a href="#conclusions">Conclusions</a></li>
				28	<li><a href="#code">Full Code Listing</a></li>
				29	</ol>
				30	</li>
Chris Lattner	0e555b1	2007-11-05 20:04:56 +0000	[diff] [blame]	31	<li><a href="LangImpl3.html">Chapter 3</a>: Code generation to LLVM IR</li>
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	32	</ul>
				33
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	34	<div class="doc_author">
				35	<p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
				36	</div>
				37
				38	<!-- *********************************************************************** -->
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	39	<div class="doc_section"><a name="intro">Chapter 2 Introduction</a></div>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	40	<!-- *********************************************************************** -->
				41
				42	<div class="doc_text">
				43
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	44	<p>Welcome to Chapter 2 of the "<a href="index.html">Implementing a language
				45	with LLVM</a>" tutorial. This chapter shows you how to use the <a
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	46	href="LangImpl1.html">Lexer built in Chapter 1</a> to build a full <a
				47	href="http://en.wikipedia.org/wiki/Parsing">parser</a> for
Chris Lattner	cde1d9d	2007-11-06 07:16:22 +0000	[diff] [blame^]	48	our Kaleidoscope language. Once we have a parser, we'll define and build an <a
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	49	href="http://en.wikipedia.org/wiki/Abstract_syntax_tree">Abstract Syntax
				50	Tree</a> (AST).</p>
				51
				52	<p>The parser we will build uses a combination of <a
				53	href="http://en.wikipedia.org/wiki/Recursive_descent_parser">Recursive Descent
				54	Parsing</a> and <a href=
				55	"http://en.wikipedia.org/wiki/Operator-precedence_parser">Operator-Precedence
Chris Lattner	cde1d9d	2007-11-06 07:16:22 +0000	[diff] [blame^]	56	Parsing</a> to parse the Kaleidoscope language (the latter for binary expression
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	57	and the former for everything else). Before we get to parsing though, lets talk
				58	about the output of the parser: the Abstract Syntax Tree.</p>
				59
				60	</div>
				61
				62	<!-- *********************************************************************** -->
				63	<div class="doc_section"><a name="ast">The Abstract Syntax Tree (AST)</a></div>
				64	<!-- *********************************************************************** -->
				65
				66	<div class="doc_text">
				67
				68	<p>The AST for a program captures its behavior in a way that it is easy for
				69	later stages of the compiler (e.g. code generation) to interpret. We basically
				70	want one object for each construct in the language, and the AST should closely
				71	model the language. In Kaleidoscope, we have expressions, a prototype, and a
				72	function object. We'll start with expressions first:</p>
				73
				74	<div class="doc_code">
				75	<pre>
				76	/// ExprAST - Base class for all expression nodes.
				77	class ExprAST {
				78	public:
				79	virtual ~ExprAST() {}
				80	};
				81
				82	/// NumberExprAST - Expression class for numeric literals like "1.0".
				83	class NumberExprAST : public ExprAST {
				84	double Val;
				85	public:
Chris Lattner	28571ed	2007-10-23 04:27:44 +0000	[diff] [blame]	86	explicit NumberExprAST(double val) : Val(val) {}
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	87	};
				88	</pre>
				89	</div>
				90
				91	<p>The code above shows the definition of the base ExprAST class and one
				92	subclass which we use for numeric literals. The important thing about this is
				93	that the NumberExprAST class captures the numeric value of the literal in the
				94	class, so that later phases of the compiler can know what it is.</p>
				95
				96	<p>Right now we only create the AST, so there are no useful accessor methods on
				97	them. It would be very easy to add a virtual method to pretty print the code,
				98	for example. Here are the other expression AST node definitions that we'll use
				99	in the basic form of the Kaleidoscope language.
				100	</p>
				101
				102	<div class="doc_code">
				103	<pre>
				104	/// VariableExprAST - Expression class for referencing a variable, like "a".
				105	class VariableExprAST : public ExprAST {
				106	std::string Name;
				107	public:
Chris Lattner	28571ed	2007-10-23 04:27:44 +0000	[diff] [blame]	108	explicit VariableExprAST(const std::string &name) : Name(name) {}
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	109	};
				110
				111	/// BinaryExprAST - Expression class for a binary operator.
				112	class BinaryExprAST : public ExprAST {
				113	char Op;
				114	ExprAST LHS, RHS;
				115	public:
				116	BinaryExprAST(char op, ExprAST lhs, ExprAST rhs)
				117	: Op(op), LHS(lhs), RHS(rhs) {}
				118	};
				119
				120	/// CallExprAST - Expression class for function calls.
				121	class CallExprAST : public ExprAST {
				122	std::string Callee;
				123	std::vector<ExprAST*> Args;
				124	public:
				125	CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
				126	: Callee(callee), Args(args) {}
				127	};
				128	</pre>
				129	</div>
				130
				131	<p>This is all (intentially) rather straight-forward: variables capture the
				132	variable name, binary operators capture their opcode (e.g. '+'), and calls
				133	capture a function name and list of argument expressions. One thing that is
				134	nice about our AST is that it captures the language features without talking
				135	about the syntax of the language. Note that there is no discussion about
				136	precedence of binary operators, lexical structure etc.</p>
				137
				138	<p>For our basic language, these are all of the expression nodes we'll define.
Owen Anderson	c6311b9	2007-10-22 06:48:28 +0000	[diff] [blame]	139	Because it doesn't have conditional control flow, it isn't Turing-complete;
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	140	we'll fix that in a later installment. The two things we need next are a way
				141	to talk about the interface to a function, and a way to talk about functions
				142	themselves:</p>
				143
				144	<div class="doc_code">
				145	<pre>
				146	/// PrototypeAST - This class represents the "prototype" for a function,
Chris Lattner	cde1d9d	2007-11-06 07:16:22 +0000	[diff] [blame^]	147	/// which captures its name, and its argument names (thus implicitly the number
				148	/// of arguments the function takes).
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	149	class PrototypeAST {
				150	std::string Name;
				151	std::vector<std::string> Args;
				152	public:
				153	PrototypeAST(const std::string &name, const std::vector<std::string> &args)
				154	: Name(name), Args(args) {}
				155	};
				156
				157	/// FunctionAST - This class represents a function definition itself.
				158	class FunctionAST {
				159	PrototypeAST *Proto;
				160	ExprAST *Body;
				161	public:
				162	FunctionAST(PrototypeAST proto, ExprAST body)
				163	: Proto(proto), Body(body) {}
				164	};
				165	</pre>
				166	</div>
				167
				168	<p>In Kaleidoscope, functions are typed with just a count of their arguments.
Chris Lattner	cde1d9d	2007-11-06 07:16:22 +0000	[diff] [blame^]	169	Since all values are double precision floating point, the type of each argument
				170	doesn't need to be stored anywhere. In a more aggressive and realistic
				171	language, the "ExprAST" class would probably have a type field.</p>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	172
				173	<p>With this scaffolding, we can now talk about parsing expressions and function
				174	bodies in Kaleidoscope.</p>
				175
				176	</div>
				177
				178	<!-- *********************************************************************** -->
				179	<div class="doc_section"><a name="parserbasics">Parser Basics</a></div>
				180	<!-- *********************************************************************** -->
				181
				182	<div class="doc_text">
				183
				184	<p>Now that we have an AST to build, we need to define the parser code to build
				185	it. The idea here is that we want to parse something like "x+y" (which is
				186	returned as three tokens by the lexer) into an AST that could be generated with
				187	calls like this:</p>
				188
				189	<div class="doc_code">
				190	<pre>
				191	ExprAST *X = new VariableExprAST("x");
				192	ExprAST *Y = new VariableExprAST("y");
				193	ExprAST *Result = new BinaryExprAST('+', X, Y);
				194	</pre>
				195	</div>
				196
				197	<p>In order to do this, we'll start by defining some basic helper routines:</p>
				198
				199	<div class="doc_code">
				200	<pre>
				201	/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
				202	/// token the parser it looking at. getNextToken reads another token from the
				203	/// lexer and updates CurTok with its results.
				204	static int CurTok;
				205	static int getNextToken() {
				206	return CurTok = gettok();
				207	}
				208	</pre>
				209	</div>
				210
				211	<p>
				212	This implements a simple token buffer around the lexer. This allows
				213	us to look one token ahead at what the lexer is returning. Every function in
Chris Lattner	e949512	2007-10-25 18:05:29 +0000	[diff] [blame]	214	our parser will assume that CurTok is the current token that needs to be
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	215	parsed.</p>
				216
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	217	<div class="doc_code">
				218	<pre>
				219
				220	/// Error* - These are little helper functions for error handling.
				221	ExprAST Error(const char Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
				222	PrototypeAST ErrorP(const char Str) { Error(Str); return 0; }
				223	FunctionAST ErrorF(const char Str) { Error(Str); return 0; }
				224	</pre>
				225	</div>
				226
				227	<p>
				228	The <tt>Error</tt> routines are simple helper routines that our parser will use
				229	to handle errors. The error recovery in our parser will not be the best and
Duncan Sands	72261ff	2007-11-05 16:04:58 +0000	[diff] [blame]	230	is not particular user-friendly, but it will be enough for our tutorial. These
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	231	routines make it easier to handle errors in routines that have various return
				232	types: they always return null.</p>
				233
				234	<p>With these basic helper functions implemented, we can implement the first
				235	piece of our grammar: we'll start with numeric literals.</p>
				236
				237	</div>
				238
				239	<!-- *********************************************************************** -->
				240	<div class="doc_section"><a name="parserprimexprs">Basic Expression
				241	Parsing</a></div>
				242	<!-- *********************************************************************** -->
				243
				244	<div class="doc_text">
				245
				246	<p>We start with numeric literals, because they are the simplest to process.
				247	For each production in our grammar, we'll define a function which parses that
				248	production. For numeric literals, we have:
				249	</p>
				250
				251	<div class="doc_code">
				252	<pre>
				253	/// numberexpr ::= number
				254	static ExprAST *ParseNumberExpr() {
				255	ExprAST *Result = new NumberExprAST(NumVal);
				256	getNextToken(); // consume the number
				257	return Result;
				258	}
				259	</pre>
				260	</div>
				261
				262	<p>This routine is very simple: it expects to be called when the current token
				263	is a <tt>tok_number</tt> token. It takes the current number value, creates
				264	a <tt>NumberExprAST</tt> node, advances the lexer to the next token, then
				265	returns.</p>
				266
				267	<p>There are some interesting aspects of this. The most important one is that
				268	this routine eats all of the tokens that correspond to the production, and
				269	returns the lexer buffer with the next token (which is not part of the grammar
				270	production) ready to go. This is a fairly standard way to go for recursive
				271	descent parsers. For a better example, the parenthesis operator is defined like
				272	this:</p>
				273
				274	<div class="doc_code">
				275	<pre>
				276	/// parenexpr ::= '(' expression ')'
				277	static ExprAST *ParseParenExpr() {
				278	getNextToken(); // eat (.
				279	ExprAST *V = ParseExpression();
				280	if (!V) return 0;
				281
				282	if (CurTok != ')')
				283	return Error("expected ')'");
				284	getNextToken(); // eat ).
				285	return V;
				286	}
				287	</pre>
				288	</div>
				289
				290	<p>This function illustrates a number of interesting things about the parser:
				291	1) it shows how we use the Error routines. When called, this function expects
				292	that the current token is a '(' token, but after parsing the subexpression, it
Chris Lattner	cde1d9d	2007-11-06 07:16:22 +0000	[diff] [blame^]	293	is possible that there is no ')' waiting. For example, if the user types in
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	294	"(4 x" instead of "(4)", the parser should emit an error. Because errors can
				295	occur, the parser needs a way to indicate that they happened: in our parser, we
				296	return null on an error.</p>
				297
				298	<p>Another interesting aspect of this function is that it uses recursion by
Owen Anderson	c6311b9	2007-10-22 06:48:28 +0000	[diff] [blame]	299	calling <tt>ParseExpression</tt> (we will soon see that <tt>ParseExpression</tt> can call
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	300	<tt>ParseParenExpr</tt>). This is powerful because it allows us to handle
				301	recursive grammars, and keeps each production very simple. Note that
Duncan Sands	72261ff	2007-11-05 16:04:58 +0000	[diff] [blame]	302	parentheses do not cause construction of AST nodes themselves. While we could
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	303	do this, the most important role of parens are to guide the parser and provide
				304	grouping. Once the parser constructs the AST, parens are not needed.</p>
				305
				306	<p>The next simple production is for handling variable references and function
				307	calls:</p>
				308
				309	<div class="doc_code">
				310	<pre>
				311	/// identifierexpr
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	312	/// ::= identifier
				313	/// ::= identifier '(' expression* ')'
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	314	static ExprAST *ParseIdentifierExpr() {
				315	std::string IdName = IdentifierStr;
				316
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	317	getNextToken(); // eat identifier.
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	318
				319	if (CurTok != '(') // Simple variable ref.
				320	return new VariableExprAST(IdName);
				321
				322	// Call.
				323	getNextToken(); // eat (
				324	std::vector<ExprAST*> Args;
Chris Lattner	7115521	2007-11-06 01:39:12 +0000	[diff] [blame]	325	if (CurTok != ')') {
				326	while (1) {
				327	ExprAST *Arg = ParseExpression();
				328	if (!Arg) return 0;
				329	Args.push_back(Arg);
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	330
Chris Lattner	7115521	2007-11-06 01:39:12 +0000	[diff] [blame]	331	if (CurTok == ')') break;
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	332
Chris Lattner	7115521	2007-11-06 01:39:12 +0000	[diff] [blame]	333	if (CurTok != ',')
				334	return Error("Expected ')'");
				335	getNextToken();
				336	}
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	337	}
				338
				339	// Eat the ')'.
				340	getNextToken();
				341
				342	return new CallExprAST(IdName, Args);
				343	}
				344	</pre>
				345	</div>
				346
				347	<p>This routine follows the same style as the other routines (it expects to be
				348	called if the current token is a <tt>tok_identifier</tt> token). It also has
				349	recursion and error handling. One interesting aspect of this is that it uses
				350	<em>look-ahead</em> to determine if the current identifier is a stand alone
				351	variable reference or if it is a function call expression. It handles this by
				352	checking to see if the token after the identifier is a '(' token, and constructs
				353	either a <tt>VariableExprAST</tt> or <tt>CallExprAST</tt> node as appropriate.
				354	</p>
				355
				356	<p>Now that we have all of our simple expression parsing logic in place, we can
Chris Lattner	cde1d9d	2007-11-06 07:16:22 +0000	[diff] [blame^]	357	define a helper function to wrap it together into one entry-point. We call this
				358	class of expressions "primary" expressions, for reasons that will become more
				359	clear <a href="LangImpl6.html#unary">later in the tutorial</a>. In order to
				360	parse an arbitrary primary expression, we need to determine what sort of
				361	specific expression it is:</p>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	362
				363	<div class="doc_code">
				364	<pre>
				365	/// primary
				366	/// ::= identifierexpr
				367	/// ::= numberexpr
				368	/// ::= parenexpr
				369	static ExprAST *ParsePrimary() {
				370	switch (CurTok) {
				371	default: return Error("unknown token when expecting an expression");
				372	case tok_identifier: return ParseIdentifierExpr();
				373	case tok_number: return ParseNumberExpr();
				374	case '(': return ParseParenExpr();
				375	}
				376	}
				377	</pre>
				378	</div>
				379
				380	<p>Now that you see the definition of this function, it makes it more obvious
				381	why we can assume the state of CurTok in the various functions. This uses
				382	look-ahead to determine which sort of expression is being inspected, and parses
				383	it with a function call.</p>
				384
				385	<p>Now that basic expressions are handled, we need to handle binary expressions,
				386	which are a bit more complex.</p>
				387
				388	</div>
				389
				390	<!-- *********************************************************************** -->
				391	<div class="doc_section"><a name="parserbinops">Binary Expression
				392	Parsing</a></div>
				393	<!-- *********************************************************************** -->
				394
				395	<div class="doc_text">
				396
				397	<p>Binary expressions are significantly harder to parse because they are often
				398	ambiguous. For example, when given the string "x+y*z", the parser can choose
				399	to parse it as either "(x+y)z" or "x+(yz)". With common definitions from
				400	mathematics, we expect the later parse, because "*" (multiplication) has
				401	higher <em>precedence</em> than "+" (addition).</p>
				402
				403	<p>There are many ways to handle this, but an elegant and efficient way is to
				404	use <a href=
				405	"http://en.wikipedia.org/wiki/Operator-precedence_parser">Operator-Precedence
				406	Parsing</a>. This parsing technique uses the precedence of binary operators to
				407	guide recursion. To start with, we need a table of precedences:</p>
				408
				409	<div class="doc_code">
				410	<pre>
				411	/// BinopPrecedence - This holds the precedence for each binary operator that is
				412	/// defined.
				413	static std::map<char, int> BinopPrecedence;
				414
				415	/// GetTokPrecedence - Get the precedence of the pending binary operator token.
				416	static int GetTokPrecedence() {
				417	if (!isascii(CurTok))
				418	return -1;
				419
				420	// Make sure it's a declared binop.
				421	int TokPrec = BinopPrecedence[CurTok];
				422	if (TokPrec <= 0) return -1;
				423	return TokPrec;
				424	}
				425
				426	int main() {
				427	// Install standard binary operators.
				428	// 1 is lowest precedence.
				429	BinopPrecedence['<'] = 10;
				430	BinopPrecedence['+'] = 20;
				431	BinopPrecedence['-'] = 20;
				432	BinopPrecedence['*'] = 40; // highest.
				433	...
				434	}
				435	</pre>
				436	</div>
				437
				438	<p>For the basic form of Kaleidoscope, we will only support 4 binary operators
Chris Lattner	cde1d9d	2007-11-06 07:16:22 +0000	[diff] [blame^]	439	(this can obviously be extended by you, our brave and intrepid reader). The
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	440	<tt>GetTokPrecedence</tt> function returns the precedence for the current token,
				441	or -1 if the token is not a binary operator. Having a map makes it easy to add
				442	new operators and makes it clear that the algorithm doesn't depend on the
				443	specific operators involved, but it would be easy enough to eliminate the map
Chris Lattner	cde1d9d	2007-11-06 07:16:22 +0000	[diff] [blame^]	444	and do the comparisons in the <tt>GetTokPrecedence</tt> function (or just use
				445	a fixed-size array).</p>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	446
				447	<p>With the helper above defined, we can now start parsing binary expressions.
				448	The basic idea of operator precedence parsing is to break down an expression
				449	with potentially ambiguous binary operators into pieces. Consider for example
				450	the expression "a+b+(c+d)ef+g". Operator precedence parsing considers this
				451	as a stream of primary expressions separated by binary operators. As such,
				452	it will first parse the leading primary expression "a", then it will see the
				453	pairs [+, b] [+, (c+d)] [, e] [, f] and [+, g]. Note that because parentheses
Duncan Sands	72261ff	2007-11-05 16:04:58 +0000	[diff] [blame]	454	are primary expressions, the binary expression parser doesn't need to worry
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	455	about nested subexpressions like (c+d) at all.
				456	</p>
				457
				458	<p>
				459	To start, an expression is a primary expression potentially followed by a
				460	sequence of [binop,primaryexpr] pairs:</p>
				461
				462	<div class="doc_code">
				463	<pre>
				464	/// expression
				465	/// ::= primary binoprhs
				466	///
				467	static ExprAST *ParseExpression() {
				468	ExprAST *LHS = ParsePrimary();
				469	if (!LHS) return 0;
				470
				471	return ParseBinOpRHS(0, LHS);
				472	}
				473	</pre>
				474	</div>
				475
				476	<p><tt>ParseBinOpRHS</tt> is the function that parses the sequence of pairs for
				477	us. It takes a precedence and a pointer to an expression for the part parsed
				478	so far. Note that "x" is a perfectly valid expression: As such, "binoprhs" is
				479	allowed to be empty, in which case it returns the expression that is passed into
				480	it. In our example above, the code passes the expression for "a" into
				481	<tt>ParseBinOpRHS</tt> and the current token is "+".</p>
				482
				483	<p>The precedence value passed into <tt>ParseBinOpRHS</tt> indicates the <em>
				484	minimal operator precedence</em> that the function is allowed to eat. For
				485	example, if the current pair stream is [+, x] and <tt>ParseBinOpRHS</tt> is
				486	passed in a precedence of 40, it will not consume any tokens (because the
				487	precedence of '+' is only 20). With this in mind, <tt>ParseBinOpRHS</tt> starts
				488	with:</p>
				489
				490	<div class="doc_code">
				491	<pre>
				492	/// binoprhs
				493	/// ::= ('+' primary)*
				494	static ExprAST ParseBinOpRHS(int ExprPrec, ExprAST LHS) {
				495	// If this is a binop, find its precedence.
				496	while (1) {
				497	int TokPrec = GetTokPrecedence();
				498
				499	// If this is a binop that binds at least as tightly as the current binop,
				500	// consume it, otherwise we are done.
				501	if (TokPrec < ExprPrec)
				502	return LHS;
				503	</pre>
				504	</div>
				505
				506	<p>This code gets the precedence of the current token and checks to see if if is
				507	too low. Because we defined invalid tokens to have a precedence of -1, this
				508	check implicitly knows that the pair-stream ends when the token stream runs out
				509	of binary operators. If this check succeeds, we know that the token is a binary
				510	operator and that it will be included in this expression:</p>
				511
				512	<div class="doc_code">
				513	<pre>
				514	// Okay, we know this is a binop.
				515	int BinOp = CurTok;
				516	getNextToken(); // eat binop
				517
				518	// Parse the primary expression after the binary operator.
				519	ExprAST *RHS = ParsePrimary();
				520	if (!RHS) return 0;
				521	</pre>
				522	</div>
				523
				524	<p>As such, this code eats (and remembers) the binary operator and then parses
				525	the following primary expression. This builds up the whole pair, the first of
				526	which is [+, b] for the running example.</p>
				527
				528	<p>Now that we parsed the left-hand side of an expression and one pair of the
				529	RHS sequence, we have to decide which way the expression associates. In
				530	particular, we could have "(a+b) binop unparsed" or "a + (b binop unparsed)".
				531	To determine this, we look ahead at "binop" to determine its precedence and
				532	compare it to BinOp's precedence (which is '+' in this case):</p>
				533
				534	<div class="doc_code">
				535	<pre>
				536	// If BinOp binds less tightly with RHS than the operator after RHS, let
				537	// the pending operator take RHS as its LHS.
				538	int NextPrec = GetTokPrecedence();
				539	if (TokPrec < NextPrec) {
				540	</pre>
				541	</div>
				542
				543	<p>If the precedence of the binop to the right of "RHS" is lower or equal to the
				544	precedence of our current operator, then we know that the parentheses associate
				545	as "(a+b) binop ...". In our example, since the next operator is "+" and so is
				546	our current one, we know that they have the same precedence. In this case we'll
				547	create the AST node for "a+b", and then continue parsing:</p>
				548
				549	<div class="doc_code">
				550	<pre>
				551	... if body omitted ...
				552	}
				553
				554	// Merge LHS/RHS.
				555	LHS = new BinaryExprAST(BinOp, LHS, RHS);
				556	} // loop around to the top of the while loop.
				557	}
				558	</pre>
				559	</div>
				560
				561	<p>In our example above, this will turn "a+b+" into "(a+b)" and execute the next
				562	iteration of the loop, with "+" as the current token. The code above will eat
				563	and remember it and parse "(c+d)" as the primary expression, which makes the
				564	current pair be [+, (c+d)]. It will then enter the 'if' above with "*" as the
				565	binop to the right of the primary. In this case, the precedence of "*" is
				566	higher than the precedence of "+" so the if condition will be entered.</p>
				567
				568	<p>The critical question left here is "how can the if condition parse the right
				569	hand side in full"? In particular, to build the AST correctly for our example,
				570	it needs to get all of "(c+d)ef" as the RHS expression variable. The code to
				571	do this is surprisingly simple (code from the above two blocks duplicated for
				572	context):</p>
				573
				574	<div class="doc_code">
				575	<pre>
				576	// If BinOp binds less tightly with RHS than the operator after RHS, let
				577	// the pending operator take RHS as its LHS.
				578	int NextPrec = GetTokPrecedence();
				579	if (TokPrec < NextPrec) {
Chris Lattner	cde1d9d	2007-11-06 07:16:22 +0000	[diff] [blame^]	580	<b>RHS = ParseBinOpRHS(TokPrec+1, RHS);
				581	if (RHS == 0) return 0;</b>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	582	}
				583	// Merge LHS/RHS.
				584	LHS = new BinaryExprAST(BinOp, LHS, RHS);
				585	} // loop around to the top of the while loop.
				586	}
				587	</pre>
				588	</div>
				589
				590	<p>At this point, we know that the binary operator to the RHS of our primary
				591	has higher precedence than the binop we are currently parsing. As such, we know
				592	that any sequence of pairs whose operators are all higher precedence than "+"
				593	should be parsed together and returned as "RHS". To do this, we recursively
				594	invoke the <tt>ParseBinOpRHS</tt> function specifying "TokPrec+1" as the minimum
				595	precedence required for it to continue. In our example above, this will cause
				596	it to return the AST node for "(c+d)ef" as RHS, which is then set as the RHS
				597	of the '+' expression.</p>
				598
				599	<p>Finally, on the next iteration of the while loop, the "+g" piece is parsed.
				600	and added to the AST. With this little bit of code (14 non-trivial lines), we
				601	correctly handle fully general binary expression parsing in a very elegant way.
Chris Lattner	cde1d9d	2007-11-06 07:16:22 +0000	[diff] [blame^]	602	This was a whirlwind tour of this code, and it is somewhat subtle. I recommend
				603	running through it with a few tough examples to see how it works.
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	604	</p>
				605
				606	<p>This wraps up handling of expressions. At this point, we can point the
				607	parser at an arbitrary token stream and build an expression from them, stopping
				608	at the first token that is not part of the expression. Next up we need to
				609	handle function definitions etc.</p>
				610
				611	</div>
				612
				613	<!-- *********************************************************************** -->
				614	<div class="doc_section"><a name="parsertop">Parsing the Rest</a></div>
				615	<!-- *********************************************************************** -->
				616
				617	<div class="doc_text">
				618
				619	<p>
Chris Lattner	cde1d9d	2007-11-06 07:16:22 +0000	[diff] [blame^]	620	The next thing missing is handling of function prototypes. In Kaleidoscope,
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	621	these are used both for 'extern' function declarations as well as function body
				622	definitions. The code to do this is straight-forward and not very interesting
				623	(once you've survived expressions):
				624	</p>
				625
				626	<div class="doc_code">
				627	<pre>
				628	/// prototype
				629	/// ::= id '(' id* ')'
				630	static PrototypeAST *ParsePrototype() {
				631	if (CurTok != tok_identifier)
				632	return ErrorP("Expected function name in prototype");
				633
				634	std::string FnName = IdentifierStr;
				635	getNextToken();
				636
				637	if (CurTok != '(')
				638	return ErrorP("Expected '(' in prototype");
				639
Chris Lattner	cde1d9d	2007-11-06 07:16:22 +0000	[diff] [blame^]	640	// Read the list of argument names.
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	641	std::vector<std::string> ArgNames;
				642	while (getNextToken() == tok_identifier)
				643	ArgNames.push_back(IdentifierStr);
				644	if (CurTok != ')')
				645	return ErrorP("Expected ')' in prototype");
				646
				647	// success.
				648	getNextToken(); // eat ')'.
				649
				650	return new PrototypeAST(FnName, ArgNames);
				651	}
				652	</pre>
				653	</div>
				654
				655	<p>Given this, a function definition is very simple, just a prototype plus
Duncan Sands	72261ff	2007-11-05 16:04:58 +0000	[diff] [blame]	656	an expression to implement the body:</p>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	657
				658	<div class="doc_code">
				659	<pre>
				660	/// definition ::= 'def' prototype expression
				661	static FunctionAST *ParseDefinition() {
				662	getNextToken(); // eat def.
				663	PrototypeAST *Proto = ParsePrototype();
				664	if (Proto == 0) return 0;
				665
				666	if (ExprAST *E = ParseExpression())
				667	return new FunctionAST(Proto, E);
				668	return 0;
				669	}
				670	</pre>
				671	</div>
				672
				673	<p>In addition, we support 'extern' to declare functions like 'sin' and 'cos' as
				674	well as to support forward declaration of user functions. 'externs' are just
				675	prototypes with no body:</p>
				676
				677	<div class="doc_code">
				678	<pre>
				679	/// external ::= 'extern' prototype
				680	static PrototypeAST *ParseExtern() {
				681	getNextToken(); // eat extern.
				682	return ParsePrototype();
				683	}
				684	</pre>
				685	</div>
				686
				687	<p>Finally, we'll also let the user type in arbitrary top-level expressions and
				688	evaluate them on the fly. We will handle this by defining anonymous nullary
				689	(zero argument) functions for them:</p>
				690
				691	<div class="doc_code">
				692	<pre>
				693	/// toplevelexpr ::= expression
				694	static FunctionAST *ParseTopLevelExpr() {
				695	if (ExprAST *E = ParseExpression()) {
				696	// Make an anonymous proto.
				697	PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
				698	return new FunctionAST(Proto, E);
				699	}
				700	return 0;
				701	}
				702	</pre>
				703	</div>
				704
				705	<p>Now that we have all the pieces, lets build a little driver that will let us
				706	actually <em>execute</em> this code we've built!</p>
				707
				708	</div>
				709
				710	<!-- *********************************************************************** -->
				711	<div class="doc_section"><a name="driver">The Driver</a></div>
				712	<!-- *********************************************************************** -->
				713
				714	<div class="doc_text">
				715
				716	<p>The driver for this simply invokes all of the parsing pieces with a top-level
				717	dispatch loop. There isn't much interesting here, so I'll just include the
				718	top-level loop. See <a href="#code">below</a> for full code in the "Top-Level
				719	Parsing" section.</p>
				720
				721	<div class="doc_code">
				722	<pre>
				723	/// top ::= definition \| external \| expression \| ';'
				724	static void MainLoop() {
				725	while (1) {
				726	fprintf(stderr, "ready> ");
				727	switch (CurTok) {
				728	case tok_eof: return;
				729	case ';': getNextToken(); break; // ignore top level semicolons.
				730	case tok_def: HandleDefinition(); break;
				731	case tok_extern: HandleExtern(); break;
				732	default: HandleTopLevelExpression(); break;
				733	}
				734	}
				735	}
				736	</pre>
				737	</div>
				738
				739	<p>The most interesting part of this is that we ignore top-level semi colons.
Owen Anderson	c6311b9	2007-10-22 06:48:28 +0000	[diff] [blame]	740	Why is this, you ask? The basic reason is that if you type "4 + 5" at the
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	741	command line, the parser doesn't know that that is the end of what you will
				742	type. For example, on the next line you could type "def foo..." in which case
				743	4+5 is the end of a top-level expression. Alternatively you could type "* 6",
				744	which would continue the expression. Having top-level semicolons allows you to
				745	type "4+5;" and the parser will know you are done.</p>
				746
				747	</div>
				748
				749	<!-- *********************************************************************** -->
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	750	<div class="doc_section"><a name="conclusions">Conclusions</a></div>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	751	<!-- *********************************************************************** -->
				752
				753	<div class="doc_text">
				754
Chris Lattner	cde1d9d	2007-11-06 07:16:22 +0000	[diff] [blame^]	755	<p>With just under 400 lines of commented code (240 lines of non-comment,
				756	non-blank code), we fully defined our minimal language, including a lexer,
				757	parser and AST builder. With this done, the executable will validate
				758	Kaleidoscope code and tell us if it is gramatically invalid. For
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	759	example, here is a sample interaction:</p>
				760
				761	<div class="doc_code">
				762	<pre>
Chris Lattner	cde1d9d	2007-11-06 07:16:22 +0000	[diff] [blame^]	763	$ <b>./a.out</b>
				764	ready> <b>def foo(x y) x+foo(y, 4.0);</b>
				765	Parsed a function definition.
				766	ready> <b>def foo(x y) x+y y;</b>
				767	Parsed a function definition.
				768	Parsed a top-level expr
				769	ready> <b>def foo(x y) x+y );</b>
				770	Parsed a function definition.
				771	Error: unknown token when expecting an expression
				772	ready> <b>extern sin(a);</b>
Chris Lattner	35abbf5	2007-10-23 06:23:57 +0000	[diff] [blame]	773	ready> Parsed an extern
Chris Lattner	cde1d9d	2007-11-06 07:16:22 +0000	[diff] [blame^]	774	ready> <b>^D</b>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	775	$
				776	</pre>
				777	</div>
				778
Chris Lattner	d93a584	2007-10-23 05:43:01 +0000	[diff] [blame]	779	<p>There is a lot of room for extension here. You can define new AST nodes,
				780	extend the language in many ways, etc. In the <a href="LangImpl3.html">next
				781	installment</a>, we will describe how to generate LLVM IR from the AST.</p>
				782
				783	</div>
				784
				785	<!-- *********************************************************************** -->
				786	<div class="doc_section"><a name="code">Full Code Listing</a></div>
				787	<!-- *********************************************************************** -->
				788
				789	<div class="doc_text">
				790
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	791	<p>
Chris Lattner	d93a584	2007-10-23 05:43:01 +0000	[diff] [blame]	792	Here is the complete code listing for this and the previous chapter.
				793	Note that it is fully self-contained: you don't need LLVM or any external
				794	libraries at all for this (other than the C and C++ standard libraries of
				795	course). To build this, just compile with:</p>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	796
				797	<div class="doc_code">
				798	<pre>
Chris Lattner	d93a584	2007-10-23 05:43:01 +0000	[diff] [blame]	799	# Compile
Chris Lattner	cde1d9d	2007-11-06 07:16:22 +0000	[diff] [blame^]	800	g++ -g -O3 toy.cpp
Chris Lattner	d93a584	2007-10-23 05:43:01 +0000	[diff] [blame]	801	# Run
				802	./a.out
				803	</pre>
				804	</div>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	805
Chris Lattner	d93a584	2007-10-23 05:43:01 +0000	[diff] [blame]	806	<p>Here is the code:</p>
				807
				808	<div class="doc_code">
				809	<pre>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	810	#include <cstdio>
				811	#include <string>
Chris Lattner	d93a584	2007-10-23 05:43:01 +0000	[diff] [blame]	812	#include <map>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	813	#include <vector>
				814
				815	//===----------------------------------------------------------------------===//
				816	// Lexer
				817	//===----------------------------------------------------------------------===//
				818
				819	// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
				820	// of these for known things.
				821	enum Token {
				822	tok_eof = -1,
				823
				824	// commands
				825	tok_def = -2, tok_extern = -3,
				826
				827	// primary
				828	tok_identifier = -4, tok_number = -5,
				829	};
				830
				831	static std::string IdentifierStr; // Filled in if tok_identifier
				832	static double NumVal; // Filled in if tok_number
				833
				834	/// gettok - Return the next token from standard input.
				835	static int gettok() {
				836	static int LastChar = ' ';
				837
				838	// Skip any whitespace.
				839	while (isspace(LastChar))
				840	LastChar = getchar();
				841
				842	if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
				843	IdentifierStr = LastChar;
				844	while (isalnum((LastChar = getchar())))
				845	IdentifierStr += LastChar;
				846
				847	if (IdentifierStr == "def") return tok_def;
				848	if (IdentifierStr == "extern") return tok_extern;
				849	return tok_identifier;
				850	}
				851
				852	if (isdigit(LastChar) \|\| LastChar == '.') { // Number: [0-9.]+
				853	std::string NumStr;
				854	do {
				855	NumStr += LastChar;
				856	LastChar = getchar();
				857	} while (isdigit(LastChar) \|\| LastChar == '.');
				858
				859	NumVal = strtod(NumStr.c_str(), 0);
				860	return tok_number;
				861	}
				862
				863	if (LastChar == '#') {
				864	// Comment until end of line.
				865	do LastChar = getchar();
				866	while (LastChar != EOF && LastChar != '\n' & LastChar != '\r');
				867
				868	if (LastChar != EOF)
				869	return gettok();
				870	}
				871
				872	// Check for end of file. Don't eat the EOF.
				873	if (LastChar == EOF)
				874	return tok_eof;
				875
				876	// Otherwise, just return the character as its ascii value.
				877	int ThisChar = LastChar;
				878	LastChar = getchar();
				879	return ThisChar;
				880	}
				881
				882	//===----------------------------------------------------------------------===//
				883	// Abstract Syntax Tree (aka Parse Tree)
				884	//===----------------------------------------------------------------------===//
				885
				886	/// ExprAST - Base class for all expression nodes.
				887	class ExprAST {
				888	public:
				889	virtual ~ExprAST() {}
				890	};
				891
				892	/// NumberExprAST - Expression class for numeric literals like "1.0".
				893	class NumberExprAST : public ExprAST {
				894	double Val;
				895	public:
Chris Lattner	28571ed	2007-10-23 04:27:44 +0000	[diff] [blame]	896	explicit NumberExprAST(double val) : Val(val) {}
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	897	};
				898
				899	/// VariableExprAST - Expression class for referencing a variable, like "a".
				900	class VariableExprAST : public ExprAST {
				901	std::string Name;
				902	public:
Chris Lattner	28571ed	2007-10-23 04:27:44 +0000	[diff] [blame]	903	explicit VariableExprAST(const std::string &name) : Name(name) {}
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	904	};
				905
				906	/// BinaryExprAST - Expression class for a binary operator.
				907	class BinaryExprAST : public ExprAST {
				908	char Op;
				909	ExprAST LHS, RHS;
				910	public:
				911	BinaryExprAST(char op, ExprAST lhs, ExprAST rhs)
				912	: Op(op), LHS(lhs), RHS(rhs) {}
				913	};
				914
				915	/// CallExprAST - Expression class for function calls.
				916	class CallExprAST : public ExprAST {
				917	std::string Callee;
				918	std::vector<ExprAST*> Args;
				919	public:
				920	CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
				921	: Callee(callee), Args(args) {}
				922	};
				923
				924	/// PrototypeAST - This class represents the "prototype" for a function,
Chris Lattner	cde1d9d	2007-11-06 07:16:22 +0000	[diff] [blame^]	925	/// which captures its name, and its argument names (thus implicitly the number
				926	/// of arguments the function takes).
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	927	class PrototypeAST {
				928	std::string Name;
				929	std::vector< Args;
				930	public:
				931	PrototypeAST(const std::string &name, const std::vector<std::string> &args)
				932	: Name(name), Args(args) {}
				933
				934	};
				935
				936	/// FunctionAST - This class represents a function definition itself.
				937	class FunctionAST {
				938	PrototypeAST *Proto;
				939	ExprAST *Body;
				940	public:
				941	FunctionAST(PrototypeAST proto, ExprAST body)
				942	: Proto(proto), Body(body) {}
				943
				944	};
				945
				946	//===----------------------------------------------------------------------===//
				947	// Parser
				948	//===----------------------------------------------------------------------===//
				949
				950	/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
				951	/// token the parser it looking at. getNextToken reads another token from the
				952	/// lexer and updates CurTok with its results.
				953	static int CurTok;
				954	static int getNextToken() {
				955	return CurTok = gettok();
				956	}
				957
				958	/// BinopPrecedence - This holds the precedence for each binary operator that is
				959	/// defined.
				960	static std::map<char, int> BinopPrecedence;
				961
				962	/// GetTokPrecedence - Get the precedence of the pending binary operator token.
				963	static int GetTokPrecedence() {
				964	if (!isascii(CurTok))
				965	return -1;
				966
				967	// Make sure it's a declared binop.
				968	int TokPrec = BinopPrecedence[CurTok];
				969	if (TokPrec <= 0) return -1;
				970	return TokPrec;
				971	}
				972
				973	/// Error* - These are little helper functions for error handling.
				974	ExprAST Error(const char Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
				975	PrototypeAST ErrorP(const char Str) { Error(Str); return 0; }
				976	FunctionAST ErrorF(const char Str) { Error(Str); return 0; }
				977
				978	static ExprAST *ParseExpression();
				979
				980	/// identifierexpr
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	981	/// ::= identifier
				982	/// ::= identifier '(' expression* ')'
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	983	static ExprAST *ParseIdentifierExpr() {
				984	std::string IdName = IdentifierStr;
				985
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	986	getNextToken(); // eat identifier.
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	987
				988	if (CurTok != '(') // Simple variable ref.
				989	return new VariableExprAST(IdName);
				990
				991	// Call.
				992	getNextToken(); // eat (
				993	std::vector<ExprAST*> Args;
Chris Lattner	7115521	2007-11-06 01:39:12 +0000	[diff] [blame]	994	if (CurTok != ')') {
				995	while (1) {
				996	ExprAST *Arg = ParseExpression();
				997	if (!Arg) return 0;
				998	Args.push_back(Arg);
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	999
Chris Lattner	7115521	2007-11-06 01:39:12 +0000	[diff] [blame]	1000	if (CurTok == ')') break;
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	1001
Chris Lattner	7115521	2007-11-06 01:39:12 +0000	[diff] [blame]	1002	if (CurTok != ',')
				1003	return Error("Expected ')'");
				1004	getNextToken();
				1005	}
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	1006	}
				1007
				1008	// Eat the ')'.
				1009	getNextToken();
				1010
				1011	return new CallExprAST(IdName, Args);
				1012	}
				1013
				1014	/// numberexpr ::= number
				1015	static ExprAST *ParseNumberExpr() {
				1016	ExprAST *Result = new NumberExprAST(NumVal);
				1017	getNextToken(); // consume the number
				1018	return Result;
				1019	}
				1020
				1021	/// parenexpr ::= '(' expression ')'
				1022	static ExprAST *ParseParenExpr() {
				1023	getNextToken(); // eat (.
				1024	ExprAST *V = ParseExpression();
				1025	if (!V) return 0;
				1026
				1027	if (CurTok != ')')
				1028	return Error("expected ')'");
				1029	getNextToken(); // eat ).
				1030	return V;
				1031	}
				1032
				1033	/// primary
				1034	/// ::= identifierexpr
				1035	/// ::= numberexpr
				1036	/// ::= parenexpr
				1037	static ExprAST *ParsePrimary() {
				1038	switch (CurTok) {
				1039	default: return Error("unknown token when expecting an expression");
				1040	case tok_identifier: return ParseIdentifierExpr();
				1041	case tok_number: return ParseNumberExpr();
				1042	case '(': return ParseParenExpr();
				1043	}
				1044	}
				1045
				1046	/// binoprhs
				1047	/// ::= ('+' primary)*
				1048	static ExprAST ParseBinOpRHS(int ExprPrec, ExprAST LHS) {
				1049	// If this is a binop, find its precedence.
				1050	while (1) {
				1051	int TokPrec = GetTokPrecedence();
				1052
				1053	// If this is a binop that binds at least as tightly as the current binop,
				1054	// consume it, otherwise we are done.
				1055	if (TokPrec < ExprPrec)
				1056	return LHS;
				1057
				1058	// Okay, we know this is a binop.
				1059	int BinOp = CurTok;
				1060	getNextToken(); // eat binop
				1061
				1062	// Parse the primary expression after the binary operator.
				1063	ExprAST *RHS = ParsePrimary();
				1064	if (!RHS) return 0;
				1065
				1066	// If BinOp binds less tightly with RHS than the operator after RHS, let
				1067	// the pending operator take RHS as its LHS.
				1068	int NextPrec = GetTokPrecedence();
				1069	if (TokPrec < NextPrec) {
				1070	RHS = ParseBinOpRHS(TokPrec+1, RHS);
				1071	if (RHS == 0) return 0;
				1072	}
				1073
				1074	// Merge LHS/RHS.
				1075	LHS = new BinaryExprAST(BinOp, LHS, RHS);
				1076	}
				1077	}
				1078
				1079	/// expression
				1080	/// ::= primary binoprhs
				1081	///
				1082	static ExprAST *ParseExpression() {
				1083	ExprAST *LHS = ParsePrimary();
				1084	if (!LHS) return 0;
				1085
				1086	return ParseBinOpRHS(0, LHS);
				1087	}
				1088
				1089	/// prototype
				1090	/// ::= id '(' id* ')'
				1091	static PrototypeAST *ParsePrototype() {
				1092	if (CurTok != tok_identifier)
				1093	return ErrorP("Expected function name in prototype");
				1094
				1095	std::string FnName = IdentifierStr;
				1096	getNextToken();
				1097
				1098	if (CurTok != '(')
				1099	return ErrorP("Expected '(' in prototype");
				1100
				1101	std::vector<std::string> ArgNames;
				1102	while (getNextToken() == tok_identifier)
				1103	ArgNames.push_back(IdentifierStr);
				1104	if (CurTok != ')')
				1105	return ErrorP("Expected ')' in prototype");
				1106
				1107	// success.
				1108	getNextToken(); // eat ')'.
				1109
				1110	return new PrototypeAST(FnName, ArgNames);
				1111	}
				1112
				1113	/// definition ::= 'def' prototype expression
				1114	static FunctionAST *ParseDefinition() {
				1115	getNextToken(); // eat def.
				1116	PrototypeAST *Proto = ParsePrototype();
				1117	if (Proto == 0) return 0;
				1118
				1119	if (ExprAST *E = ParseExpression())
				1120	return new FunctionAST(Proto, E);
				1121	return 0;
				1122	}
				1123
				1124	/// toplevelexpr ::= expression
				1125	static FunctionAST *ParseTopLevelExpr() {
				1126	if (ExprAST *E = ParseExpression()) {
				1127	// Make an anonymous proto.
				1128	PrototypeAST *Proto = new PrototypeAST("", std::vector<());
				1129	return new FunctionAST(Proto, E);
				1130	}
				1131	return 0;
				1132	}
				1133
				1134	/// external ::= 'extern' prototype
				1135	static PrototypeAST *ParseExtern() {
				1136	getNextToken(); // eat extern.
				1137	return ParsePrototype();
				1138	}
				1139
				1140	//===----------------------------------------------------------------------===//
				1141	// Top-Level parsing
				1142	//===----------------------------------------------------------------------===//
				1143
				1144	static void HandleDefinition() {
				1145	if (FunctionAST *F = ParseDefinition()) {
				1146	fprintf(stderr, "Parsed a function definition.\n");
				1147	} else {
				1148	// Skip token for error recovery.
				1149	getNextToken();
				1150	}
				1151	}
				1152
				1153	static void HandleExtern() {
				1154	if (PrototypeAST *P = ParseExtern()) {
				1155	fprintf(stderr, "Parsed an extern\n");
				1156	} else {
				1157	// Skip token for error recovery.
				1158	getNextToken();
				1159	}
				1160	}
				1161
				1162	static void HandleTopLevelExpression() {
				1163	// Evaluate a top level expression into an anonymous function.
				1164	if (FunctionAST *F = ParseTopLevelExpr()) {
				1165	fprintf(stderr, "Parsed a top-level expr\n");
				1166	} else {
				1167	// Skip token for error recovery.
				1168	getNextToken();
				1169	}
				1170	}
				1171
				1172	/// top ::= definition \| external \| expression \| ';'
				1173	static void MainLoop() {
				1174	while (1) {
				1175	fprintf(stderr, "ready> ");
				1176	switch (CurTok) {
				1177	case tok_eof: return;
				1178	case ';': getNextToken(); break; // ignore top level semicolons.
				1179	case tok_def: HandleDefinition(); break;
				1180	case tok_extern: HandleExtern(); break;
				1181	default: HandleTopLevelExpression(); break;
				1182	}
				1183	}
				1184	}
				1185
				1186	//===----------------------------------------------------------------------===//
				1187	// Main driver code.
				1188	//===----------------------------------------------------------------------===//
				1189
				1190	int main() {
				1191	// Install standard binary operators.
				1192	// 1 is lowest precedence.
				1193	BinopPrecedence['<'] = 10;
				1194	BinopPrecedence['+'] = 20;
				1195	BinopPrecedence['-'] = 20;
				1196	BinopPrecedence['*'] = 40; // highest.
				1197
				1198	// Prime the first token.
				1199	fprintf(stderr, "ready> ");
				1200	getNextToken();
				1201
				1202	MainLoop();
				1203	return 0;
				1204	}
				1205	</pre>
				1206	</div>
				1207	</div>
				1208
				1209	<!-- *********************************************************************** -->
				1210	<hr>
				1211	<address>
				1212	<a href="http://jigsaw.w3.org/css-validator/check/referer"><img
				1213	src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
				1214	<a href="http://validator.w3.org/check/referer"><img
Chris Lattner	8eef4b2	2007-10-23 06:30:50 +0000	[diff] [blame]	1215	src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	1216
				1217	<a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
				1218	<a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
				1219	Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $
				1220	</address>
				1221	</body>
				1222	</html>