Blame - docs/tutorial/LangImpl2.html - fp2-dev/platform/external/llvm

blob: 2339d96a91292e380746dc6915e6f5d291e74052 [file] [log] [blame]

Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
				2	"http://www.w3.org/TR/html4/strict.dtd">
				3
				4	<html>
				5	<head>
				6	<title>Kaleidoscope: Implementing a Parser and AST</title>
				7	<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
				8	<meta name="author" content="Chris Lattner">
				9	<link rel="stylesheet" href="../llvm.css" type="text/css">
				10	</head>
				11
				12	<body>
				13
				14	<div class="doc_title">Kaleidoscope: Implementing a Parser and AST</div>
				15
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	16	<ul>
				17	<li>Chapter 2
				18	<ol>
				19	<li><a href="#intro">Chapter 2 Introduction</a></li>
				20	<li><a href="#ast">The Abstract Syntax Tree (AST)</a></li>
				21	<li><a href="#parserbasics">Parser Basics</a></li>
				22	<li><a href="#parserprimexprs">Basic Expression Parsing</a></li>
				23	<li><a href="#parserbinops">Binary Expression Parsing</a></li>
				24	<li><a href="#parsertop">Parsing the Rest</a></li>
				25	<li><a href="#driver">The Driver</a></li>
				26	<li><a href="#conclusions">Conclusions</a></li>
				27	<li><a href="#code">Full Code Listing</a></li>
				28	</ol>
				29	</li>
				30	</ul>
				31
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	32	<div class="doc_author">
				33	<p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
				34	</div>
				35
				36	<!-- *********************************************************************** -->
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	37	<div class="doc_section"><a name="intro">Chapter 2 Introduction</a></div>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	38	<!-- *********************************************************************** -->
				39
				40	<div class="doc_text">
				41
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	42	<p>Welcome to Chapter 2 of the "<a href="index.html">Implementing a language
				43	with LLVM</a>" tutorial. This chapter shows you how to use the <a
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	44	href="LangImpl1.html">Lexer built in Chapter 1</a> to build a full <a
				45	href="http://en.wikipedia.org/wiki/Parsing">parser</a> for
				46	our Kaleidoscope language and build an <a
				47	href="http://en.wikipedia.org/wiki/Abstract_syntax_tree">Abstract Syntax
				48	Tree</a> (AST).</p>
				49
				50	<p>The parser we will build uses a combination of <a
				51	href="http://en.wikipedia.org/wiki/Recursive_descent_parser">Recursive Descent
				52	Parsing</a> and <a href=
				53	"http://en.wikipedia.org/wiki/Operator-precedence_parser">Operator-Precedence
				54	Parsing</a> to parse the Kaleidoscope language (the later for binary expression
				55	and the former for everything else). Before we get to parsing though, lets talk
				56	about the output of the parser: the Abstract Syntax Tree.</p>
				57
				58	</div>
				59
				60	<!-- *********************************************************************** -->
				61	<div class="doc_section"><a name="ast">The Abstract Syntax Tree (AST)</a></div>
				62	<!-- *********************************************************************** -->
				63
				64	<div class="doc_text">
				65
				66	<p>The AST for a program captures its behavior in a way that it is easy for
				67	later stages of the compiler (e.g. code generation) to interpret. We basically
				68	want one object for each construct in the language, and the AST should closely
				69	model the language. In Kaleidoscope, we have expressions, a prototype, and a
				70	function object. We'll start with expressions first:</p>
				71
				72	<div class="doc_code">
				73	<pre>
				74	/// ExprAST - Base class for all expression nodes.
				75	class ExprAST {
				76	public:
				77	virtual ~ExprAST() {}
				78	};
				79
				80	/// NumberExprAST - Expression class for numeric literals like "1.0".
				81	class NumberExprAST : public ExprAST {
				82	double Val;
				83	public:
Chris Lattner	28571ed	2007-10-23 04:27:44 +0000	[diff] [blame]	84	explicit NumberExprAST(double val) : Val(val) {}
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	85	};
				86	</pre>
				87	</div>
				88
				89	<p>The code above shows the definition of the base ExprAST class and one
				90	subclass which we use for numeric literals. The important thing about this is
				91	that the NumberExprAST class captures the numeric value of the literal in the
				92	class, so that later phases of the compiler can know what it is.</p>
				93
				94	<p>Right now we only create the AST, so there are no useful accessor methods on
				95	them. It would be very easy to add a virtual method to pretty print the code,
				96	for example. Here are the other expression AST node definitions that we'll use
				97	in the basic form of the Kaleidoscope language.
				98	</p>
				99
				100	<div class="doc_code">
				101	<pre>
				102	/// VariableExprAST - Expression class for referencing a variable, like "a".
				103	class VariableExprAST : public ExprAST {
				104	std::string Name;
				105	public:
Chris Lattner	28571ed	2007-10-23 04:27:44 +0000	[diff] [blame]	106	explicit VariableExprAST(const std::string &name) : Name(name) {}
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	107	};
				108
				109	/// BinaryExprAST - Expression class for a binary operator.
				110	class BinaryExprAST : public ExprAST {
				111	char Op;
				112	ExprAST LHS, RHS;
				113	public:
				114	BinaryExprAST(char op, ExprAST lhs, ExprAST rhs)
				115	: Op(op), LHS(lhs), RHS(rhs) {}
				116	};
				117
				118	/// CallExprAST - Expression class for function calls.
				119	class CallExprAST : public ExprAST {
				120	std::string Callee;
				121	std::vector<ExprAST*> Args;
				122	public:
				123	CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
				124	: Callee(callee), Args(args) {}
				125	};
				126	</pre>
				127	</div>
				128
				129	<p>This is all (intentially) rather straight-forward: variables capture the
				130	variable name, binary operators capture their opcode (e.g. '+'), and calls
				131	capture a function name and list of argument expressions. One thing that is
				132	nice about our AST is that it captures the language features without talking
				133	about the syntax of the language. Note that there is no discussion about
				134	precedence of binary operators, lexical structure etc.</p>
				135
				136	<p>For our basic language, these are all of the expression nodes we'll define.
Owen Anderson	c6311b9	2007-10-22 06:48:28 +0000	[diff] [blame]	137	Because it doesn't have conditional control flow, it isn't Turing-complete;
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	138	we'll fix that in a later installment. The two things we need next are a way
				139	to talk about the interface to a function, and a way to talk about functions
				140	themselves:</p>
				141
				142	<div class="doc_code">
				143	<pre>
				144	/// PrototypeAST - This class represents the "prototype" for a function,
				145	/// which captures its argument names as well as if it is an operator.
				146	class PrototypeAST {
				147	std::string Name;
				148	std::vector<std::string> Args;
				149	public:
				150	PrototypeAST(const std::string &name, const std::vector<std::string> &args)
				151	: Name(name), Args(args) {}
				152	};
				153
				154	/// FunctionAST - This class represents a function definition itself.
				155	class FunctionAST {
				156	PrototypeAST *Proto;
				157	ExprAST *Body;
				158	public:
				159	FunctionAST(PrototypeAST proto, ExprAST body)
				160	: Proto(proto), Body(body) {}
				161	};
				162	</pre>
				163	</div>
				164
				165	<p>In Kaleidoscope, functions are typed with just a count of their arguments.
				166	Since all values are double precision floating point, this fact doesn't need to
				167	be captured anywhere. In a more aggressive and realistic language, the
				168	"ExprAST" class would probably have a type field.</p>
				169
				170	<p>With this scaffolding, we can now talk about parsing expressions and function
				171	bodies in Kaleidoscope.</p>
				172
				173	</div>
				174
				175	<!-- *********************************************************************** -->
				176	<div class="doc_section"><a name="parserbasics">Parser Basics</a></div>
				177	<!-- *********************************************************************** -->
				178
				179	<div class="doc_text">
				180
				181	<p>Now that we have an AST to build, we need to define the parser code to build
				182	it. The idea here is that we want to parse something like "x+y" (which is
				183	returned as three tokens by the lexer) into an AST that could be generated with
				184	calls like this:</p>
				185
				186	<div class="doc_code">
				187	<pre>
				188	ExprAST *X = new VariableExprAST("x");
				189	ExprAST *Y = new VariableExprAST("y");
				190	ExprAST *Result = new BinaryExprAST('+', X, Y);
				191	</pre>
				192	</div>
				193
				194	<p>In order to do this, we'll start by defining some basic helper routines:</p>
				195
				196	<div class="doc_code">
				197	<pre>
				198	/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
				199	/// token the parser it looking at. getNextToken reads another token from the
				200	/// lexer and updates CurTok with its results.
				201	static int CurTok;
				202	static int getNextToken() {
				203	return CurTok = gettok();
				204	}
				205	</pre>
				206	</div>
				207
				208	<p>
				209	This implements a simple token buffer around the lexer. This allows
				210	us to look one token ahead at what the lexer is returning. Every function in
Chris Lattner	e949512	2007-10-25 18:05:29 +0000	[diff] [blame]	211	our parser will assume that CurTok is the current token that needs to be
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	212	parsed.</p>
				213
Chris Lattner	bee175b	2007-10-22 16:44:31 +0000	[diff] [blame]	214	<p>Again, we define these with global variables; it would be better design to
				215	wrap the entire parser in a class and use instance variables for these.
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	216	</p>
				217
				218	<div class="doc_code">
				219	<pre>
				220
				221	/// Error* - These are little helper functions for error handling.
				222	ExprAST Error(const char Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
				223	PrototypeAST ErrorP(const char Str) { Error(Str); return 0; }
				224	FunctionAST ErrorF(const char Str) { Error(Str); return 0; }
				225	</pre>
				226	</div>
				227
				228	<p>
				229	The <tt>Error</tt> routines are simple helper routines that our parser will use
				230	to handle errors. The error recovery in our parser will not be the best and
Duncan Sands	72261ff	2007-11-05 16:04:58 +0000	[diff] [blame]	231	is not particular user-friendly, but it will be enough for our tutorial. These
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	232	routines make it easier to handle errors in routines that have various return
				233	types: they always return null.</p>
				234
				235	<p>With these basic helper functions implemented, we can implement the first
				236	piece of our grammar: we'll start with numeric literals.</p>
				237
				238	</div>
				239
				240	<!-- *********************************************************************** -->
				241	<div class="doc_section"><a name="parserprimexprs">Basic Expression
				242	Parsing</a></div>
				243	<!-- *********************************************************************** -->
				244
				245	<div class="doc_text">
				246
				247	<p>We start with numeric literals, because they are the simplest to process.
				248	For each production in our grammar, we'll define a function which parses that
				249	production. For numeric literals, we have:
				250	</p>
				251
				252	<div class="doc_code">
				253	<pre>
				254	/// numberexpr ::= number
				255	static ExprAST *ParseNumberExpr() {
				256	ExprAST *Result = new NumberExprAST(NumVal);
				257	getNextToken(); // consume the number
				258	return Result;
				259	}
				260	</pre>
				261	</div>
				262
				263	<p>This routine is very simple: it expects to be called when the current token
				264	is a <tt>tok_number</tt> token. It takes the current number value, creates
				265	a <tt>NumberExprAST</tt> node, advances the lexer to the next token, then
				266	returns.</p>
				267
				268	<p>There are some interesting aspects of this. The most important one is that
				269	this routine eats all of the tokens that correspond to the production, and
				270	returns the lexer buffer with the next token (which is not part of the grammar
				271	production) ready to go. This is a fairly standard way to go for recursive
				272	descent parsers. For a better example, the parenthesis operator is defined like
				273	this:</p>
				274
				275	<div class="doc_code">
				276	<pre>
				277	/// parenexpr ::= '(' expression ')'
				278	static ExprAST *ParseParenExpr() {
				279	getNextToken(); // eat (.
				280	ExprAST *V = ParseExpression();
				281	if (!V) return 0;
				282
				283	if (CurTok != ')')
				284	return Error("expected ')'");
				285	getNextToken(); // eat ).
				286	return V;
				287	}
				288	</pre>
				289	</div>
				290
				291	<p>This function illustrates a number of interesting things about the parser:
				292	1) it shows how we use the Error routines. When called, this function expects
				293	that the current token is a '(' token, but after parsing the subexpression, it
				294	is possible that there is not a ')' waiting. For example, if the user types in
				295	"(4 x" instead of "(4)", the parser should emit an error. Because errors can
				296	occur, the parser needs a way to indicate that they happened: in our parser, we
				297	return null on an error.</p>
				298
				299	<p>Another interesting aspect of this function is that it uses recursion by
Owen Anderson	c6311b9	2007-10-22 06:48:28 +0000	[diff] [blame]	300	calling <tt>ParseExpression</tt> (we will soon see that <tt>ParseExpression</tt> can call
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	301	<tt>ParseParenExpr</tt>). This is powerful because it allows us to handle
				302	recursive grammars, and keeps each production very simple. Note that
Duncan Sands	72261ff	2007-11-05 16:04:58 +0000	[diff] [blame]	303	parentheses do not cause construction of AST nodes themselves. While we could
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	304	do this, the most important role of parens are to guide the parser and provide
				305	grouping. Once the parser constructs the AST, parens are not needed.</p>
				306
				307	<p>The next simple production is for handling variable references and function
				308	calls:</p>
				309
				310	<div class="doc_code">
				311	<pre>
				312	/// identifierexpr
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	313	/// ::= identifier
				314	/// ::= identifier '(' expression* ')'
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	315	static ExprAST *ParseIdentifierExpr() {
				316	std::string IdName = IdentifierStr;
				317
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	318	getNextToken(); // eat identifier.
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	319
				320	if (CurTok != '(') // Simple variable ref.
				321	return new VariableExprAST(IdName);
				322
				323	// Call.
				324	getNextToken(); // eat (
				325	std::vector<ExprAST*> Args;
				326	while (1) {
				327	ExprAST *Arg = ParseExpression();
				328	if (!Arg) return 0;
				329	Args.push_back(Arg);
				330
				331	if (CurTok == ')') break;
				332
				333	if (CurTok != ',')
				334	return Error("Expected ')'");
				335	getNextToken();
				336	}
				337
				338	// Eat the ')'.
				339	getNextToken();
				340
				341	return new CallExprAST(IdName, Args);
				342	}
				343	</pre>
				344	</div>
				345
				346	<p>This routine follows the same style as the other routines (it expects to be
				347	called if the current token is a <tt>tok_identifier</tt> token). It also has
				348	recursion and error handling. One interesting aspect of this is that it uses
				349	<em>look-ahead</em> to determine if the current identifier is a stand alone
				350	variable reference or if it is a function call expression. It handles this by
				351	checking to see if the token after the identifier is a '(' token, and constructs
				352	either a <tt>VariableExprAST</tt> or <tt>CallExprAST</tt> node as appropriate.
				353	</p>
				354
				355	<p>Now that we have all of our simple expression parsing logic in place, we can
				356	define a helper function to wrap them up in a class. We call this class of
				357	expressions "primary" expressions, for reasons that will become more clear
				358	later. In order to parse a primary expression, we need to determine what sort
				359	of expression it is:</p>
				360
				361	<div class="doc_code">
				362	<pre>
				363	/// primary
				364	/// ::= identifierexpr
				365	/// ::= numberexpr
				366	/// ::= parenexpr
				367	static ExprAST *ParsePrimary() {
				368	switch (CurTok) {
				369	default: return Error("unknown token when expecting an expression");
				370	case tok_identifier: return ParseIdentifierExpr();
				371	case tok_number: return ParseNumberExpr();
				372	case '(': return ParseParenExpr();
				373	}
				374	}
				375	</pre>
				376	</div>
				377
				378	<p>Now that you see the definition of this function, it makes it more obvious
				379	why we can assume the state of CurTok in the various functions. This uses
				380	look-ahead to determine which sort of expression is being inspected, and parses
				381	it with a function call.</p>
				382
				383	<p>Now that basic expressions are handled, we need to handle binary expressions,
				384	which are a bit more complex.</p>
				385
				386	</div>
				387
				388	<!-- *********************************************************************** -->
				389	<div class="doc_section"><a name="parserbinops">Binary Expression
				390	Parsing</a></div>
				391	<!-- *********************************************************************** -->
				392
				393	<div class="doc_text">
				394
				395	<p>Binary expressions are significantly harder to parse because they are often
				396	ambiguous. For example, when given the string "x+y*z", the parser can choose
				397	to parse it as either "(x+y)z" or "x+(yz)". With common definitions from
				398	mathematics, we expect the later parse, because "*" (multiplication) has
				399	higher <em>precedence</em> than "+" (addition).</p>
				400
				401	<p>There are many ways to handle this, but an elegant and efficient way is to
				402	use <a href=
				403	"http://en.wikipedia.org/wiki/Operator-precedence_parser">Operator-Precedence
				404	Parsing</a>. This parsing technique uses the precedence of binary operators to
				405	guide recursion. To start with, we need a table of precedences:</p>
				406
				407	<div class="doc_code">
				408	<pre>
				409	/// BinopPrecedence - This holds the precedence for each binary operator that is
				410	/// defined.
				411	static std::map<char, int> BinopPrecedence;
				412
				413	/// GetTokPrecedence - Get the precedence of the pending binary operator token.
				414	static int GetTokPrecedence() {
				415	if (!isascii(CurTok))
				416	return -1;
				417
				418	// Make sure it's a declared binop.
				419	int TokPrec = BinopPrecedence[CurTok];
				420	if (TokPrec <= 0) return -1;
				421	return TokPrec;
				422	}
				423
				424	int main() {
				425	// Install standard binary operators.
				426	// 1 is lowest precedence.
				427	BinopPrecedence['<'] = 10;
				428	BinopPrecedence['+'] = 20;
				429	BinopPrecedence['-'] = 20;
				430	BinopPrecedence['*'] = 40; // highest.
				431	...
				432	}
				433	</pre>
				434	</div>
				435
				436	<p>For the basic form of Kaleidoscope, we will only support 4 binary operators
				437	(this can obviously be extended by you, the reader). The
				438	<tt>GetTokPrecedence</tt> function returns the precedence for the current token,
				439	or -1 if the token is not a binary operator. Having a map makes it easy to add
				440	new operators and makes it clear that the algorithm doesn't depend on the
				441	specific operators involved, but it would be easy enough to eliminate the map
				442	and do the comparisons in the <tt>GetTokPrecedence</tt> function.</p>
				443
				444	<p>With the helper above defined, we can now start parsing binary expressions.
				445	The basic idea of operator precedence parsing is to break down an expression
				446	with potentially ambiguous binary operators into pieces. Consider for example
				447	the expression "a+b+(c+d)ef+g". Operator precedence parsing considers this
				448	as a stream of primary expressions separated by binary operators. As such,
				449	it will first parse the leading primary expression "a", then it will see the
				450	pairs [+, b] [+, (c+d)] [, e] [, f] and [+, g]. Note that because parentheses
Duncan Sands	72261ff	2007-11-05 16:04:58 +0000	[diff] [blame]	451	are primary expressions, the binary expression parser doesn't need to worry
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	452	about nested subexpressions like (c+d) at all.
				453	</p>
				454
				455	<p>
				456	To start, an expression is a primary expression potentially followed by a
				457	sequence of [binop,primaryexpr] pairs:</p>
				458
				459	<div class="doc_code">
				460	<pre>
				461	/// expression
				462	/// ::= primary binoprhs
				463	///
				464	static ExprAST *ParseExpression() {
				465	ExprAST *LHS = ParsePrimary();
				466	if (!LHS) return 0;
				467
				468	return ParseBinOpRHS(0, LHS);
				469	}
				470	</pre>
				471	</div>
				472
				473	<p><tt>ParseBinOpRHS</tt> is the function that parses the sequence of pairs for
				474	us. It takes a precedence and a pointer to an expression for the part parsed
				475	so far. Note that "x" is a perfectly valid expression: As such, "binoprhs" is
				476	allowed to be empty, in which case it returns the expression that is passed into
				477	it. In our example above, the code passes the expression for "a" into
				478	<tt>ParseBinOpRHS</tt> and the current token is "+".</p>
				479
				480	<p>The precedence value passed into <tt>ParseBinOpRHS</tt> indicates the <em>
				481	minimal operator precedence</em> that the function is allowed to eat. For
				482	example, if the current pair stream is [+, x] and <tt>ParseBinOpRHS</tt> is
				483	passed in a precedence of 40, it will not consume any tokens (because the
				484	precedence of '+' is only 20). With this in mind, <tt>ParseBinOpRHS</tt> starts
				485	with:</p>
				486
				487	<div class="doc_code">
				488	<pre>
				489	/// binoprhs
				490	/// ::= ('+' primary)*
				491	static ExprAST ParseBinOpRHS(int ExprPrec, ExprAST LHS) {
				492	// If this is a binop, find its precedence.
				493	while (1) {
				494	int TokPrec = GetTokPrecedence();
				495
				496	// If this is a binop that binds at least as tightly as the current binop,
				497	// consume it, otherwise we are done.
				498	if (TokPrec < ExprPrec)
				499	return LHS;
				500	</pre>
				501	</div>
				502
				503	<p>This code gets the precedence of the current token and checks to see if if is
				504	too low. Because we defined invalid tokens to have a precedence of -1, this
				505	check implicitly knows that the pair-stream ends when the token stream runs out
				506	of binary operators. If this check succeeds, we know that the token is a binary
				507	operator and that it will be included in this expression:</p>
				508
				509	<div class="doc_code">
				510	<pre>
				511	// Okay, we know this is a binop.
				512	int BinOp = CurTok;
				513	getNextToken(); // eat binop
				514
				515	// Parse the primary expression after the binary operator.
				516	ExprAST *RHS = ParsePrimary();
				517	if (!RHS) return 0;
				518	</pre>
				519	</div>
				520
				521	<p>As such, this code eats (and remembers) the binary operator and then parses
				522	the following primary expression. This builds up the whole pair, the first of
				523	which is [+, b] for the running example.</p>
				524
				525	<p>Now that we parsed the left-hand side of an expression and one pair of the
				526	RHS sequence, we have to decide which way the expression associates. In
				527	particular, we could have "(a+b) binop unparsed" or "a + (b binop unparsed)".
				528	To determine this, we look ahead at "binop" to determine its precedence and
				529	compare it to BinOp's precedence (which is '+' in this case):</p>
				530
				531	<div class="doc_code">
				532	<pre>
				533	// If BinOp binds less tightly with RHS than the operator after RHS, let
				534	// the pending operator take RHS as its LHS.
				535	int NextPrec = GetTokPrecedence();
				536	if (TokPrec < NextPrec) {
				537	</pre>
				538	</div>
				539
				540	<p>If the precedence of the binop to the right of "RHS" is lower or equal to the
				541	precedence of our current operator, then we know that the parentheses associate
				542	as "(a+b) binop ...". In our example, since the next operator is "+" and so is
				543	our current one, we know that they have the same precedence. In this case we'll
				544	create the AST node for "a+b", and then continue parsing:</p>
				545
				546	<div class="doc_code">
				547	<pre>
				548	... if body omitted ...
				549	}
				550
				551	// Merge LHS/RHS.
				552	LHS = new BinaryExprAST(BinOp, LHS, RHS);
				553	} // loop around to the top of the while loop.
				554	}
				555	</pre>
				556	</div>
				557
				558	<p>In our example above, this will turn "a+b+" into "(a+b)" and execute the next
				559	iteration of the loop, with "+" as the current token. The code above will eat
				560	and remember it and parse "(c+d)" as the primary expression, which makes the
				561	current pair be [+, (c+d)]. It will then enter the 'if' above with "*" as the
				562	binop to the right of the primary. In this case, the precedence of "*" is
				563	higher than the precedence of "+" so the if condition will be entered.</p>
				564
				565	<p>The critical question left here is "how can the if condition parse the right
				566	hand side in full"? In particular, to build the AST correctly for our example,
				567	it needs to get all of "(c+d)ef" as the RHS expression variable. The code to
				568	do this is surprisingly simple (code from the above two blocks duplicated for
				569	context):</p>
				570
				571	<div class="doc_code">
				572	<pre>
				573	// If BinOp binds less tightly with RHS than the operator after RHS, let
				574	// the pending operator take RHS as its LHS.
				575	int NextPrec = GetTokPrecedence();
				576	if (TokPrec < NextPrec) {
				577	RHS = ParseBinOpRHS(TokPrec+1, RHS);
				578	if (RHS == 0) return 0;
				579	}
				580	// Merge LHS/RHS.
				581	LHS = new BinaryExprAST(BinOp, LHS, RHS);
				582	} // loop around to the top of the while loop.
				583	}
				584	</pre>
				585	</div>
				586
				587	<p>At this point, we know that the binary operator to the RHS of our primary
				588	has higher precedence than the binop we are currently parsing. As such, we know
				589	that any sequence of pairs whose operators are all higher precedence than "+"
				590	should be parsed together and returned as "RHS". To do this, we recursively
				591	invoke the <tt>ParseBinOpRHS</tt> function specifying "TokPrec+1" as the minimum
				592	precedence required for it to continue. In our example above, this will cause
				593	it to return the AST node for "(c+d)ef" as RHS, which is then set as the RHS
				594	of the '+' expression.</p>
				595
				596	<p>Finally, on the next iteration of the while loop, the "+g" piece is parsed.
				597	and added to the AST. With this little bit of code (14 non-trivial lines), we
				598	correctly handle fully general binary expression parsing in a very elegant way.
				599	</p>
				600
				601	<p>This wraps up handling of expressions. At this point, we can point the
				602	parser at an arbitrary token stream and build an expression from them, stopping
				603	at the first token that is not part of the expression. Next up we need to
				604	handle function definitions etc.</p>
				605
				606	</div>
				607
				608	<!-- *********************************************************************** -->
				609	<div class="doc_section"><a name="parsertop">Parsing the Rest</a></div>
				610	<!-- *********************************************************************** -->
				611
				612	<div class="doc_text">
				613
				614	<p>
				615	The first basic thing missing is that of function prototypes. In Kaleidoscope,
				616	these are used both for 'extern' function declarations as well as function body
				617	definitions. The code to do this is straight-forward and not very interesting
				618	(once you've survived expressions):
				619	</p>
				620
				621	<div class="doc_code">
				622	<pre>
				623	/// prototype
				624	/// ::= id '(' id* ')'
				625	static PrototypeAST *ParsePrototype() {
				626	if (CurTok != tok_identifier)
				627	return ErrorP("Expected function name in prototype");
				628
				629	std::string FnName = IdentifierStr;
				630	getNextToken();
				631
				632	if (CurTok != '(')
				633	return ErrorP("Expected '(' in prototype");
				634
				635	std::vector<std::string> ArgNames;
				636	while (getNextToken() == tok_identifier)
				637	ArgNames.push_back(IdentifierStr);
				638	if (CurTok != ')')
				639	return ErrorP("Expected ')' in prototype");
				640
				641	// success.
				642	getNextToken(); // eat ')'.
				643
				644	return new PrototypeAST(FnName, ArgNames);
				645	}
				646	</pre>
				647	</div>
				648
				649	<p>Given this, a function definition is very simple, just a prototype plus
Duncan Sands	72261ff	2007-11-05 16:04:58 +0000	[diff] [blame]	650	an expression to implement the body:</p>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	651
				652	<div class="doc_code">
				653	<pre>
				654	/// definition ::= 'def' prototype expression
				655	static FunctionAST *ParseDefinition() {
				656	getNextToken(); // eat def.
				657	PrototypeAST *Proto = ParsePrototype();
				658	if (Proto == 0) return 0;
				659
				660	if (ExprAST *E = ParseExpression())
				661	return new FunctionAST(Proto, E);
				662	return 0;
				663	}
				664	</pre>
				665	</div>
				666
				667	<p>In addition, we support 'extern' to declare functions like 'sin' and 'cos' as
				668	well as to support forward declaration of user functions. 'externs' are just
				669	prototypes with no body:</p>
				670
				671	<div class="doc_code">
				672	<pre>
				673	/// external ::= 'extern' prototype
				674	static PrototypeAST *ParseExtern() {
				675	getNextToken(); // eat extern.
				676	return ParsePrototype();
				677	}
				678	</pre>
				679	</div>
				680
				681	<p>Finally, we'll also let the user type in arbitrary top-level expressions and
				682	evaluate them on the fly. We will handle this by defining anonymous nullary
				683	(zero argument) functions for them:</p>
				684
				685	<div class="doc_code">
				686	<pre>
				687	/// toplevelexpr ::= expression
				688	static FunctionAST *ParseTopLevelExpr() {
				689	if (ExprAST *E = ParseExpression()) {
				690	// Make an anonymous proto.
				691	PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
				692	return new FunctionAST(Proto, E);
				693	}
				694	return 0;
				695	}
				696	</pre>
				697	</div>
				698
				699	<p>Now that we have all the pieces, lets build a little driver that will let us
				700	actually <em>execute</em> this code we've built!</p>
				701
				702	</div>
				703
				704	<!-- *********************************************************************** -->
				705	<div class="doc_section"><a name="driver">The Driver</a></div>
				706	<!-- *********************************************************************** -->
				707
				708	<div class="doc_text">
				709
				710	<p>The driver for this simply invokes all of the parsing pieces with a top-level
				711	dispatch loop. There isn't much interesting here, so I'll just include the
				712	top-level loop. See <a href="#code">below</a> for full code in the "Top-Level
				713	Parsing" section.</p>
				714
				715	<div class="doc_code">
				716	<pre>
				717	/// top ::= definition \| external \| expression \| ';'
				718	static void MainLoop() {
				719	while (1) {
				720	fprintf(stderr, "ready> ");
				721	switch (CurTok) {
				722	case tok_eof: return;
				723	case ';': getNextToken(); break; // ignore top level semicolons.
				724	case tok_def: HandleDefinition(); break;
				725	case tok_extern: HandleExtern(); break;
				726	default: HandleTopLevelExpression(); break;
				727	}
				728	}
				729	}
				730	</pre>
				731	</div>
				732
				733	<p>The most interesting part of this is that we ignore top-level semi colons.
Owen Anderson	c6311b9	2007-10-22 06:48:28 +0000	[diff] [blame]	734	Why is this, you ask? The basic reason is that if you type "4 + 5" at the
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	735	command line, the parser doesn't know that that is the end of what you will
				736	type. For example, on the next line you could type "def foo..." in which case
				737	4+5 is the end of a top-level expression. Alternatively you could type "* 6",
				738	which would continue the expression. Having top-level semicolons allows you to
				739	type "4+5;" and the parser will know you are done.</p>
				740
				741	</div>
				742
				743	<!-- *********************************************************************** -->
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	744	<div class="doc_section"><a name="conclusions">Conclusions</a></div>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	745	<!-- *********************************************************************** -->
				746
				747	<div class="doc_text">
				748
				749	<p>With just under 400 lines of commented code, we fully defined our minimal
				750	language, including a lexer, parser and AST builder. With this done, the
				751	executable will validate code and tell us if it is gramatically invalid. For
				752	example, here is a sample interaction:</p>
				753
				754	<div class="doc_code">
				755	<pre>
				756	$ ./a.out
Chris Lattner	35abbf5	2007-10-23 06:23:57 +0000	[diff] [blame]	757	ready> def foo(x y) x+foo(y, 4.0);
Chris Lattner	01fcc04	2007-11-05 17:38:34 +0000	[diff] [blame]	758	ready> Parsed a function definition.
Chris Lattner	35abbf5	2007-10-23 06:23:57 +0000	[diff] [blame]	759	ready> def foo(x y) x+y y;
Chris Lattner	01fcc04	2007-11-05 17:38:34 +0000	[diff] [blame]	760	ready> Parsed a function definition.
Chris Lattner	35abbf5	2007-10-23 06:23:57 +0000	[diff] [blame]	761	ready> Parsed a top-level expr
				762	ready> def foo(x y) x+y );
Chris Lattner	01fcc04	2007-11-05 17:38:34 +0000	[diff] [blame]	763	ready> Parsed a function definition.
Chris Lattner	35abbf5	2007-10-23 06:23:57 +0000	[diff] [blame]	764	ready> Error: unknown token when expecting an expression
				765	ready> extern sin(a);
				766	ready> Parsed an extern
				767	ready> ^D
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	768	$
				769	</pre>
				770	</div>
				771
Chris Lattner	d93a584	2007-10-23 05:43:01 +0000	[diff] [blame]	772	<p>There is a lot of room for extension here. You can define new AST nodes,
				773	extend the language in many ways, etc. In the <a href="LangImpl3.html">next
				774	installment</a>, we will describe how to generate LLVM IR from the AST.</p>
				775
				776	</div>
				777
				778	<!-- *********************************************************************** -->
				779	<div class="doc_section"><a name="code">Full Code Listing</a></div>
				780	<!-- *********************************************************************** -->
				781
				782	<div class="doc_text">
				783
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	784	<p>
Chris Lattner	d93a584	2007-10-23 05:43:01 +0000	[diff] [blame]	785	Here is the complete code listing for this and the previous chapter.
				786	Note that it is fully self-contained: you don't need LLVM or any external
				787	libraries at all for this (other than the C and C++ standard libraries of
				788	course). To build this, just compile with:</p>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	789
				790	<div class="doc_code">
				791	<pre>
Chris Lattner	d93a584	2007-10-23 05:43:01 +0000	[diff] [blame]	792	# Compile
				793	g++ -g toy.cpp
				794	# Run
				795	./a.out
				796	</pre>
				797	</div>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	798
Chris Lattner	d93a584	2007-10-23 05:43:01 +0000	[diff] [blame]	799	<p>Here is the code:</p>
				800
				801	<div class="doc_code">
				802	<pre>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	803	#include <cstdio>
				804	#include <string>
Chris Lattner	d93a584	2007-10-23 05:43:01 +0000	[diff] [blame]	805	#include <map>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	806	#include <vector>
				807
				808	//===----------------------------------------------------------------------===//
				809	// Lexer
				810	//===----------------------------------------------------------------------===//
				811
				812	// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
				813	// of these for known things.
				814	enum Token {
				815	tok_eof = -1,
				816
				817	// commands
				818	tok_def = -2, tok_extern = -3,
				819
				820	// primary
				821	tok_identifier = -4, tok_number = -5,
				822	};
				823
				824	static std::string IdentifierStr; // Filled in if tok_identifier
				825	static double NumVal; // Filled in if tok_number
				826
				827	/// gettok - Return the next token from standard input.
				828	static int gettok() {
				829	static int LastChar = ' ';
				830
				831	// Skip any whitespace.
				832	while (isspace(LastChar))
				833	LastChar = getchar();
				834
				835	if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
				836	IdentifierStr = LastChar;
				837	while (isalnum((LastChar = getchar())))
				838	IdentifierStr += LastChar;
				839
				840	if (IdentifierStr == "def") return tok_def;
				841	if (IdentifierStr == "extern") return tok_extern;
				842	return tok_identifier;
				843	}
				844
				845	if (isdigit(LastChar) \|\| LastChar == '.') { // Number: [0-9.]+
				846	std::string NumStr;
				847	do {
				848	NumStr += LastChar;
				849	LastChar = getchar();
				850	} while (isdigit(LastChar) \|\| LastChar == '.');
				851
				852	NumVal = strtod(NumStr.c_str(), 0);
				853	return tok_number;
				854	}
				855
				856	if (LastChar == '#') {
				857	// Comment until end of line.
				858	do LastChar = getchar();
				859	while (LastChar != EOF && LastChar != '\n' & LastChar != '\r');
				860
				861	if (LastChar != EOF)
				862	return gettok();
				863	}
				864
				865	// Check for end of file. Don't eat the EOF.
				866	if (LastChar == EOF)
				867	return tok_eof;
				868
				869	// Otherwise, just return the character as its ascii value.
				870	int ThisChar = LastChar;
				871	LastChar = getchar();
				872	return ThisChar;
				873	}
				874
				875	//===----------------------------------------------------------------------===//
				876	// Abstract Syntax Tree (aka Parse Tree)
				877	//===----------------------------------------------------------------------===//
				878
				879	/// ExprAST - Base class for all expression nodes.
				880	class ExprAST {
				881	public:
				882	virtual ~ExprAST() {}
				883	};
				884
				885	/// NumberExprAST - Expression class for numeric literals like "1.0".
				886	class NumberExprAST : public ExprAST {
				887	double Val;
				888	public:
Chris Lattner	28571ed	2007-10-23 04:27:44 +0000	[diff] [blame]	889	explicit NumberExprAST(double val) : Val(val) {}
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	890	};
				891
				892	/// VariableExprAST - Expression class for referencing a variable, like "a".
				893	class VariableExprAST : public ExprAST {
				894	std::string Name;
				895	public:
Chris Lattner	28571ed	2007-10-23 04:27:44 +0000	[diff] [blame]	896	explicit VariableExprAST(const std::string &name) : Name(name) {}
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	897	};
				898
				899	/// BinaryExprAST - Expression class for a binary operator.
				900	class BinaryExprAST : public ExprAST {
				901	char Op;
				902	ExprAST LHS, RHS;
				903	public:
				904	BinaryExprAST(char op, ExprAST lhs, ExprAST rhs)
				905	: Op(op), LHS(lhs), RHS(rhs) {}
				906	};
				907
				908	/// CallExprAST - Expression class for function calls.
				909	class CallExprAST : public ExprAST {
				910	std::string Callee;
				911	std::vector<ExprAST*> Args;
				912	public:
				913	CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
				914	: Callee(callee), Args(args) {}
				915	};
				916
				917	/// PrototypeAST - This class represents the "prototype" for a function,
				918	/// which captures its argument names as well as if it is an operator.
				919	class PrototypeAST {
				920	std::string Name;
				921	std::vector< Args;
				922	public:
				923	PrototypeAST(const std::string &name, const std::vector<std::string> &args)
				924	: Name(name), Args(args) {}
				925
				926	};
				927
				928	/// FunctionAST - This class represents a function definition itself.
				929	class FunctionAST {
				930	PrototypeAST *Proto;
				931	ExprAST *Body;
				932	public:
				933	FunctionAST(PrototypeAST proto, ExprAST body)
				934	: Proto(proto), Body(body) {}
				935
				936	};
				937
				938	//===----------------------------------------------------------------------===//
				939	// Parser
				940	//===----------------------------------------------------------------------===//
				941
				942	/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
				943	/// token the parser it looking at. getNextToken reads another token from the
				944	/// lexer and updates CurTok with its results.
				945	static int CurTok;
				946	static int getNextToken() {
				947	return CurTok = gettok();
				948	}
				949
				950	/// BinopPrecedence - This holds the precedence for each binary operator that is
				951	/// defined.
				952	static std::map<char, int> BinopPrecedence;
				953
				954	/// GetTokPrecedence - Get the precedence of the pending binary operator token.
				955	static int GetTokPrecedence() {
				956	if (!isascii(CurTok))
				957	return -1;
				958
				959	// Make sure it's a declared binop.
				960	int TokPrec = BinopPrecedence[CurTok];
				961	if (TokPrec <= 0) return -1;
				962	return TokPrec;
				963	}
				964
				965	/// Error* - These are little helper functions for error handling.
				966	ExprAST Error(const char Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
				967	PrototypeAST ErrorP(const char Str) { Error(Str); return 0; }
				968	FunctionAST ErrorF(const char Str) { Error(Str); return 0; }
				969
				970	static ExprAST *ParseExpression();
				971
				972	/// identifierexpr
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	973	/// ::= identifier
				974	/// ::= identifier '(' expression* ')'
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	975	static ExprAST *ParseIdentifierExpr() {
				976	std::string IdName = IdentifierStr;
				977
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	978	getNextToken(); // eat identifier.
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	979
				980	if (CurTok != '(') // Simple variable ref.
				981	return new VariableExprAST(IdName);
				982
				983	// Call.
				984	getNextToken(); // eat (
				985	std::vector<ExprAST*> Args;
				986	while (1) {
				987	ExprAST *Arg = ParseExpression();
				988	if (!Arg) return 0;
				989	Args.push_back(Arg);
				990
				991	if (CurTok == ')') break;
				992
				993	if (CurTok != ',')
				994	return Error("Expected ')'");
				995	getNextToken();
				996	}
				997
				998	// Eat the ')'.
				999	getNextToken();
				1000
				1001	return new CallExprAST(IdName, Args);
				1002	}
				1003
				1004	/// numberexpr ::= number
				1005	static ExprAST *ParseNumberExpr() {
				1006	ExprAST *Result = new NumberExprAST(NumVal);
				1007	getNextToken(); // consume the number
				1008	return Result;
				1009	}
				1010
				1011	/// parenexpr ::= '(' expression ')'
				1012	static ExprAST *ParseParenExpr() {
				1013	getNextToken(); // eat (.
				1014	ExprAST *V = ParseExpression();
				1015	if (!V) return 0;
				1016
				1017	if (CurTok != ')')
				1018	return Error("expected ')'");
				1019	getNextToken(); // eat ).
				1020	return V;
				1021	}
				1022
				1023	/// primary
				1024	/// ::= identifierexpr
				1025	/// ::= numberexpr
				1026	/// ::= parenexpr
				1027	static ExprAST *ParsePrimary() {
				1028	switch (CurTok) {
				1029	default: return Error("unknown token when expecting an expression");
				1030	case tok_identifier: return ParseIdentifierExpr();
				1031	case tok_number: return ParseNumberExpr();
				1032	case '(': return ParseParenExpr();
				1033	}
				1034	}
				1035
				1036	/// binoprhs
				1037	/// ::= ('+' primary)*
				1038	static ExprAST ParseBinOpRHS(int ExprPrec, ExprAST LHS) {
				1039	// If this is a binop, find its precedence.
				1040	while (1) {
				1041	int TokPrec = GetTokPrecedence();
				1042
				1043	// If this is a binop that binds at least as tightly as the current binop,
				1044	// consume it, otherwise we are done.
				1045	if (TokPrec < ExprPrec)
				1046	return LHS;
				1047
				1048	// Okay, we know this is a binop.
				1049	int BinOp = CurTok;
				1050	getNextToken(); // eat binop
				1051
				1052	// Parse the primary expression after the binary operator.
				1053	ExprAST *RHS = ParsePrimary();
				1054	if (!RHS) return 0;
				1055
				1056	// If BinOp binds less tightly with RHS than the operator after RHS, let
				1057	// the pending operator take RHS as its LHS.
				1058	int NextPrec = GetTokPrecedence();
				1059	if (TokPrec < NextPrec) {
				1060	RHS = ParseBinOpRHS(TokPrec+1, RHS);
				1061	if (RHS == 0) return 0;
				1062	}
				1063
				1064	// Merge LHS/RHS.
				1065	LHS = new BinaryExprAST(BinOp, LHS, RHS);
				1066	}
				1067	}
				1068
				1069	/// expression
				1070	/// ::= primary binoprhs
				1071	///
				1072	static ExprAST *ParseExpression() {
				1073	ExprAST *LHS = ParsePrimary();
				1074	if (!LHS) return 0;
				1075
				1076	return ParseBinOpRHS(0, LHS);
				1077	}
				1078
				1079	/// prototype
				1080	/// ::= id '(' id* ')'
				1081	static PrototypeAST *ParsePrototype() {
				1082	if (CurTok != tok_identifier)
				1083	return ErrorP("Expected function name in prototype");
				1084
				1085	std::string FnName = IdentifierStr;
				1086	getNextToken();
				1087
				1088	if (CurTok != '(')
				1089	return ErrorP("Expected '(' in prototype");
				1090
				1091	std::vector<std::string> ArgNames;
				1092	while (getNextToken() == tok_identifier)
				1093	ArgNames.push_back(IdentifierStr);
				1094	if (CurTok != ')')
				1095	return ErrorP("Expected ')' in prototype");
				1096
				1097	// success.
				1098	getNextToken(); // eat ')'.
				1099
				1100	return new PrototypeAST(FnName, ArgNames);
				1101	}
				1102
				1103	/// definition ::= 'def' prototype expression
				1104	static FunctionAST *ParseDefinition() {
				1105	getNextToken(); // eat def.
				1106	PrototypeAST *Proto = ParsePrototype();
				1107	if (Proto == 0) return 0;
				1108
				1109	if (ExprAST *E = ParseExpression())
				1110	return new FunctionAST(Proto, E);
				1111	return 0;
				1112	}
				1113
				1114	/// toplevelexpr ::= expression
				1115	static FunctionAST *ParseTopLevelExpr() {
				1116	if (ExprAST *E = ParseExpression()) {
				1117	// Make an anonymous proto.
				1118	PrototypeAST *Proto = new PrototypeAST("", std::vector<());
				1119	return new FunctionAST(Proto, E);
				1120	}
				1121	return 0;
				1122	}
				1123
				1124	/// external ::= 'extern' prototype
				1125	static PrototypeAST *ParseExtern() {
				1126	getNextToken(); // eat extern.
				1127	return ParsePrototype();
				1128	}
				1129
				1130	//===----------------------------------------------------------------------===//
				1131	// Top-Level parsing
				1132	//===----------------------------------------------------------------------===//
				1133
				1134	static void HandleDefinition() {
				1135	if (FunctionAST *F = ParseDefinition()) {
				1136	fprintf(stderr, "Parsed a function definition.\n");
				1137	} else {
				1138	// Skip token for error recovery.
				1139	getNextToken();
				1140	}
				1141	}
				1142
				1143	static void HandleExtern() {
				1144	if (PrototypeAST *P = ParseExtern()) {
				1145	fprintf(stderr, "Parsed an extern\n");
				1146	} else {
				1147	// Skip token for error recovery.
				1148	getNextToken();
				1149	}
				1150	}
				1151
				1152	static void HandleTopLevelExpression() {
				1153	// Evaluate a top level expression into an anonymous function.
				1154	if (FunctionAST *F = ParseTopLevelExpr()) {
				1155	fprintf(stderr, "Parsed a top-level expr\n");
				1156	} else {
				1157	// Skip token for error recovery.
				1158	getNextToken();
				1159	}
				1160	}
				1161
				1162	/// top ::= definition \| external \| expression \| ';'
				1163	static void MainLoop() {
				1164	while (1) {
				1165	fprintf(stderr, "ready> ");
				1166	switch (CurTok) {
				1167	case tok_eof: return;
				1168	case ';': getNextToken(); break; // ignore top level semicolons.
				1169	case tok_def: HandleDefinition(); break;
				1170	case tok_extern: HandleExtern(); break;
				1171	default: HandleTopLevelExpression(); break;
				1172	}
				1173	}
				1174	}
				1175
				1176	//===----------------------------------------------------------------------===//
				1177	// Main driver code.
				1178	//===----------------------------------------------------------------------===//
				1179
				1180	int main() {
				1181	// Install standard binary operators.
				1182	// 1 is lowest precedence.
				1183	BinopPrecedence['<'] = 10;
				1184	BinopPrecedence['+'] = 20;
				1185	BinopPrecedence['-'] = 20;
				1186	BinopPrecedence['*'] = 40; // highest.
				1187
				1188	// Prime the first token.
				1189	fprintf(stderr, "ready> ");
				1190	getNextToken();
				1191
				1192	MainLoop();
				1193	return 0;
				1194	}
				1195	</pre>
				1196	</div>
				1197	</div>
				1198
				1199	<!-- *********************************************************************** -->
				1200	<hr>
				1201	<address>
				1202	<a href="http://jigsaw.w3.org/css-validator/check/referer"><img
				1203	src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
				1204	<a href="http://validator.w3.org/check/referer"><img
Chris Lattner	8eef4b2	2007-10-23 06:30:50 +0000	[diff] [blame]	1205	src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	1206
				1207	<a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
				1208	<a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
				1209	Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $
				1210	</address>
				1211	</body>
				1212	</html>