Blame - docs/tutorial/LangImpl2.html - fp2-dev/platform/external/llvm

blob: b8b6112884163ef50fa8471cbaf8877c9ad824b0 [file] [log] [blame]

Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
				2	"http://www.w3.org/TR/html4/strict.dtd">
				3
				4	<html>
				5	<head>
				6	<title>Kaleidoscope: Implementing a Parser and AST</title>
				7	<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
				8	<meta name="author" content="Chris Lattner">
				9	<link rel="stylesheet" href="../llvm.css" type="text/css">
				10	</head>
				11
				12	<body>
				13
				14	<div class="doc_title">Kaleidoscope: Implementing a Parser and AST</div>
				15
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	16	<ul>
Chris Lattner	0e555b1	2007-11-05 20:04:56 +0000	[diff] [blame]	17	<li><a href="index.html">Up to Tutorial Index</a></li>
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	18	<li>Chapter 2
				19	<ol>
				20	<li><a href="#intro">Chapter 2 Introduction</a></li>
				21	<li><a href="#ast">The Abstract Syntax Tree (AST)</a></li>
				22	<li><a href="#parserbasics">Parser Basics</a></li>
				23	<li><a href="#parserprimexprs">Basic Expression Parsing</a></li>
				24	<li><a href="#parserbinops">Binary Expression Parsing</a></li>
				25	<li><a href="#parsertop">Parsing the Rest</a></li>
				26	<li><a href="#driver">The Driver</a></li>
				27	<li><a href="#conclusions">Conclusions</a></li>
				28	<li><a href="#code">Full Code Listing</a></li>
				29	</ol>
				30	</li>
Chris Lattner	0e555b1	2007-11-05 20:04:56 +0000	[diff] [blame]	31	<li><a href="LangImpl3.html">Chapter 3</a>: Code generation to LLVM IR</li>
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	32	</ul>
				33
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	34	<div class="doc_author">
				35	<p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
				36	</div>
				37
				38	<!-- *********************************************************************** -->
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	39	<div class="doc_section"><a name="intro">Chapter 2 Introduction</a></div>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	40	<!-- *********************************************************************** -->
				41
				42	<div class="doc_text">
				43
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	44	<p>Welcome to Chapter 2 of the "<a href="index.html">Implementing a language
				45	with LLVM</a>" tutorial. This chapter shows you how to use the <a
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	46	href="LangImpl1.html">Lexer built in Chapter 1</a> to build a full <a
				47	href="http://en.wikipedia.org/wiki/Parsing">parser</a> for
				48	our Kaleidoscope language and build an <a
				49	href="http://en.wikipedia.org/wiki/Abstract_syntax_tree">Abstract Syntax
				50	Tree</a> (AST).</p>
				51
				52	<p>The parser we will build uses a combination of <a
				53	href="http://en.wikipedia.org/wiki/Recursive_descent_parser">Recursive Descent
				54	Parsing</a> and <a href=
				55	"http://en.wikipedia.org/wiki/Operator-precedence_parser">Operator-Precedence
				56	Parsing</a> to parse the Kaleidoscope language (the later for binary expression
				57	and the former for everything else). Before we get to parsing though, lets talk
				58	about the output of the parser: the Abstract Syntax Tree.</p>
				59
				60	</div>
				61
				62	<!-- *********************************************************************** -->
				63	<div class="doc_section"><a name="ast">The Abstract Syntax Tree (AST)</a></div>
				64	<!-- *********************************************************************** -->
				65
				66	<div class="doc_text">
				67
				68	<p>The AST for a program captures its behavior in a way that it is easy for
				69	later stages of the compiler (e.g. code generation) to interpret. We basically
				70	want one object for each construct in the language, and the AST should closely
				71	model the language. In Kaleidoscope, we have expressions, a prototype, and a
				72	function object. We'll start with expressions first:</p>
				73
				74	<div class="doc_code">
				75	<pre>
				76	/// ExprAST - Base class for all expression nodes.
				77	class ExprAST {
				78	public:
				79	virtual ~ExprAST() {}
				80	};
				81
				82	/// NumberExprAST - Expression class for numeric literals like "1.0".
				83	class NumberExprAST : public ExprAST {
				84	double Val;
				85	public:
Chris Lattner	28571ed	2007-10-23 04:27:44 +0000	[diff] [blame]	86	explicit NumberExprAST(double val) : Val(val) {}
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	87	};
				88	</pre>
				89	</div>
				90
				91	<p>The code above shows the definition of the base ExprAST class and one
				92	subclass which we use for numeric literals. The important thing about this is
				93	that the NumberExprAST class captures the numeric value of the literal in the
				94	class, so that later phases of the compiler can know what it is.</p>
				95
				96	<p>Right now we only create the AST, so there are no useful accessor methods on
				97	them. It would be very easy to add a virtual method to pretty print the code,
				98	for example. Here are the other expression AST node definitions that we'll use
				99	in the basic form of the Kaleidoscope language.
				100	</p>
				101
				102	<div class="doc_code">
				103	<pre>
				104	/// VariableExprAST - Expression class for referencing a variable, like "a".
				105	class VariableExprAST : public ExprAST {
				106	std::string Name;
				107	public:
Chris Lattner	28571ed	2007-10-23 04:27:44 +0000	[diff] [blame]	108	explicit VariableExprAST(const std::string &name) : Name(name) {}
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	109	};
				110
				111	/// BinaryExprAST - Expression class for a binary operator.
				112	class BinaryExprAST : public ExprAST {
				113	char Op;
				114	ExprAST LHS, RHS;
				115	public:
				116	BinaryExprAST(char op, ExprAST lhs, ExprAST rhs)
				117	: Op(op), LHS(lhs), RHS(rhs) {}
				118	};
				119
				120	/// CallExprAST - Expression class for function calls.
				121	class CallExprAST : public ExprAST {
				122	std::string Callee;
				123	std::vector<ExprAST*> Args;
				124	public:
				125	CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
				126	: Callee(callee), Args(args) {}
				127	};
				128	</pre>
				129	</div>
				130
				131	<p>This is all (intentially) rather straight-forward: variables capture the
				132	variable name, binary operators capture their opcode (e.g. '+'), and calls
				133	capture a function name and list of argument expressions. One thing that is
				134	nice about our AST is that it captures the language features without talking
				135	about the syntax of the language. Note that there is no discussion about
				136	precedence of binary operators, lexical structure etc.</p>
				137
				138	<p>For our basic language, these are all of the expression nodes we'll define.
Owen Anderson	c6311b9	2007-10-22 06:48:28 +0000	[diff] [blame]	139	Because it doesn't have conditional control flow, it isn't Turing-complete;
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	140	we'll fix that in a later installment. The two things we need next are a way
				141	to talk about the interface to a function, and a way to talk about functions
				142	themselves:</p>
				143
				144	<div class="doc_code">
				145	<pre>
				146	/// PrototypeAST - This class represents the "prototype" for a function,
				147	/// which captures its argument names as well as if it is an operator.
				148	class PrototypeAST {
				149	std::string Name;
				150	std::vector<std::string> Args;
				151	public:
				152	PrototypeAST(const std::string &name, const std::vector<std::string> &args)
				153	: Name(name), Args(args) {}
				154	};
				155
				156	/// FunctionAST - This class represents a function definition itself.
				157	class FunctionAST {
				158	PrototypeAST *Proto;
				159	ExprAST *Body;
				160	public:
				161	FunctionAST(PrototypeAST proto, ExprAST body)
				162	: Proto(proto), Body(body) {}
				163	};
				164	</pre>
				165	</div>
				166
				167	<p>In Kaleidoscope, functions are typed with just a count of their arguments.
				168	Since all values are double precision floating point, this fact doesn't need to
				169	be captured anywhere. In a more aggressive and realistic language, the
				170	"ExprAST" class would probably have a type field.</p>
				171
				172	<p>With this scaffolding, we can now talk about parsing expressions and function
				173	bodies in Kaleidoscope.</p>
				174
				175	</div>
				176
				177	<!-- *********************************************************************** -->
				178	<div class="doc_section"><a name="parserbasics">Parser Basics</a></div>
				179	<!-- *********************************************************************** -->
				180
				181	<div class="doc_text">
				182
				183	<p>Now that we have an AST to build, we need to define the parser code to build
				184	it. The idea here is that we want to parse something like "x+y" (which is
				185	returned as three tokens by the lexer) into an AST that could be generated with
				186	calls like this:</p>
				187
				188	<div class="doc_code">
				189	<pre>
				190	ExprAST *X = new VariableExprAST("x");
				191	ExprAST *Y = new VariableExprAST("y");
				192	ExprAST *Result = new BinaryExprAST('+', X, Y);
				193	</pre>
				194	</div>
				195
				196	<p>In order to do this, we'll start by defining some basic helper routines:</p>
				197
				198	<div class="doc_code">
				199	<pre>
				200	/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
				201	/// token the parser it looking at. getNextToken reads another token from the
				202	/// lexer and updates CurTok with its results.
				203	static int CurTok;
				204	static int getNextToken() {
				205	return CurTok = gettok();
				206	}
				207	</pre>
				208	</div>
				209
				210	<p>
				211	This implements a simple token buffer around the lexer. This allows
				212	us to look one token ahead at what the lexer is returning. Every function in
Chris Lattner	e949512	2007-10-25 18:05:29 +0000	[diff] [blame]	213	our parser will assume that CurTok is the current token that needs to be
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	214	parsed.</p>
				215
Chris Lattner	bee175b	2007-10-22 16:44:31 +0000	[diff] [blame]	216	<p>Again, we define these with global variables; it would be better design to
				217	wrap the entire parser in a class and use instance variables for these.
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	218	</p>
				219
				220	<div class="doc_code">
				221	<pre>
				222
				223	/// Error* - These are little helper functions for error handling.
				224	ExprAST Error(const char Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
				225	PrototypeAST ErrorP(const char Str) { Error(Str); return 0; }
				226	FunctionAST ErrorF(const char Str) { Error(Str); return 0; }
				227	</pre>
				228	</div>
				229
				230	<p>
				231	The <tt>Error</tt> routines are simple helper routines that our parser will use
				232	to handle errors. The error recovery in our parser will not be the best and
Duncan Sands	72261ff	2007-11-05 16:04:58 +0000	[diff] [blame]	233	is not particular user-friendly, but it will be enough for our tutorial. These
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	234	routines make it easier to handle errors in routines that have various return
				235	types: they always return null.</p>
				236
				237	<p>With these basic helper functions implemented, we can implement the first
				238	piece of our grammar: we'll start with numeric literals.</p>
				239
				240	</div>
				241
				242	<!-- *********************************************************************** -->
				243	<div class="doc_section"><a name="parserprimexprs">Basic Expression
				244	Parsing</a></div>
				245	<!-- *********************************************************************** -->
				246
				247	<div class="doc_text">
				248
				249	<p>We start with numeric literals, because they are the simplest to process.
				250	For each production in our grammar, we'll define a function which parses that
				251	production. For numeric literals, we have:
				252	</p>
				253
				254	<div class="doc_code">
				255	<pre>
				256	/// numberexpr ::= number
				257	static ExprAST *ParseNumberExpr() {
				258	ExprAST *Result = new NumberExprAST(NumVal);
				259	getNextToken(); // consume the number
				260	return Result;
				261	}
				262	</pre>
				263	</div>
				264
				265	<p>This routine is very simple: it expects to be called when the current token
				266	is a <tt>tok_number</tt> token. It takes the current number value, creates
				267	a <tt>NumberExprAST</tt> node, advances the lexer to the next token, then
				268	returns.</p>
				269
				270	<p>There are some interesting aspects of this. The most important one is that
				271	this routine eats all of the tokens that correspond to the production, and
				272	returns the lexer buffer with the next token (which is not part of the grammar
				273	production) ready to go. This is a fairly standard way to go for recursive
				274	descent parsers. For a better example, the parenthesis operator is defined like
				275	this:</p>
				276
				277	<div class="doc_code">
				278	<pre>
				279	/// parenexpr ::= '(' expression ')'
				280	static ExprAST *ParseParenExpr() {
				281	getNextToken(); // eat (.
				282	ExprAST *V = ParseExpression();
				283	if (!V) return 0;
				284
				285	if (CurTok != ')')
				286	return Error("expected ')'");
				287	getNextToken(); // eat ).
				288	return V;
				289	}
				290	</pre>
				291	</div>
				292
				293	<p>This function illustrates a number of interesting things about the parser:
				294	1) it shows how we use the Error routines. When called, this function expects
				295	that the current token is a '(' token, but after parsing the subexpression, it
				296	is possible that there is not a ')' waiting. For example, if the user types in
				297	"(4 x" instead of "(4)", the parser should emit an error. Because errors can
				298	occur, the parser needs a way to indicate that they happened: in our parser, we
				299	return null on an error.</p>
				300
				301	<p>Another interesting aspect of this function is that it uses recursion by
Owen Anderson	c6311b9	2007-10-22 06:48:28 +0000	[diff] [blame]	302	calling <tt>ParseExpression</tt> (we will soon see that <tt>ParseExpression</tt> can call
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	303	<tt>ParseParenExpr</tt>). This is powerful because it allows us to handle
				304	recursive grammars, and keeps each production very simple. Note that
Duncan Sands	72261ff	2007-11-05 16:04:58 +0000	[diff] [blame]	305	parentheses do not cause construction of AST nodes themselves. While we could
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	306	do this, the most important role of parens are to guide the parser and provide
				307	grouping. Once the parser constructs the AST, parens are not needed.</p>
				308
				309	<p>The next simple production is for handling variable references and function
				310	calls:</p>
				311
				312	<div class="doc_code">
				313	<pre>
				314	/// identifierexpr
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	315	/// ::= identifier
				316	/// ::= identifier '(' expression* ')'
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	317	static ExprAST *ParseIdentifierExpr() {
				318	std::string IdName = IdentifierStr;
				319
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	320	getNextToken(); // eat identifier.
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	321
				322	if (CurTok != '(') // Simple variable ref.
				323	return new VariableExprAST(IdName);
				324
				325	// Call.
				326	getNextToken(); // eat (
				327	std::vector<ExprAST*> Args;
Chris Lattner	7115521	2007-11-06 01:39:12 +0000	[diff] [blame^]	328	if (CurTok != ')') {
				329	while (1) {
				330	ExprAST *Arg = ParseExpression();
				331	if (!Arg) return 0;
				332	Args.push_back(Arg);
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	333
Chris Lattner	7115521	2007-11-06 01:39:12 +0000	[diff] [blame^]	334	if (CurTok == ')') break;
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	335
Chris Lattner	7115521	2007-11-06 01:39:12 +0000	[diff] [blame^]	336	if (CurTok != ',')
				337	return Error("Expected ')'");
				338	getNextToken();
				339	}
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	340	}
				341
				342	// Eat the ')'.
				343	getNextToken();
				344
				345	return new CallExprAST(IdName, Args);
				346	}
				347	</pre>
				348	</div>
				349
				350	<p>This routine follows the same style as the other routines (it expects to be
				351	called if the current token is a <tt>tok_identifier</tt> token). It also has
				352	recursion and error handling. One interesting aspect of this is that it uses
				353	<em>look-ahead</em> to determine if the current identifier is a stand alone
				354	variable reference or if it is a function call expression. It handles this by
				355	checking to see if the token after the identifier is a '(' token, and constructs
				356	either a <tt>VariableExprAST</tt> or <tt>CallExprAST</tt> node as appropriate.
				357	</p>
				358
				359	<p>Now that we have all of our simple expression parsing logic in place, we can
				360	define a helper function to wrap them up in a class. We call this class of
				361	expressions "primary" expressions, for reasons that will become more clear
				362	later. In order to parse a primary expression, we need to determine what sort
				363	of expression it is:</p>
				364
				365	<div class="doc_code">
				366	<pre>
				367	/// primary
				368	/// ::= identifierexpr
				369	/// ::= numberexpr
				370	/// ::= parenexpr
				371	static ExprAST *ParsePrimary() {
				372	switch (CurTok) {
				373	default: return Error("unknown token when expecting an expression");
				374	case tok_identifier: return ParseIdentifierExpr();
				375	case tok_number: return ParseNumberExpr();
				376	case '(': return ParseParenExpr();
				377	}
				378	}
				379	</pre>
				380	</div>
				381
				382	<p>Now that you see the definition of this function, it makes it more obvious
				383	why we can assume the state of CurTok in the various functions. This uses
				384	look-ahead to determine which sort of expression is being inspected, and parses
				385	it with a function call.</p>
				386
				387	<p>Now that basic expressions are handled, we need to handle binary expressions,
				388	which are a bit more complex.</p>
				389
				390	</div>
				391
				392	<!-- *********************************************************************** -->
				393	<div class="doc_section"><a name="parserbinops">Binary Expression
				394	Parsing</a></div>
				395	<!-- *********************************************************************** -->
				396
				397	<div class="doc_text">
				398
				399	<p>Binary expressions are significantly harder to parse because they are often
				400	ambiguous. For example, when given the string "x+y*z", the parser can choose
				401	to parse it as either "(x+y)z" or "x+(yz)". With common definitions from
				402	mathematics, we expect the later parse, because "*" (multiplication) has
				403	higher <em>precedence</em> than "+" (addition).</p>
				404
				405	<p>There are many ways to handle this, but an elegant and efficient way is to
				406	use <a href=
				407	"http://en.wikipedia.org/wiki/Operator-precedence_parser">Operator-Precedence
				408	Parsing</a>. This parsing technique uses the precedence of binary operators to
				409	guide recursion. To start with, we need a table of precedences:</p>
				410
				411	<div class="doc_code">
				412	<pre>
				413	/// BinopPrecedence - This holds the precedence for each binary operator that is
				414	/// defined.
				415	static std::map<char, int> BinopPrecedence;
				416
				417	/// GetTokPrecedence - Get the precedence of the pending binary operator token.
				418	static int GetTokPrecedence() {
				419	if (!isascii(CurTok))
				420	return -1;
				421
				422	// Make sure it's a declared binop.
				423	int TokPrec = BinopPrecedence[CurTok];
				424	if (TokPrec <= 0) return -1;
				425	return TokPrec;
				426	}
				427
				428	int main() {
				429	// Install standard binary operators.
				430	// 1 is lowest precedence.
				431	BinopPrecedence['<'] = 10;
				432	BinopPrecedence['+'] = 20;
				433	BinopPrecedence['-'] = 20;
				434	BinopPrecedence['*'] = 40; // highest.
				435	...
				436	}
				437	</pre>
				438	</div>
				439
				440	<p>For the basic form of Kaleidoscope, we will only support 4 binary operators
				441	(this can obviously be extended by you, the reader). The
				442	<tt>GetTokPrecedence</tt> function returns the precedence for the current token,
				443	or -1 if the token is not a binary operator. Having a map makes it easy to add
				444	new operators and makes it clear that the algorithm doesn't depend on the
				445	specific operators involved, but it would be easy enough to eliminate the map
				446	and do the comparisons in the <tt>GetTokPrecedence</tt> function.</p>
				447
				448	<p>With the helper above defined, we can now start parsing binary expressions.
				449	The basic idea of operator precedence parsing is to break down an expression
				450	with potentially ambiguous binary operators into pieces. Consider for example
				451	the expression "a+b+(c+d)ef+g". Operator precedence parsing considers this
				452	as a stream of primary expressions separated by binary operators. As such,
				453	it will first parse the leading primary expression "a", then it will see the
				454	pairs [+, b] [+, (c+d)] [, e] [, f] and [+, g]. Note that because parentheses
Duncan Sands	72261ff	2007-11-05 16:04:58 +0000	[diff] [blame]	455	are primary expressions, the binary expression parser doesn't need to worry
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	456	about nested subexpressions like (c+d) at all.
				457	</p>
				458
				459	<p>
				460	To start, an expression is a primary expression potentially followed by a
				461	sequence of [binop,primaryexpr] pairs:</p>
				462
				463	<div class="doc_code">
				464	<pre>
				465	/// expression
				466	/// ::= primary binoprhs
				467	///
				468	static ExprAST *ParseExpression() {
				469	ExprAST *LHS = ParsePrimary();
				470	if (!LHS) return 0;
				471
				472	return ParseBinOpRHS(0, LHS);
				473	}
				474	</pre>
				475	</div>
				476
				477	<p><tt>ParseBinOpRHS</tt> is the function that parses the sequence of pairs for
				478	us. It takes a precedence and a pointer to an expression for the part parsed
				479	so far. Note that "x" is a perfectly valid expression: As such, "binoprhs" is
				480	allowed to be empty, in which case it returns the expression that is passed into
				481	it. In our example above, the code passes the expression for "a" into
				482	<tt>ParseBinOpRHS</tt> and the current token is "+".</p>
				483
				484	<p>The precedence value passed into <tt>ParseBinOpRHS</tt> indicates the <em>
				485	minimal operator precedence</em> that the function is allowed to eat. For
				486	example, if the current pair stream is [+, x] and <tt>ParseBinOpRHS</tt> is
				487	passed in a precedence of 40, it will not consume any tokens (because the
				488	precedence of '+' is only 20). With this in mind, <tt>ParseBinOpRHS</tt> starts
				489	with:</p>
				490
				491	<div class="doc_code">
				492	<pre>
				493	/// binoprhs
				494	/// ::= ('+' primary)*
				495	static ExprAST ParseBinOpRHS(int ExprPrec, ExprAST LHS) {
				496	// If this is a binop, find its precedence.
				497	while (1) {
				498	int TokPrec = GetTokPrecedence();
				499
				500	// If this is a binop that binds at least as tightly as the current binop,
				501	// consume it, otherwise we are done.
				502	if (TokPrec < ExprPrec)
				503	return LHS;
				504	</pre>
				505	</div>
				506
				507	<p>This code gets the precedence of the current token and checks to see if if is
				508	too low. Because we defined invalid tokens to have a precedence of -1, this
				509	check implicitly knows that the pair-stream ends when the token stream runs out
				510	of binary operators. If this check succeeds, we know that the token is a binary
				511	operator and that it will be included in this expression:</p>
				512
				513	<div class="doc_code">
				514	<pre>
				515	// Okay, we know this is a binop.
				516	int BinOp = CurTok;
				517	getNextToken(); // eat binop
				518
				519	// Parse the primary expression after the binary operator.
				520	ExprAST *RHS = ParsePrimary();
				521	if (!RHS) return 0;
				522	</pre>
				523	</div>
				524
				525	<p>As such, this code eats (and remembers) the binary operator and then parses
				526	the following primary expression. This builds up the whole pair, the first of
				527	which is [+, b] for the running example.</p>
				528
				529	<p>Now that we parsed the left-hand side of an expression and one pair of the
				530	RHS sequence, we have to decide which way the expression associates. In
				531	particular, we could have "(a+b) binop unparsed" or "a + (b binop unparsed)".
				532	To determine this, we look ahead at "binop" to determine its precedence and
				533	compare it to BinOp's precedence (which is '+' in this case):</p>
				534
				535	<div class="doc_code">
				536	<pre>
				537	// If BinOp binds less tightly with RHS than the operator after RHS, let
				538	// the pending operator take RHS as its LHS.
				539	int NextPrec = GetTokPrecedence();
				540	if (TokPrec < NextPrec) {
				541	</pre>
				542	</div>
				543
				544	<p>If the precedence of the binop to the right of "RHS" is lower or equal to the
				545	precedence of our current operator, then we know that the parentheses associate
				546	as "(a+b) binop ...". In our example, since the next operator is "+" and so is
				547	our current one, we know that they have the same precedence. In this case we'll
				548	create the AST node for "a+b", and then continue parsing:</p>
				549
				550	<div class="doc_code">
				551	<pre>
				552	... if body omitted ...
				553	}
				554
				555	// Merge LHS/RHS.
				556	LHS = new BinaryExprAST(BinOp, LHS, RHS);
				557	} // loop around to the top of the while loop.
				558	}
				559	</pre>
				560	</div>
				561
				562	<p>In our example above, this will turn "a+b+" into "(a+b)" and execute the next
				563	iteration of the loop, with "+" as the current token. The code above will eat
				564	and remember it and parse "(c+d)" as the primary expression, which makes the
				565	current pair be [+, (c+d)]. It will then enter the 'if' above with "*" as the
				566	binop to the right of the primary. In this case, the precedence of "*" is
				567	higher than the precedence of "+" so the if condition will be entered.</p>
				568
				569	<p>The critical question left here is "how can the if condition parse the right
				570	hand side in full"? In particular, to build the AST correctly for our example,
				571	it needs to get all of "(c+d)ef" as the RHS expression variable. The code to
				572	do this is surprisingly simple (code from the above two blocks duplicated for
				573	context):</p>
				574
				575	<div class="doc_code">
				576	<pre>
				577	// If BinOp binds less tightly with RHS than the operator after RHS, let
				578	// the pending operator take RHS as its LHS.
				579	int NextPrec = GetTokPrecedence();
				580	if (TokPrec < NextPrec) {
				581	RHS = ParseBinOpRHS(TokPrec+1, RHS);
				582	if (RHS == 0) return 0;
				583	}
				584	// Merge LHS/RHS.
				585	LHS = new BinaryExprAST(BinOp, LHS, RHS);
				586	} // loop around to the top of the while loop.
				587	}
				588	</pre>
				589	</div>
				590
				591	<p>At this point, we know that the binary operator to the RHS of our primary
				592	has higher precedence than the binop we are currently parsing. As such, we know
				593	that any sequence of pairs whose operators are all higher precedence than "+"
				594	should be parsed together and returned as "RHS". To do this, we recursively
				595	invoke the <tt>ParseBinOpRHS</tt> function specifying "TokPrec+1" as the minimum
				596	precedence required for it to continue. In our example above, this will cause
				597	it to return the AST node for "(c+d)ef" as RHS, which is then set as the RHS
				598	of the '+' expression.</p>
				599
				600	<p>Finally, on the next iteration of the while loop, the "+g" piece is parsed.
				601	and added to the AST. With this little bit of code (14 non-trivial lines), we
				602	correctly handle fully general binary expression parsing in a very elegant way.
				603	</p>
				604
				605	<p>This wraps up handling of expressions. At this point, we can point the
				606	parser at an arbitrary token stream and build an expression from them, stopping
				607	at the first token that is not part of the expression. Next up we need to
				608	handle function definitions etc.</p>
				609
				610	</div>
				611
				612	<!-- *********************************************************************** -->
				613	<div class="doc_section"><a name="parsertop">Parsing the Rest</a></div>
				614	<!-- *********************************************************************** -->
				615
				616	<div class="doc_text">
				617
				618	<p>
				619	The first basic thing missing is that of function prototypes. In Kaleidoscope,
				620	these are used both for 'extern' function declarations as well as function body
				621	definitions. The code to do this is straight-forward and not very interesting
				622	(once you've survived expressions):
				623	</p>
				624
				625	<div class="doc_code">
				626	<pre>
				627	/// prototype
				628	/// ::= id '(' id* ')'
				629	static PrototypeAST *ParsePrototype() {
				630	if (CurTok != tok_identifier)
				631	return ErrorP("Expected function name in prototype");
				632
				633	std::string FnName = IdentifierStr;
				634	getNextToken();
				635
				636	if (CurTok != '(')
				637	return ErrorP("Expected '(' in prototype");
				638
				639	std::vector<std::string> ArgNames;
				640	while (getNextToken() == tok_identifier)
				641	ArgNames.push_back(IdentifierStr);
				642	if (CurTok != ')')
				643	return ErrorP("Expected ')' in prototype");
				644
				645	// success.
				646	getNextToken(); // eat ')'.
				647
				648	return new PrototypeAST(FnName, ArgNames);
				649	}
				650	</pre>
				651	</div>
				652
				653	<p>Given this, a function definition is very simple, just a prototype plus
Duncan Sands	72261ff	2007-11-05 16:04:58 +0000	[diff] [blame]	654	an expression to implement the body:</p>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	655
				656	<div class="doc_code">
				657	<pre>
				658	/// definition ::= 'def' prototype expression
				659	static FunctionAST *ParseDefinition() {
				660	getNextToken(); // eat def.
				661	PrototypeAST *Proto = ParsePrototype();
				662	if (Proto == 0) return 0;
				663
				664	if (ExprAST *E = ParseExpression())
				665	return new FunctionAST(Proto, E);
				666	return 0;
				667	}
				668	</pre>
				669	</div>
				670
				671	<p>In addition, we support 'extern' to declare functions like 'sin' and 'cos' as
				672	well as to support forward declaration of user functions. 'externs' are just
				673	prototypes with no body:</p>
				674
				675	<div class="doc_code">
				676	<pre>
				677	/// external ::= 'extern' prototype
				678	static PrototypeAST *ParseExtern() {
				679	getNextToken(); // eat extern.
				680	return ParsePrototype();
				681	}
				682	</pre>
				683	</div>
				684
				685	<p>Finally, we'll also let the user type in arbitrary top-level expressions and
				686	evaluate them on the fly. We will handle this by defining anonymous nullary
				687	(zero argument) functions for them:</p>
				688
				689	<div class="doc_code">
				690	<pre>
				691	/// toplevelexpr ::= expression
				692	static FunctionAST *ParseTopLevelExpr() {
				693	if (ExprAST *E = ParseExpression()) {
				694	// Make an anonymous proto.
				695	PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
				696	return new FunctionAST(Proto, E);
				697	}
				698	return 0;
				699	}
				700	</pre>
				701	</div>
				702
				703	<p>Now that we have all the pieces, lets build a little driver that will let us
				704	actually <em>execute</em> this code we've built!</p>
				705
				706	</div>
				707
				708	<!-- *********************************************************************** -->
				709	<div class="doc_section"><a name="driver">The Driver</a></div>
				710	<!-- *********************************************************************** -->
				711
				712	<div class="doc_text">
				713
				714	<p>The driver for this simply invokes all of the parsing pieces with a top-level
				715	dispatch loop. There isn't much interesting here, so I'll just include the
				716	top-level loop. See <a href="#code">below</a> for full code in the "Top-Level
				717	Parsing" section.</p>
				718
				719	<div class="doc_code">
				720	<pre>
				721	/// top ::= definition \| external \| expression \| ';'
				722	static void MainLoop() {
				723	while (1) {
				724	fprintf(stderr, "ready> ");
				725	switch (CurTok) {
				726	case tok_eof: return;
				727	case ';': getNextToken(); break; // ignore top level semicolons.
				728	case tok_def: HandleDefinition(); break;
				729	case tok_extern: HandleExtern(); break;
				730	default: HandleTopLevelExpression(); break;
				731	}
				732	}
				733	}
				734	</pre>
				735	</div>
				736
				737	<p>The most interesting part of this is that we ignore top-level semi colons.
Owen Anderson	c6311b9	2007-10-22 06:48:28 +0000	[diff] [blame]	738	Why is this, you ask? The basic reason is that if you type "4 + 5" at the
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	739	command line, the parser doesn't know that that is the end of what you will
				740	type. For example, on the next line you could type "def foo..." in which case
				741	4+5 is the end of a top-level expression. Alternatively you could type "* 6",
				742	which would continue the expression. Having top-level semicolons allows you to
				743	type "4+5;" and the parser will know you are done.</p>
				744
				745	</div>
				746
				747	<!-- *********************************************************************** -->
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	748	<div class="doc_section"><a name="conclusions">Conclusions</a></div>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	749	<!-- *********************************************************************** -->
				750
				751	<div class="doc_text">
				752
				753	<p>With just under 400 lines of commented code, we fully defined our minimal
				754	language, including a lexer, parser and AST builder. With this done, the
				755	executable will validate code and tell us if it is gramatically invalid. For
				756	example, here is a sample interaction:</p>
				757
				758	<div class="doc_code">
				759	<pre>
				760	$ ./a.out
Chris Lattner	35abbf5	2007-10-23 06:23:57 +0000	[diff] [blame]	761	ready> def foo(x y) x+foo(y, 4.0);
Chris Lattner	01fcc04	2007-11-05 17:38:34 +0000	[diff] [blame]	762	ready> Parsed a function definition.
Chris Lattner	35abbf5	2007-10-23 06:23:57 +0000	[diff] [blame]	763	ready> def foo(x y) x+y y;
Chris Lattner	01fcc04	2007-11-05 17:38:34 +0000	[diff] [blame]	764	ready> Parsed a function definition.
Chris Lattner	35abbf5	2007-10-23 06:23:57 +0000	[diff] [blame]	765	ready> Parsed a top-level expr
				766	ready> def foo(x y) x+y );
Chris Lattner	01fcc04	2007-11-05 17:38:34 +0000	[diff] [blame]	767	ready> Parsed a function definition.
Chris Lattner	35abbf5	2007-10-23 06:23:57 +0000	[diff] [blame]	768	ready> Error: unknown token when expecting an expression
				769	ready> extern sin(a);
				770	ready> Parsed an extern
				771	ready> ^D
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	772	$
				773	</pre>
				774	</div>
				775
Chris Lattner	d93a584	2007-10-23 05:43:01 +0000	[diff] [blame]	776	<p>There is a lot of room for extension here. You can define new AST nodes,
				777	extend the language in many ways, etc. In the <a href="LangImpl3.html">next
				778	installment</a>, we will describe how to generate LLVM IR from the AST.</p>
				779
				780	</div>
				781
				782	<!-- *********************************************************************** -->
				783	<div class="doc_section"><a name="code">Full Code Listing</a></div>
				784	<!-- *********************************************************************** -->
				785
				786	<div class="doc_text">
				787
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	788	<p>
Chris Lattner	d93a584	2007-10-23 05:43:01 +0000	[diff] [blame]	789	Here is the complete code listing for this and the previous chapter.
				790	Note that it is fully self-contained: you don't need LLVM or any external
				791	libraries at all for this (other than the C and C++ standard libraries of
				792	course). To build this, just compile with:</p>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	793
				794	<div class="doc_code">
				795	<pre>
Chris Lattner	d93a584	2007-10-23 05:43:01 +0000	[diff] [blame]	796	# Compile
				797	g++ -g toy.cpp
				798	# Run
				799	./a.out
				800	</pre>
				801	</div>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	802
Chris Lattner	d93a584	2007-10-23 05:43:01 +0000	[diff] [blame]	803	<p>Here is the code:</p>
				804
				805	<div class="doc_code">
				806	<pre>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	807	#include <cstdio>
				808	#include <string>
Chris Lattner	d93a584	2007-10-23 05:43:01 +0000	[diff] [blame]	809	#include <map>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	810	#include <vector>
				811
				812	//===----------------------------------------------------------------------===//
				813	// Lexer
				814	//===----------------------------------------------------------------------===//
				815
				816	// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
				817	// of these for known things.
				818	enum Token {
				819	tok_eof = -1,
				820
				821	// commands
				822	tok_def = -2, tok_extern = -3,
				823
				824	// primary
				825	tok_identifier = -4, tok_number = -5,
				826	};
				827
				828	static std::string IdentifierStr; // Filled in if tok_identifier
				829	static double NumVal; // Filled in if tok_number
				830
				831	/// gettok - Return the next token from standard input.
				832	static int gettok() {
				833	static int LastChar = ' ';
				834
				835	// Skip any whitespace.
				836	while (isspace(LastChar))
				837	LastChar = getchar();
				838
				839	if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
				840	IdentifierStr = LastChar;
				841	while (isalnum((LastChar = getchar())))
				842	IdentifierStr += LastChar;
				843
				844	if (IdentifierStr == "def") return tok_def;
				845	if (IdentifierStr == "extern") return tok_extern;
				846	return tok_identifier;
				847	}
				848
				849	if (isdigit(LastChar) \|\| LastChar == '.') { // Number: [0-9.]+
				850	std::string NumStr;
				851	do {
				852	NumStr += LastChar;
				853	LastChar = getchar();
				854	} while (isdigit(LastChar) \|\| LastChar == '.');
				855
				856	NumVal = strtod(NumStr.c_str(), 0);
				857	return tok_number;
				858	}
				859
				860	if (LastChar == '#') {
				861	// Comment until end of line.
				862	do LastChar = getchar();
				863	while (LastChar != EOF && LastChar != '\n' & LastChar != '\r');
				864
				865	if (LastChar != EOF)
				866	return gettok();
				867	}
				868
				869	// Check for end of file. Don't eat the EOF.
				870	if (LastChar == EOF)
				871	return tok_eof;
				872
				873	// Otherwise, just return the character as its ascii value.
				874	int ThisChar = LastChar;
				875	LastChar = getchar();
				876	return ThisChar;
				877	}
				878
				879	//===----------------------------------------------------------------------===//
				880	// Abstract Syntax Tree (aka Parse Tree)
				881	//===----------------------------------------------------------------------===//
				882
				883	/// ExprAST - Base class for all expression nodes.
				884	class ExprAST {
				885	public:
				886	virtual ~ExprAST() {}
				887	};
				888
				889	/// NumberExprAST - Expression class for numeric literals like "1.0".
				890	class NumberExprAST : public ExprAST {
				891	double Val;
				892	public:
Chris Lattner	28571ed	2007-10-23 04:27:44 +0000	[diff] [blame]	893	explicit NumberExprAST(double val) : Val(val) {}
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	894	};
				895
				896	/// VariableExprAST - Expression class for referencing a variable, like "a".
				897	class VariableExprAST : public ExprAST {
				898	std::string Name;
				899	public:
Chris Lattner	28571ed	2007-10-23 04:27:44 +0000	[diff] [blame]	900	explicit VariableExprAST(const std::string &name) : Name(name) {}
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	901	};
				902
				903	/// BinaryExprAST - Expression class for a binary operator.
				904	class BinaryExprAST : public ExprAST {
				905	char Op;
				906	ExprAST LHS, RHS;
				907	public:
				908	BinaryExprAST(char op, ExprAST lhs, ExprAST rhs)
				909	: Op(op), LHS(lhs), RHS(rhs) {}
				910	};
				911
				912	/// CallExprAST - Expression class for function calls.
				913	class CallExprAST : public ExprAST {
				914	std::string Callee;
				915	std::vector<ExprAST*> Args;
				916	public:
				917	CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
				918	: Callee(callee), Args(args) {}
				919	};
				920
				921	/// PrototypeAST - This class represents the "prototype" for a function,
				922	/// which captures its argument names as well as if it is an operator.
				923	class PrototypeAST {
				924	std::string Name;
				925	std::vector< Args;
				926	public:
				927	PrototypeAST(const std::string &name, const std::vector<std::string> &args)
				928	: Name(name), Args(args) {}
				929
				930	};
				931
				932	/// FunctionAST - This class represents a function definition itself.
				933	class FunctionAST {
				934	PrototypeAST *Proto;
				935	ExprAST *Body;
				936	public:
				937	FunctionAST(PrototypeAST proto, ExprAST body)
				938	: Proto(proto), Body(body) {}
				939
				940	};
				941
				942	//===----------------------------------------------------------------------===//
				943	// Parser
				944	//===----------------------------------------------------------------------===//
				945
				946	/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
				947	/// token the parser it looking at. getNextToken reads another token from the
				948	/// lexer and updates CurTok with its results.
				949	static int CurTok;
				950	static int getNextToken() {
				951	return CurTok = gettok();
				952	}
				953
				954	/// BinopPrecedence - This holds the precedence for each binary operator that is
				955	/// defined.
				956	static std::map<char, int> BinopPrecedence;
				957
				958	/// GetTokPrecedence - Get the precedence of the pending binary operator token.
				959	static int GetTokPrecedence() {
				960	if (!isascii(CurTok))
				961	return -1;
				962
				963	// Make sure it's a declared binop.
				964	int TokPrec = BinopPrecedence[CurTok];
				965	if (TokPrec <= 0) return -1;
				966	return TokPrec;
				967	}
				968
				969	/// Error* - These are little helper functions for error handling.
				970	ExprAST Error(const char Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
				971	PrototypeAST ErrorP(const char Str) { Error(Str); return 0; }
				972	FunctionAST ErrorF(const char Str) { Error(Str); return 0; }
				973
				974	static ExprAST *ParseExpression();
				975
				976	/// identifierexpr
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	977	/// ::= identifier
				978	/// ::= identifier '(' expression* ')'
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	979	static ExprAST *ParseIdentifierExpr() {
				980	std::string IdName = IdentifierStr;
				981
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	982	getNextToken(); // eat identifier.
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	983
				984	if (CurTok != '(') // Simple variable ref.
				985	return new VariableExprAST(IdName);
				986
				987	// Call.
				988	getNextToken(); // eat (
				989	std::vector<ExprAST*> Args;
Chris Lattner	7115521	2007-11-06 01:39:12 +0000	[diff] [blame^]	990	if (CurTok != ')') {
				991	while (1) {
				992	ExprAST *Arg = ParseExpression();
				993	if (!Arg) return 0;
				994	Args.push_back(Arg);
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	995
Chris Lattner	7115521	2007-11-06 01:39:12 +0000	[diff] [blame^]	996	if (CurTok == ')') break;
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	997
Chris Lattner	7115521	2007-11-06 01:39:12 +0000	[diff] [blame^]	998	if (CurTok != ',')
				999	return Error("Expected ')'");
				1000	getNextToken();
				1001	}
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	1002	}
				1003
				1004	// Eat the ')'.
				1005	getNextToken();
				1006
				1007	return new CallExprAST(IdName, Args);
				1008	}
				1009
				1010	/// numberexpr ::= number
				1011	static ExprAST *ParseNumberExpr() {
				1012	ExprAST *Result = new NumberExprAST(NumVal);
				1013	getNextToken(); // consume the number
				1014	return Result;
				1015	}
				1016
				1017	/// parenexpr ::= '(' expression ')'
				1018	static ExprAST *ParseParenExpr() {
				1019	getNextToken(); // eat (.
				1020	ExprAST *V = ParseExpression();
				1021	if (!V) return 0;
				1022
				1023	if (CurTok != ')')
				1024	return Error("expected ')'");
				1025	getNextToken(); // eat ).
				1026	return V;
				1027	}
				1028
				1029	/// primary
				1030	/// ::= identifierexpr
				1031	/// ::= numberexpr
				1032	/// ::= parenexpr
				1033	static ExprAST *ParsePrimary() {
				1034	switch (CurTok) {
				1035	default: return Error("unknown token when expecting an expression");
				1036	case tok_identifier: return ParseIdentifierExpr();
				1037	case tok_number: return ParseNumberExpr();
				1038	case '(': return ParseParenExpr();
				1039	}
				1040	}
				1041
				1042	/// binoprhs
				1043	/// ::= ('+' primary)*
				1044	static ExprAST ParseBinOpRHS(int ExprPrec, ExprAST LHS) {
				1045	// If this is a binop, find its precedence.
				1046	while (1) {
				1047	int TokPrec = GetTokPrecedence();
				1048
				1049	// If this is a binop that binds at least as tightly as the current binop,
				1050	// consume it, otherwise we are done.
				1051	if (TokPrec < ExprPrec)
				1052	return LHS;
				1053
				1054	// Okay, we know this is a binop.
				1055	int BinOp = CurTok;
				1056	getNextToken(); // eat binop
				1057
				1058	// Parse the primary expression after the binary operator.
				1059	ExprAST *RHS = ParsePrimary();
				1060	if (!RHS) return 0;
				1061
				1062	// If BinOp binds less tightly with RHS than the operator after RHS, let
				1063	// the pending operator take RHS as its LHS.
				1064	int NextPrec = GetTokPrecedence();
				1065	if (TokPrec < NextPrec) {
				1066	RHS = ParseBinOpRHS(TokPrec+1, RHS);
				1067	if (RHS == 0) return 0;
				1068	}
				1069
				1070	// Merge LHS/RHS.
				1071	LHS = new BinaryExprAST(BinOp, LHS, RHS);
				1072	}
				1073	}
				1074
				1075	/// expression
				1076	/// ::= primary binoprhs
				1077	///
				1078	static ExprAST *ParseExpression() {
				1079	ExprAST *LHS = ParsePrimary();
				1080	if (!LHS) return 0;
				1081
				1082	return ParseBinOpRHS(0, LHS);
				1083	}
				1084
				1085	/// prototype
				1086	/// ::= id '(' id* ')'
				1087	static PrototypeAST *ParsePrototype() {
				1088	if (CurTok != tok_identifier)
				1089	return ErrorP("Expected function name in prototype");
				1090
				1091	std::string FnName = IdentifierStr;
				1092	getNextToken();
				1093
				1094	if (CurTok != '(')
				1095	return ErrorP("Expected '(' in prototype");
				1096
				1097	std::vector<std::string> ArgNames;
				1098	while (getNextToken() == tok_identifier)
				1099	ArgNames.push_back(IdentifierStr);
				1100	if (CurTok != ')')
				1101	return ErrorP("Expected ')' in prototype");
				1102
				1103	// success.
				1104	getNextToken(); // eat ')'.
				1105
				1106	return new PrototypeAST(FnName, ArgNames);
				1107	}
				1108
				1109	/// definition ::= 'def' prototype expression
				1110	static FunctionAST *ParseDefinition() {
				1111	getNextToken(); // eat def.
				1112	PrototypeAST *Proto = ParsePrototype();
				1113	if (Proto == 0) return 0;
				1114
				1115	if (ExprAST *E = ParseExpression())
				1116	return new FunctionAST(Proto, E);
				1117	return 0;
				1118	}
				1119
				1120	/// toplevelexpr ::= expression
				1121	static FunctionAST *ParseTopLevelExpr() {
				1122	if (ExprAST *E = ParseExpression()) {
				1123	// Make an anonymous proto.
				1124	PrototypeAST *Proto = new PrototypeAST("", std::vector<());
				1125	return new FunctionAST(Proto, E);
				1126	}
				1127	return 0;
				1128	}
				1129
				1130	/// external ::= 'extern' prototype
				1131	static PrototypeAST *ParseExtern() {
				1132	getNextToken(); // eat extern.
				1133	return ParsePrototype();
				1134	}
				1135
				1136	//===----------------------------------------------------------------------===//
				1137	// Top-Level parsing
				1138	//===----------------------------------------------------------------------===//
				1139
				1140	static void HandleDefinition() {
				1141	if (FunctionAST *F = ParseDefinition()) {
				1142	fprintf(stderr, "Parsed a function definition.\n");
				1143	} else {
				1144	// Skip token for error recovery.
				1145	getNextToken();
				1146	}
				1147	}
				1148
				1149	static void HandleExtern() {
				1150	if (PrototypeAST *P = ParseExtern()) {
				1151	fprintf(stderr, "Parsed an extern\n");
				1152	} else {
				1153	// Skip token for error recovery.
				1154	getNextToken();
				1155	}
				1156	}
				1157
				1158	static void HandleTopLevelExpression() {
				1159	// Evaluate a top level expression into an anonymous function.
				1160	if (FunctionAST *F = ParseTopLevelExpr()) {
				1161	fprintf(stderr, "Parsed a top-level expr\n");
				1162	} else {
				1163	// Skip token for error recovery.
				1164	getNextToken();
				1165	}
				1166	}
				1167
				1168	/// top ::= definition \| external \| expression \| ';'
				1169	static void MainLoop() {
				1170	while (1) {
				1171	fprintf(stderr, "ready> ");
				1172	switch (CurTok) {
				1173	case tok_eof: return;
				1174	case ';': getNextToken(); break; // ignore top level semicolons.
				1175	case tok_def: HandleDefinition(); break;
				1176	case tok_extern: HandleExtern(); break;
				1177	default: HandleTopLevelExpression(); break;
				1178	}
				1179	}
				1180	}
				1181
				1182	//===----------------------------------------------------------------------===//
				1183	// Main driver code.
				1184	//===----------------------------------------------------------------------===//
				1185
				1186	int main() {
				1187	// Install standard binary operators.
				1188	// 1 is lowest precedence.
				1189	BinopPrecedence['<'] = 10;
				1190	BinopPrecedence['+'] = 20;
				1191	BinopPrecedence['-'] = 20;
				1192	BinopPrecedence['*'] = 40; // highest.
				1193
				1194	// Prime the first token.
				1195	fprintf(stderr, "ready> ");
				1196	getNextToken();
				1197
				1198	MainLoop();
				1199	return 0;
				1200	}
				1201	</pre>
				1202	</div>
				1203	</div>
				1204
				1205	<!-- *********************************************************************** -->
				1206	<hr>
				1207	<address>
				1208	<a href="http://jigsaw.w3.org/css-validator/check/referer"><img
				1209	src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
				1210	<a href="http://validator.w3.org/check/referer"><img
Chris Lattner	8eef4b2	2007-10-23 06:30:50 +0000	[diff] [blame]	1211	src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	1212
				1213	<a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
				1214	<a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
				1215	Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $
				1216	</address>
				1217	</body>
				1218	</html>