Blame - docs/tutorial/LangImpl2.html - fp2-dev/platform/external/llvm

blob: 7eed8d644b29e8e72abc8a7977a4e6f0b529c7c0 [file] [log] [blame]

Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
				2	"http://www.w3.org/TR/html4/strict.dtd">
				3
				4	<html>
				5	<head>
				6	<title>Kaleidoscope: Implementing a Parser and AST</title>
				7	<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
				8	<meta name="author" content="Chris Lattner">
				9	<link rel="stylesheet" href="../llvm.css" type="text/css">
				10	</head>
				11
				12	<body>
				13
				14	<div class="doc_title">Kaleidoscope: Implementing a Parser and AST</div>
				15
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	16	<ul>
Chris Lattner	0e555b1	2007-11-05 20:04:56 +0000	[diff] [blame^]	17	<li><a href="index.html">Up to Tutorial Index</a></li>
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	18	<li>Chapter 2
				19	<ol>
				20	<li><a href="#intro">Chapter 2 Introduction</a></li>
				21	<li><a href="#ast">The Abstract Syntax Tree (AST)</a></li>
				22	<li><a href="#parserbasics">Parser Basics</a></li>
				23	<li><a href="#parserprimexprs">Basic Expression Parsing</a></li>
				24	<li><a href="#parserbinops">Binary Expression Parsing</a></li>
				25	<li><a href="#parsertop">Parsing the Rest</a></li>
				26	<li><a href="#driver">The Driver</a></li>
				27	<li><a href="#conclusions">Conclusions</a></li>
				28	<li><a href="#code">Full Code Listing</a></li>
				29	</ol>
				30	</li>
Chris Lattner	0e555b1	2007-11-05 20:04:56 +0000	[diff] [blame^]	31	<li><a href="LangImpl3.html">Chapter 3</a>: Code generation to LLVM IR</li>
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	32	</ul>
				33
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	34	<div class="doc_author">
				35	<p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
				36	</div>
				37
				38	<!-- *********************************************************************** -->
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	39	<div class="doc_section"><a name="intro">Chapter 2 Introduction</a></div>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	40	<!-- *********************************************************************** -->
				41
				42	<div class="doc_text">
				43
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	44	<p>Welcome to Chapter 2 of the "<a href="index.html">Implementing a language
				45	with LLVM</a>" tutorial. This chapter shows you how to use the <a
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	46	href="LangImpl1.html">Lexer built in Chapter 1</a> to build a full <a
				47	href="http://en.wikipedia.org/wiki/Parsing">parser</a> for
				48	our Kaleidoscope language and build an <a
				49	href="http://en.wikipedia.org/wiki/Abstract_syntax_tree">Abstract Syntax
				50	Tree</a> (AST).</p>
				51
				52	<p>The parser we will build uses a combination of <a
				53	href="http://en.wikipedia.org/wiki/Recursive_descent_parser">Recursive Descent
				54	Parsing</a> and <a href=
				55	"http://en.wikipedia.org/wiki/Operator-precedence_parser">Operator-Precedence
				56	Parsing</a> to parse the Kaleidoscope language (the later for binary expression
				57	and the former for everything else). Before we get to parsing though, lets talk
				58	about the output of the parser: the Abstract Syntax Tree.</p>
				59
				60	</div>
				61
				62	<!-- *********************************************************************** -->
				63	<div class="doc_section"><a name="ast">The Abstract Syntax Tree (AST)</a></div>
				64	<!-- *********************************************************************** -->
				65
				66	<div class="doc_text">
				67
				68	<p>The AST for a program captures its behavior in a way that it is easy for
				69	later stages of the compiler (e.g. code generation) to interpret. We basically
				70	want one object for each construct in the language, and the AST should closely
				71	model the language. In Kaleidoscope, we have expressions, a prototype, and a
				72	function object. We'll start with expressions first:</p>
				73
				74	<div class="doc_code">
				75	<pre>
				76	/// ExprAST - Base class for all expression nodes.
				77	class ExprAST {
				78	public:
				79	virtual ~ExprAST() {}
				80	};
				81
				82	/// NumberExprAST - Expression class for numeric literals like "1.0".
				83	class NumberExprAST : public ExprAST {
				84	double Val;
				85	public:
Chris Lattner	28571ed	2007-10-23 04:27:44 +0000	[diff] [blame]	86	explicit NumberExprAST(double val) : Val(val) {}
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	87	};
				88	</pre>
				89	</div>
				90
				91	<p>The code above shows the definition of the base ExprAST class and one
				92	subclass which we use for numeric literals. The important thing about this is
				93	that the NumberExprAST class captures the numeric value of the literal in the
				94	class, so that later phases of the compiler can know what it is.</p>
				95
				96	<p>Right now we only create the AST, so there are no useful accessor methods on
				97	them. It would be very easy to add a virtual method to pretty print the code,
				98	for example. Here are the other expression AST node definitions that we'll use
				99	in the basic form of the Kaleidoscope language.
				100	</p>
				101
				102	<div class="doc_code">
				103	<pre>
				104	/// VariableExprAST - Expression class for referencing a variable, like "a".
				105	class VariableExprAST : public ExprAST {
				106	std::string Name;
				107	public:
Chris Lattner	28571ed	2007-10-23 04:27:44 +0000	[diff] [blame]	108	explicit VariableExprAST(const std::string &name) : Name(name) {}
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	109	};
				110
				111	/// BinaryExprAST - Expression class for a binary operator.
				112	class BinaryExprAST : public ExprAST {
				113	char Op;
				114	ExprAST LHS, RHS;
				115	public:
				116	BinaryExprAST(char op, ExprAST lhs, ExprAST rhs)
				117	: Op(op), LHS(lhs), RHS(rhs) {}
				118	};
				119
				120	/// CallExprAST - Expression class for function calls.
				121	class CallExprAST : public ExprAST {
				122	std::string Callee;
				123	std::vector<ExprAST*> Args;
				124	public:
				125	CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
				126	: Callee(callee), Args(args) {}
				127	};
				128	</pre>
				129	</div>
				130
				131	<p>This is all (intentially) rather straight-forward: variables capture the
				132	variable name, binary operators capture their opcode (e.g. '+'), and calls
				133	capture a function name and list of argument expressions. One thing that is
				134	nice about our AST is that it captures the language features without talking
				135	about the syntax of the language. Note that there is no discussion about
				136	precedence of binary operators, lexical structure etc.</p>
				137
				138	<p>For our basic language, these are all of the expression nodes we'll define.
Owen Anderson	c6311b9	2007-10-22 06:48:28 +0000	[diff] [blame]	139	Because it doesn't have conditional control flow, it isn't Turing-complete;
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	140	we'll fix that in a later installment. The two things we need next are a way
				141	to talk about the interface to a function, and a way to talk about functions
				142	themselves:</p>
				143
				144	<div class="doc_code">
				145	<pre>
				146	/// PrototypeAST - This class represents the "prototype" for a function,
				147	/// which captures its argument names as well as if it is an operator.
				148	class PrototypeAST {
				149	std::string Name;
				150	std::vector<std::string> Args;
				151	public:
				152	PrototypeAST(const std::string &name, const std::vector<std::string> &args)
				153	: Name(name), Args(args) {}
				154	};
				155
				156	/// FunctionAST - This class represents a function definition itself.
				157	class FunctionAST {
				158	PrototypeAST *Proto;
				159	ExprAST *Body;
				160	public:
				161	FunctionAST(PrototypeAST proto, ExprAST body)
				162	: Proto(proto), Body(body) {}
				163	};
				164	</pre>
				165	</div>
				166
				167	<p>In Kaleidoscope, functions are typed with just a count of their arguments.
				168	Since all values are double precision floating point, this fact doesn't need to
				169	be captured anywhere. In a more aggressive and realistic language, the
				170	"ExprAST" class would probably have a type field.</p>
				171
				172	<p>With this scaffolding, we can now talk about parsing expressions and function
				173	bodies in Kaleidoscope.</p>
				174
				175	</div>
				176
				177	<!-- *********************************************************************** -->
				178	<div class="doc_section"><a name="parserbasics">Parser Basics</a></div>
				179	<!-- *********************************************************************** -->
				180
				181	<div class="doc_text">
				182
				183	<p>Now that we have an AST to build, we need to define the parser code to build
				184	it. The idea here is that we want to parse something like "x+y" (which is
				185	returned as three tokens by the lexer) into an AST that could be generated with
				186	calls like this:</p>
				187
				188	<div class="doc_code">
				189	<pre>
				190	ExprAST *X = new VariableExprAST("x");
				191	ExprAST *Y = new VariableExprAST("y");
				192	ExprAST *Result = new BinaryExprAST('+', X, Y);
				193	</pre>
				194	</div>
				195
				196	<p>In order to do this, we'll start by defining some basic helper routines:</p>
				197
				198	<div class="doc_code">
				199	<pre>
				200	/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
				201	/// token the parser it looking at. getNextToken reads another token from the
				202	/// lexer and updates CurTok with its results.
				203	static int CurTok;
				204	static int getNextToken() {
				205	return CurTok = gettok();
				206	}
				207	</pre>
				208	</div>
				209
				210	<p>
				211	This implements a simple token buffer around the lexer. This allows
				212	us to look one token ahead at what the lexer is returning. Every function in
Chris Lattner	e949512	2007-10-25 18:05:29 +0000	[diff] [blame]	213	our parser will assume that CurTok is the current token that needs to be
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	214	parsed.</p>
				215
Chris Lattner	bee175b	2007-10-22 16:44:31 +0000	[diff] [blame]	216	<p>Again, we define these with global variables; it would be better design to
				217	wrap the entire parser in a class and use instance variables for these.
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	218	</p>
				219
				220	<div class="doc_code">
				221	<pre>
				222
				223	/// Error* - These are little helper functions for error handling.
				224	ExprAST Error(const char Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
				225	PrototypeAST ErrorP(const char Str) { Error(Str); return 0; }
				226	FunctionAST ErrorF(const char Str) { Error(Str); return 0; }
				227	</pre>
				228	</div>
				229
				230	<p>
				231	The <tt>Error</tt> routines are simple helper routines that our parser will use
				232	to handle errors. The error recovery in our parser will not be the best and
Duncan Sands	72261ff	2007-11-05 16:04:58 +0000	[diff] [blame]	233	is not particular user-friendly, but it will be enough for our tutorial. These
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	234	routines make it easier to handle errors in routines that have various return
				235	types: they always return null.</p>
				236
				237	<p>With these basic helper functions implemented, we can implement the first
				238	piece of our grammar: we'll start with numeric literals.</p>
				239
				240	</div>
				241
				242	<!-- *********************************************************************** -->
				243	<div class="doc_section"><a name="parserprimexprs">Basic Expression
				244	Parsing</a></div>
				245	<!-- *********************************************************************** -->
				246
				247	<div class="doc_text">
				248
				249	<p>We start with numeric literals, because they are the simplest to process.
				250	For each production in our grammar, we'll define a function which parses that
				251	production. For numeric literals, we have:
				252	</p>
				253
				254	<div class="doc_code">
				255	<pre>
				256	/// numberexpr ::= number
				257	static ExprAST *ParseNumberExpr() {
				258	ExprAST *Result = new NumberExprAST(NumVal);
				259	getNextToken(); // consume the number
				260	return Result;
				261	}
				262	</pre>
				263	</div>
				264
				265	<p>This routine is very simple: it expects to be called when the current token
				266	is a <tt>tok_number</tt> token. It takes the current number value, creates
				267	a <tt>NumberExprAST</tt> node, advances the lexer to the next token, then
				268	returns.</p>
				269
				270	<p>There are some interesting aspects of this. The most important one is that
				271	this routine eats all of the tokens that correspond to the production, and
				272	returns the lexer buffer with the next token (which is not part of the grammar
				273	production) ready to go. This is a fairly standard way to go for recursive
				274	descent parsers. For a better example, the parenthesis operator is defined like
				275	this:</p>
				276
				277	<div class="doc_code">
				278	<pre>
				279	/// parenexpr ::= '(' expression ')'
				280	static ExprAST *ParseParenExpr() {
				281	getNextToken(); // eat (.
				282	ExprAST *V = ParseExpression();
				283	if (!V) return 0;
				284
				285	if (CurTok != ')')
				286	return Error("expected ')'");
				287	getNextToken(); // eat ).
				288	return V;
				289	}
				290	</pre>
				291	</div>
				292
				293	<p>This function illustrates a number of interesting things about the parser:
				294	1) it shows how we use the Error routines. When called, this function expects
				295	that the current token is a '(' token, but after parsing the subexpression, it
				296	is possible that there is not a ')' waiting. For example, if the user types in
				297	"(4 x" instead of "(4)", the parser should emit an error. Because errors can
				298	occur, the parser needs a way to indicate that they happened: in our parser, we
				299	return null on an error.</p>
				300
				301	<p>Another interesting aspect of this function is that it uses recursion by
Owen Anderson	c6311b9	2007-10-22 06:48:28 +0000	[diff] [blame]	302	calling <tt>ParseExpression</tt> (we will soon see that <tt>ParseExpression</tt> can call
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	303	<tt>ParseParenExpr</tt>). This is powerful because it allows us to handle
				304	recursive grammars, and keeps each production very simple. Note that
Duncan Sands	72261ff	2007-11-05 16:04:58 +0000	[diff] [blame]	305	parentheses do not cause construction of AST nodes themselves. While we could
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	306	do this, the most important role of parens are to guide the parser and provide
				307	grouping. Once the parser constructs the AST, parens are not needed.</p>
				308
				309	<p>The next simple production is for handling variable references and function
				310	calls:</p>
				311
				312	<div class="doc_code">
				313	<pre>
				314	/// identifierexpr
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	315	/// ::= identifier
				316	/// ::= identifier '(' expression* ')'
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	317	static ExprAST *ParseIdentifierExpr() {
				318	std::string IdName = IdentifierStr;
				319
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	320	getNextToken(); // eat identifier.
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	321
				322	if (CurTok != '(') // Simple variable ref.
				323	return new VariableExprAST(IdName);
				324
				325	// Call.
				326	getNextToken(); // eat (
				327	std::vector<ExprAST*> Args;
				328	while (1) {
				329	ExprAST *Arg = ParseExpression();
				330	if (!Arg) return 0;
				331	Args.push_back(Arg);
				332
				333	if (CurTok == ')') break;
				334
				335	if (CurTok != ',')
				336	return Error("Expected ')'");
				337	getNextToken();
				338	}
				339
				340	// Eat the ')'.
				341	getNextToken();
				342
				343	return new CallExprAST(IdName, Args);
				344	}
				345	</pre>
				346	</div>
				347
				348	<p>This routine follows the same style as the other routines (it expects to be
				349	called if the current token is a <tt>tok_identifier</tt> token). It also has
				350	recursion and error handling. One interesting aspect of this is that it uses
				351	<em>look-ahead</em> to determine if the current identifier is a stand alone
				352	variable reference or if it is a function call expression. It handles this by
				353	checking to see if the token after the identifier is a '(' token, and constructs
				354	either a <tt>VariableExprAST</tt> or <tt>CallExprAST</tt> node as appropriate.
				355	</p>
				356
				357	<p>Now that we have all of our simple expression parsing logic in place, we can
				358	define a helper function to wrap them up in a class. We call this class of
				359	expressions "primary" expressions, for reasons that will become more clear
				360	later. In order to parse a primary expression, we need to determine what sort
				361	of expression it is:</p>
				362
				363	<div class="doc_code">
				364	<pre>
				365	/// primary
				366	/// ::= identifierexpr
				367	/// ::= numberexpr
				368	/// ::= parenexpr
				369	static ExprAST *ParsePrimary() {
				370	switch (CurTok) {
				371	default: return Error("unknown token when expecting an expression");
				372	case tok_identifier: return ParseIdentifierExpr();
				373	case tok_number: return ParseNumberExpr();
				374	case '(': return ParseParenExpr();
				375	}
				376	}
				377	</pre>
				378	</div>
				379
				380	<p>Now that you see the definition of this function, it makes it more obvious
				381	why we can assume the state of CurTok in the various functions. This uses
				382	look-ahead to determine which sort of expression is being inspected, and parses
				383	it with a function call.</p>
				384
				385	<p>Now that basic expressions are handled, we need to handle binary expressions,
				386	which are a bit more complex.</p>
				387
				388	</div>
				389
				390	<!-- *********************************************************************** -->
				391	<div class="doc_section"><a name="parserbinops">Binary Expression
				392	Parsing</a></div>
				393	<!-- *********************************************************************** -->
				394
				395	<div class="doc_text">
				396
				397	<p>Binary expressions are significantly harder to parse because they are often
				398	ambiguous. For example, when given the string "x+y*z", the parser can choose
				399	to parse it as either "(x+y)z" or "x+(yz)". With common definitions from
				400	mathematics, we expect the later parse, because "*" (multiplication) has
				401	higher <em>precedence</em> than "+" (addition).</p>
				402
				403	<p>There are many ways to handle this, but an elegant and efficient way is to
				404	use <a href=
				405	"http://en.wikipedia.org/wiki/Operator-precedence_parser">Operator-Precedence
				406	Parsing</a>. This parsing technique uses the precedence of binary operators to
				407	guide recursion. To start with, we need a table of precedences:</p>
				408
				409	<div class="doc_code">
				410	<pre>
				411	/// BinopPrecedence - This holds the precedence for each binary operator that is
				412	/// defined.
				413	static std::map<char, int> BinopPrecedence;
				414
				415	/// GetTokPrecedence - Get the precedence of the pending binary operator token.
				416	static int GetTokPrecedence() {
				417	if (!isascii(CurTok))
				418	return -1;
				419
				420	// Make sure it's a declared binop.
				421	int TokPrec = BinopPrecedence[CurTok];
				422	if (TokPrec <= 0) return -1;
				423	return TokPrec;
				424	}
				425
				426	int main() {
				427	// Install standard binary operators.
				428	// 1 is lowest precedence.
				429	BinopPrecedence['<'] = 10;
				430	BinopPrecedence['+'] = 20;
				431	BinopPrecedence['-'] = 20;
				432	BinopPrecedence['*'] = 40; // highest.
				433	...
				434	}
				435	</pre>
				436	</div>
				437
				438	<p>For the basic form of Kaleidoscope, we will only support 4 binary operators
				439	(this can obviously be extended by you, the reader). The
				440	<tt>GetTokPrecedence</tt> function returns the precedence for the current token,
				441	or -1 if the token is not a binary operator. Having a map makes it easy to add
				442	new operators and makes it clear that the algorithm doesn't depend on the
				443	specific operators involved, but it would be easy enough to eliminate the map
				444	and do the comparisons in the <tt>GetTokPrecedence</tt> function.</p>
				445
				446	<p>With the helper above defined, we can now start parsing binary expressions.
				447	The basic idea of operator precedence parsing is to break down an expression
				448	with potentially ambiguous binary operators into pieces. Consider for example
				449	the expression "a+b+(c+d)ef+g". Operator precedence parsing considers this
				450	as a stream of primary expressions separated by binary operators. As such,
				451	it will first parse the leading primary expression "a", then it will see the
				452	pairs [+, b] [+, (c+d)] [, e] [, f] and [+, g]. Note that because parentheses
Duncan Sands	72261ff	2007-11-05 16:04:58 +0000	[diff] [blame]	453	are primary expressions, the binary expression parser doesn't need to worry
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	454	about nested subexpressions like (c+d) at all.
				455	</p>
				456
				457	<p>
				458	To start, an expression is a primary expression potentially followed by a
				459	sequence of [binop,primaryexpr] pairs:</p>
				460
				461	<div class="doc_code">
				462	<pre>
				463	/// expression
				464	/// ::= primary binoprhs
				465	///
				466	static ExprAST *ParseExpression() {
				467	ExprAST *LHS = ParsePrimary();
				468	if (!LHS) return 0;
				469
				470	return ParseBinOpRHS(0, LHS);
				471	}
				472	</pre>
				473	</div>
				474
				475	<p><tt>ParseBinOpRHS</tt> is the function that parses the sequence of pairs for
				476	us. It takes a precedence and a pointer to an expression for the part parsed
				477	so far. Note that "x" is a perfectly valid expression: As such, "binoprhs" is
				478	allowed to be empty, in which case it returns the expression that is passed into
				479	it. In our example above, the code passes the expression for "a" into
				480	<tt>ParseBinOpRHS</tt> and the current token is "+".</p>
				481
				482	<p>The precedence value passed into <tt>ParseBinOpRHS</tt> indicates the <em>
				483	minimal operator precedence</em> that the function is allowed to eat. For
				484	example, if the current pair stream is [+, x] and <tt>ParseBinOpRHS</tt> is
				485	passed in a precedence of 40, it will not consume any tokens (because the
				486	precedence of '+' is only 20). With this in mind, <tt>ParseBinOpRHS</tt> starts
				487	with:</p>
				488
				489	<div class="doc_code">
				490	<pre>
				491	/// binoprhs
				492	/// ::= ('+' primary)*
				493	static ExprAST ParseBinOpRHS(int ExprPrec, ExprAST LHS) {
				494	// If this is a binop, find its precedence.
				495	while (1) {
				496	int TokPrec = GetTokPrecedence();
				497
				498	// If this is a binop that binds at least as tightly as the current binop,
				499	// consume it, otherwise we are done.
				500	if (TokPrec < ExprPrec)
				501	return LHS;
				502	</pre>
				503	</div>
				504
				505	<p>This code gets the precedence of the current token and checks to see if if is
				506	too low. Because we defined invalid tokens to have a precedence of -1, this
				507	check implicitly knows that the pair-stream ends when the token stream runs out
				508	of binary operators. If this check succeeds, we know that the token is a binary
				509	operator and that it will be included in this expression:</p>
				510
				511	<div class="doc_code">
				512	<pre>
				513	// Okay, we know this is a binop.
				514	int BinOp = CurTok;
				515	getNextToken(); // eat binop
				516
				517	// Parse the primary expression after the binary operator.
				518	ExprAST *RHS = ParsePrimary();
				519	if (!RHS) return 0;
				520	</pre>
				521	</div>
				522
				523	<p>As such, this code eats (and remembers) the binary operator and then parses
				524	the following primary expression. This builds up the whole pair, the first of
				525	which is [+, b] for the running example.</p>
				526
				527	<p>Now that we parsed the left-hand side of an expression and one pair of the
				528	RHS sequence, we have to decide which way the expression associates. In
				529	particular, we could have "(a+b) binop unparsed" or "a + (b binop unparsed)".
				530	To determine this, we look ahead at "binop" to determine its precedence and
				531	compare it to BinOp's precedence (which is '+' in this case):</p>
				532
				533	<div class="doc_code">
				534	<pre>
				535	// If BinOp binds less tightly with RHS than the operator after RHS, let
				536	// the pending operator take RHS as its LHS.
				537	int NextPrec = GetTokPrecedence();
				538	if (TokPrec < NextPrec) {
				539	</pre>
				540	</div>
				541
				542	<p>If the precedence of the binop to the right of "RHS" is lower or equal to the
				543	precedence of our current operator, then we know that the parentheses associate
				544	as "(a+b) binop ...". In our example, since the next operator is "+" and so is
				545	our current one, we know that they have the same precedence. In this case we'll
				546	create the AST node for "a+b", and then continue parsing:</p>
				547
				548	<div class="doc_code">
				549	<pre>
				550	... if body omitted ...
				551	}
				552
				553	// Merge LHS/RHS.
				554	LHS = new BinaryExprAST(BinOp, LHS, RHS);
				555	} // loop around to the top of the while loop.
				556	}
				557	</pre>
				558	</div>
				559
				560	<p>In our example above, this will turn "a+b+" into "(a+b)" and execute the next
				561	iteration of the loop, with "+" as the current token. The code above will eat
				562	and remember it and parse "(c+d)" as the primary expression, which makes the
				563	current pair be [+, (c+d)]. It will then enter the 'if' above with "*" as the
				564	binop to the right of the primary. In this case, the precedence of "*" is
				565	higher than the precedence of "+" so the if condition will be entered.</p>
				566
				567	<p>The critical question left here is "how can the if condition parse the right
				568	hand side in full"? In particular, to build the AST correctly for our example,
				569	it needs to get all of "(c+d)ef" as the RHS expression variable. The code to
				570	do this is surprisingly simple (code from the above two blocks duplicated for
				571	context):</p>
				572
				573	<div class="doc_code">
				574	<pre>
				575	// If BinOp binds less tightly with RHS than the operator after RHS, let
				576	// the pending operator take RHS as its LHS.
				577	int NextPrec = GetTokPrecedence();
				578	if (TokPrec < NextPrec) {
				579	RHS = ParseBinOpRHS(TokPrec+1, RHS);
				580	if (RHS == 0) return 0;
				581	}
				582	// Merge LHS/RHS.
				583	LHS = new BinaryExprAST(BinOp, LHS, RHS);
				584	} // loop around to the top of the while loop.
				585	}
				586	</pre>
				587	</div>
				588
				589	<p>At this point, we know that the binary operator to the RHS of our primary
				590	has higher precedence than the binop we are currently parsing. As such, we know
				591	that any sequence of pairs whose operators are all higher precedence than "+"
				592	should be parsed together and returned as "RHS". To do this, we recursively
				593	invoke the <tt>ParseBinOpRHS</tt> function specifying "TokPrec+1" as the minimum
				594	precedence required for it to continue. In our example above, this will cause
				595	it to return the AST node for "(c+d)ef" as RHS, which is then set as the RHS
				596	of the '+' expression.</p>
				597
				598	<p>Finally, on the next iteration of the while loop, the "+g" piece is parsed.
				599	and added to the AST. With this little bit of code (14 non-trivial lines), we
				600	correctly handle fully general binary expression parsing in a very elegant way.
				601	</p>
				602
				603	<p>This wraps up handling of expressions. At this point, we can point the
				604	parser at an arbitrary token stream and build an expression from them, stopping
				605	at the first token that is not part of the expression. Next up we need to
				606	handle function definitions etc.</p>
				607
				608	</div>
				609
				610	<!-- *********************************************************************** -->
				611	<div class="doc_section"><a name="parsertop">Parsing the Rest</a></div>
				612	<!-- *********************************************************************** -->
				613
				614	<div class="doc_text">
				615
				616	<p>
				617	The first basic thing missing is that of function prototypes. In Kaleidoscope,
				618	these are used both for 'extern' function declarations as well as function body
				619	definitions. The code to do this is straight-forward and not very interesting
				620	(once you've survived expressions):
				621	</p>
				622
				623	<div class="doc_code">
				624	<pre>
				625	/// prototype
				626	/// ::= id '(' id* ')'
				627	static PrototypeAST *ParsePrototype() {
				628	if (CurTok != tok_identifier)
				629	return ErrorP("Expected function name in prototype");
				630
				631	std::string FnName = IdentifierStr;
				632	getNextToken();
				633
				634	if (CurTok != '(')
				635	return ErrorP("Expected '(' in prototype");
				636
				637	std::vector<std::string> ArgNames;
				638	while (getNextToken() == tok_identifier)
				639	ArgNames.push_back(IdentifierStr);
				640	if (CurTok != ')')
				641	return ErrorP("Expected ')' in prototype");
				642
				643	// success.
				644	getNextToken(); // eat ')'.
				645
				646	return new PrototypeAST(FnName, ArgNames);
				647	}
				648	</pre>
				649	</div>
				650
				651	<p>Given this, a function definition is very simple, just a prototype plus
Duncan Sands	72261ff	2007-11-05 16:04:58 +0000	[diff] [blame]	652	an expression to implement the body:</p>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	653
				654	<div class="doc_code">
				655	<pre>
				656	/// definition ::= 'def' prototype expression
				657	static FunctionAST *ParseDefinition() {
				658	getNextToken(); // eat def.
				659	PrototypeAST *Proto = ParsePrototype();
				660	if (Proto == 0) return 0;
				661
				662	if (ExprAST *E = ParseExpression())
				663	return new FunctionAST(Proto, E);
				664	return 0;
				665	}
				666	</pre>
				667	</div>
				668
				669	<p>In addition, we support 'extern' to declare functions like 'sin' and 'cos' as
				670	well as to support forward declaration of user functions. 'externs' are just
				671	prototypes with no body:</p>
				672
				673	<div class="doc_code">
				674	<pre>
				675	/// external ::= 'extern' prototype
				676	static PrototypeAST *ParseExtern() {
				677	getNextToken(); // eat extern.
				678	return ParsePrototype();
				679	}
				680	</pre>
				681	</div>
				682
				683	<p>Finally, we'll also let the user type in arbitrary top-level expressions and
				684	evaluate them on the fly. We will handle this by defining anonymous nullary
				685	(zero argument) functions for them:</p>
				686
				687	<div class="doc_code">
				688	<pre>
				689	/// toplevelexpr ::= expression
				690	static FunctionAST *ParseTopLevelExpr() {
				691	if (ExprAST *E = ParseExpression()) {
				692	// Make an anonymous proto.
				693	PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
				694	return new FunctionAST(Proto, E);
				695	}
				696	return 0;
				697	}
				698	</pre>
				699	</div>
				700
				701	<p>Now that we have all the pieces, lets build a little driver that will let us
				702	actually <em>execute</em> this code we've built!</p>
				703
				704	</div>
				705
				706	<!-- *********************************************************************** -->
				707	<div class="doc_section"><a name="driver">The Driver</a></div>
				708	<!-- *********************************************************************** -->
				709
				710	<div class="doc_text">
				711
				712	<p>The driver for this simply invokes all of the parsing pieces with a top-level
				713	dispatch loop. There isn't much interesting here, so I'll just include the
				714	top-level loop. See <a href="#code">below</a> for full code in the "Top-Level
				715	Parsing" section.</p>
				716
				717	<div class="doc_code">
				718	<pre>
				719	/// top ::= definition \| external \| expression \| ';'
				720	static void MainLoop() {
				721	while (1) {
				722	fprintf(stderr, "ready> ");
				723	switch (CurTok) {
				724	case tok_eof: return;
				725	case ';': getNextToken(); break; // ignore top level semicolons.
				726	case tok_def: HandleDefinition(); break;
				727	case tok_extern: HandleExtern(); break;
				728	default: HandleTopLevelExpression(); break;
				729	}
				730	}
				731	}
				732	</pre>
				733	</div>
				734
				735	<p>The most interesting part of this is that we ignore top-level semi colons.
Owen Anderson	c6311b9	2007-10-22 06:48:28 +0000	[diff] [blame]	736	Why is this, you ask? The basic reason is that if you type "4 + 5" at the
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	737	command line, the parser doesn't know that that is the end of what you will
				738	type. For example, on the next line you could type "def foo..." in which case
				739	4+5 is the end of a top-level expression. Alternatively you could type "* 6",
				740	which would continue the expression. Having top-level semicolons allows you to
				741	type "4+5;" and the parser will know you are done.</p>
				742
				743	</div>
				744
				745	<!-- *********************************************************************** -->
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	746	<div class="doc_section"><a name="conclusions">Conclusions</a></div>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	747	<!-- *********************************************************************** -->
				748
				749	<div class="doc_text">
				750
				751	<p>With just under 400 lines of commented code, we fully defined our minimal
				752	language, including a lexer, parser and AST builder. With this done, the
				753	executable will validate code and tell us if it is gramatically invalid. For
				754	example, here is a sample interaction:</p>
				755
				756	<div class="doc_code">
				757	<pre>
				758	$ ./a.out
Chris Lattner	35abbf5	2007-10-23 06:23:57 +0000	[diff] [blame]	759	ready> def foo(x y) x+foo(y, 4.0);
Chris Lattner	01fcc04	2007-11-05 17:38:34 +0000	[diff] [blame]	760	ready> Parsed a function definition.
Chris Lattner	35abbf5	2007-10-23 06:23:57 +0000	[diff] [blame]	761	ready> def foo(x y) x+y y;
Chris Lattner	01fcc04	2007-11-05 17:38:34 +0000	[diff] [blame]	762	ready> Parsed a function definition.
Chris Lattner	35abbf5	2007-10-23 06:23:57 +0000	[diff] [blame]	763	ready> Parsed a top-level expr
				764	ready> def foo(x y) x+y );
Chris Lattner	01fcc04	2007-11-05 17:38:34 +0000	[diff] [blame]	765	ready> Parsed a function definition.
Chris Lattner	35abbf5	2007-10-23 06:23:57 +0000	[diff] [blame]	766	ready> Error: unknown token when expecting an expression
				767	ready> extern sin(a);
				768	ready> Parsed an extern
				769	ready> ^D
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	770	$
				771	</pre>
				772	</div>
				773
Chris Lattner	d93a584	2007-10-23 05:43:01 +0000	[diff] [blame]	774	<p>There is a lot of room for extension here. You can define new AST nodes,
				775	extend the language in many ways, etc. In the <a href="LangImpl3.html">next
				776	installment</a>, we will describe how to generate LLVM IR from the AST.</p>
				777
				778	</div>
				779
				780	<!-- *********************************************************************** -->
				781	<div class="doc_section"><a name="code">Full Code Listing</a></div>
				782	<!-- *********************************************************************** -->
				783
				784	<div class="doc_text">
				785
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	786	<p>
Chris Lattner	d93a584	2007-10-23 05:43:01 +0000	[diff] [blame]	787	Here is the complete code listing for this and the previous chapter.
				788	Note that it is fully self-contained: you don't need LLVM or any external
				789	libraries at all for this (other than the C and C++ standard libraries of
				790	course). To build this, just compile with:</p>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	791
				792	<div class="doc_code">
				793	<pre>
Chris Lattner	d93a584	2007-10-23 05:43:01 +0000	[diff] [blame]	794	# Compile
				795	g++ -g toy.cpp
				796	# Run
				797	./a.out
				798	</pre>
				799	</div>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	800
Chris Lattner	d93a584	2007-10-23 05:43:01 +0000	[diff] [blame]	801	<p>Here is the code:</p>
				802
				803	<div class="doc_code">
				804	<pre>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	805	#include <cstdio>
				806	#include <string>
Chris Lattner	d93a584	2007-10-23 05:43:01 +0000	[diff] [blame]	807	#include <map>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	808	#include <vector>
				809
				810	//===----------------------------------------------------------------------===//
				811	// Lexer
				812	//===----------------------------------------------------------------------===//
				813
				814	// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
				815	// of these for known things.
				816	enum Token {
				817	tok_eof = -1,
				818
				819	// commands
				820	tok_def = -2, tok_extern = -3,
				821
				822	// primary
				823	tok_identifier = -4, tok_number = -5,
				824	};
				825
				826	static std::string IdentifierStr; // Filled in if tok_identifier
				827	static double NumVal; // Filled in if tok_number
				828
				829	/// gettok - Return the next token from standard input.
				830	static int gettok() {
				831	static int LastChar = ' ';
				832
				833	// Skip any whitespace.
				834	while (isspace(LastChar))
				835	LastChar = getchar();
				836
				837	if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
				838	IdentifierStr = LastChar;
				839	while (isalnum((LastChar = getchar())))
				840	IdentifierStr += LastChar;
				841
				842	if (IdentifierStr == "def") return tok_def;
				843	if (IdentifierStr == "extern") return tok_extern;
				844	return tok_identifier;
				845	}
				846
				847	if (isdigit(LastChar) \|\| LastChar == '.') { // Number: [0-9.]+
				848	std::string NumStr;
				849	do {
				850	NumStr += LastChar;
				851	LastChar = getchar();
				852	} while (isdigit(LastChar) \|\| LastChar == '.');
				853
				854	NumVal = strtod(NumStr.c_str(), 0);
				855	return tok_number;
				856	}
				857
				858	if (LastChar == '#') {
				859	// Comment until end of line.
				860	do LastChar = getchar();
				861	while (LastChar != EOF && LastChar != '\n' & LastChar != '\r');
				862
				863	if (LastChar != EOF)
				864	return gettok();
				865	}
				866
				867	// Check for end of file. Don't eat the EOF.
				868	if (LastChar == EOF)
				869	return tok_eof;
				870
				871	// Otherwise, just return the character as its ascii value.
				872	int ThisChar = LastChar;
				873	LastChar = getchar();
				874	return ThisChar;
				875	}
				876
				877	//===----------------------------------------------------------------------===//
				878	// Abstract Syntax Tree (aka Parse Tree)
				879	//===----------------------------------------------------------------------===//
				880
				881	/// ExprAST - Base class for all expression nodes.
				882	class ExprAST {
				883	public:
				884	virtual ~ExprAST() {}
				885	};
				886
				887	/// NumberExprAST - Expression class for numeric literals like "1.0".
				888	class NumberExprAST : public ExprAST {
				889	double Val;
				890	public:
Chris Lattner	28571ed	2007-10-23 04:27:44 +0000	[diff] [blame]	891	explicit NumberExprAST(double val) : Val(val) {}
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	892	};
				893
				894	/// VariableExprAST - Expression class for referencing a variable, like "a".
				895	class VariableExprAST : public ExprAST {
				896	std::string Name;
				897	public:
Chris Lattner	28571ed	2007-10-23 04:27:44 +0000	[diff] [blame]	898	explicit VariableExprAST(const std::string &name) : Name(name) {}
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	899	};
				900
				901	/// BinaryExprAST - Expression class for a binary operator.
				902	class BinaryExprAST : public ExprAST {
				903	char Op;
				904	ExprAST LHS, RHS;
				905	public:
				906	BinaryExprAST(char op, ExprAST lhs, ExprAST rhs)
				907	: Op(op), LHS(lhs), RHS(rhs) {}
				908	};
				909
				910	/// CallExprAST - Expression class for function calls.
				911	class CallExprAST : public ExprAST {
				912	std::string Callee;
				913	std::vector<ExprAST*> Args;
				914	public:
				915	CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
				916	: Callee(callee), Args(args) {}
				917	};
				918
				919	/// PrototypeAST - This class represents the "prototype" for a function,
				920	/// which captures its argument names as well as if it is an operator.
				921	class PrototypeAST {
				922	std::string Name;
				923	std::vector< Args;
				924	public:
				925	PrototypeAST(const std::string &name, const std::vector<std::string> &args)
				926	: Name(name), Args(args) {}
				927
				928	};
				929
				930	/// FunctionAST - This class represents a function definition itself.
				931	class FunctionAST {
				932	PrototypeAST *Proto;
				933	ExprAST *Body;
				934	public:
				935	FunctionAST(PrototypeAST proto, ExprAST body)
				936	: Proto(proto), Body(body) {}
				937
				938	};
				939
				940	//===----------------------------------------------------------------------===//
				941	// Parser
				942	//===----------------------------------------------------------------------===//
				943
				944	/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
				945	/// token the parser it looking at. getNextToken reads another token from the
				946	/// lexer and updates CurTok with its results.
				947	static int CurTok;
				948	static int getNextToken() {
				949	return CurTok = gettok();
				950	}
				951
				952	/// BinopPrecedence - This holds the precedence for each binary operator that is
				953	/// defined.
				954	static std::map<char, int> BinopPrecedence;
				955
				956	/// GetTokPrecedence - Get the precedence of the pending binary operator token.
				957	static int GetTokPrecedence() {
				958	if (!isascii(CurTok))
				959	return -1;
				960
				961	// Make sure it's a declared binop.
				962	int TokPrec = BinopPrecedence[CurTok];
				963	if (TokPrec <= 0) return -1;
				964	return TokPrec;
				965	}
				966
				967	/// Error* - These are little helper functions for error handling.
				968	ExprAST Error(const char Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
				969	PrototypeAST ErrorP(const char Str) { Error(Str); return 0; }
				970	FunctionAST ErrorF(const char Str) { Error(Str); return 0; }
				971
				972	static ExprAST *ParseExpression();
				973
				974	/// identifierexpr
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	975	/// ::= identifier
				976	/// ::= identifier '(' expression* ')'
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	977	static ExprAST *ParseIdentifierExpr() {
				978	std::string IdName = IdentifierStr;
				979
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	980	getNextToken(); // eat identifier.
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	981
				982	if (CurTok != '(') // Simple variable ref.
				983	return new VariableExprAST(IdName);
				984
				985	// Call.
				986	getNextToken(); // eat (
				987	std::vector<ExprAST*> Args;
				988	while (1) {
				989	ExprAST *Arg = ParseExpression();
				990	if (!Arg) return 0;
				991	Args.push_back(Arg);
				992
				993	if (CurTok == ')') break;
				994
				995	if (CurTok != ',')
				996	return Error("Expected ')'");
				997	getNextToken();
				998	}
				999
				1000	// Eat the ')'.
				1001	getNextToken();
				1002
				1003	return new CallExprAST(IdName, Args);
				1004	}
				1005
				1006	/// numberexpr ::= number
				1007	static ExprAST *ParseNumberExpr() {
				1008	ExprAST *Result = new NumberExprAST(NumVal);
				1009	getNextToken(); // consume the number
				1010	return Result;
				1011	}
				1012
				1013	/// parenexpr ::= '(' expression ')'
				1014	static ExprAST *ParseParenExpr() {
				1015	getNextToken(); // eat (.
				1016	ExprAST *V = ParseExpression();
				1017	if (!V) return 0;
				1018
				1019	if (CurTok != ')')
				1020	return Error("expected ')'");
				1021	getNextToken(); // eat ).
				1022	return V;
				1023	}
				1024
				1025	/// primary
				1026	/// ::= identifierexpr
				1027	/// ::= numberexpr
				1028	/// ::= parenexpr
				1029	static ExprAST *ParsePrimary() {
				1030	switch (CurTok) {
				1031	default: return Error("unknown token when expecting an expression");
				1032	case tok_identifier: return ParseIdentifierExpr();
				1033	case tok_number: return ParseNumberExpr();
				1034	case '(': return ParseParenExpr();
				1035	}
				1036	}
				1037
				1038	/// binoprhs
				1039	/// ::= ('+' primary)*
				1040	static ExprAST ParseBinOpRHS(int ExprPrec, ExprAST LHS) {
				1041	// If this is a binop, find its precedence.
				1042	while (1) {
				1043	int TokPrec = GetTokPrecedence();
				1044
				1045	// If this is a binop that binds at least as tightly as the current binop,
				1046	// consume it, otherwise we are done.
				1047	if (TokPrec < ExprPrec)
				1048	return LHS;
				1049
				1050	// Okay, we know this is a binop.
				1051	int BinOp = CurTok;
				1052	getNextToken(); // eat binop
				1053
				1054	// Parse the primary expression after the binary operator.
				1055	ExprAST *RHS = ParsePrimary();
				1056	if (!RHS) return 0;
				1057
				1058	// If BinOp binds less tightly with RHS than the operator after RHS, let
				1059	// the pending operator take RHS as its LHS.
				1060	int NextPrec = GetTokPrecedence();
				1061	if (TokPrec < NextPrec) {
				1062	RHS = ParseBinOpRHS(TokPrec+1, RHS);
				1063	if (RHS == 0) return 0;
				1064	}
				1065
				1066	// Merge LHS/RHS.
				1067	LHS = new BinaryExprAST(BinOp, LHS, RHS);
				1068	}
				1069	}
				1070
				1071	/// expression
				1072	/// ::= primary binoprhs
				1073	///
				1074	static ExprAST *ParseExpression() {
				1075	ExprAST *LHS = ParsePrimary();
				1076	if (!LHS) return 0;
				1077
				1078	return ParseBinOpRHS(0, LHS);
				1079	}
				1080
				1081	/// prototype
				1082	/// ::= id '(' id* ')'
				1083	static PrototypeAST *ParsePrototype() {
				1084	if (CurTok != tok_identifier)
				1085	return ErrorP("Expected function name in prototype");
				1086
				1087	std::string FnName = IdentifierStr;
				1088	getNextToken();
				1089
				1090	if (CurTok != '(')
				1091	return ErrorP("Expected '(' in prototype");
				1092
				1093	std::vector<std::string> ArgNames;
				1094	while (getNextToken() == tok_identifier)
				1095	ArgNames.push_back(IdentifierStr);
				1096	if (CurTok != ')')
				1097	return ErrorP("Expected ')' in prototype");
				1098
				1099	// success.
				1100	getNextToken(); // eat ')'.
				1101
				1102	return new PrototypeAST(FnName, ArgNames);
				1103	}
				1104
				1105	/// definition ::= 'def' prototype expression
				1106	static FunctionAST *ParseDefinition() {
				1107	getNextToken(); // eat def.
				1108	PrototypeAST *Proto = ParsePrototype();
				1109	if (Proto == 0) return 0;
				1110
				1111	if (ExprAST *E = ParseExpression())
				1112	return new FunctionAST(Proto, E);
				1113	return 0;
				1114	}
				1115
				1116	/// toplevelexpr ::= expression
				1117	static FunctionAST *ParseTopLevelExpr() {
				1118	if (ExprAST *E = ParseExpression()) {
				1119	// Make an anonymous proto.
				1120	PrototypeAST *Proto = new PrototypeAST("", std::vector<());
				1121	return new FunctionAST(Proto, E);
				1122	}
				1123	return 0;
				1124	}
				1125
				1126	/// external ::= 'extern' prototype
				1127	static PrototypeAST *ParseExtern() {
				1128	getNextToken(); // eat extern.
				1129	return ParsePrototype();
				1130	}
				1131
				1132	//===----------------------------------------------------------------------===//
				1133	// Top-Level parsing
				1134	//===----------------------------------------------------------------------===//
				1135
				1136	static void HandleDefinition() {
				1137	if (FunctionAST *F = ParseDefinition()) {
				1138	fprintf(stderr, "Parsed a function definition.\n");
				1139	} else {
				1140	// Skip token for error recovery.
				1141	getNextToken();
				1142	}
				1143	}
				1144
				1145	static void HandleExtern() {
				1146	if (PrototypeAST *P = ParseExtern()) {
				1147	fprintf(stderr, "Parsed an extern\n");
				1148	} else {
				1149	// Skip token for error recovery.
				1150	getNextToken();
				1151	}
				1152	}
				1153
				1154	static void HandleTopLevelExpression() {
				1155	// Evaluate a top level expression into an anonymous function.
				1156	if (FunctionAST *F = ParseTopLevelExpr()) {
				1157	fprintf(stderr, "Parsed a top-level expr\n");
				1158	} else {
				1159	// Skip token for error recovery.
				1160	getNextToken();
				1161	}
				1162	}
				1163
				1164	/// top ::= definition \| external \| expression \| ';'
				1165	static void MainLoop() {
				1166	while (1) {
				1167	fprintf(stderr, "ready> ");
				1168	switch (CurTok) {
				1169	case tok_eof: return;
				1170	case ';': getNextToken(); break; // ignore top level semicolons.
				1171	case tok_def: HandleDefinition(); break;
				1172	case tok_extern: HandleExtern(); break;
				1173	default: HandleTopLevelExpression(); break;
				1174	}
				1175	}
				1176	}
				1177
				1178	//===----------------------------------------------------------------------===//
				1179	// Main driver code.
				1180	//===----------------------------------------------------------------------===//
				1181
				1182	int main() {
				1183	// Install standard binary operators.
				1184	// 1 is lowest precedence.
				1185	BinopPrecedence['<'] = 10;
				1186	BinopPrecedence['+'] = 20;
				1187	BinopPrecedence['-'] = 20;
				1188	BinopPrecedence['*'] = 40; // highest.
				1189
				1190	// Prime the first token.
				1191	fprintf(stderr, "ready> ");
				1192	getNextToken();
				1193
				1194	MainLoop();
				1195	return 0;
				1196	}
				1197	</pre>
				1198	</div>
				1199	</div>
				1200
				1201	<!-- *********************************************************************** -->
				1202	<hr>
				1203	<address>
				1204	<a href="http://jigsaw.w3.org/css-validator/check/referer"><img
				1205	src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
				1206	<a href="http://validator.w3.org/check/referer"><img
Chris Lattner	8eef4b2	2007-10-23 06:30:50 +0000	[diff] [blame]	1207	src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
Chris Lattner	e6c9104	2007-10-22 06:34:15 +0000	[diff] [blame]	1208
				1209	<a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
				1210	<a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
				1211	Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $
				1212	</address>
				1213	</body>
				1214	</html>