Blame - docs/tutorial/LangImpl2.rst - fp2-dev/platform/external/llvm

blob: 06b18ff6c23961a0f65d15addd580fe5fb3a8735 [file] [log] [blame]

Sean Silva	ee47edf	2012-12-05 00:26:32 +0000	[diff] [blame]	1	===========================================
				2	Kaleidoscope: Implementing a Parser and AST
				3	===========================================
				4
				5	.. contents::
				6	:local:
				7
Sean Silva	ee47edf	2012-12-05 00:26:32 +0000	[diff] [blame]	8	Chapter 2 Introduction
				9	======================
				10
				11	Welcome to Chapter 2 of the "`Implementing a language with
				12	LLVM <index.html>`_" tutorial. This chapter shows you how to use the
				13	lexer, built in `Chapter 1 <LangImpl1.html>`_, to build a full
				14	`parser <http://en.wikipedia.org/wiki/Parsing>`_ for our Kaleidoscope
				15	language. Once we have a parser, we'll define and build an `Abstract
				16	Syntax Tree <http://en.wikipedia.org/wiki/Abstract_syntax_tree>`_ (AST).
				17
				18	The parser we will build uses a combination of `Recursive Descent
				19	Parsing <http://en.wikipedia.org/wiki/Recursive_descent_parser>`_ and
				20	`Operator-Precedence
				21	Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_ to
				22	parse the Kaleidoscope language (the latter for binary expressions and
				23	the former for everything else). Before we get to parsing though, lets
				24	talk about the output of the parser: the Abstract Syntax Tree.
				25
				26	The Abstract Syntax Tree (AST)
				27	==============================
				28
				29	The AST for a program captures its behavior in such a way that it is
				30	easy for later stages of the compiler (e.g. code generation) to
				31	interpret. We basically want one object for each construct in the
				32	language, and the AST should closely model the language. In
				33	Kaleidoscope, we have expressions, a prototype, and a function object.
				34	We'll start with expressions first:
				35
				36	.. code-block:: c++
				37
				38	/// ExprAST - Base class for all expression nodes.
				39	class ExprAST {
				40	public:
				41	virtual ~ExprAST() {}
				42	};
				43
				44	/// NumberExprAST - Expression class for numeric literals like "1.0".
				45	class NumberExprAST : public ExprAST {
				46	double Val;
				47	public:
				48	NumberExprAST(double val) : Val(val) {}
				49	};
				50
				51	The code above shows the definition of the base ExprAST class and one
				52	subclass which we use for numeric literals. The important thing to note
				53	about this code is that the NumberExprAST class captures the numeric
				54	value of the literal as an instance variable. This allows later phases
				55	of the compiler to know what the stored numeric value is.
				56
				57	Right now we only create the AST, so there are no useful accessor
				58	methods on them. It would be very easy to add a virtual method to pretty
				59	print the code, for example. Here are the other expression AST node
				60	definitions that we'll use in the basic form of the Kaleidoscope
				61	language:
				62
				63	.. code-block:: c++
				64
				65	/// VariableExprAST - Expression class for referencing a variable, like "a".
				66	class VariableExprAST : public ExprAST {
				67	std::string Name;
				68	public:
				69	VariableExprAST(const std::string &name) : Name(name) {}
				70	};
				71
				72	/// BinaryExprAST - Expression class for a binary operator.
				73	class BinaryExprAST : public ExprAST {
				74	char Op;
				75	ExprAST LHS, RHS;
				76	public:
				77	BinaryExprAST(char op, ExprAST lhs, ExprAST rhs)
				78	: Op(op), LHS(lhs), RHS(rhs) {}
				79	};
				80
				81	/// CallExprAST - Expression class for function calls.
				82	class CallExprAST : public ExprAST {
				83	std::string Callee;
				84	std::vector<ExprAST*> Args;
				85	public:
				86	CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
				87	: Callee(callee), Args(args) {}
				88	};
				89
				90	This is all (intentionally) rather straight-forward: variables capture
				91	the variable name, binary operators capture their opcode (e.g. '+'), and
				92	calls capture a function name as well as a list of any argument
				93	expressions. One thing that is nice about our AST is that it captures
				94	the language features without talking about the syntax of the language.
				95	Note that there is no discussion about precedence of binary operators,
				96	lexical structure, etc.
				97
				98	For our basic language, these are all of the expression nodes we'll
				99	define. Because it doesn't have conditional control flow, it isn't
				100	Turing-complete; we'll fix that in a later installment. The two things
				101	we need next are a way to talk about the interface to a function, and a
				102	way to talk about functions themselves:
				103
				104	.. code-block:: c++
				105
				106	/// PrototypeAST - This class represents the "prototype" for a function,
				107	/// which captures its name, and its argument names (thus implicitly the number
				108	/// of arguments the function takes).
				109	class PrototypeAST {
				110	std::string Name;
				111	std::vector<std::string> Args;
				112	public:
				113	PrototypeAST(const std::string &name, const std::vector<std::string> &args)
				114	: Name(name), Args(args) {}
				115	};
				116
				117	/// FunctionAST - This class represents a function definition itself.
				118	class FunctionAST {
				119	PrototypeAST *Proto;
				120	ExprAST *Body;
				121	public:
				122	FunctionAST(PrototypeAST proto, ExprAST body)
				123	: Proto(proto), Body(body) {}
				124	};
				125
				126	In Kaleidoscope, functions are typed with just a count of their
				127	arguments. Since all values are double precision floating point, the
				128	type of each argument doesn't need to be stored anywhere. In a more
				129	aggressive and realistic language, the "ExprAST" class would probably
				130	have a type field.
				131
				132	With this scaffolding, we can now talk about parsing expressions and
				133	function bodies in Kaleidoscope.
				134
				135	Parser Basics
				136	=============
				137
				138	Now that we have an AST to build, we need to define the parser code to
				139	build it. The idea here is that we want to parse something like "x+y"
				140	(which is returned as three tokens by the lexer) into an AST that could
				141	be generated with calls like this:
				142
				143	.. code-block:: c++
				144
				145	ExprAST *X = new VariableExprAST("x");
				146	ExprAST *Y = new VariableExprAST("y");
				147	ExprAST *Result = new BinaryExprAST('+', X, Y);
				148
				149	In order to do this, we'll start by defining some basic helper routines:
				150
				151	.. code-block:: c++
				152
				153	/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
				154	/// token the parser is looking at. getNextToken reads another token from the
				155	/// lexer and updates CurTok with its results.
				156	static int CurTok;
				157	static int getNextToken() {
				158	return CurTok = gettok();
				159	}
				160
				161	This implements a simple token buffer around the lexer. This allows us
				162	to look one token ahead at what the lexer is returning. Every function
				163	in our parser will assume that CurTok is the current token that needs to
				164	be parsed.
				165
				166	.. code-block:: c++
				167
				168
				169	/// Error* - These are little helper functions for error handling.
				170	ExprAST Error(const char Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
				171	PrototypeAST ErrorP(const char Str) { Error(Str); return 0; }
				172	FunctionAST ErrorF(const char Str) { Error(Str); return 0; }
				173
				174	The ``Error`` routines are simple helper routines that our parser will
				175	use to handle errors. The error recovery in our parser will not be the
				176	best and is not particular user-friendly, but it will be enough for our
				177	tutorial. These routines make it easier to handle errors in routines
				178	that have various return types: they always return null.
				179
				180	With these basic helper functions, we can implement the first piece of
				181	our grammar: numeric literals.
				182
				183	Basic Expression Parsing
				184	========================
				185
				186	We start with numeric literals, because they are the simplest to
				187	process. For each production in our grammar, we'll define a function
				188	which parses that production. For numeric literals, we have:
				189
				190	.. code-block:: c++
				191
				192	/// numberexpr ::= number
				193	static ExprAST *ParseNumberExpr() {
				194	ExprAST *Result = new NumberExprAST(NumVal);
				195	getNextToken(); // consume the number
				196	return Result;
				197	}
				198
				199	This routine is very simple: it expects to be called when the current
				200	token is a ``tok_number`` token. It takes the current number value,
				201	creates a ``NumberExprAST`` node, advances the lexer to the next token,
				202	and finally returns.
				203
				204	There are some interesting aspects to this. The most important one is
				205	that this routine eats all of the tokens that correspond to the
				206	production and returns the lexer buffer with the next token (which is
				207	not part of the grammar production) ready to go. This is a fairly
				208	standard way to go for recursive descent parsers. For a better example,
				209	the parenthesis operator is defined like this:
				210
				211	.. code-block:: c++
				212
				213	/// parenexpr ::= '(' expression ')'
				214	static ExprAST *ParseParenExpr() {
				215	getNextToken(); // eat (.
				216	ExprAST *V = ParseExpression();
				217	if (!V) return 0;
				218
				219	if (CurTok != ')')
				220	return Error("expected ')'");
				221	getNextToken(); // eat ).
				222	return V;
				223	}
				224
				225	This function illustrates a number of interesting things about the
				226	parser:
				227
				228	1) It shows how we use the Error routines. When called, this function
				229	expects that the current token is a '(' token, but after parsing the
				230	subexpression, it is possible that there is no ')' waiting. For example,
				231	if the user types in "(4 x" instead of "(4)", the parser should emit an
				232	error. Because errors can occur, the parser needs a way to indicate that
				233	they happened: in our parser, we return null on an error.
				234
				235	2) Another interesting aspect of this function is that it uses recursion
				236	by calling ``ParseExpression`` (we will soon see that
				237	``ParseExpression`` can call ``ParseParenExpr``). This is powerful
				238	because it allows us to handle recursive grammars, and keeps each
				239	production very simple. Note that parentheses do not cause construction
				240	of AST nodes themselves. While we could do it this way, the most
				241	important role of parentheses are to guide the parser and provide
				242	grouping. Once the parser constructs the AST, parentheses are not
				243	needed.
				244
				245	The next simple production is for handling variable references and
				246	function calls:
				247
				248	.. code-block:: c++
				249
				250	/// identifierexpr
				251	/// ::= identifier
				252	/// ::= identifier '(' expression* ')'
				253	static ExprAST *ParseIdentifierExpr() {
				254	std::string IdName = IdentifierStr;
				255
				256	getNextToken(); // eat identifier.
				257
				258	if (CurTok != '(') // Simple variable ref.
				259	return new VariableExprAST(IdName);
				260
				261	// Call.
				262	getNextToken(); // eat (
				263	std::vector<ExprAST*> Args;
				264	if (CurTok != ')') {
				265	while (1) {
				266	ExprAST *Arg = ParseExpression();
				267	if (!Arg) return 0;
				268	Args.push_back(Arg);
				269
				270	if (CurTok == ')') break;
				271
				272	if (CurTok != ',')
				273	return Error("Expected ')' or ',' in argument list");
				274	getNextToken();
				275	}
				276	}
				277
				278	// Eat the ')'.
				279	getNextToken();
				280
				281	return new CallExprAST(IdName, Args);
				282	}
				283
				284	This routine follows the same style as the other routines. (It expects
				285	to be called if the current token is a ``tok_identifier`` token). It
				286	also has recursion and error handling. One interesting aspect of this is
				287	that it uses look-ahead to determine if the current identifier is a
				288	stand alone variable reference or if it is a function call expression.
				289	It handles this by checking to see if the token after the identifier is
				290	a '(' token, constructing either a ``VariableExprAST`` or
				291	``CallExprAST`` node as appropriate.
				292
				293	Now that we have all of our simple expression-parsing logic in place, we
				294	can define a helper function to wrap it together into one entry point.
				295	We call this class of expressions "primary" expressions, for reasons
				296	that will become more clear `later in the
				297	tutorial <LangImpl6.html#unary>`_. In order to parse an arbitrary
				298	primary expression, we need to determine what sort of expression it is:
				299
				300	.. code-block:: c++
				301
				302	/// primary
				303	/// ::= identifierexpr
				304	/// ::= numberexpr
				305	/// ::= parenexpr
				306	static ExprAST *ParsePrimary() {
				307	switch (CurTok) {
				308	default: return Error("unknown token when expecting an expression");
				309	case tok_identifier: return ParseIdentifierExpr();
				310	case tok_number: return ParseNumberExpr();
				311	case '(': return ParseParenExpr();
				312	}
				313	}
				314
				315	Now that you see the definition of this function, it is more obvious why
				316	we can assume the state of CurTok in the various functions. This uses
				317	look-ahead to determine which sort of expression is being inspected, and
				318	then parses it with a function call.
				319
				320	Now that basic expressions are handled, we need to handle binary
				321	expressions. They are a bit more complex.
				322
				323	Binary Expression Parsing
				324	=========================
				325
				326	Binary expressions are significantly harder to parse because they are
				327	often ambiguous. For example, when given the string "x+y\*z", the parser
				328	can choose to parse it as either "(x+y)\z" or "x+(y\z)". With common
				329	definitions from mathematics, we expect the later parse, because "\*"
				330	(multiplication) has higher precedence than "+" (addition).
				331
				332	There are many ways to handle this, but an elegant and efficient way is
				333	to use `Operator-Precedence
				334	Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_.
				335	This parsing technique uses the precedence of binary operators to guide
				336	recursion. To start with, we need a table of precedences:
				337
				338	.. code-block:: c++
				339
				340	/// BinopPrecedence - This holds the precedence for each binary operator that is
				341	/// defined.
				342	static std::map<char, int> BinopPrecedence;
				343
				344	/// GetTokPrecedence - Get the precedence of the pending binary operator token.
				345	static int GetTokPrecedence() {
				346	if (!isascii(CurTok))
				347	return -1;
				348
				349	// Make sure it's a declared binop.
				350	int TokPrec = BinopPrecedence[CurTok];
				351	if (TokPrec <= 0) return -1;
				352	return TokPrec;
				353	}
				354
				355	int main() {
				356	// Install standard binary operators.
				357	// 1 is lowest precedence.
				358	BinopPrecedence['<'] = 10;
				359	BinopPrecedence['+'] = 20;
				360	BinopPrecedence['-'] = 20;
				361	BinopPrecedence['*'] = 40; // highest.
				362	...
				363	}
				364
				365	For the basic form of Kaleidoscope, we will only support 4 binary
				366	operators (this can obviously be extended by you, our brave and intrepid
				367	reader). The ``GetTokPrecedence`` function returns the precedence for
				368	the current token, or -1 if the token is not a binary operator. Having a
				369	map makes it easy to add new operators and makes it clear that the
				370	algorithm doesn't depend on the specific operators involved, but it
				371	would be easy enough to eliminate the map and do the comparisons in the
				372	``GetTokPrecedence`` function. (Or just use a fixed-size array).
				373
				374	With the helper above defined, we can now start parsing binary
				375	expressions. The basic idea of operator precedence parsing is to break
				376	down an expression with potentially ambiguous binary operators into
				377	pieces. Consider ,for example, the expression "a+b+(c+d)\e\f+g".
				378	Operator precedence parsing considers this as a stream of primary
				379	expressions separated by binary operators. As such, it will first parse
				380	the leading primary expression "a", then it will see the pairs [+, b]
				381	[+, (c+d)] [\, e] [\, f] and [+, g]. Note that because parentheses are
				382	primary expressions, the binary expression parser doesn't need to worry
				383	about nested subexpressions like (c+d) at all.
				384
				385	To start, an expression is a primary expression potentially followed by
				386	a sequence of [binop,primaryexpr] pairs:
				387
				388	.. code-block:: c++
				389
				390	/// expression
				391	/// ::= primary binoprhs
				392	///
				393	static ExprAST *ParseExpression() {
				394	ExprAST *LHS = ParsePrimary();
				395	if (!LHS) return 0;
				396
				397	return ParseBinOpRHS(0, LHS);
				398	}
				399
				400	``ParseBinOpRHS`` is the function that parses the sequence of pairs for
				401	us. It takes a precedence and a pointer to an expression for the part
				402	that has been parsed so far. Note that "x" is a perfectly valid
				403	expression: As such, "binoprhs" is allowed to be empty, in which case it
				404	returns the expression that is passed into it. In our example above, the
				405	code passes the expression for "a" into ``ParseBinOpRHS`` and the
				406	current token is "+".
				407
				408	The precedence value passed into ``ParseBinOpRHS`` indicates the
				409	minimal operator precedence that the function is allowed to eat. For
				410	example, if the current pair stream is [+, x] and ``ParseBinOpRHS`` is
				411	passed in a precedence of 40, it will not consume any tokens (because
				412	the precedence of '+' is only 20). With this in mind, ``ParseBinOpRHS``
				413	starts with:
				414
				415	.. code-block:: c++
				416
				417	/// binoprhs
				418	/// ::= ('+' primary)*
				419	static ExprAST ParseBinOpRHS(int ExprPrec, ExprAST LHS) {
				420	// If this is a binop, find its precedence.
				421	while (1) {
				422	int TokPrec = GetTokPrecedence();
				423
				424	// If this is a binop that binds at least as tightly as the current binop,
				425	// consume it, otherwise we are done.
				426	if (TokPrec < ExprPrec)
				427	return LHS;
				428
				429	This code gets the precedence of the current token and checks to see if
				430	if is too low. Because we defined invalid tokens to have a precedence of
				431	-1, this check implicitly knows that the pair-stream ends when the token
				432	stream runs out of binary operators. If this check succeeds, we know
				433	that the token is a binary operator and that it will be included in this
				434	expression:
				435
				436	.. code-block:: c++
				437
				438	// Okay, we know this is a binop.
				439	int BinOp = CurTok;
				440	getNextToken(); // eat binop
				441
				442	// Parse the primary expression after the binary operator.
				443	ExprAST *RHS = ParsePrimary();
				444	if (!RHS) return 0;
				445
				446	As such, this code eats (and remembers) the binary operator and then
				447	parses the primary expression that follows. This builds up the whole
				448	pair, the first of which is [+, b] for the running example.
				449
				450	Now that we parsed the left-hand side of an expression and one pair of
				451	the RHS sequence, we have to decide which way the expression associates.
				452	In particular, we could have "(a+b) binop unparsed" or "a + (b binop
				453	unparsed)". To determine this, we look ahead at "binop" to determine its
				454	precedence and compare it to BinOp's precedence (which is '+' in this
				455	case):
				456
				457	.. code-block:: c++
				458
				459	// If BinOp binds less tightly with RHS than the operator after RHS, let
				460	// the pending operator take RHS as its LHS.
				461	int NextPrec = GetTokPrecedence();
				462	if (TokPrec < NextPrec) {
				463
				464	If the precedence of the binop to the right of "RHS" is lower or equal
				465	to the precedence of our current operator, then we know that the
				466	parentheses associate as "(a+b) binop ...". In our example, the current
				467	operator is "+" and the next operator is "+", we know that they have the
				468	same precedence. In this case we'll create the AST node for "a+b", and
				469	then continue parsing:
				470
				471	.. code-block:: c++
				472
				473	... if body omitted ...
				474	}
				475
				476	// Merge LHS/RHS.
				477	LHS = new BinaryExprAST(BinOp, LHS, RHS);
				478	} // loop around to the top of the while loop.
				479	}
				480
				481	In our example above, this will turn "a+b+" into "(a+b)" and execute the
				482	next iteration of the loop, with "+" as the current token. The code
				483	above will eat, remember, and parse "(c+d)" as the primary expression,
				484	which makes the current pair equal to [+, (c+d)]. It will then evaluate
				485	the 'if' conditional above with "\*" as the binop to the right of the
				486	primary. In this case, the precedence of "\*" is higher than the
				487	precedence of "+" so the if condition will be entered.
				488
				489	The critical question left here is "how can the if condition parse the
				490	right hand side in full"? In particular, to build the AST correctly for
				491	our example, it needs to get all of "(c+d)\e\f" as the RHS expression
				492	variable. The code to do this is surprisingly simple (code from the
				493	above two blocks duplicated for context):
				494
				495	.. code-block:: c++
				496
				497	// If BinOp binds less tightly with RHS than the operator after RHS, let
				498	// the pending operator take RHS as its LHS.
				499	int NextPrec = GetTokPrecedence();
				500	if (TokPrec < NextPrec) {
				501	RHS = ParseBinOpRHS(TokPrec+1, RHS);
				502	if (RHS == 0) return 0;
				503	}
				504	// Merge LHS/RHS.
				505	LHS = new BinaryExprAST(BinOp, LHS, RHS);
				506	} // loop around to the top of the while loop.
				507	}
				508
				509	At this point, we know that the binary operator to the RHS of our
				510	primary has higher precedence than the binop we are currently parsing.
				511	As such, we know that any sequence of pairs whose operators are all
				512	higher precedence than "+" should be parsed together and returned as
				513	"RHS". To do this, we recursively invoke the ``ParseBinOpRHS`` function
				514	specifying "TokPrec+1" as the minimum precedence required for it to
				515	continue. In our example above, this will cause it to return the AST
				516	node for "(c+d)\e\f" as RHS, which is then set as the RHS of the '+'
				517	expression.
				518
				519	Finally, on the next iteration of the while loop, the "+g" piece is
				520	parsed and added to the AST. With this little bit of code (14
				521	non-trivial lines), we correctly handle fully general binary expression
				522	parsing in a very elegant way. This was a whirlwind tour of this code,
				523	and it is somewhat subtle. I recommend running through it with a few
				524	tough examples to see how it works.
				525
				526	This wraps up handling of expressions. At this point, we can point the
				527	parser at an arbitrary token stream and build an expression from it,
				528	stopping at the first token that is not part of the expression. Next up
				529	we need to handle function definitions, etc.
				530
				531	Parsing the Rest
				532	================
				533
				534	The next thing missing is handling of function prototypes. In
				535	Kaleidoscope, these are used both for 'extern' function declarations as
				536	well as function body definitions. The code to do this is
				537	straight-forward and not very interesting (once you've survived
				538	expressions):
				539
				540	.. code-block:: c++
				541
				542	/// prototype
				543	/// ::= id '(' id* ')'
				544	static PrototypeAST *ParsePrototype() {
				545	if (CurTok != tok_identifier)
				546	return ErrorP("Expected function name in prototype");
				547
				548	std::string FnName = IdentifierStr;
				549	getNextToken();
				550
				551	if (CurTok != '(')
				552	return ErrorP("Expected '(' in prototype");
				553
				554	// Read the list of argument names.
				555	std::vector<std::string> ArgNames;
				556	while (getNextToken() == tok_identifier)
				557	ArgNames.push_back(IdentifierStr);
				558	if (CurTok != ')')
				559	return ErrorP("Expected ')' in prototype");
				560
				561	// success.
				562	getNextToken(); // eat ')'.
				563
				564	return new PrototypeAST(FnName, ArgNames);
				565	}
				566
				567	Given this, a function definition is very simple, just a prototype plus
				568	an expression to implement the body:
				569
				570	.. code-block:: c++
				571
				572	/// definition ::= 'def' prototype expression
				573	static FunctionAST *ParseDefinition() {
				574	getNextToken(); // eat def.
				575	PrototypeAST *Proto = ParsePrototype();
				576	if (Proto == 0) return 0;
				577
				578	if (ExprAST *E = ParseExpression())
				579	return new FunctionAST(Proto, E);
				580	return 0;
				581	}
				582
				583	In addition, we support 'extern' to declare functions like 'sin' and
				584	'cos' as well as to support forward declaration of user functions. These
				585	'extern's are just prototypes with no body:
				586
				587	.. code-block:: c++
				588
				589	/// external ::= 'extern' prototype
				590	static PrototypeAST *ParseExtern() {
				591	getNextToken(); // eat extern.
				592	return ParsePrototype();
				593	}
				594
				595	Finally, we'll also let the user type in arbitrary top-level expressions
				596	and evaluate them on the fly. We will handle this by defining anonymous
				597	nullary (zero argument) functions for them:
				598
				599	.. code-block:: c++
				600
				601	/// toplevelexpr ::= expression
				602	static FunctionAST *ParseTopLevelExpr() {
				603	if (ExprAST *E = ParseExpression()) {
				604	// Make an anonymous proto.
				605	PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
				606	return new FunctionAST(Proto, E);
				607	}
				608	return 0;
				609	}
				610
				611	Now that we have all the pieces, let's build a little driver that will
				612	let us actually execute this code we've built!
				613
				614	The Driver
				615	==========
				616
				617	The driver for this simply invokes all of the parsing pieces with a
				618	top-level dispatch loop. There isn't much interesting here, so I'll just
				619	include the top-level loop. See `below <#code>`_ for full code in the
				620	"Top-Level Parsing" section.
				621
				622	.. code-block:: c++
				623
				624	/// top ::= definition \| external \| expression \| ';'
				625	static void MainLoop() {
				626	while (1) {
				627	fprintf(stderr, "ready> ");
				628	switch (CurTok) {
				629	case tok_eof: return;
				630	case ';': getNextToken(); break; // ignore top-level semicolons.
				631	case tok_def: HandleDefinition(); break;
				632	case tok_extern: HandleExtern(); break;
				633	default: HandleTopLevelExpression(); break;
				634	}
				635	}
				636	}
				637
				638	The most interesting part of this is that we ignore top-level
				639	semicolons. Why is this, you ask? The basic reason is that if you type
				640	"4 + 5" at the command line, the parser doesn't know whether that is the
				641	end of what you will type or not. For example, on the next line you
				642	could type "def foo..." in which case 4+5 is the end of a top-level
				643	expression. Alternatively you could type "\* 6", which would continue
				644	the expression. Having top-level semicolons allows you to type "4+5;",
				645	and the parser will know you are done.
				646
				647	Conclusions
				648	===========
				649
				650	With just under 400 lines of commented code (240 lines of non-comment,
				651	non-blank code), we fully defined our minimal language, including a
				652	lexer, parser, and AST builder. With this done, the executable will
				653	validate Kaleidoscope code and tell us if it is grammatically invalid.
				654	For example, here is a sample interaction:
				655
				656	.. code-block:: bash
				657
				658	$ ./a.out
				659	ready> def foo(x y) x+foo(y, 4.0);
				660	Parsed a function definition.
				661	ready> def foo(x y) x+y y;
				662	Parsed a function definition.
				663	Parsed a top-level expr
				664	ready> def foo(x y) x+y );
				665	Parsed a function definition.
				666	Error: unknown token when expecting an expression
				667	ready> extern sin(a);
				668	ready> Parsed an extern
				669	ready> ^D
				670	$
				671
				672	There is a lot of room for extension here. You can define new AST nodes,
				673	extend the language in many ways, etc. In the `next
				674	installment <LangImpl3.html>`_, we will describe how to generate LLVM
				675	Intermediate Representation (IR) from the AST.
				676
				677	Full Code Listing
				678	=================
				679
				680	Here is the complete code listing for this and the previous chapter.
				681	Note that it is fully self-contained: you don't need LLVM or any
				682	external libraries at all for this. (Besides the C and C++ standard
				683	libraries, of course.) To build this, just compile with:
				684
				685	.. code-block:: bash
				686
				687	# Compile
				688	clang++ -g -O3 toy.cpp
				689	# Run
				690	./a.out
				691
				692	Here is the code:
				693
Logan Chien	2dce4f2	2013-06-08 09:03:03 +0000	[diff] [blame]	694	.. literalinclude:: ../../examples/Kaleidoscope/Chapter2/toy.cpp
				695	:language: c++
Sean Silva	ee47edf	2012-12-05 00:26:32 +0000	[diff] [blame]	696
				697	`Next: Implementing Code Generation to LLVM IR <LangImpl3.html>`_
				698