Blame - llvm/docs/tutorial/MyFirstLanguageFrontend/LangImpl02.rst - toolchain/llvm-project

blob: 3a2680099dfd27123f36205cf3af95729a0b7aa3 [file] [log] [blame]

Hans Wennborg	74e4f8a	2019-04-11 07:46:25 +0000	[diff] [blame]	1	:orphan:
				2
Chris Lattner	d80f118	2019-04-07 13:14:23 +0000	[diff] [blame]	3	===========================================
				4	Kaleidoscope: Implementing a Parser and AST
				5	===========================================
				6
				7	.. contents::
				8	:local:
				9
				10	Chapter 2 Introduction
				11	======================
				12
				13	Welcome to Chapter 2 of the "`Implementing a language with
				14	LLVM <index.html>`_" tutorial. This chapter shows you how to use the
				15	lexer, built in `Chapter 1 <LangImpl01.html>`_, to build a full
				16	`parser <http://en.wikipedia.org/wiki/Parsing>`_ for our Kaleidoscope
				17	language. Once we have a parser, we'll define and build an `Abstract
				18	Syntax Tree <http://en.wikipedia.org/wiki/Abstract_syntax_tree>`_ (AST).
				19
				20	The parser we will build uses a combination of `Recursive Descent
				21	Parsing <http://en.wikipedia.org/wiki/Recursive_descent_parser>`_ and
				22	`Operator-Precedence
				23	Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_ to
				24	parse the Kaleidoscope language (the latter for binary expressions and
				25	the former for everything else). Before we get to parsing though, let's
				26	talk about the output of the parser: the Abstract Syntax Tree.
				27
				28	The Abstract Syntax Tree (AST)
				29	==============================
				30
				31	The AST for a program captures its behavior in such a way that it is
				32	easy for later stages of the compiler (e.g. code generation) to
				33	interpret. We basically want one object for each construct in the
				34	language, and the AST should closely model the language. In
				35	Kaleidoscope, we have expressions, a prototype, and a function object.
				36	We'll start with expressions first:
				37
				38	.. code-block:: c++
				39
				40	/// ExprAST - Base class for all expression nodes.
				41	class ExprAST {
				42	public:
				43	virtual ~ExprAST() {}
				44	};
				45
				46	/// NumberExprAST - Expression class for numeric literals like "1.0".
				47	class NumberExprAST : public ExprAST {
				48	double Val;
				49
				50	public:
				51	NumberExprAST(double Val) : Val(Val) {}
				52	};
				53
				54	The code above shows the definition of the base ExprAST class and one
				55	subclass which we use for numeric literals. The important thing to note
				56	about this code is that the NumberExprAST class captures the numeric
				57	value of the literal as an instance variable. This allows later phases
				58	of the compiler to know what the stored numeric value is.
				59
				60	Right now we only create the AST, so there are no useful accessor
				61	methods on them. It would be very easy to add a virtual method to pretty
				62	print the code, for example. Here are the other expression AST node
				63	definitions that we'll use in the basic form of the Kaleidoscope
				64	language:
				65
				66	.. code-block:: c++
				67
				68	/// VariableExprAST - Expression class for referencing a variable, like "a".
				69	class VariableExprAST : public ExprAST {
				70	std::string Name;
				71
				72	public:
				73	VariableExprAST(const std::string &Name) : Name(Name) {}
				74	};
				75
				76	/// BinaryExprAST - Expression class for a binary operator.
				77	class BinaryExprAST : public ExprAST {
				78	char Op;
				79	std::unique_ptr<ExprAST> LHS, RHS;
				80
				81	public:
				82	BinaryExprAST(char op, std::unique_ptr<ExprAST> LHS,
				83	std::unique_ptr<ExprAST> RHS)
				84	: Op(op), LHS(std::move(LHS)), RHS(std::move(RHS)) {}
				85	};
				86
				87	/// CallExprAST - Expression class for function calls.
				88	class CallExprAST : public ExprAST {
				89	std::string Callee;
				90	std::vector<std::unique_ptr<ExprAST>> Args;
				91
				92	public:
				93	CallExprAST(const std::string &Callee,
				94	std::vector<std::unique_ptr<ExprAST>> Args)
				95	: Callee(Callee), Args(std::move(Args)) {}
				96	};
				97
				98	This is all (intentionally) rather straight-forward: variables capture
				99	the variable name, binary operators capture their opcode (e.g. '+'), and
				100	calls capture a function name as well as a list of any argument
				101	expressions. One thing that is nice about our AST is that it captures
				102	the language features without talking about the syntax of the language.
				103	Note that there is no discussion about precedence of binary operators,
				104	lexical structure, etc.
				105
				106	For our basic language, these are all of the expression nodes we'll
				107	define. Because it doesn't have conditional control flow, it isn't
				108	Turing-complete; we'll fix that in a later installment. The two things
				109	we need next are a way to talk about the interface to a function, and a
				110	way to talk about functions themselves:
				111
				112	.. code-block:: c++
				113
				114	/// PrototypeAST - This class represents the "prototype" for a function,
				115	/// which captures its name, and its argument names (thus implicitly the number
				116	/// of arguments the function takes).
				117	class PrototypeAST {
				118	std::string Name;
				119	std::vector<std::string> Args;
				120
				121	public:
				122	PrototypeAST(const std::string &name, std::vector<std::string> Args)
				123	: Name(name), Args(std::move(Args)) {}
				124
				125	const std::string &getName() const { return Name; }
				126	};
				127
				128	/// FunctionAST - This class represents a function definition itself.
				129	class FunctionAST {
				130	std::unique_ptr<PrototypeAST> Proto;
				131	std::unique_ptr<ExprAST> Body;
				132
				133	public:
				134	FunctionAST(std::unique_ptr<PrototypeAST> Proto,
				135	std::unique_ptr<ExprAST> Body)
				136	: Proto(std::move(Proto)), Body(std::move(Body)) {}
				137	};
				138
				139	In Kaleidoscope, functions are typed with just a count of their
				140	arguments. Since all values are double precision floating point, the
				141	type of each argument doesn't need to be stored anywhere. In a more
				142	aggressive and realistic language, the "ExprAST" class would probably
				143	have a type field.
				144
				145	With this scaffolding, we can now talk about parsing expressions and
				146	function bodies in Kaleidoscope.
				147
				148	Parser Basics
				149	=============
				150
				151	Now that we have an AST to build, we need to define the parser code to
				152	build it. The idea here is that we want to parse something like "x+y"
				153	(which is returned as three tokens by the lexer) into an AST that could
				154	be generated with calls like this:
				155
				156	.. code-block:: c++
				157
				158	auto LHS = llvm::make_unique<VariableExprAST>("x");
				159	auto RHS = llvm::make_unique<VariableExprAST>("y");
				160	auto Result = std::make_unique<BinaryExprAST>('+', std::move(LHS),
				161	std::move(RHS));
				162
				163	In order to do this, we'll start by defining some basic helper routines:
				164
				165	.. code-block:: c++
				166
				167	/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
				168	/// token the parser is looking at. getNextToken reads another token from the
				169	/// lexer and updates CurTok with its results.
				170	static int CurTok;
				171	static int getNextToken() {
				172	return CurTok = gettok();
				173	}
				174
				175	This implements a simple token buffer around the lexer. This allows us
				176	to look one token ahead at what the lexer is returning. Every function
				177	in our parser will assume that CurTok is the current token that needs to
				178	be parsed.
				179
				180	.. code-block:: c++
				181
				182
				183	/// LogError* - These are little helper functions for error handling.
				184	std::unique_ptr<ExprAST> LogError(const char *Str) {
				185	fprintf(stderr, "LogError: %s\n", Str);
				186	return nullptr;
				187	}
				188	std::unique_ptr<PrototypeAST> LogErrorP(const char *Str) {
				189	LogError(Str);
				190	return nullptr;
				191	}
				192
				193	The ``LogError`` routines are simple helper routines that our parser will
				194	use to handle errors. The error recovery in our parser will not be the
				195	best and is not particular user-friendly, but it will be enough for our
				196	tutorial. These routines make it easier to handle errors in routines
				197	that have various return types: they always return null.
				198
				199	With these basic helper functions, we can implement the first piece of
				200	our grammar: numeric literals.
				201
				202	Basic Expression Parsing
				203	========================
				204
				205	We start with numeric literals, because they are the simplest to
				206	process. For each production in our grammar, we'll define a function
				207	which parses that production. For numeric literals, we have:
				208
				209	.. code-block:: c++
				210
				211	/// numberexpr ::= number
				212	static std::unique_ptr<ExprAST> ParseNumberExpr() {
				213	auto Result = llvm::make_unique<NumberExprAST>(NumVal);
				214	getNextToken(); // consume the number
				215	return std::move(Result);
				216	}
				217
				218	This routine is very simple: it expects to be called when the current
				219	token is a ``tok_number`` token. It takes the current number value,
				220	creates a ``NumberExprAST`` node, advances the lexer to the next token,
				221	and finally returns.
				222
				223	There are some interesting aspects to this. The most important one is
				224	that this routine eats all of the tokens that correspond to the
				225	production and returns the lexer buffer with the next token (which is
				226	not part of the grammar production) ready to go. This is a fairly
				227	standard way to go for recursive descent parsers. For a better example,
				228	the parenthesis operator is defined like this:
				229
				230	.. code-block:: c++
				231
				232	/// parenexpr ::= '(' expression ')'
				233	static std::unique_ptr<ExprAST> ParseParenExpr() {
				234	getNextToken(); // eat (.
				235	auto V = ParseExpression();
				236	if (!V)
				237	return nullptr;
				238
				239	if (CurTok != ')')
				240	return LogError("expected ')'");
				241	getNextToken(); // eat ).
				242	return V;
				243	}
				244
				245	This function illustrates a number of interesting things about the
				246	parser:
				247
				248	1) It shows how we use the LogError routines. When called, this function
				249	expects that the current token is a '(' token, but after parsing the
				250	subexpression, it is possible that there is no ')' waiting. For example,
				251	if the user types in "(4 x" instead of "(4)", the parser should emit an
				252	error. Because errors can occur, the parser needs a way to indicate that
				253	they happened: in our parser, we return null on an error.
				254
				255	2) Another interesting aspect of this function is that it uses recursion
				256	by calling ``ParseExpression`` (we will soon see that
				257	``ParseExpression`` can call ``ParseParenExpr``). This is powerful
				258	because it allows us to handle recursive grammars, and keeps each
				259	production very simple. Note that parentheses do not cause construction
				260	of AST nodes themselves. While we could do it this way, the most
				261	important role of parentheses are to guide the parser and provide
				262	grouping. Once the parser constructs the AST, parentheses are not
				263	needed.
				264
				265	The next simple production is for handling variable references and
				266	function calls:
				267
				268	.. code-block:: c++
				269
				270	/// identifierexpr
				271	/// ::= identifier
				272	/// ::= identifier '(' expression* ')'
				273	static std::unique_ptr<ExprAST> ParseIdentifierExpr() {
				274	std::string IdName = IdentifierStr;
				275
				276	getNextToken(); // eat identifier.
				277
				278	if (CurTok != '(') // Simple variable ref.
				279	return llvm::make_unique<VariableExprAST>(IdName);
				280
				281	// Call.
				282	getNextToken(); // eat (
				283	std::vector<std::unique_ptr<ExprAST>> Args;
				284	if (CurTok != ')') {
				285	while (1) {
				286	if (auto Arg = ParseExpression())
				287	Args.push_back(std::move(Arg));
				288	else
				289	return nullptr;
				290
				291	if (CurTok == ')')
				292	break;
				293
				294	if (CurTok != ',')
				295	return LogError("Expected ')' or ',' in argument list");
				296	getNextToken();
				297	}
				298	}
				299
				300	// Eat the ')'.
				301	getNextToken();
				302
				303	return llvm::make_unique<CallExprAST>(IdName, std::move(Args));
				304	}
				305
				306	This routine follows the same style as the other routines. (It expects
				307	to be called if the current token is a ``tok_identifier`` token). It
				308	also has recursion and error handling. One interesting aspect of this is
				309	that it uses look-ahead to determine if the current identifier is a
				310	stand alone variable reference or if it is a function call expression.
				311	It handles this by checking to see if the token after the identifier is
				312	a '(' token, constructing either a ``VariableExprAST`` or
				313	``CallExprAST`` node as appropriate.
				314
				315	Now that we have all of our simple expression-parsing logic in place, we
				316	can define a helper function to wrap it together into one entry point.
				317	We call this class of expressions "primary" expressions, for reasons
				318	that will become more clear `later in the
				319	tutorial <LangImpl6.html#user-defined-unary-operators>`_. In order to parse an arbitrary
				320	primary expression, we need to determine what sort of expression it is:
				321
				322	.. code-block:: c++
				323
				324	/// primary
				325	/// ::= identifierexpr
				326	/// ::= numberexpr
				327	/// ::= parenexpr
				328	static std::unique_ptr<ExprAST> ParsePrimary() {
				329	switch (CurTok) {
				330	default:
				331	return LogError("unknown token when expecting an expression");
				332	case tok_identifier:
				333	return ParseIdentifierExpr();
				334	case tok_number:
				335	return ParseNumberExpr();
				336	case '(':
				337	return ParseParenExpr();
				338	}
				339	}
				340
				341	Now that you see the definition of this function, it is more obvious why
				342	we can assume the state of CurTok in the various functions. This uses
				343	look-ahead to determine which sort of expression is being inspected, and
				344	then parses it with a function call.
				345
				346	Now that basic expressions are handled, we need to handle binary
				347	expressions. They are a bit more complex.
				348
				349	Binary Expression Parsing
				350	=========================
				351
				352	Binary expressions are significantly harder to parse because they are
				353	often ambiguous. For example, when given the string "x+y\*z", the parser
				354	can choose to parse it as either "(x+y)\z" or "x+(y\z)". With common
				355	definitions from mathematics, we expect the later parse, because "\*"
				356	(multiplication) has higher precedence than "+" (addition).
				357
				358	There are many ways to handle this, but an elegant and efficient way is
				359	to use `Operator-Precedence
				360	Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_.
				361	This parsing technique uses the precedence of binary operators to guide
				362	recursion. To start with, we need a table of precedences:
				363
				364	.. code-block:: c++
				365
				366	/// BinopPrecedence - This holds the precedence for each binary operator that is
				367	/// defined.
				368	static std::map<char, int> BinopPrecedence;
				369
				370	/// GetTokPrecedence - Get the precedence of the pending binary operator token.
				371	static int GetTokPrecedence() {
				372	if (!isascii(CurTok))
				373	return -1;
				374
				375	// Make sure it's a declared binop.
				376	int TokPrec = BinopPrecedence[CurTok];
				377	if (TokPrec <= 0) return -1;
				378	return TokPrec;
				379	}
				380
				381	int main() {
				382	// Install standard binary operators.
				383	// 1 is lowest precedence.
				384	BinopPrecedence['<'] = 10;
				385	BinopPrecedence['+'] = 20;
				386	BinopPrecedence['-'] = 20;
				387	BinopPrecedence['*'] = 40; // highest.
				388	...
				389	}
				390
				391	For the basic form of Kaleidoscope, we will only support 4 binary
				392	operators (this can obviously be extended by you, our brave and intrepid
				393	reader). The ``GetTokPrecedence`` function returns the precedence for
				394	the current token, or -1 if the token is not a binary operator. Having a
				395	map makes it easy to add new operators and makes it clear that the
				396	algorithm doesn't depend on the specific operators involved, but it
				397	would be easy enough to eliminate the map and do the comparisons in the
				398	``GetTokPrecedence`` function. (Or just use a fixed-size array).
				399
				400	With the helper above defined, we can now start parsing binary
				401	expressions. The basic idea of operator precedence parsing is to break
				402	down an expression with potentially ambiguous binary operators into
				403	pieces. Consider, for example, the expression "a+b+(c+d)\e\f+g".
				404	Operator precedence parsing considers this as a stream of primary
				405	expressions separated by binary operators. As such, it will first parse
				406	the leading primary expression "a", then it will see the pairs [+, b]
				407	[+, (c+d)] [\, e] [\, f] and [+, g]. Note that because parentheses are
				408	primary expressions, the binary expression parser doesn't need to worry
				409	about nested subexpressions like (c+d) at all.
				410
				411	To start, an expression is a primary expression potentially followed by
				412	a sequence of [binop,primaryexpr] pairs:
				413
				414	.. code-block:: c++
				415
				416	/// expression
				417	/// ::= primary binoprhs
				418	///
				419	static std::unique_ptr<ExprAST> ParseExpression() {
				420	auto LHS = ParsePrimary();
				421	if (!LHS)
				422	return nullptr;
				423
				424	return ParseBinOpRHS(0, std::move(LHS));
				425	}
				426
				427	``ParseBinOpRHS`` is the function that parses the sequence of pairs for
				428	us. It takes a precedence and a pointer to an expression for the part
				429	that has been parsed so far. Note that "x" is a perfectly valid
				430	expression: As such, "binoprhs" is allowed to be empty, in which case it
				431	returns the expression that is passed into it. In our example above, the
				432	code passes the expression for "a" into ``ParseBinOpRHS`` and the
				433	current token is "+".
				434
				435	The precedence value passed into ``ParseBinOpRHS`` indicates the
				436	minimal operator precedence that the function is allowed to eat. For
				437	example, if the current pair stream is [+, x] and ``ParseBinOpRHS`` is
				438	passed in a precedence of 40, it will not consume any tokens (because
				439	the precedence of '+' is only 20). With this in mind, ``ParseBinOpRHS``
				440	starts with:
				441
				442	.. code-block:: c++
				443
				444	/// binoprhs
				445	/// ::= ('+' primary)*
				446	static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec,
				447	std::unique_ptr<ExprAST> LHS) {
				448	// If this is a binop, find its precedence.
				449	while (1) {
				450	int TokPrec = GetTokPrecedence();
				451
				452	// If this is a binop that binds at least as tightly as the current binop,
				453	// consume it, otherwise we are done.
				454	if (TokPrec < ExprPrec)
				455	return LHS;
				456
				457	This code gets the precedence of the current token and checks to see if
				458	if is too low. Because we defined invalid tokens to have a precedence of
				459	-1, this check implicitly knows that the pair-stream ends when the token
				460	stream runs out of binary operators. If this check succeeds, we know
				461	that the token is a binary operator and that it will be included in this
				462	expression:
				463
				464	.. code-block:: c++
				465
				466	// Okay, we know this is a binop.
				467	int BinOp = CurTok;
				468	getNextToken(); // eat binop
				469
				470	// Parse the primary expression after the binary operator.
				471	auto RHS = ParsePrimary();
				472	if (!RHS)
				473	return nullptr;
				474
				475	As such, this code eats (and remembers) the binary operator and then
				476	parses the primary expression that follows. This builds up the whole
				477	pair, the first of which is [+, b] for the running example.
				478
				479	Now that we parsed the left-hand side of an expression and one pair of
				480	the RHS sequence, we have to decide which way the expression associates.
				481	In particular, we could have "(a+b) binop unparsed" or "a + (b binop
				482	unparsed)". To determine this, we look ahead at "binop" to determine its
				483	precedence and compare it to BinOp's precedence (which is '+' in this
				484	case):
				485
				486	.. code-block:: c++
				487
				488	// If BinOp binds less tightly with RHS than the operator after RHS, let
				489	// the pending operator take RHS as its LHS.
				490	int NextPrec = GetTokPrecedence();
				491	if (TokPrec < NextPrec) {
				492
				493	If the precedence of the binop to the right of "RHS" is lower or equal
				494	to the precedence of our current operator, then we know that the
				495	parentheses associate as "(a+b) binop ...". In our example, the current
				496	operator is "+" and the next operator is "+", we know that they have the
				497	same precedence. In this case we'll create the AST node for "a+b", and
				498	then continue parsing:
				499
				500	.. code-block:: c++
				501
				502	... if body omitted ...
				503	}
				504
				505	// Merge LHS/RHS.
				506	LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS),
				507	std::move(RHS));
				508	} // loop around to the top of the while loop.
				509	}
				510
				511	In our example above, this will turn "a+b+" into "(a+b)" and execute the
				512	next iteration of the loop, with "+" as the current token. The code
				513	above will eat, remember, and parse "(c+d)" as the primary expression,
				514	which makes the current pair equal to [+, (c+d)]. It will then evaluate
				515	the 'if' conditional above with "\*" as the binop to the right of the
				516	primary. In this case, the precedence of "\*" is higher than the
				517	precedence of "+" so the if condition will be entered.
				518
				519	The critical question left here is "how can the if condition parse the
				520	right hand side in full"? In particular, to build the AST correctly for
				521	our example, it needs to get all of "(c+d)\e\f" as the RHS expression
				522	variable. The code to do this is surprisingly simple (code from the
				523	above two blocks duplicated for context):
				524
				525	.. code-block:: c++
				526
				527	// If BinOp binds less tightly with RHS than the operator after RHS, let
				528	// the pending operator take RHS as its LHS.
				529	int NextPrec = GetTokPrecedence();
				530	if (TokPrec < NextPrec) {
				531	RHS = ParseBinOpRHS(TokPrec+1, std::move(RHS));
				532	if (!RHS)
				533	return nullptr;
				534	}
				535	// Merge LHS/RHS.
				536	LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS),
				537	std::move(RHS));
				538	} // loop around to the top of the while loop.
				539	}
				540
				541	At this point, we know that the binary operator to the RHS of our
				542	primary has higher precedence than the binop we are currently parsing.
				543	As such, we know that any sequence of pairs whose operators are all
				544	higher precedence than "+" should be parsed together and returned as
				545	"RHS". To do this, we recursively invoke the ``ParseBinOpRHS`` function
				546	specifying "TokPrec+1" as the minimum precedence required for it to
				547	continue. In our example above, this will cause it to return the AST
				548	node for "(c+d)\e\f" as RHS, which is then set as the RHS of the '+'
				549	expression.
				550
				551	Finally, on the next iteration of the while loop, the "+g" piece is
				552	parsed and added to the AST. With this little bit of code (14
				553	non-trivial lines), we correctly handle fully general binary expression
				554	parsing in a very elegant way. This was a whirlwind tour of this code,
				555	and it is somewhat subtle. I recommend running through it with a few
				556	tough examples to see how it works.
				557
				558	This wraps up handling of expressions. At this point, we can point the
				559	parser at an arbitrary token stream and build an expression from it,
				560	stopping at the first token that is not part of the expression. Next up
				561	we need to handle function definitions, etc.
				562
				563	Parsing the Rest
				564	================
				565
				566	The next thing missing is handling of function prototypes. In
				567	Kaleidoscope, these are used both for 'extern' function declarations as
				568	well as function body definitions. The code to do this is
				569	straight-forward and not very interesting (once you've survived
				570	expressions):
				571
				572	.. code-block:: c++
				573
				574	/// prototype
				575	/// ::= id '(' id* ')'
				576	static std::unique_ptr<PrototypeAST> ParsePrototype() {
				577	if (CurTok != tok_identifier)
				578	return LogErrorP("Expected function name in prototype");
				579
				580	std::string FnName = IdentifierStr;
				581	getNextToken();
				582
				583	if (CurTok != '(')
				584	return LogErrorP("Expected '(' in prototype");
				585
				586	// Read the list of argument names.
				587	std::vector<std::string> ArgNames;
				588	while (getNextToken() == tok_identifier)
				589	ArgNames.push_back(IdentifierStr);
				590	if (CurTok != ')')
				591	return LogErrorP("Expected ')' in prototype");
				592
				593	// success.
				594	getNextToken(); // eat ')'.
				595
				596	return llvm::make_unique<PrototypeAST>(FnName, std::move(ArgNames));
				597	}
				598
				599	Given this, a function definition is very simple, just a prototype plus
				600	an expression to implement the body:
				601
				602	.. code-block:: c++
				603
				604	/// definition ::= 'def' prototype expression
				605	static std::unique_ptr<FunctionAST> ParseDefinition() {
				606	getNextToken(); // eat def.
				607	auto Proto = ParsePrototype();
				608	if (!Proto) return nullptr;
				609
				610	if (auto E = ParseExpression())
				611	return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E));
				612	return nullptr;
				613	}
				614
				615	In addition, we support 'extern' to declare functions like 'sin' and
				616	'cos' as well as to support forward declaration of user functions. These
				617	'extern's are just prototypes with no body:
				618
				619	.. code-block:: c++
				620
				621	/// external ::= 'extern' prototype
				622	static std::unique_ptr<PrototypeAST> ParseExtern() {
				623	getNextToken(); // eat extern.
				624	return ParsePrototype();
				625	}
				626
				627	Finally, we'll also let the user type in arbitrary top-level expressions
				628	and evaluate them on the fly. We will handle this by defining anonymous
				629	nullary (zero argument) functions for them:
				630
				631	.. code-block:: c++
				632
				633	/// toplevelexpr ::= expression
				634	static std::unique_ptr<FunctionAST> ParseTopLevelExpr() {
				635	if (auto E = ParseExpression()) {
				636	// Make an anonymous proto.
				637	auto Proto = llvm::make_unique<PrototypeAST>("", std::vector<std::string>());
				638	return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E));
				639	}
				640	return nullptr;
				641	}
				642
				643	Now that we have all the pieces, let's build a little driver that will
				644	let us actually execute this code we've built!
				645
				646	The Driver
				647	==========
				648
				649	The driver for this simply invokes all of the parsing pieces with a
				650	top-level dispatch loop. There isn't much interesting here, so I'll just
				651	include the top-level loop. See `below <#full-code-listing>`_ for full code in the
				652	"Top-Level Parsing" section.
				653
				654	.. code-block:: c++
				655
				656	/// top ::= definition \| external \| expression \| ';'
				657	static void MainLoop() {
				658	while (1) {
				659	fprintf(stderr, "ready> ");
				660	switch (CurTok) {
				661	case tok_eof:
				662	return;
				663	case ';': // ignore top-level semicolons.
				664	getNextToken();
				665	break;
				666	case tok_def:
				667	HandleDefinition();
				668	break;
				669	case tok_extern:
				670	HandleExtern();
				671	break;
				672	default:
				673	HandleTopLevelExpression();
				674	break;
				675	}
				676	}
				677	}
				678
				679	The most interesting part of this is that we ignore top-level
				680	semicolons. Why is this, you ask? The basic reason is that if you type
				681	"4 + 5" at the command line, the parser doesn't know whether that is the
				682	end of what you will type or not. For example, on the next line you
				683	could type "def foo..." in which case 4+5 is the end of a top-level
				684	expression. Alternatively you could type "\* 6", which would continue
				685	the expression. Having top-level semicolons allows you to type "4+5;",
				686	and the parser will know you are done.
				687
				688	Conclusions
				689	===========
				690
				691	With just under 400 lines of commented code (240 lines of non-comment,
				692	non-blank code), we fully defined our minimal language, including a
				693	lexer, parser, and AST builder. With this done, the executable will
				694	validate Kaleidoscope code and tell us if it is grammatically invalid.
				695	For example, here is a sample interaction:
				696
				697	.. code-block:: bash
				698
				699	$ ./a.out
				700	ready> def foo(x y) x+foo(y, 4.0);
				701	Parsed a function definition.
				702	ready> def foo(x y) x+y y;
				703	Parsed a function definition.
				704	Parsed a top-level expr
				705	ready> def foo(x y) x+y );
				706	Parsed a function definition.
				707	Error: unknown token when expecting an expression
				708	ready> extern sin(a);
				709	ready> Parsed an extern
				710	ready> ^D
				711	$
				712
				713	There is a lot of room for extension here. You can define new AST nodes,
				714	extend the language in many ways, etc. In the `next
				715	installment <LangImpl03.html>`_, we will describe how to generate LLVM
				716	Intermediate Representation (IR) from the AST.
				717
				718	Full Code Listing
				719	=================
				720
				721	Here is the complete code listing for our running example. Because this
				722	uses the LLVM libraries, we need to link them in. To do this, we use the
				723	`llvm-config <http://llvm.org/cmds/llvm-config.html>`_ tool to inform
				724	our makefile/command line about which options to use:
				725
				726	.. code-block:: bash
				727
				728	# Compile
				729	clang++ -g -O3 toy.cpp `llvm-config --cxxflags`
				730	# Run
				731	./a.out
				732
				733	Here is the code:
				734
Hans Wennborg	147e0dd	2019-04-11 07:30:56 +0000	[diff] [blame]	735	.. literalinclude:: ../../../examples/Kaleidoscope/Chapter2/toy.cpp
Chris Lattner	d80f118	2019-04-07 13:14:23 +0000	[diff] [blame]	736	:language: c++
				737
				738	`Next: Implementing Code Generation to LLVM IR <LangImpl03.html>`_
				739