Blame - docs/tutorial/LangImpl7.html - fp2-dev/platform/external/llvm

blob: b002f7a45193f31427a246f451e7be5699eb3c0f [file] [log] [blame]

Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
				2	"http://www.w3.org/TR/html4/strict.dtd">
				3
				4	<html>
				5	<head>
				6	<title>Kaleidoscope: Extending the Language: Mutable Variables / SSA
				7	construction</title>
				8	<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
				9	<meta name="author" content="Chris Lattner">
				10	<link rel="stylesheet" href="../llvm.css" type="text/css">
				11	</head>
				12
				13	<body>
				14
				15	<div class="doc_title">Kaleidoscope: Extending the Language: Mutable Variables</div>
				16
				17	<div class="doc_author">
				18	<p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
				19	</div>
				20
				21	<!-- *********************************************************************** -->
				22	<div class="doc_section"><a name="intro">Part 7 Introduction</a></div>
				23	<!-- *********************************************************************** -->
				24
				25	<div class="doc_text">
				26
				27	<p>Welcome to Part 7 of the "<a href="index.html">Implementing a language with
				28	LLVM</a>" tutorial. In parts 1 through 6, we've built a very respectable,
				29	albeit simple, <a
				30	href="http://en.wikipedia.org/wiki/Functional_programming">functional
				31	programming language</a>. In our journey, we learned some parsing techniques,
				32	how to build and represent an AST, how to build LLVM IR, and how to optimize
				33	the resultant code and JIT compile it.</p>
				34
				35	<p>While Kaleidoscope is interesting as a functional language, this makes it
				36	"too easy" to generate LLVM IR for it. In particular, a functional language
				37	makes it very easy to build LLVM IR directly in <a
				38	href="http://en.wikipedia.org/wiki/Static_single_assignment_form">SSA form</a>.
				39	Since LLVM requires that the input code be in SSA form, this is a very nice
				40	property and it is often unclear to newcomers how to generate code for an
				41	imperative language with mutable variables.</p>
				42
				43	<p>The short (and happy) summary of this chapter is that there is no need for
				44	your front-end to build SSA form: LLVM provides highly tuned and well tested
				45	support for this, though the way it works is a bit unexpected for some.</p>
				46
				47	</div>
				48
				49	<!-- *********************************************************************** -->
				50	<div class="doc_section"><a name="why">Why is this a hard problem?</a></div>
				51	<!-- *********************************************************************** -->
				52
				53	<div class="doc_text">
				54
				55	<p>
				56	To understand why mutable variables cause complexities in SSA construction,
				57	consider this extremely simple C example:
				58	</p>
				59
				60	<div class="doc_code">
				61	<pre>
				62	int G, H;
				63	int test(_Bool Condition) {
				64	int X;
				65	if (Condition)
				66	X = G;
				67	else
				68	X = H;
				69	return X;
				70	}
				71	</pre>
				72	</div>
				73
				74	<p>In this case, we have the variable "X", whose value depends on the path
				75	executed in the program. Because there are two different possible values for X
				76	before the return instruction, a PHI node is inserted to merge the two values.
				77	The LLVM IR that we want for this example looks like this:</p>
				78
				79	<div class="doc_code">
				80	<pre>
				81	@G = weak global i32 0 ; type of @G is i32*
				82	@H = weak global i32 0 ; type of @H is i32*
				83
				84	define i32 @test(i1 %Condition) {
				85	entry:
				86	br i1 %Condition, label %cond_true, label %cond_false
				87
				88	cond_true:
				89	%X.0 = load i32* @G
				90	br label %cond_next
				91
				92	cond_false:
				93	%X.1 = load i32* @H
				94	br label %cond_next
				95
				96	cond_next:
				97	%X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
				98	ret i32 %X.2
				99	}
				100	</pre>
				101	</div>
				102
				103	<p>In this example, the loads from the G and H global variables are explicit in
				104	the LLVM IR, and they live in the then/else branches of the if statement
				105	(cond_true/cond_false). In order to merge the incoming values, the X.2 phi node
				106	in the cond_next block selects the right value to use based on where control
				107	flow is coming from: if control flow comes from the cond_false block, X.2 gets
				108	the value of X.1. Alternatively, if control flow comes from cond_tree, it gets
				109	the value of X.0. The intent of this chapter is not to explain the details of
				110	SSA form. For more information, see one of the many <a
				111	href="http://en.wikipedia.org/wiki/Static_single_assignment_form">online
				112	references</a>.</p>
				113
				114	<p>The question for this article is "who places phi nodes when lowering
				115	assignments to mutable variables?". The issue here is that LLVM
				116	<em>requires</em> that its IR be in SSA form: there is no "non-ssa" mode for it.
				117	However, SSA construction requires non-trivial algorithms and data structures,
				118	so it is inconvenient and wasteful for every front-end to have to reproduce this
				119	logic.</p>
				120
				121	</div>
				122
				123	<!-- *********************************************************************** -->
				124	<div class="doc_section"><a name="memory">Memory in LLVM</a></div>
				125	<!-- *********************************************************************** -->
				126
				127	<div class="doc_text">
				128
				129	<p>The 'trick' here is that while LLVM does require all register values to be
				130	in SSA form, it does not require (or permit) memory objects to be in SSA form.
				131	In the example above, note that the loads from G and H are direct accesses to
				132	G and H: they are not renamed or versioned. This differs from some other
Chris Lattner	2e5d07e	2007-11-04 19:42:13 +0000	[diff] [blame]	133	compiler systems, which do try to version memory objects. In LLVM, instead of
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	134	encoding dataflow analysis of memory into the LLVM IR, it is handled with <a
				135	href="../WritingAnLLVMPass.html">Analysis Passes</a> which are computed on
				136	demand.</p>
				137
				138	<p>
				139	With this in mind, the high-level idea is that we want to make a stack variable
				140	(which lives in memory, because it is on the stack) for each mutable object in
				141	a function. To take advantage of this trick, we need to talk about how LLVM
				142	represents stack variables.
				143	</p>
				144
				145	<p>In LLVM, all memory accesses are explicit with load/store instructions, and
				146	it is carefully designed to not have (or need) an "address-of" operator. Notice
				147	how the type of the @G/@H global variables is actually "i32*" even though the
				148	variable is defined as "i32". What this means is that @G defines <em>space</em>
				149	for an i32 in the global data area, but its <em>name</em> actually refers to the
				150	address for that space. Stack variables work the same way, but instead of being
				151	declared with global variable definitions, they are declared with the
				152	<a href="../LangRef.html#i_alloca">LLVM alloca instruction</a>:</p>
				153
				154	<div class="doc_code">
				155	<pre>
				156	define i32 @test(i1 %Condition) {
				157	entry:
				158	%X = alloca i32 ; type of %X is i32*.
				159	...
				160	%tmp = load i32* %X ; load the stack value %X from the stack.
				161	%tmp2 = add i32 %tmp, 1 ; increment it
				162	store i32 %tmp2, i32* %X ; store it back
				163	...
				164	</pre>
				165	</div>
				166
				167	<p>This code shows an example of how you can declare and manipulate a stack
				168	variable in the LLVM IR. Stack memory allocated with the alloca instruction is
				169	fully general: you can pass the address of the stack slot to functions, you can
				170	store it in other variables, etc. In our example above, we could rewrite the
				171	example to use the alloca technique to avoid using a PHI node:</p>
				172
				173	<div class="doc_code">
				174	<pre>
				175	@G = weak global i32 0 ; type of @G is i32*
				176	@H = weak global i32 0 ; type of @H is i32*
				177
				178	define i32 @test(i1 %Condition) {
				179	entry:
				180	%X = alloca i32 ; type of %X is i32*.
				181	br i1 %Condition, label %cond_true, label %cond_false
				182
				183	cond_true:
				184	%X.0 = load i32* @G
				185	store i32 %X.0, i32* %X ; Update X
				186	br label %cond_next
				187
				188	cond_false:
				189	%X.1 = load i32* @H
				190	store i32 %X.1, i32* %X ; Update X
				191	br label %cond_next
				192
				193	cond_next:
				194	%X.2 = load i32* %X ; Read X
				195	ret i32 %X.2
				196	}
				197	</pre>
				198	</div>
				199
				200	<p>With this, we have discovered a way to handle arbitrary mutable variables
				201	without the need to create Phi nodes at all:</p>
				202
				203	<ol>
				204	<li>Each mutable variable becomes a stack allocation.</li>
				205	<li>Each read of the variable becomes a load from the stack.</li>
				206	<li>Each update of the variable becomes a store to the stack.</li>
				207	<li>Taking the address of a variable just uses the stack address directly.</li>
				208	</ol>
				209
				210	<p>While this solution has solved our immediate problem, it introduced another
				211	one: we have now apparently introduced a lot of stack traffic for very simple
				212	and common operations, a major performance problem. Fortunately for us, the
				213	LLVM optimizer has a highly-tuned optimization pass named "mem2reg" that handles
				214	this case, promoting allocas like this into SSA registers, inserting Phi nodes
				215	as appropriate. If you run this example through the pass, for example, you'll
				216	get:</p>
				217
				218	<div class="doc_code">
				219	<pre>
				220	$ <b>llvm-as < example.ll \| opt -mem2reg \| llvm-dis</b>
				221	@G = weak global i32 0
				222	@H = weak global i32 0
				223
				224	define i32 @test(i1 %Condition) {
				225	entry:
				226	br i1 %Condition, label %cond_true, label %cond_false
				227
				228	cond_true:
				229	%X.0 = load i32* @G
				230	br label %cond_next
				231
				232	cond_false:
				233	%X.1 = load i32* @H
				234	br label %cond_next
				235
				236	cond_next:
				237	%X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
				238	ret i32 %X.01
				239	}
				240	</pre>
Chris Lattner	e719831	2007-11-03 22:22:30 +0000	[diff] [blame]	241	</div>
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	242
Chris Lattner	e719831	2007-11-03 22:22:30 +0000	[diff] [blame]	243	<p>The mem2reg pass implements the standard "iterated dominator frontier"
				244	algorithm for constructing SSA form and has a number of optimizations that speed
				245	up very common degenerate cases. mem2reg really is the answer for dealing with
				246	mutable variables, and we highly recommend that you depend on it. Note that
				247	mem2reg only works on variables in certain circumstances:</p>
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	248
Chris Lattner	e719831	2007-11-03 22:22:30 +0000	[diff] [blame]	249	<ol>
				250	<li>mem2reg is alloca-driven: it looks for allocas and if it can handle them, it
				251	promotes them. It does not apply to global variables or heap allocations.</li>
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	252
Chris Lattner	e719831	2007-11-03 22:22:30 +0000	[diff] [blame]	253	<li>mem2reg only looks for alloca instructions in the entry block of the
				254	function. Being in the entry block guarantees that the alloca is only executed
				255	once, which makes analysis simpler.</li>
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	256
Chris Lattner	e719831	2007-11-03 22:22:30 +0000	[diff] [blame]	257	<li>mem2reg only promotes allocas whose uses are direct loads and stores. If
				258	the address of the stack object is passed to a function, or if any funny pointer
				259	arithmetic is involved, the alloca will not be promoted.</li>
				260
Chris Lattner	a56b22d	2007-11-05 17:45:54 +0000	[diff] [blame^]	261	<li>mem2reg only works on allocas of <a
				262	href="../LangRef.html#t_classifications">first class</a>
				263	values (such as pointers, scalars and vectors), and only if the array size
Chris Lattner	e719831	2007-11-03 22:22:30 +0000	[diff] [blame]	264	of the allocation is 1 (or missing in the .ll file). mem2reg is not capable of
				265	promoting structs or arrays to registers. Note that the "scalarrepl" pass is
				266	more powerful and can promote structs, "unions", and arrays in many cases.</li>
				267
				268	</ol>
				269
				270	<p>
				271	All of these properties are easy to satisfy for most imperative languages, and
Chris Lattner	2e5d07e	2007-11-04 19:42:13 +0000	[diff] [blame]	272	we'll illustrate this below with Kaleidoscope. The final question you may be
Chris Lattner	e719831	2007-11-03 22:22:30 +0000	[diff] [blame]	273	asking is: should I bother with this nonsense for my front-end? Wouldn't it be
				274	better if I just did SSA construction directly, avoiding use of the mem2reg
				275	optimization pass? In short, we strongly recommend that use you this technique
				276	for building SSA form, unless there is an extremely good reason not to. Using
				277	this technique is:</p>
				278
				279	<ul>
				280	<li>Proven and well tested: llvm-gcc and clang both use this technique for local
				281	mutable variables. As such, the most common clients of LLVM are using this to
				282	handle a bulk of their variables. You can be sure that bugs are found fast and
				283	fixed early.</li>
				284
				285	<li>Extremely Fast: mem2reg has a number of special cases that make it fast in
				286	common cases as well as fully general. For example, it has fast-paths for
				287	variables that are only used in a single block, variables that only have one
				288	assignment point, good heuristics to avoid insertion of unneeded phi nodes, etc.
				289	</li>
				290
				291	<li>Needed for debug info generation: <a href="../SourceLevelDebugging.html">
				292	Debug information in LLVM</a> relies on having the address of the variable
				293	exposed to attach debug info to it. This technique dovetails very naturally
				294	with this style of debug info.</li>
				295	</ul>
				296
				297	<p>If nothing else, this makes it much easier to get your front-end up and
				298	running, and is very simple to implement. Lets extend Kaleidoscope with mutable
				299	variables now!
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	300	</p>
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	301
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	302	</div>
				303
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	304	<!-- *********************************************************************** -->
				305	<div class="doc_section"><a name="kalvars">Mutable Variables in
				306	Kaleidoscope</a></div>
				307	<!-- *********************************************************************** -->
				308
				309	<div class="doc_text">
				310
				311	<p>Now that we know the sort of problem we want to tackle, lets see what this
				312	looks like in the context of our little Kaleidoscope language. We're going to
				313	add two features:</p>
				314
				315	<ol>
				316	<li>The ability to mutate variables with the '=' operator.</li>
				317	<li>The ability to define new variables.</li>
				318	</ol>
				319
				320	<p>While the first item is really what this is about, we only have variables
				321	for incoming arguments and for induction variables, and redefining them only
				322	goes so far :). Also, the ability to define new variables is a
				323	useful thing regardless of whether you will be mutating them. Here's a
				324	motivating example that shows how we could use these:</p>
				325
				326	<div class="doc_code">
				327	<pre>
				328	# Define ':' for sequencing: as a low-precedence operator that ignores operands
				329	# and just returns the RHS.
				330	def binary : 1 (x y) y;
				331
				332	# Recursive fib, we could do this before.
				333	def fib(x)
				334	if (x < 3) then
				335	1
				336	else
				337	fib(x-1)+fib(x-2);
				338
				339	# Iterative fib.
				340	def fibi(x)
				341	<b>var a = 1, b = 1, c in</b>
				342	(for i = 3, i &;t; x in
				343	<b>c = a + b</b> :
				344	<b>a = b</b> :
				345	<b>b = c</b>) :
				346	b;
				347
				348	# Call it.
				349	fibi(10);
				350	</pre>
				351	</div>
				352
				353	<p>
				354	In order to mutate variables, we have to change our existing variables to use
				355	the "alloca trick". Once we have that, we'll add our new operator, then extend
				356	Kaleidoscope to support new variable definitions.
				357	</p>
				358
				359	</div>
				360
				361	<!-- *********************************************************************** -->
				362	<div class="doc_section"><a name="adjustments">Adjusting Existing Variables for
				363	Mutation</a></div>
				364	<!-- *********************************************************************** -->
				365
				366	<div class="doc_text">
				367
				368	<p>
				369	The symbol table in Kaleidoscope is managed at code generation time by the
				370	'<tt>NamedValues</tt>' map. This map currently keeps track of the LLVM "Value*"
				371	that holds the double value for the named variable. In order to support
				372	mutation, we need to change this slightly, so that it <tt>NamedValues</tt> holds
				373	the <em>memory location</em> of the variable in question. Note that this
				374	change is a refactoring: it changes the structure of the code, but does not
				375	(by itself) change the behavior of the compiler. All of these changes are
				376	isolated in the Kaleidoscope code generator.</p>
				377
				378	<p>
				379	At this point in Kaleidoscope's development, it only supports variables for two
				380	things: incoming arguments to functions and the induction variable of 'for'
				381	loops. For consistency, we'll allow mutation of these variables in addition to
				382	other user-defined variables. This means that these will both need memory
				383	locations.
				384	</p>
				385
				386	<p>To start our transformation of Kaleidoscope, we'll change the NamedValues
				387	map to map to AllocaInst* instead of Value*. Once we do this, the C++ compiler
				388	will tell use what parts of the code we need to update:</p>
				389
				390	<div class="doc_code">
				391	<pre>
				392	static std::map<std::string, AllocaInst*> NamedValues;
				393	</pre>
				394	</div>
				395
				396	<p>Also, since we will need to create these alloca's, we'll use a helper
				397	function that ensures that the allocas are created in the entry block of the
				398	function:</p>
				399
				400	<div class="doc_code">
				401	<pre>
				402	/// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
				403	/// the function. This is used for mutable variables etc.
				404	static AllocaInst CreateEntryBlockAlloca(Function TheFunction,
				405	const std::string &VarName) {
				406	LLVMBuilder TmpB(&TheFunction->getEntryBlock(),
				407	TheFunction->getEntryBlock().begin());
				408	return TmpB.CreateAlloca(Type::DoubleTy, 0, VarName.c_str());
				409	}
				410	</pre>
				411	</div>
				412
				413	<p>This funny looking code creates an LLVMBuilder object that is pointing at
				414	the first instruction (.begin()) of the entry block. It then creates an alloca
				415	with the expected name and returns it. Because all values in Kaleidoscope are
				416	doubles, there is no need to pass in a type to use.</p>
				417
				418	<p>With this in place, the first functionality change we want to make is to
				419	variable references. In our new scheme, variables live on the stack, so code
				420	generating a reference to them actually needs to produce a load from the stack
				421	slot:</p>
				422
				423	<div class="doc_code">
				424	<pre>
				425	Value *VariableExprAST::Codegen() {
				426	// Look this variable up in the function.
				427	Value *V = NamedValues[Name];
				428	if (V == 0) return ErrorV("Unknown variable name");
				429
				430	// Load the value.
				431	return Builder.CreateLoad(V, Name.c_str());
				432	}
				433	</pre>
				434	</div>
				435
				436	<p>As you can see, this is pretty straight-forward. Next we need to update the
				437	things that define the variables to set up the alloca. We'll start with
				438	<tt>ForExprAST::Codegen</tt> (see the <a href="#code">full code listing</a> for
				439	the unabridged code):</p>
				440
				441	<div class="doc_code">
				442	<pre>
				443	Function *TheFunction = Builder.GetInsertBlock()->getParent();
				444
				445	<b>// Create an alloca for the variable in the entry block.
				446	AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);</b>
				447
				448	// Emit the start code first, without 'variable' in scope.
				449	Value *StartVal = Start->Codegen();
				450	if (StartVal == 0) return 0;
				451
				452	<b>// Store the value into the alloca.
				453	Builder.CreateStore(StartVal, Alloca);</b>
				454	...
				455
				456	// Compute the end condition.
				457	Value *EndCond = End->Codegen();
				458	if (EndCond == 0) return EndCond;
				459
				460	<b>// Reload, increment, and restore the alloca. This handles the case where
				461	// the body of the loop mutates the variable.
				462	Value *CurVar = Builder.CreateLoad(Alloca);
				463	Value *NextVar = Builder.CreateAdd(CurVar, StepVal, "nextvar");
				464	Builder.CreateStore(NextVar, Alloca);</b>
				465	...
				466	</pre>
				467	</div>
				468
				469	<p>This code is virtually identical to the code <a
				470	href="LangImpl5.html#forcodegen">before we allowed mutable variables</a>. The
				471	big difference is that we no longer have to construct a PHI node, and we use
				472	load/store to access the variable as needed.</p>
				473
				474	<p>To support mutable argument variables, we need to also make allocas for them.
				475	The code for this is also pretty simple:</p>
				476
				477	<div class="doc_code">
				478	<pre>
				479	/// CreateArgumentAllocas - Create an alloca for each argument and register the
				480	/// argument in the symbol table so that references to it will succeed.
				481	void PrototypeAST::CreateArgumentAllocas(Function *F) {
				482	Function::arg_iterator AI = F->arg_begin();
				483	for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
				484	// Create an alloca for this variable.
				485	AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
				486
				487	// Store the initial value into the alloca.
				488	Builder.CreateStore(AI, Alloca);
				489
				490	// Add arguments to variable symbol table.
				491	NamedValues[Args[Idx]] = Alloca;
				492	}
				493	}
				494	</pre>
				495	</div>
				496
				497	<p>For each argument, we make an alloca, store the input value to the function
				498	into the alloca, and register the alloca as the memory location for the
				499	argument. This method gets invoked by <tt>FunctionAST::Codegen</tt> right after
				500	it sets up the entry block for the function.</p>
				501
				502	<p>The final missing piece is adding the 'mem2reg' pass, which allows us to get
				503	good codegen once again:</p>
				504
				505	<div class="doc_code">
				506	<pre>
				507	// Set up the optimizer pipeline. Start with registering info about how the
				508	// target lays out data structures.
				509	OurFPM.add(new TargetData(*TheExecutionEngine->getTargetData()));
				510	<b>// Promote allocas to registers.
				511	OurFPM.add(createPromoteMemoryToRegisterPass());</b>
				512	// Do simple "peephole" optimizations and bit-twiddling optzns.
				513	OurFPM.add(createInstructionCombiningPass());
				514	// Reassociate expressions.
				515	OurFPM.add(createReassociatePass());
				516	</pre>
				517	</div>
				518
				519	<p>It is interesting to see what the code looks like before and after the
				520	mem2reg optimization runs. For example, this is the before/after code for our
				521	recursive fib. Before the optimization:</p>
				522
				523	<div class="doc_code">
				524	<pre>
				525	define double @fib(double %x) {
				526	entry:
				527	<b>%x1 = alloca double
				528	store double %x, double* %x1
				529	%x2 = load double* %x1</b>
				530	%multmp = fcmp ult double %x2, 3.000000e+00
				531	%booltmp = uitofp i1 %multmp to double
				532	%ifcond = fcmp one double %booltmp, 0.000000e+00
				533	br i1 %ifcond, label %then, label %else
				534
				535	then: ; preds = %entry
				536	br label %ifcont
				537
				538	else: ; preds = %entry
				539	<b>%x3 = load double* %x1</b>
				540	%subtmp = sub double %x3, 1.000000e+00
				541	%calltmp = call double @fib( double %subtmp )
				542	<b>%x4 = load double* %x1</b>
				543	%subtmp5 = sub double %x4, 2.000000e+00
				544	%calltmp6 = call double @fib( double %subtmp5 )
				545	%addtmp = add double %calltmp, %calltmp6
				546	br label %ifcont
				547
				548	ifcont: ; preds = %else, %then
				549	%iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
				550	ret double %iftmp
				551	}
				552	</pre>
				553	</div>
				554
				555	<p>Here there is only one variable (x, the input argument) but you can still
				556	see the extremely simple-minded code generation strategy we are using. In the
				557	entry block, an alloca is created, and the initial input value is stored into
				558	it. Each reference to the variable does a reload from the stack. Also, note
				559	that we didn't modify the if/then/else expression, so it still inserts a PHI
				560	node. While we could make an alloca for it, it is actually easier to create a
				561	PHI node for it, so we still just make the PHI.</p>
				562
				563	<p>Here is the code after the mem2reg pass runs:</p>
				564
				565	<div class="doc_code">
				566	<pre>
				567	define double @fib(double %x) {
				568	entry:
				569	%multmp = fcmp ult double <b>%x</b>, 3.000000e+00
				570	%booltmp = uitofp i1 %multmp to double
				571	%ifcond = fcmp one double %booltmp, 0.000000e+00
				572	br i1 %ifcond, label %then, label %else
				573
				574	then:
				575	br label %ifcont
				576
				577	else:
				578	%subtmp = sub double <b>%x</b>, 1.000000e+00
				579	%calltmp = call double @fib( double %subtmp )
				580	%subtmp5 = sub double <b>%x</b>, 2.000000e+00
				581	%calltmp6 = call double @fib( double %subtmp5 )
				582	%addtmp = add double %calltmp, %calltmp6
				583	br label %ifcont
				584
				585	ifcont: ; preds = %else, %then
				586	%iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
				587	ret double %iftmp
				588	}
				589	</pre>
				590	</div>
				591
				592	<p>This is a trivial case for mem2reg, since there are no redefinitions of the
				593	variable. The point of showing this is to calm your tension about inserting
				594	such blatent inefficiencies :).</p>
				595
				596	<p>After the rest of the optimizers run, we get:</p>
				597
				598	<div class="doc_code">
				599	<pre>
				600	define double @fib(double %x) {
				601	entry:
				602	%multmp = fcmp ult double %x, 3.000000e+00
				603	%booltmp = uitofp i1 %multmp to double
				604	%ifcond = fcmp ueq double %booltmp, 0.000000e+00
				605	br i1 %ifcond, label %else, label %ifcont
				606
				607	else:
				608	%subtmp = sub double %x, 1.000000e+00
				609	%calltmp = call double @fib( double %subtmp )
				610	%subtmp5 = sub double %x, 2.000000e+00
				611	%calltmp6 = call double @fib( double %subtmp5 )
				612	%addtmp = add double %calltmp, %calltmp6
				613	ret double %addtmp
				614
				615	ifcont:
				616	ret double 1.000000e+00
				617	}
				618	</pre>
				619	</div>
				620
				621	<p>Here we see that the simplifycfg pass decided to clone the return instruction
				622	into the end of the 'else' block. This allowed it to eliminate some branches
				623	and the PHI node.</p>
				624
				625	<p>Now that all symbol table references are updated to use stack variables,
				626	we'll add the assignment operator.</p>
				627
				628	</div>
				629
				630	<!-- *********************************************************************** -->
				631	<div class="doc_section"><a name="assignment">New Assignment Operator</a></div>
				632	<!-- *********************************************************************** -->
				633
				634	<div class="doc_text">
				635
				636	<p>With our current framework, adding a new assignment operator is really
				637	simple. We will parse it just like any other binary operator, but handle it
				638	internally (instead of allowing the user to define it). The first step is to
				639	set a precedence:</p>
				640
				641	<div class="doc_code">
				642	<pre>
				643	int main() {
				644	// Install standard binary operators.
				645	// 1 is lowest precedence.
				646	<b>BinopPrecedence['='] = 2;</b>
				647	BinopPrecedence['<'] = 10;
				648	BinopPrecedence['+'] = 20;
				649	BinopPrecedence['-'] = 20;
				650	</pre>
				651	</div>
				652
				653	<p>Now that the parser knows the precedence of the binary operator, it takes
				654	care of all the parsing and AST generation. We just need to implement codegen
				655	for the assignment operator. This looks like:</p>
				656
				657	<div class="doc_code">
				658	<pre>
				659	Value *BinaryExprAST::Codegen() {
				660	// Special case '=' because we don't want to emit the LHS as an expression.
				661	if (Op == '=') {
				662	// Assignment requires the LHS to be an identifier.
				663	VariableExprAST LHSE = dynamic_cast<VariableExprAST>(LHS);
				664	if (!LHSE)
				665	return ErrorV("destination of '=' must be a variable");
				666	</pre>
				667	</div>
				668
				669	<p>Unlike the rest of the binary operators, our assignment operator doesn't
				670	follow the "emit LHS, emit RHS, do computation" model. As such, it is handled
				671	as a special case before the other binary operators are handled. The other
				672	strange thing about it is that it requires the LHS to be a variable directly.
				673	</p>
				674
				675	<div class="doc_code">
				676	<pre>
				677	// Codegen the RHS.
				678	Value *Val = RHS->Codegen();
				679	if (Val == 0) return 0;
				680
				681	// Look up the name.
				682	Value *Variable = NamedValues[LHSE->getName()];
				683	if (Variable == 0) return ErrorV("Unknown variable name");
				684
				685	Builder.CreateStore(Val, Variable);
				686	return Val;
				687	}
				688	...
				689	</pre>
				690	</div>
				691
				692	<p>Once it has the variable, codegen'ing the assignment is straight-forward:
				693	we emit the RHS of the assignment, create a store, and return the computed
				694	value. Returning a value allows for chained assignments like "X = (Y = Z)".</p>
				695
				696	<p>Now that we have an assignment operator, we can mutate loop variables and
				697	arguments. For example, we can now run code like this:</p>
				698
				699	<div class="doc_code">
				700	<pre>
				701	# Function to print a double.
				702	extern printd(x);
				703
				704	# Define ':' for sequencing: as a low-precedence operator that ignores operands
				705	# and just returns the RHS.
				706	def binary : 1 (x y) y;
				707
				708	def test(x)
				709	printd(x) :
				710	x = 4 :
				711	printd(x);
				712
				713	test(123);
				714	</pre>
				715	</div>
				716
				717	<p>When run, this example prints "123" and then "4", showing that we did
				718	actually mutate the value! Okay, we have now officially implemented our goal:
				719	getting this to work requires SSA construction in the general case. However,
				720	to be really useful, we want the ability to define our own local variables, lets
				721	add this next!
				722	</p>
				723
				724	</div>
				725
				726	<!-- *********************************************************************** -->
				727	<div class="doc_section"><a name="localvars">User-defined Local
				728	Variables</a></div>
				729	<!-- *********************************************************************** -->
				730
				731	<div class="doc_text">
				732
				733	<p>Adding var/in is just like any other other extensions we made to
				734	Kaleidoscope: we extend the lexer, the parser, the AST and the code generator.
				735	The first step for adding our new 'var/in' construct is to extend the lexer.
				736	As before, this is pretty trivial, the code looks like this:</p>
				737
				738	<div class="doc_code">
				739	<pre>
				740	enum Token {
				741	...
				742	<b>// var definition
				743	tok_var = -13</b>
				744	...
				745	}
				746	...
				747	static int gettok() {
				748	...
				749	if (IdentifierStr == "in") return tok_in;
				750	if (IdentifierStr == "binary") return tok_binary;
				751	if (IdentifierStr == "unary") return tok_unary;
				752	<b>if (IdentifierStr == "var") return tok_var;</b>
				753	return tok_identifier;
				754	...
				755	</pre>
				756	</div>
				757
				758	<p>The next step is to define the AST node that we will construct. For var/in,
				759	it will look like this:</p>
				760
				761	<div class="doc_code">
				762	<pre>
				763	/// VarExprAST - Expression class for var/in
				764	class VarExprAST : public ExprAST {
				765	std::vector<std::pair<std::string, ExprAST*> > VarNames;
				766	ExprAST *Body;
				767	public:
				768	VarExprAST(const std::vector<std::pair<std::string, ExprAST*> > &varnames,
				769	ExprAST *body)
				770	: VarNames(varnames), Body(body) {}
				771
				772	virtual Value *Codegen();
				773	};
				774	</pre>
				775	</div>
				776
				777	<p>var/in allows a list of names to be defined all at once, and each name can
				778	optionally have an initializer value. As such, we capture this information in
				779	the VarNames vector. Also, var/in has a body, this body is allowed to access
				780	the variables defined by the let/in.</p>
				781
				782	<p>With this ready, we can define the parser pieces. First thing we do is add
				783	it as a primary expression:</p>
				784
				785	<div class="doc_code">
				786	<pre>
				787	/// primary
				788	/// ::= identifierexpr
				789	/// ::= numberexpr
				790	/// ::= parenexpr
				791	/// ::= ifexpr
				792	/// ::= forexpr
				793	<b>/// ::= varexpr</b>
				794	static ExprAST *ParsePrimary() {
				795	switch (CurTok) {
				796	default: return Error("unknown token when expecting an expression");
				797	case tok_identifier: return ParseIdentifierExpr();
				798	case tok_number: return ParseNumberExpr();
				799	case '(': return ParseParenExpr();
				800	case tok_if: return ParseIfExpr();
				801	case tok_for: return ParseForExpr();
				802	<b>case tok_var: return ParseVarExpr();</b>
				803	}
				804	}
				805	</pre>
				806	</div>
				807
				808	<p>Next we define ParseVarExpr:</p>
				809
				810	<div class="doc_code">
				811	<pre>
				812	/// varexpr ::= 'var' identifer ('=' expression)?
				813	// (',' identifer ('=' expression)?)* 'in' expression
				814	static ExprAST *ParseVarExpr() {
				815	getNextToken(); // eat the var.
				816
				817	std::vector<std::pair<std::string, ExprAST*> > VarNames;
				818
				819	// At least one variable name is required.
				820	if (CurTok != tok_identifier)
				821	return Error("expected identifier after var");
				822	</pre>
				823	</div>
				824
				825	<p>The first part of this code parses the list of identifier/expr pairs into the
				826	local <tt>VarNames</tt> vector.
				827
				828	<div class="doc_code">
				829	<pre>
				830	while (1) {
				831	std::string Name = IdentifierStr;
				832	getNextToken(); // eat identifer.
				833
				834	// Read the optional initializer.
				835	ExprAST *Init = 0;
				836	if (CurTok == '=') {
				837	getNextToken(); // eat the '='.
				838
				839	Init = ParseExpression();
				840	if (Init == 0) return 0;
				841	}
				842
				843	VarNames.push_back(std::make_pair(Name, Init));
				844
				845	// End of var list, exit loop.
				846	if (CurTok != ',') break;
				847	getNextToken(); // eat the ','.
				848
				849	if (CurTok != tok_identifier)
				850	return Error("expected identifier list after var");
				851	}
				852	</pre>
				853	</div>
				854
				855	<p>Once all the variables are parsed, we then parse the body and create the
				856	AST node:</p>
				857
				858	<div class="doc_code">
				859	<pre>
				860	// At this point, we have to have 'in'.
				861	if (CurTok != tok_in)
				862	return Error("expected 'in' keyword after 'var'");
				863	getNextToken(); // eat 'in'.
				864
				865	ExprAST *Body = ParseExpression();
				866	if (Body == 0) return 0;
				867
				868	return new VarExprAST(VarNames, Body);
				869	}
				870	</pre>
				871	</div>
				872
				873	<p>Now that we can parse and represent the code, we need to support emission of
				874	LLVM IR for it. This code starts out with:</p>
				875
				876	<div class="doc_code">
				877	<pre>
				878	Value *VarExprAST::Codegen() {
				879	std::vector<AllocaInst *> OldBindings;
				880
				881	Function *TheFunction = Builder.GetInsertBlock()->getParent();
				882
				883	// Register all variables and emit their initializer.
				884	for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
				885	const std::string &VarName = VarNames[i].first;
				886	ExprAST *Init = VarNames[i].second;
				887	</pre>
				888	</div>
				889
				890	<p>Basically it loops over all the variables, installing them one at a time.
				891	For each variable we put into the symbol table, we remember the previous value
				892	that we replace in OldBindings.</p>
				893
				894	<div class="doc_code">
				895	<pre>
				896	// Emit the initializer before adding the variable to scope, this prevents
				897	// the initializer from referencing the variable itself, and permits stuff
				898	// like this:
				899	// var a = 1 in
				900	// var a = a in ... # refers to outer 'a'.
				901	Value *InitVal;
				902	if (Init) {
				903	InitVal = Init->Codegen();
				904	if (InitVal == 0) return 0;
				905	} else { // If not specified, use 0.0.
				906	InitVal = ConstantFP::get(Type::DoubleTy, APFloat(0.0));
				907	}
				908
				909	AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
				910	Builder.CreateStore(InitVal, Alloca);
				911
				912	// Remember the old variable binding so that we can restore the binding when
				913	// we unrecurse.
				914	OldBindings.push_back(NamedValues[VarName]);
				915
				916	// Remember this binding.
				917	NamedValues[VarName] = Alloca;
				918	}
				919	</pre>
				920	</div>
				921
				922	<p>There are more comments here than code. The basic idea is that we emit the
				923	initializer, create the alloca, then update the symbol table to point to it.
				924	Once all the variables are installed in the symbol table, we evaluate the body
				925	of the var/in expression:</p>
				926
				927	<div class="doc_code">
				928	<pre>
				929	// Codegen the body, now that all vars are in scope.
				930	Value *BodyVal = Body->Codegen();
				931	if (BodyVal == 0) return 0;
				932	</pre>
				933	</div>
				934
				935	<p>Finally, before returning, we restore the previous variable bindings:</p>
				936
				937	<div class="doc_code">
				938	<pre>
				939	// Pop all our variables from scope.
				940	for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
				941	NamedValues[VarNames[i].first] = OldBindings[i];
				942
				943	// Return the body computation.
				944	return BodyVal;
				945	}
				946	</pre>
				947	</div>
				948
				949	<p>The end result of all of this is that we get properly scoped variable
				950	definitions, and we even (trivially) allow mutation of them :).</p>
				951
				952	<p>With this, we completed what we set out to do. Our nice iterative fib
				953	example from the intro compiles and runs just fine. The mem2reg pass optimizes
				954	all of our stack variables into SSA registers, inserting PHI nodes where needed,
				955	and our front-end remains simple: no iterated dominator frontier computation
				956	anywhere in sight.</p>
				957
				958	</div>
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	959
				960	<!-- *********************************************************************** -->
				961	<div class="doc_section"><a name="code">Full Code Listing</a></div>
				962	<!-- *********************************************************************** -->
				963
				964	<div class="doc_text">
				965
				966	<p>
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	967	Here is the complete code listing for our running example, enhanced with mutable
				968	variables and var/in support. To build this example, use:
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	969	</p>
				970
				971	<div class="doc_code">
				972	<pre>
				973	# Compile
				974	g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
				975	# Run
				976	./toy
				977	</pre>
				978	</div>
				979
				980	<p>Here is the code:</p>
				981
				982	<div class="doc_code">
				983	<pre>
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	984	#include "llvm/DerivedTypes.h"
				985	#include "llvm/ExecutionEngine/ExecutionEngine.h"
				986	#include "llvm/Module.h"
				987	#include "llvm/ModuleProvider.h"
				988	#include "llvm/PassManager.h"
				989	#include "llvm/Analysis/Verifier.h"
				990	#include "llvm/Target/TargetData.h"
				991	#include "llvm/Transforms/Scalar.h"
				992	#include "llvm/Support/LLVMBuilder.h"
				993	#include <cstdio>
				994	#include <string>
				995	#include <map>
				996	#include <vector>
				997	using namespace llvm;
				998
				999	//===----------------------------------------------------------------------===//
				1000	// Lexer
				1001	//===----------------------------------------------------------------------===//
				1002
				1003	// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
				1004	// of these for known things.
				1005	enum Token {
				1006	tok_eof = -1,
				1007
				1008	// commands
				1009	tok_def = -2, tok_extern = -3,
				1010
				1011	// primary
				1012	tok_identifier = -4, tok_number = -5,
				1013
				1014	// control
				1015	tok_if = -6, tok_then = -7, tok_else = -8,
				1016	tok_for = -9, tok_in = -10,
				1017
				1018	// operators
				1019	tok_binary = -11, tok_unary = -12,
				1020
				1021	// var definition
				1022	tok_var = -13
				1023	};
				1024
				1025	static std::string IdentifierStr; // Filled in if tok_identifier
				1026	static double NumVal; // Filled in if tok_number
				1027
				1028	/// gettok - Return the next token from standard input.
				1029	static int gettok() {
				1030	static int LastChar = ' ';
				1031
				1032	// Skip any whitespace.
				1033	while (isspace(LastChar))
				1034	LastChar = getchar();
				1035
				1036	if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
				1037	IdentifierStr = LastChar;
				1038	while (isalnum((LastChar = getchar())))
				1039	IdentifierStr += LastChar;
				1040
				1041	if (IdentifierStr == "def") return tok_def;
				1042	if (IdentifierStr == "extern") return tok_extern;
				1043	if (IdentifierStr == "if") return tok_if;
				1044	if (IdentifierStr == "then") return tok_then;
				1045	if (IdentifierStr == "else") return tok_else;
				1046	if (IdentifierStr == "for") return tok_for;
				1047	if (IdentifierStr == "in") return tok_in;
				1048	if (IdentifierStr == "binary") return tok_binary;
				1049	if (IdentifierStr == "unary") return tok_unary;
				1050	if (IdentifierStr == "var") return tok_var;
				1051	return tok_identifier;
				1052	}
				1053
				1054	if (isdigit(LastChar) \|\| LastChar == '.') { // Number: [0-9.]+
				1055	std::string NumStr;
				1056	do {
				1057	NumStr += LastChar;
				1058	LastChar = getchar();
				1059	} while (isdigit(LastChar) \|\| LastChar == '.');
				1060
				1061	NumVal = strtod(NumStr.c_str(), 0);
				1062	return tok_number;
				1063	}
				1064
				1065	if (LastChar == '#') {
				1066	// Comment until end of line.
				1067	do LastChar = getchar();
				1068	while (LastChar != EOF && LastChar != '\n' & LastChar != '\r');
				1069
				1070	if (LastChar != EOF)
				1071	return gettok();
				1072	}
				1073
				1074	// Check for end of file. Don't eat the EOF.
				1075	if (LastChar == EOF)
				1076	return tok_eof;
				1077
				1078	// Otherwise, just return the character as its ascii value.
				1079	int ThisChar = LastChar;
				1080	LastChar = getchar();
				1081	return ThisChar;
				1082	}
				1083
				1084	//===----------------------------------------------------------------------===//
				1085	// Abstract Syntax Tree (aka Parse Tree)
				1086	//===----------------------------------------------------------------------===//
				1087
				1088	/// ExprAST - Base class for all expression nodes.
				1089	class ExprAST {
				1090	public:
				1091	virtual ~ExprAST() {}
				1092	virtual Value *Codegen() = 0;
				1093	};
				1094
				1095	/// NumberExprAST - Expression class for numeric literals like "1.0".
				1096	class NumberExprAST : public ExprAST {
				1097	double Val;
				1098	public:
				1099	NumberExprAST(double val) : Val(val) {}
				1100	virtual Value *Codegen();
				1101	};
				1102
				1103	/// VariableExprAST - Expression class for referencing a variable, like "a".
				1104	class VariableExprAST : public ExprAST {
				1105	std::string Name;
				1106	public:
				1107	VariableExprAST(const std::string &name) : Name(name) {}
				1108	const std::string &getName() const { return Name; }
				1109	virtual Value *Codegen();
				1110	};
				1111
				1112	/// UnaryExprAST - Expression class for a unary operator.
				1113	class UnaryExprAST : public ExprAST {
				1114	char Opcode;
				1115	ExprAST *Operand;
				1116	public:
				1117	UnaryExprAST(char opcode, ExprAST *operand)
				1118	: Opcode(opcode), Operand(operand) {}
				1119	virtual Value *Codegen();
				1120	};
				1121
				1122	/// BinaryExprAST - Expression class for a binary operator.
				1123	class BinaryExprAST : public ExprAST {
				1124	char Op;
				1125	ExprAST LHS, RHS;
				1126	public:
				1127	BinaryExprAST(char op, ExprAST lhs, ExprAST rhs)
				1128	: Op(op), LHS(lhs), RHS(rhs) {}
				1129	virtual Value *Codegen();
				1130	};
				1131
				1132	/// CallExprAST - Expression class for function calls.
				1133	class CallExprAST : public ExprAST {
				1134	std::string Callee;
				1135	std::vector<ExprAST*> Args;
				1136	public:
				1137	CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
				1138	: Callee(callee), Args(args) {}
				1139	virtual Value *Codegen();
				1140	};
				1141
				1142	/// IfExprAST - Expression class for if/then/else.
				1143	class IfExprAST : public ExprAST {
				1144	ExprAST Cond, Then, *Else;
				1145	public:
				1146	IfExprAST(ExprAST cond, ExprAST then, ExprAST *_else)
				1147	: Cond(cond), Then(then), Else(_else) {}
				1148	virtual Value *Codegen();
				1149	};
				1150
				1151	/// ForExprAST - Expression class for for/in.
				1152	class ForExprAST : public ExprAST {
				1153	std::string VarName;
				1154	ExprAST Start, End, Step, Body;
				1155	public:
				1156	ForExprAST(const std::string &varname, ExprAST start, ExprAST end,
				1157	ExprAST step, ExprAST body)
				1158	: VarName(varname), Start(start), End(end), Step(step), Body(body) {}
				1159	virtual Value *Codegen();
				1160	};
				1161
				1162	/// VarExprAST - Expression class for var/in
				1163	class VarExprAST : public ExprAST {
				1164	std::vector<std::pair<std::string, ExprAST*> > VarNames;
				1165	ExprAST *Body;
				1166	public:
				1167	VarExprAST(const std::vector<std::pair<std::string, ExprAST*> > &varnames,
				1168	ExprAST *body)
				1169	: VarNames(varnames), Body(body) {}
				1170
				1171	virtual Value *Codegen();
				1172	};
				1173
				1174	/// PrototypeAST - This class represents the "prototype" for a function,
				1175	/// which captures its argument names as well as if it is an operator.
				1176	class PrototypeAST {
				1177	std::string Name;
				1178	std::vector<std::string> Args;
				1179	bool isOperator;
				1180	unsigned Precedence; // Precedence if a binary op.
				1181	public:
				1182	PrototypeAST(const std::string &name, const std::vector<std::string> &args,
				1183	bool isoperator = false, unsigned prec = 0)
				1184	: Name(name), Args(args), isOperator(isoperator), Precedence(prec) {}
				1185
				1186	bool isUnaryOp() const { return isOperator && Args.size() == 1; }
				1187	bool isBinaryOp() const { return isOperator && Args.size() == 2; }
				1188
				1189	char getOperatorName() const {
				1190	assert(isUnaryOp() \|\| isBinaryOp());
				1191	return Name[Name.size()-1];
				1192	}
				1193
				1194	unsigned getBinaryPrecedence() const { return Precedence; }
				1195
				1196	Function *Codegen();
				1197
				1198	void CreateArgumentAllocas(Function *F);
				1199	};
				1200
				1201	/// FunctionAST - This class represents a function definition itself.
				1202	class FunctionAST {
				1203	PrototypeAST *Proto;
				1204	ExprAST *Body;
				1205	public:
				1206	FunctionAST(PrototypeAST proto, ExprAST body)
				1207	: Proto(proto), Body(body) {}
				1208
				1209	Function *Codegen();
				1210	};
				1211
				1212	//===----------------------------------------------------------------------===//
				1213	// Parser
				1214	//===----------------------------------------------------------------------===//
				1215
				1216	/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
				1217	/// token the parser it looking at. getNextToken reads another token from the
				1218	/// lexer and updates CurTok with its results.
				1219	static int CurTok;
				1220	static int getNextToken() {
				1221	return CurTok = gettok();
				1222	}
				1223
				1224	/// BinopPrecedence - This holds the precedence for each binary operator that is
				1225	/// defined.
				1226	static std::map<char, int> BinopPrecedence;
				1227
				1228	/// GetTokPrecedence - Get the precedence of the pending binary operator token.
				1229	static int GetTokPrecedence() {
				1230	if (!isascii(CurTok))
				1231	return -1;
				1232
				1233	// Make sure it's a declared binop.
				1234	int TokPrec = BinopPrecedence[CurTok];
				1235	if (TokPrec <= 0) return -1;
				1236	return TokPrec;
				1237	}
				1238
				1239	/// Error* - These are little helper functions for error handling.
				1240	ExprAST Error(const char Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
				1241	PrototypeAST ErrorP(const char Str) { Error(Str); return 0; }
				1242	FunctionAST ErrorF(const char Str) { Error(Str); return 0; }
				1243
				1244	static ExprAST *ParseExpression();
				1245
				1246	/// identifierexpr
				1247	/// ::= identifer
				1248	/// ::= identifer '(' expression* ')'
				1249	static ExprAST *ParseIdentifierExpr() {
				1250	std::string IdName = IdentifierStr;
				1251
				1252	getNextToken(); // eat identifer.
				1253
				1254	if (CurTok != '(') // Simple variable ref.
				1255	return new VariableExprAST(IdName);
				1256
				1257	// Call.
				1258	getNextToken(); // eat (
				1259	std::vector<ExprAST*> Args;
				1260	if (CurTok != ')') {
				1261	while (1) {
				1262	ExprAST *Arg = ParseExpression();
				1263	if (!Arg) return 0;
				1264	Args.push_back(Arg);
				1265
				1266	if (CurTok == ')') break;
				1267
				1268	if (CurTok != ',')
				1269	return Error("Expected ')'");
				1270	getNextToken();
				1271	}
				1272	}
				1273
				1274	// Eat the ')'.
				1275	getNextToken();
				1276
				1277	return new CallExprAST(IdName, Args);
				1278	}
				1279
				1280	/// numberexpr ::= number
				1281	static ExprAST *ParseNumberExpr() {
				1282	ExprAST *Result = new NumberExprAST(NumVal);
				1283	getNextToken(); // consume the number
				1284	return Result;
				1285	}
				1286
				1287	/// parenexpr ::= '(' expression ')'
				1288	static ExprAST *ParseParenExpr() {
				1289	getNextToken(); // eat (.
				1290	ExprAST *V = ParseExpression();
				1291	if (!V) return 0;
				1292
				1293	if (CurTok != ')')
				1294	return Error("expected ')'");
				1295	getNextToken(); // eat ).
				1296	return V;
				1297	}
				1298
				1299	/// ifexpr ::= 'if' expression 'then' expression 'else' expression
				1300	static ExprAST *ParseIfExpr() {
				1301	getNextToken(); // eat the if.
				1302
				1303	// condition.
				1304	ExprAST *Cond = ParseExpression();
				1305	if (!Cond) return 0;
				1306
				1307	if (CurTok != tok_then)
				1308	return Error("expected then");
				1309	getNextToken(); // eat the then
				1310
				1311	ExprAST *Then = ParseExpression();
				1312	if (Then == 0) return 0;
				1313
				1314	if (CurTok != tok_else)
				1315	return Error("expected else");
				1316
				1317	getNextToken();
				1318
				1319	ExprAST *Else = ParseExpression();
				1320	if (!Else) return 0;
				1321
				1322	return new IfExprAST(Cond, Then, Else);
				1323	}
				1324
				1325	/// forexpr ::= 'for' identifer '=' expr ',' expr (',' expr)? 'in' expression
				1326	static ExprAST *ParseForExpr() {
				1327	getNextToken(); // eat the for.
				1328
				1329	if (CurTok != tok_identifier)
				1330	return Error("expected identifier after for");
				1331
				1332	std::string IdName = IdentifierStr;
				1333	getNextToken(); // eat identifer.
				1334
				1335	if (CurTok != '=')
				1336	return Error("expected '=' after for");
				1337	getNextToken(); // eat '='.
				1338
				1339
				1340	ExprAST *Start = ParseExpression();
				1341	if (Start == 0) return 0;
				1342	if (CurTok != ',')
				1343	return Error("expected ',' after for start value");
				1344	getNextToken();
				1345
				1346	ExprAST *End = ParseExpression();
				1347	if (End == 0) return 0;
				1348
				1349	// The step value is optional.
				1350	ExprAST *Step = 0;
				1351	if (CurTok == ',') {
				1352	getNextToken();
				1353	Step = ParseExpression();
				1354	if (Step == 0) return 0;
				1355	}
				1356
				1357	if (CurTok != tok_in)
				1358	return Error("expected 'in' after for");
				1359	getNextToken(); // eat 'in'.
				1360
				1361	ExprAST *Body = ParseExpression();
				1362	if (Body == 0) return 0;
				1363
				1364	return new ForExprAST(IdName, Start, End, Step, Body);
				1365	}
				1366
				1367	/// varexpr ::= 'var' identifer ('=' expression)?
				1368	// (',' identifer ('=' expression)?)* 'in' expression
				1369	static ExprAST *ParseVarExpr() {
				1370	getNextToken(); // eat the var.
				1371
				1372	std::vector<std::pair<std::string, ExprAST*> > VarNames;
				1373
				1374	// At least one variable name is required.
				1375	if (CurTok != tok_identifier)
				1376	return Error("expected identifier after var");
				1377
				1378	while (1) {
				1379	std::string Name = IdentifierStr;
				1380	getNextToken(); // eat identifer.
				1381
				1382	// Read the optional initializer.
				1383	ExprAST *Init = 0;
				1384	if (CurTok == '=') {
				1385	getNextToken(); // eat the '='.
				1386
				1387	Init = ParseExpression();
				1388	if (Init == 0) return 0;
				1389	}
				1390
				1391	VarNames.push_back(std::make_pair(Name, Init));
				1392
				1393	// End of var list, exit loop.
				1394	if (CurTok != ',') break;
				1395	getNextToken(); // eat the ','.
				1396
				1397	if (CurTok != tok_identifier)
				1398	return Error("expected identifier list after var");
				1399	}
				1400
				1401	// At this point, we have to have 'in'.
				1402	if (CurTok != tok_in)
				1403	return Error("expected 'in' keyword after 'var'");
				1404	getNextToken(); // eat 'in'.
				1405
				1406	ExprAST *Body = ParseExpression();
				1407	if (Body == 0) return 0;
				1408
				1409	return new VarExprAST(VarNames, Body);
				1410	}
				1411
				1412
				1413	/// primary
				1414	/// ::= identifierexpr
				1415	/// ::= numberexpr
				1416	/// ::= parenexpr
				1417	/// ::= ifexpr
				1418	/// ::= forexpr
				1419	/// ::= varexpr
				1420	static ExprAST *ParsePrimary() {
				1421	switch (CurTok) {
				1422	default: return Error("unknown token when expecting an expression");
				1423	case tok_identifier: return ParseIdentifierExpr();
				1424	case tok_number: return ParseNumberExpr();
				1425	case '(': return ParseParenExpr();
				1426	case tok_if: return ParseIfExpr();
				1427	case tok_for: return ParseForExpr();
				1428	case tok_var: return ParseVarExpr();
				1429	}
				1430	}
				1431
				1432	/// unary
				1433	/// ::= primary
				1434	/// ::= '!' unary
				1435	static ExprAST *ParseUnary() {
				1436	// If the current token is not an operator, it must be a primary expr.
				1437	if (!isascii(CurTok) \|\| CurTok == '(' \|\| CurTok == ',')
				1438	return ParsePrimary();
				1439
				1440	// If this is a unary operator, read it.
				1441	int Opc = CurTok;
				1442	getNextToken();
				1443	if (ExprAST *Operand = ParseUnary())
				1444	return new UnaryExprAST(Opc, Operand);
				1445	return 0;
				1446	}
				1447
				1448	/// binoprhs
				1449	/// ::= ('+' unary)*
				1450	static ExprAST ParseBinOpRHS(int ExprPrec, ExprAST LHS) {
				1451	// If this is a binop, find its precedence.
				1452	while (1) {
				1453	int TokPrec = GetTokPrecedence();
				1454
				1455	// If this is a binop that binds at least as tightly as the current binop,
				1456	// consume it, otherwise we are done.
				1457	if (TokPrec < ExprPrec)
				1458	return LHS;
				1459
				1460	// Okay, we know this is a binop.
				1461	int BinOp = CurTok;
				1462	getNextToken(); // eat binop
				1463
				1464	// Parse the unary expression after the binary operator.
				1465	ExprAST *RHS = ParseUnary();
				1466	if (!RHS) return 0;
				1467
				1468	// If BinOp binds less tightly with RHS than the operator after RHS, let
				1469	// the pending operator take RHS as its LHS.
				1470	int NextPrec = GetTokPrecedence();
				1471	if (TokPrec < NextPrec) {
				1472	RHS = ParseBinOpRHS(TokPrec+1, RHS);
				1473	if (RHS == 0) return 0;
				1474	}
				1475
				1476	// Merge LHS/RHS.
				1477	LHS = new BinaryExprAST(BinOp, LHS, RHS);
				1478	}
				1479	}
				1480
				1481	/// expression
				1482	/// ::= unary binoprhs
				1483	///
				1484	static ExprAST *ParseExpression() {
				1485	ExprAST *LHS = ParseUnary();
				1486	if (!LHS) return 0;
				1487
				1488	return ParseBinOpRHS(0, LHS);
				1489	}
				1490
				1491	/// prototype
				1492	/// ::= id '(' id* ')'
				1493	/// ::= binary LETTER number? (id, id)
				1494	/// ::= unary LETTER (id)
				1495	static PrototypeAST *ParsePrototype() {
				1496	std::string FnName;
				1497
				1498	int Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
				1499	unsigned BinaryPrecedence = 30;
				1500
				1501	switch (CurTok) {
				1502	default:
				1503	return ErrorP("Expected function name in prototype");
				1504	case tok_identifier:
				1505	FnName = IdentifierStr;
				1506	Kind = 0;
				1507	getNextToken();
				1508	break;
				1509	case tok_unary:
				1510	getNextToken();
				1511	if (!isascii(CurTok))
				1512	return ErrorP("Expected unary operator");
				1513	FnName = "unary";
				1514	FnName += (char)CurTok;
				1515	Kind = 1;
				1516	getNextToken();
				1517	break;
				1518	case tok_binary:
				1519	getNextToken();
				1520	if (!isascii(CurTok))
				1521	return ErrorP("Expected binary operator");
				1522	FnName = "binary";
				1523	FnName += (char)CurTok;
				1524	Kind = 2;
				1525	getNextToken();
				1526
				1527	// Read the precedence if present.
				1528	if (CurTok == tok_number) {
				1529	if (NumVal < 1 \|\| NumVal > 100)
				1530	return ErrorP("Invalid precedecnce: must be 1..100");
				1531	BinaryPrecedence = (unsigned)NumVal;
				1532	getNextToken();
				1533	}
				1534	break;
				1535	}
				1536
				1537	if (CurTok != '(')
				1538	return ErrorP("Expected '(' in prototype");
				1539
				1540	std::vector<std::string> ArgNames;
				1541	while (getNextToken() == tok_identifier)
				1542	ArgNames.push_back(IdentifierStr);
				1543	if (CurTok != ')')
				1544	return ErrorP("Expected ')' in prototype");
				1545
				1546	// success.
				1547	getNextToken(); // eat ')'.
				1548
				1549	// Verify right number of names for operator.
				1550	if (Kind && ArgNames.size() != Kind)
				1551	return ErrorP("Invalid number of operands for operator");
				1552
				1553	return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);
				1554	}
				1555
				1556	/// definition ::= 'def' prototype expression
				1557	static FunctionAST *ParseDefinition() {
				1558	getNextToken(); // eat def.
				1559	PrototypeAST *Proto = ParsePrototype();
				1560	if (Proto == 0) return 0;
				1561
				1562	if (ExprAST *E = ParseExpression())
				1563	return new FunctionAST(Proto, E);
				1564	return 0;
				1565	}
				1566
				1567	/// toplevelexpr ::= expression
				1568	static FunctionAST *ParseTopLevelExpr() {
				1569	if (ExprAST *E = ParseExpression()) {
				1570	// Make an anonymous proto.
				1571	PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
				1572	return new FunctionAST(Proto, E);
				1573	}
				1574	return 0;
				1575	}
				1576
				1577	/// external ::= 'extern' prototype
				1578	static PrototypeAST *ParseExtern() {
				1579	getNextToken(); // eat extern.
				1580	return ParsePrototype();
				1581	}
				1582
				1583	//===----------------------------------------------------------------------===//
				1584	// Code Generation
				1585	//===----------------------------------------------------------------------===//
				1586
				1587	static Module *TheModule;
				1588	static LLVMFoldingBuilder Builder;
				1589	static std::map<std::string, AllocaInst*> NamedValues;
				1590	static FunctionPassManager *TheFPM;
				1591
				1592	Value ErrorV(const char Str) { Error(Str); return 0; }
				1593
				1594	/// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
				1595	/// the function. This is used for mutable variables etc.
				1596	static AllocaInst CreateEntryBlockAlloca(Function TheFunction,
				1597	const std::string &VarName) {
				1598	LLVMBuilder TmpB(&TheFunction->getEntryBlock(),
				1599	TheFunction->getEntryBlock().begin());
				1600	return TmpB.CreateAlloca(Type::DoubleTy, 0, VarName.c_str());
				1601	}
				1602
				1603
				1604	Value *NumberExprAST::Codegen() {
				1605	return ConstantFP::get(Type::DoubleTy, APFloat(Val));
				1606	}
				1607
				1608	Value *VariableExprAST::Codegen() {
				1609	// Look this variable up in the function.
				1610	Value *V = NamedValues[Name];
				1611	if (V == 0) return ErrorV("Unknown variable name");
				1612
				1613	// Load the value.
				1614	return Builder.CreateLoad(V, Name.c_str());
				1615	}
				1616
				1617	Value *UnaryExprAST::Codegen() {
				1618	Value *OperandV = Operand->Codegen();
				1619	if (OperandV == 0) return 0;
				1620
				1621	Function *F = TheModule->getFunction(std::string("unary")+Opcode);
				1622	if (F == 0)
				1623	return ErrorV("Unknown unary operator");
				1624
				1625	return Builder.CreateCall(F, OperandV, "unop");
				1626	}
				1627
				1628
				1629	Value *BinaryExprAST::Codegen() {
				1630	// Special case '=' because we don't want to emit the LHS as an expression.
				1631	if (Op == '=') {
				1632	// Assignment requires the LHS to be an identifier.
				1633	VariableExprAST LHSE = dynamic_cast<VariableExprAST>(LHS);
				1634	if (!LHSE)
				1635	return ErrorV("destination of '=' must be a variable");
				1636	// Codegen the RHS.
				1637	Value *Val = RHS->Codegen();
				1638	if (Val == 0) return 0;
				1639
				1640	// Look up the name.
				1641	Value *Variable = NamedValues[LHSE->getName()];
				1642	if (Variable == 0) return ErrorV("Unknown variable name");
				1643
				1644	Builder.CreateStore(Val, Variable);
				1645	return Val;
				1646	}
				1647
				1648
				1649	Value *L = LHS->Codegen();
				1650	Value *R = RHS->Codegen();
				1651	if (L == 0 \|\| R == 0) return 0;
				1652
				1653	switch (Op) {
				1654	case '+': return Builder.CreateAdd(L, R, "addtmp");
				1655	case '-': return Builder.CreateSub(L, R, "subtmp");
				1656	case '*': return Builder.CreateMul(L, R, "multmp");
				1657	case '<':
				1658	L = Builder.CreateFCmpULT(L, R, "multmp");
				1659	// Convert bool 0/1 to double 0.0 or 1.0
				1660	return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
				1661	default: break;
				1662	}
				1663
				1664	// If it wasn't a builtin binary operator, it must be a user defined one. Emit
				1665	// a call to it.
				1666	Function *F = TheModule->getFunction(std::string("binary")+Op);
				1667	assert(F && "binary operator not found!");
				1668
				1669	Value *Ops[] = { L, R };
				1670	return Builder.CreateCall(F, Ops, Ops+2, "binop");
				1671	}
				1672
				1673	Value *CallExprAST::Codegen() {
				1674	// Look up the name in the global module table.
				1675	Function *CalleeF = TheModule->getFunction(Callee);
				1676	if (CalleeF == 0)
				1677	return ErrorV("Unknown function referenced");
				1678
				1679	// If argument mismatch error.
				1680	if (CalleeF->arg_size() != Args.size())
				1681	return ErrorV("Incorrect # arguments passed");
				1682
				1683	std::vector<Value*> ArgsV;
				1684	for (unsigned i = 0, e = Args.size(); i != e; ++i) {
				1685	ArgsV.push_back(Args[i]->Codegen());
				1686	if (ArgsV.back() == 0) return 0;
				1687	}
				1688
				1689	return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
				1690	}
				1691
				1692	Value *IfExprAST::Codegen() {
				1693	Value *CondV = Cond->Codegen();
				1694	if (CondV == 0) return 0;
				1695
				1696	// Convert condition to a bool by comparing equal to 0.0.
				1697	CondV = Builder.CreateFCmpONE(CondV,
				1698	ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
				1699	"ifcond");
				1700
				1701	Function *TheFunction = Builder.GetInsertBlock()->getParent();
				1702
				1703	// Create blocks for the then and else cases. Insert the 'then' block at the
				1704	// end of the function.
				1705	BasicBlock *ThenBB = new BasicBlock("then", TheFunction);
				1706	BasicBlock *ElseBB = new BasicBlock("else");
				1707	BasicBlock *MergeBB = new BasicBlock("ifcont");
				1708
				1709	Builder.CreateCondBr(CondV, ThenBB, ElseBB);
				1710
				1711	// Emit then value.
				1712	Builder.SetInsertPoint(ThenBB);
				1713
				1714	Value *ThenV = Then->Codegen();
				1715	if (ThenV == 0) return 0;
				1716
				1717	Builder.CreateBr(MergeBB);
				1718	// Codegen of 'Then' can change the current block, update ThenBB for the PHI.
				1719	ThenBB = Builder.GetInsertBlock();
				1720
				1721	// Emit else block.
				1722	TheFunction->getBasicBlockList().push_back(ElseBB);
				1723	Builder.SetInsertPoint(ElseBB);
				1724
				1725	Value *ElseV = Else->Codegen();
				1726	if (ElseV == 0) return 0;
				1727
				1728	Builder.CreateBr(MergeBB);
				1729	// Codegen of 'Else' can change the current block, update ElseBB for the PHI.
				1730	ElseBB = Builder.GetInsertBlock();
				1731
				1732	// Emit merge block.
				1733	TheFunction->getBasicBlockList().push_back(MergeBB);
				1734	Builder.SetInsertPoint(MergeBB);
				1735	PHINode *PN = Builder.CreatePHI(Type::DoubleTy, "iftmp");
				1736
				1737	PN->addIncoming(ThenV, ThenBB);
				1738	PN->addIncoming(ElseV, ElseBB);
				1739	return PN;
				1740	}
				1741
				1742	Value *ForExprAST::Codegen() {
				1743	// Output this as:
				1744	// var = alloca double
				1745	// ...
				1746	// start = startexpr
				1747	// store start -> var
				1748	// goto loop
				1749	// loop:
				1750	// ...
				1751	// bodyexpr
				1752	// ...
				1753	// loopend:
				1754	// step = stepexpr
				1755	// endcond = endexpr
				1756	//
				1757	// curvar = load var
				1758	// nextvar = curvar + step
				1759	// store nextvar -> var
				1760	// br endcond, loop, endloop
				1761	// outloop:
				1762
				1763	Function *TheFunction = Builder.GetInsertBlock()->getParent();
				1764
				1765	// Create an alloca for the variable in the entry block.
				1766	AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
				1767
				1768	// Emit the start code first, without 'variable' in scope.
				1769	Value *StartVal = Start->Codegen();
				1770	if (StartVal == 0) return 0;
				1771
				1772	// Store the value into the alloca.
				1773	Builder.CreateStore(StartVal, Alloca);
				1774
				1775	// Make the new basic block for the loop header, inserting after current
				1776	// block.
				1777	BasicBlock *PreheaderBB = Builder.GetInsertBlock();
				1778	BasicBlock *LoopBB = new BasicBlock("loop", TheFunction);
				1779
				1780	// Insert an explicit fall through from the current block to the LoopBB.
				1781	Builder.CreateBr(LoopBB);
				1782
				1783	// Start insertion in LoopBB.
				1784	Builder.SetInsertPoint(LoopBB);
				1785
				1786	// Within the loop, the variable is defined equal to the PHI node. If it
				1787	// shadows an existing variable, we have to restore it, so save it now.
				1788	AllocaInst *OldVal = NamedValues[VarName];
				1789	NamedValues[VarName] = Alloca;
				1790
				1791	// Emit the body of the loop. This, like any other expr, can change the
				1792	// current BB. Note that we ignore the value computed by the body, but don't
				1793	// allow an error.
				1794	if (Body->Codegen() == 0)
				1795	return 0;
				1796
				1797	// Emit the step value.
				1798	Value *StepVal;
				1799	if (Step) {
				1800	StepVal = Step->Codegen();
				1801	if (StepVal == 0) return 0;
				1802	} else {
				1803	// If not specified, use 1.0.
				1804	StepVal = ConstantFP::get(Type::DoubleTy, APFloat(1.0));
				1805	}
				1806
				1807	// Compute the end condition.
				1808	Value *EndCond = End->Codegen();
				1809	if (EndCond == 0) return EndCond;
				1810
				1811	// Reload, increment, and restore the alloca. This handles the case where
				1812	// the body of the loop mutates the variable.
				1813	Value *CurVar = Builder.CreateLoad(Alloca, VarName.c_str());
				1814	Value *NextVar = Builder.CreateAdd(CurVar, StepVal, "nextvar");
				1815	Builder.CreateStore(NextVar, Alloca);
				1816
				1817	// Convert condition to a bool by comparing equal to 0.0.
				1818	EndCond = Builder.CreateFCmpONE(EndCond,
				1819	ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
				1820	"loopcond");
				1821
				1822	// Create the "after loop" block and insert it.
				1823	BasicBlock *LoopEndBB = Builder.GetInsertBlock();
				1824	BasicBlock *AfterBB = new BasicBlock("afterloop", TheFunction);
				1825
				1826	// Insert the conditional branch into the end of LoopEndBB.
				1827	Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
				1828
				1829	// Any new code will be inserted in AfterBB.
				1830	Builder.SetInsertPoint(AfterBB);
				1831
				1832	// Restore the unshadowed variable.
				1833	if (OldVal)
				1834	NamedValues[VarName] = OldVal;
				1835	else
				1836	NamedValues.erase(VarName);
				1837
				1838
				1839	// for expr always returns 0.0.
				1840	return Constant::getNullValue(Type::DoubleTy);
				1841	}
				1842
				1843	Value *VarExprAST::Codegen() {
				1844	std::vector<AllocaInst *> OldBindings;
				1845
				1846	Function *TheFunction = Builder.GetInsertBlock()->getParent();
				1847
				1848	// Register all variables and emit their initializer.
				1849	for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
				1850	const std::string &VarName = VarNames[i].first;
				1851	ExprAST *Init = VarNames[i].second;
				1852
				1853	// Emit the initializer before adding the variable to scope, this prevents
				1854	// the initializer from referencing the variable itself, and permits stuff
				1855	// like this:
				1856	// var a = 1 in
				1857	// var a = a in ... # refers to outer 'a'.
				1858	Value *InitVal;
				1859	if (Init) {
				1860	InitVal = Init->Codegen();
				1861	if (InitVal == 0) return 0;
				1862	} else { // If not specified, use 0.0.
				1863	InitVal = ConstantFP::get(Type::DoubleTy, APFloat(0.0));
				1864	}
				1865
				1866	AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
				1867	Builder.CreateStore(InitVal, Alloca);
				1868
				1869	// Remember the old variable binding so that we can restore the binding when
				1870	// we unrecurse.
				1871	OldBindings.push_back(NamedValues[VarName]);
				1872
				1873	// Remember this binding.
				1874	NamedValues[VarName] = Alloca;
				1875	}
				1876
				1877	// Codegen the body, now that all vars are in scope.
				1878	Value *BodyVal = Body->Codegen();
				1879	if (BodyVal == 0) return 0;
				1880
				1881	// Pop all our variables from scope.
				1882	for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
				1883	NamedValues[VarNames[i].first] = OldBindings[i];
				1884
				1885	// Return the body computation.
				1886	return BodyVal;
				1887	}
				1888
				1889
				1890	Function *PrototypeAST::Codegen() {
				1891	// Make the function type: double(double,double) etc.
				1892	std::vector<const Type*> Doubles(Args.size(), Type::DoubleTy);
				1893	FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
				1894
				1895	Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
				1896
				1897	// If F conflicted, there was already something named 'Name'. If it has a
				1898	// body, don't allow redefinition or reextern.
				1899	if (F->getName() != Name) {
				1900	// Delete the one we just made and get the existing one.
				1901	F->eraseFromParent();
				1902	F = TheModule->getFunction(Name);
				1903
				1904	// If F already has a body, reject this.
				1905	if (!F->empty()) {
				1906	ErrorF("redefinition of function");
				1907	return 0;
				1908	}
				1909
				1910	// If F took a different number of args, reject.
				1911	if (F->arg_size() != Args.size()) {
				1912	ErrorF("redefinition of function with different # args");
				1913	return 0;
				1914	}
				1915	}
				1916
				1917	// Set names for all arguments.
				1918	unsigned Idx = 0;
				1919	for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size();
				1920	++AI, ++Idx)
				1921	AI->setName(Args[Idx]);
				1922
				1923	return F;
				1924	}
				1925
				1926	/// CreateArgumentAllocas - Create an alloca for each argument and register the
				1927	/// argument in the symbol table so that references to it will succeed.
				1928	void PrototypeAST::CreateArgumentAllocas(Function *F) {
				1929	Function::arg_iterator AI = F->arg_begin();
				1930	for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
				1931	// Create an alloca for this variable.
				1932	AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
				1933
				1934	// Store the initial value into the alloca.
				1935	Builder.CreateStore(AI, Alloca);
				1936
				1937	// Add arguments to variable symbol table.
				1938	NamedValues[Args[Idx]] = Alloca;
				1939	}
				1940	}
				1941
				1942
				1943	Function *FunctionAST::Codegen() {
				1944	NamedValues.clear();
				1945
				1946	Function *TheFunction = Proto->Codegen();
				1947	if (TheFunction == 0)
				1948	return 0;
				1949
				1950	// If this is an operator, install it.
				1951	if (Proto->isBinaryOp())
				1952	BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence();
				1953
				1954	// Create a new basic block to start insertion into.
				1955	BasicBlock *BB = new BasicBlock("entry", TheFunction);
				1956	Builder.SetInsertPoint(BB);
				1957
				1958	// Add all arguments to the symbol table and create their allocas.
				1959	Proto->CreateArgumentAllocas(TheFunction);
				1960
				1961	if (Value *RetVal = Body->Codegen()) {
				1962	// Finish off the function.
				1963	Builder.CreateRet(RetVal);
				1964
				1965	// Validate the generated code, checking for consistency.
				1966	verifyFunction(*TheFunction);
				1967
				1968	// Optimize the function.
				1969	TheFPM->run(*TheFunction);
				1970
				1971	return TheFunction;
				1972	}
				1973
				1974	// Error reading body, remove function.
				1975	TheFunction->eraseFromParent();
				1976
				1977	if (Proto->isBinaryOp())
				1978	BinopPrecedence.erase(Proto->getOperatorName());
				1979	return 0;
				1980	}
				1981
				1982	//===----------------------------------------------------------------------===//
				1983	// Top-Level parsing and JIT Driver
				1984	//===----------------------------------------------------------------------===//
				1985
				1986	static ExecutionEngine *TheExecutionEngine;
				1987
				1988	static void HandleDefinition() {
				1989	if (FunctionAST *F = ParseDefinition()) {
				1990	if (Function *LF = F->Codegen()) {
				1991	fprintf(stderr, "Read function definition:");
				1992	LF->dump();
				1993	}
				1994	} else {
				1995	// Skip token for error recovery.
				1996	getNextToken();
				1997	}
				1998	}
				1999
				2000	static void HandleExtern() {
				2001	if (PrototypeAST *P = ParseExtern()) {
				2002	if (Function *F = P->Codegen()) {
				2003	fprintf(stderr, "Read extern: ");
				2004	F->dump();
				2005	}
				2006	} else {
				2007	// Skip token for error recovery.
				2008	getNextToken();
				2009	}
				2010	}
				2011
				2012	static void HandleTopLevelExpression() {
				2013	// Evaluate a top level expression into an anonymous function.
				2014	if (FunctionAST *F = ParseTopLevelExpr()) {
				2015	if (Function *LF = F->Codegen()) {
				2016	// JIT the function, returning a function pointer.
				2017	void *FPtr = TheExecutionEngine->getPointerToFunction(LF);
				2018
				2019	// Cast it to the right type (takes no arguments, returns a double) so we
				2020	// can call it as a native function.
				2021	double (FP)() = (double ()())FPtr;
				2022	fprintf(stderr, "Evaluated to %f\n", FP());
				2023	}
				2024	} else {
				2025	// Skip token for error recovery.
				2026	getNextToken();
				2027	}
				2028	}
				2029
				2030	/// top ::= definition \| external \| expression \| ';'
				2031	static void MainLoop() {
				2032	while (1) {
				2033	fprintf(stderr, "ready> ");
				2034	switch (CurTok) {
				2035	case tok_eof: return;
				2036	case ';': getNextToken(); break; // ignore top level semicolons.
				2037	case tok_def: HandleDefinition(); break;
				2038	case tok_extern: HandleExtern(); break;
				2039	default: HandleTopLevelExpression(); break;
				2040	}
				2041	}
				2042	}
				2043
				2044
				2045
				2046	//===----------------------------------------------------------------------===//
				2047	// "Library" functions that can be "extern'd" from user code.
				2048	//===----------------------------------------------------------------------===//
				2049
				2050	/// putchard - putchar that takes a double and returns 0.
				2051	extern "C"
				2052	double putchard(double X) {
				2053	putchar((char)X);
				2054	return 0;
				2055	}
				2056
				2057	/// printd - printf that takes a double prints it as "%f\n", returning 0.
				2058	extern "C"
				2059	double printd(double X) {
				2060	printf("%f\n", X);
				2061	return 0;
				2062	}
				2063
				2064	//===----------------------------------------------------------------------===//
				2065	// Main driver code.
				2066	//===----------------------------------------------------------------------===//
				2067
				2068	int main() {
				2069	// Install standard binary operators.
				2070	// 1 is lowest precedence.
				2071	BinopPrecedence['='] = 2;
				2072	BinopPrecedence['<'] = 10;
				2073	BinopPrecedence['+'] = 20;
				2074	BinopPrecedence['-'] = 20;
				2075	BinopPrecedence['*'] = 40; // highest.
				2076
				2077	// Prime the first token.
				2078	fprintf(stderr, "ready> ");
				2079	getNextToken();
				2080
				2081	// Make the module, which holds all the code.
				2082	TheModule = new Module("my cool jit");
				2083
				2084	// Create the JIT.
				2085	TheExecutionEngine = ExecutionEngine::create(TheModule);
				2086
				2087	{
				2088	ExistingModuleProvider OurModuleProvider(TheModule);
				2089	FunctionPassManager OurFPM(&OurModuleProvider);
				2090
				2091	// Set up the optimizer pipeline. Start with registering info about how the
				2092	// target lays out data structures.
				2093	OurFPM.add(new TargetData(*TheExecutionEngine->getTargetData()));
				2094	// Promote allocas to registers.
				2095	OurFPM.add(createPromoteMemoryToRegisterPass());
				2096	// Do simple "peephole" optimizations and bit-twiddling optzns.
				2097	OurFPM.add(createInstructionCombiningPass());
				2098	// Reassociate expressions.
				2099	OurFPM.add(createReassociatePass());
				2100	// Eliminate Common SubExpressions.
				2101	OurFPM.add(createGVNPass());
				2102	// Simplify the control flow graph (deleting unreachable blocks, etc).
				2103	OurFPM.add(createCFGSimplificationPass());
				2104
				2105	// Set the global so the code gen can use this.
				2106	TheFPM = &OurFPM;
				2107
				2108	// Run the main "interpreter loop" now.
				2109	MainLoop();
				2110
				2111	TheFPM = 0;
				2112	} // Free module provider and pass manager.
				2113
				2114
				2115	// Print out all of the generated code.
				2116	TheModule->dump();
				2117	return 0;
				2118	}
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	2119	</pre>
				2120	</div>
				2121
				2122	</div>
				2123
				2124	<!-- *********************************************************************** -->
				2125	<hr>
				2126	<address>
				2127	<a href="http://jigsaw.w3.org/css-validator/check/referer"><img
				2128	src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
				2129	<a href="http://validator.w3.org/check/referer"><img
				2130	src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
				2131
				2132	<a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
				2133	<a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
				2134	Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $
				2135	</address>
				2136	</body>
				2137	</html>