Blame - docs/tutorial/LangImpl7.html - fp2-dev/platform/external/llvm

blob: 9aa65d3b6e126bf74905ad9289961c9801883596 [file] [log] [blame]

Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
				2	"http://www.w3.org/TR/html4/strict.dtd">
				3
				4	<html>
				5	<head>
				6	<title>Kaleidoscope: Extending the Language: Mutable Variables / SSA
				7	construction</title>
				8	<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
				9	<meta name="author" content="Chris Lattner">
				10	<link rel="stylesheet" href="../llvm.css" type="text/css">
				11	</head>
				12
				13	<body>
				14
				15	<div class="doc_title">Kaleidoscope: Extending the Language: Mutable Variables</div>
				16
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	17	<ul>
Chris Lattner	0e555b1	2007-11-05 20:04:56 +0000	[diff] [blame]	18	<li><a href="index.html">Up to Tutorial Index</a></li>
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	19	<li>Chapter 7
				20	<ol>
				21	<li><a href="#intro">Chapter 7 Introduction</a></li>
				22	<li><a href="#why">Why is this a hard problem?</a></li>
				23	<li><a href="#memory">Memory in LLVM</a></li>
				24	<li><a href="#kalvars">Mutable Variables in Kaleidoscope</a></li>
				25	<li><a href="#adjustments">Adjusting Existing Variables for
				26	Mutation</a></li>
				27	<li><a href="#assignment">New Assignment Operator</a></li>
				28	<li><a href="#localvars">User-defined Local Variables</a></li>
				29	<li><a href="#code">Full Code Listing</a></li>
				30	</ol>
				31	</li>
Chris Lattner	0e555b1	2007-11-05 20:04:56 +0000	[diff] [blame]	32	<li><a href="LangImpl8.html">Chapter 8</a>: Conclusion and other useful LLVM
				33	tidbits</li>
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	34	</ul>
				35
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	36	<div class="doc_author">
				37	<p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
				38	</div>
				39
				40	<!-- *********************************************************************** -->
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	41	<div class="doc_section"><a name="intro">Chapter 7 Introduction</a></div>
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	42	<!-- *********************************************************************** -->
				43
				44	<div class="doc_text">
				45
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	46	<p>Welcome to Chapter 7 of the "<a href="index.html">Implementing a language
				47	with LLVM</a>" tutorial. In chapters 1 through 6, we've built a very
				48	respectable, albeit simple, <a
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	49	href="http://en.wikipedia.org/wiki/Functional_programming">functional
				50	programming language</a>. In our journey, we learned some parsing techniques,
				51	how to build and represent an AST, how to build LLVM IR, and how to optimize
				52	the resultant code and JIT compile it.</p>
				53
				54	<p>While Kaleidoscope is interesting as a functional language, this makes it
				55	"too easy" to generate LLVM IR for it. In particular, a functional language
				56	makes it very easy to build LLVM IR directly in <a
				57	href="http://en.wikipedia.org/wiki/Static_single_assignment_form">SSA form</a>.
				58	Since LLVM requires that the input code be in SSA form, this is a very nice
				59	property and it is often unclear to newcomers how to generate code for an
				60	imperative language with mutable variables.</p>
				61
				62	<p>The short (and happy) summary of this chapter is that there is no need for
				63	your front-end to build SSA form: LLVM provides highly tuned and well tested
				64	support for this, though the way it works is a bit unexpected for some.</p>
				65
				66	</div>
				67
				68	<!-- *********************************************************************** -->
				69	<div class="doc_section"><a name="why">Why is this a hard problem?</a></div>
				70	<!-- *********************************************************************** -->
				71
				72	<div class="doc_text">
				73
				74	<p>
				75	To understand why mutable variables cause complexities in SSA construction,
				76	consider this extremely simple C example:
				77	</p>
				78
				79	<div class="doc_code">
				80	<pre>
				81	int G, H;
				82	int test(_Bool Condition) {
				83	int X;
				84	if (Condition)
				85	X = G;
				86	else
				87	X = H;
				88	return X;
				89	}
				90	</pre>
				91	</div>
				92
				93	<p>In this case, we have the variable "X", whose value depends on the path
				94	executed in the program. Because there are two different possible values for X
				95	before the return instruction, a PHI node is inserted to merge the two values.
				96	The LLVM IR that we want for this example looks like this:</p>
				97
				98	<div class="doc_code">
				99	<pre>
				100	@G = weak global i32 0 ; type of @G is i32*
				101	@H = weak global i32 0 ; type of @H is i32*
				102
				103	define i32 @test(i1 %Condition) {
				104	entry:
				105	br i1 %Condition, label %cond_true, label %cond_false
				106
				107	cond_true:
				108	%X.0 = load i32* @G
				109	br label %cond_next
				110
				111	cond_false:
				112	%X.1 = load i32* @H
				113	br label %cond_next
				114
				115	cond_next:
				116	%X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
				117	ret i32 %X.2
				118	}
				119	</pre>
				120	</div>
				121
				122	<p>In this example, the loads from the G and H global variables are explicit in
				123	the LLVM IR, and they live in the then/else branches of the if statement
				124	(cond_true/cond_false). In order to merge the incoming values, the X.2 phi node
				125	in the cond_next block selects the right value to use based on where control
				126	flow is coming from: if control flow comes from the cond_false block, X.2 gets
				127	the value of X.1. Alternatively, if control flow comes from cond_tree, it gets
				128	the value of X.0. The intent of this chapter is not to explain the details of
				129	SSA form. For more information, see one of the many <a
				130	href="http://en.wikipedia.org/wiki/Static_single_assignment_form">online
				131	references</a>.</p>
				132
				133	<p>The question for this article is "who places phi nodes when lowering
				134	assignments to mutable variables?". The issue here is that LLVM
				135	<em>requires</em> that its IR be in SSA form: there is no "non-ssa" mode for it.
				136	However, SSA construction requires non-trivial algorithms and data structures,
				137	so it is inconvenient and wasteful for every front-end to have to reproduce this
				138	logic.</p>
				139
				140	</div>
				141
				142	<!-- *********************************************************************** -->
				143	<div class="doc_section"><a name="memory">Memory in LLVM</a></div>
				144	<!-- *********************************************************************** -->
				145
				146	<div class="doc_text">
				147
				148	<p>The 'trick' here is that while LLVM does require all register values to be
				149	in SSA form, it does not require (or permit) memory objects to be in SSA form.
				150	In the example above, note that the loads from G and H are direct accesses to
				151	G and H: they are not renamed or versioned. This differs from some other
Chris Lattner	2e5d07e	2007-11-04 19:42:13 +0000	[diff] [blame]	152	compiler systems, which do try to version memory objects. In LLVM, instead of
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	153	encoding dataflow analysis of memory into the LLVM IR, it is handled with <a
				154	href="../WritingAnLLVMPass.html">Analysis Passes</a> which are computed on
				155	demand.</p>
				156
				157	<p>
				158	With this in mind, the high-level idea is that we want to make a stack variable
				159	(which lives in memory, because it is on the stack) for each mutable object in
				160	a function. To take advantage of this trick, we need to talk about how LLVM
				161	represents stack variables.
				162	</p>
				163
				164	<p>In LLVM, all memory accesses are explicit with load/store instructions, and
				165	it is carefully designed to not have (or need) an "address-of" operator. Notice
				166	how the type of the @G/@H global variables is actually "i32*" even though the
				167	variable is defined as "i32". What this means is that @G defines <em>space</em>
				168	for an i32 in the global data area, but its <em>name</em> actually refers to the
				169	address for that space. Stack variables work the same way, but instead of being
				170	declared with global variable definitions, they are declared with the
				171	<a href="../LangRef.html#i_alloca">LLVM alloca instruction</a>:</p>
				172
				173	<div class="doc_code">
				174	<pre>
				175	define i32 @test(i1 %Condition) {
				176	entry:
				177	%X = alloca i32 ; type of %X is i32*.
				178	...
				179	%tmp = load i32* %X ; load the stack value %X from the stack.
				180	%tmp2 = add i32 %tmp, 1 ; increment it
				181	store i32 %tmp2, i32* %X ; store it back
				182	...
				183	</pre>
				184	</div>
				185
				186	<p>This code shows an example of how you can declare and manipulate a stack
				187	variable in the LLVM IR. Stack memory allocated with the alloca instruction is
				188	fully general: you can pass the address of the stack slot to functions, you can
				189	store it in other variables, etc. In our example above, we could rewrite the
				190	example to use the alloca technique to avoid using a PHI node:</p>
				191
				192	<div class="doc_code">
				193	<pre>
				194	@G = weak global i32 0 ; type of @G is i32*
				195	@H = weak global i32 0 ; type of @H is i32*
				196
				197	define i32 @test(i1 %Condition) {
				198	entry:
				199	%X = alloca i32 ; type of %X is i32*.
				200	br i1 %Condition, label %cond_true, label %cond_false
				201
				202	cond_true:
				203	%X.0 = load i32* @G
				204	store i32 %X.0, i32* %X ; Update X
				205	br label %cond_next
				206
				207	cond_false:
				208	%X.1 = load i32* @H
				209	store i32 %X.1, i32* %X ; Update X
				210	br label %cond_next
				211
				212	cond_next:
				213	%X.2 = load i32* %X ; Read X
				214	ret i32 %X.2
				215	}
				216	</pre>
				217	</div>
				218
				219	<p>With this, we have discovered a way to handle arbitrary mutable variables
				220	without the need to create Phi nodes at all:</p>
				221
				222	<ol>
				223	<li>Each mutable variable becomes a stack allocation.</li>
				224	<li>Each read of the variable becomes a load from the stack.</li>
				225	<li>Each update of the variable becomes a store to the stack.</li>
				226	<li>Taking the address of a variable just uses the stack address directly.</li>
				227	</ol>
				228
				229	<p>While this solution has solved our immediate problem, it introduced another
				230	one: we have now apparently introduced a lot of stack traffic for very simple
				231	and common operations, a major performance problem. Fortunately for us, the
				232	LLVM optimizer has a highly-tuned optimization pass named "mem2reg" that handles
				233	this case, promoting allocas like this into SSA registers, inserting Phi nodes
				234	as appropriate. If you run this example through the pass, for example, you'll
				235	get:</p>
				236
				237	<div class="doc_code">
				238	<pre>
				239	$ <b>llvm-as < example.ll \| opt -mem2reg \| llvm-dis</b>
				240	@G = weak global i32 0
				241	@H = weak global i32 0
				242
				243	define i32 @test(i1 %Condition) {
				244	entry:
				245	br i1 %Condition, label %cond_true, label %cond_false
				246
				247	cond_true:
				248	%X.0 = load i32* @G
				249	br label %cond_next
				250
				251	cond_false:
				252	%X.1 = load i32* @H
				253	br label %cond_next
				254
				255	cond_next:
				256	%X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
				257	ret i32 %X.01
				258	}
				259	</pre>
Chris Lattner	e719831	2007-11-03 22:22:30 +0000	[diff] [blame]	260	</div>
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	261
Chris Lattner	e719831	2007-11-03 22:22:30 +0000	[diff] [blame]	262	<p>The mem2reg pass implements the standard "iterated dominator frontier"
				263	algorithm for constructing SSA form and has a number of optimizations that speed
				264	up very common degenerate cases. mem2reg really is the answer for dealing with
				265	mutable variables, and we highly recommend that you depend on it. Note that
				266	mem2reg only works on variables in certain circumstances:</p>
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	267
Chris Lattner	e719831	2007-11-03 22:22:30 +0000	[diff] [blame]	268	<ol>
				269	<li>mem2reg is alloca-driven: it looks for allocas and if it can handle them, it
				270	promotes them. It does not apply to global variables or heap allocations.</li>
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	271
Chris Lattner	e719831	2007-11-03 22:22:30 +0000	[diff] [blame]	272	<li>mem2reg only looks for alloca instructions in the entry block of the
				273	function. Being in the entry block guarantees that the alloca is only executed
				274	once, which makes analysis simpler.</li>
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	275
Chris Lattner	e719831	2007-11-03 22:22:30 +0000	[diff] [blame]	276	<li>mem2reg only promotes allocas whose uses are direct loads and stores. If
				277	the address of the stack object is passed to a function, or if any funny pointer
				278	arithmetic is involved, the alloca will not be promoted.</li>
				279
Chris Lattner	a56b22d	2007-11-05 17:45:54 +0000	[diff] [blame]	280	<li>mem2reg only works on allocas of <a
				281	href="../LangRef.html#t_classifications">first class</a>
				282	values (such as pointers, scalars and vectors), and only if the array size
Chris Lattner	e719831	2007-11-03 22:22:30 +0000	[diff] [blame]	283	of the allocation is 1 (or missing in the .ll file). mem2reg is not capable of
				284	promoting structs or arrays to registers. Note that the "scalarrepl" pass is
				285	more powerful and can promote structs, "unions", and arrays in many cases.</li>
				286
				287	</ol>
				288
				289	<p>
				290	All of these properties are easy to satisfy for most imperative languages, and
Chris Lattner	2e5d07e	2007-11-04 19:42:13 +0000	[diff] [blame]	291	we'll illustrate this below with Kaleidoscope. The final question you may be
Chris Lattner	e719831	2007-11-03 22:22:30 +0000	[diff] [blame]	292	asking is: should I bother with this nonsense for my front-end? Wouldn't it be
				293	better if I just did SSA construction directly, avoiding use of the mem2reg
				294	optimization pass? In short, we strongly recommend that use you this technique
				295	for building SSA form, unless there is an extremely good reason not to. Using
				296	this technique is:</p>
				297
				298	<ul>
				299	<li>Proven and well tested: llvm-gcc and clang both use this technique for local
				300	mutable variables. As such, the most common clients of LLVM are using this to
				301	handle a bulk of their variables. You can be sure that bugs are found fast and
				302	fixed early.</li>
				303
				304	<li>Extremely Fast: mem2reg has a number of special cases that make it fast in
				305	common cases as well as fully general. For example, it has fast-paths for
				306	variables that are only used in a single block, variables that only have one
				307	assignment point, good heuristics to avoid insertion of unneeded phi nodes, etc.
				308	</li>
				309
				310	<li>Needed for debug info generation: <a href="../SourceLevelDebugging.html">
				311	Debug information in LLVM</a> relies on having the address of the variable
				312	exposed to attach debug info to it. This technique dovetails very naturally
				313	with this style of debug info.</li>
				314	</ul>
				315
				316	<p>If nothing else, this makes it much easier to get your front-end up and
				317	running, and is very simple to implement. Lets extend Kaleidoscope with mutable
				318	variables now!
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	319	</p>
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	320
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	321	</div>
				322
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	323	<!-- *********************************************************************** -->
				324	<div class="doc_section"><a name="kalvars">Mutable Variables in
				325	Kaleidoscope</a></div>
				326	<!-- *********************************************************************** -->
				327
				328	<div class="doc_text">
				329
				330	<p>Now that we know the sort of problem we want to tackle, lets see what this
				331	looks like in the context of our little Kaleidoscope language. We're going to
				332	add two features:</p>
				333
				334	<ol>
				335	<li>The ability to mutate variables with the '=' operator.</li>
				336	<li>The ability to define new variables.</li>
				337	</ol>
				338
				339	<p>While the first item is really what this is about, we only have variables
				340	for incoming arguments and for induction variables, and redefining them only
				341	goes so far :). Also, the ability to define new variables is a
				342	useful thing regardless of whether you will be mutating them. Here's a
				343	motivating example that shows how we could use these:</p>
				344
				345	<div class="doc_code">
				346	<pre>
				347	# Define ':' for sequencing: as a low-precedence operator that ignores operands
				348	# and just returns the RHS.
				349	def binary : 1 (x y) y;
				350
				351	# Recursive fib, we could do this before.
				352	def fib(x)
				353	if (x < 3) then
				354	1
				355	else
				356	fib(x-1)+fib(x-2);
				357
				358	# Iterative fib.
				359	def fibi(x)
				360	<b>var a = 1, b = 1, c in</b>
				361	(for i = 3, i &;t; x in
				362	<b>c = a + b</b> :
				363	<b>a = b</b> :
				364	<b>b = c</b>) :
				365	b;
				366
				367	# Call it.
				368	fibi(10);
				369	</pre>
				370	</div>
				371
				372	<p>
				373	In order to mutate variables, we have to change our existing variables to use
				374	the "alloca trick". Once we have that, we'll add our new operator, then extend
				375	Kaleidoscope to support new variable definitions.
				376	</p>
				377
				378	</div>
				379
				380	<!-- *********************************************************************** -->
				381	<div class="doc_section"><a name="adjustments">Adjusting Existing Variables for
				382	Mutation</a></div>
				383	<!-- *********************************************************************** -->
				384
				385	<div class="doc_text">
				386
				387	<p>
				388	The symbol table in Kaleidoscope is managed at code generation time by the
				389	'<tt>NamedValues</tt>' map. This map currently keeps track of the LLVM "Value*"
				390	that holds the double value for the named variable. In order to support
				391	mutation, we need to change this slightly, so that it <tt>NamedValues</tt> holds
				392	the <em>memory location</em> of the variable in question. Note that this
				393	change is a refactoring: it changes the structure of the code, but does not
				394	(by itself) change the behavior of the compiler. All of these changes are
				395	isolated in the Kaleidoscope code generator.</p>
				396
				397	<p>
				398	At this point in Kaleidoscope's development, it only supports variables for two
				399	things: incoming arguments to functions and the induction variable of 'for'
				400	loops. For consistency, we'll allow mutation of these variables in addition to
				401	other user-defined variables. This means that these will both need memory
				402	locations.
				403	</p>
				404
				405	<p>To start our transformation of Kaleidoscope, we'll change the NamedValues
				406	map to map to AllocaInst* instead of Value*. Once we do this, the C++ compiler
				407	will tell use what parts of the code we need to update:</p>
				408
				409	<div class="doc_code">
				410	<pre>
				411	static std::map<std::string, AllocaInst*> NamedValues;
				412	</pre>
				413	</div>
				414
				415	<p>Also, since we will need to create these alloca's, we'll use a helper
				416	function that ensures that the allocas are created in the entry block of the
				417	function:</p>
				418
				419	<div class="doc_code">
				420	<pre>
				421	/// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
				422	/// the function. This is used for mutable variables etc.
				423	static AllocaInst CreateEntryBlockAlloca(Function TheFunction,
				424	const std::string &VarName) {
				425	LLVMBuilder TmpB(&TheFunction->getEntryBlock(),
				426	TheFunction->getEntryBlock().begin());
				427	return TmpB.CreateAlloca(Type::DoubleTy, 0, VarName.c_str());
				428	}
				429	</pre>
				430	</div>
				431
				432	<p>This funny looking code creates an LLVMBuilder object that is pointing at
				433	the first instruction (.begin()) of the entry block. It then creates an alloca
				434	with the expected name and returns it. Because all values in Kaleidoscope are
				435	doubles, there is no need to pass in a type to use.</p>
				436
				437	<p>With this in place, the first functionality change we want to make is to
				438	variable references. In our new scheme, variables live on the stack, so code
				439	generating a reference to them actually needs to produce a load from the stack
				440	slot:</p>
				441
				442	<div class="doc_code">
				443	<pre>
				444	Value *VariableExprAST::Codegen() {
				445	// Look this variable up in the function.
				446	Value *V = NamedValues[Name];
				447	if (V == 0) return ErrorV("Unknown variable name");
				448
				449	// Load the value.
				450	return Builder.CreateLoad(V, Name.c_str());
				451	}
				452	</pre>
				453	</div>
				454
				455	<p>As you can see, this is pretty straight-forward. Next we need to update the
				456	things that define the variables to set up the alloca. We'll start with
				457	<tt>ForExprAST::Codegen</tt> (see the <a href="#code">full code listing</a> for
				458	the unabridged code):</p>
				459
				460	<div class="doc_code">
				461	<pre>
				462	Function *TheFunction = Builder.GetInsertBlock()->getParent();
				463
				464	<b>// Create an alloca for the variable in the entry block.
				465	AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);</b>
				466
				467	// Emit the start code first, without 'variable' in scope.
				468	Value *StartVal = Start->Codegen();
				469	if (StartVal == 0) return 0;
				470
				471	<b>// Store the value into the alloca.
				472	Builder.CreateStore(StartVal, Alloca);</b>
				473	...
				474
				475	// Compute the end condition.
				476	Value *EndCond = End->Codegen();
				477	if (EndCond == 0) return EndCond;
				478
				479	<b>// Reload, increment, and restore the alloca. This handles the case where
				480	// the body of the loop mutates the variable.
				481	Value *CurVar = Builder.CreateLoad(Alloca);
				482	Value *NextVar = Builder.CreateAdd(CurVar, StepVal, "nextvar");
				483	Builder.CreateStore(NextVar, Alloca);</b>
				484	...
				485	</pre>
				486	</div>
				487
				488	<p>This code is virtually identical to the code <a
				489	href="LangImpl5.html#forcodegen">before we allowed mutable variables</a>. The
				490	big difference is that we no longer have to construct a PHI node, and we use
				491	load/store to access the variable as needed.</p>
				492
				493	<p>To support mutable argument variables, we need to also make allocas for them.
				494	The code for this is also pretty simple:</p>
				495
				496	<div class="doc_code">
				497	<pre>
				498	/// CreateArgumentAllocas - Create an alloca for each argument and register the
				499	/// argument in the symbol table so that references to it will succeed.
				500	void PrototypeAST::CreateArgumentAllocas(Function *F) {
				501	Function::arg_iterator AI = F->arg_begin();
				502	for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
				503	// Create an alloca for this variable.
				504	AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
				505
				506	// Store the initial value into the alloca.
				507	Builder.CreateStore(AI, Alloca);
				508
				509	// Add arguments to variable symbol table.
				510	NamedValues[Args[Idx]] = Alloca;
				511	}
				512	}
				513	</pre>
				514	</div>
				515
				516	<p>For each argument, we make an alloca, store the input value to the function
				517	into the alloca, and register the alloca as the memory location for the
				518	argument. This method gets invoked by <tt>FunctionAST::Codegen</tt> right after
				519	it sets up the entry block for the function.</p>
				520
				521	<p>The final missing piece is adding the 'mem2reg' pass, which allows us to get
				522	good codegen once again:</p>
				523
				524	<div class="doc_code">
				525	<pre>
				526	// Set up the optimizer pipeline. Start with registering info about how the
				527	// target lays out data structures.
				528	OurFPM.add(new TargetData(*TheExecutionEngine->getTargetData()));
				529	<b>// Promote allocas to registers.
				530	OurFPM.add(createPromoteMemoryToRegisterPass());</b>
				531	// Do simple "peephole" optimizations and bit-twiddling optzns.
				532	OurFPM.add(createInstructionCombiningPass());
				533	// Reassociate expressions.
				534	OurFPM.add(createReassociatePass());
				535	</pre>
				536	</div>
				537
				538	<p>It is interesting to see what the code looks like before and after the
				539	mem2reg optimization runs. For example, this is the before/after code for our
				540	recursive fib. Before the optimization:</p>
				541
				542	<div class="doc_code">
				543	<pre>
				544	define double @fib(double %x) {
				545	entry:
				546	<b>%x1 = alloca double
				547	store double %x, double* %x1
				548	%x2 = load double* %x1</b>
Chris Lattner	7115521	2007-11-06 01:39:12 +0000	[diff] [blame^]	549	%cmptmp = fcmp ult double %x2, 3.000000e+00
				550	%booltmp = uitofp i1 %cmptmp to double
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	551	%ifcond = fcmp one double %booltmp, 0.000000e+00
				552	br i1 %ifcond, label %then, label %else
				553
				554	then: ; preds = %entry
				555	br label %ifcont
				556
				557	else: ; preds = %entry
				558	<b>%x3 = load double* %x1</b>
				559	%subtmp = sub double %x3, 1.000000e+00
				560	%calltmp = call double @fib( double %subtmp )
				561	<b>%x4 = load double* %x1</b>
				562	%subtmp5 = sub double %x4, 2.000000e+00
				563	%calltmp6 = call double @fib( double %subtmp5 )
				564	%addtmp = add double %calltmp, %calltmp6
				565	br label %ifcont
				566
				567	ifcont: ; preds = %else, %then
				568	%iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
				569	ret double %iftmp
				570	}
				571	</pre>
				572	</div>
				573
				574	<p>Here there is only one variable (x, the input argument) but you can still
				575	see the extremely simple-minded code generation strategy we are using. In the
				576	entry block, an alloca is created, and the initial input value is stored into
				577	it. Each reference to the variable does a reload from the stack. Also, note
				578	that we didn't modify the if/then/else expression, so it still inserts a PHI
				579	node. While we could make an alloca for it, it is actually easier to create a
				580	PHI node for it, so we still just make the PHI.</p>
				581
				582	<p>Here is the code after the mem2reg pass runs:</p>
				583
				584	<div class="doc_code">
				585	<pre>
				586	define double @fib(double %x) {
				587	entry:
Chris Lattner	7115521	2007-11-06 01:39:12 +0000	[diff] [blame^]	588	%cmptmp = fcmp ult double <b>%x</b>, 3.000000e+00
				589	%booltmp = uitofp i1 %cmptmp to double
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	590	%ifcond = fcmp one double %booltmp, 0.000000e+00
				591	br i1 %ifcond, label %then, label %else
				592
				593	then:
				594	br label %ifcont
				595
				596	else:
				597	%subtmp = sub double <b>%x</b>, 1.000000e+00
				598	%calltmp = call double @fib( double %subtmp )
				599	%subtmp5 = sub double <b>%x</b>, 2.000000e+00
				600	%calltmp6 = call double @fib( double %subtmp5 )
				601	%addtmp = add double %calltmp, %calltmp6
				602	br label %ifcont
				603
				604	ifcont: ; preds = %else, %then
				605	%iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
				606	ret double %iftmp
				607	}
				608	</pre>
				609	</div>
				610
				611	<p>This is a trivial case for mem2reg, since there are no redefinitions of the
				612	variable. The point of showing this is to calm your tension about inserting
				613	such blatent inefficiencies :).</p>
				614
				615	<p>After the rest of the optimizers run, we get:</p>
				616
				617	<div class="doc_code">
				618	<pre>
				619	define double @fib(double %x) {
				620	entry:
Chris Lattner	7115521	2007-11-06 01:39:12 +0000	[diff] [blame^]	621	%cmptmp = fcmp ult double %x, 3.000000e+00
				622	%booltmp = uitofp i1 %cmptmp to double
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	623	%ifcond = fcmp ueq double %booltmp, 0.000000e+00
				624	br i1 %ifcond, label %else, label %ifcont
				625
				626	else:
				627	%subtmp = sub double %x, 1.000000e+00
				628	%calltmp = call double @fib( double %subtmp )
				629	%subtmp5 = sub double %x, 2.000000e+00
				630	%calltmp6 = call double @fib( double %subtmp5 )
				631	%addtmp = add double %calltmp, %calltmp6
				632	ret double %addtmp
				633
				634	ifcont:
				635	ret double 1.000000e+00
				636	}
				637	</pre>
				638	</div>
				639
				640	<p>Here we see that the simplifycfg pass decided to clone the return instruction
				641	into the end of the 'else' block. This allowed it to eliminate some branches
				642	and the PHI node.</p>
				643
				644	<p>Now that all symbol table references are updated to use stack variables,
				645	we'll add the assignment operator.</p>
				646
				647	</div>
				648
				649	<!-- *********************************************************************** -->
				650	<div class="doc_section"><a name="assignment">New Assignment Operator</a></div>
				651	<!-- *********************************************************************** -->
				652
				653	<div class="doc_text">
				654
				655	<p>With our current framework, adding a new assignment operator is really
				656	simple. We will parse it just like any other binary operator, but handle it
				657	internally (instead of allowing the user to define it). The first step is to
				658	set a precedence:</p>
				659
				660	<div class="doc_code">
				661	<pre>
				662	int main() {
				663	// Install standard binary operators.
				664	// 1 is lowest precedence.
				665	<b>BinopPrecedence['='] = 2;</b>
				666	BinopPrecedence['<'] = 10;
				667	BinopPrecedence['+'] = 20;
				668	BinopPrecedence['-'] = 20;
				669	</pre>
				670	</div>
				671
				672	<p>Now that the parser knows the precedence of the binary operator, it takes
				673	care of all the parsing and AST generation. We just need to implement codegen
				674	for the assignment operator. This looks like:</p>
				675
				676	<div class="doc_code">
				677	<pre>
				678	Value *BinaryExprAST::Codegen() {
				679	// Special case '=' because we don't want to emit the LHS as an expression.
				680	if (Op == '=') {
				681	// Assignment requires the LHS to be an identifier.
				682	VariableExprAST LHSE = dynamic_cast<VariableExprAST>(LHS);
				683	if (!LHSE)
				684	return ErrorV("destination of '=' must be a variable");
				685	</pre>
				686	</div>
				687
				688	<p>Unlike the rest of the binary operators, our assignment operator doesn't
				689	follow the "emit LHS, emit RHS, do computation" model. As such, it is handled
				690	as a special case before the other binary operators are handled. The other
				691	strange thing about it is that it requires the LHS to be a variable directly.
				692	</p>
				693
				694	<div class="doc_code">
				695	<pre>
				696	// Codegen the RHS.
				697	Value *Val = RHS->Codegen();
				698	if (Val == 0) return 0;
				699
				700	// Look up the name.
				701	Value *Variable = NamedValues[LHSE->getName()];
				702	if (Variable == 0) return ErrorV("Unknown variable name");
				703
				704	Builder.CreateStore(Val, Variable);
				705	return Val;
				706	}
				707	...
				708	</pre>
				709	</div>
				710
				711	<p>Once it has the variable, codegen'ing the assignment is straight-forward:
				712	we emit the RHS of the assignment, create a store, and return the computed
				713	value. Returning a value allows for chained assignments like "X = (Y = Z)".</p>
				714
				715	<p>Now that we have an assignment operator, we can mutate loop variables and
				716	arguments. For example, we can now run code like this:</p>
				717
				718	<div class="doc_code">
				719	<pre>
				720	# Function to print a double.
				721	extern printd(x);
				722
				723	# Define ':' for sequencing: as a low-precedence operator that ignores operands
				724	# and just returns the RHS.
				725	def binary : 1 (x y) y;
				726
				727	def test(x)
				728	printd(x) :
				729	x = 4 :
				730	printd(x);
				731
				732	test(123);
				733	</pre>
				734	</div>
				735
				736	<p>When run, this example prints "123" and then "4", showing that we did
				737	actually mutate the value! Okay, we have now officially implemented our goal:
				738	getting this to work requires SSA construction in the general case. However,
				739	to be really useful, we want the ability to define our own local variables, lets
				740	add this next!
				741	</p>
				742
				743	</div>
				744
				745	<!-- *********************************************************************** -->
				746	<div class="doc_section"><a name="localvars">User-defined Local
				747	Variables</a></div>
				748	<!-- *********************************************************************** -->
				749
				750	<div class="doc_text">
				751
				752	<p>Adding var/in is just like any other other extensions we made to
				753	Kaleidoscope: we extend the lexer, the parser, the AST and the code generator.
				754	The first step for adding our new 'var/in' construct is to extend the lexer.
				755	As before, this is pretty trivial, the code looks like this:</p>
				756
				757	<div class="doc_code">
				758	<pre>
				759	enum Token {
				760	...
				761	<b>// var definition
				762	tok_var = -13</b>
				763	...
				764	}
				765	...
				766	static int gettok() {
				767	...
				768	if (IdentifierStr == "in") return tok_in;
				769	if (IdentifierStr == "binary") return tok_binary;
				770	if (IdentifierStr == "unary") return tok_unary;
				771	<b>if (IdentifierStr == "var") return tok_var;</b>
				772	return tok_identifier;
				773	...
				774	</pre>
				775	</div>
				776
				777	<p>The next step is to define the AST node that we will construct. For var/in,
				778	it will look like this:</p>
				779
				780	<div class="doc_code">
				781	<pre>
				782	/// VarExprAST - Expression class for var/in
				783	class VarExprAST : public ExprAST {
				784	std::vector<std::pair<std::string, ExprAST*> > VarNames;
				785	ExprAST *Body;
				786	public:
				787	VarExprAST(const std::vector<std::pair<std::string, ExprAST*> > &varnames,
				788	ExprAST *body)
				789	: VarNames(varnames), Body(body) {}
				790
				791	virtual Value *Codegen();
				792	};
				793	</pre>
				794	</div>
				795
				796	<p>var/in allows a list of names to be defined all at once, and each name can
				797	optionally have an initializer value. As such, we capture this information in
				798	the VarNames vector. Also, var/in has a body, this body is allowed to access
				799	the variables defined by the let/in.</p>
				800
				801	<p>With this ready, we can define the parser pieces. First thing we do is add
				802	it as a primary expression:</p>
				803
				804	<div class="doc_code">
				805	<pre>
				806	/// primary
				807	/// ::= identifierexpr
				808	/// ::= numberexpr
				809	/// ::= parenexpr
				810	/// ::= ifexpr
				811	/// ::= forexpr
				812	<b>/// ::= varexpr</b>
				813	static ExprAST *ParsePrimary() {
				814	switch (CurTok) {
				815	default: return Error("unknown token when expecting an expression");
				816	case tok_identifier: return ParseIdentifierExpr();
				817	case tok_number: return ParseNumberExpr();
				818	case '(': return ParseParenExpr();
				819	case tok_if: return ParseIfExpr();
				820	case tok_for: return ParseForExpr();
				821	<b>case tok_var: return ParseVarExpr();</b>
				822	}
				823	}
				824	</pre>
				825	</div>
				826
				827	<p>Next we define ParseVarExpr:</p>
				828
				829	<div class="doc_code">
				830	<pre>
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	831	/// varexpr ::= 'var' identifier ('=' expression)?
				832	// (',' identifier ('=' expression)?)* 'in' expression
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	833	static ExprAST *ParseVarExpr() {
				834	getNextToken(); // eat the var.
				835
				836	std::vector<std::pair<std::string, ExprAST*> > VarNames;
				837
				838	// At least one variable name is required.
				839	if (CurTok != tok_identifier)
				840	return Error("expected identifier after var");
				841	</pre>
				842	</div>
				843
				844	<p>The first part of this code parses the list of identifier/expr pairs into the
				845	local <tt>VarNames</tt> vector.
				846
				847	<div class="doc_code">
				848	<pre>
				849	while (1) {
				850	std::string Name = IdentifierStr;
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	851	getNextToken(); // eat identifier.
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	852
				853	// Read the optional initializer.
				854	ExprAST *Init = 0;
				855	if (CurTok == '=') {
				856	getNextToken(); // eat the '='.
				857
				858	Init = ParseExpression();
				859	if (Init == 0) return 0;
				860	}
				861
				862	VarNames.push_back(std::make_pair(Name, Init));
				863
				864	// End of var list, exit loop.
				865	if (CurTok != ',') break;
				866	getNextToken(); // eat the ','.
				867
				868	if (CurTok != tok_identifier)
				869	return Error("expected identifier list after var");
				870	}
				871	</pre>
				872	</div>
				873
				874	<p>Once all the variables are parsed, we then parse the body and create the
				875	AST node:</p>
				876
				877	<div class="doc_code">
				878	<pre>
				879	// At this point, we have to have 'in'.
				880	if (CurTok != tok_in)
				881	return Error("expected 'in' keyword after 'var'");
				882	getNextToken(); // eat 'in'.
				883
				884	ExprAST *Body = ParseExpression();
				885	if (Body == 0) return 0;
				886
				887	return new VarExprAST(VarNames, Body);
				888	}
				889	</pre>
				890	</div>
				891
				892	<p>Now that we can parse and represent the code, we need to support emission of
				893	LLVM IR for it. This code starts out with:</p>
				894
				895	<div class="doc_code">
				896	<pre>
				897	Value *VarExprAST::Codegen() {
				898	std::vector<AllocaInst *> OldBindings;
				899
				900	Function *TheFunction = Builder.GetInsertBlock()->getParent();
				901
				902	// Register all variables and emit their initializer.
				903	for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
				904	const std::string &VarName = VarNames[i].first;
				905	ExprAST *Init = VarNames[i].second;
				906	</pre>
				907	</div>
				908
				909	<p>Basically it loops over all the variables, installing them one at a time.
				910	For each variable we put into the symbol table, we remember the previous value
				911	that we replace in OldBindings.</p>
				912
				913	<div class="doc_code">
				914	<pre>
				915	// Emit the initializer before adding the variable to scope, this prevents
				916	// the initializer from referencing the variable itself, and permits stuff
				917	// like this:
				918	// var a = 1 in
				919	// var a = a in ... # refers to outer 'a'.
				920	Value *InitVal;
				921	if (Init) {
				922	InitVal = Init->Codegen();
				923	if (InitVal == 0) return 0;
				924	} else { // If not specified, use 0.0.
				925	InitVal = ConstantFP::get(Type::DoubleTy, APFloat(0.0));
				926	}
				927
				928	AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
				929	Builder.CreateStore(InitVal, Alloca);
				930
				931	// Remember the old variable binding so that we can restore the binding when
				932	// we unrecurse.
				933	OldBindings.push_back(NamedValues[VarName]);
				934
				935	// Remember this binding.
				936	NamedValues[VarName] = Alloca;
				937	}
				938	</pre>
				939	</div>
				940
				941	<p>There are more comments here than code. The basic idea is that we emit the
				942	initializer, create the alloca, then update the symbol table to point to it.
				943	Once all the variables are installed in the symbol table, we evaluate the body
				944	of the var/in expression:</p>
				945
				946	<div class="doc_code">
				947	<pre>
				948	// Codegen the body, now that all vars are in scope.
				949	Value *BodyVal = Body->Codegen();
				950	if (BodyVal == 0) return 0;
				951	</pre>
				952	</div>
				953
				954	<p>Finally, before returning, we restore the previous variable bindings:</p>
				955
				956	<div class="doc_code">
				957	<pre>
				958	// Pop all our variables from scope.
				959	for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
				960	NamedValues[VarNames[i].first] = OldBindings[i];
				961
				962	// Return the body computation.
				963	return BodyVal;
				964	}
				965	</pre>
				966	</div>
				967
				968	<p>The end result of all of this is that we get properly scoped variable
				969	definitions, and we even (trivially) allow mutation of them :).</p>
				970
				971	<p>With this, we completed what we set out to do. Our nice iterative fib
				972	example from the intro compiles and runs just fine. The mem2reg pass optimizes
				973	all of our stack variables into SSA registers, inserting PHI nodes where needed,
				974	and our front-end remains simple: no iterated dominator frontier computation
				975	anywhere in sight.</p>
				976
				977	</div>
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	978
				979	<!-- *********************************************************************** -->
				980	<div class="doc_section"><a name="code">Full Code Listing</a></div>
				981	<!-- *********************************************************************** -->
				982
				983	<div class="doc_text">
				984
				985	<p>
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	986	Here is the complete code listing for our running example, enhanced with mutable
				987	variables and var/in support. To build this example, use:
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	988	</p>
				989
				990	<div class="doc_code">
				991	<pre>
				992	# Compile
				993	g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
				994	# Run
				995	./toy
				996	</pre>
				997	</div>
				998
				999	<p>Here is the code:</p>
				1000
				1001	<div class="doc_code">
				1002	<pre>
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	1003	#include "llvm/DerivedTypes.h"
				1004	#include "llvm/ExecutionEngine/ExecutionEngine.h"
				1005	#include "llvm/Module.h"
				1006	#include "llvm/ModuleProvider.h"
				1007	#include "llvm/PassManager.h"
				1008	#include "llvm/Analysis/Verifier.h"
				1009	#include "llvm/Target/TargetData.h"
				1010	#include "llvm/Transforms/Scalar.h"
				1011	#include "llvm/Support/LLVMBuilder.h"
				1012	#include <cstdio>
				1013	#include <string>
				1014	#include <map>
				1015	#include <vector>
				1016	using namespace llvm;
				1017
				1018	//===----------------------------------------------------------------------===//
				1019	// Lexer
				1020	//===----------------------------------------------------------------------===//
				1021
				1022	// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
				1023	// of these for known things.
				1024	enum Token {
				1025	tok_eof = -1,
				1026
				1027	// commands
				1028	tok_def = -2, tok_extern = -3,
				1029
				1030	// primary
				1031	tok_identifier = -4, tok_number = -5,
				1032
				1033	// control
				1034	tok_if = -6, tok_then = -7, tok_else = -8,
				1035	tok_for = -9, tok_in = -10,
				1036
				1037	// operators
				1038	tok_binary = -11, tok_unary = -12,
				1039
				1040	// var definition
				1041	tok_var = -13
				1042	};
				1043
				1044	static std::string IdentifierStr; // Filled in if tok_identifier
				1045	static double NumVal; // Filled in if tok_number
				1046
				1047	/// gettok - Return the next token from standard input.
				1048	static int gettok() {
				1049	static int LastChar = ' ';
				1050
				1051	// Skip any whitespace.
				1052	while (isspace(LastChar))
				1053	LastChar = getchar();
				1054
				1055	if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
				1056	IdentifierStr = LastChar;
				1057	while (isalnum((LastChar = getchar())))
				1058	IdentifierStr += LastChar;
				1059
				1060	if (IdentifierStr == "def") return tok_def;
				1061	if (IdentifierStr == "extern") return tok_extern;
				1062	if (IdentifierStr == "if") return tok_if;
				1063	if (IdentifierStr == "then") return tok_then;
				1064	if (IdentifierStr == "else") return tok_else;
				1065	if (IdentifierStr == "for") return tok_for;
				1066	if (IdentifierStr == "in") return tok_in;
				1067	if (IdentifierStr == "binary") return tok_binary;
				1068	if (IdentifierStr == "unary") return tok_unary;
				1069	if (IdentifierStr == "var") return tok_var;
				1070	return tok_identifier;
				1071	}
				1072
				1073	if (isdigit(LastChar) \|\| LastChar == '.') { // Number: [0-9.]+
				1074	std::string NumStr;
				1075	do {
				1076	NumStr += LastChar;
				1077	LastChar = getchar();
				1078	} while (isdigit(LastChar) \|\| LastChar == '.');
				1079
				1080	NumVal = strtod(NumStr.c_str(), 0);
				1081	return tok_number;
				1082	}
				1083
				1084	if (LastChar == '#') {
				1085	// Comment until end of line.
				1086	do LastChar = getchar();
				1087	while (LastChar != EOF && LastChar != '\n' & LastChar != '\r');
				1088
				1089	if (LastChar != EOF)
				1090	return gettok();
				1091	}
				1092
				1093	// Check for end of file. Don't eat the EOF.
				1094	if (LastChar == EOF)
				1095	return tok_eof;
				1096
				1097	// Otherwise, just return the character as its ascii value.
				1098	int ThisChar = LastChar;
				1099	LastChar = getchar();
				1100	return ThisChar;
				1101	}
				1102
				1103	//===----------------------------------------------------------------------===//
				1104	// Abstract Syntax Tree (aka Parse Tree)
				1105	//===----------------------------------------------------------------------===//
				1106
				1107	/// ExprAST - Base class for all expression nodes.
				1108	class ExprAST {
				1109	public:
				1110	virtual ~ExprAST() {}
				1111	virtual Value *Codegen() = 0;
				1112	};
				1113
				1114	/// NumberExprAST - Expression class for numeric literals like "1.0".
				1115	class NumberExprAST : public ExprAST {
				1116	double Val;
				1117	public:
				1118	NumberExprAST(double val) : Val(val) {}
				1119	virtual Value *Codegen();
				1120	};
				1121
				1122	/// VariableExprAST - Expression class for referencing a variable, like "a".
				1123	class VariableExprAST : public ExprAST {
				1124	std::string Name;
				1125	public:
				1126	VariableExprAST(const std::string &name) : Name(name) {}
				1127	const std::string &getName() const { return Name; }
				1128	virtual Value *Codegen();
				1129	};
				1130
				1131	/// UnaryExprAST - Expression class for a unary operator.
				1132	class UnaryExprAST : public ExprAST {
				1133	char Opcode;
				1134	ExprAST *Operand;
				1135	public:
				1136	UnaryExprAST(char opcode, ExprAST *operand)
				1137	: Opcode(opcode), Operand(operand) {}
				1138	virtual Value *Codegen();
				1139	};
				1140
				1141	/// BinaryExprAST - Expression class for a binary operator.
				1142	class BinaryExprAST : public ExprAST {
				1143	char Op;
				1144	ExprAST LHS, RHS;
				1145	public:
				1146	BinaryExprAST(char op, ExprAST lhs, ExprAST rhs)
				1147	: Op(op), LHS(lhs), RHS(rhs) {}
				1148	virtual Value *Codegen();
				1149	};
				1150
				1151	/// CallExprAST - Expression class for function calls.
				1152	class CallExprAST : public ExprAST {
				1153	std::string Callee;
				1154	std::vector<ExprAST*> Args;
				1155	public:
				1156	CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
				1157	: Callee(callee), Args(args) {}
				1158	virtual Value *Codegen();
				1159	};
				1160
				1161	/// IfExprAST - Expression class for if/then/else.
				1162	class IfExprAST : public ExprAST {
				1163	ExprAST Cond, Then, *Else;
				1164	public:
				1165	IfExprAST(ExprAST cond, ExprAST then, ExprAST *_else)
				1166	: Cond(cond), Then(then), Else(_else) {}
				1167	virtual Value *Codegen();
				1168	};
				1169
				1170	/// ForExprAST - Expression class for for/in.
				1171	class ForExprAST : public ExprAST {
				1172	std::string VarName;
				1173	ExprAST Start, End, Step, Body;
				1174	public:
				1175	ForExprAST(const std::string &varname, ExprAST start, ExprAST end,
				1176	ExprAST step, ExprAST body)
				1177	: VarName(varname), Start(start), End(end), Step(step), Body(body) {}
				1178	virtual Value *Codegen();
				1179	};
				1180
				1181	/// VarExprAST - Expression class for var/in
				1182	class VarExprAST : public ExprAST {
				1183	std::vector<std::pair<std::string, ExprAST*> > VarNames;
				1184	ExprAST *Body;
				1185	public:
				1186	VarExprAST(const std::vector<std::pair<std::string, ExprAST*> > &varnames,
				1187	ExprAST *body)
				1188	: VarNames(varnames), Body(body) {}
				1189
				1190	virtual Value *Codegen();
				1191	};
				1192
				1193	/// PrototypeAST - This class represents the "prototype" for a function,
				1194	/// which captures its argument names as well as if it is an operator.
				1195	class PrototypeAST {
				1196	std::string Name;
				1197	std::vector<std::string> Args;
				1198	bool isOperator;
				1199	unsigned Precedence; // Precedence if a binary op.
				1200	public:
				1201	PrototypeAST(const std::string &name, const std::vector<std::string> &args,
				1202	bool isoperator = false, unsigned prec = 0)
				1203	: Name(name), Args(args), isOperator(isoperator), Precedence(prec) {}
				1204
				1205	bool isUnaryOp() const { return isOperator && Args.size() == 1; }
				1206	bool isBinaryOp() const { return isOperator && Args.size() == 2; }
				1207
				1208	char getOperatorName() const {
				1209	assert(isUnaryOp() \|\| isBinaryOp());
				1210	return Name[Name.size()-1];
				1211	}
				1212
				1213	unsigned getBinaryPrecedence() const { return Precedence; }
				1214
				1215	Function *Codegen();
				1216
				1217	void CreateArgumentAllocas(Function *F);
				1218	};
				1219
				1220	/// FunctionAST - This class represents a function definition itself.
				1221	class FunctionAST {
				1222	PrototypeAST *Proto;
				1223	ExprAST *Body;
				1224	public:
				1225	FunctionAST(PrototypeAST proto, ExprAST body)
				1226	: Proto(proto), Body(body) {}
				1227
				1228	Function *Codegen();
				1229	};
				1230
				1231	//===----------------------------------------------------------------------===//
				1232	// Parser
				1233	//===----------------------------------------------------------------------===//
				1234
				1235	/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
				1236	/// token the parser it looking at. getNextToken reads another token from the
				1237	/// lexer and updates CurTok with its results.
				1238	static int CurTok;
				1239	static int getNextToken() {
				1240	return CurTok = gettok();
				1241	}
				1242
				1243	/// BinopPrecedence - This holds the precedence for each binary operator that is
				1244	/// defined.
				1245	static std::map<char, int> BinopPrecedence;
				1246
				1247	/// GetTokPrecedence - Get the precedence of the pending binary operator token.
				1248	static int GetTokPrecedence() {
				1249	if (!isascii(CurTok))
				1250	return -1;
				1251
				1252	// Make sure it's a declared binop.
				1253	int TokPrec = BinopPrecedence[CurTok];
				1254	if (TokPrec <= 0) return -1;
				1255	return TokPrec;
				1256	}
				1257
				1258	/// Error* - These are little helper functions for error handling.
				1259	ExprAST Error(const char Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
				1260	PrototypeAST ErrorP(const char Str) { Error(Str); return 0; }
				1261	FunctionAST ErrorF(const char Str) { Error(Str); return 0; }
				1262
				1263	static ExprAST *ParseExpression();
				1264
				1265	/// identifierexpr
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	1266	/// ::= identifier
				1267	/// ::= identifier '(' expression* ')'
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	1268	static ExprAST *ParseIdentifierExpr() {
				1269	std::string IdName = IdentifierStr;
				1270
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	1271	getNextToken(); // eat identifier.
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	1272
				1273	if (CurTok != '(') // Simple variable ref.
				1274	return new VariableExprAST(IdName);
				1275
				1276	// Call.
				1277	getNextToken(); // eat (
				1278	std::vector<ExprAST*> Args;
				1279	if (CurTok != ')') {
				1280	while (1) {
				1281	ExprAST *Arg = ParseExpression();
				1282	if (!Arg) return 0;
				1283	Args.push_back(Arg);
				1284
				1285	if (CurTok == ')') break;
				1286
				1287	if (CurTok != ',')
				1288	return Error("Expected ')'");
				1289	getNextToken();
				1290	}
				1291	}
				1292
				1293	// Eat the ')'.
				1294	getNextToken();
				1295
				1296	return new CallExprAST(IdName, Args);
				1297	}
				1298
				1299	/// numberexpr ::= number
				1300	static ExprAST *ParseNumberExpr() {
				1301	ExprAST *Result = new NumberExprAST(NumVal);
				1302	getNextToken(); // consume the number
				1303	return Result;
				1304	}
				1305
				1306	/// parenexpr ::= '(' expression ')'
				1307	static ExprAST *ParseParenExpr() {
				1308	getNextToken(); // eat (.
				1309	ExprAST *V = ParseExpression();
				1310	if (!V) return 0;
				1311
				1312	if (CurTok != ')')
				1313	return Error("expected ')'");
				1314	getNextToken(); // eat ).
				1315	return V;
				1316	}
				1317
				1318	/// ifexpr ::= 'if' expression 'then' expression 'else' expression
				1319	static ExprAST *ParseIfExpr() {
				1320	getNextToken(); // eat the if.
				1321
				1322	// condition.
				1323	ExprAST *Cond = ParseExpression();
				1324	if (!Cond) return 0;
				1325
				1326	if (CurTok != tok_then)
				1327	return Error("expected then");
				1328	getNextToken(); // eat the then
				1329
				1330	ExprAST *Then = ParseExpression();
				1331	if (Then == 0) return 0;
				1332
				1333	if (CurTok != tok_else)
				1334	return Error("expected else");
				1335
				1336	getNextToken();
				1337
				1338	ExprAST *Else = ParseExpression();
				1339	if (!Else) return 0;
				1340
				1341	return new IfExprAST(Cond, Then, Else);
				1342	}
				1343
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	1344	/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	1345	static ExprAST *ParseForExpr() {
				1346	getNextToken(); // eat the for.
				1347
				1348	if (CurTok != tok_identifier)
				1349	return Error("expected identifier after for");
				1350
				1351	std::string IdName = IdentifierStr;
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	1352	getNextToken(); // eat identifier.
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	1353
				1354	if (CurTok != '=')
				1355	return Error("expected '=' after for");
				1356	getNextToken(); // eat '='.
				1357
				1358
				1359	ExprAST *Start = ParseExpression();
				1360	if (Start == 0) return 0;
				1361	if (CurTok != ',')
				1362	return Error("expected ',' after for start value");
				1363	getNextToken();
				1364
				1365	ExprAST *End = ParseExpression();
				1366	if (End == 0) return 0;
				1367
				1368	// The step value is optional.
				1369	ExprAST *Step = 0;
				1370	if (CurTok == ',') {
				1371	getNextToken();
				1372	Step = ParseExpression();
				1373	if (Step == 0) return 0;
				1374	}
				1375
				1376	if (CurTok != tok_in)
				1377	return Error("expected 'in' after for");
				1378	getNextToken(); // eat 'in'.
				1379
				1380	ExprAST *Body = ParseExpression();
				1381	if (Body == 0) return 0;
				1382
				1383	return new ForExprAST(IdName, Start, End, Step, Body);
				1384	}
				1385
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	1386	/// varexpr ::= 'var' identifier ('=' expression)?
				1387	// (',' identifier ('=' expression)?)* 'in' expression
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	1388	static ExprAST *ParseVarExpr() {
				1389	getNextToken(); // eat the var.
				1390
				1391	std::vector<std::pair<std::string, ExprAST*> > VarNames;
				1392
				1393	// At least one variable name is required.
				1394	if (CurTok != tok_identifier)
				1395	return Error("expected identifier after var");
				1396
				1397	while (1) {
				1398	std::string Name = IdentifierStr;
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	1399	getNextToken(); // eat identifier.
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	1400
				1401	// Read the optional initializer.
				1402	ExprAST *Init = 0;
				1403	if (CurTok == '=') {
				1404	getNextToken(); // eat the '='.
				1405
				1406	Init = ParseExpression();
				1407	if (Init == 0) return 0;
				1408	}
				1409
				1410	VarNames.push_back(std::make_pair(Name, Init));
				1411
				1412	// End of var list, exit loop.
				1413	if (CurTok != ',') break;
				1414	getNextToken(); // eat the ','.
				1415
				1416	if (CurTok != tok_identifier)
				1417	return Error("expected identifier list after var");
				1418	}
				1419
				1420	// At this point, we have to have 'in'.
				1421	if (CurTok != tok_in)
				1422	return Error("expected 'in' keyword after 'var'");
				1423	getNextToken(); // eat 'in'.
				1424
				1425	ExprAST *Body = ParseExpression();
				1426	if (Body == 0) return 0;
				1427
				1428	return new VarExprAST(VarNames, Body);
				1429	}
				1430
				1431
				1432	/// primary
				1433	/// ::= identifierexpr
				1434	/// ::= numberexpr
				1435	/// ::= parenexpr
				1436	/// ::= ifexpr
				1437	/// ::= forexpr
				1438	/// ::= varexpr
				1439	static ExprAST *ParsePrimary() {
				1440	switch (CurTok) {
				1441	default: return Error("unknown token when expecting an expression");
				1442	case tok_identifier: return ParseIdentifierExpr();
				1443	case tok_number: return ParseNumberExpr();
				1444	case '(': return ParseParenExpr();
				1445	case tok_if: return ParseIfExpr();
				1446	case tok_for: return ParseForExpr();
				1447	case tok_var: return ParseVarExpr();
				1448	}
				1449	}
				1450
				1451	/// unary
				1452	/// ::= primary
				1453	/// ::= '!' unary
				1454	static ExprAST *ParseUnary() {
				1455	// If the current token is not an operator, it must be a primary expr.
				1456	if (!isascii(CurTok) \|\| CurTok == '(' \|\| CurTok == ',')
				1457	return ParsePrimary();
				1458
				1459	// If this is a unary operator, read it.
				1460	int Opc = CurTok;
				1461	getNextToken();
				1462	if (ExprAST *Operand = ParseUnary())
				1463	return new UnaryExprAST(Opc, Operand);
				1464	return 0;
				1465	}
				1466
				1467	/// binoprhs
				1468	/// ::= ('+' unary)*
				1469	static ExprAST ParseBinOpRHS(int ExprPrec, ExprAST LHS) {
				1470	// If this is a binop, find its precedence.
				1471	while (1) {
				1472	int TokPrec = GetTokPrecedence();
				1473
				1474	// If this is a binop that binds at least as tightly as the current binop,
				1475	// consume it, otherwise we are done.
				1476	if (TokPrec < ExprPrec)
				1477	return LHS;
				1478
				1479	// Okay, we know this is a binop.
				1480	int BinOp = CurTok;
				1481	getNextToken(); // eat binop
				1482
				1483	// Parse the unary expression after the binary operator.
				1484	ExprAST *RHS = ParseUnary();
				1485	if (!RHS) return 0;
				1486
				1487	// If BinOp binds less tightly with RHS than the operator after RHS, let
				1488	// the pending operator take RHS as its LHS.
				1489	int NextPrec = GetTokPrecedence();
				1490	if (TokPrec < NextPrec) {
				1491	RHS = ParseBinOpRHS(TokPrec+1, RHS);
				1492	if (RHS == 0) return 0;
				1493	}
				1494
				1495	// Merge LHS/RHS.
				1496	LHS = new BinaryExprAST(BinOp, LHS, RHS);
				1497	}
				1498	}
				1499
				1500	/// expression
				1501	/// ::= unary binoprhs
				1502	///
				1503	static ExprAST *ParseExpression() {
				1504	ExprAST *LHS = ParseUnary();
				1505	if (!LHS) return 0;
				1506
				1507	return ParseBinOpRHS(0, LHS);
				1508	}
				1509
				1510	/// prototype
				1511	/// ::= id '(' id* ')'
				1512	/// ::= binary LETTER number? (id, id)
				1513	/// ::= unary LETTER (id)
				1514	static PrototypeAST *ParsePrototype() {
				1515	std::string FnName;
				1516
				1517	int Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
				1518	unsigned BinaryPrecedence = 30;
				1519
				1520	switch (CurTok) {
				1521	default:
				1522	return ErrorP("Expected function name in prototype");
				1523	case tok_identifier:
				1524	FnName = IdentifierStr;
				1525	Kind = 0;
				1526	getNextToken();
				1527	break;
				1528	case tok_unary:
				1529	getNextToken();
				1530	if (!isascii(CurTok))
				1531	return ErrorP("Expected unary operator");
				1532	FnName = "unary";
				1533	FnName += (char)CurTok;
				1534	Kind = 1;
				1535	getNextToken();
				1536	break;
				1537	case tok_binary:
				1538	getNextToken();
				1539	if (!isascii(CurTok))
				1540	return ErrorP("Expected binary operator");
				1541	FnName = "binary";
				1542	FnName += (char)CurTok;
				1543	Kind = 2;
				1544	getNextToken();
				1545
				1546	// Read the precedence if present.
				1547	if (CurTok == tok_number) {
				1548	if (NumVal < 1 \|\| NumVal > 100)
				1549	return ErrorP("Invalid precedecnce: must be 1..100");
				1550	BinaryPrecedence = (unsigned)NumVal;
				1551	getNextToken();
				1552	}
				1553	break;
				1554	}
				1555
				1556	if (CurTok != '(')
				1557	return ErrorP("Expected '(' in prototype");
				1558
				1559	std::vector<std::string> ArgNames;
				1560	while (getNextToken() == tok_identifier)
				1561	ArgNames.push_back(IdentifierStr);
				1562	if (CurTok != ')')
				1563	return ErrorP("Expected ')' in prototype");
				1564
				1565	// success.
				1566	getNextToken(); // eat ')'.
				1567
				1568	// Verify right number of names for operator.
				1569	if (Kind && ArgNames.size() != Kind)
				1570	return ErrorP("Invalid number of operands for operator");
				1571
				1572	return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);
				1573	}
				1574
				1575	/// definition ::= 'def' prototype expression
				1576	static FunctionAST *ParseDefinition() {
				1577	getNextToken(); // eat def.
				1578	PrototypeAST *Proto = ParsePrototype();
				1579	if (Proto == 0) return 0;
				1580
				1581	if (ExprAST *E = ParseExpression())
				1582	return new FunctionAST(Proto, E);
				1583	return 0;
				1584	}
				1585
				1586	/// toplevelexpr ::= expression
				1587	static FunctionAST *ParseTopLevelExpr() {
				1588	if (ExprAST *E = ParseExpression()) {
				1589	// Make an anonymous proto.
				1590	PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
				1591	return new FunctionAST(Proto, E);
				1592	}
				1593	return 0;
				1594	}
				1595
				1596	/// external ::= 'extern' prototype
				1597	static PrototypeAST *ParseExtern() {
				1598	getNextToken(); // eat extern.
				1599	return ParsePrototype();
				1600	}
				1601
				1602	//===----------------------------------------------------------------------===//
				1603	// Code Generation
				1604	//===----------------------------------------------------------------------===//
				1605
				1606	static Module *TheModule;
				1607	static LLVMFoldingBuilder Builder;
				1608	static std::map<std::string, AllocaInst*> NamedValues;
				1609	static FunctionPassManager *TheFPM;
				1610
				1611	Value ErrorV(const char Str) { Error(Str); return 0; }
				1612
				1613	/// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
				1614	/// the function. This is used for mutable variables etc.
				1615	static AllocaInst CreateEntryBlockAlloca(Function TheFunction,
				1616	const std::string &VarName) {
				1617	LLVMBuilder TmpB(&TheFunction->getEntryBlock(),
				1618	TheFunction->getEntryBlock().begin());
				1619	return TmpB.CreateAlloca(Type::DoubleTy, 0, VarName.c_str());
				1620	}
				1621
				1622
				1623	Value *NumberExprAST::Codegen() {
				1624	return ConstantFP::get(Type::DoubleTy, APFloat(Val));
				1625	}
				1626
				1627	Value *VariableExprAST::Codegen() {
				1628	// Look this variable up in the function.
				1629	Value *V = NamedValues[Name];
				1630	if (V == 0) return ErrorV("Unknown variable name");
				1631
				1632	// Load the value.
				1633	return Builder.CreateLoad(V, Name.c_str());
				1634	}
				1635
				1636	Value *UnaryExprAST::Codegen() {
				1637	Value *OperandV = Operand->Codegen();
				1638	if (OperandV == 0) return 0;
				1639
				1640	Function *F = TheModule->getFunction(std::string("unary")+Opcode);
				1641	if (F == 0)
				1642	return ErrorV("Unknown unary operator");
				1643
				1644	return Builder.CreateCall(F, OperandV, "unop");
				1645	}
				1646
				1647
				1648	Value *BinaryExprAST::Codegen() {
				1649	// Special case '=' because we don't want to emit the LHS as an expression.
				1650	if (Op == '=') {
				1651	// Assignment requires the LHS to be an identifier.
				1652	VariableExprAST LHSE = dynamic_cast<VariableExprAST>(LHS);
				1653	if (!LHSE)
				1654	return ErrorV("destination of '=' must be a variable");
				1655	// Codegen the RHS.
				1656	Value *Val = RHS->Codegen();
				1657	if (Val == 0) return 0;
				1658
				1659	// Look up the name.
				1660	Value *Variable = NamedValues[LHSE->getName()];
				1661	if (Variable == 0) return ErrorV("Unknown variable name");
				1662
				1663	Builder.CreateStore(Val, Variable);
				1664	return Val;
				1665	}
				1666
				1667
				1668	Value *L = LHS->Codegen();
				1669	Value *R = RHS->Codegen();
				1670	if (L == 0 \|\| R == 0) return 0;
				1671
				1672	switch (Op) {
				1673	case '+': return Builder.CreateAdd(L, R, "addtmp");
				1674	case '-': return Builder.CreateSub(L, R, "subtmp");
				1675	case '*': return Builder.CreateMul(L, R, "multmp");
				1676	case '<':
Chris Lattner	7115521	2007-11-06 01:39:12 +0000	[diff] [blame^]	1677	L = Builder.CreateFCmpULT(L, R, "cmptmp");
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	1678	// Convert bool 0/1 to double 0.0 or 1.0
				1679	return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
				1680	default: break;
				1681	}
				1682
				1683	// If it wasn't a builtin binary operator, it must be a user defined one. Emit
				1684	// a call to it.
				1685	Function *F = TheModule->getFunction(std::string("binary")+Op);
				1686	assert(F && "binary operator not found!");
				1687
				1688	Value *Ops[] = { L, R };
				1689	return Builder.CreateCall(F, Ops, Ops+2, "binop");
				1690	}
				1691
				1692	Value *CallExprAST::Codegen() {
				1693	// Look up the name in the global module table.
				1694	Function *CalleeF = TheModule->getFunction(Callee);
				1695	if (CalleeF == 0)
				1696	return ErrorV("Unknown function referenced");
				1697
				1698	// If argument mismatch error.
				1699	if (CalleeF->arg_size() != Args.size())
				1700	return ErrorV("Incorrect # arguments passed");
				1701
				1702	std::vector<Value*> ArgsV;
				1703	for (unsigned i = 0, e = Args.size(); i != e; ++i) {
				1704	ArgsV.push_back(Args[i]->Codegen());
				1705	if (ArgsV.back() == 0) return 0;
				1706	}
				1707
				1708	return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
				1709	}
				1710
				1711	Value *IfExprAST::Codegen() {
				1712	Value *CondV = Cond->Codegen();
				1713	if (CondV == 0) return 0;
				1714
				1715	// Convert condition to a bool by comparing equal to 0.0.
				1716	CondV = Builder.CreateFCmpONE(CondV,
				1717	ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
				1718	"ifcond");
				1719
				1720	Function *TheFunction = Builder.GetInsertBlock()->getParent();
				1721
				1722	// Create blocks for the then and else cases. Insert the 'then' block at the
				1723	// end of the function.
				1724	BasicBlock *ThenBB = new BasicBlock("then", TheFunction);
				1725	BasicBlock *ElseBB = new BasicBlock("else");
				1726	BasicBlock *MergeBB = new BasicBlock("ifcont");
				1727
				1728	Builder.CreateCondBr(CondV, ThenBB, ElseBB);
				1729
				1730	// Emit then value.
				1731	Builder.SetInsertPoint(ThenBB);
				1732
				1733	Value *ThenV = Then->Codegen();
				1734	if (ThenV == 0) return 0;
				1735
				1736	Builder.CreateBr(MergeBB);
				1737	// Codegen of 'Then' can change the current block, update ThenBB for the PHI.
				1738	ThenBB = Builder.GetInsertBlock();
				1739
				1740	// Emit else block.
				1741	TheFunction->getBasicBlockList().push_back(ElseBB);
				1742	Builder.SetInsertPoint(ElseBB);
				1743
				1744	Value *ElseV = Else->Codegen();
				1745	if (ElseV == 0) return 0;
				1746
				1747	Builder.CreateBr(MergeBB);
				1748	// Codegen of 'Else' can change the current block, update ElseBB for the PHI.
				1749	ElseBB = Builder.GetInsertBlock();
				1750
				1751	// Emit merge block.
				1752	TheFunction->getBasicBlockList().push_back(MergeBB);
				1753	Builder.SetInsertPoint(MergeBB);
				1754	PHINode *PN = Builder.CreatePHI(Type::DoubleTy, "iftmp");
				1755
				1756	PN->addIncoming(ThenV, ThenBB);
				1757	PN->addIncoming(ElseV, ElseBB);
				1758	return PN;
				1759	}
				1760
				1761	Value *ForExprAST::Codegen() {
				1762	// Output this as:
				1763	// var = alloca double
				1764	// ...
				1765	// start = startexpr
				1766	// store start -> var
				1767	// goto loop
				1768	// loop:
				1769	// ...
				1770	// bodyexpr
				1771	// ...
				1772	// loopend:
				1773	// step = stepexpr
				1774	// endcond = endexpr
				1775	//
				1776	// curvar = load var
				1777	// nextvar = curvar + step
				1778	// store nextvar -> var
				1779	// br endcond, loop, endloop
				1780	// outloop:
				1781
				1782	Function *TheFunction = Builder.GetInsertBlock()->getParent();
				1783
				1784	// Create an alloca for the variable in the entry block.
				1785	AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
				1786
				1787	// Emit the start code first, without 'variable' in scope.
				1788	Value *StartVal = Start->Codegen();
				1789	if (StartVal == 0) return 0;
				1790
				1791	// Store the value into the alloca.
				1792	Builder.CreateStore(StartVal, Alloca);
				1793
				1794	// Make the new basic block for the loop header, inserting after current
				1795	// block.
				1796	BasicBlock *PreheaderBB = Builder.GetInsertBlock();
				1797	BasicBlock *LoopBB = new BasicBlock("loop", TheFunction);
				1798
				1799	// Insert an explicit fall through from the current block to the LoopBB.
				1800	Builder.CreateBr(LoopBB);
				1801
				1802	// Start insertion in LoopBB.
				1803	Builder.SetInsertPoint(LoopBB);
				1804
				1805	// Within the loop, the variable is defined equal to the PHI node. If it
				1806	// shadows an existing variable, we have to restore it, so save it now.
				1807	AllocaInst *OldVal = NamedValues[VarName];
				1808	NamedValues[VarName] = Alloca;
				1809
				1810	// Emit the body of the loop. This, like any other expr, can change the
				1811	// current BB. Note that we ignore the value computed by the body, but don't
				1812	// allow an error.
				1813	if (Body->Codegen() == 0)
				1814	return 0;
				1815
				1816	// Emit the step value.
				1817	Value *StepVal;
				1818	if (Step) {
				1819	StepVal = Step->Codegen();
				1820	if (StepVal == 0) return 0;
				1821	} else {
				1822	// If not specified, use 1.0.
				1823	StepVal = ConstantFP::get(Type::DoubleTy, APFloat(1.0));
				1824	}
				1825
				1826	// Compute the end condition.
				1827	Value *EndCond = End->Codegen();
				1828	if (EndCond == 0) return EndCond;
				1829
				1830	// Reload, increment, and restore the alloca. This handles the case where
				1831	// the body of the loop mutates the variable.
				1832	Value *CurVar = Builder.CreateLoad(Alloca, VarName.c_str());
				1833	Value *NextVar = Builder.CreateAdd(CurVar, StepVal, "nextvar");
				1834	Builder.CreateStore(NextVar, Alloca);
				1835
				1836	// Convert condition to a bool by comparing equal to 0.0.
				1837	EndCond = Builder.CreateFCmpONE(EndCond,
				1838	ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
				1839	"loopcond");
				1840
				1841	// Create the "after loop" block and insert it.
				1842	BasicBlock *LoopEndBB = Builder.GetInsertBlock();
				1843	BasicBlock *AfterBB = new BasicBlock("afterloop", TheFunction);
				1844
				1845	// Insert the conditional branch into the end of LoopEndBB.
				1846	Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
				1847
				1848	// Any new code will be inserted in AfterBB.
				1849	Builder.SetInsertPoint(AfterBB);
				1850
				1851	// Restore the unshadowed variable.
				1852	if (OldVal)
				1853	NamedValues[VarName] = OldVal;
				1854	else
				1855	NamedValues.erase(VarName);
				1856
				1857
				1858	// for expr always returns 0.0.
				1859	return Constant::getNullValue(Type::DoubleTy);
				1860	}
				1861
				1862	Value *VarExprAST::Codegen() {
				1863	std::vector<AllocaInst *> OldBindings;
				1864
				1865	Function *TheFunction = Builder.GetInsertBlock()->getParent();
				1866
				1867	// Register all variables and emit their initializer.
				1868	for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
				1869	const std::string &VarName = VarNames[i].first;
				1870	ExprAST *Init = VarNames[i].second;
				1871
				1872	// Emit the initializer before adding the variable to scope, this prevents
				1873	// the initializer from referencing the variable itself, and permits stuff
				1874	// like this:
				1875	// var a = 1 in
				1876	// var a = a in ... # refers to outer 'a'.
				1877	Value *InitVal;
				1878	if (Init) {
				1879	InitVal = Init->Codegen();
				1880	if (InitVal == 0) return 0;
				1881	} else { // If not specified, use 0.0.
				1882	InitVal = ConstantFP::get(Type::DoubleTy, APFloat(0.0));
				1883	}
				1884
				1885	AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
				1886	Builder.CreateStore(InitVal, Alloca);
				1887
				1888	// Remember the old variable binding so that we can restore the binding when
				1889	// we unrecurse.
				1890	OldBindings.push_back(NamedValues[VarName]);
				1891
				1892	// Remember this binding.
				1893	NamedValues[VarName] = Alloca;
				1894	}
				1895
				1896	// Codegen the body, now that all vars are in scope.
				1897	Value *BodyVal = Body->Codegen();
				1898	if (BodyVal == 0) return 0;
				1899
				1900	// Pop all our variables from scope.
				1901	for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
				1902	NamedValues[VarNames[i].first] = OldBindings[i];
				1903
				1904	// Return the body computation.
				1905	return BodyVal;
				1906	}
				1907
				1908
				1909	Function *PrototypeAST::Codegen() {
				1910	// Make the function type: double(double,double) etc.
				1911	std::vector<const Type*> Doubles(Args.size(), Type::DoubleTy);
				1912	FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
				1913
				1914	Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
				1915
				1916	// If F conflicted, there was already something named 'Name'. If it has a
				1917	// body, don't allow redefinition or reextern.
				1918	if (F->getName() != Name) {
				1919	// Delete the one we just made and get the existing one.
				1920	F->eraseFromParent();
				1921	F = TheModule->getFunction(Name);
				1922
				1923	// If F already has a body, reject this.
				1924	if (!F->empty()) {
				1925	ErrorF("redefinition of function");
				1926	return 0;
				1927	}
				1928
				1929	// If F took a different number of args, reject.
				1930	if (F->arg_size() != Args.size()) {
				1931	ErrorF("redefinition of function with different # args");
				1932	return 0;
				1933	}
				1934	}
				1935
				1936	// Set names for all arguments.
				1937	unsigned Idx = 0;
				1938	for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size();
				1939	++AI, ++Idx)
				1940	AI->setName(Args[Idx]);
				1941
				1942	return F;
				1943	}
				1944
				1945	/// CreateArgumentAllocas - Create an alloca for each argument and register the
				1946	/// argument in the symbol table so that references to it will succeed.
				1947	void PrototypeAST::CreateArgumentAllocas(Function *F) {
				1948	Function::arg_iterator AI = F->arg_begin();
				1949	for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
				1950	// Create an alloca for this variable.
				1951	AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
				1952
				1953	// Store the initial value into the alloca.
				1954	Builder.CreateStore(AI, Alloca);
				1955
				1956	// Add arguments to variable symbol table.
				1957	NamedValues[Args[Idx]] = Alloca;
				1958	}
				1959	}
				1960
				1961
				1962	Function *FunctionAST::Codegen() {
				1963	NamedValues.clear();
				1964
				1965	Function *TheFunction = Proto->Codegen();
				1966	if (TheFunction == 0)
				1967	return 0;
				1968
				1969	// If this is an operator, install it.
				1970	if (Proto->isBinaryOp())
				1971	BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence();
				1972
				1973	// Create a new basic block to start insertion into.
				1974	BasicBlock *BB = new BasicBlock("entry", TheFunction);
				1975	Builder.SetInsertPoint(BB);
				1976
				1977	// Add all arguments to the symbol table and create their allocas.
				1978	Proto->CreateArgumentAllocas(TheFunction);
				1979
				1980	if (Value *RetVal = Body->Codegen()) {
				1981	// Finish off the function.
				1982	Builder.CreateRet(RetVal);
				1983
				1984	// Validate the generated code, checking for consistency.
				1985	verifyFunction(*TheFunction);
				1986
				1987	// Optimize the function.
				1988	TheFPM->run(*TheFunction);
				1989
				1990	return TheFunction;
				1991	}
				1992
				1993	// Error reading body, remove function.
				1994	TheFunction->eraseFromParent();
				1995
				1996	if (Proto->isBinaryOp())
				1997	BinopPrecedence.erase(Proto->getOperatorName());
				1998	return 0;
				1999	}
				2000
				2001	//===----------------------------------------------------------------------===//
				2002	// Top-Level parsing and JIT Driver
				2003	//===----------------------------------------------------------------------===//
				2004
				2005	static ExecutionEngine *TheExecutionEngine;
				2006
				2007	static void HandleDefinition() {
				2008	if (FunctionAST *F = ParseDefinition()) {
				2009	if (Function *LF = F->Codegen()) {
				2010	fprintf(stderr, "Read function definition:");
				2011	LF->dump();
				2012	}
				2013	} else {
				2014	// Skip token for error recovery.
				2015	getNextToken();
				2016	}
				2017	}
				2018
				2019	static void HandleExtern() {
				2020	if (PrototypeAST *P = ParseExtern()) {
				2021	if (Function *F = P->Codegen()) {
				2022	fprintf(stderr, "Read extern: ");
				2023	F->dump();
				2024	}
				2025	} else {
				2026	// Skip token for error recovery.
				2027	getNextToken();
				2028	}
				2029	}
				2030
				2031	static void HandleTopLevelExpression() {
				2032	// Evaluate a top level expression into an anonymous function.
				2033	if (FunctionAST *F = ParseTopLevelExpr()) {
				2034	if (Function *LF = F->Codegen()) {
				2035	// JIT the function, returning a function pointer.
				2036	void *FPtr = TheExecutionEngine->getPointerToFunction(LF);
				2037
				2038	// Cast it to the right type (takes no arguments, returns a double) so we
				2039	// can call it as a native function.
				2040	double (FP)() = (double ()())FPtr;
				2041	fprintf(stderr, "Evaluated to %f\n", FP());
				2042	}
				2043	} else {
				2044	// Skip token for error recovery.
				2045	getNextToken();
				2046	}
				2047	}
				2048
				2049	/// top ::= definition \| external \| expression \| ';'
				2050	static void MainLoop() {
				2051	while (1) {
				2052	fprintf(stderr, "ready> ");
				2053	switch (CurTok) {
				2054	case tok_eof: return;
				2055	case ';': getNextToken(); break; // ignore top level semicolons.
				2056	case tok_def: HandleDefinition(); break;
				2057	case tok_extern: HandleExtern(); break;
				2058	default: HandleTopLevelExpression(); break;
				2059	}
				2060	}
				2061	}
				2062
				2063
				2064
				2065	//===----------------------------------------------------------------------===//
				2066	// "Library" functions that can be "extern'd" from user code.
				2067	//===----------------------------------------------------------------------===//
				2068
				2069	/// putchard - putchar that takes a double and returns 0.
				2070	extern "C"
				2071	double putchard(double X) {
				2072	putchar((char)X);
				2073	return 0;
				2074	}
				2075
				2076	/// printd - printf that takes a double prints it as "%f\n", returning 0.
				2077	extern "C"
				2078	double printd(double X) {
				2079	printf("%f\n", X);
				2080	return 0;
				2081	}
				2082
				2083	//===----------------------------------------------------------------------===//
				2084	// Main driver code.
				2085	//===----------------------------------------------------------------------===//
				2086
				2087	int main() {
				2088	// Install standard binary operators.
				2089	// 1 is lowest precedence.
				2090	BinopPrecedence['='] = 2;
				2091	BinopPrecedence['<'] = 10;
				2092	BinopPrecedence['+'] = 20;
				2093	BinopPrecedence['-'] = 20;
				2094	BinopPrecedence['*'] = 40; // highest.
				2095
				2096	// Prime the first token.
				2097	fprintf(stderr, "ready> ");
				2098	getNextToken();
				2099
				2100	// Make the module, which holds all the code.
				2101	TheModule = new Module("my cool jit");
				2102
				2103	// Create the JIT.
				2104	TheExecutionEngine = ExecutionEngine::create(TheModule);
				2105
				2106	{
				2107	ExistingModuleProvider OurModuleProvider(TheModule);
				2108	FunctionPassManager OurFPM(&OurModuleProvider);
				2109
				2110	// Set up the optimizer pipeline. Start with registering info about how the
				2111	// target lays out data structures.
				2112	OurFPM.add(new TargetData(*TheExecutionEngine->getTargetData()));
				2113	// Promote allocas to registers.
				2114	OurFPM.add(createPromoteMemoryToRegisterPass());
				2115	// Do simple "peephole" optimizations and bit-twiddling optzns.
				2116	OurFPM.add(createInstructionCombiningPass());
				2117	// Reassociate expressions.
				2118	OurFPM.add(createReassociatePass());
				2119	// Eliminate Common SubExpressions.
				2120	OurFPM.add(createGVNPass());
				2121	// Simplify the control flow graph (deleting unreachable blocks, etc).
				2122	OurFPM.add(createCFGSimplificationPass());
				2123
				2124	// Set the global so the code gen can use this.
				2125	TheFPM = &OurFPM;
				2126
				2127	// Run the main "interpreter loop" now.
				2128	MainLoop();
				2129
				2130	TheFPM = 0;
				2131	} // Free module provider and pass manager.
				2132
				2133
				2134	// Print out all of the generated code.
				2135	TheModule->dump();
				2136	return 0;
				2137	}
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	2138	</pre>
				2139	</div>
				2140
				2141	</div>
				2142
				2143	<!-- *********************************************************************** -->
				2144	<hr>
				2145	<address>
				2146	<a href="http://jigsaw.w3.org/css-validator/check/referer"><img
				2147	src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
				2148	<a href="http://validator.w3.org/check/referer"><img
				2149	src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
				2150
				2151	<a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
				2152	<a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
				2153	Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $
				2154	</address>
				2155	</body>
				2156	</html>