Blame - docs/tutorial/LangImpl7.html - fp2-dev/platform/external/llvm

blob: f5606484eb9ff947d0392787e1cf0763e1cee6db [file] [log] [blame]

Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
				2	"http://www.w3.org/TR/html4/strict.dtd">
				3
				4	<html>
				5	<head>
				6	<title>Kaleidoscope: Extending the Language: Mutable Variables / SSA
				7	construction</title>
				8	<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
				9	<meta name="author" content="Chris Lattner">
				10	<link rel="stylesheet" href="../llvm.css" type="text/css">
				11	</head>
				12
				13	<body>
				14
				15	<div class="doc_title">Kaleidoscope: Extending the Language: Mutable Variables</div>
				16
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	17	<ul>
Chris Lattner	0e555b1	2007-11-05 20:04:56 +0000	[diff] [blame]	18	<li><a href="index.html">Up to Tutorial Index</a></li>
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	19	<li>Chapter 7
				20	<ol>
				21	<li><a href="#intro">Chapter 7 Introduction</a></li>
				22	<li><a href="#why">Why is this a hard problem?</a></li>
				23	<li><a href="#memory">Memory in LLVM</a></li>
				24	<li><a href="#kalvars">Mutable Variables in Kaleidoscope</a></li>
				25	<li><a href="#adjustments">Adjusting Existing Variables for
				26	Mutation</a></li>
				27	<li><a href="#assignment">New Assignment Operator</a></li>
				28	<li><a href="#localvars">User-defined Local Variables</a></li>
				29	<li><a href="#code">Full Code Listing</a></li>
				30	</ol>
				31	</li>
Chris Lattner	0e555b1	2007-11-05 20:04:56 +0000	[diff] [blame]	32	<li><a href="LangImpl8.html">Chapter 8</a>: Conclusion and other useful LLVM
				33	tidbits</li>
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	34	</ul>
				35
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	36	<div class="doc_author">
				37	<p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
				38	</div>
				39
				40	<!-- *********************************************************************** -->
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	41	<div class="doc_section"><a name="intro">Chapter 7 Introduction</a></div>
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	42	<!-- *********************************************************************** -->
				43
				44	<div class="doc_text">
				45
Chris Lattner	128eb86	2007-11-05 19:06:59 +0000	[diff] [blame]	46	<p>Welcome to Chapter 7 of the "<a href="index.html">Implementing a language
				47	with LLVM</a>" tutorial. In chapters 1 through 6, we've built a very
				48	respectable, albeit simple, <a
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	49	href="http://en.wikipedia.org/wiki/Functional_programming">functional
				50	programming language</a>. In our journey, we learned some parsing techniques,
				51	how to build and represent an AST, how to build LLVM IR, and how to optimize
Chris Lattner	b7e6b1a	2007-11-15 04:51:31 +0000	[diff] [blame]	52	the resultant code as well as JIT compile it.</p>
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	53
Chris Lattner	b7e6b1a	2007-11-15 04:51:31 +0000	[diff] [blame]	54	<p>While Kaleidoscope is interesting as a functional language, the fact that it
				55	is functional makes it "too easy" to generate LLVM IR for it. In particular, a
				56	functional language makes it very easy to build LLVM IR directly in <a
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	57	href="http://en.wikipedia.org/wiki/Static_single_assignment_form">SSA form</a>.
				58	Since LLVM requires that the input code be in SSA form, this is a very nice
				59	property and it is often unclear to newcomers how to generate code for an
				60	imperative language with mutable variables.</p>
				61
				62	<p>The short (and happy) summary of this chapter is that there is no need for
				63	your front-end to build SSA form: LLVM provides highly tuned and well tested
				64	support for this, though the way it works is a bit unexpected for some.</p>
				65
				66	</div>
				67
				68	<!-- *********************************************************************** -->
				69	<div class="doc_section"><a name="why">Why is this a hard problem?</a></div>
				70	<!-- *********************************************************************** -->
				71
				72	<div class="doc_text">
				73
				74	<p>
				75	To understand why mutable variables cause complexities in SSA construction,
				76	consider this extremely simple C example:
				77	</p>
				78
				79	<div class="doc_code">
				80	<pre>
				81	int G, H;
				82	int test(_Bool Condition) {
				83	int X;
				84	if (Condition)
				85	X = G;
				86	else
				87	X = H;
				88	return X;
				89	}
				90	</pre>
				91	</div>
				92
				93	<p>In this case, we have the variable "X", whose value depends on the path
				94	executed in the program. Because there are two different possible values for X
				95	before the return instruction, a PHI node is inserted to merge the two values.
				96	The LLVM IR that we want for this example looks like this:</p>
				97
				98	<div class="doc_code">
				99	<pre>
				100	@G = weak global i32 0 ; type of @G is i32*
				101	@H = weak global i32 0 ; type of @H is i32*
				102
				103	define i32 @test(i1 %Condition) {
				104	entry:
				105	br i1 %Condition, label %cond_true, label %cond_false
				106
				107	cond_true:
				108	%X.0 = load i32* @G
				109	br label %cond_next
				110
				111	cond_false:
				112	%X.1 = load i32* @H
				113	br label %cond_next
				114
				115	cond_next:
				116	%X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
				117	ret i32 %X.2
				118	}
				119	</pre>
				120	</div>
				121
				122	<p>In this example, the loads from the G and H global variables are explicit in
				123	the LLVM IR, and they live in the then/else branches of the if statement
				124	(cond_true/cond_false). In order to merge the incoming values, the X.2 phi node
				125	in the cond_next block selects the right value to use based on where control
				126	flow is coming from: if control flow comes from the cond_false block, X.2 gets
Chris Lattner	b7e6b1a	2007-11-15 04:51:31 +0000	[diff] [blame]	127	the value of X.1. Alternatively, if control flow comes from cond_true, it gets
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	128	the value of X.0. The intent of this chapter is not to explain the details of
				129	SSA form. For more information, see one of the many <a
				130	href="http://en.wikipedia.org/wiki/Static_single_assignment_form">online
				131	references</a>.</p>
				132
Chris Lattner	b7e6b1a	2007-11-15 04:51:31 +0000	[diff] [blame]	133	<p>The question for this article is "who places the phi nodes when lowering
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	134	assignments to mutable variables?". The issue here is that LLVM
				135	<em>requires</em> that its IR be in SSA form: there is no "non-ssa" mode for it.
				136	However, SSA construction requires non-trivial algorithms and data structures,
				137	so it is inconvenient and wasteful for every front-end to have to reproduce this
				138	logic.</p>
				139
				140	</div>
				141
				142	<!-- *********************************************************************** -->
				143	<div class="doc_section"><a name="memory">Memory in LLVM</a></div>
				144	<!-- *********************************************************************** -->
				145
				146	<div class="doc_text">
				147
				148	<p>The 'trick' here is that while LLVM does require all register values to be
				149	in SSA form, it does not require (or permit) memory objects to be in SSA form.
				150	In the example above, note that the loads from G and H are direct accesses to
				151	G and H: they are not renamed or versioned. This differs from some other
Chris Lattner	2e5d07e	2007-11-04 19:42:13 +0000	[diff] [blame]	152	compiler systems, which do try to version memory objects. In LLVM, instead of
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	153	encoding dataflow analysis of memory into the LLVM IR, it is handled with <a
				154	href="../WritingAnLLVMPass.html">Analysis Passes</a> which are computed on
				155	demand.</p>
				156
				157	<p>
				158	With this in mind, the high-level idea is that we want to make a stack variable
				159	(which lives in memory, because it is on the stack) for each mutable object in
				160	a function. To take advantage of this trick, we need to talk about how LLVM
				161	represents stack variables.
				162	</p>
				163
				164	<p>In LLVM, all memory accesses are explicit with load/store instructions, and
Chris Lattner	b7e6b1a	2007-11-15 04:51:31 +0000	[diff] [blame]	165	it is carefully designed not to have (or need) an "address-of" operator. Notice
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	166	how the type of the @G/@H global variables is actually "i32*" even though the
				167	variable is defined as "i32". What this means is that @G defines <em>space</em>
				168	for an i32 in the global data area, but its <em>name</em> actually refers to the
Chris Lattner	b7e6b1a	2007-11-15 04:51:31 +0000	[diff] [blame]	169	address for that space. Stack variables work the same way, except that instead of
				170	being declared with global variable definitions, they are declared with the
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	171	<a href="../LangRef.html#i_alloca">LLVM alloca instruction</a>:</p>
				172
				173	<div class="doc_code">
				174	<pre>
Chris Lattner	1e46a6c	2007-11-07 06:34:39 +0000	[diff] [blame]	175	define i32 @example() {
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	176	entry:
				177	%X = alloca i32 ; type of %X is i32*.
				178	...
				179	%tmp = load i32* %X ; load the stack value %X from the stack.
				180	%tmp2 = add i32 %tmp, 1 ; increment it
				181	store i32 %tmp2, i32* %X ; store it back
				182	...
				183	</pre>
				184	</div>
				185
				186	<p>This code shows an example of how you can declare and manipulate a stack
				187	variable in the LLVM IR. Stack memory allocated with the alloca instruction is
				188	fully general: you can pass the address of the stack slot to functions, you can
				189	store it in other variables, etc. In our example above, we could rewrite the
				190	example to use the alloca technique to avoid using a PHI node:</p>
				191
				192	<div class="doc_code">
				193	<pre>
				194	@G = weak global i32 0 ; type of @G is i32*
				195	@H = weak global i32 0 ; type of @H is i32*
				196
				197	define i32 @test(i1 %Condition) {
				198	entry:
				199	%X = alloca i32 ; type of %X is i32*.
				200	br i1 %Condition, label %cond_true, label %cond_false
				201
				202	cond_true:
				203	%X.0 = load i32* @G
				204	store i32 %X.0, i32* %X ; Update X
				205	br label %cond_next
				206
				207	cond_false:
				208	%X.1 = load i32* @H
				209	store i32 %X.1, i32* %X ; Update X
				210	br label %cond_next
				211
				212	cond_next:
				213	%X.2 = load i32* %X ; Read X
				214	ret i32 %X.2
				215	}
				216	</pre>
				217	</div>
				218
				219	<p>With this, we have discovered a way to handle arbitrary mutable variables
				220	without the need to create Phi nodes at all:</p>
				221
				222	<ol>
				223	<li>Each mutable variable becomes a stack allocation.</li>
				224	<li>Each read of the variable becomes a load from the stack.</li>
				225	<li>Each update of the variable becomes a store to the stack.</li>
				226	<li>Taking the address of a variable just uses the stack address directly.</li>
				227	</ol>
				228
				229	<p>While this solution has solved our immediate problem, it introduced another
				230	one: we have now apparently introduced a lot of stack traffic for very simple
				231	and common operations, a major performance problem. Fortunately for us, the
				232	LLVM optimizer has a highly-tuned optimization pass named "mem2reg" that handles
				233	this case, promoting allocas like this into SSA registers, inserting Phi nodes
				234	as appropriate. If you run this example through the pass, for example, you'll
				235	get:</p>
				236
				237	<div class="doc_code">
				238	<pre>
				239	$ <b>llvm-as < example.ll \| opt -mem2reg \| llvm-dis</b>
				240	@G = weak global i32 0
				241	@H = weak global i32 0
				242
				243	define i32 @test(i1 %Condition) {
				244	entry:
				245	br i1 %Condition, label %cond_true, label %cond_false
				246
				247	cond_true:
				248	%X.0 = load i32* @G
				249	br label %cond_next
				250
				251	cond_false:
				252	%X.1 = load i32* @H
				253	br label %cond_next
				254
				255	cond_next:
				256	%X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
				257	ret i32 %X.01
				258	}
				259	</pre>
Chris Lattner	e719831	2007-11-03 22:22:30 +0000	[diff] [blame]	260	</div>
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	261
Chris Lattner	b7e6b1a	2007-11-15 04:51:31 +0000	[diff] [blame]	262	<p>The mem2reg pass implements the standard "iterated dominance frontier"
Chris Lattner	e719831	2007-11-03 22:22:30 +0000	[diff] [blame]	263	algorithm for constructing SSA form and has a number of optimizations that speed
Chris Lattner	b7e6b1a	2007-11-15 04:51:31 +0000	[diff] [blame]	264	up (very common) degenerate cases. The mem2reg optimization pass is the answer to dealing
				265	with mutable variables, and we highly recommend that you depend on it. Note that
Chris Lattner	e719831	2007-11-03 22:22:30 +0000	[diff] [blame]	266	mem2reg only works on variables in certain circumstances:</p>
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	267
Chris Lattner	e719831	2007-11-03 22:22:30 +0000	[diff] [blame]	268	<ol>
				269	<li>mem2reg is alloca-driven: it looks for allocas and if it can handle them, it
				270	promotes them. It does not apply to global variables or heap allocations.</li>
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	271
Chris Lattner	e719831	2007-11-03 22:22:30 +0000	[diff] [blame]	272	<li>mem2reg only looks for alloca instructions in the entry block of the
				273	function. Being in the entry block guarantees that the alloca is only executed
				274	once, which makes analysis simpler.</li>
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	275
Chris Lattner	e719831	2007-11-03 22:22:30 +0000	[diff] [blame]	276	<li>mem2reg only promotes allocas whose uses are direct loads and stores. If
				277	the address of the stack object is passed to a function, or if any funny pointer
				278	arithmetic is involved, the alloca will not be promoted.</li>
				279
Chris Lattner	a56b22d	2007-11-05 17:45:54 +0000	[diff] [blame]	280	<li>mem2reg only works on allocas of <a
				281	href="../LangRef.html#t_classifications">first class</a>
				282	values (such as pointers, scalars and vectors), and only if the array size
Chris Lattner	e719831	2007-11-03 22:22:30 +0000	[diff] [blame]	283	of the allocation is 1 (or missing in the .ll file). mem2reg is not capable of
				284	promoting structs or arrays to registers. Note that the "scalarrepl" pass is
				285	more powerful and can promote structs, "unions", and arrays in many cases.</li>
				286
				287	</ol>
				288
				289	<p>
				290	All of these properties are easy to satisfy for most imperative languages, and
Chris Lattner	b7e6b1a	2007-11-15 04:51:31 +0000	[diff] [blame]	291	we'll illustrate it below with Kaleidoscope. The final question you may be
Chris Lattner	e719831	2007-11-03 22:22:30 +0000	[diff] [blame]	292	asking is: should I bother with this nonsense for my front-end? Wouldn't it be
				293	better if I just did SSA construction directly, avoiding use of the mem2reg
Chris Lattner	b7e6b1a	2007-11-15 04:51:31 +0000	[diff] [blame]	294	optimization pass? In short, we strongly recommend that you use this technique
Chris Lattner	e719831	2007-11-03 22:22:30 +0000	[diff] [blame]	295	for building SSA form, unless there is an extremely good reason not to. Using
				296	this technique is:</p>
				297
				298	<ul>
				299	<li>Proven and well tested: llvm-gcc and clang both use this technique for local
				300	mutable variables. As such, the most common clients of LLVM are using this to
				301	handle a bulk of their variables. You can be sure that bugs are found fast and
				302	fixed early.</li>
				303
				304	<li>Extremely Fast: mem2reg has a number of special cases that make it fast in
				305	common cases as well as fully general. For example, it has fast-paths for
				306	variables that are only used in a single block, variables that only have one
				307	assignment point, good heuristics to avoid insertion of unneeded phi nodes, etc.
				308	</li>
				309
				310	<li>Needed for debug info generation: <a href="../SourceLevelDebugging.html">
				311	Debug information in LLVM</a> relies on having the address of the variable
Chris Lattner	b7e6b1a	2007-11-15 04:51:31 +0000	[diff] [blame]	312	exposed so that debug info can be attached to it. This technique dovetails
				313	very naturally with this style of debug info.</li>
Chris Lattner	e719831	2007-11-03 22:22:30 +0000	[diff] [blame]	314	</ul>
				315
				316	<p>If nothing else, this makes it much easier to get your front-end up and
				317	running, and is very simple to implement. Lets extend Kaleidoscope with mutable
				318	variables now!
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	319	</p>
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	320
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	321	</div>
				322
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	323	<!-- *********************************************************************** -->
				324	<div class="doc_section"><a name="kalvars">Mutable Variables in
				325	Kaleidoscope</a></div>
				326	<!-- *********************************************************************** -->
				327
				328	<div class="doc_text">
				329
				330	<p>Now that we know the sort of problem we want to tackle, lets see what this
				331	looks like in the context of our little Kaleidoscope language. We're going to
				332	add two features:</p>
				333
				334	<ol>
				335	<li>The ability to mutate variables with the '=' operator.</li>
				336	<li>The ability to define new variables.</li>
				337	</ol>
				338
				339	<p>While the first item is really what this is about, we only have variables
Chris Lattner	b7e6b1a	2007-11-15 04:51:31 +0000	[diff] [blame]	340	for incoming arguments as well as for induction variables, and redefining those only
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	341	goes so far :). Also, the ability to define new variables is a
				342	useful thing regardless of whether you will be mutating them. Here's a
				343	motivating example that shows how we could use these:</p>
				344
				345	<div class="doc_code">
				346	<pre>
				347	# Define ':' for sequencing: as a low-precedence operator that ignores operands
				348	# and just returns the RHS.
				349	def binary : 1 (x y) y;
				350
				351	# Recursive fib, we could do this before.
				352	def fib(x)
				353	if (x < 3) then
				354	1
				355	else
				356	fib(x-1)+fib(x-2);
				357
				358	# Iterative fib.
				359	def fibi(x)
				360	<b>var a = 1, b = 1, c in</b>
Chris Lattner	1e46a6c	2007-11-07 06:34:39 +0000	[diff] [blame]	361	(for i = 3, i < x in
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	362	<b>c = a + b</b> :
				363	<b>a = b</b> :
				364	<b>b = c</b>) :
				365	b;
				366
				367	# Call it.
				368	fibi(10);
				369	</pre>
				370	</div>
				371
				372	<p>
				373	In order to mutate variables, we have to change our existing variables to use
				374	the "alloca trick". Once we have that, we'll add our new operator, then extend
				375	Kaleidoscope to support new variable definitions.
				376	</p>
				377
				378	</div>
				379
				380	<!-- *********************************************************************** -->
				381	<div class="doc_section"><a name="adjustments">Adjusting Existing Variables for
				382	Mutation</a></div>
				383	<!-- *********************************************************************** -->
				384
				385	<div class="doc_text">
				386
				387	<p>
				388	The symbol table in Kaleidoscope is managed at code generation time by the
				389	'<tt>NamedValues</tt>' map. This map currently keeps track of the LLVM "Value*"
				390	that holds the double value for the named variable. In order to support
				391	mutation, we need to change this slightly, so that it <tt>NamedValues</tt> holds
				392	the <em>memory location</em> of the variable in question. Note that this
				393	change is a refactoring: it changes the structure of the code, but does not
				394	(by itself) change the behavior of the compiler. All of these changes are
				395	isolated in the Kaleidoscope code generator.</p>
				396
				397	<p>
				398	At this point in Kaleidoscope's development, it only supports variables for two
				399	things: incoming arguments to functions and the induction variable of 'for'
				400	loops. For consistency, we'll allow mutation of these variables in addition to
				401	other user-defined variables. This means that these will both need memory
				402	locations.
				403	</p>
				404
				405	<p>To start our transformation of Kaleidoscope, we'll change the NamedValues
Chris Lattner	b7e6b1a	2007-11-15 04:51:31 +0000	[diff] [blame]	406	map so that it maps to AllocaInst* instead of Value*. Once we do this, the C++
				407	compiler will tell us what parts of the code we need to update:</p>
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	408
				409	<div class="doc_code">
				410	<pre>
				411	static std::map<std::string, AllocaInst*> NamedValues;
				412	</pre>
				413	</div>
				414
				415	<p>Also, since we will need to create these alloca's, we'll use a helper
				416	function that ensures that the allocas are created in the entry block of the
				417	function:</p>
				418
				419	<div class="doc_code">
				420	<pre>
				421	/// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
				422	/// the function. This is used for mutable variables etc.
				423	static AllocaInst CreateEntryBlockAlloca(Function TheFunction,
				424	const std::string &VarName) {
Gabor Greif	d6c1ed0	2009-03-11 19:51:07 +0000	[diff] [blame^]	425	IRBuilder<> TmpB(&TheFunction->getEntryBlock(),
Duncan Sands	89f6d88	2008-04-13 06:22:09 +0000	[diff] [blame]	426	TheFunction->getEntryBlock().begin());
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	427	return TmpB.CreateAlloca(Type::DoubleTy, 0, VarName.c_str());
				428	}
				429	</pre>
				430	</div>
				431
Duncan Sands	89f6d88	2008-04-13 06:22:09 +0000	[diff] [blame]	432	<p>This funny looking code creates an IRBuilder object that is pointing at
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	433	the first instruction (.begin()) of the entry block. It then creates an alloca
				434	with the expected name and returns it. Because all values in Kaleidoscope are
				435	doubles, there is no need to pass in a type to use.</p>
				436
				437	<p>With this in place, the first functionality change we want to make is to
				438	variable references. In our new scheme, variables live on the stack, so code
				439	generating a reference to them actually needs to produce a load from the stack
				440	slot:</p>
				441
				442	<div class="doc_code">
				443	<pre>
				444	Value *VariableExprAST::Codegen() {
				445	// Look this variable up in the function.
				446	Value *V = NamedValues[Name];
				447	if (V == 0) return ErrorV("Unknown variable name");
				448
Chris Lattner	1e46a6c	2007-11-07 06:34:39 +0000	[diff] [blame]	449	<b>// Load the value.
				450	return Builder.CreateLoad(V, Name.c_str());</b>
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	451	}
				452	</pre>
				453	</div>
				454
Chris Lattner	b7e6b1a	2007-11-15 04:51:31 +0000	[diff] [blame]	455	<p>As you can see, this is pretty straightforward. Now we need to update the
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	456	things that define the variables to set up the alloca. We'll start with
				457	<tt>ForExprAST::Codegen</tt> (see the <a href="#code">full code listing</a> for
				458	the unabridged code):</p>
				459
				460	<div class="doc_code">
				461	<pre>
				462	Function *TheFunction = Builder.GetInsertBlock()->getParent();
				463
				464	<b>// Create an alloca for the variable in the entry block.
				465	AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);</b>
				466
				467	// Emit the start code first, without 'variable' in scope.
				468	Value *StartVal = Start->Codegen();
				469	if (StartVal == 0) return 0;
				470
				471	<b>// Store the value into the alloca.
				472	Builder.CreateStore(StartVal, Alloca);</b>
				473	...
				474
				475	// Compute the end condition.
				476	Value *EndCond = End->Codegen();
				477	if (EndCond == 0) return EndCond;
				478
				479	<b>// Reload, increment, and restore the alloca. This handles the case where
				480	// the body of the loop mutates the variable.
				481	Value *CurVar = Builder.CreateLoad(Alloca);
				482	Value *NextVar = Builder.CreateAdd(CurVar, StepVal, "nextvar");
				483	Builder.CreateStore(NextVar, Alloca);</b>
				484	...
				485	</pre>
				486	</div>
				487
				488	<p>This code is virtually identical to the code <a
				489	href="LangImpl5.html#forcodegen">before we allowed mutable variables</a>. The
				490	big difference is that we no longer have to construct a PHI node, and we use
				491	load/store to access the variable as needed.</p>
				492
				493	<p>To support mutable argument variables, we need to also make allocas for them.
				494	The code for this is also pretty simple:</p>
				495
				496	<div class="doc_code">
				497	<pre>
				498	/// CreateArgumentAllocas - Create an alloca for each argument and register the
				499	/// argument in the symbol table so that references to it will succeed.
				500	void PrototypeAST::CreateArgumentAllocas(Function *F) {
				501	Function::arg_iterator AI = F->arg_begin();
				502	for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
				503	// Create an alloca for this variable.
				504	AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
				505
				506	// Store the initial value into the alloca.
				507	Builder.CreateStore(AI, Alloca);
				508
				509	// Add arguments to variable symbol table.
				510	NamedValues[Args[Idx]] = Alloca;
				511	}
				512	}
				513	</pre>
				514	</div>
				515
				516	<p>For each argument, we make an alloca, store the input value to the function
				517	into the alloca, and register the alloca as the memory location for the
				518	argument. This method gets invoked by <tt>FunctionAST::Codegen</tt> right after
				519	it sets up the entry block for the function.</p>
				520
Chris Lattner	b7e6b1a	2007-11-15 04:51:31 +0000	[diff] [blame]	521	<p>The final missing piece is adding the mem2reg pass, which allows us to get
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	522	good codegen once again:</p>
				523
				524	<div class="doc_code">
				525	<pre>
				526	// Set up the optimizer pipeline. Start with registering info about how the
				527	// target lays out data structures.
				528	OurFPM.add(new TargetData(*TheExecutionEngine->getTargetData()));
				529	<b>// Promote allocas to registers.
				530	OurFPM.add(createPromoteMemoryToRegisterPass());</b>
				531	// Do simple "peephole" optimizations and bit-twiddling optzns.
				532	OurFPM.add(createInstructionCombiningPass());
				533	// Reassociate expressions.
				534	OurFPM.add(createReassociatePass());
				535	</pre>
				536	</div>
				537
				538	<p>It is interesting to see what the code looks like before and after the
				539	mem2reg optimization runs. For example, this is the before/after code for our
Chris Lattner	b7e6b1a	2007-11-15 04:51:31 +0000	[diff] [blame]	540	recursive fib function. Before the optimization:</p>
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	541
				542	<div class="doc_code">
				543	<pre>
				544	define double @fib(double %x) {
				545	entry:
				546	<b>%x1 = alloca double
				547	store double %x, double* %x1
				548	%x2 = load double* %x1</b>
Chris Lattner	7115521	2007-11-06 01:39:12 +0000	[diff] [blame]	549	%cmptmp = fcmp ult double %x2, 3.000000e+00
				550	%booltmp = uitofp i1 %cmptmp to double
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	551	%ifcond = fcmp one double %booltmp, 0.000000e+00
				552	br i1 %ifcond, label %then, label %else
				553
				554	then: ; preds = %entry
				555	br label %ifcont
				556
				557	else: ; preds = %entry
				558	<b>%x3 = load double* %x1</b>
				559	%subtmp = sub double %x3, 1.000000e+00
				560	%calltmp = call double @fib( double %subtmp )
				561	<b>%x4 = load double* %x1</b>
				562	%subtmp5 = sub double %x4, 2.000000e+00
				563	%calltmp6 = call double @fib( double %subtmp5 )
				564	%addtmp = add double %calltmp, %calltmp6
				565	br label %ifcont
				566
				567	ifcont: ; preds = %else, %then
				568	%iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
				569	ret double %iftmp
				570	}
				571	</pre>
				572	</div>
				573
				574	<p>Here there is only one variable (x, the input argument) but you can still
				575	see the extremely simple-minded code generation strategy we are using. In the
				576	entry block, an alloca is created, and the initial input value is stored into
				577	it. Each reference to the variable does a reload from the stack. Also, note
				578	that we didn't modify the if/then/else expression, so it still inserts a PHI
				579	node. While we could make an alloca for it, it is actually easier to create a
				580	PHI node for it, so we still just make the PHI.</p>
				581
				582	<p>Here is the code after the mem2reg pass runs:</p>
				583
				584	<div class="doc_code">
				585	<pre>
				586	define double @fib(double %x) {
				587	entry:
Chris Lattner	7115521	2007-11-06 01:39:12 +0000	[diff] [blame]	588	%cmptmp = fcmp ult double <b>%x</b>, 3.000000e+00
				589	%booltmp = uitofp i1 %cmptmp to double
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	590	%ifcond = fcmp one double %booltmp, 0.000000e+00
				591	br i1 %ifcond, label %then, label %else
				592
				593	then:
				594	br label %ifcont
				595
				596	else:
				597	%subtmp = sub double <b>%x</b>, 1.000000e+00
				598	%calltmp = call double @fib( double %subtmp )
				599	%subtmp5 = sub double <b>%x</b>, 2.000000e+00
				600	%calltmp6 = call double @fib( double %subtmp5 )
				601	%addtmp = add double %calltmp, %calltmp6
				602	br label %ifcont
				603
				604	ifcont: ; preds = %else, %then
				605	%iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
				606	ret double %iftmp
				607	}
				608	</pre>
				609	</div>
				610
				611	<p>This is a trivial case for mem2reg, since there are no redefinitions of the
				612	variable. The point of showing this is to calm your tension about inserting
				613	such blatent inefficiencies :).</p>
				614
				615	<p>After the rest of the optimizers run, we get:</p>
				616
				617	<div class="doc_code">
				618	<pre>
				619	define double @fib(double %x) {
				620	entry:
Chris Lattner	7115521	2007-11-06 01:39:12 +0000	[diff] [blame]	621	%cmptmp = fcmp ult double %x, 3.000000e+00
				622	%booltmp = uitofp i1 %cmptmp to double
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	623	%ifcond = fcmp ueq double %booltmp, 0.000000e+00
				624	br i1 %ifcond, label %else, label %ifcont
				625
				626	else:
				627	%subtmp = sub double %x, 1.000000e+00
				628	%calltmp = call double @fib( double %subtmp )
				629	%subtmp5 = sub double %x, 2.000000e+00
				630	%calltmp6 = call double @fib( double %subtmp5 )
				631	%addtmp = add double %calltmp, %calltmp6
				632	ret double %addtmp
				633
				634	ifcont:
				635	ret double 1.000000e+00
				636	}
				637	</pre>
				638	</div>
				639
				640	<p>Here we see that the simplifycfg pass decided to clone the return instruction
				641	into the end of the 'else' block. This allowed it to eliminate some branches
				642	and the PHI node.</p>
				643
				644	<p>Now that all symbol table references are updated to use stack variables,
				645	we'll add the assignment operator.</p>
				646
				647	</div>
				648
				649	<!-- *********************************************************************** -->
				650	<div class="doc_section"><a name="assignment">New Assignment Operator</a></div>
				651	<!-- *********************************************************************** -->
				652
				653	<div class="doc_text">
				654
				655	<p>With our current framework, adding a new assignment operator is really
				656	simple. We will parse it just like any other binary operator, but handle it
				657	internally (instead of allowing the user to define it). The first step is to
				658	set a precedence:</p>
				659
				660	<div class="doc_code">
				661	<pre>
				662	int main() {
				663	// Install standard binary operators.
				664	// 1 is lowest precedence.
				665	<b>BinopPrecedence['='] = 2;</b>
				666	BinopPrecedence['<'] = 10;
				667	BinopPrecedence['+'] = 20;
				668	BinopPrecedence['-'] = 20;
				669	</pre>
				670	</div>
				671
				672	<p>Now that the parser knows the precedence of the binary operator, it takes
				673	care of all the parsing and AST generation. We just need to implement codegen
				674	for the assignment operator. This looks like:</p>
				675
				676	<div class="doc_code">
				677	<pre>
				678	Value *BinaryExprAST::Codegen() {
				679	// Special case '=' because we don't want to emit the LHS as an expression.
				680	if (Op == '=') {
				681	// Assignment requires the LHS to be an identifier.
				682	VariableExprAST LHSE = dynamic_cast<VariableExprAST>(LHS);
				683	if (!LHSE)
				684	return ErrorV("destination of '=' must be a variable");
				685	</pre>
				686	</div>
				687
				688	<p>Unlike the rest of the binary operators, our assignment operator doesn't
				689	follow the "emit LHS, emit RHS, do computation" model. As such, it is handled
				690	as a special case before the other binary operators are handled. The other
Chris Lattner	1e46a6c	2007-11-07 06:34:39 +0000	[diff] [blame]	691	strange thing is that it requires the LHS to be a variable. It is invalid to
				692	have "(x+1) = expr" - only things like "x = expr" are allowed.
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	693	</p>
				694
				695	<div class="doc_code">
				696	<pre>
				697	// Codegen the RHS.
				698	Value *Val = RHS->Codegen();
				699	if (Val == 0) return 0;
				700
				701	// Look up the name.
				702	Value *Variable = NamedValues[LHSE->getName()];
				703	if (Variable == 0) return ErrorV("Unknown variable name");
				704
				705	Builder.CreateStore(Val, Variable);
				706	return Val;
				707	}
				708	...
				709	</pre>
				710	</div>
				711
Chris Lattner	b7e6b1a	2007-11-15 04:51:31 +0000	[diff] [blame]	712	<p>Once we have the variable, codegen'ing the assignment is straightforward:
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	713	we emit the RHS of the assignment, create a store, and return the computed
				714	value. Returning a value allows for chained assignments like "X = (Y = Z)".</p>
				715
				716	<p>Now that we have an assignment operator, we can mutate loop variables and
				717	arguments. For example, we can now run code like this:</p>
				718
				719	<div class="doc_code">
				720	<pre>
				721	# Function to print a double.
				722	extern printd(x);
				723
				724	# Define ':' for sequencing: as a low-precedence operator that ignores operands
				725	# and just returns the RHS.
				726	def binary : 1 (x y) y;
				727
				728	def test(x)
				729	printd(x) :
				730	x = 4 :
				731	printd(x);
				732
				733	test(123);
				734	</pre>
				735	</div>
				736
				737	<p>When run, this example prints "123" and then "4", showing that we did
				738	actually mutate the value! Okay, we have now officially implemented our goal:
				739	getting this to work requires SSA construction in the general case. However,
				740	to be really useful, we want the ability to define our own local variables, lets
				741	add this next!
				742	</p>
				743
				744	</div>
				745
				746	<!-- *********************************************************************** -->
				747	<div class="doc_section"><a name="localvars">User-defined Local
				748	Variables</a></div>
				749	<!-- *********************************************************************** -->
				750
				751	<div class="doc_text">
				752
				753	<p>Adding var/in is just like any other other extensions we made to
				754	Kaleidoscope: we extend the lexer, the parser, the AST and the code generator.
				755	The first step for adding our new 'var/in' construct is to extend the lexer.
				756	As before, this is pretty trivial, the code looks like this:</p>
				757
				758	<div class="doc_code">
				759	<pre>
				760	enum Token {
				761	...
				762	<b>// var definition
				763	tok_var = -13</b>
				764	...
				765	}
				766	...
				767	static int gettok() {
				768	...
				769	if (IdentifierStr == "in") return tok_in;
				770	if (IdentifierStr == "binary") return tok_binary;
				771	if (IdentifierStr == "unary") return tok_unary;
				772	<b>if (IdentifierStr == "var") return tok_var;</b>
				773	return tok_identifier;
				774	...
				775	</pre>
				776	</div>
				777
				778	<p>The next step is to define the AST node that we will construct. For var/in,
Chris Lattner	1e46a6c	2007-11-07 06:34:39 +0000	[diff] [blame]	779	it looks like this:</p>
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	780
				781	<div class="doc_code">
				782	<pre>
				783	/// VarExprAST - Expression class for var/in
				784	class VarExprAST : public ExprAST {
				785	std::vector<std::pair<std::string, ExprAST*> > VarNames;
				786	ExprAST *Body;
				787	public:
				788	VarExprAST(const std::vector<std::pair<std::string, ExprAST*> > &varnames,
				789	ExprAST *body)
				790	: VarNames(varnames), Body(body) {}
				791
				792	virtual Value *Codegen();
				793	};
				794	</pre>
				795	</div>
				796
				797	<p>var/in allows a list of names to be defined all at once, and each name can
				798	optionally have an initializer value. As such, we capture this information in
				799	the VarNames vector. Also, var/in has a body, this body is allowed to access
Chris Lattner	1e46a6c	2007-11-07 06:34:39 +0000	[diff] [blame]	800	the variables defined by the var/in.</p>
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	801
Chris Lattner	b7e6b1a	2007-11-15 04:51:31 +0000	[diff] [blame]	802	<p>With this in place, we can define the parser pieces. The first thing we do is add
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	803	it as a primary expression:</p>
				804
				805	<div class="doc_code">
				806	<pre>
				807	/// primary
				808	/// ::= identifierexpr
				809	/// ::= numberexpr
				810	/// ::= parenexpr
				811	/// ::= ifexpr
				812	/// ::= forexpr
				813	<b>/// ::= varexpr</b>
				814	static ExprAST *ParsePrimary() {
				815	switch (CurTok) {
				816	default: return Error("unknown token when expecting an expression");
				817	case tok_identifier: return ParseIdentifierExpr();
				818	case tok_number: return ParseNumberExpr();
				819	case '(': return ParseParenExpr();
				820	case tok_if: return ParseIfExpr();
				821	case tok_for: return ParseForExpr();
				822	<b>case tok_var: return ParseVarExpr();</b>
				823	}
				824	}
				825	</pre>
				826	</div>
				827
				828	<p>Next we define ParseVarExpr:</p>
				829
				830	<div class="doc_code">
				831	<pre>
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	832	/// varexpr ::= 'var' identifier ('=' expression)?
				833	// (',' identifier ('=' expression)?)* 'in' expression
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	834	static ExprAST *ParseVarExpr() {
				835	getNextToken(); // eat the var.
				836
				837	std::vector<std::pair<std::string, ExprAST*> > VarNames;
				838
				839	// At least one variable name is required.
				840	if (CurTok != tok_identifier)
				841	return Error("expected identifier after var");
				842	</pre>
				843	</div>
				844
				845	<p>The first part of this code parses the list of identifier/expr pairs into the
				846	local <tt>VarNames</tt> vector.
				847
				848	<div class="doc_code">
				849	<pre>
				850	while (1) {
				851	std::string Name = IdentifierStr;
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	852	getNextToken(); // eat identifier.
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	853
				854	// Read the optional initializer.
				855	ExprAST *Init = 0;
				856	if (CurTok == '=') {
				857	getNextToken(); // eat the '='.
				858
				859	Init = ParseExpression();
				860	if (Init == 0) return 0;
				861	}
				862
				863	VarNames.push_back(std::make_pair(Name, Init));
				864
				865	// End of var list, exit loop.
				866	if (CurTok != ',') break;
				867	getNextToken(); // eat the ','.
				868
				869	if (CurTok != tok_identifier)
				870	return Error("expected identifier list after var");
				871	}
				872	</pre>
				873	</div>
				874
				875	<p>Once all the variables are parsed, we then parse the body and create the
				876	AST node:</p>
				877
				878	<div class="doc_code">
				879	<pre>
				880	// At this point, we have to have 'in'.
				881	if (CurTok != tok_in)
				882	return Error("expected 'in' keyword after 'var'");
				883	getNextToken(); // eat 'in'.
				884
				885	ExprAST *Body = ParseExpression();
				886	if (Body == 0) return 0;
				887
				888	return new VarExprAST(VarNames, Body);
				889	}
				890	</pre>
				891	</div>
				892
				893	<p>Now that we can parse and represent the code, we need to support emission of
				894	LLVM IR for it. This code starts out with:</p>
				895
				896	<div class="doc_code">
				897	<pre>
				898	Value *VarExprAST::Codegen() {
				899	std::vector<AllocaInst *> OldBindings;
				900
				901	Function *TheFunction = Builder.GetInsertBlock()->getParent();
				902
				903	// Register all variables and emit their initializer.
				904	for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
				905	const std::string &VarName = VarNames[i].first;
				906	ExprAST *Init = VarNames[i].second;
				907	</pre>
				908	</div>
				909
				910	<p>Basically it loops over all the variables, installing them one at a time.
				911	For each variable we put into the symbol table, we remember the previous value
				912	that we replace in OldBindings.</p>
				913
				914	<div class="doc_code">
				915	<pre>
				916	// Emit the initializer before adding the variable to scope, this prevents
				917	// the initializer from referencing the variable itself, and permits stuff
				918	// like this:
				919	// var a = 1 in
				920	// var a = a in ... # refers to outer 'a'.
				921	Value *InitVal;
				922	if (Init) {
				923	InitVal = Init->Codegen();
				924	if (InitVal == 0) return 0;
				925	} else { // If not specified, use 0.0.
Gabor Greif	5934adf	2008-06-10 01:52:17 +0000	[diff] [blame]	926	InitVal = ConstantFP::get(APFloat(0.0));
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	927	}
				928
				929	AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
				930	Builder.CreateStore(InitVal, Alloca);
				931
				932	// Remember the old variable binding so that we can restore the binding when
				933	// we unrecurse.
				934	OldBindings.push_back(NamedValues[VarName]);
				935
				936	// Remember this binding.
				937	NamedValues[VarName] = Alloca;
				938	}
				939	</pre>
				940	</div>
				941
				942	<p>There are more comments here than code. The basic idea is that we emit the
				943	initializer, create the alloca, then update the symbol table to point to it.
				944	Once all the variables are installed in the symbol table, we evaluate the body
				945	of the var/in expression:</p>
				946
				947	<div class="doc_code">
				948	<pre>
				949	// Codegen the body, now that all vars are in scope.
				950	Value *BodyVal = Body->Codegen();
				951	if (BodyVal == 0) return 0;
				952	</pre>
				953	</div>
				954
				955	<p>Finally, before returning, we restore the previous variable bindings:</p>
				956
				957	<div class="doc_code">
				958	<pre>
				959	// Pop all our variables from scope.
				960	for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
				961	NamedValues[VarNames[i].first] = OldBindings[i];
				962
				963	// Return the body computation.
				964	return BodyVal;
				965	}
				966	</pre>
				967	</div>
				968
				969	<p>The end result of all of this is that we get properly scoped variable
				970	definitions, and we even (trivially) allow mutation of them :).</p>
				971
				972	<p>With this, we completed what we set out to do. Our nice iterative fib
				973	example from the intro compiles and runs just fine. The mem2reg pass optimizes
				974	all of our stack variables into SSA registers, inserting PHI nodes where needed,
Chris Lattner	b7e6b1a	2007-11-15 04:51:31 +0000	[diff] [blame]	975	and our front-end remains simple: no "iterated dominance frontier" computation
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	976	anywhere in sight.</p>
				977
				978	</div>
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	979
				980	<!-- *********************************************************************** -->
				981	<div class="doc_section"><a name="code">Full Code Listing</a></div>
				982	<!-- *********************************************************************** -->
				983
				984	<div class="doc_text">
				985
				986	<p>
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	987	Here is the complete code listing for our running example, enhanced with mutable
				988	variables and var/in support. To build this example, use:
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	989	</p>
				990
				991	<div class="doc_code">
				992	<pre>
				993	# Compile
				994	g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
				995	# Run
				996	./toy
				997	</pre>
				998	</div>
				999
				1000	<p>Here is the code:</p>
				1001
				1002	<div class="doc_code">
				1003	<pre>
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	1004	#include "llvm/DerivedTypes.h"
				1005	#include "llvm/ExecutionEngine/ExecutionEngine.h"
				1006	#include "llvm/Module.h"
				1007	#include "llvm/ModuleProvider.h"
				1008	#include "llvm/PassManager.h"
				1009	#include "llvm/Analysis/Verifier.h"
				1010	#include "llvm/Target/TargetData.h"
				1011	#include "llvm/Transforms/Scalar.h"
Duncan Sands	89f6d88	2008-04-13 06:22:09 +0000	[diff] [blame]	1012	#include "llvm/Support/IRBuilder.h"
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	1013	#include <cstdio>
				1014	#include <string>
				1015	#include <map>
				1016	#include <vector>
				1017	using namespace llvm;
				1018
				1019	//===----------------------------------------------------------------------===//
				1020	// Lexer
				1021	//===----------------------------------------------------------------------===//
				1022
				1023	// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
				1024	// of these for known things.
				1025	enum Token {
				1026	tok_eof = -1,
				1027
				1028	// commands
				1029	tok_def = -2, tok_extern = -3,
				1030
				1031	// primary
				1032	tok_identifier = -4, tok_number = -5,
				1033
				1034	// control
				1035	tok_if = -6, tok_then = -7, tok_else = -8,
				1036	tok_for = -9, tok_in = -10,
				1037
				1038	// operators
				1039	tok_binary = -11, tok_unary = -12,
				1040
				1041	// var definition
				1042	tok_var = -13
				1043	};
				1044
				1045	static std::string IdentifierStr; // Filled in if tok_identifier
				1046	static double NumVal; // Filled in if tok_number
				1047
				1048	/// gettok - Return the next token from standard input.
				1049	static int gettok() {
				1050	static int LastChar = ' ';
				1051
				1052	// Skip any whitespace.
				1053	while (isspace(LastChar))
				1054	LastChar = getchar();
				1055
				1056	if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
				1057	IdentifierStr = LastChar;
				1058	while (isalnum((LastChar = getchar())))
				1059	IdentifierStr += LastChar;
				1060
				1061	if (IdentifierStr == "def") return tok_def;
				1062	if (IdentifierStr == "extern") return tok_extern;
				1063	if (IdentifierStr == "if") return tok_if;
				1064	if (IdentifierStr == "then") return tok_then;
				1065	if (IdentifierStr == "else") return tok_else;
				1066	if (IdentifierStr == "for") return tok_for;
				1067	if (IdentifierStr == "in") return tok_in;
				1068	if (IdentifierStr == "binary") return tok_binary;
				1069	if (IdentifierStr == "unary") return tok_unary;
				1070	if (IdentifierStr == "var") return tok_var;
				1071	return tok_identifier;
				1072	}
				1073
				1074	if (isdigit(LastChar) \|\| LastChar == '.') { // Number: [0-9.]+
				1075	std::string NumStr;
				1076	do {
				1077	NumStr += LastChar;
				1078	LastChar = getchar();
				1079	} while (isdigit(LastChar) \|\| LastChar == '.');
				1080
				1081	NumVal = strtod(NumStr.c_str(), 0);
				1082	return tok_number;
				1083	}
				1084
				1085	if (LastChar == '#') {
				1086	// Comment until end of line.
				1087	do LastChar = getchar();
Chris Lattner	c80c23f	2007-12-02 22:46:01 +0000	[diff] [blame]	1088	while (LastChar != EOF && LastChar != '\n' && LastChar != '\r');
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	1089
				1090	if (LastChar != EOF)
				1091	return gettok();
				1092	}
				1093
				1094	// Check for end of file. Don't eat the EOF.
				1095	if (LastChar == EOF)
				1096	return tok_eof;
				1097
				1098	// Otherwise, just return the character as its ascii value.
				1099	int ThisChar = LastChar;
				1100	LastChar = getchar();
				1101	return ThisChar;
				1102	}
				1103
				1104	//===----------------------------------------------------------------------===//
				1105	// Abstract Syntax Tree (aka Parse Tree)
				1106	//===----------------------------------------------------------------------===//
				1107
				1108	/// ExprAST - Base class for all expression nodes.
				1109	class ExprAST {
				1110	public:
				1111	virtual ~ExprAST() {}
				1112	virtual Value *Codegen() = 0;
				1113	};
				1114
				1115	/// NumberExprAST - Expression class for numeric literals like "1.0".
				1116	class NumberExprAST : public ExprAST {
				1117	double Val;
				1118	public:
				1119	NumberExprAST(double val) : Val(val) {}
				1120	virtual Value *Codegen();
				1121	};
				1122
				1123	/// VariableExprAST - Expression class for referencing a variable, like "a".
				1124	class VariableExprAST : public ExprAST {
				1125	std::string Name;
				1126	public:
				1127	VariableExprAST(const std::string &name) : Name(name) {}
				1128	const std::string &getName() const { return Name; }
				1129	virtual Value *Codegen();
				1130	};
				1131
				1132	/// UnaryExprAST - Expression class for a unary operator.
				1133	class UnaryExprAST : public ExprAST {
				1134	char Opcode;
				1135	ExprAST *Operand;
				1136	public:
				1137	UnaryExprAST(char opcode, ExprAST *operand)
				1138	: Opcode(opcode), Operand(operand) {}
				1139	virtual Value *Codegen();
				1140	};
				1141
				1142	/// BinaryExprAST - Expression class for a binary operator.
				1143	class BinaryExprAST : public ExprAST {
				1144	char Op;
				1145	ExprAST LHS, RHS;
				1146	public:
				1147	BinaryExprAST(char op, ExprAST lhs, ExprAST rhs)
				1148	: Op(op), LHS(lhs), RHS(rhs) {}
				1149	virtual Value *Codegen();
				1150	};
				1151
				1152	/// CallExprAST - Expression class for function calls.
				1153	class CallExprAST : public ExprAST {
				1154	std::string Callee;
				1155	std::vector<ExprAST*> Args;
				1156	public:
				1157	CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
				1158	: Callee(callee), Args(args) {}
				1159	virtual Value *Codegen();
				1160	};
				1161
				1162	/// IfExprAST - Expression class for if/then/else.
				1163	class IfExprAST : public ExprAST {
				1164	ExprAST Cond, Then, *Else;
				1165	public:
				1166	IfExprAST(ExprAST cond, ExprAST then, ExprAST *_else)
				1167	: Cond(cond), Then(then), Else(_else) {}
				1168	virtual Value *Codegen();
				1169	};
				1170
				1171	/// ForExprAST - Expression class for for/in.
				1172	class ForExprAST : public ExprAST {
				1173	std::string VarName;
				1174	ExprAST Start, End, Step, Body;
				1175	public:
				1176	ForExprAST(const std::string &varname, ExprAST start, ExprAST end,
				1177	ExprAST step, ExprAST body)
				1178	: VarName(varname), Start(start), End(end), Step(step), Body(body) {}
				1179	virtual Value *Codegen();
				1180	};
				1181
				1182	/// VarExprAST - Expression class for var/in
				1183	class VarExprAST : public ExprAST {
				1184	std::vector<std::pair<std::string, ExprAST*> > VarNames;
				1185	ExprAST *Body;
				1186	public:
				1187	VarExprAST(const std::vector<std::pair<std::string, ExprAST*> > &varnames,
				1188	ExprAST *body)
				1189	: VarNames(varnames), Body(body) {}
				1190
				1191	virtual Value *Codegen();
				1192	};
				1193
				1194	/// PrototypeAST - This class represents the "prototype" for a function,
				1195	/// which captures its argument names as well as if it is an operator.
				1196	class PrototypeAST {
				1197	std::string Name;
				1198	std::vector<std::string> Args;
				1199	bool isOperator;
				1200	unsigned Precedence; // Precedence if a binary op.
				1201	public:
				1202	PrototypeAST(const std::string &name, const std::vector<std::string> &args,
				1203	bool isoperator = false, unsigned prec = 0)
				1204	: Name(name), Args(args), isOperator(isoperator), Precedence(prec) {}
				1205
				1206	bool isUnaryOp() const { return isOperator && Args.size() == 1; }
				1207	bool isBinaryOp() const { return isOperator && Args.size() == 2; }
				1208
				1209	char getOperatorName() const {
				1210	assert(isUnaryOp() \|\| isBinaryOp());
				1211	return Name[Name.size()-1];
				1212	}
				1213
				1214	unsigned getBinaryPrecedence() const { return Precedence; }
				1215
				1216	Function *Codegen();
				1217
				1218	void CreateArgumentAllocas(Function *F);
				1219	};
				1220
				1221	/// FunctionAST - This class represents a function definition itself.
				1222	class FunctionAST {
				1223	PrototypeAST *Proto;
				1224	ExprAST *Body;
				1225	public:
				1226	FunctionAST(PrototypeAST proto, ExprAST body)
				1227	: Proto(proto), Body(body) {}
				1228
				1229	Function *Codegen();
				1230	};
				1231
				1232	//===----------------------------------------------------------------------===//
				1233	// Parser
				1234	//===----------------------------------------------------------------------===//
				1235
				1236	/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
				1237	/// token the parser it looking at. getNextToken reads another token from the
				1238	/// lexer and updates CurTok with its results.
				1239	static int CurTok;
				1240	static int getNextToken() {
				1241	return CurTok = gettok();
				1242	}
				1243
				1244	/// BinopPrecedence - This holds the precedence for each binary operator that is
				1245	/// defined.
				1246	static std::map<char, int> BinopPrecedence;
				1247
				1248	/// GetTokPrecedence - Get the precedence of the pending binary operator token.
				1249	static int GetTokPrecedence() {
				1250	if (!isascii(CurTok))
				1251	return -1;
				1252
				1253	// Make sure it's a declared binop.
				1254	int TokPrec = BinopPrecedence[CurTok];
				1255	if (TokPrec <= 0) return -1;
				1256	return TokPrec;
				1257	}
				1258
				1259	/// Error* - These are little helper functions for error handling.
				1260	ExprAST Error(const char Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
				1261	PrototypeAST ErrorP(const char Str) { Error(Str); return 0; }
				1262	FunctionAST ErrorF(const char Str) { Error(Str); return 0; }
				1263
				1264	static ExprAST *ParseExpression();
				1265
				1266	/// identifierexpr
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	1267	/// ::= identifier
				1268	/// ::= identifier '(' expression* ')'
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	1269	static ExprAST *ParseIdentifierExpr() {
				1270	std::string IdName = IdentifierStr;
				1271
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	1272	getNextToken(); // eat identifier.
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	1273
				1274	if (CurTok != '(') // Simple variable ref.
				1275	return new VariableExprAST(IdName);
				1276
				1277	// Call.
				1278	getNextToken(); // eat (
				1279	std::vector<ExprAST*> Args;
				1280	if (CurTok != ')') {
				1281	while (1) {
				1282	ExprAST *Arg = ParseExpression();
				1283	if (!Arg) return 0;
				1284	Args.push_back(Arg);
				1285
				1286	if (CurTok == ')') break;
				1287
				1288	if (CurTok != ',')
Chris Lattner	6c4be9c	2008-04-14 16:44:41 +0000	[diff] [blame]	1289	return Error("Expected ')' or ',' in argument list");
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	1290	getNextToken();
				1291	}
				1292	}
				1293
				1294	// Eat the ')'.
				1295	getNextToken();
				1296
				1297	return new CallExprAST(IdName, Args);
				1298	}
				1299
				1300	/// numberexpr ::= number
				1301	static ExprAST *ParseNumberExpr() {
				1302	ExprAST *Result = new NumberExprAST(NumVal);
				1303	getNextToken(); // consume the number
				1304	return Result;
				1305	}
				1306
				1307	/// parenexpr ::= '(' expression ')'
				1308	static ExprAST *ParseParenExpr() {
				1309	getNextToken(); // eat (.
				1310	ExprAST *V = ParseExpression();
				1311	if (!V) return 0;
				1312
				1313	if (CurTok != ')')
				1314	return Error("expected ')'");
				1315	getNextToken(); // eat ).
				1316	return V;
				1317	}
				1318
				1319	/// ifexpr ::= 'if' expression 'then' expression 'else' expression
				1320	static ExprAST *ParseIfExpr() {
				1321	getNextToken(); // eat the if.
				1322
				1323	// condition.
				1324	ExprAST *Cond = ParseExpression();
				1325	if (!Cond) return 0;
				1326
				1327	if (CurTok != tok_then)
				1328	return Error("expected then");
				1329	getNextToken(); // eat the then
				1330
				1331	ExprAST *Then = ParseExpression();
				1332	if (Then == 0) return 0;
				1333
				1334	if (CurTok != tok_else)
				1335	return Error("expected else");
				1336
				1337	getNextToken();
				1338
				1339	ExprAST *Else = ParseExpression();
				1340	if (!Else) return 0;
				1341
				1342	return new IfExprAST(Cond, Then, Else);
				1343	}
				1344
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	1345	/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	1346	static ExprAST *ParseForExpr() {
				1347	getNextToken(); // eat the for.
				1348
				1349	if (CurTok != tok_identifier)
				1350	return Error("expected identifier after for");
				1351
				1352	std::string IdName = IdentifierStr;
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	1353	getNextToken(); // eat identifier.
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	1354
				1355	if (CurTok != '=')
				1356	return Error("expected '=' after for");
				1357	getNextToken(); // eat '='.
				1358
				1359
				1360	ExprAST *Start = ParseExpression();
				1361	if (Start == 0) return 0;
				1362	if (CurTok != ',')
				1363	return Error("expected ',' after for start value");
				1364	getNextToken();
				1365
				1366	ExprAST *End = ParseExpression();
				1367	if (End == 0) return 0;
				1368
				1369	// The step value is optional.
				1370	ExprAST *Step = 0;
				1371	if (CurTok == ',') {
				1372	getNextToken();
				1373	Step = ParseExpression();
				1374	if (Step == 0) return 0;
				1375	}
				1376
				1377	if (CurTok != tok_in)
				1378	return Error("expected 'in' after for");
				1379	getNextToken(); // eat 'in'.
				1380
				1381	ExprAST *Body = ParseExpression();
				1382	if (Body == 0) return 0;
				1383
				1384	return new ForExprAST(IdName, Start, End, Step, Body);
				1385	}
				1386
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	1387	/// varexpr ::= 'var' identifier ('=' expression)?
				1388	// (',' identifier ('=' expression)?)* 'in' expression
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	1389	static ExprAST *ParseVarExpr() {
				1390	getNextToken(); // eat the var.
				1391
				1392	std::vector<std::pair<std::string, ExprAST*> > VarNames;
				1393
				1394	// At least one variable name is required.
				1395	if (CurTok != tok_identifier)
				1396	return Error("expected identifier after var");
				1397
				1398	while (1) {
				1399	std::string Name = IdentifierStr;
Chris Lattner	20a0c80	2007-11-05 17:54:34 +0000	[diff] [blame]	1400	getNextToken(); // eat identifier.
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	1401
				1402	// Read the optional initializer.
				1403	ExprAST *Init = 0;
				1404	if (CurTok == '=') {
				1405	getNextToken(); // eat the '='.
				1406
				1407	Init = ParseExpression();
				1408	if (Init == 0) return 0;
				1409	}
				1410
				1411	VarNames.push_back(std::make_pair(Name, Init));
				1412
				1413	// End of var list, exit loop.
				1414	if (CurTok != ',') break;
				1415	getNextToken(); // eat the ','.
				1416
				1417	if (CurTok != tok_identifier)
				1418	return Error("expected identifier list after var");
				1419	}
				1420
				1421	// At this point, we have to have 'in'.
				1422	if (CurTok != tok_in)
				1423	return Error("expected 'in' keyword after 'var'");
				1424	getNextToken(); // eat 'in'.
				1425
				1426	ExprAST *Body = ParseExpression();
				1427	if (Body == 0) return 0;
				1428
				1429	return new VarExprAST(VarNames, Body);
				1430	}
				1431
				1432
				1433	/// primary
				1434	/// ::= identifierexpr
				1435	/// ::= numberexpr
				1436	/// ::= parenexpr
				1437	/// ::= ifexpr
				1438	/// ::= forexpr
				1439	/// ::= varexpr
				1440	static ExprAST *ParsePrimary() {
				1441	switch (CurTok) {
				1442	default: return Error("unknown token when expecting an expression");
				1443	case tok_identifier: return ParseIdentifierExpr();
				1444	case tok_number: return ParseNumberExpr();
				1445	case '(': return ParseParenExpr();
				1446	case tok_if: return ParseIfExpr();
				1447	case tok_for: return ParseForExpr();
				1448	case tok_var: return ParseVarExpr();
				1449	}
				1450	}
				1451
				1452	/// unary
				1453	/// ::= primary
				1454	/// ::= '!' unary
				1455	static ExprAST *ParseUnary() {
				1456	// If the current token is not an operator, it must be a primary expr.
				1457	if (!isascii(CurTok) \|\| CurTok == '(' \|\| CurTok == ',')
				1458	return ParsePrimary();
				1459
				1460	// If this is a unary operator, read it.
				1461	int Opc = CurTok;
				1462	getNextToken();
				1463	if (ExprAST *Operand = ParseUnary())
				1464	return new UnaryExprAST(Opc, Operand);
				1465	return 0;
				1466	}
				1467
				1468	/// binoprhs
				1469	/// ::= ('+' unary)*
				1470	static ExprAST ParseBinOpRHS(int ExprPrec, ExprAST LHS) {
				1471	// If this is a binop, find its precedence.
				1472	while (1) {
				1473	int TokPrec = GetTokPrecedence();
				1474
				1475	// If this is a binop that binds at least as tightly as the current binop,
				1476	// consume it, otherwise we are done.
				1477	if (TokPrec < ExprPrec)
				1478	return LHS;
				1479
				1480	// Okay, we know this is a binop.
				1481	int BinOp = CurTok;
				1482	getNextToken(); // eat binop
				1483
				1484	// Parse the unary expression after the binary operator.
				1485	ExprAST *RHS = ParseUnary();
				1486	if (!RHS) return 0;
				1487
				1488	// If BinOp binds less tightly with RHS than the operator after RHS, let
				1489	// the pending operator take RHS as its LHS.
				1490	int NextPrec = GetTokPrecedence();
				1491	if (TokPrec < NextPrec) {
				1492	RHS = ParseBinOpRHS(TokPrec+1, RHS);
				1493	if (RHS == 0) return 0;
				1494	}
				1495
				1496	// Merge LHS/RHS.
				1497	LHS = new BinaryExprAST(BinOp, LHS, RHS);
				1498	}
				1499	}
				1500
				1501	/// expression
				1502	/// ::= unary binoprhs
				1503	///
				1504	static ExprAST *ParseExpression() {
				1505	ExprAST *LHS = ParseUnary();
				1506	if (!LHS) return 0;
				1507
				1508	return ParseBinOpRHS(0, LHS);
				1509	}
				1510
				1511	/// prototype
				1512	/// ::= id '(' id* ')'
				1513	/// ::= binary LETTER number? (id, id)
				1514	/// ::= unary LETTER (id)
				1515	static PrototypeAST *ParsePrototype() {
				1516	std::string FnName;
				1517
				1518	int Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
				1519	unsigned BinaryPrecedence = 30;
				1520
				1521	switch (CurTok) {
				1522	default:
				1523	return ErrorP("Expected function name in prototype");
				1524	case tok_identifier:
				1525	FnName = IdentifierStr;
				1526	Kind = 0;
				1527	getNextToken();
				1528	break;
				1529	case tok_unary:
				1530	getNextToken();
				1531	if (!isascii(CurTok))
				1532	return ErrorP("Expected unary operator");
				1533	FnName = "unary";
				1534	FnName += (char)CurTok;
				1535	Kind = 1;
				1536	getNextToken();
				1537	break;
				1538	case tok_binary:
				1539	getNextToken();
				1540	if (!isascii(CurTok))
				1541	return ErrorP("Expected binary operator");
				1542	FnName = "binary";
				1543	FnName += (char)CurTok;
				1544	Kind = 2;
				1545	getNextToken();
				1546
				1547	// Read the precedence if present.
				1548	if (CurTok == tok_number) {
				1549	if (NumVal < 1 \|\| NumVal > 100)
				1550	return ErrorP("Invalid precedecnce: must be 1..100");
				1551	BinaryPrecedence = (unsigned)NumVal;
				1552	getNextToken();
				1553	}
				1554	break;
				1555	}
				1556
				1557	if (CurTok != '(')
				1558	return ErrorP("Expected '(' in prototype");
				1559
				1560	std::vector<std::string> ArgNames;
				1561	while (getNextToken() == tok_identifier)
				1562	ArgNames.push_back(IdentifierStr);
				1563	if (CurTok != ')')
				1564	return ErrorP("Expected ')' in prototype");
				1565
				1566	// success.
				1567	getNextToken(); // eat ')'.
				1568
				1569	// Verify right number of names for operator.
				1570	if (Kind && ArgNames.size() != Kind)
				1571	return ErrorP("Invalid number of operands for operator");
				1572
				1573	return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);
				1574	}
				1575
				1576	/// definition ::= 'def' prototype expression
				1577	static FunctionAST *ParseDefinition() {
				1578	getNextToken(); // eat def.
				1579	PrototypeAST *Proto = ParsePrototype();
				1580	if (Proto == 0) return 0;
				1581
				1582	if (ExprAST *E = ParseExpression())
				1583	return new FunctionAST(Proto, E);
				1584	return 0;
				1585	}
				1586
				1587	/// toplevelexpr ::= expression
				1588	static FunctionAST *ParseTopLevelExpr() {
				1589	if (ExprAST *E = ParseExpression()) {
				1590	// Make an anonymous proto.
				1591	PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
				1592	return new FunctionAST(Proto, E);
				1593	}
				1594	return 0;
				1595	}
				1596
				1597	/// external ::= 'extern' prototype
				1598	static PrototypeAST *ParseExtern() {
				1599	getNextToken(); // eat extern.
				1600	return ParsePrototype();
				1601	}
				1602
				1603	//===----------------------------------------------------------------------===//
				1604	// Code Generation
				1605	//===----------------------------------------------------------------------===//
				1606
				1607	static Module *TheModule;
Gabor Greif	d6c1ed0	2009-03-11 19:51:07 +0000	[diff] [blame^]	1608	static IRBuilder<> Builder;
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	1609	static std::map<std::string, AllocaInst*> NamedValues;
				1610	static FunctionPassManager *TheFPM;
				1611
				1612	Value ErrorV(const char Str) { Error(Str); return 0; }
				1613
				1614	/// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
				1615	/// the function. This is used for mutable variables etc.
				1616	static AllocaInst CreateEntryBlockAlloca(Function TheFunction,
				1617	const std::string &VarName) {
Gabor Greif	d6c1ed0	2009-03-11 19:51:07 +0000	[diff] [blame^]	1618	IRBuilder<> TmpB(&TheFunction->getEntryBlock(),
Duncan Sands	89f6d88	2008-04-13 06:22:09 +0000	[diff] [blame]	1619	TheFunction->getEntryBlock().begin());
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	1620	return TmpB.CreateAlloca(Type::DoubleTy, 0, VarName.c_str());
				1621	}
				1622
				1623
				1624	Value *NumberExprAST::Codegen() {
Gabor Greif	5934adf	2008-06-10 01:52:17 +0000	[diff] [blame]	1625	return ConstantFP::get(APFloat(Val));
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	1626	}
				1627
				1628	Value *VariableExprAST::Codegen() {
				1629	// Look this variable up in the function.
				1630	Value *V = NamedValues[Name];
				1631	if (V == 0) return ErrorV("Unknown variable name");
				1632
				1633	// Load the value.
				1634	return Builder.CreateLoad(V, Name.c_str());
				1635	}
				1636
				1637	Value *UnaryExprAST::Codegen() {
				1638	Value *OperandV = Operand->Codegen();
				1639	if (OperandV == 0) return 0;
				1640
				1641	Function *F = TheModule->getFunction(std::string("unary")+Opcode);
				1642	if (F == 0)
				1643	return ErrorV("Unknown unary operator");
				1644
				1645	return Builder.CreateCall(F, OperandV, "unop");
				1646	}
				1647
				1648
				1649	Value *BinaryExprAST::Codegen() {
				1650	// Special case '=' because we don't want to emit the LHS as an expression.
				1651	if (Op == '=') {
				1652	// Assignment requires the LHS to be an identifier.
				1653	VariableExprAST LHSE = dynamic_cast<VariableExprAST>(LHS);
				1654	if (!LHSE)
				1655	return ErrorV("destination of '=' must be a variable");
				1656	// Codegen the RHS.
				1657	Value *Val = RHS->Codegen();
				1658	if (Val == 0) return 0;
				1659
				1660	// Look up the name.
				1661	Value *Variable = NamedValues[LHSE->getName()];
				1662	if (Variable == 0) return ErrorV("Unknown variable name");
				1663
				1664	Builder.CreateStore(Val, Variable);
				1665	return Val;
				1666	}
				1667
				1668
				1669	Value *L = LHS->Codegen();
				1670	Value *R = RHS->Codegen();
				1671	if (L == 0 \|\| R == 0) return 0;
				1672
				1673	switch (Op) {
				1674	case '+': return Builder.CreateAdd(L, R, "addtmp");
				1675	case '-': return Builder.CreateSub(L, R, "subtmp");
				1676	case '*': return Builder.CreateMul(L, R, "multmp");
				1677	case '<':
Chris Lattner	7115521	2007-11-06 01:39:12 +0000	[diff] [blame]	1678	L = Builder.CreateFCmpULT(L, R, "cmptmp");
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	1679	// Convert bool 0/1 to double 0.0 or 1.0
				1680	return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
				1681	default: break;
				1682	}
				1683
				1684	// If it wasn't a builtin binary operator, it must be a user defined one. Emit
				1685	// a call to it.
				1686	Function *F = TheModule->getFunction(std::string("binary")+Op);
				1687	assert(F && "binary operator not found!");
				1688
				1689	Value *Ops[] = { L, R };
				1690	return Builder.CreateCall(F, Ops, Ops+2, "binop");
				1691	}
				1692
				1693	Value *CallExprAST::Codegen() {
				1694	// Look up the name in the global module table.
				1695	Function *CalleeF = TheModule->getFunction(Callee);
				1696	if (CalleeF == 0)
				1697	return ErrorV("Unknown function referenced");
				1698
				1699	// If argument mismatch error.
				1700	if (CalleeF->arg_size() != Args.size())
				1701	return ErrorV("Incorrect # arguments passed");
				1702
				1703	std::vector<Value*> ArgsV;
				1704	for (unsigned i = 0, e = Args.size(); i != e; ++i) {
				1705	ArgsV.push_back(Args[i]->Codegen());
				1706	if (ArgsV.back() == 0) return 0;
				1707	}
				1708
				1709	return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
				1710	}
				1711
				1712	Value *IfExprAST::Codegen() {
				1713	Value *CondV = Cond->Codegen();
				1714	if (CondV == 0) return 0;
				1715
				1716	// Convert condition to a bool by comparing equal to 0.0.
				1717	CondV = Builder.CreateFCmpONE(CondV,
Gabor Greif	5934adf	2008-06-10 01:52:17 +0000	[diff] [blame]	1718	ConstantFP::get(APFloat(0.0)),
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	1719	"ifcond");
				1720
				1721	Function *TheFunction = Builder.GetInsertBlock()->getParent();
				1722
				1723	// Create blocks for the then and else cases. Insert the 'then' block at the
				1724	// end of the function.
Gabor Greif	df7d2b4	2008-04-19 22:25:09 +0000	[diff] [blame]	1725	BasicBlock *ThenBB = BasicBlock::Create("then", TheFunction);
				1726	BasicBlock *ElseBB = BasicBlock::Create("else");
				1727	BasicBlock *MergeBB = BasicBlock::Create("ifcont");
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	1728
				1729	Builder.CreateCondBr(CondV, ThenBB, ElseBB);
				1730
				1731	// Emit then value.
				1732	Builder.SetInsertPoint(ThenBB);
				1733
				1734	Value *ThenV = Then->Codegen();
				1735	if (ThenV == 0) return 0;
				1736
				1737	Builder.CreateBr(MergeBB);
				1738	// Codegen of 'Then' can change the current block, update ThenBB for the PHI.
				1739	ThenBB = Builder.GetInsertBlock();
				1740
				1741	// Emit else block.
				1742	TheFunction->getBasicBlockList().push_back(ElseBB);
				1743	Builder.SetInsertPoint(ElseBB);
				1744
				1745	Value *ElseV = Else->Codegen();
				1746	if (ElseV == 0) return 0;
				1747
				1748	Builder.CreateBr(MergeBB);
				1749	// Codegen of 'Else' can change the current block, update ElseBB for the PHI.
				1750	ElseBB = Builder.GetInsertBlock();
				1751
				1752	// Emit merge block.
				1753	TheFunction->getBasicBlockList().push_back(MergeBB);
				1754	Builder.SetInsertPoint(MergeBB);
				1755	PHINode *PN = Builder.CreatePHI(Type::DoubleTy, "iftmp");
				1756
				1757	PN->addIncoming(ThenV, ThenBB);
				1758	PN->addIncoming(ElseV, ElseBB);
				1759	return PN;
				1760	}
				1761
				1762	Value *ForExprAST::Codegen() {
				1763	// Output this as:
				1764	// var = alloca double
				1765	// ...
				1766	// start = startexpr
				1767	// store start -> var
				1768	// goto loop
				1769	// loop:
				1770	// ...
				1771	// bodyexpr
				1772	// ...
				1773	// loopend:
				1774	// step = stepexpr
				1775	// endcond = endexpr
				1776	//
				1777	// curvar = load var
				1778	// nextvar = curvar + step
				1779	// store nextvar -> var
				1780	// br endcond, loop, endloop
				1781	// outloop:
				1782
				1783	Function *TheFunction = Builder.GetInsertBlock()->getParent();
				1784
				1785	// Create an alloca for the variable in the entry block.
				1786	AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
				1787
				1788	// Emit the start code first, without 'variable' in scope.
				1789	Value *StartVal = Start->Codegen();
				1790	if (StartVal == 0) return 0;
				1791
				1792	// Store the value into the alloca.
				1793	Builder.CreateStore(StartVal, Alloca);
				1794
				1795	// Make the new basic block for the loop header, inserting after current
				1796	// block.
				1797	BasicBlock *PreheaderBB = Builder.GetInsertBlock();
Gabor Greif	df7d2b4	2008-04-19 22:25:09 +0000	[diff] [blame]	1798	BasicBlock *LoopBB = BasicBlock::Create("loop", TheFunction);
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	1799
				1800	// Insert an explicit fall through from the current block to the LoopBB.
				1801	Builder.CreateBr(LoopBB);
				1802
				1803	// Start insertion in LoopBB.
				1804	Builder.SetInsertPoint(LoopBB);
				1805
				1806	// Within the loop, the variable is defined equal to the PHI node. If it
				1807	// shadows an existing variable, we have to restore it, so save it now.
				1808	AllocaInst *OldVal = NamedValues[VarName];
				1809	NamedValues[VarName] = Alloca;
				1810
				1811	// Emit the body of the loop. This, like any other expr, can change the
				1812	// current BB. Note that we ignore the value computed by the body, but don't
				1813	// allow an error.
				1814	if (Body->Codegen() == 0)
				1815	return 0;
				1816
				1817	// Emit the step value.
				1818	Value *StepVal;
				1819	if (Step) {
				1820	StepVal = Step->Codegen();
				1821	if (StepVal == 0) return 0;
				1822	} else {
				1823	// If not specified, use 1.0.
Gabor Greif	5934adf	2008-06-10 01:52:17 +0000	[diff] [blame]	1824	StepVal = ConstantFP::get(APFloat(1.0));
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	1825	}
				1826
				1827	// Compute the end condition.
				1828	Value *EndCond = End->Codegen();
				1829	if (EndCond == 0) return EndCond;
				1830
				1831	// Reload, increment, and restore the alloca. This handles the case where
				1832	// the body of the loop mutates the variable.
				1833	Value *CurVar = Builder.CreateLoad(Alloca, VarName.c_str());
				1834	Value *NextVar = Builder.CreateAdd(CurVar, StepVal, "nextvar");
				1835	Builder.CreateStore(NextVar, Alloca);
				1836
				1837	// Convert condition to a bool by comparing equal to 0.0.
				1838	EndCond = Builder.CreateFCmpONE(EndCond,
Gabor Greif	5934adf	2008-06-10 01:52:17 +0000	[diff] [blame]	1839	ConstantFP::get(APFloat(0.0)),
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	1840	"loopcond");
				1841
				1842	// Create the "after loop" block and insert it.
				1843	BasicBlock *LoopEndBB = Builder.GetInsertBlock();
Gabor Greif	df7d2b4	2008-04-19 22:25:09 +0000	[diff] [blame]	1844	BasicBlock *AfterBB = BasicBlock::Create("afterloop", TheFunction);
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	1845
				1846	// Insert the conditional branch into the end of LoopEndBB.
				1847	Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
				1848
				1849	// Any new code will be inserted in AfterBB.
				1850	Builder.SetInsertPoint(AfterBB);
				1851
				1852	// Restore the unshadowed variable.
				1853	if (OldVal)
				1854	NamedValues[VarName] = OldVal;
				1855	else
				1856	NamedValues.erase(VarName);
				1857
				1858
				1859	// for expr always returns 0.0.
				1860	return Constant::getNullValue(Type::DoubleTy);
				1861	}
				1862
				1863	Value *VarExprAST::Codegen() {
				1864	std::vector<AllocaInst *> OldBindings;
				1865
				1866	Function *TheFunction = Builder.GetInsertBlock()->getParent();
				1867
				1868	// Register all variables and emit their initializer.
				1869	for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
				1870	const std::string &VarName = VarNames[i].first;
				1871	ExprAST *Init = VarNames[i].second;
				1872
				1873	// Emit the initializer before adding the variable to scope, this prevents
				1874	// the initializer from referencing the variable itself, and permits stuff
				1875	// like this:
				1876	// var a = 1 in
				1877	// var a = a in ... # refers to outer 'a'.
				1878	Value *InitVal;
				1879	if (Init) {
				1880	InitVal = Init->Codegen();
				1881	if (InitVal == 0) return 0;
				1882	} else { // If not specified, use 0.0.
Gabor Greif	5934adf	2008-06-10 01:52:17 +0000	[diff] [blame]	1883	InitVal = ConstantFP::get(APFloat(0.0));
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	1884	}
				1885
				1886	AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
				1887	Builder.CreateStore(InitVal, Alloca);
				1888
				1889	// Remember the old variable binding so that we can restore the binding when
				1890	// we unrecurse.
				1891	OldBindings.push_back(NamedValues[VarName]);
				1892
				1893	// Remember this binding.
				1894	NamedValues[VarName] = Alloca;
				1895	}
				1896
				1897	// Codegen the body, now that all vars are in scope.
				1898	Value *BodyVal = Body->Codegen();
				1899	if (BodyVal == 0) return 0;
				1900
				1901	// Pop all our variables from scope.
				1902	for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
				1903	NamedValues[VarNames[i].first] = OldBindings[i];
				1904
				1905	// Return the body computation.
				1906	return BodyVal;
				1907	}
				1908
				1909
				1910	Function *PrototypeAST::Codegen() {
				1911	// Make the function type: double(double,double) etc.
				1912	std::vector<const Type*> Doubles(Args.size(), Type::DoubleTy);
				1913	FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
				1914
Gabor Greif	df7d2b4	2008-04-19 22:25:09 +0000	[diff] [blame]	1915	Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	1916
				1917	// If F conflicted, there was already something named 'Name'. If it has a
				1918	// body, don't allow redefinition or reextern.
				1919	if (F->getName() != Name) {
				1920	// Delete the one we just made and get the existing one.
				1921	F->eraseFromParent();
				1922	F = TheModule->getFunction(Name);
				1923
				1924	// If F already has a body, reject this.
				1925	if (!F->empty()) {
				1926	ErrorF("redefinition of function");
				1927	return 0;
				1928	}
				1929
				1930	// If F took a different number of args, reject.
				1931	if (F->arg_size() != Args.size()) {
				1932	ErrorF("redefinition of function with different # args");
				1933	return 0;
				1934	}
				1935	}
				1936
				1937	// Set names for all arguments.
				1938	unsigned Idx = 0;
				1939	for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size();
				1940	++AI, ++Idx)
				1941	AI->setName(Args[Idx]);
				1942
				1943	return F;
				1944	}
				1945
				1946	/// CreateArgumentAllocas - Create an alloca for each argument and register the
				1947	/// argument in the symbol table so that references to it will succeed.
				1948	void PrototypeAST::CreateArgumentAllocas(Function *F) {
				1949	Function::arg_iterator AI = F->arg_begin();
				1950	for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
				1951	// Create an alloca for this variable.
				1952	AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
				1953
				1954	// Store the initial value into the alloca.
				1955	Builder.CreateStore(AI, Alloca);
				1956
				1957	// Add arguments to variable symbol table.
				1958	NamedValues[Args[Idx]] = Alloca;
				1959	}
				1960	}
				1961
				1962
				1963	Function *FunctionAST::Codegen() {
				1964	NamedValues.clear();
				1965
				1966	Function *TheFunction = Proto->Codegen();
				1967	if (TheFunction == 0)
				1968	return 0;
				1969
				1970	// If this is an operator, install it.
				1971	if (Proto->isBinaryOp())
				1972	BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence();
				1973
				1974	// Create a new basic block to start insertion into.
Gabor Greif	df7d2b4	2008-04-19 22:25:09 +0000	[diff] [blame]	1975	BasicBlock *BB = BasicBlock::Create("entry", TheFunction);
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	1976	Builder.SetInsertPoint(BB);
				1977
				1978	// Add all arguments to the symbol table and create their allocas.
				1979	Proto->CreateArgumentAllocas(TheFunction);
				1980
				1981	if (Value *RetVal = Body->Codegen()) {
				1982	// Finish off the function.
				1983	Builder.CreateRet(RetVal);
				1984
				1985	// Validate the generated code, checking for consistency.
				1986	verifyFunction(*TheFunction);
				1987
				1988	// Optimize the function.
				1989	TheFPM->run(*TheFunction);
				1990
				1991	return TheFunction;
				1992	}
				1993
				1994	// Error reading body, remove function.
				1995	TheFunction->eraseFromParent();
				1996
				1997	if (Proto->isBinaryOp())
				1998	BinopPrecedence.erase(Proto->getOperatorName());
				1999	return 0;
				2000	}
				2001
				2002	//===----------------------------------------------------------------------===//
				2003	// Top-Level parsing and JIT Driver
				2004	//===----------------------------------------------------------------------===//
				2005
				2006	static ExecutionEngine *TheExecutionEngine;
				2007
				2008	static void HandleDefinition() {
				2009	if (FunctionAST *F = ParseDefinition()) {
				2010	if (Function *LF = F->Codegen()) {
				2011	fprintf(stderr, "Read function definition:");
				2012	LF->dump();
				2013	}
				2014	} else {
				2015	// Skip token for error recovery.
				2016	getNextToken();
				2017	}
				2018	}
				2019
				2020	static void HandleExtern() {
				2021	if (PrototypeAST *P = ParseExtern()) {
				2022	if (Function *F = P->Codegen()) {
				2023	fprintf(stderr, "Read extern: ");
				2024	F->dump();
				2025	}
				2026	} else {
				2027	// Skip token for error recovery.
				2028	getNextToken();
				2029	}
				2030	}
				2031
				2032	static void HandleTopLevelExpression() {
				2033	// Evaluate a top level expression into an anonymous function.
				2034	if (FunctionAST *F = ParseTopLevelExpr()) {
				2035	if (Function *LF = F->Codegen()) {
				2036	// JIT the function, returning a function pointer.
				2037	void *FPtr = TheExecutionEngine->getPointerToFunction(LF);
				2038
				2039	// Cast it to the right type (takes no arguments, returns a double) so we
				2040	// can call it as a native function.
				2041	double (FP)() = (double ()())FPtr;
				2042	fprintf(stderr, "Evaluated to %f\n", FP());
				2043	}
				2044	} else {
				2045	// Skip token for error recovery.
				2046	getNextToken();
				2047	}
				2048	}
				2049
				2050	/// top ::= definition \| external \| expression \| ';'
				2051	static void MainLoop() {
				2052	while (1) {
				2053	fprintf(stderr, "ready> ");
				2054	switch (CurTok) {
				2055	case tok_eof: return;
				2056	case ';': getNextToken(); break; // ignore top level semicolons.
				2057	case tok_def: HandleDefinition(); break;
				2058	case tok_extern: HandleExtern(); break;
				2059	default: HandleTopLevelExpression(); break;
				2060	}
				2061	}
				2062	}
				2063
				2064
				2065
				2066	//===----------------------------------------------------------------------===//
				2067	// "Library" functions that can be "extern'd" from user code.
				2068	//===----------------------------------------------------------------------===//
				2069
				2070	/// putchard - putchar that takes a double and returns 0.
				2071	extern "C"
				2072	double putchard(double X) {
				2073	putchar((char)X);
				2074	return 0;
				2075	}
				2076
				2077	/// printd - printf that takes a double prints it as "%f\n", returning 0.
				2078	extern "C"
				2079	double printd(double X) {
				2080	printf("%f\n", X);
				2081	return 0;
				2082	}
				2083
				2084	//===----------------------------------------------------------------------===//
				2085	// Main driver code.
				2086	//===----------------------------------------------------------------------===//
				2087
				2088	int main() {
				2089	// Install standard binary operators.
				2090	// 1 is lowest precedence.
				2091	BinopPrecedence['='] = 2;
				2092	BinopPrecedence['<'] = 10;
				2093	BinopPrecedence['+'] = 20;
				2094	BinopPrecedence['-'] = 20;
				2095	BinopPrecedence['*'] = 40; // highest.
				2096
				2097	// Prime the first token.
				2098	fprintf(stderr, "ready> ");
				2099	getNextToken();
				2100
				2101	// Make the module, which holds all the code.
				2102	TheModule = new Module("my cool jit");
				2103
				2104	// Create the JIT.
				2105	TheExecutionEngine = ExecutionEngine::create(TheModule);
				2106
				2107	{
				2108	ExistingModuleProvider OurModuleProvider(TheModule);
				2109	FunctionPassManager OurFPM(&OurModuleProvider);
				2110
				2111	// Set up the optimizer pipeline. Start with registering info about how the
				2112	// target lays out data structures.
				2113	OurFPM.add(new TargetData(*TheExecutionEngine->getTargetData()));
				2114	// Promote allocas to registers.
				2115	OurFPM.add(createPromoteMemoryToRegisterPass());
				2116	// Do simple "peephole" optimizations and bit-twiddling optzns.
				2117	OurFPM.add(createInstructionCombiningPass());
				2118	// Reassociate expressions.
				2119	OurFPM.add(createReassociatePass());
				2120	// Eliminate Common SubExpressions.
				2121	OurFPM.add(createGVNPass());
				2122	// Simplify the control flow graph (deleting unreachable blocks, etc).
				2123	OurFPM.add(createCFGSimplificationPass());
				2124
				2125	// Set the global so the code gen can use this.
				2126	TheFPM = &OurFPM;
				2127
				2128	// Run the main "interpreter loop" now.
				2129	MainLoop();
				2130
				2131	TheFPM = 0;
Chris Lattner	515686b	2008-02-05 06:18:42 +0000	[diff] [blame]	2132
				2133	// Print out all of the generated code.
				2134	TheModule->dump();
				2135
				2136	} // Free module provider (and thus the module) and pass manager.
				2137
Chris Lattner	62a709d	2007-11-05 00:23:57 +0000	[diff] [blame]	2138	return 0;
				2139	}
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	2140	</pre>
				2141	</div>
				2142
Chris Lattner	729eb14	2008-02-10 19:11:04 +0000	[diff] [blame]	2143	<a href="LangImpl8.html">Next: Conclusion and other useful LLVM tidbits</a>
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	2144	</div>
				2145
				2146	<!-- *********************************************************************** -->
				2147	<hr>
				2148	<address>
				2149	<a href="http://jigsaw.w3.org/css-validator/check/referer"><img
				2150	src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
				2151	<a href="http://validator.w3.org/check/referer"><img
				2152	src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
				2153
				2154	<a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
				2155	<a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
				2156	Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $
				2157	</address>
				2158	</body>
				2159	</html>