Blame - docs/tutorial/LangImpl7.html - fp2-dev/platform/external/llvm

blob: ca0d2f06e034fb9880e4ce620cf4329e14aad9fe [file] [log] [blame]

Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
				2	"http://www.w3.org/TR/html4/strict.dtd">
				3
				4	<html>
				5	<head>
				6	<title>Kaleidoscope: Extending the Language: Mutable Variables / SSA
				7	construction</title>
				8	<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
				9	<meta name="author" content="Chris Lattner">
				10	<link rel="stylesheet" href="../llvm.css" type="text/css">
				11	</head>
				12
				13	<body>
				14
				15	<div class="doc_title">Kaleidoscope: Extending the Language: Mutable Variables</div>
				16
				17	<div class="doc_author">
				18	<p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
				19	</div>
				20
				21	<!-- *********************************************************************** -->
				22	<div class="doc_section"><a name="intro">Part 7 Introduction</a></div>
				23	<!-- *********************************************************************** -->
				24
				25	<div class="doc_text">
				26
				27	<p>Welcome to Part 7 of the "<a href="index.html">Implementing a language with
				28	LLVM</a>" tutorial. In parts 1 through 6, we've built a very respectable,
				29	albeit simple, <a
				30	href="http://en.wikipedia.org/wiki/Functional_programming">functional
				31	programming language</a>. In our journey, we learned some parsing techniques,
				32	how to build and represent an AST, how to build LLVM IR, and how to optimize
				33	the resultant code and JIT compile it.</p>
				34
				35	<p>While Kaleidoscope is interesting as a functional language, this makes it
				36	"too easy" to generate LLVM IR for it. In particular, a functional language
				37	makes it very easy to build LLVM IR directly in <a
				38	href="http://en.wikipedia.org/wiki/Static_single_assignment_form">SSA form</a>.
				39	Since LLVM requires that the input code be in SSA form, this is a very nice
				40	property and it is often unclear to newcomers how to generate code for an
				41	imperative language with mutable variables.</p>
				42
				43	<p>The short (and happy) summary of this chapter is that there is no need for
				44	your front-end to build SSA form: LLVM provides highly tuned and well tested
				45	support for this, though the way it works is a bit unexpected for some.</p>
				46
				47	</div>
				48
				49	<!-- *********************************************************************** -->
				50	<div class="doc_section"><a name="why">Why is this a hard problem?</a></div>
				51	<!-- *********************************************************************** -->
				52
				53	<div class="doc_text">
				54
				55	<p>
				56	To understand why mutable variables cause complexities in SSA construction,
				57	consider this extremely simple C example:
				58	</p>
				59
				60	<div class="doc_code">
				61	<pre>
				62	int G, H;
				63	int test(_Bool Condition) {
				64	int X;
				65	if (Condition)
				66	X = G;
				67	else
				68	X = H;
				69	return X;
				70	}
				71	</pre>
				72	</div>
				73
				74	<p>In this case, we have the variable "X", whose value depends on the path
				75	executed in the program. Because there are two different possible values for X
				76	before the return instruction, a PHI node is inserted to merge the two values.
				77	The LLVM IR that we want for this example looks like this:</p>
				78
				79	<div class="doc_code">
				80	<pre>
				81	@G = weak global i32 0 ; type of @G is i32*
				82	@H = weak global i32 0 ; type of @H is i32*
				83
				84	define i32 @test(i1 %Condition) {
				85	entry:
				86	br i1 %Condition, label %cond_true, label %cond_false
				87
				88	cond_true:
				89	%X.0 = load i32* @G
				90	br label %cond_next
				91
				92	cond_false:
				93	%X.1 = load i32* @H
				94	br label %cond_next
				95
				96	cond_next:
				97	%X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
				98	ret i32 %X.2
				99	}
				100	</pre>
				101	</div>
				102
				103	<p>In this example, the loads from the G and H global variables are explicit in
				104	the LLVM IR, and they live in the then/else branches of the if statement
				105	(cond_true/cond_false). In order to merge the incoming values, the X.2 phi node
				106	in the cond_next block selects the right value to use based on where control
				107	flow is coming from: if control flow comes from the cond_false block, X.2 gets
				108	the value of X.1. Alternatively, if control flow comes from cond_tree, it gets
				109	the value of X.0. The intent of this chapter is not to explain the details of
				110	SSA form. For more information, see one of the many <a
				111	href="http://en.wikipedia.org/wiki/Static_single_assignment_form">online
				112	references</a>.</p>
				113
				114	<p>The question for this article is "who places phi nodes when lowering
				115	assignments to mutable variables?". The issue here is that LLVM
				116	<em>requires</em> that its IR be in SSA form: there is no "non-ssa" mode for it.
				117	However, SSA construction requires non-trivial algorithms and data structures,
				118	so it is inconvenient and wasteful for every front-end to have to reproduce this
				119	logic.</p>
				120
				121	</div>
				122
				123	<!-- *********************************************************************** -->
				124	<div class="doc_section"><a name="memory">Memory in LLVM</a></div>
				125	<!-- *********************************************************************** -->
				126
				127	<div class="doc_text">
				128
				129	<p>The 'trick' here is that while LLVM does require all register values to be
				130	in SSA form, it does not require (or permit) memory objects to be in SSA form.
				131	In the example above, note that the loads from G and H are direct accesses to
				132	G and H: they are not renamed or versioned. This differs from some other
				133	compiler systems, which does try to version memory objects. In LLVM, instead of
				134	encoding dataflow analysis of memory into the LLVM IR, it is handled with <a
				135	href="../WritingAnLLVMPass.html">Analysis Passes</a> which are computed on
				136	demand.</p>
				137
				138	<p>
				139	With this in mind, the high-level idea is that we want to make a stack variable
				140	(which lives in memory, because it is on the stack) for each mutable object in
				141	a function. To take advantage of this trick, we need to talk about how LLVM
				142	represents stack variables.
				143	</p>
				144
				145	<p>In LLVM, all memory accesses are explicit with load/store instructions, and
				146	it is carefully designed to not have (or need) an "address-of" operator. Notice
				147	how the type of the @G/@H global variables is actually "i32*" even though the
				148	variable is defined as "i32". What this means is that @G defines <em>space</em>
				149	for an i32 in the global data area, but its <em>name</em> actually refers to the
				150	address for that space. Stack variables work the same way, but instead of being
				151	declared with global variable definitions, they are declared with the
				152	<a href="../LangRef.html#i_alloca">LLVM alloca instruction</a>:</p>
				153
				154	<div class="doc_code">
				155	<pre>
				156	define i32 @test(i1 %Condition) {
				157	entry:
				158	%X = alloca i32 ; type of %X is i32*.
				159	...
				160	%tmp = load i32* %X ; load the stack value %X from the stack.
				161	%tmp2 = add i32 %tmp, 1 ; increment it
				162	store i32 %tmp2, i32* %X ; store it back
				163	...
				164	</pre>
				165	</div>
				166
				167	<p>This code shows an example of how you can declare and manipulate a stack
				168	variable in the LLVM IR. Stack memory allocated with the alloca instruction is
				169	fully general: you can pass the address of the stack slot to functions, you can
				170	store it in other variables, etc. In our example above, we could rewrite the
				171	example to use the alloca technique to avoid using a PHI node:</p>
				172
				173	<div class="doc_code">
				174	<pre>
				175	@G = weak global i32 0 ; type of @G is i32*
				176	@H = weak global i32 0 ; type of @H is i32*
				177
				178	define i32 @test(i1 %Condition) {
				179	entry:
				180	%X = alloca i32 ; type of %X is i32*.
				181	br i1 %Condition, label %cond_true, label %cond_false
				182
				183	cond_true:
				184	%X.0 = load i32* @G
				185	store i32 %X.0, i32* %X ; Update X
				186	br label %cond_next
				187
				188	cond_false:
				189	%X.1 = load i32* @H
				190	store i32 %X.1, i32* %X ; Update X
				191	br label %cond_next
				192
				193	cond_next:
				194	%X.2 = load i32* %X ; Read X
				195	ret i32 %X.2
				196	}
				197	</pre>
				198	</div>
				199
				200	<p>With this, we have discovered a way to handle arbitrary mutable variables
				201	without the need to create Phi nodes at all:</p>
				202
				203	<ol>
				204	<li>Each mutable variable becomes a stack allocation.</li>
				205	<li>Each read of the variable becomes a load from the stack.</li>
				206	<li>Each update of the variable becomes a store to the stack.</li>
				207	<li>Taking the address of a variable just uses the stack address directly.</li>
				208	</ol>
				209
				210	<p>While this solution has solved our immediate problem, it introduced another
				211	one: we have now apparently introduced a lot of stack traffic for very simple
				212	and common operations, a major performance problem. Fortunately for us, the
				213	LLVM optimizer has a highly-tuned optimization pass named "mem2reg" that handles
				214	this case, promoting allocas like this into SSA registers, inserting Phi nodes
				215	as appropriate. If you run this example through the pass, for example, you'll
				216	get:</p>
				217
				218	<div class="doc_code">
				219	<pre>
				220	$ <b>llvm-as < example.ll \| opt -mem2reg \| llvm-dis</b>
				221	@G = weak global i32 0
				222	@H = weak global i32 0
				223
				224	define i32 @test(i1 %Condition) {
				225	entry:
				226	br i1 %Condition, label %cond_true, label %cond_false
				227
				228	cond_true:
				229	%X.0 = load i32* @G
				230	br label %cond_next
				231
				232	cond_false:
				233	%X.1 = load i32* @H
				234	br label %cond_next
				235
				236	cond_next:
				237	%X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
				238	ret i32 %X.01
				239	}
				240	</pre>
Chris Lattner	e719831	2007-11-03 22:22:30 +0000	[diff] [blame]	241	</div>
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	242
Chris Lattner	e719831	2007-11-03 22:22:30 +0000	[diff] [blame]	243	<p>The mem2reg pass implements the standard "iterated dominator frontier"
				244	algorithm for constructing SSA form and has a number of optimizations that speed
				245	up very common degenerate cases. mem2reg really is the answer for dealing with
				246	mutable variables, and we highly recommend that you depend on it. Note that
				247	mem2reg only works on variables in certain circumstances:</p>
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	248
Chris Lattner	e719831	2007-11-03 22:22:30 +0000	[diff] [blame]	249	<ol>
				250	<li>mem2reg is alloca-driven: it looks for allocas and if it can handle them, it
				251	promotes them. It does not apply to global variables or heap allocations.</li>
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	252
Chris Lattner	e719831	2007-11-03 22:22:30 +0000	[diff] [blame]	253	<li>mem2reg only looks for alloca instructions in the entry block of the
				254	function. Being in the entry block guarantees that the alloca is only executed
				255	once, which makes analysis simpler.</li>
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	256
Chris Lattner	e719831	2007-11-03 22:22:30 +0000	[diff] [blame]	257	<li>mem2reg only promotes allocas whose uses are direct loads and stores. If
				258	the address of the stack object is passed to a function, or if any funny pointer
				259	arithmetic is involved, the alloca will not be promoted.</li>
				260
				261	<li>mem2reg only works on allocas of scalar values, and only if the array size
				262	of the allocation is 1 (or missing in the .ll file). mem2reg is not capable of
				263	promoting structs or arrays to registers. Note that the "scalarrepl" pass is
				264	more powerful and can promote structs, "unions", and arrays in many cases.</li>
				265
				266	</ol>
				267
				268	<p>
				269	All of these properties are easy to satisfy for most imperative languages, and
				270	we'll illustrated this below with Kaleidoscope. The final question you may be
				271	asking is: should I bother with this nonsense for my front-end? Wouldn't it be
				272	better if I just did SSA construction directly, avoiding use of the mem2reg
				273	optimization pass? In short, we strongly recommend that use you this technique
				274	for building SSA form, unless there is an extremely good reason not to. Using
				275	this technique is:</p>
				276
				277	<ul>
				278	<li>Proven and well tested: llvm-gcc and clang both use this technique for local
				279	mutable variables. As such, the most common clients of LLVM are using this to
				280	handle a bulk of their variables. You can be sure that bugs are found fast and
				281	fixed early.</li>
				282
				283	<li>Extremely Fast: mem2reg has a number of special cases that make it fast in
				284	common cases as well as fully general. For example, it has fast-paths for
				285	variables that are only used in a single block, variables that only have one
				286	assignment point, good heuristics to avoid insertion of unneeded phi nodes, etc.
				287	</li>
				288
				289	<li>Needed for debug info generation: <a href="../SourceLevelDebugging.html">
				290	Debug information in LLVM</a> relies on having the address of the variable
				291	exposed to attach debug info to it. This technique dovetails very naturally
				292	with this style of debug info.</li>
				293	</ul>
				294
				295	<p>If nothing else, this makes it much easier to get your front-end up and
				296	running, and is very simple to implement. Lets extend Kaleidoscope with mutable
				297	variables now!
Chris Lattner	00c992d	2007-11-03 08:55:29 +0000	[diff] [blame]	298	</p>
				299	</div>
				300
				301
				302	<!-- *********************************************************************** -->
				303	<div class="doc_section"><a name="code">Full Code Listing</a></div>
				304	<!-- *********************************************************************** -->
				305
				306	<div class="doc_text">
				307
				308	<p>
				309	Here is the complete code listing for our running example, enhanced with the
				310	if/then/else and for expressions.. To build this example, use:
				311	</p>
				312
				313	<div class="doc_code">
				314	<pre>
				315	# Compile
				316	g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
				317	# Run
				318	./toy
				319	</pre>
				320	</div>
				321
				322	<p>Here is the code:</p>
				323
				324	<div class="doc_code">
				325	<pre>
				326	</pre>
				327	</div>
				328
				329	</div>
				330
				331	<!-- *********************************************************************** -->
				332	<hr>
				333	<address>
				334	<a href="http://jigsaw.w3.org/css-validator/check/referer"><img
				335	src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
				336	<a href="http://validator.w3.org/check/referer"><img
				337	src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
				338
				339	<a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
				340	<a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
				341	Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $
				342	</address>
				343	</body>
				344	</html>