| Chris Lattner | 00c992d | 2007-11-03 08:55:29 +0000 | [diff] [blame] | 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" | 
 | 2 |                       "http://www.w3.org/TR/html4/strict.dtd"> | 
 | 3 |  | 
 | 4 | <html> | 
 | 5 | <head> | 
 | 6 |   <title>Kaleidoscope: Extending the Language: Mutable Variables / SSA | 
 | 7 |          construction</title> | 
 | 8 |   <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> | 
 | 9 |   <meta name="author" content="Chris Lattner"> | 
 | 10 |   <link rel="stylesheet" href="../llvm.css" type="text/css"> | 
 | 11 | </head> | 
 | 12 |  | 
 | 13 | <body> | 
 | 14 |  | 
 | 15 | <div class="doc_title">Kaleidoscope: Extending the Language: Mutable Variables</div> | 
 | 16 |  | 
 | 17 | <div class="doc_author"> | 
 | 18 |   <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p> | 
 | 19 | </div> | 
 | 20 |  | 
 | 21 | <!-- *********************************************************************** --> | 
 | 22 | <div class="doc_section"><a name="intro">Part 7 Introduction</a></div> | 
 | 23 | <!-- *********************************************************************** --> | 
 | 24 |  | 
 | 25 | <div class="doc_text"> | 
 | 26 |  | 
 | 27 | <p>Welcome to Part 7 of the "<a href="index.html">Implementing a language with | 
 | 28 | LLVM</a>" tutorial.  In parts 1 through 6, we've built a very respectable, | 
 | 29 | albeit simple, <a  | 
 | 30 | href="http://en.wikipedia.org/wiki/Functional_programming">functional | 
 | 31 | programming language</a>.  In our journey, we learned some parsing techniques, | 
 | 32 | how to build and represent an AST, how to build LLVM IR, and how to optimize | 
 | 33 | the resultant code and JIT compile it.</p> | 
 | 34 |  | 
 | 35 | <p>While Kaleidoscope is interesting as a functional language, this makes it  | 
 | 36 | "too easy" to generate LLVM IR for it.  In particular, a functional language | 
 | 37 | makes it very easy to build LLVM IR directly in <a  | 
 | 38 | href="http://en.wikipedia.org/wiki/Static_single_assignment_form">SSA form</a>. | 
 | 39 | Since LLVM requires that the input code be in SSA form, this is a very nice | 
 | 40 | property and it is often unclear to newcomers how to generate code for an | 
 | 41 | imperative language with mutable variables.</p> | 
 | 42 |  | 
 | 43 | <p>The short (and happy) summary of this chapter is that there is no need for | 
 | 44 | your front-end to build SSA form: LLVM provides highly tuned and well tested | 
 | 45 | support for this, though the way it works is a bit unexpected for some.</p> | 
 | 46 |  | 
 | 47 | </div> | 
 | 48 |  | 
 | 49 | <!-- *********************************************************************** --> | 
 | 50 | <div class="doc_section"><a name="why">Why is this a hard problem?</a></div> | 
 | 51 | <!-- *********************************************************************** --> | 
 | 52 |  | 
 | 53 | <div class="doc_text"> | 
 | 54 |  | 
 | 55 | <p> | 
 | 56 | To understand why mutable variables cause complexities in SSA construction,  | 
 | 57 | consider this extremely simple C example: | 
 | 58 | </p> | 
 | 59 |  | 
 | 60 | <div class="doc_code"> | 
 | 61 | <pre> | 
 | 62 | int G, H; | 
 | 63 | int test(_Bool Condition) { | 
 | 64 |   int X; | 
 | 65 |   if (Condition) | 
 | 66 |     X = G; | 
 | 67 |   else | 
 | 68 |     X = H; | 
 | 69 |   return X; | 
 | 70 | } | 
 | 71 | </pre> | 
 | 72 | </div> | 
 | 73 |  | 
 | 74 | <p>In this case, we have the variable "X", whose value depends on the path  | 
 | 75 | executed in the program.  Because there are two different possible values for X | 
 | 76 | before the return instruction, a PHI node is inserted to merge the two values. | 
 | 77 | The LLVM IR that we want for this example looks like this:</p> | 
 | 78 |  | 
 | 79 | <div class="doc_code"> | 
 | 80 | <pre> | 
 | 81 | @G = weak global i32 0   ; type of @G is i32* | 
 | 82 | @H = weak global i32 0   ; type of @H is i32* | 
 | 83 |  | 
 | 84 | define i32 @test(i1 %Condition) { | 
 | 85 | entry: | 
 | 86 | 	br i1 %Condition, label %cond_true, label %cond_false | 
 | 87 |  | 
 | 88 | cond_true: | 
 | 89 | 	%X.0 = load i32* @G | 
 | 90 | 	br label %cond_next | 
 | 91 |  | 
 | 92 | cond_false: | 
 | 93 | 	%X.1 = load i32* @H | 
 | 94 | 	br label %cond_next | 
 | 95 |  | 
 | 96 | cond_next: | 
 | 97 | 	%X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] | 
 | 98 | 	ret i32 %X.2 | 
 | 99 | } | 
 | 100 | </pre> | 
 | 101 | </div> | 
 | 102 |  | 
 | 103 | <p>In this example, the loads from the G and H global variables are explicit in | 
 | 104 | the LLVM IR, and they live in the then/else branches of the if statement | 
 | 105 | (cond_true/cond_false).  In order to merge the incoming values, the X.2 phi node | 
 | 106 | in the cond_next block selects the right value to use based on where control  | 
 | 107 | flow is coming from: if control flow comes from the cond_false block, X.2 gets | 
 | 108 | the value of X.1.  Alternatively, if control flow comes from cond_tree, it gets | 
 | 109 | the value of X.0.  The intent of this chapter is not to explain the details of | 
 | 110 | SSA form.  For more information, see one of the many <a  | 
 | 111 | href="http://en.wikipedia.org/wiki/Static_single_assignment_form">online  | 
 | 112 | references</a>.</p> | 
 | 113 |  | 
 | 114 | <p>The question for this article is "who places phi nodes when lowering  | 
 | 115 | assignments to mutable variables?".  The issue here is that LLVM  | 
 | 116 | <em>requires</em> that its IR be in SSA form: there is no "non-ssa" mode for it. | 
 | 117 | However, SSA construction requires non-trivial algorithms and data structures, | 
 | 118 | so it is inconvenient and wasteful for every front-end to have to reproduce this | 
 | 119 | logic.</p> | 
 | 120 |  | 
 | 121 | </div> | 
 | 122 |  | 
 | 123 | <!-- *********************************************************************** --> | 
 | 124 | <div class="doc_section"><a name="memory">Memory in LLVM</a></div> | 
 | 125 | <!-- *********************************************************************** --> | 
 | 126 |  | 
 | 127 | <div class="doc_text"> | 
 | 128 |  | 
 | 129 | <p>The 'trick' here is that while LLVM does require all register values to be | 
 | 130 | in SSA form, it does not require (or permit) memory objects to be in SSA form. | 
 | 131 | In the example above, note that the loads from G and H are direct accesses to | 
 | 132 | G and H: they are not renamed or versioned.  This differs from some other | 
 | 133 | compiler systems, which does try to version memory objects.  In LLVM, instead of | 
 | 134 | encoding dataflow analysis of memory into the LLVM IR, it is handled with <a  | 
 | 135 | href="../WritingAnLLVMPass.html">Analysis Passes</a> which are computed on | 
 | 136 | demand.</p> | 
 | 137 |  | 
 | 138 | <p> | 
 | 139 | With this in mind, the high-level idea is that we want to make a stack variable | 
 | 140 | (which lives in memory, because it is on the stack) for each mutable object in | 
 | 141 | a function.  To take advantage of this trick, we need to talk about how LLVM | 
 | 142 | represents stack variables. | 
 | 143 | </p> | 
 | 144 |  | 
 | 145 | <p>In LLVM, all memory accesses are explicit with load/store instructions, and | 
 | 146 | it is carefully designed to not have (or need) an "address-of" operator.  Notice | 
 | 147 | how the type of the @G/@H global variables is actually "i32*" even though the  | 
 | 148 | variable is defined as "i32".  What this means is that @G defines <em>space</em> | 
 | 149 | for an i32 in the global data area, but its <em>name</em> actually refers to the | 
 | 150 | address for that space.  Stack variables work the same way, but instead of being | 
 | 151 | declared with global variable definitions, they are declared with the  | 
 | 152 | <a href="../LangRef.html#i_alloca">LLVM alloca instruction</a>:</p> | 
 | 153 |  | 
 | 154 | <div class="doc_code"> | 
 | 155 | <pre> | 
 | 156 | define i32 @test(i1 %Condition) { | 
 | 157 | entry: | 
 | 158 | 	%X = alloca i32           ; type of %X is i32*. | 
 | 159 | 	... | 
 | 160 | 	%tmp = load i32* %X       ; load the stack value %X from the stack. | 
 | 161 | 	%tmp2 = add i32 %tmp, 1   ; increment it | 
 | 162 | 	store i32 %tmp2, i32* %X  ; store it back | 
 | 163 | 	... | 
 | 164 | </pre> | 
 | 165 | </div> | 
 | 166 |  | 
 | 167 | <p>This code shows an example of how you can declare and manipulate a stack | 
 | 168 | variable in the LLVM IR.  Stack memory allocated with the alloca instruction is | 
 | 169 | fully general: you can pass the address of the stack slot to functions, you can | 
 | 170 | store it in other variables, etc.  In our example above, we could rewrite the | 
 | 171 | example to use the alloca technique to avoid using a PHI node:</p> | 
 | 172 |  | 
 | 173 | <div class="doc_code"> | 
 | 174 | <pre> | 
 | 175 | @G = weak global i32 0   ; type of @G is i32* | 
 | 176 | @H = weak global i32 0   ; type of @H is i32* | 
 | 177 |  | 
 | 178 | define i32 @test(i1 %Condition) { | 
 | 179 | entry: | 
 | 180 | 	%X = alloca i32           ; type of %X is i32*. | 
 | 181 | 	br i1 %Condition, label %cond_true, label %cond_false | 
 | 182 |  | 
 | 183 | cond_true: | 
 | 184 | 	%X.0 = load i32* @G | 
 | 185 |         store i32 %X.0, i32* %X   ; Update X | 
 | 186 | 	br label %cond_next | 
 | 187 |  | 
 | 188 | cond_false: | 
 | 189 | 	%X.1 = load i32* @H | 
 | 190 |         store i32 %X.1, i32* %X   ; Update X | 
 | 191 | 	br label %cond_next | 
 | 192 |  | 
 | 193 | cond_next: | 
 | 194 | 	%X.2 = load i32* %X       ; Read X | 
 | 195 | 	ret i32 %X.2 | 
 | 196 | } | 
 | 197 | </pre> | 
 | 198 | </div> | 
 | 199 |  | 
 | 200 | <p>With this, we have discovered a way to handle arbitrary mutable variables | 
 | 201 | without the need to create Phi nodes at all:</p> | 
 | 202 |  | 
 | 203 | <ol> | 
 | 204 | <li>Each mutable variable becomes a stack allocation.</li> | 
 | 205 | <li>Each read of the variable becomes a load from the stack.</li> | 
 | 206 | <li>Each update of the variable becomes a store to the stack.</li> | 
 | 207 | <li>Taking the address of a variable just uses the stack address directly.</li> | 
 | 208 | </ol> | 
 | 209 |  | 
 | 210 | <p>While this solution has solved our immediate problem, it introduced another | 
 | 211 | one: we have now apparently introduced a lot of stack traffic for very simple | 
 | 212 | and common operations, a major performance problem.  Fortunately for us, the | 
 | 213 | LLVM optimizer has a highly-tuned optimization pass named "mem2reg" that handles | 
 | 214 | this case, promoting allocas like this into SSA registers, inserting Phi nodes | 
 | 215 | as appropriate.  If you run this example through the pass, for example, you'll | 
 | 216 | get:</p> | 
 | 217 |  | 
 | 218 | <div class="doc_code"> | 
 | 219 | <pre> | 
 | 220 | $ <b>llvm-as < example.ll | opt -mem2reg | llvm-dis</b> | 
 | 221 | @G = weak global i32 0 | 
 | 222 | @H = weak global i32 0 | 
 | 223 |  | 
 | 224 | define i32 @test(i1 %Condition) { | 
 | 225 | entry: | 
 | 226 | 	br i1 %Condition, label %cond_true, label %cond_false | 
 | 227 |  | 
 | 228 | cond_true: | 
 | 229 | 	%X.0 = load i32* @G | 
 | 230 | 	br label %cond_next | 
 | 231 |  | 
 | 232 | cond_false: | 
 | 233 | 	%X.1 = load i32* @H | 
 | 234 | 	br label %cond_next | 
 | 235 |  | 
 | 236 | cond_next: | 
 | 237 | 	%X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] | 
 | 238 | 	ret i32 %X.01 | 
 | 239 | } | 
 | 240 | </pre> | 
| Chris Lattner | e719831 | 2007-11-03 22:22:30 +0000 | [diff] [blame] | 241 | </div> | 
| Chris Lattner | 00c992d | 2007-11-03 08:55:29 +0000 | [diff] [blame] | 242 |  | 
| Chris Lattner | e719831 | 2007-11-03 22:22:30 +0000 | [diff] [blame] | 243 | <p>The mem2reg pass implements the standard "iterated dominator frontier" | 
 | 244 | algorithm for constructing SSA form and has a number of optimizations that speed | 
 | 245 | up very common degenerate cases.  mem2reg really is the answer for dealing with | 
 | 246 | mutable variables, and we highly recommend that you depend on it.  Note that | 
 | 247 | mem2reg only works on variables in certain circumstances:</p> | 
| Chris Lattner | 00c992d | 2007-11-03 08:55:29 +0000 | [diff] [blame] | 248 |  | 
| Chris Lattner | e719831 | 2007-11-03 22:22:30 +0000 | [diff] [blame] | 249 | <ol> | 
 | 250 | <li>mem2reg is alloca-driven: it looks for allocas and if it can handle them, it | 
 | 251 | promotes them.  It does not apply to global variables or heap allocations.</li> | 
| Chris Lattner | 00c992d | 2007-11-03 08:55:29 +0000 | [diff] [blame] | 252 |  | 
| Chris Lattner | e719831 | 2007-11-03 22:22:30 +0000 | [diff] [blame] | 253 | <li>mem2reg only looks for alloca instructions in the entry block of the | 
 | 254 | function.  Being in the entry block guarantees that the alloca is only executed | 
 | 255 | once, which makes analysis simpler.</li> | 
| Chris Lattner | 00c992d | 2007-11-03 08:55:29 +0000 | [diff] [blame] | 256 |  | 
| Chris Lattner | e719831 | 2007-11-03 22:22:30 +0000 | [diff] [blame] | 257 | <li>mem2reg only promotes allocas whose uses are direct loads and stores.  If | 
 | 258 | the address of the stack object is passed to a function, or if any funny pointer | 
 | 259 | arithmetic is involved, the alloca will not be promoted.</li> | 
 | 260 |  | 
 | 261 | <li>mem2reg only works on allocas of scalar values, and only if the array size | 
 | 262 | of the allocation is 1 (or missing in the .ll file).  mem2reg is not capable of | 
 | 263 | promoting structs or arrays to registers.  Note that the "scalarrepl" pass is | 
 | 264 | more powerful and can promote structs, "unions", and arrays in many cases.</li> | 
 | 265 |  | 
 | 266 | </ol> | 
 | 267 |  | 
 | 268 | <p> | 
 | 269 | All of these properties are easy to satisfy for most imperative languages, and | 
 | 270 | we'll illustrated this below with Kaleidoscope.  The final question you may be | 
 | 271 | asking is: should I bother with this nonsense for my front-end?  Wouldn't it be | 
 | 272 | better if I just did SSA construction directly, avoiding use of the mem2reg | 
 | 273 | optimization pass?  In short, we strongly recommend that use you this technique | 
 | 274 | for building SSA form, unless there is an extremely good reason not to.  Using | 
 | 275 | this technique is:</p> | 
 | 276 |  | 
 | 277 | <ul> | 
 | 278 | <li>Proven and well tested: llvm-gcc and clang both use this technique for local | 
 | 279 | mutable variables.  As such, the most common clients of LLVM are using this to | 
 | 280 | handle a bulk of their variables.  You can be sure that bugs are found fast and | 
 | 281 | fixed early.</li> | 
 | 282 |  | 
 | 283 | <li>Extremely Fast: mem2reg has a number of special cases that make it fast in | 
 | 284 | common cases as well as fully general.  For example, it has fast-paths for | 
 | 285 | variables that are only used in a single block, variables that only have one | 
 | 286 | assignment point, good heuristics to avoid insertion of unneeded phi nodes, etc. | 
 | 287 | </li> | 
 | 288 |  | 
 | 289 | <li>Needed for debug info generation: <a href="../SourceLevelDebugging.html"> | 
 | 290 | Debug information in LLVM</a> relies on having the address of the variable | 
 | 291 | exposed to attach debug info to it.  This technique dovetails very naturally  | 
 | 292 | with this style of debug info.</li> | 
 | 293 | </ul> | 
 | 294 |  | 
 | 295 | <p>If nothing else, this makes it much easier to get your front-end up and  | 
 | 296 | running, and is very simple to implement.  Lets extend Kaleidoscope with mutable | 
 | 297 | variables now! | 
| Chris Lattner | 00c992d | 2007-11-03 08:55:29 +0000 | [diff] [blame] | 298 | </p> | 
 | 299 | </div> | 
 | 300 |  | 
 | 301 |  | 
 | 302 | <!-- *********************************************************************** --> | 
 | 303 | <div class="doc_section"><a name="code">Full Code Listing</a></div> | 
 | 304 | <!-- *********************************************************************** --> | 
 | 305 |  | 
 | 306 | <div class="doc_text"> | 
 | 307 |  | 
 | 308 | <p> | 
 | 309 | Here is the complete code listing for our running example, enhanced with the | 
 | 310 | if/then/else and for expressions..  To build this example, use: | 
 | 311 | </p> | 
 | 312 |  | 
 | 313 | <div class="doc_code"> | 
 | 314 | <pre> | 
 | 315 |    # Compile | 
 | 316 |    g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy | 
 | 317 |    # Run | 
 | 318 |    ./toy | 
 | 319 | </pre> | 
 | 320 | </div> | 
 | 321 |  | 
 | 322 | <p>Here is the code:</p> | 
 | 323 |  | 
 | 324 | <div class="doc_code"> | 
 | 325 | <pre> | 
 | 326 | </pre> | 
 | 327 | </div> | 
 | 328 |  | 
 | 329 | </div> | 
 | 330 |  | 
 | 331 | <!-- *********************************************************************** --> | 
 | 332 | <hr> | 
 | 333 | <address> | 
 | 334 |   <a href="http://jigsaw.w3.org/css-validator/check/referer"><img | 
 | 335 |   src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> | 
 | 336 |   <a href="http://validator.w3.org/check/referer"><img | 
 | 337 |   src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> | 
 | 338 |  | 
 | 339 |   <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> | 
 | 340 |   <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br> | 
 | 341 |   Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $ | 
 | 342 | </address> | 
 | 343 | </body> | 
 | 344 | </html> |