Chris Lattner | 00c992d | 2007-11-03 08:55:29 +0000 | [diff] [blame^] | 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" |
| 2 | "http://www.w3.org/TR/html4/strict.dtd"> |
| 3 | |
| 4 | <html> |
| 5 | <head> |
| 6 | <title>Kaleidoscope: Extending the Language: Mutable Variables / SSA |
| 7 | construction</title> |
| 8 | <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> |
| 9 | <meta name="author" content="Chris Lattner"> |
| 10 | <link rel="stylesheet" href="../llvm.css" type="text/css"> |
| 11 | </head> |
| 12 | |
| 13 | <body> |
| 14 | |
| 15 | <div class="doc_title">Kaleidoscope: Extending the Language: Mutable Variables</div> |
| 16 | |
| 17 | <div class="doc_author"> |
| 18 | <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p> |
| 19 | </div> |
| 20 | |
| 21 | <!-- *********************************************************************** --> |
| 22 | <div class="doc_section"><a name="intro">Part 7 Introduction</a></div> |
| 23 | <!-- *********************************************************************** --> |
| 24 | |
| 25 | <div class="doc_text"> |
| 26 | |
| 27 | <p>Welcome to Part 7 of the "<a href="index.html">Implementing a language with |
| 28 | LLVM</a>" tutorial. In parts 1 through 6, we've built a very respectable, |
| 29 | albeit simple, <a |
| 30 | href="http://en.wikipedia.org/wiki/Functional_programming">functional |
| 31 | programming language</a>. In our journey, we learned some parsing techniques, |
| 32 | how to build and represent an AST, how to build LLVM IR, and how to optimize |
| 33 | the resultant code and JIT compile it.</p> |
| 34 | |
| 35 | <p>While Kaleidoscope is interesting as a functional language, this makes it |
| 36 | "too easy" to generate LLVM IR for it. In particular, a functional language |
| 37 | makes it very easy to build LLVM IR directly in <a |
| 38 | href="http://en.wikipedia.org/wiki/Static_single_assignment_form">SSA form</a>. |
| 39 | Since LLVM requires that the input code be in SSA form, this is a very nice |
| 40 | property and it is often unclear to newcomers how to generate code for an |
| 41 | imperative language with mutable variables.</p> |
| 42 | |
| 43 | <p>The short (and happy) summary of this chapter is that there is no need for |
| 44 | your front-end to build SSA form: LLVM provides highly tuned and well tested |
| 45 | support for this, though the way it works is a bit unexpected for some.</p> |
| 46 | |
| 47 | </div> |
| 48 | |
| 49 | <!-- *********************************************************************** --> |
| 50 | <div class="doc_section"><a name="why">Why is this a hard problem?</a></div> |
| 51 | <!-- *********************************************************************** --> |
| 52 | |
| 53 | <div class="doc_text"> |
| 54 | |
| 55 | <p> |
| 56 | To understand why mutable variables cause complexities in SSA construction, |
| 57 | consider this extremely simple C example: |
| 58 | </p> |
| 59 | |
| 60 | <div class="doc_code"> |
| 61 | <pre> |
| 62 | int G, H; |
| 63 | int test(_Bool Condition) { |
| 64 | int X; |
| 65 | if (Condition) |
| 66 | X = G; |
| 67 | else |
| 68 | X = H; |
| 69 | return X; |
| 70 | } |
| 71 | </pre> |
| 72 | </div> |
| 73 | |
| 74 | <p>In this case, we have the variable "X", whose value depends on the path |
| 75 | executed in the program. Because there are two different possible values for X |
| 76 | before the return instruction, a PHI node is inserted to merge the two values. |
| 77 | The LLVM IR that we want for this example looks like this:</p> |
| 78 | |
| 79 | <div class="doc_code"> |
| 80 | <pre> |
| 81 | @G = weak global i32 0 ; type of @G is i32* |
| 82 | @H = weak global i32 0 ; type of @H is i32* |
| 83 | |
| 84 | define i32 @test(i1 %Condition) { |
| 85 | entry: |
| 86 | br i1 %Condition, label %cond_true, label %cond_false |
| 87 | |
| 88 | cond_true: |
| 89 | %X.0 = load i32* @G |
| 90 | br label %cond_next |
| 91 | |
| 92 | cond_false: |
| 93 | %X.1 = load i32* @H |
| 94 | br label %cond_next |
| 95 | |
| 96 | cond_next: |
| 97 | %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] |
| 98 | ret i32 %X.2 |
| 99 | } |
| 100 | </pre> |
| 101 | </div> |
| 102 | |
| 103 | <p>In this example, the loads from the G and H global variables are explicit in |
| 104 | the LLVM IR, and they live in the then/else branches of the if statement |
| 105 | (cond_true/cond_false). In order to merge the incoming values, the X.2 phi node |
| 106 | in the cond_next block selects the right value to use based on where control |
| 107 | flow is coming from: if control flow comes from the cond_false block, X.2 gets |
| 108 | the value of X.1. Alternatively, if control flow comes from cond_tree, it gets |
| 109 | the value of X.0. The intent of this chapter is not to explain the details of |
| 110 | SSA form. For more information, see one of the many <a |
| 111 | href="http://en.wikipedia.org/wiki/Static_single_assignment_form">online |
| 112 | references</a>.</p> |
| 113 | |
| 114 | <p>The question for this article is "who places phi nodes when lowering |
| 115 | assignments to mutable variables?". The issue here is that LLVM |
| 116 | <em>requires</em> that its IR be in SSA form: there is no "non-ssa" mode for it. |
| 117 | However, SSA construction requires non-trivial algorithms and data structures, |
| 118 | so it is inconvenient and wasteful for every front-end to have to reproduce this |
| 119 | logic.</p> |
| 120 | |
| 121 | </div> |
| 122 | |
| 123 | <!-- *********************************************************************** --> |
| 124 | <div class="doc_section"><a name="memory">Memory in LLVM</a></div> |
| 125 | <!-- *********************************************************************** --> |
| 126 | |
| 127 | <div class="doc_text"> |
| 128 | |
| 129 | <p>The 'trick' here is that while LLVM does require all register values to be |
| 130 | in SSA form, it does not require (or permit) memory objects to be in SSA form. |
| 131 | In the example above, note that the loads from G and H are direct accesses to |
| 132 | G and H: they are not renamed or versioned. This differs from some other |
| 133 | compiler systems, which does try to version memory objects. In LLVM, instead of |
| 134 | encoding dataflow analysis of memory into the LLVM IR, it is handled with <a |
| 135 | href="../WritingAnLLVMPass.html">Analysis Passes</a> which are computed on |
| 136 | demand.</p> |
| 137 | |
| 138 | <p> |
| 139 | With this in mind, the high-level idea is that we want to make a stack variable |
| 140 | (which lives in memory, because it is on the stack) for each mutable object in |
| 141 | a function. To take advantage of this trick, we need to talk about how LLVM |
| 142 | represents stack variables. |
| 143 | </p> |
| 144 | |
| 145 | <p>In LLVM, all memory accesses are explicit with load/store instructions, and |
| 146 | it is carefully designed to not have (or need) an "address-of" operator. Notice |
| 147 | how the type of the @G/@H global variables is actually "i32*" even though the |
| 148 | variable is defined as "i32". What this means is that @G defines <em>space</em> |
| 149 | for an i32 in the global data area, but its <em>name</em> actually refers to the |
| 150 | address for that space. Stack variables work the same way, but instead of being |
| 151 | declared with global variable definitions, they are declared with the |
| 152 | <a href="../LangRef.html#i_alloca">LLVM alloca instruction</a>:</p> |
| 153 | |
| 154 | <div class="doc_code"> |
| 155 | <pre> |
| 156 | define i32 @test(i1 %Condition) { |
| 157 | entry: |
| 158 | %X = alloca i32 ; type of %X is i32*. |
| 159 | ... |
| 160 | %tmp = load i32* %X ; load the stack value %X from the stack. |
| 161 | %tmp2 = add i32 %tmp, 1 ; increment it |
| 162 | store i32 %tmp2, i32* %X ; store it back |
| 163 | ... |
| 164 | </pre> |
| 165 | </div> |
| 166 | |
| 167 | <p>This code shows an example of how you can declare and manipulate a stack |
| 168 | variable in the LLVM IR. Stack memory allocated with the alloca instruction is |
| 169 | fully general: you can pass the address of the stack slot to functions, you can |
| 170 | store it in other variables, etc. In our example above, we could rewrite the |
| 171 | example to use the alloca technique to avoid using a PHI node:</p> |
| 172 | |
| 173 | <div class="doc_code"> |
| 174 | <pre> |
| 175 | @G = weak global i32 0 ; type of @G is i32* |
| 176 | @H = weak global i32 0 ; type of @H is i32* |
| 177 | |
| 178 | define i32 @test(i1 %Condition) { |
| 179 | entry: |
| 180 | %X = alloca i32 ; type of %X is i32*. |
| 181 | br i1 %Condition, label %cond_true, label %cond_false |
| 182 | |
| 183 | cond_true: |
| 184 | %X.0 = load i32* @G |
| 185 | store i32 %X.0, i32* %X ; Update X |
| 186 | br label %cond_next |
| 187 | |
| 188 | cond_false: |
| 189 | %X.1 = load i32* @H |
| 190 | store i32 %X.1, i32* %X ; Update X |
| 191 | br label %cond_next |
| 192 | |
| 193 | cond_next: |
| 194 | %X.2 = load i32* %X ; Read X |
| 195 | ret i32 %X.2 |
| 196 | } |
| 197 | </pre> |
| 198 | </div> |
| 199 | |
| 200 | <p>With this, we have discovered a way to handle arbitrary mutable variables |
| 201 | without the need to create Phi nodes at all:</p> |
| 202 | |
| 203 | <ol> |
| 204 | <li>Each mutable variable becomes a stack allocation.</li> |
| 205 | <li>Each read of the variable becomes a load from the stack.</li> |
| 206 | <li>Each update of the variable becomes a store to the stack.</li> |
| 207 | <li>Taking the address of a variable just uses the stack address directly.</li> |
| 208 | </ol> |
| 209 | |
| 210 | <p>While this solution has solved our immediate problem, it introduced another |
| 211 | one: we have now apparently introduced a lot of stack traffic for very simple |
| 212 | and common operations, a major performance problem. Fortunately for us, the |
| 213 | LLVM optimizer has a highly-tuned optimization pass named "mem2reg" that handles |
| 214 | this case, promoting allocas like this into SSA registers, inserting Phi nodes |
| 215 | as appropriate. If you run this example through the pass, for example, you'll |
| 216 | get:</p> |
| 217 | |
| 218 | <div class="doc_code"> |
| 219 | <pre> |
| 220 | $ <b>llvm-as < example.ll | opt -mem2reg | llvm-dis</b> |
| 221 | @G = weak global i32 0 |
| 222 | @H = weak global i32 0 |
| 223 | |
| 224 | define i32 @test(i1 %Condition) { |
| 225 | entry: |
| 226 | br i1 %Condition, label %cond_true, label %cond_false |
| 227 | |
| 228 | cond_true: |
| 229 | %X.0 = load i32* @G |
| 230 | br label %cond_next |
| 231 | |
| 232 | cond_false: |
| 233 | %X.1 = load i32* @H |
| 234 | br label %cond_next |
| 235 | |
| 236 | cond_next: |
| 237 | %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] |
| 238 | ret i32 %X.01 |
| 239 | } |
| 240 | </pre> |
| 241 | |
| 242 | <p>The mem2reg pass is guaranteed to work, and |
| 243 | |
| 244 | which cases. |
| 245 | </p> |
| 246 | |
| 247 | <p>The final question you may be asking is: should I bother with this nonsense |
| 248 | for my front-end? Wouldn't it be better if I just did SSA construction |
| 249 | directly, avoiding use of the mem2reg optimization pass? |
| 250 | |
| 251 | Proven, well tested, debug info, etc. |
| 252 | </p> |
| 253 | </div> |
| 254 | |
| 255 | |
| 256 | <!-- *********************************************************************** --> |
| 257 | <div class="doc_section"><a name="code">Full Code Listing</a></div> |
| 258 | <!-- *********************************************************************** --> |
| 259 | |
| 260 | <div class="doc_text"> |
| 261 | |
| 262 | <p> |
| 263 | Here is the complete code listing for our running example, enhanced with the |
| 264 | if/then/else and for expressions.. To build this example, use: |
| 265 | </p> |
| 266 | |
| 267 | <div class="doc_code"> |
| 268 | <pre> |
| 269 | # Compile |
| 270 | g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy |
| 271 | # Run |
| 272 | ./toy |
| 273 | </pre> |
| 274 | </div> |
| 275 | |
| 276 | <p>Here is the code:</p> |
| 277 | |
| 278 | <div class="doc_code"> |
| 279 | <pre> |
| 280 | </pre> |
| 281 | </div> |
| 282 | |
| 283 | </div> |
| 284 | |
| 285 | <!-- *********************************************************************** --> |
| 286 | <hr> |
| 287 | <address> |
| 288 | <a href="http://jigsaw.w3.org/css-validator/check/referer"><img |
| 289 | src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> |
| 290 | <a href="http://validator.w3.org/check/referer"><img |
| 291 | src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> |
| 292 | |
| 293 | <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> |
| 294 | <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br> |
| 295 | Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $ |
| 296 | </address> |
| 297 | </body> |
| 298 | </html> |