Sean Silva | ee47edf | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 1 | ======================================================= |
| 2 | Kaleidoscope: Extending the Language: Mutable Variables |
| 3 | ======================================================= |
| 4 | |
| 5 | .. contents:: |
| 6 | :local: |
| 7 | |
Sean Silva | ee47edf | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 8 | Chapter 7 Introduction |
| 9 | ====================== |
| 10 | |
| 11 | Welcome to Chapter 7 of the "`Implementing a language with |
| 12 | LLVM <index.html>`_" tutorial. In chapters 1 through 6, we've built a |
| 13 | very respectable, albeit simple, `functional programming |
| 14 | language <http://en.wikipedia.org/wiki/Functional_programming>`_. In our |
| 15 | journey, we learned some parsing techniques, how to build and represent |
| 16 | an AST, how to build LLVM IR, and how to optimize the resultant code as |
| 17 | well as JIT compile it. |
| 18 | |
| 19 | While Kaleidoscope is interesting as a functional language, the fact |
| 20 | that it is functional makes it "too easy" to generate LLVM IR for it. In |
| 21 | particular, a functional language makes it very easy to build LLVM IR |
| 22 | directly in `SSA |
| 23 | form <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_. |
| 24 | Since LLVM requires that the input code be in SSA form, this is a very |
| 25 | nice property and it is often unclear to newcomers how to generate code |
| 26 | for an imperative language with mutable variables. |
| 27 | |
| 28 | The short (and happy) summary of this chapter is that there is no need |
| 29 | for your front-end to build SSA form: LLVM provides highly tuned and |
| 30 | well tested support for this, though the way it works is a bit |
| 31 | unexpected for some. |
| 32 | |
| 33 | Why is this a hard problem? |
| 34 | =========================== |
| 35 | |
| 36 | To understand why mutable variables cause complexities in SSA |
| 37 | construction, consider this extremely simple C example: |
| 38 | |
| 39 | .. code-block:: c |
| 40 | |
| 41 | int G, H; |
| 42 | int test(_Bool Condition) { |
| 43 | int X; |
| 44 | if (Condition) |
| 45 | X = G; |
| 46 | else |
| 47 | X = H; |
| 48 | return X; |
| 49 | } |
| 50 | |
| 51 | In this case, we have the variable "X", whose value depends on the path |
| 52 | executed in the program. Because there are two different possible values |
| 53 | for X before the return instruction, a PHI node is inserted to merge the |
| 54 | two values. The LLVM IR that we want for this example looks like this: |
| 55 | |
| 56 | .. code-block:: llvm |
| 57 | |
| 58 | @G = weak global i32 0 ; type of @G is i32* |
| 59 | @H = weak global i32 0 ; type of @H is i32* |
| 60 | |
| 61 | define i32 @test(i1 %Condition) { |
| 62 | entry: |
| 63 | br i1 %Condition, label %cond_true, label %cond_false |
| 64 | |
| 65 | cond_true: |
| 66 | %X.0 = load i32* @G |
| 67 | br label %cond_next |
| 68 | |
| 69 | cond_false: |
| 70 | %X.1 = load i32* @H |
| 71 | br label %cond_next |
| 72 | |
| 73 | cond_next: |
| 74 | %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] |
| 75 | ret i32 %X.2 |
| 76 | } |
| 77 | |
| 78 | In this example, the loads from the G and H global variables are |
| 79 | explicit in the LLVM IR, and they live in the then/else branches of the |
| 80 | if statement (cond\_true/cond\_false). In order to merge the incoming |
| 81 | values, the X.2 phi node in the cond\_next block selects the right value |
| 82 | to use based on where control flow is coming from: if control flow comes |
| 83 | from the cond\_false block, X.2 gets the value of X.1. Alternatively, if |
| 84 | control flow comes from cond\_true, it gets the value of X.0. The intent |
| 85 | of this chapter is not to explain the details of SSA form. For more |
| 86 | information, see one of the many `online |
| 87 | references <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_. |
| 88 | |
| 89 | The question for this article is "who places the phi nodes when lowering |
| 90 | assignments to mutable variables?". The issue here is that LLVM |
| 91 | *requires* that its IR be in SSA form: there is no "non-ssa" mode for |
| 92 | it. However, SSA construction requires non-trivial algorithms and data |
| 93 | structures, so it is inconvenient and wasteful for every front-end to |
| 94 | have to reproduce this logic. |
| 95 | |
| 96 | Memory in LLVM |
| 97 | ============== |
| 98 | |
| 99 | The 'trick' here is that while LLVM does require all register values to |
| 100 | be in SSA form, it does not require (or permit) memory objects to be in |
| 101 | SSA form. In the example above, note that the loads from G and H are |
| 102 | direct accesses to G and H: they are not renamed or versioned. This |
| 103 | differs from some other compiler systems, which do try to version memory |
| 104 | objects. In LLVM, instead of encoding dataflow analysis of memory into |
| 105 | the LLVM IR, it is handled with `Analysis |
| 106 | Passes <../WritingAnLLVMPass.html>`_ which are computed on demand. |
| 107 | |
| 108 | With this in mind, the high-level idea is that we want to make a stack |
| 109 | variable (which lives in memory, because it is on the stack) for each |
| 110 | mutable object in a function. To take advantage of this trick, we need |
| 111 | to talk about how LLVM represents stack variables. |
| 112 | |
| 113 | In LLVM, all memory accesses are explicit with load/store instructions, |
| 114 | and it is carefully designed not to have (or need) an "address-of" |
| 115 | operator. Notice how the type of the @G/@H global variables is actually |
| 116 | "i32\*" even though the variable is defined as "i32". What this means is |
| 117 | that @G defines *space* for an i32 in the global data area, but its |
| 118 | *name* actually refers to the address for that space. Stack variables |
| 119 | work the same way, except that instead of being declared with global |
| 120 | variable definitions, they are declared with the `LLVM alloca |
| 121 | instruction <../LangRef.html#i_alloca>`_: |
| 122 | |
| 123 | .. code-block:: llvm |
| 124 | |
| 125 | define i32 @example() { |
| 126 | entry: |
| 127 | %X = alloca i32 ; type of %X is i32*. |
| 128 | ... |
| 129 | %tmp = load i32* %X ; load the stack value %X from the stack. |
| 130 | %tmp2 = add i32 %tmp, 1 ; increment it |
| 131 | store i32 %tmp2, i32* %X ; store it back |
| 132 | ... |
| 133 | |
| 134 | This code shows an example of how you can declare and manipulate a stack |
| 135 | variable in the LLVM IR. Stack memory allocated with the alloca |
| 136 | instruction is fully general: you can pass the address of the stack slot |
| 137 | to functions, you can store it in other variables, etc. In our example |
| 138 | above, we could rewrite the example to use the alloca technique to avoid |
| 139 | using a PHI node: |
| 140 | |
| 141 | .. code-block:: llvm |
| 142 | |
| 143 | @G = weak global i32 0 ; type of @G is i32* |
| 144 | @H = weak global i32 0 ; type of @H is i32* |
| 145 | |
| 146 | define i32 @test(i1 %Condition) { |
| 147 | entry: |
| 148 | %X = alloca i32 ; type of %X is i32*. |
| 149 | br i1 %Condition, label %cond_true, label %cond_false |
| 150 | |
| 151 | cond_true: |
| 152 | %X.0 = load i32* @G |
| 153 | store i32 %X.0, i32* %X ; Update X |
| 154 | br label %cond_next |
| 155 | |
| 156 | cond_false: |
| 157 | %X.1 = load i32* @H |
| 158 | store i32 %X.1, i32* %X ; Update X |
| 159 | br label %cond_next |
| 160 | |
| 161 | cond_next: |
| 162 | %X.2 = load i32* %X ; Read X |
| 163 | ret i32 %X.2 |
| 164 | } |
| 165 | |
| 166 | With this, we have discovered a way to handle arbitrary mutable |
| 167 | variables without the need to create Phi nodes at all: |
| 168 | |
| 169 | #. Each mutable variable becomes a stack allocation. |
| 170 | #. Each read of the variable becomes a load from the stack. |
| 171 | #. Each update of the variable becomes a store to the stack. |
| 172 | #. Taking the address of a variable just uses the stack address |
| 173 | directly. |
| 174 | |
| 175 | While this solution has solved our immediate problem, it introduced |
| 176 | another one: we have now apparently introduced a lot of stack traffic |
| 177 | for very simple and common operations, a major performance problem. |
| 178 | Fortunately for us, the LLVM optimizer has a highly-tuned optimization |
| 179 | pass named "mem2reg" that handles this case, promoting allocas like this |
| 180 | into SSA registers, inserting Phi nodes as appropriate. If you run this |
| 181 | example through the pass, for example, you'll get: |
| 182 | |
| 183 | .. code-block:: bash |
| 184 | |
| 185 | $ llvm-as < example.ll | opt -mem2reg | llvm-dis |
| 186 | @G = weak global i32 0 |
| 187 | @H = weak global i32 0 |
| 188 | |
| 189 | define i32 @test(i1 %Condition) { |
| 190 | entry: |
| 191 | br i1 %Condition, label %cond_true, label %cond_false |
| 192 | |
| 193 | cond_true: |
| 194 | %X.0 = load i32* @G |
| 195 | br label %cond_next |
| 196 | |
| 197 | cond_false: |
| 198 | %X.1 = load i32* @H |
| 199 | br label %cond_next |
| 200 | |
| 201 | cond_next: |
| 202 | %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] |
| 203 | ret i32 %X.01 |
| 204 | } |
| 205 | |
| 206 | The mem2reg pass implements the standard "iterated dominance frontier" |
| 207 | algorithm for constructing SSA form and has a number of optimizations |
| 208 | that speed up (very common) degenerate cases. The mem2reg optimization |
| 209 | pass is the answer to dealing with mutable variables, and we highly |
| 210 | recommend that you depend on it. Note that mem2reg only works on |
| 211 | variables in certain circumstances: |
| 212 | |
| 213 | #. mem2reg is alloca-driven: it looks for allocas and if it can handle |
| 214 | them, it promotes them. It does not apply to global variables or heap |
| 215 | allocations. |
| 216 | #. mem2reg only looks for alloca instructions in the entry block of the |
| 217 | function. Being in the entry block guarantees that the alloca is only |
| 218 | executed once, which makes analysis simpler. |
| 219 | #. mem2reg only promotes allocas whose uses are direct loads and stores. |
| 220 | If the address of the stack object is passed to a function, or if any |
| 221 | funny pointer arithmetic is involved, the alloca will not be |
| 222 | promoted. |
| 223 | #. mem2reg only works on allocas of `first |
| 224 | class <../LangRef.html#t_classifications>`_ values (such as pointers, |
| 225 | scalars and vectors), and only if the array size of the allocation is |
| 226 | 1 (or missing in the .ll file). mem2reg is not capable of promoting |
| 227 | structs or arrays to registers. Note that the "scalarrepl" pass is |
| 228 | more powerful and can promote structs, "unions", and arrays in many |
| 229 | cases. |
| 230 | |
| 231 | All of these properties are easy to satisfy for most imperative |
| 232 | languages, and we'll illustrate it below with Kaleidoscope. The final |
| 233 | question you may be asking is: should I bother with this nonsense for my |
| 234 | front-end? Wouldn't it be better if I just did SSA construction |
| 235 | directly, avoiding use of the mem2reg optimization pass? In short, we |
| 236 | strongly recommend that you use this technique for building SSA form, |
| 237 | unless there is an extremely good reason not to. Using this technique |
| 238 | is: |
| 239 | |
| 240 | - Proven and well tested: llvm-gcc and clang both use this technique |
| 241 | for local mutable variables. As such, the most common clients of LLVM |
| 242 | are using this to handle a bulk of their variables. You can be sure |
| 243 | that bugs are found fast and fixed early. |
| 244 | - Extremely Fast: mem2reg has a number of special cases that make it |
| 245 | fast in common cases as well as fully general. For example, it has |
| 246 | fast-paths for variables that are only used in a single block, |
| 247 | variables that only have one assignment point, good heuristics to |
| 248 | avoid insertion of unneeded phi nodes, etc. |
| 249 | - Needed for debug info generation: `Debug information in |
| 250 | LLVM <../SourceLevelDebugging.html>`_ relies on having the address of |
| 251 | the variable exposed so that debug info can be attached to it. This |
| 252 | technique dovetails very naturally with this style of debug info. |
| 253 | |
| 254 | If nothing else, this makes it much easier to get your front-end up and |
| 255 | running, and is very simple to implement. Lets extend Kaleidoscope with |
| 256 | mutable variables now! |
| 257 | |
| 258 | Mutable Variables in Kaleidoscope |
| 259 | ================================= |
| 260 | |
| 261 | Now that we know the sort of problem we want to tackle, lets see what |
| 262 | this looks like in the context of our little Kaleidoscope language. |
| 263 | We're going to add two features: |
| 264 | |
| 265 | #. The ability to mutate variables with the '=' operator. |
| 266 | #. The ability to define new variables. |
| 267 | |
| 268 | While the first item is really what this is about, we only have |
| 269 | variables for incoming arguments as well as for induction variables, and |
| 270 | redefining those only goes so far :). Also, the ability to define new |
| 271 | variables is a useful thing regardless of whether you will be mutating |
| 272 | them. Here's a motivating example that shows how we could use these: |
| 273 | |
| 274 | :: |
| 275 | |
| 276 | # Define ':' for sequencing: as a low-precedence operator that ignores operands |
| 277 | # and just returns the RHS. |
| 278 | def binary : 1 (x y) y; |
| 279 | |
| 280 | # Recursive fib, we could do this before. |
| 281 | def fib(x) |
| 282 | if (x < 3) then |
| 283 | 1 |
| 284 | else |
| 285 | fib(x-1)+fib(x-2); |
| 286 | |
| 287 | # Iterative fib. |
| 288 | def fibi(x) |
| 289 | var a = 1, b = 1, c in |
| 290 | (for i = 3, i < x in |
| 291 | c = a + b : |
| 292 | a = b : |
| 293 | b = c) : |
| 294 | b; |
| 295 | |
| 296 | # Call it. |
| 297 | fibi(10); |
| 298 | |
| 299 | In order to mutate variables, we have to change our existing variables |
| 300 | to use the "alloca trick". Once we have that, we'll add our new |
| 301 | operator, then extend Kaleidoscope to support new variable definitions. |
| 302 | |
| 303 | Adjusting Existing Variables for Mutation |
| 304 | ========================================= |
| 305 | |
| 306 | The symbol table in Kaleidoscope is managed at code generation time by |
| 307 | the '``NamedValues``' map. This map currently keeps track of the LLVM |
| 308 | "Value\*" that holds the double value for the named variable. In order |
| 309 | to support mutation, we need to change this slightly, so that it |
| 310 | ``NamedValues`` holds the *memory location* of the variable in question. |
| 311 | Note that this change is a refactoring: it changes the structure of the |
| 312 | code, but does not (by itself) change the behavior of the compiler. All |
| 313 | of these changes are isolated in the Kaleidoscope code generator. |
| 314 | |
| 315 | At this point in Kaleidoscope's development, it only supports variables |
| 316 | for two things: incoming arguments to functions and the induction |
| 317 | variable of 'for' loops. For consistency, we'll allow mutation of these |
| 318 | variables in addition to other user-defined variables. This means that |
| 319 | these will both need memory locations. |
| 320 | |
| 321 | To start our transformation of Kaleidoscope, we'll change the |
| 322 | NamedValues map so that it maps to AllocaInst\* instead of Value\*. Once |
| 323 | we do this, the C++ compiler will tell us what parts of the code we need |
| 324 | to update: |
| 325 | |
| 326 | .. code-block:: c++ |
| 327 | |
| 328 | static std::map<std::string, AllocaInst*> NamedValues; |
| 329 | |
| 330 | Also, since we will need to create these alloca's, we'll use a helper |
| 331 | function that ensures that the allocas are created in the entry block of |
| 332 | the function: |
| 333 | |
| 334 | .. code-block:: c++ |
| 335 | |
| 336 | /// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of |
| 337 | /// the function. This is used for mutable variables etc. |
| 338 | static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction, |
| 339 | const std::string &VarName) { |
| 340 | IRBuilder<> TmpB(&TheFunction->getEntryBlock(), |
| 341 | TheFunction->getEntryBlock().begin()); |
| 342 | return TmpB.CreateAlloca(Type::getDoubleTy(getGlobalContext()), 0, |
| 343 | VarName.c_str()); |
| 344 | } |
| 345 | |
| 346 | This funny looking code creates an IRBuilder object that is pointing at |
| 347 | the first instruction (.begin()) of the entry block. It then creates an |
| 348 | alloca with the expected name and returns it. Because all values in |
| 349 | Kaleidoscope are doubles, there is no need to pass in a type to use. |
| 350 | |
| 351 | With this in place, the first functionality change we want to make is to |
| 352 | variable references. In our new scheme, variables live on the stack, so |
| 353 | code generating a reference to them actually needs to produce a load |
| 354 | from the stack slot: |
| 355 | |
| 356 | .. code-block:: c++ |
| 357 | |
| 358 | Value *VariableExprAST::Codegen() { |
| 359 | // Look this variable up in the function. |
| 360 | Value *V = NamedValues[Name]; |
| 361 | if (V == 0) return ErrorV("Unknown variable name"); |
| 362 | |
| 363 | // Load the value. |
| 364 | return Builder.CreateLoad(V, Name.c_str()); |
| 365 | } |
| 366 | |
| 367 | As you can see, this is pretty straightforward. Now we need to update |
| 368 | the things that define the variables to set up the alloca. We'll start |
| 369 | with ``ForExprAST::Codegen`` (see the `full code listing <#code>`_ for |
| 370 | the unabridged code): |
| 371 | |
| 372 | .. code-block:: c++ |
| 373 | |
| 374 | Function *TheFunction = Builder.GetInsertBlock()->getParent(); |
| 375 | |
| 376 | // Create an alloca for the variable in the entry block. |
| 377 | AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName); |
| 378 | |
| 379 | // Emit the start code first, without 'variable' in scope. |
| 380 | Value *StartVal = Start->Codegen(); |
| 381 | if (StartVal == 0) return 0; |
| 382 | |
| 383 | // Store the value into the alloca. |
| 384 | Builder.CreateStore(StartVal, Alloca); |
| 385 | ... |
| 386 | |
| 387 | // Compute the end condition. |
| 388 | Value *EndCond = End->Codegen(); |
| 389 | if (EndCond == 0) return EndCond; |
| 390 | |
| 391 | // Reload, increment, and restore the alloca. This handles the case where |
| 392 | // the body of the loop mutates the variable. |
| 393 | Value *CurVar = Builder.CreateLoad(Alloca); |
| 394 | Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar"); |
| 395 | Builder.CreateStore(NextVar, Alloca); |
| 396 | ... |
| 397 | |
| 398 | This code is virtually identical to the code `before we allowed mutable |
| 399 | variables <LangImpl5.html#forcodegen>`_. The big difference is that we |
| 400 | no longer have to construct a PHI node, and we use load/store to access |
| 401 | the variable as needed. |
| 402 | |
| 403 | To support mutable argument variables, we need to also make allocas for |
| 404 | them. The code for this is also pretty simple: |
| 405 | |
| 406 | .. code-block:: c++ |
| 407 | |
| 408 | /// CreateArgumentAllocas - Create an alloca for each argument and register the |
| 409 | /// argument in the symbol table so that references to it will succeed. |
| 410 | void PrototypeAST::CreateArgumentAllocas(Function *F) { |
| 411 | Function::arg_iterator AI = F->arg_begin(); |
| 412 | for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) { |
| 413 | // Create an alloca for this variable. |
| 414 | AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]); |
| 415 | |
| 416 | // Store the initial value into the alloca. |
| 417 | Builder.CreateStore(AI, Alloca); |
| 418 | |
| 419 | // Add arguments to variable symbol table. |
| 420 | NamedValues[Args[Idx]] = Alloca; |
| 421 | } |
| 422 | } |
| 423 | |
| 424 | For each argument, we make an alloca, store the input value to the |
| 425 | function into the alloca, and register the alloca as the memory location |
| 426 | for the argument. This method gets invoked by ``FunctionAST::Codegen`` |
| 427 | right after it sets up the entry block for the function. |
| 428 | |
| 429 | The final missing piece is adding the mem2reg pass, which allows us to |
| 430 | get good codegen once again: |
| 431 | |
| 432 | .. code-block:: c++ |
| 433 | |
| 434 | // Set up the optimizer pipeline. Start with registering info about how the |
| 435 | // target lays out data structures. |
| 436 | OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout())); |
| 437 | // Promote allocas to registers. |
| 438 | OurFPM.add(createPromoteMemoryToRegisterPass()); |
| 439 | // Do simple "peephole" optimizations and bit-twiddling optzns. |
| 440 | OurFPM.add(createInstructionCombiningPass()); |
| 441 | // Reassociate expressions. |
| 442 | OurFPM.add(createReassociatePass()); |
| 443 | |
| 444 | It is interesting to see what the code looks like before and after the |
| 445 | mem2reg optimization runs. For example, this is the before/after code |
| 446 | for our recursive fib function. Before the optimization: |
| 447 | |
| 448 | .. code-block:: llvm |
| 449 | |
| 450 | define double @fib(double %x) { |
| 451 | entry: |
| 452 | %x1 = alloca double |
| 453 | store double %x, double* %x1 |
| 454 | %x2 = load double* %x1 |
| 455 | %cmptmp = fcmp ult double %x2, 3.000000e+00 |
| 456 | %booltmp = uitofp i1 %cmptmp to double |
| 457 | %ifcond = fcmp one double %booltmp, 0.000000e+00 |
| 458 | br i1 %ifcond, label %then, label %else |
| 459 | |
| 460 | then: ; preds = %entry |
| 461 | br label %ifcont |
| 462 | |
| 463 | else: ; preds = %entry |
| 464 | %x3 = load double* %x1 |
| 465 | %subtmp = fsub double %x3, 1.000000e+00 |
| 466 | %calltmp = call double @fib(double %subtmp) |
| 467 | %x4 = load double* %x1 |
| 468 | %subtmp5 = fsub double %x4, 2.000000e+00 |
| 469 | %calltmp6 = call double @fib(double %subtmp5) |
| 470 | %addtmp = fadd double %calltmp, %calltmp6 |
| 471 | br label %ifcont |
| 472 | |
| 473 | ifcont: ; preds = %else, %then |
| 474 | %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ] |
| 475 | ret double %iftmp |
| 476 | } |
| 477 | |
| 478 | Here there is only one variable (x, the input argument) but you can |
| 479 | still see the extremely simple-minded code generation strategy we are |
| 480 | using. In the entry block, an alloca is created, and the initial input |
| 481 | value is stored into it. Each reference to the variable does a reload |
| 482 | from the stack. Also, note that we didn't modify the if/then/else |
| 483 | expression, so it still inserts a PHI node. While we could make an |
| 484 | alloca for it, it is actually easier to create a PHI node for it, so we |
| 485 | still just make the PHI. |
| 486 | |
| 487 | Here is the code after the mem2reg pass runs: |
| 488 | |
| 489 | .. code-block:: llvm |
| 490 | |
| 491 | define double @fib(double %x) { |
| 492 | entry: |
| 493 | %cmptmp = fcmp ult double %x, 3.000000e+00 |
| 494 | %booltmp = uitofp i1 %cmptmp to double |
| 495 | %ifcond = fcmp one double %booltmp, 0.000000e+00 |
| 496 | br i1 %ifcond, label %then, label %else |
| 497 | |
| 498 | then: |
| 499 | br label %ifcont |
| 500 | |
| 501 | else: |
| 502 | %subtmp = fsub double %x, 1.000000e+00 |
| 503 | %calltmp = call double @fib(double %subtmp) |
| 504 | %subtmp5 = fsub double %x, 2.000000e+00 |
| 505 | %calltmp6 = call double @fib(double %subtmp5) |
| 506 | %addtmp = fadd double %calltmp, %calltmp6 |
| 507 | br label %ifcont |
| 508 | |
| 509 | ifcont: ; preds = %else, %then |
| 510 | %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ] |
| 511 | ret double %iftmp |
| 512 | } |
| 513 | |
| 514 | This is a trivial case for mem2reg, since there are no redefinitions of |
| 515 | the variable. The point of showing this is to calm your tension about |
| 516 | inserting such blatent inefficiencies :). |
| 517 | |
| 518 | After the rest of the optimizers run, we get: |
| 519 | |
| 520 | .. code-block:: llvm |
| 521 | |
| 522 | define double @fib(double %x) { |
| 523 | entry: |
| 524 | %cmptmp = fcmp ult double %x, 3.000000e+00 |
| 525 | %booltmp = uitofp i1 %cmptmp to double |
| 526 | %ifcond = fcmp ueq double %booltmp, 0.000000e+00 |
| 527 | br i1 %ifcond, label %else, label %ifcont |
| 528 | |
| 529 | else: |
| 530 | %subtmp = fsub double %x, 1.000000e+00 |
| 531 | %calltmp = call double @fib(double %subtmp) |
| 532 | %subtmp5 = fsub double %x, 2.000000e+00 |
| 533 | %calltmp6 = call double @fib(double %subtmp5) |
| 534 | %addtmp = fadd double %calltmp, %calltmp6 |
| 535 | ret double %addtmp |
| 536 | |
| 537 | ifcont: |
| 538 | ret double 1.000000e+00 |
| 539 | } |
| 540 | |
| 541 | Here we see that the simplifycfg pass decided to clone the return |
| 542 | instruction into the end of the 'else' block. This allowed it to |
| 543 | eliminate some branches and the PHI node. |
| 544 | |
| 545 | Now that all symbol table references are updated to use stack variables, |
| 546 | we'll add the assignment operator. |
| 547 | |
| 548 | New Assignment Operator |
| 549 | ======================= |
| 550 | |
| 551 | With our current framework, adding a new assignment operator is really |
| 552 | simple. We will parse it just like any other binary operator, but handle |
| 553 | it internally (instead of allowing the user to define it). The first |
| 554 | step is to set a precedence: |
| 555 | |
| 556 | .. code-block:: c++ |
| 557 | |
| 558 | int main() { |
| 559 | // Install standard binary operators. |
| 560 | // 1 is lowest precedence. |
| 561 | BinopPrecedence['='] = 2; |
| 562 | BinopPrecedence['<'] = 10; |
| 563 | BinopPrecedence['+'] = 20; |
| 564 | BinopPrecedence['-'] = 20; |
| 565 | |
| 566 | Now that the parser knows the precedence of the binary operator, it |
| 567 | takes care of all the parsing and AST generation. We just need to |
| 568 | implement codegen for the assignment operator. This looks like: |
| 569 | |
| 570 | .. code-block:: c++ |
| 571 | |
| 572 | Value *BinaryExprAST::Codegen() { |
| 573 | // Special case '=' because we don't want to emit the LHS as an expression. |
| 574 | if (Op == '=') { |
| 575 | // Assignment requires the LHS to be an identifier. |
| 576 | VariableExprAST *LHSE = dynamic_cast<VariableExprAST*>(LHS); |
| 577 | if (!LHSE) |
| 578 | return ErrorV("destination of '=' must be a variable"); |
| 579 | |
| 580 | Unlike the rest of the binary operators, our assignment operator doesn't |
| 581 | follow the "emit LHS, emit RHS, do computation" model. As such, it is |
| 582 | handled as a special case before the other binary operators are handled. |
| 583 | The other strange thing is that it requires the LHS to be a variable. It |
| 584 | is invalid to have "(x+1) = expr" - only things like "x = expr" are |
| 585 | allowed. |
| 586 | |
| 587 | .. code-block:: c++ |
| 588 | |
| 589 | // Codegen the RHS. |
| 590 | Value *Val = RHS->Codegen(); |
| 591 | if (Val == 0) return 0; |
| 592 | |
| 593 | // Look up the name. |
| 594 | Value *Variable = NamedValues[LHSE->getName()]; |
| 595 | if (Variable == 0) return ErrorV("Unknown variable name"); |
| 596 | |
| 597 | Builder.CreateStore(Val, Variable); |
| 598 | return Val; |
| 599 | } |
| 600 | ... |
| 601 | |
| 602 | Once we have the variable, codegen'ing the assignment is |
| 603 | straightforward: we emit the RHS of the assignment, create a store, and |
| 604 | return the computed value. Returning a value allows for chained |
| 605 | assignments like "X = (Y = Z)". |
| 606 | |
| 607 | Now that we have an assignment operator, we can mutate loop variables |
| 608 | and arguments. For example, we can now run code like this: |
| 609 | |
| 610 | :: |
| 611 | |
| 612 | # Function to print a double. |
| 613 | extern printd(x); |
| 614 | |
| 615 | # Define ':' for sequencing: as a low-precedence operator that ignores operands |
| 616 | # and just returns the RHS. |
| 617 | def binary : 1 (x y) y; |
| 618 | |
| 619 | def test(x) |
| 620 | printd(x) : |
| 621 | x = 4 : |
| 622 | printd(x); |
| 623 | |
| 624 | test(123); |
| 625 | |
| 626 | When run, this example prints "123" and then "4", showing that we did |
| 627 | actually mutate the value! Okay, we have now officially implemented our |
| 628 | goal: getting this to work requires SSA construction in the general |
| 629 | case. However, to be really useful, we want the ability to define our |
| 630 | own local variables, lets add this next! |
| 631 | |
| 632 | User-defined Local Variables |
| 633 | ============================ |
| 634 | |
| 635 | Adding var/in is just like any other other extensions we made to |
| 636 | Kaleidoscope: we extend the lexer, the parser, the AST and the code |
| 637 | generator. The first step for adding our new 'var/in' construct is to |
| 638 | extend the lexer. As before, this is pretty trivial, the code looks like |
| 639 | this: |
| 640 | |
| 641 | .. code-block:: c++ |
| 642 | |
| 643 | enum Token { |
| 644 | ... |
| 645 | // var definition |
| 646 | tok_var = -13 |
| 647 | ... |
| 648 | } |
| 649 | ... |
| 650 | static int gettok() { |
| 651 | ... |
| 652 | if (IdentifierStr == "in") return tok_in; |
| 653 | if (IdentifierStr == "binary") return tok_binary; |
| 654 | if (IdentifierStr == "unary") return tok_unary; |
| 655 | if (IdentifierStr == "var") return tok_var; |
| 656 | return tok_identifier; |
| 657 | ... |
| 658 | |
| 659 | The next step is to define the AST node that we will construct. For |
| 660 | var/in, it looks like this: |
| 661 | |
| 662 | .. code-block:: c++ |
| 663 | |
| 664 | /// VarExprAST - Expression class for var/in |
| 665 | class VarExprAST : public ExprAST { |
| 666 | std::vector<std::pair<std::string, ExprAST*> > VarNames; |
| 667 | ExprAST *Body; |
| 668 | public: |
| 669 | VarExprAST(const std::vector<std::pair<std::string, ExprAST*> > &varnames, |
| 670 | ExprAST *body) |
| 671 | : VarNames(varnames), Body(body) {} |
| 672 | |
| 673 | virtual Value *Codegen(); |
| 674 | }; |
| 675 | |
| 676 | var/in allows a list of names to be defined all at once, and each name |
| 677 | can optionally have an initializer value. As such, we capture this |
| 678 | information in the VarNames vector. Also, var/in has a body, this body |
| 679 | is allowed to access the variables defined by the var/in. |
| 680 | |
| 681 | With this in place, we can define the parser pieces. The first thing we |
| 682 | do is add it as a primary expression: |
| 683 | |
| 684 | .. code-block:: c++ |
| 685 | |
| 686 | /// primary |
| 687 | /// ::= identifierexpr |
| 688 | /// ::= numberexpr |
| 689 | /// ::= parenexpr |
| 690 | /// ::= ifexpr |
| 691 | /// ::= forexpr |
| 692 | /// ::= varexpr |
| 693 | static ExprAST *ParsePrimary() { |
| 694 | switch (CurTok) { |
| 695 | default: return Error("unknown token when expecting an expression"); |
| 696 | case tok_identifier: return ParseIdentifierExpr(); |
| 697 | case tok_number: return ParseNumberExpr(); |
| 698 | case '(': return ParseParenExpr(); |
| 699 | case tok_if: return ParseIfExpr(); |
| 700 | case tok_for: return ParseForExpr(); |
| 701 | case tok_var: return ParseVarExpr(); |
| 702 | } |
| 703 | } |
| 704 | |
| 705 | Next we define ParseVarExpr: |
| 706 | |
| 707 | .. code-block:: c++ |
| 708 | |
| 709 | /// varexpr ::= 'var' identifier ('=' expression)? |
| 710 | // (',' identifier ('=' expression)?)* 'in' expression |
| 711 | static ExprAST *ParseVarExpr() { |
| 712 | getNextToken(); // eat the var. |
| 713 | |
| 714 | std::vector<std::pair<std::string, ExprAST*> > VarNames; |
| 715 | |
| 716 | // At least one variable name is required. |
| 717 | if (CurTok != tok_identifier) |
| 718 | return Error("expected identifier after var"); |
| 719 | |
| 720 | The first part of this code parses the list of identifier/expr pairs |
| 721 | into the local ``VarNames`` vector. |
| 722 | |
| 723 | .. code-block:: c++ |
| 724 | |
| 725 | while (1) { |
| 726 | std::string Name = IdentifierStr; |
| 727 | getNextToken(); // eat identifier. |
| 728 | |
| 729 | // Read the optional initializer. |
| 730 | ExprAST *Init = 0; |
| 731 | if (CurTok == '=') { |
| 732 | getNextToken(); // eat the '='. |
| 733 | |
| 734 | Init = ParseExpression(); |
| 735 | if (Init == 0) return 0; |
| 736 | } |
| 737 | |
| 738 | VarNames.push_back(std::make_pair(Name, Init)); |
| 739 | |
| 740 | // End of var list, exit loop. |
| 741 | if (CurTok != ',') break; |
| 742 | getNextToken(); // eat the ','. |
| 743 | |
| 744 | if (CurTok != tok_identifier) |
| 745 | return Error("expected identifier list after var"); |
| 746 | } |
| 747 | |
| 748 | Once all the variables are parsed, we then parse the body and create the |
| 749 | AST node: |
| 750 | |
| 751 | .. code-block:: c++ |
| 752 | |
| 753 | // At this point, we have to have 'in'. |
| 754 | if (CurTok != tok_in) |
| 755 | return Error("expected 'in' keyword after 'var'"); |
| 756 | getNextToken(); // eat 'in'. |
| 757 | |
| 758 | ExprAST *Body = ParseExpression(); |
| 759 | if (Body == 0) return 0; |
| 760 | |
| 761 | return new VarExprAST(VarNames, Body); |
| 762 | } |
| 763 | |
| 764 | Now that we can parse and represent the code, we need to support |
| 765 | emission of LLVM IR for it. This code starts out with: |
| 766 | |
| 767 | .. code-block:: c++ |
| 768 | |
| 769 | Value *VarExprAST::Codegen() { |
| 770 | std::vector<AllocaInst *> OldBindings; |
| 771 | |
| 772 | Function *TheFunction = Builder.GetInsertBlock()->getParent(); |
| 773 | |
| 774 | // Register all variables and emit their initializer. |
| 775 | for (unsigned i = 0, e = VarNames.size(); i != e; ++i) { |
| 776 | const std::string &VarName = VarNames[i].first; |
| 777 | ExprAST *Init = VarNames[i].second; |
| 778 | |
| 779 | Basically it loops over all the variables, installing them one at a |
| 780 | time. For each variable we put into the symbol table, we remember the |
| 781 | previous value that we replace in OldBindings. |
| 782 | |
| 783 | .. code-block:: c++ |
| 784 | |
| 785 | // Emit the initializer before adding the variable to scope, this prevents |
| 786 | // the initializer from referencing the variable itself, and permits stuff |
| 787 | // like this: |
| 788 | // var a = 1 in |
| 789 | // var a = a in ... # refers to outer 'a'. |
| 790 | Value *InitVal; |
| 791 | if (Init) { |
| 792 | InitVal = Init->Codegen(); |
| 793 | if (InitVal == 0) return 0; |
| 794 | } else { // If not specified, use 0.0. |
| 795 | InitVal = ConstantFP::get(getGlobalContext(), APFloat(0.0)); |
| 796 | } |
| 797 | |
| 798 | AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName); |
| 799 | Builder.CreateStore(InitVal, Alloca); |
| 800 | |
| 801 | // Remember the old variable binding so that we can restore the binding when |
| 802 | // we unrecurse. |
| 803 | OldBindings.push_back(NamedValues[VarName]); |
| 804 | |
| 805 | // Remember this binding. |
| 806 | NamedValues[VarName] = Alloca; |
| 807 | } |
| 808 | |
| 809 | There are more comments here than code. The basic idea is that we emit |
| 810 | the initializer, create the alloca, then update the symbol table to |
| 811 | point to it. Once all the variables are installed in the symbol table, |
| 812 | we evaluate the body of the var/in expression: |
| 813 | |
| 814 | .. code-block:: c++ |
| 815 | |
| 816 | // Codegen the body, now that all vars are in scope. |
| 817 | Value *BodyVal = Body->Codegen(); |
| 818 | if (BodyVal == 0) return 0; |
| 819 | |
| 820 | Finally, before returning, we restore the previous variable bindings: |
| 821 | |
| 822 | .. code-block:: c++ |
| 823 | |
| 824 | // Pop all our variables from scope. |
| 825 | for (unsigned i = 0, e = VarNames.size(); i != e; ++i) |
| 826 | NamedValues[VarNames[i].first] = OldBindings[i]; |
| 827 | |
| 828 | // Return the body computation. |
| 829 | return BodyVal; |
| 830 | } |
| 831 | |
| 832 | The end result of all of this is that we get properly scoped variable |
| 833 | definitions, and we even (trivially) allow mutation of them :). |
| 834 | |
| 835 | With this, we completed what we set out to do. Our nice iterative fib |
| 836 | example from the intro compiles and runs just fine. The mem2reg pass |
| 837 | optimizes all of our stack variables into SSA registers, inserting PHI |
| 838 | nodes where needed, and our front-end remains simple: no "iterated |
| 839 | dominance frontier" computation anywhere in sight. |
| 840 | |
| 841 | Full Code Listing |
| 842 | ================= |
| 843 | |
| 844 | Here is the complete code listing for our running example, enhanced with |
| 845 | mutable variables and var/in support. To build this example, use: |
| 846 | |
| 847 | .. code-block:: bash |
| 848 | |
| 849 | # Compile |
| 850 | clang++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy |
| 851 | # Run |
| 852 | ./toy |
| 853 | |
| 854 | Here is the code: |
| 855 | |
Logan Chien | 2dce4f2 | 2013-06-08 09:03:03 +0000 | [diff] [blame] | 856 | .. literalinclude:: ../../examples/Kaleidoscope/Chapter7/toy.cpp |
| 857 | :language: c++ |
Sean Silva | ee47edf | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 858 | |
| 859 | `Next: Conclusion and other useful LLVM tidbits <LangImpl8.html>`_ |
| 860 | |