Hans Wennborg | 74e4f8a | 2019-04-11 07:46:25 +0000 | [diff] [blame] | 1 | :orphan: |
| 2 | |
Chris Lattner | d80f118 | 2019-04-07 13:14:23 +0000 | [diff] [blame] | 3 | ======================================================= |
| 4 | Kaleidoscope: Extending the Language: Mutable Variables |
| 5 | ======================================================= |
| 6 | |
| 7 | .. contents:: |
| 8 | :local: |
| 9 | |
| 10 | Chapter 7 Introduction |
| 11 | ====================== |
| 12 | |
| 13 | Welcome to Chapter 7 of the "`Implementing a language with |
| 14 | LLVM <index.html>`_" tutorial. In chapters 1 through 6, we've built a |
| 15 | very respectable, albeit simple, `functional programming |
| 16 | language <http://en.wikipedia.org/wiki/Functional_programming>`_. In our |
| 17 | journey, we learned some parsing techniques, how to build and represent |
| 18 | an AST, how to build LLVM IR, and how to optimize the resultant code as |
| 19 | well as JIT compile it. |
| 20 | |
| 21 | While Kaleidoscope is interesting as a functional language, the fact |
| 22 | that it is functional makes it "too easy" to generate LLVM IR for it. In |
| 23 | particular, a functional language makes it very easy to build LLVM IR |
| 24 | directly in `SSA |
| 25 | form <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_. |
| 26 | Since LLVM requires that the input code be in SSA form, this is a very |
| 27 | nice property and it is often unclear to newcomers how to generate code |
| 28 | for an imperative language with mutable variables. |
| 29 | |
| 30 | The short (and happy) summary of this chapter is that there is no need |
| 31 | for your front-end to build SSA form: LLVM provides highly tuned and |
| 32 | well tested support for this, though the way it works is a bit |
| 33 | unexpected for some. |
| 34 | |
| 35 | Why is this a hard problem? |
| 36 | =========================== |
| 37 | |
| 38 | To understand why mutable variables cause complexities in SSA |
| 39 | construction, consider this extremely simple C example: |
| 40 | |
| 41 | .. code-block:: c |
| 42 | |
| 43 | int G, H; |
| 44 | int test(_Bool Condition) { |
| 45 | int X; |
| 46 | if (Condition) |
| 47 | X = G; |
| 48 | else |
| 49 | X = H; |
| 50 | return X; |
| 51 | } |
| 52 | |
| 53 | In this case, we have the variable "X", whose value depends on the path |
| 54 | executed in the program. Because there are two different possible values |
| 55 | for X before the return instruction, a PHI node is inserted to merge the |
| 56 | two values. The LLVM IR that we want for this example looks like this: |
| 57 | |
| 58 | .. code-block:: llvm |
| 59 | |
| 60 | @G = weak global i32 0 ; type of @G is i32* |
| 61 | @H = weak global i32 0 ; type of @H is i32* |
| 62 | |
| 63 | define i32 @test(i1 %Condition) { |
| 64 | entry: |
| 65 | br i1 %Condition, label %cond_true, label %cond_false |
| 66 | |
| 67 | cond_true: |
| 68 | %X.0 = load i32* @G |
| 69 | br label %cond_next |
| 70 | |
| 71 | cond_false: |
| 72 | %X.1 = load i32* @H |
| 73 | br label %cond_next |
| 74 | |
| 75 | cond_next: |
| 76 | %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] |
| 77 | ret i32 %X.2 |
| 78 | } |
| 79 | |
| 80 | In this example, the loads from the G and H global variables are |
| 81 | explicit in the LLVM IR, and they live in the then/else branches of the |
| 82 | if statement (cond\_true/cond\_false). In order to merge the incoming |
| 83 | values, the X.2 phi node in the cond\_next block selects the right value |
| 84 | to use based on where control flow is coming from: if control flow comes |
| 85 | from the cond\_false block, X.2 gets the value of X.1. Alternatively, if |
| 86 | control flow comes from cond\_true, it gets the value of X.0. The intent |
| 87 | of this chapter is not to explain the details of SSA form. For more |
| 88 | information, see one of the many `online |
| 89 | references <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_. |
| 90 | |
| 91 | The question for this article is "who places the phi nodes when lowering |
| 92 | assignments to mutable variables?". The issue here is that LLVM |
| 93 | *requires* that its IR be in SSA form: there is no "non-ssa" mode for |
| 94 | it. However, SSA construction requires non-trivial algorithms and data |
| 95 | structures, so it is inconvenient and wasteful for every front-end to |
| 96 | have to reproduce this logic. |
| 97 | |
| 98 | Memory in LLVM |
| 99 | ============== |
| 100 | |
| 101 | The 'trick' here is that while LLVM does require all register values to |
| 102 | be in SSA form, it does not require (or permit) memory objects to be in |
| 103 | SSA form. In the example above, note that the loads from G and H are |
| 104 | direct accesses to G and H: they are not renamed or versioned. This |
| 105 | differs from some other compiler systems, which do try to version memory |
| 106 | objects. In LLVM, instead of encoding dataflow analysis of memory into |
| 107 | the LLVM IR, it is handled with `Analysis |
| 108 | Passes <../WritingAnLLVMPass.html>`_ which are computed on demand. |
| 109 | |
| 110 | With this in mind, the high-level idea is that we want to make a stack |
| 111 | variable (which lives in memory, because it is on the stack) for each |
| 112 | mutable object in a function. To take advantage of this trick, we need |
| 113 | to talk about how LLVM represents stack variables. |
| 114 | |
| 115 | In LLVM, all memory accesses are explicit with load/store instructions, |
| 116 | and it is carefully designed not to have (or need) an "address-of" |
| 117 | operator. Notice how the type of the @G/@H global variables is actually |
| 118 | "i32\*" even though the variable is defined as "i32". What this means is |
| 119 | that @G defines *space* for an i32 in the global data area, but its |
| 120 | *name* actually refers to the address for that space. Stack variables |
| 121 | work the same way, except that instead of being declared with global |
| 122 | variable definitions, they are declared with the `LLVM alloca |
| 123 | instruction <../LangRef.html#alloca-instruction>`_: |
| 124 | |
| 125 | .. code-block:: llvm |
| 126 | |
| 127 | define i32 @example() { |
| 128 | entry: |
| 129 | %X = alloca i32 ; type of %X is i32*. |
| 130 | ... |
| 131 | %tmp = load i32* %X ; load the stack value %X from the stack. |
| 132 | %tmp2 = add i32 %tmp, 1 ; increment it |
| 133 | store i32 %tmp2, i32* %X ; store it back |
| 134 | ... |
| 135 | |
| 136 | This code shows an example of how you can declare and manipulate a stack |
| 137 | variable in the LLVM IR. Stack memory allocated with the alloca |
| 138 | instruction is fully general: you can pass the address of the stack slot |
| 139 | to functions, you can store it in other variables, etc. In our example |
| 140 | above, we could rewrite the example to use the alloca technique to avoid |
| 141 | using a PHI node: |
| 142 | |
| 143 | .. code-block:: llvm |
| 144 | |
| 145 | @G = weak global i32 0 ; type of @G is i32* |
| 146 | @H = weak global i32 0 ; type of @H is i32* |
| 147 | |
| 148 | define i32 @test(i1 %Condition) { |
| 149 | entry: |
| 150 | %X = alloca i32 ; type of %X is i32*. |
| 151 | br i1 %Condition, label %cond_true, label %cond_false |
| 152 | |
| 153 | cond_true: |
| 154 | %X.0 = load i32* @G |
| 155 | store i32 %X.0, i32* %X ; Update X |
| 156 | br label %cond_next |
| 157 | |
| 158 | cond_false: |
| 159 | %X.1 = load i32* @H |
| 160 | store i32 %X.1, i32* %X ; Update X |
| 161 | br label %cond_next |
| 162 | |
| 163 | cond_next: |
| 164 | %X.2 = load i32* %X ; Read X |
| 165 | ret i32 %X.2 |
| 166 | } |
| 167 | |
| 168 | With this, we have discovered a way to handle arbitrary mutable |
| 169 | variables without the need to create Phi nodes at all: |
| 170 | |
| 171 | #. Each mutable variable becomes a stack allocation. |
| 172 | #. Each read of the variable becomes a load from the stack. |
| 173 | #. Each update of the variable becomes a store to the stack. |
| 174 | #. Taking the address of a variable just uses the stack address |
| 175 | directly. |
| 176 | |
| 177 | While this solution has solved our immediate problem, it introduced |
| 178 | another one: we have now apparently introduced a lot of stack traffic |
| 179 | for very simple and common operations, a major performance problem. |
| 180 | Fortunately for us, the LLVM optimizer has a highly-tuned optimization |
| 181 | pass named "mem2reg" that handles this case, promoting allocas like this |
| 182 | into SSA registers, inserting Phi nodes as appropriate. If you run this |
| 183 | example through the pass, for example, you'll get: |
| 184 | |
| 185 | .. code-block:: bash |
| 186 | |
| 187 | $ llvm-as < example.ll | opt -mem2reg | llvm-dis |
| 188 | @G = weak global i32 0 |
| 189 | @H = weak global i32 0 |
| 190 | |
| 191 | define i32 @test(i1 %Condition) { |
| 192 | entry: |
| 193 | br i1 %Condition, label %cond_true, label %cond_false |
| 194 | |
| 195 | cond_true: |
| 196 | %X.0 = load i32* @G |
| 197 | br label %cond_next |
| 198 | |
| 199 | cond_false: |
| 200 | %X.1 = load i32* @H |
| 201 | br label %cond_next |
| 202 | |
| 203 | cond_next: |
| 204 | %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] |
| 205 | ret i32 %X.01 |
| 206 | } |
| 207 | |
| 208 | The mem2reg pass implements the standard "iterated dominance frontier" |
| 209 | algorithm for constructing SSA form and has a number of optimizations |
| 210 | that speed up (very common) degenerate cases. The mem2reg optimization |
| 211 | pass is the answer to dealing with mutable variables, and we highly |
| 212 | recommend that you depend on it. Note that mem2reg only works on |
| 213 | variables in certain circumstances: |
| 214 | |
| 215 | #. mem2reg is alloca-driven: it looks for allocas and if it can handle |
| 216 | them, it promotes them. It does not apply to global variables or heap |
| 217 | allocations. |
| 218 | #. mem2reg only looks for alloca instructions in the entry block of the |
| 219 | function. Being in the entry block guarantees that the alloca is only |
| 220 | executed once, which makes analysis simpler. |
| 221 | #. mem2reg only promotes allocas whose uses are direct loads and stores. |
| 222 | If the address of the stack object is passed to a function, or if any |
| 223 | funny pointer arithmetic is involved, the alloca will not be |
| 224 | promoted. |
| 225 | #. mem2reg only works on allocas of `first |
| 226 | class <../LangRef.html#first-class-types>`_ values (such as pointers, |
| 227 | scalars and vectors), and only if the array size of the allocation is |
| 228 | 1 (or missing in the .ll file). mem2reg is not capable of promoting |
| 229 | structs or arrays to registers. Note that the "sroa" pass is |
| 230 | more powerful and can promote structs, "unions", and arrays in many |
| 231 | cases. |
| 232 | |
| 233 | All of these properties are easy to satisfy for most imperative |
| 234 | languages, and we'll illustrate it below with Kaleidoscope. The final |
| 235 | question you may be asking is: should I bother with this nonsense for my |
| 236 | front-end? Wouldn't it be better if I just did SSA construction |
| 237 | directly, avoiding use of the mem2reg optimization pass? In short, we |
| 238 | strongly recommend that you use this technique for building SSA form, |
| 239 | unless there is an extremely good reason not to. Using this technique |
| 240 | is: |
| 241 | |
| 242 | - Proven and well tested: clang uses this technique |
| 243 | for local mutable variables. As such, the most common clients of LLVM |
| 244 | are using this to handle a bulk of their variables. You can be sure |
| 245 | that bugs are found fast and fixed early. |
| 246 | - Extremely Fast: mem2reg has a number of special cases that make it |
| 247 | fast in common cases as well as fully general. For example, it has |
| 248 | fast-paths for variables that are only used in a single block, |
| 249 | variables that only have one assignment point, good heuristics to |
| 250 | avoid insertion of unneeded phi nodes, etc. |
| 251 | - Needed for debug info generation: `Debug information in |
| 252 | LLVM <../SourceLevelDebugging.html>`_ relies on having the address of |
| 253 | the variable exposed so that debug info can be attached to it. This |
| 254 | technique dovetails very naturally with this style of debug info. |
| 255 | |
| 256 | If nothing else, this makes it much easier to get your front-end up and |
| 257 | running, and is very simple to implement. Let's extend Kaleidoscope with |
| 258 | mutable variables now! |
| 259 | |
| 260 | Mutable Variables in Kaleidoscope |
| 261 | ================================= |
| 262 | |
| 263 | Now that we know the sort of problem we want to tackle, let's see what |
| 264 | this looks like in the context of our little Kaleidoscope language. |
| 265 | We're going to add two features: |
| 266 | |
| 267 | #. The ability to mutate variables with the '=' operator. |
| 268 | #. The ability to define new variables. |
| 269 | |
| 270 | While the first item is really what this is about, we only have |
| 271 | variables for incoming arguments as well as for induction variables, and |
| 272 | redefining those only goes so far :). Also, the ability to define new |
| 273 | variables is a useful thing regardless of whether you will be mutating |
| 274 | them. Here's a motivating example that shows how we could use these: |
| 275 | |
| 276 | :: |
| 277 | |
| 278 | # Define ':' for sequencing: as a low-precedence operator that ignores operands |
| 279 | # and just returns the RHS. |
| 280 | def binary : 1 (x y) y; |
| 281 | |
| 282 | # Recursive fib, we could do this before. |
| 283 | def fib(x) |
| 284 | if (x < 3) then |
| 285 | 1 |
| 286 | else |
| 287 | fib(x-1)+fib(x-2); |
| 288 | |
| 289 | # Iterative fib. |
| 290 | def fibi(x) |
| 291 | var a = 1, b = 1, c in |
| 292 | (for i = 3, i < x in |
| 293 | c = a + b : |
| 294 | a = b : |
| 295 | b = c) : |
| 296 | b; |
| 297 | |
| 298 | # Call it. |
| 299 | fibi(10); |
| 300 | |
| 301 | In order to mutate variables, we have to change our existing variables |
| 302 | to use the "alloca trick". Once we have that, we'll add our new |
| 303 | operator, then extend Kaleidoscope to support new variable definitions. |
| 304 | |
| 305 | Adjusting Existing Variables for Mutation |
| 306 | ========================================= |
| 307 | |
| 308 | The symbol table in Kaleidoscope is managed at code generation time by |
| 309 | the '``NamedValues``' map. This map currently keeps track of the LLVM |
| 310 | "Value\*" that holds the double value for the named variable. In order |
| 311 | to support mutation, we need to change this slightly, so that |
| 312 | ``NamedValues`` holds the *memory location* of the variable in question. |
| 313 | Note that this change is a refactoring: it changes the structure of the |
| 314 | code, but does not (by itself) change the behavior of the compiler. All |
| 315 | of these changes are isolated in the Kaleidoscope code generator. |
| 316 | |
| 317 | At this point in Kaleidoscope's development, it only supports variables |
| 318 | for two things: incoming arguments to functions and the induction |
| 319 | variable of 'for' loops. For consistency, we'll allow mutation of these |
| 320 | variables in addition to other user-defined variables. This means that |
| 321 | these will both need memory locations. |
| 322 | |
| 323 | To start our transformation of Kaleidoscope, we'll change the |
| 324 | NamedValues map so that it maps to AllocaInst\* instead of Value\*. Once |
| 325 | we do this, the C++ compiler will tell us what parts of the code we need |
| 326 | to update: |
| 327 | |
| 328 | .. code-block:: c++ |
| 329 | |
| 330 | static std::map<std::string, AllocaInst*> NamedValues; |
| 331 | |
| 332 | Also, since we will need to create these allocas, we'll use a helper |
| 333 | function that ensures that the allocas are created in the entry block of |
| 334 | the function: |
| 335 | |
| 336 | .. code-block:: c++ |
| 337 | |
| 338 | /// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of |
| 339 | /// the function. This is used for mutable variables etc. |
| 340 | static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction, |
| 341 | const std::string &VarName) { |
| 342 | IRBuilder<> TmpB(&TheFunction->getEntryBlock(), |
| 343 | TheFunction->getEntryBlock().begin()); |
| 344 | return TmpB.CreateAlloca(Type::getDoubleTy(TheContext), 0, |
| 345 | VarName.c_str()); |
| 346 | } |
| 347 | |
| 348 | This funny looking code creates an IRBuilder object that is pointing at |
| 349 | the first instruction (.begin()) of the entry block. It then creates an |
| 350 | alloca with the expected name and returns it. Because all values in |
| 351 | Kaleidoscope are doubles, there is no need to pass in a type to use. |
| 352 | |
| 353 | With this in place, the first functionality change we want to make belongs to |
| 354 | variable references. In our new scheme, variables live on the stack, so |
| 355 | code generating a reference to them actually needs to produce a load |
| 356 | from the stack slot: |
| 357 | |
| 358 | .. code-block:: c++ |
| 359 | |
| 360 | Value *VariableExprAST::codegen() { |
| 361 | // Look this variable up in the function. |
| 362 | Value *V = NamedValues[Name]; |
| 363 | if (!V) |
| 364 | return LogErrorV("Unknown variable name"); |
| 365 | |
| 366 | // Load the value. |
| 367 | return Builder.CreateLoad(V, Name.c_str()); |
| 368 | } |
| 369 | |
| 370 | As you can see, this is pretty straightforward. Now we need to update |
| 371 | the things that define the variables to set up the alloca. We'll start |
| 372 | with ``ForExprAST::codegen()`` (see the `full code listing <#id1>`_ for |
| 373 | the unabridged code): |
| 374 | |
| 375 | .. code-block:: c++ |
| 376 | |
| 377 | Function *TheFunction = Builder.GetInsertBlock()->getParent(); |
| 378 | |
| 379 | // Create an alloca for the variable in the entry block. |
| 380 | AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName); |
| 381 | |
| 382 | // Emit the start code first, without 'variable' in scope. |
| 383 | Value *StartVal = Start->codegen(); |
| 384 | if (!StartVal) |
| 385 | return nullptr; |
| 386 | |
| 387 | // Store the value into the alloca. |
| 388 | Builder.CreateStore(StartVal, Alloca); |
| 389 | ... |
| 390 | |
| 391 | // Compute the end condition. |
| 392 | Value *EndCond = End->codegen(); |
| 393 | if (!EndCond) |
| 394 | return nullptr; |
| 395 | |
| 396 | // Reload, increment, and restore the alloca. This handles the case where |
| 397 | // the body of the loop mutates the variable. |
| 398 | Value *CurVar = Builder.CreateLoad(Alloca); |
| 399 | Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar"); |
| 400 | Builder.CreateStore(NextVar, Alloca); |
| 401 | ... |
| 402 | |
| 403 | This code is virtually identical to the code `before we allowed mutable |
| 404 | variables <LangImpl5.html#code-generation-for-the-for-loop>`_. The big difference is that we |
| 405 | no longer have to construct a PHI node, and we use load/store to access |
| 406 | the variable as needed. |
| 407 | |
| 408 | To support mutable argument variables, we need to also make allocas for |
| 409 | them. The code for this is also pretty simple: |
| 410 | |
| 411 | .. code-block:: c++ |
| 412 | |
| 413 | Function *FunctionAST::codegen() { |
| 414 | ... |
| 415 | Builder.SetInsertPoint(BB); |
| 416 | |
| 417 | // Record the function arguments in the NamedValues map. |
| 418 | NamedValues.clear(); |
| 419 | for (auto &Arg : TheFunction->args()) { |
| 420 | // Create an alloca for this variable. |
| 421 | AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, Arg.getName()); |
| 422 | |
| 423 | // Store the initial value into the alloca. |
| 424 | Builder.CreateStore(&Arg, Alloca); |
| 425 | |
| 426 | // Add arguments to variable symbol table. |
| 427 | NamedValues[Arg.getName()] = Alloca; |
| 428 | } |
| 429 | |
| 430 | if (Value *RetVal = Body->codegen()) { |
| 431 | ... |
| 432 | |
| 433 | For each argument, we make an alloca, store the input value to the |
| 434 | function into the alloca, and register the alloca as the memory location |
| 435 | for the argument. This method gets invoked by ``FunctionAST::codegen()`` |
| 436 | right after it sets up the entry block for the function. |
| 437 | |
| 438 | The final missing piece is adding the mem2reg pass, which allows us to |
| 439 | get good codegen once again: |
| 440 | |
| 441 | .. code-block:: c++ |
| 442 | |
| 443 | // Promote allocas to registers. |
| 444 | TheFPM->add(createPromoteMemoryToRegisterPass()); |
| 445 | // Do simple "peephole" optimizations and bit-twiddling optzns. |
| 446 | TheFPM->add(createInstructionCombiningPass()); |
| 447 | // Reassociate expressions. |
| 448 | TheFPM->add(createReassociatePass()); |
| 449 | ... |
| 450 | |
| 451 | It is interesting to see what the code looks like before and after the |
| 452 | mem2reg optimization runs. For example, this is the before/after code |
| 453 | for our recursive fib function. Before the optimization: |
| 454 | |
| 455 | .. code-block:: llvm |
| 456 | |
| 457 | define double @fib(double %x) { |
| 458 | entry: |
| 459 | %x1 = alloca double |
| 460 | store double %x, double* %x1 |
| 461 | %x2 = load double, double* %x1 |
| 462 | %cmptmp = fcmp ult double %x2, 3.000000e+00 |
| 463 | %booltmp = uitofp i1 %cmptmp to double |
| 464 | %ifcond = fcmp one double %booltmp, 0.000000e+00 |
| 465 | br i1 %ifcond, label %then, label %else |
| 466 | |
| 467 | then: ; preds = %entry |
| 468 | br label %ifcont |
| 469 | |
| 470 | else: ; preds = %entry |
| 471 | %x3 = load double, double* %x1 |
| 472 | %subtmp = fsub double %x3, 1.000000e+00 |
| 473 | %calltmp = call double @fib(double %subtmp) |
| 474 | %x4 = load double, double* %x1 |
| 475 | %subtmp5 = fsub double %x4, 2.000000e+00 |
| 476 | %calltmp6 = call double @fib(double %subtmp5) |
| 477 | %addtmp = fadd double %calltmp, %calltmp6 |
| 478 | br label %ifcont |
| 479 | |
| 480 | ifcont: ; preds = %else, %then |
| 481 | %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ] |
| 482 | ret double %iftmp |
| 483 | } |
| 484 | |
| 485 | Here there is only one variable (x, the input argument) but you can |
| 486 | still see the extremely simple-minded code generation strategy we are |
| 487 | using. In the entry block, an alloca is created, and the initial input |
| 488 | value is stored into it. Each reference to the variable does a reload |
| 489 | from the stack. Also, note that we didn't modify the if/then/else |
| 490 | expression, so it still inserts a PHI node. While we could make an |
| 491 | alloca for it, it is actually easier to create a PHI node for it, so we |
| 492 | still just make the PHI. |
| 493 | |
| 494 | Here is the code after the mem2reg pass runs: |
| 495 | |
| 496 | .. code-block:: llvm |
| 497 | |
| 498 | define double @fib(double %x) { |
| 499 | entry: |
| 500 | %cmptmp = fcmp ult double %x, 3.000000e+00 |
| 501 | %booltmp = uitofp i1 %cmptmp to double |
| 502 | %ifcond = fcmp one double %booltmp, 0.000000e+00 |
| 503 | br i1 %ifcond, label %then, label %else |
| 504 | |
| 505 | then: |
| 506 | br label %ifcont |
| 507 | |
| 508 | else: |
| 509 | %subtmp = fsub double %x, 1.000000e+00 |
| 510 | %calltmp = call double @fib(double %subtmp) |
| 511 | %subtmp5 = fsub double %x, 2.000000e+00 |
| 512 | %calltmp6 = call double @fib(double %subtmp5) |
| 513 | %addtmp = fadd double %calltmp, %calltmp6 |
| 514 | br label %ifcont |
| 515 | |
| 516 | ifcont: ; preds = %else, %then |
| 517 | %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ] |
| 518 | ret double %iftmp |
| 519 | } |
| 520 | |
| 521 | This is a trivial case for mem2reg, since there are no redefinitions of |
| 522 | the variable. The point of showing this is to calm your tension about |
Nico Weber | bb69208 | 2019-09-13 14:58:24 +0000 | [diff] [blame^] | 523 | inserting such blatant inefficiencies :). |
Chris Lattner | d80f118 | 2019-04-07 13:14:23 +0000 | [diff] [blame] | 524 | |
| 525 | After the rest of the optimizers run, we get: |
| 526 | |
| 527 | .. code-block:: llvm |
| 528 | |
| 529 | define double @fib(double %x) { |
| 530 | entry: |
| 531 | %cmptmp = fcmp ult double %x, 3.000000e+00 |
| 532 | %booltmp = uitofp i1 %cmptmp to double |
| 533 | %ifcond = fcmp ueq double %booltmp, 0.000000e+00 |
| 534 | br i1 %ifcond, label %else, label %ifcont |
| 535 | |
| 536 | else: |
| 537 | %subtmp = fsub double %x, 1.000000e+00 |
| 538 | %calltmp = call double @fib(double %subtmp) |
| 539 | %subtmp5 = fsub double %x, 2.000000e+00 |
| 540 | %calltmp6 = call double @fib(double %subtmp5) |
| 541 | %addtmp = fadd double %calltmp, %calltmp6 |
| 542 | ret double %addtmp |
| 543 | |
| 544 | ifcont: |
| 545 | ret double 1.000000e+00 |
| 546 | } |
| 547 | |
| 548 | Here we see that the simplifycfg pass decided to clone the return |
| 549 | instruction into the end of the 'else' block. This allowed it to |
| 550 | eliminate some branches and the PHI node. |
| 551 | |
| 552 | Now that all symbol table references are updated to use stack variables, |
| 553 | we'll add the assignment operator. |
| 554 | |
| 555 | New Assignment Operator |
| 556 | ======================= |
| 557 | |
| 558 | With our current framework, adding a new assignment operator is really |
| 559 | simple. We will parse it just like any other binary operator, but handle |
| 560 | it internally (instead of allowing the user to define it). The first |
| 561 | step is to set a precedence: |
| 562 | |
| 563 | .. code-block:: c++ |
| 564 | |
| 565 | int main() { |
| 566 | // Install standard binary operators. |
| 567 | // 1 is lowest precedence. |
| 568 | BinopPrecedence['='] = 2; |
| 569 | BinopPrecedence['<'] = 10; |
| 570 | BinopPrecedence['+'] = 20; |
| 571 | BinopPrecedence['-'] = 20; |
| 572 | |
| 573 | Now that the parser knows the precedence of the binary operator, it |
| 574 | takes care of all the parsing and AST generation. We just need to |
| 575 | implement codegen for the assignment operator. This looks like: |
| 576 | |
| 577 | .. code-block:: c++ |
| 578 | |
| 579 | Value *BinaryExprAST::codegen() { |
| 580 | // Special case '=' because we don't want to emit the LHS as an expression. |
| 581 | if (Op == '=') { |
| 582 | // Assignment requires the LHS to be an identifier. |
| 583 | VariableExprAST *LHSE = dynamic_cast<VariableExprAST*>(LHS.get()); |
| 584 | if (!LHSE) |
| 585 | return LogErrorV("destination of '=' must be a variable"); |
| 586 | |
| 587 | Unlike the rest of the binary operators, our assignment operator doesn't |
| 588 | follow the "emit LHS, emit RHS, do computation" model. As such, it is |
| 589 | handled as a special case before the other binary operators are handled. |
| 590 | The other strange thing is that it requires the LHS to be a variable. It |
| 591 | is invalid to have "(x+1) = expr" - only things like "x = expr" are |
| 592 | allowed. |
| 593 | |
| 594 | .. code-block:: c++ |
| 595 | |
| 596 | // Codegen the RHS. |
| 597 | Value *Val = RHS->codegen(); |
| 598 | if (!Val) |
| 599 | return nullptr; |
| 600 | |
| 601 | // Look up the name. |
| 602 | Value *Variable = NamedValues[LHSE->getName()]; |
| 603 | if (!Variable) |
| 604 | return LogErrorV("Unknown variable name"); |
| 605 | |
| 606 | Builder.CreateStore(Val, Variable); |
| 607 | return Val; |
| 608 | } |
| 609 | ... |
| 610 | |
| 611 | Once we have the variable, codegen'ing the assignment is |
| 612 | straightforward: we emit the RHS of the assignment, create a store, and |
| 613 | return the computed value. Returning a value allows for chained |
| 614 | assignments like "X = (Y = Z)". |
| 615 | |
| 616 | Now that we have an assignment operator, we can mutate loop variables |
| 617 | and arguments. For example, we can now run code like this: |
| 618 | |
| 619 | :: |
| 620 | |
| 621 | # Function to print a double. |
| 622 | extern printd(x); |
| 623 | |
| 624 | # Define ':' for sequencing: as a low-precedence operator that ignores operands |
| 625 | # and just returns the RHS. |
| 626 | def binary : 1 (x y) y; |
| 627 | |
| 628 | def test(x) |
| 629 | printd(x) : |
| 630 | x = 4 : |
| 631 | printd(x); |
| 632 | |
| 633 | test(123); |
| 634 | |
| 635 | When run, this example prints "123" and then "4", showing that we did |
| 636 | actually mutate the value! Okay, we have now officially implemented our |
| 637 | goal: getting this to work requires SSA construction in the general |
| 638 | case. However, to be really useful, we want the ability to define our |
| 639 | own local variables, let's add this next! |
| 640 | |
| 641 | User-defined Local Variables |
| 642 | ============================ |
| 643 | |
| 644 | Adding var/in is just like any other extension we made to |
| 645 | Kaleidoscope: we extend the lexer, the parser, the AST and the code |
| 646 | generator. The first step for adding our new 'var/in' construct is to |
| 647 | extend the lexer. As before, this is pretty trivial, the code looks like |
| 648 | this: |
| 649 | |
| 650 | .. code-block:: c++ |
| 651 | |
| 652 | enum Token { |
| 653 | ... |
| 654 | // var definition |
| 655 | tok_var = -13 |
| 656 | ... |
| 657 | } |
| 658 | ... |
| 659 | static int gettok() { |
| 660 | ... |
| 661 | if (IdentifierStr == "in") |
| 662 | return tok_in; |
| 663 | if (IdentifierStr == "binary") |
| 664 | return tok_binary; |
| 665 | if (IdentifierStr == "unary") |
| 666 | return tok_unary; |
| 667 | if (IdentifierStr == "var") |
| 668 | return tok_var; |
| 669 | return tok_identifier; |
| 670 | ... |
| 671 | |
| 672 | The next step is to define the AST node that we will construct. For |
| 673 | var/in, it looks like this: |
| 674 | |
| 675 | .. code-block:: c++ |
| 676 | |
| 677 | /// VarExprAST - Expression class for var/in |
| 678 | class VarExprAST : public ExprAST { |
| 679 | std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames; |
| 680 | std::unique_ptr<ExprAST> Body; |
| 681 | |
| 682 | public: |
| 683 | VarExprAST(std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames, |
| 684 | std::unique_ptr<ExprAST> Body) |
| 685 | : VarNames(std::move(VarNames)), Body(std::move(Body)) {} |
| 686 | |
| 687 | Value *codegen() override; |
| 688 | }; |
| 689 | |
| 690 | var/in allows a list of names to be defined all at once, and each name |
| 691 | can optionally have an initializer value. As such, we capture this |
| 692 | information in the VarNames vector. Also, var/in has a body, this body |
| 693 | is allowed to access the variables defined by the var/in. |
| 694 | |
| 695 | With this in place, we can define the parser pieces. The first thing we |
| 696 | do is add it as a primary expression: |
| 697 | |
| 698 | .. code-block:: c++ |
| 699 | |
| 700 | /// primary |
| 701 | /// ::= identifierexpr |
| 702 | /// ::= numberexpr |
| 703 | /// ::= parenexpr |
| 704 | /// ::= ifexpr |
| 705 | /// ::= forexpr |
| 706 | /// ::= varexpr |
| 707 | static std::unique_ptr<ExprAST> ParsePrimary() { |
| 708 | switch (CurTok) { |
| 709 | default: |
| 710 | return LogError("unknown token when expecting an expression"); |
| 711 | case tok_identifier: |
| 712 | return ParseIdentifierExpr(); |
| 713 | case tok_number: |
| 714 | return ParseNumberExpr(); |
| 715 | case '(': |
| 716 | return ParseParenExpr(); |
| 717 | case tok_if: |
| 718 | return ParseIfExpr(); |
| 719 | case tok_for: |
| 720 | return ParseForExpr(); |
| 721 | case tok_var: |
| 722 | return ParseVarExpr(); |
| 723 | } |
| 724 | } |
| 725 | |
| 726 | Next we define ParseVarExpr: |
| 727 | |
| 728 | .. code-block:: c++ |
| 729 | |
| 730 | /// varexpr ::= 'var' identifier ('=' expression)? |
| 731 | // (',' identifier ('=' expression)?)* 'in' expression |
| 732 | static std::unique_ptr<ExprAST> ParseVarExpr() { |
| 733 | getNextToken(); // eat the var. |
| 734 | |
| 735 | std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames; |
| 736 | |
| 737 | // At least one variable name is required. |
| 738 | if (CurTok != tok_identifier) |
| 739 | return LogError("expected identifier after var"); |
| 740 | |
| 741 | The first part of this code parses the list of identifier/expr pairs |
| 742 | into the local ``VarNames`` vector. |
| 743 | |
| 744 | .. code-block:: c++ |
| 745 | |
| 746 | while (1) { |
| 747 | std::string Name = IdentifierStr; |
| 748 | getNextToken(); // eat identifier. |
| 749 | |
| 750 | // Read the optional initializer. |
| 751 | std::unique_ptr<ExprAST> Init; |
| 752 | if (CurTok == '=') { |
| 753 | getNextToken(); // eat the '='. |
| 754 | |
| 755 | Init = ParseExpression(); |
| 756 | if (!Init) return nullptr; |
| 757 | } |
| 758 | |
| 759 | VarNames.push_back(std::make_pair(Name, std::move(Init))); |
| 760 | |
| 761 | // End of var list, exit loop. |
| 762 | if (CurTok != ',') break; |
| 763 | getNextToken(); // eat the ','. |
| 764 | |
| 765 | if (CurTok != tok_identifier) |
| 766 | return LogError("expected identifier list after var"); |
| 767 | } |
| 768 | |
| 769 | Once all the variables are parsed, we then parse the body and create the |
| 770 | AST node: |
| 771 | |
| 772 | .. code-block:: c++ |
| 773 | |
| 774 | // At this point, we have to have 'in'. |
| 775 | if (CurTok != tok_in) |
| 776 | return LogError("expected 'in' keyword after 'var'"); |
| 777 | getNextToken(); // eat 'in'. |
| 778 | |
| 779 | auto Body = ParseExpression(); |
| 780 | if (!Body) |
| 781 | return nullptr; |
| 782 | |
Jonas Devlieghere | 0eaee54 | 2019-08-15 15:54:37 +0000 | [diff] [blame] | 783 | return std::make_unique<VarExprAST>(std::move(VarNames), |
Chris Lattner | d80f118 | 2019-04-07 13:14:23 +0000 | [diff] [blame] | 784 | std::move(Body)); |
| 785 | } |
| 786 | |
| 787 | Now that we can parse and represent the code, we need to support |
| 788 | emission of LLVM IR for it. This code starts out with: |
| 789 | |
| 790 | .. code-block:: c++ |
| 791 | |
| 792 | Value *VarExprAST::codegen() { |
| 793 | std::vector<AllocaInst *> OldBindings; |
| 794 | |
| 795 | Function *TheFunction = Builder.GetInsertBlock()->getParent(); |
| 796 | |
| 797 | // Register all variables and emit their initializer. |
| 798 | for (unsigned i = 0, e = VarNames.size(); i != e; ++i) { |
| 799 | const std::string &VarName = VarNames[i].first; |
| 800 | ExprAST *Init = VarNames[i].second.get(); |
| 801 | |
| 802 | Basically it loops over all the variables, installing them one at a |
| 803 | time. For each variable we put into the symbol table, we remember the |
| 804 | previous value that we replace in OldBindings. |
| 805 | |
| 806 | .. code-block:: c++ |
| 807 | |
| 808 | // Emit the initializer before adding the variable to scope, this prevents |
| 809 | // the initializer from referencing the variable itself, and permits stuff |
| 810 | // like this: |
| 811 | // var a = 1 in |
| 812 | // var a = a in ... # refers to outer 'a'. |
| 813 | Value *InitVal; |
| 814 | if (Init) { |
| 815 | InitVal = Init->codegen(); |
| 816 | if (!InitVal) |
| 817 | return nullptr; |
| 818 | } else { // If not specified, use 0.0. |
| 819 | InitVal = ConstantFP::get(TheContext, APFloat(0.0)); |
| 820 | } |
| 821 | |
| 822 | AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName); |
| 823 | Builder.CreateStore(InitVal, Alloca); |
| 824 | |
| 825 | // Remember the old variable binding so that we can restore the binding when |
| 826 | // we unrecurse. |
| 827 | OldBindings.push_back(NamedValues[VarName]); |
| 828 | |
| 829 | // Remember this binding. |
| 830 | NamedValues[VarName] = Alloca; |
| 831 | } |
| 832 | |
| 833 | There are more comments here than code. The basic idea is that we emit |
| 834 | the initializer, create the alloca, then update the symbol table to |
| 835 | point to it. Once all the variables are installed in the symbol table, |
| 836 | we evaluate the body of the var/in expression: |
| 837 | |
| 838 | .. code-block:: c++ |
| 839 | |
| 840 | // Codegen the body, now that all vars are in scope. |
| 841 | Value *BodyVal = Body->codegen(); |
| 842 | if (!BodyVal) |
| 843 | return nullptr; |
| 844 | |
| 845 | Finally, before returning, we restore the previous variable bindings: |
| 846 | |
| 847 | .. code-block:: c++ |
| 848 | |
| 849 | // Pop all our variables from scope. |
| 850 | for (unsigned i = 0, e = VarNames.size(); i != e; ++i) |
| 851 | NamedValues[VarNames[i].first] = OldBindings[i]; |
| 852 | |
| 853 | // Return the body computation. |
| 854 | return BodyVal; |
| 855 | } |
| 856 | |
| 857 | The end result of all of this is that we get properly scoped variable |
| 858 | definitions, and we even (trivially) allow mutation of them :). |
| 859 | |
| 860 | With this, we completed what we set out to do. Our nice iterative fib |
| 861 | example from the intro compiles and runs just fine. The mem2reg pass |
| 862 | optimizes all of our stack variables into SSA registers, inserting PHI |
| 863 | nodes where needed, and our front-end remains simple: no "iterated |
| 864 | dominance frontier" computation anywhere in sight. |
| 865 | |
| 866 | Full Code Listing |
| 867 | ================= |
| 868 | |
| 869 | Here is the complete code listing for our running example, enhanced with |
| 870 | mutable variables and var/in support. To build this example, use: |
| 871 | |
| 872 | .. code-block:: bash |
| 873 | |
| 874 | # Compile |
| 875 | clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy |
| 876 | # Run |
| 877 | ./toy |
| 878 | |
| 879 | Here is the code: |
| 880 | |
Hans Wennborg | 147e0dd | 2019-04-11 07:30:56 +0000 | [diff] [blame] | 881 | .. literalinclude:: ../../../examples/Kaleidoscope/Chapter7/toy.cpp |
Chris Lattner | d80f118 | 2019-04-07 13:14:23 +0000 | [diff] [blame] | 882 | :language: c++ |
| 883 | |
| 884 | `Next: Compiling to Object Code <LangImpl08.html>`_ |
| 885 | |