| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" |
| "http://www.w3.org/TR/html4/strict.dtd"> |
| |
| <html> |
| <head> |
| <title>Kaleidoscope: Extending the Language: Mutable Variables / SSA |
| construction</title> |
| <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> |
| <meta name="author" content="Chris Lattner"> |
| <link rel="stylesheet" href="../llvm.css" type="text/css"> |
| </head> |
| |
| <body> |
| |
| <div class="doc_title">Kaleidoscope: Extending the Language: Mutable Variables</div> |
| |
| <div class="doc_author"> |
| <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p> |
| </div> |
| |
| <!-- *********************************************************************** --> |
| <div class="doc_section"><a name="intro">Part 7 Introduction</a></div> |
| <!-- *********************************************************************** --> |
| |
| <div class="doc_text"> |
| |
| <p>Welcome to Part 7 of the "<a href="index.html">Implementing a language with |
| LLVM</a>" tutorial. In parts 1 through 6, we've built a very respectable, |
| albeit simple, <a |
| href="http://en.wikipedia.org/wiki/Functional_programming">functional |
| programming language</a>. In our journey, we learned some parsing techniques, |
| how to build and represent an AST, how to build LLVM IR, and how to optimize |
| the resultant code and JIT compile it.</p> |
| |
| <p>While Kaleidoscope is interesting as a functional language, this makes it |
| "too easy" to generate LLVM IR for it. In particular, a functional language |
| makes it very easy to build LLVM IR directly in <a |
| href="http://en.wikipedia.org/wiki/Static_single_assignment_form">SSA form</a>. |
| Since LLVM requires that the input code be in SSA form, this is a very nice |
| property and it is often unclear to newcomers how to generate code for an |
| imperative language with mutable variables.</p> |
| |
| <p>The short (and happy) summary of this chapter is that there is no need for |
| your front-end to build SSA form: LLVM provides highly tuned and well tested |
| support for this, though the way it works is a bit unexpected for some.</p> |
| |
| </div> |
| |
| <!-- *********************************************************************** --> |
| <div class="doc_section"><a name="why">Why is this a hard problem?</a></div> |
| <!-- *********************************************************************** --> |
| |
| <div class="doc_text"> |
| |
| <p> |
| To understand why mutable variables cause complexities in SSA construction, |
| consider this extremely simple C example: |
| </p> |
| |
| <div class="doc_code"> |
| <pre> |
| int G, H; |
| int test(_Bool Condition) { |
| int X; |
| if (Condition) |
| X = G; |
| else |
| X = H; |
| return X; |
| } |
| </pre> |
| </div> |
| |
| <p>In this case, we have the variable "X", whose value depends on the path |
| executed in the program. Because there are two different possible values for X |
| before the return instruction, a PHI node is inserted to merge the two values. |
| The LLVM IR that we want for this example looks like this:</p> |
| |
| <div class="doc_code"> |
| <pre> |
| @G = weak global i32 0 ; type of @G is i32* |
| @H = weak global i32 0 ; type of @H is i32* |
| |
| define i32 @test(i1 %Condition) { |
| entry: |
| br i1 %Condition, label %cond_true, label %cond_false |
| |
| cond_true: |
| %X.0 = load i32* @G |
| br label %cond_next |
| |
| cond_false: |
| %X.1 = load i32* @H |
| br label %cond_next |
| |
| cond_next: |
| %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] |
| ret i32 %X.2 |
| } |
| </pre> |
| </div> |
| |
| <p>In this example, the loads from the G and H global variables are explicit in |
| the LLVM IR, and they live in the then/else branches of the if statement |
| (cond_true/cond_false). In order to merge the incoming values, the X.2 phi node |
| in the cond_next block selects the right value to use based on where control |
| flow is coming from: if control flow comes from the cond_false block, X.2 gets |
| the value of X.1. Alternatively, if control flow comes from cond_tree, it gets |
| the value of X.0. The intent of this chapter is not to explain the details of |
| SSA form. For more information, see one of the many <a |
| href="http://en.wikipedia.org/wiki/Static_single_assignment_form">online |
| references</a>.</p> |
| |
| <p>The question for this article is "who places phi nodes when lowering |
| assignments to mutable variables?". The issue here is that LLVM |
| <em>requires</em> that its IR be in SSA form: there is no "non-ssa" mode for it. |
| However, SSA construction requires non-trivial algorithms and data structures, |
| so it is inconvenient and wasteful for every front-end to have to reproduce this |
| logic.</p> |
| |
| </div> |
| |
| <!-- *********************************************************************** --> |
| <div class="doc_section"><a name="memory">Memory in LLVM</a></div> |
| <!-- *********************************************************************** --> |
| |
| <div class="doc_text"> |
| |
| <p>The 'trick' here is that while LLVM does require all register values to be |
| in SSA form, it does not require (or permit) memory objects to be in SSA form. |
| In the example above, note that the loads from G and H are direct accesses to |
| G and H: they are not renamed or versioned. This differs from some other |
| compiler systems, which does try to version memory objects. In LLVM, instead of |
| encoding dataflow analysis of memory into the LLVM IR, it is handled with <a |
| href="../WritingAnLLVMPass.html">Analysis Passes</a> which are computed on |
| demand.</p> |
| |
| <p> |
| With this in mind, the high-level idea is that we want to make a stack variable |
| (which lives in memory, because it is on the stack) for each mutable object in |
| a function. To take advantage of this trick, we need to talk about how LLVM |
| represents stack variables. |
| </p> |
| |
| <p>In LLVM, all memory accesses are explicit with load/store instructions, and |
| it is carefully designed to not have (or need) an "address-of" operator. Notice |
| how the type of the @G/@H global variables is actually "i32*" even though the |
| variable is defined as "i32". What this means is that @G defines <em>space</em> |
| for an i32 in the global data area, but its <em>name</em> actually refers to the |
| address for that space. Stack variables work the same way, but instead of being |
| declared with global variable definitions, they are declared with the |
| <a href="../LangRef.html#i_alloca">LLVM alloca instruction</a>:</p> |
| |
| <div class="doc_code"> |
| <pre> |
| define i32 @test(i1 %Condition) { |
| entry: |
| %X = alloca i32 ; type of %X is i32*. |
| ... |
| %tmp = load i32* %X ; load the stack value %X from the stack. |
| %tmp2 = add i32 %tmp, 1 ; increment it |
| store i32 %tmp2, i32* %X ; store it back |
| ... |
| </pre> |
| </div> |
| |
| <p>This code shows an example of how you can declare and manipulate a stack |
| variable in the LLVM IR. Stack memory allocated with the alloca instruction is |
| fully general: you can pass the address of the stack slot to functions, you can |
| store it in other variables, etc. In our example above, we could rewrite the |
| example to use the alloca technique to avoid using a PHI node:</p> |
| |
| <div class="doc_code"> |
| <pre> |
| @G = weak global i32 0 ; type of @G is i32* |
| @H = weak global i32 0 ; type of @H is i32* |
| |
| define i32 @test(i1 %Condition) { |
| entry: |
| %X = alloca i32 ; type of %X is i32*. |
| br i1 %Condition, label %cond_true, label %cond_false |
| |
| cond_true: |
| %X.0 = load i32* @G |
| store i32 %X.0, i32* %X ; Update X |
| br label %cond_next |
| |
| cond_false: |
| %X.1 = load i32* @H |
| store i32 %X.1, i32* %X ; Update X |
| br label %cond_next |
| |
| cond_next: |
| %X.2 = load i32* %X ; Read X |
| ret i32 %X.2 |
| } |
| </pre> |
| </div> |
| |
| <p>With this, we have discovered a way to handle arbitrary mutable variables |
| without the need to create Phi nodes at all:</p> |
| |
| <ol> |
| <li>Each mutable variable becomes a stack allocation.</li> |
| <li>Each read of the variable becomes a load from the stack.</li> |
| <li>Each update of the variable becomes a store to the stack.</li> |
| <li>Taking the address of a variable just uses the stack address directly.</li> |
| </ol> |
| |
| <p>While this solution has solved our immediate problem, it introduced another |
| one: we have now apparently introduced a lot of stack traffic for very simple |
| and common operations, a major performance problem. Fortunately for us, the |
| LLVM optimizer has a highly-tuned optimization pass named "mem2reg" that handles |
| this case, promoting allocas like this into SSA registers, inserting Phi nodes |
| as appropriate. If you run this example through the pass, for example, you'll |
| get:</p> |
| |
| <div class="doc_code"> |
| <pre> |
| $ <b>llvm-as < example.ll | opt -mem2reg | llvm-dis</b> |
| @G = weak global i32 0 |
| @H = weak global i32 0 |
| |
| define i32 @test(i1 %Condition) { |
| entry: |
| br i1 %Condition, label %cond_true, label %cond_false |
| |
| cond_true: |
| %X.0 = load i32* @G |
| br label %cond_next |
| |
| cond_false: |
| %X.1 = load i32* @H |
| br label %cond_next |
| |
| cond_next: |
| %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] |
| ret i32 %X.01 |
| } |
| </pre> |
| </div> |
| |
| <p>The mem2reg pass implements the standard "iterated dominator frontier" |
| algorithm for constructing SSA form and has a number of optimizations that speed |
| up very common degenerate cases. mem2reg really is the answer for dealing with |
| mutable variables, and we highly recommend that you depend on it. Note that |
| mem2reg only works on variables in certain circumstances:</p> |
| |
| <ol> |
| <li>mem2reg is alloca-driven: it looks for allocas and if it can handle them, it |
| promotes them. It does not apply to global variables or heap allocations.</li> |
| |
| <li>mem2reg only looks for alloca instructions in the entry block of the |
| function. Being in the entry block guarantees that the alloca is only executed |
| once, which makes analysis simpler.</li> |
| |
| <li>mem2reg only promotes allocas whose uses are direct loads and stores. If |
| the address of the stack object is passed to a function, or if any funny pointer |
| arithmetic is involved, the alloca will not be promoted.</li> |
| |
| <li>mem2reg only works on allocas of scalar values, and only if the array size |
| of the allocation is 1 (or missing in the .ll file). mem2reg is not capable of |
| promoting structs or arrays to registers. Note that the "scalarrepl" pass is |
| more powerful and can promote structs, "unions", and arrays in many cases.</li> |
| |
| </ol> |
| |
| <p> |
| All of these properties are easy to satisfy for most imperative languages, and |
| we'll illustrated this below with Kaleidoscope. The final question you may be |
| asking is: should I bother with this nonsense for my front-end? Wouldn't it be |
| better if I just did SSA construction directly, avoiding use of the mem2reg |
| optimization pass? In short, we strongly recommend that use you this technique |
| for building SSA form, unless there is an extremely good reason not to. Using |
| this technique is:</p> |
| |
| <ul> |
| <li>Proven and well tested: llvm-gcc and clang both use this technique for local |
| mutable variables. As such, the most common clients of LLVM are using this to |
| handle a bulk of their variables. You can be sure that bugs are found fast and |
| fixed early.</li> |
| |
| <li>Extremely Fast: mem2reg has a number of special cases that make it fast in |
| common cases as well as fully general. For example, it has fast-paths for |
| variables that are only used in a single block, variables that only have one |
| assignment point, good heuristics to avoid insertion of unneeded phi nodes, etc. |
| </li> |
| |
| <li>Needed for debug info generation: <a href="../SourceLevelDebugging.html"> |
| Debug information in LLVM</a> relies on having the address of the variable |
| exposed to attach debug info to it. This technique dovetails very naturally |
| with this style of debug info.</li> |
| </ul> |
| |
| <p>If nothing else, this makes it much easier to get your front-end up and |
| running, and is very simple to implement. Lets extend Kaleidoscope with mutable |
| variables now! |
| </p> |
| </div> |
| |
| |
| <!-- *********************************************************************** --> |
| <div class="doc_section"><a name="code">Full Code Listing</a></div> |
| <!-- *********************************************************************** --> |
| |
| <div class="doc_text"> |
| |
| <p> |
| Here is the complete code listing for our running example, enhanced with the |
| if/then/else and for expressions.. To build this example, use: |
| </p> |
| |
| <div class="doc_code"> |
| <pre> |
| # Compile |
| g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy |
| # Run |
| ./toy |
| </pre> |
| </div> |
| |
| <p>Here is the code:</p> |
| |
| <div class="doc_code"> |
| <pre> |
| </pre> |
| </div> |
| |
| </div> |
| |
| <!-- *********************************************************************** --> |
| <hr> |
| <address> |
| <a href="http://jigsaw.w3.org/css-validator/check/referer"><img |
| src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> |
| <a href="http://validator.w3.org/check/referer"><img |
| src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> |
| |
| <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> |
| <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br> |
| Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $ |
| </address> |
| </body> |
| </html> |