Chris Lattner | c9d5d2c | 2007-11-01 06:49:54 +0000 | [diff] [blame^] | 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" |
| 2 | "http://www.w3.org/TR/html4/strict.dtd"> |
| 3 | |
| 4 | <html> |
| 5 | <head> |
| 6 | <title>Kaleidoscope: Extending the Language: Operator Overloading</title> |
| 7 | <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> |
| 8 | <meta name="author" content="Chris Lattner"> |
| 9 | <link rel="stylesheet" href="../llvm.css" type="text/css"> |
| 10 | </head> |
| 11 | |
| 12 | <body> |
| 13 | |
| 14 | <div class="doc_title">Kaleidoscope: Extending the Language: Operator Overloading</div> |
| 15 | |
| 16 | <div class="doc_author"> |
| 17 | <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p> |
| 18 | </div> |
| 19 | |
| 20 | <!-- *********************************************************************** --> |
| 21 | <div class="doc_section"><a name="intro">Part 6 Introduction</a></div> |
| 22 | <!-- *********************************************************************** --> |
| 23 | |
| 24 | <div class="doc_text"> |
| 25 | |
| 26 | <p>Welcome to Part 6 of the "<a href="index.html">Implementing a language with |
| 27 | LLVM</a>" tutorial. At this point in our tutorial, we now have a fully |
| 28 | functional language that is fairly minimal, but also useful. One big problem |
| 29 | with it though is that it doesn't have many useful operators (like division, |
| 30 | logical negation, or even any comparisons other than less-than.</p> |
| 31 | |
| 32 | <p>This chapter of the tutorial takes a wild digression into adding operator |
| 33 | overloading to the simple and beautiful Kaleidoscope language, giving us a |
| 34 | simple and ugly language in some ways, but also a powerful one at the same time. |
| 35 | One of the great things about creating your own language is that you get to |
| 36 | decide what is good or bad. In this tutorial we'll assume that it is okay and |
| 37 | use this as a way to show some interesting parsing techniques.</p> |
| 38 | |
| 39 | </div> |
| 40 | |
| 41 | <!-- *********************************************************************** --> |
| 42 | <div class="doc_section"><a name="idea">Operator Overloading: the Idea</a></div> |
| 43 | <!-- *********************************************************************** --> |
| 44 | |
| 45 | <div class="doc_text"> |
| 46 | |
| 47 | <p> |
| 48 | The operator overloading that we will add to Kaleidoscope is more general than |
| 49 | languages like C++. In C++, you are only allowed to redefine existing |
| 50 | operators: you can't programatically change the grammar, introduce new |
| 51 | operators, change precedence levels, etc. In this chapter, we will add this |
| 52 | capability to Kaleidoscope, which will allow us to round out the set of |
| 53 | operators that are supported, culminating in a more interesting example app.</p> |
| 54 | |
| 55 | <p>The point of going into operator overloading in a tutorial like this is to |
| 56 | show the power and flexibility of using a hand-written parser. The parser we |
| 57 | are using so far is using recursive descent for most parts of the grammar, and |
| 58 | operator precedence parsing for the expressions. See <a |
| 59 | href="LangImpl2.html">Chapter 2</a> for details. Without using operator |
| 60 | precedence parsing, it would be very difficult to allow the programmer to |
| 61 | introduce new operators into the grammar: the grammar is dynamically extensible |
| 62 | as the JIT runs.</p> |
| 63 | |
| 64 | <p>The two specific features we'll add are programmable unary operators (right |
| 65 | now, Kaleidoscope has no unary operators at all) as well as binary operators. |
| 66 | An example of this is:</p> |
| 67 | |
| 68 | <div class="doc_code"> |
| 69 | <pre> |
| 70 | # Logical unary not. |
| 71 | def unary!(v) |
| 72 | if v then |
| 73 | 0 |
| 74 | else |
| 75 | 1; |
| 76 | |
| 77 | # Define > with the same precedence as <. |
| 78 | def binary> 10 (LHS RHS) |
| 79 | !(LHS < RHS); # alternatively, could just use "RHS < LHS" |
| 80 | |
| 81 | # Binary "logical or", (note that it does not "short circuit") |
| 82 | def binary| 5 (LHS RHS) |
| 83 | if LHS then |
| 84 | 1 |
| 85 | else if RHS then |
| 86 | 1 |
| 87 | else |
| 88 | 0; |
| 89 | |
| 90 | # Define = with slightly lower precedence than relationals. |
| 91 | def binary= 9 (LHS RHS) |
| 92 | !(LHS < RHS | LHS > RHS); |
| 93 | </pre> |
| 94 | </div> |
| 95 | |
| 96 | <p>Many languages aspire to being able to implement their standard runtime |
| 97 | library in the language itself. In Kaleidoscope, we can implement significant |
| 98 | parts of the language in the library!</p> |
| 99 | |
| 100 | <p>We will break down implementation of these features into two parts: |
| 101 | implementing support for overloading of binary operators and adding unary |
| 102 | operators.</p> |
| 103 | |
| 104 | </div> |
| 105 | |
| 106 | <!-- *********************************************************************** --> |
| 107 | <div class="doc_section"><a name="binary">Overloading Binary Operators</a></div> |
| 108 | <!-- *********************************************************************** --> |
| 109 | |
| 110 | <div class="doc_text"> |
| 111 | |
| 112 | <p>Adding support for overloaded binary operators is pretty simple with our |
| 113 | current framework. We'll first add support for the unary/binary keywords:</p> |
| 114 | |
| 115 | <div class="doc_code"> |
| 116 | <pre> |
| 117 | enum Token { |
| 118 | ... |
| 119 | <b>// operators |
| 120 | tok_binary = -11, tok_unary = -12</b> |
| 121 | }; |
| 122 | ... |
| 123 | static int gettok() { |
| 124 | ... |
| 125 | if (IdentifierStr == "for") return tok_for; |
| 126 | if (IdentifierStr == "in") return tok_in; |
| 127 | <b>if (IdentifierStr == "binary") return tok_binary; |
| 128 | if (IdentifierStr == "unary") return tok_unary;</b> |
| 129 | return tok_identifier; |
| 130 | </pre> |
| 131 | </div> |
| 132 | |
| 133 | <p>This just adds lexer support for the unary and binary keywords, like we |
| 134 | did in <a href="LangImpl5.html#iflexer">previous chapters</a>. One nice thing |
| 135 | about our current AST is that we represent binary operators fully generally |
| 136 | with their ASCII code as the opcode. For our extended operators, we'll use the |
| 137 | same representation, so we don't need any new AST or parser support.</p> |
| 138 | |
| 139 | <p>On the other hand, we have to be able to represent the definitions of these |
| 140 | new operators, in the "def binary| 5" part of the function definition. In the |
| 141 | grammar so far, the "name" for the function definition is parsed as the |
| 142 | "prototype" production and into the <tt>PrototypeAST</tt> AST node. To |
| 143 | represent our new user-defined operators as prototypes, we have to extend |
| 144 | the <tt>PrototypeAST</tt> AST node like this:</p> |
| 145 | |
| 146 | <div class="doc_code"> |
| 147 | <pre> |
| 148 | /// PrototypeAST - This class represents the "prototype" for a function, |
| 149 | /// which captures its argument names as well as if it is an operator. |
| 150 | class PrototypeAST { |
| 151 | std::string Name; |
| 152 | std::vector<std::string> Args; |
| 153 | <b>bool isOperator; |
| 154 | unsigned Precedence; // Precedence if a binary op.</b> |
| 155 | public: |
| 156 | PrototypeAST(const std::string &name, const std::vector<std::string> &args, |
| 157 | <b>bool isoperator = false, unsigned prec = 0</b>) |
| 158 | : Name(name), Args(args), <b>isOperator(isoperator), Precedence(prec)</b> {} |
| 159 | |
| 160 | <b>bool isUnaryOp() const { return isOperator && Args.size() == 1; } |
| 161 | bool isBinaryOp() const { return isOperator && Args.size() == 2; } |
| 162 | |
| 163 | char getOperatorName() const { |
| 164 | assert(isUnaryOp() || isBinaryOp()); |
| 165 | return Name[Name.size()-1]; |
| 166 | } |
| 167 | |
| 168 | unsigned getBinaryPrecedence() const { return Precedence; }</b> |
| 169 | |
| 170 | Function *Codegen(); |
| 171 | }; |
| 172 | </pre> |
| 173 | </div> |
| 174 | |
| 175 | <p>Basically, in addition to knowing a name for the prototype, we now keep track |
| 176 | of whether it was an operator, and if it was, what precedence level the operator |
| 177 | is at. The precedence is only used for binary operators.</p> |
| 178 | |
| 179 | |
| 180 | <p>...</p> |
| 181 | |
| 182 | </div> |
| 183 | |
| 184 | |
| 185 | <!-- *********************************************************************** --> |
| 186 | <div class="doc_section"><a name="code">Full Code Listing</a></div> |
| 187 | <!-- *********************************************************************** --> |
| 188 | |
| 189 | <div class="doc_text"> |
| 190 | |
| 191 | <p> |
| 192 | Here is the complete code listing for our running example, enhanced with the |
| 193 | if/then/else and for expressions.. To build this example, use: |
| 194 | </p> |
| 195 | |
| 196 | <div class="doc_code"> |
| 197 | <pre> |
| 198 | # Compile |
| 199 | g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy |
| 200 | # Run |
| 201 | ./toy |
| 202 | </pre> |
| 203 | </div> |
| 204 | |
| 205 | <p>Here is the code:</p> |
| 206 | |
| 207 | <div class="doc_code"> |
| 208 | <pre> |
| 209 | </pre> |
| 210 | </div> |
| 211 | |
| 212 | </div> |
| 213 | |
| 214 | <!-- *********************************************************************** --> |
| 215 | <hr> |
| 216 | <address> |
| 217 | <a href="http://jigsaw.w3.org/css-validator/check/referer"><img |
| 218 | src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> |
| 219 | <a href="http://validator.w3.org/check/referer"><img |
| 220 | src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> |
| 221 | |
| 222 | <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> |
| 223 | <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br> |
| 224 | Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $ |
| 225 | </address> |
| 226 | </body> |
| 227 | </html> |