| Chris Lattner | c9d5d2c | 2007-11-01 06:49:54 +0000 | [diff] [blame^] | 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" | 
|  | 2 | "http://www.w3.org/TR/html4/strict.dtd"> | 
|  | 3 |  | 
|  | 4 | <html> | 
|  | 5 | <head> | 
|  | 6 | <title>Kaleidoscope: Extending the Language: Operator Overloading</title> | 
|  | 7 | <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> | 
|  | 8 | <meta name="author" content="Chris Lattner"> | 
|  | 9 | <link rel="stylesheet" href="../llvm.css" type="text/css"> | 
|  | 10 | </head> | 
|  | 11 |  | 
|  | 12 | <body> | 
|  | 13 |  | 
|  | 14 | <div class="doc_title">Kaleidoscope: Extending the Language: Operator Overloading</div> | 
|  | 15 |  | 
|  | 16 | <div class="doc_author"> | 
|  | 17 | <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p> | 
|  | 18 | </div> | 
|  | 19 |  | 
|  | 20 | <!-- *********************************************************************** --> | 
|  | 21 | <div class="doc_section"><a name="intro">Part 6 Introduction</a></div> | 
|  | 22 | <!-- *********************************************************************** --> | 
|  | 23 |  | 
|  | 24 | <div class="doc_text"> | 
|  | 25 |  | 
|  | 26 | <p>Welcome to Part 6 of the "<a href="index.html">Implementing a language with | 
|  | 27 | LLVM</a>" tutorial.  At this point in our tutorial, we now have a fully | 
|  | 28 | functional language that is fairly minimal, but also useful.  One big problem | 
|  | 29 | with it though is that it doesn't have many useful operators (like division, | 
|  | 30 | logical negation, or even any comparisons other than less-than.</p> | 
|  | 31 |  | 
|  | 32 | <p>This chapter of the tutorial takes a wild digression into adding operator | 
|  | 33 | overloading to the simple and beautiful Kaleidoscope language, giving us a | 
|  | 34 | simple and ugly language in some ways, but also a powerful one at the same time. | 
|  | 35 | One of the great things about creating your own language is that you get to | 
|  | 36 | decide what is good or bad.  In this tutorial we'll assume that it is okay and | 
|  | 37 | use this as a way to show some interesting parsing techniques.</p> | 
|  | 38 |  | 
|  | 39 | </div> | 
|  | 40 |  | 
|  | 41 | <!-- *********************************************************************** --> | 
|  | 42 | <div class="doc_section"><a name="idea">Operator Overloading: the Idea</a></div> | 
|  | 43 | <!-- *********************************************************************** --> | 
|  | 44 |  | 
|  | 45 | <div class="doc_text"> | 
|  | 46 |  | 
|  | 47 | <p> | 
|  | 48 | The operator overloading that we will add to Kaleidoscope is more general than | 
|  | 49 | languages like C++.  In C++, you are only allowed to redefine existing | 
|  | 50 | operators: you can't programatically change the grammar, introduce new | 
|  | 51 | operators, change precedence levels, etc.  In this chapter, we will add this | 
|  | 52 | capability to Kaleidoscope, which will allow us to round out the set of | 
|  | 53 | operators that are supported, culminating in a more interesting example app.</p> | 
|  | 54 |  | 
|  | 55 | <p>The point of going into operator overloading in a tutorial like this is to | 
|  | 56 | show the power and flexibility of using a hand-written parser.  The parser we | 
|  | 57 | are using so far is using recursive descent for most parts of the grammar, and | 
|  | 58 | operator precedence parsing for the expressions.  See <a | 
|  | 59 | href="LangImpl2.html">Chapter 2</a> for details.  Without using operator | 
|  | 60 | precedence parsing, it would be very difficult to allow the programmer to | 
|  | 61 | introduce new operators into the grammar: the grammar is dynamically extensible | 
|  | 62 | as the JIT runs.</p> | 
|  | 63 |  | 
|  | 64 | <p>The two specific features we'll add are programmable unary operators (right | 
|  | 65 | now, Kaleidoscope has no unary operators at all) as well as binary operators. | 
|  | 66 | An example of this is:</p> | 
|  | 67 |  | 
|  | 68 | <div class="doc_code"> | 
|  | 69 | <pre> | 
|  | 70 | # Logical unary not. | 
|  | 71 | def unary!(v) | 
|  | 72 | if v then | 
|  | 73 | 0 | 
|  | 74 | else | 
|  | 75 | 1; | 
|  | 76 |  | 
|  | 77 | # Define > with the same precedence as <. | 
|  | 78 | def binary> 10 (LHS RHS) | 
|  | 79 | !(LHS < RHS);     # alternatively, could just use "RHS < LHS" | 
|  | 80 |  | 
|  | 81 | # Binary "logical or", (note that it does not "short circuit") | 
|  | 82 | def binary| 5 (LHS RHS) | 
|  | 83 | if LHS then | 
|  | 84 | 1 | 
|  | 85 | else if RHS then | 
|  | 86 | 1 | 
|  | 87 | else | 
|  | 88 | 0; | 
|  | 89 |  | 
|  | 90 | # Define = with slightly lower precedence than relationals. | 
|  | 91 | def binary= 9 (LHS RHS) | 
|  | 92 | !(LHS < RHS | LHS > RHS); | 
|  | 93 | </pre> | 
|  | 94 | </div> | 
|  | 95 |  | 
|  | 96 | <p>Many languages aspire to being able to implement their standard runtime | 
|  | 97 | library in the language itself.  In Kaleidoscope, we can implement significant | 
|  | 98 | parts of the language in the library!</p> | 
|  | 99 |  | 
|  | 100 | <p>We will break down implementation of these features into two parts: | 
|  | 101 | implementing support for overloading of binary operators and adding unary | 
|  | 102 | operators.</p> | 
|  | 103 |  | 
|  | 104 | </div> | 
|  | 105 |  | 
|  | 106 | <!-- *********************************************************************** --> | 
|  | 107 | <div class="doc_section"><a name="binary">Overloading Binary Operators</a></div> | 
|  | 108 | <!-- *********************************************************************** --> | 
|  | 109 |  | 
|  | 110 | <div class="doc_text"> | 
|  | 111 |  | 
|  | 112 | <p>Adding support for overloaded binary operators is pretty simple with our | 
|  | 113 | current framework.  We'll first add support for the unary/binary keywords:</p> | 
|  | 114 |  | 
|  | 115 | <div class="doc_code"> | 
|  | 116 | <pre> | 
|  | 117 | enum Token { | 
|  | 118 | ... | 
|  | 119 | <b>// operators | 
|  | 120 | tok_binary = -11, tok_unary = -12</b> | 
|  | 121 | }; | 
|  | 122 | ... | 
|  | 123 | static int gettok() { | 
|  | 124 | ... | 
|  | 125 | if (IdentifierStr == "for") return tok_for; | 
|  | 126 | if (IdentifierStr == "in") return tok_in; | 
|  | 127 | <b>if (IdentifierStr == "binary") return tok_binary; | 
|  | 128 | if (IdentifierStr == "unary") return tok_unary;</b> | 
|  | 129 | return tok_identifier; | 
|  | 130 | </pre> | 
|  | 131 | </div> | 
|  | 132 |  | 
|  | 133 | <p>This just adds lexer support for the unary and binary keywords, like we | 
|  | 134 | did in <a href="LangImpl5.html#iflexer">previous chapters</a>.  One nice thing | 
|  | 135 | about our current AST is that we represent binary operators fully generally | 
|  | 136 | with their ASCII code as the opcode.  For our extended operators, we'll use the | 
|  | 137 | same representation, so we don't need any new AST or parser support.</p> | 
|  | 138 |  | 
|  | 139 | <p>On the other hand, we have to be able to represent the definitions of these | 
|  | 140 | new operators, in the "def binary| 5" part of the function definition.  In the | 
|  | 141 | grammar so far, the "name" for the function definition is parsed as the | 
|  | 142 | "prototype" production and into the <tt>PrototypeAST</tt> AST node.  To | 
|  | 143 | represent our new user-defined operators as prototypes, we have to extend | 
|  | 144 | the  <tt>PrototypeAST</tt> AST node like this:</p> | 
|  | 145 |  | 
|  | 146 | <div class="doc_code"> | 
|  | 147 | <pre> | 
|  | 148 | /// PrototypeAST - This class represents the "prototype" for a function, | 
|  | 149 | /// which captures its argument names as well as if it is an operator. | 
|  | 150 | class PrototypeAST { | 
|  | 151 | std::string Name; | 
|  | 152 | std::vector<std::string> Args; | 
|  | 153 | <b>bool isOperator; | 
|  | 154 | unsigned Precedence;  // Precedence if a binary op.</b> | 
|  | 155 | public: | 
|  | 156 | PrototypeAST(const std::string &name, const std::vector<std::string> &args, | 
|  | 157 | <b>bool isoperator = false, unsigned prec = 0</b>) | 
|  | 158 | : Name(name), Args(args), <b>isOperator(isoperator), Precedence(prec)</b> {} | 
|  | 159 |  | 
|  | 160 | <b>bool isUnaryOp() const { return isOperator && Args.size() == 1; } | 
|  | 161 | bool isBinaryOp() const { return isOperator && Args.size() == 2; } | 
|  | 162 |  | 
|  | 163 | char getOperatorName() const { | 
|  | 164 | assert(isUnaryOp() || isBinaryOp()); | 
|  | 165 | return Name[Name.size()-1]; | 
|  | 166 | } | 
|  | 167 |  | 
|  | 168 | unsigned getBinaryPrecedence() const { return Precedence; }</b> | 
|  | 169 |  | 
|  | 170 | Function *Codegen(); | 
|  | 171 | }; | 
|  | 172 | </pre> | 
|  | 173 | </div> | 
|  | 174 |  | 
|  | 175 | <p>Basically, in addition to knowing a name for the prototype, we now keep track | 
|  | 176 | of whether it was an operator, and if it was, what precedence level the operator | 
|  | 177 | is at.  The precedence is only used for binary operators.</p> | 
|  | 178 |  | 
|  | 179 |  | 
|  | 180 | <p>...</p> | 
|  | 181 |  | 
|  | 182 | </div> | 
|  | 183 |  | 
|  | 184 |  | 
|  | 185 | <!-- *********************************************************************** --> | 
|  | 186 | <div class="doc_section"><a name="code">Full Code Listing</a></div> | 
|  | 187 | <!-- *********************************************************************** --> | 
|  | 188 |  | 
|  | 189 | <div class="doc_text"> | 
|  | 190 |  | 
|  | 191 | <p> | 
|  | 192 | Here is the complete code listing for our running example, enhanced with the | 
|  | 193 | if/then/else and for expressions..  To build this example, use: | 
|  | 194 | </p> | 
|  | 195 |  | 
|  | 196 | <div class="doc_code"> | 
|  | 197 | <pre> | 
|  | 198 | # Compile | 
|  | 199 | g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy | 
|  | 200 | # Run | 
|  | 201 | ./toy | 
|  | 202 | </pre> | 
|  | 203 | </div> | 
|  | 204 |  | 
|  | 205 | <p>Here is the code:</p> | 
|  | 206 |  | 
|  | 207 | <div class="doc_code"> | 
|  | 208 | <pre> | 
|  | 209 | </pre> | 
|  | 210 | </div> | 
|  | 211 |  | 
|  | 212 | </div> | 
|  | 213 |  | 
|  | 214 | <!-- *********************************************************************** --> | 
|  | 215 | <hr> | 
|  | 216 | <address> | 
|  | 217 | <a href="http://jigsaw.w3.org/css-validator/check/referer"><img | 
|  | 218 | src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> | 
|  | 219 | <a href="http://validator.w3.org/check/referer"><img | 
|  | 220 | src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> | 
|  | 221 |  | 
|  | 222 | <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> | 
|  | 223 | <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br> | 
|  | 224 | Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $ | 
|  | 225 | </address> | 
|  | 226 | </body> | 
|  | 227 | </html> |