Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 1 | ============================================================ |
| 2 | Kaleidoscope: Extending the Language: User-defined Operators |
| 3 | ============================================================ |
| 4 | |
| 5 | .. contents:: |
| 6 | :local: |
| 7 | |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 8 | Chapter 6 Introduction |
| 9 | ====================== |
| 10 | |
| 11 | Welcome to Chapter 6 of the "`Implementing a language with |
| 12 | LLVM <index.html>`_" tutorial. At this point in our tutorial, we now |
| 13 | have a fully functional language that is fairly minimal, but also |
| 14 | useful. There is still one big problem with it, however. Our language |
| 15 | doesn't have many useful operators (like division, logical negation, or |
| 16 | even any comparisons besides less-than). |
| 17 | |
| 18 | This chapter of the tutorial takes a wild digression into adding |
| 19 | user-defined operators to the simple and beautiful Kaleidoscope |
| 20 | language. This digression now gives us a simple and ugly language in |
| 21 | some ways, but also a powerful one at the same time. One of the great |
| 22 | things about creating your own language is that you get to decide what |
| 23 | is good or bad. In this tutorial we'll assume that it is okay to use |
| 24 | this as a way to show some interesting parsing techniques. |
| 25 | |
| 26 | At the end of this tutorial, we'll run through an example Kaleidoscope |
Alex Denisov | 596e979 | 2015-12-15 20:50:29 +0000 | [diff] [blame] | 27 | application that `renders the Mandelbrot set <#kicking-the-tires>`_. This gives an |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 28 | example of what you can build with Kaleidoscope and its feature set. |
| 29 | |
| 30 | User-defined Operators: the Idea |
| 31 | ================================ |
| 32 | |
| 33 | The "operator overloading" that we will add to Kaleidoscope is more |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 34 | general than in languages like C++. In C++, you are only allowed to |
Sylvestre Ledru | 843b751 | 2016-08-28 20:29:18 +0000 | [diff] [blame] | 35 | redefine existing operators: you can't programmatically change the |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 36 | grammar, introduce new operators, change precedence levels, etc. In this |
| 37 | chapter, we will add this capability to Kaleidoscope, which will let the |
| 38 | user round out the set of operators that are supported. |
| 39 | |
| 40 | The point of going into user-defined operators in a tutorial like this |
| 41 | is to show the power and flexibility of using a hand-written parser. |
| 42 | Thus far, the parser we have been implementing uses recursive descent |
| 43 | for most parts of the grammar and operator precedence parsing for the |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 44 | expressions. See `Chapter 2 <LangImpl2.html>`_ for details. By |
| 45 | using operator precedence parsing, it is very easy to allow |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 46 | the programmer to introduce new operators into the grammar: the grammar |
| 47 | is dynamically extensible as the JIT runs. |
| 48 | |
| 49 | The two specific features we'll add are programmable unary operators |
| 50 | (right now, Kaleidoscope has no unary operators at all) as well as |
| 51 | binary operators. An example of this is: |
| 52 | |
| 53 | :: |
| 54 | |
| 55 | # Logical unary not. |
| 56 | def unary!(v) |
| 57 | if v then |
| 58 | 0 |
| 59 | else |
| 60 | 1; |
| 61 | |
| 62 | # Define > with the same precedence as <. |
| 63 | def binary> 10 (LHS RHS) |
| 64 | RHS < LHS; |
| 65 | |
| 66 | # Binary "logical or", (note that it does not "short circuit") |
| 67 | def binary| 5 (LHS RHS) |
| 68 | if LHS then |
| 69 | 1 |
| 70 | else if RHS then |
| 71 | 1 |
| 72 | else |
| 73 | 0; |
| 74 | |
| 75 | # Define = with slightly lower precedence than relationals. |
| 76 | def binary= 9 (LHS RHS) |
| 77 | !(LHS < RHS | LHS > RHS); |
| 78 | |
| 79 | Many languages aspire to being able to implement their standard runtime |
| 80 | library in the language itself. In Kaleidoscope, we can implement |
| 81 | significant parts of the language in the library! |
| 82 | |
| 83 | We will break down implementation of these features into two parts: |
| 84 | implementing support for user-defined binary operators and adding unary |
| 85 | operators. |
| 86 | |
| 87 | User-defined Binary Operators |
| 88 | ============================= |
| 89 | |
| 90 | Adding support for user-defined binary operators is pretty simple with |
| 91 | our current framework. We'll first add support for the unary/binary |
| 92 | keywords: |
| 93 | |
| 94 | .. code-block:: c++ |
| 95 | |
| 96 | enum Token { |
| 97 | ... |
| 98 | // operators |
Lang Hames | 59b0da8 | 2015-08-19 18:15:58 +0000 | [diff] [blame] | 99 | tok_binary = -11, |
| 100 | tok_unary = -12 |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 101 | }; |
| 102 | ... |
| 103 | static int gettok() { |
| 104 | ... |
Lang Hames | 59b0da8 | 2015-08-19 18:15:58 +0000 | [diff] [blame] | 105 | if (IdentifierStr == "for") |
| 106 | return tok_for; |
| 107 | if (IdentifierStr == "in") |
| 108 | return tok_in; |
| 109 | if (IdentifierStr == "binary") |
| 110 | return tok_binary; |
| 111 | if (IdentifierStr == "unary") |
| 112 | return tok_unary; |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 113 | return tok_identifier; |
| 114 | |
| 115 | This just adds lexer support for the unary and binary keywords, like we |
Alex Denisov | 596e979 | 2015-12-15 20:50:29 +0000 | [diff] [blame] | 116 | did in `previous chapters <LangImpl5.html#lexer-extensions-for-if-then-else>`_. One nice thing |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 117 | about our current AST, is that we represent binary operators with full |
| 118 | generalisation by using their ASCII code as the opcode. For our extended |
| 119 | operators, we'll use this same representation, so we don't need any new |
| 120 | AST or parser support. |
| 121 | |
| 122 | On the other hand, we have to be able to represent the definitions of |
| 123 | these new operators, in the "def binary\| 5" part of the function |
| 124 | definition. In our grammar so far, the "name" for the function |
| 125 | definition is parsed as the "prototype" production and into the |
| 126 | ``PrototypeAST`` AST node. To represent our new user-defined operators |
| 127 | as prototypes, we have to extend the ``PrototypeAST`` AST node like |
| 128 | this: |
| 129 | |
| 130 | .. code-block:: c++ |
| 131 | |
| 132 | /// PrototypeAST - This class represents the "prototype" for a function, |
| 133 | /// which captures its argument names as well as if it is an operator. |
| 134 | class PrototypeAST { |
| 135 | std::string Name; |
| 136 | std::vector<std::string> Args; |
Lang Hames | 09bf4c1 | 2015-08-18 18:11:06 +0000 | [diff] [blame] | 137 | bool IsOperator; |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 138 | unsigned Precedence; // Precedence if a binary op. |
Lang Hames | 59b0da8 | 2015-08-19 18:15:58 +0000 | [diff] [blame] | 139 | |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 140 | public: |
Lang Hames | 09bf4c1 | 2015-08-18 18:11:06 +0000 | [diff] [blame] | 141 | PrototypeAST(const std::string &name, std::vector<std::string> Args, |
| 142 | bool IsOperator = false, unsigned Prec = 0) |
| 143 | : Name(name), Args(std::move(Args)), IsOperator(IsOperator), |
| 144 | Precedence(Prec) {} |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 145 | |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 146 | Function *codegen(); |
| 147 | const std::string &getName() const { return Name; } |
| 148 | |
Lang Hames | 09bf4c1 | 2015-08-18 18:11:06 +0000 | [diff] [blame] | 149 | bool isUnaryOp() const { return IsOperator && Args.size() == 1; } |
| 150 | bool isBinaryOp() const { return IsOperator && Args.size() == 2; } |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 151 | |
| 152 | char getOperatorName() const { |
| 153 | assert(isUnaryOp() || isBinaryOp()); |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 154 | return Name[Name.size() - 1]; |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 155 | } |
| 156 | |
| 157 | unsigned getBinaryPrecedence() const { return Precedence; } |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 158 | }; |
| 159 | |
| 160 | Basically, in addition to knowing a name for the prototype, we now keep |
| 161 | track of whether it was an operator, and if it was, what precedence |
| 162 | level the operator is at. The precedence is only used for binary |
| 163 | operators (as you'll see below, it just doesn't apply for unary |
| 164 | operators). Now that we have a way to represent the prototype for a |
| 165 | user-defined operator, we need to parse it: |
| 166 | |
| 167 | .. code-block:: c++ |
| 168 | |
| 169 | /// prototype |
| 170 | /// ::= id '(' id* ')' |
| 171 | /// ::= binary LETTER number? (id, id) |
Lang Hames | 09bf4c1 | 2015-08-18 18:11:06 +0000 | [diff] [blame] | 172 | static std::unique_ptr<PrototypeAST> ParsePrototype() { |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 173 | std::string FnName; |
| 174 | |
| 175 | unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary. |
| 176 | unsigned BinaryPrecedence = 30; |
| 177 | |
| 178 | switch (CurTok) { |
| 179 | default: |
Lang Hames | f9878c5 | 2016-03-25 17:33:32 +0000 | [diff] [blame] | 180 | return LogErrorP("Expected function name in prototype"); |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 181 | case tok_identifier: |
| 182 | FnName = IdentifierStr; |
| 183 | Kind = 0; |
| 184 | getNextToken(); |
| 185 | break; |
| 186 | case tok_binary: |
| 187 | getNextToken(); |
| 188 | if (!isascii(CurTok)) |
Lang Hames | f9878c5 | 2016-03-25 17:33:32 +0000 | [diff] [blame] | 189 | return LogErrorP("Expected binary operator"); |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 190 | FnName = "binary"; |
| 191 | FnName += (char)CurTok; |
| 192 | Kind = 2; |
| 193 | getNextToken(); |
| 194 | |
| 195 | // Read the precedence if present. |
| 196 | if (CurTok == tok_number) { |
| 197 | if (NumVal < 1 || NumVal > 100) |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 198 | return LogErrorP("Invalid precedence: must be 1..100"); |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 199 | BinaryPrecedence = (unsigned)NumVal; |
| 200 | getNextToken(); |
| 201 | } |
| 202 | break; |
| 203 | } |
| 204 | |
| 205 | if (CurTok != '(') |
Lang Hames | f9878c5 | 2016-03-25 17:33:32 +0000 | [diff] [blame] | 206 | return LogErrorP("Expected '(' in prototype"); |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 207 | |
| 208 | std::vector<std::string> ArgNames; |
| 209 | while (getNextToken() == tok_identifier) |
| 210 | ArgNames.push_back(IdentifierStr); |
| 211 | if (CurTok != ')') |
Lang Hames | f9878c5 | 2016-03-25 17:33:32 +0000 | [diff] [blame] | 212 | return LogErrorP("Expected ')' in prototype"); |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 213 | |
| 214 | // success. |
| 215 | getNextToken(); // eat ')'. |
| 216 | |
| 217 | // Verify right number of names for operator. |
| 218 | if (Kind && ArgNames.size() != Kind) |
Lang Hames | f9878c5 | 2016-03-25 17:33:32 +0000 | [diff] [blame] | 219 | return LogErrorP("Invalid number of operands for operator"); |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 220 | |
Lang Hames | 59b0da8 | 2015-08-19 18:15:58 +0000 | [diff] [blame] | 221 | return llvm::make_unique<PrototypeAST>(FnName, std::move(ArgNames), Kind != 0, |
| 222 | BinaryPrecedence); |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 223 | } |
| 224 | |
| 225 | This is all fairly straightforward parsing code, and we have already |
| 226 | seen a lot of similar code in the past. One interesting part about the |
| 227 | code above is the couple lines that set up ``FnName`` for binary |
| 228 | operators. This builds names like "binary@" for a newly defined "@" |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 229 | operator. It then takes advantage of the fact that symbol names in the |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 230 | LLVM symbol table are allowed to have any character in them, including |
| 231 | embedded nul characters. |
| 232 | |
| 233 | The next interesting thing to add, is codegen support for these binary |
| 234 | operators. Given our current structure, this is a simple addition of a |
| 235 | default case for our existing binary operator node: |
| 236 | |
| 237 | .. code-block:: c++ |
| 238 | |
Lang Hames | 2d789c3 | 2015-08-26 03:07:41 +0000 | [diff] [blame] | 239 | Value *BinaryExprAST::codegen() { |
| 240 | Value *L = LHS->codegen(); |
| 241 | Value *R = RHS->codegen(); |
Lang Hames | 59b0da8 | 2015-08-19 18:15:58 +0000 | [diff] [blame] | 242 | if (!L || !R) |
| 243 | return nullptr; |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 244 | |
| 245 | switch (Op) { |
Lang Hames | 59b0da8 | 2015-08-19 18:15:58 +0000 | [diff] [blame] | 246 | case '+': |
| 247 | return Builder.CreateFAdd(L, R, "addtmp"); |
| 248 | case '-': |
| 249 | return Builder.CreateFSub(L, R, "subtmp"); |
| 250 | case '*': |
| 251 | return Builder.CreateFMul(L, R, "multmp"); |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 252 | case '<': |
| 253 | L = Builder.CreateFCmpULT(L, R, "cmptmp"); |
| 254 | // Convert bool 0/1 to double 0.0 or 1.0 |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 255 | return Builder.CreateUIToFP(L, Type::getDoubleTy(TheContext), |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 256 | "booltmp"); |
Lang Hames | 59b0da8 | 2015-08-19 18:15:58 +0000 | [diff] [blame] | 257 | default: |
| 258 | break; |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 259 | } |
| 260 | |
| 261 | // If it wasn't a builtin binary operator, it must be a user defined one. Emit |
| 262 | // a call to it. |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 263 | Function *F = getFunction(std::string("binary") + Op); |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 264 | assert(F && "binary operator not found!"); |
| 265 | |
| 266 | Value *Ops[2] = { L, R }; |
| 267 | return Builder.CreateCall(F, Ops, "binop"); |
| 268 | } |
| 269 | |
| 270 | As you can see above, the new code is actually really simple. It just |
| 271 | does a lookup for the appropriate operator in the symbol table and |
| 272 | generates a function call to it. Since user-defined operators are just |
| 273 | built as normal functions (because the "prototype" boils down to a |
| 274 | function with the right name) everything falls into place. |
| 275 | |
| 276 | The final piece of code we are missing, is a bit of top-level magic: |
| 277 | |
| 278 | .. code-block:: c++ |
| 279 | |
Lang Hames | 2d789c3 | 2015-08-26 03:07:41 +0000 | [diff] [blame] | 280 | Function *FunctionAST::codegen() { |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 281 | // Transfer ownership of the prototype to the FunctionProtos map, but keep a |
| 282 | // reference to it for use below. |
| 283 | auto &P = *Proto; |
| 284 | FunctionProtos[Proto->getName()] = std::move(Proto); |
| 285 | Function *TheFunction = getFunction(P.getName()); |
Lang Hames | 59b0da8 | 2015-08-19 18:15:58 +0000 | [diff] [blame] | 286 | if (!TheFunction) |
| 287 | return nullptr; |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 288 | |
| 289 | // If this is an operator, install it. |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 290 | if (P.isBinaryOp()) |
| 291 | BinopPrecedence[P.getOperatorName()] = P.getBinaryPrecedence(); |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 292 | |
| 293 | // Create a new basic block to start insertion into. |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 294 | BasicBlock *BB = BasicBlock::Create(TheContext, "entry", TheFunction); |
| 295 | ... |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 296 | |
| 297 | Basically, before codegening a function, if it is a user-defined |
| 298 | operator, we register it in the precedence table. This allows the binary |
| 299 | operator parsing logic we already have in place to handle it. Since we |
| 300 | are working on a fully-general operator precedence parser, this is all |
| 301 | we need to do to "extend the grammar". |
| 302 | |
| 303 | Now we have useful user-defined binary operators. This builds a lot on |
| 304 | the previous framework we built for other operators. Adding unary |
| 305 | operators is a bit more challenging, because we don't have any framework |
| 306 | for it yet - lets see what it takes. |
| 307 | |
| 308 | User-defined Unary Operators |
| 309 | ============================ |
| 310 | |
| 311 | Since we don't currently support unary operators in the Kaleidoscope |
| 312 | language, we'll need to add everything to support them. Above, we added |
| 313 | simple support for the 'unary' keyword to the lexer. In addition to |
| 314 | that, we need an AST node: |
| 315 | |
| 316 | .. code-block:: c++ |
| 317 | |
| 318 | /// UnaryExprAST - Expression class for a unary operator. |
| 319 | class UnaryExprAST : public ExprAST { |
| 320 | char Opcode; |
Lang Hames | 09bf4c1 | 2015-08-18 18:11:06 +0000 | [diff] [blame] | 321 | std::unique_ptr<ExprAST> Operand; |
Lang Hames | 59b0da8 | 2015-08-19 18:15:58 +0000 | [diff] [blame] | 322 | |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 323 | public: |
Lang Hames | 09bf4c1 | 2015-08-18 18:11:06 +0000 | [diff] [blame] | 324 | UnaryExprAST(char Opcode, std::unique_ptr<ExprAST> Operand) |
| 325 | : Opcode(Opcode), Operand(std::move(Operand)) {} |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 326 | |
| 327 | Value *codegen() override; |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 328 | }; |
| 329 | |
| 330 | This AST node is very simple and obvious by now. It directly mirrors the |
| 331 | binary operator AST node, except that it only has one child. With this, |
| 332 | we need to add the parsing logic. Parsing a unary operator is pretty |
| 333 | simple: we'll add a new function to do it: |
| 334 | |
| 335 | .. code-block:: c++ |
| 336 | |
| 337 | /// unary |
| 338 | /// ::= primary |
| 339 | /// ::= '!' unary |
Lang Hames | 09bf4c1 | 2015-08-18 18:11:06 +0000 | [diff] [blame] | 340 | static std::unique_ptr<ExprAST> ParseUnary() { |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 341 | // If the current token is not an operator, it must be a primary expr. |
| 342 | if (!isascii(CurTok) || CurTok == '(' || CurTok == ',') |
| 343 | return ParsePrimary(); |
| 344 | |
| 345 | // If this is a unary operator, read it. |
| 346 | int Opc = CurTok; |
| 347 | getNextToken(); |
Lang Hames | 09bf4c1 | 2015-08-18 18:11:06 +0000 | [diff] [blame] | 348 | if (auto Operand = ParseUnary()) |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 349 | return llvm::make_unique<UnaryExprAST>(Opc, std::move(Operand)); |
Lang Hames | 09bf4c1 | 2015-08-18 18:11:06 +0000 | [diff] [blame] | 350 | return nullptr; |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 351 | } |
| 352 | |
| 353 | The grammar we add is pretty straightforward here. If we see a unary |
| 354 | operator when parsing a primary operator, we eat the operator as a |
| 355 | prefix and parse the remaining piece as another unary operator. This |
| 356 | allows us to handle multiple unary operators (e.g. "!!x"). Note that |
| 357 | unary operators can't have ambiguous parses like binary operators can, |
| 358 | so there is no need for precedence information. |
| 359 | |
| 360 | The problem with this function, is that we need to call ParseUnary from |
| 361 | somewhere. To do this, we change previous callers of ParsePrimary to |
| 362 | call ParseUnary instead: |
| 363 | |
| 364 | .. code-block:: c++ |
| 365 | |
| 366 | /// binoprhs |
| 367 | /// ::= ('+' unary)* |
Lang Hames | 09bf4c1 | 2015-08-18 18:11:06 +0000 | [diff] [blame] | 368 | static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec, |
| 369 | std::unique_ptr<ExprAST> LHS) { |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 370 | ... |
| 371 | // Parse the unary expression after the binary operator. |
Lang Hames | 09bf4c1 | 2015-08-18 18:11:06 +0000 | [diff] [blame] | 372 | auto RHS = ParseUnary(); |
Lang Hames | 59b0da8 | 2015-08-19 18:15:58 +0000 | [diff] [blame] | 373 | if (!RHS) |
| 374 | return nullptr; |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 375 | ... |
| 376 | } |
| 377 | /// expression |
| 378 | /// ::= unary binoprhs |
| 379 | /// |
Lang Hames | 09bf4c1 | 2015-08-18 18:11:06 +0000 | [diff] [blame] | 380 | static std::unique_ptr<ExprAST> ParseExpression() { |
| 381 | auto LHS = ParseUnary(); |
Lang Hames | 59b0da8 | 2015-08-19 18:15:58 +0000 | [diff] [blame] | 382 | if (!LHS) |
| 383 | return nullptr; |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 384 | |
Lang Hames | 09bf4c1 | 2015-08-18 18:11:06 +0000 | [diff] [blame] | 385 | return ParseBinOpRHS(0, std::move(LHS)); |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 386 | } |
| 387 | |
| 388 | With these two simple changes, we are now able to parse unary operators |
| 389 | and build the AST for them. Next up, we need to add parser support for |
| 390 | prototypes, to parse the unary operator prototype. We extend the binary |
| 391 | operator code above with: |
| 392 | |
| 393 | .. code-block:: c++ |
| 394 | |
| 395 | /// prototype |
| 396 | /// ::= id '(' id* ')' |
| 397 | /// ::= binary LETTER number? (id, id) |
| 398 | /// ::= unary LETTER (id) |
Lang Hames | 09bf4c1 | 2015-08-18 18:11:06 +0000 | [diff] [blame] | 399 | static std::unique_ptr<PrototypeAST> ParsePrototype() { |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 400 | std::string FnName; |
| 401 | |
| 402 | unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary. |
| 403 | unsigned BinaryPrecedence = 30; |
| 404 | |
| 405 | switch (CurTok) { |
| 406 | default: |
Lang Hames | f9878c5 | 2016-03-25 17:33:32 +0000 | [diff] [blame] | 407 | return LogErrorP("Expected function name in prototype"); |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 408 | case tok_identifier: |
| 409 | FnName = IdentifierStr; |
| 410 | Kind = 0; |
| 411 | getNextToken(); |
| 412 | break; |
| 413 | case tok_unary: |
| 414 | getNextToken(); |
| 415 | if (!isascii(CurTok)) |
Lang Hames | f9878c5 | 2016-03-25 17:33:32 +0000 | [diff] [blame] | 416 | return LogErrorP("Expected unary operator"); |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 417 | FnName = "unary"; |
| 418 | FnName += (char)CurTok; |
| 419 | Kind = 1; |
| 420 | getNextToken(); |
| 421 | break; |
| 422 | case tok_binary: |
| 423 | ... |
| 424 | |
| 425 | As with binary operators, we name unary operators with a name that |
| 426 | includes the operator character. This assists us at code generation |
| 427 | time. Speaking of, the final piece we need to add is codegen support for |
| 428 | unary operators. It looks like this: |
| 429 | |
| 430 | .. code-block:: c++ |
| 431 | |
Lang Hames | 2d789c3 | 2015-08-26 03:07:41 +0000 | [diff] [blame] | 432 | Value *UnaryExprAST::codegen() { |
| 433 | Value *OperandV = Operand->codegen(); |
Lang Hames | 59b0da8 | 2015-08-19 18:15:58 +0000 | [diff] [blame] | 434 | if (!OperandV) |
| 435 | return nullptr; |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 436 | |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 437 | Function *F = getFunction(std::string("unary") + Opcode); |
Lang Hames | 59b0da8 | 2015-08-19 18:15:58 +0000 | [diff] [blame] | 438 | if (!F) |
Lang Hames | f9878c5 | 2016-03-25 17:33:32 +0000 | [diff] [blame] | 439 | return LogErrorV("Unknown unary operator"); |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 440 | |
| 441 | return Builder.CreateCall(F, OperandV, "unop"); |
| 442 | } |
| 443 | |
| 444 | This code is similar to, but simpler than, the code for binary |
| 445 | operators. It is simpler primarily because it doesn't need to handle any |
| 446 | predefined operators. |
| 447 | |
| 448 | Kicking the Tires |
| 449 | ================= |
| 450 | |
| 451 | It is somewhat hard to believe, but with a few simple extensions we've |
| 452 | covered in the last chapters, we have grown a real-ish language. With |
| 453 | this, we can do a lot of interesting things, including I/O, math, and a |
| 454 | bunch of other things. For example, we can now add a nice sequencing |
| 455 | operator (printd is defined to print out the specified value and a |
| 456 | newline): |
| 457 | |
| 458 | :: |
| 459 | |
| 460 | ready> extern printd(x); |
| 461 | Read extern: |
| 462 | declare double @printd(double) |
| 463 | |
| 464 | ready> def binary : 1 (x y) 0; # Low-precedence operator that ignores operands. |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 465 | ... |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 466 | ready> printd(123) : printd(456) : printd(789); |
| 467 | 123.000000 |
| 468 | 456.000000 |
| 469 | 789.000000 |
| 470 | Evaluated to 0.000000 |
| 471 | |
| 472 | We can also define a bunch of other "primitive" operations, such as: |
| 473 | |
| 474 | :: |
| 475 | |
| 476 | # Logical unary not. |
| 477 | def unary!(v) |
| 478 | if v then |
| 479 | 0 |
| 480 | else |
| 481 | 1; |
| 482 | |
| 483 | # Unary negate. |
| 484 | def unary-(v) |
| 485 | 0-v; |
| 486 | |
| 487 | # Define > with the same precedence as <. |
| 488 | def binary> 10 (LHS RHS) |
| 489 | RHS < LHS; |
| 490 | |
| 491 | # Binary logical or, which does not short circuit. |
| 492 | def binary| 5 (LHS RHS) |
| 493 | if LHS then |
| 494 | 1 |
| 495 | else if RHS then |
| 496 | 1 |
| 497 | else |
| 498 | 0; |
| 499 | |
| 500 | # Binary logical and, which does not short circuit. |
| 501 | def binary& 6 (LHS RHS) |
| 502 | if !LHS then |
| 503 | 0 |
| 504 | else |
| 505 | !!RHS; |
| 506 | |
| 507 | # Define = with slightly lower precedence than relationals. |
| 508 | def binary = 9 (LHS RHS) |
| 509 | !(LHS < RHS | LHS > RHS); |
| 510 | |
| 511 | # Define ':' for sequencing: as a low-precedence operator that ignores operands |
| 512 | # and just returns the RHS. |
| 513 | def binary : 1 (x y) y; |
| 514 | |
| 515 | Given the previous if/then/else support, we can also define interesting |
| 516 | functions for I/O. For example, the following prints out a character |
| 517 | whose "density" reflects the value passed in: the lower the value, the |
| 518 | denser the character: |
| 519 | |
| 520 | :: |
| 521 | |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 522 | ready> extern putchard(char); |
| 523 | ... |
| 524 | ready> def printdensity(d) |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 525 | if d > 8 then |
| 526 | putchard(32) # ' ' |
| 527 | else if d > 4 then |
| 528 | putchard(46) # '.' |
| 529 | else if d > 2 then |
| 530 | putchard(43) # '+' |
| 531 | else |
| 532 | putchard(42); # '*' |
| 533 | ... |
| 534 | ready> printdensity(1): printdensity(2): printdensity(3): |
| 535 | printdensity(4): printdensity(5): printdensity(9): |
| 536 | putchard(10); |
| 537 | **++. |
| 538 | Evaluated to 0.000000 |
| 539 | |
| 540 | Based on these simple primitive operations, we can start to define more |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 541 | interesting things. For example, here's a little function that determines |
| 542 | the number of iterations it takes for a certain function in the complex |
| 543 | plane to diverge: |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 544 | |
| 545 | :: |
| 546 | |
| 547 | # Determine whether the specific location diverges. |
| 548 | # Solve for z = z^2 + c in the complex plane. |
Hans Wennborg | 1d8b31b | 2016-06-14 16:05:12 +0000 | [diff] [blame] | 549 | def mandelconverger(real imag iters creal cimag) |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 550 | if iters > 255 | (real*real + imag*imag > 4) then |
| 551 | iters |
| 552 | else |
Hans Wennborg | 1d8b31b | 2016-06-14 16:05:12 +0000 | [diff] [blame] | 553 | mandelconverger(real*real - imag*imag + creal, |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 554 | 2*real*imag + cimag, |
| 555 | iters+1, creal, cimag); |
| 556 | |
| 557 | # Return the number of iterations required for the iteration to escape |
Hans Wennborg | 1d8b31b | 2016-06-14 16:05:12 +0000 | [diff] [blame] | 558 | def mandelconverge(real imag) |
| 559 | mandelconverger(real, imag, 0, real, imag); |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 560 | |
| 561 | This "``z = z2 + c``" function is a beautiful little creature that is |
| 562 | the basis for computation of the `Mandelbrot |
| 563 | Set <http://en.wikipedia.org/wiki/Mandelbrot_set>`_. Our |
| 564 | ``mandelconverge`` function returns the number of iterations that it |
| 565 | takes for a complex orbit to escape, saturating to 255. This is not a |
| 566 | very useful function by itself, but if you plot its value over a |
| 567 | two-dimensional plane, you can see the Mandelbrot set. Given that we are |
| 568 | limited to using putchard here, our amazing graphical output is limited, |
| 569 | but we can whip together something using the density plotter above: |
| 570 | |
| 571 | :: |
| 572 | |
Hans Wennborg | 1d8b31b | 2016-06-14 16:05:12 +0000 | [diff] [blame] | 573 | # Compute and plot the mandelbrot set with the specified 2 dimensional range |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 574 | # info. |
| 575 | def mandelhelp(xmin xmax xstep ymin ymax ystep) |
| 576 | for y = ymin, y < ymax, ystep in ( |
| 577 | (for x = xmin, x < xmax, xstep in |
Hans Wennborg | 1d8b31b | 2016-06-14 16:05:12 +0000 | [diff] [blame] | 578 | printdensity(mandelconverge(x,y))) |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 579 | : putchard(10) |
| 580 | ) |
| 581 | |
| 582 | # mandel - This is a convenient helper function for plotting the mandelbrot set |
| 583 | # from the specified position with the specified Magnification. |
| 584 | def mandel(realstart imagstart realmag imagmag) |
| 585 | mandelhelp(realstart, realstart+realmag*78, realmag, |
| 586 | imagstart, imagstart+imagmag*40, imagmag); |
| 587 | |
Hans Wennborg | 1d8b31b | 2016-06-14 16:05:12 +0000 | [diff] [blame] | 588 | Given this, we can try plotting out the mandelbrot set! Lets try it out: |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 589 | |
| 590 | :: |
| 591 | |
| 592 | ready> mandel(-2.3, -1.3, 0.05, 0.07); |
| 593 | *******************************+++++++++++************************************* |
| 594 | *************************+++++++++++++++++++++++******************************* |
| 595 | **********************+++++++++++++++++++++++++++++**************************** |
| 596 | *******************+++++++++++++++++++++.. ...++++++++************************* |
| 597 | *****************++++++++++++++++++++++.... ...+++++++++*********************** |
| 598 | ***************+++++++++++++++++++++++..... ...+++++++++********************* |
| 599 | **************+++++++++++++++++++++++.... ....+++++++++******************** |
| 600 | *************++++++++++++++++++++++...... .....++++++++******************* |
| 601 | ************+++++++++++++++++++++....... .......+++++++****************** |
| 602 | ***********+++++++++++++++++++.... ... .+++++++***************** |
| 603 | **********+++++++++++++++++....... .+++++++**************** |
| 604 | *********++++++++++++++........... ...+++++++*************** |
| 605 | ********++++++++++++............ ...++++++++************** |
| 606 | ********++++++++++... .......... .++++++++************** |
| 607 | *******+++++++++..... .+++++++++************* |
| 608 | *******++++++++...... ..+++++++++************* |
| 609 | *******++++++....... ..+++++++++************* |
| 610 | *******+++++...... ..+++++++++************* |
| 611 | *******.... .... ...+++++++++************* |
| 612 | *******.... . ...+++++++++************* |
| 613 | *******+++++...... ...+++++++++************* |
| 614 | *******++++++....... ..+++++++++************* |
| 615 | *******++++++++...... .+++++++++************* |
| 616 | *******+++++++++..... ..+++++++++************* |
| 617 | ********++++++++++... .......... .++++++++************** |
| 618 | ********++++++++++++............ ...++++++++************** |
| 619 | *********++++++++++++++.......... ...+++++++*************** |
| 620 | **********++++++++++++++++........ .+++++++**************** |
| 621 | **********++++++++++++++++++++.... ... ..+++++++**************** |
| 622 | ***********++++++++++++++++++++++....... .......++++++++***************** |
| 623 | ************+++++++++++++++++++++++...... ......++++++++****************** |
| 624 | **************+++++++++++++++++++++++.... ....++++++++******************** |
| 625 | ***************+++++++++++++++++++++++..... ...+++++++++********************* |
| 626 | *****************++++++++++++++++++++++.... ...++++++++*********************** |
| 627 | *******************+++++++++++++++++++++......++++++++************************* |
| 628 | *********************++++++++++++++++++++++.++++++++*************************** |
| 629 | *************************+++++++++++++++++++++++******************************* |
| 630 | ******************************+++++++++++++************************************ |
| 631 | ******************************************************************************* |
| 632 | ******************************************************************************* |
| 633 | ******************************************************************************* |
| 634 | Evaluated to 0.000000 |
| 635 | ready> mandel(-2, -1, 0.02, 0.04); |
| 636 | **************************+++++++++++++++++++++++++++++++++++++++++++++++++++++ |
| 637 | ***********************++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
| 638 | *********************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++. |
| 639 | *******************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++... |
| 640 | *****************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++..... |
| 641 | ***************++++++++++++++++++++++++++++++++++++++++++++++++++++++++........ |
| 642 | **************++++++++++++++++++++++++++++++++++++++++++++++++++++++........... |
| 643 | ************+++++++++++++++++++++++++++++++++++++++++++++++++++++.............. |
| 644 | ***********++++++++++++++++++++++++++++++++++++++++++++++++++........ . |
| 645 | **********++++++++++++++++++++++++++++++++++++++++++++++............. |
| 646 | ********+++++++++++++++++++++++++++++++++++++++++++.................. |
| 647 | *******+++++++++++++++++++++++++++++++++++++++....................... |
| 648 | ******+++++++++++++++++++++++++++++++++++........................... |
| 649 | *****++++++++++++++++++++++++++++++++............................ |
| 650 | *****++++++++++++++++++++++++++++............................... |
| 651 | ****++++++++++++++++++++++++++...... ......................... |
| 652 | ***++++++++++++++++++++++++......... ...... ........... |
| 653 | ***++++++++++++++++++++++............ |
| 654 | **+++++++++++++++++++++.............. |
| 655 | **+++++++++++++++++++................ |
| 656 | *++++++++++++++++++................. |
| 657 | *++++++++++++++++............ ... |
| 658 | *++++++++++++++.............. |
| 659 | *+++....++++................ |
| 660 | *.......... ........... |
| 661 | * |
| 662 | *.......... ........... |
| 663 | *+++....++++................ |
| 664 | *++++++++++++++.............. |
| 665 | *++++++++++++++++............ ... |
| 666 | *++++++++++++++++++................. |
| 667 | **+++++++++++++++++++................ |
| 668 | **+++++++++++++++++++++.............. |
| 669 | ***++++++++++++++++++++++............ |
| 670 | ***++++++++++++++++++++++++......... ...... ........... |
| 671 | ****++++++++++++++++++++++++++...... ......................... |
| 672 | *****++++++++++++++++++++++++++++............................... |
| 673 | *****++++++++++++++++++++++++++++++++............................ |
| 674 | ******+++++++++++++++++++++++++++++++++++........................... |
| 675 | *******+++++++++++++++++++++++++++++++++++++++....................... |
| 676 | ********+++++++++++++++++++++++++++++++++++++++++++.................. |
| 677 | Evaluated to 0.000000 |
| 678 | ready> mandel(-0.9, -1.4, 0.02, 0.03); |
| 679 | ******************************************************************************* |
| 680 | ******************************************************************************* |
| 681 | ******************************************************************************* |
| 682 | **********+++++++++++++++++++++************************************************ |
| 683 | *+++++++++++++++++++++++++++++++++++++++*************************************** |
| 684 | +++++++++++++++++++++++++++++++++++++++++++++********************************** |
| 685 | ++++++++++++++++++++++++++++++++++++++++++++++++++***************************** |
| 686 | ++++++++++++++++++++++++++++++++++++++++++++++++++++++************************* |
| 687 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++********************** |
| 688 | +++++++++++++++++++++++++++++++++.........++++++++++++++++++******************* |
| 689 | +++++++++++++++++++++++++++++++.... ......+++++++++++++++++++**************** |
| 690 | +++++++++++++++++++++++++++++....... ........+++++++++++++++++++************** |
| 691 | ++++++++++++++++++++++++++++........ ........++++++++++++++++++++************ |
| 692 | +++++++++++++++++++++++++++......... .. ...+++++++++++++++++++++********** |
| 693 | ++++++++++++++++++++++++++........... ....++++++++++++++++++++++******** |
| 694 | ++++++++++++++++++++++++............. .......++++++++++++++++++++++****** |
| 695 | +++++++++++++++++++++++............. ........+++++++++++++++++++++++**** |
| 696 | ++++++++++++++++++++++........... ..........++++++++++++++++++++++*** |
| 697 | ++++++++++++++++++++........... .........++++++++++++++++++++++* |
| 698 | ++++++++++++++++++............ ...........++++++++++++++++++++ |
| 699 | ++++++++++++++++............... .............++++++++++++++++++ |
| 700 | ++++++++++++++................. ...............++++++++++++++++ |
| 701 | ++++++++++++.................. .................++++++++++++++ |
| 702 | +++++++++.................. .................+++++++++++++ |
| 703 | ++++++........ . ......... ..++++++++++++ |
| 704 | ++............ ...... ....++++++++++ |
| 705 | .............. ...++++++++++ |
| 706 | .............. ....+++++++++ |
| 707 | .............. .....++++++++ |
| 708 | ............. ......++++++++ |
| 709 | ........... .......++++++++ |
| 710 | ......... ........+++++++ |
| 711 | ......... ........+++++++ |
| 712 | ......... ....+++++++ |
| 713 | ........ ...+++++++ |
| 714 | ....... ...+++++++ |
| 715 | ....+++++++ |
| 716 | .....+++++++ |
| 717 | ....+++++++ |
| 718 | ....+++++++ |
| 719 | ....+++++++ |
| 720 | Evaluated to 0.000000 |
| 721 | ready> ^D |
| 722 | |
| 723 | At this point, you may be starting to realize that Kaleidoscope is a |
| 724 | real and powerful language. It may not be self-similar :), but it can be |
| 725 | used to plot things that are! |
| 726 | |
| 727 | With this, we conclude the "adding user-defined operators" chapter of |
| 728 | the tutorial. We have successfully augmented our language, adding the |
| 729 | ability to extend the language in the library, and we have shown how |
| 730 | this can be used to build a simple but interesting end-user application |
| 731 | in Kaleidoscope. At this point, Kaleidoscope can build a variety of |
| 732 | applications that are functional and can call functions with |
| 733 | side-effects, but it can't actually define and mutate a variable itself. |
| 734 | |
| 735 | Strikingly, variable mutation is an important feature of some languages, |
| 736 | and it is not at all obvious how to `add support for mutable |
| 737 | variables <LangImpl7.html>`_ without having to add an "SSA construction" |
| 738 | phase to your front-end. In the next chapter, we will describe how you |
| 739 | can add variable mutation without building SSA in your front-end. |
| 740 | |
| 741 | Full Code Listing |
| 742 | ================= |
| 743 | |
| 744 | Here is the complete code listing for our running example, enhanced with |
Mehdi Amini | bb6805d | 2017-02-11 21:26:52 +0000 | [diff] [blame] | 745 | the support for user-defined operators. To build this example, use: |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 746 | |
| 747 | .. code-block:: bash |
| 748 | |
| 749 | # Compile |
Eric Christopher | a8c6a0a | 2015-01-08 19:07:01 +0000 | [diff] [blame] | 750 | clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 751 | # Run |
| 752 | ./toy |
| 753 | |
| 754 | On some platforms, you will need to specify -rdynamic or |
| 755 | -Wl,--export-dynamic when linking. This ensures that symbols defined in |
| 756 | the main executable are exported to the dynamic linker and so are |
| 757 | available for symbol resolution at run time. This is not needed if you |
| 758 | compile your support code into a shared library, although doing that |
| 759 | will cause problems on Windows. |
| 760 | |
| 761 | Here is the code: |
| 762 | |
Logan Chien | 855b17d | 2013-06-08 09:03:03 +0000 | [diff] [blame] | 763 | .. literalinclude:: ../../examples/Kaleidoscope/Chapter6/toy.cpp |
| 764 | :language: c++ |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 765 | |
| 766 | `Next: Extending the language: mutable variables / SSA |
Wilfred Hughes | 945f43e | 2016-07-02 17:01:59 +0000 | [diff] [blame] | 767 | construction <LangImpl07.html>`_ |
Sean Silva | d7fb396 | 2012-12-05 00:26:32 +0000 | [diff] [blame] | 768 | |