| Chris Lattner | b8fc650 | 2007-11-05 01:58:13 +0000 | [diff] [blame] | 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" | 
|  | 2 | "http://www.w3.org/TR/html4/strict.dtd"> | 
|  | 3 |  | 
|  | 4 | <html> | 
|  | 5 | <head> | 
| Chris Lattner | 99005a4 | 2007-11-05 19:10:15 +0000 | [diff] [blame] | 6 | <title>Kaleidoscope: Conclusion and other useful LLVM tidbits</title> | 
| Chris Lattner | b8fc650 | 2007-11-05 01:58:13 +0000 | [diff] [blame] | 7 | <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> | 
|  | 8 | <meta name="author" content="Chris Lattner"> | 
|  | 9 | <link rel="stylesheet" href="../llvm.css" type="text/css"> | 
|  | 10 | </head> | 
|  | 11 |  | 
|  | 12 | <body> | 
|  | 13 |  | 
| Chris Lattner | 99005a4 | 2007-11-05 19:10:15 +0000 | [diff] [blame] | 14 | <div class="doc_title">Kaleidoscope: Conclusion and other useful LLVM | 
|  | 15 | tidbits</div> | 
|  | 16 |  | 
|  | 17 | <ul> | 
|  | 18 | <li>Chapter 8 | 
|  | 19 | <ol> | 
|  | 20 | <li><a href="#conclusion">Tutorial Conclusion</a></li> | 
|  | 21 | <li><a href="#llvmirproperties">Properties of LLVM IR</a> | 
|  | 22 | <ul> | 
|  | 23 | <li><a href="#targetindep">Target Independence</a></li> | 
|  | 24 | <li><a href="#safety">Safety Guarantees</a></li> | 
|  | 25 | <li><a href="#langspecific">Language-Specific Optimizations</a></li> | 
|  | 26 | </ul> | 
|  | 27 | </li> | 
|  | 28 | <li><a href="#tipsandtricks">Tips and Tricks</a> | 
|  | 29 | <ul> | 
|  | 30 | <li><a href="#offsetofsizeof">Implementing portable | 
|  | 31 | offsetof/sizeof</a></li> | 
|  | 32 | <li><a href="#gcstack">Garbage Collected Stack Frames</a></li> | 
|  | 33 | </ul> | 
|  | 34 | </li> | 
|  | 35 | </ol> | 
|  | 36 | </li> | 
|  | 37 | </ul> | 
|  | 38 |  | 
| Chris Lattner | b8fc650 | 2007-11-05 01:58:13 +0000 | [diff] [blame] | 39 |  | 
|  | 40 | <div class="doc_author"> | 
|  | 41 | <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p> | 
|  | 42 | </div> | 
|  | 43 |  | 
|  | 44 | <!-- *********************************************************************** --> | 
| Chris Lattner | 99005a4 | 2007-11-05 19:10:15 +0000 | [diff] [blame] | 45 | <div class="doc_section"><a name="conclusion">Tutorial Conclusion</a></div> | 
| Chris Lattner | b8fc650 | 2007-11-05 01:58:13 +0000 | [diff] [blame] | 46 | <!-- *********************************************************************** --> | 
|  | 47 |  | 
|  | 48 | <div class="doc_text"> | 
|  | 49 |  | 
|  | 50 | <p>Welcome to the the final chapter of the "<a href="index.html">Implementing a | 
|  | 51 | language with LLVM</a>" tutorial.  In the course of this tutorial, we have grown | 
|  | 52 | our little Kaleidoscope language from being a useless toy, to being a | 
|  | 53 | semi-interesting (but probably still useless) toy. :)</p> | 
|  | 54 |  | 
|  | 55 | <p>It is interesting to see how far we've come, and how little code it has | 
|  | 56 | taken.  We built the entire lexer, parser, AST, code generator, and an | 
|  | 57 | interactive run-loop (with a JIT!) by-hand in under 700 lines of | 
|  | 58 | (non-comment/non-blank) code.</p> | 
|  | 59 |  | 
|  | 60 | <p>Our little language supports a couple of interesting features: it supports | 
|  | 61 | user defined binary and unary operators, it uses JIT compilation for immediate | 
|  | 62 | evaluation, and it supports a few control flow constructs with SSA construction. | 
|  | 63 | </p> | 
|  | 64 |  | 
|  | 65 | <p>Part of the idea of this tutorial was to show you how easy and fun it can be | 
|  | 66 | to define, build, and play with languages.  Building a compiler need not be a | 
|  | 67 | scary or mystical process!  Now that you've seen some of the basics, I strongly | 
|  | 68 | encourage you to take the code and hack on it.  For example, try adding:</p> | 
|  | 69 |  | 
|  | 70 | <ul> | 
|  | 71 | <li><b>global variables</b> - While global variables have questional value in | 
|  | 72 | modern software engineering, they are often useful when putting together quick | 
|  | 73 | little hacks like the Kaleidoscope compiler itself.  Fortunately, our current | 
|  | 74 | setup makes it very easy to add global variables: just have value lookup check | 
|  | 75 | to see if an unresolved variable is in the global variable symbol table before | 
|  | 76 | rejecting it.  To create a new global variable, make an instance of the LLVM | 
|  | 77 | <tt>GlobalVariable</tt> class.</li> | 
|  | 78 |  | 
|  | 79 | <li><b>typed variables</b> - Kaleidoscope currently only supports variables of | 
|  | 80 | type double.  This gives the language a very nice elegance, because only | 
|  | 81 | supporting one type means that you never have to specify types.  Different | 
|  | 82 | languages have different ways of handling this.  The easiest way is to require | 
|  | 83 | the user to specify types for every variable definition, and record the type | 
|  | 84 | of the variable in the symbol table along with its Value*.</li> | 
|  | 85 |  | 
|  | 86 | <li><b>arrays, structs, vectors, etc</b> - Once you add types, you can start | 
|  | 87 | extending the type system in all sorts of interesting ways.  Simple arrays are | 
|  | 88 | very easy and are quite useful for many different applications.  Adding them is | 
|  | 89 | mostly an exercise in learning how the LLVM <a | 
|  | 90 | href="../LangRef.html#i_getelementptr">getelementptr</a> instruction works. | 
|  | 91 | The getelementptr instruction is so nifty/unconventional, it <a | 
| Chris Lattner | 9ca08f3 | 2007-11-05 19:28:07 +0000 | [diff] [blame^] | 92 | href="../GetElementPtr.html">has its own FAQ</a>!).  If you add support | 
|  | 93 | for recursive types (e.g. linked lists), make sure to read the <a | 
|  | 94 | href="../ProgrammersManual.html#TypeResolve">section in the LLVM | 
|  | 95 | Programmer's Manual</a> that describes how to construct them.</li> | 
| Chris Lattner | b8fc650 | 2007-11-05 01:58:13 +0000 | [diff] [blame] | 96 |  | 
|  | 97 | <li><b>standard runtime</b> - Our current language allows the user to access | 
|  | 98 | arbitrary external functions, and we use it for things like "printd" and | 
|  | 99 | "putchard".  As you extend the language to add higher-level constructs, often | 
|  | 100 | these constructs make the most amount of sense to be lowered into calls into a | 
|  | 101 | language-supplied runtime.  For example, if you add hash tables to the language, | 
|  | 102 | it would probably make sense to add the routines to a runtime, instead of | 
|  | 103 | inlining them all the way.</li> | 
|  | 104 |  | 
|  | 105 | <li><b>memory management</b> - Currently we can only access the stack in | 
|  | 106 | Kaleidoscope.  It would also be useful to be able to allocate heap memory, | 
|  | 107 | either with calls to the standard libc malloc/free interface or with a garbage | 
|  | 108 | collector.  If you choose to use garbage collection, note that LLVM fully | 
|  | 109 | supports <a href="../GarbageCollection.html">Accurate Garbage Collection</a> | 
|  | 110 | including algorithms that move objects and need to scan/update the stack.</li> | 
|  | 111 |  | 
|  | 112 | <li><b>debugger support</b> - LLVM supports generation of <a | 
|  | 113 | href="../SourceLevelDebugging.html">DWARF Debug info</a> which is understood by | 
|  | 114 | common debuggers like GDB.  Adding support for debug info is fairly | 
|  | 115 | straight-forward.  The best way to understand it is to compile some C/C++ code | 
|  | 116 | with "<tt>llvm-gcc -g -O0</tt>" and taking a look at what it produces.</li> | 
|  | 117 |  | 
| Chris Lattner | a3f07ef | 2007-11-05 07:00:54 +0000 | [diff] [blame] | 118 | <li><b>exception handling support</b> - LLVM supports generation of <a | 
| Chris Lattner | b8fc650 | 2007-11-05 01:58:13 +0000 | [diff] [blame] | 119 | href="../ExceptionHandling.html">zero cost exceptions</a> which interoperate | 
|  | 120 | with code compiled in other languages.  You could also generate code by | 
|  | 121 | implicitly making every function return an error value and checking it.  You | 
|  | 122 | could also make explicit use of setjmp/longjmp.  There are many different ways | 
|  | 123 | to go here.</li> | 
|  | 124 |  | 
|  | 125 | <li><b>object orientation, generics, database access, complex numbers, | 
|  | 126 | geometric programming, ...</b> - Really, there is | 
|  | 127 | no end of crazy features that you can add to the language.</li> | 
|  | 128 |  | 
| Chris Lattner | a3f07ef | 2007-11-05 07:00:54 +0000 | [diff] [blame] | 129 | <li><b>unusual domains</b> - We've been talking about applying LLVM to a domain | 
|  | 130 | that many people are interested in: building a compiler for a specific language. | 
|  | 131 | However, there are many other domains that can use compiler technology that are | 
|  | 132 | not typically considered.  For example, LLVM has been used to implement OpenGL | 
|  | 133 | graphics acceleration, translate C++ code to ActionScript, and many other | 
|  | 134 | cute and clever things.  Maybe you will be the first to JIT compile a regular | 
|  | 135 | expression interpreter into native code with LLVM?</li> | 
|  | 136 |  | 
| Chris Lattner | b8fc650 | 2007-11-05 01:58:13 +0000 | [diff] [blame] | 137 | </ul> | 
|  | 138 |  | 
|  | 139 | <p> | 
|  | 140 | Have fun - try doing something crazy and unusual.  Building a language like | 
|  | 141 | everyone else always has is much less fun than trying something a little crazy | 
|  | 142 | and off the wall and seeing how it turns out.  If you get stuck or want to talk | 
|  | 143 | about it, feel free to email the <a | 
|  | 144 | href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">llvmdev mailing | 
|  | 145 | list</a>: it has lots of people who are interested in languages and are often | 
|  | 146 | willing to help out. | 
|  | 147 | </p> | 
|  | 148 |  | 
|  | 149 | <p>Before we end, I want to talk about some "tips and tricks" for generating | 
|  | 150 | LLVM IR.  These are some of the more subtle things that may not be obvious, but | 
|  | 151 | are very useful if you want to take advantage of LLVM's capabilities.</p> | 
|  | 152 |  | 
|  | 153 | </div> | 
|  | 154 |  | 
|  | 155 | <!-- *********************************************************************** --> | 
| Chris Lattner | a3f07ef | 2007-11-05 07:00:54 +0000 | [diff] [blame] | 156 | <div class="doc_section"><a name="llvmirproperties">Properties of LLVM | 
|  | 157 | IR</a></div> | 
|  | 158 | <!-- *********************************************************************** --> | 
|  | 159 |  | 
|  | 160 | <div class="doc_text"> | 
|  | 161 |  | 
|  | 162 | <p>We have a couple common questions about code in the LLVM IR form, lets just | 
|  | 163 | get these out of the way right now shall we?</p> | 
|  | 164 |  | 
|  | 165 | </div> | 
|  | 166 |  | 
|  | 167 | <!-- ======================================================================= --> | 
|  | 168 | <div class="doc_subsubsection"><a name="targetindep">Target | 
|  | 169 | Independence</a></div> | 
|  | 170 | <!-- ======================================================================= --> | 
|  | 171 |  | 
|  | 172 | <div class="doc_text"> | 
|  | 173 |  | 
|  | 174 | <p>Kaleidoscope is an example of a "portable language": any program written in | 
|  | 175 | Kaleidoscope will work the same way on any target that it runs on.  Many other | 
|  | 176 | languages have this property, e.g. lisp, java, haskell, javascript, python, etc | 
|  | 177 | (note that while these languages are portable, not all their libraries are).</p> | 
|  | 178 |  | 
|  | 179 | <p>One nice aspect of LLVM is that it is often capable of preserving language | 
|  | 180 | independence in the IR: you can take the LLVM IR for a Kaleidoscope-compiled | 
|  | 181 | program and run it on any target that LLVM supports, even emitting C code and | 
|  | 182 | compiling that on targets that LLVM doesn't support natively.  You can trivially | 
|  | 183 | tell that the Kaleidoscope compiler generates target-independent code because it | 
|  | 184 | never queries for any target-specific information when generating code.</p> | 
|  | 185 |  | 
|  | 186 | <p>The fact that LLVM provides a compact target-independent representation for | 
|  | 187 | code gets a lot of people excited.  Unfortunately, these people are usually | 
|  | 188 | thinking about C or a language from the C family when they are asking questions | 
|  | 189 | about language portability.  I say "unfortunately", because there is really no | 
|  | 190 | way to make (fully general) C code portable, other than shipping the source code | 
|  | 191 | around (and of course, C source code is not actually portable in general | 
|  | 192 | either - ever port a really old application from 32- to 64-bits?).</p> | 
|  | 193 |  | 
|  | 194 | <p>The problem with C (again, in its full generality) is that it is heavily | 
|  | 195 | laden with target specific assumptions.  As one simple example, the preprocessor | 
|  | 196 | often destructively removes target-independence from the code when it processes | 
|  | 197 | the input text:</p> | 
|  | 198 |  | 
|  | 199 | <div class="doc_code"> | 
|  | 200 | <pre> | 
|  | 201 | #ifdef __i386__ | 
|  | 202 | int X = 1; | 
|  | 203 | #else | 
|  | 204 | int X = 42; | 
|  | 205 | #endif | 
|  | 206 | </pre> | 
|  | 207 | </div> | 
|  | 208 |  | 
|  | 209 | <p>While it is possible to engineer more and more complex solutions to problems | 
|  | 210 | like this, it cannot be solved in full generality in a way better than shipping | 
|  | 211 | the actual source code.</p> | 
|  | 212 |  | 
|  | 213 | <p>That said, there are interesting subsets of C that can be made portable.  If | 
|  | 214 | you are willing to fix primitive types to a fixed size (say int = 32-bits, | 
|  | 215 | and long = 64-bits), don't care about ABI compatibility with existing binaries, | 
|  | 216 | and are willing to give up some other minor features, you can have portable | 
|  | 217 | code.  This can even make real sense for specialized domains such as an | 
|  | 218 | in-kernel language.</p> | 
|  | 219 |  | 
|  | 220 | </div> | 
|  | 221 |  | 
|  | 222 | <!-- ======================================================================= --> | 
|  | 223 | <div class="doc_subsubsection"><a name="safety">Safety Guarantees</a></div> | 
|  | 224 | <!-- ======================================================================= --> | 
|  | 225 |  | 
|  | 226 | <div class="doc_text"> | 
|  | 227 |  | 
|  | 228 | <p>Many of the languages above are also "safe" languages: it is impossible for | 
|  | 229 | a program written in Java to corrupt its address space and crash the process. | 
|  | 230 | Safety is an interesting property that requires a combination of language | 
|  | 231 | design, runtime support, and often operating system support.</p> | 
|  | 232 |  | 
|  | 233 | <p>It is certainly possible to implement a safe language in LLVM, but LLVM IR | 
|  | 234 | does not itself guarantee safety.  The LLVM IR allows unsafe pointer casts, | 
|  | 235 | use after free bugs, buffer over-runs, and a variety of other problems.  Safety | 
|  | 236 | needs to be implemented as a layer on top of LLVM and, conveniently, several | 
|  | 237 | groups have investigated this.  Ask on the <a | 
|  | 238 | href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">llvmdev mailing | 
|  | 239 | list</a> if you are interested in more details.</p> | 
|  | 240 |  | 
|  | 241 | </div> | 
|  | 242 |  | 
|  | 243 | <!-- ======================================================================= --> | 
|  | 244 | <div class="doc_subsubsection"><a name="langspecific">Language-Specific | 
|  | 245 | Optimizations</a></div> | 
|  | 246 | <!-- ======================================================================= --> | 
|  | 247 |  | 
|  | 248 | <div class="doc_text"> | 
|  | 249 |  | 
|  | 250 | <p>One thing about LLVM that turns off many people is that it does not solve all | 
|  | 251 | the world's problems in one system (sorry 'world hunger', someone else will have | 
|  | 252 | to solve you some other day).  One specific complaint is that people perceive | 
|  | 253 | LLVM as being incapable of performing high-level language-specific optimization: | 
|  | 254 | LLVM "loses too much information".</p> | 
|  | 255 |  | 
|  | 256 | <p>Unfortunately, this is really not the place to give you a full and unified | 
|  | 257 | version of "Chris Lattner's theory of compiler design".  Instead, I'll make a | 
|  | 258 | few observations:</p> | 
|  | 259 |  | 
|  | 260 | <p>First, you're right that LLVM does lose information.  For example, as of this | 
|  | 261 | writing, there is no way to distinguish in the LLVM IR whether an SSA-value came | 
|  | 262 | from a C "int" or a C "long" on an ILP32 machine (other than debug info).  Both | 
|  | 263 | get compiled down to an 'i32' value and the information about what it came from | 
|  | 264 | is lost.  The more general issue here is that the LLVM type system uses | 
|  | 265 | "structural equivalence" instead of "name equivalence".  Another place this | 
|  | 266 | surprises people is if you have two types in a high-level language that have the | 
|  | 267 | same structure (e.g. two different structs that have a single int field): these | 
|  | 268 | types will compile down into a single LLVM type and it will be impossible to | 
|  | 269 | tell what it came from.</p> | 
|  | 270 |  | 
|  | 271 | <p>Second, while LLVM does lose information, LLVM is not a fixed target: we | 
|  | 272 | continue to enhance and improve it in many different ways.  In addition to | 
|  | 273 | adding new features (LLVM did not always support exceptions or debug info), we | 
|  | 274 | also extend the IR to capture important information for optimization (e.g. | 
|  | 275 | whether an argument is sign or zero extended, information about pointers | 
|  | 276 | aliasing, etc.  Many of the enhancements are user-driven: people want LLVM to | 
|  | 277 | do some specific feature, so they go ahead and extend it to do so.</p> | 
|  | 278 |  | 
|  | 279 | <p>Third, it <em>is certainly possible</em> to add language-specific | 
|  | 280 | optimizations, and you have a number of choices in how to do it.  As one trivial | 
|  | 281 | example, it is possible to add language-specific optimization passes that | 
|  | 282 | "known" things about code compiled for a language.  In the case of the C family, | 
|  | 283 | there is an optimziation pass that "knows" about the standard C library | 
|  | 284 | functions.  If you call "exit(0)" in main(), it knows that it is safe to | 
|  | 285 | optimize that into "return 0;" for example, because C specifies what the 'exit' | 
|  | 286 | function does.</p> | 
|  | 287 |  | 
|  | 288 | <p>In addition to simple library knowledge, it is possible to embed a variety of | 
|  | 289 | other language-specific information into the LLVM IR.  If you have a specific | 
|  | 290 | need and run into a wall, please bring the topic up on the llvmdev list.  At the | 
|  | 291 | very worst, you can always treat LLVM as if it were a "dumb code generator" and | 
|  | 292 | implement the high-level optimizations you desire in your front-end on the | 
|  | 293 | language-specific AST. | 
|  | 294 | </p> | 
|  | 295 |  | 
|  | 296 | </div> | 
|  | 297 |  | 
|  | 298 | <!-- *********************************************************************** --> | 
| Chris Lattner | b8fc650 | 2007-11-05 01:58:13 +0000 | [diff] [blame] | 299 | <div class="doc_section"><a name="tipsandtricks">Tips and Tricks</a></div> | 
|  | 300 | <!-- *********************************************************************** --> | 
|  | 301 |  | 
|  | 302 | <div class="doc_text"> | 
|  | 303 |  | 
| Chris Lattner | a3f07ef | 2007-11-05 07:00:54 +0000 | [diff] [blame] | 304 | <p>There is a variety of useful tips and tricks that you come to know after | 
|  | 305 | working on/with LLVM that aren't obvious at first glance.  Instead of letting | 
|  | 306 | everyone rediscover them, this section talks about some of these issues.</p> | 
|  | 307 |  | 
|  | 308 | </div> | 
|  | 309 |  | 
|  | 310 | <!-- ======================================================================= --> | 
|  | 311 | <div class="doc_subsubsection"><a name="offsetofsizeof">Implementing portable | 
|  | 312 | offsetof/sizeof</a></div> | 
|  | 313 | <!-- ======================================================================= --> | 
|  | 314 |  | 
|  | 315 | <div class="doc_text"> | 
|  | 316 |  | 
|  | 317 | <p>One interesting thing that comes up if you are trying to keep the code | 
|  | 318 | generated by your compiler "target independent" is that you often need to know | 
|  | 319 | the size of some LLVM type or the offset of some field in an llvm structure. | 
|  | 320 | For example, you might need to pass the size of a type into a function that | 
|  | 321 | allocates memory.</p> | 
|  | 322 |  | 
|  | 323 | <p>Unfortunately, this can vary widely across targets: for example the width of | 
|  | 324 | a pointer is trivially target-specific.  However, there is a <a | 
|  | 325 | href="http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt">clever | 
|  | 326 | way to use the getelementptr instruction</a> that allows you to compute this | 
|  | 327 | in a portable way.</p> | 
|  | 328 |  | 
|  | 329 | </div> | 
|  | 330 |  | 
|  | 331 | <!-- ======================================================================= --> | 
|  | 332 | <div class="doc_subsubsection"><a name="gcstack">Garbage Collected | 
|  | 333 | Stack Frames</a></div> | 
|  | 334 | <!-- ======================================================================= --> | 
|  | 335 |  | 
|  | 336 | <div class="doc_text"> | 
|  | 337 |  | 
|  | 338 | <p>Some languages want to explicitly manage their stack frames, often so that | 
|  | 339 | they are garbage collected or to allow easy implementation of closures.  There | 
|  | 340 | are often better ways to implement these features than explicit stack frames, | 
|  | 341 | but <a | 
|  | 342 | href="http://nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt">LLVM | 
|  | 343 | does support them if you want</a>.  It requires your front-end to convert the | 
|  | 344 | code into <a | 
|  | 345 | href="http://en.wikipedia.org/wiki/Continuation-passing_style">Continuation | 
|  | 346 | Passing Style</a> and use of tail calls (which LLVM also supports).</p> | 
| Chris Lattner | b8fc650 | 2007-11-05 01:58:13 +0000 | [diff] [blame] | 347 |  | 
|  | 348 | </div> | 
|  | 349 |  | 
|  | 350 | <!-- *********************************************************************** --> | 
|  | 351 | <hr> | 
|  | 352 | <address> | 
|  | 353 | <a href="http://jigsaw.w3.org/css-validator/check/referer"><img | 
|  | 354 | src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> | 
|  | 355 | <a href="http://validator.w3.org/check/referer"><img | 
|  | 356 | src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> | 
|  | 357 |  | 
|  | 358 | <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> | 
|  | 359 | <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br> | 
|  | 360 | Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $ | 
|  | 361 | </address> | 
|  | 362 | </body> | 
|  | 363 | </html> |