| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" | 
|  | 2 | "http://www.w3.org/TR/html4/strict.dtd"> | 
|  | 3 | <html> | 
|  | 4 | <head> | 
|  | 5 | <title>Accurate Garbage Collection with LLVM</title> | 
|  | 6 | <link rel="stylesheet" href="llvm.css" type="text/css"> | 
|  | 7 | </head> | 
|  | 8 | <body> | 
|  | 9 |  | 
|  | 10 | <div class="doc_title"> | 
|  | 11 | Accurate Garbage Collection with LLVM | 
|  | 12 | </div> | 
|  | 13 |  | 
|  | 14 | <ol> | 
|  | 15 | <li><a href="#introduction">Introduction</a> | 
|  | 16 | <ul> | 
|  | 17 | <li><a href="#feature">GC features provided and algorithms supported</a></li> | 
|  | 18 | </ul> | 
|  | 19 | </li> | 
|  | 20 |  | 
|  | 21 | <li><a href="#interfaces">Interfaces for user programs</a> | 
|  | 22 | <ul> | 
|  | 23 | <li><a href="#roots">Identifying GC roots on the stack: <tt>llvm.gcroot</tt></a></li> | 
| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 24 | <li><a href="#allocate">Allocating memory from the GC</a></li> | 
|  | 25 | <li><a href="#barriers">Reading and writing references to the heap</a></li> | 
|  | 26 | <li><a href="#explicit">Explicit invocation of the garbage collector</a></li> | 
|  | 27 | </ul> | 
|  | 28 | </li> | 
|  | 29 |  | 
|  | 30 | <li><a href="#gcimpl">Implementing a garbage collector</a> | 
|  | 31 | <ul> | 
|  | 32 | <li><a href="#llvm_gc_readwrite">Implementing <tt>llvm_gc_read</tt> and <tt>llvm_gc_write</tt></a></li> | 
| Chris Lattner | 9b2a184 | 2004-05-27 05:52:10 +0000 | [diff] [blame] | 33 | <li><a href="#callbacks">Callback functions used to implement the garbage collector</a></li> | 
| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 34 | </ul> | 
|  | 35 | </li> | 
| Chris Lattner | 9b2a184 | 2004-05-27 05:52:10 +0000 | [diff] [blame] | 36 | <li><a href="#gcimpls">GC implementations available</a> | 
|  | 37 | <ul> | 
|  | 38 | <li><a href="#semispace">SemiSpace - A simple copying garbage collector</a></li> | 
| Chris Lattner | 0b02dbc | 2004-07-09 05:03:54 +0000 | [diff] [blame] | 39 | </ul> | 
| Chris Lattner | 9b2a184 | 2004-05-27 05:52:10 +0000 | [diff] [blame] | 40 | </li> | 
| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 41 |  | 
|  | 42 | <!-- | 
|  | 43 | <li><a href="#codegen">Implementing GC support in a code generator</a></li> | 
|  | 44 | --> | 
|  | 45 | </ol> | 
|  | 46 |  | 
|  | 47 | <div class="doc_author"> | 
|  | 48 | <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p> | 
|  | 49 | </div> | 
|  | 50 |  | 
|  | 51 | <!-- *********************************************************************** --> | 
|  | 52 | <div class="doc_section"> | 
|  | 53 | <a name="introduction">Introduction</a> | 
|  | 54 | </div> | 
|  | 55 | <!-- *********************************************************************** --> | 
|  | 56 |  | 
|  | 57 | <div class="doc_text"> | 
|  | 58 |  | 
|  | 59 | <p>Garbage collection is a widely used technique that frees the programmer from | 
|  | 60 | having to know the life-times of heap objects, making software easier to produce | 
|  | 61 | and maintain.  Many programming languages rely on garbage collection for | 
|  | 62 | automatic memory management.  There are two primary forms of garbage collection: | 
|  | 63 | conservative and accurate.</p> | 
|  | 64 |  | 
|  | 65 | <p>Conservative garbage collection often does not require any special support | 
|  | 66 | from either the language or the compiler: it can handle non-type-safe | 
|  | 67 | programming languages (such as C/C++) and does not require any special | 
|  | 68 | information from the compiler.  The [LINK] Boehm collector is an example of a | 
|  | 69 | state-of-the-art conservative collector.</p> | 
|  | 70 |  | 
|  | 71 | <p>Accurate garbage collection requires the ability to identify all pointers in | 
|  | 72 | the program at run-time (which requires that the source-language be type-safe in | 
|  | 73 | most cases).  Identifying pointers at run-time requires compiler support to | 
|  | 74 | locate all places that hold live pointer variables at run-time, including the | 
|  | 75 | <a href="#roots">processor stack and registers</a>.</p> | 
|  | 76 |  | 
|  | 77 | <p> | 
|  | 78 | Conservative garbage collection is attractive because it does not require any | 
|  | 79 | special compiler support, but it does have problems.  In particular, because the | 
|  | 80 | conservative garbage collector cannot <i>know</i> that a particular word in the | 
|  | 81 | machine is a pointer, it cannot move live objects in the heap (preventing the | 
|  | 82 | use of compacting and generational GC algorithms) and it can occasionally suffer | 
|  | 83 | from memory leaks due to integer values that happen to point to objects in the | 
|  | 84 | program.  In addition, some aggressive compiler transformations can break | 
|  | 85 | conservative garbage collectors (though these seem rare in practice). | 
|  | 86 | </p> | 
|  | 87 |  | 
|  | 88 | <p> | 
|  | 89 | Accurate garbage collectors do not suffer from any of these problems, but they | 
|  | 90 | can suffer from degraded scalar optimization of the program.  In particular, | 
|  | 91 | because the runtime must be able to identify and update all pointers active in | 
|  | 92 | the program, some optimizations are less effective.  In practice, however, the | 
|  | 93 | locality and performance benefits of using aggressive garbage allocation | 
|  | 94 | techniques dominates any low-level losses. | 
|  | 95 | </p> | 
|  | 96 |  | 
|  | 97 | <p> | 
|  | 98 | This document describes the mechanisms and interfaces provided by LLVM to | 
|  | 99 | support accurate garbage collection. | 
|  | 100 | </p> | 
|  | 101 |  | 
|  | 102 | </div> | 
|  | 103 |  | 
|  | 104 | <!-- ======================================================================= --> | 
|  | 105 | <div class="doc_subsection"> | 
|  | 106 | <a name="feature">GC features provided and algorithms supported</a> | 
|  | 107 | </div> | 
|  | 108 |  | 
|  | 109 | <div class="doc_text"> | 
|  | 110 |  | 
|  | 111 | <p> | 
|  | 112 | LLVM provides support for a broad class of garbage collection algorithms, | 
|  | 113 | including compacting semi-space collectors, mark-sweep collectors, generational | 
|  | 114 | collectors, and even reference counting implementations.  It includes support | 
|  | 115 | for <a href="#barriers">read and write barriers</a>, and associating <a | 
|  | 116 | href="#roots">meta-data with stack objects</a> (used for tagless garbage | 
|  | 117 | collection).  All LLVM code generators support garbage collection, including the | 
|  | 118 | C backend. | 
|  | 119 | </p> | 
|  | 120 |  | 
|  | 121 | <p> | 
|  | 122 | We hope that the primitive support built into LLVM is sufficient to support a | 
|  | 123 | broad class of garbage collected languages, including Scheme, ML, scripting | 
|  | 124 | languages, Java, C#, etc.  That said, the implemented garbage collectors may | 
|  | 125 | need to be extended to support language-specific features such as finalization, | 
|  | 126 | weak references, or other features.  As these needs are identified and | 
|  | 127 | implemented, they should be added to this specification. | 
|  | 128 | </p> | 
|  | 129 |  | 
|  | 130 | <p> | 
|  | 131 | LLVM does not currently support garbage collection of multi-threaded programs or | 
|  | 132 | GC-safe points other than function calls, but these will be added in the future | 
|  | 133 | as there is interest. | 
|  | 134 | </p> | 
|  | 135 |  | 
|  | 136 | </div> | 
|  | 137 |  | 
|  | 138 | <!-- *********************************************************************** --> | 
|  | 139 | <div class="doc_section"> | 
|  | 140 | <a name="interfaces">Interfaces for user programs</a> | 
|  | 141 | </div> | 
|  | 142 | <!-- *********************************************************************** --> | 
|  | 143 |  | 
|  | 144 | <div class="doc_text"> | 
|  | 145 |  | 
|  | 146 | <p>This section describes the interfaces provided by LLVM and by the garbage | 
|  | 147 | collector run-time that should be used by user programs.  As such, this is the | 
|  | 148 | interface that front-end authors should generate code for. | 
|  | 149 | </p> | 
|  | 150 |  | 
|  | 151 | </div> | 
|  | 152 |  | 
|  | 153 | <!-- ======================================================================= --> | 
|  | 154 | <div class="doc_subsection"> | 
|  | 155 | <a name="roots">Identifying GC roots on the stack: <tt>llvm.gcroot</tt></a> | 
|  | 156 | </div> | 
|  | 157 |  | 
|  | 158 | <div class="doc_text"> | 
|  | 159 |  | 
|  | 160 | <div class="doc_code"><tt> | 
|  | 161 | void %llvm.gcroot(<ty>** %ptrloc, <ty2>* %metadata) | 
|  | 162 | </tt></div> | 
|  | 163 |  | 
|  | 164 | <p> | 
|  | 165 | The <tt>llvm.gcroot</tt> intrinsic is used to inform LLVM of a pointer variable | 
|  | 166 | on the stack.  The first argument contains the address of the variable on the | 
|  | 167 | stack, and the second contains a pointer to metadata that should be associated | 
|  | 168 | with the pointer (which <b>must</b> be a constant or global value address).  At | 
|  | 169 | runtime, the <tt>llvm.gcroot</tt> intrinsic stores a null pointer into the | 
|  | 170 | specified location to initialize the pointer.</p> | 
|  | 171 |  | 
|  | 172 | <p> | 
|  | 173 | Consider the following fragment of Java code: | 
|  | 174 | </p> | 
|  | 175 |  | 
|  | 176 | <pre> | 
|  | 177 | { | 
|  | 178 | Object X;   // A null-initialized reference to an object | 
|  | 179 | ... | 
|  | 180 | } | 
|  | 181 | </pre> | 
|  | 182 |  | 
|  | 183 | <p> | 
|  | 184 | This block (which may be located in the middle of a function or in a loop nest), | 
|  | 185 | could be compiled to this LLVM code: | 
|  | 186 | </p> | 
|  | 187 |  | 
|  | 188 | <pre> | 
|  | 189 | Entry: | 
|  | 190 | ;; In the entry block for the function, allocate the | 
|  | 191 | ;; stack space for X, which is an LLVM pointer. | 
|  | 192 | %X = alloca %Object* | 
|  | 193 | ... | 
|  | 194 |  | 
|  | 195 | ;; "CodeBlock" is the block corresponding to the start | 
| Reid Spencer | 03d186a | 2004-05-25 08:45:31 +0000 | [diff] [blame] | 196 | ;;  of the scope above. | 
| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 197 | CodeBlock: | 
|  | 198 | ;; Initialize the object, telling LLVM that it is now live. | 
|  | 199 | ;; Java has type-tags on objects, so it doesn't need any | 
|  | 200 | ;; metadata. | 
|  | 201 | call void %llvm.gcroot(%Object** %X, sbyte* null) | 
|  | 202 | ... | 
|  | 203 |  | 
|  | 204 | ;; As the pointer goes out of scope, store a null value into | 
|  | 205 | ;; it, to indicate that the value is no longer live. | 
|  | 206 | store %Object* null, %Object** %X | 
|  | 207 | ... | 
|  | 208 | </pre> | 
|  | 209 |  | 
|  | 210 | </div> | 
|  | 211 |  | 
|  | 212 | <!-- ======================================================================= --> | 
|  | 213 | <div class="doc_subsection"> | 
| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 214 | <a name="allocate">Allocating memory from the GC</a> | 
|  | 215 | </div> | 
|  | 216 |  | 
|  | 217 | <div class="doc_text"> | 
|  | 218 |  | 
|  | 219 | <div class="doc_code"><tt> | 
|  | 220 | sbyte *%llvm_gc_allocate(unsigned %Size) | 
|  | 221 | </tt></div> | 
|  | 222 |  | 
|  | 223 | <p>The <tt>llvm_gc_allocate</tt> function is a global function defined by the | 
| Chris Lattner | aab3aff | 2004-06-09 03:59:05 +0000 | [diff] [blame] | 224 | garbage collector implementation to allocate memory.  It returns a | 
| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 225 | zeroed-out block of memory of the appropriate size.</p> | 
|  | 226 |  | 
|  | 227 | </div> | 
|  | 228 |  | 
|  | 229 | <!-- ======================================================================= --> | 
|  | 230 | <div class="doc_subsection"> | 
|  | 231 | <a name="barriers">Reading and writing references to the heap</a> | 
|  | 232 | </div> | 
|  | 233 |  | 
|  | 234 | <div class="doc_text"> | 
|  | 235 |  | 
|  | 236 | <div class="doc_code"><tt> | 
|  | 237 | sbyte *%llvm.gcread(sbyte **)<br> | 
|  | 238 | void %llvm.gcwrite(sbyte*, sbyte**) | 
|  | 239 | </tt></div> | 
|  | 240 |  | 
|  | 241 | <p>Several of the more interesting garbage collectors (e.g., generational | 
|  | 242 | collectors) need to be informed when the mutator (the program that needs garbage | 
|  | 243 | collection) reads or writes object references into the heap.  In the case of a | 
|  | 244 | generational collector, it needs to keep track of which "old" generation objects | 
|  | 245 | have references stored into them.  The amount of code that typically needs to be | 
| Chris Lattner | aab3aff | 2004-06-09 03:59:05 +0000 | [diff] [blame] | 246 | executed is usually quite small (and not on the critical path of any | 
|  | 247 | computation), so the overall performance impact of the inserted code is | 
|  | 248 | tolerable.</p> | 
| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 249 |  | 
|  | 250 | <p>To support garbage collectors that use read or write barriers, LLVM provides | 
|  | 251 | the <tt>llvm.gcread</tt> and <tt>llvm.gcwrite</tt> intrinsics.  The first | 
|  | 252 | intrinsic has exactly the same semantics as a non-volatile LLVM load and the | 
|  | 253 | second has the same semantics as a non-volatile LLVM store.  At code generation | 
|  | 254 | time, these intrinsics are replaced with calls into the garbage collector | 
|  | 255 | (<tt><a href="#llvm_gc_readwrite">llvm_gc_read</a></tt> and <tt><a | 
|  | 256 | href="#llvm_gc_readwrite">llvm_gc_write</a></tt> respectively), which are then | 
|  | 257 | inlined into the code. | 
|  | 258 | </p> | 
|  | 259 |  | 
|  | 260 | <p> | 
|  | 261 | If you are writing a front-end for a garbage collected language, every load or | 
|  | 262 | store of a reference from or to the heap should use these intrinsics instead of | 
|  | 263 | normal LLVM loads/stores.</p> | 
|  | 264 |  | 
|  | 265 | </div> | 
|  | 266 |  | 
|  | 267 | <!-- ======================================================================= --> | 
|  | 268 | <div class="doc_subsection"> | 
|  | 269 | <a name="initialize">Garbage collector startup and initialization</a> | 
|  | 270 | </div> | 
|  | 271 |  | 
|  | 272 | <div class="doc_text"> | 
|  | 273 |  | 
|  | 274 | <div class="doc_code"><tt> | 
| Chris Lattner | 9b2a184 | 2004-05-27 05:52:10 +0000 | [diff] [blame] | 275 | void %llvm_gc_initialize(unsigned %InitialHeapSize) | 
| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 276 | </tt></div> | 
|  | 277 |  | 
|  | 278 | <p> | 
|  | 279 | The <tt>llvm_gc_initialize</tt> function should be called once before any other | 
|  | 280 | garbage collection functions are called.  This gives the garbage collector the | 
| Chris Lattner | 9b2a184 | 2004-05-27 05:52:10 +0000 | [diff] [blame] | 281 | chance to initialize itself and allocate the heap spaces.  The initial heap size | 
|  | 282 | to allocate should be specified as an argument. | 
| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 283 | </p> | 
|  | 284 |  | 
|  | 285 | </div> | 
|  | 286 |  | 
|  | 287 | <!-- ======================================================================= --> | 
|  | 288 | <div class="doc_subsection"> | 
|  | 289 | <a name="explicit">Explicit invocation of the garbage collector</a> | 
|  | 290 | </div> | 
|  | 291 |  | 
|  | 292 | <div class="doc_text"> | 
|  | 293 |  | 
|  | 294 | <div class="doc_code"><tt> | 
|  | 295 | void %llvm_gc_collect() | 
|  | 296 | </tt></div> | 
|  | 297 |  | 
|  | 298 | <p> | 
|  | 299 | The <tt>llvm_gc_collect</tt> function is exported by the garbage collector | 
|  | 300 | implementations to provide a full collection, even when the heap is not | 
|  | 301 | exhausted.  This can be used by end-user code as a hint, and may be ignored by | 
|  | 302 | the garbage collector. | 
|  | 303 | </p> | 
|  | 304 |  | 
|  | 305 | </div> | 
|  | 306 |  | 
|  | 307 |  | 
|  | 308 | <!-- *********************************************************************** --> | 
|  | 309 | <div class="doc_section"> | 
|  | 310 | <a name="gcimpl">Implementing a garbage collector</a> | 
|  | 311 | </div> | 
|  | 312 | <!-- *********************************************************************** --> | 
|  | 313 |  | 
|  | 314 | <div class="doc_text"> | 
|  | 315 |  | 
|  | 316 | <p> | 
| Chris Lattner | 9b2a184 | 2004-05-27 05:52:10 +0000 | [diff] [blame] | 317 | Implementing a garbage collector for LLVM is fairly straight-forward.  The LLVM | 
|  | 318 | garbage collectors are provided in a form that makes them easy to link into the | 
|  | 319 | language-specific runtime that a language front-end would use.  They require | 
|  | 320 | functionality from the language-specific runtime to get information about <a | 
|  | 321 | href="#gcdescriptors">where pointers are located in heap objects</a>. | 
|  | 322 | </p> | 
|  | 323 |  | 
|  | 324 | <p>The | 
| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 325 | implementation must include the <a | 
|  | 326 | href="#allocate"><tt>llvm_gc_allocate</tt></a> and <a | 
|  | 327 | href="#explicit"><tt>llvm_gc_collect</tt></a> functions, and it must implement | 
|  | 328 | the <a href="#llvm_gc_readwrite">read/write barrier</a> functions as well.  To | 
|  | 329 | do this, it will probably have to <a href="#traceroots">trace through the roots | 
|  | 330 | from the stack</a> and understand the <a href="#gcdescriptors">GC descriptors | 
|  | 331 | for heap objects</a>.  Luckily, there are some <a href="#gcimpls">example | 
|  | 332 | implementations</a> available. | 
|  | 333 | </p> | 
|  | 334 | </div> | 
|  | 335 |  | 
|  | 336 |  | 
|  | 337 | <!-- ======================================================================= --> | 
|  | 338 | <div class="doc_subsection"> | 
|  | 339 | <a name="llvm_gc_readwrite">Implementing <tt>llvm_gc_read</tt> and <tt>llvm_gc_write</tt></a> | 
|  | 340 | </div> | 
|  | 341 |  | 
|  | 342 | <div class="doc_text"> | 
|  | 343 | <div class="doc_code"><tt> | 
|  | 344 | void *llvm_gc_read(void **)<br> | 
|  | 345 | void llvm_gc_write(void*, void**) | 
|  | 346 | </tt></div> | 
|  | 347 |  | 
|  | 348 | <p> | 
|  | 349 | These functions <i>must</i> be implemented in every garbage collector, even if | 
|  | 350 | they do not need read/write barriers.  In this case, just load or store the | 
|  | 351 | pointer, then return. | 
|  | 352 | </p> | 
|  | 353 |  | 
|  | 354 | <p> | 
|  | 355 | If an actual read or write barrier is needed, it should be straight-forward to | 
|  | 356 | implement it.  Note that we may add a pointer to the start of the memory object | 
|  | 357 | as a parameter in the future, if needed. | 
|  | 358 | </p> | 
|  | 359 |  | 
|  | 360 | </div> | 
|  | 361 |  | 
|  | 362 | <!-- ======================================================================= --> | 
|  | 363 | <div class="doc_subsection"> | 
| Chris Lattner | 0b02dbc | 2004-07-09 05:03:54 +0000 | [diff] [blame] | 364 | <a name="callbacks">Callback functions used to implement the garbage collector</a> | 
| Chris Lattner | 9b2a184 | 2004-05-27 05:52:10 +0000 | [diff] [blame] | 365 | </div> | 
|  | 366 |  | 
| Chris Lattner | 0b02dbc | 2004-07-09 05:03:54 +0000 | [diff] [blame] | 367 | <div class="doc_text"> | 
|  | 368 | <p> | 
| Chris Lattner | 9b2a184 | 2004-05-27 05:52:10 +0000 | [diff] [blame] | 369 | Garbage collector implementations make use of call-back functions that are | 
|  | 370 | implemented by other parts of the LLVM system. | 
| Chris Lattner | 0b02dbc | 2004-07-09 05:03:54 +0000 | [diff] [blame] | 371 | </p> | 
|  | 372 | </div> | 
| Chris Lattner | 9b2a184 | 2004-05-27 05:52:10 +0000 | [diff] [blame] | 373 |  | 
|  | 374 | <!--_________________________________________________________________________--> | 
|  | 375 | <div class="doc_subsubsection"> | 
|  | 376 | <a name="traceroots">Tracing GC pointers from the program stack</a> | 
| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 377 | </div> | 
|  | 378 |  | 
|  | 379 | <div class="doc_text"> | 
|  | 380 | <div class="doc_code"><tt> | 
|  | 381 | void llvm_cg_walk_gcroots(void (*FP)(void **Root, void *Meta)); | 
|  | 382 | </tt></div> | 
|  | 383 |  | 
|  | 384 | <p> | 
|  | 385 | The <tt>llvm_cg_walk_gcroots</tt> function is a function provided by the code | 
|  | 386 | generator that iterates through all of the GC roots on the stack, calling the | 
|  | 387 | specified function pointer with each record.  For each GC root, the address of | 
|  | 388 | the pointer and the meta-data (from the <a | 
|  | 389 | href="#gcroot"><tt>llvm.gcroot</tt></a> intrinsic) are provided. | 
|  | 390 | </p> | 
|  | 391 | </div> | 
|  | 392 |  | 
| Chris Lattner | 9b2a184 | 2004-05-27 05:52:10 +0000 | [diff] [blame] | 393 | <!--_________________________________________________________________________--> | 
|  | 394 | <div class="doc_subsubsection"> | 
|  | 395 | <a name="staticroots">Tracing GC pointers from static roots</a> | 
|  | 396 | </div> | 
| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 397 |  | 
| Chris Lattner | 9b2a184 | 2004-05-27 05:52:10 +0000 | [diff] [blame] | 398 | <div class="doc_text"> | 
|  | 399 | TODO | 
|  | 400 | </div> | 
|  | 401 |  | 
|  | 402 |  | 
|  | 403 | <!--_________________________________________________________________________--> | 
|  | 404 | <div class="doc_subsubsection"> | 
|  | 405 | <a name="gcdescriptors">Tracing GC pointers from heap objects</a> | 
|  | 406 | </div> | 
|  | 407 |  | 
|  | 408 | <div class="doc_text"> | 
|  | 409 | <p> | 
|  | 410 | The three most common ways to keep track of where pointers live in heap objects | 
|  | 411 | are (listed in order of space overhead required):</p> | 
|  | 412 |  | 
|  | 413 | <ol> | 
|  | 414 | <li>In languages with polymorphic objects, pointers from an object header are | 
|  | 415 | usually used to identify the GC pointers in the heap object.  This is common for | 
|  | 416 | object-oriented languages like Self, Smalltalk, Java, or C#.</li> | 
|  | 417 |  | 
|  | 418 | <li>If heap objects are not polymorphic, often the "shape" of the heap can be | 
|  | 419 | determined from the roots of the heap or from some other meta-data [<a | 
|  | 420 | href="#appel89">Appel89</a>, <a href="#goldberg91">Goldberg91</a>, <a | 
|  | 421 | href="#tolmach94">Tolmach94</a>].  In this case, the garbage collector can | 
|  | 422 | propagate the information around from meta data stored with the roots.  This | 
|  | 423 | often eliminates the need to have a header on objects in the heap.  This is | 
|  | 424 | common in the ML family.</li> | 
|  | 425 |  | 
|  | 426 | <li>If all heap objects have pointers in the same locations, or pointers can be | 
|  | 427 | distinguished just by looking at them (e.g., the low order bit is clear), no | 
|  | 428 | book-keeping is needed at all.  This is common for Lisp-like languages.</li> | 
|  | 429 | </ol> | 
|  | 430 |  | 
|  | 431 | <p>The LLVM garbage collectors are capable of supporting all of these styles of | 
|  | 432 | language, including ones that mix various implementations.  To do this, it | 
|  | 433 | allows the source-language to associate meta-data with the <a | 
|  | 434 | href="#roots">stack roots</a>, and the heap tracing routines can propagate the | 
|  | 435 | information.  In addition, LLVM allows the front-end to extract GC information | 
|  | 436 | from in any form from a specific object pointer (this supports situations #1 and | 
|  | 437 | #3). | 
|  | 438 | </p> | 
|  | 439 |  | 
|  | 440 | <p><b>Making this efficient</b></p> | 
|  | 441 |  | 
|  | 442 |  | 
|  | 443 |  | 
|  | 444 | </div> | 
|  | 445 |  | 
|  | 446 |  | 
|  | 447 |  | 
|  | 448 | <!-- *********************************************************************** --> | 
|  | 449 | <div class="doc_section"> | 
| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 450 | <a name="gcimpls">GC implementations available</a> | 
|  | 451 | </div> | 
| Chris Lattner | 9b2a184 | 2004-05-27 05:52:10 +0000 | [diff] [blame] | 452 | <!-- *********************************************************************** --> | 
| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 453 |  | 
|  | 454 | <div class="doc_text"> | 
|  | 455 |  | 
|  | 456 | <p> | 
|  | 457 | To make this more concrete, the currently implemented LLVM garbage collectors | 
| Chris Lattner | 9b2a184 | 2004-05-27 05:52:10 +0000 | [diff] [blame] | 458 | all live in the <tt>llvm/runtime/GC/*</tt> directories in the LLVM source-base. | 
|  | 459 | If you are interested in implementing an algorithm, there are many interesting | 
|  | 460 | possibilities (mark/sweep, a generational collector, a reference counting | 
|  | 461 | collector, etc), or you could choose to improve one of the existing algorithms. | 
| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 462 | </p> | 
|  | 463 |  | 
|  | 464 | </div> | 
|  | 465 |  | 
| Chris Lattner | 9b2a184 | 2004-05-27 05:52:10 +0000 | [diff] [blame] | 466 | <!-- ======================================================================= --> | 
|  | 467 | <div class="doc_subsection"> | 
| Chris Lattner | 0b02dbc | 2004-07-09 05:03:54 +0000 | [diff] [blame] | 468 | <a name="semispace">SemiSpace - A simple copying garbage collector</a> | 
| Chris Lattner | 9b2a184 | 2004-05-27 05:52:10 +0000 | [diff] [blame] | 469 | </div> | 
|  | 470 |  | 
|  | 471 | <div class="doc_text"> | 
|  | 472 | <p> | 
|  | 473 | SemiSpace is a very simple copying collector.  When it starts up, it allocates | 
|  | 474 | two blocks of memory for the heap.  It uses a simple bump-pointer allocator to | 
|  | 475 | allocate memory from the first block until it runs out of space.  When it runs | 
|  | 476 | out of space, it traces through all of the roots of the program, copying blocks | 
|  | 477 | to the other half of the memory space. | 
|  | 478 | </p> | 
|  | 479 |  | 
|  | 480 | </div> | 
|  | 481 |  | 
|  | 482 | <!--_________________________________________________________________________--> | 
|  | 483 | <div class="doc_subsubsection"> | 
|  | 484 | Possible Improvements | 
|  | 485 | </div> | 
|  | 486 |  | 
|  | 487 | <div class="doc_text"> | 
|  | 488 |  | 
|  | 489 | <p> | 
|  | 490 | If a collection cycle happens and the heap is not compacted very much (say less | 
|  | 491 | than 25% of the allocated memory was freed), the memory regions should be | 
|  | 492 | doubled in size.</p> | 
|  | 493 |  | 
|  | 494 | </div> | 
|  | 495 |  | 
|  | 496 | <!-- *********************************************************************** --> | 
|  | 497 | <div class="doc_section"> | 
|  | 498 | <a name="references">References</a> | 
|  | 499 | </div> | 
|  | 500 | <!-- *********************************************************************** --> | 
|  | 501 |  | 
|  | 502 | <div class="doc_text"> | 
|  | 503 |  | 
|  | 504 | <p><a name="appel89">[Appel89]</a> Runtime Tags Aren't Necessary. Andrew | 
|  | 505 | W. Appel. Lisp and Symbolic Computation 19(7):703-705, July 1989.</p> | 
|  | 506 |  | 
|  | 507 | <p><a name="goldberg91">[Goldberg91]</a> Tag-free garbage collection for | 
|  | 508 | strongly typed programming languages.  Benjamin Goldberg. ACM SIGPLAN | 
|  | 509 | PLDI'91.</p> | 
|  | 510 |  | 
|  | 511 | <p><a name="tolmach94">[Tolmach94]</a> Tag-free garbage collection using | 
|  | 512 | explicit type parameters.  Andrew Tolmach.  Proceedings of the 1994 ACM | 
|  | 513 | conference on LISP and functional programming.</p> | 
|  | 514 |  | 
|  | 515 | </div> | 
| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 516 |  | 
|  | 517 | <!-- *********************************************************************** --> | 
|  | 518 |  | 
|  | 519 | <hr> | 
|  | 520 | <address> | 
|  | 521 | <a href="http://jigsaw.w3.org/css-validator/check/referer"><img | 
|  | 522 | src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> | 
|  | 523 | <a href="http://validator.w3.org/check/referer"><img | 
|  | 524 | src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> | 
|  | 525 |  | 
|  | 526 | <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> | 
|  | 527 | <a href="http://llvm.cs.uiuc.edu">LLVM Compiler Infrastructure</a><br> | 
|  | 528 | Last modified: $Date$ | 
|  | 529 | </address> | 
|  | 530 |  | 
|  | 531 | </body> | 
|  | 532 | </html> |