| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" | 
|  | 2 | "http://www.w3.org/TR/html4/strict.dtd"> | 
|  | 3 | <html> | 
|  | 4 | <head> | 
|  | 5 | <title>Accurate Garbage Collection with LLVM</title> | 
|  | 6 | <link rel="stylesheet" href="llvm.css" type="text/css"> | 
|  | 7 | </head> | 
|  | 8 | <body> | 
|  | 9 |  | 
|  | 10 | <div class="doc_title"> | 
|  | 11 | Accurate Garbage Collection with LLVM | 
|  | 12 | </div> | 
|  | 13 |  | 
|  | 14 | <ol> | 
|  | 15 | <li><a href="#introduction">Introduction</a> | 
|  | 16 | <ul> | 
|  | 17 | <li><a href="#feature">GC features provided and algorithms supported</a></li> | 
|  | 18 | </ul> | 
|  | 19 | </li> | 
|  | 20 |  | 
|  | 21 | <li><a href="#interfaces">Interfaces for user programs</a> | 
|  | 22 | <ul> | 
|  | 23 | <li><a href="#roots">Identifying GC roots on the stack: <tt>llvm.gcroot</tt></a></li> | 
| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 24 | <li><a href="#allocate">Allocating memory from the GC</a></li> | 
|  | 25 | <li><a href="#barriers">Reading and writing references to the heap</a></li> | 
|  | 26 | <li><a href="#explicit">Explicit invocation of the garbage collector</a></li> | 
|  | 27 | </ul> | 
|  | 28 | </li> | 
|  | 29 |  | 
|  | 30 | <li><a href="#gcimpl">Implementing a garbage collector</a> | 
|  | 31 | <ul> | 
|  | 32 | <li><a href="#llvm_gc_readwrite">Implementing <tt>llvm_gc_read</tt> and <tt>llvm_gc_write</tt></a></li> | 
| Chris Lattner | 9b2a184 | 2004-05-27 05:52:10 +0000 | [diff] [blame^] | 33 | <li><a href="#callbacks">Callback functions used to implement the garbage collector</a></li> | 
| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 34 | </ul> | 
|  | 35 | </li> | 
| Chris Lattner | 9b2a184 | 2004-05-27 05:52:10 +0000 | [diff] [blame^] | 36 | <li><a href="#gcimpls">GC implementations available</a> | 
|  | 37 | <ul> | 
|  | 38 | <li><a href="#semispace">SemiSpace - A simple copying garbage collector</a></li> | 
|  | 39 | </li> | 
| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 40 |  | 
|  | 41 | <!-- | 
|  | 42 | <li><a href="#codegen">Implementing GC support in a code generator</a></li> | 
|  | 43 | --> | 
|  | 44 | </ol> | 
|  | 45 |  | 
|  | 46 | <div class="doc_author"> | 
|  | 47 | <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p> | 
|  | 48 | </div> | 
|  | 49 |  | 
|  | 50 | <!-- *********************************************************************** --> | 
|  | 51 | <div class="doc_section"> | 
|  | 52 | <a name="introduction">Introduction</a> | 
|  | 53 | </div> | 
|  | 54 | <!-- *********************************************************************** --> | 
|  | 55 |  | 
|  | 56 | <div class="doc_text"> | 
|  | 57 |  | 
|  | 58 | <p>Garbage collection is a widely used technique that frees the programmer from | 
|  | 59 | having to know the life-times of heap objects, making software easier to produce | 
|  | 60 | and maintain.  Many programming languages rely on garbage collection for | 
|  | 61 | automatic memory management.  There are two primary forms of garbage collection: | 
|  | 62 | conservative and accurate.</p> | 
|  | 63 |  | 
|  | 64 | <p>Conservative garbage collection often does not require any special support | 
|  | 65 | from either the language or the compiler: it can handle non-type-safe | 
|  | 66 | programming languages (such as C/C++) and does not require any special | 
|  | 67 | information from the compiler.  The [LINK] Boehm collector is an example of a | 
|  | 68 | state-of-the-art conservative collector.</p> | 
|  | 69 |  | 
|  | 70 | <p>Accurate garbage collection requires the ability to identify all pointers in | 
|  | 71 | the program at run-time (which requires that the source-language be type-safe in | 
|  | 72 | most cases).  Identifying pointers at run-time requires compiler support to | 
|  | 73 | locate all places that hold live pointer variables at run-time, including the | 
|  | 74 | <a href="#roots">processor stack and registers</a>.</p> | 
|  | 75 |  | 
|  | 76 | <p> | 
|  | 77 | Conservative garbage collection is attractive because it does not require any | 
|  | 78 | special compiler support, but it does have problems.  In particular, because the | 
|  | 79 | conservative garbage collector cannot <i>know</i> that a particular word in the | 
|  | 80 | machine is a pointer, it cannot move live objects in the heap (preventing the | 
|  | 81 | use of compacting and generational GC algorithms) and it can occasionally suffer | 
|  | 82 | from memory leaks due to integer values that happen to point to objects in the | 
|  | 83 | program.  In addition, some aggressive compiler transformations can break | 
|  | 84 | conservative garbage collectors (though these seem rare in practice). | 
|  | 85 | </p> | 
|  | 86 |  | 
|  | 87 | <p> | 
|  | 88 | Accurate garbage collectors do not suffer from any of these problems, but they | 
|  | 89 | can suffer from degraded scalar optimization of the program.  In particular, | 
|  | 90 | because the runtime must be able to identify and update all pointers active in | 
|  | 91 | the program, some optimizations are less effective.  In practice, however, the | 
|  | 92 | locality and performance benefits of using aggressive garbage allocation | 
|  | 93 | techniques dominates any low-level losses. | 
|  | 94 | </p> | 
|  | 95 |  | 
|  | 96 | <p> | 
|  | 97 | This document describes the mechanisms and interfaces provided by LLVM to | 
|  | 98 | support accurate garbage collection. | 
|  | 99 | </p> | 
|  | 100 |  | 
|  | 101 | </div> | 
|  | 102 |  | 
|  | 103 | <!-- ======================================================================= --> | 
|  | 104 | <div class="doc_subsection"> | 
|  | 105 | <a name="feature">GC features provided and algorithms supported</a> | 
|  | 106 | </div> | 
|  | 107 |  | 
|  | 108 | <div class="doc_text"> | 
|  | 109 |  | 
|  | 110 | <p> | 
|  | 111 | LLVM provides support for a broad class of garbage collection algorithms, | 
|  | 112 | including compacting semi-space collectors, mark-sweep collectors, generational | 
|  | 113 | collectors, and even reference counting implementations.  It includes support | 
|  | 114 | for <a href="#barriers">read and write barriers</a>, and associating <a | 
|  | 115 | href="#roots">meta-data with stack objects</a> (used for tagless garbage | 
|  | 116 | collection).  All LLVM code generators support garbage collection, including the | 
|  | 117 | C backend. | 
|  | 118 | </p> | 
|  | 119 |  | 
|  | 120 | <p> | 
|  | 121 | We hope that the primitive support built into LLVM is sufficient to support a | 
|  | 122 | broad class of garbage collected languages, including Scheme, ML, scripting | 
|  | 123 | languages, Java, C#, etc.  That said, the implemented garbage collectors may | 
|  | 124 | need to be extended to support language-specific features such as finalization, | 
|  | 125 | weak references, or other features.  As these needs are identified and | 
|  | 126 | implemented, they should be added to this specification. | 
|  | 127 | </p> | 
|  | 128 |  | 
|  | 129 | <p> | 
|  | 130 | LLVM does not currently support garbage collection of multi-threaded programs or | 
|  | 131 | GC-safe points other than function calls, but these will be added in the future | 
|  | 132 | as there is interest. | 
|  | 133 | </p> | 
|  | 134 |  | 
|  | 135 | </div> | 
|  | 136 |  | 
|  | 137 | <!-- *********************************************************************** --> | 
|  | 138 | <div class="doc_section"> | 
|  | 139 | <a name="interfaces">Interfaces for user programs</a> | 
|  | 140 | </div> | 
|  | 141 | <!-- *********************************************************************** --> | 
|  | 142 |  | 
|  | 143 | <div class="doc_text"> | 
|  | 144 |  | 
|  | 145 | <p>This section describes the interfaces provided by LLVM and by the garbage | 
|  | 146 | collector run-time that should be used by user programs.  As such, this is the | 
|  | 147 | interface that front-end authors should generate code for. | 
|  | 148 | </p> | 
|  | 149 |  | 
|  | 150 | </div> | 
|  | 151 |  | 
|  | 152 | <!-- ======================================================================= --> | 
|  | 153 | <div class="doc_subsection"> | 
|  | 154 | <a name="roots">Identifying GC roots on the stack: <tt>llvm.gcroot</tt></a> | 
|  | 155 | </div> | 
|  | 156 |  | 
|  | 157 | <div class="doc_text"> | 
|  | 158 |  | 
|  | 159 | <div class="doc_code"><tt> | 
|  | 160 | void %llvm.gcroot(<ty>** %ptrloc, <ty2>* %metadata) | 
|  | 161 | </tt></div> | 
|  | 162 |  | 
|  | 163 | <p> | 
|  | 164 | The <tt>llvm.gcroot</tt> intrinsic is used to inform LLVM of a pointer variable | 
|  | 165 | on the stack.  The first argument contains the address of the variable on the | 
|  | 166 | stack, and the second contains a pointer to metadata that should be associated | 
|  | 167 | with the pointer (which <b>must</b> be a constant or global value address).  At | 
|  | 168 | runtime, the <tt>llvm.gcroot</tt> intrinsic stores a null pointer into the | 
|  | 169 | specified location to initialize the pointer.</p> | 
|  | 170 |  | 
|  | 171 | <p> | 
|  | 172 | Consider the following fragment of Java code: | 
|  | 173 | </p> | 
|  | 174 |  | 
|  | 175 | <pre> | 
|  | 176 | { | 
|  | 177 | Object X;   // A null-initialized reference to an object | 
|  | 178 | ... | 
|  | 179 | } | 
|  | 180 | </pre> | 
|  | 181 |  | 
|  | 182 | <p> | 
|  | 183 | This block (which may be located in the middle of a function or in a loop nest), | 
|  | 184 | could be compiled to this LLVM code: | 
|  | 185 | </p> | 
|  | 186 |  | 
|  | 187 | <pre> | 
|  | 188 | Entry: | 
|  | 189 | ;; In the entry block for the function, allocate the | 
|  | 190 | ;; stack space for X, which is an LLVM pointer. | 
|  | 191 | %X = alloca %Object* | 
|  | 192 | ... | 
|  | 193 |  | 
|  | 194 | ;; "CodeBlock" is the block corresponding to the start | 
| Reid Spencer | 03d186a | 2004-05-25 08:45:31 +0000 | [diff] [blame] | 195 | ;;  of the scope above. | 
| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 196 | CodeBlock: | 
|  | 197 | ;; Initialize the object, telling LLVM that it is now live. | 
|  | 198 | ;; Java has type-tags on objects, so it doesn't need any | 
|  | 199 | ;; metadata. | 
|  | 200 | call void %llvm.gcroot(%Object** %X, sbyte* null) | 
|  | 201 | ... | 
|  | 202 |  | 
|  | 203 | ;; As the pointer goes out of scope, store a null value into | 
|  | 204 | ;; it, to indicate that the value is no longer live. | 
|  | 205 | store %Object* null, %Object** %X | 
|  | 206 | ... | 
|  | 207 | </pre> | 
|  | 208 |  | 
|  | 209 | </div> | 
|  | 210 |  | 
|  | 211 | <!-- ======================================================================= --> | 
|  | 212 | <div class="doc_subsection"> | 
| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 213 | <a name="allocate">Allocating memory from the GC</a> | 
|  | 214 | </div> | 
|  | 215 |  | 
|  | 216 | <div class="doc_text"> | 
|  | 217 |  | 
|  | 218 | <div class="doc_code"><tt> | 
|  | 219 | sbyte *%llvm_gc_allocate(unsigned %Size) | 
|  | 220 | </tt></div> | 
|  | 221 |  | 
|  | 222 | <p>The <tt>llvm_gc_allocate</tt> function is a global function defined by the | 
|  | 223 | garbage collector implementation to allocate memory.  It should return a | 
|  | 224 | zeroed-out block of memory of the appropriate size.</p> | 
|  | 225 |  | 
|  | 226 | </div> | 
|  | 227 |  | 
|  | 228 | <!-- ======================================================================= --> | 
|  | 229 | <div class="doc_subsection"> | 
|  | 230 | <a name="barriers">Reading and writing references to the heap</a> | 
|  | 231 | </div> | 
|  | 232 |  | 
|  | 233 | <div class="doc_text"> | 
|  | 234 |  | 
|  | 235 | <div class="doc_code"><tt> | 
|  | 236 | sbyte *%llvm.gcread(sbyte **)<br> | 
|  | 237 | void %llvm.gcwrite(sbyte*, sbyte**) | 
|  | 238 | </tt></div> | 
|  | 239 |  | 
|  | 240 | <p>Several of the more interesting garbage collectors (e.g., generational | 
|  | 241 | collectors) need to be informed when the mutator (the program that needs garbage | 
|  | 242 | collection) reads or writes object references into the heap.  In the case of a | 
|  | 243 | generational collector, it needs to keep track of which "old" generation objects | 
|  | 244 | have references stored into them.  The amount of code that typically needs to be | 
|  | 245 | executed is usually quite small, so the overall performance impact of the | 
|  | 246 | inserted code is tolerable.</p> | 
|  | 247 |  | 
|  | 248 | <p>To support garbage collectors that use read or write barriers, LLVM provides | 
|  | 249 | the <tt>llvm.gcread</tt> and <tt>llvm.gcwrite</tt> intrinsics.  The first | 
|  | 250 | intrinsic has exactly the same semantics as a non-volatile LLVM load and the | 
|  | 251 | second has the same semantics as a non-volatile LLVM store.  At code generation | 
|  | 252 | time, these intrinsics are replaced with calls into the garbage collector | 
|  | 253 | (<tt><a href="#llvm_gc_readwrite">llvm_gc_read</a></tt> and <tt><a | 
|  | 254 | href="#llvm_gc_readwrite">llvm_gc_write</a></tt> respectively), which are then | 
|  | 255 | inlined into the code. | 
|  | 256 | </p> | 
|  | 257 |  | 
|  | 258 | <p> | 
|  | 259 | If you are writing a front-end for a garbage collected language, every load or | 
|  | 260 | store of a reference from or to the heap should use these intrinsics instead of | 
|  | 261 | normal LLVM loads/stores.</p> | 
|  | 262 |  | 
|  | 263 | </div> | 
|  | 264 |  | 
|  | 265 | <!-- ======================================================================= --> | 
|  | 266 | <div class="doc_subsection"> | 
|  | 267 | <a name="initialize">Garbage collector startup and initialization</a> | 
|  | 268 | </div> | 
|  | 269 |  | 
|  | 270 | <div class="doc_text"> | 
|  | 271 |  | 
|  | 272 | <div class="doc_code"><tt> | 
| Chris Lattner | 9b2a184 | 2004-05-27 05:52:10 +0000 | [diff] [blame^] | 273 | void %llvm_gc_initialize(unsigned %InitialHeapSize) | 
| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 274 | </tt></div> | 
|  | 275 |  | 
|  | 276 | <p> | 
|  | 277 | The <tt>llvm_gc_initialize</tt> function should be called once before any other | 
|  | 278 | garbage collection functions are called.  This gives the garbage collector the | 
| Chris Lattner | 9b2a184 | 2004-05-27 05:52:10 +0000 | [diff] [blame^] | 279 | chance to initialize itself and allocate the heap spaces.  The initial heap size | 
|  | 280 | to allocate should be specified as an argument. | 
| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 281 | </p> | 
|  | 282 |  | 
|  | 283 | </div> | 
|  | 284 |  | 
|  | 285 | <!-- ======================================================================= --> | 
|  | 286 | <div class="doc_subsection"> | 
|  | 287 | <a name="explicit">Explicit invocation of the garbage collector</a> | 
|  | 288 | </div> | 
|  | 289 |  | 
|  | 290 | <div class="doc_text"> | 
|  | 291 |  | 
|  | 292 | <div class="doc_code"><tt> | 
|  | 293 | void %llvm_gc_collect() | 
|  | 294 | </tt></div> | 
|  | 295 |  | 
|  | 296 | <p> | 
|  | 297 | The <tt>llvm_gc_collect</tt> function is exported by the garbage collector | 
|  | 298 | implementations to provide a full collection, even when the heap is not | 
|  | 299 | exhausted.  This can be used by end-user code as a hint, and may be ignored by | 
|  | 300 | the garbage collector. | 
|  | 301 | </p> | 
|  | 302 |  | 
|  | 303 | </div> | 
|  | 304 |  | 
|  | 305 |  | 
|  | 306 | <!-- *********************************************************************** --> | 
|  | 307 | <div class="doc_section"> | 
|  | 308 | <a name="gcimpl">Implementing a garbage collector</a> | 
|  | 309 | </div> | 
|  | 310 | <!-- *********************************************************************** --> | 
|  | 311 |  | 
|  | 312 | <div class="doc_text"> | 
|  | 313 |  | 
|  | 314 | <p> | 
| Chris Lattner | 9b2a184 | 2004-05-27 05:52:10 +0000 | [diff] [blame^] | 315 | Implementing a garbage collector for LLVM is fairly straight-forward.  The LLVM | 
|  | 316 | garbage collectors are provided in a form that makes them easy to link into the | 
|  | 317 | language-specific runtime that a language front-end would use.  They require | 
|  | 318 | functionality from the language-specific runtime to get information about <a | 
|  | 319 | href="#gcdescriptors">where pointers are located in heap objects</a>. | 
|  | 320 | </p> | 
|  | 321 |  | 
|  | 322 | <p>The | 
| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 323 | implementation must include the <a | 
|  | 324 | href="#allocate"><tt>llvm_gc_allocate</tt></a> and <a | 
|  | 325 | href="#explicit"><tt>llvm_gc_collect</tt></a> functions, and it must implement | 
|  | 326 | the <a href="#llvm_gc_readwrite">read/write barrier</a> functions as well.  To | 
|  | 327 | do this, it will probably have to <a href="#traceroots">trace through the roots | 
|  | 328 | from the stack</a> and understand the <a href="#gcdescriptors">GC descriptors | 
|  | 329 | for heap objects</a>.  Luckily, there are some <a href="#gcimpls">example | 
|  | 330 | implementations</a> available. | 
|  | 331 | </p> | 
|  | 332 | </div> | 
|  | 333 |  | 
|  | 334 |  | 
|  | 335 | <!-- ======================================================================= --> | 
|  | 336 | <div class="doc_subsection"> | 
|  | 337 | <a name="llvm_gc_readwrite">Implementing <tt>llvm_gc_read</tt> and <tt>llvm_gc_write</tt></a> | 
|  | 338 | </div> | 
|  | 339 |  | 
|  | 340 | <div class="doc_text"> | 
|  | 341 | <div class="doc_code"><tt> | 
|  | 342 | void *llvm_gc_read(void **)<br> | 
|  | 343 | void llvm_gc_write(void*, void**) | 
|  | 344 | </tt></div> | 
|  | 345 |  | 
|  | 346 | <p> | 
|  | 347 | These functions <i>must</i> be implemented in every garbage collector, even if | 
|  | 348 | they do not need read/write barriers.  In this case, just load or store the | 
|  | 349 | pointer, then return. | 
|  | 350 | </p> | 
|  | 351 |  | 
|  | 352 | <p> | 
|  | 353 | If an actual read or write barrier is needed, it should be straight-forward to | 
|  | 354 | implement it.  Note that we may add a pointer to the start of the memory object | 
|  | 355 | as a parameter in the future, if needed. | 
|  | 356 | </p> | 
|  | 357 |  | 
|  | 358 | </div> | 
|  | 359 |  | 
|  | 360 | <!-- ======================================================================= --> | 
|  | 361 | <div class="doc_subsection"> | 
| Chris Lattner | 9b2a184 | 2004-05-27 05:52:10 +0000 | [diff] [blame^] | 362 | <a name="callbacks">Callback functions used to implement the garbage collector</a></li> | 
|  | 363 | </div> | 
|  | 364 |  | 
|  | 365 | Garbage collector implementations make use of call-back functions that are | 
|  | 366 | implemented by other parts of the LLVM system. | 
|  | 367 |  | 
|  | 368 | <!--_________________________________________________________________________--> | 
|  | 369 | <div class="doc_subsubsection"> | 
|  | 370 | <a name="traceroots">Tracing GC pointers from the program stack</a> | 
| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 371 | </div> | 
|  | 372 |  | 
|  | 373 | <div class="doc_text"> | 
|  | 374 | <div class="doc_code"><tt> | 
|  | 375 | void llvm_cg_walk_gcroots(void (*FP)(void **Root, void *Meta)); | 
|  | 376 | </tt></div> | 
|  | 377 |  | 
|  | 378 | <p> | 
|  | 379 | The <tt>llvm_cg_walk_gcroots</tt> function is a function provided by the code | 
|  | 380 | generator that iterates through all of the GC roots on the stack, calling the | 
|  | 381 | specified function pointer with each record.  For each GC root, the address of | 
|  | 382 | the pointer and the meta-data (from the <a | 
|  | 383 | href="#gcroot"><tt>llvm.gcroot</tt></a> intrinsic) are provided. | 
|  | 384 | </p> | 
|  | 385 | </div> | 
|  | 386 |  | 
| Chris Lattner | 9b2a184 | 2004-05-27 05:52:10 +0000 | [diff] [blame^] | 387 | <!--_________________________________________________________________________--> | 
|  | 388 | <div class="doc_subsubsection"> | 
|  | 389 | <a name="staticroots">Tracing GC pointers from static roots</a> | 
|  | 390 | </div> | 
| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 391 |  | 
| Chris Lattner | 9b2a184 | 2004-05-27 05:52:10 +0000 | [diff] [blame^] | 392 | <div class="doc_text"> | 
|  | 393 | TODO | 
|  | 394 | </div> | 
|  | 395 |  | 
|  | 396 |  | 
|  | 397 | <!--_________________________________________________________________________--> | 
|  | 398 | <div class="doc_subsubsection"> | 
|  | 399 | <a name="gcdescriptors">Tracing GC pointers from heap objects</a> | 
|  | 400 | </div> | 
|  | 401 |  | 
|  | 402 | <div class="doc_text"> | 
|  | 403 | <p> | 
|  | 404 | The three most common ways to keep track of where pointers live in heap objects | 
|  | 405 | are (listed in order of space overhead required):</p> | 
|  | 406 |  | 
|  | 407 | <ol> | 
|  | 408 | <li>In languages with polymorphic objects, pointers from an object header are | 
|  | 409 | usually used to identify the GC pointers in the heap object.  This is common for | 
|  | 410 | object-oriented languages like Self, Smalltalk, Java, or C#.</li> | 
|  | 411 |  | 
|  | 412 | <li>If heap objects are not polymorphic, often the "shape" of the heap can be | 
|  | 413 | determined from the roots of the heap or from some other meta-data [<a | 
|  | 414 | href="#appel89">Appel89</a>, <a href="#goldberg91">Goldberg91</a>, <a | 
|  | 415 | href="#tolmach94">Tolmach94</a>].  In this case, the garbage collector can | 
|  | 416 | propagate the information around from meta data stored with the roots.  This | 
|  | 417 | often eliminates the need to have a header on objects in the heap.  This is | 
|  | 418 | common in the ML family.</li> | 
|  | 419 |  | 
|  | 420 | <li>If all heap objects have pointers in the same locations, or pointers can be | 
|  | 421 | distinguished just by looking at them (e.g., the low order bit is clear), no | 
|  | 422 | book-keeping is needed at all.  This is common for Lisp-like languages.</li> | 
|  | 423 | </ol> | 
|  | 424 |  | 
|  | 425 | <p>The LLVM garbage collectors are capable of supporting all of these styles of | 
|  | 426 | language, including ones that mix various implementations.  To do this, it | 
|  | 427 | allows the source-language to associate meta-data with the <a | 
|  | 428 | href="#roots">stack roots</a>, and the heap tracing routines can propagate the | 
|  | 429 | information.  In addition, LLVM allows the front-end to extract GC information | 
|  | 430 | from in any form from a specific object pointer (this supports situations #1 and | 
|  | 431 | #3). | 
|  | 432 | </p> | 
|  | 433 |  | 
|  | 434 | <p><b>Making this efficient</b></p> | 
|  | 435 |  | 
|  | 436 |  | 
|  | 437 |  | 
|  | 438 | </div> | 
|  | 439 |  | 
|  | 440 |  | 
|  | 441 |  | 
|  | 442 | <!-- *********************************************************************** --> | 
|  | 443 | <div class="doc_section"> | 
| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 444 | <a name="gcimpls">GC implementations available</a> | 
|  | 445 | </div> | 
| Chris Lattner | 9b2a184 | 2004-05-27 05:52:10 +0000 | [diff] [blame^] | 446 | <!-- *********************************************************************** --> | 
| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 447 |  | 
|  | 448 | <div class="doc_text"> | 
|  | 449 |  | 
|  | 450 | <p> | 
|  | 451 | To make this more concrete, the currently implemented LLVM garbage collectors | 
| Chris Lattner | 9b2a184 | 2004-05-27 05:52:10 +0000 | [diff] [blame^] | 452 | all live in the <tt>llvm/runtime/GC/*</tt> directories in the LLVM source-base. | 
|  | 453 | If you are interested in implementing an algorithm, there are many interesting | 
|  | 454 | possibilities (mark/sweep, a generational collector, a reference counting | 
|  | 455 | collector, etc), or you could choose to improve one of the existing algorithms. | 
| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 456 | </p> | 
|  | 457 |  | 
|  | 458 | </div> | 
|  | 459 |  | 
| Chris Lattner | 9b2a184 | 2004-05-27 05:52:10 +0000 | [diff] [blame^] | 460 | <!-- ======================================================================= --> | 
|  | 461 | <div class="doc_subsection"> | 
|  | 462 | <a name="semispace">SemiSpace - A simple copying garbage collector</a></li> | 
|  | 463 | </div> | 
|  | 464 |  | 
|  | 465 | <div class="doc_text"> | 
|  | 466 | <p> | 
|  | 467 | SemiSpace is a very simple copying collector.  When it starts up, it allocates | 
|  | 468 | two blocks of memory for the heap.  It uses a simple bump-pointer allocator to | 
|  | 469 | allocate memory from the first block until it runs out of space.  When it runs | 
|  | 470 | out of space, it traces through all of the roots of the program, copying blocks | 
|  | 471 | to the other half of the memory space. | 
|  | 472 | </p> | 
|  | 473 |  | 
|  | 474 | </div> | 
|  | 475 |  | 
|  | 476 | <!--_________________________________________________________________________--> | 
|  | 477 | <div class="doc_subsubsection"> | 
|  | 478 | Possible Improvements | 
|  | 479 | </div> | 
|  | 480 |  | 
|  | 481 | <div class="doc_text"> | 
|  | 482 |  | 
|  | 483 | <p> | 
|  | 484 | If a collection cycle happens and the heap is not compacted very much (say less | 
|  | 485 | than 25% of the allocated memory was freed), the memory regions should be | 
|  | 486 | doubled in size.</p> | 
|  | 487 |  | 
|  | 488 | </div> | 
|  | 489 |  | 
|  | 490 | <!-- *********************************************************************** --> | 
|  | 491 | <div class="doc_section"> | 
|  | 492 | <a name="references">References</a> | 
|  | 493 | </div> | 
|  | 494 | <!-- *********************************************************************** --> | 
|  | 495 |  | 
|  | 496 | <div class="doc_text"> | 
|  | 497 |  | 
|  | 498 | <p><a name="appel89">[Appel89]</a> Runtime Tags Aren't Necessary. Andrew | 
|  | 499 | W. Appel. Lisp and Symbolic Computation 19(7):703-705, July 1989.</p> | 
|  | 500 |  | 
|  | 501 | <p><a name="goldberg91">[Goldberg91]</a> Tag-free garbage collection for | 
|  | 502 | strongly typed programming languages.  Benjamin Goldberg. ACM SIGPLAN | 
|  | 503 | PLDI'91.</p> | 
|  | 504 |  | 
|  | 505 | <p><a name="tolmach94">[Tolmach94]</a> Tag-free garbage collection using | 
|  | 506 | explicit type parameters.  Andrew Tolmach.  Proceedings of the 1994 ACM | 
|  | 507 | conference on LISP and functional programming.</p> | 
|  | 508 |  | 
|  | 509 | </div> | 
| Chris Lattner | 0d8c2db | 2004-05-23 21:02:20 +0000 | [diff] [blame] | 510 |  | 
|  | 511 | <!-- *********************************************************************** --> | 
|  | 512 |  | 
|  | 513 | <hr> | 
|  | 514 | <address> | 
|  | 515 | <a href="http://jigsaw.w3.org/css-validator/check/referer"><img | 
|  | 516 | src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> | 
|  | 517 | <a href="http://validator.w3.org/check/referer"><img | 
|  | 518 | src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> | 
|  | 519 |  | 
|  | 520 | <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> | 
|  | 521 | <a href="http://llvm.cs.uiuc.edu">LLVM Compiler Infrastructure</a><br> | 
|  | 522 | Last modified: $Date$ | 
|  | 523 | </address> | 
|  | 524 |  | 
|  | 525 | </body> | 
|  | 526 | </html> |