blob: df887f364fa9fbdfc53e2c4aecdd092211eba596 [file] [log] [blame]
Chris Lattner0d8c2db2004-05-23 21:02:20 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
3<html>
4<head>
5 <title>Accurate Garbage Collection with LLVM</title>
6 <link rel="stylesheet" href="llvm.css" type="text/css">
7</head>
8<body>
9
10<div class="doc_title">
11 Accurate Garbage Collection with LLVM
12</div>
13
14<ol>
15 <li><a href="#introduction">Introduction</a>
16 <ul>
17 <li><a href="#feature">GC features provided and algorithms supported</a></li>
18 </ul>
19 </li>
20
21 <li><a href="#interfaces">Interfaces for user programs</a>
22 <ul>
23 <li><a href="#roots">Identifying GC roots on the stack: <tt>llvm.gcroot</tt></a></li>
Chris Lattner0d8c2db2004-05-23 21:02:20 +000024 <li><a href="#allocate">Allocating memory from the GC</a></li>
25 <li><a href="#barriers">Reading and writing references to the heap</a></li>
26 <li><a href="#explicit">Explicit invocation of the garbage collector</a></li>
27 </ul>
28 </li>
29
30 <li><a href="#gcimpl">Implementing a garbage collector</a>
31 <ul>
32 <li><a href="#llvm_gc_readwrite">Implementing <tt>llvm_gc_read</tt> and <tt>llvm_gc_write</tt></a></li>
Chris Lattner9b2a1842004-05-27 05:52:10 +000033 <li><a href="#callbacks">Callback functions used to implement the garbage collector</a></li>
Chris Lattner0d8c2db2004-05-23 21:02:20 +000034 </ul>
35 </li>
Chris Lattner9b2a1842004-05-27 05:52:10 +000036 <li><a href="#gcimpls">GC implementations available</a>
37 <ul>
38 <li><a href="#semispace">SemiSpace - A simple copying garbage collector</a></li>
Chris Lattner0b02dbc2004-07-09 05:03:54 +000039 </ul>
Chris Lattner9b2a1842004-05-27 05:52:10 +000040 </li>
Chris Lattner0d8c2db2004-05-23 21:02:20 +000041
42<!--
43 <li><a href="#codegen">Implementing GC support in a code generator</a></li>
44-->
45</ol>
46
47<div class="doc_author">
48 <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
49</div>
50
51<!-- *********************************************************************** -->
52<div class="doc_section">
53 <a name="introduction">Introduction</a>
54</div>
55<!-- *********************************************************************** -->
56
57<div class="doc_text">
58
59<p>Garbage collection is a widely used technique that frees the programmer from
60having to know the life-times of heap objects, making software easier to produce
61and maintain. Many programming languages rely on garbage collection for
62automatic memory management. There are two primary forms of garbage collection:
63conservative and accurate.</p>
64
65<p>Conservative garbage collection often does not require any special support
66from either the language or the compiler: it can handle non-type-safe
67programming languages (such as C/C++) and does not require any special
Jeff Cohen65fc36b2007-04-18 17:26:14 +000068information from the compiler. The
69<a href="http://www.hpl.hp.com/personal/Hans_Boehm/gc/">Boehm collector</a> is
70an example of a state-of-the-art conservative collector.</p>
Chris Lattner0d8c2db2004-05-23 21:02:20 +000071
72<p>Accurate garbage collection requires the ability to identify all pointers in
73the program at run-time (which requires that the source-language be type-safe in
74most cases). Identifying pointers at run-time requires compiler support to
75locate all places that hold live pointer variables at run-time, including the
76<a href="#roots">processor stack and registers</a>.</p>
77
78<p>
79Conservative garbage collection is attractive because it does not require any
80special compiler support, but it does have problems. In particular, because the
81conservative garbage collector cannot <i>know</i> that a particular word in the
82machine is a pointer, it cannot move live objects in the heap (preventing the
83use of compacting and generational GC algorithms) and it can occasionally suffer
84from memory leaks due to integer values that happen to point to objects in the
85program. In addition, some aggressive compiler transformations can break
86conservative garbage collectors (though these seem rare in practice).
87</p>
88
89<p>
90Accurate garbage collectors do not suffer from any of these problems, but they
91can suffer from degraded scalar optimization of the program. In particular,
92because the runtime must be able to identify and update all pointers active in
93the program, some optimizations are less effective. In practice, however, the
94locality and performance benefits of using aggressive garbage allocation
95techniques dominates any low-level losses.
96</p>
97
98<p>
99This document describes the mechanisms and interfaces provided by LLVM to
100support accurate garbage collection.
101</p>
102
103</div>
104
105<!-- ======================================================================= -->
106<div class="doc_subsection">
107 <a name="feature">GC features provided and algorithms supported</a>
108</div>
109
110<div class="doc_text">
111
112<p>
113LLVM provides support for a broad class of garbage collection algorithms,
114including compacting semi-space collectors, mark-sweep collectors, generational
115collectors, and even reference counting implementations. It includes support
116for <a href="#barriers">read and write barriers</a>, and associating <a
117href="#roots">meta-data with stack objects</a> (used for tagless garbage
118collection). All LLVM code generators support garbage collection, including the
119C backend.
120</p>
121
122<p>
123We hope that the primitive support built into LLVM is sufficient to support a
124broad class of garbage collected languages, including Scheme, ML, scripting
125languages, Java, C#, etc. That said, the implemented garbage collectors may
126need to be extended to support language-specific features such as finalization,
127weak references, or other features. As these needs are identified and
128implemented, they should be added to this specification.
129</p>
130
131<p>
132LLVM does not currently support garbage collection of multi-threaded programs or
133GC-safe points other than function calls, but these will be added in the future
134as there is interest.
135</p>
136
137</div>
138
139<!-- *********************************************************************** -->
140<div class="doc_section">
141 <a name="interfaces">Interfaces for user programs</a>
142</div>
143<!-- *********************************************************************** -->
144
145<div class="doc_text">
146
147<p>This section describes the interfaces provided by LLVM and by the garbage
148collector run-time that should be used by user programs. As such, this is the
149interface that front-end authors should generate code for.
150</p>
151
152</div>
153
154<!-- ======================================================================= -->
155<div class="doc_subsection">
156 <a name="roots">Identifying GC roots on the stack: <tt>llvm.gcroot</tt></a>
157</div>
158
159<div class="doc_text">
160
161<div class="doc_code"><tt>
Chris Lattner1df4f752007-09-21 17:30:40 +0000162 void %llvm.gcroot(i8** %ptrloc, i8* %metadata)
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000163</tt></div>
164
165<p>
166The <tt>llvm.gcroot</tt> intrinsic is used to inform LLVM of a pointer variable
167on the stack. The first argument contains the address of the variable on the
168stack, and the second contains a pointer to metadata that should be associated
Chris Lattner36597a52007-09-12 17:53:10 +0000169with the pointer (which <b>must</b> be a constant or global value address).</p>
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000170
171<p>
172Consider the following fragment of Java code:
173</p>
174
175<pre>
176 {
177 Object X; // A null-initialized reference to an object
178 ...
179 }
180</pre>
181
182<p>
183This block (which may be located in the middle of a function or in a loop nest),
184could be compiled to this LLVM code:
185</p>
186
187<pre>
188Entry:
189 ;; In the entry block for the function, allocate the
190 ;; stack space for X, which is an LLVM pointer.
191 %X = alloca %Object*
192 ...
193
Chris Lattner36597a52007-09-12 17:53:10 +0000194 ;; Java null-initializes pointers.
195 store %Object* null, %Object** %X
196
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000197 ;; "CodeBlock" is the block corresponding to the start
Reid Spencer03d186a2004-05-25 08:45:31 +0000198 ;; of the scope above.
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000199CodeBlock:
200 ;; Initialize the object, telling LLVM that it is now live.
201 ;; Java has type-tags on objects, so it doesn't need any
202 ;; metadata.
Chris Lattner1df4f752007-09-21 17:30:40 +0000203 %tmp = bitcast %Object** %X to i8**
204 call void %llvm.gcroot(i8** %tmp, i8* null)
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000205 ...
206
207 ;; As the pointer goes out of scope, store a null value into
208 ;; it, to indicate that the value is no longer live.
209 store %Object* null, %Object** %X
210 ...
211</pre>
212
213</div>
214
215<!-- ======================================================================= -->
216<div class="doc_subsection">
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000217 <a name="allocate">Allocating memory from the GC</a>
218</div>
219
220<div class="doc_text">
221
222<div class="doc_code"><tt>
Chris Lattner1df4f752007-09-21 17:30:40 +0000223 void *%llvm_gc_allocate(unsigned %Size)
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000224</tt></div>
225
226<p>The <tt>llvm_gc_allocate</tt> function is a global function defined by the
Chris Lattneraab3aff2004-06-09 03:59:05 +0000227garbage collector implementation to allocate memory. It returns a
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000228zeroed-out block of memory of the appropriate size.</p>
229
230</div>
231
232<!-- ======================================================================= -->
233<div class="doc_subsection">
234 <a name="barriers">Reading and writing references to the heap</a>
235</div>
236
237<div class="doc_text">
238
239<div class="doc_code"><tt>
Chris Lattner1df4f752007-09-21 17:30:40 +0000240 i8 *%llvm.gcread(i8 *, i8 **)<br>
241 void %llvm.gcwrite(i8*, i8*, i8**)
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000242</tt></div>
243
244<p>Several of the more interesting garbage collectors (e.g., generational
245collectors) need to be informed when the mutator (the program that needs garbage
246collection) reads or writes object references into the heap. In the case of a
247generational collector, it needs to keep track of which "old" generation objects
248have references stored into them. The amount of code that typically needs to be
Chris Lattneraab3aff2004-06-09 03:59:05 +0000249executed is usually quite small (and not on the critical path of any
250computation), so the overall performance impact of the inserted code is
251tolerable.</p>
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000252
253<p>To support garbage collectors that use read or write barriers, LLVM provides
254the <tt>llvm.gcread</tt> and <tt>llvm.gcwrite</tt> intrinsics. The first
255intrinsic has exactly the same semantics as a non-volatile LLVM load and the
Chris Lattner728f03f2004-07-22 05:49:38 +0000256second has the same semantics as a non-volatile LLVM store, with the
257additions that they also take a pointer to the start of the memory
258object as an argument. At code generation
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000259time, these intrinsics are replaced with calls into the garbage collector
260(<tt><a href="#llvm_gc_readwrite">llvm_gc_read</a></tt> and <tt><a
261href="#llvm_gc_readwrite">llvm_gc_write</a></tt> respectively), which are then
262inlined into the code.
263</p>
264
265<p>
266If you are writing a front-end for a garbage collected language, every load or
267store of a reference from or to the heap should use these intrinsics instead of
268normal LLVM loads/stores.</p>
269
270</div>
271
272<!-- ======================================================================= -->
273<div class="doc_subsection">
274 <a name="initialize">Garbage collector startup and initialization</a>
275</div>
276
277<div class="doc_text">
278
279<div class="doc_code"><tt>
Chris Lattner9b2a1842004-05-27 05:52:10 +0000280 void %llvm_gc_initialize(unsigned %InitialHeapSize)
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000281</tt></div>
282
283<p>
284The <tt>llvm_gc_initialize</tt> function should be called once before any other
285garbage collection functions are called. This gives the garbage collector the
Chris Lattner9b2a1842004-05-27 05:52:10 +0000286chance to initialize itself and allocate the heap spaces. The initial heap size
287to allocate should be specified as an argument.
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000288</p>
289
290</div>
291
292<!-- ======================================================================= -->
293<div class="doc_subsection">
294 <a name="explicit">Explicit invocation of the garbage collector</a>
295</div>
296
297<div class="doc_text">
298
299<div class="doc_code"><tt>
300 void %llvm_gc_collect()
301</tt></div>
302
303<p>
304The <tt>llvm_gc_collect</tt> function is exported by the garbage collector
305implementations to provide a full collection, even when the heap is not
306exhausted. This can be used by end-user code as a hint, and may be ignored by
307the garbage collector.
308</p>
309
310</div>
311
312
313<!-- *********************************************************************** -->
314<div class="doc_section">
315 <a name="gcimpl">Implementing a garbage collector</a>
316</div>
317<!-- *********************************************************************** -->
318
319<div class="doc_text">
320
321<p>
Chris Lattner9b2a1842004-05-27 05:52:10 +0000322Implementing a garbage collector for LLVM is fairly straight-forward. The LLVM
323garbage collectors are provided in a form that makes them easy to link into the
324language-specific runtime that a language front-end would use. They require
325functionality from the language-specific runtime to get information about <a
326href="#gcdescriptors">where pointers are located in heap objects</a>.
327</p>
328
329<p>The
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000330implementation must include the <a
331href="#allocate"><tt>llvm_gc_allocate</tt></a> and <a
332href="#explicit"><tt>llvm_gc_collect</tt></a> functions, and it must implement
333the <a href="#llvm_gc_readwrite">read/write barrier</a> functions as well. To
334do this, it will probably have to <a href="#traceroots">trace through the roots
335from the stack</a> and understand the <a href="#gcdescriptors">GC descriptors
336for heap objects</a>. Luckily, there are some <a href="#gcimpls">example
337implementations</a> available.
338</p>
339</div>
340
341
342<!-- ======================================================================= -->
343<div class="doc_subsection">
344 <a name="llvm_gc_readwrite">Implementing <tt>llvm_gc_read</tt> and <tt>llvm_gc_write</tt></a>
345</div>
346
347<div class="doc_text">
348 <div class="doc_code"><tt>
Chris Lattner728f03f2004-07-22 05:49:38 +0000349 void *llvm_gc_read(void*, void **)<br>
350 void llvm_gc_write(void*, void *, void**)
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000351 </tt></div>
352
353<p>
354These functions <i>must</i> be implemented in every garbage collector, even if
355they do not need read/write barriers. In this case, just load or store the
356pointer, then return.
357</p>
358
359<p>
360If an actual read or write barrier is needed, it should be straight-forward to
Chris Lattner728f03f2004-07-22 05:49:38 +0000361implement it.
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000362</p>
363
364</div>
365
366<!-- ======================================================================= -->
367<div class="doc_subsection">
Chris Lattner0b02dbc2004-07-09 05:03:54 +0000368 <a name="callbacks">Callback functions used to implement the garbage collector</a>
Chris Lattner9b2a1842004-05-27 05:52:10 +0000369</div>
370
Chris Lattner0b02dbc2004-07-09 05:03:54 +0000371<div class="doc_text">
372<p>
Chris Lattner9b2a1842004-05-27 05:52:10 +0000373Garbage collector implementations make use of call-back functions that are
374implemented by other parts of the LLVM system.
Chris Lattner0b02dbc2004-07-09 05:03:54 +0000375</p>
376</div>
Chris Lattner9b2a1842004-05-27 05:52:10 +0000377
378<!--_________________________________________________________________________-->
379<div class="doc_subsubsection">
380 <a name="traceroots">Tracing GC pointers from the program stack</a>
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000381</div>
382
383<div class="doc_text">
384 <div class="doc_code"><tt>
385 void llvm_cg_walk_gcroots(void (*FP)(void **Root, void *Meta));
386 </tt></div>
387
388<p>
389The <tt>llvm_cg_walk_gcroots</tt> function is a function provided by the code
390generator that iterates through all of the GC roots on the stack, calling the
391specified function pointer with each record. For each GC root, the address of
392the pointer and the meta-data (from the <a
Duncan Sands8036ca42007-03-30 12:22:09 +0000393href="#roots"><tt>llvm.gcroot</tt></a> intrinsic) are provided.
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000394</p>
395</div>
396
Chris Lattner9b2a1842004-05-27 05:52:10 +0000397<!--_________________________________________________________________________-->
398<div class="doc_subsubsection">
399 <a name="staticroots">Tracing GC pointers from static roots</a>
400</div>
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000401
Chris Lattner9b2a1842004-05-27 05:52:10 +0000402<div class="doc_text">
403TODO
404</div>
405
406
407<!--_________________________________________________________________________-->
408<div class="doc_subsubsection">
409 <a name="gcdescriptors">Tracing GC pointers from heap objects</a>
410</div>
411
412<div class="doc_text">
413<p>
414The three most common ways to keep track of where pointers live in heap objects
415are (listed in order of space overhead required):</p>
416
417<ol>
418<li>In languages with polymorphic objects, pointers from an object header are
419usually used to identify the GC pointers in the heap object. This is common for
420object-oriented languages like Self, Smalltalk, Java, or C#.</li>
421
422<li>If heap objects are not polymorphic, often the "shape" of the heap can be
423determined from the roots of the heap or from some other meta-data [<a
424href="#appel89">Appel89</a>, <a href="#goldberg91">Goldberg91</a>, <a
425href="#tolmach94">Tolmach94</a>]. In this case, the garbage collector can
426propagate the information around from meta data stored with the roots. This
427often eliminates the need to have a header on objects in the heap. This is
428common in the ML family.</li>
429
430<li>If all heap objects have pointers in the same locations, or pointers can be
431distinguished just by looking at them (e.g., the low order bit is clear), no
432book-keeping is needed at all. This is common for Lisp-like languages.</li>
433</ol>
434
435<p>The LLVM garbage collectors are capable of supporting all of these styles of
436language, including ones that mix various implementations. To do this, it
437allows the source-language to associate meta-data with the <a
438href="#roots">stack roots</a>, and the heap tracing routines can propagate the
439information. In addition, LLVM allows the front-end to extract GC information
440from in any form from a specific object pointer (this supports situations #1 and
441#3).
442</p>
443
444<p><b>Making this efficient</b></p>
445
446
447
448</div>
449
450
451
452<!-- *********************************************************************** -->
453<div class="doc_section">
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000454 <a name="gcimpls">GC implementations available</a>
455</div>
Chris Lattner9b2a1842004-05-27 05:52:10 +0000456<!-- *********************************************************************** -->
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000457
458<div class="doc_text">
459
460<p>
461To make this more concrete, the currently implemented LLVM garbage collectors
Chris Lattner9b2a1842004-05-27 05:52:10 +0000462all live in the <tt>llvm/runtime/GC/*</tt> directories in the LLVM source-base.
463If you are interested in implementing an algorithm, there are many interesting
464possibilities (mark/sweep, a generational collector, a reference counting
465collector, etc), or you could choose to improve one of the existing algorithms.
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000466</p>
467
468</div>
469
Chris Lattner9b2a1842004-05-27 05:52:10 +0000470<!-- ======================================================================= -->
471<div class="doc_subsection">
Chris Lattner0b02dbc2004-07-09 05:03:54 +0000472 <a name="semispace">SemiSpace - A simple copying garbage collector</a>
Chris Lattner9b2a1842004-05-27 05:52:10 +0000473</div>
474
475<div class="doc_text">
476<p>
477SemiSpace is a very simple copying collector. When it starts up, it allocates
478two blocks of memory for the heap. It uses a simple bump-pointer allocator to
479allocate memory from the first block until it runs out of space. When it runs
480out of space, it traces through all of the roots of the program, copying blocks
481to the other half of the memory space.
482</p>
483
484</div>
485
486<!--_________________________________________________________________________-->
487<div class="doc_subsubsection">
488 Possible Improvements
489</div>
490
491<div class="doc_text">
492
493<p>
494If a collection cycle happens and the heap is not compacted very much (say less
495than 25% of the allocated memory was freed), the memory regions should be
496doubled in size.</p>
497
498</div>
499
500<!-- *********************************************************************** -->
501<div class="doc_section">
502 <a name="references">References</a>
503</div>
504<!-- *********************************************************************** -->
505
506<div class="doc_text">
507
508<p><a name="appel89">[Appel89]</a> Runtime Tags Aren't Necessary. Andrew
509W. Appel. Lisp and Symbolic Computation 19(7):703-705, July 1989.</p>
510
511<p><a name="goldberg91">[Goldberg91]</a> Tag-free garbage collection for
512strongly typed programming languages. Benjamin Goldberg. ACM SIGPLAN
513PLDI'91.</p>
514
515<p><a name="tolmach94">[Tolmach94]</a> Tag-free garbage collection using
516explicit type parameters. Andrew Tolmach. Proceedings of the 1994 ACM
517conference on LISP and functional programming.</p>
518
519</div>
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000520
521<!-- *********************************************************************** -->
522
523<hr>
524<address>
525 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
526 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
527 <a href="http://validator.w3.org/check/referer"><img
528 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
529
530 <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
Reid Spencer05fe4b02006-03-14 05:39:39 +0000531 <a href="http://llvm.org">LLVM Compiler Infrastructure</a><br>
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000532 Last modified: $Date$
533</address>
534
535</body>
536</html>