blob: 3263fc624b06fc818242b07cab803a52a02e6de8 [file] [log] [blame]
Chris Lattner0d8c2db2004-05-23 21:02:20 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
3<html>
4<head>
5 <title>Accurate Garbage Collection with LLVM</title>
6 <link rel="stylesheet" href="llvm.css" type="text/css">
7</head>
8<body>
9
10<div class="doc_title">
11 Accurate Garbage Collection with LLVM
12</div>
13
14<ol>
15 <li><a href="#introduction">Introduction</a>
16 <ul>
17 <li><a href="#feature">GC features provided and algorithms supported</a></li>
18 </ul>
19 </li>
20
21 <li><a href="#interfaces">Interfaces for user programs</a>
22 <ul>
23 <li><a href="#roots">Identifying GC roots on the stack: <tt>llvm.gcroot</tt></a></li>
Chris Lattner0d8c2db2004-05-23 21:02:20 +000024 <li><a href="#allocate">Allocating memory from the GC</a></li>
25 <li><a href="#barriers">Reading and writing references to the heap</a></li>
26 <li><a href="#explicit">Explicit invocation of the garbage collector</a></li>
27 </ul>
28 </li>
29
30 <li><a href="#gcimpl">Implementing a garbage collector</a>
31 <ul>
32 <li><a href="#llvm_gc_readwrite">Implementing <tt>llvm_gc_read</tt> and <tt>llvm_gc_write</tt></a></li>
Chris Lattner9b2a1842004-05-27 05:52:10 +000033 <li><a href="#callbacks">Callback functions used to implement the garbage collector</a></li>
Chris Lattner0d8c2db2004-05-23 21:02:20 +000034 </ul>
35 </li>
Chris Lattner9b2a1842004-05-27 05:52:10 +000036 <li><a href="#gcimpls">GC implementations available</a>
37 <ul>
38 <li><a href="#semispace">SemiSpace - A simple copying garbage collector</a></li>
39 </li>
Chris Lattner0d8c2db2004-05-23 21:02:20 +000040
41<!--
42 <li><a href="#codegen">Implementing GC support in a code generator</a></li>
43-->
44</ol>
45
46<div class="doc_author">
47 <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
48</div>
49
50<!-- *********************************************************************** -->
51<div class="doc_section">
52 <a name="introduction">Introduction</a>
53</div>
54<!-- *********************************************************************** -->
55
56<div class="doc_text">
57
58<p>Garbage collection is a widely used technique that frees the programmer from
59having to know the life-times of heap objects, making software easier to produce
60and maintain. Many programming languages rely on garbage collection for
61automatic memory management. There are two primary forms of garbage collection:
62conservative and accurate.</p>
63
64<p>Conservative garbage collection often does not require any special support
65from either the language or the compiler: it can handle non-type-safe
66programming languages (such as C/C++) and does not require any special
67information from the compiler. The [LINK] Boehm collector is an example of a
68state-of-the-art conservative collector.</p>
69
70<p>Accurate garbage collection requires the ability to identify all pointers in
71the program at run-time (which requires that the source-language be type-safe in
72most cases). Identifying pointers at run-time requires compiler support to
73locate all places that hold live pointer variables at run-time, including the
74<a href="#roots">processor stack and registers</a>.</p>
75
76<p>
77Conservative garbage collection is attractive because it does not require any
78special compiler support, but it does have problems. In particular, because the
79conservative garbage collector cannot <i>know</i> that a particular word in the
80machine is a pointer, it cannot move live objects in the heap (preventing the
81use of compacting and generational GC algorithms) and it can occasionally suffer
82from memory leaks due to integer values that happen to point to objects in the
83program. In addition, some aggressive compiler transformations can break
84conservative garbage collectors (though these seem rare in practice).
85</p>
86
87<p>
88Accurate garbage collectors do not suffer from any of these problems, but they
89can suffer from degraded scalar optimization of the program. In particular,
90because the runtime must be able to identify and update all pointers active in
91the program, some optimizations are less effective. In practice, however, the
92locality and performance benefits of using aggressive garbage allocation
93techniques dominates any low-level losses.
94</p>
95
96<p>
97This document describes the mechanisms and interfaces provided by LLVM to
98support accurate garbage collection.
99</p>
100
101</div>
102
103<!-- ======================================================================= -->
104<div class="doc_subsection">
105 <a name="feature">GC features provided and algorithms supported</a>
106</div>
107
108<div class="doc_text">
109
110<p>
111LLVM provides support for a broad class of garbage collection algorithms,
112including compacting semi-space collectors, mark-sweep collectors, generational
113collectors, and even reference counting implementations. It includes support
114for <a href="#barriers">read and write barriers</a>, and associating <a
115href="#roots">meta-data with stack objects</a> (used for tagless garbage
116collection). All LLVM code generators support garbage collection, including the
117C backend.
118</p>
119
120<p>
121We hope that the primitive support built into LLVM is sufficient to support a
122broad class of garbage collected languages, including Scheme, ML, scripting
123languages, Java, C#, etc. That said, the implemented garbage collectors may
124need to be extended to support language-specific features such as finalization,
125weak references, or other features. As these needs are identified and
126implemented, they should be added to this specification.
127</p>
128
129<p>
130LLVM does not currently support garbage collection of multi-threaded programs or
131GC-safe points other than function calls, but these will be added in the future
132as there is interest.
133</p>
134
135</div>
136
137<!-- *********************************************************************** -->
138<div class="doc_section">
139 <a name="interfaces">Interfaces for user programs</a>
140</div>
141<!-- *********************************************************************** -->
142
143<div class="doc_text">
144
145<p>This section describes the interfaces provided by LLVM and by the garbage
146collector run-time that should be used by user programs. As such, this is the
147interface that front-end authors should generate code for.
148</p>
149
150</div>
151
152<!-- ======================================================================= -->
153<div class="doc_subsection">
154 <a name="roots">Identifying GC roots on the stack: <tt>llvm.gcroot</tt></a>
155</div>
156
157<div class="doc_text">
158
159<div class="doc_code"><tt>
160 void %llvm.gcroot(&lt;ty&gt;** %ptrloc, &lt;ty2&gt;* %metadata)
161</tt></div>
162
163<p>
164The <tt>llvm.gcroot</tt> intrinsic is used to inform LLVM of a pointer variable
165on the stack. The first argument contains the address of the variable on the
166stack, and the second contains a pointer to metadata that should be associated
167with the pointer (which <b>must</b> be a constant or global value address). At
168runtime, the <tt>llvm.gcroot</tt> intrinsic stores a null pointer into the
169specified location to initialize the pointer.</p>
170
171<p>
172Consider the following fragment of Java code:
173</p>
174
175<pre>
176 {
177 Object X; // A null-initialized reference to an object
178 ...
179 }
180</pre>
181
182<p>
183This block (which may be located in the middle of a function or in a loop nest),
184could be compiled to this LLVM code:
185</p>
186
187<pre>
188Entry:
189 ;; In the entry block for the function, allocate the
190 ;; stack space for X, which is an LLVM pointer.
191 %X = alloca %Object*
192 ...
193
194 ;; "CodeBlock" is the block corresponding to the start
Reid Spencer03d186a2004-05-25 08:45:31 +0000195 ;; of the scope above.
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000196CodeBlock:
197 ;; Initialize the object, telling LLVM that it is now live.
198 ;; Java has type-tags on objects, so it doesn't need any
199 ;; metadata.
200 call void %llvm.gcroot(%Object** %X, sbyte* null)
201 ...
202
203 ;; As the pointer goes out of scope, store a null value into
204 ;; it, to indicate that the value is no longer live.
205 store %Object* null, %Object** %X
206 ...
207</pre>
208
209</div>
210
211<!-- ======================================================================= -->
212<div class="doc_subsection">
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000213 <a name="allocate">Allocating memory from the GC</a>
214</div>
215
216<div class="doc_text">
217
218<div class="doc_code"><tt>
219 sbyte *%llvm_gc_allocate(unsigned %Size)
220</tt></div>
221
222<p>The <tt>llvm_gc_allocate</tt> function is a global function defined by the
Chris Lattneraab3aff2004-06-09 03:59:05 +0000223garbage collector implementation to allocate memory. It returns a
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000224zeroed-out block of memory of the appropriate size.</p>
225
226</div>
227
228<!-- ======================================================================= -->
229<div class="doc_subsection">
230 <a name="barriers">Reading and writing references to the heap</a>
231</div>
232
233<div class="doc_text">
234
235<div class="doc_code"><tt>
236 sbyte *%llvm.gcread(sbyte **)<br>
237 void %llvm.gcwrite(sbyte*, sbyte**)
238</tt></div>
239
240<p>Several of the more interesting garbage collectors (e.g., generational
241collectors) need to be informed when the mutator (the program that needs garbage
242collection) reads or writes object references into the heap. In the case of a
243generational collector, it needs to keep track of which "old" generation objects
244have references stored into them. The amount of code that typically needs to be
Chris Lattneraab3aff2004-06-09 03:59:05 +0000245executed is usually quite small (and not on the critical path of any
246computation), so the overall performance impact of the inserted code is
247tolerable.</p>
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000248
249<p>To support garbage collectors that use read or write barriers, LLVM provides
250the <tt>llvm.gcread</tt> and <tt>llvm.gcwrite</tt> intrinsics. The first
251intrinsic has exactly the same semantics as a non-volatile LLVM load and the
252second has the same semantics as a non-volatile LLVM store. At code generation
253time, these intrinsics are replaced with calls into the garbage collector
254(<tt><a href="#llvm_gc_readwrite">llvm_gc_read</a></tt> and <tt><a
255href="#llvm_gc_readwrite">llvm_gc_write</a></tt> respectively), which are then
256inlined into the code.
257</p>
258
259<p>
260If you are writing a front-end for a garbage collected language, every load or
261store of a reference from or to the heap should use these intrinsics instead of
262normal LLVM loads/stores.</p>
263
264</div>
265
266<!-- ======================================================================= -->
267<div class="doc_subsection">
268 <a name="initialize">Garbage collector startup and initialization</a>
269</div>
270
271<div class="doc_text">
272
273<div class="doc_code"><tt>
Chris Lattner9b2a1842004-05-27 05:52:10 +0000274 void %llvm_gc_initialize(unsigned %InitialHeapSize)
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000275</tt></div>
276
277<p>
278The <tt>llvm_gc_initialize</tt> function should be called once before any other
279garbage collection functions are called. This gives the garbage collector the
Chris Lattner9b2a1842004-05-27 05:52:10 +0000280chance to initialize itself and allocate the heap spaces. The initial heap size
281to allocate should be specified as an argument.
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000282</p>
283
284</div>
285
286<!-- ======================================================================= -->
287<div class="doc_subsection">
288 <a name="explicit">Explicit invocation of the garbage collector</a>
289</div>
290
291<div class="doc_text">
292
293<div class="doc_code"><tt>
294 void %llvm_gc_collect()
295</tt></div>
296
297<p>
298The <tt>llvm_gc_collect</tt> function is exported by the garbage collector
299implementations to provide a full collection, even when the heap is not
300exhausted. This can be used by end-user code as a hint, and may be ignored by
301the garbage collector.
302</p>
303
304</div>
305
306
307<!-- *********************************************************************** -->
308<div class="doc_section">
309 <a name="gcimpl">Implementing a garbage collector</a>
310</div>
311<!-- *********************************************************************** -->
312
313<div class="doc_text">
314
315<p>
Chris Lattner9b2a1842004-05-27 05:52:10 +0000316Implementing a garbage collector for LLVM is fairly straight-forward. The LLVM
317garbage collectors are provided in a form that makes them easy to link into the
318language-specific runtime that a language front-end would use. They require
319functionality from the language-specific runtime to get information about <a
320href="#gcdescriptors">where pointers are located in heap objects</a>.
321</p>
322
323<p>The
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000324implementation must include the <a
325href="#allocate"><tt>llvm_gc_allocate</tt></a> and <a
326href="#explicit"><tt>llvm_gc_collect</tt></a> functions, and it must implement
327the <a href="#llvm_gc_readwrite">read/write barrier</a> functions as well. To
328do this, it will probably have to <a href="#traceroots">trace through the roots
329from the stack</a> and understand the <a href="#gcdescriptors">GC descriptors
330for heap objects</a>. Luckily, there are some <a href="#gcimpls">example
331implementations</a> available.
332</p>
333</div>
334
335
336<!-- ======================================================================= -->
337<div class="doc_subsection">
338 <a name="llvm_gc_readwrite">Implementing <tt>llvm_gc_read</tt> and <tt>llvm_gc_write</tt></a>
339</div>
340
341<div class="doc_text">
342 <div class="doc_code"><tt>
343 void *llvm_gc_read(void **)<br>
344 void llvm_gc_write(void*, void**)
345 </tt></div>
346
347<p>
348These functions <i>must</i> be implemented in every garbage collector, even if
349they do not need read/write barriers. In this case, just load or store the
350pointer, then return.
351</p>
352
353<p>
354If an actual read or write barrier is needed, it should be straight-forward to
355implement it. Note that we may add a pointer to the start of the memory object
356as a parameter in the future, if needed.
357</p>
358
359</div>
360
361<!-- ======================================================================= -->
362<div class="doc_subsection">
Chris Lattner9b2a1842004-05-27 05:52:10 +0000363 <a name="callbacks">Callback functions used to implement the garbage collector</a></li>
364</div>
365
366Garbage collector implementations make use of call-back functions that are
367implemented by other parts of the LLVM system.
368
369<!--_________________________________________________________________________-->
370<div class="doc_subsubsection">
371 <a name="traceroots">Tracing GC pointers from the program stack</a>
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000372</div>
373
374<div class="doc_text">
375 <div class="doc_code"><tt>
376 void llvm_cg_walk_gcroots(void (*FP)(void **Root, void *Meta));
377 </tt></div>
378
379<p>
380The <tt>llvm_cg_walk_gcroots</tt> function is a function provided by the code
381generator that iterates through all of the GC roots on the stack, calling the
382specified function pointer with each record. For each GC root, the address of
383the pointer and the meta-data (from the <a
384href="#gcroot"><tt>llvm.gcroot</tt></a> intrinsic) are provided.
385</p>
386</div>
387
Chris Lattner9b2a1842004-05-27 05:52:10 +0000388<!--_________________________________________________________________________-->
389<div class="doc_subsubsection">
390 <a name="staticroots">Tracing GC pointers from static roots</a>
391</div>
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000392
Chris Lattner9b2a1842004-05-27 05:52:10 +0000393<div class="doc_text">
394TODO
395</div>
396
397
398<!--_________________________________________________________________________-->
399<div class="doc_subsubsection">
400 <a name="gcdescriptors">Tracing GC pointers from heap objects</a>
401</div>
402
403<div class="doc_text">
404<p>
405The three most common ways to keep track of where pointers live in heap objects
406are (listed in order of space overhead required):</p>
407
408<ol>
409<li>In languages with polymorphic objects, pointers from an object header are
410usually used to identify the GC pointers in the heap object. This is common for
411object-oriented languages like Self, Smalltalk, Java, or C#.</li>
412
413<li>If heap objects are not polymorphic, often the "shape" of the heap can be
414determined from the roots of the heap or from some other meta-data [<a
415href="#appel89">Appel89</a>, <a href="#goldberg91">Goldberg91</a>, <a
416href="#tolmach94">Tolmach94</a>]. In this case, the garbage collector can
417propagate the information around from meta data stored with the roots. This
418often eliminates the need to have a header on objects in the heap. This is
419common in the ML family.</li>
420
421<li>If all heap objects have pointers in the same locations, or pointers can be
422distinguished just by looking at them (e.g., the low order bit is clear), no
423book-keeping is needed at all. This is common for Lisp-like languages.</li>
424</ol>
425
426<p>The LLVM garbage collectors are capable of supporting all of these styles of
427language, including ones that mix various implementations. To do this, it
428allows the source-language to associate meta-data with the <a
429href="#roots">stack roots</a>, and the heap tracing routines can propagate the
430information. In addition, LLVM allows the front-end to extract GC information
431from in any form from a specific object pointer (this supports situations #1 and
432#3).
433</p>
434
435<p><b>Making this efficient</b></p>
436
437
438
439</div>
440
441
442
443<!-- *********************************************************************** -->
444<div class="doc_section">
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000445 <a name="gcimpls">GC implementations available</a>
446</div>
Chris Lattner9b2a1842004-05-27 05:52:10 +0000447<!-- *********************************************************************** -->
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000448
449<div class="doc_text">
450
451<p>
452To make this more concrete, the currently implemented LLVM garbage collectors
Chris Lattner9b2a1842004-05-27 05:52:10 +0000453all live in the <tt>llvm/runtime/GC/*</tt> directories in the LLVM source-base.
454If you are interested in implementing an algorithm, there are many interesting
455possibilities (mark/sweep, a generational collector, a reference counting
456collector, etc), or you could choose to improve one of the existing algorithms.
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000457</p>
458
459</div>
460
Chris Lattner9b2a1842004-05-27 05:52:10 +0000461<!-- ======================================================================= -->
462<div class="doc_subsection">
463 <a name="semispace">SemiSpace - A simple copying garbage collector</a></li>
464</div>
465
466<div class="doc_text">
467<p>
468SemiSpace is a very simple copying collector. When it starts up, it allocates
469two blocks of memory for the heap. It uses a simple bump-pointer allocator to
470allocate memory from the first block until it runs out of space. When it runs
471out of space, it traces through all of the roots of the program, copying blocks
472to the other half of the memory space.
473</p>
474
475</div>
476
477<!--_________________________________________________________________________-->
478<div class="doc_subsubsection">
479 Possible Improvements
480</div>
481
482<div class="doc_text">
483
484<p>
485If a collection cycle happens and the heap is not compacted very much (say less
486than 25% of the allocated memory was freed), the memory regions should be
487doubled in size.</p>
488
489</div>
490
491<!-- *********************************************************************** -->
492<div class="doc_section">
493 <a name="references">References</a>
494</div>
495<!-- *********************************************************************** -->
496
497<div class="doc_text">
498
499<p><a name="appel89">[Appel89]</a> Runtime Tags Aren't Necessary. Andrew
500W. Appel. Lisp and Symbolic Computation 19(7):703-705, July 1989.</p>
501
502<p><a name="goldberg91">[Goldberg91]</a> Tag-free garbage collection for
503strongly typed programming languages. Benjamin Goldberg. ACM SIGPLAN
504PLDI'91.</p>
505
506<p><a name="tolmach94">[Tolmach94]</a> Tag-free garbage collection using
507explicit type parameters. Andrew Tolmach. Proceedings of the 1994 ACM
508conference on LISP and functional programming.</p>
509
510</div>
Chris Lattner0d8c2db2004-05-23 21:02:20 +0000511
512<!-- *********************************************************************** -->
513
514<hr>
515<address>
516 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
517 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
518 <a href="http://validator.w3.org/check/referer"><img
519 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
520
521 <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
522 <a href="http://llvm.cs.uiuc.edu">LLVM Compiler Infrastructure</a><br>
523 Last modified: $Date$
524</address>
525
526</body>
527</html>