blob: 72afd5b4eb92cc0493660b6738cf16efcaf8ef01 [file] [log] [blame]
Reid Spencer50026612004-05-22 02:28:36 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
3<html>
4<head>
5 <title>LLVM Bytecode File Format</title>
6 <link rel="stylesheet" href="llvm.css" type="text/css">
Reid Spencer1ab929c2004-07-05 08:18:07 +00007 <style type="text/css">
Reid Spencer2cc36152004-07-05 19:04:27 +00008 TR, TD { border: 2px solid gray; padding-left: 4pt; padding-right: 4pt; padding-top: 2pt; padding-bottom: 2pt; }
Reid Spencer1ab929c2004-07-05 08:18:07 +00009 TH { border: 2px solid gray; font-weight: bold; font-size: 105%; }
Reid Spencer2cc36152004-07-05 19:04:27 +000010 TABLE { text-align: center; border: 2px solid black;
Reid Spencer1ab929c2004-07-05 08:18:07 +000011 border-collapse: collapse; margin-top: 1em; margin-left: 1em; margin-right: 1em; margin-bottom: 1em; }
Reid Spencer2cc36152004-07-05 19:04:27 +000012 .td_left { border: 2px solid gray; text-align: left; }
Reid Spencer50026612004-05-22 02:28:36 +000013 </style>
14</head>
15<body>
16 <div class="doc_title"> LLVM Bytecode File Format </div>
17<ol>
18 <li><a href="#abstract">Abstract</a></li>
Reid Spencer1ab929c2004-07-05 08:18:07 +000019 <li><a href="#concepts">Concepts</a>
Reid Spencer50026612004-05-22 02:28:36 +000020 <ol>
21 <li><a href="#blocks">Blocks</a></li>
22 <li><a href="#lists">Lists</a></li>
23 <li><a href="#fields">Fields</a></li>
24 <li><a href="#align">Alignment</a></li>
Reid Spencer82c46712004-07-07 13:34:26 +000025 <li><a href="#vbr">Variable Bit-Rate Encoding</a></li>
Reid Spencer1ab929c2004-07-05 08:18:07 +000026 <li><a href="#encoding">Encoding Primitives</a></li>
27 <li><a href="#slots">Slots</a></li>
28 </ol>
29 </li>
Reid Spencer51f31e02004-07-05 22:28:02 +000030 <li><a href="#general">General Structure</a> </li>
31 <li><a href="#blockdefs">Block Definitions</a>
Reid Spencer1ab929c2004-07-05 08:18:07 +000032 <ol>
Reid Spencerb39021b2004-05-23 17:05:09 +000033 <li><a href="#signature">Signature Block</a></li>
34 <li><a href="#module">Module Block</a></li>
Reid Spencer1ab929c2004-07-05 08:18:07 +000035 <li><a href="#globaltypes">Global Type Pool</a></li>
36 <li><a href="#globalinfo">Module Info Block</a></li>
37 <li><a href="#constantpool">Global Constant Pool</a></li>
38 <li><a href="#functiondefs">Function Definition</a></li>
39 <li><a href="#compactiontable">Compaction Table</a></li>
40 <li><a href="#instructionlist">Instruction List</a></li>
41 <li><a href="#symtab">Symbol Table</a></li>
Reid Spencer50026612004-05-22 02:28:36 +000042 </ol>
43 </li>
Reid Spencer7c76d332004-06-08 07:41:41 +000044 <li><a href="#versiondiffs">Version Differences</a>
45 <ol>
46 <li><a href="#vers12">Version 1.2 Differences From 1.3</a></li>
47 <li><a href="#vers11">Version 1.1 Differences From 1.2</a></li>
48 <li><a href="#vers10">Version 1.0 Differences From 1.1</a></li>
49 </ol>
50 </li>
Reid Spencer50026612004-05-22 02:28:36 +000051</ol>
Chris Lattner8dabb502004-05-25 17:44:58 +000052<div class="doc_author">
53<p>Written by <a href="mailto:rspencer@x10sys.com">Reid Spencer</a>
54</p>
Reid Spencer50026612004-05-22 02:28:36 +000055</div>
Reid Spencer1ab929c2004-07-05 08:18:07 +000056
Reid Spencer50026612004-05-22 02:28:36 +000057<!-- *********************************************************************** -->
58<div class="doc_section"> <a name="abstract">Abstract </a></div>
59<!-- *********************************************************************** -->
60<div class="doc_text">
Reid Spencer82c46712004-07-07 13:34:26 +000061 <p>This document describes the LLVM bytecode file format. It specifies the
62 binary encoding rules of the bytecode file format so that equivalent systems
63 can encode bytecode files correctly. The LLVM bytecode representation is
64 used to store the intermediate representation on disk in compacted form.</p>
65 <p>The LLVM bytecode format may change in the future, but LLVM will always be
66 backwards compatible with older formats. This document will only describe
67 the most current version of the bytecode format. See
68 <a href="#versiondiffs">Version Differences</a> for the details on how the
69 current version is different from previous versions.</p>
Reid Spencer50026612004-05-22 02:28:36 +000070</p>
71</div>
Reid Spencer1ab929c2004-07-05 08:18:07 +000072
Reid Spencer50026612004-05-22 02:28:36 +000073<!-- *********************************************************************** -->
Reid Spencer1ab929c2004-07-05 08:18:07 +000074<div class="doc_section"> <a name="concepts">Concepts</a> </div>
Reid Spencer50026612004-05-22 02:28:36 +000075<!-- *********************************************************************** -->
76<div class="doc_text">
Reid Spencer82c46712004-07-07 13:34:26 +000077 <p>This section describes the general concepts of the bytecode file format
78 without getting into specific layout details. It is recommended that you read
79 this section thoroughly before interpreting the detailed descriptions.</p>
Reid Spencer50026612004-05-22 02:28:36 +000080</div>
Reid Spencer1ab929c2004-07-05 08:18:07 +000081
Reid Spencer50026612004-05-22 02:28:36 +000082<!-- _______________________________________________________________________ -->
83<div class="doc_subsection"><a name="blocks">Blocks</a> </div>
84<div class="doc_text">
Reid Spencer82c46712004-07-07 13:34:26 +000085 <p>LLVM bytecode files consist simply of a sequence of blocks of bytes using
86 a binary encoding Each block begins with an header of two unsigned integers.
87 The first value identifies the type of block and the second value provides
88 the size of the block in bytes. The block identifier is used because it is
89 possible for entire blocks to be omitted from the file if they are empty.
90 The block identifier helps the reader determine which kind of block is next
91 in the file. Note that blocks can be nested within other blocks.</p>
92 <p> All blocks are variable length, and the block header specifies the size
93 of the block. All blocks begin on a byte index that is aligned to an even
94 32-bit boundary. That is, the first block is 32-bit aligned because it
95 starts at offset 0. Each block is padded with zero fill bytes to ensure that
96 the next block also starts on a 32-bit boundary.</p>
Reid Spencer50026612004-05-22 02:28:36 +000097</div>
Reid Spencer1ab929c2004-07-05 08:18:07 +000098
Reid Spencer50026612004-05-22 02:28:36 +000099<!-- _______________________________________________________________________ -->
100<div class="doc_subsection"><a name="lists">Lists</a> </div>
101<div class="doc_text">
Reid Spencer1ab929c2004-07-05 08:18:07 +0000102 <p>LLVM Bytecode blocks often contain lists of things of a similar type. For
103 example, a function contains a list of instructions and a function type
104 contains a list of argument types. There are two basic types of lists:
Reid Spencer82c46712004-07-07 13:34:26 +0000105 length lists (<a href="#llist">llist</a>), and null terminated lists
106 (<a href="#zlist">zlist</a>), as described below in the
107 <a href="#encoding">Encoding Primitives</a>.</p>
Reid Spencer50026612004-05-22 02:28:36 +0000108</div>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000109
Reid Spencer50026612004-05-22 02:28:36 +0000110<!-- _______________________________________________________________________ -->
111<div class="doc_subsection"><a name="fields">Fields</a> </div>
112<div class="doc_text">
113<p>Fields are units of information that LLVM knows how to write atomically.
114Most fields have a uniform length or some kind of length indication built into
Chris Lattner2b905652004-05-24 05:35:17 +0000115their encoding. For example, a constant string (array of bytes) is
Reid Spencer50026612004-05-22 02:28:36 +0000116written simply as the length followed by the characters. Although this is
117similar to a list, constant strings are treated atomically and are thus
118fields.</p>
119<p>Fields use a condensed bit format specific to the type of information
120they must contain. As few bits as possible are written for each field. The
121sections that follow will provide the details on how these fields are
122written and how the bits are to be interpreted.</p>
123</div>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000124
Reid Spencer50026612004-05-22 02:28:36 +0000125<!-- _______________________________________________________________________ -->
Reid Spencer1ab929c2004-07-05 08:18:07 +0000126<div class="doc_subsection"><a name="align">Alignment</a> </div>
Reid Spencer7aa940d2004-05-25 15:47:57 +0000127<div class="doc_text">
Reid Spencer1ab929c2004-07-05 08:18:07 +0000128 <p>To support cross-platform differences, the bytecode file is aligned on
129 certain boundaries. This means that a small amount of padding (at most 3
130 bytes) will be added to ensure that the next entry is aligned to a 32-bit
131 boundary.</p>
Chris Lattner8dabb502004-05-25 17:44:58 +0000132</div>
133
Reid Spencer7aa940d2004-05-25 15:47:57 +0000134<!-- _______________________________________________________________________ -->
Reid Spencer82c46712004-07-07 13:34:26 +0000135<div class="doc_subsection"><a name="vbr">Variable Bit-Rate Encoding</a> </div>
Reid Spencerb39021b2004-05-23 17:05:09 +0000136<div class="doc_text">
Chris Lattner2b905652004-05-24 05:35:17 +0000137<p>Most of the values written to LLVM bytecode files are small integers. To
138minimize the number of bytes written for these quantities, an encoding
Reid Spencerb39021b2004-05-23 17:05:09 +0000139scheme similar to UTF-8 is used to write integer data. The scheme is known as
140variable bit rate (vbr) encoding. In this encoding, the high bit of each
141byte is used to indicate if more bytes follow. If (byte &amp; 0x80) is non-zero
142in any given byte, it means there is another byte immediately following that
143also contributes to the value. For the final byte (byte &amp; 0x80) is false
144(the high bit is not set). In each byte only the low seven bits contribute to
145the value. Consequently 32-bit quantities can take from one to <em>five</em>
146bytes to encode. In general, smaller quantities will encode in fewer bytes,
147as follows:</p>
Reid Spencer2cc36152004-07-05 19:04:27 +0000148<table>
Reid Spencerb39021b2004-05-23 17:05:09 +0000149 <tr>
150 <th>Byte #</th>
151 <th>Significant Bits</th>
152 <th>Maximum Value</th>
153 </tr>
154 <tr><td>1</td><td>0-6</td><td>127</td></tr>
155 <tr><td>2</td><td>7-13</td><td>16,383</td></tr>
156 <tr><td>3</td><td>14-20</td><td>2,097,151</td></tr>
157 <tr><td>4</td><td>21-27</td><td>268,435,455</td></tr>
158 <tr><td>5</td><td>28-34</td><td>34,359,738,367</td></tr>
159 <tr><td>6</td><td>35-41</td><td>4,398,046,511,103</td></tr>
160 <tr><td>7</td><td>42-48</td><td>562,949,953,421,311</td></tr>
161 <tr><td>8</td><td>49-55</td><td>72,057,594,037,927,935</td></tr>
162 <tr><td>9</td><td>56-62</td><td>9,223,372,036,854,775,807</td></tr>
163 <tr><td>10</td><td>63-69</td><td>1,180,591,620,717,411,303,423</td></tr>
164</table>
Chris Lattner2b905652004-05-24 05:35:17 +0000165<p>Note that in practice, the tenth byte could only encode bit 63
Reid Spencerb39021b2004-05-23 17:05:09 +0000166since the maximum quantity to use this encoding is a 64-bit integer.</p>
Chris Lattner2b905652004-05-24 05:35:17 +0000167
168<p><em>Signed</em> VBR values are encoded with the standard vbr encoding, but
169with the sign bit as the low order bit instead of the high order bit. This
170allows small negative quantities to be encoded efficiently. For example, -3
171is encoded as "((3 &lt;&lt; 1) | 1)" and 3 is encoded as "(3 &lt;&lt; 1) |
1720)", emitted with the standard vbr encoding above.</p>
Reid Spencer82c46712004-07-07 13:34:26 +0000173</div>
Chris Lattner2b905652004-05-24 05:35:17 +0000174
Reid Spencer82c46712004-07-07 13:34:26 +0000175<!-- _______________________________________________________________________ -->
176<div class="doc_subsection"><a name="encoding">Encoding Primitives</a> </div>
177<div class="doc_text">
178 <p>Each field in the bytecode format is encoded into the file using a small
179 set of primitive formats. The table below defines the encoding rules for the
180 various primitives used and gives them each a type name. The type names used
181 in the descriptions of blocks and fields in the <a href="#details">Detailed
182 Layout</a>next section. Any type name with the suffix <em>_vbr</em> indicates
183 a quantity that is encoded using variable bit rate encoding as described
184 above.</p>
Reid Spencerb39021b2004-05-23 17:05:09 +0000185<table class="doc_table" >
186 <tr>
187 <th><b>Type</b></th>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000188 <th class="td_left"><b>Rule</b></th>
Reid Spencerb39021b2004-05-23 17:05:09 +0000189 </tr>
190 <tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000191 <td><a name="unsigned"><b>unsigned</b></a></td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000192 <td class="td_left">A 32-bit unsigned integer that always occupies four
Reid Spencerb39021b2004-05-23 17:05:09 +0000193 consecutive bytes. The unsigned integer is encoded using LSB first
194 ordering. That is bits 2<sup>0</sup> through 2<sup>7</sup> are in the
195 byte with the lowest file offset (little endian).</td>
196 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000197 <td><a name="uint32_vbr"><b>uint32_vbr</b></a></td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000198 <td class="td_left">A 32-bit unsigned integer that occupies from one to five
Reid Spencerb39021b2004-05-23 17:05:09 +0000199 bytes using variable bit rate encoding.</td>
200 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000201 <td><a name="uint64_vbr"><b>uint64_vbr</b></a></td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000202 <td class="td_left">A 64-bit unsigned integer that occupies from one to ten
Reid Spencerb39021b2004-05-23 17:05:09 +0000203 bytes using variable bit rate encoding.</td>
204 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000205 <td><a name="int64_vbr"><b>int64_vbr</b></a></td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000206 <td class="td_left">A 64-bit signed integer that occupies from one to ten
Chris Lattner2b905652004-05-24 05:35:17 +0000207 bytes using the signed variable bit rate encoding.</td>
Reid Spencerb39021b2004-05-23 17:05:09 +0000208 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000209 <td><a name="char"><b>char</b></a></td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000210 <td class="td_left">A single unsigned character encoded into one byte</td>
Reid Spencerb39021b2004-05-23 17:05:09 +0000211 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000212 <td><a name="bit"><b>bit(n-m)</b></a></td>
213 <td class="td_left">A set of bit within some larger integer field. The
214 values of <code>n</code> and <code>m</code> specify the inclusive range
215 of bits that define the subfield. The value for <code>m</code> may be
216 omitted if its the same as <code>n</code>.</td>
Reid Spencerb39021b2004-05-23 17:05:09 +0000217 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000218 <td><a name="string"><b>string</b></a></td>
Reid Spencer51f31e02004-07-05 22:28:02 +0000219 <td class="td_left">A uint32_vbr indicating the type of the constant string
Reid Spencer2cc36152004-07-05 19:04:27 +0000220 which also includes its length, immediately followed by the characters of
221 the string. There is no terminating null byte in the string.</td>
Reid Spencerb39021b2004-05-23 17:05:09 +0000222 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000223 <td><a name="data"><b>data</b></a></td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000224 <td class="td_left">An arbitrarily long segment of data to which no
Reid Spencerb39021b2004-05-23 17:05:09 +0000225 interpretation is implied. This is used for float, double, and constant
226 initializers.</td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000227 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000228 <td><a name="llist"><b>llist(x)</b></a></td>
229 <td class="td_left">A length list of x. This means the list is encoded as
230 an <a href="#uint32_vbr">uint32_vbr</a> providing the length of the list,
231 followed by a sequence of that many "x" items. This implies that the reader
232 should iterate the number of times provided by the length.</td>
233 </tr><tr>
234 <td><a name="zlist"><b>zlist(x)</b></a></td>
235 <td class="td_left">A zero-terminated list of x. This means the list is encoded
236 as a sequence of an indeterminate number of "x" items, followed by an
237 <a href="#uint32_vbr">uint32_vbr</a> terminating value. This implies that none
238 of the "x" items can have a zero value (or else the list terminates).</td>
239 </tr><tr>
240 <td><a name="block"><b>block</b></a></td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000241 <td class="td_left">A block of data that is logically related. A block
242 begins with an <a href="#unsigned">unsigned</a> that provides the block
243 identifier (constant value) and an <a href="#unsigned">unsigned</a> that
244 provides the length of the block. Blocks may compose other blocks.
245 </td>
Reid Spencerb39021b2004-05-23 17:05:09 +0000246 </tr>
247</table>
248</div>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000249
Reid Spencerb39021b2004-05-23 17:05:09 +0000250<!-- _______________________________________________________________________ -->
Reid Spencer82c46712004-07-07 13:34:26 +0000251<div class="doc_subsection"><a name="notation">Field Notation</a> </div>
252<div class="doc_text">
253 <p>In the detailed block and field descriptions that follow, a regex like
254 notation is used to describe optional and repeated fields. A very limited
255 subset of regex is used to describe these, as given in the following table:
256 </p>
257 <table class="doc_table" >
258 <tr>
259 <th><b>Character</b></th>
260 <th class="td_left"><b>Meaning</b></th>
261 </tr><tr>
262 <td><b><code>?</code></b></td>
263 <td class="td_left">The question mark indicates 0 or 1 occurrences of
264 the thing preceding it.</td>
265 </tr><tr>
266 <td><b><code>*</code></b></td>
267 <td class="td_left">The asterisk indicates 0 or more occurrences of the
268 thing preceding it.</td>
269 </tr><tr>
270 <td><b><code>+</code></b></td>
271 <td class="td_left">The plus sign indicates 1 or more occurrences of the
272 thing preceding it.</td>
273 </tr><tr>
274 <td><b><code>()</code></b></td>
275 <td class="td_left">Parentheses are used for grouping.</td>
276 </tr><tr>
277 <td><b><code>,</code></b></td>
278 <td class="td_left">The comma separates sequential fields.</td>
279 </tr>
280 </table>
281 <p>So, for example, consider the following specifications:</p>
282 <div class="doc_code">
283 <ol>
284 <li><code>string?</code></li>
285 <li><code>(uint32_vbr,uin32_vbr)+</code></li>
286 <li><code>(unsigned?,uint32_vbr)*</code></li>
287 <li><code>(llist(unsigned))?</code></li>
288 </ol>
289 </div>
290 <p>with the following interpretations:</p>
291 <ol>
292 <li>An optional string. Matches either nothing or a single string</li>
293 <li>One or more pairs of uint32_vbr.</li>
294 <li>Zero or more occurrences of either an unsigned followed by a uint32_vbr
295 or just a uint32_vbr.</li>
296 <li>An optional length list of unsigned values.</li>
297 </ol>
298</div>
299
300<!-- _______________________________________________________________________ -->
Reid Spencer1ab929c2004-07-05 08:18:07 +0000301<div class="doc_subsection"><a name="slots">Slots</a> </div>
Reid Spencer50026612004-05-22 02:28:36 +0000302<div class="doc_text">
Reid Spencer1ab929c2004-07-05 08:18:07 +0000303<p>The bytecode format uses the notion of a "slot" to reference Types and
304Values. Since the bytecode file is a <em>direct</em> representation of LLVM's
305intermediate representation, there is a need to represent pointers in the file.
306Slots are used for this purpose. For example, if one has the following assembly:
307</p>
Reid Spencer82c46712004-07-07 13:34:26 +0000308<div class="doc_code"><code>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000309 %MyType = type { int, sbyte }<br>
310 %MyVar = external global %MyType
Reid Spencer82c46712004-07-07 13:34:26 +0000311</code></div>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000312<p>there are two definitions. The definition of <tt>%MyVar</tt> uses
313<tt>%MyType</tt>. In the C++ IR this linkage between <tt>%MyVar</tt> and
314<tt>%MyType</tt> is
315explicit through the use of C++ pointers. In bytecode, however, there's no
316ability to store memory addresses. Instead, we compute and write out slot
317numbers for every Type and Value written to the file.</p>
318<p>A slot number is simply an unsigned 32-bit integer encoded in the variable
319bit rate scheme (see <a href="#encoding">encoding</a>). This ensures that
320low slot numbers are encoded in one byte. Through various bits of magic LLVM
321attempts to always keep the slot numbers low. The first attempt is to associate
322slot numbers with their "type plane". That is, Values of the same type are
323written to the bytecode file in a list (sequentially). Their order in that list
324determines their slot number. This means that slot #1 doesn't mean anything
325unless you also specify for which type you want slot #1. Types are handled
326specially and are always written to the file first (in the
327<a href="#globaltypes">Global Type Pool</a>) and
328in such a way that both forward and backward references of the types can often be
329resolved with a single pass through the type pool. </p>
330<p>Slot numbers are also kept small by rearranging their order. Because of the
331structure of LLVM, certain values are much more likely to be used frequently
332in the body of a function. For this reason, a compaction table is provided in
333the body of a function if its use would make the function body smaller.
334Suppose you have a function body that uses just the types "int*" and "{double}"
335but uses them thousands of time. Its worthwhile to ensure that the slot number
336for these types are low so they can be encoded in a single byte (via vbr).
337This is exactly what the compaction table does.</p>
338</div>
339
340<!-- *********************************************************************** -->
Reid Spencer51f31e02004-07-05 22:28:02 +0000341<div class="doc_section"> <a name="general">General Structure</a> </div>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000342<!-- *********************************************************************** -->
343<div class="doc_text">
Reid Spencer82c46712004-07-07 13:34:26 +0000344 <p>This section provides the general structure of the LLVM bytecode file
Reid Spencer51f31e02004-07-05 22:28:02 +0000345 format. The bytecode file format requires blocks to be in a certain order and
346 nested in a particular way so that an LLVM module can be constructed
347 efficiently from the contents of the file. This ordering defines a general
348 structure for bytecode files as shown below. The table below shows the order
349 in which all block types may appear. Please note that some of the blocks are
350 optional and some may be repeated. The structure is fairly loose because
351 optional blocks, if empty, are completely omitted from the file.</p>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000352<table>
353 <tr>
354 <th>ID</th>
355 <th>Parent</th>
356 <th>Optional?</th>
357 <th>Repeated?</th>
358 <th>Level</th>
359 <th>Block Type</th>
Reid Spencer51f31e02004-07-05 22:28:02 +0000360 <th>Description</th>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000361 </tr>
362 <tr><td>N/A</td><td>File</td><td>No</td><td>No</td><td>0</td>
363 <td class="td_left"><a href="#signature">Signature</a></td>
Reid Spencer51f31e02004-07-05 22:28:02 +0000364 <td class="td_left">This contains the file signature (magic number)
365 that identifies the file as LLVM bytecode.</td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000366 </tr>
367 <tr><td>0x01</td><td>File</td><td>No</td><td>No</td><td>0</td>
368 <td class="td_left"><a href="#module">Module</a></td>
Reid Spencer51f31e02004-07-05 22:28:02 +0000369 <td class="td_left">This is the top level block in a bytecode file. It
370 contains all the other blocks.</li>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000371 </tr>
372 <tr><td>0x15</td><td>Module</td><td>No</td><td>No</td><td>1</td>
Reid Spencer51f31e02004-07-05 22:28:02 +0000373 <td class="td_left">&nbsp;&nbsp;&nbsp;<a href="#globaltypes">Global&nbsp;Type&nbsp;Pool</a></td>
374 <td class="td_left">This block contains all the global (module) level
375 types.</td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000376 </tr>
377 <tr><td>0x14</td><td>Module</td><td>No</td><td>No</td><td>1</td>
Reid Spencer51f31e02004-07-05 22:28:02 +0000378 <td class="td_left">&nbsp;&nbsp;&nbsp;<a href="#globalinfo">Module&nbsp;Globals&nbsp;Info</a></td>
379 <td class="td_left">This block contains the type, constness, and linkage
380 for each of the global variables in the module. It also contains the
381 type of the functions and the constant initializers.</td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000382 </tr>
383 <tr><td>0x12</td><td>Module</td><td>Yes</td><td>No</td><td>1</td>
Reid Spencer51f31e02004-07-05 22:28:02 +0000384 <td class="td_left">&nbsp;&nbsp;&nbsp;<a href="#constantpool">Module&nbsp;Constant&nbsp;Pool</a></td>
385 <td class="td_left">This block contains all the global constants
386 except function arguments, global values and constant strings.</td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000387 </tr>
388 <tr><td>0x11</td><td>Module</td><td>Yes</td><td>Yes</td><td>1</td>
Reid Spencer82c46712004-07-07 13:34:26 +0000389 <td class="td_left">&nbsp;&nbsp;&nbsp;<a href="#functiondefs">Function&nbsp;Definitions</a>*</td>
Reid Spencer51f31e02004-07-05 22:28:02 +0000390 <td class="td_left">One function block is written for each function in
391 the module. The function block contains the instructions, compaction
392 table, type constant pool, and symbol table for the function.</td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000393 </tr>
394 <tr><td>0x12</td><td>Function</td><td>Yes</td><td>No</td><td>2</td>
Reid Spencer51f31e02004-07-05 22:28:02 +0000395 <td class="td_left">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#constantpool">Function&nbsp;Constant&nbsp;Pool</a></td>
396 <td class="td_left">Any constants (including types) used solely
397 within the function are emitted here in the function constant pool.
398 </td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000399 </tr>
400 <tr><td>0x33</td><td>Function</td><td>Yes</td><td>No</td><td>2</td>
Reid Spencer51f31e02004-07-05 22:28:02 +0000401 <td class="td_left">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#compactiontable">Compaction&nbsp;Table</a></td>
402 <td class="td_left">This table reduces bytecode size by providing a
403 funtion-local mapping of type and value slot numbers to their
404 global slot numbers</td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000405 </tr>
406 <tr><td>0x32</td><td>Function</td><td>No</td><td>No</td><td>2</td>
Reid Spencer51f31e02004-07-05 22:28:02 +0000407 <td class="td_left">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#instructionlist">Instruction&nbsp;List</a></td>
408 <td class="td_left">This block contains all the instructions of the
409 function. The basic blocks are inferred by terminating instructions.
410 </td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000411 </tr>
412 <tr><td>0x13</td><td>Function</td><td>Yes</td><td>No</td><td>2</td>
Reid Spencer51f31e02004-07-05 22:28:02 +0000413 <td class="td_left">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#symtab">Function&nbsp;Symbol&nbsp;Table</a></td>
414 <td class="td_left">This symbol table provides the names for the
415 function specific values used (basic block labels mostly).</td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000416 </tr>
417 <tr><td>0x13</td><td>Module</td><td>Yes</td><td>No</td><td>1</td>
Reid Spencer51f31e02004-07-05 22:28:02 +0000418 <td class="td_left">&nbsp;&nbsp;&nbsp;<a href="#symtab">Module&nbsp;Symbol&nbsp;Table</a></td>
419 <td class="td_left">This symbol table provides the names for the various
420 entries in the file that are not function specific (global vars, and
421 functions mostly).</td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000422 </tr>
423</table>
Reid Spencer82c46712004-07-07 13:34:26 +0000424<p>Use the links in the table for details about the contents of each of the block types.</p>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000425</div>
426
Reid Spencer50026612004-05-22 02:28:36 +0000427<!-- *********************************************************************** -->
Reid Spencer51f31e02004-07-05 22:28:02 +0000428<div class="doc_section"> <a name="blockdefs">Block Definitions</a> </div>
Reid Spencer50026612004-05-22 02:28:36 +0000429<!-- *********************************************************************** -->
430<div class="doc_text">
Reid Spencer51f31e02004-07-05 22:28:02 +0000431 <p>This section provides the detailed layout of the individual block types
432 in the LLVM bytecode file format. </p>
Reid Spencer50026612004-05-22 02:28:36 +0000433</div>
Reid Spencer51f31e02004-07-05 22:28:02 +0000434
Reid Spencer50026612004-05-22 02:28:36 +0000435<!-- _______________________________________________________________________ -->
Reid Spencerb39021b2004-05-23 17:05:09 +0000436<div class="doc_subsection"><a name="signature">Signature Block</a> </div>
Reid Spencer50026612004-05-22 02:28:36 +0000437<div class="doc_text">
Chris Lattner2b905652004-05-24 05:35:17 +0000438<p>The signature occurs in every LLVM bytecode file and is always first.
Reid Spencerb39021b2004-05-23 17:05:09 +0000439It simply provides a few bytes of data to identify the file as being an LLVM
440bytecode file. This block is always four bytes in length and differs from the
441other blocks because there is no identifier and no block length at the start
442of the block. Essentially, this block is just the "magic number" for the file.
Reid Spencer2cc36152004-07-05 19:04:27 +0000443<table>
Reid Spencer50026612004-05-22 02:28:36 +0000444 <tr>
Reid Spencer939290f2004-05-22 05:56:41 +0000445 <th><b>Type</b></th>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000446 <th class="td_left"><b>Field Description</b></th>
Reid Spencerb39021b2004-05-23 17:05:09 +0000447 </tr><tr>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000448 <td><a href="#char">char</a></td>
449 <td class="td_left">Constant "l" (0x6C)</td>
Reid Spencerb39021b2004-05-23 17:05:09 +0000450 </tr><tr>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000451 <td><a href="#char">char</a></td>
452 <td class="td_left">Constant "l" (0x6C)</td>
Reid Spencerb39021b2004-05-23 17:05:09 +0000453 </tr><tr>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000454 <td><a href="#char">char</a></td>
455 <td class="td_left">Constant "v" (0x76)</td>
Reid Spencerb39021b2004-05-23 17:05:09 +0000456 </tr><tr>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000457 <td><a href="#char">char</a></td>
458 <td class="td_left">Constant "m" (0x6D)</td>
Reid Spencer50026612004-05-22 02:28:36 +0000459 </tr>
Reid Spencerb39021b2004-05-23 17:05:09 +0000460</table>
461</div>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000462
Reid Spencerb39021b2004-05-23 17:05:09 +0000463<!-- _______________________________________________________________________ -->
464<div class="doc_subsection"><a name="module">Module Block</a> </div>
465<div class="doc_text">
466<p>The module block contains a small pre-amble and all the other blocks in
Reid Spencer1ab929c2004-07-05 08:18:07 +0000467the file. The table below shows the structure of the module block. Note that it
468only provides the module identifier, size of the module block, and the format
469information. Everything else is contained in other blocks, described in other
470sections.</p>
Reid Spencer2cc36152004-07-05 19:04:27 +0000471<table>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000472 <tr>
473 <th><b>Type</b></th>
474 <th class="td_left"><b>Field Description</b></th>
475 </tr><tr>
476 <td><a href="#unsigned">unsigned</a></td>
477 <td class="td_left">Module Identifier (0x01)</td>
478 </tr><tr>
479 <td><a href="#unsigned">unsigned</a></td>
480 <td class="td_left">Size of the module block in bytes</td>
481 </tr><tr>
482 <td><a href="#uint32_vbr">uint32_vbr</a></td>
483 <td class="td_left"><a href="#format">Format Information</a></td>
484 </tr><tr>
485 <td><a href="#block">block</a></td>
486 <td class="td_left"><a href="#globaltypes">Global Type Pool</a></td>
487 </tr><tr>
488 <td><a href="#block">block</a></td>
489 <td class="td_left"><a href="#globalinfo">Module Globals Info</a></td>
490 </tr><tr>
491 <td><a href="#block">block</a></td>
492 <td class="td_left"><a href="#constantpool">Module Constant Pool</a></td>
493 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000494 <td><a href="#block">block</a>*</td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000495 <td class="td_left"><a href="#functiondefs">Function Definitions</a></td>
496 </tr><tr>
497 <td><a href="#block">block</a></td>
498 <td class="td_left"><a href="#symboltable">Module Symbol Table</a></td>
499 </tr>
500</table>
501</div>
502
503<!-- _______________________________________________________________________ -->
504<div class="doc_subsubsection"><a name="format">Format Information</a></div>
505<div class="doc_text">
506<p>The format information field is encoded into a 32-bit vbr-encoded unsigned
507integer as shown in the following table.</p>
508<table>
509 <tr>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000510 <th><b>Type</b></th>
511 <th class="td_left"><b>Description</b></th>
512 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000513 <td><a href="#bit">bit(0)</a></td>
514 <td class="td_left">Target is big endian?</td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000515 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000516 <td><a href="#bit">bit(1)</a></td>
517 <td class="td_left">On target pointers are 64-bit?</td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000518 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000519 <td><a href="#bit">bit(2)</a></td>
520 <td class="td_left">Target has no endianess?</td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000521 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000522 <td><a href="#bit">bit(3)</a></td>
523 <td class="td_left">Target has no pointer size?</td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000524 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000525 <td><a href="#bit">bit(4-31)</a></td>
526 <td class="td_left">Bytecode format version</td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000527 </tr>
528</table>
529<p>
530Of particular note, the bytecode format number is simply a 28-bit
531monotonically increase integer that identifies the version of the bytecode
Chris Lattner2b905652004-05-24 05:35:17 +0000532format (which is not directly related to the LLVM release number). The
533bytecode versions defined so far are (note that this document only describes
Reid Spencer1ab929c2004-07-05 08:18:07 +0000534the latest version, 1.3):</p>
Chris Lattner2b905652004-05-24 05:35:17 +0000535<ul>
536<li>#0: LLVM 1.0 &amp; 1.1</li>
537<li>#1: LLVM 1.2</li>
538<li>#2: LLVM 1.3</li>
539</ul>
Chris Lattner2b905652004-05-24 05:35:17 +0000540<p>Note that we plan to eventually expand the target description capabilities
Reid Spencer1ab929c2004-07-05 08:18:07 +0000541of bytecode files to <a href="http://llvm.cs.uiuc.edu/PR263">target triples</a>.
542</p>
Reid Spencer50026612004-05-22 02:28:36 +0000543</div>
Chris Lattner2b905652004-05-24 05:35:17 +0000544
Reid Spencer50026612004-05-22 02:28:36 +0000545<!-- _______________________________________________________________________ -->
Reid Spencer1ab929c2004-07-05 08:18:07 +0000546<div class="doc_subsection"><a name="globaltypes">Global Type Pool</a> </div>
Reid Spencer50026612004-05-22 02:28:36 +0000547<div class="doc_text">
Chris Lattner2b905652004-05-24 05:35:17 +0000548<p>The global type pool consists of type definitions. Their order of appearance
Reid Spencerb39021b2004-05-23 17:05:09 +0000549in the file determines their slot number (0 based). Slot numbers are used to
550replace pointers in the intermediate representation. Each slot number uniquely
551identifies one entry in a type plane (a collection of values of the same type).
552Since all values have types and are associated with the order in which the type
553pool is written, the global type pool <em>must</em> be written as the first
554block of a module. If it is not, attempts to read the file will fail because
555both forward and backward type resolution will not be possible.</p>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000556<p>The type pool is simply a list of type definitions, as shown in the table
Reid Spencerb39021b2004-05-23 17:05:09 +0000557below.</p>
Reid Spencer2cc36152004-07-05 19:04:27 +0000558<table>
Reid Spencerb39021b2004-05-23 17:05:09 +0000559 <tr>
Reid Spencerb39021b2004-05-23 17:05:09 +0000560 <th><b>Type</b></th>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000561 <th class="td_left"><b>Field Description</b></th>
Reid Spencerb39021b2004-05-23 17:05:09 +0000562 </tr><tr>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000563 <td><a href="#unsigned">unsigned</a></td>
Reid Spencer2cc36152004-07-05 19:04:27 +0000564 <td class="td_left">Type Pool Identifier (0x15)</td>
Reid Spencerb39021b2004-05-23 17:05:09 +0000565 </tr><tr>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000566 <td><a href="#unsigned">unsigned</a></td>
Reid Spencer2cc36152004-07-05 19:04:27 +0000567 <td class="td_left">Size in bytes of the type pool block.</td>
Reid Spencerb39021b2004-05-23 17:05:09 +0000568 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000569 <td><a href="#llist">llist</a>(<a href="#type">type</a>)</td>
570 <td class="td_left">A length list of type definitions.</td>
Reid Spencerb39021b2004-05-23 17:05:09 +0000571 </tr>
572</table>
Reid Spencer50026612004-05-22 02:28:36 +0000573</div>
574<!-- _______________________________________________________________________ -->
Reid Spencer1ab929c2004-07-05 08:18:07 +0000575<div class="doc_subsubsection"><a name="type">Type Definitions</a></div>
576<div class="doc_text">
Reid Spencer82c46712004-07-07 13:34:26 +0000577<p>Types in the type pool are defined using a different format for each kind
578of type, as given in the following sections.</p>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000579<h3>Primitive Types</h3>
580<p>The primitive types encompass the basic integer and floating point types</p>
581<table>
582 <tr>
583 <th><b>Type</b></th>
584 <th class="td_left"><b>Description</b></th>
585 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000586 <td><a href="#uint32_vbr">uint32_vbr</a></td>
587 <td class="td_left">Type ID for the primitive types (values 1 to 11)
588 <sup>1</sup></td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000589 </tr>
590</table>
Reid Spencer2cc36152004-07-05 19:04:27 +0000591Notes:
592<ol>
Reid Spencer82c46712004-07-07 13:34:26 +0000593 <li>The values for the Type IDs for the primitive types are provided by the
594 definition of the <code>llvm::Type::TypeID</code> enumeration in
595 <code>include/llvm/Type.h</code>. The enumeration gives the following
596 mapping:<ol>
597 <li>bool</li>
598 <li>ubyte</li>
599 <li>sbyte</li>
600 <li>ushort</li>
601 <li>short</li>
602 <li>uint</li>
603 <li>int</li>
604 <li>ulong</li>
605 <li>long</li>
606 <li>float</li>
607 <li>double</li>
608 </ol></li>
Reid Spencer2cc36152004-07-05 19:04:27 +0000609</ol>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000610<h3>Function Types</h3>
611<table>
612 <tr>
613 <th><b>Type</b></th>
614 <th class="td_left"><b>Description</b></th>
615 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000616 <td><a href="#uint32_vbr">uint32_vbr</a></td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000617 <td class="td_left">Type ID for function types (13)</td>
618 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000619 <td><a href="#uint32_vbr">uint32_vbr</a></td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000620 <td class="td_left">Slot number of function's return type.</td>
621 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000622 <td><a href="#llist">llist</a>(<a href="#uint32_vbr">uint32_vbr</a>)</td>
623 <td class="td_left">Slot number of each argument's type.</td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000624 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000625 <td><a href="#uint32_vbr">uint32_vbr</a>?</td>
626 <td class="td_left">Value 0 if this is a varargs function, missing otherwise.</td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000627 </tr>
628</table>
629<h3>Structure Types</h3>
630<table>
631 <tr>
632 <th><b>Type</b></th>
633 <th class="td_left"><b>Description</b></th>
634 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000635 <td><a href="#uint32_vbr">uint32_vbr</a></td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000636 <td class="td_left">Type ID for structure types (14)</td>
637 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000638 <td><a href="#zlist">zlist</a>(<a href="#uint32_vbr">uint32_vbr</a>)</td>
639 <td class="td_left">Slot number of each of the element's fields.</td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000640 </tr>
641</table>
642<h3>Array Types</h3>
643<table>
644 <tr>
645 <th><b>Type</b></th>
646 <th class="td_left"><b>Description</b></th>
647 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000648 <td><a href="#uint32_vbr">uint32_vbr</a></td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000649 <td class="td_left">Type ID for Array Types (15)</td>
650 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000651 <td><a href="#uint32_vbr">uint32_vbr</a></td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000652 <td class="td_left">Slot number of array's element type.</td>
653 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000654 <td><a href="#uint32_vbr">uint32_vbr</a></td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000655 <td class="td_left">The number of elements in the array.</td>
656 </tr>
657</table>
658<h3>Pointer Types</h3>
659<table>
660 <tr>
661 <th><b>Type</b></th>
662 <th class="td_left"><b>Description</b></th>
663 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000664 <td><a href="#uint32_vbr">uint32_vbr</a></td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000665 <td class="td_left">Type ID For Pointer Types (16)</td>
666 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000667 <td><a href="#uint32_vbr">uint32_vbr</a></td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000668 <td class="td_left">Slot number of pointer's element type.</td>
669 </tr>
670</table>
671<h3>Opaque Types</h3>
672<table>
673 <tr>
674 <th><b>Type</b></th>
675 <th class="td_left"><b>Description</b></th>
676 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000677 <td><a href="#uint32_vbr">uint32_vbr</a></td>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000678 <td class="td_left">Type ID For Opaque Types (17)</td>
679 </tr>
680</table>
681</div>
682<!-- _______________________________________________________________________ -->
683<div class="doc_subsection"><a name="globalinfo">Module Global Info</a> </div>
Reid Spencer50026612004-05-22 02:28:36 +0000684<div class="doc_text">
Reid Spencer2cc36152004-07-05 19:04:27 +0000685 <p>The module global info block contains the definitions of all global
686 variables including their initializers and the <em>declaration</em> of all
Chris Lattnerf4ddea62004-07-06 19:58:54 +0000687 functions. The format is shown in the table below:</p>
Reid Spencer2cc36152004-07-05 19:04:27 +0000688 <table>
689 <tr>
690 <th><b>Type</b></th>
691 <th class="td_left"><b>Field Description</b></th>
692 </tr><tr>
693 <td><a href="#unsigned">unsigned</a></td>
694 <td class="td_left">Module global info identifier (0x14)</td>
695 </tr><tr>
696 <td><a href="#unsigned">unsigned</a></td>
697 <td class="td_left">Size in bytes of the module global info block.</td>
698 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000699 <td><a href="#zlist">zlist</a>(<a href="#globalvar">globalvar</a>)</td>
700 <td class="td_left">A zero terminated list of global var definitions
701 occuring in the module.</td>
Reid Spencer2cc36152004-07-05 19:04:27 +0000702 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000703 <td><a href="#zlist">zlist</a>(<a href="#uint32_vbr">uint32_vbr</a>)</td>
704 <td class="td_left">A zero terminated list of function types occuring in
705 the module.</td>
Reid Spencer2cc36152004-07-05 19:04:27 +0000706 </tr>
707 </table>
Reid Spencer50026612004-05-22 02:28:36 +0000708</div>
Reid Spencer2cc36152004-07-05 19:04:27 +0000709
710<!-- _______________________________________________________________________ -->
711<div class="doc_subsubsection"><a name="globalvar">Global Variable Field</a>
712</div>
713<div class="doc_text">
Reid Spencer82c46712004-07-07 13:34:26 +0000714 <p>Global variables are written using an <a href="#uint32_vbr">uint32_vbr</a>
715 that encodes information about the global variable and a list of the constant
716 initializers for the global var, if any.</p>
717 <p>The table below provides the bit layout of the first
718 <a href="#uint32_vbr">uint32_vbr</a> that describes the global variable.</p>
Reid Spencer2cc36152004-07-05 19:04:27 +0000719 <table>
720 <tr>
Reid Spencer2cc36152004-07-05 19:04:27 +0000721 <th><b>Type</b></th>
722 <th class="td_left"><b>Description</b></th>
723 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000724 <td><a href="#bit">bit(0)</a></td>
Reid Spencer2cc36152004-07-05 19:04:27 +0000725 <td class="td_left">Is constant?</td>
726 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000727 <td><a href="#bit">bit(1)</a></td>
728 <td class="td_left">Has initializer? Note that this bit determines whether
729 the constant initializer field (described below) follows.</li>
Reid Spencer2cc36152004-07-05 19:04:27 +0000730 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000731 <td><a href="#bit">bit(2-4)</a></td>
Reid Spencer2cc36152004-07-05 19:04:27 +0000732 <td class="td_left">Linkage type: 0=External, 1=Weak, 2=Appending,
733 3=Internal, 4=LinkOnce</td>
734 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000735 <td><a href="#bit">bit(5-31)</a></td>
Reid Spencer2cc36152004-07-05 19:04:27 +0000736 <td class="td_left">Slot number of type for the global variable.</td>
737 </tr>
738 </table>
Reid Spencer82c46712004-07-07 13:34:26 +0000739 <p>The table below provides the format of the constant initializers for the
740 global variable field, if it has one.</p>
741 <table>
742 <tr>
743 <th><b>Type</b></th>
744 <th class="td_left"><b>Description</b></th>
745 </tr><tr>
746 <td>(<a href="#zlist">zlist</a>(<a href="#uint32_vbr">uint32_vbr</a>))?
747 </a>
748 </td>
749 <td class="td_left">An optional zero-terminated list of slot numbers of
750 the global variable's constant initializer.</td>
751 </tr>
752 </table>
Reid Spencer2cc36152004-07-05 19:04:27 +0000753</div>
754
Reid Spencer50026612004-05-22 02:28:36 +0000755<!-- _______________________________________________________________________ -->
Reid Spencer1ab929c2004-07-05 08:18:07 +0000756<div class="doc_subsection"><a name="constantpool">Constant Pool</a> </div>
Reid Spencer50026612004-05-22 02:28:36 +0000757<div class="doc_text">
Reid Spencer2cc36152004-07-05 19:04:27 +0000758 <p>A constant pool defines as set of constant values. There are actually two
759 types of constant pool blocks: one for modules and one for functions. For
760 modules, the block begins with the constant strings encountered anywhere in
761 the module. For functions, the block begins with types only encountered in
Reid Spencer82c46712004-07-07 13:34:26 +0000762 the function. In both cases the header is identical. The tables that follow,
Reid Spencer2cc36152004-07-05 19:04:27 +0000763 show the header, module constant pool preamble, function constant pool
764 preamble, and the part common to both function and module constant pools.</p>
765 <p><b>Common Block Header</b></p>
766 <table>
767 <tr>
768 <th><b>Type</b></th>
769 <th class="td_left"><b>Field Description</b></th>
770 </tr><tr>
771 <td><a href="#unsigned">unsigned</a></td>
772 <td class="td_left">Constant pool identifier (0x12)</td>
Reid Spencer82c46712004-07-07 13:34:26 +0000773 </tr><tr>
774 <td><a href="#unsigned">unsigned</a></td>
775 <td class="td_left">Size in bytes of the constant pool block.</td>
Reid Spencer2cc36152004-07-05 19:04:27 +0000776 </tr>
777 </table>
778 <p><b>Module Constant Pool Preamble (constant strings)</b></p>
779 <table>
780 <tr>
781 <th><b>Type</b></th>
782 <th class="td_left"><b>Field Description</b></th>
783 </tr><tr>
784 <td><a href="#uint32_vbr">uint32_vbr</a></td>
785 <td class="td_left">The number of constant strings that follow.</td>
786 </tr><tr>
787 <td><a href="#uint32_vbr">uint32_vbr</a></td>
788 <td class="td_left">Zero. This identifies the following "plane" as
Reid Spencer82c46712004-07-07 13:34:26 +0000789 containing the constant strings. This is needed to identify it
790 uniquely from other constant planes that follow.
Reid Spencer2cc36152004-07-05 19:04:27 +0000791 </td>
792 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000793 <td><a href="#uint32_vbr">uint32_vbr</a>+</td>
794 <td class="td_left">Slot number of the constant string's type. Note
795 that the constant string's type implicitly defines the length of
796 the string.
Reid Spencer2cc36152004-07-05 19:04:27 +0000797 </td>
798 </tr>
799 </table>
Reid Spencer2cc36152004-07-05 19:04:27 +0000800 <p><b>Function Constant Pool Preamble (function types)</b></p>
801 <p>The structure of the types for functions is identical to the
802 <a href="#globaltypes">Global Type Pool</a>. Please refer to that section
803 for the details.
804 <p><b>Common Part (other constants)</b></p>
805 <table>
806 <tr>
807 <th><b>Type</b></th>
808 <th class="td_left"><b>Field Description</b></th>
809 </tr><tr>
810 <td><a href="#uint32_vbr">uint32_vbr</a></td>
811 <td class="td_left">Number of entries in this type plane.</td>
812 </tr><tr>
813 <td><a href="#uint32_vbr">uint32_vbr</a></td>
814 <td class="td_left">Type slot number of this plane.</td>
815 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000816 <td><a href="#constant">constant</a>+</td>
Reid Spencer2cc36152004-07-05 19:04:27 +0000817 <td class="td_left">The definition of a constant (see below).</td>
818 </tr>
819 </table>
820</div>
821<!-- _______________________________________________________________________ -->
822<div class="doc_subsubsection"><a name="constant">Constant Field</a></div>
823<div class="doc_text">
824 <p>Constants come in many shapes and flavors. The sections that followe define
825 the format for each of them. All constants start with a
826 <a href="#uint32_vbr">uint32_vbr</a> encoded integer that provides the number
827 of operands for the constant. For primitive, structure, and array constants,
828 this will always be zero since those types of constants have no operands.
829 In this case, we have the following field definitions:</p>
830 <ul>
831 <li><b>Bool</b>. This is written as an <a href="#uint32_vbr">uint32_vbr</a>
832 of value 1U or 0U.</li>
833 <li><b>Signed Integers (sbyte,short,int,long)</b>. These are written as
834 an <a href="#int64_vbr">int64_vbr</a> with the corresponding value.</li>
835 <li><b>Unsigned Integers (ubyte,ushort,uint,ulong)</b>. These are written
836 as an <a href="#uint64_vbr">uint64_vbr</a> with the corresponding value.
837 </li>
838 <li><b>Floating Point</b>. Both the float and double types are written
839 literally in binary format.</li>
840 <li><b>Arrays</b>. Arrays are written simply as a list of
841 <a href="#uint32_vbr">uint32_vbr</a> encoded slot numbers to the constant
842 element values.</li>
843 <li><b>Structures</b>. Structures are written simply as a list of
844 <a href="#uint32_vbr">uint32_vbr</a> encoded slot numbers to the constant
845 field values of the structure.</li>
846 </ul>
847 <p>When the number of operands to the constant is non-zero, we have a
848 constant expression and its field format is provided in the table below.</p>
849 <table>
850 <tr>
851 <th><b>Type</b></th>
852 <th class="td_left"><b>Field Description</b></th>
853 </tr><tr>
854 <td><a href="#uint32_vbr">uint32_vbr</a></td>
855 <td class="td_left">Op code of the instruction for the constant
856 expression.</td>
857 </tr><tr>
858 <td><a href="#uint32_vbr">uint32_vbr</a></td>
859 <td class="td_left">The slot number of the constant value for an
860 operand.<sup>1</sup></td>
861 </tr><tr>
862 <td><a href="#uint32_vbr">uint32_vbr</a></td>
863 <td class="td_left">The slot number for the type of the constant value
864 for an operand.<sup>1</sup></td>
865 </tr>
866 </table>
867 Notes:<ol>
868 <li>Both these fields are repeatable but only in pairs.</li>
869 </ol>
Reid Spencer50026612004-05-22 02:28:36 +0000870</div>
871<!-- _______________________________________________________________________ -->
Reid Spencer51f31e02004-07-05 22:28:02 +0000872<div class="doc_subsection"><a name="functiondefs">Function Definition</a></div>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000873<div class="doc_text">
Reid Spencer82c46712004-07-07 13:34:26 +0000874 <p>Function definitions contain the linkage, constant pool or compaction
875 table, instruction list, and symbol table for a function. The following table
876 shows the structure of a function definition.</p>
Reid Spencer51f31e02004-07-05 22:28:02 +0000877 <table>
878 <tr>
879 <th><b>Type</b></th>
880 <th class="td_left"><b>Field Description</b></th>
881 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000882 <td><a href="#unsigned">unsigned</a></td>
883 <td class="td_left">Function definition block identifier (0x11)</td>
884 </tr><tr>
885 <td><a href="#unsigned">unsigned</a></td>
886 <td class="td_left">Size in bytes of the function definition block.</td>
887 </tr><tr>
Reid Spencer51f31e02004-07-05 22:28:02 +0000888 <td><a href="#uint32_vbr">uint32_vbr</a></td>
889 <td class="td_left">The linkage type of the function: 0=External, 1=Weak,
890 2=Appending, 3=Internal, 4=LinkOnce<sup>1</sup></td>
891 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000892 <td><a href="#block">block</a></td>
893 <td class="td_left">The <a href="#constantpool">constant pool</a> block
894 for this function.<sup>2</sup></td>
Reid Spencer51f31e02004-07-05 22:28:02 +0000895 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000896 <td><a href="#block">block</a></td>
897 <td class="td_left">The <a href="#compactiontable">compaction table</a>
898 block for the function.<sup>2</sup></td>
Reid Spencer51f31e02004-07-05 22:28:02 +0000899 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000900 <td><a href="#block">block</a></td>
901 <td class="td_left">The <a href="#instructionlist">instruction list</a>
902 for the function.</td>
Reid Spencer51f31e02004-07-05 22:28:02 +0000903 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000904 <td><a href="#block">block</a></td>
905 <td class="td_left">The function's <a href="#symboltable">symbol table</a>
906 containing only those symbols pertinent to the function (mostly
907 block labels).</td>
Reid Spencer51f31e02004-07-05 22:28:02 +0000908 </tr>
909 </table>
910 Notes:<ol>
911 <li>Note that if the linkage type is "External" then none of the other
912 fields will be present as the function is defined elsewhere.</li>
913 <li>Note that only one of the constant pool or compaction table will be
914 written. Compaction tables are only written if they will actually save
915 bytecode space. If not, then a regular constant pool is written.</li>
916 </ol>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000917</div>
918<!-- _______________________________________________________________________ -->
919<div class="doc_subsection"><a name="compactiontable">Compaction Table</a> </div>
920<div class="doc_text">
Reid Spencer2cc36152004-07-05 19:04:27 +0000921 <p>Compaction tables are part of a function definition. They are merely a
922 device for reducing the size of bytecode files. The size of a bytecode
923 file is dependent on the <em>value</em> of the slot numbers used because
924 larger values use more bytes in the variable bit rate encoding scheme.
Reid Spencer82c46712004-07-07 13:34:26 +0000925 Furthermore, the compressed instruction format reserves only six bits for
Reid Spencer2cc36152004-07-05 19:04:27 +0000926 the type of the instruction. In large modules, declaring hundreds or thousands
927 of types, the values of the slot numbers can be quite large. However,
928 functions may use only a small fraction of the global types. In such cases
929 a compaction table is created that maps the global type and value slot
Reid Spencer82c46712004-07-07 13:34:26 +0000930 numbers to smaller values used by a function. Functions will contain either
931 a function-specific constant pool <em>or</em> a compaction table but not
932 both. Compaction tables have the format shown in the table below.</p>
Reid Spencer2cc36152004-07-05 19:04:27 +0000933 <table>
934 <tr>
935 <th><b>Type</b></th>
936 <th class="td_left"><b>Field Description</b></th>
937 </tr><tr>
938 <td><a href="#uint32_vbr">uint32_vbr</a></td>
939 <td class="td_left">The number of types that follow</td>
940 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000941 <td><a href="#uint32_vbr">uint32_vbr</a>+</td>
Reid Spencer2cc36152004-07-05 19:04:27 +0000942 <td class="td_left">The slot number in the global type plane of the
943 type that will be referenced in the function with the index of
Reid Spencer82c46712004-07-07 13:34:26 +0000944 this entry in the compaction table.</td>
Reid Spencer2cc36152004-07-05 19:04:27 +0000945 </tr><tr>
946 <td><a href="#type_len">type_len</a></td>
947 <td class="td_left">An encoding of the type and number of values that
Reid Spencer82c46712004-07-07 13:34:26 +0000948 follow. This field's encoding varies depending on the size of
949 the type plane. See <a href="#type_len">Type and Length</a> for
950 further details.</td>
Reid Spencer2cc36152004-07-05 19:04:27 +0000951 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000952 <td><a href="#uint32_vbr">uint32_vbr</a>+</td>
Reid Spencer2cc36152004-07-05 19:04:27 +0000953 <td class="td_left">The slot number in the globals of the value that
954 will be referenced in the function with the index of this entry in
Reid Spencer82c46712004-07-07 13:34:26 +0000955 the compaction table</td>
Reid Spencer2cc36152004-07-05 19:04:27 +0000956 </tr>
957 </table>
Reid Spencer1ab929c2004-07-05 08:18:07 +0000958</div>
Reid Spencer2cc36152004-07-05 19:04:27 +0000959
960<!-- _______________________________________________________________________ -->
961<div class="doc_subsubsection"><a name="type_len">Type and Length</a></div>
962<div class="doc_text">
963 <p>The type and length of a compaction table type plane is encoded differently
964 depending on the length of the plane. For planes of length 1 or 2, the length
965 is encoded into bits 0 and 1 of a <a href="#uint32_vbr">uint32_vbr</a> and the
966 type is encoded into bits 2-31. Because type numbers are often small, this
967 often saves an extra byte per plane. If the length of the plane is greater
968 than 2 then the encoding uses a <a href="#uint32_vbr">uint32_vbr</a> for each
969 of the length and type, in that order.</p>
970</div>
971
Reid Spencer1ab929c2004-07-05 08:18:07 +0000972<!-- _______________________________________________________________________ -->
973<div class="doc_subsection"><a name="instructionlist">Instruction List</a> </div>
Reid Spencer50026612004-05-22 02:28:36 +0000974<div class="doc_text">
Reid Spencer51f31e02004-07-05 22:28:02 +0000975 <p>The instructions in a function are written as a simple list. Basic blocks
976 are inferred by the terminating instruction types. The format of the block
977 is given in the following table.</p>
978 <table>
979 <tr>
980 <th><b>Type</b></th>
981 <th class="td_left"><b>Field Description</b></th>
982 </tr><tr>
983 <td><a href="#unsigned">unsigned</a></td>
984 <td class="td_left">Instruction list identifier (0x33).</td>
985 </tr><tr>
986 <td><a href="#unsigned">unsigned</a></td>
987 <td class="td_left">Size in bytes of the instruction list.</td>
988 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +0000989 <td><a href="#instruction">instruction</a>+</td>
990 <td class="td_left">An instruction. Instructions have a variety of formats.
991 See <a href="#instruction">Instructions</a> for details.</td>
Reid Spencer51f31e02004-07-05 22:28:02 +0000992 </tr>
993 </table>
Reid Spencer50026612004-05-22 02:28:36 +0000994</div>
Reid Spencer51f31e02004-07-05 22:28:02 +0000995
996<!-- _______________________________________________________________________ -->
997<div class="doc_subsubsection"><a name="instruction">Instructions</a></div>
998<div class="doc_text">
999 <p>For brevity, instructions are written in one of four formats, depending on
1000 the number of operands to the instruction. Each instruction begins with a
1001 <a href="#uint32_vbr">uint32_vbr</a> that encodes the type of the instruction
1002 as well as other things. The tables that follow describe the format of this
1003 first word of each instruction.</p>
1004 <p><b>Instruction Format 0</b></p>
1005 <p>This format is used for a few instructions that can't easily be optimized
1006 because they have large numbers of operands (e.g. PHI Node or getelementptr).
1007 Each of the opcode, type, and operand fields is as successive fields.</p>
1008 <table>
1009 <tr>
1010 <th><b>Type</b></th>
1011 <th class="td_left"><b>Field Description</b></th>
1012 </tr><tr>
1013 <td><a href="#uint32_vbr">uint32_vbr</a></td>
1014 <td class="td_left">Specifies the opcode of the instruction. Note that for
1015 compatibility with the other instruction formats, the opcode is shifted
1016 left by 2 bits. Bits 0 and 1 must have value zero for this format.</td>
1017 </tr><tr>
1018 <td><a href="#uint32_vbr">uint32_vbr</a></td>
1019 <td class="td_left">Provides the slot number of the result type of the
1020 instruction</td>
1021 </tr><tr>
1022 <td><a href="#uint32_vbr">uint32_vbr</a></td>
1023 <td class="td_left">The number of operands that follow.</td>
1024 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +00001025 <td><a href="#uint32_vbr">uint32_vbr</a>+</td>
1026 <td class="td_left">The slot number of the value(s) for the operand(s).
1027 <sup>1</sup></td>
Reid Spencer51f31e02004-07-05 22:28:02 +00001028 </tr>
1029 </table>
1030 Notes:<ol>
Reid Spencer51f31e02004-07-05 22:28:02 +00001031 <li>Note that if the instruction is a getelementptr and the type of the
1032 operand is a sequential type (array or pointer) then the slot number is
1033 shifted up two bits and the low order bits will encode the type of index
1034 used, as follows: 0=uint, 1=int, 2=ulong, 3=long.</li>
1035 </ol>
1036 <p><b>Instruction Format 1</b></p>
1037 <p>This format encodes the opcode, type and a single operand into a single
1038 <a href="#uint32_vbr">uint32_vbr</a> as follows:</p>
1039 <table>
1040 <tr>
1041 <th><b>Bits</b></th>
1042 <th><b>Type</b></th>
1043 <th class="td_left"><b>Field Description</b></th>
1044 </tr><tr>
1045 <td>0-1</td><td>constant "1"</td>
1046 <td class="td_left">These two bits must be the value 1 which identifies
1047 this as an instruction of format 1.</td>
1048 </td>
1049 </tr><tr>
1050 <td>2-7</td><td><a href="#opcodes">opcode</a></td>
1051 <td class="td_left">Specifies the opcode of the instruction. Note that
Reid Spencer82c46712004-07-07 13:34:26 +00001052 the maximum opcode value is 63.</td>
Reid Spencer51f31e02004-07-05 22:28:02 +00001053 </tr><tr>
1054 <td>8-19</td><td><a href="#unsigned">unsigned</a></td>
1055 <td class="td_left">Specifies the slot number of the type for this
1056 instruction. Maximum slot number is 2<sup>12</sup>-1=4095.</td>
1057 </tr><tr>
1058 <td>20-31</td><td><a href="#unsigned">unsigned</a></td>
1059 <td class="td_left">Specifies the slot number of the value for the
1060 first operand. Maximum slot number is 2<sup>12</sup>-1=4095. Note
1061 that the value 2<sup>12</sup>-1 denotes zero operands.</td>
1062 </tr>
1063 </table>
1064 <p><b>Instruction Format 2</b></p>
1065 <p>This format encodes the opcode, type and two operands into a single
1066 <a href="#uint32_vbr">uint32_vbr</a> as follows:</p>
1067 <table>
1068 <tr>
1069 <th><b>Bits</b></th>
1070 <th><b>Type</b></th>
1071 <th class="td_left"><b>Field Description</b></th>
1072 </tr><tr>
1073 <td>0-1</td><td>constant "2"</td>
1074 <td class="td_left">These two bits must be the value 2 which identifies
1075 this as an instruction of format 2.</td>
1076 </td>
1077 </tr><tr>
1078 <td>2-7</td><td><a href="#opcodes">opcode</a></td>
1079 <td class="td_left">Specifies the opcode of the instruction. Note that
Reid Spencer82c46712004-07-07 13:34:26 +00001080 the maximum opcode value is 63.</td>
Reid Spencer51f31e02004-07-05 22:28:02 +00001081 </tr><tr>
1082 <td>8-15</td><td><a href="#unsigned">unsigned</a></td>
1083 <td class="td_left">Specifies the slot number of the type for this
1084 instruction. Maximum slot number is 2<sup>8</sup>-1=255.</td>
1085 </tr><tr>
1086 <td>16-23</td><td><a href="#unsigned">unsigned</a></td>
1087 <td class="td_left">Specifies the slot number of the value for the
1088 first operand. Maximum slot number is 2<sup>8</sup>-1=255.</td>
1089 </tr><tr>
1090 <td>24-31</td><td><a href="#unsigned">unsigned</a></td>
1091 <td class="td_left">Specifies the slot number of the value for the
1092 second operand. Maximum slot number is 2<sup>8</sup>-1=255.</td>
1093 </tr>
1094 </table>
1095 <p><b>Instruction Format 3</b></p>
1096 <p>This format encodes the opcode, type and three operands into a single
1097 <a href="#uint32_vbr">uint32_vbr</a> as follows:</p>
1098 <table>
1099 <tr>
1100 <th><b>Bits</b></th>
1101 <th><b>Type</b></th>
1102 <th class="td_left"><b>Field Description</b></th>
1103 </tr><tr>
1104 <td>0-1</td><td>constant "3"</td>
1105 <td class="td_left">These two bits must be the value 3 which identifies
1106 this as an instruction of format 3.</td>
1107 </td>
1108 </tr><tr>
1109 <td>2-7</td><td><a href="#opcodes">opcode</a></td>
1110 <td class="td_left">Specifies the opcode of the instruction. Note that
Reid Spencer82c46712004-07-07 13:34:26 +00001111 the maximum opcode value is 63.</td>
Reid Spencer51f31e02004-07-05 22:28:02 +00001112 </tr><tr>
1113 <td>8-13</td><td><a href="#unsigned">unsigned</a></td>
1114 <td class="td_left">Specifies the slot number of the type for this
1115 instruction. Maximum slot number is 2<sup>6</sup>-1=63.</td>
1116 </tr><tr>
1117 <td>14-19</td><td><a href="#unsigned">unsigned</a></td>
1118 <td class="td_left">Specifies the slot number of the value for the
1119 first operand. Maximum slot number is 2<sup>6</sup>-1=63.</td>
1120 </tr><tr>
1121 <td>20-25</td><td><a href="#unsigned">unsigned</a></td>
1122 <td class="td_left">Specifies the slot number of the value for the
1123 second operand. Maximum slot number is 2<sup>6</sup>-1=63.</td>
1124 </tr><tr>
1125 <td>26-31</td><td><a href="#unsigned">unsigned</a></td>
1126 <td class="td_left">Specifies the slot number of the value for the
1127 third operand. Maximum slot number is 2<sup>6</sup>-1=63.</td>
1128 </tr>
1129 </table>
1130</div>
1131
Reid Spencer50026612004-05-22 02:28:36 +00001132<!-- _______________________________________________________________________ -->
Reid Spencerb39021b2004-05-23 17:05:09 +00001133<div class="doc_subsection"><a name="symtab">Symbol Table</a> </div>
Reid Spencer50026612004-05-22 02:28:36 +00001134<div class="doc_text">
Reid Spencerb39021b2004-05-23 17:05:09 +00001135<p>A symbol table can be put out in conjunction with a module or a function.
1136A symbol table is a list of type planes. Each type plane starts with the number
1137of entries in the plane and the type plane's slot number (so the type can be
1138looked up in the global type pool). For each entry in a type plane, the slot
1139number of the value and the name associated with that value are written. The
1140format is given in the table below. </p>
Reid Spencer2cc36152004-07-05 19:04:27 +00001141<table>
Reid Spencerb39021b2004-05-23 17:05:09 +00001142 <tr>
Reid Spencerb39021b2004-05-23 17:05:09 +00001143 <th><b>Type</b></th>
Reid Spencer1ab929c2004-07-05 08:18:07 +00001144 <th class="td_left"><b>Field Description</b></th>
Reid Spencerb39021b2004-05-23 17:05:09 +00001145 </tr><tr>
Reid Spencer51f31e02004-07-05 22:28:02 +00001146 <td><a href="#unsigned">unsigned</a></td>
Reid Spencer1ab929c2004-07-05 08:18:07 +00001147 <td class="td_left">Symbol Table Identifier (0x13)</td>
Reid Spencerb39021b2004-05-23 17:05:09 +00001148 </tr><tr>
Reid Spencer51f31e02004-07-05 22:28:02 +00001149 <td><a href="#unsigned">unsigned</a></td>
Reid Spencer1ab929c2004-07-05 08:18:07 +00001150 <td class="td_left">Size in bytes of the symbol table block.</td>
Reid Spencerb39021b2004-05-23 17:05:09 +00001151 </tr><tr>
Reid Spencer51f31e02004-07-05 22:28:02 +00001152 <td><a href="#uint32_vbr">uint32_vbr</a></td>
Reid Spencer1ab929c2004-07-05 08:18:07 +00001153 <td class="td_left">Number of entries in type plane</td>
Reid Spencerb39021b2004-05-23 17:05:09 +00001154 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +00001155 <td><a href="#symtab_entry">symtab_entry</a>*</td>
1156 <td class="td_left">Provides the slot number of the type and its name.</td>
Reid Spencerb39021b2004-05-23 17:05:09 +00001157 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +00001158 <td><a href="#symtab_plane">symtab_plane</a>*</td>
Reid Spencer51f31e02004-07-05 22:28:02 +00001159 <td class="td_left">A type plane containing value slot number and name
Reid Spencer82c46712004-07-07 13:34:26 +00001160 for all values of the same type.</td>
Reid Spencerb39021b2004-05-23 17:05:09 +00001161 </tr>
1162</table>
Reid Spencer50026612004-05-22 02:28:36 +00001163</div>
Reid Spencer51f31e02004-07-05 22:28:02 +00001164
1165<!-- _______________________________________________________________________ -->
1166<div class="doc_subsubsection"> <a name="symtab_plane">Symbol Table Plane</a>
1167</div>
1168<div class="doc_text">
1169 <p>A symbol table plane provides the symbol table entries for all values of
1170 a common type. The encoding is given in the following table:</p>
1171<table>
1172 <tr>
1173 <th><b>Type</b></th>
1174 <th class="td_left"><b>Field Description</b></th>
1175 </tr><tr>
1176 <td><a href="#uint32_vbr">uint32_vbr</a></td>
1177 <td class="td_left">Number of entries in this plane.</td>
1178 </tr><tr>
1179 <td><a href="#uint32_vbr">uint32_vbr</a></td>
1180 <td class="td_left">Slot number of type for this plane.</td>
1181 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +00001182 <td><a href="#symtab_entry">symtab_entry</a>+</td>
1183 <td class="td_left">The symbol table entries for this plane.</td>
Reid Spencer51f31e02004-07-05 22:28:02 +00001184 </tr>
1185</table>
1186</div>
1187
1188<!-- _______________________________________________________________________ -->
1189<div class="doc_subsubsection"> <a name="symtab_entry">Symbol Table Entry</a>
1190</div>
1191<div class="doc_text">
1192 <p>A symbol table entry provides the assocation between a type or value's
1193 slot number and the name given to that type or value. The format is given
1194 in the following table:</p>
1195<table>
1196 <tr>
1197 <th><b>Type</b></th>
1198 <th class="td_left"><b>Field Description</b></th>
1199 </tr><tr>
1200 <td><a href="#uint32_vbr">uint32_vbr</a></td>
1201 <td class="td_left">Slot number of the type or value being given a name.
1202 </td>
1203 </tr><tr>
1204 <td><a href="#uint32_vbr">uint32_vbr</a></td>
1205 <td class="td_left">Length of the character array that follows.</td>
1206 </tr><tr>
Reid Spencer82c46712004-07-07 13:34:26 +00001207 <td><a href="#char">char</a>+</td>
1208 <td class="td_left">The characters of the name.</td>
Reid Spencer51f31e02004-07-05 22:28:02 +00001209 </tr>
1210</table>
1211</div>
1212
Reid Spencer7c76d332004-06-08 07:41:41 +00001213<!-- *********************************************************************** -->
1214<div class="doc_section"> <a name="versiondiffs">Version Differences</a> </div>
1215<!-- *********************************************************************** -->
1216<div class="doc_text">
1217<p>This section describes the differences in the Bytecode Format across LLVM
1218versions. The versions are listed in reverse order because it assumes the
1219current version is as documented in the previous sections. Each section here
Chris Lattner1cc070c2004-07-05 18:05:48 +00001220describes the differences between that version and the one that <i>follows</i>.
Reid Spencer7c76d332004-06-08 07:41:41 +00001221</p>
1222</div>
Reid Spencer51f31e02004-07-05 22:28:02 +00001223
Reid Spencer7c76d332004-06-08 07:41:41 +00001224<!-- _______________________________________________________________________ -->
1225<div class="doc_subsection">
1226<a name="vers12">Version 1.2 Differences From 1.3</a></div>
Reid Spencer1ab929c2004-07-05 08:18:07 +00001227<!-- _______________________________________________________________________ -->
1228<div class="doc_subsubsection">Type Derives From Value</div>
Reid Spencer7c76d332004-06-08 07:41:41 +00001229<div class="doc_text">
Reid Spencer1ab929c2004-07-05 08:18:07 +00001230 <p>In version 1.2, the Type class in the LLVM IR derives from the Value class.
1231 This is not the case in version 1.3. Consequently, in version 1.2 the notion
1232 of a "Type Type" was used to write out values that were Types. The types
1233 always occuped plane 12 (corresponding to the TypeTyID) of any type planed
1234 set of values. In 1.3 this representation is not convenient because the
1235 TypeTyID (12) is not present and its value is now used for LabelTyID.
1236 Consequently, the data structures written that involve types do so by writing
1237 all the types first and then each of the value planes according to those
1238 types. In version 1.2, the types would have been written intermingled with
1239 the values.</p>
1240</div>
1241
1242<!-- _______________________________________________________________________ -->
1243<div class="doc_subsubsection">Restricted getelementptr Types</a></div>
1244<div class="doc_text">
1245 <p>In version 1.2, the getelementptr instruction required a ubyte type index
1246 for accessing a structure field and a long type index for accessing an array
1247 element. Consequently, it was only possible to access structures of 255 or
1248 fewer elements. Starting in version 1.3, this restriction was lifted.
Chris Lattner7c66ab32004-07-05 17:55:28 +00001249 Structures must now be indexed with uint constants. Arrays may now be
1250 indexed with int, uint, long, or ulong typed values.
1251 The consequence of this was that the bytecode format had to
Reid Spencer1ab929c2004-07-05 08:18:07 +00001252 change in order to accommodate the larger range of structure indices.</p>
Reid Spencer7c76d332004-06-08 07:41:41 +00001253</div>
1254
1255<!-- _______________________________________________________________________ -->
1256<div class="doc_subsection">
1257<a name="vers11">Version 1.1 Differences From 1.2 </a></div>
Reid Spencer1ab929c2004-07-05 08:18:07 +00001258<!-- _______________________________________________________________________ -->
1259<div class="doc_subsubsection">Explicit Primitive Zeros</div>
Reid Spencer7c76d332004-06-08 07:41:41 +00001260<div class="doc_text">
Reid Spencer1ab929c2004-07-05 08:18:07 +00001261 <p>In version 1.1, the zero value for primitives was explicitly encoded into
1262 the bytecode format. Since these zero values are constant values in the
1263 LLVM IR and never change, there is no reason to explicitly encode them. This
1264 explicit encoding was removed in version 1.2.</p>
1265</div>
1266
1267<!-- _______________________________________________________________________ -->
1268<div class="doc_subsubsection">Inconsistent Module Global Info</div>
1269<div class="doc_text">
1270 <p>In version 1.1, the Module Global Info block was not aligned causing the
1271 next block to be read in on an unaligned boundary. This problem was corrected
1272 in version 1.2.</p>
Reid Spencer7c76d332004-06-08 07:41:41 +00001273</div>
1274
1275<!-- _______________________________________________________________________ -->
1276<div class="doc_subsection">
Reid Spencer51f31e02004-07-05 22:28:02 +00001277<a name="vers10">Version 1.0 Differences From 1.1</a></div>
Reid Spencer7c76d332004-06-08 07:41:41 +00001278<div class="doc_text">
Reid Spencer1ab929c2004-07-05 08:18:07 +00001279<p>None. Version 1.0 and 1.1 bytecode formats are identical.</p>
Reid Spencer7c76d332004-06-08 07:41:41 +00001280</div>
Reid Spencer50026612004-05-22 02:28:36 +00001281
1282<!-- *********************************************************************** -->
1283<hr>
1284<address>
1285 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
1286 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
1287 <a href="http://validator.w3.org/check/referer"><img
1288 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!" /></a>
1289
1290 <a href="mailto:rspencer@x10sys.com">Reid Spencer</a> and
1291 <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
1292 <a href="http://llvm.cs.uiuc.edu">The LLVM Compiler Infrastructure</a><br>
1293 Last modified: $Date$
1294</address>
1295</body>
1296</html>
1297<!-- vim: sw=2
1298-->