| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" | 
|  | 2 | "http://www.w3.org/TR/html4/strict.dtd"> | 
|  | 3 | <html> | 
|  | 4 | <head> | 
|  | 5 | <title>LLVM Bytecode File Format</title> | 
|  | 6 | <link rel="stylesheet" href="llvm.css" type="text/css"> | 
| Reid Spencer | 6f1d699 | 2004-05-23 17:12:45 +0000 | [diff] [blame] | 7 | <style type="css"> | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 8 | table, tr, td { border: 2px solid gray } | 
| Reid Spencer | b39021b | 2004-05-23 17:05:09 +0000 | [diff] [blame] | 9 | th { border: 2px solid gray; font-weight: bold; } | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 10 | table { border-collapse: collapse; margin-top: 1em margin-bottom: 1em } | 
|  | 11 | </style> | 
|  | 12 | </head> | 
|  | 13 | <body> | 
|  | 14 | <div class="doc_title"> LLVM Bytecode File Format </div> | 
|  | 15 | <ol> | 
|  | 16 | <li><a href="#abstract">Abstract</a></li> | 
| Reid Spencer | 6f1d699 | 2004-05-23 17:12:45 +0000 | [diff] [blame] | 17 | <li><a href="#general">General Concepts</a> | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 18 | <ol> | 
|  | 19 | <li><a href="#blocks">Blocks</a></li> | 
|  | 20 | <li><a href="#lists">Lists</a></li> | 
|  | 21 | <li><a href="#fields">Fields</a></li> | 
| Reid Spencer | 7aa940d | 2004-05-25 15:47:57 +0000 | [diff] [blame] | 22 | <li><a href="#slots">Slots</a></li> | 
| Reid Spencer | b39021b | 2004-05-23 17:05:09 +0000 | [diff] [blame] | 23 | <li><a href="#encoding">Encoding Rules</a></li> | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 24 | <li><a href="#align">Alignment</a></li> | 
|  | 25 | </ol> | 
| Reid Spencer | 6f1d699 | 2004-05-23 17:12:45 +0000 | [diff] [blame] | 26 | </li> | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 27 | <li><a href="#details">Detailed Layout</a> | 
|  | 28 | <ol> | 
|  | 29 | <li><a href="#notation">Notation</a></li> | 
|  | 30 | <li><a href="#blocktypes">Blocks Types</a></li> | 
| Reid Spencer | b39021b | 2004-05-23 17:05:09 +0000 | [diff] [blame] | 31 | <li><a href="#signature">Signature Block</a></li> | 
|  | 32 | <li><a href="#module">Module Block</a></li> | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 33 | <li><a href="#typeool">Global Type Pool</a></li> | 
|  | 34 | <li><a href="#modinfo">Module Info Block</a></li> | 
|  | 35 | <li><a href="#constants">Global Constant Pool</a></li> | 
| Chris Lattner | 2ca1fd1 | 2004-05-24 04:55:32 +0000 | [diff] [blame] | 36 | <li><a href="#functions">Function Blocks</a></li> | 
|  | 37 | <li><a href="#symtab">Module Symbol Table</a></li> | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 38 | </ol> | 
|  | 39 | </li> | 
| Reid Spencer | 7c76d33 | 2004-06-08 07:41:41 +0000 | [diff] [blame^] | 40 | <li><a href="#versiondiffs">Version Differences</a> | 
|  | 41 | <ol> | 
|  | 42 | <li><a href="#vers12">Version 1.2 Differences From 1.3</a></li> | 
|  | 43 | <li><a href="#vers11">Version 1.1 Differences From 1.2</a></li> | 
|  | 44 | <li><a href="#vers10">Version 1.0 Differences From 1.1</a></li> | 
|  | 45 | </ol> | 
|  | 46 | </li> | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 47 | </ol> | 
| Chris Lattner | 8dabb50 | 2004-05-25 17:44:58 +0000 | [diff] [blame] | 48 | <div class="doc_author"> | 
|  | 49 | <p>Written by <a href="mailto:rspencer@x10sys.com">Reid Spencer</a> | 
|  | 50 | </p> | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 51 | </div> | 
| Reid Spencer | c0a2af1 | 2004-06-05 14:18:02 +0000 | [diff] [blame] | 52 | <div class="doc_warning"> | 
|  | 53 | <p>Warning: This is a work in progress.</p> | 
|  | 54 | </div> | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 55 | <!-- *********************************************************************** --> | 
|  | 56 | <div class="doc_section"> <a name="abstract">Abstract </a></div> | 
|  | 57 | <!-- *********************************************************************** --> | 
|  | 58 | <div class="doc_text"> | 
| Chris Lattner | 2b90565 | 2004-05-24 05:35:17 +0000 | [diff] [blame] | 59 | <p>This document describes the LLVM bytecode | 
|  | 60 | file format. It specifies the binary encoding rules of the bytecode file format | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 61 | so that equivalent systems can encode bytecode files correctly.  The LLVM | 
|  | 62 | bytecode representation is used to store the intermediate representation on | 
|  | 63 | disk in compacted form. | 
|  | 64 | </p> | 
|  | 65 | </div> | 
|  | 66 | <!-- *********************************************************************** --> | 
|  | 67 | <div class="doc_section"> <a name="general">General Concepts</a> </div> | 
|  | 68 | <!-- *********************************************************************** --> | 
|  | 69 | <div class="doc_text"> | 
|  | 70 | <p>This section describes the general concepts of the bytecode file format | 
| Chris Lattner | 2b90565 | 2004-05-24 05:35:17 +0000 | [diff] [blame] | 71 | without getting into bit and byte level specifics.  Note that the LLVM bytecode | 
|  | 72 | format may change in the future, but will always be backwards compatible with | 
|  | 73 | older formats.  This document only describes the most current version of the | 
|  | 74 | bytecode format.</p> | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 75 | </div> | 
|  | 76 | <!-- _______________________________________________________________________ --> | 
|  | 77 | <div class="doc_subsection"><a name="blocks">Blocks</a> </div> | 
|  | 78 | <div class="doc_text"> | 
|  | 79 | <p>LLVM bytecode files consist simply of a sequence of blocks of bytes. | 
|  | 80 | Each block begins with an identification value that determines the type of | 
|  | 81 | the next block.  The possible types of blocks are described below in the section | 
| Reid Spencer | b39021b | 2004-05-23 17:05:09 +0000 | [diff] [blame] | 82 | <a href="#blocktypes">Block Types</a>. The block identifier is used because | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 83 | it is possible for entire blocks to be omitted from the file if they are | 
|  | 84 | empty. The block identifier helps the reader determine which kind of block is | 
|  | 85 | next in the file.</p> | 
| Reid Spencer | 939290f | 2004-05-22 05:56:41 +0000 | [diff] [blame] | 86 | <p>The following block identifiers are currently in use | 
|  | 87 | (from llvm/Bytecode/Format.h):</p> | 
|  | 88 | <ol> | 
|  | 89 | <li><b>Module (0x01)</b>.</li> | 
|  | 90 | <li><b>Function (0x11)</b>.</li> | 
|  | 91 | <li><b>ConstantPool (0x12)</b>.</li> | 
|  | 92 | <li><b>SymbolTable (0x13)</b>.</li> | 
|  | 93 | <li><b>ModuleGlobalInfo (0x14)</b>.</li> | 
|  | 94 | <li><b>GlobalTypePlane (0x15)</b>.</li> | 
|  | 95 | <li><b>BasicBlock (0x31)</b>.</li> | 
|  | 96 | <li><b>InstructionList (0x32)</b>.</li> | 
|  | 97 | <li><b>CompactionTable (0x33)</b>.</li> | 
|  | 98 | </ol> | 
| Chris Lattner | 2b90565 | 2004-05-24 05:35:17 +0000 | [diff] [blame] | 99 | <p> All blocks are variable length, and the block header specifies the size of | 
|  | 100 | the block.  All blocks are rounded aligned to even 32-bit boundaries, so they | 
|  | 101 | always start and end of this boundary.  Each block begins with an integer | 
|  | 102 | identifier and the length of the block, which does not include the padding | 
|  | 103 | bytes needed for alignment.</p> | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 104 | </div> | 
|  | 105 | <!-- _______________________________________________________________________ --> | 
|  | 106 | <div class="doc_subsection"><a name="lists">Lists</a> </div> | 
|  | 107 | <div class="doc_text"> | 
|  | 108 | <p>Most blocks are constructed of lists of information. Lists can be constructed | 
|  | 109 | of other lists, etc. This decomposition of information follows the containment | 
| Chris Lattner | 2b90565 | 2004-05-24 05:35:17 +0000 | [diff] [blame] | 110 | hierarchy of the LLVM Intermediate Representation. For example, a function | 
|  | 111 | contains a list of instructions (the terminator instructions implicitly define | 
|  | 112 | the end of the basic blocks).</p> | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 113 | <p>A list is encoded into the file simply by encoding the number of entries as | 
|  | 114 | an integer followed by each of the entries. The reader knows when the list is | 
|  | 115 | done because it will have filled the list with the required numbe of entries. | 
|  | 116 | </p> | 
|  | 117 | </div> | 
|  | 118 | <!-- _______________________________________________________________________ --> | 
|  | 119 | <div class="doc_subsection"><a name="fields">Fields</a> </div> | 
|  | 120 | <div class="doc_text"> | 
|  | 121 | <p>Fields are units of information that LLVM knows how to write atomically. | 
|  | 122 | Most fields have a uniform length or some kind of length indication built into | 
| Chris Lattner | 2b90565 | 2004-05-24 05:35:17 +0000 | [diff] [blame] | 123 | their encoding. For example, a constant string (array of bytes) is | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 124 | written simply as the length followed by the characters. Although this is | 
|  | 125 | similar to a list, constant strings are treated atomically and are thus | 
|  | 126 | fields.</p> | 
|  | 127 | <p>Fields use a condensed bit format specific to the type of information | 
|  | 128 | they must contain. As few bits as possible are written for each field. The | 
|  | 129 | sections that follow will provide the details on how these fields are | 
|  | 130 | written and how the bits are to be interpreted.</p> | 
|  | 131 | </div> | 
|  | 132 | <!-- _______________________________________________________________________ --> | 
| Reid Spencer | 7aa940d | 2004-05-25 15:47:57 +0000 | [diff] [blame] | 133 | <div class="doc_subsection"><a name="slots">Slots</a> </div> | 
|  | 134 | <div class="doc_text"> | 
|  | 135 | <p>The bytecode format uses the notion of a "slot" to reference Types and | 
|  | 136 | Values. Since the bytecode file is a <em>direct</em> representation of LLVM's | 
|  | 137 | intermediate representation, there is a need to represent pointers in the file. | 
|  | 138 | Slots are used for this purpose. For example, if one has the following assembly: | 
|  | 139 | </p> | 
| Chris Lattner | 8dabb50 | 2004-05-25 17:44:58 +0000 | [diff] [blame] | 140 |  | 
|  | 141 | <div class="doc_code"> | 
|  | 142 | %MyType = type { int, sbyte }<br> | 
|  | 143 | %MyVar = external global %MyType | 
|  | 144 | </div> | 
|  | 145 |  | 
|  | 146 | <p>there are two definitions. The definition of <tt>%MyVar</tt> uses | 
|  | 147 | <tt>%MyType</tt>. In the C++ IR this linkage between <tt>%MyVar</tt> and | 
|  | 148 | <tt>%MyType</tt> is | 
|  | 149 | explicit through the use of C++ pointers. In bytecode, however, there's no | 
| Reid Spencer | 7aa940d | 2004-05-25 15:47:57 +0000 | [diff] [blame] | 150 | ability to store memory addresses. Instead, we compute and write out slot | 
|  | 151 | numbers for every type and Value written to the file.</p> | 
|  | 152 | <p>A slot number is simply an unsigned 32-bit integer encoded in the variable | 
|  | 153 | bit rate scheme (see <a href="#encoding">encoding</a> below). This ensures that | 
|  | 154 | low slot numbers are encoded in one byte. Through various bits of magic LLVM | 
|  | 155 | attempts to always keep the slot numbers low. The first attempt is to associate | 
|  | 156 | slot numbers with their "type plane". That is, Values of the same type are | 
|  | 157 | written to the bytecode file in a list (sequentially). Their order in that list | 
|  | 158 | determines their slot number. This means that slot #1 doesn't mean anything | 
|  | 159 | unless you also specify for which type you want slot #1. Types are handled | 
|  | 160 | specially and are always written to the file first (in the Global Type Pool) and | 
| Chris Lattner | 8dabb50 | 2004-05-25 17:44:58 +0000 | [diff] [blame] | 161 | in such a way that both forward and backward references of the types can often be | 
| Reid Spencer | 7aa940d | 2004-05-25 15:47:57 +0000 | [diff] [blame] | 162 | resolved with a single pass through the type pool. </p> | 
|  | 163 | <p>Slot numbers are also kept small by rearranging their order. Because of the | 
|  | 164 | structure of LLVM, certain values are much more likely to be used frequently | 
|  | 165 | in the body of a function. For this reason, a compaction table is provided in | 
|  | 166 | the body of a function if its use would make the function body smaller. | 
|  | 167 | Suppose you have a function body that uses just the types "int*" and "{double}" | 
|  | 168 | but uses them thousands of time. Its worthwhile to ensure that the slot number | 
|  | 169 | for these types are low so they can be encoded in a single byte (via vbr). | 
|  | 170 | This is exactly what the compaction table does.</p> | 
|  | 171 | </div> | 
|  | 172 | <!-- _______________________________________________________________________ --> | 
| Reid Spencer | b39021b | 2004-05-23 17:05:09 +0000 | [diff] [blame] | 173 | <div class="doc_subsection"><a name="encoding">Encoding Primitives</a> </div> | 
|  | 174 | <div class="doc_text"> | 
|  | 175 | <p>Each field that can be put out is encoded into the file using a small set | 
|  | 176 | of primitives. The rules for these primitives are described below.</p> | 
|  | 177 | <h3>Variable Bit Rate Encoding</h3> | 
| Chris Lattner | 2b90565 | 2004-05-24 05:35:17 +0000 | [diff] [blame] | 178 | <p>Most of the values written to LLVM bytecode files are small integers.  To | 
|  | 179 | minimize the number of bytes written for these quantities, an encoding | 
| Reid Spencer | b39021b | 2004-05-23 17:05:09 +0000 | [diff] [blame] | 180 | scheme similar to UTF-8 is used to write integer data. The scheme is known as | 
|  | 181 | variable bit rate (vbr) encoding.  In this encoding, the high bit of each | 
|  | 182 | byte is used to indicate if more bytes follow. If (byte & 0x80) is non-zero | 
|  | 183 | in any given byte, it means there is another byte immediately following that | 
|  | 184 | also contributes to the value. For the final byte (byte & 0x80) is false | 
|  | 185 | (the high bit is not set). In each byte only the low seven bits contribute to | 
|  | 186 | the value. Consequently 32-bit quantities can take from one to <em>five</em> | 
|  | 187 | bytes to encode. In general, smaller quantities will encode in fewer bytes, | 
|  | 188 | as follows:</p> | 
|  | 189 | <table class="doc_table_nw"> | 
|  | 190 | <tr> | 
|  | 191 | <th>Byte #</th> | 
|  | 192 | <th>Significant Bits</th> | 
|  | 193 | <th>Maximum Value</th> | 
|  | 194 | </tr> | 
|  | 195 | <tr><td>1</td><td>0-6</td><td>127</td></tr> | 
|  | 196 | <tr><td>2</td><td>7-13</td><td>16,383</td></tr> | 
|  | 197 | <tr><td>3</td><td>14-20</td><td>2,097,151</td></tr> | 
|  | 198 | <tr><td>4</td><td>21-27</td><td>268,435,455</td></tr> | 
|  | 199 | <tr><td>5</td><td>28-34</td><td>34,359,738,367</td></tr> | 
|  | 200 | <tr><td>6</td><td>35-41</td><td>4,398,046,511,103</td></tr> | 
|  | 201 | <tr><td>7</td><td>42-48</td><td>562,949,953,421,311</td></tr> | 
|  | 202 | <tr><td>8</td><td>49-55</td><td>72,057,594,037,927,935</td></tr> | 
|  | 203 | <tr><td>9</td><td>56-62</td><td>9,223,372,036,854,775,807</td></tr> | 
|  | 204 | <tr><td>10</td><td>63-69</td><td>1,180,591,620,717,411,303,423</td></tr> | 
|  | 205 | </table> | 
| Chris Lattner | 2b90565 | 2004-05-24 05:35:17 +0000 | [diff] [blame] | 206 | <p>Note that in practice, the tenth byte could only encode bit 63 | 
| Reid Spencer | b39021b | 2004-05-23 17:05:09 +0000 | [diff] [blame] | 207 | since the maximum quantity to use this encoding is a 64-bit integer.</p> | 
| Chris Lattner | 2b90565 | 2004-05-24 05:35:17 +0000 | [diff] [blame] | 208 |  | 
|  | 209 | <p><em>Signed</em> VBR values are encoded with the standard vbr encoding, but | 
|  | 210 | with the sign bit as the low order bit instead of the high order bit.  This | 
|  | 211 | allows small negative quantities to be encoded efficiently.  For example, -3 | 
|  | 212 | is encoded as "((3 << 1) | 1)" and 3 is encoded as "(3 << 1) | | 
|  | 213 | 0)", emitted with the standard vbr encoding above.</p> | 
|  | 214 |  | 
| Reid Spencer | b39021b | 2004-05-23 17:05:09 +0000 | [diff] [blame] | 215 | <p>The table below defines the encoding rules for type names used in the | 
|  | 216 | descriptions of blocks and fields in the next section. Any type name with | 
|  | 217 | the suffix <em>_vbr</em> indicate a quantity that is encoded using | 
|  | 218 | variable bit rate encoding as described above.</p> | 
|  | 219 | <table class="doc_table" > | 
|  | 220 | <tr> | 
|  | 221 | <th><b>Type</b></th> | 
|  | 222 | <th align="left"><b>Rule</b></th> | 
|  | 223 | </tr> | 
|  | 224 | <tr> | 
|  | 225 | <td>unsigned</td> | 
|  | 226 | <td align="left">A 32-bit unsigned integer that always occupies four | 
|  | 227 | consecutive bytes. The unsigned integer is encoded using LSB first | 
|  | 228 | ordering. That is bits 2<sup>0</sup> through 2<sup>7</sup> are in the | 
|  | 229 | byte with the lowest file offset (little endian).</td> | 
|  | 230 | </tr><tr> | 
|  | 231 | <td>uint_vbr</td> | 
|  | 232 | <td align="left">A 32-bit unsigned integer that occupies from one to five | 
|  | 233 | bytes using variable bit rate encoding.</td> | 
|  | 234 | </tr><tr> | 
|  | 235 | <td>uint64_vbr</td> | 
|  | 236 | <td align="left">A 64-bit unsigned integer that occupies from one to ten | 
|  | 237 | bytes using variable bit rate encoding.</td> | 
|  | 238 | </tr><tr> | 
|  | 239 | <td>int64_vbr</td> | 
|  | 240 | <td align="left">A 64-bit signed integer that occupies from one to ten | 
| Chris Lattner | 2b90565 | 2004-05-24 05:35:17 +0000 | [diff] [blame] | 241 | bytes using the signed variable bit rate encoding.</td> | 
| Reid Spencer | b39021b | 2004-05-23 17:05:09 +0000 | [diff] [blame] | 242 | </tr><tr> | 
|  | 243 | <td>char</td> | 
|  | 244 | <td align="left">A single unsigned character encoded into one byte</td> | 
|  | 245 | </tr><tr> | 
|  | 246 | <td>bit</td> | 
|  | 247 | <td align="left">A single bit within a byte.</td> | 
|  | 248 | </tr><tr> | 
|  | 249 | <td>string</td> | 
|  | 250 | <td align="left">A uint_vbr indicating the length of the character string | 
|  | 251 | immediately followed by the characters of the string. There is no | 
| Chris Lattner | 2b90565 | 2004-05-24 05:35:17 +0000 | [diff] [blame] | 252 | terminating null byte in the string.</td> | 
| Reid Spencer | b39021b | 2004-05-23 17:05:09 +0000 | [diff] [blame] | 253 | </tr><tr> | 
|  | 254 | <td>data</td> | 
|  | 255 | <td align="left">An arbitrarily long segment of data to which no | 
|  | 256 | interpretation is implied. This is used for float, double, and constant | 
|  | 257 | initializers.</td> | 
|  | 258 | </tr> | 
|  | 259 | </table> | 
|  | 260 | </div> | 
|  | 261 | <!-- _______________________________________________________________________ --> | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 262 | <div class="doc_subsection"><a name="align">Alignment</a> </div> | 
|  | 263 | <div class="doc_text"> | 
|  | 264 | <p>To support cross-platform differences, the bytecode file is aligned on | 
|  | 265 | certain boundaries. This means that a small amount of padding (at most 3 bytes) | 
|  | 266 | will be added to ensure that the next entry is aligned to a 32-bit boundary. | 
|  | 267 | </p> | 
|  | 268 | </div> | 
|  | 269 | <!-- *********************************************************************** --> | 
|  | 270 | <div class="doc_section"> <a name="details">Detailed Layout</a> </div> | 
|  | 271 | <!-- *********************************************************************** --> | 
|  | 272 | <div class="doc_text"> | 
| Reid Spencer | b39021b | 2004-05-23 17:05:09 +0000 | [diff] [blame] | 273 | <p>This section provides the detailed layout of the LLVM bytecode file format. | 
|  | 274 | bit and byte level specifics.</p> | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 275 | </div> | 
|  | 276 | <!-- _______________________________________________________________________ --> | 
|  | 277 | <div class="doc_subsection"><a name="notation">Notation</a></div> | 
|  | 278 | <div class="doc_text"> | 
|  | 279 | <p>The descriptions of the bytecode format that follow describe the bit | 
|  | 280 | fields in detail. These descriptions are provided in tabular form. Each table | 
|  | 281 | has four columns that specify:</p> | 
|  | 282 | <ol> | 
| Chris Lattner | 2b90565 | 2004-05-24 05:35:17 +0000 | [diff] [blame] | 283 | <li><b>Byte(s)</b>: The offset in bytes of the field from the start of | 
| Reid Spencer | 6f1d699 | 2004-05-23 17:12:45 +0000 | [diff] [blame] | 284 | its container (block, list, other field).</li> | 
| Chris Lattner | 2b90565 | 2004-05-24 05:35:17 +0000 | [diff] [blame] | 285 | <li><b>Bit(s)</b>: The offset in bits of the field from the start of | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 286 | the byte field. Bits are always little endian. That is, bit addresses with | 
| Reid Spencer | 6f1d699 | 2004-05-23 17:12:45 +0000 | [diff] [blame] | 287 | smaller values have smaller address (i.e. 2<sup>0</sup> is at bit 0, | 
|  | 288 | 2<sup>1</sup> at 1, etc.) | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 289 | </li> | 
| Chris Lattner | 2b90565 | 2004-05-24 05:35:17 +0000 | [diff] [blame] | 290 | <li><b>Align?</b>: Indicates if this field is aligned to 32 bits or not. | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 291 | This indicates where the <em>next</em> field starts, always on a 32 bit | 
|  | 292 | boundary.</li> | 
| Chris Lattner | 2b90565 | 2004-05-24 05:35:17 +0000 | [diff] [blame] | 293 | <li><b>Type</b>: The basic type of information contained in the field.</li> | 
|  | 294 | <li><b>Description</b>: Describes the contents of the field.</li> | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 295 | </ol> | 
|  | 296 | </div> | 
|  | 297 | <!-- _______________________________________________________________________ --> | 
|  | 298 | <div class="doc_subsection"><a name="blocktypes">Block Types</a></div> | 
|  | 299 | <div class="doc_text"> | 
|  | 300 | <p>The bytecode format encodes the intermediate representation into groups | 
|  | 301 | of bytes known as blocks. The blocks are written sequentially to the file in | 
|  | 302 | the following order:</p> | 
|  | 303 | <ol> | 
| Chris Lattner | 2b90565 | 2004-05-24 05:35:17 +0000 | [diff] [blame] | 304 | <li><a href="#signature">Signature</a>: This contains the file signature | 
|  | 305 | (magic number) that identifies the file as LLVM bytecode and the bytecode | 
|  | 306 | version number.</li> | 
|  | 307 | <li><a href="#module">Module Block</a>: This is the top level block in a | 
| Reid Spencer | b39021b | 2004-05-23 17:05:09 +0000 | [diff] [blame] | 308 | bytecode file. It contains all the other blocks.</li> | 
| Chris Lattner | 2b90565 | 2004-05-24 05:35:17 +0000 | [diff] [blame] | 309 | <li><a href="#gtypepool">Global Type Pool</a>: This block contains all the | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 310 | global (module) level types.</li> | 
| Chris Lattner | 2b90565 | 2004-05-24 05:35:17 +0000 | [diff] [blame] | 311 | <li><a href="#modinfo">Module Info</a>: This block contains the types of the | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 312 | global variables and functions in the module as well as the constant | 
|  | 313 | initializers for the global variables</li> | 
| Chris Lattner | 2b90565 | 2004-05-24 05:35:17 +0000 | [diff] [blame] | 314 | <li><a href="#constants">Constants</a>: This block contains all the global | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 315 | constants except function arguments, global values and constant strings.</li> | 
| Chris Lattner | 2b90565 | 2004-05-24 05:35:17 +0000 | [diff] [blame] | 316 | <li><a href="#functions">Functions</a>: One function block is written for | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 317 | each function in the module. </li> | 
| Chris Lattner | 2b90565 | 2004-05-24 05:35:17 +0000 | [diff] [blame] | 318 | <li><a href="$symtab">Symbol Table</a>: The module level symbol table that | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 319 | provides names for the various other entries in the file is the final block | 
|  | 320 | written.</li> | 
|  | 321 | </ol> | 
|  | 322 | </div> | 
|  | 323 | <!-- _______________________________________________________________________ --> | 
| Reid Spencer | b39021b | 2004-05-23 17:05:09 +0000 | [diff] [blame] | 324 | <div class="doc_subsection"><a name="signature">Signature Block</a> </div> | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 325 | <div class="doc_text"> | 
| Chris Lattner | 2b90565 | 2004-05-24 05:35:17 +0000 | [diff] [blame] | 326 | <p>The signature occurs in every LLVM bytecode file and is always first. | 
| Reid Spencer | b39021b | 2004-05-23 17:05:09 +0000 | [diff] [blame] | 327 | It simply provides a few bytes of data to identify the file as being an LLVM | 
|  | 328 | bytecode file. This block is always four bytes in length and differs from the | 
|  | 329 | other blocks because there is no identifier and no block length at the start | 
|  | 330 | of the block. Essentially, this block is just the "magic number" for the file. | 
|  | 331 | <table class="doc_table_nw" > | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 332 | <tr> | 
|  | 333 | <th><b>Byte(s)</b></th> | 
|  | 334 | <th><b>Bit(s)</b></th> | 
|  | 335 | <th><b>Align?</b></th> | 
| Reid Spencer | 939290f | 2004-05-22 05:56:41 +0000 | [diff] [blame] | 336 | <th><b>Type</b></th> | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 337 | <th align="left"><b>Field Description</b></th> | 
| Reid Spencer | b39021b | 2004-05-23 17:05:09 +0000 | [diff] [blame] | 338 | </tr><tr> | 
|  | 339 | <td>00</td><td>-</td><td>No</td><td>char</td> | 
| Reid Spencer | 939290f | 2004-05-22 05:56:41 +0000 | [diff] [blame] | 340 | <td align="left">Constant "l" (0x6C)</td> | 
| Reid Spencer | b39021b | 2004-05-23 17:05:09 +0000 | [diff] [blame] | 341 | </tr><tr> | 
|  | 342 | <td>01</td><td>-</td><td>No</td><td>char</td> | 
| Reid Spencer | 939290f | 2004-05-22 05:56:41 +0000 | [diff] [blame] | 343 | <td align="left">Constant "l" (0x6C)</td> | 
| Reid Spencer | b39021b | 2004-05-23 17:05:09 +0000 | [diff] [blame] | 344 | </tr><tr> | 
|  | 345 | <td>02</td><td>-</td><td>No</td><td>char</td> | 
| Reid Spencer | 939290f | 2004-05-22 05:56:41 +0000 | [diff] [blame] | 346 | <td align="left">Constant "v" (0x76)</td> | 
| Reid Spencer | b39021b | 2004-05-23 17:05:09 +0000 | [diff] [blame] | 347 | </tr><tr> | 
|  | 348 | <td>03</td><td>-</td><td>No</td><td>char</td> | 
| Reid Spencer | 939290f | 2004-05-22 05:56:41 +0000 | [diff] [blame] | 349 | <td align="left">Constant "m" (0x6D)</td> | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 350 | </tr> | 
| Reid Spencer | b39021b | 2004-05-23 17:05:09 +0000 | [diff] [blame] | 351 | </table> | 
|  | 352 | </div> | 
|  | 353 | <!-- _______________________________________________________________________ --> | 
|  | 354 | <div class="doc_subsection"><a name="module">Module Block</a> </div> | 
|  | 355 | <div class="doc_text"> | 
|  | 356 | <p>The module block contains a small pre-amble and all the other blocks in | 
|  | 357 | the file. Of particular note, the bytecode format number is simply a 28-bit | 
|  | 358 | monotonically increase integer that identifiers the version of the bytecode | 
| Chris Lattner | 2b90565 | 2004-05-24 05:35:17 +0000 | [diff] [blame] | 359 | format (which is not directly related to the LLVM release number).  The | 
|  | 360 | bytecode versions defined so far are (note that this document only describes | 
|  | 361 | the latest version): </p> | 
|  | 362 |  | 
|  | 363 | <ul> | 
|  | 364 | <li>#0: LLVM 1.0 & 1.1</li> | 
|  | 365 | <li>#1: LLVM 1.2</li> | 
|  | 366 | <li>#2: LLVM 1.3</li> | 
|  | 367 | </ul> | 
|  | 368 |  | 
|  | 369 | <p>The table below shows the format of the module block header. It is defined | 
|  | 370 | by blocks described in other sections.</p> | 
| Reid Spencer | b39021b | 2004-05-23 17:05:09 +0000 | [diff] [blame] | 371 | <table class="doc_table_nw" > | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 372 | <tr> | 
| Reid Spencer | b39021b | 2004-05-23 17:05:09 +0000 | [diff] [blame] | 373 | <th><b>Byte(s)</b></th> | 
|  | 374 | <th><b>Bit(s)</b></th> | 
|  | 375 | <th><b>Align?</b></th> | 
|  | 376 | <th><b>Type</b></th> | 
|  | 377 | <th align="left"><b>Field Description</b></th> | 
|  | 378 | </tr><tr> | 
|  | 379 | <td>04-07</td><td>-</td><td>No</td><td>unsigned</td> | 
|  | 380 | <td align="left">Module Identifier (0x01)</td> | 
|  | 381 | </tr><tr> | 
|  | 382 | <td>08-11</td><td>-</td><td>No</td><td>unsigned</td> | 
|  | 383 | <td align="left">Size of the module block in bytes</td> | 
|  | 384 | </tr><tr> | 
|  | 385 | <td>12-15</td><td>00</td><td>Yes</td><td>uint32_vbr</td> | 
|  | 386 | <td align="left">Format Information</td> | 
|  | 387 | </tr><tr> | 
|  | 388 | <td>''</td><td>0</td><td>-</td><td>bit</td> | 
|  | 389 | <td align="left">Big Endian?</td> | 
|  | 390 | </tr><tr> | 
|  | 391 | <td>''</td><td>1</td><td>-</td><td>bit</td> | 
|  | 392 | <td align="left">Pointers Are 64-bit?</td> | 
|  | 393 | </tr><tr> | 
|  | 394 | <td>''</td><td>2</td><td>-</td><td>bit</td> | 
|  | 395 | <td align="left">Has No Endianess?</td> | 
|  | 396 | </tr><tr> | 
|  | 397 | <td>''</td><td>3</td><td>-</td><td>bit</td> | 
|  | 398 | <td align="left">Has No Pointer Size?</td> | 
|  | 399 | </tr><tr> | 
|  | 400 | <td>''</td><td>4-31</td><td>-</td><td>bit</td> | 
|  | 401 | <td align="left">Bytecode Format Version</td> | 
|  | 402 | </tr><tr> | 
|  | 403 | <td>16-end</td><td>-</td><td>-</td><td>blocks</td> | 
|  | 404 | <td align="left">The remaining bytes in the block consist | 
|  | 405 | solely of other block types in sequence.</td> | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 406 | </tr> | 
|  | 407 | </table> | 
| Chris Lattner | 2b90565 | 2004-05-24 05:35:17 +0000 | [diff] [blame] | 408 |  | 
|  | 409 | <p>Note that we plan to eventually expand the target description capabilities | 
|  | 410 | of bytecode files to <a href="http://llvm.cs.uiuc.edu/PR263">target | 
|  | 411 | triples</a>.</p> | 
|  | 412 |  | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 413 | </div> | 
| Chris Lattner | 2b90565 | 2004-05-24 05:35:17 +0000 | [diff] [blame] | 414 |  | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 415 | <!-- _______________________________________________________________________ --> | 
|  | 416 | <div class="doc_subsection"><a name="gtypepool">Global Type Pool</a> </div> | 
|  | 417 | <div class="doc_text"> | 
| Chris Lattner | 2b90565 | 2004-05-24 05:35:17 +0000 | [diff] [blame] | 418 | <p>The global type pool consists of type definitions. Their order of appearance | 
| Reid Spencer | b39021b | 2004-05-23 17:05:09 +0000 | [diff] [blame] | 419 | in the file determines their slot number (0 based). Slot numbers are used to | 
|  | 420 | replace pointers in the intermediate representation. Each slot number uniquely | 
|  | 421 | identifies one entry in a type plane (a collection of values of the same type). | 
|  | 422 | Since all values have types and are associated with the order in which the type | 
|  | 423 | pool is written, the global type pool <em>must</em> be written as the first | 
|  | 424 | block of a module. If it is not, attempts to read the file will fail because | 
|  | 425 | both forward and backward type resolution will not be possible.</p> | 
|  | 426 | <p>The type pool is simply a list of types definitions, as shown in the table | 
|  | 427 | below.</p> | 
|  | 428 | <table class="doc_table_nw" > | 
|  | 429 | <tr> | 
|  | 430 | <th><b>Byte(s)</b></th> | 
|  | 431 | <th><b>Bit(s)</b></th> | 
|  | 432 | <th><b>Align?</b></th> | 
|  | 433 | <th><b>Type</b></th> | 
|  | 434 | <th align="left"><b>Field Description</b></th> | 
|  | 435 | </tr><tr> | 
|  | 436 | <td>00-03</td><td>-</td><td>No</td><td>unsigned</td> | 
|  | 437 | <td align="left">Type Pool Identifier (0x13)</td> | 
|  | 438 | </tr><tr> | 
|  | 439 | <td>04-07</td><td>-</td><td>No</td><td>unsigned</td> | 
|  | 440 | <td align="left">Size in bytes of the symbol table block.</td> | 
|  | 441 | </tr><tr> | 
|  | 442 | <td>08-11<sup>1</sup></td><td>-</td><td>No</td><td>uint32_vbr</td> | 
|  | 443 | <td align="left">Number of entries in type plane</td> | 
|  | 444 | </tr><tr> | 
|  | 445 | <td>12-15<sup>1</sup></td><td>-</td><td>No</td><td>uint32_vbr</td> | 
|  | 446 | <td align="left">Type plane index for following entries</td> | 
|  | 447 | </tr><tr> | 
|  | 448 | <td>16-end<sup>1,2</sup></td><td>-</td><td>No</td><td>type</td> | 
|  | 449 | <td align="left">Each of the type definitions.</td> | 
|  | 450 | </tr><tr> | 
|  | 451 | <td align="left" colspan="5"><sup>1</sup>Maximum length shown, | 
|  | 452 | may be smaller<br><sup>2</sup>Repeated field. | 
|  | 453 | </tr> | 
|  | 454 | </table> | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 455 | </div> | 
|  | 456 | <!-- _______________________________________________________________________ --> | 
|  | 457 | <div class="doc_subsection"><a name="modinfo">Module Info</a> </div> | 
|  | 458 | <div class="doc_text"> | 
| Reid Spencer | b39021b | 2004-05-23 17:05:09 +0000 | [diff] [blame] | 459 | <p>To be determined.</p> | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 460 | </div> | 
|  | 461 | <!-- _______________________________________________________________________ --> | 
|  | 462 | <div class="doc_subsection"><a name="constants">Constants</a> </div> | 
|  | 463 | <div class="doc_text"> | 
| Reid Spencer | b39021b | 2004-05-23 17:05:09 +0000 | [diff] [blame] | 464 | <p>To be determined.</p> | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 465 | </div> | 
|  | 466 | <!-- _______________________________________________________________________ --> | 
|  | 467 | <div class="doc_subsection"><a name="functions">Functions</a> </div> | 
|  | 468 | <div class="doc_text"> | 
| Reid Spencer | b39021b | 2004-05-23 17:05:09 +0000 | [diff] [blame] | 469 | <p>To be determined.</p> | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 470 | </div> | 
|  | 471 | <!-- _______________________________________________________________________ --> | 
| Reid Spencer | b39021b | 2004-05-23 17:05:09 +0000 | [diff] [blame] | 472 | <div class="doc_subsection"><a name="symtab">Symbol Table</a> </div> | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 473 | <div class="doc_text"> | 
| Reid Spencer | b39021b | 2004-05-23 17:05:09 +0000 | [diff] [blame] | 474 | <p>A symbol table can be put out in conjunction with a module or a function. | 
|  | 475 | A symbol table is a list of type planes. Each type plane starts with the number | 
|  | 476 | of entries in the plane and the type plane's slot number (so the type can be | 
|  | 477 | looked up in the global type pool). For each entry in a type plane, the slot | 
|  | 478 | number of the value and the name associated with that value are written.  The | 
|  | 479 | format is given in the table below. </p> | 
|  | 480 | <table class="doc_table_nw" > | 
|  | 481 | <tr> | 
|  | 482 | <th><b>Byte(s)</b></th> | 
|  | 483 | <th><b>Bit(s)</b></th> | 
|  | 484 | <th><b>Align?</b></th> | 
|  | 485 | <th><b>Type</b></th> | 
|  | 486 | <th align="left"><b>Field Description</b></th> | 
|  | 487 | </tr><tr> | 
|  | 488 | <td>00-03</td><td>-</td><td>No</td><td>unsigned</td> | 
|  | 489 | <td align="left">Symbol Table Identifier (0x13)</td> | 
|  | 490 | </tr><tr> | 
|  | 491 | <td>04-07</td><td>-</td><td>No</td><td>unsigned</td> | 
|  | 492 | <td align="left">Size in bytes of the symbol table block.</td> | 
|  | 493 | </tr><tr> | 
|  | 494 | <td>08-11<sup>1</sup></td><td>-</td><td>No</td><td>uint32_vbr</td> | 
|  | 495 | <td align="left">Number of entries in type plane</td> | 
|  | 496 | </tr><tr> | 
|  | 497 | <td>12-15<sup>1</sup></td><td>-</td><td>No</td><td>uint32_vbr</td> | 
|  | 498 | <td align="left">Type plane index for following entries</td> | 
|  | 499 | </tr><tr> | 
|  | 500 | <td>16-19<sup>1,2</sup></td><td>-</td><td>No</td><td>uint32_vbr</td> | 
|  | 501 | <td align="left">Slot number of a value.</td> | 
|  | 502 | </tr><tr> | 
|  | 503 | <td>variable<sup>1,2</sup></td><td>-</td><td>No</td><td>string</td> | 
|  | 504 | <td align="left">Name of the value in the symbol table.</td> | 
|  | 505 | </tr> | 
|  | 506 | <tr> | 
|  | 507 | <td align="left" colspan="5"><sup>1</sup>Maximum length shown, | 
|  | 508 | may be smaller<br><sup>2</sup>Repeated field. | 
|  | 509 | </tr> | 
|  | 510 | </table> | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 511 | </div> | 
| Reid Spencer | 7c76d33 | 2004-06-08 07:41:41 +0000 | [diff] [blame^] | 512 | <!-- *********************************************************************** --> | 
|  | 513 | <div class="doc_section"> <a name="versiondiffs">Version Differences</a> </div> | 
|  | 514 | <!-- *********************************************************************** --> | 
|  | 515 | <div class="doc_text"> | 
|  | 516 | <p>This section describes the differences in the Bytecode Format across LLVM | 
|  | 517 | versions. The versions are listed in reverse order because it assumes the | 
|  | 518 | current version is as documented in the previous sections. Each section here | 
|  | 519 | describes the differences between that version and the one that <i>follows</i> | 
|  | 520 | </p> | 
|  | 521 | </div> | 
|  | 522 | <!-- _______________________________________________________________________ --> | 
|  | 523 | <div class="doc_subsection"> | 
|  | 524 | <a name="vers12">Version 1.2 Differences From 1.3</a></div> | 
|  | 525 | <div class="doc_text"> | 
|  | 526 | <p>TBD: How version 1.2 differs from version 1.3</p> | 
|  | 527 | </div> | 
|  | 528 |  | 
|  | 529 | <!-- _______________________________________________________________________ --> | 
|  | 530 | <div class="doc_subsection"> | 
|  | 531 | <a name="vers11">Version 1.1 Differences From 1.2 </a></div> | 
|  | 532 | <div class="doc_text"> | 
|  | 533 | <p>TBD: How version 1.1 differs from version 1.2</p> | 
|  | 534 | </div> | 
|  | 535 |  | 
|  | 536 | <!-- _______________________________________________________________________ --> | 
|  | 537 | <div class="doc_subsection"> | 
|  | 538 | <a name="vers11">Version 1.0 Differences From 1.1</a></div> | 
|  | 539 | <div class="doc_text"> | 
|  | 540 | <p>TBD: How version 1.0 differs from version 1.1</p> | 
|  | 541 | </div> | 
| Reid Spencer | 5002661 | 2004-05-22 02:28:36 +0000 | [diff] [blame] | 542 |  | 
|  | 543 | <!-- *********************************************************************** --> | 
|  | 544 | <hr> | 
|  | 545 | <address> | 
|  | 546 | <a href="http://jigsaw.w3.org/css-validator/check/referer"><img | 
|  | 547 | src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> | 
|  | 548 | <a href="http://validator.w3.org/check/referer"><img | 
|  | 549 | src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!" /></a> | 
|  | 550 |  | 
|  | 551 | <a href="mailto:rspencer@x10sys.com">Reid Spencer</a> and | 
|  | 552 | <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> | 
|  | 553 | <a href="http://llvm.cs.uiuc.edu">The LLVM Compiler Infrastructure</a><br> | 
|  | 554 | Last modified: $Date$ | 
|  | 555 | </address> | 
|  | 556 | </body> | 
|  | 557 | </html> | 
|  | 558 | <!-- vim: sw=2 | 
|  | 559 | --> |