| Chris Lattner | 3a1716d | 2007-05-12 05:37:42 +0000 | [diff] [blame] | 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" | 
|  | 2 | "http://www.w3.org/TR/html4/strict.dtd"> | 
| Reid Spencer | 2c1ce4f | 2007-01-20 23:21:08 +0000 | [diff] [blame] | 3 | <html> | 
|  | 4 | <head> | 
|  | 5 | <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> | 
|  | 6 | <title>LLVM Bitcode File Format</title> | 
|  | 7 | <link rel="stylesheet" href="llvm.css" type="text/css"> | 
| Reid Spencer | 2c1ce4f | 2007-01-20 23:21:08 +0000 | [diff] [blame] | 8 | </head> | 
|  | 9 | <body> | 
|  | 10 | <div class="doc_title"> LLVM Bitcode File Format </div> | 
|  | 11 | <ol> | 
|  | 12 | <li><a href="#abstract">Abstract</a></li> | 
| Chris Lattner | e9ef457 | 2007-05-12 03:23:40 +0000 | [diff] [blame] | 13 | <li><a href="#overview">Overview</a></li> | 
|  | 14 | <li><a href="#bitstream">Bitstream Format</a> | 
|  | 15 | <ol> | 
|  | 16 | <li><a href="#magic">Magic Numbers</a></li> | 
| Chris Lattner | 3a1716d | 2007-05-12 05:37:42 +0000 | [diff] [blame] | 17 | <li><a href="#primitives">Primitives</a></li> | 
|  | 18 | <li><a href="#abbrevid">Abbreviation IDs</a></li> | 
|  | 19 | <li><a href="#blocks">Blocks</a></li> | 
|  | 20 | <li><a href="#datarecord">Data Records</a></li> | 
| Chris Lattner | daeb63c | 2007-05-12 07:49:15 +0000 | [diff] [blame^] | 21 | <li><a href="#abbreviations">Abbreviations</a></li> | 
| Chris Lattner | e9ef457 | 2007-05-12 03:23:40 +0000 | [diff] [blame] | 22 | </ol> | 
|  | 23 | </li> | 
|  | 24 | <li><a href="#llvmir">LLVM IR Encoding</a></li> | 
| Reid Spencer | 2c1ce4f | 2007-01-20 23:21:08 +0000 | [diff] [blame] | 25 | </ol> | 
|  | 26 | <div class="doc_author"> | 
| Chris Lattner | e9ef457 | 2007-05-12 03:23:40 +0000 | [diff] [blame] | 27 | <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a>. | 
| Reid Spencer | 2c1ce4f | 2007-01-20 23:21:08 +0000 | [diff] [blame] | 28 | </p> | 
|  | 29 | </div> | 
| Chris Lattner | e9ef457 | 2007-05-12 03:23:40 +0000 | [diff] [blame] | 30 |  | 
| Reid Spencer | 2c1ce4f | 2007-01-20 23:21:08 +0000 | [diff] [blame] | 31 | <!-- *********************************************************************** --> | 
| Chris Lattner | e9ef457 | 2007-05-12 03:23:40 +0000 | [diff] [blame] | 32 | <div class="doc_section"> <a name="abstract">Abstract</a></div> | 
| Reid Spencer | 2c1ce4f | 2007-01-20 23:21:08 +0000 | [diff] [blame] | 33 | <!-- *********************************************************************** --> | 
| Chris Lattner | e9ef457 | 2007-05-12 03:23:40 +0000 | [diff] [blame] | 34 |  | 
| Reid Spencer | 2c1ce4f | 2007-01-20 23:21:08 +0000 | [diff] [blame] | 35 | <div class="doc_text"> | 
| Chris Lattner | e9ef457 | 2007-05-12 03:23:40 +0000 | [diff] [blame] | 36 |  | 
|  | 37 | <p>This document describes the LLVM bitstream file format and the encoding of | 
|  | 38 | the LLVM IR into it.</p> | 
|  | 39 |  | 
| Reid Spencer | 2c1ce4f | 2007-01-20 23:21:08 +0000 | [diff] [blame] | 40 | </div> | 
| Chris Lattner | e9ef457 | 2007-05-12 03:23:40 +0000 | [diff] [blame] | 41 |  | 
| Reid Spencer | 2c1ce4f | 2007-01-20 23:21:08 +0000 | [diff] [blame] | 42 | <!-- *********************************************************************** --> | 
| Chris Lattner | e9ef457 | 2007-05-12 03:23:40 +0000 | [diff] [blame] | 43 | <div class="doc_section"> <a name="overview">Overview</a></div> | 
| Reid Spencer | 2c1ce4f | 2007-01-20 23:21:08 +0000 | [diff] [blame] | 44 | <!-- *********************************************************************** --> | 
| Chris Lattner | e9ef457 | 2007-05-12 03:23:40 +0000 | [diff] [blame] | 45 |  | 
| Reid Spencer | 2c1ce4f | 2007-01-20 23:21:08 +0000 | [diff] [blame] | 46 | <div class="doc_text"> | 
| Chris Lattner | e9ef457 | 2007-05-12 03:23:40 +0000 | [diff] [blame] | 47 |  | 
|  | 48 | <p> | 
|  | 49 | What is commonly known as the LLVM bitcode file format (also, sometimes | 
|  | 50 | anachronistically known as bytecode) is actually two things: a <a | 
|  | 51 | href="#bitstream">bitstream container format</a> | 
|  | 52 | and an <a href="#llvmir">encoding of LLVM IR</a> into the container format.</p> | 
|  | 53 |  | 
|  | 54 | <p> | 
|  | 55 | The bitstream format is an abstract encoding of structured data, like very | 
|  | 56 | similar to XML in some ways.  Like XML, bitstream files contain tags, and nested | 
|  | 57 | structures, and you can parse the file without having to understand the tags. | 
|  | 58 | Unlike XML, the bitstream format is a binary encoding, and unlike XML it | 
|  | 59 | provides a mechanism for the file to self-describe "abbreviations", which are | 
|  | 60 | effectively size optimizations for the content.</p> | 
|  | 61 |  | 
|  | 62 | <p>This document first describes the LLVM bitstream format, then describes the | 
|  | 63 | record structure used by LLVM IR files. | 
|  | 64 | </p> | 
|  | 65 |  | 
| Reid Spencer | 2c1ce4f | 2007-01-20 23:21:08 +0000 | [diff] [blame] | 66 | </div> | 
| Chris Lattner | e9ef457 | 2007-05-12 03:23:40 +0000 | [diff] [blame] | 67 |  | 
|  | 68 | <!-- *********************************************************************** --> | 
|  | 69 | <div class="doc_section"> <a name="bitstream">Bitstream Format</a></div> | 
|  | 70 | <!-- *********************************************************************** --> | 
|  | 71 |  | 
|  | 72 | <div class="doc_text"> | 
|  | 73 |  | 
|  | 74 | <p> | 
|  | 75 | The bitstream format is literally a stream of bits, with a very simple | 
|  | 76 | structure.  This structure consists of the following concepts: | 
|  | 77 | </p> | 
|  | 78 |  | 
|  | 79 | <ul> | 
| Chris Lattner | 3a1716d | 2007-05-12 05:37:42 +0000 | [diff] [blame] | 80 | <li>A "<a href="#magic">magic number</a>" that identifies the contents of | 
|  | 81 | the stream.</li> | 
|  | 82 | <li>Encoding <a href="#primitives">primitives</a> like variable bit-rate | 
|  | 83 | integers.</li> | 
|  | 84 | <li><a href="#blocks">Blocks</a>, which define nested content.</li> | 
|  | 85 | <li><a href="#datarecord">Data Records</a>, which describe entities within the | 
|  | 86 | file.</li> | 
| Chris Lattner | e9ef457 | 2007-05-12 03:23:40 +0000 | [diff] [blame] | 87 | <li>Abbreviations, which specify compression optimizations for the file.</li> | 
|  | 88 | </ul> | 
|  | 89 |  | 
|  | 90 | <p>Note that the <a | 
|  | 91 | href="CommandGuide/html/llvm-bcanalyzer.html">llvm-bcanalyzer</a> tool can be | 
|  | 92 | used to dump and inspect arbitrary bitstreams, which is very useful for | 
|  | 93 | understanding the encoding.</p> | 
|  | 94 |  | 
|  | 95 | </div> | 
|  | 96 |  | 
|  | 97 | <!-- ======================================================================= --> | 
|  | 98 | <div class="doc_subsection"><a name="magic">Magic Numbers</a> | 
|  | 99 | </div> | 
|  | 100 |  | 
|  | 101 | <div class="doc_text"> | 
|  | 102 |  | 
| Chris Lattner | 3a1716d | 2007-05-12 05:37:42 +0000 | [diff] [blame] | 103 | <p>The first four bytes of the stream identify the encoding of the file.  This | 
|  | 104 | is used by a reader to know what is contained in the file.</p> | 
| Chris Lattner | e9ef457 | 2007-05-12 03:23:40 +0000 | [diff] [blame] | 105 |  | 
|  | 106 | </div> | 
|  | 107 |  | 
| Chris Lattner | 3a1716d | 2007-05-12 05:37:42 +0000 | [diff] [blame] | 108 | <!-- ======================================================================= --> | 
|  | 109 | <div class="doc_subsection"><a name="primitives">Primitives</a> | 
|  | 110 | </div> | 
| Chris Lattner | e9ef457 | 2007-05-12 03:23:40 +0000 | [diff] [blame] | 111 |  | 
|  | 112 | <div class="doc_text"> | 
|  | 113 |  | 
| Chris Lattner | 3a1716d | 2007-05-12 05:37:42 +0000 | [diff] [blame] | 114 | <p> | 
|  | 115 | A bitstream literally consists of a stream of bits.  This stream is made up of a | 
|  | 116 | number of primitive values that encode a stream of integer values.  These | 
|  | 117 | integers are are encoded in two ways: either as <a href="#fixedwidth">Fixed | 
|  | 118 | Width Integers</a> or as <a href="#variablewidth">Variable Width | 
|  | 119 | Integers</a>. | 
| Chris Lattner | e9ef457 | 2007-05-12 03:23:40 +0000 | [diff] [blame] | 120 | </p> | 
|  | 121 |  | 
|  | 122 | </div> | 
|  | 123 |  | 
| Chris Lattner | 3a1716d | 2007-05-12 05:37:42 +0000 | [diff] [blame] | 124 | <!-- _______________________________________________________________________ --> | 
|  | 125 | <div class="doc_subsubsection"> <a name="fixedwidth">Fixed Width Integers</a> | 
|  | 126 | </div> | 
|  | 127 |  | 
|  | 128 | <div class="doc_text"> | 
|  | 129 |  | 
|  | 130 | <p>Fixed-width integer values have their low bits emitted directly to the file. | 
|  | 131 | For example, a 3-bit integer value encodes 1 as 001.  Fixed width integers | 
|  | 132 | are used when there are a well-known number of options for a field.  For | 
|  | 133 | example, boolean values are usually encoded with a 1-bit wide integer. | 
|  | 134 | </p> | 
|  | 135 |  | 
|  | 136 | </div> | 
|  | 137 |  | 
|  | 138 | <!-- _______________________________________________________________________ --> | 
|  | 139 | <div class="doc_subsubsection"> <a name="variablewidth">Variable Width | 
|  | 140 | Integers</a></div> | 
|  | 141 |  | 
|  | 142 | <div class="doc_text"> | 
|  | 143 |  | 
|  | 144 | <p>Variable-width integer (VBR) values encode values of arbitrary size, | 
|  | 145 | optimizing for the case where the values are small.  Given a 4-bit VBR field, | 
|  | 146 | any 3-bit value (0 through 7) is encoded directly, with the high bit set to | 
|  | 147 | zero.  Values larger than N-1 bits emit their bits in a series of N-1 bit | 
|  | 148 | chunks, where all but the last set the high bit.</p> | 
|  | 149 |  | 
|  | 150 | <p>For example, the value 27 (0x1B) is encoded as 1011 0011 when emitted as a | 
|  | 151 | vbr4 value.  The first set of four bits indicates the value 3 (011) with a | 
|  | 152 | continuation piece (indicated by a high bit of 1).  The next word indicates a | 
|  | 153 | value of 24 (011 << 3) with no continuation.  The sum (3+24) yields the value | 
|  | 154 | 27. | 
|  | 155 | </p> | 
|  | 156 |  | 
|  | 157 | </div> | 
|  | 158 |  | 
|  | 159 | <!-- _______________________________________________________________________ --> | 
|  | 160 | <div class="doc_subsubsection"> <a name="char6">6-bit characters</a></div> | 
|  | 161 |  | 
|  | 162 | <div class="doc_text"> | 
|  | 163 |  | 
|  | 164 | <p>6-bit characters encode common characters into a fixed 6-bit field.  They | 
|  | 165 | represent the following characters with the following 6-bit values:<s/p> | 
|  | 166 |  | 
|  | 167 | <ul> | 
|  | 168 | <li>'a' .. 'z' - 0 .. 25</li> | 
|  | 169 | <li>'A' .. 'Z' - 26 .. 52</li> | 
|  | 170 | <li>'0' .. '9' - 53 .. 61</li> | 
|  | 171 | <li>'.' - 62</li> | 
|  | 172 | <li>'_' - 63</li> | 
|  | 173 | </ul> | 
|  | 174 |  | 
|  | 175 | <p>This encoding is only suitable for encoding characters and strings that | 
|  | 176 | consist only of the above characters.  It is completely incapable of encoding | 
|  | 177 | characters not in the set.</p> | 
|  | 178 |  | 
|  | 179 | </div> | 
|  | 180 |  | 
|  | 181 | <!-- _______________________________________________________________________ --> | 
|  | 182 | <div class="doc_subsubsection"> <a name="wordalign">Word Alignment</a></div> | 
|  | 183 |  | 
|  | 184 | <div class="doc_text"> | 
|  | 185 |  | 
|  | 186 | <p>Occasionally, it is useful to emit zero bits until the bitstream is a | 
|  | 187 | multiple of 32 bits.  This ensures that the bit position in the stream can be | 
|  | 188 | represented as a multiple of 32-bit words.</p> | 
|  | 189 |  | 
|  | 190 | </div> | 
|  | 191 |  | 
|  | 192 |  | 
|  | 193 | <!-- ======================================================================= --> | 
|  | 194 | <div class="doc_subsection"><a name="abbrevid">Abbreviation IDs</a> | 
|  | 195 | </div> | 
|  | 196 |  | 
|  | 197 | <div class="doc_text"> | 
|  | 198 |  | 
|  | 199 | <p> | 
|  | 200 | A bitstream is a sequential series of <a href="#blocks">Blocks</a> and | 
|  | 201 | <a href="#datarecord">Data Records</a>.  Both of these start with an | 
|  | 202 | abbreviation ID encoded as a fixed-bitwidth field.  The width is specified by | 
|  | 203 | the current block, as described below.  The value of the abbreviation ID | 
|  | 204 | specifies either a builtin ID (which have special meanings, defined below) or | 
|  | 205 | one of the abbreviation IDs defined by the stream itself. | 
|  | 206 | </p> | 
|  | 207 |  | 
|  | 208 | <p> | 
|  | 209 | The set of builtin abbrev IDs is: | 
|  | 210 | </p> | 
|  | 211 |  | 
|  | 212 | <ul> | 
|  | 213 | <li>0 - <a href="#END_BLOCK">END_BLOCK</a> - This abbrev ID marks the end of the | 
|  | 214 | current block.</li> | 
|  | 215 | <li>1 - <a href="#ENTER_SUBBLOCK">ENTER_SUBBLOCK</a> - This abbrev ID marks the | 
|  | 216 | beginning of a new block.</li> | 
| Chris Lattner | daeb63c | 2007-05-12 07:49:15 +0000 | [diff] [blame^] | 217 | <li>2 - <a href="#DEFINE_ABBREV">DEFINE_ABBREV</a> - This defines a new | 
|  | 218 | abbreviation.</li> | 
|  | 219 | <li>3 - <a href="#UNABBREV_RECORD">UNABBREV_RECORD</a> - This ID specifies the | 
|  | 220 | definition of an unabbreviated record.</li> | 
| Chris Lattner | 3a1716d | 2007-05-12 05:37:42 +0000 | [diff] [blame] | 221 | </ul> | 
|  | 222 |  | 
| Chris Lattner | daeb63c | 2007-05-12 07:49:15 +0000 | [diff] [blame^] | 223 | <p>Abbreviation IDs 4 and above are defined by the stream itself, and specify | 
|  | 224 | an <a href="#abbrev_records">abbreviated record encoding</a>.</p> | 
| Chris Lattner | 3a1716d | 2007-05-12 05:37:42 +0000 | [diff] [blame] | 225 |  | 
|  | 226 | </div> | 
|  | 227 |  | 
|  | 228 | <!-- ======================================================================= --> | 
|  | 229 | <div class="doc_subsection"><a name="blocks">Blocks</a> | 
|  | 230 | </div> | 
|  | 231 |  | 
|  | 232 | <div class="doc_text"> | 
|  | 233 |  | 
|  | 234 | <p> | 
|  | 235 | Blocks in a bitstream denote nested regions of the stream, and are identified by | 
|  | 236 | a content-specific id number (for example, LLVM IR uses an ID of 12 to represent | 
|  | 237 | function bodies).  Nested blocks capture the hierachical structure of the data | 
|  | 238 | encoded in it, and various properties are associated with blocks as the file is | 
|  | 239 | parsed.  Block definitions allow the reader to efficiently skip blocks | 
|  | 240 | in constant time if the reader wants a summary of blocks, or if it wants to | 
|  | 241 | efficiently skip data they do not understand.  The LLVM IR reader uses this | 
|  | 242 | mechanism to skip function bodies, lazily reading them on demand. | 
|  | 243 | </p> | 
|  | 244 |  | 
|  | 245 | <p> | 
|  | 246 | When reading and encoding the stream, several properties are maintained for the | 
|  | 247 | block.  In particular, each block maintains: | 
|  | 248 | </p> | 
|  | 249 |  | 
|  | 250 | <ol> | 
|  | 251 | <li>A current abbrev id width.  This value starts at 2, and is set every time a | 
|  | 252 | block record is entered.  The block entry specifies the abbrev id width for | 
|  | 253 | the body of the block.</li> | 
|  | 254 |  | 
|  | 255 | <li>A set of abbreviations.  Abbreviations may be defined within a block, or | 
|  | 256 | they may be associated with all blocks of a particular ID. | 
|  | 257 | </li> | 
|  | 258 | </ol> | 
|  | 259 |  | 
|  | 260 | <p>As sub blocks are entered, these properties are saved and the new sub-block | 
|  | 261 | has its own set of abbreviations, and its own abbrev id width.  When a sub-block | 
|  | 262 | is popped, the saved values are restored.</p> | 
|  | 263 |  | 
|  | 264 | </div> | 
|  | 265 |  | 
|  | 266 | <!-- _______________________________________________________________________ --> | 
|  | 267 | <div class="doc_subsubsection"> <a name="ENTER_SUBBLOCK">ENTER_SUBBLOCK | 
|  | 268 | Encoding</a></div> | 
|  | 269 |  | 
|  | 270 | <div class="doc_text"> | 
|  | 271 |  | 
|  | 272 | <p><tt>[ENTER_SUBBLOCK, blockid<sub>vbr8</sub>, newabbrevlen<sub>vbr4</sub>, | 
|  | 273 | <align32bits>, blocklen<sub>32</sub>]</tt></p> | 
|  | 274 |  | 
|  | 275 | <p> | 
|  | 276 | The ENTER_SUBBLOCK abbreviation ID specifies the start of a new block record. | 
|  | 277 | The <tt>blockid</tt> value is encoded as a 8-bit VBR identifier, and indicates | 
|  | 278 | the type of block being entered (which is application specific).  The | 
|  | 279 | <tt>newabbrevlen</tt> value is a 4-bit VBR which specifies the | 
|  | 280 | abbrev id width for the sub-block.  The <tt>blocklen</tt> is a 32-bit aligned | 
|  | 281 | value that specifies the size of the subblock, in 32-bit words.  This value | 
|  | 282 | allows the reader to skip over the entire block in one jump. | 
|  | 283 | </p> | 
|  | 284 |  | 
|  | 285 | </div> | 
|  | 286 |  | 
|  | 287 | <!-- _______________________________________________________________________ --> | 
|  | 288 | <div class="doc_subsubsection"> <a name="END_BLOCK">END_BLOCK | 
|  | 289 | Encoding</a></div> | 
|  | 290 |  | 
|  | 291 | <div class="doc_text"> | 
|  | 292 |  | 
|  | 293 | <p><tt>[END_BLOCK, <align32bits>]</tt></p> | 
|  | 294 |  | 
|  | 295 | <p> | 
|  | 296 | The END_BLOCK abbreviation ID specifies the end of the current block record. | 
|  | 297 | Its end is aligned to 32-bits to ensure that the size of the block is an even | 
|  | 298 | multiple of 32-bits.</p> | 
|  | 299 |  | 
|  | 300 | </div> | 
|  | 301 |  | 
|  | 302 |  | 
|  | 303 |  | 
|  | 304 | <!-- ======================================================================= --> | 
|  | 305 | <div class="doc_subsection"><a name="datarecord">Data Records</a> | 
|  | 306 | </div> | 
|  | 307 |  | 
|  | 308 | <div class="doc_text"> | 
| Chris Lattner | daeb63c | 2007-05-12 07:49:15 +0000 | [diff] [blame^] | 309 | <p> | 
|  | 310 | Data records consist of a record code and a number of (up to) 64-bit integer | 
|  | 311 | values.  The interpretation of the code and values is application specific and | 
|  | 312 | there are multiple different ways to encode a record (with an unabbrev record | 
|  | 313 | or with an abbreviation).  In the LLVM IR format, for example, there is a record | 
|  | 314 | which encodes the target triple of a module.  The code is MODULE_CODE_TRIPLE, | 
|  | 315 | and the values of the record are the ascii codes for the characters in the | 
|  | 316 | string.</p> | 
|  | 317 |  | 
|  | 318 | </div> | 
|  | 319 |  | 
|  | 320 | <!-- _______________________________________________________________________ --> | 
|  | 321 | <div class="doc_subsubsection"> <a name="UNABBREV_RECORD">UNABBREV_RECORD | 
|  | 322 | Encoding</a></div> | 
|  | 323 |  | 
|  | 324 | <div class="doc_text"> | 
|  | 325 |  | 
|  | 326 | <p><tt>[UNABBREV_RECORD, code<sub>vbr6</sub>, numops<sub>vbr6</sub>, | 
|  | 327 | op0<sub>vbr6</sub>, op1<sub>vbr6</sub>, ...]</tt></p> | 
|  | 328 |  | 
|  | 329 | <p>An UNABBREV_RECORD provides a default fallback encoding, which is both | 
|  | 330 | completely general and also extremely inefficient.  It can describe an arbitrary | 
|  | 331 | record, by emitting the code and operands as vbrs.</p> | 
|  | 332 |  | 
|  | 333 | <p>For example, emitting an LLVM IR target triple as an unabbreviated record | 
|  | 334 | requires emitting the UNABBREV_RECORD abbrevid, a vbr6 for the | 
|  | 335 | MODULE_CODE_TRIPLE code, a vbr6 for the length of the string (which is equal to | 
|  | 336 | the number of operands), and a vbr6 for each character.  Since there are no | 
|  | 337 | letters with value less than 32, each letter would need to be emitted as at | 
|  | 338 | least a two-part VBR, which means that each letter would require at least 12 | 
|  | 339 | bits.  This is not an efficient encoding, but it is fully general.</p> | 
|  | 340 |  | 
|  | 341 | </div> | 
|  | 342 |  | 
|  | 343 | <!-- _______________________________________________________________________ --> | 
|  | 344 | <div class="doc_subsubsection"> <a name="abbrev_records">Abbreviated Record | 
|  | 345 | Encoding</a></div> | 
|  | 346 |  | 
|  | 347 | <div class="doc_text"> | 
|  | 348 |  | 
|  | 349 | <p><tt>[<abbrevid>, fields...]</tt></p> | 
|  | 350 |  | 
|  | 351 | <p>An abbreviated record is a abbreviation id followed by a set of fields that | 
|  | 352 | are encoded according to the <a href="#abbreviations">abbreviation | 
|  | 353 | definition</a>.  This allows records to be encoded significantly more densely | 
|  | 354 | than records encoded with the <a href="#UNABBREV_RECORD">UNABBREV_RECORD</a> | 
|  | 355 | type, and allows the abbreviation types to be specified in the stream itself, | 
|  | 356 | which allows the files to be completely self describing.  The actual encoding | 
|  | 357 | of abbreviations is defined below. | 
|  | 358 | </p> | 
|  | 359 |  | 
|  | 360 | </div> | 
|  | 361 |  | 
|  | 362 | <!-- ======================================================================= --> | 
|  | 363 | <div class="doc_subsection"><a name="abbreviations">Abbreviations</a> | 
|  | 364 | </div> | 
|  | 365 |  | 
|  | 366 | <div class="doc_text"> | 
|  | 367 | <p> | 
|  | 368 | Abbreviations are an important form of compression for bitstreams.  The idea is | 
|  | 369 | to specify a dense encoding for a class of records once, then use that encoding | 
|  | 370 | to emit many records.  It takes space to emit the encoding into the file, but | 
|  | 371 | the space is recouped (hopefully plus some) when the records that use it are | 
|  | 372 | emitted. | 
|  | 373 | </p> | 
| Chris Lattner | 3a1716d | 2007-05-12 05:37:42 +0000 | [diff] [blame] | 374 |  | 
|  | 375 | <p> | 
| Chris Lattner | daeb63c | 2007-05-12 07:49:15 +0000 | [diff] [blame^] | 376 | Abbreviations can be determined dynamically per client, per file.  Since the | 
|  | 377 | abbreviations are stored in the bitstream itself, different streams of the same | 
|  | 378 | format can contain different sets of abbreviations if the specific stream does | 
|  | 379 | not need it.  As a concrete example, LLVM IR files usually emit an abbreviation | 
|  | 380 | for binary operators.  If a specific LLVM module contained no or few binary | 
|  | 381 | operators, the abbreviation does not need to be emitted. | 
| Chris Lattner | 3a1716d | 2007-05-12 05:37:42 +0000 | [diff] [blame] | 382 | </p> | 
| Chris Lattner | daeb63c | 2007-05-12 07:49:15 +0000 | [diff] [blame^] | 383 | </div> | 
|  | 384 |  | 
|  | 385 | <!-- _______________________________________________________________________ --> | 
|  | 386 | <div class="doc_subsubsection"><a name="DEFINE_ABBREV">DEFINE_ABBREV | 
|  | 387 | Encoding</a></div> | 
|  | 388 |  | 
|  | 389 | <div class="doc_text"> | 
|  | 390 |  | 
|  | 391 | <p><tt>[DEFINE_ABBREV, numabbrevops<sub>vbr5</sub>, abbrevop0, abbrevop1, | 
|  | 392 | ...]</tt></p> | 
|  | 393 |  | 
|  | 394 | <p>An abbreviation definition consists of the DEFINE_ABBREV abbrevid followed | 
|  | 395 | by a VBR that specifies the number of abbrev operands, then the abbrev | 
|  | 396 | operands themselves.  Abbreviation operands come in three forms.  They all start | 
|  | 397 | with a single bit that indicates whether the abbrev operand is a literal operand | 
|  | 398 | (when the bit is 1) or an encoding operand (when the bit is 0).</p> | 
|  | 399 |  | 
|  | 400 | <ol> | 
|  | 401 | <li>Literal operands - <tt>[1<sub>1</sub>, litvalue<sub>vbr8</sub>]</tt> - | 
|  | 402 | Literal operands specify that the value in the result | 
|  | 403 | is always a single specific value.  This specific value is emitted as a vbr8 | 
|  | 404 | after the bit indicating that it is a literal operand.</li> | 
|  | 405 | <li>Encoding info without data - <tt>[0<sub>1</sub>, encoding<sub>3</sub>]</tt> | 
|  | 406 | - blah | 
|  | 407 | </li> | 
|  | 408 | <li>Encoding info with data - <tt>[0<sub>1</sub>, encoding<sub>3</sub>, | 
|  | 409 | value<sub>vbr5</sub>]</tt> - | 
|  | 410 |  | 
|  | 411 | </li> | 
|  | 412 | </ol> | 
| Chris Lattner | 3a1716d | 2007-05-12 05:37:42 +0000 | [diff] [blame] | 413 |  | 
|  | 414 | </div> | 
|  | 415 |  | 
|  | 416 |  | 
| Chris Lattner | e9ef457 | 2007-05-12 03:23:40 +0000 | [diff] [blame] | 417 | <!-- *********************************************************************** --> | 
|  | 418 | <div class="doc_section"> <a name="llvmir">LLVM IR Encoding</a></div> | 
|  | 419 | <!-- *********************************************************************** --> | 
|  | 420 |  | 
|  | 421 | <div class="doc_text"> | 
|  | 422 |  | 
|  | 423 | <p></p> | 
|  | 424 |  | 
|  | 425 | </div> | 
|  | 426 |  | 
|  | 427 |  | 
| Reid Spencer | 2c1ce4f | 2007-01-20 23:21:08 +0000 | [diff] [blame] | 428 | <!-- *********************************************************************** --> | 
|  | 429 | <hr> | 
|  | 430 | <address> <a href="http://jigsaw.w3.org/css-validator/check/referer"><img | 
|  | 431 | src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> | 
|  | 432 | <a href="http://validator.w3.org/check/referer"><img | 
|  | 433 | src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> | 
| Chris Lattner | e9ef457 | 2007-05-12 03:23:40 +0000 | [diff] [blame] | 434 | <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> | 
| Reid Spencer | 2c1ce4f | 2007-01-20 23:21:08 +0000 | [diff] [blame] | 435 | <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br> | 
|  | 436 | Last modified: $Date$ | 
|  | 437 | </address> | 
|  | 438 | </body> | 
|  | 439 | </html> |