Blame - docs/BitCodeFormat.html - fp2-dev/platform/external/llvm

blob: ed9bd082b56ad351a320c4d5831ef05ec8e98a97 [file] [log] [blame]

Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
				2	"http://www.w3.org/TR/html4/strict.dtd">
Reid Spencer	2c1ce4f	2007-01-20 23:21:08 +0000	[diff] [blame]	3	<html>
				4	<head>
				5	<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
				6	<title>LLVM Bitcode File Format</title>
				7	<link rel="stylesheet" href="llvm.css" type="text/css">
Reid Spencer	2c1ce4f	2007-01-20 23:21:08 +0000	[diff] [blame]	8	</head>
				9	<body>
				10	<div class="doc_title"> LLVM Bitcode File Format </div>
				11	<ol>
				12	<li><a href="#abstract">Abstract</a></li>
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	13	<li><a href="#overview">Overview</a></li>
				14	<li><a href="#bitstream">Bitstream Format</a>
				15	<ol>
				16	<li><a href="#magic">Magic Numbers</a></li>
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	17	<li><a href="#primitives">Primitives</a></li>
				18	<li><a href="#abbrevid">Abbreviation IDs</a></li>
				19	<li><a href="#blocks">Blocks</a></li>
				20	<li><a href="#datarecord">Data Records</a></li>
Chris Lattner	daeb63c	2007-05-12 07:49:15 +0000	[diff] [blame]	21	<li><a href="#abbreviations">Abbreviations</a></li>
Chris Lattner	7300af5	2007-05-13 00:59:52 +0000	[diff] [blame]	22	<li><a href="#stdblocks">Standard Blocks</a></li>
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	23	</ol>
				24	</li>
Chris Lattner	6fa6a32	2008-07-09 05:14:23 +0000	[diff] [blame]	25	<li><a href="#wrapper">Bitcode Wrapper Format</a>
				26	</li>
Chris Lattner	69b3e40	2007-05-13 01:39:44 +0000	[diff] [blame]	27	<li><a href="#llvmir">LLVM IR Encoding</a>
				28	<ol>
				29	<li><a href="#basics">Basics</a></li>
				30	</ol>
				31	</li>
Reid Spencer	2c1ce4f	2007-01-20 23:21:08 +0000	[diff] [blame]	32	</ol>
				33	<div class="doc_author">
Chris Lattner	f19b8e4	2007-10-08 18:42:45 +0000	[diff] [blame]	34	<p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a>
				35	and <a href="http://www.reverberate.org">Joshua Haberman</a>.
Reid Spencer	2c1ce4f	2007-01-20 23:21:08 +0000	[diff] [blame]	36	</p>
				37	</div>
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	38
Reid Spencer	2c1ce4f	2007-01-20 23:21:08 +0000	[diff] [blame]	39	<!-- *********************************************************************** -->
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	40	<div class="doc_section"> <a name="abstract">Abstract</a></div>
Reid Spencer	2c1ce4f	2007-01-20 23:21:08 +0000	[diff] [blame]	41	<!-- *********************************************************************** -->
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	42
Reid Spencer	2c1ce4f	2007-01-20 23:21:08 +0000	[diff] [blame]	43	<div class="doc_text">
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	44
				45	<p>This document describes the LLVM bitstream file format and the encoding of
				46	the LLVM IR into it.</p>
				47
Reid Spencer	2c1ce4f	2007-01-20 23:21:08 +0000	[diff] [blame]	48	</div>
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	49
Reid Spencer	2c1ce4f	2007-01-20 23:21:08 +0000	[diff] [blame]	50	<!-- *********************************************************************** -->
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	51	<div class="doc_section"> <a name="overview">Overview</a></div>
Reid Spencer	2c1ce4f	2007-01-20 23:21:08 +0000	[diff] [blame]	52	<!-- *********************************************************************** -->
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	53
Reid Spencer	2c1ce4f	2007-01-20 23:21:08 +0000	[diff] [blame]	54	<div class="doc_text">
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	55
				56	<p>
				57	What is commonly known as the LLVM bitcode file format (also, sometimes
				58	anachronistically known as bytecode) is actually two things: a <a
				59	href="#bitstream">bitstream container format</a>
				60	and an <a href="#llvmir">encoding of LLVM IR</a> into the container format.</p>
				61
				62	<p>
Reid Spencer	58d0547	2007-05-12 08:01:52 +0000	[diff] [blame]	63	The bitstream format is an abstract encoding of structured data, very
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	64	similar to XML in some ways. Like XML, bitstream files contain tags, and nested
				65	structures, and you can parse the file without having to understand the tags.
				66	Unlike XML, the bitstream format is a binary encoding, and unlike XML it
				67	provides a mechanism for the file to self-describe "abbreviations", which are
				68	effectively size optimizations for the content.</p>
				69
Chris Lattner	6fa6a32	2008-07-09 05:14:23 +0000	[diff] [blame]	70	<p>LLVM IR files may be optionally embedded into a <a
				71	href="#wrapper">wrapper</a> structure that makes it easy to embed extra data
				72	along with LLVM IR files.</p>
				73
				74	<p>This document first describes the LLVM bitstream format, describes the
				75	wrapper format, then describes the record structure used by LLVM IR files.
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	76	</p>
				77
Reid Spencer	2c1ce4f	2007-01-20 23:21:08 +0000	[diff] [blame]	78	</div>
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	79
				80	<!-- *********************************************************************** -->
				81	<div class="doc_section"> <a name="bitstream">Bitstream Format</a></div>
				82	<!-- *********************************************************************** -->
				83
				84	<div class="doc_text">
				85
				86	<p>
				87	The bitstream format is literally a stream of bits, with a very simple
				88	structure. This structure consists of the following concepts:
				89	</p>
				90
				91	<ul>
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	92	<li>A "<a href="#magic">magic number</a>" that identifies the contents of
				93	the stream.</li>
				94	<li>Encoding <a href="#primitives">primitives</a> like variable bit-rate
				95	integers.</li>
				96	<li><a href="#blocks">Blocks</a>, which define nested content.</li>
				97	<li><a href="#datarecord">Data Records</a>, which describe entities within the
				98	file.</li>
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	99	<li>Abbreviations, which specify compression optimizations for the file.</li>
				100	</ul>
				101
				102	<p>Note that the <a
				103	href="CommandGuide/html/llvm-bcanalyzer.html">llvm-bcanalyzer</a> tool can be
				104	used to dump and inspect arbitrary bitstreams, which is very useful for
				105	understanding the encoding.</p>
				106
				107	</div>
				108
				109	<!-- ======================================================================= -->
				110	<div class="doc_subsection"><a name="magic">Magic Numbers</a>
				111	</div>
				112
				113	<div class="doc_text">
				114
Chris Lattner	f19b8e4	2007-10-08 18:42:45 +0000	[diff] [blame]	115	<p>The first two bytes of a bitcode file are 'BC' (0x42, 0x43).
				116	The second two bytes are an application-specific magic number. Generic
				117	bitcode tools can look at only the first two bytes to verify the file is
				118	bitcode, while application-specific programs will want to look at all four.</p>
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	119
				120	</div>
				121
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	122	<!-- ======================================================================= -->
				123	<div class="doc_subsection"><a name="primitives">Primitives</a>
				124	</div>
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	125
				126	<div class="doc_text">
				127
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	128	<p>
Chris Lattner	f19b8e4	2007-10-08 18:42:45 +0000	[diff] [blame]	129	A bitstream literally consists of a stream of bits, which are read in order
				130	starting with the least significant bit of each byte. The stream is made up of a
Chris Lattner	69b3e40	2007-05-13 01:39:44 +0000	[diff] [blame]	131	number of primitive values that encode a stream of unsigned integer values.
				132	These
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	133	integers are are encoded in two ways: either as <a href="#fixedwidth">Fixed
				134	Width Integers</a> or as <a href="#variablewidth">Variable Width
				135	Integers</a>.
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	136	</p>
				137
				138	</div>
				139
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	140	<!-- _______________________________________________________________________ -->
				141	<div class="doc_subsubsection"> <a name="fixedwidth">Fixed Width Integers</a>
				142	</div>
				143
				144	<div class="doc_text">
				145
				146	<p>Fixed-width integer values have their low bits emitted directly to the file.
				147	For example, a 3-bit integer value encodes 1 as 001. Fixed width integers
				148	are used when there are a well-known number of options for a field. For
				149	example, boolean values are usually encoded with a 1-bit wide integer.
				150	</p>
				151
				152	</div>
				153
				154	<!-- _______________________________________________________________________ -->
				155	<div class="doc_subsubsection"> <a name="variablewidth">Variable Width
				156	Integers</a></div>
				157
				158	<div class="doc_text">
				159
				160	<p>Variable-width integer (VBR) values encode values of arbitrary size,
				161	optimizing for the case where the values are small. Given a 4-bit VBR field,
				162	any 3-bit value (0 through 7) is encoded directly, with the high bit set to
				163	zero. Values larger than N-1 bits emit their bits in a series of N-1 bit
				164	chunks, where all but the last set the high bit.</p>
				165
				166	<p>For example, the value 27 (0x1B) is encoded as 1011 0011 when emitted as a
				167	vbr4 value. The first set of four bits indicates the value 3 (011) with a
				168	continuation piece (indicated by a high bit of 1). The next word indicates a
				169	value of 24 (011 << 3) with no continuation. The sum (3+24) yields the value
				170	27.
				171	</p>
				172
				173	</div>
				174
				175	<!-- _______________________________________________________________________ -->
				176	<div class="doc_subsubsection"> <a name="char6">6-bit characters</a></div>
				177
				178	<div class="doc_text">
				179
				180	<p>6-bit characters encode common characters into a fixed 6-bit field. They
Chris Lattner	f1d64e9	2007-05-12 07:50:14 +0000	[diff] [blame]	181	represent the following characters with the following 6-bit values:</p>
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	182
				183	<ul>
				184	<li>'a' .. 'z' - 0 .. 25</li>
Chris Lattner	f19b8e4	2007-10-08 18:42:45 +0000	[diff] [blame]	185	<li>'A' .. 'Z' - 26 .. 51</li>
				186	<li>'0' .. '9' - 52 .. 61</li>
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	187	<li>'.' - 62</li>
				188	<li>'_' - 63</li>
				189	</ul>
				190
				191	<p>This encoding is only suitable for encoding characters and strings that
				192	consist only of the above characters. It is completely incapable of encoding
				193	characters not in the set.</p>
				194
				195	</div>
				196
				197	<!-- _______________________________________________________________________ -->
				198	<div class="doc_subsubsection"> <a name="wordalign">Word Alignment</a></div>
				199
				200	<div class="doc_text">
				201
				202	<p>Occasionally, it is useful to emit zero bits until the bitstream is a
				203	multiple of 32 bits. This ensures that the bit position in the stream can be
				204	represented as a multiple of 32-bit words.</p>
				205
				206	</div>
				207
				208
				209	<!-- ======================================================================= -->
				210	<div class="doc_subsection"><a name="abbrevid">Abbreviation IDs</a>
				211	</div>
				212
				213	<div class="doc_text">
				214
				215	<p>
				216	A bitstream is a sequential series of <a href="#blocks">Blocks</a> and
				217	<a href="#datarecord">Data Records</a>. Both of these start with an
				218	abbreviation ID encoded as a fixed-bitwidth field. The width is specified by
				219	the current block, as described below. The value of the abbreviation ID
				220	specifies either a builtin ID (which have special meanings, defined below) or
				221	one of the abbreviation IDs defined by the stream itself.
				222	</p>
				223
				224	<p>
				225	The set of builtin abbrev IDs is:
				226	</p>
				227
				228	<ul>
				229	<li>0 - <a href="#END_BLOCK">END_BLOCK</a> - This abbrev ID marks the end of the
				230	current block.</li>
				231	<li>1 - <a href="#ENTER_SUBBLOCK">ENTER_SUBBLOCK</a> - This abbrev ID marks the
				232	beginning of a new block.</li>
Chris Lattner	daeb63c	2007-05-12 07:49:15 +0000	[diff] [blame]	233	<li>2 - <a href="#DEFINE_ABBREV">DEFINE_ABBREV</a> - This defines a new
				234	abbreviation.</li>
				235	<li>3 - <a href="#UNABBREV_RECORD">UNABBREV_RECORD</a> - This ID specifies the
				236	definition of an unabbreviated record.</li>
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	237	</ul>
				238
Chris Lattner	daeb63c	2007-05-12 07:49:15 +0000	[diff] [blame]	239	<p>Abbreviation IDs 4 and above are defined by the stream itself, and specify
				240	an <a href="#abbrev_records">abbreviated record encoding</a>.</p>
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	241
				242	</div>
				243
				244	<!-- ======================================================================= -->
				245	<div class="doc_subsection"><a name="blocks">Blocks</a>
				246	</div>
				247
				248	<div class="doc_text">
				249
				250	<p>
				251	Blocks in a bitstream denote nested regions of the stream, and are identified by
				252	a content-specific id number (for example, LLVM IR uses an ID of 12 to represent
Chris Lattner	f19b8e4	2007-10-08 18:42:45 +0000	[diff] [blame]	253	function bodies). Block IDs 0-7 are reserved for <a href="#stdblocks">standard blocks</a>
				254	whose meaning is defined by Bitcode; block IDs 8 and greater are
				255	application specific. Nested blocks capture the hierachical structure of the data
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	256	encoded in it, and various properties are associated with blocks as the file is
				257	parsed. Block definitions allow the reader to efficiently skip blocks
				258	in constant time if the reader wants a summary of blocks, or if it wants to
				259	efficiently skip data they do not understand. The LLVM IR reader uses this
				260	mechanism to skip function bodies, lazily reading them on demand.
				261	</p>
				262
				263	<p>
				264	When reading and encoding the stream, several properties are maintained for the
				265	block. In particular, each block maintains:
				266	</p>
				267
				268	<ol>
				269	<li>A current abbrev id width. This value starts at 2, and is set every time a
				270	block record is entered. The block entry specifies the abbrev id width for
				271	the body of the block.</li>
				272
Chris Lattner	f19b8e4	2007-10-08 18:42:45 +0000	[diff] [blame]	273	<li>A set of abbreviations. Abbreviations may be defined within a block, in
				274	which case they are only defined in that block (neither subblocks nor
				275	enclosing blocks see the abbreviation). Abbreviations can also be defined
				276	inside a <a href="#BLOCKINFO">BLOCKINFO</a> block, in which case they are
				277	defined in all blocks that match the ID that the BLOCKINFO block is describing.
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	278	</li>
				279	</ol>
				280
				281	<p>As sub blocks are entered, these properties are saved and the new sub-block
				282	has its own set of abbreviations, and its own abbrev id width. When a sub-block
				283	is popped, the saved values are restored.</p>
				284
				285	</div>
				286
				287	<!-- _______________________________________________________________________ -->
				288	<div class="doc_subsubsection"> <a name="ENTER_SUBBLOCK">ENTER_SUBBLOCK
				289	Encoding</a></div>
				290
				291	<div class="doc_text">
				292
				293	<p><tt>[ENTER_SUBBLOCK, blockid<sub>vbr8</sub>, newabbrevlen<sub>vbr4</sub>,
				294	<align32bits>, blocklen<sub>32</sub>]</tt></p>
				295
				296	<p>
				297	The ENTER_SUBBLOCK abbreviation ID specifies the start of a new block record.
				298	The <tt>blockid</tt> value is encoded as a 8-bit VBR identifier, and indicates
Chris Lattner	f19b8e4	2007-10-08 18:42:45 +0000	[diff] [blame]	299	the type of block being entered (which can be a <a href="#stdblocks">standard
				300	block</a> or an application-specific block). The
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	301	<tt>newabbrevlen</tt> value is a 4-bit VBR which specifies the
				302	abbrev id width for the sub-block. The <tt>blocklen</tt> is a 32-bit aligned
				303	value that specifies the size of the subblock, in 32-bit words. This value
				304	allows the reader to skip over the entire block in one jump.
				305	</p>
				306
				307	</div>
				308
				309	<!-- _______________________________________________________________________ -->
				310	<div class="doc_subsubsection"> <a name="END_BLOCK">END_BLOCK
				311	Encoding</a></div>
				312
				313	<div class="doc_text">
				314
				315	<p><tt>[END_BLOCK, <align32bits>]</tt></p>
				316
				317	<p>
				318	The END_BLOCK abbreviation ID specifies the end of the current block record.
				319	Its end is aligned to 32-bits to ensure that the size of the block is an even
				320	multiple of 32-bits.</p>
				321
				322	</div>
				323
				324
				325
				326	<!-- ======================================================================= -->
				327	<div class="doc_subsection"><a name="datarecord">Data Records</a>
				328	</div>
				329
				330	<div class="doc_text">
Chris Lattner	daeb63c	2007-05-12 07:49:15 +0000	[diff] [blame]	331	<p>
				332	Data records consist of a record code and a number of (up to) 64-bit integer
				333	values. The interpretation of the code and values is application specific and
				334	there are multiple different ways to encode a record (with an unabbrev record
				335	or with an abbreviation). In the LLVM IR format, for example, there is a record
				336	which encodes the target triple of a module. The code is MODULE_CODE_TRIPLE,
				337	and the values of the record are the ascii codes for the characters in the
				338	string.</p>
				339
				340	</div>
				341
				342	<!-- _______________________________________________________________________ -->
				343	<div class="doc_subsubsection"> <a name="UNABBREV_RECORD">UNABBREV_RECORD
				344	Encoding</a></div>
				345
				346	<div class="doc_text">
				347
				348	<p><tt>[UNABBREV_RECORD, code<sub>vbr6</sub>, numops<sub>vbr6</sub>,
				349	op0<sub>vbr6</sub>, op1<sub>vbr6</sub>, ...]</tt></p>
				350
				351	<p>An UNABBREV_RECORD provides a default fallback encoding, which is both
				352	completely general and also extremely inefficient. It can describe an arbitrary
				353	record, by emitting the code and operands as vbrs.</p>
				354
				355	<p>For example, emitting an LLVM IR target triple as an unabbreviated record
				356	requires emitting the UNABBREV_RECORD abbrevid, a vbr6 for the
				357	MODULE_CODE_TRIPLE code, a vbr6 for the length of the string (which is equal to
				358	the number of operands), and a vbr6 for each character. Since there are no
				359	letters with value less than 32, each letter would need to be emitted as at
				360	least a two-part VBR, which means that each letter would require at least 12
				361	bits. This is not an efficient encoding, but it is fully general.</p>
				362
				363	</div>
				364
				365	<!-- _______________________________________________________________________ -->
				366	<div class="doc_subsubsection"> <a name="abbrev_records">Abbreviated Record
				367	Encoding</a></div>
				368
				369	<div class="doc_text">
				370
				371	<p><tt>[<abbrevid>, fields...]</tt></p>
				372
				373	<p>An abbreviated record is a abbreviation id followed by a set of fields that
				374	are encoded according to the <a href="#abbreviations">abbreviation
				375	definition</a>. This allows records to be encoded significantly more densely
				376	than records encoded with the <a href="#UNABBREV_RECORD">UNABBREV_RECORD</a>
				377	type, and allows the abbreviation types to be specified in the stream itself,
				378	which allows the files to be completely self describing. The actual encoding
				379	of abbreviations is defined below.
				380	</p>
				381
				382	</div>
				383
				384	<!-- ======================================================================= -->
				385	<div class="doc_subsection"><a name="abbreviations">Abbreviations</a>
				386	</div>
				387
				388	<div class="doc_text">
				389	<p>
				390	Abbreviations are an important form of compression for bitstreams. The idea is
				391	to specify a dense encoding for a class of records once, then use that encoding
				392	to emit many records. It takes space to emit the encoding into the file, but
				393	the space is recouped (hopefully plus some) when the records that use it are
				394	emitted.
				395	</p>
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	396
				397	<p>
Chris Lattner	daeb63c	2007-05-12 07:49:15 +0000	[diff] [blame]	398	Abbreviations can be determined dynamically per client, per file. Since the
				399	abbreviations are stored in the bitstream itself, different streams of the same
				400	format can contain different sets of abbreviations if the specific stream does
				401	not need it. As a concrete example, LLVM IR files usually emit an abbreviation
				402	for binary operators. If a specific LLVM module contained no or few binary
				403	operators, the abbreviation does not need to be emitted.
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	404	</p>
Chris Lattner	daeb63c	2007-05-12 07:49:15 +0000	[diff] [blame]	405	</div>
				406
				407	<!-- _______________________________________________________________________ -->
				408	<div class="doc_subsubsection"><a name="DEFINE_ABBREV">DEFINE_ABBREV
				409	Encoding</a></div>
				410
				411	<div class="doc_text">
				412
				413	<p><tt>[DEFINE_ABBREV, numabbrevops<sub>vbr5</sub>, abbrevop0, abbrevop1,
				414	...]</tt></p>
				415
Chris Lattner	f19b8e4	2007-10-08 18:42:45 +0000	[diff] [blame]	416	<p>A DEFINE_ABBREV record adds an abbreviation to the list of currently
				417	defined abbreviations in the scope of this block. This definition only
				418	exists inside this immediate block -- it is not visible in subblocks or
				419	enclosing blocks.
				420	Abbreviations are implicitly assigned IDs
				421	sequentially starting from 4 (the first application-defined abbreviation ID).
				422	Any abbreviations defined in a BLOCKINFO record receive IDs first, in order,
				423	followed by any abbreviations defined within the block itself.
				424	Abbreviated data records reference this ID to indicate what abbreviation
				425	they are invoking.</p>
				426
Chris Lattner	daeb63c	2007-05-12 07:49:15 +0000	[diff] [blame]	427	<p>An abbreviation definition consists of the DEFINE_ABBREV abbrevid followed
				428	by a VBR that specifies the number of abbrev operands, then the abbrev
				429	operands themselves. Abbreviation operands come in three forms. They all start
				430	with a single bit that indicates whether the abbrev operand is a literal operand
				431	(when the bit is 1) or an encoding operand (when the bit is 0).</p>
				432
				433	<ol>
				434	<li>Literal operands - <tt>[1<sub>1</sub>, litvalue<sub>vbr8</sub>]</tt> -
				435	Literal operands specify that the value in the result
				436	is always a single specific value. This specific value is emitted as a vbr8
				437	after the bit indicating that it is a literal operand.</li>
				438	<li>Encoding info without data - <tt>[0<sub>1</sub>, encoding<sub>3</sub>]</tt>
Chris Lattner	7300af5	2007-05-13 00:59:52 +0000	[diff] [blame]	439	- Operand encodings that do not have extra data are just emitted as their code.
Chris Lattner	daeb63c	2007-05-12 07:49:15 +0000	[diff] [blame]	440	</li>
				441	<li>Encoding info with data - <tt>[0<sub>1</sub>, encoding<sub>3</sub>,
Chris Lattner	7300af5	2007-05-13 00:59:52 +0000	[diff] [blame]	442	value<sub>vbr5</sub>]</tt> - Operand encodings that do have extra data are
				443	emitted as their code, followed by the extra data.
Chris Lattner	daeb63c	2007-05-12 07:49:15 +0000	[diff] [blame]	444	</li>
				445	</ol>
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	446
Chris Lattner	7300af5	2007-05-13 00:59:52 +0000	[diff] [blame]	447	<p>The possible operand encodings are:</p>
				448
				449	<ul>
				450	<li>1 - Fixed - The field should be emitted as a <a
				451	href="#fixedwidth">fixed-width value</a>, whose width
Chris Lattner	f19b8e4	2007-10-08 18:42:45 +0000	[diff] [blame]	452	is specified by the operand's extra data.</li>
Chris Lattner	7300af5	2007-05-13 00:59:52 +0000	[diff] [blame]	453	<li>2 - VBR - The field should be emitted as a <a
				454	href="#variablewidth">variable-width value</a>, whose width
Chris Lattner	f19b8e4	2007-10-08 18:42:45 +0000	[diff] [blame]	455	is specified by the operand's extra data.</li>
				456	<li>3 - Array - This field is an array of values. The array operand has no
				457	extra data, but expects another operand to follow it which indicates the
				458	element type of the array. When reading an array in an abbreviated record,
				459	the first integer is a vbr6 that indicates the array length, followed by
				460	the encoded elements of the array. An array may only occur as the last
				461	operand of an abbreviation (except for the one final operand that gives
				462	the array's type).</li>
Chris Lattner	7300af5	2007-05-13 00:59:52 +0000	[diff] [blame]	463	<li>4 - Char6 - This field should be emitted as a <a href="#char6">char6-encoded
Chris Lattner	f19b8e4	2007-10-08 18:42:45 +0000	[diff] [blame]	464	value</a>. This operand type takes no extra data.</li>
Chris Lattner	7300af5	2007-05-13 00:59:52 +0000	[diff] [blame]	465	</ul>
				466
				467	<p>For example, target triples in LLVM modules are encoded as a record of the
				468	form <tt>[TRIPLE, 'a', 'b', 'c', 'd']</tt>. Consider if the bitstream emitted
				469	the following abbrev entry:</p>
				470
				471	<ul>
				472	<li><tt>[0, Fixed, 4]</tt></li>
				473	<li><tt>[0, Array]</tt></li>
				474	<li><tt>[0, Char6]</tt></li>
				475	</ul>
				476
				477	<p>When emitting a record with this abbreviation, the above entry would be
				478	emitted as:</p>
				479
				480	<p><tt>[4<sub>abbrevwidth</sub>, 2<sub>4</sub>, 4<sub>vbr6</sub>,
				481	0<sub>6</sub>, 1<sub>6</sub>, 2<sub>6</sub>, 3<sub>6</sub>]</tt></p>
				482
				483	<p>These values are:</p>
				484
				485	<ol>
				486	<li>The first value, 4, is the abbreviation ID for this abbreviation.</li>
				487	<li>The second value, 2, is the code for TRIPLE in LLVM IR files.</li>
				488	<li>The third value, 4, is the length of the array.</li>
				489	<li>The rest of the values are the char6 encoded values for "abcd".</li>
				490	</ol>
				491
				492	<p>With this abbreviation, the triple is emitted with only 37 bits (assuming a
				493	abbrev id width of 3). Without the abbreviation, significantly more space would
				494	be required to emit the target triple. Also, since the TRIPLE value is not
				495	emitted as a literal in the abbreviation, the abbreviation can also be used for
				496	any other string value.
				497	</p>
				498
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	499	</div>
				500
Chris Lattner	7300af5	2007-05-13 00:59:52 +0000	[diff] [blame]	501	<!-- ======================================================================= -->
				502	<div class="doc_subsection"><a name="stdblocks">Standard Blocks</a>
				503	</div>
				504
				505	<div class="doc_text">
				506
				507	<p>
				508	In addition to the basic block structure and record encodings, the bitstream
				509	also defines specific builtin block types. These block types specify how the
				510	stream is to be decoded or other metadata. In the future, new standard blocks
Chris Lattner	f19b8e4	2007-10-08 18:42:45 +0000	[diff] [blame]	511	may be added. Block IDs 0-7 are reserved for standard blocks.
Chris Lattner	7300af5	2007-05-13 00:59:52 +0000	[diff] [blame]	512	</p>
				513
				514	</div>
				515
				516	<!-- _______________________________________________________________________ -->
				517	<div class="doc_subsubsection"><a name="BLOCKINFO">#0 - BLOCKINFO
				518	Block</a></div>
				519
				520	<div class="doc_text">
				521
				522	<p>The BLOCKINFO block allows the description of metadata for other blocks. The
				523	currently specified records are:</p>
				524
				525	<ul>
				526	<li><tt>[SETBID (#1), blockid]</tt></li>
				527	<li><tt>[DEFINE_ABBREV, ...]</tt></li>
				528	</ul>
				529
				530	<p>
Chris Lattner	f19b8e4	2007-10-08 18:42:45 +0000	[diff] [blame]	531	The SETBID record indicates which block ID is being described. SETBID
				532	records can occur multiple times throughout the block to change which
				533	block ID is being described. There must be a SETBID record prior to
				534	any other records.
				535	</p>
				536
				537	<p>
				538	Standard DEFINE_ABBREV records can occur inside BLOCKINFO blocks, but unlike
				539	their occurrence in normal blocks, the abbreviation is defined for blocks
				540	matching the block ID we are describing, <i>not</i> the BLOCKINFO block itself.
				541	The abbreviations defined in BLOCKINFO blocks receive abbreviation ids
				542	as described in <a href="#DEFINE_ABBREV">DEFINE_ABBREV</a>.
				543	</p>
				544
				545	<p>
				546	Note that although the data in BLOCKINFO blocks is described as "metadata," the
				547	abbreviations they contain are essential for parsing records from the
				548	corresponding blocks. It is not safe to skip them.
Chris Lattner	7300af5	2007-05-13 00:59:52 +0000	[diff] [blame]	549	</p>
				550
				551	</div>
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	552
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	553	<!-- *********************************************************************** -->
Chris Lattner	6fa6a32	2008-07-09 05:14:23 +0000	[diff] [blame]	554	<div class="doc_section"> <a name="wrapper">Bitcode Wrapper Format</a></div>
				555	<!-- *********************************************************************** -->
				556
				557	<div class="doc_text">
				558
				559	<p>Bitcode files for LLVM IR may optionally be wrapped in a simple wrapper
				560	structure. This structure contains a simple header that indicates the offset
				561	and size of the embedded BC file. This allows additional information to be
				562	stored alongside the BC file. The structure of this file header is:
				563	</p>
				564
				565	<p>
				566	<pre>
				567	[Magic<sub>32</sub>,
				568	Version<sub>32</sub>,
				569	Offset<sub>32</sub>,
				570	Size<sub>32</sub>,
				571	CPUType<sub>32</sub>]
				572	</pre></p>
				573
				574	<p>Each of the fields are 32-bit fields stored in little endian form (as with
				575	the rest of the bitcode file fields). The Magic number is always
				576	<tt>0x0B17C0DE</tt> and the version is currently always <tt>0</tt>. The Offset
				577	field is the offset in bytes to the start of the bitcode stream in the file, and
				578	the Size field is a size in bytes of the stream. CPUType is a target-specific
				579	value that can be used to encode the CPU of the target.
				580	</div>
				581
				582
				583	<!-- *********************************************************************** -->
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	584	<div class="doc_section"> <a name="llvmir">LLVM IR Encoding</a></div>
				585	<!-- *********************************************************************** -->
				586
				587	<div class="doc_text">
				588
Chris Lattner	69b3e40	2007-05-13 01:39:44 +0000	[diff] [blame]	589	<p>LLVM IR is encoded into a bitstream by defining blocks and records. It uses
				590	blocks for things like constant pools, functions, symbol tables, etc. It uses
				591	records for things like instructions, global variable descriptors, type
				592	descriptions, etc. This document does not describe the set of abbreviations
				593	that the writer uses, as these are fully self-described in the file, and the
				594	reader is not allowed to build in any knowledge of this.</p>
				595
				596	</div>
				597
				598	<!-- ======================================================================= -->
				599	<div class="doc_subsection"><a name="basics">Basics</a>
				600	</div>
				601
				602	<!-- _______________________________________________________________________ -->
				603	<div class="doc_subsubsection"><a name="ir_magic">LLVM IR Magic Number</a></div>
				604
				605	<div class="doc_text">
				606
				607	<p>
				608	The magic number for LLVM IR files is:
				609	</p>
				610
Chris Lattner	f19b8e4	2007-10-08 18:42:45 +0000	[diff] [blame]	611	<p><tt>[0x0<sub>4</sub>, 0xC<sub>4</sub>, 0xE<sub>4</sub>, 0xD<sub>4</sub>]</tt></p>
Chris Lattner	69b3e40	2007-05-13 01:39:44 +0000	[diff] [blame]	612
Chris Lattner	f19b8e4	2007-10-08 18:42:45 +0000	[diff] [blame]	613	<p>When combined with the bitcode magic number and viewed as bytes, this is "BC 0xC0DE".</p>
Chris Lattner	69b3e40	2007-05-13 01:39:44 +0000	[diff] [blame]	614
				615	</div>
				616
				617	<!-- _______________________________________________________________________ -->
				618	<div class="doc_subsubsection"><a name="ir_signed_vbr">Signed VBRs</a></div>
				619
				620	<div class="doc_text">
				621
				622	<p>
				623	<a href="#variablewidth">Variable Width Integers</a> are an efficient way to
				624	encode arbitrary sized unsigned values, but is an extremely inefficient way to
				625	encode signed values (as signed values are otherwise treated as maximally large
				626	unsigned values).</p>
				627
				628	<p>As such, signed vbr values of a specific width are emitted as follows:</p>
				629
				630	<ul>
				631	<li>Positive values are emitted as vbrs of the specified width, but with their
				632	value shifted left by one.</li>
				633	<li>Negative values are emitted as vbrs of the specified width, but the negated
				634	value is shifted left by one, and the low bit is set.</li>
				635	</ul>
				636
				637	<p>With this encoding, small positive and small negative values can both be
				638	emitted efficiently.</p>
				639
				640	</div>
				641
				642
				643	<!-- _______________________________________________________________________ -->
				644	<div class="doc_subsubsection"><a name="ir_blocks">LLVM IR Blocks</a></div>
				645
				646	<div class="doc_text">
				647
				648	<p>
				649	LLVM IR is defined with the following blocks:
				650	</p>
				651
				652	<ul>
				653	<li>8 - MODULE_BLOCK - This is the top-level block that contains the
				654	entire module, and describes a variety of per-module information.</li>
				655	<li>9 - PARAMATTR_BLOCK - This enumerates the parameter attributes.</li>
				656	<li>10 - TYPE_BLOCK - This describes all of the types in the module.</li>
				657	<li>11 - CONSTANTS_BLOCK - This describes constants for a module or
				658	function.</li>
				659	<li>12 - FUNCTION_BLOCK - This describes a function body.</li>
				660	<li>13 - TYPE_SYMTAB_BLOCK - This describes the type symbol table.</li>
				661	<li>14 - VALUE_SYMTAB_BLOCK - This describes a value symbol table.</li>
				662	</ul>
				663
				664	</div>
				665
				666	<!-- ======================================================================= -->
				667	<div class="doc_subsection"><a name="MODULE_BLOCK">MODULE_BLOCK Contents</a>
				668	</div>
				669
				670	<div class="doc_text">
				671
				672	<p>
				673	</p>
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	674
				675	</div>
				676
				677
Reid Spencer	2c1ce4f	2007-01-20 23:21:08 +0000	[diff] [blame]	678	<!-- *********************************************************************** -->
				679	<hr>
				680	<address> <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
				681	src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
				682	<a href="http://validator.w3.org/check/referer"><img
				683	src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	684	<a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
Reid Spencer	2c1ce4f	2007-01-20 23:21:08 +0000	[diff] [blame]	685	<a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
				686	Last modified: $Date$
				687	</address>
				688	</body>
				689	</html>