Blame - docs/BitCodeFormat.html - platform/external/llvm

blob: a8c885acca4b34c11f3c56e31d4d26cd3478192b [file] [log] [blame]

Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
				2	"http://www.w3.org/TR/html4/strict.dtd">
Reid Spencer	2c1ce4f	2007-01-20 23:21:08 +0000	[diff] [blame]	3	<html>
				4	<head>
				5	<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
				6	<title>LLVM Bitcode File Format</title>
				7	<link rel="stylesheet" href="llvm.css" type="text/css">
Reid Spencer	2c1ce4f	2007-01-20 23:21:08 +0000	[diff] [blame]	8	</head>
				9	<body>
				10	<div class="doc_title"> LLVM Bitcode File Format </div>
				11	<ol>
				12	<li><a href="#abstract">Abstract</a></li>
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	13	<li><a href="#overview">Overview</a></li>
				14	<li><a href="#bitstream">Bitstream Format</a>
				15	<ol>
				16	<li><a href="#magic">Magic Numbers</a></li>
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	17	<li><a href="#primitives">Primitives</a></li>
				18	<li><a href="#abbrevid">Abbreviation IDs</a></li>
				19	<li><a href="#blocks">Blocks</a></li>
				20	<li><a href="#datarecord">Data Records</a></li>
Chris Lattner	daeb63c	2007-05-12 07:49:15 +0000	[diff] [blame]	21	<li><a href="#abbreviations">Abbreviations</a></li>
Chris Lattner	7300af5	2007-05-13 00:59:52 +0000	[diff] [blame]	22	<li><a href="#stdblocks">Standard Blocks</a></li>
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	23	</ol>
				24	</li>
Chris Lattner	6fa6a32	2008-07-09 05:14:23 +0000	[diff] [blame]	25	<li><a href="#wrapper">Bitcode Wrapper Format</a>
				26	</li>
Chris Lattner	69b3e40	2007-05-13 01:39:44 +0000	[diff] [blame]	27	<li><a href="#llvmir">LLVM IR Encoding</a>
				28	<ol>
				29	<li><a href="#basics">Basics</a></li>
				30	</ol>
				31	</li>
Reid Spencer	2c1ce4f	2007-01-20 23:21:08 +0000	[diff] [blame]	32	</ol>
				33	<div class="doc_author">
Chris Lattner	f19b8e4	2007-10-08 18:42:45 +0000	[diff] [blame]	34	<p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a>
				35	and <a href="http://www.reverberate.org">Joshua Haberman</a>.
Reid Spencer	2c1ce4f	2007-01-20 23:21:08 +0000	[diff] [blame]	36	</p>
				37	</div>
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	38
Reid Spencer	2c1ce4f	2007-01-20 23:21:08 +0000	[diff] [blame]	39	<!-- *********************************************************************** -->
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	40	<div class="doc_section"> <a name="abstract">Abstract</a></div>
Reid Spencer	2c1ce4f	2007-01-20 23:21:08 +0000	[diff] [blame]	41	<!-- *********************************************************************** -->
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	42
Reid Spencer	2c1ce4f	2007-01-20 23:21:08 +0000	[diff] [blame]	43	<div class="doc_text">
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	44
				45	<p>This document describes the LLVM bitstream file format and the encoding of
				46	the LLVM IR into it.</p>
				47
Reid Spencer	2c1ce4f	2007-01-20 23:21:08 +0000	[diff] [blame]	48	</div>
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	49
Reid Spencer	2c1ce4f	2007-01-20 23:21:08 +0000	[diff] [blame]	50	<!-- *********************************************************************** -->
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	51	<div class="doc_section"> <a name="overview">Overview</a></div>
Reid Spencer	2c1ce4f	2007-01-20 23:21:08 +0000	[diff] [blame]	52	<!-- *********************************************************************** -->
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	53
Reid Spencer	2c1ce4f	2007-01-20 23:21:08 +0000	[diff] [blame]	54	<div class="doc_text">
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	55
				56	<p>
				57	What is commonly known as the LLVM bitcode file format (also, sometimes
				58	anachronistically known as bytecode) is actually two things: a <a
				59	href="#bitstream">bitstream container format</a>
				60	and an <a href="#llvmir">encoding of LLVM IR</a> into the container format.</p>
				61
				62	<p>
Reid Spencer	58d0547	2007-05-12 08:01:52 +0000	[diff] [blame]	63	The bitstream format is an abstract encoding of structured data, very
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	64	similar to XML in some ways. Like XML, bitstream files contain tags, and nested
				65	structures, and you can parse the file without having to understand the tags.
				66	Unlike XML, the bitstream format is a binary encoding, and unlike XML it
				67	provides a mechanism for the file to self-describe "abbreviations", which are
				68	effectively size optimizations for the content.</p>
				69
Chris Lattner	6fa6a32	2008-07-09 05:14:23 +0000	[diff] [blame]	70	<p>LLVM IR files may be optionally embedded into a <a
				71	href="#wrapper">wrapper</a> structure that makes it easy to embed extra data
				72	along with LLVM IR files.</p>
				73
				74	<p>This document first describes the LLVM bitstream format, describes the
				75	wrapper format, then describes the record structure used by LLVM IR files.
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	76	</p>
				77
Reid Spencer	2c1ce4f	2007-01-20 23:21:08 +0000	[diff] [blame]	78	</div>
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	79
				80	<!-- *********************************************************************** -->
				81	<div class="doc_section"> <a name="bitstream">Bitstream Format</a></div>
				82	<!-- *********************************************************************** -->
				83
				84	<div class="doc_text">
				85
				86	<p>
				87	The bitstream format is literally a stream of bits, with a very simple
				88	structure. This structure consists of the following concepts:
				89	</p>
				90
				91	<ul>
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	92	<li>A "<a href="#magic">magic number</a>" that identifies the contents of
				93	the stream.</li>
				94	<li>Encoding <a href="#primitives">primitives</a> like variable bit-rate
				95	integers.</li>
				96	<li><a href="#blocks">Blocks</a>, which define nested content.</li>
				97	<li><a href="#datarecord">Data Records</a>, which describe entities within the
				98	file.</li>
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	99	<li>Abbreviations, which specify compression optimizations for the file.</li>
				100	</ul>
				101
				102	<p>Note that the <a
				103	href="CommandGuide/html/llvm-bcanalyzer.html">llvm-bcanalyzer</a> tool can be
				104	used to dump and inspect arbitrary bitstreams, which is very useful for
				105	understanding the encoding.</p>
				106
				107	</div>
				108
				109	<!-- ======================================================================= -->
				110	<div class="doc_subsection"><a name="magic">Magic Numbers</a>
				111	</div>
				112
				113	<div class="doc_text">
				114
Chris Lattner	f19b8e4	2007-10-08 18:42:45 +0000	[diff] [blame]	115	<p>The first two bytes of a bitcode file are 'BC' (0x42, 0x43).
				116	The second two bytes are an application-specific magic number. Generic
				117	bitcode tools can look at only the first two bytes to verify the file is
				118	bitcode, while application-specific programs will want to look at all four.</p>
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	119
				120	</div>
				121
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	122	<!-- ======================================================================= -->
				123	<div class="doc_subsection"><a name="primitives">Primitives</a>
				124	</div>
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	125
				126	<div class="doc_text">
				127
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	128	<p>
Chris Lattner	f19b8e4	2007-10-08 18:42:45 +0000	[diff] [blame]	129	A bitstream literally consists of a stream of bits, which are read in order
				130	starting with the least significant bit of each byte. The stream is made up of a
Chris Lattner	69b3e40	2007-05-13 01:39:44 +0000	[diff] [blame]	131	number of primitive values that encode a stream of unsigned integer values.
				132	These
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	133	integers are are encoded in two ways: either as <a href="#fixedwidth">Fixed
				134	Width Integers</a> or as <a href="#variablewidth">Variable Width
				135	Integers</a>.
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	136	</p>
				137
				138	</div>
				139
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	140	<!-- _______________________________________________________________________ -->
				141	<div class="doc_subsubsection"> <a name="fixedwidth">Fixed Width Integers</a>
				142	</div>
				143
				144	<div class="doc_text">
				145
				146	<p>Fixed-width integer values have their low bits emitted directly to the file.
				147	For example, a 3-bit integer value encodes 1 as 001. Fixed width integers
				148	are used when there are a well-known number of options for a field. For
				149	example, boolean values are usually encoded with a 1-bit wide integer.
				150	</p>
				151
				152	</div>
				153
				154	<!-- _______________________________________________________________________ -->
				155	<div class="doc_subsubsection"> <a name="variablewidth">Variable Width
				156	Integers</a></div>
				157
				158	<div class="doc_text">
				159
				160	<p>Variable-width integer (VBR) values encode values of arbitrary size,
				161	optimizing for the case where the values are small. Given a 4-bit VBR field,
				162	any 3-bit value (0 through 7) is encoded directly, with the high bit set to
				163	zero. Values larger than N-1 bits emit their bits in a series of N-1 bit
				164	chunks, where all but the last set the high bit.</p>
				165
				166	<p>For example, the value 27 (0x1B) is encoded as 1011 0011 when emitted as a
				167	vbr4 value. The first set of four bits indicates the value 3 (011) with a
				168	continuation piece (indicated by a high bit of 1). The next word indicates a
				169	value of 24 (011 << 3) with no continuation. The sum (3+24) yields the value
				170	27.
				171	</p>
				172
				173	</div>
				174
				175	<!-- _______________________________________________________________________ -->
				176	<div class="doc_subsubsection"> <a name="char6">6-bit characters</a></div>
				177
				178	<div class="doc_text">
				179
				180	<p>6-bit characters encode common characters into a fixed 6-bit field. They
Chris Lattner	f1d64e9	2007-05-12 07:50:14 +0000	[diff] [blame]	181	represent the following characters with the following 6-bit values:</p>
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	182
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	183	<div class="doc_code">
				184	<pre>
				185	'a' .. 'z' — 0 .. 25
				186	'A' .. 'Z' — 26 .. 51
				187	'0' .. '9' — 52 .. 61
				188	'.' — 62
				189	'_' — 63
				190	</pre>
				191	</div>
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	192
				193	<p>This encoding is only suitable for encoding characters and strings that
				194	consist only of the above characters. It is completely incapable of encoding
				195	characters not in the set.</p>
				196
				197	</div>
				198
				199	<!-- _______________________________________________________________________ -->
				200	<div class="doc_subsubsection"> <a name="wordalign">Word Alignment</a></div>
				201
				202	<div class="doc_text">
				203
				204	<p>Occasionally, it is useful to emit zero bits until the bitstream is a
				205	multiple of 32 bits. This ensures that the bit position in the stream can be
				206	represented as a multiple of 32-bit words.</p>
				207
				208	</div>
				209
				210
				211	<!-- ======================================================================= -->
				212	<div class="doc_subsection"><a name="abbrevid">Abbreviation IDs</a>
				213	</div>
				214
				215	<div class="doc_text">
				216
				217	<p>
				218	A bitstream is a sequential series of <a href="#blocks">Blocks</a> and
				219	<a href="#datarecord">Data Records</a>. Both of these start with an
				220	abbreviation ID encoded as a fixed-bitwidth field. The width is specified by
				221	the current block, as described below. The value of the abbreviation ID
				222	specifies either a builtin ID (which have special meanings, defined below) or
				223	one of the abbreviation IDs defined by the stream itself.
				224	</p>
				225
				226	<p>
				227	The set of builtin abbrev IDs is:
				228	</p>
				229
				230	<ul>
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	231	<li><tt>0 - <a href="#END_BLOCK">END_BLOCK</a></tt> — This abbrev ID marks
				232	the end of the current block.</li>
				233	<li><tt>1 - <a href="#ENTER_SUBBLOCK">ENTER_SUBBLOCK</a></tt> — This
				234	abbrev ID marks the beginning of a new block.</li>
				235	<li><tt>2 - <a href="#DEFINE_ABBREV">DEFINE_ABBREV</a></tt> — This defines
				236	a new abbreviation.</li>
				237	<li><tt>3 - <a href="#UNABBREV_RECORD">UNABBREV_RECORD</a></tt> — This ID
				238	specifies the definition of an unabbreviated record.</li>
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	239	</ul>
				240
Chris Lattner	daeb63c	2007-05-12 07:49:15 +0000	[diff] [blame]	241	<p>Abbreviation IDs 4 and above are defined by the stream itself, and specify
				242	an <a href="#abbrev_records">abbreviated record encoding</a>.</p>
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	243
				244	</div>
				245
				246	<!-- ======================================================================= -->
				247	<div class="doc_subsection"><a name="blocks">Blocks</a>
				248	</div>
				249
				250	<div class="doc_text">
				251
				252	<p>
				253	Blocks in a bitstream denote nested regions of the stream, and are identified by
				254	a content-specific id number (for example, LLVM IR uses an ID of 12 to represent
Chris Lattner	f19b8e4	2007-10-08 18:42:45 +0000	[diff] [blame]	255	function bodies). Block IDs 0-7 are reserved for <a href="#stdblocks">standard blocks</a>
				256	whose meaning is defined by Bitcode; block IDs 8 and greater are
				257	application specific. Nested blocks capture the hierachical structure of the data
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	258	encoded in it, and various properties are associated with blocks as the file is
				259	parsed. Block definitions allow the reader to efficiently skip blocks
				260	in constant time if the reader wants a summary of blocks, or if it wants to
				261	efficiently skip data they do not understand. The LLVM IR reader uses this
				262	mechanism to skip function bodies, lazily reading them on demand.
				263	</p>
				264
				265	<p>
				266	When reading and encoding the stream, several properties are maintained for the
				267	block. In particular, each block maintains:
				268	</p>
				269
				270	<ol>
				271	<li>A current abbrev id width. This value starts at 2, and is set every time a
				272	block record is entered. The block entry specifies the abbrev id width for
				273	the body of the block.</li>
				274
Chris Lattner	f19b8e4	2007-10-08 18:42:45 +0000	[diff] [blame]	275	<li>A set of abbreviations. Abbreviations may be defined within a block, in
				276	which case they are only defined in that block (neither subblocks nor
				277	enclosing blocks see the abbreviation). Abbreviations can also be defined
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	278	inside a <tt><a href="#BLOCKINFO">BLOCKINFO</a></tt> block, in which case
				279	they are defined in all blocks that match the ID that the BLOCKINFO block is
				280	describing.
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	281	</li>
				282	</ol>
				283
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	284	<p>
				285	As sub blocks are entered, these properties are saved and the new sub-block has
				286	its own set of abbreviations, and its own abbrev id width. When a sub-block is
				287	popped, the saved values are restored.
				288	</p>
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	289
				290	</div>
				291
				292	<!-- _______________________________________________________________________ -->
				293	<div class="doc_subsubsection"> <a name="ENTER_SUBBLOCK">ENTER_SUBBLOCK
				294	Encoding</a></div>
				295
				296	<div class="doc_text">
				297
				298	<p><tt>[ENTER_SUBBLOCK, blockid<sub>vbr8</sub>, newabbrevlen<sub>vbr4</sub>,
				299	<align32bits>, blocklen<sub>32</sub>]</tt></p>
				300
				301	<p>
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	302	The <tt>ENTER_SUBBLOCK</tt> abbreviation ID specifies the start of a new block
				303	record. The <tt>blockid</tt> value is encoded as an 8-bit VBR identifier, and
				304	indicates the type of block being entered, which can be
				305	a <a href="#stdblocks">standard block</a> or an application-specific block.
				306	The <tt>newabbrevlen</tt> value is a 4-bit VBR, which specifies the abbrev id
				307	width for the sub-block. The <tt>blocklen</tt> value is a 32-bit aligned value
				308	that specifies the size of the subblock in 32-bit words. This value allows the
				309	reader to skip over the entire block in one jump.
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	310	</p>
				311
				312	</div>
				313
				314	<!-- _______________________________________________________________________ -->
				315	<div class="doc_subsubsection"> <a name="END_BLOCK">END_BLOCK
				316	Encoding</a></div>
				317
				318	<div class="doc_text">
				319
				320	<p><tt>[END_BLOCK, <align32bits>]</tt></p>
				321
				322	<p>
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	323	The <tt>END_BLOCK</tt> abbreviation ID specifies the end of the current block
				324	record. Its end is aligned to 32-bits to ensure that the size of the block is
				325	an even multiple of 32-bits.
				326	</p>
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	327
				328	</div>
				329
				330
				331
				332	<!-- ======================================================================= -->
				333	<div class="doc_subsection"><a name="datarecord">Data Records</a>
				334	</div>
				335
				336	<div class="doc_text">
Chris Lattner	daeb63c	2007-05-12 07:49:15 +0000	[diff] [blame]	337	<p>
				338	Data records consist of a record code and a number of (up to) 64-bit integer
				339	values. The interpretation of the code and values is application specific and
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	340	there are multiple different ways to encode a record (with an unabbrev record or
				341	with an abbreviation). In the LLVM IR format, for example, there is a record
				342	which encodes the target triple of a module. The code is
				343	<tt>MODULE_CODE_TRIPLE</tt>, and the values of the record are the ASCII codes
				344	for the characters in the string.
				345	</p>
Chris Lattner	daeb63c	2007-05-12 07:49:15 +0000	[diff] [blame]	346
				347	</div>
				348
				349	<!-- _______________________________________________________________________ -->
				350	<div class="doc_subsubsection"> <a name="UNABBREV_RECORD">UNABBREV_RECORD
				351	Encoding</a></div>
				352
				353	<div class="doc_text">
				354
				355	<p><tt>[UNABBREV_RECORD, code<sub>vbr6</sub>, numops<sub>vbr6</sub>,
				356	op0<sub>vbr6</sub>, op1<sub>vbr6</sub>, ...]</tt></p>
				357
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	358	<p>
				359	An <tt>UNABBREV_RECORD</tt> provides a default fallback encoding, which is both
				360	completely general and extremely inefficient. It can describe an arbitrary
				361	record by emitting the code and operands as vbrs.
				362	</p>
Chris Lattner	daeb63c	2007-05-12 07:49:15 +0000	[diff] [blame]	363
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	364	<p>
				365	For example, emitting an LLVM IR target triple as an unabbreviated record
				366	requires emitting the <tt>UNABBREV_RECORD</tt> abbrevid, a vbr6 for the
				367	<tt>MODULE_CODE_TRIPLE</tt> code, a vbr6 for the length of the string, which is
				368	equal to the number of operands, and a vbr6 for each character. Because there
				369	are no letters with values less than 32, each letter would need to be emitted as
				370	at least a two-part VBR, which means that each letter would require at least 12
				371	bits. This is not an efficient encoding, but it is fully general.
				372	</p>
Chris Lattner	daeb63c	2007-05-12 07:49:15 +0000	[diff] [blame]	373
				374	</div>
				375
				376	<!-- _______________________________________________________________________ -->
				377	<div class="doc_subsubsection"> <a name="abbrev_records">Abbreviated Record
				378	Encoding</a></div>
				379
				380	<div class="doc_text">
				381
				382	<p><tt>[<abbrevid>, fields...]</tt></p>
				383
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	384	<p>
				385	An abbreviated record is a abbreviation id followed by a set of fields that are
				386	encoded according to the <a href="#abbreviations">abbreviation definition</a>.
				387	This allows records to be encoded significantly more densely than records
				388	encoded with the <tt><a href="#UNABBREV_RECORD">UNABBREV_RECORD</a></tt> type,
				389	and allows the abbreviation types to be specified in the stream itself, which
				390	allows the files to be completely self describing. The actual encoding of
				391	abbreviations is defined below.
Chris Lattner	daeb63c	2007-05-12 07:49:15 +0000	[diff] [blame]	392	</p>
				393
				394	</div>
				395
				396	<!-- ======================================================================= -->
				397	<div class="doc_subsection"><a name="abbreviations">Abbreviations</a>
				398	</div>
				399
				400	<div class="doc_text">
				401	<p>
				402	Abbreviations are an important form of compression for bitstreams. The idea is
				403	to specify a dense encoding for a class of records once, then use that encoding
				404	to emit many records. It takes space to emit the encoding into the file, but
				405	the space is recouped (hopefully plus some) when the records that use it are
				406	emitted.
				407	</p>
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	408
				409	<p>
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	410	Abbreviations can be determined dynamically per client, per file. Because the
Chris Lattner	daeb63c	2007-05-12 07:49:15 +0000	[diff] [blame]	411	abbreviations are stored in the bitstream itself, different streams of the same
				412	format can contain different sets of abbreviations if the specific stream does
				413	not need it. As a concrete example, LLVM IR files usually emit an abbreviation
				414	for binary operators. If a specific LLVM module contained no or few binary
				415	operators, the abbreviation does not need to be emitted.
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	416	</p>
Chris Lattner	daeb63c	2007-05-12 07:49:15 +0000	[diff] [blame]	417	</div>
				418
				419	<!-- _______________________________________________________________________ -->
				420	<div class="doc_subsubsection"><a name="DEFINE_ABBREV">DEFINE_ABBREV
				421	Encoding</a></div>
				422
				423	<div class="doc_text">
				424
				425	<p><tt>[DEFINE_ABBREV, numabbrevops<sub>vbr5</sub>, abbrevop0, abbrevop1,
				426	...]</tt></p>
				427
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	428	<p>
				429	A <tt>DEFINE_ABBREV</tt> record adds an abbreviation to the list of currently
				430	defined abbreviations in the scope of this block. This definition only exists
				431	inside this immediate block — it is not visible in subblocks or enclosing
				432	blocks. Abbreviations are implicitly assigned IDs sequentially starting from 4
				433	(the first application-defined abbreviation ID). Any abbreviations defined in a
				434	<tt>BLOCKINFO</tt> record receive IDs first, in order, followed by any
				435	abbreviations defined within the block itself. Abbreviated data records
				436	reference this ID to indicate what abbreviation they are invoking.
				437	</p>
Chris Lattner	f19b8e4	2007-10-08 18:42:45 +0000	[diff] [blame]	438
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	439	<p>
				440	An abbreviation definition consists of the <tt>DEFINE_ABBREV</tt> abbrevid
				441	followed by a VBR that specifies the number of abbrev operands, then the abbrev
Chris Lattner	daeb63c	2007-05-12 07:49:15 +0000	[diff] [blame]	442	operands themselves. Abbreviation operands come in three forms. They all start
				443	with a single bit that indicates whether the abbrev operand is a literal operand
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	444	(when the bit is 1) or an encoding operand (when the bit is 0).
				445	</p>
Chris Lattner	daeb63c	2007-05-12 07:49:15 +0000	[diff] [blame]	446
				447	<ol>
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	448	<li>Literal operands — <tt>[1<sub>1</sub>, litvalue<sub>vbr8</sub>]</tt>
				449	— Literal operands specify that the value in the result is always a single
				450	specific value. This specific value is emitted as a vbr8 after the bit
				451	indicating that it is a literal operand.</li>
				452	<li>Encoding info without data — <tt>[0<sub>1</sub>,
				453	encoding<sub>3</sub>]</tt> — Operand encodings that do not have extra
				454	data are just emitted as their code.
Chris Lattner	daeb63c	2007-05-12 07:49:15 +0000	[diff] [blame]	455	</li>
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	456	<li>Encoding info with data — <tt>[0<sub>1</sub>, encoding<sub>3</sub>,
				457	value<sub>vbr5</sub>]</tt> — Operand encodings that do have extra data are
Chris Lattner	7300af5	2007-05-13 00:59:52 +0000	[diff] [blame]	458	emitted as their code, followed by the extra data.
Chris Lattner	daeb63c	2007-05-12 07:49:15 +0000	[diff] [blame]	459	</li>
				460	</ol>
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	461
Chris Lattner	7300af5	2007-05-13 00:59:52 +0000	[diff] [blame]	462	<p>The possible operand encodings are:</p>
				463
				464	<ul>
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	465	<li>1 — Fixed — The field should be emitted as
				466	a <a href="#fixedwidth">fixed-width value</a>, whose width is specified by
				467	the operand's extra data.</li>
				468	<li>2 — VBR — The field should be emitted as
				469	a <a href="#variablewidth">variable-width value</a>, whose width is
				470	specified by the operand's extra data.</li>
				471	<li>3 — Array — This field is an array of values. The array operand
				472	has no extra data, but expects another operand to follow it which indicates
				473	the element type of the array. When reading an array in an abbreviated
				474	record, the first integer is a vbr6 that indicates the array length,
				475	followed by the encoded elements of the array. An array may only occur as
				476	the last operand of an abbreviation (except for the one final operand that
				477	gives the array's type).</li>
				478	<li>4 — Char6 — This field should be emitted as
				479	a <a href="#char6">char6-encoded value</a>. This operand type takes no
				480	extra data.</li>
Chris Lattner	7300af5	2007-05-13 00:59:52 +0000	[diff] [blame]	481	</ul>
				482
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	483	<p>
				484	For example, target triples in LLVM modules are encoded as a record of the
Chris Lattner	7300af5	2007-05-13 00:59:52 +0000	[diff] [blame]	485	form <tt>[TRIPLE, 'a', 'b', 'c', 'd']</tt>. Consider if the bitstream emitted
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	486	the following abbrev entry:
				487	</p>
Chris Lattner	7300af5	2007-05-13 00:59:52 +0000	[diff] [blame]	488
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	489	<div class="doc_code">
				490	<pre>
				491	[0, Fixed, 4]
				492	[0, Array]
				493	[0, Char6]
				494	</pre>
				495	</div>
Chris Lattner	7300af5	2007-05-13 00:59:52 +0000	[diff] [blame]	496
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	497	<p>
				498	When emitting a record with this abbreviation, the above entry would be emitted
				499	as:
				500	</p>
Chris Lattner	7300af5	2007-05-13 00:59:52 +0000	[diff] [blame]	501
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	502	<div class="doc_code">
Bill Wendling	903bcc4	2009-04-04 22:36:02 +0000	[diff] [blame^]	503	<p>
				504	<tt>[4<sub>abbrevwidth</sub>, 2<sub>4</sub>, 4<sub>vbr6</sub>, 0<sub>6</sub>,
				505	1<sub>6</sub>, 2<sub>6</sub>, 3<sub>6</sub>]</tt>
				506	</p>
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	507	</div>
Chris Lattner	7300af5	2007-05-13 00:59:52 +0000	[diff] [blame]	508
				509	<p>These values are:</p>
				510
				511	<ol>
				512	<li>The first value, 4, is the abbreviation ID for this abbreviation.</li>
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	513	<li>The second value, 2, is the code for <tt>TRIPLE</tt> in LLVM IR files.</li>
Chris Lattner	7300af5	2007-05-13 00:59:52 +0000	[diff] [blame]	514	<li>The third value, 4, is the length of the array.</li>
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	515	<li>The rest of the values are the char6 encoded values
				516	for <tt>"abcd"</tt>.</li>
Chris Lattner	7300af5	2007-05-13 00:59:52 +0000	[diff] [blame]	517	</ol>
				518
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	519	<p>
				520	With this abbreviation, the triple is emitted with only 37 bits (assuming a
Chris Lattner	7300af5	2007-05-13 00:59:52 +0000	[diff] [blame]	521	abbrev id width of 3). Without the abbreviation, significantly more space would
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	522	be required to emit the target triple. Also, because the <tt>TRIPLE</tt> value
				523	is not emitted as a literal in the abbreviation, the abbreviation can also be
				524	used for any other string value.
Chris Lattner	7300af5	2007-05-13 00:59:52 +0000	[diff] [blame]	525	</p>
				526
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	527	</div>
				528
Chris Lattner	7300af5	2007-05-13 00:59:52 +0000	[diff] [blame]	529	<!-- ======================================================================= -->
				530	<div class="doc_subsection"><a name="stdblocks">Standard Blocks</a>
				531	</div>
				532
				533	<div class="doc_text">
				534
				535	<p>
				536	In addition to the basic block structure and record encodings, the bitstream
				537	also defines specific builtin block types. These block types specify how the
				538	stream is to be decoded or other metadata. In the future, new standard blocks
Chris Lattner	f19b8e4	2007-10-08 18:42:45 +0000	[diff] [blame]	539	may be added. Block IDs 0-7 are reserved for standard blocks.
Chris Lattner	7300af5	2007-05-13 00:59:52 +0000	[diff] [blame]	540	</p>
				541
				542	</div>
				543
				544	<!-- _______________________________________________________________________ -->
				545	<div class="doc_subsubsection"><a name="BLOCKINFO">#0 - BLOCKINFO
				546	Block</a></div>
				547
				548	<div class="doc_text">
				549
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	550	<p>
				551	The <tt>BLOCKINFO</tt> block allows the description of metadata for other
				552	blocks. The currently specified records are:
				553	</p>
				554
				555	<div class="doc_code">
				556	<pre>
				557	[SETBID (#1), blockid]
				558	[DEFINE_ABBREV, ...]
				559	</pre>
				560	</div>
Chris Lattner	7300af5	2007-05-13 00:59:52 +0000	[diff] [blame]	561
				562	<p>
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	563	The <tt>SETBID</tt> record indicates which block ID is being
				564	described. <tt>SETBID</tt> records can occur multiple times throughout the
				565	block to change which block ID is being described. There must be
				566	a <tt>SETBID</tt> record prior to any other records.
Chris Lattner	f19b8e4	2007-10-08 18:42:45 +0000	[diff] [blame]	567	</p>
				568
				569	<p>
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	570	Standard <tt>DEFINE_ABBREV</tt> records can occur inside <tt>BLOCKINFO</tt>
				571	blocks, but unlike their occurrence in normal blocks, the abbreviation is
				572	defined for blocks matching the block ID we are describing, <i>not</i> the
				573	<tt>BLOCKINFO</tt> block itself. The abbreviations defined
				574	in <tt>BLOCKINFO</tt> blocks receive abbreviation IDs as described
				575	in <tt><a href="#DEFINE_ABBREV">DEFINE_ABBREV</a></tt>.
Chris Lattner	f19b8e4	2007-10-08 18:42:45 +0000	[diff] [blame]	576	</p>
				577
				578	<p>
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	579	Note that although the data in <tt>BLOCKINFO</tt> blocks is described as
				580	"metadata," the abbreviations they contain are essential for parsing records
				581	from the corresponding blocks. It is not safe to skip them.
Chris Lattner	7300af5	2007-05-13 00:59:52 +0000	[diff] [blame]	582	</p>
				583
				584	</div>
Chris Lattner	3a1716d	2007-05-12 05:37:42 +0000	[diff] [blame]	585
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	586	<!-- *********************************************************************** -->
Chris Lattner	6fa6a32	2008-07-09 05:14:23 +0000	[diff] [blame]	587	<div class="doc_section"> <a name="wrapper">Bitcode Wrapper Format</a></div>
				588	<!-- *********************************************************************** -->
				589
				590	<div class="doc_text">
				591
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	592	<p>
				593	Bitcode files for LLVM IR may optionally be wrapped in a simple wrapper
Chris Lattner	6fa6a32	2008-07-09 05:14:23 +0000	[diff] [blame]	594	structure. This structure contains a simple header that indicates the offset
				595	and size of the embedded BC file. This allows additional information to be
				596	stored alongside the BC file. The structure of this file header is:
				597	</p>
				598
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	599	<div class="doc_code">
Bill Wendling	903bcc4	2009-04-04 22:36:02 +0000	[diff] [blame^]	600	<p>
				601	<tt>[Magic<sub>32</sub>, Version<sub>32</sub>, Offset<sub>32</sub>,
				602	Size<sub>32</sub>, CPUType<sub>32</sub>]</tt>
				603	</p>
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	604	</div>
Chris Lattner	6fa6a32	2008-07-09 05:14:23 +0000	[diff] [blame]	605
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	606	<p>
				607	Each of the fields are 32-bit fields stored in little endian form (as with
Chris Lattner	6fa6a32	2008-07-09 05:14:23 +0000	[diff] [blame]	608	the rest of the bitcode file fields). The Magic number is always
				609	<tt>0x0B17C0DE</tt> and the version is currently always <tt>0</tt>. The Offset
				610	field is the offset in bytes to the start of the bitcode stream in the file, and
				611	the Size field is a size in bytes of the stream. CPUType is a target-specific
				612	value that can be used to encode the CPU of the target.
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	613	</p>
Chris Lattner	6fa6a32	2008-07-09 05:14:23 +0000	[diff] [blame]	614
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	615	</div>
Chris Lattner	6fa6a32	2008-07-09 05:14:23 +0000	[diff] [blame]	616
				617	<!-- *********************************************************************** -->
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	618	<div class="doc_section"> <a name="llvmir">LLVM IR Encoding</a></div>
				619	<!-- *********************************************************************** -->
				620
				621	<div class="doc_text">
				622
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	623	<p>
				624	LLVM IR is encoded into a bitstream by defining blocks and records. It uses
Chris Lattner	69b3e40	2007-05-13 01:39:44 +0000	[diff] [blame]	625	blocks for things like constant pools, functions, symbol tables, etc. It uses
				626	records for things like instructions, global variable descriptors, type
				627	descriptions, etc. This document does not describe the set of abbreviations
				628	that the writer uses, as these are fully self-described in the file, and the
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	629	reader is not allowed to build in any knowledge of this.
				630	</p>
Chris Lattner	69b3e40	2007-05-13 01:39:44 +0000	[diff] [blame]	631
				632	</div>
				633
				634	<!-- ======================================================================= -->
				635	<div class="doc_subsection"><a name="basics">Basics</a>
				636	</div>
				637
				638	<!-- _______________________________________________________________________ -->
				639	<div class="doc_subsubsection"><a name="ir_magic">LLVM IR Magic Number</a></div>
				640
				641	<div class="doc_text">
				642
				643	<p>
				644	The magic number for LLVM IR files is:
				645	</p>
				646
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	647	<div class="doc_code">
Bill Wendling	903bcc4	2009-04-04 22:36:02 +0000	[diff] [blame^]	648	<p>
				649	<tt>[0x0<sub>4</sub>, 0xC<sub>4</sub>, 0xE<sub>4</sub>, 0xD<sub>4</sub>]</tt>
				650	</p>
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	651	</div>
Chris Lattner	69b3e40	2007-05-13 01:39:44 +0000	[diff] [blame]	652
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	653	<p>
				654	When combined with the bitcode magic number and viewed as bytes, this is
				655	<tt>"BC 0xC0DE"</tt>.
				656	</p>
Chris Lattner	69b3e40	2007-05-13 01:39:44 +0000	[diff] [blame]	657
				658	</div>
				659
				660	<!-- _______________________________________________________________________ -->
				661	<div class="doc_subsubsection"><a name="ir_signed_vbr">Signed VBRs</a></div>
				662
				663	<div class="doc_text">
				664
				665	<p>
				666	<a href="#variablewidth">Variable Width Integers</a> are an efficient way to
				667	encode arbitrary sized unsigned values, but is an extremely inefficient way to
				668	encode signed values (as signed values are otherwise treated as maximally large
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	669	unsigned values).
				670	</p>
Chris Lattner	69b3e40	2007-05-13 01:39:44 +0000	[diff] [blame]	671
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	672	<p>
				673	As such, signed vbr values of a specific width are emitted as follows:
				674	</p>
Chris Lattner	69b3e40	2007-05-13 01:39:44 +0000	[diff] [blame]	675
				676	<ul>
				677	<li>Positive values are emitted as vbrs of the specified width, but with their
				678	value shifted left by one.</li>
				679	<li>Negative values are emitted as vbrs of the specified width, but the negated
				680	value is shifted left by one, and the low bit is set.</li>
				681	</ul>
				682
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	683	<p>
				684	With this encoding, small positive and small negative values can both be emitted
				685	efficiently.
				686	</p>
Chris Lattner	69b3e40	2007-05-13 01:39:44 +0000	[diff] [blame]	687
				688	</div>
				689
				690
				691	<!-- _______________________________________________________________________ -->
				692	<div class="doc_subsubsection"><a name="ir_blocks">LLVM IR Blocks</a></div>
				693
				694	<div class="doc_text">
				695
				696	<p>
				697	LLVM IR is defined with the following blocks:
				698	</p>
				699
				700	<ul>
Bill Wendling	bb7425f	2009-04-04 22:27:03 +0000	[diff] [blame]	701	<li>8 — <tt>MODULE_BLOCK</tt> — This is the top-level block that
				702	contains the entire module, and describes a variety of per-module
				703	information.</li>
				704	<li>9 — <tt>PARAMATTR_BLOCK</tt> — This enumerates the parameter
				705	attributes.</li>
				706	<li>10 — <tt>TYPE_BLOCK</tt> — This describes all of the types in
				707	the module.</li>
				708	<li>11 — <tt>CONSTANTS_BLOCK</tt> — This describes constants for a
				709	module or function.</li>
				710	<li>12 — <tt>FUNCTION_BLOCK</tt> — This describes a function
				711	body.</li>
				712	<li>13 — <tt>TYPE_SYMTAB_BLOCK</tt> — This describes the type symbol
				713	table.</li>
				714	<li>14 — <tt>VALUE_SYMTAB_BLOCK</tt> — This describes a value symbol
				715	table.</li>
Chris Lattner	69b3e40	2007-05-13 01:39:44 +0000	[diff] [blame]	716	</ul>
				717
				718	</div>
				719
				720	<!-- ======================================================================= -->
				721	<div class="doc_subsection"><a name="MODULE_BLOCK">MODULE_BLOCK Contents</a>
				722	</div>
				723
				724	<div class="doc_text">
				725
				726	<p>
				727	</p>
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	728
				729	</div>
				730
				731
Reid Spencer	2c1ce4f	2007-01-20 23:21:08 +0000	[diff] [blame]	732	<!-- *********************************************************************** -->
				733	<hr>
				734	<address> <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
Misha Brukman	4440870	2008-12-11 17:34:48 +0000	[diff] [blame]	735	src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a>
Reid Spencer	2c1ce4f	2007-01-20 23:21:08 +0000	[diff] [blame]	736	<a href="http://validator.w3.org/check/referer"><img
Misha Brukman	4440870	2008-12-11 17:34:48 +0000	[diff] [blame]	737	src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a>
Chris Lattner	e9ef457	2007-05-12 03:23:40 +0000	[diff] [blame]	738	<a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
Reid Spencer	2c1ce4f	2007-01-20 23:21:08 +0000	[diff] [blame]	739	<a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
				740	Last modified: $Date$
				741	</address>
				742	</body>
				743	</html>