Blame - docs/CodeGenerator.html - platform/external/llvm

blob: b454b6b9d0b9055b1dc2021cbdbf6f1936ea4d8b [file] [log] [blame]

Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
				2	"http://www.w3.org/TR/html4/strict.dtd">
				3	<html>
				4	<head>
				5	<meta http-equiv="content-type" content="text/html; charset=utf-8">
				6	<title>The LLVM Target-Independent Code Generator</title>
				7	<link rel="stylesheet" href="llvm.css" type="text/css">
				8	</head>
				9	<body>
				10
				11	<div class="doc_title">
				12	The LLVM Target-Independent Code Generator
				13	</div>
				14
				15	<ol>
				16	<li><a href="#introduction">Introduction</a>
				17	<ul>
				18	<li><a href="#required">Required components in the code generator</a></li>
				19	<li><a href="#high-level-design">The high-level design of the code
				20	generator</a></li>
				21	<li><a href="#tablegen">Using TableGen for target description</a></li>
				22	</ul>
				23	</li>
				24	<li><a href="#targetdesc">Target description classes</a>
				25	<ul>
				26	<li><a href="#targetmachine">The <tt>TargetMachine</tt> class</a></li>
				27	<li><a href="#targetdata">The <tt>TargetData</tt> class</a></li>
				28	<li><a href="#targetlowering">The <tt>TargetLowering</tt> class</a></li>
				29	<li><a href="#mregisterinfo">The <tt>MRegisterInfo</tt> class</a></li>
				30	<li><a href="#targetinstrinfo">The <tt>TargetInstrInfo</tt> class</a></li>
				31	<li><a href="#targetframeinfo">The <tt>TargetFrameInfo</tt> class</a></li>
				32	<li><a href="#targetsubtarget">The <tt>TargetSubtarget</tt> class</a></li>
				33	<li><a href="#targetjitinfo">The <tt>TargetJITInfo</tt> class</a></li>
				34	</ul>
				35	</li>
				36	<li><a href="#codegendesc">Machine code description classes</a>
				37	<ul>
				38	<li><a href="#machineinstr">The <tt>MachineInstr</tt> class</a></li>
				39	<li><a href="#machinebasicblock">The <tt>MachineBasicBlock</tt>
				40	class</a></li>
				41	<li><a href="#machinefunction">The <tt>MachineFunction</tt> class</a></li>
				42	</ul>
				43	</li>
				44	<li><a href="#codegenalgs">Target-independent code generation algorithms</a>
				45	<ul>
				46	<li><a href="#instselect">Instruction Selection</a>
				47	<ul>
				48	<li><a href="#selectiondag_intro">Introduction to SelectionDAGs</a></li>
				49	<li><a href="#selectiondag_process">SelectionDAG Code Generation
				50	Process</a></li>
				51	<li><a href="#selectiondag_build">Initial SelectionDAG
				52	Construction</a></li>
				53	<li><a href="#selectiondag_legalize">SelectionDAG Legalize Phase</a></li>
				54	<li><a href="#selectiondag_optimize">SelectionDAG Optimization
				55	Phase: the DAG Combiner</a></li>
				56	<li><a href="#selectiondag_select">SelectionDAG Select Phase</a></li>
				57	<li><a href="#selectiondag_sched">SelectionDAG Scheduling and Formation
				58	Phase</a></li>
				59	<li><a href="#selectiondag_future">Future directions for the
				60	SelectionDAG</a></li>
				61	</ul></li>
				62	<li><a href="#liveintervals">Live Intervals</a>
				63	<ul>
				64	<li><a href="#livevariable_analysis">Live Variable Analysis</a></li>
				65	<li><a href="#liveintervals_analysis">Live Intervals Analysis</a></li>
				66	</ul></li>
				67	<li><a href="#regalloc">Register Allocation</a>
				68	<ul>
				69	<li><a href="#regAlloc_represent">How registers are represented in
				70	LLVM</a></li>
				71	<li><a href="#regAlloc_howTo">Mapping virtual registers to physical
				72	registers</a></li>
				73	<li><a href="#regAlloc_twoAddr">Handling two address instructions</a></li>
				74	<li><a href="#regAlloc_ssaDecon">The SSA deconstruction phase</a></li>
				75	<li><a href="#regAlloc_fold">Instruction folding</a></li>
				76	<li><a href="#regAlloc_builtIn">Built in register allocators</a></li>
				77	</ul></li>
				78	<li><a href="#codeemit">Code Emission</a>
				79	<ul>
				80	<li><a href="#codeemit_asm">Generating Assembly Code</a></li>
				81	<li><a href="#codeemit_bin">Generating Binary Machine Code</a></li>
				82	</ul></li>
				83	</ul>
				84	</li>
				85	<li><a href="#targetimpls">Target-specific Implementation Notes</a>
				86	<ul>
				87	<li><a href="#x86">The X86 backend</a></li>
				88	<li><a href="#ppc">The PowerPC backend</a>
				89	<ul>
				90	<li><a href="#ppc_abi">LLVM PowerPC ABI</a></li>
				91	<li><a href="#ppc_frame">Frame Layout</a></li>
				92	<li><a href="#ppc_prolog">Prolog/Epilog</a></li>
				93	<li><a href="#ppc_dynamic">Dynamic Allocation</a></li>
				94	</ul></li>
				95	</ul></li>
				96
				97	</ol>
				98
				99	<div class="doc_author">
				100	<p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a>,
				101	<a href="mailto:isanbard@gmail.com">Bill Wendling</a>,
				102	<a href="mailto:pronesto@gmail.com">Fernando Magno Quintao
				103	Pereira</a> and
				104	<a href="mailto:jlaskey@mac.com">Jim Laskey</a></p>
				105	</div>
				106
				107	<div class="doc_warning">
				108	<p>Warning: This is a work in progress.</p>
				109	</div>
				110
				111	<!-- *********************************************************************** -->
				112	<div class="doc_section">
				113	<a name="introduction">Introduction</a>
				114	</div>
				115	<!-- *********************************************************************** -->
				116
				117	<div class="doc_text">
				118
				119	<p>The LLVM target-independent code generator is a framework that provides a
				120	suite of reusable components for translating the LLVM internal representation to
				121	the machine code for a specified target—either in assembly form (suitable
				122	for a static compiler) or in binary machine code format (usable for a JIT
				123	compiler). The LLVM target-independent code generator consists of five main
				124	components:</p>
				125
				126	<ol>
				127	<li><a href="#targetdesc">Abstract target description</a> interfaces which
				128	capture important properties about various aspects of the machine, independently
				129	of how they will be used. These interfaces are defined in
				130	<tt>include/llvm/Target/</tt>.</li>
				131
				132	<li>Classes used to represent the <a href="#codegendesc">machine code</a> being
				133	generated for a target. These classes are intended to be abstract enough to
				134	represent the machine code for <i>any</i> target machine. These classes are
				135	defined in <tt>include/llvm/CodeGen/</tt>.</li>
				136
				137	<li><a href="#codegenalgs">Target-independent algorithms</a> used to implement
				138	various phases of native code generation (register allocation, scheduling, stack
				139	frame representation, etc). This code lives in <tt>lib/CodeGen/</tt>.</li>
				140
				141	<li><a href="#targetimpls">Implementations of the abstract target description
				142	interfaces</a> for particular targets. These machine descriptions make use of
				143	the components provided by LLVM, and can optionally provide custom
				144	target-specific passes, to build complete code generators for a specific target.
				145	Target descriptions live in <tt>lib/Target/</tt>.</li>
				146
				147	<li><a href="#jit">The target-independent JIT components</a>. The LLVM JIT is
				148	completely target independent (it uses the <tt>TargetJITInfo</tt> structure to
				149	interface for target-specific issues. The code for the target-independent
				150	JIT lives in <tt>lib/ExecutionEngine/JIT</tt>.</li>
				151
				152	</ol>
				153
				154	<p>
				155	Depending on which part of the code generator you are interested in working on,
				156	different pieces of this will be useful to you. In any case, you should be
				157	familiar with the <a href="#targetdesc">target description</a> and <a
				158	href="#codegendesc">machine code representation</a> classes. If you want to add
				159	a backend for a new target, you will need to <a href="#targetimpls">implement the
				160	target description</a> classes for your new target and understand the <a
				161	href="LangRef.html">LLVM code representation</a>. If you are interested in
				162	implementing a new <a href="#codegenalgs">code generation algorithm</a>, it
				163	should only depend on the target-description and machine code representation
				164	classes, ensuring that it is portable.
				165	</p>
				166
				167	</div>
				168
				169	<!-- ======================================================================= -->
				170	<div class="doc_subsection">
				171	<a name="required">Required components in the code generator</a>
				172	</div>
				173
				174	<div class="doc_text">
				175
				176	<p>The two pieces of the LLVM code generator are the high-level interface to the
				177	code generator and the set of reusable components that can be used to build
				178	target-specific backends. The two most important interfaces (<a
				179	href="#targetmachine"><tt>TargetMachine</tt></a> and <a
				180	href="#targetdata"><tt>TargetData</tt></a>) are the only ones that are
				181	required to be defined for a backend to fit into the LLVM system, but the others
				182	must be defined if the reusable code generator components are going to be
				183	used.</p>
				184
				185	<p>This design has two important implications. The first is that LLVM can
				186	support completely non-traditional code generation targets. For example, the C
				187	backend does not require register allocation, instruction selection, or any of
				188	the other standard components provided by the system. As such, it only
				189	implements these two interfaces, and does its own thing. Another example of a
				190	code generator like this is a (purely hypothetical) backend that converts LLVM
				191	to the GCC RTL form and uses GCC to emit machine code for a target.</p>
				192
				193	<p>This design also implies that it is possible to design and
				194	implement radically different code generators in the LLVM system that do not
				195	make use of any of the built-in components. Doing so is not recommended at all,
				196	but could be required for radically different targets that do not fit into the
				197	LLVM machine description model: FPGAs for example.</p>
				198
				199	</div>
				200
				201	<!-- ======================================================================= -->
				202	<div class="doc_subsection">
				203	<a name="high-level-design">The high-level design of the code generator</a>
				204	</div>
				205
				206	<div class="doc_text">
				207
				208	<p>The LLVM target-independent code generator is designed to support efficient and
				209	quality code generation for standard register-based microprocessors. Code
				210	generation in this model is divided into the following stages:</p>
				211
				212	<ol>
				213	<li><b><a href="#instselect">Instruction Selection</a></b> - This phase
				214	determines an efficient way to express the input LLVM code in the target
				215	instruction set.
				216	This stage produces the initial code for the program in the target instruction
				217	set, then makes use of virtual registers in SSA form and physical registers that
				218	represent any required register assignments due to target constraints or calling
				219	conventions. This step turns the LLVM code into a DAG of target
				220	instructions.</li>
				221
				222	<li><b><a href="#selectiondag_sched">Scheduling and Formation</a></b> - This
				223	phase takes the DAG of target instructions produced by the instruction selection
				224	phase, determines an ordering of the instructions, then emits the instructions
				225	as <tt><a href="#machineinstr">MachineInstr</a></tt>s with that ordering. Note
				226	that we describe this in the <a href="#instselect">instruction selection
				227	section</a> because it operates on a <a
				228	href="#selectiondag_intro">SelectionDAG</a>.
				229	</li>
				230
				231	<li><b><a href="#ssamco">SSA-based Machine Code Optimizations</a></b> - This
				232	optional stage consists of a series of machine-code optimizations that
				233	operate on the SSA-form produced by the instruction selector. Optimizations
				234	like modulo-scheduling or peephole optimization work here.
				235	</li>
				236
				237	<li><b><a href="#regalloc">Register Allocation</a></b> - The
				238	target code is transformed from an infinite virtual register file in SSA form
				239	to the concrete register file used by the target. This phase introduces spill
				240	code and eliminates all virtual register references from the program.</li>
				241
				242	<li><b><a href="#proepicode">Prolog/Epilog Code Insertion</a></b> - Once the
				243	machine code has been generated for the function and the amount of stack space
				244	required is known (used for LLVM alloca's and spill slots), the prolog and
				245	epilog code for the function can be inserted and "abstract stack location
				246	references" can be eliminated. This stage is responsible for implementing
				247	optimizations like frame-pointer elimination and stack packing.</li>
				248
				249	<li><b><a href="#latemco">Late Machine Code Optimizations</a></b> - Optimizations
				250	that operate on "final" machine code can go here, such as spill code scheduling
				251	and peephole optimizations.</li>
				252
				253	<li><b><a href="#codeemit">Code Emission</a></b> - The final stage actually
				254	puts out the code for the current function, either in the target assembler
				255	format or in machine code.</li>
				256
				257	</ol>
				258
				259	<p>The code generator is based on the assumption that the instruction selector
				260	will use an optimal pattern matching selector to create high-quality sequences of
				261	native instructions. Alternative code generator designs based on pattern
				262	expansion and aggressive iterative peephole optimization are much slower. This
				263	design permits efficient compilation (important for JIT environments) and
				264	aggressive optimization (used when generating code offline) by allowing
				265	components of varying levels of sophistication to be used for any step of
				266	compilation.</p>
				267
				268	<p>In addition to these stages, target implementations can insert arbitrary
				269	target-specific passes into the flow. For example, the X86 target uses a
				270	special pass to handle the 80x87 floating point stack architecture. Other
				271	targets with unusual requirements can be supported with custom passes as
				272	needed.</p>
				273
				274	</div>
				275
				276
				277	<!-- ======================================================================= -->
				278	<div class="doc_subsection">
				279	<a name="tablegen">Using TableGen for target description</a>
				280	</div>
				281
				282	<div class="doc_text">
				283
				284	<p>The target description classes require a detailed description of the target
				285	architecture. These target descriptions often have a large amount of common
				286	information (e.g., an <tt>add</tt> instruction is almost identical to a
				287	<tt>sub</tt> instruction).
				288	In order to allow the maximum amount of commonality to be factored out, the LLVM
				289	code generator uses the <a href="TableGenFundamentals.html">TableGen</a> tool to
				290	describe big chunks of the target machine, which allows the use of
				291	domain-specific and target-specific abstractions to reduce the amount of
				292	repetition.</p>
				293
				294	<p>As LLVM continues to be developed and refined, we plan to move more and more
				295	of the target description to the <tt>.td</tt> form. Doing so gives us a
				296	number of advantages. The most important is that it makes it easier to port
				297	LLVM because it reduces the amount of C++ code that has to be written, and the
				298	surface area of the code generator that needs to be understood before someone
				299	can get something working. Second, it makes it easier to change things. In
				300	particular, if tables and other things are all emitted by <tt>tblgen</tt>, we
				301	only need a change in one place (<tt>tblgen</tt>) to update all of the targets
				302	to a new interface.</p>
				303
				304	</div>
				305
				306	<!-- *********************************************************************** -->
				307	<div class="doc_section">
				308	<a name="targetdesc">Target description classes</a>
				309	</div>
				310	<!-- *********************************************************************** -->
				311
				312	<div class="doc_text">
				313
				314	<p>The LLVM target description classes (located in the
				315	<tt>include/llvm/Target</tt> directory) provide an abstract description of the
				316	target machine independent of any particular client. These classes are
				317	designed to capture the <i>abstract</i> properties of the target (such as the
				318	instructions and registers it has), and do not incorporate any particular pieces
				319	of code generation algorithms.</p>
				320
				321	<p>All of the target description classes (except the <tt><a
				322	href="#targetdata">TargetData</a></tt> class) are designed to be subclassed by
				323	the concrete target implementation, and have virtual methods implemented. To
				324	get to these implementations, the <tt><a
				325	href="#targetmachine">TargetMachine</a></tt> class provides accessors that
				326	should be implemented by the target.</p>
				327
				328	</div>
				329
				330	<!-- ======================================================================= -->
				331	<div class="doc_subsection">
				332	<a name="targetmachine">The <tt>TargetMachine</tt> class</a>
				333	</div>
				334
				335	<div class="doc_text">
				336
				337	<p>The <tt>TargetMachine</tt> class provides virtual methods that are used to
				338	access the target-specific implementations of the various target description
				339	classes via the <tt>get*Info</tt> methods (<tt>getInstrInfo</tt>,
				340	<tt>getRegisterInfo</tt>, <tt>getFrameInfo</tt>, etc.). This class is
				341	designed to be specialized by
				342	a concrete target implementation (e.g., <tt>X86TargetMachine</tt>) which
				343	implements the various virtual methods. The only required target description
				344	class is the <a href="#targetdata"><tt>TargetData</tt></a> class, but if the
				345	code generator components are to be used, the other interfaces should be
				346	implemented as well.</p>
				347
				348	</div>
				349
				350
				351	<!-- ======================================================================= -->
				352	<div class="doc_subsection">
				353	<a name="targetdata">The <tt>TargetData</tt> class</a>
				354	</div>
				355
				356	<div class="doc_text">
				357
				358	<p>The <tt>TargetData</tt> class is the only required target description class,
				359	and it is the only class that is not extensible (you cannot derived a new
				360	class from it). <tt>TargetData</tt> specifies information about how the target
				361	lays out memory for structures, the alignment requirements for various data
				362	types, the size of pointers in the target, and whether the target is
				363	little-endian or big-endian.</p>
				364
				365	</div>
				366
				367	<!-- ======================================================================= -->
				368	<div class="doc_subsection">
				369	<a name="targetlowering">The <tt>TargetLowering</tt> class</a>
				370	</div>
				371
				372	<div class="doc_text">
				373
				374	<p>The <tt>TargetLowering</tt> class is used by SelectionDAG based instruction
				375	selectors primarily to describe how LLVM code should be lowered to SelectionDAG
				376	operations. Among other things, this class indicates:</p>
				377
				378	<ul>
				379	<li>an initial register class to use for various <tt>ValueType</tt>s</li>
				380	<li>which operations are natively supported by the target machine</li>
				381	<li>the return type of <tt>setcc</tt> operations</li>
				382	<li>the type to use for shift amounts</li>
				383	<li>various high-level characteristics, like whether it is profitable to turn
				384	division by a constant into a multiplication sequence</li>
				385	</ul>
				386
				387	</div>
				388
				389	<!-- ======================================================================= -->
				390	<div class="doc_subsection">
				391	<a name="mregisterinfo">The <tt>MRegisterInfo</tt> class</a>
				392	</div>
				393
				394	<div class="doc_text">
				395
				396	<p>The <tt>MRegisterInfo</tt> class (which will eventually be renamed to
				397	<tt>TargetRegisterInfo</tt>) is used to describe the register file of the
				398	target and any interactions between the registers.</p>
				399
				400	<p>Registers in the code generator are represented in the code generator by
				401	unsigned integers. Physical registers (those that actually exist in the target
				402	description) are unique small numbers, and virtual registers are generally
				403	large. Note that register #0 is reserved as a flag value.</p>
				404
				405	<p>Each register in the processor description has an associated
				406	<tt>TargetRegisterDesc</tt> entry, which provides a textual name for the
				407	register (used for assembly output and debugging dumps) and a set of aliases
				408	(used to indicate whether one register overlaps with another).
				409	</p>
				410
				411	<p>In addition to the per-register description, the <tt>MRegisterInfo</tt> class
				412	exposes a set of processor specific register classes (instances of the
				413	<tt>TargetRegisterClass</tt> class). Each register class contains sets of
				414	registers that have the same properties (for example, they are all 32-bit
				415	integer registers). Each SSA virtual register created by the instruction
				416	selector has an associated register class. When the register allocator runs, it
				417	replaces virtual registers with a physical register in the set.</p>
				418
				419	<p>
				420	The target-specific implementations of these classes is auto-generated from a <a
				421	href="TableGenFundamentals.html">TableGen</a> description of the register file.
				422	</p>
				423
				424	</div>
				425
				426	<!-- ======================================================================= -->
				427	<div class="doc_subsection">
				428	<a name="targetinstrinfo">The <tt>TargetInstrInfo</tt> class</a>
				429	</div>
				430
				431	<div class="doc_text">
				432	<p>The <tt>TargetInstrInfo</tt> class is used to describe the machine
				433	instructions supported by the target. It is essentially an array of
				434	<tt>TargetInstrDescriptor</tt> objects, each of which describes one
				435	instruction the target supports. Descriptors define things like the mnemonic
				436	for the opcode, the number of operands, the list of implicit register uses
				437	and defs, whether the instruction has certain target-independent properties
				438	(accesses memory, is commutable, etc), and holds any target-specific
				439	flags.</p>
				440	</div>
				441
				442	<!-- ======================================================================= -->
				443	<div class="doc_subsection">
				444	<a name="targetframeinfo">The <tt>TargetFrameInfo</tt> class</a>
				445	</div>
				446
				447	<div class="doc_text">
				448	<p>The <tt>TargetFrameInfo</tt> class is used to provide information about the
				449	stack frame layout of the target. It holds the direction of stack growth,
				450	the known stack alignment on entry to each function, and the offset to the
				451	local area. The offset to the local area is the offset from the stack
				452	pointer on function entry to the first location where function data (local
				453	variables, spill locations) can be stored.</p>
				454	</div>
				455
				456	<!-- ======================================================================= -->
				457	<div class="doc_subsection">
				458	<a name="targetsubtarget">The <tt>TargetSubtarget</tt> class</a>
				459	</div>
				460
				461	<div class="doc_text">
				462	<p>The <tt>TargetSubtarget</tt> class is used to provide information about the
				463	specific chip set being targeted. A sub-target informs code generation of
				464	which instructions are supported, instruction latencies and instruction
				465	execution itinerary; i.e., which processing units are used, in what order, and
				466	for how long.</p>
				467	</div>
				468
				469
				470	<!-- ======================================================================= -->
				471	<div class="doc_subsection">
				472	<a name="targetjitinfo">The <tt>TargetJITInfo</tt> class</a>
				473	</div>
				474
				475	<div class="doc_text">
				476	<p>The <tt>TargetJITInfo</tt> class exposes an abstract interface used by the
				477	Just-In-Time code generator to perform target-specific activities, such as
				478	emitting stubs. If a <tt>TargetMachine</tt> supports JIT code generation, it
				479	should provide one of these objects through the <tt>getJITInfo</tt>
				480	method.</p>
				481	</div>
				482
				483	<!-- *********************************************************************** -->
				484	<div class="doc_section">
				485	<a name="codegendesc">Machine code description classes</a>
				486	</div>
				487	<!-- *********************************************************************** -->
				488
				489	<div class="doc_text">
				490
				491	<p>At the high-level, LLVM code is translated to a machine specific
				492	representation formed out of
				493	<a href="#machinefunction"><tt>MachineFunction</tt></a>,
				494	<a href="#machinebasicblock"><tt>MachineBasicBlock</tt></a>, and <a
				495	href="#machineinstr"><tt>MachineInstr</tt></a> instances
				496	(defined in <tt>include/llvm/CodeGen</tt>). This representation is completely
				497	target agnostic, representing instructions in their most abstract form: an
				498	opcode and a series of operands. This representation is designed to support
				499	both an SSA representation for machine code, as well as a register allocated,
				500	non-SSA form.</p>
				501
				502	</div>
				503
				504	<!-- ======================================================================= -->
				505	<div class="doc_subsection">
				506	<a name="machineinstr">The <tt>MachineInstr</tt> class</a>
				507	</div>
				508
				509	<div class="doc_text">
				510
				511	<p>Target machine instructions are represented as instances of the
				512	<tt>MachineInstr</tt> class. This class is an extremely abstract way of
				513	representing machine instructions. In particular, it only keeps track of
				514	an opcode number and a set of operands.</p>
				515
				516	<p>The opcode number is a simple unsigned integer that only has meaning to a
				517	specific backend. All of the instructions for a target should be defined in
				518	the <tt>*InstrInfo.td</tt> file for the target. The opcode enum values
				519	are auto-generated from this description. The <tt>MachineInstr</tt> class does
				520	not have any information about how to interpret the instruction (i.e., what the
				521	semantics of the instruction are); for that you must refer to the
				522	<tt><a href="#targetinstrinfo">TargetInstrInfo</a></tt> class.</p>
				523
				524	<p>The operands of a machine instruction can be of several different types:
				525	a register reference, a constant integer, a basic block reference, etc. In
				526	addition, a machine operand should be marked as a def or a use of the value
				527	(though only registers are allowed to be defs).</p>
				528
				529	<p>By convention, the LLVM code generator orders instruction operands so that
				530	all register definitions come before the register uses, even on architectures
				531	that are normally printed in other orders. For example, the SPARC add
				532	instruction: "<tt>add %i1, %i2, %i3</tt>" adds the "%i1", and "%i2" registers
				533	and stores the result into the "%i3" register. In the LLVM code generator,
				534	the operands should be stored as "<tt>%i3, %i1, %i2</tt>": with the destination
				535	first.</p>
				536
				537	<p>Keeping destination (definition) operands at the beginning of the operand
				538	list has several advantages. In particular, the debugging printer will print
				539	the instruction like this:</p>
				540
				541	<div class="doc_code">
				542	<pre>
				543	%r3 = add %i1, %i2
				544	</pre>
				545	</div>
				546
				547	<p>Also if the first operand is a def, it is easier to <a
				548	href="#buildmi">create instructions</a> whose only def is the first
				549	operand.</p>
				550
				551	</div>
				552
				553	<!-- _______________________________________________________________________ -->
				554	<div class="doc_subsubsection">
				555	<a name="buildmi">Using the <tt>MachineInstrBuilder.h</tt> functions</a>
				556	</div>
				557
				558	<div class="doc_text">
				559
				560	<p>Machine instructions are created by using the <tt>BuildMI</tt> functions,
				561	located in the <tt>include/llvm/CodeGen/MachineInstrBuilder.h</tt> file. The
				562	<tt>BuildMI</tt> functions make it easy to build arbitrary machine
				563	instructions. Usage of the <tt>BuildMI</tt> functions look like this:</p>
				564
				565	<div class="doc_code">
				566	<pre>
				567	// Create a 'DestReg = mov 42' (rendered in X86 assembly as 'mov DestReg, 42')
				568	// instruction. The '1' specifies how many operands will be added.
				569	MachineInstr *MI = BuildMI(X86::MOV32ri, 1, DestReg).addImm(42);
				570
				571	// Create the same instr, but insert it at the end of a basic block.
				572	MachineBasicBlock &MBB = ...
				573	BuildMI(MBB, X86::MOV32ri, 1, DestReg).addImm(42);
				574
				575	// Create the same instr, but insert it before a specified iterator point.
				576	MachineBasicBlock::iterator MBBI = ...
				577	BuildMI(MBB, MBBI, X86::MOV32ri, 1, DestReg).addImm(42);
				578
				579	// Create a 'cmp Reg, 0' instruction, no destination reg.
				580	MI = BuildMI(X86::CMP32ri, 2).addReg(Reg).addImm(0);
				581	// Create an 'sahf' instruction which takes no operands and stores nothing.
				582	MI = BuildMI(X86::SAHF, 0);
				583
				584	// Create a self looping branch instruction.
				585	BuildMI(MBB, X86::JNE, 1).addMBB(&MBB);
				586	</pre>
				587	</div>
				588
				589	<p>The key thing to remember with the <tt>BuildMI</tt> functions is that you
				590	have to specify the number of operands that the machine instruction will take.
				591	This allows for efficient memory allocation. You also need to specify if
				592	operands default to be uses of values, not definitions. If you need to add a
				593	definition operand (other than the optional destination register), you must
				594	explicitly mark it as such:</p>
				595
				596	<div class="doc_code">
				597	<pre>
				598	MI.addReg(Reg, MachineOperand::Def);
				599	</pre>
				600	</div>
				601
				602	</div>
				603
				604	<!-- _______________________________________________________________________ -->
				605	<div class="doc_subsubsection">
				606	<a name="fixedregs">Fixed (preassigned) registers</a>
				607	</div>
				608
				609	<div class="doc_text">
				610
				611	<p>One important issue that the code generator needs to be aware of is the
				612	presence of fixed registers. In particular, there are often places in the
				613	instruction stream where the register allocator <em>must</em> arrange for a
				614	particular value to be in a particular register. This can occur due to
				615	limitations of the instruction set (e.g., the X86 can only do a 32-bit divide
				616	with the <tt>EAX</tt>/<tt>EDX</tt> registers), or external factors like calling
				617	conventions. In any case, the instruction selector should emit code that
				618	copies a virtual register into or out of a physical register when needed.</p>
				619
				620	<p>For example, consider this simple LLVM example:</p>
				621
				622	<div class="doc_code">
				623	<pre>
				624	int %test(int %X, int %Y) {
				625	%Z = div int %X, %Y
				626	ret int %Z
				627	}
				628	</pre>
				629	</div>
				630
				631	<p>The X86 instruction selector produces this machine code for the <tt>div</tt>
				632	and <tt>ret</tt> (use
				633	"<tt>llc X.bc -march=x86 -print-machineinstrs</tt>" to get this):</p>
				634
				635	<div class="doc_code">
				636	<pre>
				637	;; Start of div
				638	%EAX = mov %reg1024 ;; Copy X (in reg1024) into EAX
				639	%reg1027 = sar %reg1024, 31
				640	%EDX = mov %reg1027 ;; Sign extend X into EDX
				641	idiv %reg1025 ;; Divide by Y (in reg1025)
				642	%reg1026 = mov %EAX ;; Read the result (Z) out of EAX
				643
				644	;; Start of ret
				645	%EAX = mov %reg1026 ;; 32-bit return value goes in EAX
				646	ret
				647	</pre>
				648	</div>
				649
				650	<p>By the end of code generation, the register allocator has coalesced
				651	the registers and deleted the resultant identity moves producing the
				652	following code:</p>
				653
				654	<div class="doc_code">
				655	<pre>
				656	;; X is in EAX, Y is in ECX
				657	mov %EAX, %EDX
				658	sar %EDX, 31
				659	idiv %ECX
				660	ret
				661	</pre>
				662	</div>
				663
				664	<p>This approach is extremely general (if it can handle the X86 architecture,
				665	it can handle anything!) and allows all of the target specific
				666	knowledge about the instruction stream to be isolated in the instruction
				667	selector. Note that physical registers should have a short lifetime for good
				668	code generation, and all physical registers are assumed dead on entry to and
				669	exit from basic blocks (before register allocation). Thus, if you need a value
				670	to be live across basic block boundaries, it <em>must</em> live in a virtual
				671	register.</p>
				672
				673	</div>
				674
				675	<!-- _______________________________________________________________________ -->
				676	<div class="doc_subsubsection">
				677	<a name="ssa">Machine code in SSA form</a>
				678	</div>
				679
				680	<div class="doc_text">
				681
				682	<p><tt>MachineInstr</tt>'s are initially selected in SSA-form, and
				683	are maintained in SSA-form until register allocation happens. For the most
				684	part, this is trivially simple since LLVM is already in SSA form; LLVM PHI nodes
				685	become machine code PHI nodes, and virtual registers are only allowed to have a
				686	single definition.</p>
				687
				688	<p>After register allocation, machine code is no longer in SSA-form because there
				689	are no virtual registers left in the code.</p>
				690
				691	</div>
				692
				693	<!-- ======================================================================= -->
				694	<div class="doc_subsection">
				695	<a name="machinebasicblock">The <tt>MachineBasicBlock</tt> class</a>
				696	</div>
				697
				698	<div class="doc_text">
				699
				700	<p>The <tt>MachineBasicBlock</tt> class contains a list of machine instructions
				701	(<tt><a href="#machineinstr">MachineInstr</a></tt> instances). It roughly
				702	corresponds to the LLVM code input to the instruction selector, but there can be
				703	a one-to-many mapping (i.e. one LLVM basic block can map to multiple machine
				704	basic blocks). The <tt>MachineBasicBlock</tt> class has a
				705	"<tt>getBasicBlock</tt>" method, which returns the LLVM basic block that it
				706	comes from.</p>
				707
				708	</div>
				709
				710	<!-- ======================================================================= -->
				711	<div class="doc_subsection">
				712	<a name="machinefunction">The <tt>MachineFunction</tt> class</a>
				713	</div>
				714
				715	<div class="doc_text">
				716
				717	<p>The <tt>MachineFunction</tt> class contains a list of machine basic blocks
				718	(<tt><a href="#machinebasicblock">MachineBasicBlock</a></tt> instances). It
				719	corresponds one-to-one with the LLVM function input to the instruction selector.
				720	In addition to a list of basic blocks, the <tt>MachineFunction</tt> contains a
				721	a <tt>MachineConstantPool</tt>, a <tt>MachineFrameInfo</tt>, a
				722	<tt>MachineFunctionInfo</tt>, a <tt>SSARegMap</tt>, and a set of live in and
				723	live out registers for the function. See
				724	<tt>include/llvm/CodeGen/MachineFunction.h</tt> for more information.</p>
				725
				726	</div>
				727
				728	<!-- *********************************************************************** -->
				729	<div class="doc_section">
				730	<a name="codegenalgs">Target-independent code generation algorithms</a>
				731	</div>
				732	<!-- *********************************************************************** -->
				733
				734	<div class="doc_text">
				735
				736	<p>This section documents the phases described in the <a
				737	href="#high-level-design">high-level design of the code generator</a>. It
				738	explains how they work and some of the rationale behind their design.</p>
				739
				740	</div>
				741
				742	<!-- ======================================================================= -->
				743	<div class="doc_subsection">
				744	<a name="instselect">Instruction Selection</a>
				745	</div>
				746
				747	<div class="doc_text">
				748	<p>
				749	Instruction Selection is the process of translating LLVM code presented to the
				750	code generator into target-specific machine instructions. There are several
Evan Cheng	bd8c49c	2007-10-08 17:54:24 +0000	[diff] [blame^]	751	well-known ways to do this in the literature. LLVM uses a SelectionDAG based
				752	instruction selector.
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	753	</p>
				754
				755	<p>Portions of the DAG instruction selector are generated from the target
				756	description (<tt>*.td</tt>) files. Our goal is for the entire instruction
				757	selector to be generated from these <tt>.td</tt> files.</p>
				758	</div>
				759
				760	<!-- _______________________________________________________________________ -->
				761	<div class="doc_subsubsection">
				762	<a name="selectiondag_intro">Introduction to SelectionDAGs</a>
				763	</div>
				764
				765	<div class="doc_text">
				766
				767	<p>The SelectionDAG provides an abstraction for code representation in a way
				768	that is amenable to instruction selection using automatic techniques
				769	(e.g. dynamic-programming based optimal pattern matching selectors). It is also
				770	well-suited to other phases of code generation; in particular,
				771	instruction scheduling (SelectionDAG's are very close to scheduling DAGs
				772	post-selection). Additionally, the SelectionDAG provides a host representation
				773	where a large variety of very-low-level (but target-independent)
				774	<a href="#selectiondag_optimize">optimizations</a> may be
				775	performed; ones which require extensive information about the instructions
				776	efficiently supported by the target.</p>
				777
				778	<p>The SelectionDAG is a Directed-Acyclic-Graph whose nodes are instances of the
				779	<tt>SDNode</tt> class. The primary payload of the <tt>SDNode</tt> is its
				780	operation code (Opcode) that indicates what operation the node performs and
				781	the operands to the operation.
				782	The various operation node types are described at the top of the
				783	<tt>include/llvm/CodeGen/SelectionDAGNodes.h</tt> file.</p>
				784
				785	<p>Although most operations define a single value, each node in the graph may
				786	define multiple values. For example, a combined div/rem operation will define
				787	both the dividend and the remainder. Many other situations require multiple
				788	values as well. Each node also has some number of operands, which are edges
				789	to the node defining the used value. Because nodes may define multiple values,
				790	edges are represented by instances of the <tt>SDOperand</tt> class, which is
				791	a <tt><SDNode, unsigned></tt> pair, indicating the node and result
				792	value being used, respectively. Each value produced by an <tt>SDNode</tt> has
				793	an associated <tt>MVT::ValueType</tt> indicating what type the value is.</p>
				794
				795	<p>SelectionDAGs contain two different kinds of values: those that represent
				796	data flow and those that represent control flow dependencies. Data values are
				797	simple edges with an integer or floating point value type. Control edges are
				798	represented as "chain" edges which are of type <tt>MVT::Other</tt>. These edges
				799	provide an ordering between nodes that have side effects (such as
				800	loads, stores, calls, returns, etc). All nodes that have side effects should
				801	take a token chain as input and produce a new one as output. By convention,
				802	token chain inputs are always operand #0, and chain results are always the last
				803	value produced by an operation.</p>
				804
				805	<p>A SelectionDAG has designated "Entry" and "Root" nodes. The Entry node is
				806	always a marker node with an Opcode of <tt>ISD::EntryToken</tt>. The Root node
				807	is the final side-effecting node in the token chain. For example, in a single
				808	basic block function it would be the return node.</p>
				809
				810	<p>One important concept for SelectionDAGs is the notion of a "legal" vs.
				811	"illegal" DAG. A legal DAG for a target is one that only uses supported
				812	operations and supported types. On a 32-bit PowerPC, for example, a DAG with
				813	a value of type i1, i8, i16, or i64 would be illegal, as would a DAG that uses a
				814	SREM or UREM operation. The
				815	<a href="#selectiondag_legalize">legalize</a> phase is responsible for turning
				816	an illegal DAG into a legal DAG.</p>
				817
				818	</div>
				819
				820	<!-- _______________________________________________________________________ -->
				821	<div class="doc_subsubsection">
				822	<a name="selectiondag_process">SelectionDAG Instruction Selection Process</a>
				823	</div>
				824
				825	<div class="doc_text">
				826
				827	<p>SelectionDAG-based instruction selection consists of the following steps:</p>
				828
				829	<ol>
				830	<li><a href="#selectiondag_build">Build initial DAG</a> - This stage
				831	performs a simple translation from the input LLVM code to an illegal
				832	SelectionDAG.</li>
				833	<li><a href="#selectiondag_optimize">Optimize SelectionDAG</a> - This stage
				834	performs simple optimizations on the SelectionDAG to simplify it, and
				835	recognize meta instructions (like rotates and <tt>div</tt>/<tt>rem</tt>
				836	pairs) for targets that support these meta operations. This makes the
				837	resultant code more efficient and the <a href="#selectiondag_select">select
				838	instructions from DAG</a> phase (below) simpler.</li>
				839	<li><a href="#selectiondag_legalize">Legalize SelectionDAG</a> - This stage
				840	converts the illegal SelectionDAG to a legal SelectionDAG by eliminating
				841	unsupported operations and data types.</li>
				842	<li><a href="#selectiondag_optimize">Optimize SelectionDAG (#2)</a> - This
				843	second run of the SelectionDAG optimizes the newly legalized DAG to
				844	eliminate inefficiencies introduced by legalization.</li>
				845	<li><a href="#selectiondag_select">Select instructions from DAG</a> - Finally,
				846	the target instruction selector matches the DAG operations to target
				847	instructions. This process translates the target-independent input DAG into
				848	another DAG of target instructions.</li>
				849	<li><a href="#selectiondag_sched">SelectionDAG Scheduling and Formation</a>
				850	- The last phase assigns a linear order to the instructions in the
				851	target-instruction DAG and emits them into the MachineFunction being
				852	compiled. This step uses traditional prepass scheduling techniques.</li>
				853	</ol>
				854
				855	<p>After all of these steps are complete, the SelectionDAG is destroyed and the
				856	rest of the code generation passes are run.</p>
				857
				858	<p>One great way to visualize what is going on here is to take advantage of a
				859	few LLC command line options. In particular, the <tt>-view-isel-dags</tt>
				860	option pops up a window with the SelectionDAG input to the Select phase for all
				861	of the code compiled (if you only get errors printed to the console while using
				862	this, you probably <a href="ProgrammersManual.html#ViewGraph">need to configure
				863	your system</a> to add support for it). The <tt>-view-sched-dags</tt> option
				864	views the SelectionDAG output from the Select phase and input to the Scheduler
				865	phase.</p>
				866
				867	</div>
				868
				869	<!-- _______________________________________________________________________ -->
				870	<div class="doc_subsubsection">
				871	<a name="selectiondag_build">Initial SelectionDAG Construction</a>
				872	</div>
				873
				874	<div class="doc_text">
				875
				876	<p>The initial SelectionDAG is naïvely peephole expanded from the LLVM
				877	input by the <tt>SelectionDAGLowering</tt> class in the
				878	<tt>lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp</tt> file. The intent of this
				879	pass is to expose as much low-level, target-specific details to the SelectionDAG
				880	as possible. This pass is mostly hard-coded (e.g. an LLVM <tt>add</tt> turns
				881	into an <tt>SDNode add</tt> while a <tt>geteelementptr</tt> is expanded into the
				882	obvious arithmetic). This pass requires target-specific hooks to lower calls,
				883	returns, varargs, etc. For these features, the
				884	<tt><a href="#targetlowering">TargetLowering</a></tt> interface is used.</p>
				885
				886	</div>
				887
				888	<!-- _______________________________________________________________________ -->
				889	<div class="doc_subsubsection">
				890	<a name="selectiondag_legalize">SelectionDAG Legalize Phase</a>
				891	</div>
				892
				893	<div class="doc_text">
				894
				895	<p>The Legalize phase is in charge of converting a DAG to only use the types and
				896	operations that are natively supported by the target. This involves two major
				897	tasks:</p>
				898
				899	<ol>
				900	<li><p>Convert values of unsupported types to values of supported types.</p>
				901	<p>There are two main ways of doing this: converting small types to
				902	larger types ("promoting"), and breaking up large integer types
				903	into smaller ones ("expanding"). For example, a target might require
				904	that all f32 values are promoted to f64 and that all i1/i8/i16 values
				905	are promoted to i32. The same target might require that all i64 values
				906	be expanded into i32 values. These changes can insert sign and zero
				907	extensions as needed to make sure that the final code has the same
				908	behavior as the input.</p>
				909	<p>A target implementation tells the legalizer which types are supported
				910	(and which register class to use for them) by calling the
				911	<tt>addRegisterClass</tt> method in its TargetLowering constructor.</p>
				912	</li>
				913
				914	<li><p>Eliminate operations that are not supported by the target.</p>
				915	<p>Targets often have weird constraints, such as not supporting every
				916	operation on every supported datatype (e.g. X86 does not support byte
				917	conditional moves and PowerPC does not support sign-extending loads from
				918	a 16-bit memory location). Legalize takes care of this by open-coding
				919	another sequence of operations to emulate the operation ("expansion"), by
				920	promoting one type to a larger type that supports the operation
				921	("promotion"), or by using a target-specific hook to implement the
				922	legalization ("custom").</p>
				923	<p>A target implementation tells the legalizer which operations are not
				924	supported (and which of the above three actions to take) by calling the
				925	<tt>setOperationAction</tt> method in its <tt>TargetLowering</tt>
				926	constructor.</p>
				927	</li>
				928	</ol>
				929
				930	<p>Prior to the existance of the Legalize pass, we required that every target
				931	<a href="#selectiondag_optimize">selector</a> supported and handled every
				932	operator and type even if they are not natively supported. The introduction of
				933	the Legalize phase allows all of the cannonicalization patterns to be shared
				934	across targets, and makes it very easy to optimize the cannonicalized code
				935	because it is still in the form of a DAG.</p>
				936
				937	</div>
				938
				939	<!-- _______________________________________________________________________ -->
				940	<div class="doc_subsubsection">
				941	<a name="selectiondag_optimize">SelectionDAG Optimization Phase: the DAG
				942	Combiner</a>
				943	</div>
				944
				945	<div class="doc_text">
				946
				947	<p>The SelectionDAG optimization phase is run twice for code generation: once
				948	immediately after the DAG is built and once after legalization. The first run
				949	of the pass allows the initial code to be cleaned up (e.g. performing
				950	optimizations that depend on knowing that the operators have restricted type
				951	inputs). The second run of the pass cleans up the messy code generated by the
				952	Legalize pass, which allows Legalize to be very simple (it can focus on making
				953	code legal instead of focusing on generating <em>good</em> and legal code).</p>
				954
				955	<p>One important class of optimizations performed is optimizing inserted sign
				956	and zero extension instructions. We currently use ad-hoc techniques, but could
				957	move to more rigorous techniques in the future. Here are some good papers on
				958	the subject:</p>
				959
				960	<p>
				961	"<a href="http://www.eecs.harvard.edu/~nr/pubs/widen-abstract.html">Widening
				962	integer arithmetic</a>"<br>
				963	Kevin Redwine and Norman Ramsey<br>
				964	International Conference on Compiler Construction (CC) 2004
				965	</p>
				966
				967
				968	<p>
				969	"<a href="http://portal.acm.org/citation.cfm?doid=512529.512552">Effective
				970	sign extension elimination</a>"<br>
				971	Motohiro Kawahito, Hideaki Komatsu, and Toshio Nakatani<br>
				972	Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design
				973	and Implementation.
				974	</p>
				975
				976	</div>
				977
				978	<!-- _______________________________________________________________________ -->
				979	<div class="doc_subsubsection">
				980	<a name="selectiondag_select">SelectionDAG Select Phase</a>
				981	</div>
				982
				983	<div class="doc_text">
				984
				985	<p>The Select phase is the bulk of the target-specific code for instruction
				986	selection. This phase takes a legal SelectionDAG as input, pattern matches the
				987	instructions supported by the target to this DAG, and produces a new DAG of
				988	target code. For example, consider the following LLVM fragment:</p>
				989
				990	<div class="doc_code">
				991	<pre>
				992	%t1 = add float %W, %X
				993	%t2 = mul float %t1, %Y
				994	%t3 = add float %t2, %Z
				995	</pre>
				996	</div>
				997
				998	<p>This LLVM code corresponds to a SelectionDAG that looks basically like
				999	this:</p>
				1000
				1001	<div class="doc_code">
				1002	<pre>
				1003	(fadd:f32 (fmul:f32 (fadd:f32 W, X), Y), Z)
				1004	</pre>
				1005	</div>
				1006
				1007	<p>If a target supports floating point multiply-and-add (FMA) operations, one
				1008	of the adds can be merged with the multiply. On the PowerPC, for example, the
				1009	output of the instruction selector might look like this DAG:</p>
				1010
				1011	<div class="doc_code">
				1012	<pre>
				1013	(FMADDS (FADDS W, X), Y, Z)
				1014	</pre>
				1015	</div>
				1016
				1017	<p>The <tt>FMADDS</tt> instruction is a ternary instruction that multiplies its
				1018	first two operands and adds the third (as single-precision floating-point
				1019	numbers). The <tt>FADDS</tt> instruction is a simple binary single-precision
				1020	add instruction. To perform this pattern match, the PowerPC backend includes
				1021	the following instruction definitions:</p>
				1022
				1023	<div class="doc_code">
				1024	<pre>
				1025	def FMADDS : AForm_1<59, 29,
				1026	(ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRC, F4RC:$FRB),
				1027	"fmadds $FRT, $FRA, $FRC, $FRB",
				1028	[<b>(set F4RC:$FRT, (fadd (fmul F4RC:$FRA, F4RC:$FRC),
				1029	F4RC:$FRB))</b>]>;
				1030	def FADDS : AForm_2<59, 21,
				1031	(ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRB),
				1032	"fadds $FRT, $FRA, $FRB",
				1033	[<b>(set F4RC:$FRT, (fadd F4RC:$FRA, F4RC:$FRB))</b>]>;
				1034	</pre>
				1035	</div>
				1036
				1037	<p>The portion of the instruction definition in bold indicates the pattern used
				1038	to match the instruction. The DAG operators (like <tt>fmul</tt>/<tt>fadd</tt>)
				1039	are defined in the <tt>lib/Target/TargetSelectionDAG.td</tt> file.
				1040	"<tt>F4RC</tt>" is the register class of the input and result values.<p>
				1041
				1042	<p>The TableGen DAG instruction selector generator reads the instruction
				1043	patterns in the <tt>.td</tt> file and automatically builds parts of the pattern
				1044	matching code for your target. It has the following strengths:</p>
				1045
				1046	<ul>
				1047	<li>At compiler-compiler time, it analyzes your instruction patterns and tells
				1048	you if your patterns make sense or not.</li>
				1049	<li>It can handle arbitrary constraints on operands for the pattern match. In
				1050	particular, it is straight-forward to say things like "match any immediate
				1051	that is a 13-bit sign-extended value". For examples, see the
				1052	<tt>immSExt16</tt> and related <tt>tblgen</tt> classes in the PowerPC
				1053	backend.</li>
				1054	<li>It knows several important identities for the patterns defined. For
				1055	example, it knows that addition is commutative, so it allows the
				1056	<tt>FMADDS</tt> pattern above to match "<tt>(fadd X, (fmul Y, Z))</tt>" as
				1057	well as "<tt>(fadd (fmul X, Y), Z)</tt>", without the target author having
				1058	to specially handle this case.</li>
				1059	<li>It has a full-featured type-inferencing system. In particular, you should
				1060	rarely have to explicitly tell the system what type parts of your patterns
				1061	are. In the <tt>FMADDS</tt> case above, we didn't have to tell
				1062	<tt>tblgen</tt> that all of the nodes in the pattern are of type 'f32'. It
				1063	was able to infer and propagate this knowledge from the fact that
				1064	<tt>F4RC</tt> has type 'f32'.</li>
				1065	<li>Targets can define their own (and rely on built-in) "pattern fragments".
				1066	Pattern fragments are chunks of reusable patterns that get inlined into your
				1067	patterns during compiler-compiler time. For example, the integer
				1068	"<tt>(not x)</tt>" operation is actually defined as a pattern fragment that
				1069	expands as "<tt>(xor x, -1)</tt>", since the SelectionDAG does not have a
				1070	native '<tt>not</tt>' operation. Targets can define their own short-hand
				1071	fragments as they see fit. See the definition of '<tt>not</tt>' and
				1072	'<tt>ineg</tt>' for examples.</li>
				1073	<li>In addition to instructions, targets can specify arbitrary patterns that
				1074	map to one or more instructions using the 'Pat' class. For example,
				1075	the PowerPC has no way to load an arbitrary integer immediate into a
				1076	register in one instruction. To tell tblgen how to do this, it defines:
				1077	<br>
				1078	<br>
				1079	<div class="doc_code">
				1080	<pre>
				1081	// Arbitrary immediate support. Implement in terms of LIS/ORI.
				1082	def : Pat<(i32 imm:$imm),
				1083	(ORI (LIS (HI16 imm:$imm)), (LO16 imm:$imm))>;
				1084	</pre>
				1085	</div>
				1086	<br>
				1087	If none of the single-instruction patterns for loading an immediate into a
				1088	register match, this will be used. This rule says "match an arbitrary i32
				1089	immediate, turning it into an <tt>ORI</tt> ('or a 16-bit immediate') and an
				1090	<tt>LIS</tt> ('load 16-bit immediate, where the immediate is shifted to the
				1091	left 16 bits') instruction". To make this work, the
				1092	<tt>LO16</tt>/<tt>HI16</tt> node transformations are used to manipulate the
				1093	input immediate (in this case, take the high or low 16-bits of the
				1094	immediate).</li>
				1095	<li>While the system does automate a lot, it still allows you to write custom
				1096	C++ code to match special cases if there is something that is hard to
				1097	express.</li>
				1098	</ul>
				1099
				1100	<p>While it has many strengths, the system currently has some limitations,
				1101	primarily because it is a work in progress and is not yet finished:</p>
				1102
				1103	<ul>
				1104	<li>Overall, there is no way to define or match SelectionDAG nodes that define
				1105	multiple values (e.g. <tt>ADD_PARTS</tt>, <tt>LOAD</tt>, <tt>CALL</tt>,
				1106	etc). This is the biggest reason that you currently still <em>have to</em>
				1107	write custom C++ code for your instruction selector.</li>
				1108	<li>There is no great way to support matching complex addressing modes yet. In
				1109	the future, we will extend pattern fragments to allow them to define
				1110	multiple values (e.g. the four operands of the <a href="#x86_memory">X86
				1111	addressing mode</a>). In addition, we'll extend fragments so that a
				1112	fragment can match multiple different patterns.</li>
				1113	<li>We don't automatically infer flags like isStore/isLoad yet.</li>
				1114	<li>We don't automatically generate the set of supported registers and
				1115	operations for the <a href="#selectiondag_legalize">Legalizer</a> yet.</li>
				1116	<li>We don't have a way of tying in custom legalized nodes yet.</li>
				1117	</ul>
				1118
				1119	<p>Despite these limitations, the instruction selector generator is still quite
				1120	useful for most of the binary and logical operations in typical instruction
				1121	sets. If you run into any problems or can't figure out how to do something,
				1122	please let Chris know!</p>
				1123
				1124	</div>
				1125
				1126	<!-- _______________________________________________________________________ -->
				1127	<div class="doc_subsubsection">
				1128	<a name="selectiondag_sched">SelectionDAG Scheduling and Formation Phase</a>
				1129	</div>
				1130
				1131	<div class="doc_text">
				1132
				1133	<p>The scheduling phase takes the DAG of target instructions from the selection
				1134	phase and assigns an order. The scheduler can pick an order depending on
				1135	various constraints of the machines (i.e. order for minimal register pressure or
				1136	try to cover instruction latencies). Once an order is established, the DAG is
				1137	converted to a list of <tt><a href="#machineinstr">MachineInstr</a></tt>s and
				1138	the SelectionDAG is destroyed.</p>
				1139
				1140	<p>Note that this phase is logically separate from the instruction selection
				1141	phase, but is tied to it closely in the code because it operates on
				1142	SelectionDAGs.</p>
				1143
				1144	</div>
				1145
				1146	<!-- _______________________________________________________________________ -->
				1147	<div class="doc_subsubsection">
				1148	<a name="selectiondag_future">Future directions for the SelectionDAG</a>
				1149	</div>
				1150
				1151	<div class="doc_text">
				1152
				1153	<ol>
				1154	<li>Optional function-at-a-time selection.</li>
				1155	<li>Auto-generate entire selector from <tt>.td</tt> file.</li>
				1156	</ol>
				1157
				1158	</div>
				1159
				1160	<!-- ======================================================================= -->
				1161	<div class="doc_subsection">
				1162	<a name="ssamco">SSA-based Machine Code Optimizations</a>
				1163	</div>
				1164	<div class="doc_text"><p>To Be Written</p></div>
				1165
				1166	<!-- ======================================================================= -->
				1167	<div class="doc_subsection">
				1168	<a name="liveintervals">Live Intervals</a>
				1169	</div>
				1170
				1171	<div class="doc_text">
				1172
				1173	<p>Live Intervals are the ranges (intervals) where a variable is <i>live</i>.
				1174	They are used by some <a href="#regalloc">register allocator</a> passes to
				1175	determine if two or more virtual registers which require the same physical
				1176	register are live at the same point in the program (i.e., they conflict). When
				1177	this situation occurs, one virtual register must be <i>spilled</i>.</p>
				1178
				1179	</div>
				1180
				1181	<!-- _______________________________________________________________________ -->
				1182	<div class="doc_subsubsection">
				1183	<a name="livevariable_analysis">Live Variable Analysis</a>
				1184	</div>
				1185
				1186	<div class="doc_text">
				1187
				1188	<p>The first step in determining the live intervals of variables is to
				1189	calculate the set of registers that are immediately dead after the
				1190	instruction (i.e., the instruction calculates the value, but it is
				1191	never used) and the set of registers that are used by the instruction,
				1192	but are never used after the instruction (i.e., they are killed). Live
				1193	variable information is computed for each <i>virtual</i> register and
				1194	<i>register allocatable</i> physical register in the function. This
				1195	is done in a very efficient manner because it uses SSA to sparsely
				1196	compute lifetime information for virtual registers (which are in SSA
				1197	form) and only has to track physical registers within a block. Before
				1198	register allocation, LLVM can assume that physical registers are only
				1199	live within a single basic block. This allows it to do a single,
				1200	local analysis to resolve physical register lifetimes within each
				1201	basic block. If a physical register is not register allocatable (e.g.,
				1202	a stack pointer or condition codes), it is not tracked.</p>
				1203
				1204	<p>Physical registers may be live in to or out of a function. Live in values
				1205	are typically arguments in registers. Live out values are typically return
				1206	values in registers. Live in values are marked as such, and are given a dummy
				1207	"defining" instruction during live intervals analysis. If the last basic block
				1208	of a function is a <tt>return</tt>, then it's marked as using all live out
				1209	values in the function.</p>
				1210
				1211	<p><tt>PHI</tt> nodes need to be handled specially, because the calculation
				1212	of the live variable information from a depth first traversal of the CFG of
				1213	the function won't guarantee that a virtual register used by the <tt>PHI</tt>
				1214	node is defined before it's used. When a <tt>PHI</tt> node is encounted, only
				1215	the definition is handled, because the uses will be handled in other basic
				1216	blocks.</p>
				1217
				1218	<p>For each <tt>PHI</tt> node of the current basic block, we simulate an
				1219	assignment at the end of the current basic block and traverse the successor
				1220	basic blocks. If a successor basic block has a <tt>PHI</tt> node and one of
				1221	the <tt>PHI</tt> node's operands is coming from the current basic block,
				1222	then the variable is marked as <i>alive</i> within the current basic block
				1223	and all of its predecessor basic blocks, until the basic block with the
				1224	defining instruction is encountered.</p>
				1225
				1226	</div>
				1227
				1228	<!-- _______________________________________________________________________ -->
				1229	<div class="doc_subsubsection">
				1230	<a name="liveintervals_analysis">Live Intervals Analysis</a>
				1231	</div>
				1232
				1233	<div class="doc_text">
				1234
				1235	<p>We now have the information available to perform the live intervals analysis
				1236	and build the live intervals themselves. We start off by numbering the basic
				1237	blocks and machine instructions. We then handle the "live-in" values. These
				1238	are in physical registers, so the physical register is assumed to be killed by
				1239	the end of the basic block. Live intervals for virtual registers are computed
				1240	for some ordering of the machine instructions <tt>[1, N]</tt>. A live interval
				1241	is an interval <tt>[i, j)</tt>, where <tt>1 <= i <= j < N</tt>, for which a
				1242	variable is live.</p>
				1243
				1244	<p><i><b>More to come...</b></i></p>
				1245
				1246	</div>
				1247
				1248	<!-- ======================================================================= -->
				1249	<div class="doc_subsection">
				1250	<a name="regalloc">Register Allocation</a>
				1251	</div>
				1252
				1253	<div class="doc_text">
				1254
				1255	<p>The <i>Register Allocation problem</i> consists in mapping a program
				1256	<i>P<sub>v</sub></i>, that can use an unbounded number of virtual
				1257	registers, to a program <i>P<sub>p</sub></i> that contains a finite
				1258	(possibly small) number of physical registers. Each target architecture has
				1259	a different number of physical registers. If the number of physical
				1260	registers is not enough to accommodate all the virtual registers, some of
				1261	them will have to be mapped into memory. These virtuals are called
				1262	<i>spilled virtuals</i>.</p>
				1263
				1264	</div>
				1265
				1266	<!-- _______________________________________________________________________ -->
				1267
				1268	<div class="doc_subsubsection">
				1269	<a name="regAlloc_represent">How registers are represented in LLVM</a>
				1270	</div>
				1271
				1272	<div class="doc_text">
				1273
				1274	<p>In LLVM, physical registers are denoted by integer numbers that
				1275	normally range from 1 to 1023. To see how this numbering is defined
				1276	for a particular architecture, you can read the
				1277	<tt>GenRegisterNames.inc</tt> file for that architecture. For
				1278	instance, by inspecting
				1279	<tt>lib/Target/X86/X86GenRegisterNames.inc</tt> we see that the 32-bit
				1280	register <tt>EAX</tt> is denoted by 15, and the MMX register
				1281	<tt>MM0</tt> is mapped to 48.</p>
				1282
				1283	<p>Some architectures contain registers that share the same physical
				1284	location. A notable example is the X86 platform. For instance, in the
				1285	X86 architecture, the registers <tt>EAX</tt>, <tt>AX</tt> and
				1286	<tt>AL</tt> share the first eight bits. These physical registers are
				1287	marked as <i>aliased</i> in LLVM. Given a particular architecture, you
				1288	can check which registers are aliased by inspecting its
				1289	<tt>RegisterInfo.td</tt> file. Moreover, the method
				1290	<tt>MRegisterInfo::getAliasSet(p_reg)</tt> returns an array containing
				1291	all the physical registers aliased to the register <tt>p_reg</tt>.</p>
				1292
				1293	<p>Physical registers, in LLVM, are grouped in <i>Register Classes</i>.
				1294	Elements in the same register class are functionally equivalent, and can
				1295	be interchangeably used. Each virtual register can only be mapped to
				1296	physical registers of a particular class. For instance, in the X86
				1297	architecture, some virtuals can only be allocated to 8 bit registers.
				1298	A register class is described by <tt>TargetRegisterClass</tt> objects.
				1299	To discover if a virtual register is compatible with a given physical,
				1300	this code can be used:
				1301	</p>
				1302
				1303	<div class="doc_code">
				1304	<pre>
				1305	bool RegMapping_Fer::compatible_class(MachineFunction &mf,
				1306	unsigned v_reg,
				1307	unsigned p_reg) {
				1308	assert(MRegisterInfo::isPhysicalRegister(p_reg) &&
				1309	"Target register must be physical");
				1310	const TargetRegisterClass *trc = mf.getSSARegMap()->getRegClass(v_reg);
				1311	return trc->contains(p_reg);
				1312	}
				1313	</pre>
				1314	</div>
				1315
				1316	<p>Sometimes, mostly for debugging purposes, it is useful to change
				1317	the number of physical registers available in the target
				1318	architecture. This must be done statically, inside the
				1319	<tt>TargetRegsterInfo.td</tt> file. Just <tt>grep</tt> for
				1320	<tt>RegisterClass</tt>, the last parameter of which is a list of
				1321	registers. Just commenting some out is one simple way to avoid them
				1322	being used. A more polite way is to explicitly exclude some registers
				1323	from the <i>allocation order</i>. See the definition of the
				1324	<tt>GR</tt> register class in
				1325	<tt>lib/Target/IA64/IA64RegisterInfo.td</tt> for an example of this
				1326	(e.g., <tt>numReservedRegs</tt> registers are hidden.)</p>
				1327
				1328	<p>Virtual registers are also denoted by integer numbers. Contrary to
				1329	physical registers, different virtual registers never share the same
				1330	number. The smallest virtual register is normally assigned the number
				1331	1024. This may change, so, in order to know which is the first virtual
				1332	register, you should access
				1333	<tt>MRegisterInfo::FirstVirtualRegister</tt>. Any register whose
				1334	number is greater than or equal to
				1335	<tt>MRegisterInfo::FirstVirtualRegister</tt> is considered a virtual
				1336	register. Whereas physical registers are statically defined in a
				1337	<tt>TargetRegisterInfo.td</tt> file and cannot be created by the
				1338	application developer, that is not the case with virtual registers.
				1339	In order to create new virtual registers, use the method
				1340	<tt>SSARegMap::createVirtualRegister()</tt>. This method will return a
				1341	virtual register with the highest code.
				1342	</p>
				1343
				1344	<p>Before register allocation, the operands of an instruction are
				1345	mostly virtual registers, although physical registers may also be
				1346	used. In order to check if a given machine operand is a register, use
				1347	the boolean function <tt>MachineOperand::isRegister()</tt>. To obtain
				1348	the integer code of a register, use
				1349	<tt>MachineOperand::getReg()</tt>. An instruction may define or use a
				1350	register. For instance, <tt>ADD reg:1026 := reg:1025 reg:1024</tt>
				1351	defines the registers 1024, and uses registers 1025 and 1026. Given a
				1352	register operand, the method <tt>MachineOperand::isUse()</tt> informs
				1353	if that register is being used by the instruction. The method
				1354	<tt>MachineOperand::isDef()</tt> informs if that registers is being
				1355	defined.</p>
				1356
				1357	<p>We will call physical registers present in the LLVM bitcode before
				1358	register allocation <i>pre-colored registers</i>. Pre-colored
				1359	registers are used in many different situations, for instance, to pass
				1360	parameters of functions calls, and to store results of particular
				1361	instructions. There are two types of pre-colored registers: the ones
				1362	<i>implicitly</i> defined, and those <i>explicitly</i>
				1363	defined. Explicitly defined registers are normal operands, and can be
				1364	accessed with <tt>MachineInstr::getOperand(int)::getReg()</tt>. In
				1365	order to check which registers are implicitly defined by an
				1366	instruction, use the
				1367	<tt>TargetInstrInfo::get(opcode)::ImplicitDefs</tt>, where
				1368	<tt>opcode</tt> is the opcode of the target instruction. One important
				1369	difference between explicit and implicit physical registers is that
				1370	the latter are defined statically for each instruction, whereas the
				1371	former may vary depending on the program being compiled. For example,
				1372	an instruction that represents a function call will always implicitly
				1373	define or use the same set of physical registers. To read the
				1374	registers implicitly used by an instruction, use
				1375	<tt>TargetInstrInfo::get(opcode)::ImplicitUses</tt>. Pre-colored
				1376	registers impose constraints on any register allocation algorithm. The
				1377	register allocator must make sure that none of them is been
				1378	overwritten by the values of virtual registers while still alive.</p>
				1379
				1380	</div>
				1381
				1382	<!-- _______________________________________________________________________ -->
				1383
				1384	<div class="doc_subsubsection">
				1385	<a name="regAlloc_howTo">Mapping virtual registers to physical registers</a>
				1386	</div>
				1387
				1388	<div class="doc_text">
				1389
				1390	<p>There are two ways to map virtual registers to physical registers (or to
				1391	memory slots). The first way, that we will call <i>direct mapping</i>,
				1392	is based on the use of methods of the classes <tt>MRegisterInfo</tt>,
				1393	and <tt>MachineOperand</tt>. The second way, that we will call
				1394	<i>indirect mapping</i>, relies on the <tt>VirtRegMap</tt> class in
				1395	order to insert loads and stores sending and getting values to and from
				1396	memory.</p>
				1397
				1398	<p>The direct mapping provides more flexibility to the developer of
				1399	the register allocator; however, it is more error prone, and demands
				1400	more implementation work. Basically, the programmer will have to
				1401	specify where load and store instructions should be inserted in the
				1402	target function being compiled in order to get and store values in
				1403	memory. To assign a physical register to a virtual register present in
				1404	a given operand, use <tt>MachineOperand::setReg(p_reg)</tt>. To insert
				1405	a store instruction, use
				1406	<tt>MRegisterInfo::storeRegToStackSlot(...)</tt>, and to insert a load
				1407	instruction, use <tt>MRegisterInfo::loadRegFromStackSlot</tt>.</p>
				1408
				1409	<p>The indirect mapping shields the application developer from the
				1410	complexities of inserting load and store instructions. In order to map
				1411	a virtual register to a physical one, use
				1412	<tt>VirtRegMap::assignVirt2Phys(vreg, preg)</tt>. In order to map a
				1413	certain virtual register to memory, use
				1414	<tt>VirtRegMap::assignVirt2StackSlot(vreg)</tt>. This method will
				1415	return the stack slot where <tt>vreg</tt>'s value will be located. If
				1416	it is necessary to map another virtual register to the same stack
				1417	slot, use <tt>VirtRegMap::assignVirt2StackSlot(vreg,
				1418	stack_location)</tt>. One important point to consider when using the
				1419	indirect mapping, is that even if a virtual register is mapped to
				1420	memory, it still needs to be mapped to a physical register. This
				1421	physical register is the location where the virtual register is
				1422	supposed to be found before being stored or after being reloaded.</p>
				1423
				1424	<p>If the indirect strategy is used, after all the virtual registers
				1425	have been mapped to physical registers or stack slots, it is necessary
				1426	to use a spiller object to place load and store instructions in the
				1427	code. Every virtual that has been mapped to a stack slot will be
				1428	stored to memory after been defined and will be loaded before being
				1429	used. The implementation of the spiller tries to recycle load/store
				1430	instructions, avoiding unnecessary instructions. For an example of how
				1431	to invoke the spiller, see
				1432	<tt>RegAllocLinearScan::runOnMachineFunction</tt> in
				1433	<tt>lib/CodeGen/RegAllocLinearScan.cpp</tt>.</p>
				1434
				1435	</div>
				1436
				1437	<!-- _______________________________________________________________________ -->
				1438	<div class="doc_subsubsection">
				1439	<a name="regAlloc_twoAddr">Handling two address instructions</a>
				1440	</div>
				1441
				1442	<div class="doc_text">
				1443
				1444	<p>With very rare exceptions (e.g., function calls), the LLVM machine
				1445	code instructions are three address instructions. That is, each
				1446	instruction is expected to define at most one register, and to use at
				1447	most two registers. However, some architectures use two address
				1448	instructions. In this case, the defined register is also one of the
				1449	used register. For instance, an instruction such as <tt>ADD %EAX,
				1450	%EBX</tt>, in X86 is actually equivalent to <tt>%EAX = %EAX +
				1451	%EBX</tt>.</p>
				1452
				1453	<p>In order to produce correct code, LLVM must convert three address
				1454	instructions that represent two address instructions into true two
				1455	address instructions. LLVM provides the pass
				1456	<tt>TwoAddressInstructionPass</tt> for this specific purpose. It must
				1457	be run before register allocation takes place. After its execution,
				1458	the resulting code may no longer be in SSA form. This happens, for
				1459	instance, in situations where an instruction such as <tt>%a = ADD %b
				1460	%c</tt> is converted to two instructions such as:</p>
				1461
				1462	<div class="doc_code">
				1463	<pre>
				1464	%a = MOVE %b
				1465	%a = ADD %a %b
				1466	</pre>
				1467	</div>
				1468
				1469	<p>Notice that, internally, the second instruction is represented as
				1470	<tt>ADD %a[def/use] %b</tt>. I.e., the register operand <tt>%a</tt> is
				1471	both used and defined by the instruction.</p>
				1472
				1473	</div>
				1474
				1475	<!-- _______________________________________________________________________ -->
				1476	<div class="doc_subsubsection">
				1477	<a name="regAlloc_ssaDecon">The SSA deconstruction phase</a>
				1478	</div>
				1479
				1480	<div class="doc_text">
				1481
				1482	<p>An important transformation that happens during register allocation is called
				1483	the <i>SSA Deconstruction Phase</i>. The SSA form simplifies many
				1484	analyses that are performed on the control flow graph of
				1485	programs. However, traditional instruction sets do not implement
				1486	PHI instructions. Thus, in order to generate executable code, compilers
				1487	must replace PHI instructions with other instructions that preserve their
				1488	semantics.</p>
				1489
				1490	<p>There are many ways in which PHI instructions can safely be removed
				1491	from the target code. The most traditional PHI deconstruction
				1492	algorithm replaces PHI instructions with copy instructions. That is
				1493	the strategy adopted by LLVM. The SSA deconstruction algorithm is
				1494	implemented in n<tt>lib/CodeGen/>PHIElimination.cpp</tt>. In order to
				1495	invoke this pass, the identifier <tt>PHIEliminationID</tt> must be
				1496	marked as required in the code of the register allocator.</p>
				1497
				1498	</div>
				1499
				1500	<!-- _______________________________________________________________________ -->
				1501	<div class="doc_subsubsection">
				1502	<a name="regAlloc_fold">Instruction folding</a>
				1503	</div>
				1504
				1505	<div class="doc_text">
				1506
				1507	<p><i>Instruction folding</i> is an optimization performed during
				1508	register allocation that removes unnecessary copy instructions. For
				1509	instance, a sequence of instructions such as:</p>
				1510
				1511	<div class="doc_code">
				1512	<pre>
				1513	%EBX = LOAD %mem_address
				1514	%EAX = COPY %EBX
				1515	</pre>
				1516	</div>
				1517
				1518	<p>can be safely substituted by the single instruction:
				1519
				1520	<div class="doc_code">
				1521	<pre>
				1522	%EAX = LOAD %mem_address
				1523	</pre>
				1524	</div>
				1525
				1526	<p>Instructions can be folded with the
				1527	<tt>MRegisterInfo::foldMemoryOperand(...)</tt> method. Care must be
				1528	taken when folding instructions; a folded instruction can be quite
				1529	different from the original instruction. See
				1530	<tt>LiveIntervals::addIntervalsForSpills</tt> in
				1531	<tt>lib/CodeGen/LiveIntervalAnalysis.cpp</tt> for an example of its use.</p>
				1532
				1533	</div>
				1534
				1535	<!-- _______________________________________________________________________ -->
				1536
				1537	<div class="doc_subsubsection">
				1538	<a name="regAlloc_builtIn">Built in register allocators</a>
				1539	</div>
				1540
				1541	<div class="doc_text">
				1542
				1543	<p>The LLVM infrastructure provides the application developer with
				1544	three different register allocators:</p>
				1545
				1546	<ul>
				1547	<li><i>Simple</i> - This is a very simple implementation that does
				1548	not keep values in registers across instructions. This register
				1549	allocator immediately spills every value right after it is
				1550	computed, and reloads all used operands from memory to temporary
				1551	registers before each instruction.</li>
				1552	<li><i>Local</i> - This register allocator is an improvement on the
				1553	<i>Simple</i> implementation. It allocates registers on a basic
				1554	block level, attempting to keep values in registers and reusing
				1555	registers as appropriate.</li>
				1556	<li><i>Linear Scan</i> - <i>The default allocator</i>. This is the
				1557	well-know linear scan register allocator. Whereas the
				1558	<i>Simple</i> and <i>Local</i> algorithms use a direct mapping
				1559	implementation technique, the <i>Linear Scan</i> implementation
				1560	uses a spiller in order to place load and stores.</li>
				1561	</ul>
				1562
				1563	<p>The type of register allocator used in <tt>llc</tt> can be chosen with the
				1564	command line option <tt>-regalloc=...</tt>:</p>
				1565
				1566	<div class="doc_code">
				1567	<pre>
				1568	$ llc -f -regalloc=simple file.bc -o sp.s;
				1569	$ llc -f -regalloc=local file.bc -o lc.s;
				1570	$ llc -f -regalloc=linearscan file.bc -o ln.s;
				1571	</pre>
				1572	</div>
				1573
				1574	</div>
				1575
				1576	<!-- ======================================================================= -->
				1577	<div class="doc_subsection">
				1578	<a name="proepicode">Prolog/Epilog Code Insertion</a>
				1579	</div>
				1580	<div class="doc_text"><p>To Be Written</p></div>
				1581	<!-- ======================================================================= -->
				1582	<div class="doc_subsection">
				1583	<a name="latemco">Late Machine Code Optimizations</a>
				1584	</div>
				1585	<div class="doc_text"><p>To Be Written</p></div>
				1586	<!-- ======================================================================= -->
				1587	<div class="doc_subsection">
				1588	<a name="codeemit">Code Emission</a>
				1589	</div>
				1590	<div class="doc_text"><p>To Be Written</p></div>
				1591	<!-- _______________________________________________________________________ -->
				1592	<div class="doc_subsubsection">
				1593	<a name="codeemit_asm">Generating Assembly Code</a>
				1594	</div>
				1595	<div class="doc_text"><p>To Be Written</p></div>
				1596	<!-- _______________________________________________________________________ -->
				1597	<div class="doc_subsubsection">
				1598	<a name="codeemit_bin">Generating Binary Machine Code</a>
				1599	</div>
				1600
				1601	<div class="doc_text">
				1602	<p>For the JIT or <tt>.o</tt> file writer</p>
				1603	</div>
				1604
				1605
				1606	<!-- *********************************************************************** -->
				1607	<div class="doc_section">
				1608	<a name="targetimpls">Target-specific Implementation Notes</a>
				1609	</div>
				1610	<!-- *********************************************************************** -->
				1611
				1612	<div class="doc_text">
				1613
				1614	<p>This section of the document explains features or design decisions that
				1615	are specific to the code generator for a particular target.</p>
				1616
				1617	</div>
				1618
				1619
				1620	<!-- ======================================================================= -->
				1621	<div class="doc_subsection">
				1622	<a name="x86">The X86 backend</a>
				1623	</div>
				1624
				1625	<div class="doc_text">
				1626
				1627	<p>The X86 code generator lives in the <tt>lib/Target/X86</tt> directory. This
				1628	code generator currently targets a generic P6-like processor. As such, it
				1629	produces a few P6-and-above instructions (like conditional moves), but it does
				1630	not make use of newer features like MMX or SSE. In the future, the X86 backend
				1631	will have sub-target support added for specific processor families and
				1632	implementations.</p>
				1633
				1634	</div>
				1635
				1636	<!-- _______________________________________________________________________ -->
				1637	<div class="doc_subsubsection">
				1638	<a name="x86_tt">X86 Target Triples Supported</a>
				1639	</div>
				1640
				1641	<div class="doc_text">
				1642
				1643	<p>The following are the known target triples that are supported by the X86
				1644	backend. This is not an exhaustive list, and it would be useful to add those
				1645	that people test.</p>
				1646
				1647	<ul>
				1648	<li><b>i686-pc-linux-gnu</b> - Linux</li>
				1649	<li><b>i386-unknown-freebsd5.3</b> - FreeBSD 5.3</li>
				1650	<li><b>i686-pc-cygwin</b> - Cygwin on Win32</li>
				1651	<li><b>i686-pc-mingw32</b> - MingW on Win32</li>
				1652	<li><b>i386-pc-mingw32msvc</b> - MingW crosscompiler on Linux</li>
				1653	<li><b>i686-apple-darwin*</b> - Apple Darwin on X86</li>
				1654	</ul>
				1655
				1656	</div>
				1657
				1658	<!-- _______________________________________________________________________ -->
				1659	<div class="doc_subsubsection">
				1660	<a name="x86_cc">X86 Calling Conventions supported</a>
				1661	</div>
				1662
				1663
				1664	<div class="doc_text">
				1665
				1666	<p>The folowing target-specific calling conventions are known to backend:</p>
				1667
				1668	<ul>
				1669	<li><b>x86_StdCall</b> - stdcall calling convention seen on Microsoft Windows
				1670	platform (CC ID = 64).</li>
				1671	<li><b>x86_FastCall</b> - fastcall calling convention seen on Microsoft Windows
				1672	platform (CC ID = 65).</li>
				1673	</ul>
				1674
				1675	</div>
				1676
				1677	<!-- _______________________________________________________________________ -->
				1678	<div class="doc_subsubsection">
				1679	<a name="x86_memory">Representing X86 addressing modes in MachineInstrs</a>
				1680	</div>
				1681
				1682	<div class="doc_text">
				1683
				1684	<p>The x86 has a very flexible way of accessing memory. It is capable of
				1685	forming memory addresses of the following expression directly in integer
				1686	instructions (which use ModR/M addressing):</p>
				1687
				1688	<div class="doc_code">
				1689	<pre>
				1690	Base + [1,2,4,8] * IndexReg + Disp32
				1691	</pre>
				1692	</div>
				1693
				1694	<p>In order to represent this, LLVM tracks no less than 4 operands for each
				1695	memory operand of this form. This means that the "load" form of '<tt>mov</tt>'
				1696	has the following <tt>MachineOperand</tt>s in this order:</p>
				1697
				1698	<pre>
				1699	Index: 0 \| 1 2 3 4
				1700	Meaning: DestReg, \| BaseReg, Scale, IndexReg, Displacement
				1701	OperandTy: VirtReg, \| VirtReg, UnsImm, VirtReg, SignExtImm
				1702	</pre>
				1703
				1704	<p>Stores, and all other instructions, treat the four memory operands in the
				1705	same way and in the same order.</p>
				1706
				1707	</div>
				1708
				1709	<!-- _______________________________________________________________________ -->
				1710	<div class="doc_subsubsection">
				1711	<a name="x86_names">Instruction naming</a>
				1712	</div>
				1713
				1714	<div class="doc_text">
				1715
				1716	<p>An instruction name consists of the base name, a default operand size, and a
				1717	a character per operand with an optional special size. For example:</p>
				1718
				1719	<p>
				1720	<tt>ADD8rr</tt> -> add, 8-bit register, 8-bit register<br>
				1721	<tt>IMUL16rmi</tt> -> imul, 16-bit register, 16-bit memory, 16-bit immediate<br>
				1722	<tt>IMUL16rmi8</tt> -> imul, 16-bit register, 16-bit memory, 8-bit immediate<br>
				1723	<tt>MOVSX32rm16</tt> -> movsx, 32-bit register, 16-bit memory
				1724	</p>
				1725
				1726	</div>
				1727
				1728	<!-- ======================================================================= -->
				1729	<div class="doc_subsection">
				1730	<a name="ppc">The PowerPC backend</a>
				1731	</div>
				1732
				1733	<div class="doc_text">
				1734	<p>The PowerPC code generator lives in the lib/Target/PowerPC directory. The
				1735	code generation is retargetable to several variations or <i>subtargets</i> of
				1736	the PowerPC ISA; including ppc32, ppc64 and altivec.
				1737	</p>
				1738	</div>
				1739
				1740	<!-- _______________________________________________________________________ -->
				1741	<div class="doc_subsubsection">
				1742	<a name="ppc_abi">LLVM PowerPC ABI</a>
				1743	</div>
				1744
				1745	<div class="doc_text">
				1746	<p>LLVM follows the AIX PowerPC ABI, with two deviations. LLVM uses a PC
				1747	relative (PIC) or static addressing for accessing global values, so no TOC (r2)
				1748	is used. Second, r31 is used as a frame pointer to allow dynamic growth of a
				1749	stack frame. LLVM takes advantage of having no TOC to provide space to save
				1750	the frame pointer in the PowerPC linkage area of the caller frame. Other
				1751	details of PowerPC ABI can be found at <a href=
				1752	"http://developer.apple.com/documentation/DeveloperTools/Conceptual/LowLevelABI/Articles/32bitPowerPC.html"
				1753	>PowerPC ABI.</a> Note: This link describes the 32 bit ABI. The
				1754	64 bit ABI is similar except space for GPRs are 8 bytes wide (not 4) and r13 is
				1755	reserved for system use.</p>
				1756	</div>
				1757
				1758	<!-- _______________________________________________________________________ -->
				1759	<div class="doc_subsubsection">
				1760	<a name="ppc_frame">Frame Layout</a>
				1761	</div>
				1762
				1763	<div class="doc_text">
				1764	<p>The size of a PowerPC frame is usually fixed for the duration of a
				1765	function’s invocation. Since the frame is fixed size, all references into
				1766	the frame can be accessed via fixed offsets from the stack pointer. The
				1767	exception to this is when dynamic alloca or variable sized arrays are present,
				1768	then a base pointer (r31) is used as a proxy for the stack pointer and stack
				1769	pointer is free to grow or shrink. A base pointer is also used if llvm-gcc is
				1770	not passed the -fomit-frame-pointer flag. The stack pointer is always aligned to
				1771	16 bytes, so that space allocated for altivec vectors will be properly
				1772	aligned.</p>
				1773	<p>An invocation frame is layed out as follows (low memory at top);</p>
				1774	</div>
				1775
				1776	<div class="doc_text">
				1777	<table class="layout">
				1778	<tr>
				1779	<td>Linkage<br><br></td>
				1780	</tr>
				1781	<tr>
				1782	<td>Parameter area<br><br></td>
				1783	</tr>
				1784	<tr>
				1785	<td>Dynamic area<br><br></td>
				1786	</tr>
				1787	<tr>
				1788	<td>Locals area<br><br></td>
				1789	</tr>
				1790	<tr>
				1791	<td>Saved registers area<br><br></td>
				1792	</tr>
				1793	<tr style="border-style: none hidden none hidden;">
				1794	<td><br></td>
				1795	</tr>
				1796	<tr>
				1797	<td>Previous Frame<br><br></td>
				1798	</tr>
				1799	</table>
				1800	</div>
				1801
				1802	<div class="doc_text">
				1803	<p>The <i>linkage</i> area is used by a callee to save special registers prior
				1804	to allocating its own frame. Only three entries are relevant to LLVM. The
				1805	first entry is the previous stack pointer (sp), aka link. This allows probing
				1806	tools like gdb or exception handlers to quickly scan the frames in the stack. A
				1807	function epilog can also use the link to pop the frame from the stack. The
				1808	third entry in the linkage area is used to save the return address from the lr
				1809	register. Finally, as mentioned above, the last entry is used to save the
				1810	previous frame pointer (r31.) The entries in the linkage area are the size of a
				1811	GPR, thus the linkage area is 24 bytes long in 32 bit mode and 48 bytes in 64
				1812	bit mode.</p>
				1813	</div>
				1814
				1815	<div class="doc_text">
				1816	<p>32 bit linkage area</p>
				1817	<table class="layout">
				1818	<tr>
				1819	<td>0</td>
				1820	<td>Saved SP (r1)</td>
				1821	</tr>
				1822	<tr>
				1823	<td>4</td>
				1824	<td>Saved CR</td>
				1825	</tr>
				1826	<tr>
				1827	<td>8</td>
				1828	<td>Saved LR</td>
				1829	</tr>
				1830	<tr>
				1831	<td>12</td>
				1832	<td>Reserved</td>
				1833	</tr>
				1834	<tr>
				1835	<td>16</td>
				1836	<td>Reserved</td>
				1837	</tr>
				1838	<tr>
				1839	<td>20</td>
				1840	<td>Saved FP (r31)</td>
				1841	</tr>
				1842	</table>
				1843	</div>
				1844
				1845	<div class="doc_text">
				1846	<p>64 bit linkage area</p>
				1847	<table class="layout">
				1848	<tr>
				1849	<td>0</td>
				1850	<td>Saved SP (r1)</td>
				1851	</tr>
				1852	<tr>
				1853	<td>8</td>
				1854	<td>Saved CR</td>
				1855	</tr>
				1856	<tr>
				1857	<td>16</td>
				1858	<td>Saved LR</td>
				1859	</tr>
				1860	<tr>
				1861	<td>24</td>
				1862	<td>Reserved</td>
				1863	</tr>
				1864	<tr>
				1865	<td>32</td>
				1866	<td>Reserved</td>
				1867	</tr>
				1868	<tr>
				1869	<td>40</td>
				1870	<td>Saved FP (r31)</td>
				1871	</tr>
				1872	</table>
				1873	</div>
				1874
				1875	<div class="doc_text">
				1876	<p>The <i>parameter area</i> is used to store arguments being passed to a callee
				1877	function. Following the PowerPC ABI, the first few arguments are actually
				1878	passed in registers, with the space in the parameter area unused. However, if
				1879	there are not enough registers or the callee is a thunk or vararg function,
				1880	these register arguments can be spilled into the parameter area. Thus, the
				1881	parameter area must be large enough to store all the parameters for the largest
				1882	call sequence made by the caller. The size must also be mimimally large enough
				1883	to spill registers r3-r10. This allows callees blind to the call signature,
				1884	such as thunks and vararg functions, enough space to cache the argument
				1885	registers. Therefore, the parameter area is minimally 32 bytes (64 bytes in 64
				1886	bit mode.) Also note that since the parameter area is a fixed offset from the
				1887	top of the frame, that a callee can access its spilt arguments using fixed
				1888	offsets from the stack pointer (or base pointer.)</p>
				1889	</div>
				1890
				1891	<div class="doc_text">
				1892	<p>Combining the information about the linkage, parameter areas and alignment. A
				1893	stack frame is minimally 64 bytes in 32 bit mode and 128 bytes in 64 bit
				1894	mode.</p>
				1895	</div>
				1896
				1897	<div class="doc_text">
				1898	<p>The <i>dynamic area</i> starts out as size zero. If a function uses dynamic
				1899	alloca then space is added to the stack, the linkage and parameter areas are
				1900	shifted to top of stack, and the new space is available immediately below the
				1901	linkage and parameter areas. The cost of shifting the linkage and parameter
				1902	areas is minor since only the link value needs to be copied. The link value can
				1903	be easily fetched by adding the original frame size to the base pointer. Note
				1904	that allocations in the dynamic space need to observe 16 byte aligment.</p>
				1905	</div>
				1906
				1907	<div class="doc_text">
				1908	<p>The <i>locals area</i> is where the llvm compiler reserves space for local
				1909	variables.</p>
				1910	</div>
				1911
				1912	<div class="doc_text">
				1913	<p>The <i>saved registers area</i> is where the llvm compiler spills callee saved
				1914	registers on entry to the callee.</p>
				1915	</div>
				1916
				1917	<!-- _______________________________________________________________________ -->
				1918	<div class="doc_subsubsection">
				1919	<a name="ppc_prolog">Prolog/Epilog</a>
				1920	</div>
				1921
				1922	<div class="doc_text">
				1923	<p>The llvm prolog and epilog are the same as described in the PowerPC ABI, with
				1924	the following exceptions. Callee saved registers are spilled after the frame is
				1925	created. This allows the llvm epilog/prolog support to be common with other
				1926	targets. The base pointer callee saved register r31 is saved in the TOC slot of
				1927	linkage area. This simplifies allocation of space for the base pointer and
				1928	makes it convenient to locate programatically and during debugging.</p>
				1929	</div>
				1930
				1931	<!-- _______________________________________________________________________ -->
				1932	<div class="doc_subsubsection">
				1933	<a name="ppc_dynamic">Dynamic Allocation</a>
				1934	</div>
				1935
				1936	<div class="doc_text">
				1937	<p></p>
				1938	</div>
				1939
				1940	<div class="doc_text">
				1941	<p><i>TODO - More to come.</i></p>
				1942	</div>
				1943
				1944
				1945	<!-- *********************************************************************** -->
				1946	<hr>
				1947	<address>
				1948	<a href="http://jigsaw.w3.org/css-validator/check/referer"><img
				1949	src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
				1950	<a href="http://validator.w3.org/check/referer"><img
				1951	src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!" /></a>
				1952
				1953	<a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
				1954	<a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
				1955	Last modified: $Date$
				1956	</address>
				1957
				1958	</body>
				1959	</html>