Blame - docs/CodeGenerator.html - fp2-dev/platform/external/llvm

blob: 29ea79ec202c6bfae61e81dbcf4f5275555e4957 [file] [log] [blame]

Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
				2	"http://www.w3.org/TR/html4/strict.dtd">
				3	<html>
				4	<head>
				5	<meta http-equiv="content-type" content="text/html; charset=utf-8">
				6	<title>The LLVM Target-Independent Code Generator</title>
				7	<link rel="stylesheet" href="llvm.css" type="text/css">
				8	</head>
				9	<body>
				10
				11	<div class="doc_title">
				12	The LLVM Target-Independent Code Generator
				13	</div>
				14
				15	<ol>
				16	<li><a href="#introduction">Introduction</a>
				17	<ul>
				18	<li><a href="#required">Required components in the code generator</a></li>
				19	<li><a href="#high-level-design">The high-level design of the code
				20	generator</a></li>
				21	<li><a href="#tablegen">Using TableGen for target description</a></li>
				22	</ul>
				23	</li>
				24	<li><a href="#targetdesc">Target description classes</a>
				25	<ul>
				26	<li><a href="#targetmachine">The <tt>TargetMachine</tt> class</a></li>
				27	<li><a href="#targetdata">The <tt>TargetData</tt> class</a></li>
				28	<li><a href="#targetlowering">The <tt>TargetLowering</tt> class</a></li>
Dan Gohman	1e57df3	2008-02-10 18:45:23 +0000	[diff] [blame]	29	<li><a href="#targetregisterinfo">The <tt>TargetRegisterInfo</tt> class</a></li>
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	30	<li><a href="#targetinstrinfo">The <tt>TargetInstrInfo</tt> class</a></li>
				31	<li><a href="#targetframeinfo">The <tt>TargetFrameInfo</tt> class</a></li>
				32	<li><a href="#targetsubtarget">The <tt>TargetSubtarget</tt> class</a></li>
				33	<li><a href="#targetjitinfo">The <tt>TargetJITInfo</tt> class</a></li>
				34	</ul>
				35	</li>
				36	<li><a href="#codegendesc">Machine code description classes</a>
				37	<ul>
				38	<li><a href="#machineinstr">The <tt>MachineInstr</tt> class</a></li>
				39	<li><a href="#machinebasicblock">The <tt>MachineBasicBlock</tt>
				40	class</a></li>
				41	<li><a href="#machinefunction">The <tt>MachineFunction</tt> class</a></li>
				42	</ul>
				43	</li>
				44	<li><a href="#codegenalgs">Target-independent code generation algorithms</a>
				45	<ul>
				46	<li><a href="#instselect">Instruction Selection</a>
				47	<ul>
				48	<li><a href="#selectiondag_intro">Introduction to SelectionDAGs</a></li>
				49	<li><a href="#selectiondag_process">SelectionDAG Code Generation
				50	Process</a></li>
				51	<li><a href="#selectiondag_build">Initial SelectionDAG
				52	Construction</a></li>
Dan Gohman	a5486e2	2008-11-24 16:27:17 +0000	[diff] [blame]	53	<li><a href="#selectiondag_legalize_types">SelectionDAG LegalizeTypes Phase</a></li>
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	54	<li><a href="#selectiondag_legalize">SelectionDAG Legalize Phase</a></li>
				55	<li><a href="#selectiondag_optimize">SelectionDAG Optimization
				56	Phase: the DAG Combiner</a></li>
				57	<li><a href="#selectiondag_select">SelectionDAG Select Phase</a></li>
				58	<li><a href="#selectiondag_sched">SelectionDAG Scheduling and Formation
				59	Phase</a></li>
				60	<li><a href="#selectiondag_future">Future directions for the
				61	SelectionDAG</a></li>
				62	</ul></li>
				63	<li><a href="#liveintervals">Live Intervals</a>
				64	<ul>
				65	<li><a href="#livevariable_analysis">Live Variable Analysis</a></li>
				66	<li><a href="#liveintervals_analysis">Live Intervals Analysis</a></li>
				67	</ul></li>
				68	<li><a href="#regalloc">Register Allocation</a>
				69	<ul>
				70	<li><a href="#regAlloc_represent">How registers are represented in
				71	LLVM</a></li>
				72	<li><a href="#regAlloc_howTo">Mapping virtual registers to physical
				73	registers</a></li>
				74	<li><a href="#regAlloc_twoAddr">Handling two address instructions</a></li>
				75	<li><a href="#regAlloc_ssaDecon">The SSA deconstruction phase</a></li>
				76	<li><a href="#regAlloc_fold">Instruction folding</a></li>
				77	<li><a href="#regAlloc_builtIn">Built in register allocators</a></li>
				78	</ul></li>
				79	<li><a href="#codeemit">Code Emission</a>
				80	<ul>
				81	<li><a href="#codeemit_asm">Generating Assembly Code</a></li>
				82	<li><a href="#codeemit_bin">Generating Binary Machine Code</a></li>
				83	</ul></li>
				84	</ul>
				85	</li>
				86	<li><a href="#targetimpls">Target-specific Implementation Notes</a>
				87	<ul>
Arnold Schwaighofer	0744492	2008-05-14 09:17:12 +0000	[diff] [blame]	88	<li><a href="#tailcallopt">Tail call optimization</a></li>
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	89	<li><a href="#x86">The X86 backend</a></li>
				90	<li><a href="#ppc">The PowerPC backend</a>
				91	<ul>
				92	<li><a href="#ppc_abi">LLVM PowerPC ABI</a></li>
				93	<li><a href="#ppc_frame">Frame Layout</a></li>
				94	<li><a href="#ppc_prolog">Prolog/Epilog</a></li>
				95	<li><a href="#ppc_dynamic">Dynamic Allocation</a></li>
				96	</ul></li>
				97	</ul></li>
				98
				99	</ol>
				100
				101	<div class="doc_author">
				102	<p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a>,
				103	<a href="mailto:isanbard@gmail.com">Bill Wendling</a>,
				104	<a href="mailto:pronesto@gmail.com">Fernando Magno Quintao
				105	Pereira</a> and
				106	<a href="mailto:jlaskey@mac.com">Jim Laskey</a></p>
				107	</div>
				108
				109	<div class="doc_warning">
				110	<p>Warning: This is a work in progress.</p>
				111	</div>
				112
				113	<!-- *********************************************************************** -->
				114	<div class="doc_section">
				115	<a name="introduction">Introduction</a>
				116	</div>
				117	<!-- *********************************************************************** -->
				118
				119	<div class="doc_text">
				120
				121	<p>The LLVM target-independent code generator is a framework that provides a
				122	suite of reusable components for translating the LLVM internal representation to
				123	the machine code for a specified target—either in assembly form (suitable
				124	for a static compiler) or in binary machine code format (usable for a JIT
				125	compiler). The LLVM target-independent code generator consists of five main
				126	components:</p>
				127
				128	<ol>
				129	<li><a href="#targetdesc">Abstract target description</a> interfaces which
				130	capture important properties about various aspects of the machine, independently
				131	of how they will be used. These interfaces are defined in
				132	<tt>include/llvm/Target/</tt>.</li>
				133
				134	<li>Classes used to represent the <a href="#codegendesc">machine code</a> being
				135	generated for a target. These classes are intended to be abstract enough to
				136	represent the machine code for <i>any</i> target machine. These classes are
				137	defined in <tt>include/llvm/CodeGen/</tt>.</li>
				138
				139	<li><a href="#codegenalgs">Target-independent algorithms</a> used to implement
				140	various phases of native code generation (register allocation, scheduling, stack
				141	frame representation, etc). This code lives in <tt>lib/CodeGen/</tt>.</li>
				142
				143	<li><a href="#targetimpls">Implementations of the abstract target description
				144	interfaces</a> for particular targets. These machine descriptions make use of
				145	the components provided by LLVM, and can optionally provide custom
				146	target-specific passes, to build complete code generators for a specific target.
				147	Target descriptions live in <tt>lib/Target/</tt>.</li>
				148
				149	<li><a href="#jit">The target-independent JIT components</a>. The LLVM JIT is
				150	completely target independent (it uses the <tt>TargetJITInfo</tt> structure to
				151	interface for target-specific issues. The code for the target-independent
				152	JIT lives in <tt>lib/ExecutionEngine/JIT</tt>.</li>
				153
				154	</ol>
				155
				156	<p>
				157	Depending on which part of the code generator you are interested in working on,
				158	different pieces of this will be useful to you. In any case, you should be
				159	familiar with the <a href="#targetdesc">target description</a> and <a
				160	href="#codegendesc">machine code representation</a> classes. If you want to add
				161	a backend for a new target, you will need to <a href="#targetimpls">implement the
				162	target description</a> classes for your new target and understand the <a
				163	href="LangRef.html">LLVM code representation</a>. If you are interested in
				164	implementing a new <a href="#codegenalgs">code generation algorithm</a>, it
				165	should only depend on the target-description and machine code representation
				166	classes, ensuring that it is portable.
				167	</p>
				168
				169	</div>
				170
				171	<!-- ======================================================================= -->
				172	<div class="doc_subsection">
				173	<a name="required">Required components in the code generator</a>
				174	</div>
				175
				176	<div class="doc_text">
				177
				178	<p>The two pieces of the LLVM code generator are the high-level interface to the
				179	code generator and the set of reusable components that can be used to build
				180	target-specific backends. The two most important interfaces (<a
				181	href="#targetmachine"><tt>TargetMachine</tt></a> and <a
				182	href="#targetdata"><tt>TargetData</tt></a>) are the only ones that are
				183	required to be defined for a backend to fit into the LLVM system, but the others
				184	must be defined if the reusable code generator components are going to be
				185	used.</p>
				186
				187	<p>This design has two important implications. The first is that LLVM can
				188	support completely non-traditional code generation targets. For example, the C
				189	backend does not require register allocation, instruction selection, or any of
				190	the other standard components provided by the system. As such, it only
				191	implements these two interfaces, and does its own thing. Another example of a
				192	code generator like this is a (purely hypothetical) backend that converts LLVM
				193	to the GCC RTL form and uses GCC to emit machine code for a target.</p>
				194
				195	<p>This design also implies that it is possible to design and
				196	implement radically different code generators in the LLVM system that do not
				197	make use of any of the built-in components. Doing so is not recommended at all,
				198	but could be required for radically different targets that do not fit into the
				199	LLVM machine description model: FPGAs for example.</p>
				200
				201	</div>
				202
				203	<!-- ======================================================================= -->
				204	<div class="doc_subsection">
				205	<a name="high-level-design">The high-level design of the code generator</a>
				206	</div>
				207
				208	<div class="doc_text">
				209
				210	<p>The LLVM target-independent code generator is designed to support efficient and
				211	quality code generation for standard register-based microprocessors. Code
				212	generation in this model is divided into the following stages:</p>
				213
				214	<ol>
				215	<li><b><a href="#instselect">Instruction Selection</a></b> - This phase
				216	determines an efficient way to express the input LLVM code in the target
				217	instruction set.
				218	This stage produces the initial code for the program in the target instruction
				219	set, then makes use of virtual registers in SSA form and physical registers that
				220	represent any required register assignments due to target constraints or calling
				221	conventions. This step turns the LLVM code into a DAG of target
				222	instructions.</li>
				223
				224	<li><b><a href="#selectiondag_sched">Scheduling and Formation</a></b> - This
				225	phase takes the DAG of target instructions produced by the instruction selection
				226	phase, determines an ordering of the instructions, then emits the instructions
				227	as <tt><a href="#machineinstr">MachineInstr</a></tt>s with that ordering. Note
				228	that we describe this in the <a href="#instselect">instruction selection
				229	section</a> because it operates on a <a
				230	href="#selectiondag_intro">SelectionDAG</a>.
				231	</li>
				232
				233	<li><b><a href="#ssamco">SSA-based Machine Code Optimizations</a></b> - This
				234	optional stage consists of a series of machine-code optimizations that
				235	operate on the SSA-form produced by the instruction selector. Optimizations
				236	like modulo-scheduling or peephole optimization work here.
				237	</li>
				238
				239	<li><b><a href="#regalloc">Register Allocation</a></b> - The
				240	target code is transformed from an infinite virtual register file in SSA form
				241	to the concrete register file used by the target. This phase introduces spill
				242	code and eliminates all virtual register references from the program.</li>
				243
				244	<li><b><a href="#proepicode">Prolog/Epilog Code Insertion</a></b> - Once the
				245	machine code has been generated for the function and the amount of stack space
				246	required is known (used for LLVM alloca's and spill slots), the prolog and
				247	epilog code for the function can be inserted and "abstract stack location
				248	references" can be eliminated. This stage is responsible for implementing
				249	optimizations like frame-pointer elimination and stack packing.</li>
				250
				251	<li><b><a href="#latemco">Late Machine Code Optimizations</a></b> - Optimizations
				252	that operate on "final" machine code can go here, such as spill code scheduling
				253	and peephole optimizations.</li>
				254
				255	<li><b><a href="#codeemit">Code Emission</a></b> - The final stage actually
				256	puts out the code for the current function, either in the target assembler
				257	format or in machine code.</li>
				258
				259	</ol>
				260
				261	<p>The code generator is based on the assumption that the instruction selector
				262	will use an optimal pattern matching selector to create high-quality sequences of
				263	native instructions. Alternative code generator designs based on pattern
				264	expansion and aggressive iterative peephole optimization are much slower. This
				265	design permits efficient compilation (important for JIT environments) and
				266	aggressive optimization (used when generating code offline) by allowing
				267	components of varying levels of sophistication to be used for any step of
				268	compilation.</p>
				269
				270	<p>In addition to these stages, target implementations can insert arbitrary
				271	target-specific passes into the flow. For example, the X86 target uses a
				272	special pass to handle the 80x87 floating point stack architecture. Other
				273	targets with unusual requirements can be supported with custom passes as
				274	needed.</p>
				275
				276	</div>
				277
				278
				279	<!-- ======================================================================= -->
				280	<div class="doc_subsection">
				281	<a name="tablegen">Using TableGen for target description</a>
				282	</div>
				283
				284	<div class="doc_text">
				285
				286	<p>The target description classes require a detailed description of the target
				287	architecture. These target descriptions often have a large amount of common
				288	information (e.g., an <tt>add</tt> instruction is almost identical to a
				289	<tt>sub</tt> instruction).
				290	In order to allow the maximum amount of commonality to be factored out, the LLVM
				291	code generator uses the <a href="TableGenFundamentals.html">TableGen</a> tool to
				292	describe big chunks of the target machine, which allows the use of
				293	domain-specific and target-specific abstractions to reduce the amount of
				294	repetition.</p>
				295
				296	<p>As LLVM continues to be developed and refined, we plan to move more and more
				297	of the target description to the <tt>.td</tt> form. Doing so gives us a
				298	number of advantages. The most important is that it makes it easier to port
				299	LLVM because it reduces the amount of C++ code that has to be written, and the
				300	surface area of the code generator that needs to be understood before someone
				301	can get something working. Second, it makes it easier to change things. In
				302	particular, if tables and other things are all emitted by <tt>tblgen</tt>, we
				303	only need a change in one place (<tt>tblgen</tt>) to update all of the targets
				304	to a new interface.</p>
				305
				306	</div>
				307
				308	<!-- *********************************************************************** -->
				309	<div class="doc_section">
				310	<a name="targetdesc">Target description classes</a>
				311	</div>
				312	<!-- *********************************************************************** -->
				313
				314	<div class="doc_text">
				315
				316	<p>The LLVM target description classes (located in the
				317	<tt>include/llvm/Target</tt> directory) provide an abstract description of the
				318	target machine independent of any particular client. These classes are
				319	designed to capture the <i>abstract</i> properties of the target (such as the
				320	instructions and registers it has), and do not incorporate any particular pieces
				321	of code generation algorithms.</p>
				322
				323	<p>All of the target description classes (except the <tt><a
				324	href="#targetdata">TargetData</a></tt> class) are designed to be subclassed by
				325	the concrete target implementation, and have virtual methods implemented. To
				326	get to these implementations, the <tt><a
				327	href="#targetmachine">TargetMachine</a></tt> class provides accessors that
				328	should be implemented by the target.</p>
				329
				330	</div>
				331
				332	<!-- ======================================================================= -->
				333	<div class="doc_subsection">
				334	<a name="targetmachine">The <tt>TargetMachine</tt> class</a>
				335	</div>
				336
				337	<div class="doc_text">
				338
				339	<p>The <tt>TargetMachine</tt> class provides virtual methods that are used to
				340	access the target-specific implementations of the various target description
				341	classes via the <tt>get*Info</tt> methods (<tt>getInstrInfo</tt>,
				342	<tt>getRegisterInfo</tt>, <tt>getFrameInfo</tt>, etc.). This class is
				343	designed to be specialized by
				344	a concrete target implementation (e.g., <tt>X86TargetMachine</tt>) which
				345	implements the various virtual methods. The only required target description
				346	class is the <a href="#targetdata"><tt>TargetData</tt></a> class, but if the
				347	code generator components are to be used, the other interfaces should be
				348	implemented as well.</p>
				349
				350	</div>
				351
				352
				353	<!-- ======================================================================= -->
				354	<div class="doc_subsection">
				355	<a name="targetdata">The <tt>TargetData</tt> class</a>
				356	</div>
				357
				358	<div class="doc_text">
				359
				360	<p>The <tt>TargetData</tt> class is the only required target description class,
				361	and it is the only class that is not extensible (you cannot derived a new
				362	class from it). <tt>TargetData</tt> specifies information about how the target
				363	lays out memory for structures, the alignment requirements for various data
				364	types, the size of pointers in the target, and whether the target is
				365	little-endian or big-endian.</p>
				366
				367	</div>
				368
				369	<!-- ======================================================================= -->
				370	<div class="doc_subsection">
				371	<a name="targetlowering">The <tt>TargetLowering</tt> class</a>
				372	</div>
				373
				374	<div class="doc_text">
				375
				376	<p>The <tt>TargetLowering</tt> class is used by SelectionDAG based instruction
				377	selectors primarily to describe how LLVM code should be lowered to SelectionDAG
				378	operations. Among other things, this class indicates:</p>
				379
				380	<ul>
				381	<li>an initial register class to use for various <tt>ValueType</tt>s</li>
				382	<li>which operations are natively supported by the target machine</li>
				383	<li>the return type of <tt>setcc</tt> operations</li>
				384	<li>the type to use for shift amounts</li>
				385	<li>various high-level characteristics, like whether it is profitable to turn
				386	division by a constant into a multiplication sequence</li>
				387	</ul>
				388
				389	</div>
				390
				391	<!-- ======================================================================= -->
				392	<div class="doc_subsection">
Dan Gohman	1e57df3	2008-02-10 18:45:23 +0000	[diff] [blame]	393	<a name="targetregisterinfo">The <tt>TargetRegisterInfo</tt> class</a>
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	394	</div>
				395
				396	<div class="doc_text">
				397
Dan Gohman	1e57df3	2008-02-10 18:45:23 +0000	[diff] [blame]	398	<p>The <tt>TargetRegisterInfo</tt> class is used to describe the register
				399	file of the target and any interactions between the registers.</p>
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	400
				401	<p>Registers in the code generator are represented in the code generator by
				402	unsigned integers. Physical registers (those that actually exist in the target
				403	description) are unique small numbers, and virtual registers are generally
				404	large. Note that register #0 is reserved as a flag value.</p>
				405
				406	<p>Each register in the processor description has an associated
				407	<tt>TargetRegisterDesc</tt> entry, which provides a textual name for the
				408	register (used for assembly output and debugging dumps) and a set of aliases
				409	(used to indicate whether one register overlaps with another).
				410	</p>
				411
Dan Gohman	1e57df3	2008-02-10 18:45:23 +0000	[diff] [blame]	412	<p>In addition to the per-register description, the <tt>TargetRegisterInfo</tt>
				413	class exposes a set of processor specific register classes (instances of the
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	414	<tt>TargetRegisterClass</tt> class). Each register class contains sets of
				415	registers that have the same properties (for example, they are all 32-bit
				416	integer registers). Each SSA virtual register created by the instruction
				417	selector has an associated register class. When the register allocator runs, it
				418	replaces virtual registers with a physical register in the set.</p>
				419
				420	<p>
				421	The target-specific implementations of these classes is auto-generated from a <a
				422	href="TableGenFundamentals.html">TableGen</a> description of the register file.
				423	</p>
				424
				425	</div>
				426
				427	<!-- ======================================================================= -->
				428	<div class="doc_subsection">
				429	<a name="targetinstrinfo">The <tt>TargetInstrInfo</tt> class</a>
				430	</div>
				431
				432	<div class="doc_text">
				433	<p>The <tt>TargetInstrInfo</tt> class is used to describe the machine
				434	instructions supported by the target. It is essentially an array of
				435	<tt>TargetInstrDescriptor</tt> objects, each of which describes one
				436	instruction the target supports. Descriptors define things like the mnemonic
				437	for the opcode, the number of operands, the list of implicit register uses
				438	and defs, whether the instruction has certain target-independent properties
				439	(accesses memory, is commutable, etc), and holds any target-specific
				440	flags.</p>
				441	</div>
				442
				443	<!-- ======================================================================= -->
				444	<div class="doc_subsection">
				445	<a name="targetframeinfo">The <tt>TargetFrameInfo</tt> class</a>
				446	</div>
				447
				448	<div class="doc_text">
				449	<p>The <tt>TargetFrameInfo</tt> class is used to provide information about the
				450	stack frame layout of the target. It holds the direction of stack growth,
				451	the known stack alignment on entry to each function, and the offset to the
				452	local area. The offset to the local area is the offset from the stack
				453	pointer on function entry to the first location where function data (local
				454	variables, spill locations) can be stored.</p>
				455	</div>
				456
				457	<!-- ======================================================================= -->
				458	<div class="doc_subsection">
				459	<a name="targetsubtarget">The <tt>TargetSubtarget</tt> class</a>
				460	</div>
				461
				462	<div class="doc_text">
				463	<p>The <tt>TargetSubtarget</tt> class is used to provide information about the
				464	specific chip set being targeted. A sub-target informs code generation of
				465	which instructions are supported, instruction latencies and instruction
				466	execution itinerary; i.e., which processing units are used, in what order, and
				467	for how long.</p>
				468	</div>
				469
				470
				471	<!-- ======================================================================= -->
				472	<div class="doc_subsection">
				473	<a name="targetjitinfo">The <tt>TargetJITInfo</tt> class</a>
				474	</div>
				475
				476	<div class="doc_text">
				477	<p>The <tt>TargetJITInfo</tt> class exposes an abstract interface used by the
				478	Just-In-Time code generator to perform target-specific activities, such as
				479	emitting stubs. If a <tt>TargetMachine</tt> supports JIT code generation, it
				480	should provide one of these objects through the <tt>getJITInfo</tt>
				481	method.</p>
				482	</div>
				483
				484	<!-- *********************************************************************** -->
				485	<div class="doc_section">
				486	<a name="codegendesc">Machine code description classes</a>
				487	</div>
				488	<!-- *********************************************************************** -->
				489
				490	<div class="doc_text">
				491
				492	<p>At the high-level, LLVM code is translated to a machine specific
				493	representation formed out of
				494	<a href="#machinefunction"><tt>MachineFunction</tt></a>,
				495	<a href="#machinebasicblock"><tt>MachineBasicBlock</tt></a>, and <a
				496	href="#machineinstr"><tt>MachineInstr</tt></a> instances
				497	(defined in <tt>include/llvm/CodeGen</tt>). This representation is completely
				498	target agnostic, representing instructions in their most abstract form: an
				499	opcode and a series of operands. This representation is designed to support
				500	both an SSA representation for machine code, as well as a register allocated,
				501	non-SSA form.</p>
				502
				503	</div>
				504
				505	<!-- ======================================================================= -->
				506	<div class="doc_subsection">
				507	<a name="machineinstr">The <tt>MachineInstr</tt> class</a>
				508	</div>
				509
				510	<div class="doc_text">
				511
				512	<p>Target machine instructions are represented as instances of the
				513	<tt>MachineInstr</tt> class. This class is an extremely abstract way of
				514	representing machine instructions. In particular, it only keeps track of
				515	an opcode number and a set of operands.</p>
				516
				517	<p>The opcode number is a simple unsigned integer that only has meaning to a
				518	specific backend. All of the instructions for a target should be defined in
				519	the <tt>*InstrInfo.td</tt> file for the target. The opcode enum values
				520	are auto-generated from this description. The <tt>MachineInstr</tt> class does
				521	not have any information about how to interpret the instruction (i.e., what the
				522	semantics of the instruction are); for that you must refer to the
				523	<tt><a href="#targetinstrinfo">TargetInstrInfo</a></tt> class.</p>
				524
				525	<p>The operands of a machine instruction can be of several different types:
				526	a register reference, a constant integer, a basic block reference, etc. In
				527	addition, a machine operand should be marked as a def or a use of the value
				528	(though only registers are allowed to be defs).</p>
				529
				530	<p>By convention, the LLVM code generator orders instruction operands so that
				531	all register definitions come before the register uses, even on architectures
				532	that are normally printed in other orders. For example, the SPARC add
				533	instruction: "<tt>add %i1, %i2, %i3</tt>" adds the "%i1", and "%i2" registers
				534	and stores the result into the "%i3" register. In the LLVM code generator,
				535	the operands should be stored as "<tt>%i3, %i1, %i2</tt>": with the destination
				536	first.</p>
				537
				538	<p>Keeping destination (definition) operands at the beginning of the operand
				539	list has several advantages. In particular, the debugging printer will print
				540	the instruction like this:</p>
				541
				542	<div class="doc_code">
				543	<pre>
				544	%r3 = add %i1, %i2
				545	</pre>
				546	</div>
				547
				548	<p>Also if the first operand is a def, it is easier to <a
				549	href="#buildmi">create instructions</a> whose only def is the first
				550	operand.</p>
				551
				552	</div>
				553
				554	<!-- _______________________________________________________________________ -->
				555	<div class="doc_subsubsection">
				556	<a name="buildmi">Using the <tt>MachineInstrBuilder.h</tt> functions</a>
				557	</div>
				558
				559	<div class="doc_text">
				560
				561	<p>Machine instructions are created by using the <tt>BuildMI</tt> functions,
				562	located in the <tt>include/llvm/CodeGen/MachineInstrBuilder.h</tt> file. The
				563	<tt>BuildMI</tt> functions make it easy to build arbitrary machine
				564	instructions. Usage of the <tt>BuildMI</tt> functions look like this:</p>
				565
				566	<div class="doc_code">
				567	<pre>
				568	// Create a 'DestReg = mov 42' (rendered in X86 assembly as 'mov DestReg, 42')
				569	// instruction. The '1' specifies how many operands will be added.
				570	MachineInstr *MI = BuildMI(X86::MOV32ri, 1, DestReg).addImm(42);
				571
				572	// Create the same instr, but insert it at the end of a basic block.
				573	MachineBasicBlock &MBB = ...
				574	BuildMI(MBB, X86::MOV32ri, 1, DestReg).addImm(42);
				575
				576	// Create the same instr, but insert it before a specified iterator point.
				577	MachineBasicBlock::iterator MBBI = ...
				578	BuildMI(MBB, MBBI, X86::MOV32ri, 1, DestReg).addImm(42);
				579
				580	// Create a 'cmp Reg, 0' instruction, no destination reg.
				581	MI = BuildMI(X86::CMP32ri, 2).addReg(Reg).addImm(0);
				582	// Create an 'sahf' instruction which takes no operands and stores nothing.
				583	MI = BuildMI(X86::SAHF, 0);
				584
				585	// Create a self looping branch instruction.
				586	BuildMI(MBB, X86::JNE, 1).addMBB(&MBB);
				587	</pre>
				588	</div>
				589
				590	<p>The key thing to remember with the <tt>BuildMI</tt> functions is that you
				591	have to specify the number of operands that the machine instruction will take.
				592	This allows for efficient memory allocation. You also need to specify if
				593	operands default to be uses of values, not definitions. If you need to add a
				594	definition operand (other than the optional destination register), you must
				595	explicitly mark it as such:</p>
				596
				597	<div class="doc_code">
				598	<pre>
				599	MI.addReg(Reg, MachineOperand::Def);
				600	</pre>
				601	</div>
				602
				603	</div>
				604
				605	<!-- _______________________________________________________________________ -->
				606	<div class="doc_subsubsection">
				607	<a name="fixedregs">Fixed (preassigned) registers</a>
				608	</div>
				609
				610	<div class="doc_text">
				611
				612	<p>One important issue that the code generator needs to be aware of is the
				613	presence of fixed registers. In particular, there are often places in the
				614	instruction stream where the register allocator <em>must</em> arrange for a
				615	particular value to be in a particular register. This can occur due to
				616	limitations of the instruction set (e.g., the X86 can only do a 32-bit divide
				617	with the <tt>EAX</tt>/<tt>EDX</tt> registers), or external factors like calling
				618	conventions. In any case, the instruction selector should emit code that
				619	copies a virtual register into or out of a physical register when needed.</p>
				620
				621	<p>For example, consider this simple LLVM example:</p>
				622
				623	<div class="doc_code">
				624	<pre>
Matthijs Kooijman	6957e4b	2008-06-04 15:46:35 +0000	[diff] [blame]	625	define i32 @test(i32 %X, i32 %Y) {
				626	%Z = udiv i32 %X, %Y
				627	ret i32 %Z
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	628	}
				629	</pre>
				630	</div>
				631
				632	<p>The X86 instruction selector produces this machine code for the <tt>div</tt>
				633	and <tt>ret</tt> (use
				634	"<tt>llc X.bc -march=x86 -print-machineinstrs</tt>" to get this):</p>
				635
				636	<div class="doc_code">
				637	<pre>
				638	;; Start of div
				639	%EAX = mov %reg1024 ;; Copy X (in reg1024) into EAX
				640	%reg1027 = sar %reg1024, 31
				641	%EDX = mov %reg1027 ;; Sign extend X into EDX
				642	idiv %reg1025 ;; Divide by Y (in reg1025)
				643	%reg1026 = mov %EAX ;; Read the result (Z) out of EAX
				644
				645	;; Start of ret
				646	%EAX = mov %reg1026 ;; 32-bit return value goes in EAX
				647	ret
				648	</pre>
				649	</div>
				650
				651	<p>By the end of code generation, the register allocator has coalesced
				652	the registers and deleted the resultant identity moves producing the
				653	following code:</p>
				654
				655	<div class="doc_code">
				656	<pre>
				657	;; X is in EAX, Y is in ECX
				658	mov %EAX, %EDX
				659	sar %EDX, 31
				660	idiv %ECX
				661	ret
				662	</pre>
				663	</div>
				664
				665	<p>This approach is extremely general (if it can handle the X86 architecture,
				666	it can handle anything!) and allows all of the target specific
				667	knowledge about the instruction stream to be isolated in the instruction
				668	selector. Note that physical registers should have a short lifetime for good
				669	code generation, and all physical registers are assumed dead on entry to and
				670	exit from basic blocks (before register allocation). Thus, if you need a value
				671	to be live across basic block boundaries, it <em>must</em> live in a virtual
				672	register.</p>
				673
				674	</div>
				675
				676	<!-- _______________________________________________________________________ -->
				677	<div class="doc_subsubsection">
				678	<a name="ssa">Machine code in SSA form</a>
				679	</div>
				680
				681	<div class="doc_text">
				682
				683	<p><tt>MachineInstr</tt>'s are initially selected in SSA-form, and
				684	are maintained in SSA-form until register allocation happens. For the most
				685	part, this is trivially simple since LLVM is already in SSA form; LLVM PHI nodes
				686	become machine code PHI nodes, and virtual registers are only allowed to have a
				687	single definition.</p>
				688
				689	<p>After register allocation, machine code is no longer in SSA-form because there
				690	are no virtual registers left in the code.</p>
				691
				692	</div>
				693
				694	<!-- ======================================================================= -->
				695	<div class="doc_subsection">
				696	<a name="machinebasicblock">The <tt>MachineBasicBlock</tt> class</a>
				697	</div>
				698
				699	<div class="doc_text">
				700
				701	<p>The <tt>MachineBasicBlock</tt> class contains a list of machine instructions
				702	(<tt><a href="#machineinstr">MachineInstr</a></tt> instances). It roughly
				703	corresponds to the LLVM code input to the instruction selector, but there can be
				704	a one-to-many mapping (i.e. one LLVM basic block can map to multiple machine
				705	basic blocks). The <tt>MachineBasicBlock</tt> class has a
				706	"<tt>getBasicBlock</tt>" method, which returns the LLVM basic block that it
				707	comes from.</p>
				708
				709	</div>
				710
				711	<!-- ======================================================================= -->
				712	<div class="doc_subsection">
				713	<a name="machinefunction">The <tt>MachineFunction</tt> class</a>
				714	</div>
				715
				716	<div class="doc_text">
				717
				718	<p>The <tt>MachineFunction</tt> class contains a list of machine basic blocks
				719	(<tt><a href="#machinebasicblock">MachineBasicBlock</a></tt> instances). It
				720	corresponds one-to-one with the LLVM function input to the instruction selector.
				721	In addition to a list of basic blocks, the <tt>MachineFunction</tt> contains a
				722	a <tt>MachineConstantPool</tt>, a <tt>MachineFrameInfo</tt>, a
Chris Lattner	b70e151	2007-12-31 04:16:08 +0000	[diff] [blame]	723	<tt>MachineFunctionInfo</tt>, and a <tt>MachineRegisterInfo</tt>. See
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	724	<tt>include/llvm/CodeGen/MachineFunction.h</tt> for more information.</p>
				725
				726	</div>
				727
				728	<!-- *********************************************************************** -->
				729	<div class="doc_section">
				730	<a name="codegenalgs">Target-independent code generation algorithms</a>
				731	</div>
				732	<!-- *********************************************************************** -->
				733
				734	<div class="doc_text">
				735
				736	<p>This section documents the phases described in the <a
				737	href="#high-level-design">high-level design of the code generator</a>. It
				738	explains how they work and some of the rationale behind their design.</p>
				739
				740	</div>
				741
				742	<!-- ======================================================================= -->
				743	<div class="doc_subsection">
				744	<a name="instselect">Instruction Selection</a>
				745	</div>
				746
				747	<div class="doc_text">
				748	<p>
				749	Instruction Selection is the process of translating LLVM code presented to the
				750	code generator into target-specific machine instructions. There are several
Evan Cheng	bd8c49c	2007-10-08 17:54:24 +0000	[diff] [blame]	751	well-known ways to do this in the literature. LLVM uses a SelectionDAG based
				752	instruction selector.
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	753	</p>
				754
				755	<p>Portions of the DAG instruction selector are generated from the target
				756	description (<tt>*.td</tt>) files. Our goal is for the entire instruction
Dan Gohman	5ab9826	2007-12-13 20:43:47 +0000	[diff] [blame]	757	selector to be generated from these <tt>.td</tt> files, though currently
				758	there are still things that require custom C++ code.</p>
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	759	</div>
				760
				761	<!-- _______________________________________________________________________ -->
				762	<div class="doc_subsubsection">
				763	<a name="selectiondag_intro">Introduction to SelectionDAGs</a>
				764	</div>
				765
				766	<div class="doc_text">
				767
				768	<p>The SelectionDAG provides an abstraction for code representation in a way
				769	that is amenable to instruction selection using automatic techniques
				770	(e.g. dynamic-programming based optimal pattern matching selectors). It is also
				771	well-suited to other phases of code generation; in particular,
				772	instruction scheduling (SelectionDAG's are very close to scheduling DAGs
				773	post-selection). Additionally, the SelectionDAG provides a host representation
				774	where a large variety of very-low-level (but target-independent)
				775	<a href="#selectiondag_optimize">optimizations</a> may be
				776	performed; ones which require extensive information about the instructions
				777	efficiently supported by the target.</p>
				778
				779	<p>The SelectionDAG is a Directed-Acyclic-Graph whose nodes are instances of the
				780	<tt>SDNode</tt> class. The primary payload of the <tt>SDNode</tt> is its
				781	operation code (Opcode) that indicates what operation the node performs and
				782	the operands to the operation.
				783	The various operation node types are described at the top of the
				784	<tt>include/llvm/CodeGen/SelectionDAGNodes.h</tt> file.</p>
				785
				786	<p>Although most operations define a single value, each node in the graph may
				787	define multiple values. For example, a combined div/rem operation will define
				788	both the dividend and the remainder. Many other situations require multiple
				789	values as well. Each node also has some number of operands, which are edges
				790	to the node defining the used value. Because nodes may define multiple values,
Dan Gohman	8181bd1	2008-07-27 21:46:04 +0000	[diff] [blame]	791	edges are represented by instances of the <tt>SDValue</tt> class, which is
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	792	a <tt><SDNode, unsigned></tt> pair, indicating the node and result
				793	value being used, respectively. Each value produced by an <tt>SDNode</tt> has
Duncan Sands	92c4391	2008-06-06 12:08:01 +0000	[diff] [blame]	794	an associated <tt>MVT</tt> (Machine Value Type) indicating what the type of the
				795	value is.</p>
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	796
				797	<p>SelectionDAGs contain two different kinds of values: those that represent
				798	data flow and those that represent control flow dependencies. Data values are
				799	simple edges with an integer or floating point value type. Control edges are
				800	represented as "chain" edges which are of type <tt>MVT::Other</tt>. These edges
				801	provide an ordering between nodes that have side effects (such as
				802	loads, stores, calls, returns, etc). All nodes that have side effects should
				803	take a token chain as input and produce a new one as output. By convention,
				804	token chain inputs are always operand #0, and chain results are always the last
				805	value produced by an operation.</p>
				806
				807	<p>A SelectionDAG has designated "Entry" and "Root" nodes. The Entry node is
				808	always a marker node with an Opcode of <tt>ISD::EntryToken</tt>. The Root node
				809	is the final side-effecting node in the token chain. For example, in a single
				810	basic block function it would be the return node.</p>
				811
				812	<p>One important concept for SelectionDAGs is the notion of a "legal" vs.
				813	"illegal" DAG. A legal DAG for a target is one that only uses supported
				814	operations and supported types. On a 32-bit PowerPC, for example, a DAG with
				815	a value of type i1, i8, i16, or i64 would be illegal, as would a DAG that uses a
				816	SREM or UREM operation. The
Dan Gohman	a5486e2	2008-11-24 16:27:17 +0000	[diff] [blame]	817	<a href="#selectinodag_legalize_types">legalize types</a> and
				818	<a href="#selectiondag_legalize">legalize operations</a> phases are
				819	responsible for turning an illegal DAG into a legal DAG.</p>
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	820
				821	</div>
				822
				823	<!-- _______________________________________________________________________ -->
				824	<div class="doc_subsubsection">
				825	<a name="selectiondag_process">SelectionDAG Instruction Selection Process</a>
				826	</div>
				827
				828	<div class="doc_text">
				829
				830	<p>SelectionDAG-based instruction selection consists of the following steps:</p>
				831
				832	<ol>
				833	<li><a href="#selectiondag_build">Build initial DAG</a> - This stage
				834	performs a simple translation from the input LLVM code to an illegal
				835	SelectionDAG.</li>
				836	<li><a href="#selectiondag_optimize">Optimize SelectionDAG</a> - This stage
				837	performs simple optimizations on the SelectionDAG to simplify it, and
				838	recognize meta instructions (like rotates and <tt>div</tt>/<tt>rem</tt>
				839	pairs) for targets that support these meta operations. This makes the
				840	resultant code more efficient and the <a href="#selectiondag_select">select
				841	instructions from DAG</a> phase (below) simpler.</li>
Dan Gohman	a5486e2	2008-11-24 16:27:17 +0000	[diff] [blame]	842	<li><a href="#selectiondag_legalize_types">Legalize SelectionDAG Types</a> - This
				843	stage transforms SelectionDAG nodes to eliminate any types that are
				844	unsupported on the target.</li>
				845	<li><a href="#selectiondag_optimize">Optimize SelectionDAG</a> - The
				846	SelectionDAG optimizer is run to clean up redundancies exposed
				847	by type legalization.</li>
				848	<li><a href="#selectiondag_legalize">Legalize SelectionDAG Types</a> - This
				849	stage transforms SelectionDAG nodes to eliminate any types that are
				850	unsupported on the target.</li>
				851	<li><a href="#selectiondag_optimize">Optimize SelectionDAG</a> - The
				852	SelectionDAG optimizer is run to eliminate inefficiencies introduced
				853	by operation legalization.</li>
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	854	<li><a href="#selectiondag_select">Select instructions from DAG</a> - Finally,
				855	the target instruction selector matches the DAG operations to target
				856	instructions. This process translates the target-independent input DAG into
				857	another DAG of target instructions.</li>
				858	<li><a href="#selectiondag_sched">SelectionDAG Scheduling and Formation</a>
				859	- The last phase assigns a linear order to the instructions in the
				860	target-instruction DAG and emits them into the MachineFunction being
				861	compiled. This step uses traditional prepass scheduling techniques.</li>
				862	</ol>
				863
				864	<p>After all of these steps are complete, the SelectionDAG is destroyed and the
				865	rest of the code generation passes are run.</p>
				866
				867	<p>One great way to visualize what is going on here is to take advantage of a
Dan Gohman	d8d7107	2008-09-10 22:23:41 +0000	[diff] [blame]	868	few LLC command line options. The following options pop up a window displaying
				869	the SelectionDAG at specific times (if you only get errors printed to the console
				870	while using this, you probably
				871	<a href="ProgrammersManual.html#ViewGraph">need to configure your system</a> to
				872	add support for it).</p>
				873
				874	<ul>
				875	<li><tt>-view-dag-combine1-dags</tt> displays the DAG after being built, before
				876	the first optimization pass.</li>
				877	<li><tt>-view-legalize-dags</tt> displays the DAG before Legalization.</li>
				878	<li><tt>-view-dag-combine2-dags</tt> displays the DAG before the second
				879	optimization pass.</li>
				880	<li><tt>-view-isel-dags</tt> displays the DAG before the Select phase.</li>
				881	<li><tt>-view-sched-dags</tt> displays the DAG before Scheduling.</li>
				882	</ul>
				883
				884	<p>The <tt>-view-sunit-dags</tt> displays the Scheduler's dependency graph.
				885	This graph is based on the final SelectionDAG, with nodes that must be
				886	scheduled together bundled into a single scheduling-unit node, and with
Dan Gohman	a5486e2	2008-11-24 16:27:17 +0000	[diff] [blame]	887	immediate operands and other nodes that aren't relevant for scheduling
Dan Gohman	d8d7107	2008-09-10 22:23:41 +0000	[diff] [blame]	888	omitted.
Dan Gohman	3e80ef8	2007-10-15 21:07:59 +0000	[diff] [blame]	889	</p>
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	890
				891	</div>
				892
				893	<!-- _______________________________________________________________________ -->
				894	<div class="doc_subsubsection">
				895	<a name="selectiondag_build">Initial SelectionDAG Construction</a>
				896	</div>
				897
				898	<div class="doc_text">
				899
				900	<p>The initial SelectionDAG is naïvely peephole expanded from the LLVM
				901	input by the <tt>SelectionDAGLowering</tt> class in the
				902	<tt>lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp</tt> file. The intent of this
				903	pass is to expose as much low-level, target-specific details to the SelectionDAG
				904	as possible. This pass is mostly hard-coded (e.g. an LLVM <tt>add</tt> turns
Dan Gohman	6985124	2008-10-03 00:07:11 +0000	[diff] [blame]	905	into an <tt>SDNode add</tt> while a <tt>getelementptr</tt> is expanded into the
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	906	obvious arithmetic). This pass requires target-specific hooks to lower calls,
				907	returns, varargs, etc. For these features, the
				908	<tt><a href="#targetlowering">TargetLowering</a></tt> interface is used.</p>
				909
				910	</div>
				911
				912	<!-- _______________________________________________________________________ -->
				913	<div class="doc_subsubsection">
Dan Gohman	a5486e2	2008-11-24 16:27:17 +0000	[diff] [blame]	914	<a name="selectiondag_legalize_types">SelectionDAG LegalizeTypes Phase</a>
				915	</div>
				916
				917	<div class="doc_text">
				918
				919	<p>The Legalize phase is in charge of converting a DAG to only use the types
				920	that are natively supported by the target.</p>
				921
				922	<p>There are two main ways of converting values of unsupported scalar types
				923	to values of supported types: converting small types to
				924	larger types ("promoting"), and breaking up large integer types
				925	into smaller ones ("expanding"). For example, a target might require
				926	that all f32 values are promoted to f64 and that all i1/i8/i16 values
				927	are promoted to i32. The same target might require that all i64 values
				928	be expanded into pairs of i32 values. These changes can insert sign and
				929	zero extensions as needed to make sure that the final code has the same
				930	behavior as the input.</p>
				931
				932	<p>There are two main ways of converting values of unsupported vector types
				933	to value of supported types: splitting vector types, multiple times if
				934	necessary, until a legal type is found, and extending vector types by
				935	adding elements to the end to round them out to legal types ("widening").
				936	If a vector gets split all the way down to single-element parts with
				937	no supported vector type being found, the elements are converted to
				938	scalars ("scalarizing").</p>
				939
				940	<p>A target implementation tells the legalizer which types are supported
				941	(and which register class to use for them) by calling the
				942	<tt>addRegisterClass</tt> method in its TargetLowering constructor.</p>
				943
				944	</div>
				945
				946	<!-- _______________________________________________________________________ -->
				947	<div class="doc_subsubsection">
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	948	<a name="selectiondag_legalize">SelectionDAG Legalize Phase</a>
				949	</div>
				950
				951	<div class="doc_text">
				952
Dan Gohman	a5486e2	2008-11-24 16:27:17 +0000	[diff] [blame]	953	<p>The Legalize phase is in charge of converting a DAG to only use the
				954	operations that are natively supported by the target.</p>
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	955
Dan Gohman	a5486e2	2008-11-24 16:27:17 +0000	[diff] [blame]	956	<p>Targets often have weird constraints, such as not supporting every
				957	operation on every supported datatype (e.g. X86 does not support byte
				958	conditional moves and PowerPC does not support sign-extending loads from
				959	a 16-bit memory location). Legalize takes care of this by open-coding
				960	another sequence of operations to emulate the operation ("expansion"), by
				961	promoting one type to a larger type that supports the operation
				962	("promotion"), or by using a target-specific hook to implement the
				963	legalization ("custom").</p>
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	964
Dan Gohman	a5486e2	2008-11-24 16:27:17 +0000	[diff] [blame]	965	<p>A target implementation tells the legalizer which operations are not
				966	supported (and which of the above three actions to take) by calling the
				967	<tt>setOperationAction</tt> method in its <tt>TargetLowering</tt>
				968	constructor.</p>
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	969
Dan Gohman	a5486e2	2008-11-24 16:27:17 +0000	[diff] [blame]	970	<p>Prior to the existence of the Legalize passes, we required that every target
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	971	<a href="#selectiondag_optimize">selector</a> supported and handled every
				972	operator and type even if they are not natively supported. The introduction of
Dan Gohman	a5486e2	2008-11-24 16:27:17 +0000	[diff] [blame]	973	the Legalize phases allows all of the canonicalization patterns to be shared
				974	across targets, and makes it very easy to optimize the canonicalized code
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	975	because it is still in the form of a DAG.</p>
				976
				977	</div>
				978
				979	<!-- _______________________________________________________________________ -->
				980	<div class="doc_subsubsection">
				981	<a name="selectiondag_optimize">SelectionDAG Optimization Phase: the DAG
				982	Combiner</a>
				983	</div>
				984
				985	<div class="doc_text">
				986
Dan Gohman	a5486e2	2008-11-24 16:27:17 +0000	[diff] [blame]	987	<p>The SelectionDAG optimization phase is run multiple times for code generation,
				988	immediately after the DAG is built and once after each legalization. The first
				989	run of the pass allows the initial code to be cleaned up (e.g. performing
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	990	optimizations that depend on knowing that the operators have restricted type
Dan Gohman	a5486e2	2008-11-24 16:27:17 +0000	[diff] [blame]	991	inputs). Subsequent runs of the pass clean up the messy code generated by the
				992	Legalize passes, which allows Legalize to be very simple (it can focus on making
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	993	code legal instead of focusing on generating <em>good</em> and legal code).</p>
				994
				995	<p>One important class of optimizations performed is optimizing inserted sign
				996	and zero extension instructions. We currently use ad-hoc techniques, but could
				997	move to more rigorous techniques in the future. Here are some good papers on
				998	the subject:</p>
				999
				1000	<p>
				1001	"<a href="http://www.eecs.harvard.edu/~nr/pubs/widen-abstract.html">Widening
				1002	integer arithmetic</a>"<br>
				1003	Kevin Redwine and Norman Ramsey<br>
				1004	International Conference on Compiler Construction (CC) 2004
				1005	</p>
				1006
				1007
				1008	<p>
				1009	"<a href="http://portal.acm.org/citation.cfm?doid=512529.512552">Effective
				1010	sign extension elimination</a>"<br>
				1011	Motohiro Kawahito, Hideaki Komatsu, and Toshio Nakatani<br>
				1012	Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design
				1013	and Implementation.
				1014	</p>
				1015
				1016	</div>
				1017
				1018	<!-- _______________________________________________________________________ -->
				1019	<div class="doc_subsubsection">
				1020	<a name="selectiondag_select">SelectionDAG Select Phase</a>
				1021	</div>
				1022
				1023	<div class="doc_text">
				1024
				1025	<p>The Select phase is the bulk of the target-specific code for instruction
				1026	selection. This phase takes a legal SelectionDAG as input, pattern matches the
				1027	instructions supported by the target to this DAG, and produces a new DAG of
				1028	target code. For example, consider the following LLVM fragment:</p>
				1029
				1030	<div class="doc_code">
				1031	<pre>
				1032	%t1 = add float %W, %X
				1033	%t2 = mul float %t1, %Y
				1034	%t3 = add float %t2, %Z
				1035	</pre>
				1036	</div>
				1037
				1038	<p>This LLVM code corresponds to a SelectionDAG that looks basically like
				1039	this:</p>
				1040
				1041	<div class="doc_code">
				1042	<pre>
				1043	(fadd:f32 (fmul:f32 (fadd:f32 W, X), Y), Z)
				1044	</pre>
				1045	</div>
				1046
				1047	<p>If a target supports floating point multiply-and-add (FMA) operations, one
				1048	of the adds can be merged with the multiply. On the PowerPC, for example, the
				1049	output of the instruction selector might look like this DAG:</p>
				1050
				1051	<div class="doc_code">
				1052	<pre>
				1053	(FMADDS (FADDS W, X), Y, Z)
				1054	</pre>
				1055	</div>
				1056
				1057	<p>The <tt>FMADDS</tt> instruction is a ternary instruction that multiplies its
				1058	first two operands and adds the third (as single-precision floating-point
				1059	numbers). The <tt>FADDS</tt> instruction is a simple binary single-precision
				1060	add instruction. To perform this pattern match, the PowerPC backend includes
				1061	the following instruction definitions:</p>
				1062
				1063	<div class="doc_code">
				1064	<pre>
				1065	def FMADDS : AForm_1<59, 29,
				1066	(ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRC, F4RC:$FRB),
				1067	"fmadds $FRT, $FRA, $FRC, $FRB",
				1068	[<b>(set F4RC:$FRT, (fadd (fmul F4RC:$FRA, F4RC:$FRC),
				1069	F4RC:$FRB))</b>]>;
				1070	def FADDS : AForm_2<59, 21,
				1071	(ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRB),
				1072	"fadds $FRT, $FRA, $FRB",
				1073	[<b>(set F4RC:$FRT, (fadd F4RC:$FRA, F4RC:$FRB))</b>]>;
				1074	</pre>
				1075	</div>
				1076
				1077	<p>The portion of the instruction definition in bold indicates the pattern used
				1078	to match the instruction. The DAG operators (like <tt>fmul</tt>/<tt>fadd</tt>)
				1079	are defined in the <tt>lib/Target/TargetSelectionDAG.td</tt> file.
Dan Gohman	33bb04f	2008-11-24 16:35:31 +0000	[diff] [blame^]	1080	"<tt>F4RC</tt>" is the register class of the input and result values.</p>
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	1081
				1082	<p>The TableGen DAG instruction selector generator reads the instruction
				1083	patterns in the <tt>.td</tt> file and automatically builds parts of the pattern
				1084	matching code for your target. It has the following strengths:</p>
				1085
				1086	<ul>
				1087	<li>At compiler-compiler time, it analyzes your instruction patterns and tells
				1088	you if your patterns make sense or not.</li>
				1089	<li>It can handle arbitrary constraints on operands for the pattern match. In
				1090	particular, it is straight-forward to say things like "match any immediate
				1091	that is a 13-bit sign-extended value". For examples, see the
				1092	<tt>immSExt16</tt> and related <tt>tblgen</tt> classes in the PowerPC
				1093	backend.</li>
				1094	<li>It knows several important identities for the patterns defined. For
				1095	example, it knows that addition is commutative, so it allows the
				1096	<tt>FMADDS</tt> pattern above to match "<tt>(fadd X, (fmul Y, Z))</tt>" as
				1097	well as "<tt>(fadd (fmul X, Y), Z)</tt>", without the target author having
				1098	to specially handle this case.</li>
				1099	<li>It has a full-featured type-inferencing system. In particular, you should
				1100	rarely have to explicitly tell the system what type parts of your patterns
				1101	are. In the <tt>FMADDS</tt> case above, we didn't have to tell
				1102	<tt>tblgen</tt> that all of the nodes in the pattern are of type 'f32'. It
				1103	was able to infer and propagate this knowledge from the fact that
				1104	<tt>F4RC</tt> has type 'f32'.</li>
				1105	<li>Targets can define their own (and rely on built-in) "pattern fragments".
				1106	Pattern fragments are chunks of reusable patterns that get inlined into your
				1107	patterns during compiler-compiler time. For example, the integer
				1108	"<tt>(not x)</tt>" operation is actually defined as a pattern fragment that
				1109	expands as "<tt>(xor x, -1)</tt>", since the SelectionDAG does not have a
				1110	native '<tt>not</tt>' operation. Targets can define their own short-hand
				1111	fragments as they see fit. See the definition of '<tt>not</tt>' and
				1112	'<tt>ineg</tt>' for examples.</li>
				1113	<li>In addition to instructions, targets can specify arbitrary patterns that
				1114	map to one or more instructions using the 'Pat' class. For example,
				1115	the PowerPC has no way to load an arbitrary integer immediate into a
				1116	register in one instruction. To tell tblgen how to do this, it defines:
				1117	<br>
				1118	<br>
				1119	<div class="doc_code">
				1120	<pre>
				1121	// Arbitrary immediate support. Implement in terms of LIS/ORI.
				1122	def : Pat<(i32 imm:$imm),
				1123	(ORI (LIS (HI16 imm:$imm)), (LO16 imm:$imm))>;
				1124	</pre>
				1125	</div>
				1126	<br>
				1127	If none of the single-instruction patterns for loading an immediate into a
				1128	register match, this will be used. This rule says "match an arbitrary i32
				1129	immediate, turning it into an <tt>ORI</tt> ('or a 16-bit immediate') and an
				1130	<tt>LIS</tt> ('load 16-bit immediate, where the immediate is shifted to the
				1131	left 16 bits') instruction". To make this work, the
				1132	<tt>LO16</tt>/<tt>HI16</tt> node transformations are used to manipulate the
				1133	input immediate (in this case, take the high or low 16-bits of the
				1134	immediate).</li>
				1135	<li>While the system does automate a lot, it still allows you to write custom
				1136	C++ code to match special cases if there is something that is hard to
				1137	express.</li>
				1138	</ul>
				1139
				1140	<p>While it has many strengths, the system currently has some limitations,
				1141	primarily because it is a work in progress and is not yet finished:</p>
				1142
				1143	<ul>
				1144	<li>Overall, there is no way to define or match SelectionDAG nodes that define
				1145	multiple values (e.g. <tt>ADD_PARTS</tt>, <tt>LOAD</tt>, <tt>CALL</tt>,
				1146	etc). This is the biggest reason that you currently still <em>have to</em>
				1147	write custom C++ code for your instruction selector.</li>
				1148	<li>There is no great way to support matching complex addressing modes yet. In
				1149	the future, we will extend pattern fragments to allow them to define
				1150	multiple values (e.g. the four operands of the <a href="#x86_memory">X86
Dan Gohman	5ab9826	2007-12-13 20:43:47 +0000	[diff] [blame]	1151	addressing mode</a>, which are currently matched with custom C++ code).
				1152	In addition, we'll extend fragments so that a
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	1153	fragment can match multiple different patterns.</li>
				1154	<li>We don't automatically infer flags like isStore/isLoad yet.</li>
				1155	<li>We don't automatically generate the set of supported registers and
				1156	operations for the <a href="#selectiondag_legalize">Legalizer</a> yet.</li>
				1157	<li>We don't have a way of tying in custom legalized nodes yet.</li>
				1158	</ul>
				1159
				1160	<p>Despite these limitations, the instruction selector generator is still quite
				1161	useful for most of the binary and logical operations in typical instruction
				1162	sets. If you run into any problems or can't figure out how to do something,
				1163	please let Chris know!</p>
				1164
				1165	</div>
				1166
				1167	<!-- _______________________________________________________________________ -->
				1168	<div class="doc_subsubsection">
				1169	<a name="selectiondag_sched">SelectionDAG Scheduling and Formation Phase</a>
				1170	</div>
				1171
				1172	<div class="doc_text">
				1173
				1174	<p>The scheduling phase takes the DAG of target instructions from the selection
				1175	phase and assigns an order. The scheduler can pick an order depending on
				1176	various constraints of the machines (i.e. order for minimal register pressure or
				1177	try to cover instruction latencies). Once an order is established, the DAG is
				1178	converted to a list of <tt><a href="#machineinstr">MachineInstr</a></tt>s and
				1179	the SelectionDAG is destroyed.</p>
				1180
				1181	<p>Note that this phase is logically separate from the instruction selection
				1182	phase, but is tied to it closely in the code because it operates on
				1183	SelectionDAGs.</p>
				1184
				1185	</div>
				1186
				1187	<!-- _______________________________________________________________________ -->
				1188	<div class="doc_subsubsection">
				1189	<a name="selectiondag_future">Future directions for the SelectionDAG</a>
				1190	</div>
				1191
				1192	<div class="doc_text">
				1193
				1194	<ol>
				1195	<li>Optional function-at-a-time selection.</li>
				1196	<li>Auto-generate entire selector from <tt>.td</tt> file.</li>
				1197	</ol>
				1198
				1199	</div>
				1200
				1201	<!-- ======================================================================= -->
				1202	<div class="doc_subsection">
				1203	<a name="ssamco">SSA-based Machine Code Optimizations</a>
				1204	</div>
				1205	<div class="doc_text"><p>To Be Written</p></div>
				1206
				1207	<!-- ======================================================================= -->
				1208	<div class="doc_subsection">
				1209	<a name="liveintervals">Live Intervals</a>
				1210	</div>
				1211
				1212	<div class="doc_text">
				1213
				1214	<p>Live Intervals are the ranges (intervals) where a variable is <i>live</i>.
				1215	They are used by some <a href="#regalloc">register allocator</a> passes to
				1216	determine if two or more virtual registers which require the same physical
				1217	register are live at the same point in the program (i.e., they conflict). When
				1218	this situation occurs, one virtual register must be <i>spilled</i>.</p>
				1219
				1220	</div>
				1221
				1222	<!-- _______________________________________________________________________ -->
				1223	<div class="doc_subsubsection">
				1224	<a name="livevariable_analysis">Live Variable Analysis</a>
				1225	</div>
				1226
				1227	<div class="doc_text">
				1228
				1229	<p>The first step in determining the live intervals of variables is to
				1230	calculate the set of registers that are immediately dead after the
				1231	instruction (i.e., the instruction calculates the value, but it is
				1232	never used) and the set of registers that are used by the instruction,
				1233	but are never used after the instruction (i.e., they are killed). Live
				1234	variable information is computed for each <i>virtual</i> register and
				1235	<i>register allocatable</i> physical register in the function. This
				1236	is done in a very efficient manner because it uses SSA to sparsely
				1237	compute lifetime information for virtual registers (which are in SSA
				1238	form) and only has to track physical registers within a block. Before
				1239	register allocation, LLVM can assume that physical registers are only
				1240	live within a single basic block. This allows it to do a single,
				1241	local analysis to resolve physical register lifetimes within each
				1242	basic block. If a physical register is not register allocatable (e.g.,
				1243	a stack pointer or condition codes), it is not tracked.</p>
				1244
				1245	<p>Physical registers may be live in to or out of a function. Live in values
				1246	are typically arguments in registers. Live out values are typically return
				1247	values in registers. Live in values are marked as such, and are given a dummy
				1248	"defining" instruction during live intervals analysis. If the last basic block
				1249	of a function is a <tt>return</tt>, then it's marked as using all live out
				1250	values in the function.</p>
				1251
				1252	<p><tt>PHI</tt> nodes need to be handled specially, because the calculation
				1253	of the live variable information from a depth first traversal of the CFG of
				1254	the function won't guarantee that a virtual register used by the <tt>PHI</tt>
Dan Gohman	a5486e2	2008-11-24 16:27:17 +0000	[diff] [blame]	1255	node is defined before it's used. When a <tt>PHI</tt> node is encountered, only
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	1256	the definition is handled, because the uses will be handled in other basic
				1257	blocks.</p>
				1258
				1259	<p>For each <tt>PHI</tt> node of the current basic block, we simulate an
				1260	assignment at the end of the current basic block and traverse the successor
				1261	basic blocks. If a successor basic block has a <tt>PHI</tt> node and one of
				1262	the <tt>PHI</tt> node's operands is coming from the current basic block,
				1263	then the variable is marked as <i>alive</i> within the current basic block
				1264	and all of its predecessor basic blocks, until the basic block with the
				1265	defining instruction is encountered.</p>
				1266
				1267	</div>
				1268
				1269	<!-- _______________________________________________________________________ -->
				1270	<div class="doc_subsubsection">
				1271	<a name="liveintervals_analysis">Live Intervals Analysis</a>
				1272	</div>
				1273
				1274	<div class="doc_text">
				1275
				1276	<p>We now have the information available to perform the live intervals analysis
				1277	and build the live intervals themselves. We start off by numbering the basic
				1278	blocks and machine instructions. We then handle the "live-in" values. These
				1279	are in physical registers, so the physical register is assumed to be killed by
				1280	the end of the basic block. Live intervals for virtual registers are computed
				1281	for some ordering of the machine instructions <tt>[1, N]</tt>. A live interval
Dan Gohman	8e58bc5	2008-10-14 17:00:38 +0000	[diff] [blame]	1282	is an interval <tt>[i, j)</tt>, where <tt>1 <= i <= j < N</tt>, for which a
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	1283	variable is live.</p>
				1284
				1285	<p><i><b>More to come...</b></i></p>
				1286
				1287	</div>
				1288
				1289	<!-- ======================================================================= -->
				1290	<div class="doc_subsection">
				1291	<a name="regalloc">Register Allocation</a>
				1292	</div>
				1293
				1294	<div class="doc_text">
				1295
				1296	<p>The <i>Register Allocation problem</i> consists in mapping a program
				1297	<i>P<sub>v</sub></i>, that can use an unbounded number of virtual
				1298	registers, to a program <i>P<sub>p</sub></i> that contains a finite
				1299	(possibly small) number of physical registers. Each target architecture has
				1300	a different number of physical registers. If the number of physical
				1301	registers is not enough to accommodate all the virtual registers, some of
				1302	them will have to be mapped into memory. These virtuals are called
				1303	<i>spilled virtuals</i>.</p>
				1304
				1305	</div>
				1306
				1307	<!-- _______________________________________________________________________ -->
				1308
				1309	<div class="doc_subsubsection">
				1310	<a name="regAlloc_represent">How registers are represented in LLVM</a>
				1311	</div>
				1312
				1313	<div class="doc_text">
				1314
				1315	<p>In LLVM, physical registers are denoted by integer numbers that
				1316	normally range from 1 to 1023. To see how this numbering is defined
				1317	for a particular architecture, you can read the
				1318	<tt>GenRegisterNames.inc</tt> file for that architecture. For
				1319	instance, by inspecting
				1320	<tt>lib/Target/X86/X86GenRegisterNames.inc</tt> we see that the 32-bit
				1321	register <tt>EAX</tt> is denoted by 15, and the MMX register
				1322	<tt>MM0</tt> is mapped to 48.</p>
				1323
				1324	<p>Some architectures contain registers that share the same physical
				1325	location. A notable example is the X86 platform. For instance, in the
				1326	X86 architecture, the registers <tt>EAX</tt>, <tt>AX</tt> and
				1327	<tt>AL</tt> share the first eight bits. These physical registers are
				1328	marked as <i>aliased</i> in LLVM. Given a particular architecture, you
				1329	can check which registers are aliased by inspecting its
				1330	<tt>RegisterInfo.td</tt> file. Moreover, the method
Dan Gohman	1e57df3	2008-02-10 18:45:23 +0000	[diff] [blame]	1331	<tt>TargetRegisterInfo::getAliasSet(p_reg)</tt> returns an array containing
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	1332	all the physical registers aliased to the register <tt>p_reg</tt>.</p>
				1333
				1334	<p>Physical registers, in LLVM, are grouped in <i>Register Classes</i>.
				1335	Elements in the same register class are functionally equivalent, and can
				1336	be interchangeably used. Each virtual register can only be mapped to
				1337	physical registers of a particular class. For instance, in the X86
				1338	architecture, some virtuals can only be allocated to 8 bit registers.
				1339	A register class is described by <tt>TargetRegisterClass</tt> objects.
				1340	To discover if a virtual register is compatible with a given physical,
				1341	this code can be used:
				1342	</p>
				1343
				1344	<div class="doc_code">
				1345	<pre>
				1346	bool RegMapping_Fer::compatible_class(MachineFunction &mf,
				1347	unsigned v_reg,
				1348	unsigned p_reg) {
Dan Gohman	1e57df3	2008-02-10 18:45:23 +0000	[diff] [blame]	1349	assert(TargetRegisterInfo::isPhysicalRegister(p_reg) &&
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	1350	"Target register must be physical");
Chris Lattner	b70e151	2007-12-31 04:16:08 +0000	[diff] [blame]	1351	const TargetRegisterClass *trc = mf.getRegInfo().getRegClass(v_reg);
				1352	return trc->contains(p_reg);
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	1353	}
				1354	</pre>
				1355	</div>
				1356
				1357	<p>Sometimes, mostly for debugging purposes, it is useful to change
				1358	the number of physical registers available in the target
				1359	architecture. This must be done statically, inside the
				1360	<tt>TargetRegsterInfo.td</tt> file. Just <tt>grep</tt> for
				1361	<tt>RegisterClass</tt>, the last parameter of which is a list of
				1362	registers. Just commenting some out is one simple way to avoid them
				1363	being used. A more polite way is to explicitly exclude some registers
				1364	from the <i>allocation order</i>. See the definition of the
				1365	<tt>GR</tt> register class in
				1366	<tt>lib/Target/IA64/IA64RegisterInfo.td</tt> for an example of this
				1367	(e.g., <tt>numReservedRegs</tt> registers are hidden.)</p>
				1368
				1369	<p>Virtual registers are also denoted by integer numbers. Contrary to
				1370	physical registers, different virtual registers never share the same
				1371	number. The smallest virtual register is normally assigned the number
				1372	1024. This may change, so, in order to know which is the first virtual
				1373	register, you should access
Dan Gohman	1e57df3	2008-02-10 18:45:23 +0000	[diff] [blame]	1374	<tt>TargetRegisterInfo::FirstVirtualRegister</tt>. Any register whose
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	1375	number is greater than or equal to
Dan Gohman	1e57df3	2008-02-10 18:45:23 +0000	[diff] [blame]	1376	<tt>TargetRegisterInfo::FirstVirtualRegister</tt> is considered a virtual
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	1377	register. Whereas physical registers are statically defined in a
				1378	<tt>TargetRegisterInfo.td</tt> file and cannot be created by the
				1379	application developer, that is not the case with virtual registers.
				1380	In order to create new virtual registers, use the method
Chris Lattner	b70e151	2007-12-31 04:16:08 +0000	[diff] [blame]	1381	<tt>MachineRegisterInfo::createVirtualRegister()</tt>. This method will return a
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	1382	virtual register with the highest code.
				1383	</p>
				1384
				1385	<p>Before register allocation, the operands of an instruction are
				1386	mostly virtual registers, although physical registers may also be
				1387	used. In order to check if a given machine operand is a register, use
				1388	the boolean function <tt>MachineOperand::isRegister()</tt>. To obtain
				1389	the integer code of a register, use
				1390	<tt>MachineOperand::getReg()</tt>. An instruction may define or use a
				1391	register. For instance, <tt>ADD reg:1026 := reg:1025 reg:1024</tt>
				1392	defines the registers 1024, and uses registers 1025 and 1026. Given a
				1393	register operand, the method <tt>MachineOperand::isUse()</tt> informs
				1394	if that register is being used by the instruction. The method
				1395	<tt>MachineOperand::isDef()</tt> informs if that registers is being
				1396	defined.</p>
				1397
				1398	<p>We will call physical registers present in the LLVM bitcode before
				1399	register allocation <i>pre-colored registers</i>. Pre-colored
				1400	registers are used in many different situations, for instance, to pass
				1401	parameters of functions calls, and to store results of particular
				1402	instructions. There are two types of pre-colored registers: the ones
				1403	<i>implicitly</i> defined, and those <i>explicitly</i>
				1404	defined. Explicitly defined registers are normal operands, and can be
				1405	accessed with <tt>MachineInstr::getOperand(int)::getReg()</tt>. In
				1406	order to check which registers are implicitly defined by an
				1407	instruction, use the
				1408	<tt>TargetInstrInfo::get(opcode)::ImplicitDefs</tt>, where
				1409	<tt>opcode</tt> is the opcode of the target instruction. One important
				1410	difference between explicit and implicit physical registers is that
				1411	the latter are defined statically for each instruction, whereas the
				1412	former may vary depending on the program being compiled. For example,
				1413	an instruction that represents a function call will always implicitly
				1414	define or use the same set of physical registers. To read the
				1415	registers implicitly used by an instruction, use
				1416	<tt>TargetInstrInfo::get(opcode)::ImplicitUses</tt>. Pre-colored
				1417	registers impose constraints on any register allocation algorithm. The
				1418	register allocator must make sure that none of them is been
				1419	overwritten by the values of virtual registers while still alive.</p>
				1420
				1421	</div>
				1422
				1423	<!-- _______________________________________________________________________ -->
				1424
				1425	<div class="doc_subsubsection">
				1426	<a name="regAlloc_howTo">Mapping virtual registers to physical registers</a>
				1427	</div>
				1428
				1429	<div class="doc_text">
				1430
				1431	<p>There are two ways to map virtual registers to physical registers (or to
				1432	memory slots). The first way, that we will call <i>direct mapping</i>,
Dan Gohman	1e57df3	2008-02-10 18:45:23 +0000	[diff] [blame]	1433	is based on the use of methods of the classes <tt>TargetRegisterInfo</tt>,
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	1434	and <tt>MachineOperand</tt>. The second way, that we will call
				1435	<i>indirect mapping</i>, relies on the <tt>VirtRegMap</tt> class in
				1436	order to insert loads and stores sending and getting values to and from
				1437	memory.</p>
				1438
				1439	<p>The direct mapping provides more flexibility to the developer of
				1440	the register allocator; however, it is more error prone, and demands
				1441	more implementation work. Basically, the programmer will have to
				1442	specify where load and store instructions should be inserted in the
				1443	target function being compiled in order to get and store values in
				1444	memory. To assign a physical register to a virtual register present in
				1445	a given operand, use <tt>MachineOperand::setReg(p_reg)</tt>. To insert
				1446	a store instruction, use
Dan Gohman	1e57df3	2008-02-10 18:45:23 +0000	[diff] [blame]	1447	<tt>TargetRegisterInfo::storeRegToStackSlot(...)</tt>, and to insert a load
				1448	instruction, use <tt>TargetRegisterInfo::loadRegFromStackSlot</tt>.</p>
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	1449
				1450	<p>The indirect mapping shields the application developer from the
				1451	complexities of inserting load and store instructions. In order to map
				1452	a virtual register to a physical one, use
				1453	<tt>VirtRegMap::assignVirt2Phys(vreg, preg)</tt>. In order to map a
				1454	certain virtual register to memory, use
				1455	<tt>VirtRegMap::assignVirt2StackSlot(vreg)</tt>. This method will
				1456	return the stack slot where <tt>vreg</tt>'s value will be located. If
				1457	it is necessary to map another virtual register to the same stack
				1458	slot, use <tt>VirtRegMap::assignVirt2StackSlot(vreg,
				1459	stack_location)</tt>. One important point to consider when using the
				1460	indirect mapping, is that even if a virtual register is mapped to
				1461	memory, it still needs to be mapped to a physical register. This
				1462	physical register is the location where the virtual register is
				1463	supposed to be found before being stored or after being reloaded.</p>
				1464
				1465	<p>If the indirect strategy is used, after all the virtual registers
				1466	have been mapped to physical registers or stack slots, it is necessary
				1467	to use a spiller object to place load and store instructions in the
				1468	code. Every virtual that has been mapped to a stack slot will be
				1469	stored to memory after been defined and will be loaded before being
				1470	used. The implementation of the spiller tries to recycle load/store
				1471	instructions, avoiding unnecessary instructions. For an example of how
				1472	to invoke the spiller, see
				1473	<tt>RegAllocLinearScan::runOnMachineFunction</tt> in
				1474	<tt>lib/CodeGen/RegAllocLinearScan.cpp</tt>.</p>
				1475
				1476	</div>
				1477
				1478	<!-- _______________________________________________________________________ -->
				1479	<div class="doc_subsubsection">
				1480	<a name="regAlloc_twoAddr">Handling two address instructions</a>
				1481	</div>
				1482
				1483	<div class="doc_text">
				1484
				1485	<p>With very rare exceptions (e.g., function calls), the LLVM machine
				1486	code instructions are three address instructions. That is, each
				1487	instruction is expected to define at most one register, and to use at
				1488	most two registers. However, some architectures use two address
				1489	instructions. In this case, the defined register is also one of the
				1490	used register. For instance, an instruction such as <tt>ADD %EAX,
				1491	%EBX</tt>, in X86 is actually equivalent to <tt>%EAX = %EAX +
				1492	%EBX</tt>.</p>
				1493
				1494	<p>In order to produce correct code, LLVM must convert three address
				1495	instructions that represent two address instructions into true two
				1496	address instructions. LLVM provides the pass
				1497	<tt>TwoAddressInstructionPass</tt> for this specific purpose. It must
				1498	be run before register allocation takes place. After its execution,
				1499	the resulting code may no longer be in SSA form. This happens, for
				1500	instance, in situations where an instruction such as <tt>%a = ADD %b
				1501	%c</tt> is converted to two instructions such as:</p>
				1502
				1503	<div class="doc_code">
				1504	<pre>
				1505	%a = MOVE %b
Dan Gohman	64bbd90	2008-06-13 17:55:57 +0000	[diff] [blame]	1506	%a = ADD %a %c
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	1507	</pre>
				1508	</div>
				1509
				1510	<p>Notice that, internally, the second instruction is represented as
Dan Gohman	64bbd90	2008-06-13 17:55:57 +0000	[diff] [blame]	1511	<tt>ADD %a[def/use] %c</tt>. I.e., the register operand <tt>%a</tt> is
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	1512	both used and defined by the instruction.</p>
				1513
				1514	</div>
				1515
				1516	<!-- _______________________________________________________________________ -->
				1517	<div class="doc_subsubsection">
				1518	<a name="regAlloc_ssaDecon">The SSA deconstruction phase</a>
				1519	</div>
				1520
				1521	<div class="doc_text">
				1522
				1523	<p>An important transformation that happens during register allocation is called
				1524	the <i>SSA Deconstruction Phase</i>. The SSA form simplifies many
				1525	analyses that are performed on the control flow graph of
				1526	programs. However, traditional instruction sets do not implement
				1527	PHI instructions. Thus, in order to generate executable code, compilers
				1528	must replace PHI instructions with other instructions that preserve their
				1529	semantics.</p>
				1530
				1531	<p>There are many ways in which PHI instructions can safely be removed
				1532	from the target code. The most traditional PHI deconstruction
				1533	algorithm replaces PHI instructions with copy instructions. That is
				1534	the strategy adopted by LLVM. The SSA deconstruction algorithm is
				1535	implemented in n<tt>lib/CodeGen/>PHIElimination.cpp</tt>. In order to
				1536	invoke this pass, the identifier <tt>PHIEliminationID</tt> must be
				1537	marked as required in the code of the register allocator.</p>
				1538
				1539	</div>
				1540
				1541	<!-- _______________________________________________________________________ -->
				1542	<div class="doc_subsubsection">
				1543	<a name="regAlloc_fold">Instruction folding</a>
				1544	</div>
				1545
				1546	<div class="doc_text">
				1547
				1548	<p><i>Instruction folding</i> is an optimization performed during
				1549	register allocation that removes unnecessary copy instructions. For
				1550	instance, a sequence of instructions such as:</p>
				1551
				1552	<div class="doc_code">
				1553	<pre>
				1554	%EBX = LOAD %mem_address
				1555	%EAX = COPY %EBX
				1556	</pre>
				1557	</div>
				1558
Dan Gohman	33bb04f	2008-11-24 16:35:31 +0000	[diff] [blame^]	1559	<p>can be safely substituted by the single instruction:</p>
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	1560
				1561	<div class="doc_code">
				1562	<pre>
				1563	%EAX = LOAD %mem_address
				1564	</pre>
				1565	</div>
				1566
				1567	<p>Instructions can be folded with the
Dan Gohman	1e57df3	2008-02-10 18:45:23 +0000	[diff] [blame]	1568	<tt>TargetRegisterInfo::foldMemoryOperand(...)</tt> method. Care must be
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	1569	taken when folding instructions; a folded instruction can be quite
				1570	different from the original instruction. See
				1571	<tt>LiveIntervals::addIntervalsForSpills</tt> in
				1572	<tt>lib/CodeGen/LiveIntervalAnalysis.cpp</tt> for an example of its use.</p>
				1573
				1574	</div>
				1575
				1576	<!-- _______________________________________________________________________ -->
				1577
				1578	<div class="doc_subsubsection">
				1579	<a name="regAlloc_builtIn">Built in register allocators</a>
				1580	</div>
				1581
				1582	<div class="doc_text">
				1583
				1584	<p>The LLVM infrastructure provides the application developer with
				1585	three different register allocators:</p>
				1586
				1587	<ul>
				1588	<li><i>Simple</i> - This is a very simple implementation that does
				1589	not keep values in registers across instructions. This register
				1590	allocator immediately spills every value right after it is
				1591	computed, and reloads all used operands from memory to temporary
				1592	registers before each instruction.</li>
				1593	<li><i>Local</i> - This register allocator is an improvement on the
				1594	<i>Simple</i> implementation. It allocates registers on a basic
				1595	block level, attempting to keep values in registers and reusing
				1596	registers as appropriate.</li>
				1597	<li><i>Linear Scan</i> - <i>The default allocator</i>. This is the
				1598	well-know linear scan register allocator. Whereas the
				1599	<i>Simple</i> and <i>Local</i> algorithms use a direct mapping
				1600	implementation technique, the <i>Linear Scan</i> implementation
				1601	uses a spiller in order to place load and stores.</li>
				1602	</ul>
				1603
				1604	<p>The type of register allocator used in <tt>llc</tt> can be chosen with the
				1605	command line option <tt>-regalloc=...</tt>:</p>
				1606
				1607	<div class="doc_code">
				1608	<pre>
				1609	$ llc -f -regalloc=simple file.bc -o sp.s;
				1610	$ llc -f -regalloc=local file.bc -o lc.s;
				1611	$ llc -f -regalloc=linearscan file.bc -o ln.s;
				1612	</pre>
				1613	</div>
				1614
				1615	</div>
				1616
				1617	<!-- ======================================================================= -->
				1618	<div class="doc_subsection">
				1619	<a name="proepicode">Prolog/Epilog Code Insertion</a>
				1620	</div>
				1621	<div class="doc_text"><p>To Be Written</p></div>
				1622	<!-- ======================================================================= -->
				1623	<div class="doc_subsection">
				1624	<a name="latemco">Late Machine Code Optimizations</a>
				1625	</div>
				1626	<div class="doc_text"><p>To Be Written</p></div>
				1627	<!-- ======================================================================= -->
				1628	<div class="doc_subsection">
				1629	<a name="codeemit">Code Emission</a>
				1630	</div>
				1631	<div class="doc_text"><p>To Be Written</p></div>
				1632	<!-- _______________________________________________________________________ -->
				1633	<div class="doc_subsubsection">
				1634	<a name="codeemit_asm">Generating Assembly Code</a>
				1635	</div>
				1636	<div class="doc_text"><p>To Be Written</p></div>
				1637	<!-- _______________________________________________________________________ -->
				1638	<div class="doc_subsubsection">
				1639	<a name="codeemit_bin">Generating Binary Machine Code</a>
				1640	</div>
				1641
				1642	<div class="doc_text">
				1643	<p>For the JIT or <tt>.o</tt> file writer</p>
				1644	</div>
				1645
				1646
				1647	<!-- *********************************************************************** -->
				1648	<div class="doc_section">
				1649	<a name="targetimpls">Target-specific Implementation Notes</a>
				1650	</div>
				1651	<!-- *********************************************************************** -->
				1652
				1653	<div class="doc_text">
				1654
				1655	<p>This section of the document explains features or design decisions that
				1656	are specific to the code generator for a particular target.</p>
				1657
				1658	</div>
				1659
Arnold Schwaighofer	0744492	2008-05-14 09:17:12 +0000	[diff] [blame]	1660	<!-- ======================================================================= -->
				1661	<div class="doc_subsection">
				1662	<a name="tailcallopt">Tail call optimization</a>
				1663	</div>
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	1664
Arnold Schwaighofer	0744492	2008-05-14 09:17:12 +0000	[diff] [blame]	1665	<div class="doc_text">
				1666	<p>Tail call optimization, callee reusing the stack of the caller, is currently supported on x86/x86-64 and PowerPC. It is performed if:
				1667	<ul>
				1668	<li>Caller and callee have the calling convention <tt>fastcc</tt>.</li>
				1669	<li>The call is a tail call - in tail position (ret immediately follows call and ret uses value of call or is void).</li>
				1670	<li>Option <tt>-tailcallopt</tt> is enabled.</li>
				1671	<li>Platform specific constraints are met.</li>
				1672	</ul>
				1673	</p>
				1674
				1675	<p>x86/x86-64 constraints:
				1676	<ul>
				1677	<li>No variable argument lists are used.</li>
				1678	<li>On x86-64 when generating GOT/PIC code only module-local calls (visibility = hidden or protected) are supported.</li>
				1679	</ul>
				1680	</p>
				1681	<p>PowerPC constraints:
				1682	<ul>
				1683	<li>No variable argument lists are used.</li>
				1684	<li>No byval parameters are used.</li>
				1685	<li>On ppc32/64 GOT/PIC only module-local calls (visibility = hidden or protected) are supported.</li>
				1686	</ul>
				1687	</p>
				1688	<p>Example:</p>
				1689	<p>Call as <tt>llc -tailcallopt test.ll</tt>.
				1690	<div class="doc_code">
				1691	<pre>
				1692	declare fastcc i32 @tailcallee(i32 inreg %a1, i32 inreg %a2, i32 %a3, i32 %a4)
				1693
				1694	define fastcc i32 @tailcaller(i32 %in1, i32 %in2) {
				1695	%l1 = add i32 %in1, %in2
				1696	%tmp = tail call fastcc i32 @tailcallee(i32 %in1 inreg, i32 %in2 inreg, i32 %in1, i32 %l1)
				1697	ret i32 %tmp
				1698	}</pre>
				1699	</div>
				1700	</p>
				1701	<p>Implications of <tt>-tailcallopt</tt>:</p>
				1702	<p>To support tail call optimization in situations where the callee has more arguments than the caller a 'callee pops arguments' convention is used. This currently causes each <tt>fastcc</tt> call that is not tail call optimized (because one or more of above constraints are not met) to be followed by a readjustment of the stack. So performance might be worse in such cases.</p>
				1703	<p>On x86 and x86-64 one register is reserved for indirect tail calls (e.g via a function pointer). So there is one less register for integer argument passing. For x86 this means 2 registers (if <tt>inreg</tt> parameter attribute is used) and for x86-64 this means 5 register are used.</p>
				1704	</div>
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	1705	<!-- ======================================================================= -->
				1706	<div class="doc_subsection">
				1707	<a name="x86">The X86 backend</a>
				1708	</div>
				1709
				1710	<div class="doc_text">
				1711
				1712	<p>The X86 code generator lives in the <tt>lib/Target/X86</tt> directory. This
Dan Gohman	5ab9826	2007-12-13 20:43:47 +0000	[diff] [blame]	1713	code generator is capable of targeting a variety of x86-32 and x86-64
				1714	processors, and includes support for ISA extensions such as MMX and SSE.
				1715	</p>
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	1716
				1717	</div>
				1718
				1719	<!-- _______________________________________________________________________ -->
				1720	<div class="doc_subsubsection">
				1721	<a name="x86_tt">X86 Target Triples Supported</a>
				1722	</div>
				1723
				1724	<div class="doc_text">
				1725
				1726	<p>The following are the known target triples that are supported by the X86
				1727	backend. This is not an exhaustive list, and it would be useful to add those
				1728	that people test.</p>
				1729
				1730	<ul>
				1731	<li><b>i686-pc-linux-gnu</b> - Linux</li>
				1732	<li><b>i386-unknown-freebsd5.3</b> - FreeBSD 5.3</li>
				1733	<li><b>i686-pc-cygwin</b> - Cygwin on Win32</li>
				1734	<li><b>i686-pc-mingw32</b> - MingW on Win32</li>
				1735	<li><b>i386-pc-mingw32msvc</b> - MingW crosscompiler on Linux</li>
				1736	<li><b>i686-apple-darwin*</b> - Apple Darwin on X86</li>
				1737	</ul>
				1738
				1739	</div>
				1740
				1741	<!-- _______________________________________________________________________ -->
				1742	<div class="doc_subsubsection">
				1743	<a name="x86_cc">X86 Calling Conventions supported</a>
				1744	</div>
				1745
				1746
				1747	<div class="doc_text">
				1748
Dan Gohman	a5486e2	2008-11-24 16:27:17 +0000	[diff] [blame]	1749	<p>The following target-specific calling conventions are known to backend:</p>
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	1750
				1751	<ul>
				1752	<li><b>x86_StdCall</b> - stdcall calling convention seen on Microsoft Windows
				1753	platform (CC ID = 64).</li>
				1754	<li><b>x86_FastCall</b> - fastcall calling convention seen on Microsoft Windows
				1755	platform (CC ID = 65).</li>
				1756	</ul>
				1757
				1758	</div>
				1759
				1760	<!-- _______________________________________________________________________ -->
				1761	<div class="doc_subsubsection">
				1762	<a name="x86_memory">Representing X86 addressing modes in MachineInstrs</a>
				1763	</div>
				1764
				1765	<div class="doc_text">
				1766
				1767	<p>The x86 has a very flexible way of accessing memory. It is capable of
				1768	forming memory addresses of the following expression directly in integer
				1769	instructions (which use ModR/M addressing):</p>
				1770
				1771	<div class="doc_code">
				1772	<pre>
				1773	Base + [1,2,4,8] * IndexReg + Disp32
				1774	</pre>
				1775	</div>
				1776
				1777	<p>In order to represent this, LLVM tracks no less than 4 operands for each
				1778	memory operand of this form. This means that the "load" form of '<tt>mov</tt>'
				1779	has the following <tt>MachineOperand</tt>s in this order:</p>
				1780
				1781	<pre>
				1782	Index: 0 \| 1 2 3 4
				1783	Meaning: DestReg, \| BaseReg, Scale, IndexReg, Displacement
				1784	OperandTy: VirtReg, \| VirtReg, UnsImm, VirtReg, SignExtImm
				1785	</pre>
				1786
				1787	<p>Stores, and all other instructions, treat the four memory operands in the
				1788	same way and in the same order.</p>
				1789
				1790	</div>
				1791
				1792	<!-- _______________________________________________________________________ -->
				1793	<div class="doc_subsubsection">
				1794	<a name="x86_names">Instruction naming</a>
				1795	</div>
				1796
				1797	<div class="doc_text">
				1798
				1799	<p>An instruction name consists of the base name, a default operand size, and a
				1800	a character per operand with an optional special size. For example:</p>
				1801
				1802	<p>
				1803	<tt>ADD8rr</tt> -> add, 8-bit register, 8-bit register<br>
				1804	<tt>IMUL16rmi</tt> -> imul, 16-bit register, 16-bit memory, 16-bit immediate<br>
				1805	<tt>IMUL16rmi8</tt> -> imul, 16-bit register, 16-bit memory, 8-bit immediate<br>
				1806	<tt>MOVSX32rm16</tt> -> movsx, 32-bit register, 16-bit memory
				1807	</p>
				1808
				1809	</div>
				1810
				1811	<!-- ======================================================================= -->
				1812	<div class="doc_subsection">
				1813	<a name="ppc">The PowerPC backend</a>
				1814	</div>
				1815
				1816	<div class="doc_text">
				1817	<p>The PowerPC code generator lives in the lib/Target/PowerPC directory. The
				1818	code generation is retargetable to several variations or <i>subtargets</i> of
				1819	the PowerPC ISA; including ppc32, ppc64 and altivec.
				1820	</p>
				1821	</div>
				1822
				1823	<!-- _______________________________________________________________________ -->
				1824	<div class="doc_subsubsection">
				1825	<a name="ppc_abi">LLVM PowerPC ABI</a>
				1826	</div>
				1827
				1828	<div class="doc_text">
				1829	<p>LLVM follows the AIX PowerPC ABI, with two deviations. LLVM uses a PC
				1830	relative (PIC) or static addressing for accessing global values, so no TOC (r2)
				1831	is used. Second, r31 is used as a frame pointer to allow dynamic growth of a
				1832	stack frame. LLVM takes advantage of having no TOC to provide space to save
				1833	the frame pointer in the PowerPC linkage area of the caller frame. Other
				1834	details of PowerPC ABI can be found at <a href=
				1835	"http://developer.apple.com/documentation/DeveloperTools/Conceptual/LowLevelABI/Articles/32bitPowerPC.html"
				1836	>PowerPC ABI.</a> Note: This link describes the 32 bit ABI. The
				1837	64 bit ABI is similar except space for GPRs are 8 bytes wide (not 4) and r13 is
				1838	reserved for system use.</p>
				1839	</div>
				1840
				1841	<!-- _______________________________________________________________________ -->
				1842	<div class="doc_subsubsection">
				1843	<a name="ppc_frame">Frame Layout</a>
				1844	</div>
				1845
				1846	<div class="doc_text">
				1847	<p>The size of a PowerPC frame is usually fixed for the duration of a
				1848	function’s invocation. Since the frame is fixed size, all references into
				1849	the frame can be accessed via fixed offsets from the stack pointer. The
				1850	exception to this is when dynamic alloca or variable sized arrays are present,
				1851	then a base pointer (r31) is used as a proxy for the stack pointer and stack
				1852	pointer is free to grow or shrink. A base pointer is also used if llvm-gcc is
				1853	not passed the -fomit-frame-pointer flag. The stack pointer is always aligned to
				1854	16 bytes, so that space allocated for altivec vectors will be properly
				1855	aligned.</p>
Dan Gohman	a5486e2	2008-11-24 16:27:17 +0000	[diff] [blame]	1856	<p>An invocation frame is laid out as follows (low memory at top);</p>
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	1857	</div>
				1858
				1859	<div class="doc_text">
				1860	<table class="layout">
				1861	<tr>
				1862	<td>Linkage<br><br></td>
				1863	</tr>
				1864	<tr>
				1865	<td>Parameter area<br><br></td>
				1866	</tr>
				1867	<tr>
				1868	<td>Dynamic area<br><br></td>
				1869	</tr>
				1870	<tr>
				1871	<td>Locals area<br><br></td>
				1872	</tr>
				1873	<tr>
				1874	<td>Saved registers area<br><br></td>
				1875	</tr>
				1876	<tr style="border-style: none hidden none hidden;">
				1877	<td><br></td>
				1878	</tr>
				1879	<tr>
				1880	<td>Previous Frame<br><br></td>
				1881	</tr>
				1882	</table>
				1883	</div>
				1884
				1885	<div class="doc_text">
				1886	<p>The <i>linkage</i> area is used by a callee to save special registers prior
				1887	to allocating its own frame. Only three entries are relevant to LLVM. The
				1888	first entry is the previous stack pointer (sp), aka link. This allows probing
				1889	tools like gdb or exception handlers to quickly scan the frames in the stack. A
				1890	function epilog can also use the link to pop the frame from the stack. The
				1891	third entry in the linkage area is used to save the return address from the lr
				1892	register. Finally, as mentioned above, the last entry is used to save the
				1893	previous frame pointer (r31.) The entries in the linkage area are the size of a
				1894	GPR, thus the linkage area is 24 bytes long in 32 bit mode and 48 bytes in 64
				1895	bit mode.</p>
				1896	</div>
				1897
				1898	<div class="doc_text">
				1899	<p>32 bit linkage area</p>
				1900	<table class="layout">
				1901	<tr>
				1902	<td>0</td>
				1903	<td>Saved SP (r1)</td>
				1904	</tr>
				1905	<tr>
				1906	<td>4</td>
				1907	<td>Saved CR</td>
				1908	</tr>
				1909	<tr>
				1910	<td>8</td>
				1911	<td>Saved LR</td>
				1912	</tr>
				1913	<tr>
				1914	<td>12</td>
				1915	<td>Reserved</td>
				1916	</tr>
				1917	<tr>
				1918	<td>16</td>
				1919	<td>Reserved</td>
				1920	</tr>
				1921	<tr>
				1922	<td>20</td>
				1923	<td>Saved FP (r31)</td>
				1924	</tr>
				1925	</table>
				1926	</div>
				1927
				1928	<div class="doc_text">
				1929	<p>64 bit linkage area</p>
				1930	<table class="layout">
				1931	<tr>
				1932	<td>0</td>
				1933	<td>Saved SP (r1)</td>
				1934	</tr>
				1935	<tr>
				1936	<td>8</td>
				1937	<td>Saved CR</td>
				1938	</tr>
				1939	<tr>
				1940	<td>16</td>
				1941	<td>Saved LR</td>
				1942	</tr>
				1943	<tr>
				1944	<td>24</td>
				1945	<td>Reserved</td>
				1946	</tr>
				1947	<tr>
				1948	<td>32</td>
				1949	<td>Reserved</td>
				1950	</tr>
				1951	<tr>
				1952	<td>40</td>
				1953	<td>Saved FP (r31)</td>
				1954	</tr>
				1955	</table>
				1956	</div>
				1957
				1958	<div class="doc_text">
				1959	<p>The <i>parameter area</i> is used to store arguments being passed to a callee
				1960	function. Following the PowerPC ABI, the first few arguments are actually
				1961	passed in registers, with the space in the parameter area unused. However, if
				1962	there are not enough registers or the callee is a thunk or vararg function,
				1963	these register arguments can be spilled into the parameter area. Thus, the
				1964	parameter area must be large enough to store all the parameters for the largest
Dan Gohman	a5486e2	2008-11-24 16:27:17 +0000	[diff] [blame]	1965	call sequence made by the caller. The size must also be minimally large enough
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	1966	to spill registers r3-r10. This allows callees blind to the call signature,
				1967	such as thunks and vararg functions, enough space to cache the argument
				1968	registers. Therefore, the parameter area is minimally 32 bytes (64 bytes in 64
				1969	bit mode.) Also note that since the parameter area is a fixed offset from the
				1970	top of the frame, that a callee can access its spilt arguments using fixed
				1971	offsets from the stack pointer (or base pointer.)</p>
				1972	</div>
				1973
				1974	<div class="doc_text">
				1975	<p>Combining the information about the linkage, parameter areas and alignment. A
				1976	stack frame is minimally 64 bytes in 32 bit mode and 128 bytes in 64 bit
				1977	mode.</p>
				1978	</div>
				1979
				1980	<div class="doc_text">
				1981	<p>The <i>dynamic area</i> starts out as size zero. If a function uses dynamic
				1982	alloca then space is added to the stack, the linkage and parameter areas are
				1983	shifted to top of stack, and the new space is available immediately below the
				1984	linkage and parameter areas. The cost of shifting the linkage and parameter
				1985	areas is minor since only the link value needs to be copied. The link value can
				1986	be easily fetched by adding the original frame size to the base pointer. Note
Dan Gohman	a5486e2	2008-11-24 16:27:17 +0000	[diff] [blame]	1987	that allocations in the dynamic space need to observe 16 byte alignment.</p>
Dan Gohman	f17a25c	2007-07-18 16:29:46 +0000	[diff] [blame]	1988	</div>
				1989
				1990	<div class="doc_text">
				1991	<p>The <i>locals area</i> is where the llvm compiler reserves space for local
				1992	variables.</p>
				1993	</div>
				1994
				1995	<div class="doc_text">
				1996	<p>The <i>saved registers area</i> is where the llvm compiler spills callee saved
				1997	registers on entry to the callee.</p>
				1998	</div>
				1999
				2000	<!-- _______________________________________________________________________ -->
				2001	<div class="doc_subsubsection">
				2002	<a name="ppc_prolog">Prolog/Epilog</a>
				2003	</div>
				2004
				2005	<div class="doc_text">
				2006	<p>The llvm prolog and epilog are the same as described in the PowerPC ABI, with
				2007	the following exceptions. Callee saved registers are spilled after the frame is
				2008	created. This allows the llvm epilog/prolog support to be common with other
				2009	targets. The base pointer callee saved register r31 is saved in the TOC slot of
				2010	linkage area. This simplifies allocation of space for the base pointer and
				2011	makes it convenient to locate programatically and during debugging.</p>
				2012	</div>
				2013
				2014	<!-- _______________________________________________________________________ -->
				2015	<div class="doc_subsubsection">
				2016	<a name="ppc_dynamic">Dynamic Allocation</a>
				2017	</div>
				2018
				2019	<div class="doc_text">
				2020	<p></p>
				2021	</div>
				2022
				2023	<div class="doc_text">
				2024	<p><i>TODO - More to come.</i></p>
				2025	</div>
				2026
				2027
				2028	<!-- *********************************************************************** -->
				2029	<hr>
				2030	<address>
				2031	<a href="http://jigsaw.w3.org/css-validator/check/referer"><img
				2032	src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
				2033	<a href="http://validator.w3.org/check/referer"><img
				2034	src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!" /></a>
				2035
				2036	<a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
				2037	<a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
				2038	Last modified: $Date$
				2039	</address>
				2040
				2041	</body>
				2042	</html>