Blame - llvm/docs/CodeGenerator.html - toolchain/llvm-project

blob: 8b1db7ac3da7ac4ee8f56919658ecb86daed6ada [file] [log] [blame]

Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
				2	"http://www.w3.org/TR/html4/strict.dtd">
				3	<html>
				4	<head>
Jim Laskey	5782584	2006-12-15 10:40:48 +0000	[diff] [blame]	5	<meta http-equiv="content-type" content="text/html; charset=utf-8">
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	6	<title>The LLVM Target-Independent Code Generator</title>
				7	<link rel="stylesheet" href="llvm.css" type="text/css">
				8	</head>
				9	<body>
				10
				11	<div class="doc_title">
				12	The LLVM Target-Independent Code Generator
				13	</div>
				14
				15	<ol>
				16	<li><a href="#introduction">Introduction</a>
				17	<ul>
				18	<li><a href="#required">Required components in the code generator</a></li>
Chris Lattner	acf3d62	2005-10-16 00:36:38 +0000	[diff] [blame]	19	<li><a href="#high-level-design">The high-level design of the code
				20	generator</a></li>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	21	<li><a href="#tablegen">Using TableGen for target description</a></li>
				22	</ul>
				23	</li>
				24	<li><a href="#targetdesc">Target description classes</a>
				25	<ul>
				26	<li><a href="#targetmachine">The <tt>TargetMachine</tt> class</a></li>
				27	<li><a href="#targetdata">The <tt>TargetData</tt> class</a></li>
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	28	<li><a href="#targetlowering">The <tt>TargetLowering</tt> class</a></li>
Dan Gohman	3a4be0f	2008-02-10 18:45:23 +0000	[diff] [blame]	29	<li><a href="#targetregisterinfo">The <tt>TargetRegisterInfo</tt> class</a></li>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	30	<li><a href="#targetinstrinfo">The <tt>TargetInstrInfo</tt> class</a></li>
				31	<li><a href="#targetframeinfo">The <tt>TargetFrameInfo</tt> class</a></li>
Chris Lattner	c9afa28	2005-10-16 17:06:07 +0000	[diff] [blame]	32	<li><a href="#targetsubtarget">The <tt>TargetSubtarget</tt> class</a></li>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	33	<li><a href="#targetjitinfo">The <tt>TargetJITInfo</tt> class</a></li>
				34	</ul>
				35	</li>
				36	<li><a href="#codegendesc">Machine code description classes</a>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	37	<ul>
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	38	<li><a href="#machineinstr">The <tt>MachineInstr</tt> class</a></li>
Chris Lattner	d6f1a33	2005-10-16 18:31:08 +0000	[diff] [blame]	39	<li><a href="#machinebasicblock">The <tt>MachineBasicBlock</tt>
				40	class</a></li>
				41	<li><a href="#machinefunction">The <tt>MachineFunction</tt> class</a></li>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	42	</ul>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	43	</li>
				44	<li><a href="#codegenalgs">Target-independent code generation algorithms</a>
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	45	<ul>
				46	<li><a href="#instselect">Instruction Selection</a>
				47	<ul>
				48	<li><a href="#selectiondag_intro">Introduction to SelectionDAGs</a></li>
				49	<li><a href="#selectiondag_process">SelectionDAG Code Generation
				50	Process</a></li>
				51	<li><a href="#selectiondag_build">Initial SelectionDAG
				52	Construction</a></li>
Dan Gohman	1e6f511	2008-11-24 16:27:17 +0000	[diff] [blame]	53	<li><a href="#selectiondag_legalize_types">SelectionDAG LegalizeTypes Phase</a></li>
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	54	<li><a href="#selectiondag_legalize">SelectionDAG Legalize Phase</a></li>
				55	<li><a href="#selectiondag_optimize">SelectionDAG Optimization
Chris Lattner	acf3d62	2005-10-16 00:36:38 +0000	[diff] [blame]	56	Phase: the DAG Combiner</a></li>
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	57	<li><a href="#selectiondag_select">SelectionDAG Select Phase</a></li>
Chris Lattner	d6f1a33	2005-10-16 18:31:08 +0000	[diff] [blame]	58	<li><a href="#selectiondag_sched">SelectionDAG Scheduling and Formation
Chris Lattner	acf3d62	2005-10-16 00:36:38 +0000	[diff] [blame]	59	Phase</a></li>
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	60	<li><a href="#selectiondag_future">Future directions for the
				61	SelectionDAG</a></li>
				62	</ul></li>
Bill Wendling	d495bd0	2006-09-06 18:42:41 +0000	[diff] [blame]	63	<li><a href="#liveintervals">Live Intervals</a>
Bill Wendling	bb902cf	2006-09-04 23:35:52 +0000	[diff] [blame]	64	<ul>
				65	<li><a href="#livevariable_analysis">Live Variable Analysis</a></li>
Bill Wendling	d495bd0	2006-09-06 18:42:41 +0000	[diff] [blame]	66	<li><a href="#liveintervals_analysis">Live Intervals Analysis</a></li>
Bill Wendling	bb902cf	2006-09-04 23:35:52 +0000	[diff] [blame]	67	</ul></li>
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	68	<li><a href="#regalloc">Register Allocation</a>
				69	<ul>
				70	<li><a href="#regAlloc_represent">How registers are represented in
				71	LLVM</a></li>
				72	<li><a href="#regAlloc_howTo">Mapping virtual registers to physical
				73	registers</a></li>
				74	<li><a href="#regAlloc_twoAddr">Handling two address instructions</a></li>
				75	<li><a href="#regAlloc_ssaDecon">The SSA deconstruction phase</a></li>
				76	<li><a href="#regAlloc_fold">Instruction folding</a></li>
				77	<li><a href="#regAlloc_builtIn">Built in register allocators</a></li>
				78	</ul></li>
Chris Lattner	d6f1a33	2005-10-16 18:31:08 +0000	[diff] [blame]	79	<li><a href="#codeemit">Code Emission</a>
				80	<ul>
				81	<li><a href="#codeemit_asm">Generating Assembly Code</a></li>
				82	<li><a href="#codeemit_bin">Generating Binary Machine Code</a></li>
				83	</ul></li>
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	84	</ul>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	85	</li>
Chris Lattner	d6f1a33	2005-10-16 18:31:08 +0000	[diff] [blame]	86	<li><a href="#targetimpls">Target-specific Implementation Notes</a>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	87	<ul>
Arnold Schwaighofer	2c6b888	2008-05-14 09:17:12 +0000	[diff] [blame]	88	<li><a href="#tailcallopt">Tail call optimization</a></li>
Evan Cheng	5967649	2010-03-08 21:05:02 +0000	[diff] [blame]	89	<li><a href="#sibcallopt">Sibling call optimization</a></li>
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	90	<li><a href="#x86">The X86 backend</a></li>
Jim Laskey	5782584	2006-12-15 10:40:48 +0000	[diff] [blame]	91	<li><a href="#ppc">The PowerPC backend</a>
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	92	<ul>
				93	<li><a href="#ppc_abi">LLVM PowerPC ABI</a></li>
				94	<li><a href="#ppc_frame">Frame Layout</a></li>
				95	<li><a href="#ppc_prolog">Prolog/Epilog</a></li>
				96	<li><a href="#ppc_dynamic">Dynamic Allocation</a></li>
Jim Laskey	5782584	2006-12-15 10:40:48 +0000	[diff] [blame]	97	</ul></li>
				98	</ul></li>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	99
				100	</ol>
				101
				102	<div class="doc_author">
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	103	<p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a>,
Jim Laskey	7248e71	2007-03-14 19:30:33 +0000	[diff] [blame]	104	<a href="mailto:isanbard@gmail.com">Bill Wendling</a>,
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	105	<a href="mailto:pronesto@gmail.com">Fernando Magno Quintao
Jim Laskey	7248e71	2007-03-14 19:30:33 +0000	[diff] [blame]	106	Pereira</a> and
				107	<a href="mailto:jlaskey@mac.com">Jim Laskey</a></p>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	108	</div>
				109
Chris Lattner	f249fdc	2004-06-01 17:18:11 +0000	[diff] [blame]	110	<div class="doc_warning">
				111	<p>Warning: This is a work in progress.</p>
				112	</div>
				113
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	114	<!-- *********************************************************************** -->
				115	<div class="doc_section">
				116	<a name="introduction">Introduction</a>
				117	</div>
				118	<!-- *********************************************************************** -->
				119
				120	<div class="doc_text">
				121
				122	<p>The LLVM target-independent code generator is a framework that provides a
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	123	suite of reusable components for translating the LLVM internal representation
				124	to the machine code for a specified target—either in assembly form
				125	(suitable for a static compiler) or in binary machine code format (usable for
				126	a JIT compiler). The LLVM target-independent code generator consists of five
				127	main components:</p>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	128
				129	<ol>
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	130	<li><a href="#targetdesc">Abstract target description</a> interfaces which
				131	capture important properties about various aspects of the machine,
				132	independently of how they will be used. These interfaces are defined in
				133	<tt>include/llvm/Target/</tt>.</li>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	134
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	135	<li>Classes used to represent the <a href="#codegendesc">machine code</a>
				136	being generated for a target. These classes are intended to be abstract
				137	enough to represent the machine code for <i>any</i> target machine. These
				138	classes are defined in <tt>include/llvm/CodeGen/</tt>.</li>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	139
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	140	<li><a href="#codegenalgs">Target-independent algorithms</a> used to implement
				141	various phases of native code generation (register allocation, scheduling,
				142	stack frame representation, etc). This code lives
				143	in <tt>lib/CodeGen/</tt>.</li>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	144
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	145	<li><a href="#targetimpls">Implementations of the abstract target description
				146	interfaces</a> for particular targets. These machine descriptions make
				147	use of the components provided by LLVM, and can optionally provide custom
				148	target-specific passes, to build complete code generators for a specific
				149	target. Target descriptions live in <tt>lib/Target/</tt>.</li>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	150
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	151	<li><a href="#jit">The target-independent JIT components</a>. The LLVM JIT is
				152	completely target independent (it uses the <tt>TargetJITInfo</tt>
				153	structure to interface for target-specific issues. The code for the
				154	target-independent JIT lives in <tt>lib/ExecutionEngine/JIT</tt>.</li>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	155	</ol>
				156
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	157	<p>Depending on which part of the code generator you are interested in working
				158	on, different pieces of this will be useful to you. In any case, you should
				159	be familiar with the <a href="#targetdesc">target description</a>
				160	and <a href="#codegendesc">machine code representation</a> classes. If you
				161	want to add a backend for a new target, you will need
				162	to <a href="#targetimpls">implement the target description</a> classes for
				163	your new target and understand the <a href="LangRef.html">LLVM code
				164	representation</a>. If you are interested in implementing a
				165	new <a href="#codegenalgs">code generation algorithm</a>, it should only
				166	depend on the target-description and machine code representation classes,
				167	ensuring that it is portable.</p>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	168
				169	</div>
				170
				171	<!-- ======================================================================= -->
				172	<div class="doc_subsection">
				173	<a name="required">Required components in the code generator</a>
				174	</div>
				175
				176	<div class="doc_text">
				177
				178	<p>The two pieces of the LLVM code generator are the high-level interface to the
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	179	code generator and the set of reusable components that can be used to build
				180	target-specific backends. The two most important interfaces
				181	(<a href="#targetmachine"><tt>TargetMachine</tt></a>
				182	and <a href="#targetdata"><tt>TargetData</tt></a>) are the only ones that are
				183	required to be defined for a backend to fit into the LLVM system, but the
				184	others must be defined if the reusable code generator components are going to
				185	be used.</p>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	186
				187	<p>This design has two important implications. The first is that LLVM can
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	188	support completely non-traditional code generation targets. For example, the
				189	C backend does not require register allocation, instruction selection, or any
				190	of the other standard components provided by the system. As such, it only
				191	implements these two interfaces, and does its own thing. Another example of
				192	a code generator like this is a (purely hypothetical) backend that converts
				193	LLVM to the GCC RTL form and uses GCC to emit machine code for a target.</p>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	194
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	195	<p>This design also implies that it is possible to design and implement
				196	radically different code generators in the LLVM system that do not make use
				197	of any of the built-in components. Doing so is not recommended at all, but
				198	could be required for radically different targets that do not fit into the
				199	LLVM machine description model: FPGAs for example.</p>
Chris Lattner	e6cad6c	2004-06-02 07:06:06 +0000	[diff] [blame]	200
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	201	</div>
				202
				203	<!-- ======================================================================= -->
				204	<div class="doc_subsection">
Chris Lattner	f249fdc	2004-06-01 17:18:11 +0000	[diff] [blame]	205	<a name="high-level-design">The high-level design of the code generator</a>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	206	</div>
				207
				208	<div class="doc_text">
				209
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	210	<p>The LLVM target-independent code generator is designed to support efficient
				211	and quality code generation for standard register-based microprocessors.
				212	Code generation in this model is divided into the following stages:</p>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	213
				214	<ol>
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	215	<li><b><a href="#instselect">Instruction Selection</a></b> — This phase
				216	determines an efficient way to express the input LLVM code in the target
				217	instruction set. This stage produces the initial code for the program in
				218	the target instruction set, then makes use of virtual registers in SSA
				219	form and physical registers that represent any required register
				220	assignments due to target constraints or calling conventions. This step
				221	turns the LLVM code into a DAG of target instructions.</li>
Chris Lattner	d6f1a33	2005-10-16 18:31:08 +0000	[diff] [blame]	222
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	223	<li><b><a href="#selectiondag_sched">Scheduling and Formation</a></b> —
				224	This phase takes the DAG of target instructions produced by the
				225	instruction selection phase, determines an ordering of the instructions,
				226	then emits the instructions
				227	as <tt><a href="#machineinstr">MachineInstr</a></tt>s with that ordering.
				228	Note that we describe this in the <a href="#instselect">instruction
				229	selection section</a> because it operates on
				230	a <a href="#selectiondag_intro">SelectionDAG</a>.</li>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	231
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	232	<li><b><a href="#ssamco">SSA-based Machine Code Optimizations</a></b> —
				233	This optional stage consists of a series of machine-code optimizations
				234	that operate on the SSA-form produced by the instruction selector.
				235	Optimizations like modulo-scheduling or peephole optimization work
				236	here.</li>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	237
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	238	<li><b><a href="#regalloc">Register Allocation</a></b> — The target code
				239	is transformed from an infinite virtual register file in SSA form to the
				240	concrete register file used by the target. This phase introduces spill
				241	code and eliminates all virtual register references from the program.</li>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	242
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	243	<li><b><a href="#proepicode">Prolog/Epilog Code Insertion</a></b> — Once
				244	the machine code has been generated for the function and the amount of
				245	stack space required is known (used for LLVM alloca's and spill slots),
				246	the prolog and epilog code for the function can be inserted and "abstract
				247	stack location references" can be eliminated. This stage is responsible
				248	for implementing optimizations like frame-pointer elimination and stack
				249	packing.</li>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	250
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	251	<li><b><a href="#latemco">Late Machine Code Optimizations</a></b> —
				252	Optimizations that operate on "final" machine code can go here, such as
				253	spill code scheduling and peephole optimizations.</li>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	254
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	255	<li><b><a href="#codeemit">Code Emission</a></b> — The final stage
				256	actually puts out the code for the current function, either in the target
				257	assembler format or in machine code.</li>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	258	</ol>
				259
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	260	<p>The code generator is based on the assumption that the instruction selector
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	261	will use an optimal pattern matching selector to create high-quality
				262	sequences of native instructions. Alternative code generator designs based
				263	on pattern expansion and aggressive iterative peephole optimization are much
				264	slower. This design permits efficient compilation (important for JIT
				265	environments) and aggressive optimization (used when generating code offline)
				266	by allowing components of varying levels of sophistication to be used for any
				267	step of compilation.</p>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	268
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	269	<p>In addition to these stages, target implementations can insert arbitrary
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	270	target-specific passes into the flow. For example, the X86 target uses a
				271	special pass to handle the 80x87 floating point stack architecture. Other
				272	targets with unusual requirements can be supported with custom passes as
				273	needed.</p>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	274
				275	</div>
				276
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	277	<!-- ======================================================================= -->
				278	<div class="doc_subsection">
Chris Lattner	f249fdc	2004-06-01 17:18:11 +0000	[diff] [blame]	279	<a name="tablegen">Using TableGen for target description</a>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	280	</div>
				281
				282	<div class="doc_text">
				283
Chris Lattner	d9be5fa	2004-06-01 18:35:00 +0000	[diff] [blame]	284	<p>The target description classes require a detailed description of the target
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	285	architecture. These target descriptions often have a large amount of common
				286	information (e.g., an <tt>add</tt> instruction is almost identical to a
				287	<tt>sub</tt> instruction). In order to allow the maximum amount of
				288	commonality to be factored out, the LLVM code generator uses
				289	the <a href="TableGenFundamentals.html">TableGen</a> tool to describe big
				290	chunks of the target machine, which allows the use of domain-specific and
				291	target-specific abstractions to reduce the amount of repetition.</p>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	292
Chris Lattner	d6f1a33	2005-10-16 18:31:08 +0000	[diff] [blame]	293	<p>As LLVM continues to be developed and refined, we plan to move more and more
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	294	of the target description to the <tt>.td</tt> form. Doing so gives us a
				295	number of advantages. The most important is that it makes it easier to port
				296	LLVM because it reduces the amount of C++ code that has to be written, and
				297	the surface area of the code generator that needs to be understood before
				298	someone can get something working. Second, it makes it easier to change
				299	things. In particular, if tables and other things are all emitted
				300	by <tt>tblgen</tt>, we only need a change in one place (<tt>tblgen</tt>) to
				301	update all of the targets to a new interface.</p>
Chris Lattner	d6f1a33	2005-10-16 18:31:08 +0000	[diff] [blame]	302
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	303	</div>
				304
				305	<!-- *********************************************************************** -->
				306	<div class="doc_section">
				307	<a name="targetdesc">Target description classes</a>
				308	</div>
				309	<!-- *********************************************************************** -->
				310
				311	<div class="doc_text">
				312
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	313	<p>The LLVM target description classes (located in the
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	314	<tt>include/llvm/Target</tt> directory) provide an abstract description of
				315	the target machine independent of any particular client. These classes are
				316	designed to capture the <i>abstract</i> properties of the target (such as the
				317	instructions and registers it has), and do not incorporate any particular
				318	pieces of code generation algorithms.</p>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	319
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	320	<p>All of the target description classes (except the
				321	<tt><a href="#targetdata">TargetData</a></tt> class) are designed to be
				322	subclassed by the concrete target implementation, and have virtual methods
				323	implemented. To get to these implementations, the
				324	<tt><a href="#targetmachine">TargetMachine</a></tt> class provides accessors
				325	that should be implemented by the target.</p>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	326
				327	</div>
				328
				329	<!-- ======================================================================= -->
				330	<div class="doc_subsection">
				331	<a name="targetmachine">The <tt>TargetMachine</tt> class</a>
				332	</div>
				333
				334	<div class="doc_text">
				335
				336	<p>The <tt>TargetMachine</tt> class provides virtual methods that are used to
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	337	access the target-specific implementations of the various target description
				338	classes via the <tt>get*Info</tt> methods (<tt>getInstrInfo</tt>,
				339	<tt>getRegisterInfo</tt>, <tt>getFrameInfo</tt>, etc.). This class is
				340	designed to be specialized by a concrete target implementation
				341	(e.g., <tt>X86TargetMachine</tt>) which implements the various virtual
				342	methods. The only required target description class is
				343	the <a href="#targetdata"><tt>TargetData</tt></a> class, but if the code
				344	generator components are to be used, the other interfaces should be
				345	implemented as well.</p>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	346
				347	</div>
				348
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	349	<!-- ======================================================================= -->
				350	<div class="doc_subsection">
				351	<a name="targetdata">The <tt>TargetData</tt> class</a>
				352	</div>
				353
				354	<div class="doc_text">
				355
				356	<p>The <tt>TargetData</tt> class is the only required target description class,
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	357	and it is the only class that is not extensible (you cannot derived a new
				358	class from it). <tt>TargetData</tt> specifies information about how the
				359	target lays out memory for structures, the alignment requirements for various
				360	data types, the size of pointers in the target, and whether the target is
				361	little-endian or big-endian.</p>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	362
				363	</div>
				364
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	365	<!-- ======================================================================= -->
				366	<div class="doc_subsection">
				367	<a name="targetlowering">The <tt>TargetLowering</tt> class</a>
				368	</div>
				369
				370	<div class="doc_text">
				371
				372	<p>The <tt>TargetLowering</tt> class is used by SelectionDAG based instruction
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	373	selectors primarily to describe how LLVM code should be lowered to
				374	SelectionDAG operations. Among other things, this class indicates:</p>
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	375
				376	<ul>
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	377	<li>an initial register class to use for various <tt>ValueType</tt>s,</li>
				378
				379	<li>which operations are natively supported by the target machine,</li>
				380
				381	<li>the return type of <tt>setcc</tt> operations,</li>
				382
				383	<li>the type to use for shift amounts, and</li>
				384
Chris Lattner	d6f1a33	2005-10-16 18:31:08 +0000	[diff] [blame]	385	<li>various high-level characteristics, like whether it is profitable to turn
				386	division by a constant into a multiplication sequence</li>
Jim Laskey	5782584	2006-12-15 10:40:48 +0000	[diff] [blame]	387	</ul>
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	388
				389	</div>
				390
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	391	<!-- ======================================================================= -->
				392	<div class="doc_subsection">
Dan Gohman	3a4be0f	2008-02-10 18:45:23 +0000	[diff] [blame]	393	<a name="targetregisterinfo">The <tt>TargetRegisterInfo</tt> class</a>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	394	</div>
				395
				396	<div class="doc_text">
				397
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	398	<p>The <tt>TargetRegisterInfo</tt> class is used to describe the register file
				399	of the target and any interactions between the registers.</p>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	400
				401	<p>Registers in the code generator are represented in the code generator by
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	402	unsigned integers. Physical registers (those that actually exist in the
				403	target description) are unique small numbers, and virtual registers are
				404	generally large. Note that register #0 is reserved as a flag value.</p>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	405
				406	<p>Each register in the processor description has an associated
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	407	<tt>TargetRegisterDesc</tt> entry, which provides a textual name for the
				408	register (used for assembly output and debugging dumps) and a set of aliases
				409	(used to indicate whether one register overlaps with another).</p>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	410
Dan Gohman	3a4be0f	2008-02-10 18:45:23 +0000	[diff] [blame]	411	<p>In addition to the per-register description, the <tt>TargetRegisterInfo</tt>
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	412	class exposes a set of processor specific register classes (instances of the
				413	<tt>TargetRegisterClass</tt> class). Each register class contains sets of
				414	registers that have the same properties (for example, they are all 32-bit
				415	integer registers). Each SSA virtual register created by the instruction
				416	selector has an associated register class. When the register allocator runs,
				417	it replaces virtual registers with a physical register in the set.</p>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	418
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	419	<p>The target-specific implementations of these classes is auto-generated from
				420	a <a href="TableGenFundamentals.html">TableGen</a> description of the
				421	register file.</p>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	422
				423	</div>
				424
				425	<!-- ======================================================================= -->
				426	<div class="doc_subsection">
Chris Lattner	f249fdc	2004-06-01 17:18:11 +0000	[diff] [blame]	427	<a name="targetinstrinfo">The <tt>TargetInstrInfo</tt> class</a>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	428	</div>
				429
Reid Spencer	e5dc84b	2005-07-19 01:36:35 +0000	[diff] [blame]	430	<div class="doc_text">
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	431
				432	<p>The <tt>TargetInstrInfo</tt> class is used to describe the machine
				433	instructions supported by the target. It is essentially an array of
				434	<tt>TargetInstrDescriptor</tt> objects, each of which describes one
				435	instruction the target supports. Descriptors define things like the mnemonic
				436	for the opcode, the number of operands, the list of implicit register uses
				437	and defs, whether the instruction has certain target-independent properties
				438	(accesses memory, is commutable, etc), and holds any target-specific
				439	flags.</p>
				440
Reid Spencer	e5dc84b	2005-07-19 01:36:35 +0000	[diff] [blame]	441	</div>
				442
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	443	<!-- ======================================================================= -->
				444	<div class="doc_subsection">
Chris Lattner	f249fdc	2004-06-01 17:18:11 +0000	[diff] [blame]	445	<a name="targetframeinfo">The <tt>TargetFrameInfo</tt> class</a>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	446	</div>
				447
Reid Spencer	e5dc84b	2005-07-19 01:36:35 +0000	[diff] [blame]	448	<div class="doc_text">
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	449
				450	<p>The <tt>TargetFrameInfo</tt> class is used to provide information about the
				451	stack frame layout of the target. It holds the direction of stack growth, the
				452	known stack alignment on entry to each function, and the offset to the local
				453	area. The offset to the local area is the offset from the stack pointer on
				454	function entry to the first location where function data (local variables,
				455	spill locations) can be stored.</p>
				456
Reid Spencer	e5dc84b	2005-07-19 01:36:35 +0000	[diff] [blame]	457	</div>
Chris Lattner	c9afa28	2005-10-16 17:06:07 +0000	[diff] [blame]	458
				459	<!-- ======================================================================= -->
				460	<div class="doc_subsection">
				461	<a name="targetsubtarget">The <tt>TargetSubtarget</tt> class</a>
				462	</div>
				463
				464	<div class="doc_text">
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	465
				466	<p>The <tt>TargetSubtarget</tt> class is used to provide information about the
				467	specific chip set being targeted. A sub-target informs code generation of
				468	which instructions are supported, instruction latencies and instruction
				469	execution itinerary; i.e., which processing units are used, in what order,
				470	and for how long.</p>
				471
Chris Lattner	c9afa28	2005-10-16 17:06:07 +0000	[diff] [blame]	472	</div>
				473
				474
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	475	<!-- ======================================================================= -->
				476	<div class="doc_subsection">
Chris Lattner	f249fdc	2004-06-01 17:18:11 +0000	[diff] [blame]	477	<a name="targetjitinfo">The <tt>TargetJITInfo</tt> class</a>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	478	</div>
				479
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	480	<div class="doc_text">
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	481
				482	<p>The <tt>TargetJITInfo</tt> class exposes an abstract interface used by the
				483	Just-In-Time code generator to perform target-specific activities, such as
				484	emitting stubs. If a <tt>TargetMachine</tt> supports JIT code generation, it
				485	should provide one of these objects through the <tt>getJITInfo</tt>
				486	method.</p>
				487
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	488	</div>
				489
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	490	<!-- *********************************************************************** -->
				491	<div class="doc_section">
				492	<a name="codegendesc">Machine code description classes</a>
				493	</div>
				494	<!-- *********************************************************************** -->
				495
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	496	<div class="doc_text">
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	497
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	498	<p>At the high-level, LLVM code is translated to a machine specific
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	499	representation formed out of
				500	<a href="#machinefunction"><tt>MachineFunction</tt></a>,
				501	<a href="#machinebasicblock"><tt>MachineBasicBlock</tt></a>,
				502	and <a href="#machineinstr"><tt>MachineInstr</tt></a> instances (defined
				503	in <tt>include/llvm/CodeGen</tt>). This representation is completely target
				504	agnostic, representing instructions in their most abstract form: an opcode
				505	and a series of operands. This representation is designed to support both an
				506	SSA representation for machine code, as well as a register allocated, non-SSA
				507	form.</p>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	508
				509	</div>
				510
				511	<!-- ======================================================================= -->
				512	<div class="doc_subsection">
				513	<a name="machineinstr">The <tt>MachineInstr</tt> class</a>
				514	</div>
				515
				516	<div class="doc_text">
				517
				518	<p>Target machine instructions are represented as instances of the
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	519	<tt>MachineInstr</tt> class. This class is an extremely abstract way of
				520	representing machine instructions. In particular, it only keeps track of an
				521	opcode number and a set of operands.</p>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	522
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	523	<p>The opcode number is a simple unsigned integer that only has meaning to a
				524	specific backend. All of the instructions for a target should be defined in
				525	the <tt>*InstrInfo.td</tt> file for the target. The opcode enum values are
				526	auto-generated from this description. The <tt>MachineInstr</tt> class does
				527	not have any information about how to interpret the instruction (i.e., what
				528	the semantics of the instruction are); for that you must refer to the
				529	<tt><a href="#targetinstrinfo">TargetInstrInfo</a></tt> class.</p>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	530
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	531	<p>The operands of a machine instruction can be of several different types: a
				532	register reference, a constant integer, a basic block reference, etc. In
				533	addition, a machine operand should be marked as a def or a use of the value
				534	(though only registers are allowed to be defs).</p>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	535
				536	<p>By convention, the LLVM code generator orders instruction operands so that
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	537	all register definitions come before the register uses, even on architectures
				538	that are normally printed in other orders. For example, the SPARC add
				539	instruction: "<tt>add %i1, %i2, %i3</tt>" adds the "%i1", and "%i2" registers
				540	and stores the result into the "%i3" register. In the LLVM code generator,
				541	the operands should be stored as "<tt>%i3, %i1, %i2</tt>": with the
				542	destination first.</p>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	543
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	544	<p>Keeping destination (definition) operands at the beginning of the operand
				545	list has several advantages. In particular, the debugging printer will print
				546	the instruction like this:</p>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	547
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	548	<div class="doc_code">
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	549	<pre>
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	550	%r3 = add %i1, %i2
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	551	</pre>
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	552	</div>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	553
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	554	<p>Also if the first operand is a def, it is easier to <a href="#buildmi">create
				555	instructions</a> whose only def is the first operand.</p>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	556
				557	</div>
				558
				559	<!-- _______________________________________________________________________ -->
				560	<div class="doc_subsubsection">
				561	<a name="buildmi">Using the <tt>MachineInstrBuilder.h</tt> functions</a>
				562	</div>
				563
				564	<div class="doc_text">
				565
				566	<p>Machine instructions are created by using the <tt>BuildMI</tt> functions,
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	567	located in the <tt>include/llvm/CodeGen/MachineInstrBuilder.h</tt> file. The
				568	<tt>BuildMI</tt> functions make it easy to build arbitrary machine
				569	instructions. Usage of the <tt>BuildMI</tt> functions look like this:</p>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	570
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	571	<div class="doc_code">
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	572	<pre>
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	573	// Create a 'DestReg = mov 42' (rendered in X86 assembly as 'mov DestReg, 42')
				574	// instruction. The '1' specifies how many operands will be added.
				575	MachineInstr *MI = BuildMI(X86::MOV32ri, 1, DestReg).addImm(42);
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	576
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	577	// Create the same instr, but insert it at the end of a basic block.
				578	MachineBasicBlock &MBB = ...
				579	BuildMI(MBB, X86::MOV32ri, 1, DestReg).addImm(42);
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	580
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	581	// Create the same instr, but insert it before a specified iterator point.
				582	MachineBasicBlock::iterator MBBI = ...
				583	BuildMI(MBB, MBBI, X86::MOV32ri, 1, DestReg).addImm(42);
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	584
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	585	// Create a 'cmp Reg, 0' instruction, no destination reg.
				586	MI = BuildMI(X86::CMP32ri, 2).addReg(Reg).addImm(0);
				587	// Create an 'sahf' instruction which takes no operands and stores nothing.
				588	MI = BuildMI(X86::SAHF, 0);
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	589
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	590	// Create a self looping branch instruction.
				591	BuildMI(MBB, X86::JNE, 1).addMBB(&MBB);
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	592	</pre>
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	593	</div>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	594
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	595	<p>The key thing to remember with the <tt>BuildMI</tt> functions is that you
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	596	have to specify the number of operands that the machine instruction will
				597	take. This allows for efficient memory allocation. You also need to specify
				598	if operands default to be uses of values, not definitions. If you need to
				599	add a definition operand (other than the optional destination register), you
				600	must explicitly mark it as such:</p>
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	601
				602	<div class="doc_code">
				603	<pre>
Bill Wendling	f7b83c7	2009-05-13 21:33:08 +0000	[diff] [blame]	604	MI.addReg(Reg, RegState::Define);
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	605	</pre>
				606	</div>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	607
				608	</div>
				609
				610	<!-- _______________________________________________________________________ -->
				611	<div class="doc_subsubsection">
Reid Spencer	4da9784	2005-04-24 20:56:18 +0000	[diff] [blame]	612	<a name="fixedregs">Fixed (preassigned) registers</a>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	613	</div>
				614
				615	<div class="doc_text">
				616
				617	<p>One important issue that the code generator needs to be aware of is the
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	618	presence of fixed registers. In particular, there are often places in the
				619	instruction stream where the register allocator <em>must</em> arrange for a
				620	particular value to be in a particular register. This can occur due to
				621	limitations of the instruction set (e.g., the X86 can only do a 32-bit divide
				622	with the <tt>EAX</tt>/<tt>EDX</tt> registers), or external factors like
				623	calling conventions. In any case, the instruction selector should emit code
				624	that copies a virtual register into or out of a physical register when
				625	needed.</p>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	626
				627	<p>For example, consider this simple LLVM example:</p>
				628
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	629	<div class="doc_code">
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	630	<pre>
Matthijs Kooijman	a6bb22e	2008-06-04 15:46:35 +0000	[diff] [blame]	631	define i32 @test(i32 %X, i32 %Y) {
				632	%Z = udiv i32 %X, %Y
				633	ret i32 %Z
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	634	}
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	635	</pre>
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	636	</div>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	637
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	638	<p>The X86 instruction selector produces this machine code for the <tt>div</tt>
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	639	and <tt>ret</tt> (use "<tt>llc X.bc -march=x86 -print-machineinstrs</tt>" to
				640	get this):</p>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	641
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	642	<div class="doc_code">
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	643	<pre>
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	644	;; Start of div
				645	%EAX = mov %reg1024 ;; Copy X (in reg1024) into EAX
				646	%reg1027 = sar %reg1024, 31
				647	%EDX = mov %reg1027 ;; Sign extend X into EDX
				648	idiv %reg1025 ;; Divide by Y (in reg1025)
				649	%reg1026 = mov %EAX ;; Read the result (Z) out of EAX
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	650
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	651	;; Start of ret
				652	%EAX = mov %reg1026 ;; 32-bit return value goes in EAX
				653	ret
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	654	</pre>
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	655	</div>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	656
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	657	<p>By the end of code generation, the register allocator has coalesced the
				658	registers and deleted the resultant identity moves producing the following
				659	code:</p>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	660
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	661	<div class="doc_code">
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	662	<pre>
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	663	;; X is in EAX, Y is in ECX
				664	mov %EAX, %EDX
				665	sar %EDX, 31
				666	idiv %ECX
				667	ret
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	668	</pre>
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	669	</div>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	670
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	671	<p>This approach is extremely general (if it can handle the X86 architecture, it
				672	can handle anything!) and allows all of the target specific knowledge about
				673	the instruction stream to be isolated in the instruction selector. Note that
				674	physical registers should have a short lifetime for good code generation, and
				675	all physical registers are assumed dead on entry to and exit from basic
				676	blocks (before register allocation). Thus, if you need a value to be live
				677	across basic block boundaries, it <em>must</em> live in a virtual
				678	register.</p>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	679
				680	</div>
				681
				682	<!-- _______________________________________________________________________ -->
				683	<div class="doc_subsubsection">
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	684	<a name="ssa">Machine code in SSA form</a>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	685	</div>
				686
				687	<div class="doc_text">
				688
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	689	<p><tt>MachineInstr</tt>'s are initially selected in SSA-form, and are
				690	maintained in SSA-form until register allocation happens. For the most part,
				691	this is trivially simple since LLVM is already in SSA form; LLVM PHI nodes
				692	become machine code PHI nodes, and virtual registers are only allowed to have
				693	a single definition.</p>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	694
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	695	<p>After register allocation, machine code is no longer in SSA-form because
				696	there are no virtual registers left in the code.</p>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	697
				698	</div>
				699
Chris Lattner	d6f1a33	2005-10-16 18:31:08 +0000	[diff] [blame]	700	<!-- ======================================================================= -->
				701	<div class="doc_subsection">
				702	<a name="machinebasicblock">The <tt>MachineBasicBlock</tt> class</a>
				703	</div>
				704
				705	<div class="doc_text">
				706
				707	<p>The <tt>MachineBasicBlock</tt> class contains a list of machine instructions
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	708	(<tt><a href="#machineinstr">MachineInstr</a></tt> instances). It roughly
				709	corresponds to the LLVM code input to the instruction selector, but there can
				710	be a one-to-many mapping (i.e. one LLVM basic block can map to multiple
				711	machine basic blocks). The <tt>MachineBasicBlock</tt> class has a
				712	"<tt>getBasicBlock</tt>" method, which returns the LLVM basic block that it
				713	comes from.</p>
Chris Lattner	d6f1a33	2005-10-16 18:31:08 +0000	[diff] [blame]	714
				715	</div>
				716
				717	<!-- ======================================================================= -->
				718	<div class="doc_subsection">
				719	<a name="machinefunction">The <tt>MachineFunction</tt> class</a>
				720	</div>
				721
				722	<div class="doc_text">
				723
				724	<p>The <tt>MachineFunction</tt> class contains a list of machine basic blocks
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	725	(<tt><a href="#machinebasicblock">MachineBasicBlock</a></tt> instances). It
				726	corresponds one-to-one with the LLVM function input to the instruction
				727	selector. In addition to a list of basic blocks,
				728	the <tt>MachineFunction</tt> contains a a <tt>MachineConstantPool</tt>,
				729	a <tt>MachineFrameInfo</tt>, a <tt>MachineFunctionInfo</tt>, and a
				730	<tt>MachineRegisterInfo</tt>. See
				731	<tt>include/llvm/CodeGen/MachineFunction.h</tt> for more information.</p>
Chris Lattner	d6f1a33	2005-10-16 18:31:08 +0000	[diff] [blame]	732
				733	</div>
				734
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	735	<!-- *********************************************************************** -->
				736	<div class="doc_section">
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	737	<a name="codegenalgs">Target-independent code generation algorithms</a>
				738	</div>
				739	<!-- *********************************************************************** -->
				740
				741	<div class="doc_text">
				742
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	743	<p>This section documents the phases described in the
				744	<a href="#high-level-design">high-level design of the code generator</a>.
				745	It explains how they work and some of the rationale behind their design.</p>
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	746
				747	</div>
				748
				749	<!-- ======================================================================= -->
				750	<div class="doc_subsection">
				751	<a name="instselect">Instruction Selection</a>
				752	</div>
				753
				754	<div class="doc_text">
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	755
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	756	<p>Instruction Selection is the process of translating LLVM code presented to
				757	the code generator into target-specific machine instructions. There are
				758	several well-known ways to do this in the literature. LLVM uses a
				759	SelectionDAG based instruction selector.</p>
				760
				761	<p>Portions of the DAG instruction selector are generated from the target
				762	description (<tt>*.td</tt>) files. Our goal is for the entire instruction
				763	selector to be generated from these <tt>.td</tt> files, though currently
				764	there are still things that require custom C++ code.</p>
				765
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	766	</div>
				767
				768	<!-- _______________________________________________________________________ -->
				769	<div class="doc_subsubsection">
				770	<a name="selectiondag_intro">Introduction to SelectionDAGs</a>
				771	</div>
				772
				773	<div class="doc_text">
				774
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	775	<p>The SelectionDAG provides an abstraction for code representation in a way
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	776	that is amenable to instruction selection using automatic techniques
				777	(e.g. dynamic-programming based optimal pattern matching selectors). It is
				778	also well-suited to other phases of code generation; in particular,
				779	instruction scheduling (SelectionDAG's are very close to scheduling DAGs
				780	post-selection). Additionally, the SelectionDAG provides a host
				781	representation where a large variety of very-low-level (but
				782	target-independent) <a href="#selectiondag_optimize">optimizations</a> may be
				783	performed; ones which require extensive information about the instructions
				784	efficiently supported by the target.</p>
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	785
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	786	<p>The SelectionDAG is a Directed-Acyclic-Graph whose nodes are instances of the
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	787	<tt>SDNode</tt> class. The primary payload of the <tt>SDNode</tt> is its
				788	operation code (Opcode) that indicates what operation the node performs and
				789	the operands to the operation. The various operation node types are
				790	described at the top of the <tt>include/llvm/CodeGen/SelectionDAGNodes.h</tt>
				791	file.</p>
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	792
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	793	<p>Although most operations define a single value, each node in the graph may
				794	define multiple values. For example, a combined div/rem operation will
				795	define both the dividend and the remainder. Many other situations require
				796	multiple values as well. Each node also has some number of operands, which
				797	are edges to the node defining the used value. Because nodes may define
				798	multiple values, edges are represented by instances of the <tt>SDValue</tt>
				799	class, which is a <tt><SDNode, unsigned></tt> pair, indicating the node
				800	and result value being used, respectively. Each value produced by
				801	an <tt>SDNode</tt> has an associated <tt>MVT</tt> (Machine Value Type)
				802	indicating what the type of the value is.</p>
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	803
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	804	<p>SelectionDAGs contain two different kinds of values: those that represent
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	805	data flow and those that represent control flow dependencies. Data values
				806	are simple edges with an integer or floating point value type. Control edges
				807	are represented as "chain" edges which are of type <tt>MVT::Other</tt>.
				808	These edges provide an ordering between nodes that have side effects (such as
				809	loads, stores, calls, returns, etc). All nodes that have side effects should
				810	take a token chain as input and produce a new one as output. By convention,
				811	token chain inputs are always operand #0, and chain results are always the
				812	last value produced by an operation.</p>
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	813
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	814	<p>A SelectionDAG has designated "Entry" and "Root" nodes. The Entry node is
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	815	always a marker node with an Opcode of <tt>ISD::EntryToken</tt>. The Root
				816	node is the final side-effecting node in the token chain. For example, in a
				817	single basic block function it would be the return node.</p>
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	818
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	819	<p>One important concept for SelectionDAGs is the notion of a "legal" vs.
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	820	"illegal" DAG. A legal DAG for a target is one that only uses supported
				821	operations and supported types. On a 32-bit PowerPC, for example, a DAG with
				822	a value of type i1, i8, i16, or i64 would be illegal, as would a DAG that
				823	uses a SREM or UREM operation. The
				824	<a href="#selectinodag_legalize_types">legalize types</a> and
				825	<a href="#selectiondag_legalize">legalize operations</a> phases are
				826	responsible for turning an illegal DAG into a legal DAG.</p>
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	827
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	828	</div>
				829
				830	<!-- _______________________________________________________________________ -->
				831	<div class="doc_subsubsection">
Reid Spencer	4da9784	2005-04-24 20:56:18 +0000	[diff] [blame]	832	<a name="selectiondag_process">SelectionDAG Instruction Selection Process</a>
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	833	</div>
				834
				835	<div class="doc_text">
				836
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	837	<p>SelectionDAG-based instruction selection consists of the following steps:</p>
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	838
				839	<ol>
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	840	<li><a href="#selectiondag_build">Build initial DAG</a> — This stage
				841	performs a simple translation from the input LLVM code to an illegal
				842	SelectionDAG.</li>
				843
				844	<li><a href="#selectiondag_optimize">Optimize SelectionDAG</a> — This
				845	stage performs simple optimizations on the SelectionDAG to simplify it,
				846	and recognize meta instructions (like rotates
				847	and <tt>div</tt>/<tt>rem</tt> pairs) for targets that support these meta
				848	operations. This makes the resultant code more efficient and
				849	the <a href="#selectiondag_select">select instructions from DAG</a> phase
				850	(below) simpler.</li>
				851
				852	<li><a href="#selectiondag_legalize_types">Legalize SelectionDAG Types</a>
				853	— This stage transforms SelectionDAG nodes to eliminate any types
				854	that are unsupported on the target.</li>
				855
				856	<li><a href="#selectiondag_optimize">Optimize SelectionDAG</a> — The
				857	SelectionDAG optimizer is run to clean up redundancies exposed by type
				858	legalization.</li>
				859
				860	<li><a href="#selectiondag_legalize">Legalize SelectionDAG Types</a> —
				861	This stage transforms SelectionDAG nodes to eliminate any types that are
				862	unsupported on the target.</li>
				863
				864	<li><a href="#selectiondag_optimize">Optimize SelectionDAG</a> — The
				865	SelectionDAG optimizer is run to eliminate inefficiencies introduced by
				866	operation legalization.</li>
				867
				868	<li><a href="#selectiondag_select">Select instructions from DAG</a> —
				869	Finally, the target instruction selector matches the DAG operations to
				870	target instructions. This process translates the target-independent input
				871	DAG into another DAG of target instructions.</li>
				872
				873	<li><a href="#selectiondag_sched">SelectionDAG Scheduling and Formation</a>
				874	— The last phase assigns a linear order to the instructions in the
				875	target-instruction DAG and emits them into the MachineFunction being
				876	compiled. This step uses traditional prepass scheduling techniques.</li>
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	877	</ol>
				878
				879	<p>After all of these steps are complete, the SelectionDAG is destroyed and the
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	880	rest of the code generation passes are run.</p>
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	881
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	882	<p>One great way to visualize what is going on here is to take advantage of a
				883	few LLC command line options. The following options pop up a window
				884	displaying the SelectionDAG at specific times (if you only get errors printed
				885	to the console while using this, you probably
				886	<a href="ProgrammersManual.html#ViewGraph">need to configure your system</a>
				887	to add support for it).</p>
Dan Gohman	dd51d52	2008-09-10 22:23:41 +0000	[diff] [blame]	888
				889	<ul>
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	890	<li><tt>-view-dag-combine1-dags</tt> displays the DAG after being built,
				891	before the first optimization pass.</li>
				892
				893	<li><tt>-view-legalize-dags</tt> displays the DAG before Legalization.</li>
				894
				895	<li><tt>-view-dag-combine2-dags</tt> displays the DAG before the second
				896	optimization pass.</li>
				897
				898	<li><tt>-view-isel-dags</tt> displays the DAG before the Select phase.</li>
				899
				900	<li><tt>-view-sched-dags</tt> displays the DAG before Scheduling.</li>
Dan Gohman	dd51d52	2008-09-10 22:23:41 +0000	[diff] [blame]	901	</ul>
				902
				903	<p>The <tt>-view-sunit-dags</tt> displays the Scheduler's dependency graph.
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	904	This graph is based on the final SelectionDAG, with nodes that must be
				905	scheduled together bundled into a single scheduling-unit node, and with
				906	immediate operands and other nodes that aren't relevant for scheduling
				907	omitted.</p>
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	908
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	909	</div>
				910
				911	<!-- _______________________________________________________________________ -->
				912	<div class="doc_subsubsection">
				913	<a name="selectiondag_build">Initial SelectionDAG Construction</a>
				914	</div>
				915
				916	<div class="doc_text">
				917
Bill Wendling	6737f5d	2006-08-28 03:04:05 +0000	[diff] [blame]	918	<p>The initial SelectionDAG is naïvely peephole expanded from the LLVM
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	919	input by the <tt>SelectionDAGLowering</tt> class in the
				920	<tt>lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp</tt> file. The intent of
				921	this pass is to expose as much low-level, target-specific details to the
				922	SelectionDAG as possible. This pass is mostly hard-coded (e.g. an
				923	LLVM <tt>add</tt> turns into an <tt>SDNode add</tt> while a
				924	<tt>getelementptr</tt> is expanded into the obvious arithmetic). This pass
				925	requires target-specific hooks to lower calls, returns, varargs, etc. For
				926	these features, the <tt><a href="#targetlowering">TargetLowering</a></tt>
				927	interface is used.</p>
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	928
				929	</div>
				930
				931	<!-- _______________________________________________________________________ -->
				932	<div class="doc_subsubsection">
Dan Gohman	1e6f511	2008-11-24 16:27:17 +0000	[diff] [blame]	933	<a name="selectiondag_legalize_types">SelectionDAG LegalizeTypes Phase</a>
				934	</div>
				935
				936	<div class="doc_text">
				937
				938	<p>The Legalize phase is in charge of converting a DAG to only use the types
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	939	that are natively supported by the target.</p>
Dan Gohman	1e6f511	2008-11-24 16:27:17 +0000	[diff] [blame]	940
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	941	<p>There are two main ways of converting values of unsupported scalar types to
				942	values of supported types: converting small types to larger types
				943	("promoting"), and breaking up large integer types into smaller ones
				944	("expanding"). For example, a target might require that all f32 values are
				945	promoted to f64 and that all i1/i8/i16 values are promoted to i32. The same
				946	target might require that all i64 values be expanded into pairs of i32
				947	values. These changes can insert sign and zero extensions as needed to make
				948	sure that the final code has the same behavior as the input.</p>
Dan Gohman	1e6f511	2008-11-24 16:27:17 +0000	[diff] [blame]	949
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	950	<p>There are two main ways of converting values of unsupported vector types to
				951	value of supported types: splitting vector types, multiple times if
				952	necessary, until a legal type is found, and extending vector types by adding
				953	elements to the end to round them out to legal types ("widening"). If a
				954	vector gets split all the way down to single-element parts with no supported
				955	vector type being found, the elements are converted to scalars
				956	("scalarizing").</p>
Dan Gohman	1e6f511	2008-11-24 16:27:17 +0000	[diff] [blame]	957
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	958	<p>A target implementation tells the legalizer which types are supported (and
				959	which register class to use for them) by calling the
Dan Gohman	1e6f511	2008-11-24 16:27:17 +0000	[diff] [blame]	960	<tt>addRegisterClass</tt> method in its TargetLowering constructor.</p>
				961
				962	</div>
				963
				964	<!-- _______________________________________________________________________ -->
				965	<div class="doc_subsubsection">
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	966	<a name="selectiondag_legalize">SelectionDAG Legalize Phase</a>
				967	</div>
				968
				969	<div class="doc_text">
				970
Dan Gohman	1e6f511	2008-11-24 16:27:17 +0000	[diff] [blame]	971	<p>The Legalize phase is in charge of converting a DAG to only use the
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	972	operations that are natively supported by the target.</p>
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	973
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	974	<p>Targets often have weird constraints, such as not supporting every operation
				975	on every supported datatype (e.g. X86 does not support byte conditional moves
				976	and PowerPC does not support sign-extending loads from a 16-bit memory
				977	location). Legalize takes care of this by open-coding another sequence of
				978	operations to emulate the operation ("expansion"), by promoting one type to a
				979	larger type that supports the operation ("promotion"), or by using a
				980	target-specific hook to implement the legalization ("custom").</p>
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	981
Dan Gohman	1e6f511	2008-11-24 16:27:17 +0000	[diff] [blame]	982	<p>A target implementation tells the legalizer which operations are not
				983	supported (and which of the above three actions to take) by calling the
				984	<tt>setOperationAction</tt> method in its <tt>TargetLowering</tt>
				985	constructor.</p>
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	986
Dan Gohman	1e6f511	2008-11-24 16:27:17 +0000	[diff] [blame]	987	<p>Prior to the existence of the Legalize passes, we required that every target
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	988	<a href="#selectiondag_optimize">selector</a> supported and handled every
				989	operator and type even if they are not natively supported. The introduction
				990	of the Legalize phases allows all of the canonicalization patterns to be
				991	shared across targets, and makes it very easy to optimize the canonicalized
				992	code because it is still in the form of a DAG.</p>
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	993
				994	</div>
				995
				996	<!-- _______________________________________________________________________ -->
				997	<div class="doc_subsubsection">
Chris Lattner	acf3d62	2005-10-16 00:36:38 +0000	[diff] [blame]	998	<a name="selectiondag_optimize">SelectionDAG Optimization Phase: the DAG
				999	Combiner</a>
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	1000	</div>
				1001
				1002	<div class="doc_text">
				1003
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1004	<p>The SelectionDAG optimization phase is run multiple times for code
				1005	generation, immediately after the DAG is built and once after each
				1006	legalization. The first run of the pass allows the initial code to be
				1007	cleaned up (e.g. performing optimizations that depend on knowing that the
				1008	operators have restricted type inputs). Subsequent runs of the pass clean up
				1009	the messy code generated by the Legalize passes, which allows Legalize to be
				1010	very simple (it can focus on making code legal instead of focusing on
				1011	generating <em>good</em> and legal code).</p>
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	1012
				1013	<p>One important class of optimizations performed is optimizing inserted sign
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1014	and zero extension instructions. We currently use ad-hoc techniques, but
				1015	could move to more rigorous techniques in the future. Here are some good
				1016	papers on the subject:</p>
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	1017
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1018	<p>"<a href="http://www.eecs.harvard.edu/~nr/pubs/widen-abstract.html">Widening
				1019	integer arithmetic</a>"<br>
				1020	Kevin Redwine and Norman Ramsey<br>
				1021	International Conference on Compiler Construction (CC) 2004</p>
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	1022
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1023	<p>"<a href="http://portal.acm.org/citation.cfm?doid=512529.512552">Effective
				1024	sign extension elimination</a>"<br>
				1025	Motohiro Kawahito, Hideaki Komatsu, and Toshio Nakatani<br>
				1026	Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design
				1027	and Implementation.</p>
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	1028
				1029	</div>
				1030
				1031	<!-- _______________________________________________________________________ -->
				1032	<div class="doc_subsubsection">
				1033	<a name="selectiondag_select">SelectionDAG Select Phase</a>
				1034	</div>
				1035
				1036	<div class="doc_text">
				1037
				1038	<p>The Select phase is the bulk of the target-specific code for instruction
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1039	selection. This phase takes a legal SelectionDAG as input, pattern matches
				1040	the instructions supported by the target to this DAG, and produces a new DAG
				1041	of target code. For example, consider the following LLVM fragment:</p>
Chris Lattner	17acad6	2005-10-16 20:02:19 +0000	[diff] [blame]	1042
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	1043	<div class="doc_code">
Chris Lattner	17acad6	2005-10-16 20:02:19 +0000	[diff] [blame]	1044	<pre>
Dan Gohman	6f34abd	2010-03-02 01:11:08 +0000	[diff] [blame]	1045	%t1 = fadd float %W, %X
				1046	%t2 = fmul float %t1, %Y
				1047	%t3 = fadd float %t2, %Z
Chris Lattner	17acad6	2005-10-16 20:02:19 +0000	[diff] [blame]	1048	</pre>
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	1049	</div>
Chris Lattner	17acad6	2005-10-16 20:02:19 +0000	[diff] [blame]	1050
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	1051	<p>This LLVM code corresponds to a SelectionDAG that looks basically like
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1052	this:</p>
Chris Lattner	17acad6	2005-10-16 20:02:19 +0000	[diff] [blame]	1053
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	1054	<div class="doc_code">
Chris Lattner	17acad6	2005-10-16 20:02:19 +0000	[diff] [blame]	1055	<pre>
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	1056	(fadd:f32 (fmul:f32 (fadd:f32 W, X), Y), Z)
Chris Lattner	17acad6	2005-10-16 20:02:19 +0000	[diff] [blame]	1057	</pre>
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	1058	</div>
Chris Lattner	17acad6	2005-10-16 20:02:19 +0000	[diff] [blame]	1059
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1060	<p>If a target supports floating point multiply-and-add (FMA) operations, one of
				1061	the adds can be merged with the multiply. On the PowerPC, for example, the
				1062	output of the instruction selector might look like this DAG:</p>
Chris Lattner	17acad6	2005-10-16 20:02:19 +0000	[diff] [blame]	1063
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	1064	<div class="doc_code">
Chris Lattner	17acad6	2005-10-16 20:02:19 +0000	[diff] [blame]	1065	<pre>
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	1066	(FMADDS (FADDS W, X), Y, Z)
Chris Lattner	17acad6	2005-10-16 20:02:19 +0000	[diff] [blame]	1067	</pre>
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	1068	</div>
Chris Lattner	17acad6	2005-10-16 20:02:19 +0000	[diff] [blame]	1069
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	1070	<p>The <tt>FMADDS</tt> instruction is a ternary instruction that multiplies its
				1071	first two operands and adds the third (as single-precision floating-point
				1072	numbers). The <tt>FADDS</tt> instruction is a simple binary single-precision
				1073	add instruction. To perform this pattern match, the PowerPC backend includes
				1074	the following instruction definitions:</p>
Chris Lattner	17acad6	2005-10-16 20:02:19 +0000	[diff] [blame]	1075
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	1076	<div class="doc_code">
Chris Lattner	17acad6	2005-10-16 20:02:19 +0000	[diff] [blame]	1077	<pre>
				1078	def FMADDS : AForm_1<59, 29,
				1079	(ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRC, F4RC:$FRB),
				1080	"fmadds $FRT, $FRA, $FRC, $FRB",
				1081	[<b>(set F4RC:$FRT, (fadd (fmul F4RC:$FRA, F4RC:$FRC),
				1082	F4RC:$FRB))</b>]>;
				1083	def FADDS : AForm_2<59, 21,
				1084	(ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRB),
				1085	"fadds $FRT, $FRA, $FRB",
				1086	[<b>(set F4RC:$FRT, (fadd F4RC:$FRA, F4RC:$FRB))</b>]>;
				1087	</pre>
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	1088	</div>
Chris Lattner	17acad6	2005-10-16 20:02:19 +0000	[diff] [blame]	1089
				1090	<p>The portion of the instruction definition in bold indicates the pattern used
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1091	to match the instruction. The DAG operators
				1092	(like <tt>fmul</tt>/<tt>fadd</tt>) are defined in
Dan Gohman	2a02035	2010-03-25 00:03:04 +0000	[diff] [blame]	1093	the <tt>include/llvm/Target/TargetSelectionDAG.td</tt> file. "
				1094	<tt>F4RC</tt>" is the register class of the input and result values.</p>
Chris Lattner	17acad6	2005-10-16 20:02:19 +0000	[diff] [blame]	1095
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1096	<p>The TableGen DAG instruction selector generator reads the instruction
				1097	patterns in the <tt>.td</tt> file and automatically builds parts of the
				1098	pattern matching code for your target. It has the following strengths:</p>
Chris Lattner	17acad6	2005-10-16 20:02:19 +0000	[diff] [blame]	1099
				1100	<ul>
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1101	<li>At compiler-compiler time, it analyzes your instruction patterns and tells
				1102	you if your patterns make sense or not.</li>
				1103
				1104	<li>It can handle arbitrary constraints on operands for the pattern match. In
				1105	particular, it is straight-forward to say things like "match any immediate
				1106	that is a 13-bit sign-extended value". For examples, see the
				1107	<tt>immSExt16</tt> and related <tt>tblgen</tt> classes in the PowerPC
				1108	backend.</li>
				1109
				1110	<li>It knows several important identities for the patterns defined. For
				1111	example, it knows that addition is commutative, so it allows the
				1112	<tt>FMADDS</tt> pattern above to match "<tt>(fadd X, (fmul Y, Z))</tt>" as
				1113	well as "<tt>(fadd (fmul X, Y), Z)</tt>", without the target author having
				1114	to specially handle this case.</li>
				1115
				1116	<li>It has a full-featured type-inferencing system. In particular, you should
				1117	rarely have to explicitly tell the system what type parts of your patterns
				1118	are. In the <tt>FMADDS</tt> case above, we didn't have to tell
				1119	<tt>tblgen</tt> that all of the nodes in the pattern are of type 'f32'.
				1120	It was able to infer and propagate this knowledge from the fact that
				1121	<tt>F4RC</tt> has type 'f32'.</li>
				1122
				1123	<li>Targets can define their own (and rely on built-in) "pattern fragments".
				1124	Pattern fragments are chunks of reusable patterns that get inlined into
				1125	your patterns during compiler-compiler time. For example, the integer
				1126	"<tt>(not x)</tt>" operation is actually defined as a pattern fragment
				1127	that expands as "<tt>(xor x, -1)</tt>", since the SelectionDAG does not
				1128	have a native '<tt>not</tt>' operation. Targets can define their own
				1129	short-hand fragments as they see fit. See the definition of
				1130	'<tt>not</tt>' and '<tt>ineg</tt>' for examples.</li>
				1131
				1132	<li>In addition to instructions, targets can specify arbitrary patterns that
				1133	map to one or more instructions using the 'Pat' class. For example, the
				1134	PowerPC has no way to load an arbitrary integer immediate into a register
				1135	in one instruction. To tell tblgen how to do this, it defines:
				1136	<br>
				1137	<br>
				1138	<div class="doc_code">
				1139	<pre>
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	1140	// Arbitrary immediate support. Implement in terms of LIS/ORI.
				1141	def : Pat<(i32 imm:$imm),
				1142	(ORI (LIS (HI16 imm:$imm)), (LO16 imm:$imm))>;
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1143	</pre>
				1144	</div>
				1145	<br>
				1146	If none of the single-instruction patterns for loading an immediate into a
				1147	register match, this will be used. This rule says "match an arbitrary i32
				1148	immediate, turning it into an <tt>ORI</tt> ('or a 16-bit immediate') and
				1149	an <tt>LIS</tt> ('load 16-bit immediate, where the immediate is shifted to
				1150	the left 16 bits') instruction". To make this work, the
				1151	<tt>LO16</tt>/<tt>HI16</tt> node transformations are used to manipulate
				1152	the input immediate (in this case, take the high or low 16-bits of the
				1153	immediate).</li>
				1154
				1155	<li>While the system does automate a lot, it still allows you to write custom
				1156	C++ code to match special cases if there is something that is hard to
				1157	express.</li>
Chris Lattner	17acad6	2005-10-16 20:02:19 +0000	[diff] [blame]	1158	</ul>
				1159
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	1160	<p>While it has many strengths, the system currently has some limitations,
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1161	primarily because it is a work in progress and is not yet finished:</p>
Chris Lattner	17acad6	2005-10-16 20:02:19 +0000	[diff] [blame]	1162
				1163	<ul>
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1164	<li>Overall, there is no way to define or match SelectionDAG nodes that define
Dan Gohman	a4fea5b	2009-04-22 15:55:31 +0000	[diff] [blame]	1165	multiple values (e.g. <tt>SMUL_LOHI</tt>, <tt>LOAD</tt>, <tt>CALL</tt>,
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1166	etc). This is the biggest reason that you currently still <em>have
				1167	to</em> write custom C++ code for your instruction selector.</li>
				1168
				1169	<li>There is no great way to support matching complex addressing modes yet.
				1170	In the future, we will extend pattern fragments to allow them to define
				1171	multiple values (e.g. the four operands of the <a href="#x86_memory">X86
				1172	addressing mode</a>, which are currently matched with custom C++ code).
				1173	In addition, we'll extend fragments so that a fragment can match multiple
				1174	different patterns.</li>
				1175
				1176	<li>We don't automatically infer flags like isStore/isLoad yet.</li>
				1177
				1178	<li>We don't automatically generate the set of supported registers and
				1179	operations for the <a href="#selectiondag_legalize">Legalizer</a>
				1180	yet.</li>
				1181
				1182	<li>We don't have a way of tying in custom legalized nodes yet.</li>
Chris Lattner	721f3ce	2005-10-17 04:18:41 +0000	[diff] [blame]	1183	</ul>
Chris Lattner	17acad6	2005-10-16 20:02:19 +0000	[diff] [blame]	1184
				1185	<p>Despite these limitations, the instruction selector generator is still quite
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1186	useful for most of the binary and logical operations in typical instruction
				1187	sets. If you run into any problems or can't figure out how to do something,
				1188	please let Chris know!</p>
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	1189
				1190	</div>
				1191
				1192	<!-- _______________________________________________________________________ -->
				1193	<div class="doc_subsubsection">
Chris Lattner	d6f1a33	2005-10-16 18:31:08 +0000	[diff] [blame]	1194	<a name="selectiondag_sched">SelectionDAG Scheduling and Formation Phase</a>
Chris Lattner	acf3d62	2005-10-16 00:36:38 +0000	[diff] [blame]	1195	</div>
				1196
				1197	<div class="doc_text">
				1198
				1199	<p>The scheduling phase takes the DAG of target instructions from the selection
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1200	phase and assigns an order. The scheduler can pick an order depending on
				1201	various constraints of the machines (i.e. order for minimal register pressure
				1202	or try to cover instruction latencies). Once an order is established, the
				1203	DAG is converted to a list
				1204	of <tt><a href="#machineinstr">MachineInstr</a></tt>s and the SelectionDAG is
				1205	destroyed.</p>
Chris Lattner	acf3d62	2005-10-16 00:36:38 +0000	[diff] [blame]	1206
Jeff Cohen	dd24d7c	2005-10-24 16:54:55 +0000	[diff] [blame]	1207	<p>Note that this phase is logically separate from the instruction selection
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1208	phase, but is tied to it closely in the code because it operates on
				1209	SelectionDAGs.</p>
Chris Lattner	9fcf000	2005-10-17 03:09:31 +0000	[diff] [blame]	1210
Chris Lattner	acf3d62	2005-10-16 00:36:38 +0000	[diff] [blame]	1211	</div>
				1212
				1213	<!-- _______________________________________________________________________ -->
				1214	<div class="doc_subsubsection">
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	1215	<a name="selectiondag_future">Future directions for the SelectionDAG</a>
				1216	</div>
				1217
				1218	<div class="doc_text">
				1219
				1220	<ol>
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1221	<li>Optional function-at-a-time selection.</li>
				1222
				1223	<li>Auto-generate entire selector from <tt>.td</tt> file.</li>
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	1224	</ol>
				1225
				1226	</div>
Reid Spencer	4da9784	2005-04-24 20:56:18 +0000	[diff] [blame]	1227
				1228	<!-- ======================================================================= -->
				1229	<div class="doc_subsection">
				1230	<a name="ssamco">SSA-based Machine Code Optimizations</a>
				1231	</div>
				1232	<div class="doc_text"><p>To Be Written</p></div>
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	1233
Reid Spencer	4da9784	2005-04-24 20:56:18 +0000	[diff] [blame]	1234	<!-- ======================================================================= -->
				1235	<div class="doc_subsection">
Bill Wendling	d495bd0	2006-09-06 18:42:41 +0000	[diff] [blame]	1236	<a name="liveintervals">Live Intervals</a>
Bill Wendling	bb902cf	2006-09-04 23:35:52 +0000	[diff] [blame]	1237	</div>
				1238
				1239	<div class="doc_text">
				1240
Bill Wendling	d495bd0	2006-09-06 18:42:41 +0000	[diff] [blame]	1241	<p>Live Intervals are the ranges (intervals) where a variable is <i>live</i>.
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1242	They are used by some <a href="#regalloc">register allocator</a> passes to
				1243	determine if two or more virtual registers which require the same physical
				1244	register are live at the same point in the program (i.e., they conflict).
				1245	When this situation occurs, one virtual register must be <i>spilled</i>.</p>
Bill Wendling	bb902cf	2006-09-04 23:35:52 +0000	[diff] [blame]	1246
				1247	</div>
				1248
				1249	<!-- _______________________________________________________________________ -->
				1250	<div class="doc_subsubsection">
				1251	<a name="livevariable_analysis">Live Variable Analysis</a>
				1252	</div>
				1253
				1254	<div class="doc_text">
				1255
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1256	<p>The first step in determining the live intervals of variables is to calculate
				1257	the set of registers that are immediately dead after the instruction (i.e.,
				1258	the instruction calculates the value, but it is never used) and the set of
				1259	registers that are used by the instruction, but are never used after the
				1260	instruction (i.e., they are killed). Live variable information is computed
				1261	for each <i>virtual</i> register and <i>register allocatable</i> physical
				1262	register in the function. This is done in a very efficient manner because it
				1263	uses SSA to sparsely compute lifetime information for virtual registers
				1264	(which are in SSA form) and only has to track physical registers within a
				1265	block. Before register allocation, LLVM can assume that physical registers
				1266	are only live within a single basic block. This allows it to do a single,
				1267	local analysis to resolve physical register lifetimes within each basic
				1268	block. If a physical register is not register allocatable (e.g., a stack
				1269	pointer or condition codes), it is not tracked.</p>
Bill Wendling	bb902cf	2006-09-04 23:35:52 +0000	[diff] [blame]	1270
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1271	<p>Physical registers may be live in to or out of a function. Live in values are
				1272	typically arguments in registers. Live out values are typically return values
				1273	in registers. Live in values are marked as such, and are given a dummy
				1274	"defining" instruction during live intervals analysis. If the last basic
				1275	block of a function is a <tt>return</tt>, then it's marked as using all live
				1276	out values in the function.</p>
Bill Wendling	bb902cf	2006-09-04 23:35:52 +0000	[diff] [blame]	1277
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1278	<p><tt>PHI</tt> nodes need to be handled specially, because the calculation of
				1279	the live variable information from a depth first traversal of the CFG of the
				1280	function won't guarantee that a virtual register used by the <tt>PHI</tt>
				1281	node is defined before it's used. When a <tt>PHI</tt> node is encountered,
				1282	only the definition is handled, because the uses will be handled in other
				1283	basic blocks.</p>
Bill Wendling	bb902cf	2006-09-04 23:35:52 +0000	[diff] [blame]	1284
				1285	<p>For each <tt>PHI</tt> node of the current basic block, we simulate an
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1286	assignment at the end of the current basic block and traverse the successor
				1287	basic blocks. If a successor basic block has a <tt>PHI</tt> node and one of
				1288	the <tt>PHI</tt> node's operands is coming from the current basic block, then
				1289	the variable is marked as <i>alive</i> within the current basic block and all
				1290	of its predecessor basic blocks, until the basic block with the defining
				1291	instruction is encountered.</p>
Bill Wendling	bb902cf	2006-09-04 23:35:52 +0000	[diff] [blame]	1292
				1293	</div>
				1294
Bill Wendling	d495bd0	2006-09-06 18:42:41 +0000	[diff] [blame]	1295	<!-- _______________________________________________________________________ -->
				1296	<div class="doc_subsubsection">
				1297	<a name="liveintervals_analysis">Live Intervals Analysis</a>
				1298	</div>
Bill Wendling	bb902cf	2006-09-04 23:35:52 +0000	[diff] [blame]	1299
Bill Wendling	d495bd0	2006-09-06 18:42:41 +0000	[diff] [blame]	1300	<div class="doc_text">
Bill Wendling	34ab067	2006-10-11 06:30:10 +0000	[diff] [blame]	1301
Bill Wendling	f21825f	2006-10-11 18:00:22 +0000	[diff] [blame]	1302	<p>We now have the information available to perform the live intervals analysis
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1303	and build the live intervals themselves. We start off by numbering the basic
				1304	blocks and machine instructions. We then handle the "live-in" values. These
				1305	are in physical registers, so the physical register is assumed to be killed
				1306	by the end of the basic block. Live intervals for virtual registers are
				1307	computed for some ordering of the machine instructions <tt>[1, N]</tt>. A
				1308	live interval is an interval <tt>[i, j)</tt>, where <tt>1 <= i <= j
				1309	< N</tt>, for which a variable is live.</p>
Bill Wendling	34ab067	2006-10-11 06:30:10 +0000	[diff] [blame]	1310
Bill Wendling	f21825f	2006-10-11 18:00:22 +0000	[diff] [blame]	1311	<p><i><b>More to come...</b></i></p>
				1312
Bill Wendling	d495bd0	2006-09-06 18:42:41 +0000	[diff] [blame]	1313	</div>
Bill Wendling	bb902cf	2006-09-04 23:35:52 +0000	[diff] [blame]	1314
				1315	<!-- ======================================================================= -->
				1316	<div class="doc_subsection">
Reid Spencer	4da9784	2005-04-24 20:56:18 +0000	[diff] [blame]	1317	<a name="regalloc">Register Allocation</a>
				1318	</div>
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	1319
				1320	<div class="doc_text">
				1321
Bill Wendling	34ab067	2006-10-11 06:30:10 +0000	[diff] [blame]	1322	<p>The <i>Register Allocation problem</i> consists in mapping a program
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1323	<i>P<sub>v</sub></i>, that can use an unbounded number of virtual registers,
				1324	to a program <i>P<sub>p</sub></i> that contains a finite (possibly small)
				1325	number of physical registers. Each target architecture has a different number
				1326	of physical registers. If the number of physical registers is not enough to
				1327	accommodate all the virtual registers, some of them will have to be mapped
				1328	into memory. These virtuals are called <i>spilled virtuals</i>.</p>
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	1329
				1330	</div>
				1331
				1332	<!-- _______________________________________________________________________ -->
				1333
				1334	<div class="doc_subsubsection">
				1335	<a name="regAlloc_represent">How registers are represented in LLVM</a>
				1336	</div>
				1337
				1338	<div class="doc_text">
				1339
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1340	<p>In LLVM, physical registers are denoted by integer numbers that normally
				1341	range from 1 to 1023. To see how this numbering is defined for a particular
				1342	architecture, you can read the <tt>GenRegisterNames.inc</tt> file for that
				1343	architecture. For instance, by
				1344	inspecting <tt>lib/Target/X86/X86GenRegisterNames.inc</tt> we see that the
				1345	32-bit register <tt>EAX</tt> is denoted by 15, and the MMX register
				1346	<tt>MM0</tt> is mapped to 48.</p>
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	1347
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1348	<p>Some architectures contain registers that share the same physical location. A
				1349	notable example is the X86 platform. For instance, in the X86 architecture,
				1350	the registers <tt>EAX</tt>, <tt>AX</tt> and <tt>AL</tt> share the first eight
				1351	bits. These physical registers are marked as <i>aliased</i> in LLVM. Given a
				1352	particular architecture, you can check which registers are aliased by
				1353	inspecting its <tt>RegisterInfo.td</tt> file. Moreover, the method
				1354	<tt>TargetRegisterInfo::getAliasSet(p_reg)</tt> returns an array containing
				1355	all the physical registers aliased to the register <tt>p_reg</tt>.</p>
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	1356
				1357	<p>Physical registers, in LLVM, are grouped in <i>Register Classes</i>.
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1358	Elements in the same register class are functionally equivalent, and can be
				1359	interchangeably used. Each virtual register can only be mapped to physical
				1360	registers of a particular class. For instance, in the X86 architecture, some
				1361	virtuals can only be allocated to 8 bit registers. A register class is
				1362	described by <tt>TargetRegisterClass</tt> objects. To discover if a virtual
				1363	register is compatible with a given physical, this code can be used:</p>
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	1364
				1365	<div class="doc_code">
				1366	<pre>
Jim Laskey	5782584	2006-12-15 10:40:48 +0000	[diff] [blame]	1367	bool RegMapping_Fer::compatible_class(MachineFunction &mf,
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	1368	unsigned v_reg,
				1369	unsigned p_reg) {
Dan Gohman	3a4be0f	2008-02-10 18:45:23 +0000	[diff] [blame]	1370	assert(TargetRegisterInfo::isPhysicalRegister(p_reg) &&
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	1371	"Target register must be physical");
Chris Lattner	21ec2b4	2007-12-31 04:16:08 +0000	[diff] [blame]	1372	const TargetRegisterClass *trc = mf.getRegInfo().getRegClass(v_reg);
				1373	return trc->contains(p_reg);
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	1374	}
				1375	</pre>
				1376	</div>
				1377
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1378	<p>Sometimes, mostly for debugging purposes, it is useful to change the number
				1379	of physical registers available in the target architecture. This must be done
				1380	statically, inside the <tt>TargetRegsterInfo.td</tt> file. Just <tt>grep</tt>
				1381	for <tt>RegisterClass</tt>, the last parameter of which is a list of
				1382	registers. Just commenting some out is one simple way to avoid them being
				1383	used. A more polite way is to explicitly exclude some registers from
Dan Gohman	1715115	2009-07-24 00:30:09 +0000	[diff] [blame]	1384	the <i>allocation order</i>. See the definition of the <tt>GR8</tt> register
				1385	class in <tt>lib/Target/X86/X86RegisterInfo.td</tt> for an example of this.
				1386	</p>
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	1387
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1388	<p>Virtual registers are also denoted by integer numbers. Contrary to physical
				1389	registers, different virtual registers never share the same number. The
				1390	smallest virtual register is normally assigned the number 1024. This may
				1391	change, so, in order to know which is the first virtual register, you should
				1392	access <tt>TargetRegisterInfo::FirstVirtualRegister</tt>. Any register whose
				1393	number is greater than or equal
				1394	to <tt>TargetRegisterInfo::FirstVirtualRegister</tt> is considered a virtual
				1395	register. Whereas physical registers are statically defined in
				1396	a <tt>TargetRegisterInfo.td</tt> file and cannot be created by the
				1397	application developer, that is not the case with virtual registers. In order
				1398	to create new virtual registers, use the
				1399	method <tt>MachineRegisterInfo::createVirtualRegister()</tt>. This method
				1400	will return a virtual register with the highest code.</p>
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	1401
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1402	<p>Before register allocation, the operands of an instruction are mostly virtual
				1403	registers, although physical registers may also be used. In order to check if
				1404	a given machine operand is a register, use the boolean
				1405	function <tt>MachineOperand::isRegister()</tt>. To obtain the integer code of
				1406	a register, use <tt>MachineOperand::getReg()</tt>. An instruction may define
				1407	or use a register. For instance, <tt>ADD reg:1026 := reg:1025 reg:1024</tt>
				1408	defines the registers 1024, and uses registers 1025 and 1026. Given a
				1409	register operand, the method <tt>MachineOperand::isUse()</tt> informs if that
				1410	register is being used by the instruction. The
				1411	method <tt>MachineOperand::isDef()</tt> informs if that registers is being
				1412	defined.</p>
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	1413
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1414	<p>We will call physical registers present in the LLVM bitcode before register
				1415	allocation <i>pre-colored registers</i>. Pre-colored registers are used in
				1416	many different situations, for instance, to pass parameters of functions
				1417	calls, and to store results of particular instructions. There are two types
				1418	of pre-colored registers: the ones <i>implicitly</i> defined, and
				1419	those <i>explicitly</i> defined. Explicitly defined registers are normal
				1420	operands, and can be accessed
				1421	with <tt>MachineInstr::getOperand(int)::getReg()</tt>. In order to check
				1422	which registers are implicitly defined by an instruction, use
				1423	the <tt>TargetInstrInfo::get(opcode)::ImplicitDefs</tt>,
				1424	where <tt>opcode</tt> is the opcode of the target instruction. One important
				1425	difference between explicit and implicit physical registers is that the
				1426	latter are defined statically for each instruction, whereas the former may
				1427	vary depending on the program being compiled. For example, an instruction
				1428	that represents a function call will always implicitly define or use the same
				1429	set of physical registers. To read the registers implicitly used by an
				1430	instruction,
				1431	use <tt>TargetInstrInfo::get(opcode)::ImplicitUses</tt>. Pre-colored
				1432	registers impose constraints on any register allocation algorithm. The
Bob Wilson	35e856a	2010-04-09 18:39:54 +0000	[diff] [blame]	1433	register allocator must make sure that none of them are overwritten by
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1434	the values of virtual registers while still alive.</p>
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	1435
				1436	</div>
				1437
				1438	<!-- _______________________________________________________________________ -->
				1439
				1440	<div class="doc_subsubsection">
				1441	<a name="regAlloc_howTo">Mapping virtual registers to physical registers</a>
				1442	</div>
				1443
				1444	<div class="doc_text">
				1445
				1446	<p>There are two ways to map virtual registers to physical registers (or to
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1447	memory slots). The first way, that we will call <i>direct mapping</i>, is
				1448	based on the use of methods of the classes <tt>TargetRegisterInfo</tt>,
				1449	and <tt>MachineOperand</tt>. The second way, that we will call <i>indirect
				1450	mapping</i>, relies on the <tt>VirtRegMap</tt> class in order to insert loads
				1451	and stores sending and getting values to and from memory.</p>
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	1452
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1453	<p>The direct mapping provides more flexibility to the developer of the register
				1454	allocator; however, it is more error prone, and demands more implementation
				1455	work. Basically, the programmer will have to specify where load and store
				1456	instructions should be inserted in the target function being compiled in
				1457	order to get and store values in memory. To assign a physical register to a
				1458	virtual register present in a given operand,
				1459	use <tt>MachineOperand::setReg(p_reg)</tt>. To insert a store instruction,
				1460	use <tt>TargetRegisterInfo::storeRegToStackSlot(...)</tt>, and to insert a
				1461	load instruction, use <tt>TargetRegisterInfo::loadRegFromStackSlot</tt>.</p>
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	1462
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1463	<p>The indirect mapping shields the application developer from the complexities
				1464	of inserting load and store instructions. In order to map a virtual register
				1465	to a physical one, use <tt>VirtRegMap::assignVirt2Phys(vreg, preg)</tt>. In
				1466	order to map a certain virtual register to memory,
				1467	use <tt>VirtRegMap::assignVirt2StackSlot(vreg)</tt>. This method will return
				1468	the stack slot where <tt>vreg</tt>'s value will be located. If it is
				1469	necessary to map another virtual register to the same stack slot,
				1470	use <tt>VirtRegMap::assignVirt2StackSlot(vreg, stack_location)</tt>. One
				1471	important point to consider when using the indirect mapping, is that even if
				1472	a virtual register is mapped to memory, it still needs to be mapped to a
				1473	physical register. This physical register is the location where the virtual
				1474	register is supposed to be found before being stored or after being
				1475	reloaded.</p>
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	1476
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1477	<p>If the indirect strategy is used, after all the virtual registers have been
				1478	mapped to physical registers or stack slots, it is necessary to use a spiller
				1479	object to place load and store instructions in the code. Every virtual that
				1480	has been mapped to a stack slot will be stored to memory after been defined
				1481	and will be loaded before being used. The implementation of the spiller tries
				1482	to recycle load/store instructions, avoiding unnecessary instructions. For an
				1483	example of how to invoke the spiller,
				1484	see <tt>RegAllocLinearScan::runOnMachineFunction</tt>
				1485	in <tt>lib/CodeGen/RegAllocLinearScan.cpp</tt>.</p>
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	1486
				1487	</div>
				1488
				1489	<!-- _______________________________________________________________________ -->
				1490	<div class="doc_subsubsection">
				1491	<a name="regAlloc_twoAddr">Handling two address instructions</a>
				1492	</div>
				1493
				1494	<div class="doc_text">
				1495
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1496	<p>With very rare exceptions (e.g., function calls), the LLVM machine code
				1497	instructions are three address instructions. That is, each instruction is
				1498	expected to define at most one register, and to use at most two registers.
				1499	However, some architectures use two address instructions. In this case, the
				1500	defined register is also one of the used register. For instance, an
				1501	instruction such as <tt>ADD %EAX, %EBX</tt>, in X86 is actually equivalent
				1502	to <tt>%EAX = %EAX + %EBX</tt>.</p>
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	1503
				1504	<p>In order to produce correct code, LLVM must convert three address
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1505	instructions that represent two address instructions into true two address
				1506	instructions. LLVM provides the pass <tt>TwoAddressInstructionPass</tt> for
				1507	this specific purpose. It must be run before register allocation takes
				1508	place. After its execution, the resulting code may no longer be in SSA
				1509	form. This happens, for instance, in situations where an instruction such
				1510	as <tt>%a = ADD %b %c</tt> is converted to two instructions such as:</p>
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	1511
				1512	<div class="doc_code">
				1513	<pre>
				1514	%a = MOVE %b
Dan Gohman	01cd2d9	2008-06-13 17:55:57 +0000	[diff] [blame]	1515	%a = ADD %a %c
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	1516	</pre>
				1517	</div>
				1518
				1519	<p>Notice that, internally, the second instruction is represented as
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1520	<tt>ADD %a[def/use] %c</tt>. I.e., the register operand <tt>%a</tt> is both
				1521	used and defined by the instruction.</p>
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	1522
				1523	</div>
				1524
				1525	<!-- _______________________________________________________________________ -->
				1526	<div class="doc_subsubsection">
				1527	<a name="regAlloc_ssaDecon">The SSA deconstruction phase</a>
				1528	</div>
				1529
				1530	<div class="doc_text">
				1531
				1532	<p>An important transformation that happens during register allocation is called
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1533	the <i>SSA Deconstruction Phase</i>. The SSA form simplifies many analyses
				1534	that are performed on the control flow graph of programs. However,
				1535	traditional instruction sets do not implement PHI instructions. Thus, in
				1536	order to generate executable code, compilers must replace PHI instructions
				1537	with other instructions that preserve their semantics.</p>
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	1538
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1539	<p>There are many ways in which PHI instructions can safely be removed from the
				1540	target code. The most traditional PHI deconstruction algorithm replaces PHI
				1541	instructions with copy instructions. That is the strategy adopted by
				1542	LLVM. The SSA deconstruction algorithm is implemented
				1543	in <tt>lib/CodeGen/PHIElimination.cpp</tt>. In order to invoke this pass, the
				1544	identifier <tt>PHIEliminationID</tt> must be marked as required in the code
				1545	of the register allocator.</p>
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	1546
				1547	</div>
				1548
				1549	<!-- _______________________________________________________________________ -->
				1550	<div class="doc_subsubsection">
				1551	<a name="regAlloc_fold">Instruction folding</a>
				1552	</div>
				1553
				1554	<div class="doc_text">
				1555
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1556	<p><i>Instruction folding</i> is an optimization performed during register
				1557	allocation that removes unnecessary copy instructions. For instance, a
				1558	sequence of instructions such as:</p>
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	1559
				1560	<div class="doc_code">
				1561	<pre>
				1562	%EBX = LOAD %mem_address
				1563	%EAX = COPY %EBX
				1564	</pre>
				1565	</div>
				1566
Dan Gohman	970a547	2008-11-24 16:35:31 +0000	[diff] [blame]	1567	<p>can be safely substituted by the single instruction:</p>
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	1568
				1569	<div class="doc_code">
				1570	<pre>
				1571	%EAX = LOAD %mem_address
				1572	</pre>
				1573	</div>
				1574
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1575	<p>Instructions can be folded with
				1576	the <tt>TargetRegisterInfo::foldMemoryOperand(...)</tt> method. Care must be
				1577	taken when folding instructions; a folded instruction can be quite different
				1578	from the original
				1579	instruction. See <tt>LiveIntervals::addIntervalsForSpills</tt>
				1580	in <tt>lib/CodeGen/LiveIntervalAnalysis.cpp</tt> for an example of its
				1581	use.</p>
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	1582
				1583	</div>
				1584
				1585	<!-- _______________________________________________________________________ -->
				1586
				1587	<div class="doc_subsubsection">
				1588	<a name="regAlloc_builtIn">Built in register allocators</a>
				1589	</div>
				1590
				1591	<div class="doc_text">
				1592
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1593	<p>The LLVM infrastructure provides the application developer with three
				1594	different register allocators:</p>
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	1595
				1596	<ul>
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1597	<li><i>Linear Scan</i> — <i>The default allocator</i>. This is the
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	1598	well-know linear scan register allocator. Whereas the
				1599	<i>Simple</i> and <i>Local</i> algorithms use a direct mapping
				1600	implementation technique, the <i>Linear Scan</i> implementation
				1601	uses a spiller in order to place load and stores.</li>
Jakob Stoklund Olesen	ec2e964	2010-06-15 21:58:33 +0000	[diff] [blame]	1602
				1603	<li><i>Fast</i> — This register allocator is the default for debug
				1604	builds. It allocates registers on a basic block level, attempting to keep
				1605	values in registers and reusing registers as appropriate.</li>
				1606
				1607	<li><i>PBQP</i> — A Partitioned Boolean Quadratic Programming (PBQP)
				1608	based register allocator. This allocator works by constructing a PBQP
				1609	problem representing the register allocation problem under consideration,
				1610	solving this using a PBQP solver, and mapping the solution back to a
				1611	register assignment.</li>
				1612
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	1613	</ul>
				1614
				1615	<p>The type of register allocator used in <tt>llc</tt> can be chosen with the
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1616	command line option <tt>-regalloc=...</tt>:</p>
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	1617
				1618	<div class="doc_code">
				1619	<pre>
Dan Gohman	dd121d5	2009-08-25 15:54:01 +0000	[diff] [blame]	1620	$ llc -regalloc=linearscan file.bc -o ln.s;
Jakob Stoklund Olesen	ec2e964	2010-06-15 21:58:33 +0000	[diff] [blame]	1621	$ llc -regalloc=fast file.bc -o fa.s;
				1622	$ llc -regalloc=pbqp file.bc -o pbqp.s;
Bill Wendling	00c5aec	2006-09-01 21:46:00 +0000	[diff] [blame]	1623	</pre>
				1624	</div>
				1625
				1626	</div>
				1627
Reid Spencer	4da9784	2005-04-24 20:56:18 +0000	[diff] [blame]	1628	<!-- ======================================================================= -->
				1629	<div class="doc_subsection">
				1630	<a name="proepicode">Prolog/Epilog Code Insertion</a>
				1631	</div>
				1632	<div class="doc_text"><p>To Be Written</p></div>
				1633	<!-- ======================================================================= -->
				1634	<div class="doc_subsection">
				1635	<a name="latemco">Late Machine Code Optimizations</a>
				1636	</div>
				1637	<div class="doc_text"><p>To Be Written</p></div>
				1638	<!-- ======================================================================= -->
				1639	<div class="doc_subsection">
Chris Lattner	d6f1a33	2005-10-16 18:31:08 +0000	[diff] [blame]	1640	<a name="codeemit">Code Emission</a>
Reid Spencer	4da9784	2005-04-24 20:56:18 +0000	[diff] [blame]	1641	</div>
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	1642	<div class="doc_text"><p>To Be Written</p></div>
Chris Lattner	d6f1a33	2005-10-16 18:31:08 +0000	[diff] [blame]	1643	<!-- _______________________________________________________________________ -->
				1644	<div class="doc_subsubsection">
				1645	<a name="codeemit_asm">Generating Assembly Code</a>
				1646	</div>
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	1647	<div class="doc_text"><p>To Be Written</p></div>
Chris Lattner	d6f1a33	2005-10-16 18:31:08 +0000	[diff] [blame]	1648	<!-- _______________________________________________________________________ -->
				1649	<div class="doc_subsubsection">
				1650	<a name="codeemit_bin">Generating Binary Machine Code</a>
				1651	</div>
				1652
				1653	<div class="doc_text">
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	1654	<p>For the JIT or <tt>.o</tt> file writer</p>
Chris Lattner	d6f1a33	2005-10-16 18:31:08 +0000	[diff] [blame]	1655	</div>
				1656
				1657
Chris Lattner	54903b6	2005-01-28 17:22:53 +0000	[diff] [blame]	1658	<!-- *********************************************************************** -->
				1659	<div class="doc_section">
Chris Lattner	d6f1a33	2005-10-16 18:31:08 +0000	[diff] [blame]	1660	<a name="targetimpls">Target-specific Implementation Notes</a>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	1661	</div>
				1662	<!-- *********************************************************************** -->
				1663
				1664	<div class="doc_text">
				1665
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1666	<p>This section of the document explains features or design decisions that are
				1667	specific to the code generator for a particular target.</p>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	1668
				1669	</div>
				1670
Arnold Schwaighofer	2c6b888	2008-05-14 09:17:12 +0000	[diff] [blame]	1671	<!-- ======================================================================= -->
				1672	<div class="doc_subsection">
				1673	<a name="tailcallopt">Tail call optimization</a>
				1674	</div>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	1675
Arnold Schwaighofer	2c6b888	2008-05-14 09:17:12 +0000	[diff] [blame]	1676	<div class="doc_text">
Arnold Schwaighofer	2c6b888	2008-05-14 09:17:12 +0000	[diff] [blame]	1677
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1678	<p>Tail call optimization, callee reusing the stack of the caller, is currently
				1679	supported on x86/x86-64 and PowerPC. It is performed if:</p>
				1680
				1681	<ul>
Chris Lattner	a179e4d	2010-03-11 00:22:57 +0000	[diff] [blame]	1682	<li>Caller and callee have the calling convention <tt>fastcc</tt> or
				1683	<tt>cc 10</tt> (GHC call convention).</li>
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1684
				1685	<li>The call is a tail call - in tail position (ret immediately follows call
				1686	and ret uses value of call or is void).</li>
				1687
				1688	<li>Option <tt>-tailcallopt</tt> is enabled.</li>
				1689
				1690	<li>Platform specific constraints are met.</li>
				1691	</ul>
				1692
				1693	<p>x86/x86-64 constraints:</p>
				1694
				1695	<ul>
				1696	<li>No variable argument lists are used.</li>
				1697
				1698	<li>On x86-64 when generating GOT/PIC code only module-local calls (visibility
				1699	= hidden or protected) are supported.</li>
				1700	</ul>
				1701
				1702	<p>PowerPC constraints:</p>
				1703
				1704	<ul>
				1705	<li>No variable argument lists are used.</li>
				1706
				1707	<li>No byval parameters are used.</li>
				1708
				1709	<li>On ppc32/64 GOT/PIC only module-local calls (visibility = hidden or protected) are supported.</li>
				1710	</ul>
				1711
				1712	<p>Example:</p>
				1713
				1714	<p>Call as <tt>llc -tailcallopt test.ll</tt>.</p>
				1715
				1716	<div class="doc_code">
				1717	<pre>
Arnold Schwaighofer	2c6b888	2008-05-14 09:17:12 +0000	[diff] [blame]	1718	declare fastcc i32 @tailcallee(i32 inreg %a1, i32 inreg %a2, i32 %a3, i32 %a4)
				1719
				1720	define fastcc i32 @tailcaller(i32 %in1, i32 %in2) {
				1721	%l1 = add i32 %in1, %in2
				1722	%tmp = tail call fastcc i32 @tailcallee(i32 %in1 inreg, i32 %in2 inreg, i32 %in1, i32 %l1)
				1723	ret i32 %tmp
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1724	}
				1725	</pre>
				1726	</div>
				1727
				1728	<p>Implications of <tt>-tailcallopt</tt>:</p>
				1729
				1730	<p>To support tail call optimization in situations where the callee has more
				1731	arguments than the caller a 'callee pops arguments' convention is used. This
				1732	currently causes each <tt>fastcc</tt> call that is not tail call optimized
				1733	(because one or more of above constraints are not met) to be followed by a
				1734	readjustment of the stack. So performance might be worse in such cases.</p>
				1735
Arnold Schwaighofer	2c6b888	2008-05-14 09:17:12 +0000	[diff] [blame]	1736	</div>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	1737	<!-- ======================================================================= -->
				1738	<div class="doc_subsection">
Evan Cheng	5967649	2010-03-08 21:05:02 +0000	[diff] [blame]	1739	<a name="sibcallopt">Sibling call optimization</a>
				1740	</div>
				1741
				1742	<div class="doc_text">
				1743
				1744	<p>Sibling call optimization is a restricted form of tail call optimization.
				1745	Unlike tail call optimization described in the previous section, it can be
				1746	performed automatically on any tail calls when <tt>-tailcallopt</tt> option
				1747	is not specified.</p>
				1748
				1749	<p>Sibling call optimization is currently performed on x86/x86-64 when the
				1750	following constraints are met:</p>
				1751
				1752	<ul>
				1753	<li>Caller and callee have the same calling convention. It can be either
				1754	<tt>c</tt> or <tt>fastcc</tt>.
				1755
				1756	<li>The call is a tail call - in tail position (ret immediately follows call
				1757	and ret uses value of call or is void).</li>
				1758
				1759	<li>Caller and callee have matching return type or the callee result is not
				1760	used.
				1761
				1762	<li>If any of the callee arguments are being passed in stack, they must be
				1763	available in caller's own incoming argument stack and the frame offsets
				1764	must be the same.
				1765	</ul>
				1766
				1767	<p>Example:</p>
				1768	<div class="doc_code">
				1769	<pre>
				1770	declare i32 @bar(i32, i32)
				1771
				1772	define i32 @foo(i32 %a, i32 %b, i32 %c) {
				1773	entry:
				1774	%0 = tail call i32 @bar(i32 %a, i32 %b)
				1775	ret i32 %0
				1776	}
				1777	</pre>
				1778	</div>
				1779
				1780	</div>
				1781	<!-- ======================================================================= -->
				1782	<div class="doc_subsection">
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	1783	<a name="x86">The X86 backend</a>
				1784	</div>
				1785
				1786	<div class="doc_text">
				1787
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	1788	<p>The X86 code generator lives in the <tt>lib/Target/X86</tt> directory. This
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1789	code generator is capable of targeting a variety of x86-32 and x86-64
				1790	processors, and includes support for ISA extensions such as MMX and SSE.</p>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	1791
				1792	</div>
				1793
				1794	<!-- _______________________________________________________________________ -->
				1795	<div class="doc_subsubsection">
Nate Begeman	7ea4e86	2009-01-26 02:54:45 +0000	[diff] [blame]	1796	<a name="x86_tt">X86 Target Triples supported</a>
Chris Lattner	de69bf9	2005-07-12 00:20:49 +0000	[diff] [blame]	1797	</div>
				1798
				1799	<div class="doc_text">
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	1800
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1801	<p>The following are the known target triples that are supported by the X86
				1802	backend. This is not an exhaustive list, and it would be useful to add those
				1803	that people test.</p>
Chris Lattner	de69bf9	2005-07-12 00:20:49 +0000	[diff] [blame]	1804
				1805	<ul>
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1806	<li><b>i686-pc-linux-gnu</b> — Linux</li>
				1807
				1808	<li><b>i386-unknown-freebsd5.3</b> — FreeBSD 5.3</li>
				1809
				1810	<li><b>i686-pc-cygwin</b> — Cygwin on Win32</li>
				1811
				1812	<li><b>i686-pc-mingw32</b> — MingW on Win32</li>
				1813
				1814	<li><b>i386-pc-mingw32msvc</b> — MingW crosscompiler on Linux</li>
				1815
				1816	<li><b>i686-apple-darwin*</b> — Apple Darwin on X86</li>
Torok Edwin	4378bf0	2009-06-15 12:17:44 +0000	[diff] [blame]	1817
				1818	<li><b>x86_64-unknown-linux-gnu</b> — Linux</li>
Chris Lattner	de69bf9	2005-07-12 00:20:49 +0000	[diff] [blame]	1819	</ul>
				1820
				1821	</div>
				1822
				1823	<!-- _______________________________________________________________________ -->
				1824	<div class="doc_subsubsection">
Anton Korobeynikov	6f7072c	2006-09-17 20:25:45 +0000	[diff] [blame]	1825	<a name="x86_cc">X86 Calling Conventions supported</a>
				1826	</div>
				1827
				1828
				1829	<div class="doc_text">
				1830
Dan Gohman	1e6f511	2008-11-24 16:27:17 +0000	[diff] [blame]	1831	<p>The following target-specific calling conventions are known to backend:</p>
Anton Korobeynikov	6f7072c	2006-09-17 20:25:45 +0000	[diff] [blame]	1832
				1833	<ul>
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1834	<li><b>x86_StdCall</b> — stdcall calling convention seen on Microsoft
				1835	Windows platform (CC ID = 64).</li>
				1836
				1837	<li><b>x86_FastCall</b> — fastcall calling convention seen on Microsoft
				1838	Windows platform (CC ID = 65).</li>
Anton Korobeynikov	6f7072c	2006-09-17 20:25:45 +0000	[diff] [blame]	1839	</ul>
				1840
				1841	</div>
				1842
				1843	<!-- _______________________________________________________________________ -->
				1844	<div class="doc_subsubsection">
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	1845	<a name="x86_memory">Representing X86 addressing modes in MachineInstrs</a>
				1846	</div>
				1847
				1848	<div class="doc_text">
				1849
Misha Brukman	3703685	2005-02-17 22:22:24 +0000	[diff] [blame]	1850	<p>The x86 has a very flexible way of accessing memory. It is capable of
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1851	forming memory addresses of the following expression directly in integer
				1852	instructions (which use ModR/M addressing):</p>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	1853
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	1854	<div class="doc_code">
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	1855	<pre>
Chris Lattner	10a5a6f	2009-10-10 21:30:55 +0000	[diff] [blame]	1856	SegmentReg: Base + [1,2,4,8] * IndexReg + Disp32
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	1857	</pre>
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	1858	</div>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	1859
Chris Lattner	10a5a6f	2009-10-10 21:30:55 +0000	[diff] [blame]	1860	<p>In order to represent this, LLVM tracks no less than 5 operands for each
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1861	memory operand of this form. This means that the "load" form of
				1862	'<tt>mov</tt>' has the following <tt>MachineOperand</tt>s in this order:</p>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	1863
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1864	<div class="doc_code">
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	1865	<pre>
Chris Lattner	10a5a6f	2009-10-10 21:30:55 +0000	[diff] [blame]	1866	Index: 0 \| 1 2 3 4 5
				1867	Meaning: DestReg, \| BaseReg, Scale, IndexReg, Displacement Segment
				1868	OperandTy: VirtReg, \| VirtReg, UnsImm, VirtReg, SignExtImm PhysReg
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	1869	</pre>
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1870	</div>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	1871
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1872	<p>Stores, and all other instructions, treat the four memory operands in the
Chris Lattner	10a5a6f	2009-10-10 21:30:55 +0000	[diff] [blame]	1873	same way and in the same order. If the segment register is unspecified
				1874	(regno = 0), then no segment override is generated. "Lea" operations do not
				1875	have a segment register specified, so they only have 4 operands for their
				1876	memory reference.</p>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	1877
				1878	</div>
				1879
				1880	<!-- _______________________________________________________________________ -->
				1881	<div class="doc_subsubsection">
Nate Begeman	7ea4e86	2009-01-26 02:54:45 +0000	[diff] [blame]	1882	<a name="x86_memory">X86 address spaces supported</a>
				1883	</div>
				1884
				1885	<div class="doc_text">
				1886
Dan Gohman	d99feb8	2009-05-05 20:48:47 +0000	[diff] [blame]	1887	<p>x86 has an experimental feature which provides
				1888	the ability to perform loads and stores to different address spaces
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1889	via the x86 segment registers. A segment override prefix byte on an
				1890	instruction causes the instruction's memory access to go to the specified
				1891	segment. LLVM address space 0 is the default address space, which includes
				1892	the stack, and any unqualified memory accesses in a program. Address spaces
				1893	1-255 are currently reserved for user-defined code. The GS-segment is
Chris Lattner	be9fa50	2009-05-05 18:52:19 +0000	[diff] [blame]	1894	represented by address space 256, while the FS-segment is represented by
				1895	address space 257. Other x86 segments have yet to be allocated address space
				1896	numbers.</p>
Nate Begeman	7ea4e86	2009-01-26 02:54:45 +0000	[diff] [blame]	1897
Dan Gohman	d99feb8	2009-05-05 20:48:47 +0000	[diff] [blame]	1898	<p>While these address spaces may seem similar to TLS via the
				1899	<tt>thread_local</tt> keyword, and often use the same underlying hardware,
				1900	there are some fundamental differences.</p>
				1901
				1902	<p>The <tt>thread_local</tt> keyword applies to global variables and
				1903	specifies that they are to be allocated in thread-local memory. There are
				1904	no type qualifiers involved, and these variables can be pointed to with
				1905	normal pointers and accessed with normal loads and stores.
				1906	The <tt>thread_local</tt> keyword is target-independent at the LLVM IR
				1907	level (though LLVM doesn't yet have implementations of it for some
				1908	configurations).<p>
				1909
				1910	<p>Special address spaces, in contrast, apply to static types. Every
				1911	load and store has a particular address space in its address operand type,
				1912	and this is what determines which address space is accessed.
				1913	LLVM ignores these special address space qualifiers on global variables,
				1914	and does not provide a way to directly allocate storage in them.
				1915	At the LLVM IR level, the behavior of these special address spaces depends
				1916	in part on the underlying OS or runtime environment, and they are specific
				1917	to x86 (and LLVM doesn't yet handle them correctly in some cases).</p>
				1918
				1919	<p>Some operating systems and runtime environments use (or may in the future
				1920	use) the FS/GS-segment registers for various low-level purposes, so care
				1921	should be taken when considering them.</p>
Nate Begeman	7ea4e86	2009-01-26 02:54:45 +0000	[diff] [blame]	1922
				1923	</div>
				1924
				1925	<!-- _______________________________________________________________________ -->
				1926	<div class="doc_subsubsection">
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	1927	<a name="x86_names">Instruction naming</a>
				1928	</div>
				1929
				1930	<div class="doc_text">
				1931
Bill Wendling	5c385de	2006-08-28 02:26:32 +0000	[diff] [blame]	1932	<p>An instruction name consists of the base name, a default operand size, and a
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1933	a character per operand with an optional special size. For example:</p>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	1934
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1935	<div class="doc_code">
				1936	<pre>
				1937	ADD8rr -> add, 8-bit register, 8-bit register
				1938	IMUL16rmi -> imul, 16-bit register, 16-bit memory, 16-bit immediate
				1939	IMUL16rmi8 -> imul, 16-bit register, 16-bit memory, 8-bit immediate
				1940	MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory
				1941	</pre>
				1942	</div>
Chris Lattner	b4e5664	2004-06-04 00:16:02 +0000	[diff] [blame]	1943
				1944	</div>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	1945
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	1946	<!-- ======================================================================= -->
				1947	<div class="doc_subsection">
				1948	<a name="ppc">The PowerPC backend</a>
				1949	</div>
				1950
				1951	<div class="doc_text">
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1952
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	1953	<p>The PowerPC code generator lives in the lib/Target/PowerPC directory. The
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1954	code generation is retargetable to several variations or <i>subtargets</i> of
				1955	the PowerPC ISA; including ppc32, ppc64 and altivec.</p>
				1956
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	1957	</div>
				1958
				1959	<!-- _______________________________________________________________________ -->
				1960	<div class="doc_subsubsection">
				1961	<a name="ppc_abi">LLVM PowerPC ABI</a>
				1962	</div>
				1963
				1964	<div class="doc_text">
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1965
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	1966	<p>LLVM follows the AIX PowerPC ABI, with two deviations. LLVM uses a PC
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1967	relative (PIC) or static addressing for accessing global values, so no TOC
				1968	(r2) is used. Second, r31 is used as a frame pointer to allow dynamic growth
				1969	of a stack frame. LLVM takes advantage of having no TOC to provide space to
				1970	save the frame pointer in the PowerPC linkage area of the caller frame.
				1971	Other details of PowerPC ABI can be found at <a href=
				1972	"http://developer.apple.com/documentation/DeveloperTools/Conceptual/LowLevelABI/Articles/32bitPowerPC.html"
				1973	>PowerPC ABI.</a> Note: This link describes the 32 bit ABI. The 64 bit ABI
				1974	is similar except space for GPRs are 8 bytes wide (not 4) and r13 is reserved
				1975	for system use.</p>
				1976
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	1977	</div>
				1978
				1979	<!-- _______________________________________________________________________ -->
				1980	<div class="doc_subsubsection">
				1981	<a name="ppc_frame">Frame Layout</a>
				1982	</div>
				1983
				1984	<div class="doc_text">
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1985
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	1986	<p>The size of a PowerPC frame is usually fixed for the duration of a
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1987	function's invocation. Since the frame is fixed size, all references
				1988	into the frame can be accessed via fixed offsets from the stack pointer. The
				1989	exception to this is when dynamic alloca or variable sized arrays are
				1990	present, then a base pointer (r31) is used as a proxy for the stack pointer
				1991	and stack pointer is free to grow or shrink. A base pointer is also used if
				1992	llvm-gcc is not passed the -fomit-frame-pointer flag. The stack pointer is
				1993	always aligned to 16 bytes, so that space allocated for altivec vectors will
				1994	be properly aligned.</p>
				1995
Dan Gohman	1e6f511	2008-11-24 16:27:17 +0000	[diff] [blame]	1996	<p>An invocation frame is laid out as follows (low memory at top);</p>
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	1997
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	1998	<table class="layout">
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	1999	<tr>
				2000	<td>Linkage<br><br></td>
				2001	</tr>
				2002	<tr>
				2003	<td>Parameter area<br><br></td>
				2004	</tr>
				2005	<tr>
				2006	<td>Dynamic area<br><br></td>
				2007	</tr>
				2008	<tr>
				2009	<td>Locals area<br><br></td>
				2010	</tr>
				2011	<tr>
				2012	<td>Saved registers area<br><br></td>
				2013	</tr>
				2014	<tr style="border-style: none hidden none hidden;">
				2015	<td><br></td>
				2016	</tr>
				2017	<tr>
				2018	<td>Previous Frame<br><br></td>
				2019	</tr>
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	2020	</table>
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	2021
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	2022	<p>The <i>linkage</i> area is used by a callee to save special registers prior
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	2023	to allocating its own frame. Only three entries are relevant to LLVM. The
				2024	first entry is the previous stack pointer (sp), aka link. This allows
				2025	probing tools like gdb or exception handlers to quickly scan the frames in
				2026	the stack. A function epilog can also use the link to pop the frame from the
				2027	stack. The third entry in the linkage area is used to save the return
				2028	address from the lr register. Finally, as mentioned above, the last entry is
				2029	used to save the previous frame pointer (r31.) The entries in the linkage
				2030	area are the size of a GPR, thus the linkage area is 24 bytes long in 32 bit
				2031	mode and 48 bytes in 64 bit mode.</p>
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	2032
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	2033	<p>32 bit linkage area</p>
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	2034
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	2035	<table class="layout">
				2036	<tr>
				2037	<td>0</td>
				2038	<td>Saved SP (r1)</td>
				2039	</tr>
				2040	<tr>
				2041	<td>4</td>
				2042	<td>Saved CR</td>
				2043	</tr>
				2044	<tr>
				2045	<td>8</td>
				2046	<td>Saved LR</td>
				2047	</tr>
				2048	<tr>
				2049	<td>12</td>
				2050	<td>Reserved</td>
				2051	</tr>
				2052	<tr>
				2053	<td>16</td>
				2054	<td>Reserved</td>
				2055	</tr>
				2056	<tr>
				2057	<td>20</td>
				2058	<td>Saved FP (r31)</td>
				2059	</tr>
				2060	</table>
				2061
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	2062	<p>64 bit linkage area</p>
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	2063
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	2064	<table class="layout">
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	2065	<tr>
				2066	<td>0</td>
				2067	<td>Saved SP (r1)</td>
				2068	</tr>
				2069	<tr>
				2070	<td>8</td>
				2071	<td>Saved CR</td>
				2072	</tr>
				2073	<tr>
				2074	<td>16</td>
				2075	<td>Saved LR</td>
				2076	</tr>
				2077	<tr>
				2078	<td>24</td>
				2079	<td>Reserved</td>
				2080	</tr>
				2081	<tr>
				2082	<td>32</td>
				2083	<td>Reserved</td>
				2084	</tr>
				2085	<tr>
				2086	<td>40</td>
				2087	<td>Saved FP (r31)</td>
				2088	</tr>
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	2089	</table>
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	2090
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	2091	<p>The <i>parameter area</i> is used to store arguments being passed to a callee
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	2092	function. Following the PowerPC ABI, the first few arguments are actually
				2093	passed in registers, with the space in the parameter area unused. However,
				2094	if there are not enough registers or the callee is a thunk or vararg
				2095	function, these register arguments can be spilled into the parameter area.
				2096	Thus, the parameter area must be large enough to store all the parameters for
				2097	the largest call sequence made by the caller. The size must also be
				2098	minimally large enough to spill registers r3-r10. This allows callees blind
				2099	to the call signature, such as thunks and vararg functions, enough space to
				2100	cache the argument registers. Therefore, the parameter area is minimally 32
				2101	bytes (64 bytes in 64 bit mode.) Also note that since the parameter area is
				2102	a fixed offset from the top of the frame, that a callee can access its spilt
				2103	arguments using fixed offsets from the stack pointer (or base pointer.)</p>
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	2104
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	2105	<p>Combining the information about the linkage, parameter areas and alignment. A
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	2106	stack frame is minimally 64 bytes in 32 bit mode and 128 bytes in 64 bit
				2107	mode.</p>
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	2108
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	2109	<p>The <i>dynamic area</i> starts out as size zero. If a function uses dynamic
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	2110	alloca then space is added to the stack, the linkage and parameter areas are
				2111	shifted to top of stack, and the new space is available immediately below the
				2112	linkage and parameter areas. The cost of shifting the linkage and parameter
				2113	areas is minor since only the link value needs to be copied. The link value
				2114	can be easily fetched by adding the original frame size to the base pointer.
				2115	Note that allocations in the dynamic space need to observe 16 byte
				2116	alignment.</p>
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	2117
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	2118	<p>The <i>locals area</i> is where the llvm compiler reserves space for local
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	2119	variables.</p>
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	2120
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	2121	<p>The <i>saved registers area</i> is where the llvm compiler spills callee
				2122	saved registers on entry to the callee.</p>
				2123
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	2124	</div>
				2125
				2126	<!-- _______________________________________________________________________ -->
				2127	<div class="doc_subsubsection">
				2128	<a name="ppc_prolog">Prolog/Epilog</a>
				2129	</div>
				2130
				2131	<div class="doc_text">
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	2132
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	2133	<p>The llvm prolog and epilog are the same as described in the PowerPC ABI, with
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	2134	the following exceptions. Callee saved registers are spilled after the frame
				2135	is created. This allows the llvm epilog/prolog support to be common with
				2136	other targets. The base pointer callee saved register r31 is saved in the
				2137	TOC slot of linkage area. This simplifies allocation of space for the base
				2138	pointer and makes it convenient to locate programatically and during
				2139	debugging.</p>
				2140
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	2141	</div>
				2142
				2143	<!-- _______________________________________________________________________ -->
				2144	<div class="doc_subsubsection">
				2145	<a name="ppc_dynamic">Dynamic Allocation</a>
				2146	</div>
				2147
				2148	<div class="doc_text">
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	2149
Jim Laskey	5782584	2006-12-15 10:40:48 +0000	[diff] [blame]	2150	<p><i>TODO - More to come.</i></p>
Bill Wendling	64602b1	2009-04-15 02:12:37 +0000	[diff] [blame]	2151
Jim Laskey	5782584	2006-12-15 10:40:48 +0000	[diff] [blame]	2152	</div>
Jim Laskey	ef58334	2006-12-14 17:19:50 +0000	[diff] [blame]	2153
				2154
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	2155	<!-- *********************************************************************** -->
				2156	<hr>
				2157	<address>
				2158	<a href="http://jigsaw.w3.org/css-validator/check/referer"><img
Misha Brukman	86242e1	2008-12-11 17:34:48 +0000	[diff] [blame]	2159	src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	2160	<a href="http://validator.w3.org/check/referer"><img
Misha Brukman	21a6370	2008-12-11 18:23:24 +0000	[diff] [blame]	2161	src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	2162
				2163	<a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
Reid Spencer	ca05854	2006-03-14 05:39:39 +0000	[diff] [blame]	2164	<a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
Chris Lattner	565d7d5	2004-06-01 06:48:00 +0000	[diff] [blame]	2165	Last modified: $Date$
				2166	</address>
				2167
				2168	</body>
				2169	</html>