Blame - docs/CodeGenerator.html - fp2-dev/platform/external/llvm

blob: 651eb96603ece223d94644c0f039dc9790976df9 [file] [log] [blame]

Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
				2	"http://www.w3.org/TR/html4/strict.dtd">
				3	<html>
				4	<head>
Jim Laskey	b744c25	2006-12-15 10:40:48 +0000	[diff] [blame]	5	<meta http-equiv="content-type" content="text/html; charset=utf-8">
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	6	<title>The LLVM Target-Independent Code Generator</title>
Daniel Dunbar	adea497	2012-04-19 20:20:34 +0000	[diff] [blame]	7	<link rel="stylesheet" href="_static/llvm.css" type="text/css">
Benjamin Kramer	943beeb	2010-10-30 21:07:28 +0000	[diff] [blame]	8
				9	<style type="text/css">
				10	.unknown { background-color: #C0C0C0; text-align: center; }
				11	.unknown:before { content: "?" }
				12	.no { background-color: #C11B17 }
				13	.no:before { content: "N" }
				14	.partial { background-color: #F88017 }
				15	.yes { background-color: #0F0; }
				16	.yes:before { content: "Y" }
				17	</style>
				18
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	19	</head>
				20	<body>
				21
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	22	<h1>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	23	The LLVM Target-Independent Code Generator
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	24	</h1>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	25
				26	<ol>
				27	<li><a href="#introduction">Introduction</a>
				28	<ul>
				29	<li><a href="#required">Required components in the code generator</a></li>
Chris Lattner	e35d3bb	2005-10-16 00:36:38 +0000	[diff] [blame]	30	<li><a href="#high-level-design">The high-level design of the code
				31	generator</a></li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	32	<li><a href="#tablegen">Using TableGen for target description</a></li>
				33	</ul>
				34	</li>
				35	<li><a href="#targetdesc">Target description classes</a>
				36	<ul>
				37	<li><a href="#targetmachine">The <tt>TargetMachine</tt> class</a></li>
				38	<li><a href="#targetdata">The <tt>TargetData</tt> class</a></li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	39	<li><a href="#targetlowering">The <tt>TargetLowering</tt> class</a></li>
Dan Gohman	6f0d024	2008-02-10 18:45:23 +0000	[diff] [blame]	40	<li><a href="#targetregisterinfo">The <tt>TargetRegisterInfo</tt> class</a></li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	41	<li><a href="#targetinstrinfo">The <tt>TargetInstrInfo</tt> class</a></li>
				42	<li><a href="#targetframeinfo">The <tt>TargetFrameInfo</tt> class</a></li>
Chris Lattner	47adebb	2005-10-16 17:06:07 +0000	[diff] [blame]	43	<li><a href="#targetsubtarget">The <tt>TargetSubtarget</tt> class</a></li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	44	<li><a href="#targetjitinfo">The <tt>TargetJITInfo</tt> class</a></li>
				45	</ul>
				46	</li>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	47	<li><a href="#codegendesc">The "Machine" Code Generator classes</a>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	48	<ul>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	49	<li><a href="#machineinstr">The <tt>MachineInstr</tt> class</a></li>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	50	<li><a href="#machinebasicblock">The <tt>MachineBasicBlock</tt>
				51	class</a></li>
				52	<li><a href="#machinefunction">The <tt>MachineFunction</tt> class</a></li>
Evan Cheng	2e9c724	2011-12-14 21:32:14 +0000	[diff] [blame]	53	<li><a href="#machineinstrbundle"><tt>MachineInstr Bundles</tt></a></li>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	54	</ul>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	55	</li>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	56	<li><a href="#mc">The "MC" Layer</a>
				57	<ul>
				58	<li><a href="#mcstreamer">The <tt>MCStreamer</tt> API</a></li>
				59	<li><a href="#mccontext">The <tt>MCContext</tt> class</a>
				60	<li><a href="#mcsymbol">The <tt>MCSymbol</tt> class</a></li>
				61	<li><a href="#mcsection">The <tt>MCSection</tt> class</a></li>
				62	<li><a href="#mcinst">The <tt>MCInst</tt> class</a></li>
				63	</ul>
				64	</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	65	<li><a href="#codegenalgs">Target-independent code generation algorithms</a>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	66	<ul>
				67	<li><a href="#instselect">Instruction Selection</a>
				68	<ul>
				69	<li><a href="#selectiondag_intro">Introduction to SelectionDAGs</a></li>
				70	<li><a href="#selectiondag_process">SelectionDAG Code Generation
				71	Process</a></li>
				72	<li><a href="#selectiondag_build">Initial SelectionDAG
				73	Construction</a></li>
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	74	<li><a href="#selectiondag_legalize_types">SelectionDAG LegalizeTypes Phase</a></li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	75	<li><a href="#selectiondag_legalize">SelectionDAG Legalize Phase</a></li>
				76	<li><a href="#selectiondag_optimize">SelectionDAG Optimization
Chris Lattner	e35d3bb	2005-10-16 00:36:38 +0000	[diff] [blame]	77	Phase: the DAG Combiner</a></li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	78	<li><a href="#selectiondag_select">SelectionDAG Select Phase</a></li>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	79	<li><a href="#selectiondag_sched">SelectionDAG Scheduling and Formation
Chris Lattner	e35d3bb	2005-10-16 00:36:38 +0000	[diff] [blame]	80	Phase</a></li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	81	<li><a href="#selectiondag_future">Future directions for the
				82	SelectionDAG</a></li>
				83	</ul></li>
Bill Wendling	3fc488d	2006-09-06 18:42:41 +0000	[diff] [blame]	84	<li><a href="#liveintervals">Live Intervals</a>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	85	<ul>
				86	<li><a href="#livevariable_analysis">Live Variable Analysis</a></li>
Bill Wendling	3fc488d	2006-09-06 18:42:41 +0000	[diff] [blame]	87	<li><a href="#liveintervals_analysis">Live Intervals Analysis</a></li>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	88	</ul></li>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	89	<li><a href="#regalloc">Register Allocation</a>
				90	<ul>
				91	<li><a href="#regAlloc_represent">How registers are represented in
				92	LLVM</a></li>
				93	<li><a href="#regAlloc_howTo">Mapping virtual registers to physical
				94	registers</a></li>
				95	<li><a href="#regAlloc_twoAddr">Handling two address instructions</a></li>
				96	<li><a href="#regAlloc_ssaDecon">The SSA deconstruction phase</a></li>
				97	<li><a href="#regAlloc_fold">Instruction folding</a></li>
				98	<li><a href="#regAlloc_builtIn">Built in register allocators</a></li>
				99	</ul></li>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	100	<li><a href="#codeemit">Code Emission</a></li>
Anshuman Dasgupta	6805b56	2011-12-06 23:12:42 +0000	[diff] [blame]	101	<li><a href="#vliw_packetizer">VLIW Packetizer</a>
				102	<ul>
				103	<li><a href="#vliw_mapping">Mapping from instructions to functional
				104	units</a></li>
				105	<li><a href="#vliw_repr">How the packetization tables are
				106	generated and used</a></li>
				107	</ul>
				108	</li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	109	</ul>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	110	</li>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	111	<li><a href="#nativeassembler">Implementing a Native Assembler</a></li>
				112
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	113	<li><a href="#targetimpls">Target-specific Implementation Notes</a>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	114	<ul>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	115	<li><a href="#targetfeatures">Target Feature Matrix</a></li>
Arnold Schwaighofer	9097d14	2008-05-14 09:17:12 +0000	[diff] [blame]	116	<li><a href="#tailcallopt">Tail call optimization</a></li>
Evan Cheng	dc444e9	2010-03-08 21:05:02 +0000	[diff] [blame]	117	<li><a href="#sibcallopt">Sibling call optimization</a></li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	118	<li><a href="#x86">The X86 backend</a></li>
Jim Laskey	b744c25	2006-12-15 10:40:48 +0000	[diff] [blame]	119	<li><a href="#ppc">The PowerPC backend</a>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	120	<ul>
				121	<li><a href="#ppc_abi">LLVM PowerPC ABI</a></li>
				122	<li><a href="#ppc_frame">Frame Layout</a></li>
				123	<li><a href="#ppc_prolog">Prolog/Epilog</a></li>
				124	<li><a href="#ppc_dynamic">Dynamic Allocation</a></li>
Jim Laskey	b744c25	2006-12-15 10:40:48 +0000	[diff] [blame]	125	</ul></li>
Justin Holewinski	dceb002	2011-08-11 17:34:16 +0000	[diff] [blame]	126	<li><a href="#ptx">The PTX backend</a></li>
Jim Laskey	b744c25	2006-12-15 10:40:48 +0000	[diff] [blame]	127	</ul></li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	128
				129	</ol>
				130
				131	<div class="doc_author">
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	132	<p>Written by the LLVM Team.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	133	</div>
				134
Chris Lattner	10d6800	2004-06-01 17:18:11 +0000	[diff] [blame]	135	<div class="doc_warning">
				136	<p>Warning: This is a work in progress.</p>
				137	</div>
				138
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	139	<!-- *********************************************************************** -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	140	<h2>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	141	<a name="introduction">Introduction</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	142	</h2>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	143	<!-- *********************************************************************** -->
				144
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	145	<div>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	146
				147	<p>The LLVM target-independent code generator is a framework that provides a
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	148	suite of reusable components for translating the LLVM internal representation
				149	to the machine code for a specified target—either in assembly form
				150	(suitable for a static compiler) or in binary machine code format (usable for
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	151	a JIT compiler). The LLVM target-independent code generator consists of six
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	152	main components:</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	153
				154	<ol>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	155	<li><a href="#targetdesc">Abstract target description</a> interfaces which
				156	capture important properties about various aspects of the machine,
				157	independently of how they will be used. These interfaces are defined in
				158	<tt>include/llvm/Target/</tt>.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	159
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	160	<li>Classes used to represent the <a href="#codegendesc">code being
				161	generated</a> for a target. These classes are intended to be abstract
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	162	enough to represent the machine code for <i>any</i> target machine. These
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	163	classes are defined in <tt>include/llvm/CodeGen/</tt>. At this level,
				164	concepts like "constant pool entries" and "jump tables" are explicitly
				165	exposed.</li>
				166
				167	<li>Classes and algorithms used to represent code as the object file level,
				168	the <a href="#mc">MC Layer</a>. These classes represent assembly level
				169	constructs like labels, sections, and instructions. At this level,
				170	concepts like "constant pool entries" and "jump tables" don't exist.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	171
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	172	<li><a href="#codegenalgs">Target-independent algorithms</a> used to implement
				173	various phases of native code generation (register allocation, scheduling,
				174	stack frame representation, etc). This code lives
				175	in <tt>lib/CodeGen/</tt>.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	176
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	177	<li><a href="#targetimpls">Implementations of the abstract target description
				178	interfaces</a> for particular targets. These machine descriptions make
				179	use of the components provided by LLVM, and can optionally provide custom
				180	target-specific passes, to build complete code generators for a specific
				181	target. Target descriptions live in <tt>lib/Target/</tt>.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	182
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	183	<li><a href="#jit">The target-independent JIT components</a>. The LLVM JIT is
				184	completely target independent (it uses the <tt>TargetJITInfo</tt>
				185	structure to interface for target-specific issues. The code for the
				186	target-independent JIT lives in <tt>lib/ExecutionEngine/JIT</tt>.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	187	</ol>
				188
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	189	<p>Depending on which part of the code generator you are interested in working
				190	on, different pieces of this will be useful to you. In any case, you should
				191	be familiar with the <a href="#targetdesc">target description</a>
				192	and <a href="#codegendesc">machine code representation</a> classes. If you
				193	want to add a backend for a new target, you will need
				194	to <a href="#targetimpls">implement the target description</a> classes for
				195	your new target and understand the <a href="LangRef.html">LLVM code
				196	representation</a>. If you are interested in implementing a
				197	new <a href="#codegenalgs">code generation algorithm</a>, it should only
				198	depend on the target-description and machine code representation classes,
				199	ensuring that it is portable.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	200
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	201	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	202	<h3>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	203	<a name="required">Required components in the code generator</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	204	</h3>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	205
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	206	<div>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	207
				208	<p>The two pieces of the LLVM code generator are the high-level interface to the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	209	code generator and the set of reusable components that can be used to build
				210	target-specific backends. The two most important interfaces
				211	(<a href="#targetmachine"><tt>TargetMachine</tt></a>
				212	and <a href="#targetdata"><tt>TargetData</tt></a>) are the only ones that are
				213	required to be defined for a backend to fit into the LLVM system, but the
				214	others must be defined if the reusable code generator components are going to
				215	be used.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	216
				217	<p>This design has two important implications. The first is that LLVM can
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	218	support completely non-traditional code generation targets. For example, the
				219	C backend does not require register allocation, instruction selection, or any
				220	of the other standard components provided by the system. As such, it only
David Blaikie	722f254	2012-06-14 16:52:55 +0000	[diff] [blame]	221	implements these two interfaces, and does its own thing. Note that C backend
				222	was removed from the trunk since LLVM 3.1 release. Another example of
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	223	a code generator like this is a (purely hypothetical) backend that converts
				224	LLVM to the GCC RTL form and uses GCC to emit machine code for a target.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	225
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	226	<p>This design also implies that it is possible to design and implement
				227	radically different code generators in the LLVM system that do not make use
				228	of any of the built-in components. Doing so is not recommended at all, but
				229	could be required for radically different targets that do not fit into the
				230	LLVM machine description model: FPGAs for example.</p>
Chris Lattner	900bf8c	2004-06-02 07:06:06 +0000	[diff] [blame]	231
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	232	</div>
				233
				234	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	235	<h3>
Chris Lattner	10d6800	2004-06-01 17:18:11 +0000	[diff] [blame]	236	<a name="high-level-design">The high-level design of the code generator</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	237	</h3>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	238
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	239	<div>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	240
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	241	<p>The LLVM target-independent code generator is designed to support efficient
				242	and quality code generation for standard register-based microprocessors.
				243	Code generation in this model is divided into the following stages:</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	244
				245	<ol>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	246	<li><b><a href="#instselect">Instruction Selection</a></b> — This phase
				247	determines an efficient way to express the input LLVM code in the target
				248	instruction set. This stage produces the initial code for the program in
				249	the target instruction set, then makes use of virtual registers in SSA
				250	form and physical registers that represent any required register
				251	assignments due to target constraints or calling conventions. This step
				252	turns the LLVM code into a DAG of target instructions.</li>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	253
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	254	<li><b><a href="#selectiondag_sched">Scheduling and Formation</a></b> —
				255	This phase takes the DAG of target instructions produced by the
				256	instruction selection phase, determines an ordering of the instructions,
				257	then emits the instructions
				258	as <tt><a href="#machineinstr">MachineInstr</a></tt>s with that ordering.
				259	Note that we describe this in the <a href="#instselect">instruction
				260	selection section</a> because it operates on
				261	a <a href="#selectiondag_intro">SelectionDAG</a>.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	262
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	263	<li><b><a href="#ssamco">SSA-based Machine Code Optimizations</a></b> —
				264	This optional stage consists of a series of machine-code optimizations
				265	that operate on the SSA-form produced by the instruction selector.
				266	Optimizations like modulo-scheduling or peephole optimization work
				267	here.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	268
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	269	<li><b><a href="#regalloc">Register Allocation</a></b> — The target code
				270	is transformed from an infinite virtual register file in SSA form to the
				271	concrete register file used by the target. This phase introduces spill
				272	code and eliminates all virtual register references from the program.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	273
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	274	<li><b><a href="#proepicode">Prolog/Epilog Code Insertion</a></b> — Once
				275	the machine code has been generated for the function and the amount of
				276	stack space required is known (used for LLVM alloca's and spill slots),
				277	the prolog and epilog code for the function can be inserted and "abstract
				278	stack location references" can be eliminated. This stage is responsible
				279	for implementing optimizations like frame-pointer elimination and stack
				280	packing.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	281
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	282	<li><b><a href="#latemco">Late Machine Code Optimizations</a></b> —
				283	Optimizations that operate on "final" machine code can go here, such as
				284	spill code scheduling and peephole optimizations.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	285
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	286	<li><b><a href="#codeemit">Code Emission</a></b> — The final stage
				287	actually puts out the code for the current function, either in the target
				288	assembler format or in machine code.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	289	</ol>
				290
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	291	<p>The code generator is based on the assumption that the instruction selector
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	292	will use an optimal pattern matching selector to create high-quality
				293	sequences of native instructions. Alternative code generator designs based
				294	on pattern expansion and aggressive iterative peephole optimization are much
				295	slower. This design permits efficient compilation (important for JIT
				296	environments) and aggressive optimization (used when generating code offline)
				297	by allowing components of varying levels of sophistication to be used for any
				298	step of compilation.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	299
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	300	<p>In addition to these stages, target implementations can insert arbitrary
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	301	target-specific passes into the flow. For example, the X86 target uses a
				302	special pass to handle the 80x87 floating point stack architecture. Other
				303	targets with unusual requirements can be supported with custom passes as
				304	needed.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	305
				306	</div>
				307
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	308	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	309	<h3>
Chris Lattner	10d6800	2004-06-01 17:18:11 +0000	[diff] [blame]	310	<a name="tablegen">Using TableGen for target description</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	311	</h3>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	312
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	313	<div>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	314
Chris Lattner	5489e93	2004-06-01 18:35:00 +0000	[diff] [blame]	315	<p>The target description classes require a detailed description of the target
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	316	architecture. These target descriptions often have a large amount of common
				317	information (e.g., an <tt>add</tt> instruction is almost identical to a
				318	<tt>sub</tt> instruction). In order to allow the maximum amount of
				319	commonality to be factored out, the LLVM code generator uses
				320	the <a href="TableGenFundamentals.html">TableGen</a> tool to describe big
				321	chunks of the target machine, which allows the use of domain-specific and
				322	target-specific abstractions to reduce the amount of repetition.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	323
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	324	<p>As LLVM continues to be developed and refined, we plan to move more and more
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	325	of the target description to the <tt>.td</tt> form. Doing so gives us a
				326	number of advantages. The most important is that it makes it easier to port
				327	LLVM because it reduces the amount of C++ code that has to be written, and
				328	the surface area of the code generator that needs to be understood before
				329	someone can get something working. Second, it makes it easier to change
				330	things. In particular, if tables and other things are all emitted
				331	by <tt>tblgen</tt>, we only need a change in one place (<tt>tblgen</tt>) to
				332	update all of the targets to a new interface.</p>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	333
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	334	</div>
				335
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	336	</div>
				337
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	338	<!-- *********************************************************************** -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	339	<h2>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	340	<a name="targetdesc">Target description classes</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	341	</h2>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	342	<!-- *********************************************************************** -->
				343
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	344	<div>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	345
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	346	<p>The LLVM target description classes (located in the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	347	<tt>include/llvm/Target</tt> directory) provide an abstract description of
				348	the target machine independent of any particular client. These classes are
				349	designed to capture the <i>abstract</i> properties of the target (such as the
				350	instructions and registers it has), and do not incorporate any particular
				351	pieces of code generation algorithms.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	352
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	353	<p>All of the target description classes (except the
				354	<tt><a href="#targetdata">TargetData</a></tt> class) are designed to be
				355	subclassed by the concrete target implementation, and have virtual methods
				356	implemented. To get to these implementations, the
				357	<tt><a href="#targetmachine">TargetMachine</a></tt> class provides accessors
				358	that should be implemented by the target.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	359
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	360	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	361	<h3>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	362	<a name="targetmachine">The <tt>TargetMachine</tt> class</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	363	</h3>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	364
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	365	<div>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	366
				367	<p>The <tt>TargetMachine</tt> class provides virtual methods that are used to
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	368	access the target-specific implementations of the various target description
				369	classes via the <tt>get*Info</tt> methods (<tt>getInstrInfo</tt>,
				370	<tt>getRegisterInfo</tt>, <tt>getFrameInfo</tt>, etc.). This class is
				371	designed to be specialized by a concrete target implementation
				372	(e.g., <tt>X86TargetMachine</tt>) which implements the various virtual
				373	methods. The only required target description class is
				374	the <a href="#targetdata"><tt>TargetData</tt></a> class, but if the code
				375	generator components are to be used, the other interfaces should be
				376	implemented as well.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	377
				378	</div>
				379
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	380	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	381	<h3>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	382	<a name="targetdata">The <tt>TargetData</tt> class</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	383	</h3>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	384
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	385	<div>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	386
				387	<p>The <tt>TargetData</tt> class is the only required target description class,
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	388	and it is the only class that is not extensible (you cannot derived a new
				389	class from it). <tt>TargetData</tt> specifies information about how the
				390	target lays out memory for structures, the alignment requirements for various
				391	data types, the size of pointers in the target, and whether the target is
				392	little-endian or big-endian.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	393
				394	</div>
				395
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	396	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	397	<h3>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	398	<a name="targetlowering">The <tt>TargetLowering</tt> class</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	399	</h3>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	400
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	401	<div>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	402
				403	<p>The <tt>TargetLowering</tt> class is used by SelectionDAG based instruction
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	404	selectors primarily to describe how LLVM code should be lowered to
				405	SelectionDAG operations. Among other things, this class indicates:</p>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	406
				407	<ul>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	408	<li>an initial register class to use for various <tt>ValueType</tt>s,</li>
				409
				410	<li>which operations are natively supported by the target machine,</li>
				411
				412	<li>the return type of <tt>setcc</tt> operations,</li>
				413
				414	<li>the type to use for shift amounts, and</li>
				415
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	416	<li>various high-level characteristics, like whether it is profitable to turn
				417	division by a constant into a multiplication sequence</li>
Jim Laskey	b744c25	2006-12-15 10:40:48 +0000	[diff] [blame]	418	</ul>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	419
				420	</div>
				421
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	422	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	423	<h3>
Dan Gohman	6f0d024	2008-02-10 18:45:23 +0000	[diff] [blame]	424	<a name="targetregisterinfo">The <tt>TargetRegisterInfo</tt> class</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	425	</h3>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	426
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	427	<div>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	428
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	429	<p>The <tt>TargetRegisterInfo</tt> class is used to describe the register file
				430	of the target and any interactions between the registers.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	431
				432	<p>Registers in the code generator are represented in the code generator by
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	433	unsigned integers. Physical registers (those that actually exist in the
				434	target description) are unique small numbers, and virtual registers are
				435	generally large. Note that register #0 is reserved as a flag value.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	436
				437	<p>Each register in the processor description has an associated
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	438	<tt>TargetRegisterDesc</tt> entry, which provides a textual name for the
				439	register (used for assembly output and debugging dumps) and a set of aliases
				440	(used to indicate whether one register overlaps with another).</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	441
Dan Gohman	6f0d024	2008-02-10 18:45:23 +0000	[diff] [blame]	442	<p>In addition to the per-register description, the <tt>TargetRegisterInfo</tt>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	443	class exposes a set of processor specific register classes (instances of the
				444	<tt>TargetRegisterClass</tt> class). Each register class contains sets of
				445	registers that have the same properties (for example, they are all 32-bit
				446	integer registers). Each SSA virtual register created by the instruction
				447	selector has an associated register class. When the register allocator runs,
				448	it replaces virtual registers with a physical register in the set.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	449
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	450	<p>The target-specific implementations of these classes is auto-generated from
				451	a <a href="TableGenFundamentals.html">TableGen</a> description of the
				452	register file.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	453
				454	</div>
				455
				456	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	457	<h3>
Chris Lattner	10d6800	2004-06-01 17:18:11 +0000	[diff] [blame]	458	<a name="targetinstrinfo">The <tt>TargetInstrInfo</tt> class</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	459	</h3>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	460
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	461	<div>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	462
				463	<p>The <tt>TargetInstrInfo</tt> class is used to describe the machine
				464	instructions supported by the target. It is essentially an array of
				465	<tt>TargetInstrDescriptor</tt> objects, each of which describes one
				466	instruction the target supports. Descriptors define things like the mnemonic
				467	for the opcode, the number of operands, the list of implicit register uses
				468	and defs, whether the instruction has certain target-independent properties
				469	(accesses memory, is commutable, etc), and holds any target-specific
				470	flags.</p>
				471
Reid Spencer	627cd00	2005-07-19 01:36:35 +0000	[diff] [blame]	472	</div>
				473
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	474	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	475	<h3>
Chris Lattner	10d6800	2004-06-01 17:18:11 +0000	[diff] [blame]	476	<a name="targetframeinfo">The <tt>TargetFrameInfo</tt> class</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	477	</h3>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	478
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	479	<div>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	480
				481	<p>The <tt>TargetFrameInfo</tt> class is used to provide information about the
				482	stack frame layout of the target. It holds the direction of stack growth, the
				483	known stack alignment on entry to each function, and the offset to the local
				484	area. The offset to the local area is the offset from the stack pointer on
				485	function entry to the first location where function data (local variables,
				486	spill locations) can be stored.</p>
				487
Reid Spencer	627cd00	2005-07-19 01:36:35 +0000	[diff] [blame]	488	</div>
Chris Lattner	47adebb	2005-10-16 17:06:07 +0000	[diff] [blame]	489
				490	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	491	<h3>
Chris Lattner	47adebb	2005-10-16 17:06:07 +0000	[diff] [blame]	492	<a name="targetsubtarget">The <tt>TargetSubtarget</tt> class</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	493	</h3>
Chris Lattner	47adebb	2005-10-16 17:06:07 +0000	[diff] [blame]	494
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	495	<div>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	496
				497	<p>The <tt>TargetSubtarget</tt> class is used to provide information about the
				498	specific chip set being targeted. A sub-target informs code generation of
				499	which instructions are supported, instruction latencies and instruction
				500	execution itinerary; i.e., which processing units are used, in what order,
				501	and for how long.</p>
				502
Chris Lattner	47adebb	2005-10-16 17:06:07 +0000	[diff] [blame]	503	</div>
				504
				505
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	506	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	507	<h3>
Chris Lattner	10d6800	2004-06-01 17:18:11 +0000	[diff] [blame]	508	<a name="targetjitinfo">The <tt>TargetJITInfo</tt> class</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	509	</h3>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	510
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	511	<div>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	512
				513	<p>The <tt>TargetJITInfo</tt> class exposes an abstract interface used by the
				514	Just-In-Time code generator to perform target-specific activities, such as
				515	emitting stubs. If a <tt>TargetMachine</tt> supports JIT code generation, it
				516	should provide one of these objects through the <tt>getJITInfo</tt>
				517	method.</p>
				518
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	519	</div>
				520
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	521	</div>
				522
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	523	<!-- *********************************************************************** -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	524	<h2>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	525	<a name="codegendesc">Machine code description classes</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	526	</h2>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	527	<!-- *********************************************************************** -->
				528
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	529	<div>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	530
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	531	<p>At the high-level, LLVM code is translated to a machine specific
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	532	representation formed out of
				533	<a href="#machinefunction"><tt>MachineFunction</tt></a>,
				534	<a href="#machinebasicblock"><tt>MachineBasicBlock</tt></a>,
				535	and <a href="#machineinstr"><tt>MachineInstr</tt></a> instances (defined
				536	in <tt>include/llvm/CodeGen</tt>). This representation is completely target
				537	agnostic, representing instructions in their most abstract form: an opcode
				538	and a series of operands. This representation is designed to support both an
				539	SSA representation for machine code, as well as a register allocated, non-SSA
				540	form.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	541
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	542	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	543	<h3>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	544	<a name="machineinstr">The <tt>MachineInstr</tt> class</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	545	</h3>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	546
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	547	<div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	548
				549	<p>Target machine instructions are represented as instances of the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	550	<tt>MachineInstr</tt> class. This class is an extremely abstract way of
				551	representing machine instructions. In particular, it only keeps track of an
				552	opcode number and a set of operands.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	553
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	554	<p>The opcode number is a simple unsigned integer that only has meaning to a
				555	specific backend. All of the instructions for a target should be defined in
				556	the <tt>*InstrInfo.td</tt> file for the target. The opcode enum values are
				557	auto-generated from this description. The <tt>MachineInstr</tt> class does
				558	not have any information about how to interpret the instruction (i.e., what
				559	the semantics of the instruction are); for that you must refer to the
				560	<tt><a href="#targetinstrinfo">TargetInstrInfo</a></tt> class.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	561
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	562	<p>The operands of a machine instruction can be of several different types: a
				563	register reference, a constant integer, a basic block reference, etc. In
				564	addition, a machine operand should be marked as a def or a use of the value
				565	(though only registers are allowed to be defs).</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	566
				567	<p>By convention, the LLVM code generator orders instruction operands so that
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	568	all register definitions come before the register uses, even on architectures
				569	that are normally printed in other orders. For example, the SPARC add
				570	instruction: "<tt>add %i1, %i2, %i3</tt>" adds the "%i1", and "%i2" registers
				571	and stores the result into the "%i3" register. In the LLVM code generator,
				572	the operands should be stored as "<tt>%i3, %i1, %i2</tt>": with the
				573	destination first.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	574
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	575	<p>Keeping destination (definition) operands at the beginning of the operand
				576	list has several advantages. In particular, the debugging printer will print
				577	the instruction like this:</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	578
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	579	<div class="doc_code">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	580	<pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	581	%r3 = add %i1, %i2
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	582	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	583	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	584
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	585	<p>Also if the first operand is a def, it is easier to <a href="#buildmi">create
				586	instructions</a> whose only def is the first operand.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	587
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	588	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	589	<h4>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	590	<a name="buildmi">Using the <tt>MachineInstrBuilder.h</tt> functions</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	591	</h4>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	592
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	593	<div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	594
				595	<p>Machine instructions are created by using the <tt>BuildMI</tt> functions,
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	596	located in the <tt>include/llvm/CodeGen/MachineInstrBuilder.h</tt> file. The
				597	<tt>BuildMI</tt> functions make it easy to build arbitrary machine
				598	instructions. Usage of the <tt>BuildMI</tt> functions look like this:</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	599
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	600	<div class="doc_code">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	601	<pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	602	// Create a 'DestReg = mov 42' (rendered in X86 assembly as 'mov DestReg, 42')
				603	// instruction. The '1' specifies how many operands will be added.
				604	MachineInstr *MI = BuildMI(X86::MOV32ri, 1, DestReg).addImm(42);
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	605
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	606	// Create the same instr, but insert it at the end of a basic block.
				607	MachineBasicBlock &MBB = ...
				608	BuildMI(MBB, X86::MOV32ri, 1, DestReg).addImm(42);
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	609
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	610	// Create the same instr, but insert it before a specified iterator point.
				611	MachineBasicBlock::iterator MBBI = ...
				612	BuildMI(MBB, MBBI, X86::MOV32ri, 1, DestReg).addImm(42);
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	613
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	614	// Create a 'cmp Reg, 0' instruction, no destination reg.
				615	MI = BuildMI(X86::CMP32ri, 2).addReg(Reg).addImm(0);
				616	// Create an 'sahf' instruction which takes no operands and stores nothing.
				617	MI = BuildMI(X86::SAHF, 0);
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	618
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	619	// Create a self looping branch instruction.
				620	BuildMI(MBB, X86::JNE, 1).addMBB(&MBB);
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	621	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	622	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	623
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	624	<p>The key thing to remember with the <tt>BuildMI</tt> functions is that you
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	625	have to specify the number of operands that the machine instruction will
				626	take. This allows for efficient memory allocation. You also need to specify
				627	if operands default to be uses of values, not definitions. If you need to
				628	add a definition operand (other than the optional destination register), you
				629	must explicitly mark it as such:</p>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	630
				631	<div class="doc_code">
				632	<pre>
Bill Wendling	587daed	2009-05-13 21:33:08 +0000	[diff] [blame]	633	MI.addReg(Reg, RegState::Define);
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	634	</pre>
				635	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	636
				637	</div>
				638
				639	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	640	<h4>
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	641	<a name="fixedregs">Fixed (preassigned) registers</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	642	</h4>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	643
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	644	<div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	645
				646	<p>One important issue that the code generator needs to be aware of is the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	647	presence of fixed registers. In particular, there are often places in the
				648	instruction stream where the register allocator <em>must</em> arrange for a
				649	particular value to be in a particular register. This can occur due to
				650	limitations of the instruction set (e.g., the X86 can only do a 32-bit divide
				651	with the <tt>EAX</tt>/<tt>EDX</tt> registers), or external factors like
				652	calling conventions. In any case, the instruction selector should emit code
				653	that copies a virtual register into or out of a physical register when
				654	needed.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	655
				656	<p>For example, consider this simple LLVM example:</p>
				657
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	658	<div class="doc_code">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	659	<pre>
Matthijs Kooijman	61399af	2008-06-04 15:46:35 +0000	[diff] [blame]	660	define i32 @test(i32 %X, i32 %Y) {
				661	%Z = udiv i32 %X, %Y
				662	ret i32 %Z
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	663	}
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	664	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	665	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	666
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	667	<p>The X86 instruction selector produces this machine code for the <tt>div</tt>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	668	and <tt>ret</tt> (use "<tt>llc X.bc -march=x86 -print-machineinstrs</tt>" to
				669	get this):</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	670
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	671	<div class="doc_code">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	672	<pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	673	;; Start of div
				674	%EAX = mov %reg1024 ;; Copy X (in reg1024) into EAX
				675	%reg1027 = sar %reg1024, 31
				676	%EDX = mov %reg1027 ;; Sign extend X into EDX
				677	idiv %reg1025 ;; Divide by Y (in reg1025)
				678	%reg1026 = mov %EAX ;; Read the result (Z) out of EAX
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	679
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	680	;; Start of ret
				681	%EAX = mov %reg1026 ;; 32-bit return value goes in EAX
				682	ret
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	683	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	684	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	685
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	686	<p>By the end of code generation, the register allocator has coalesced the
				687	registers and deleted the resultant identity moves producing the following
				688	code:</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	689
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	690	<div class="doc_code">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	691	<pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	692	;; X is in EAX, Y is in ECX
				693	mov %EAX, %EDX
				694	sar %EDX, 31
				695	idiv %ECX
				696	ret
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	697	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	698	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	699
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	700	<p>This approach is extremely general (if it can handle the X86 architecture, it
				701	can handle anything!) and allows all of the target specific knowledge about
				702	the instruction stream to be isolated in the instruction selector. Note that
				703	physical registers should have a short lifetime for good code generation, and
				704	all physical registers are assumed dead on entry to and exit from basic
				705	blocks (before register allocation). Thus, if you need a value to be live
				706	across basic block boundaries, it <em>must</em> live in a virtual
				707	register.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	708
				709	</div>
				710
				711	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	712	<h4>
Jakob Stoklund Olesen	7739cad	2012-01-16 19:22:00 +0000	[diff] [blame]	713	<a name="callclobber">Call-clobbered registers</a>
				714	</h4>
				715
				716	<div>
				717
				718	<p>Some machine instructions, like calls, clobber a large number of physical
				719	registers. Rather than adding <code><def,dead></code> operands for
				720	all of them, it is possible to use an <code>MO_RegisterMask</code> operand
				721	instead. The register mask operand holds a bit mask of preserved registers,
				722	and everything else is considered to be clobbered by the instruction. </p>
				723
				724	</div>
				725
				726	<!-- _______________________________________________________________________ -->
				727	<h4>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	728	<a name="ssa">Machine code in SSA form</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	729	</h4>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	730
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	731	<div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	732
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	733	<p><tt>MachineInstr</tt>'s are initially selected in SSA-form, and are
				734	maintained in SSA-form until register allocation happens. For the most part,
				735	this is trivially simple since LLVM is already in SSA form; LLVM PHI nodes
				736	become machine code PHI nodes, and virtual registers are only allowed to have
				737	a single definition.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	738
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	739	<p>After register allocation, machine code is no longer in SSA-form because
				740	there are no virtual registers left in the code.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	741
				742	</div>
				743
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	744	</div>
				745
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	746	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	747	<h3>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	748	<a name="machinebasicblock">The <tt>MachineBasicBlock</tt> class</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	749	</h3>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	750
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	751	<div>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	752
				753	<p>The <tt>MachineBasicBlock</tt> class contains a list of machine instructions
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	754	(<tt><a href="#machineinstr">MachineInstr</a></tt> instances). It roughly
				755	corresponds to the LLVM code input to the instruction selector, but there can
				756	be a one-to-many mapping (i.e. one LLVM basic block can map to multiple
				757	machine basic blocks). The <tt>MachineBasicBlock</tt> class has a
				758	"<tt>getBasicBlock</tt>" method, which returns the LLVM basic block that it
				759	comes from.</p>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	760
				761	</div>
				762
				763	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	764	<h3>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	765	<a name="machinefunction">The <tt>MachineFunction</tt> class</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	766	</h3>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	767
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	768	<div>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	769
				770	<p>The <tt>MachineFunction</tt> class contains a list of machine basic blocks
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	771	(<tt><a href="#machinebasicblock">MachineBasicBlock</a></tt> instances). It
				772	corresponds one-to-one with the LLVM function input to the instruction
				773	selector. In addition to a list of basic blocks,
				774	the <tt>MachineFunction</tt> contains a a <tt>MachineConstantPool</tt>,
				775	a <tt>MachineFrameInfo</tt>, a <tt>MachineFunctionInfo</tt>, and a
				776	<tt>MachineRegisterInfo</tt>. See
				777	<tt>include/llvm/CodeGen/MachineFunction.h</tt> for more information.</p>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	778
				779	</div>
				780
Evan Cheng	2e9c724	2011-12-14 21:32:14 +0000	[diff] [blame]	781	<!-- ======================================================================= -->
				782	<h3>
				783	<a name="machineinstrbundle"><tt>MachineInstr Bundles</tt></a>
				784	</h3>
				785
				786	<div>
				787
				788	<p>LLVM code generator can model sequences of instructions as MachineInstr
				789	bundles. A MI bundle can model a VLIW group / pack which contains an
				790	arbitrary number of parallel instructions. It can also be used to model
				791	a sequential list of instructions (potentially with data dependencies) that
				792	cannot be legally separated (e.g. ARM Thumb2 IT blocks).</p>
				793
				794	<p>Conceptually a MI bundle is a MI with a number of other MIs nested within:
				795	</p>
				796
				797	<div class="doc_code">
				798	<pre>
				799	--------------
				800	\| Bundle \| ---------
				801	-------------- \
				802	\| ----------------
				803	\| \| MI \|
				804	\| ----------------
				805	\| \|
				806	\| ----------------
				807	\| \| MI \|
				808	\| ----------------
				809	\| \|
				810	\| ----------------
				811	\| \| MI \|
				812	\| ----------------
				813	\|
				814	--------------
				815	\| Bundle \| --------
				816	-------------- \
				817	\| ----------------
				818	\| \| MI \|
				819	\| ----------------
				820	\| \|
				821	\| ----------------
				822	\| \| MI \|
				823	\| ----------------
				824	\| \|
				825	\| ...
				826	\|
				827	--------------
				828	\| Bundle \| --------
				829	-------------- \
				830	\|
				831	...
				832	</pre>
				833	</div>
				834
				835	<p> MI bundle support does not change the physical representations of
				836	MachineBasicBlock and MachineInstr. All the MIs (including top level and
				837	nested ones) are stored as sequential list of MIs. The "bundled" MIs are
				838	marked with the 'InsideBundle' flag. A top level MI with the special BUNDLE
				839	opcode is used to represent the start of a bundle. It's legal to mix BUNDLE
				840	MIs with indiviual MIs that are not inside bundles nor represent bundles.
				841	</p>
				842
				843	<p> MachineInstr passes should operate on a MI bundle as a single unit. Member
				844	methods have been taught to correctly handle bundles and MIs inside bundles.
				845	The MachineBasicBlock iterator has been modified to skip over bundled MIs to
				846	enforce the bundle-as-a-single-unit concept. An alternative iterator
				847	instr_iterator has been added to MachineBasicBlock to allow passes to
				848	iterate over all of the MIs in a MachineBasicBlock, including those which
				849	are nested inside bundles. The top level BUNDLE instruction must have the
				850	correct set of register MachineOperand's that represent the cumulative
				851	inputs and outputs of the bundled MIs.</p>
				852
				853	<p> Packing / bundling of MachineInstr's should be done as part of the register
				854	allocation super-pass. More specifically, the pass which determines what
				855	MIs should be bundled together must be done after code generator exits SSA
				856	form (i.e. after two-address pass, PHI elimination, and copy coalescing).
				857	Bundles should only be finalized (i.e. adding BUNDLE MIs and input and
				858	output register MachineOperands) after virtual registers have been
				859	rewritten into physical registers. This requirement eliminates the need to
				860	add virtual register operands to BUNDLE instructions which would effectively
				861	double the virtual register def and use lists.</p>
				862
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	863	</div>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	864
NAKAMURA Takumi	9c55f59	2012-03-27 11:25:16 +0000	[diff] [blame]	865	</div>
				866
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	867	<!-- *********************************************************************** -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	868	<h2>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	869	<a name="mc">The "MC" Layer</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	870	</h2>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	871	<!-- *********************************************************************** -->
				872
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	873	<div>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	874
				875	<p>
				876	The MC Layer is used to represent and process code at the raw machine code
				877	level, devoid of "high level" information like "constant pools", "jump tables",
				878	"global variables" or anything like that. At this level, LLVM handles things
				879	like label names, machine instructions, and sections in the object file. The
				880	code in this layer is used for a number of important purposes: the tail end of
				881	the code generator uses it to write a .s or .o file, and it is also used by the
Jay Foad	d61895a	2011-04-13 13:03:56 +0000	[diff] [blame]	882	llvm-mc tool to implement standalone machine code assemblers and disassemblers.
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	883	</p>
				884
				885	<p>
				886	This section describes some of the important classes. There are also a number
				887	of important subsystems that interact at this layer, they are described later
				888	in this manual.
				889	</p>
				890
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	891	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	892	<h3>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	893	<a name="mcstreamer">The <tt>MCStreamer</tt> API</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	894	</h3>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	895
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	896	<div>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	897
				898	<p>
				899	MCStreamer is best thought of as an assembler API. It is an abstract API which
				900	is <em>implemented</em> in different ways (e.g. to output a .s file, output an
				901	ELF .o file, etc) but whose API correspond directly to what you see in a .s
				902	file. MCStreamer has one method per directive, such as EmitLabel,
				903	EmitSymbolAttribute, SwitchSection, EmitValue (for .byte, .word), etc, which
				904	directly correspond to assembly level directives. It also has an
				905	EmitInstruction method, which is used to output an MCInst to the streamer.
				906	</p>
				907
				908	<p>
				909	This API is most important for two clients: the llvm-mc stand-alone assembler is
				910	effectively a parser that parses a line, then invokes a method on MCStreamer. In
				911	the code generator, the <a href="#codeemit">Code Emission</a> phase of the code
				912	generator lowers higher level LLVM IR and Machine* constructs down to the MC
				913	layer, emitting directives through MCStreamer.</p>
				914
				915	<p>
				916	On the implementation side of MCStreamer, there are two major implementations:
				917	one for writing out a .s file (MCAsmStreamer), and one for writing out a .o
				918	file (MCObjectStreamer). MCAsmStreamer is a straight-forward implementation
				919	that prints out a directive for each method (e.g. EmitValue -> .byte), but
				920	MCObjectStreamer implements a full assembler.
				921	</p>
				922
				923	</div>
				924
				925	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	926	<h3>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	927	<a name="mccontext">The <tt>MCContext</tt> class</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	928	</h3>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	929
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	930	<div>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	931
				932	<p>
				933	The MCContext class is the owner of a variety of uniqued data structures at the
				934	MC layer, including symbols, sections, etc. As such, this is the class that you
				935	interact with to create symbols and sections. This class can not be subclassed.
				936	</p>
				937
				938	</div>
				939
				940	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	941	<h3>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	942	<a name="mcsymbol">The <tt>MCSymbol</tt> class</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	943	</h3>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	944
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	945	<div>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	946
				947	<p>
				948	The MCSymbol class represents a symbol (aka label) in the assembly file. There
				949	are two interesting kinds of symbols: assembler temporary symbols, and normal
				950	symbols. Assembler temporary symbols are used and processed by the assembler
				951	but are discarded when the object file is produced. The distinction is usually
				952	represented by adding a prefix to the label, for example "L" labels are
				953	assembler temporary labels in MachO.
				954	</p>
				955
				956	<p>MCSymbols are created by MCContext and uniqued there. This means that
				957	MCSymbols can be compared for pointer equivalence to find out if they are the
				958	same symbol. Note that pointer inequality does not guarantee the labels will
				959	end up at different addresses though. It's perfectly legal to output something
				960	like this to the .s file:<p>
				961
				962	<pre>
				963	foo:
				964	bar:
				965	.byte 4
				966	</pre>
				967
				968	<p>In this case, both the foo and bar symbols will have the same address.</p>
				969
				970	</div>
				971
				972	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	973	<h3>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	974	<a name="mcsection">The <tt>MCSection</tt> class</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	975	</h3>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	976
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	977	<div>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	978
				979	<p>
				980	The MCSection class represents an object-file specific section. It is subclassed
				981	by object file specific implementations (e.g. <tt>MCSectionMachO</tt>,
				982	<tt>MCSectionCOFF</tt>, <tt>MCSectionELF</tt>) and these are created and uniqued
				983	by MCContext. The MCStreamer has a notion of the current section, which can be
				984	changed with the SwitchToSection method (which corresponds to a ".section"
				985	directive in a .s file).
				986	</p>
				987
				988	</div>
				989
				990	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	991	<h3>
Benjamin Kramer	943beeb	2010-10-30 21:07:28 +0000	[diff] [blame]	992	<a name="mcinst">The <tt>MCInst</tt> class</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	993	</h3>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	994
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	995	<div>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	996
				997	<p>
				998	The MCInst class is a target-independent representation of an instruction. It
				999	is a simple class (much more so than <a href="#machineinstr">MachineInstr</a>)
				1000	that holds a target-specific opcode and a vector of MCOperands. MCOperand, in
				1001	turn, is a simple discriminated union of three cases: 1) a simple immediate,
				1002	2) a target register ID, 3) a symbolic expression (e.g. "Lfoo-Lbar+42") as an
				1003	MCExpr.
				1004	</p>
				1005
				1006	<p>MCInst is the common currency used to represent machine instructions at the
				1007	MC layer. It is the type used by the instruction encoder, the instruction
				1008	printer, and the type generated by the assembly parser and disassembler.
				1009	</p>
				1010
				1011	</div>
				1012
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	1013	</div>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	1014
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	1015	<!-- *********************************************************************** -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1016	<h2>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1017	<a name="codegenalgs">Target-independent code generation algorithms</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1018	</h2>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1019	<!-- *********************************************************************** -->
				1020
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	1021	<div>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1022
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1023	<p>This section documents the phases described in the
				1024	<a href="#high-level-design">high-level design of the code generator</a>.
				1025	It explains how they work and some of the rationale behind their design.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1026
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1027	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1028	<h3>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1029	<a name="instselect">Instruction Selection</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1030	</h3>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1031
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	1032	<div>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1033
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1034	<p>Instruction Selection is the process of translating LLVM code presented to
				1035	the code generator into target-specific machine instructions. There are
				1036	several well-known ways to do this in the literature. LLVM uses a
				1037	SelectionDAG based instruction selector.</p>
				1038
				1039	<p>Portions of the DAG instruction selector are generated from the target
				1040	description (<tt>*.td</tt>) files. Our goal is for the entire instruction
				1041	selector to be generated from these <tt>.td</tt> files, though currently
				1042	there are still things that require custom C++ code.</p>
				1043
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1044	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1045	<h4>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1046	<a name="selectiondag_intro">Introduction to SelectionDAGs</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1047	</h4>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1048
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	1049	<div>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1050
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1051	<p>The SelectionDAG provides an abstraction for code representation in a way
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1052	that is amenable to instruction selection using automatic techniques
				1053	(e.g. dynamic-programming based optimal pattern matching selectors). It is
				1054	also well-suited to other phases of code generation; in particular,
				1055	instruction scheduling (SelectionDAG's are very close to scheduling DAGs
				1056	post-selection). Additionally, the SelectionDAG provides a host
				1057	representation where a large variety of very-low-level (but
				1058	target-independent) <a href="#selectiondag_optimize">optimizations</a> may be
				1059	performed; ones which require extensive information about the instructions
				1060	efficiently supported by the target.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1061
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1062	<p>The SelectionDAG is a Directed-Acyclic-Graph whose nodes are instances of the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1063	<tt>SDNode</tt> class. The primary payload of the <tt>SDNode</tt> is its
				1064	operation code (Opcode) that indicates what operation the node performs and
				1065	the operands to the operation. The various operation node types are
				1066	described at the top of the <tt>include/llvm/CodeGen/SelectionDAGNodes.h</tt>
				1067	file.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1068
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1069	<p>Although most operations define a single value, each node in the graph may
				1070	define multiple values. For example, a combined div/rem operation will
				1071	define both the dividend and the remainder. Many other situations require
				1072	multiple values as well. Each node also has some number of operands, which
				1073	are edges to the node defining the used value. Because nodes may define
				1074	multiple values, edges are represented by instances of the <tt>SDValue</tt>
				1075	class, which is a <tt><SDNode, unsigned></tt> pair, indicating the node
				1076	and result value being used, respectively. Each value produced by
				1077	an <tt>SDNode</tt> has an associated <tt>MVT</tt> (Machine Value Type)
				1078	indicating what the type of the value is.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1079
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1080	<p>SelectionDAGs contain two different kinds of values: those that represent
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1081	data flow and those that represent control flow dependencies. Data values
				1082	are simple edges with an integer or floating point value type. Control edges
				1083	are represented as "chain" edges which are of type <tt>MVT::Other</tt>.
				1084	These edges provide an ordering between nodes that have side effects (such as
				1085	loads, stores, calls, returns, etc). All nodes that have side effects should
				1086	take a token chain as input and produce a new one as output. By convention,
				1087	token chain inputs are always operand #0, and chain results are always the
				1088	last value produced by an operation.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1089
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1090	<p>A SelectionDAG has designated "Entry" and "Root" nodes. The Entry node is
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1091	always a marker node with an Opcode of <tt>ISD::EntryToken</tt>. The Root
				1092	node is the final side-effecting node in the token chain. For example, in a
				1093	single basic block function it would be the return node.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1094
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1095	<p>One important concept for SelectionDAGs is the notion of a "legal" vs.
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1096	"illegal" DAG. A legal DAG for a target is one that only uses supported
				1097	operations and supported types. On a 32-bit PowerPC, for example, a DAG with
				1098	a value of type i1, i8, i16, or i64 would be illegal, as would a DAG that
				1099	uses a SREM or UREM operation. The
				1100	<a href="#selectinodag_legalize_types">legalize types</a> and
				1101	<a href="#selectiondag_legalize">legalize operations</a> phases are
				1102	responsible for turning an illegal DAG into a legal DAG.</p>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1103
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1104	</div>
				1105
				1106	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1107	<h4>
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1108	<a name="selectiondag_process">SelectionDAG Instruction Selection Process</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1109	</h4>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1110
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	1111	<div>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1112
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1113	<p>SelectionDAG-based instruction selection consists of the following steps:</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1114
				1115	<ol>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1116	<li><a href="#selectiondag_build">Build initial DAG</a> — This stage
				1117	performs a simple translation from the input LLVM code to an illegal
				1118	SelectionDAG.</li>
				1119
				1120	<li><a href="#selectiondag_optimize">Optimize SelectionDAG</a> — This
				1121	stage performs simple optimizations on the SelectionDAG to simplify it,
				1122	and recognize meta instructions (like rotates
				1123	and <tt>div</tt>/<tt>rem</tt> pairs) for targets that support these meta
				1124	operations. This makes the resultant code more efficient and
				1125	the <a href="#selectiondag_select">select instructions from DAG</a> phase
				1126	(below) simpler.</li>
				1127
				1128	<li><a href="#selectiondag_legalize_types">Legalize SelectionDAG Types</a>
				1129	— This stage transforms SelectionDAG nodes to eliminate any types
				1130	that are unsupported on the target.</li>
				1131
				1132	<li><a href="#selectiondag_optimize">Optimize SelectionDAG</a> — The
				1133	SelectionDAG optimizer is run to clean up redundancies exposed by type
				1134	legalization.</li>
				1135
Chris Lattner	7138863	2010-12-12 02:42:57 +0000	[diff] [blame]	1136	<li><a href="#selectiondag_legalize">Legalize SelectionDAG Ops</a> —
Chris Lattner	4c247f6	2010-12-13 00:17:12 +0000	[diff] [blame]	1137	This stage transforms SelectionDAG nodes to eliminate any operations
				1138	that are unsupported on the target.</li>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1139
				1140	<li><a href="#selectiondag_optimize">Optimize SelectionDAG</a> — The
				1141	SelectionDAG optimizer is run to eliminate inefficiencies introduced by
				1142	operation legalization.</li>
				1143
				1144	<li><a href="#selectiondag_select">Select instructions from DAG</a> —
				1145	Finally, the target instruction selector matches the DAG operations to
				1146	target instructions. This process translates the target-independent input
				1147	DAG into another DAG of target instructions.</li>
				1148
				1149	<li><a href="#selectiondag_sched">SelectionDAG Scheduling and Formation</a>
				1150	— The last phase assigns a linear order to the instructions in the
				1151	target-instruction DAG and emits them into the MachineFunction being
				1152	compiled. This step uses traditional prepass scheduling techniques.</li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1153	</ol>
				1154
				1155	<p>After all of these steps are complete, the SelectionDAG is destroyed and the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1156	rest of the code generation passes are run.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1157
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1158	<p>One great way to visualize what is going on here is to take advantage of a
				1159	few LLC command line options. The following options pop up a window
				1160	displaying the SelectionDAG at specific times (if you only get errors printed
				1161	to the console while using this, you probably
				1162	<a href="ProgrammersManual.html#ViewGraph">need to configure your system</a>
				1163	to add support for it).</p>
Dan Gohman	8c9c55f	2008-09-10 22:23:41 +0000	[diff] [blame]	1164
				1165	<ul>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1166	<li><tt>-view-dag-combine1-dags</tt> displays the DAG after being built,
				1167	before the first optimization pass.</li>
				1168
				1169	<li><tt>-view-legalize-dags</tt> displays the DAG before Legalization.</li>
				1170
				1171	<li><tt>-view-dag-combine2-dags</tt> displays the DAG before the second
				1172	optimization pass.</li>
				1173
				1174	<li><tt>-view-isel-dags</tt> displays the DAG before the Select phase.</li>
				1175
				1176	<li><tt>-view-sched-dags</tt> displays the DAG before Scheduling.</li>
Dan Gohman	8c9c55f	2008-09-10 22:23:41 +0000	[diff] [blame]	1177	</ul>
				1178
				1179	<p>The <tt>-view-sunit-dags</tt> displays the Scheduler's dependency graph.
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1180	This graph is based on the final SelectionDAG, with nodes that must be
				1181	scheduled together bundled into a single scheduling-unit node, and with
				1182	immediate operands and other nodes that aren't relevant for scheduling
				1183	omitted.</p>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1184
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1185	</div>
				1186
				1187	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1188	<h4>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1189	<a name="selectiondag_build">Initial SelectionDAG Construction</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1190	</h4>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1191
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	1192	<div>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1193
Bill Wendling	1644877	2006-08-28 03:04:05 +0000	[diff] [blame]	1194	<p>The initial SelectionDAG is naïvely peephole expanded from the LLVM
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1195	input by the <tt>SelectionDAGLowering</tt> class in the
				1196	<tt>lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp</tt> file. The intent of
				1197	this pass is to expose as much low-level, target-specific details to the
				1198	SelectionDAG as possible. This pass is mostly hard-coded (e.g. an
				1199	LLVM <tt>add</tt> turns into an <tt>SDNode add</tt> while a
				1200	<tt>getelementptr</tt> is expanded into the obvious arithmetic). This pass
				1201	requires target-specific hooks to lower calls, returns, varargs, etc. For
				1202	these features, the <tt><a href="#targetlowering">TargetLowering</a></tt>
				1203	interface is used.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1204
				1205	</div>
				1206
				1207	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1208	<h4>
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1209	<a name="selectiondag_legalize_types">SelectionDAG LegalizeTypes Phase</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1210	</h4>
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1211
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	1212	<div>
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1213
				1214	<p>The Legalize phase is in charge of converting a DAG to only use the types
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1215	that are natively supported by the target.</p>
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1216
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1217	<p>There are two main ways of converting values of unsupported scalar types to
				1218	values of supported types: converting small types to larger types
				1219	("promoting"), and breaking up large integer types into smaller ones
				1220	("expanding"). For example, a target might require that all f32 values are
				1221	promoted to f64 and that all i1/i8/i16 values are promoted to i32. The same
				1222	target might require that all i64 values be expanded into pairs of i32
				1223	values. These changes can insert sign and zero extensions as needed to make
				1224	sure that the final code has the same behavior as the input.</p>
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1225
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1226	<p>There are two main ways of converting values of unsupported vector types to
				1227	value of supported types: splitting vector types, multiple times if
				1228	necessary, until a legal type is found, and extending vector types by adding
				1229	elements to the end to round them out to legal types ("widening"). If a
				1230	vector gets split all the way down to single-element parts with no supported
				1231	vector type being found, the elements are converted to scalars
				1232	("scalarizing").</p>
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1233
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1234	<p>A target implementation tells the legalizer which types are supported (and
				1235	which register class to use for them) by calling the
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1236	<tt>addRegisterClass</tt> method in its TargetLowering constructor.</p>
				1237
				1238	</div>
				1239
				1240	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1241	<h4>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1242	<a name="selectiondag_legalize">SelectionDAG Legalize Phase</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1243	</h4>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1244
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	1245	<div>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1246
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1247	<p>The Legalize phase is in charge of converting a DAG to only use the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1248	operations that are natively supported by the target.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1249
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1250	<p>Targets often have weird constraints, such as not supporting every operation
				1251	on every supported datatype (e.g. X86 does not support byte conditional moves
				1252	and PowerPC does not support sign-extending loads from a 16-bit memory
				1253	location). Legalize takes care of this by open-coding another sequence of
				1254	operations to emulate the operation ("expansion"), by promoting one type to a
				1255	larger type that supports the operation ("promotion"), or by using a
				1256	target-specific hook to implement the legalization ("custom").</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1257
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1258	<p>A target implementation tells the legalizer which operations are not
				1259	supported (and which of the above three actions to take) by calling the
				1260	<tt>setOperationAction</tt> method in its <tt>TargetLowering</tt>
				1261	constructor.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1262
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1263	<p>Prior to the existence of the Legalize passes, we required that every target
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1264	<a href="#selectiondag_optimize">selector</a> supported and handled every
				1265	operator and type even if they are not natively supported. The introduction
				1266	of the Legalize phases allows all of the canonicalization patterns to be
				1267	shared across targets, and makes it very easy to optimize the canonicalized
				1268	code because it is still in the form of a DAG.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1269
				1270	</div>
				1271
				1272	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1273	<h4>
				1274	<a name="selectiondag_optimize">
				1275	SelectionDAG Optimization Phase: the DAG Combiner
				1276	</a>
				1277	</h4>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1278
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	1279	<div>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1280
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1281	<p>The SelectionDAG optimization phase is run multiple times for code
				1282	generation, immediately after the DAG is built and once after each
				1283	legalization. The first run of the pass allows the initial code to be
				1284	cleaned up (e.g. performing optimizations that depend on knowing that the
				1285	operators have restricted type inputs). Subsequent runs of the pass clean up
				1286	the messy code generated by the Legalize passes, which allows Legalize to be
				1287	very simple (it can focus on making code legal instead of focusing on
				1288	generating <em>good</em> and legal code).</p>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1289
				1290	<p>One important class of optimizations performed is optimizing inserted sign
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1291	and zero extension instructions. We currently use ad-hoc techniques, but
				1292	could move to more rigorous techniques in the future. Here are some good
				1293	papers on the subject:</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1294
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1295	<p>"<a href="http://www.eecs.harvard.edu/~nr/pubs/widen-abstract.html">Widening
				1296	integer arithmetic</a>"<br>
				1297	Kevin Redwine and Norman Ramsey<br>
				1298	International Conference on Compiler Construction (CC) 2004</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1299
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1300	<p>"<a href="http://portal.acm.org/citation.cfm?doid=512529.512552">Effective
				1301	sign extension elimination</a>"<br>
				1302	Motohiro Kawahito, Hideaki Komatsu, and Toshio Nakatani<br>
				1303	Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design
				1304	and Implementation.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1305
				1306	</div>
				1307
				1308	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1309	<h4>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1310	<a name="selectiondag_select">SelectionDAG Select Phase</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1311	</h4>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1312
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	1313	<div>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1314
				1315	<p>The Select phase is the bulk of the target-specific code for instruction
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1316	selection. This phase takes a legal SelectionDAG as input, pattern matches
				1317	the instructions supported by the target to this DAG, and produces a new DAG
				1318	of target code. For example, consider the following LLVM fragment:</p>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1319
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1320	<div class="doc_code">
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1321	<pre>
Dan Gohman	a9445e1	2010-03-02 01:11:08 +0000	[diff] [blame]	1322	%t1 = fadd float %W, %X
				1323	%t2 = fmul float %t1, %Y
				1324	%t3 = fadd float %t2, %Z
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1325	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1326	</div>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1327
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1328	<p>This LLVM code corresponds to a SelectionDAG that looks basically like
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1329	this:</p>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1330
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1331	<div class="doc_code">
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1332	<pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1333	(fadd:f32 (fmul:f32 (fadd:f32 W, X), Y), Z)
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1334	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1335	</div>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1336
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1337	<p>If a target supports floating point multiply-and-add (FMA) operations, one of
				1338	the adds can be merged with the multiply. On the PowerPC, for example, the
				1339	output of the instruction selector might look like this DAG:</p>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1340
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1341	<div class="doc_code">
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1342	<pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1343	(FMADDS (FADDS W, X), Y, Z)
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1344	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1345	</div>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1346
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1347	<p>The <tt>FMADDS</tt> instruction is a ternary instruction that multiplies its
				1348	first two operands and adds the third (as single-precision floating-point
				1349	numbers). The <tt>FADDS</tt> instruction is a simple binary single-precision
				1350	add instruction. To perform this pattern match, the PowerPC backend includes
				1351	the following instruction definitions:</p>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1352
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1353	<div class="doc_code">
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1354	<pre>
				1355	def FMADDS : AForm_1<59, 29,
				1356	(ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRC, F4RC:$FRB),
				1357	"fmadds $FRT, $FRA, $FRC, $FRB",
				1358	[<b>(set F4RC:$FRT, (fadd (fmul F4RC:$FRA, F4RC:$FRC),
				1359	F4RC:$FRB))</b>]>;
				1360	def FADDS : AForm_2<59, 21,
				1361	(ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRB),
				1362	"fadds $FRT, $FRA, $FRB",
				1363	[<b>(set F4RC:$FRT, (fadd F4RC:$FRA, F4RC:$FRB))</b>]>;
				1364	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1365	</div>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1366
				1367	<p>The portion of the instruction definition in bold indicates the pattern used
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1368	to match the instruction. The DAG operators
				1369	(like <tt>fmul</tt>/<tt>fadd</tt>) are defined in
Dan Gohman	6a4824c	2010-03-25 00:03:04 +0000	[diff] [blame]	1370	the <tt>include/llvm/Target/TargetSelectionDAG.td</tt> file. "
				1371	<tt>F4RC</tt>" is the register class of the input and result values.</p>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1372
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1373	<p>The TableGen DAG instruction selector generator reads the instruction
				1374	patterns in the <tt>.td</tt> file and automatically builds parts of the
				1375	pattern matching code for your target. It has the following strengths:</p>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1376
				1377	<ul>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1378	<li>At compiler-compiler time, it analyzes your instruction patterns and tells
				1379	you if your patterns make sense or not.</li>
				1380
				1381	<li>It can handle arbitrary constraints on operands for the pattern match. In
				1382	particular, it is straight-forward to say things like "match any immediate
				1383	that is a 13-bit sign-extended value". For examples, see the
				1384	<tt>immSExt16</tt> and related <tt>tblgen</tt> classes in the PowerPC
				1385	backend.</li>
				1386
				1387	<li>It knows several important identities for the patterns defined. For
				1388	example, it knows that addition is commutative, so it allows the
				1389	<tt>FMADDS</tt> pattern above to match "<tt>(fadd X, (fmul Y, Z))</tt>" as
				1390	well as "<tt>(fadd (fmul X, Y), Z)</tt>", without the target author having
				1391	to specially handle this case.</li>
				1392
				1393	<li>It has a full-featured type-inferencing system. In particular, you should
				1394	rarely have to explicitly tell the system what type parts of your patterns
				1395	are. In the <tt>FMADDS</tt> case above, we didn't have to tell
				1396	<tt>tblgen</tt> that all of the nodes in the pattern are of type 'f32'.
				1397	It was able to infer and propagate this knowledge from the fact that
				1398	<tt>F4RC</tt> has type 'f32'.</li>
				1399
				1400	<li>Targets can define their own (and rely on built-in) "pattern fragments".
				1401	Pattern fragments are chunks of reusable patterns that get inlined into
				1402	your patterns during compiler-compiler time. For example, the integer
				1403	"<tt>(not x)</tt>" operation is actually defined as a pattern fragment
				1404	that expands as "<tt>(xor x, -1)</tt>", since the SelectionDAG does not
				1405	have a native '<tt>not</tt>' operation. Targets can define their own
				1406	short-hand fragments as they see fit. See the definition of
				1407	'<tt>not</tt>' and '<tt>ineg</tt>' for examples.</li>
				1408
				1409	<li>In addition to instructions, targets can specify arbitrary patterns that
				1410	map to one or more instructions using the 'Pat' class. For example, the
				1411	PowerPC has no way to load an arbitrary integer immediate into a register
				1412	in one instruction. To tell tblgen how to do this, it defines:
				1413	<br>
				1414	<br>
				1415	<div class="doc_code">
				1416	<pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1417	// Arbitrary immediate support. Implement in terms of LIS/ORI.
				1418	def : Pat<(i32 imm:$imm),
				1419	(ORI (LIS (HI16 imm:$imm)), (LO16 imm:$imm))>;
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1420	</pre>
				1421	</div>
				1422	<br>
				1423	If none of the single-instruction patterns for loading an immediate into a
				1424	register match, this will be used. This rule says "match an arbitrary i32
				1425	immediate, turning it into an <tt>ORI</tt> ('or a 16-bit immediate') and
				1426	an <tt>LIS</tt> ('load 16-bit immediate, where the immediate is shifted to
				1427	the left 16 bits') instruction". To make this work, the
				1428	<tt>LO16</tt>/<tt>HI16</tt> node transformations are used to manipulate
				1429	the input immediate (in this case, take the high or low 16-bits of the
				1430	immediate).</li>
				1431
				1432	<li>While the system does automate a lot, it still allows you to write custom
				1433	C++ code to match special cases if there is something that is hard to
				1434	express.</li>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1435	</ul>
				1436
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1437	<p>While it has many strengths, the system currently has some limitations,
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1438	primarily because it is a work in progress and is not yet finished:</p>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1439
				1440	<ul>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1441	<li>Overall, there is no way to define or match SelectionDAG nodes that define
Dan Gohman	e370c80	2009-04-22 15:55:31 +0000	[diff] [blame]	1442	multiple values (e.g. <tt>SMUL_LOHI</tt>, <tt>LOAD</tt>, <tt>CALL</tt>,
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1443	etc). This is the biggest reason that you currently still <em>have
				1444	to</em> write custom C++ code for your instruction selector.</li>
				1445
				1446	<li>There is no great way to support matching complex addressing modes yet.
				1447	In the future, we will extend pattern fragments to allow them to define
				1448	multiple values (e.g. the four operands of the <a href="#x86_memory">X86
				1449	addressing mode</a>, which are currently matched with custom C++ code).
				1450	In addition, we'll extend fragments so that a fragment can match multiple
				1451	different patterns.</li>
				1452
				1453	<li>We don't automatically infer flags like isStore/isLoad yet.</li>
				1454
				1455	<li>We don't automatically generate the set of supported registers and
				1456	operations for the <a href="#selectiondag_legalize">Legalizer</a>
				1457	yet.</li>
				1458
				1459	<li>We don't have a way of tying in custom legalized nodes yet.</li>
Chris Lattner	7d6915c	2005-10-17 04:18:41 +0000	[diff] [blame]	1460	</ul>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1461
				1462	<p>Despite these limitations, the instruction selector generator is still quite
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1463	useful for most of the binary and logical operations in typical instruction
				1464	sets. If you run into any problems or can't figure out how to do something,
				1465	please let Chris know!</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1466
				1467	</div>
				1468
				1469	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1470	<h4>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	1471	<a name="selectiondag_sched">SelectionDAG Scheduling and Formation Phase</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1472	</h4>
Chris Lattner	e35d3bb	2005-10-16 00:36:38 +0000	[diff] [blame]	1473
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	1474	<div>
Chris Lattner	e35d3bb	2005-10-16 00:36:38 +0000	[diff] [blame]	1475
				1476	<p>The scheduling phase takes the DAG of target instructions from the selection
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1477	phase and assigns an order. The scheduler can pick an order depending on
				1478	various constraints of the machines (i.e. order for minimal register pressure
				1479	or try to cover instruction latencies). Once an order is established, the
				1480	DAG is converted to a list
				1481	of <tt><a href="#machineinstr">MachineInstr</a></tt>s and the SelectionDAG is
				1482	destroyed.</p>
Chris Lattner	e35d3bb	2005-10-16 00:36:38 +0000	[diff] [blame]	1483
Jeff Cohen	0b81cda	2005-10-24 16:54:55 +0000	[diff] [blame]	1484	<p>Note that this phase is logically separate from the instruction selection
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1485	phase, but is tied to it closely in the code because it operates on
				1486	SelectionDAGs.</p>
Chris Lattner	c38959f	2005-10-17 03:09:31 +0000	[diff] [blame]	1487
Chris Lattner	e35d3bb	2005-10-16 00:36:38 +0000	[diff] [blame]	1488	</div>
				1489
				1490	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1491	<h4>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1492	<a name="selectiondag_future">Future directions for the SelectionDAG</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1493	</h4>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1494
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	1495	<div>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1496
				1497	<ol>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1498	<li>Optional function-at-a-time selection.</li>
				1499
				1500	<li>Auto-generate entire selector from <tt>.td</tt> file.</li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1501	</ol>
				1502
				1503	</div>
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1504
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	1505	</div>
				1506
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1507	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1508	<h3>
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1509	<a name="ssamco">SSA-based Machine Code Optimizations</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1510	</h3>
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	1511	<div><p>To Be Written</p></div>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1512
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1513	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1514	<h3>
Bill Wendling	3fc488d	2006-09-06 18:42:41 +0000	[diff] [blame]	1515	<a name="liveintervals">Live Intervals</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1516	</h3>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1517
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	1518	<div>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1519
Bill Wendling	3fc488d	2006-09-06 18:42:41 +0000	[diff] [blame]	1520	<p>Live Intervals are the ranges (intervals) where a variable is <i>live</i>.
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1521	They are used by some <a href="#regalloc">register allocator</a> passes to
				1522	determine if two or more virtual registers which require the same physical
				1523	register are live at the same point in the program (i.e., they conflict).
				1524	When this situation occurs, one virtual register must be <i>spilled</i>.</p>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1525
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1526	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1527	<h4>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1528	<a name="livevariable_analysis">Live Variable Analysis</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1529	</h4>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1530
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	1531	<div>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1532
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1533	<p>The first step in determining the live intervals of variables is to calculate
				1534	the set of registers that are immediately dead after the instruction (i.e.,
				1535	the instruction calculates the value, but it is never used) and the set of
				1536	registers that are used by the instruction, but are never used after the
				1537	instruction (i.e., they are killed). Live variable information is computed
				1538	for each <i>virtual</i> register and <i>register allocatable</i> physical
				1539	register in the function. This is done in a very efficient manner because it
				1540	uses SSA to sparsely compute lifetime information for virtual registers
				1541	(which are in SSA form) and only has to track physical registers within a
				1542	block. Before register allocation, LLVM can assume that physical registers
				1543	are only live within a single basic block. This allows it to do a single,
				1544	local analysis to resolve physical register lifetimes within each basic
				1545	block. If a physical register is not register allocatable (e.g., a stack
				1546	pointer or condition codes), it is not tracked.</p>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1547
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1548	<p>Physical registers may be live in to or out of a function. Live in values are
				1549	typically arguments in registers. Live out values are typically return values
				1550	in registers. Live in values are marked as such, and are given a dummy
				1551	"defining" instruction during live intervals analysis. If the last basic
				1552	block of a function is a <tt>return</tt>, then it's marked as using all live
				1553	out values in the function.</p>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1554
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1555	<p><tt>PHI</tt> nodes need to be handled specially, because the calculation of
				1556	the live variable information from a depth first traversal of the CFG of the
				1557	function won't guarantee that a virtual register used by the <tt>PHI</tt>
				1558	node is defined before it's used. When a <tt>PHI</tt> node is encountered,
				1559	only the definition is handled, because the uses will be handled in other
				1560	basic blocks.</p>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1561
				1562	<p>For each <tt>PHI</tt> node of the current basic block, we simulate an
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1563	assignment at the end of the current basic block and traverse the successor
				1564	basic blocks. If a successor basic block has a <tt>PHI</tt> node and one of
				1565	the <tt>PHI</tt> node's operands is coming from the current basic block, then
				1566	the variable is marked as <i>alive</i> within the current basic block and all
				1567	of its predecessor basic blocks, until the basic block with the defining
				1568	instruction is encountered.</p>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1569
				1570	</div>
				1571
Bill Wendling	3fc488d	2006-09-06 18:42:41 +0000	[diff] [blame]	1572	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1573	<h4>
Bill Wendling	3fc488d	2006-09-06 18:42:41 +0000	[diff] [blame]	1574	<a name="liveintervals_analysis">Live Intervals Analysis</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1575	</h4>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1576
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	1577	<div>
Bill Wendling	3cd5ca6	2006-10-11 06:30:10 +0000	[diff] [blame]	1578
Bill Wendling	82e2eea	2006-10-11 18:00:22 +0000	[diff] [blame]	1579	<p>We now have the information available to perform the live intervals analysis
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1580	and build the live intervals themselves. We start off by numbering the basic
				1581	blocks and machine instructions. We then handle the "live-in" values. These
				1582	are in physical registers, so the physical register is assumed to be killed
				1583	by the end of the basic block. Live intervals for virtual registers are
				1584	computed for some ordering of the machine instructions <tt>[1, N]</tt>. A
				1585	live interval is an interval <tt>[i, j)</tt>, where <tt>1 <= i <= j
				1586	< N</tt>, for which a variable is live.</p>
Bill Wendling	3cd5ca6	2006-10-11 06:30:10 +0000	[diff] [blame]	1587
Bill Wendling	82e2eea	2006-10-11 18:00:22 +0000	[diff] [blame]	1588	<p><i><b>More to come...</b></i></p>
				1589
Bill Wendling	3fc488d	2006-09-06 18:42:41 +0000	[diff] [blame]	1590	</div>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1591
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	1592	</div>
				1593
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1594	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1595	<h3>
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1596	<a name="regalloc">Register Allocation</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1597	</h3>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1598
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	1599	<div>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1600
Bill Wendling	3cd5ca6	2006-10-11 06:30:10 +0000	[diff] [blame]	1601	<p>The <i>Register Allocation problem</i> consists in mapping a program
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1602	<i>P<sub>v</sub></i>, that can use an unbounded number of virtual registers,
				1603	to a program <i>P<sub>p</sub></i> that contains a finite (possibly small)
				1604	number of physical registers. Each target architecture has a different number
				1605	of physical registers. If the number of physical registers is not enough to
				1606	accommodate all the virtual registers, some of them will have to be mapped
				1607	into memory. These virtuals are called <i>spilled virtuals</i>.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1608
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1609	<!-- _______________________________________________________________________ -->
				1610
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1611	<h4>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1612	<a name="regAlloc_represent">How registers are represented in LLVM</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1613	</h4>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1614
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	1615	<div>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1616
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1617	<p>In LLVM, physical registers are denoted by integer numbers that normally
				1618	range from 1 to 1023. To see how this numbering is defined for a particular
				1619	architecture, you can read the <tt>GenRegisterNames.inc</tt> file for that
				1620	architecture. For instance, by
Jakub Staszak	00920f6	2012-04-15 20:13:47 +0000	[diff] [blame]	1621	inspecting <tt>lib/Target/X86/X86GenRegisterInfo.inc</tt> we see that the
				1622	32-bit register <tt>EAX</tt> is denoted by 43, and the MMX register
				1623	<tt>MM0</tt> is mapped to 65.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1624
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1625	<p>Some architectures contain registers that share the same physical location. A
				1626	notable example is the X86 platform. For instance, in the X86 architecture,
				1627	the registers <tt>EAX</tt>, <tt>AX</tt> and <tt>AL</tt> share the first eight
				1628	bits. These physical registers are marked as <i>aliased</i> in LLVM. Given a
				1629	particular architecture, you can check which registers are aliased by
Jakob Stoklund Olesen	22de91a	2012-06-01 23:28:34 +0000	[diff] [blame]	1630	inspecting its <tt>RegisterInfo.td</tt> file. Moreover, the class
				1631	<tt>MCRegAliasIterator</tt> enumerates all the physical registers aliased to
				1632	a register.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1633
				1634	<p>Physical registers, in LLVM, are grouped in <i>Register Classes</i>.
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1635	Elements in the same register class are functionally equivalent, and can be
				1636	interchangeably used. Each virtual register can only be mapped to physical
				1637	registers of a particular class. For instance, in the X86 architecture, some
				1638	virtuals can only be allocated to 8 bit registers. A register class is
				1639	described by <tt>TargetRegisterClass</tt> objects. To discover if a virtual
				1640	register is compatible with a given physical, this code can be used:</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1641
				1642	<div class="doc_code">
				1643	<pre>
Jim Laskey	b744c25	2006-12-15 10:40:48 +0000	[diff] [blame]	1644	bool RegMapping_Fer::compatible_class(MachineFunction &mf,
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1645	unsigned v_reg,
				1646	unsigned p_reg) {
Dan Gohman	6f0d024	2008-02-10 18:45:23 +0000	[diff] [blame]	1647	assert(TargetRegisterInfo::isPhysicalRegister(p_reg) &&
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1648	"Target register must be physical");
Chris Lattner	534bcfb	2007-12-31 04:16:08 +0000	[diff] [blame]	1649	const TargetRegisterClass *trc = mf.getRegInfo().getRegClass(v_reg);
				1650	return trc->contains(p_reg);
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1651	}
				1652	</pre>
				1653	</div>
				1654
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1655	<p>Sometimes, mostly for debugging purposes, it is useful to change the number
				1656	of physical registers available in the target architecture. This must be done
				1657	statically, inside the <tt>TargetRegsterInfo.td</tt> file. Just <tt>grep</tt>
				1658	for <tt>RegisterClass</tt>, the last parameter of which is a list of
				1659	registers. Just commenting some out is one simple way to avoid them being
				1660	used. A more polite way is to explicitly exclude some registers from
Dan Gohman	d2cb3d2	2009-07-24 00:30:09 +0000	[diff] [blame]	1661	the <i>allocation order</i>. See the definition of the <tt>GR8</tt> register
				1662	class in <tt>lib/Target/X86/X86RegisterInfo.td</tt> for an example of this.
				1663	</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1664
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1665	<p>Virtual registers are also denoted by integer numbers. Contrary to physical
Jakob Stoklund Olesen	3ca2102	2011-01-08 23:10:59 +0000	[diff] [blame]	1666	registers, different virtual registers never share the same number. Whereas
				1667	physical registers are statically defined in a <tt>TargetRegisterInfo.td</tt>
				1668	file and cannot be created by the application developer, that is not the case
				1669	with virtual registers. In order to create new virtual registers, use the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1670	method <tt>MachineRegisterInfo::createVirtualRegister()</tt>. This method
Jakob Stoklund Olesen	3ca2102	2011-01-08 23:10:59 +0000	[diff] [blame]	1671	will return a new virtual register. Use an <tt>IndexedMap<Foo,
				1672	VirtReg2IndexFunctor></tt> to hold information per virtual register. If you
				1673	need to enumerate all virtual registers, use the function
				1674	<tt>TargetRegisterInfo::index2VirtReg()</tt> to find the virtual register
				1675	numbers:</p>
				1676
				1677	<div class="doc_code">
				1678	<pre>
				1679	for (unsigned i = 0, e = MRI->getNumVirtRegs(); i != e; ++i) {
				1680	unsigned VirtReg = TargetRegisterInfo::index2VirtReg(i);
				1681	stuff(VirtReg);
				1682	}
				1683	</pre>
				1684	</div>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1685
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1686	<p>Before register allocation, the operands of an instruction are mostly virtual
				1687	registers, although physical registers may also be used. In order to check if
				1688	a given machine operand is a register, use the boolean
				1689	function <tt>MachineOperand::isRegister()</tt>. To obtain the integer code of
				1690	a register, use <tt>MachineOperand::getReg()</tt>. An instruction may define
				1691	or use a register. For instance, <tt>ADD reg:1026 := reg:1025 reg:1024</tt>
				1692	defines the registers 1024, and uses registers 1025 and 1026. Given a
				1693	register operand, the method <tt>MachineOperand::isUse()</tt> informs if that
				1694	register is being used by the instruction. The
				1695	method <tt>MachineOperand::isDef()</tt> informs if that registers is being
				1696	defined.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1697
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1698	<p>We will call physical registers present in the LLVM bitcode before register
				1699	allocation <i>pre-colored registers</i>. Pre-colored registers are used in
				1700	many different situations, for instance, to pass parameters of functions
				1701	calls, and to store results of particular instructions. There are two types
				1702	of pre-colored registers: the ones <i>implicitly</i> defined, and
				1703	those <i>explicitly</i> defined. Explicitly defined registers are normal
				1704	operands, and can be accessed
				1705	with <tt>MachineInstr::getOperand(int)::getReg()</tt>. In order to check
				1706	which registers are implicitly defined by an instruction, use
				1707	the <tt>TargetInstrInfo::get(opcode)::ImplicitDefs</tt>,
				1708	where <tt>opcode</tt> is the opcode of the target instruction. One important
				1709	difference between explicit and implicit physical registers is that the
				1710	latter are defined statically for each instruction, whereas the former may
				1711	vary depending on the program being compiled. For example, an instruction
				1712	that represents a function call will always implicitly define or use the same
				1713	set of physical registers. To read the registers implicitly used by an
				1714	instruction,
				1715	use <tt>TargetInstrInfo::get(opcode)::ImplicitUses</tt>. Pre-colored
				1716	registers impose constraints on any register allocation algorithm. The
Bob Wilson	0473868	2010-04-09 18:39:54 +0000	[diff] [blame]	1717	register allocator must make sure that none of them are overwritten by
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1718	the values of virtual registers while still alive.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1719
				1720	</div>
				1721
				1722	<!-- _______________________________________________________________________ -->
				1723
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1724	<h4>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1725	<a name="regAlloc_howTo">Mapping virtual registers to physical registers</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1726	</h4>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1727
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	1728	<div>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1729
				1730	<p>There are two ways to map virtual registers to physical registers (or to
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1731	memory slots). The first way, that we will call <i>direct mapping</i>, is
				1732	based on the use of methods of the classes <tt>TargetRegisterInfo</tt>,
				1733	and <tt>MachineOperand</tt>. The second way, that we will call <i>indirect
				1734	mapping</i>, relies on the <tt>VirtRegMap</tt> class in order to insert loads
				1735	and stores sending and getting values to and from memory.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1736
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1737	<p>The direct mapping provides more flexibility to the developer of the register
				1738	allocator; however, it is more error prone, and demands more implementation
				1739	work. Basically, the programmer will have to specify where load and store
				1740	instructions should be inserted in the target function being compiled in
				1741	order to get and store values in memory. To assign a physical register to a
				1742	virtual register present in a given operand,
				1743	use <tt>MachineOperand::setReg(p_reg)</tt>. To insert a store instruction,
Jakob Stoklund Olesen	297907f	2010-08-31 22:01:07 +0000	[diff] [blame]	1744	use <tt>TargetInstrInfo::storeRegToStackSlot(...)</tt>, and to insert a
				1745	load instruction, use <tt>TargetInstrInfo::loadRegFromStackSlot</tt>.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1746
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1747	<p>The indirect mapping shields the application developer from the complexities
				1748	of inserting load and store instructions. In order to map a virtual register
				1749	to a physical one, use <tt>VirtRegMap::assignVirt2Phys(vreg, preg)</tt>. In
				1750	order to map a certain virtual register to memory,
				1751	use <tt>VirtRegMap::assignVirt2StackSlot(vreg)</tt>. This method will return
				1752	the stack slot where <tt>vreg</tt>'s value will be located. If it is
				1753	necessary to map another virtual register to the same stack slot,
				1754	use <tt>VirtRegMap::assignVirt2StackSlot(vreg, stack_location)</tt>. One
				1755	important point to consider when using the indirect mapping, is that even if
				1756	a virtual register is mapped to memory, it still needs to be mapped to a
				1757	physical register. This physical register is the location where the virtual
				1758	register is supposed to be found before being stored or after being
				1759	reloaded.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1760
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1761	<p>If the indirect strategy is used, after all the virtual registers have been
				1762	mapped to physical registers or stack slots, it is necessary to use a spiller
				1763	object to place load and store instructions in the code. Every virtual that
				1764	has been mapped to a stack slot will be stored to memory after been defined
				1765	and will be loaded before being used. The implementation of the spiller tries
				1766	to recycle load/store instructions, avoiding unnecessary instructions. For an
				1767	example of how to invoke the spiller,
				1768	see <tt>RegAllocLinearScan::runOnMachineFunction</tt>
				1769	in <tt>lib/CodeGen/RegAllocLinearScan.cpp</tt>.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1770
				1771	</div>
				1772
				1773	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1774	<h4>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1775	<a name="regAlloc_twoAddr">Handling two address instructions</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1776	</h4>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1777
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	1778	<div>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1779
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1780	<p>With very rare exceptions (e.g., function calls), the LLVM machine code
				1781	instructions are three address instructions. That is, each instruction is
				1782	expected to define at most one register, and to use at most two registers.
				1783	However, some architectures use two address instructions. In this case, the
				1784	defined register is also one of the used register. For instance, an
				1785	instruction such as <tt>ADD %EAX, %EBX</tt>, in X86 is actually equivalent
				1786	to <tt>%EAX = %EAX + %EBX</tt>.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1787
				1788	<p>In order to produce correct code, LLVM must convert three address
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1789	instructions that represent two address instructions into true two address
				1790	instructions. LLVM provides the pass <tt>TwoAddressInstructionPass</tt> for
				1791	this specific purpose. It must be run before register allocation takes
				1792	place. After its execution, the resulting code may no longer be in SSA
				1793	form. This happens, for instance, in situations where an instruction such
				1794	as <tt>%a = ADD %b %c</tt> is converted to two instructions such as:</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1795
				1796	<div class="doc_code">
				1797	<pre>
				1798	%a = MOVE %b
Dan Gohman	03e5857	2008-06-13 17:55:57 +0000	[diff] [blame]	1799	%a = ADD %a %c
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1800	</pre>
				1801	</div>
				1802
				1803	<p>Notice that, internally, the second instruction is represented as
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1804	<tt>ADD %a[def/use] %c</tt>. I.e., the register operand <tt>%a</tt> is both
				1805	used and defined by the instruction.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1806
				1807	</div>
				1808
				1809	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1810	<h4>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1811	<a name="regAlloc_ssaDecon">The SSA deconstruction phase</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1812	</h4>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1813
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	1814	<div>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1815
				1816	<p>An important transformation that happens during register allocation is called
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1817	the <i>SSA Deconstruction Phase</i>. The SSA form simplifies many analyses
				1818	that are performed on the control flow graph of programs. However,
				1819	traditional instruction sets do not implement PHI instructions. Thus, in
				1820	order to generate executable code, compilers must replace PHI instructions
				1821	with other instructions that preserve their semantics.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1822
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1823	<p>There are many ways in which PHI instructions can safely be removed from the
				1824	target code. The most traditional PHI deconstruction algorithm replaces PHI
				1825	instructions with copy instructions. That is the strategy adopted by
				1826	LLVM. The SSA deconstruction algorithm is implemented
				1827	in <tt>lib/CodeGen/PHIElimination.cpp</tt>. In order to invoke this pass, the
				1828	identifier <tt>PHIEliminationID</tt> must be marked as required in the code
				1829	of the register allocator.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1830
				1831	</div>
				1832
				1833	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1834	<h4>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1835	<a name="regAlloc_fold">Instruction folding</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1836	</h4>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1837
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	1838	<div>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1839
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1840	<p><i>Instruction folding</i> is an optimization performed during register
				1841	allocation that removes unnecessary copy instructions. For instance, a
				1842	sequence of instructions such as:</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1843
				1844	<div class="doc_code">
				1845	<pre>
				1846	%EBX = LOAD %mem_address
				1847	%EAX = COPY %EBX
				1848	</pre>
				1849	</div>
				1850
Dan Gohman	a7ab2bf	2008-11-24 16:35:31 +0000	[diff] [blame]	1851	<p>can be safely substituted by the single instruction:</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1852
				1853	<div class="doc_code">
				1854	<pre>
				1855	%EAX = LOAD %mem_address
				1856	</pre>
				1857	</div>
				1858
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1859	<p>Instructions can be folded with
				1860	the <tt>TargetRegisterInfo::foldMemoryOperand(...)</tt> method. Care must be
				1861	taken when folding instructions; a folded instruction can be quite different
				1862	from the original
				1863	instruction. See <tt>LiveIntervals::addIntervalsForSpills</tt>
				1864	in <tt>lib/CodeGen/LiveIntervalAnalysis.cpp</tt> for an example of its
				1865	use.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1866
				1867	</div>
				1868
				1869	<!-- _______________________________________________________________________ -->
				1870
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1871	<h4>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1872	<a name="regAlloc_builtIn">Built in register allocators</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1873	</h4>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1874
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	1875	<div>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1876
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1877	<p>The LLVM infrastructure provides the application developer with three
				1878	different register allocators:</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1879
				1880	<ul>
Jakob Stoklund Olesen	8a3eab9	2010-06-15 21:58:33 +0000	[diff] [blame]	1881	<li><i>Fast</i> — This register allocator is the default for debug
				1882	builds. It allocates registers on a basic block level, attempting to keep
				1883	values in registers and reusing registers as appropriate.</li>
				1884
Andrew Trick	5acaeb5	2011-07-26 18:31:49 +0000	[diff] [blame]	1885	<li><i>Basic</i> — This is an incremental approach to register
				1886	allocation. Live ranges are assigned to registers one at a time in
				1887	an order that is driven by heuristics. Since code can be rewritten
				1888	on-the-fly during allocation, this framework allows interesting
				1889	allocators to be developed as extensions. It is not itself a
				1890	production register allocator but is a potentially useful
				1891	stand-alone mode for triaging bugs and as a performance baseline.
				1892
				1893	<li><i>Greedy</i> — <i>The default allocator</i>. This is a
				1894	highly tuned implementation of the <i>Basic</i> allocator that
				1895	incorporates global live range splitting. This allocator works hard
				1896	to minimize the cost of spill code.
				1897
Jakob Stoklund Olesen	8a3eab9	2010-06-15 21:58:33 +0000	[diff] [blame]	1898	<li><i>PBQP</i> — A Partitioned Boolean Quadratic Programming (PBQP)
				1899	based register allocator. This allocator works by constructing a PBQP
				1900	problem representing the register allocation problem under consideration,
				1901	solving this using a PBQP solver, and mapping the solution back to a
				1902	register assignment.</li>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1903	</ul>
				1904
				1905	<p>The type of register allocator used in <tt>llc</tt> can be chosen with the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1906	command line option <tt>-regalloc=...</tt>:</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1907
				1908	<div class="doc_code">
				1909	<pre>
Dan Gohman	0cabaa5	2009-08-25 15:54:01 +0000	[diff] [blame]	1910	$ llc -regalloc=linearscan file.bc -o ln.s;
Jakob Stoklund Olesen	8a3eab9	2010-06-15 21:58:33 +0000	[diff] [blame]	1911	$ llc -regalloc=fast file.bc -o fa.s;
				1912	$ llc -regalloc=pbqp file.bc -o pbqp.s;
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1913	</pre>
				1914	</div>
				1915
				1916	</div>
				1917
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	1918	</div>
				1919
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1920	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1921	<h3>
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1922	<a name="proepicode">Prolog/Epilog Code Insertion</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1923	</h3>
Bill Wendling	66bc5c6	2011-07-25 20:19:48 +0000	[diff] [blame]	1924
NAKAMURA Takumi	4b2e07a	2011-10-31 13:04:26 +0000	[diff] [blame]	1925	<div>
				1926
Bill Wendling	66bc5c6	2011-07-25 20:19:48 +0000	[diff] [blame]	1927	<!-- _______________________________________________________________________ -->
				1928	<h4>
				1929	<a name="compact_unwind">Compact Unwind</a>
				1930	</h4>
				1931
				1932	<div>
				1933
Bill Wendling	75471d6	2011-07-26 07:58:09 +0000	[diff] [blame]	1934	<p>Throwing an exception requires <em>unwinding</em> out of a function. The
				1935	information on how to unwind a given function is traditionally expressed in
				1936	DWARF unwind (a.k.a. frame) info. But that format was originally developed
				1937	for debuggers to backtrace, and each Frame Description Entry (FDE) requires
				1938	~20-30 bytes per function. There is also the cost of mapping from an address
				1939	in a function to the corresponding FDE at runtime. An alternative unwind
				1940	encoding is called <em>compact unwind</em> and requires just 4-bytes per
				1941	function.</p>
Bill Wendling	66bc5c6	2011-07-25 20:19:48 +0000	[diff] [blame]	1942
				1943	<p>The compact unwind encoding is a 32-bit value, which is encoded in an
				1944	architecture-specific way. It specifies which registers to restore and from
Roman Divacky	4d7ce32	2011-08-01 20:38:27 +0000	[diff] [blame]	1945	where, and how to unwind out of the function. When the linker creates a final
Bill Wendling	66bc5c6	2011-07-25 20:19:48 +0000	[diff] [blame]	1946	linked image, it will create a <code>__TEXT,__unwind_info</code>
				1947	section. This section is a small and fast way for the runtime to access
				1948	unwind info for any given function. If we emit compact unwind info for the
				1949	function, that compact unwind info will be encoded in
				1950	the <code>__TEXT,__unwind_info</code> section. If we emit DWARF unwind info,
				1951	the <code>__TEXT,__unwind_info</code> section will contain the offset of the
				1952	FDE in the <code>__TEXT,__eh_frame</code> section in the final linked
				1953	image.</p>
				1954
				1955	<p>For X86, there are three modes for the compact unwind encoding:</p>
				1956
Bill Wendling	75471d6	2011-07-26 07:58:09 +0000	[diff] [blame]	1957	<dl>
Bill Wendling	66bc5c6	2011-07-25 20:19:48 +0000	[diff] [blame]	1958	<dt><i>Function with a Frame Pointer (<code>EBP</code> or <code>RBP</code>)</i></dt>
				1959	<dd><p><code>EBP/RBP</code>-based frame, where <code>EBP/RBP</code> is pushed
				1960	onto the stack immediately after the return address,
				1961	then <code>ESP/RSP</code> is moved to <code>EBP/RBP</code>. Thus to
				1962	unwind, <code>ESP/RSP</code> is restored with the
				1963	current <code>EBP/RBP</code> value, then <code>EBP/RBP</code> is restored
				1964	by popping the stack, and the return is done by popping the stack once
				1965	more into the PC. All non-volatile registers that need to be restored must
				1966	have been saved in a small range on the stack that
				1967	starts <code>EBP-4</code> to <code>EBP-1020</code> (<code>RBP-8</code>
Bill Wendling	75471d6	2011-07-26 07:58:09 +0000	[diff] [blame]	1968	to <code>RBP-1020</code>). The offset (divided by 4 in 32-bit mode and 8
				1969	in 64-bit mode) is encoded in bits 16-23 (mask: <code>0x00FF0000</code>).
				1970	The registers saved are encoded in bits 0-14
				1971	(mask: <code>0x00007FFF</code>) as five 3-bit entries from the following
				1972	table:</p>
Bill Wendling	66bc5c6	2011-07-25 20:19:48 +0000	[diff] [blame]	1973	<table border="1" cellspacing="0">
				1974	<tr>
				1975	<th>Compact Number</th>
				1976	<th>i386 Register</th>
				1977	<th>x86-64 Regiser</th>
				1978	</tr>
				1979	<tr>
				1980	<td>1</td>
				1981	<td><code>EBX</code></td>
				1982	<td><code>RBX</code></td>
				1983	</tr>
				1984	<tr>
				1985	<td>2</td>
				1986	<td><code>ECX</code></td>
				1987	<td><code>R12</code></td>
				1988	</tr>
				1989	<tr>
				1990	<td>3</td>
				1991	<td><code>EDX</code></td>
				1992	<td><code>R13</code></td>
				1993	</tr>
				1994	<tr>
				1995	<td>4</td>
				1996	<td><code>EDI</code></td>
				1997	<td><code>R14</code></td>
				1998	</tr>
				1999	<tr>
				2000	<td>5</td>
				2001	<td><code>ESI</code></td>
				2002	<td><code>R15</code></td>
				2003	</tr>
				2004	<tr>
				2005	<td>6</td>
				2006	<td><code>EBP</code></td>
				2007	<td><code>RBP</code></td>
				2008	</tr>
				2009	</table>
				2010
				2011	</dd>
				2012
Bill Wendling	3ef750d	2011-07-25 20:25:03 +0000	[diff] [blame]	2013	<dt><i>Frameless with a Small Constant Stack Size (<code>EBP</code>
				2014	or <code>RBP</code> is not used as a frame pointer)</i></dt>
Bill Wendling	66bc5c6	2011-07-25 20:19:48 +0000	[diff] [blame]	2015	<dd><p>To return, a constant (encoded in the compact unwind encoding) is added
				2016	to the <code>ESP/RSP</code>. Then the return is done by popping the stack
				2017	into the PC. All non-volatile registers that need to be restored must have
				2018	been saved on the stack immediately after the return address. The stack
Bill Wendling	75471d6	2011-07-26 07:58:09 +0000	[diff] [blame]	2019	size (divided by 4 in 32-bit mode and 8 in 64-bit mode) is encoded in bits
				2020	16-23 (mask: <code>0x00FF0000</code>). There is a maximum stack size of
				2021	1024 bytes in 32-bit mode and 2048 in 64-bit mode. The number of registers
				2022	saved is encoded in bits 9-12 (mask: <code>0x00001C00</code>). Bits 0-9
				2023	(mask: <code>0x000003FF</code>) contain which registers were saved and
				2024	their order. (See
				2025	the <code>encodeCompactUnwindRegistersWithoutFrame()</code> function
				2026	in <code>lib/Target/X86FrameLowering.cpp</code> for the encoding
Bill Wendling	66bc5c6	2011-07-25 20:19:48 +0000	[diff] [blame]	2027	algorithm.)</p></dd>
				2028
Bill Wendling	3ef750d	2011-07-25 20:25:03 +0000	[diff] [blame]	2029	<dt><i>Frameless with a Large Constant Stack Size (<code>EBP</code>
				2030	or <code>RBP</code> is not used as a frame pointer)</i></dt>
Bill Wendling	66bc5c6	2011-07-25 20:19:48 +0000	[diff] [blame]	2031	<dd><p>This case is like the "Frameless with a Small Constant Stack Size"
Bill Wendling	3ef750d	2011-07-25 20:25:03 +0000	[diff] [blame]	2032	case, but the stack size is too large to encode in the compact unwind
Bill Wendling	66bc5c6	2011-07-25 20:19:48 +0000	[diff] [blame]	2033	encoding. Instead it requires that the function contains "<code>subl
				2034	$nnnnnn, %esp</code>" in its prolog. The compact encoding contains the
Bill Wendling	3ef750d	2011-07-25 20:25:03 +0000	[diff] [blame]	2035	offset to the <code>$nnnnnn</code> value in the function in bits 9-12
Bill Wendling	66bc5c6	2011-07-25 20:19:48 +0000	[diff] [blame]	2036	(mask: <code>0x00001C00</code>).</p></dd>
Bill Wendling	75471d6	2011-07-26 07:58:09 +0000	[diff] [blame]	2037	</dl>
Bill Wendling	66bc5c6	2011-07-25 20:19:48 +0000	[diff] [blame]	2038
				2039	</div>
				2040
NAKAMURA Takumi	4b2e07a	2011-10-31 13:04:26 +0000	[diff] [blame]	2041	</div>
				2042
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	2043	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2044	<h3>
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	2045	<a name="latemco">Late Machine Code Optimizations</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2046	</h3>
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2047	<div><p>To Be Written</p></div>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	2048
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	2049	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2050	<h3>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	2051	<a name="codeemit">Code Emission</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2052	</h3>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	2053
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2054	<div>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	2055
				2056	<p>The code emission step of code generation is responsible for lowering from
				2057	the code generator abstractions (like <a
				2058	href="#machinefunction">MachineFunction</a>, <a
				2059	href="#machineinstr">MachineInstr</a>, etc) down
				2060	to the abstractions used by the MC layer (<a href="#mcinst">MCInst</a>,
				2061	<a href="#mcstreamer">MCStreamer</a>, etc). This is
				2062	done with a combination of several different classes: the (misnamed)
				2063	target-independent AsmPrinter class, target-specific subclasses of AsmPrinter
				2064	(such as SparcAsmPrinter), and the TargetLoweringObjectFile class.</p>
				2065
				2066	<p>Since the MC layer works at the level of abstraction of object files, it
				2067	doesn't have a notion of functions, global variables etc. Instead, it thinks
				2068	about labels, directives, and instructions. A key class used at this time is
				2069	the MCStreamer class. This is an abstract API that is implemented in different
				2070	ways (e.g. to output a .s file, output an ELF .o file, etc) that is effectively
				2071	an "assembler API". MCStreamer has one method per directive, such as EmitLabel,
				2072	EmitSymbolAttribute, SwitchSection, etc, which directly correspond to assembly
				2073	level directives.
				2074	</p>
				2075
				2076	<p>If you are interested in implementing a code generator for a target, there
				2077	are three important things that you have to implement for your target:</p>
				2078
				2079	<ol>
				2080	<li>First, you need a subclass of AsmPrinter for your target. This class
				2081	implements the general lowering process converting MachineFunction's into MC
				2082	label constructs. The AsmPrinter base class provides a number of useful methods
				2083	and routines, and also allows you to override the lowering process in some
				2084	important ways. You should get much of the lowering for free if you are
				2085	implementing an ELF, COFF, or MachO target, because the TargetLoweringObjectFile
				2086	class implements much of the common logic.</li>
				2087
				2088	<li>Second, you need to implement an instruction printer for your target. The
				2089	instruction printer takes an <a href="#mcinst">MCInst</a> and renders it to a
				2090	raw_ostream as text. Most of this is automatically generated from the .td file
				2091	(when you specify something like "<tt>add $dst, $src1, $src2</tt>" in the
				2092	instructions), but you need to implement routines to print operands.</li>
				2093
				2094	<li>Third, you need to implement code that lowers a <a
				2095	href="#machineinstr">MachineInstr</a> to an MCInst, usually implemented in
				2096	"<target>MCInstLower.cpp". This lowering process is often target
				2097	specific, and is responsible for turning jump table entries, constant pool
				2098	indices, global variable addresses, etc into MCLabels as appropriate. This
				2099	translation layer is also responsible for expanding pseudo ops used by the code
				2100	generator into the actual machine instructions they correspond to. The MCInsts
				2101	that are generated by this are fed into the instruction printer or the encoder.
				2102	</li>
				2103
				2104	</ol>
				2105
				2106	<p>Finally, at your choosing, you can also implement an subclass of
				2107	MCCodeEmitter which lowers MCInst's into machine code bytes and relocations.
				2108	This is important if you want to support direct .o file emission, or would like
				2109	to implement an assembler for your target.</p>
				2110
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	2111	</div>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	2112
Anshuman Dasgupta	6805b56	2011-12-06 23:12:42 +0000	[diff] [blame]	2113	<!-- ======================================================================= -->
				2114	<h3>
				2115	<a name="vliw_packetizer">VLIW Packetizer</a>
				2116	</h3>
				2117
				2118	<div>
				2119
				2120	<p>In a Very Long Instruction Word (VLIW) architecture, the compiler is
				2121	responsible for mapping instructions to functional-units available on
				2122	the architecture. To that end, the compiler creates groups of instructions
				2123	called <i>packets</i> or <i>bundles</i>. The VLIW packetizer in LLVM is
				2124	a target-independent mechanism to enable the packetization of machine
				2125	instructions.</p>
				2126
				2127	<!-- _______________________________________________________________________ -->
				2128
				2129	<h4>
				2130	<a name="vliw_mapping">Mapping from instructions to functional units</a>
				2131	</h4>
				2132
				2133	<div>
				2134
				2135	<p>Instructions in a VLIW target can typically be mapped to multiple functional
				2136	units. During the process of packetizing, the compiler must be able to reason
				2137	about whether an instruction can be added to a packet. This decision can be
				2138	complex since the compiler has to examine all possible mappings of instructions
				2139	to functional units. Therefore to alleviate compilation-time complexity, the
				2140	VLIW packetizer parses the instruction classes of a target and generates tables
				2141	at compiler build time. These tables can then be queried by the provided
				2142	machine-independent API to determine if an instruction can be accommodated in a
				2143	packet.</p>
				2144	</div>
				2145
				2146	<!-- ======================================================================= -->
				2147	<h4>
				2148	<a name="vliw_repr">
				2149	How the packetization tables are generated and used
				2150	</a>
				2151	</h4>
				2152
				2153	<div>
				2154
				2155	<p>The packetizer reads instruction classes from a target's itineraries and
				2156	creates a deterministic finite automaton (DFA) to represent the state of a
				2157	packet. A DFA consists of three major elements: inputs, states, and
				2158	transitions. The set of inputs for the generated DFA represents the instruction
				2159	being added to a packet. The states represent the possible consumption
				2160	of functional units by instructions in a packet. In the DFA, transitions from
				2161	one state to another occur on the addition of an instruction to an existing
				2162	packet. If there is a legal mapping of functional units to instructions, then
				2163	the DFA contains a corresponding transition. The absence of a transition
				2164	indicates that a legal mapping does not exist and that the instruction cannot
				2165	be added to the packet.</p>
				2166
				2167	<p>To generate tables for a VLIW target, add <i>Target</i>GenDFAPacketizer.inc
				2168	as a target to the Makefile in the target directory. The exported API provides
				2169	three functions: <tt>DFAPacketizer::clearResources()</tt>,
				2170	<tt>DFAPacketizer::reserveResources(MachineInstr *MI)</tt>, and
				2171	<tt>DFAPacketizer::canReserveResources(MachineInstr *MI)</tt>. These functions
				2172	allow a target packetizer to add an instruction to an existing packet and to
				2173	check whether an instruction can be added to a packet. See
				2174	<tt>llvm/CodeGen/DFAPacketizer.h</tt> for more information.</p>
				2175
				2176	</div>
				2177
				2178	</div>
				2179
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2180	</div>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	2181
Chris Lattner	22481f2	2010-09-21 04:03:39 +0000	[diff] [blame]	2182	<!-- *********************************************************************** -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2183	<h2>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	2184	<a name="nativeassembler">Implementing a Native Assembler</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2185	</h2>
Chris Lattner	22481f2	2010-09-21 04:03:39 +0000	[diff] [blame]	2186	<!-- *********************************************************************** -->
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	2187
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2188	<div>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	2189
Chris Lattner	22481f2	2010-09-21 04:03:39 +0000	[diff] [blame]	2190	<p>Though you're probably reading this because you want to write or maintain a
				2191	compiler backend, LLVM also fully supports building a native assemblers too.
				2192	We've tried hard to automate the generation of the assembler from the .td files
				2193	(in particular the instruction syntax and encodings), which means that a large
				2194	part of the manual and repetitive data entry can be factored and shared with the
				2195	compiler.</p>
				2196
Chris Lattner	674c1dc	2010-10-30 17:36:36 +0000	[diff] [blame]	2197	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2198	<h3 id="na_instparsing">Instruction Parsing</h3>
Chris Lattner	674c1dc	2010-10-30 17:36:36 +0000	[diff] [blame]	2199
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2200	<div><p>To Be Written</p></div>
Chris Lattner	674c1dc	2010-10-30 17:36:36 +0000	[diff] [blame]	2201
				2202
				2203	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2204	<h3 id="na_instaliases">
Chris Lattner	674c1dc	2010-10-30 17:36:36 +0000	[diff] [blame]	2205	Instruction Alias Processing
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2206	</h3>
Chris Lattner	674c1dc	2010-10-30 17:36:36 +0000	[diff] [blame]	2207
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2208	<div>
Chris Lattner	674c1dc	2010-10-30 17:36:36 +0000	[diff] [blame]	2209	<p>Once the instruction is parsed, it enters the MatchInstructionImpl function.
				2210	The MatchInstructionImpl function performs alias processing and then does
				2211	actual matching.</p>
				2212
Chris Lattner	693173f	2010-10-30 19:23:13 +0000	[diff] [blame]	2213	<p>Alias processing is the phase that canonicalizes different lexical forms of
Chris Lattner	674c1dc	2010-10-30 17:36:36 +0000	[diff] [blame]	2214	the same instructions down to one representation. There are several different
				2215	kinds of alias that are possible to implement and they are listed below in the
				2216	order that they are processed (which is in order from simplest/weakest to most
				2217	complex/powerful). Generally you want to use the first alias mechanism that
				2218	meets the needs of your instruction, because it will allow a more concise
				2219	description.</p>
				2220
				2221	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2222	<h4>Mnemonic Aliases</h4>
Chris Lattner	674c1dc	2010-10-30 17:36:36 +0000	[diff] [blame]	2223
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2224	<div>
Chris Lattner	674c1dc	2010-10-30 17:36:36 +0000	[diff] [blame]	2225
Chris Lattner	8cf8bcc	2010-10-30 19:47:49 +0000	[diff] [blame]	2226	<p>The first phase of alias processing is simple instruction mnemonic
Chris Lattner	674c1dc	2010-10-30 17:36:36 +0000	[diff] [blame]	2227	remapping for classes of instructions which are allowed with two different
Chris Lattner	693173f	2010-10-30 19:23:13 +0000	[diff] [blame]	2228	mnemonics. This phase is a simple and unconditionally remapping from one input
Chris Lattner	674c1dc	2010-10-30 17:36:36 +0000	[diff] [blame]	2229	mnemonic to one output mnemonic. It isn't possible for this form of alias to
				2230	look at the operands at all, so the remapping must apply for all forms of a
				2231	given mnemonic. Mnemonic aliases are defined simply, for example X86 has:
				2232	</p>
				2233
				2234	<div class="doc_code">
				2235	<pre>
				2236	def : MnemonicAlias<"cbw", "cbtw">;
				2237	def : MnemonicAlias<"smovq", "movsq">;
				2238	def : MnemonicAlias<"fldcww", "fldcw">;
				2239	def : MnemonicAlias<"fucompi", "fucomip">;
				2240	def : MnemonicAlias<"ud2a", "ud2">;
				2241	</pre>
				2242	</div>
				2243
				2244	<p>... and many others. With a MnemonicAlias definition, the mnemonic is
Chris Lattner	693173f	2010-10-30 19:23:13 +0000	[diff] [blame]	2245	remapped simply and directly. Though MnemonicAlias's can't look at any aspect
				2246	of the instruction (such as the operands) they can depend on global modes (the
				2247	same ones supported by the matcher), through a Requires clause:</p>
				2248
				2249	<div class="doc_code">
				2250	<pre>
				2251	def : MnemonicAlias<"pushf", "pushfq">, Requires<[In64BitMode]>;
				2252	def : MnemonicAlias<"pushf", "pushfl">, Requires<[In32BitMode]>;
				2253	</pre>
				2254	</div>
				2255
				2256	<p>In this example, the mnemonic gets mapped into different a new one depending
				2257	on the current instruction set.</p>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	2258
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	2259	</div>
				2260
Chris Lattner	c7a03fb	2010-11-06 08:30:26 +0000	[diff] [blame]	2261	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2262	<h4>Instruction Aliases</h4>
Chris Lattner	c7a03fb	2010-11-06 08:30:26 +0000	[diff] [blame]	2263
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2264	<div>
Chris Lattner	c7a03fb	2010-11-06 08:30:26 +0000	[diff] [blame]	2265
				2266	<p>The most general phase of alias processing occurs while matching is
				2267	happening: it provides new forms for the matcher to match along with a specific
				2268	instruction to generate. An instruction alias has two parts: the string to
				2269	match and the instruction to generate. For example:
				2270	</p>
				2271
				2272	<div class="doc_code">
				2273	<pre>
				2274	def : InstAlias<"movsx $src, $dst", (MOVSX16rr8W GR16:$dst, GR8 :$src)>;
				2275	def : InstAlias<"movsx $src, $dst", (MOVSX16rm8W GR16:$dst, i8mem:$src)>;
				2276	def : InstAlias<"movsx $src, $dst", (MOVSX32rr8 GR32:$dst, GR8 :$src)>;
				2277	def : InstAlias<"movsx $src, $dst", (MOVSX32rr16 GR32:$dst, GR16 :$src)>;
				2278	def : InstAlias<"movsx $src, $dst", (MOVSX64rr8 GR64:$dst, GR8 :$src)>;
				2279	def : InstAlias<"movsx $src, $dst", (MOVSX64rr16 GR64:$dst, GR16 :$src)>;
				2280	def : InstAlias<"movsx $src, $dst", (MOVSX64rr32 GR64:$dst, GR32 :$src)>;
				2281	</pre>
				2282	</div>
				2283
				2284	<p>This shows a powerful example of the instruction aliases, matching the
				2285	same mnemonic in multiple different ways depending on what operands are present
				2286	in the assembly. The result of instruction aliases can include operands in a
				2287	different order than the destination instruction, and can use an input
				2288	multiple times, for example:</p>
				2289
				2290	<div class="doc_code">
				2291	<pre>
				2292	def : InstAlias<"clrb $reg", (XOR8rr GR8 :$reg, GR8 :$reg)>;
				2293	def : InstAlias<"clrw $reg", (XOR16rr GR16:$reg, GR16:$reg)>;
				2294	def : InstAlias<"clrl $reg", (XOR32rr GR32:$reg, GR32:$reg)>;
				2295	def : InstAlias<"clrq $reg", (XOR64rr GR64:$reg, GR64:$reg)>;
				2296	</pre>
				2297	</div>
				2298
				2299	<p>This example also shows that tied operands are only listed once. In the X86
				2300	backend, XOR8rr has two input GR8's and one output GR8 (where an input is tied
				2301	to the output). InstAliases take a flattened operand list without duplicates
Chris Lattner	90fd797	2010-11-06 19:57:21 +0000	[diff] [blame]	2302	for tied operands. The result of an instruction alias can also use immediates
				2303	and fixed physical registers which are added as simple immediate operands in the
				2304	result, for example:</p>
Chris Lattner	98c870f	2010-11-06 19:25:43 +0000	[diff] [blame]	2305
				2306	<div class="doc_code">
				2307	<pre>
Chris Lattner	90fd797	2010-11-06 19:57:21 +0000	[diff] [blame]	2308	// Fixed Immediate operand.
Chris Lattner	98c870f	2010-11-06 19:25:43 +0000	[diff] [blame]	2309	def : InstAlias<"aad", (AAD8i8 10)>;
Chris Lattner	90fd797	2010-11-06 19:57:21 +0000	[diff] [blame]	2310
				2311	// Fixed register operand.
				2312	def : InstAlias<"fcomi", (COM_FIr ST1)>;
				2313
				2314	// Simple alias.
				2315	def : InstAlias<"fcomi $reg", (COM_FIr RST:$reg)>;
Chris Lattner	98c870f	2010-11-06 19:25:43 +0000	[diff] [blame]	2316	</pre>
				2317	</div>
				2318
Chris Lattner	c7a03fb	2010-11-06 08:30:26 +0000	[diff] [blame]	2319
				2320	<p>Instruction aliases can also have a Requires clause to make them
				2321	subtarget specific.</p>
				2322
Bill Wendling	3f58a51	2011-05-04 23:40:14 +0000	[diff] [blame]	2323	<p>If the back-end supports it, the instruction printer can automatically emit
				2324	the alias rather than what's being aliased. It typically leads to better,
				2325	more readable code. If it's better to print out what's being aliased, then
				2326	pass a '0' as the third parameter to the InstAlias definition.</p>
				2327
Chris Lattner	c7a03fb	2010-11-06 08:30:26 +0000	[diff] [blame]	2328	</div>
				2329
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2330	</div>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	2331
Chris Lattner	22481f2	2010-09-21 04:03:39 +0000	[diff] [blame]	2332	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2333	<h3 id="na_matching">Instruction Matching</h3>
Chris Lattner	674c1dc	2010-10-30 17:36:36 +0000	[diff] [blame]	2334
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2335	<div><p>To Be Written</p></div>
Chris Lattner	22481f2	2010-09-21 04:03:39 +0000	[diff] [blame]	2336
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2337	</div>
Chris Lattner	22481f2	2010-09-21 04:03:39 +0000	[diff] [blame]	2338
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	2339	<!-- *********************************************************************** -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2340	<h2>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	2341	<a name="targetimpls">Target-specific Implementation Notes</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2342	</h2>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2343	<!-- *********************************************************************** -->
				2344
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2345	<div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2346
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2347	<p>This section of the document explains features or design decisions that are
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2348	specific to the code generator for a particular target. First we start
				2349	with a table that summarizes what features are supported by each target.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2350
Arnold Schwaighofer	9097d14	2008-05-14 09:17:12 +0000	[diff] [blame]	2351	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2352	<h3>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2353	<a name="targetfeatures">Target Feature Matrix</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2354	</h3>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2355
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2356	<div>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2357
				2358	<p>Note that this table does not include the C backend or Cpp backends, since
				2359	they do not use the target independent code generator infrastructure. It also
				2360	doesn't list features that are not supported fully by any target yet. It
				2361	considers a feature to be supported if at least one subtarget supports it. A
				2362	feature being supported means that it is useful and works for most cases, it
				2363	does not indicate that there are zero known bugs in the implementation. Here
				2364	is the key:</p>
				2365
				2366
				2367	<table border="1" cellspacing="0">
				2368	<tr>
				2369	<th>Unknown</th>
				2370	<th>No support</th>
				2371	<th>Partial Support</th>
				2372	<th>Complete Support</th>
				2373	</tr>
				2374	<tr>
				2375	<td class="unknown"></td>
				2376	<td class="no"></td>
				2377	<td class="partial"></td>
				2378	<td class="yes"></td>
				2379	</tr>
				2380	</table>
				2381
				2382	<p>Here is the table:</p>
				2383
				2384	<table width="689" border="1" cellspacing="0">
				2385	<tr><td></td>
Benjamin Kramer	943beeb	2010-10-30 21:07:28 +0000	[diff] [blame]	2386	<td colspan="13" align="center" style="background-color:#ffc">Target</td>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2387	</tr>
				2388	<tr>
				2389	<th>Feature</th>
				2390	<th>ARM</th>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2391	<th>CellSPU</th>
Tony Linthicum	b4b5415	2011-12-12 21:14:40 +0000	[diff] [blame]	2392	<th>Hexagon</th>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2393	<th>MBlaze</th>
				2394	<th>MSP430</th>
				2395	<th>Mips</th>
				2396	<th>PTX</th>
				2397	<th>PowerPC</th>
				2398	<th>Sparc</th>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2399	<th>X86</th>
				2400	<th>XCore</th>
				2401	</tr>
				2402
				2403	<tr>
				2404	<td><a href="#feat_reliable">is generally reliable</a></td>
				2405	<td class="yes"></td> <!-- ARM -->
Kalle Raiskila	94cc4fe	2010-10-25 08:57:30 +0000	[diff] [blame]	2406	<td class="no"></td> <!-- CellSPU -->
Tony Linthicum	b4b5415	2011-12-12 21:14:40 +0000	[diff] [blame]	2407	<td class="yes"></td> <!-- Hexagon -->
Wesley Peck	c6a4524	2010-10-24 18:50:12 +0000	[diff] [blame]	2408	<td class="no"></td> <!-- MBlaze -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2409	<td class="unknown"></td> <!-- MSP430 -->
Bruno Cardoso Lopes	9d9c4ad	2011-10-25 20:09:31 +0000	[diff] [blame]	2410	<td class="yes"></td> <!-- Mips -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2411	<td class="no"></td> <!-- PTX -->
				2412	<td class="yes"></td> <!-- PowerPC -->
				2413	<td class="yes"></td> <!-- Sparc -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2414	<td class="yes"></td> <!-- X86 -->
				2415	<td class="unknown"></td> <!-- XCore -->
				2416	</tr>
				2417
				2418	<tr>
				2419	<td><a href="#feat_asmparser">assembly parser</a></td>
				2420	<td class="no"></td> <!-- ARM -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2421	<td class="no"></td> <!-- CellSPU -->
Tony Linthicum	b4b5415	2011-12-12 21:14:40 +0000	[diff] [blame]	2422	<td class="no"></td> <!-- Hexagon -->
Wesley Peck	d5fe3ef	2010-12-20 21:54:50 +0000	[diff] [blame]	2423	<td class="yes"></td> <!-- MBlaze -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2424	<td class="no"></td> <!-- MSP430 -->
				2425	<td class="no"></td> <!-- Mips -->
				2426	<td class="no"></td> <!-- PTX -->
				2427	<td class="no"></td> <!-- PowerPC -->
				2428	<td class="no"></td> <!-- Sparc -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2429	<td class="yes"></td> <!-- X86 -->
				2430	<td class="no"></td> <!-- XCore -->
				2431	</tr>
				2432
				2433	<tr>
				2434	<td><a href="#feat_disassembler">disassembler</a></td>
				2435	<td class="yes"></td> <!-- ARM -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2436	<td class="no"></td> <!-- CellSPU -->
Tony Linthicum	b4b5415	2011-12-12 21:14:40 +0000	[diff] [blame]	2437	<td class="no"></td> <!-- Hexagon -->
Wesley Peck	d5fe3ef	2010-12-20 21:54:50 +0000	[diff] [blame]	2438	<td class="yes"></td> <!-- MBlaze -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2439	<td class="no"></td> <!-- MSP430 -->
				2440	<td class="no"></td> <!-- Mips -->
				2441	<td class="no"></td> <!-- PTX -->
				2442	<td class="no"></td> <!-- PowerPC -->
				2443	<td class="no"></td> <!-- Sparc -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2444	<td class="yes"></td> <!-- X86 -->
				2445	<td class="no"></td> <!-- XCore -->
				2446	</tr>
				2447
				2448	<tr>
				2449	<td><a href="#feat_inlineasm">inline asm</a></td>
				2450	<td class="yes"></td> <!-- ARM -->
Kalle Raiskila	94cc4fe	2010-10-25 08:57:30 +0000	[diff] [blame]	2451	<td class="no"></td> <!-- CellSPU -->
Tony Linthicum	b4b5415	2011-12-12 21:14:40 +0000	[diff] [blame]	2452	<td class="yes"></td> <!-- Hexagon -->
Wesley Peck	d5fe3ef	2010-12-20 21:54:50 +0000	[diff] [blame]	2453	<td class="yes"></td> <!-- MBlaze -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2454	<td class="unknown"></td> <!-- MSP430 -->
Bruno Cardoso Lopes	48461f6	2010-12-19 22:41:43 +0000	[diff] [blame]	2455	<td class="no"></td> <!-- Mips -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2456	<td class="unknown"></td> <!-- PTX -->
				2457	<td class="yes"></td> <!-- PowerPC -->
				2458	<td class="unknown"></td> <!-- Sparc -->
Jakob Stoklund Olesen	f22e672	2011-09-19 18:15:46 +0000	[diff] [blame]	2459	<td class="yes"></td> <!-- X86 -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2460	<td class="unknown"></td> <!-- XCore -->
				2461	</tr>
				2462
				2463	<tr>
				2464	<td><a href="#feat_jit">jit</a></td>
				2465	<td class="partial"><a href="#feat_jit_arm">*</a></td> <!-- ARM -->
Kalle Raiskila	94cc4fe	2010-10-25 08:57:30 +0000	[diff] [blame]	2466	<td class="no"></td> <!-- CellSPU -->
Tony Linthicum	b4b5415	2011-12-12 21:14:40 +0000	[diff] [blame]	2467	<td class="no"></td> <!-- Hexagon -->
Wesley Peck	c6a4524	2010-10-24 18:50:12 +0000	[diff] [blame]	2468	<td class="no"></td> <!-- MBlaze -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2469	<td class="unknown"></td> <!-- MSP430 -->
Bruno Cardoso Lopes	9d9c4ad	2011-10-25 20:09:31 +0000	[diff] [blame]	2470	<td class="yes"></td> <!-- Mips -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2471	<td class="unknown"></td> <!-- PTX -->
				2472	<td class="yes"></td> <!-- PowerPC -->
				2473	<td class="unknown"></td> <!-- Sparc -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2474	<td class="yes"></td> <!-- X86 -->
				2475	<td class="unknown"></td> <!-- XCore -->
				2476	</tr>
				2477
				2478	<tr>
				2479	<td><a href="#feat_objectwrite">.o file writing</a></td>
				2480	<td class="no"></td> <!-- ARM -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2481	<td class="no"></td> <!-- CellSPU -->
Tony Linthicum	b4b5415	2011-12-12 21:14:40 +0000	[diff] [blame]	2482	<td class="no"></td> <!-- Hexagon -->
Wesley Peck	d5fe3ef	2010-12-20 21:54:50 +0000	[diff] [blame]	2483	<td class="yes"></td> <!-- MBlaze -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2484	<td class="no"></td> <!-- MSP430 -->
				2485	<td class="no"></td> <!-- Mips -->
				2486	<td class="no"></td> <!-- PTX -->
				2487	<td class="no"></td> <!-- PowerPC -->
				2488	<td class="no"></td> <!-- Sparc -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2489	<td class="yes"></td> <!-- X86 -->
				2490	<td class="no"></td> <!-- XCore -->
				2491	</tr>
				2492
				2493	<tr>
				2494	<td><a href="#feat_tailcall">tail calls</a></td>
				2495	<td class="yes"></td> <!-- ARM -->
Kalle Raiskila	94cc4fe	2010-10-25 08:57:30 +0000	[diff] [blame]	2496	<td class="no"></td> <!-- CellSPU -->
Tony Linthicum	b4b5415	2011-12-12 21:14:40 +0000	[diff] [blame]	2497	<td class="yes"></td> <!-- Hexagon -->
Wesley Peck	c6a4524	2010-10-24 18:50:12 +0000	[diff] [blame]	2498	<td class="no"></td> <!-- MBlaze -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2499	<td class="unknown"></td> <!-- MSP430 -->
Bruno Cardoso Lopes	48461f6	2010-12-19 22:41:43 +0000	[diff] [blame]	2500	<td class="no"></td> <!-- Mips -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2501	<td class="unknown"></td> <!-- PTX -->
				2502	<td class="yes"></td> <!-- PowerPC -->
				2503	<td class="unknown"></td> <!-- Sparc -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2504	<td class="yes"></td> <!-- X86 -->
				2505	<td class="unknown"></td> <!-- XCore -->
				2506	</tr>
				2507
Rafael Espindola	70d2b17	2011-11-27 22:05:46 +0000	[diff] [blame]	2508	<tr>
				2509	<td><a href="#feat_segstacks">segmented stacks</a></td>
				2510	<td class="no"></td> <!-- ARM -->
				2511	<td class="no"></td> <!-- CellSPU -->
Joe Abbey	a0c1fc3	2012-01-16 13:16:05 +0000	[diff] [blame]	2512	<td class="no"></td> <!-- Hexagon -->
Rafael Espindola	70d2b17	2011-11-27 22:05:46 +0000	[diff] [blame]	2513	<td class="no"></td> <!-- MBlaze -->
				2514	<td class="no"></td> <!-- MSP430 -->
				2515	<td class="no"></td> <!-- Mips -->
				2516	<td class="no"></td> <!-- PTX -->
				2517	<td class="no"></td> <!-- PowerPC -->
				2518	<td class="no"></td> <!-- Sparc -->
				2519	<td class="partial"><a href="#feat_segstacks_x86">*</a></td> <!-- X86 -->
				2520	<td class="no"></td> <!-- XCore -->
				2521	</tr>
				2522
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2523
				2524	</table>
				2525
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2526	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2527	<h4 id="feat_reliable">Is Generally Reliable</h4>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2528
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2529	<div>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2530	<p>This box indicates whether the target is considered to be production quality.
				2531	This indicates that the target has been used as a static compiler to
				2532	compile large amounts of code by a variety of different people and is in
				2533	continuous use.</p>
				2534	</div>
				2535
				2536	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2537	<h4 id="feat_asmparser">Assembly Parser</h4>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2538
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2539	<div>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2540	<p>This box indicates whether the target supports parsing target specific .s
				2541	files by implementing the MCAsmParser interface. This is required for llvm-mc
				2542	to be able to act as a native assembler and is required for inline assembly
				2543	support in the native .o file writer.</p>
				2544
				2545	</div>
				2546
				2547
				2548	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2549	<h4 id="feat_disassembler">Disassembler</h4>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2550
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2551	<div>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2552	<p>This box indicates whether the target supports the MCDisassembler API for
				2553	disassembling machine opcode bytes into MCInst's.</p>
				2554
				2555	</div>
				2556
				2557	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2558	<h4 id="feat_inlineasm">Inline Asm</h4>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2559
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2560	<div>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2561	<p>This box indicates whether the target supports most popular inline assembly
				2562	constraints and modifiers.</p>
				2563
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2564	</div>
				2565
				2566	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2567	<h4 id="feat_jit">JIT Support</h4>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2568
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2569	<div>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2570	<p>This box indicates whether the target supports the JIT compiler through
				2571	the ExecutionEngine interface.</p>
				2572
Chris Lattner	6fb9955	2010-10-24 16:24:22 +0000	[diff] [blame]	2573	<p id="feat_jit_arm">The ARM backend has basic support for integer code
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2574	in ARM codegen mode, but lacks NEON and full Thumb support.</p>
				2575
				2576	</div>
				2577
				2578	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2579	<h4 id="feat_objectwrite">.o File Writing</h4>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2580
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2581	<div>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2582
				2583	<p>This box indicates whether the target supports writing .o files (e.g. MachO,
				2584	ELF, and/or COFF) files directly from the target. Note that the target also
				2585	must include an assembly parser and general inline assembly support for full
				2586	inline assembly support in the .o writer.</p>
				2587
Chris Lattner	219ddf5	2010-10-28 02:22:02 +0000	[diff] [blame]	2588	<p>Targets that don't support this feature can obviously still write out .o
				2589	files, they just rely on having an external assembler to translate from a .s
				2590	file to a .o file (as is the case for many C compilers).</p>
				2591
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2592	</div>
				2593
				2594	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2595	<h4 id="feat_tailcall">Tail Calls</h4>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2596
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2597	<div>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2598
				2599	<p>This box indicates whether the target supports guaranteed tail calls. These
				2600	are calls marked "<a href="LangRef.html#i_call">tail</a>" and use the fastcc
				2601	calling convention. Please see the <a href="#tailcallopt">tail call section
				2602	more more details</a>.</p>
				2603
				2604	</div>
				2605
Rafael Espindola	70d2b17	2011-11-27 22:05:46 +0000	[diff] [blame]	2606	<!-- _______________________________________________________________________ -->
				2607	<h4 id="feat_segstacks">Segmented Stacks</h4>
				2608
				2609	<div>
				2610
				2611	<p>This box indicates whether the target supports segmented stacks. This
				2612	replaces the traditional large C stack with many linked segments. It
				2613	is compatible with the <a href="http://gcc.gnu.org/wiki/SplitStacks">gcc
				2614	implementation</a> used by the Go front end.</p>
				2615
Rafael Espindola	30c5fa2	2011-11-28 17:06:58 +0000	[diff] [blame]	2616	<p id="feat_segstacks_x86">Basic support exists on the X86 backend. Currently
				2617	vararg doesn't work and the object files are not marked the way the gold
Rafael Espindola	dda8c6f	2011-11-29 19:38:09 +0000	[diff] [blame]	2618	linker expects, but simple Go programs can be built by dragonegg.</p>
Rafael Espindola	70d2b17	2011-11-27 22:05:46 +0000	[diff] [blame]	2619
				2620	</div>
				2621
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2622	</div>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2623
				2624	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2625	<h3>
Arnold Schwaighofer	9097d14	2008-05-14 09:17:12 +0000	[diff] [blame]	2626	<a name="tailcallopt">Tail call optimization</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2627	</h3>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2628
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2629	<div>
Arnold Schwaighofer	9097d14	2008-05-14 09:17:12 +0000	[diff] [blame]	2630
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2631	<p>Tail call optimization, callee reusing the stack of the caller, is currently
				2632	supported on x86/x86-64 and PowerPC. It is performed if:</p>
				2633
				2634	<ul>
Chris Lattner	2968943	2010-03-11 00:22:57 +0000	[diff] [blame]	2635	<li>Caller and callee have the calling convention <tt>fastcc</tt> or
				2636	<tt>cc 10</tt> (GHC call convention).</li>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2637
				2638	<li>The call is a tail call - in tail position (ret immediately follows call
				2639	and ret uses value of call or is void).</li>
				2640
				2641	<li>Option <tt>-tailcallopt</tt> is enabled.</li>
				2642
				2643	<li>Platform specific constraints are met.</li>
				2644	</ul>
				2645
				2646	<p>x86/x86-64 constraints:</p>
				2647
				2648	<ul>
				2649	<li>No variable argument lists are used.</li>
				2650
				2651	<li>On x86-64 when generating GOT/PIC code only module-local calls (visibility
				2652	= hidden or protected) are supported.</li>
				2653	</ul>
				2654
				2655	<p>PowerPC constraints:</p>
				2656
				2657	<ul>
				2658	<li>No variable argument lists are used.</li>
				2659
				2660	<li>No byval parameters are used.</li>
				2661
				2662	<li>On ppc32/64 GOT/PIC only module-local calls (visibility = hidden or protected) are supported.</li>
				2663	</ul>
				2664
				2665	<p>Example:</p>
				2666
				2667	<p>Call as <tt>llc -tailcallopt test.ll</tt>.</p>
				2668
				2669	<div class="doc_code">
				2670	<pre>
Arnold Schwaighofer	9097d14	2008-05-14 09:17:12 +0000	[diff] [blame]	2671	declare fastcc i32 @tailcallee(i32 inreg %a1, i32 inreg %a2, i32 %a3, i32 %a4)
				2672
				2673	define fastcc i32 @tailcaller(i32 %in1, i32 %in2) {
				2674	%l1 = add i32 %in1, %in2
				2675	%tmp = tail call fastcc i32 @tailcallee(i32 %in1 inreg, i32 %in2 inreg, i32 %in1, i32 %l1)
				2676	ret i32 %tmp
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2677	}
				2678	</pre>
				2679	</div>
				2680
				2681	<p>Implications of <tt>-tailcallopt</tt>:</p>
				2682
				2683	<p>To support tail call optimization in situations where the callee has more
				2684	arguments than the caller a 'callee pops arguments' convention is used. This
				2685	currently causes each <tt>fastcc</tt> call that is not tail call optimized
				2686	(because one or more of above constraints are not met) to be followed by a
				2687	readjustment of the stack. So performance might be worse in such cases.</p>
				2688
Arnold Schwaighofer	9097d14	2008-05-14 09:17:12 +0000	[diff] [blame]	2689	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2690	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2691	<h3>
Evan Cheng	dc444e9	2010-03-08 21:05:02 +0000	[diff] [blame]	2692	<a name="sibcallopt">Sibling call optimization</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2693	</h3>
Evan Cheng	dc444e9	2010-03-08 21:05:02 +0000	[diff] [blame]	2694
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2695	<div>
Evan Cheng	dc444e9	2010-03-08 21:05:02 +0000	[diff] [blame]	2696
				2697	<p>Sibling call optimization is a restricted form of tail call optimization.
				2698	Unlike tail call optimization described in the previous section, it can be
				2699	performed automatically on any tail calls when <tt>-tailcallopt</tt> option
				2700	is not specified.</p>
				2701
				2702	<p>Sibling call optimization is currently performed on x86/x86-64 when the
				2703	following constraints are met:</p>
				2704
				2705	<ul>
				2706	<li>Caller and callee have the same calling convention. It can be either
				2707	<tt>c</tt> or <tt>fastcc</tt>.
				2708
				2709	<li>The call is a tail call - in tail position (ret immediately follows call
				2710	and ret uses value of call or is void).</li>
				2711
				2712	<li>Caller and callee have matching return type or the callee result is not
				2713	used.
				2714
				2715	<li>If any of the callee arguments are being passed in stack, they must be
				2716	available in caller's own incoming argument stack and the frame offsets
				2717	must be the same.
				2718	</ul>
				2719
				2720	<p>Example:</p>
				2721	<div class="doc_code">
				2722	<pre>
				2723	declare i32 @bar(i32, i32)
				2724
				2725	define i32 @foo(i32 %a, i32 %b, i32 %c) {
				2726	entry:
				2727	%0 = tail call i32 @bar(i32 %a, i32 %b)
				2728	ret i32 %0
				2729	}
				2730	</pre>
				2731	</div>
				2732
				2733	</div>
				2734	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2735	<h3>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2736	<a name="x86">The X86 backend</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2737	</h3>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2738
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2739	<div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2740
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	2741	<p>The X86 code generator lives in the <tt>lib/Target/X86</tt> directory. This
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2742	code generator is capable of targeting a variety of x86-32 and x86-64
				2743	processors, and includes support for ISA extensions such as MMX and SSE.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2744
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2745	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2746	<h4>
Nate Begeman	3450984	2009-01-26 02:54:45 +0000	[diff] [blame]	2747	<a name="x86_tt">X86 Target Triples supported</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2748	</h4>
Chris Lattner	9b988be	2005-07-12 00:20:49 +0000	[diff] [blame]	2749
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2750	<div>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	2751
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2752	<p>The following are the known target triples that are supported by the X86
				2753	backend. This is not an exhaustive list, and it would be useful to add those
				2754	that people test.</p>
Chris Lattner	9b988be	2005-07-12 00:20:49 +0000	[diff] [blame]	2755
				2756	<ul>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2757	<li><b>i686-pc-linux-gnu</b> — Linux</li>
				2758
				2759	<li><b>i386-unknown-freebsd5.3</b> — FreeBSD 5.3</li>
				2760
				2761	<li><b>i686-pc-cygwin</b> — Cygwin on Win32</li>
				2762
				2763	<li><b>i686-pc-mingw32</b> — MingW on Win32</li>
				2764
				2765	<li><b>i386-pc-mingw32msvc</b> — MingW crosscompiler on Linux</li>
				2766
				2767	<li><b>i686-apple-darwin*</b> — Apple Darwin on X86</li>
Torok Edwin	c457b65	2009-06-15 12:17:44 +0000	[diff] [blame]	2768
				2769	<li><b>x86_64-unknown-linux-gnu</b> — Linux</li>
Chris Lattner	9b988be	2005-07-12 00:20:49 +0000	[diff] [blame]	2770	</ul>
				2771
				2772	</div>
				2773
				2774	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2775	<h4>
Anton Korobeynikov	bcb9770	2006-09-17 20:25:45 +0000	[diff] [blame]	2776	<a name="x86_cc">X86 Calling Conventions supported</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2777	</h4>
Anton Korobeynikov	bcb9770	2006-09-17 20:25:45 +0000	[diff] [blame]	2778
				2779
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2780	<div>
Anton Korobeynikov	bcb9770	2006-09-17 20:25:45 +0000	[diff] [blame]	2781
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	2782	<p>The following target-specific calling conventions are known to backend:</p>
Anton Korobeynikov	bcb9770	2006-09-17 20:25:45 +0000	[diff] [blame]	2783
				2784	<ul>
Chris Lattner	01ebd56	2011-05-22 22:28:47 +0000	[diff] [blame]	2785	<li><b>x86_StdCall</b> — stdcall calling convention seen on Microsoft
				2786	Windows platform (CC ID = 64).</li>
				2787	<li><b>x86_FastCall</b> — fastcall calling convention seen on Microsoft
				2788	Windows platform (CC ID = 65).</li>
				2789	<li><b>x86_ThisCall</b> — Similar to X86_StdCall. Passes first argument
				2790	in ECX, others via stack. Callee is responsible for stack cleaning. This
				2791	convention is used by MSVC by default for methods in its ABI
				2792	(CC ID = 70).</li>
Anton Korobeynikov	bcb9770	2006-09-17 20:25:45 +0000	[diff] [blame]	2793	</ul>
				2794
				2795	</div>
				2796
				2797	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2798	<h4>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2799	<a name="x86_memory">Representing X86 addressing modes in MachineInstrs</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2800	</h4>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2801
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2802	<div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2803
Misha Brukman	600df45	2005-02-17 22:22:24 +0000	[diff] [blame]	2804	<p>The x86 has a very flexible way of accessing memory. It is capable of
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2805	forming memory addresses of the following expression directly in integer
				2806	instructions (which use ModR/M addressing):</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2807
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	2808	<div class="doc_code">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2809	<pre>
Chris Lattner	b91227d	2009-10-10 21:30:55 +0000	[diff] [blame]	2810	SegmentReg: Base + [1,2,4,8] * IndexReg + Disp32
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2811	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	2812	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2813
Chris Lattner	b91227d	2009-10-10 21:30:55 +0000	[diff] [blame]	2814	<p>In order to represent this, LLVM tracks no less than 5 operands for each
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2815	memory operand of this form. This means that the "load" form of
				2816	'<tt>mov</tt>' has the following <tt>MachineOperand</tt>s in this order:</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2817
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2818	<div class="doc_code">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2819	<pre>
Chris Lattner	b91227d	2009-10-10 21:30:55 +0000	[diff] [blame]	2820	Index: 0 \| 1 2 3 4 5
				2821	Meaning: DestReg, \| BaseReg, Scale, IndexReg, Displacement Segment
				2822	OperandTy: VirtReg, \| VirtReg, UnsImm, VirtReg, SignExtImm PhysReg
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2823	</pre>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2824	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2825
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2826	<p>Stores, and all other instructions, treat the four memory operands in the
Chris Lattner	b91227d	2009-10-10 21:30:55 +0000	[diff] [blame]	2827	same way and in the same order. If the segment register is unspecified
				2828	(regno = 0), then no segment override is generated. "Lea" operations do not
				2829	have a segment register specified, so they only have 4 operands for their
				2830	memory reference.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2831
				2832	</div>
				2833
				2834	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2835	<h4>
Nate Begeman	3450984	2009-01-26 02:54:45 +0000	[diff] [blame]	2836	<a name="x86_memory">X86 address spaces supported</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2837	</h4>
Nate Begeman	3450984	2009-01-26 02:54:45 +0000	[diff] [blame]	2838
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2839	<div>
Nate Begeman	3450984	2009-01-26 02:54:45 +0000	[diff] [blame]	2840
Jay Foad	cb88ec3	2011-04-06 07:55:30 +0000	[diff] [blame]	2841	<p>x86 has a feature which provides
Dan Gohman	d26795a	2009-05-05 20:48:47 +0000	[diff] [blame]	2842	the ability to perform loads and stores to different address spaces
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2843	via the x86 segment registers. A segment override prefix byte on an
				2844	instruction causes the instruction's memory access to go to the specified
				2845	segment. LLVM address space 0 is the default address space, which includes
				2846	the stack, and any unqualified memory accesses in a program. Address spaces
				2847	1-255 are currently reserved for user-defined code. The GS-segment is
Chris Lattner	1777d0c	2009-05-05 18:52:19 +0000	[diff] [blame]	2848	represented by address space 256, while the FS-segment is represented by
				2849	address space 257. Other x86 segments have yet to be allocated address space
				2850	numbers.</p>
Nate Begeman	3450984	2009-01-26 02:54:45 +0000	[diff] [blame]	2851
Dan Gohman	d26795a	2009-05-05 20:48:47 +0000	[diff] [blame]	2852	<p>While these address spaces may seem similar to TLS via the
				2853	<tt>thread_local</tt> keyword, and often use the same underlying hardware,
				2854	there are some fundamental differences.</p>
				2855
				2856	<p>The <tt>thread_local</tt> keyword applies to global variables and
				2857	specifies that they are to be allocated in thread-local memory. There are
				2858	no type qualifiers involved, and these variables can be pointed to with
				2859	normal pointers and accessed with normal loads and stores.
				2860	The <tt>thread_local</tt> keyword is target-independent at the LLVM IR
				2861	level (though LLVM doesn't yet have implementations of it for some
				2862	configurations).<p>
				2863
				2864	<p>Special address spaces, in contrast, apply to static types. Every
				2865	load and store has a particular address space in its address operand type,
				2866	and this is what determines which address space is accessed.
				2867	LLVM ignores these special address space qualifiers on global variables,
				2868	and does not provide a way to directly allocate storage in them.
				2869	At the LLVM IR level, the behavior of these special address spaces depends
				2870	in part on the underlying OS or runtime environment, and they are specific
				2871	to x86 (and LLVM doesn't yet handle them correctly in some cases).</p>
				2872
				2873	<p>Some operating systems and runtime environments use (or may in the future
				2874	use) the FS/GS-segment registers for various low-level purposes, so care
				2875	should be taken when considering them.</p>
Nate Begeman	3450984	2009-01-26 02:54:45 +0000	[diff] [blame]	2876
				2877	</div>
				2878
				2879	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2880	<h4>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2881	<a name="x86_names">Instruction naming</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2882	</h4>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2883
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2884	<div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2885
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	2886	<p>An instruction name consists of the base name, a default operand size, and a
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2887	a character per operand with an optional special size. For example:</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2888
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2889	<div class="doc_code">
				2890	<pre>
				2891	ADD8rr -> add, 8-bit register, 8-bit register
				2892	IMUL16rmi -> imul, 16-bit register, 16-bit memory, 16-bit immediate
				2893	IMUL16rmi8 -> imul, 16-bit register, 16-bit memory, 8-bit immediate
				2894	MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory
				2895	</pre>
				2896	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2897
				2898	</div>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	2899
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2900	</div>
				2901
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2902	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2903	<h3>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2904	<a name="ppc">The PowerPC backend</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2905	</h3>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2906
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2907	<div>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2908
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2909	<p>The PowerPC code generator lives in the lib/Target/PowerPC directory. The
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2910	code generation is retargetable to several variations or <i>subtargets</i> of
				2911	the PowerPC ISA; including ppc32, ppc64 and altivec.</p>
				2912
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2913	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2914	<h4>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2915	<a name="ppc_abi">LLVM PowerPC ABI</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2916	</h4>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2917
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2918	<div>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2919
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2920	<p>LLVM follows the AIX PowerPC ABI, with two deviations. LLVM uses a PC
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2921	relative (PIC) or static addressing for accessing global values, so no TOC
				2922	(r2) is used. Second, r31 is used as a frame pointer to allow dynamic growth
				2923	of a stack frame. LLVM takes advantage of having no TOC to provide space to
				2924	save the frame pointer in the PowerPC linkage area of the caller frame.
				2925	Other details of PowerPC ABI can be found at <a href=
				2926	"http://developer.apple.com/documentation/DeveloperTools/Conceptual/LowLevelABI/Articles/32bitPowerPC.html"
				2927	>PowerPC ABI.</a> Note: This link describes the 32 bit ABI. The 64 bit ABI
				2928	is similar except space for GPRs are 8 bytes wide (not 4) and r13 is reserved
				2929	for system use.</p>
				2930
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2931	</div>
				2932
				2933	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2934	<h4>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2935	<a name="ppc_frame">Frame Layout</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2936	</h4>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2937
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	2938	<div>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2939
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2940	<p>The size of a PowerPC frame is usually fixed for the duration of a
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2941	function's invocation. Since the frame is fixed size, all references
				2942	into the frame can be accessed via fixed offsets from the stack pointer. The
				2943	exception to this is when dynamic alloca or variable sized arrays are
				2944	present, then a base pointer (r31) is used as a proxy for the stack pointer
				2945	and stack pointer is free to grow or shrink. A base pointer is also used if
				2946	llvm-gcc is not passed the -fomit-frame-pointer flag. The stack pointer is
				2947	always aligned to 16 bytes, so that space allocated for altivec vectors will
				2948	be properly aligned.</p>
				2949
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	2950	<p>An invocation frame is laid out as follows (low memory at top);</p>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2951
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2952	<table class="layout">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2953	<tr>
				2954	<td>Linkage<br><br></td>
				2955	</tr>
				2956	<tr>
				2957	<td>Parameter area<br><br></td>
				2958	</tr>
				2959	<tr>
				2960	<td>Dynamic area<br><br></td>
				2961	</tr>
				2962	<tr>
				2963	<td>Locals area<br><br></td>
				2964	</tr>
				2965	<tr>
				2966	<td>Saved registers area<br><br></td>
				2967	</tr>
				2968	<tr style="border-style: none hidden none hidden;">
				2969	<td><br></td>
				2970	</tr>
				2971	<tr>
				2972	<td>Previous Frame<br><br></td>
				2973	</tr>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2974	</table>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2975
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2976	<p>The <i>linkage</i> area is used by a callee to save special registers prior
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2977	to allocating its own frame. Only three entries are relevant to LLVM. The
				2978	first entry is the previous stack pointer (sp), aka link. This allows
				2979	probing tools like gdb or exception handlers to quickly scan the frames in
				2980	the stack. A function epilog can also use the link to pop the frame from the
				2981	stack. The third entry in the linkage area is used to save the return
				2982	address from the lr register. Finally, as mentioned above, the last entry is
				2983	used to save the previous frame pointer (r31.) The entries in the linkage
				2984	area are the size of a GPR, thus the linkage area is 24 bytes long in 32 bit
				2985	mode and 48 bytes in 64 bit mode.</p>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2986
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2987	<p>32 bit linkage area</p>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2988
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2989	<table class="layout">
				2990	<tr>
				2991	<td>0</td>
				2992	<td>Saved SP (r1)</td>
				2993	</tr>
				2994	<tr>
				2995	<td>4</td>
				2996	<td>Saved CR</td>
				2997	</tr>
				2998	<tr>
				2999	<td>8</td>
				3000	<td>Saved LR</td>
				3001	</tr>
				3002	<tr>
				3003	<td>12</td>
				3004	<td>Reserved</td>
				3005	</tr>
				3006	<tr>
				3007	<td>16</td>
				3008	<td>Reserved</td>
				3009	</tr>
				3010	<tr>
				3011	<td>20</td>
				3012	<td>Saved FP (r31)</td>
				3013	</tr>
				3014	</table>
				3015
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	3016	<p>64 bit linkage area</p>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	3017
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	3018	<table class="layout">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	3019	<tr>
				3020	<td>0</td>
				3021	<td>Saved SP (r1)</td>
				3022	</tr>
				3023	<tr>
				3024	<td>8</td>
				3025	<td>Saved CR</td>
				3026	</tr>
				3027	<tr>
				3028	<td>16</td>
				3029	<td>Saved LR</td>
				3030	</tr>
				3031	<tr>
				3032	<td>24</td>
				3033	<td>Reserved</td>
				3034	</tr>
				3035	<tr>
				3036	<td>32</td>
				3037	<td>Reserved</td>
				3038	</tr>
				3039	<tr>
				3040	<td>40</td>
				3041	<td>Saved FP (r31)</td>
				3042	</tr>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	3043	</table>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	3044
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	3045	<p>The <i>parameter area</i> is used to store arguments being passed to a callee
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	3046	function. Following the PowerPC ABI, the first few arguments are actually
				3047	passed in registers, with the space in the parameter area unused. However,
				3048	if there are not enough registers or the callee is a thunk or vararg
				3049	function, these register arguments can be spilled into the parameter area.
				3050	Thus, the parameter area must be large enough to store all the parameters for
				3051	the largest call sequence made by the caller. The size must also be
				3052	minimally large enough to spill registers r3-r10. This allows callees blind
				3053	to the call signature, such as thunks and vararg functions, enough space to
				3054	cache the argument registers. Therefore, the parameter area is minimally 32
				3055	bytes (64 bytes in 64 bit mode.) Also note that since the parameter area is
				3056	a fixed offset from the top of the frame, that a callee can access its spilt
				3057	arguments using fixed offsets from the stack pointer (or base pointer.)</p>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	3058
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	3059	<p>Combining the information about the linkage, parameter areas and alignment. A
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	3060	stack frame is minimally 64 bytes in 32 bit mode and 128 bytes in 64 bit
				3061	mode.</p>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	3062
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	3063	<p>The <i>dynamic area</i> starts out as size zero. If a function uses dynamic
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	3064	alloca then space is added to the stack, the linkage and parameter areas are
				3065	shifted to top of stack, and the new space is available immediately below the
				3066	linkage and parameter areas. The cost of shifting the linkage and parameter
				3067	areas is minor since only the link value needs to be copied. The link value
				3068	can be easily fetched by adding the original frame size to the base pointer.
				3069	Note that allocations in the dynamic space need to observe 16 byte
				3070	alignment.</p>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	3071
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	3072	<p>The <i>locals area</i> is where the llvm compiler reserves space for local
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	3073	variables.</p>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	3074
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	3075	<p>The <i>saved registers area</i> is where the llvm compiler spills callee
				3076	saved registers on entry to the callee.</p>
				3077
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	3078	</div>
				3079
				3080	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	3081	<h4>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	3082	<a name="ppc_prolog">Prolog/Epilog</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	3083	</h4>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	3084
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	3085	<div>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	3086
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	3087	<p>The llvm prolog and epilog are the same as described in the PowerPC ABI, with
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	3088	the following exceptions. Callee saved registers are spilled after the frame
				3089	is created. This allows the llvm epilog/prolog support to be common with
				3090	other targets. The base pointer callee saved register r31 is saved in the
				3091	TOC slot of linkage area. This simplifies allocation of space for the base
				3092	pointer and makes it convenient to locate programatically and during
				3093	debugging.</p>
				3094
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	3095	</div>
				3096
				3097	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	3098	<h4>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	3099	<a name="ppc_dynamic">Dynamic Allocation</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	3100	</h4>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	3101
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	3102	<div>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	3103
Jim Laskey	b744c25	2006-12-15 10:40:48 +0000	[diff] [blame]	3104	<p><i>TODO - More to come.</i></p>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	3105
Jim Laskey	b744c25	2006-12-15 10:40:48 +0000	[diff] [blame]	3106	</div>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	3107
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	3108	</div>
				3109
Justin Holewinski	dceb002	2011-08-11 17:34:16 +0000	[diff] [blame]	3110	<!-- ======================================================================= -->
				3111	<h3>
				3112	<a name="ptx">The PTX backend</a>
				3113	</h3>
				3114
				3115	<div>
				3116
				3117	<p>The PTX code generator lives in the lib/Target/PTX directory. It is
				3118	currently a work-in-progress, but already supports most of the code
				3119	generation functionality needed to generate correct PTX kernels for
				3120	CUDA devices.</p>
				3121
				3122	<p>The code generator can target PTX 2.0+, and shader model 1.0+. The
				3123	PTX ISA Reference Manual is used as the primary source of ISA
				3124	information, though an effort is made to make the output of the code
				3125	generator match the output of the NVidia nvcc compiler, whenever
				3126	possible.</p>
				3127
				3128	<p>Code Generator Options:</p>
				3129	<table border="1" cellspacing="0">
				3130	<tr>
				3131	<th>Option</th>
				3132	<th>Description</th>
				3133	</tr>
				3134	<tr>
				3135	<td><code>double</code></td>
				3136	<td align="left">If enabled, the map_f64_to_f32 directive is
				3137	disabled in the PTX output, allowing native double-precision
				3138	arithmetic</td>
				3139	</tr>
				3140	<tr>
				3141	<td><code>no-fma</code></td>
				3142	<td align="left">Disable generation of Fused-Multiply Add
				3143	instructions, which may be beneficial for some devices</td>
				3144	</tr>
				3145	<tr>
				3146	<td><code>smxy / computexy</code></td>
				3147	<td align="left">Set shader model/compute capability to x.y,
				3148	e.g. sm20 or compute13</td>
				3149	</tr>
				3150	</table>
				3151
				3152	<p>Working:</p>
				3153	<ul>
				3154	<li>Arithmetic instruction selection (including combo FMA)</li>
				3155	<li>Bitwise instruction selection</li>
				3156	<li>Control-flow instruction selection</li>
				3157	<li>Function calls (only on SM 2.0+ and no return arguments)</li>
				3158	<li>Addresses spaces (0 = global, 1 = constant, 2 = local, 4 =
				3159	shared)</li>
				3160	<li>Thread synchronization (bar.sync)</li>
				3161	<li>Special register reads ([N]TID, [N]CTAID, PMx, CLOCK, etc.)</li>
				3162	</ul>
				3163
				3164	<p>In Progress:</p>
				3165	<ul>
				3166	<li>Robust call instruction selection</li>
				3167	<li>Stack frame allocation</li>
				3168	<li>Device-specific instruction scheduling optimizations</li>
				3169	</ul>
				3170
				3171
				3172	</div>
				3173
NAKAMURA Takumi	f5af6ad	2011-04-23 00:30:22 +0000	[diff] [blame]	3174	</div>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	3175
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	3176	<!-- *********************************************************************** -->
				3177	<hr>
				3178	<address>
				3179	<a href="http://jigsaw.w3.org/css-validator/check/referer"><img
Misha Brukman	4440870	2008-12-11 17:34:48 +0000	[diff] [blame]	3180	src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	3181	<a href="http://validator.w3.org/check/referer"><img
Misha Brukman	f00ddb0	2008-12-11 18:23:24 +0000	[diff] [blame]	3182	src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	3183
				3184	<a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
NAKAMURA Takumi	b9a3363	2011-04-09 02:13:37 +0000	[diff] [blame]	3185	<a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	3186	Last modified: $Date$
				3187	</address>
				3188
				3189	</body>
				3190	</html>